I am using Omap3515 (Arm Cortex A8). Enabled I-Cache, D-Cache, Branch Prediction and MMU.
I am getting a data abort, if I try to copy a frame buffer of 600KB from an external memory region to another external memory region. After the data abort, I could notice that the SDR i.e SDRAM is not accessible.
I have enabled MMU in such a way that PA=VA.
There is no issue if I copy less amount data.
And also, If I disable D-Cache then there is no abort and it works fine. But I would like to enable D-Cache for faster access.
Thanks and regards,
Gopu
Hello Gopu,
how did you solve the problem which the internal RAM access was slower than the external RAM when MMU was enabled?
In case of successful case, how many data did you transferred?
Is it less that dcache size?
By the way, what is your computer board of OMAP 3515?
What is the LPDDR part?
Best regards,
Yasuhiko Koumoto.
Hello,
Thanks for the reply. Here are the details.
Internal RAM slow :
This is due to the C bit and B bit settings in TLB of MMU. By mistake I have not enabled the cacheable and bufferable bit in the TLB. After enabling these bits. Internal RAM started working fine.
In OMAP 3515 internal ram is of 64 KB, I have tested a NEONCopyPLD of 47 KB from one region to another region it worked fine. i.e src = 0x40204000, dest = 0x40204400, size = 1024*47.
External RAM and abort :
This issue still exists. I want to copy 600KB from one region of SDRAM to another region of SDRAM. But it aborts if D Cache is enabled.
I tired 300KB, it works fine for some times. But for most of the times it goes to prefetch/data abort handler. And following are the status register values after the abort
Data Fault Status Register is 0
Instruction Fault Status Register is 0x1008
Instruction Fault Address Register is 0x80437314 (this is my code region)
Data Auxiliary Fault Status Register is 0
In the board we have single-data-rate LPSDR connected at the CS0 i.e from 0x80000000.
I am using only L1 address table in the MMU, and I did not enable L2 cache.
Here is my code
.arm
.sect EntryOnReset
;// Module Imports and exports
.global ResetHandler
.global RelocateImage
.global OasysEntry
.global BoardInit
.global __stack
.global __STACK_SIZE
.global NEONCopyPLD
.global EnableCaches
.global DisableCaches
.global MemmoryTest
.global core_init
;// Stack Size Definition.
UND_Stack_Size .EQU 0x00000100
SVC_Stack_Size .EQU 0x00001000 ;//Only main function stack is to be mentioned here.
ABT_Stack_Size .EQU 0x00000100
FIQ_Stack_Size .EQU 0x00000100
IRQ_Stack_Size .EQU 0x00000100
USR_Stack_Size .EQU 0x00000100
;//******************************************************************************
;// EQUATES
;// Standard definitions of Mode bits and Interrupt (I & F) flags in PSRs
Mode_USR .EQU 0x10
Mode_FIQ .EQU 0x11
Mode_IRQ .EQU 0x12
Mode_SVC .EQU 0x13
Mode_ABT .EQU 0x17
Mode_UND .EQU 0x1B
Mode_SYS .EQU 0x1F
I_Bit .EQU 0x80 ;// when I bit is set, IRQ is disabled
F_Bit .EQU 0x40 ;// when F bit is set, FIQ is disabled
FIQ_IRQ_DISABLE .EQU 0xC0 ;//Disable both FIQ and IRQ.
;// CODE GENERATION DIRECTIVES
;// Area Definition and Entry Point
;// Startup Code must be linked first at Address at which it expects to run.
;//------------------------------------------------------------------------------
;// Routine name : ENTRY
;// Description : Entry point for software.
;// Assumptions : <none>
;// Tainted registers : <none>
;// Functions called :
;// Low level
;// Requirements :
;//
;// 1. This routine shall initialize the Stack pointer.
;// 2. This routine shall perform architecture specific
;// initializations by calling InitCPU routine.
ResetHandler:
;==================================================================
; Enable access to NEON/VFP by enabling access to Coprocessors 10 and 11.
; Enables Full Access i.e. in both privileged and non privileged modes
MRC p15, #0, r0, c1, c0, #2 ; Read Coprocessor Access Control Register (CPACR)
ORR r0, r0, #(0xF << 20) ; Enable access to CP 10 & 11
MCR p15, #0, r0, c1, c0, #2 ; Write Coprocessor Access Control Register (CPACR)
ISB
; Switch on the VFP and NEON hardware
;=================================================================
MOV r0, #0x40000000
VMSR FPEXC, r0 ; Write FPEXC register, EN bit set
;@ Disable MMU.
MRC p15, #0, r1, c1, c0, #0 ;@ Read Control Register configuration data.
BIC r1, r1, #0x1
MCR p15, #0, r1, c1, c0, #0 ;@ Write Control Register configuration data.
;@ Disable L1 Caches.
BIC r1, r1, #(0x1 << 12) ;@ Disable I Cache.
BIC r1, r1, #(0x1 << 2) ;@ Disable D Cache.
MCR p15, #0, r1, c1, c0, #0 ;@ Write Control Register configuration data
;@ Invalidate L1 Caches.
;@ Invalidate Instruction cache.
MOV r1, #0
MCR p15, #0, r1, c7, c5, #0
;@ Invalidate Data cache.
;@ To make the code general purpose, calculate the
;@ cache size first and loop through each set + way.
MRC p15, #1, r0, c0, c0, #0 ;@ Read Cache Size ID.
;TBR LDR r3, #0x1FF
MOV r3, #0x1FF
AND r0, r3, r0, LSR #13 ;@ r0 = no. of sets - 1.
MOV r1, #0 ;@ r1 = way counter way_loop.
way_loop:
MOV r3, #0 ;@ r3 = set counter set_loop.
set_loop:
MOV r2, r1, LSL #30
;TBR ORR r2, r3, LSL #5 ;@ r2 = set/way cache operation format.
ORR r2, r2, r3, LSL #5 ;@ r2 = set/way cache operation format.
MCR p15, #0, r2, c7, c6, #2 ;@ Invalidate the line described by r2.
ADD r3, r3, #1 ;@ Increment set counter.
CMP r0, r3 ;@ Last set reached yet?
BGT set_loop ;@ If not, iterate set_loop,
ADD r1, r1, #1 ;@ else, next.
CMP r1, #4 ;@ Last way reached yet?
BNE way_loop ;@ if not, iterate way_loop.
;@ Invalidate TLB
MCR p15, #0, r1, c8, c7, #0
;@ Branch Prediction Enable.
ORR r1, r1, #(0x1 << 11) ;@ Global BP Enable bit.
;@ Enable D-side Prefetch
MRC p15, #0, r1, c1, c0, #1 ;@ Read Auxiliary Control Register.
ORR r1, r1, #(0x1 <<2) ;@ Enable D-side prefetch.
MCR p15, #0, r1, c1, c0, #1; ;@ Write Auxiliary Control Register.
DSB
;@ DSB causes completion of all cache maintenance operations appearing in program
;@ order before the DSB instruction.
;@ An ISB instruction causes the effect of all branch predictor maintenance
;@ operations before the ISB instruction to be visible to all instructions
;@ after the ISB instruction.
;@ Initialize PageTable.
;@ It would be faster to create this in a read-only section in an assembly file.
;@ descriptor.
LDR r1, tlb_l1_base
MOV r2, #0
MOV r4, #1
write_pte
MOV r0, #0x0E
ORR r0, r0, r4, LSL #0xA
ORR r0, r0, r4, LSL #0xB
ORR r0, r0, r2, LSL #20
STR r0, [r1]
ADD r1, r1, #4
ADD r2, r2, #1 ;@ Decrement loop counter.
SUBS r3, r2, #4096
BNE write_pte
;@ Initialize MMU.
MOV r1,#0x0
MCR p15, #0, r1, c2, c0, #2 ;@ Write Translation Table Base Control Register.
MCR p15, #0, r1, c2, c0, #0 ;@ Write Translation Table Base Register 0.
;@ In this simple example, do not use TRE or Normal Memory Remap Register.
;@ Set all Domains to Manger.
MOV r1, #0xFFFF ; Provied Manager access, so access premition bits (AP) shall not be checked.
ORR r1, r1, r1, LSL #0x10
MCR p15, #0, r1, c3, c0, #0 ;@ Write Domain Access Control Register.
;@ Enable MMU
ORR r1, r1, #0x1 ;@ Bit 0 is the MMU enable.
tlb_l1_base .word 0x40200000
;// SETUP STACK POINTERS FOR USR MODE
;*------------------------------------------------------
;* INITIALIZE THE USER MODE STACK
StackInit:
LDR sp, c_stack
LDR r0, c_STACK_SIZE
ADD sp, sp, r0
;*-----------------------------------------------------
;* Clear upper 3 bits for 64-bit alignment.
BIC sp, sp, #0x07
;// Set IRQ and FIQ bits in CPSR to disable all interrupts.
MRS R0, CPSR
STMFD SP!, {R0} ;// Store it onto stack
ORR R1, R0, #FIQ_IRQ_DISABLE
MSR CPSR_c, R1
;// SETUP STACK FOR OTHER MODES
MOV R0, SP
;// Enter Undefined Instruction Mode and set its Stack Pointer
MSR CPSR_c, #Mode_UND|I_Bit|F_Bit
MOV SP, R0
SUB R0, R0, #UND_Stack_Size
;// Enter Abort Mode and set its Stack Pointer
MSR CPSR_c, #Mode_ABT|I_Bit|F_Bit
SUB R0, R0, #ABT_Stack_Size
;// Enter FIQ Mode and set its Stack Pointer
MSR CPSR_c, #Mode_FIQ|I_Bit|F_Bit
SUB R0, R0, #FIQ_Stack_Size
;// Enter IRQ Mode and set its Stack Pointer
MSR CPSR_c, #Mode_IRQ|I_Bit|F_Bit
SUB R0, R0, #IRQ_Stack_Size
;// Enter Supervisor Mode and set its Stack Pointer
MSR CPSR_c, #Mode_SVC|I_Bit|F_Bit
SUB R0, R0, #SVC_Stack_Size
MSR CPSR_c, #Mode_SVC|I_Bit
;// MOVE TO myfunc
;// The following routine copies the loaded image to execution region.
BL RelocateImage
;// The following routine initialises the Omap3515.
BL BoardInit
;// The following routine enaables the MMU.
;BL EnableMMU
;// The following routine enables the I cache.
BL EnableICaches
;// The following routine enables branch prediction.
BL EnableBrachPrediction
;// The following routine enables the D cache.
BL EnableDCaches
;// The following routine enables the L2 cache.
;BL EnableL2UnifiedCache
;// The following routine starts the OS.
BL MemmoryTest
;===================================================================
; Enable MMU and Branch to __main
; Leaving the caches disabled until after scatter loading.
.global EnableMMU
;******************************************************************************
; c1, Control Register
; [0] M bit Banked Enables the MMU:
; 0 = MMU disabled, reset value
; 1 = MMU enabled.
EnableMMU:
;Read the c1 register
mrc p15, #0, r0, c1, c0, #0
;Set b1 - Enables the MMU
orr r0, r0, #0x1
; Write back to c1 register to enable MMU
mcr p15, #0, r0, c1, c0, #0
BX lr
.global EnableICaches
; This API enables instruction cache.
EnableICaches:
MRC p15, #0, r0, c1, c0, #0 ; Read System Control Register
ORR r0, r0, #(0x1 << 12) ; Set I bit 12 to enable I Cache
;BIC r0, r0, #(0x1 <<12) ; Clear bit 0
MCR p15, #0, r0, c1, c0, #0 ; Write System Control Register
.global EnableDCaches
; This API enables data cache.
EnableDCaches:
ORR r0, r0, #(0x1 << 2) ; Set C bit 2 to enable D Cache
;BIC r0, r0, #(0x1 << 2) ; Clear bit 0
;BIC r0, r0, #(0x1 << 1) ; disable alignment checks
.global EnableL2UnifiedCache
; Enable Cortex-A8 Level2 Unified Cache
EnableL2UnifiedCache:
MRC p15, #0, r0, c1, c0, #1 ; Read Auxiliary Control Register
ORR r0, r0, #2 ; L2EN bit, enable L2 cache
;BIC r0, r0, #(0x1 << 1) ; L2EN bit, disable L2 cache
;ORR r0, r0, #(0x1 << 4) ;Enables speculative accesses on AXI
ORR r0, r0, #(0x1 << 4) ;Enables speculative accesses on AXI
ORR r0, r0, #(0x1 << 5) ;Enables caching NEON data within the L1 data cache
MCR p15, #0, r0, c1, c0, #1 ; Write Auxiliary Control Register
.global EnableBrachPrediction
; This API enables branch prediction
EnableBrachPrediction:
ORR r0, r0, #(0x1 << 11) ; Set Z bit 11 to enable branch prediction
;BIC r0, r0, #(0x1 << 11) ; Disable all forms of branch prediction
c_stack: .long __stack
c_STACK_SIZE: .long __STACK_SIZE
;// POINTERS TO VARIABLES
; ENDIF
.END
thank you for your detailed explanations.
May ask you the reason why the board equips LPSDR SDRAM?
According to the OMAP3515 reference manual (http://www.tij.co.jp/jp/lit/ds/symlink/omap3515.pdf),
there are below descriptions in "6.4.2 SDRAM Controller Subsystem (SDRC)".
The SDRC module only supports lowpower double-data-rate (LPDDR) SDRAM devices.
Although I don't know LPDDR specs well, the commands will be double data rate.
I guess the transactions would succeed by chance at dcache off case because the transaction would be a single access.
I am afraid the burst transaction in dcache on case would not match the SDRAM specs.
In the OMAP3515 reference manual, the SDRAM timing chart was not described and I am not sure whether my guess is correct.
Can't you replace the SDRAM for LPDDR?
I'm sorry but I can only say it by the information from you.
Hi,
Thanks a lot for the reply. OMAP3515 supports both SDR and DDR, But we have mounted SDR as of now. We will be getting a new board with DDR after a month time. May be I can check it at that time. Other than that, do you feel I am missing some settings for MMU.
Thanks a lot again,
I'm sorry but I had not looked at your code and now I look at it.
I wonder why you enable D-Cache and MMU separately.
Because D-Cache will be enabled when MMU is enabled, I think both of D-Cache and MMU should be enabled at the same time.
However, it might have nothing with the problem.
As for the other parts, I think there would be OK.
By the way, can you do an experiment of the write-through?
You can find by bit31 of the Cache Size Identification Register whether the processor will support the write-through (i.e. C=1, B=0).
Although the write-through mode is identical to un-cacheable in Cortex-A9, I could not find such the statement in Cortex-A8 TRM.
But I'm not sure that the phenomenon would be changed.
Thanks a lot for going through the code. I have implemented by enabling I cache D cache and MMU at the same time.
But modified after going through the below links 1. ARM Information Center
and 2. ARM Information Center
Already I have tried all the combinations of C and B bit, but the problem still exist.
Hello vskgopu,
The contents of
1. ARM Information Center
would be no errors but I and D cache enabling codes are not shown.
2. ARM Information Center
would also be no errors.
Unless I or D cache and MMU are enabled at the same time, 'Not Allowed' state of the link 2 would happen.
Of course, I know these my comments would not be related with your problem.
can I confirm one thing?
What was the result of your experiment when the C=0 and B=0?
In this case, it is equivalent to D-cache disable.
Had the combination succeeded?
Here is the test report
Test 1:
MMU - Enabled
I Cache - Enabled
D Cahce - Enabled
C Bit - Disabled
B Bit - Disabled
Brach Predictin - Enabled
Time taken to execute the code in internal ram is 312 milli seconds.
Test 2:
D Cahce - Disabled
C Bit - Enabled
B Bit - Enabled
Time taken to execute the code in internal ram is 7 milli seconds.
The following are the further test reports
milli sec(s) Execution Region MMU I Cache D Cahce C Bit B Bit Brach Predictin
7 SRAM Enabled Enabled Enabled Enabled Enabled Enabled
292 SRAM Enabled Disabled Disabled Enabled Enabled Enabled
7 SRAM Disabled Enabled Disabled Enabled Enabled Enabled
292 SRAM Disabled Disabled Disabled Enabled Enabled Enabled
312 SRAM Enabled Enabled Enabled Disabled Disabled Enabled
7 SRAM Enabled Enabled Enabled Enabled Disabled Enabled
312 SRAM Enabled Enabled Enabled Disabled Enabled Enabled
7 SRAM Enabled Enabled Disabled Enabled Enabled Enabled
thank you for the information. I's interesting.
According to the results, it seems that the performance depend only on C-bit regardless of whether D-cache is enable or disable.
I checked the OMAP3515 block diagram shown below.
As you can see, there are L2-Cache between CPU and SRAM.
I guess the attribute of L1 caching was absorbed in L2Cache because L2Cache was disable and L2Cache would react according only to C-bit (which is a part of ARCACHE/AWCACHE of an interconnect between CPU and L2Cache).
That is, for the SRAM veiw, L1 D-cache state (i.e. enable or disable) will not affect the SRAM performance when MMU is enabled.
It is just assumption and only the following case cannot be explained.
7 SRAM Disabled Enabled Disabled C=1 B=1 Enabled
By the way, I had wanted to know the results of the SDRAM and the combination of C-bit and B-bit.
In the SDRAM case, did the transaction always failed? Or, were there any conditions which had succeeded?
Thanks a lot for the analysis. Here is the report of code in SDRAM
7 SDRAM Enabled Enabled Enabled Enabled Enabled Enabled
483 SDRAM Enabled Disabled Disabled Enabled Enabled Enabled
7 SDRAM Disabled Enabled Disabled Enabled Enabled Enabled
483 SDRAM Disabled Disabled Disabled Enabled Enabled Enabled
483 SDRAM Enabled Enabled Enabled Disabled Disabled Enabled
7 SDRAM Enabled Enabled Enabled Enabled Disabled Enabled
483 SDRAM Enabled Enabled Enabled Disabled Enabled Enabled
7 SDRAM Enabled Enabled Disabled Enabled Enabled Enabled
tnank you.
Regarding SDRAM case, the following 2 cases are against to my assumption.
Can anyone explain the phenomena without inconsistency?
Might it be possible if both caches were disabled the C-bit would be ignored?
Anyway, the 2nd case could not be explained and it would be the same as the SRAM case.
483 SDRAM Enabled Disabled Disabled Enabled Enabled Enabled 7 SDRAM Disabled Enabled Disabled Enabled Enabled Enabled
Do the other conditions which are not listed cause SDRAM crush?
Best regards.
For enabling L2 cache, is it enough to do the following or do I have to do some other settings as well ?
you should clear C bit in the CP15 Control Register c1 before initializing L2 Cache.
Secondary, you should invalidate L2 Cache by similar method to L1 Cache.
These two steps are lost.
Finally, you should set C bit in the CP15 Control Register c1.
For your reference, The below are L2 Cache enable/disable sequences extracted from "Cortex™-A8 Technical Reference Manual Revision: r3p2".
8.3 Enabling and disabling the L2 cache controller To enable the L2 cache following a reset or to change the settings of the L2 Cache Auxiliary Control Register, you must use the following sequence: 1. Complete the processor reset sequence or disable the L2 cache.2. Program the L2 Cache Auxiliary Control Register. See c9, L2 Cache Auxiliary Control Register on page 3-95 for details. NoteIf you have configured the processor to support parity or ECC memory, you must enable those features before you can program the C bit. MRC p15, 1, <Rd>, c9, c0, 2 ; Read L2 Cache Auxiliary Control Register MCR p15, 1, <Rd>, c9, c0, 2 ; Write L2 Cache Auxiliary Control Register 3. Program the Auxiliary Control Register to set the L2EN bit to 1. See c1, Auxiliary Control Register on page 3-47 for details. MRC p15, 0, <Rd>, c1, c0, 1 ; Read Auxiliary Control Register MCR p15, 0, <Rd>, c1, c0, 1 ; Write Auxiliary Control Register 4. Program the C bit in the CP15 Control Register c1. See c1, Control Register on page 3-44 for details. MRC p15, 0, <Rd>, c1, c0, 0 ; Read Control Register MCR p15, 0, <Rd>, c1, c0, 0 ; Write Control Register To disable the L2 cache, but leave the L1 data cache enabled, use the following sequence:1. Disable the C bit.2. Clean and invalidate the L1 and L2 caches.3. Disable the L2 cache by clearing the L2EN bit to 0.4. Enable the C bit.NoteTo keep memory coherent when using cache maintenance operations, you must follow the L2 cache disabling sequence. Cache maintenance operations have an effect on the L1 and L2 caches when they are disabled. A cache maintenance operation can evict a cache line from the L1 data cache. If the L2EN bit is set to 1, the evicted cache line can be allocated to the L2 cache. If the L2EN bit is not set to 1, then evictions from the L1 data cache are sent directly to external memory using the AXI interface.
8.3 Enabling and disabling the L2 cache controller
To enable the L2 cache following a reset or to change the settings of the L2 Cache Auxiliary Control Register, you must use the following sequence:
1. Complete the processor reset sequence or disable the L2 cache.2. Program the L2 Cache Auxiliary Control Register. See c9, L2 Cache Auxiliary Control Register on page 3-95 for details.
NoteIf you have configured the processor to support parity or ECC memory, you must enable those features before you can program the C bit.
MRC p15, 1, <Rd>, c9, c0, 2 ; Read L2 Cache Auxiliary Control Register MCR p15, 1, <Rd>, c9, c0, 2 ; Write L2 Cache Auxiliary Control Register
3. Program the Auxiliary Control Register to set the L2EN bit to 1. See c1, Auxiliary Control Register on page 3-47 for details.
MRC p15, 0, <Rd>, c1, c0, 1 ; Read Auxiliary Control Register MCR p15, 0, <Rd>, c1, c0, 1 ; Write Auxiliary Control Register
4. Program the C bit in the CP15 Control Register c1. See c1, Control Register on page 3-44 for details.
MRC p15, 0, <Rd>, c1, c0, 0 ; Read Control Register MCR p15, 0, <Rd>, c1, c0, 0 ; Write Control Register
To disable the L2 cache, but leave the L1 data cache enabled, use the following sequence:1. Disable the C bit.2. Clean and invalidate the L1 and L2 caches.3. Disable the L2 cache by clearing the L2EN bit to 0.4. Enable the C bit.NoteTo keep memory coherent when using cache maintenance operations, you must follow the L2 cache disabling sequence. Cache maintenance operations have an effect on the L1 and L2 caches when they are disabled. A cache maintenance operation can evict a cache line from the L1 data cache. If the L2EN bit is set to 1, the evicted cache line can be allocated to the L2 cache. If the L2EN bit is not set to 1, then evictions from the L1 data cache are sent directly to external memory using the AXI interface.