I am using Omap3515 (Arm Cortex A8). Enabled I-Cache, D-Cache, Branch Prediction and MMU.
I am getting a data abort, if I try to copy a frame buffer of 600KB from an external memory region to another external memory region. After the data abort, I could notice that the SDR i.e SDRAM is not accessible.
I have enabled MMU in such a way that PA=VA.
There is no issue if I copy less amount data.
And also, If I disable D-Cache then there is no abort and it works fine. But I would like to enable D-Cache for faster access.
Thanks and regards,
Gopu
Hello Gopu,
tnank you.
Regarding SDRAM case, the following 2 cases are against to my assumption.
Can anyone explain the phenomena without inconsistency?
Might it be possible if both caches were disabled the C-bit would be ignored?
Anyway, the 2nd case could not be explained and it would be the same as the SRAM case.
483 SDRAM Enabled Disabled Disabled Enabled Enabled Enabled 7 SDRAM Disabled Enabled Disabled Enabled Enabled Enabled
483 SDRAM Enabled Disabled Disabled Enabled Enabled Enabled
7 SDRAM Disabled Enabled Disabled Enabled Enabled Enabled
Do the other conditions which are not listed cause SDRAM crush?
Best regards.
Yasuhiko Koumoto.
Hello,
For enabling L2 cache, is it enough to do the following or do I have to do some other settings as well ?
;==================================================================
; Enable Cortex-A8 Level2 Unified Cache
EnableL2UnifiedCache:
MRC p15, #0, r0, c1, c0, #1 ; Read Auxiliary Control Register
ORR r0, r0, #2 ; L2EN bit, enable L2 cache
;BIC r0, r0, #(0x1 << 1) ; L2EN bit, disable L2 cache
;ORR r0, r0, #(0x1 << 4) ;Enables speculative accesses on AXI
ORR r0, r0, #(0x1 << 4) ;Enables speculative accesses on AXI
ORR r0, r0, #(0x1 << 5) ;Enables caching NEON data within the L1 data cache
MCR p15, #0, r0, c1, c0, #1 ; Write Auxiliary Control Register
BX lr
you should clear C bit in the CP15 Control Register c1 before initializing L2 Cache.
Secondary, you should invalidate L2 Cache by similar method to L1 Cache.
These two steps are lost.
Finally, you should set C bit in the CP15 Control Register c1.
For your reference, The below are L2 Cache enable/disable sequences extracted from "Cortex™-A8 Technical Reference Manual Revision: r3p2".
8.3 Enabling and disabling the L2 cache controller To enable the L2 cache following a reset or to change the settings of the L2 Cache Auxiliary Control Register, you must use the following sequence: 1. Complete the processor reset sequence or disable the L2 cache.2. Program the L2 Cache Auxiliary Control Register. See c9, L2 Cache Auxiliary Control Register on page 3-95 for details. NoteIf you have configured the processor to support parity or ECC memory, you must enable those features before you can program the C bit. MRC p15, 1, <Rd>, c9, c0, 2 ; Read L2 Cache Auxiliary Control Register MCR p15, 1, <Rd>, c9, c0, 2 ; Write L2 Cache Auxiliary Control Register 3. Program the Auxiliary Control Register to set the L2EN bit to 1. See c1, Auxiliary Control Register on page 3-47 for details. MRC p15, 0, <Rd>, c1, c0, 1 ; Read Auxiliary Control Register MCR p15, 0, <Rd>, c1, c0, 1 ; Write Auxiliary Control Register 4. Program the C bit in the CP15 Control Register c1. See c1, Control Register on page 3-44 for details. MRC p15, 0, <Rd>, c1, c0, 0 ; Read Control Register MCR p15, 0, <Rd>, c1, c0, 0 ; Write Control Register To disable the L2 cache, but leave the L1 data cache enabled, use the following sequence:1. Disable the C bit.2. Clean and invalidate the L1 and L2 caches.3. Disable the L2 cache by clearing the L2EN bit to 0.4. Enable the C bit.NoteTo keep memory coherent when using cache maintenance operations, you must follow the L2 cache disabling sequence. Cache maintenance operations have an effect on the L1 and L2 caches when they are disabled. A cache maintenance operation can evict a cache line from the L1 data cache. If the L2EN bit is set to 1, the evicted cache line can be allocated to the L2 cache. If the L2EN bit is not set to 1, then evictions from the L1 data cache are sent directly to external memory using the AXI interface.
8.3 Enabling and disabling the L2 cache controller
To enable the L2 cache following a reset or to change the settings of the L2 Cache Auxiliary Control Register, you must use the following sequence:
1. Complete the processor reset sequence or disable the L2 cache.2. Program the L2 Cache Auxiliary Control Register. See c9, L2 Cache Auxiliary Control Register on page 3-95 for details.
NoteIf you have configured the processor to support parity or ECC memory, you must enable those features before you can program the C bit.
MRC p15, 1, <Rd>, c9, c0, 2 ; Read L2 Cache Auxiliary Control Register MCR p15, 1, <Rd>, c9, c0, 2 ; Write L2 Cache Auxiliary Control Register
3. Program the Auxiliary Control Register to set the L2EN bit to 1. See c1, Auxiliary Control Register on page 3-47 for details.
MRC p15, 0, <Rd>, c1, c0, 1 ; Read Auxiliary Control Register MCR p15, 0, <Rd>, c1, c0, 1 ; Write Auxiliary Control Register
4. Program the C bit in the CP15 Control Register c1. See c1, Control Register on page 3-44 for details.
MRC p15, 0, <Rd>, c1, c0, 0 ; Read Control Register MCR p15, 0, <Rd>, c1, c0, 0 ; Write Control Register
To disable the L2 cache, but leave the L1 data cache enabled, use the following sequence:1. Disable the C bit.2. Clean and invalidate the L1 and L2 caches.3. Disable the L2 cache by clearing the L2EN bit to 0.4. Enable the C bit.NoteTo keep memory coherent when using cache maintenance operations, you must follow the L2 cache disabling sequence. Cache maintenance operations have an effect on the L1 and L2 caches when they are disabled. A cache maintenance operation can evict a cache line from the L1 data cache. If the L2EN bit is set to 1, the evicted cache line can be allocated to the L2 cache. If the L2EN bit is not set to 1, then evictions from the L1 data cache are sent directly to external memory using the AXI interface.
Best regards,
Note that in ARMv7 the S, TEX, C, and B bits of a translation table entry together determine the cacheability and external behaviour of a memory region. When TEX=0, the C and B bits provide ARMv4/v5-compatible behaviour:
But for normal memory regions, new encodings exist which allow you to separately specify the L1 and L2 cache policy. Note that device and strongly-ordered regions (TEX=0,C=0) should only be used for peripherals, and incur a serious performance penalty. They should never be used for code execution.
If the MMU is disabled, then instruction fetches behave as if the target memory region is configured as normal memory, L1/L2 cacheable (write policy irrelevant), while data accesses behave as if the target memory region is configured as strongly-ordered. This incoherency means you have to be a bit careful when the MMU is disabled, and I recommend configuring and enabling the MMU at the earllest convenient moment.
The timing measurements mostly show that terrible performance results if instruction caching is disabled (by either disabling it explicitly in the system control register, or by enabling the MMU but running code out of memory marked device or strongly-ordered). This is not very surprising.
Addendum: I've just uploaded a small example that may be helpful in constructing section translation tables. It is for the am335x but should be easily adaptable to omap3.
Hi matthijs,
I cannot agree with you.
Note that device and strongly-ordered regions (TEX=0,C=0) should only be used for peripherals, and incur a serious performance penalty. They should never be used for code execution.
How come are you so sure?
Hadn't you made any application programs?
How should we configure to set a normal memory uncached?
Best regadrs,
Strongly-ordered and device memory impose additional constraints compared to normal uncacheable memory, which adversely impact performance without any benefit when targeting memory. In essence, accesses to device memory behave more like remote procedure calls. For example, performing 8 sequential byte-stores to normal uncacheable memory can be (and often is) merged to become a single dword-store. To device memory, they will always remain 8 individual byte-stores on the AXI bus. If strongly-ordered, then moreover the CPU will wait for each store to complete.
Normal uncacheable memory is configured by setting TEX=100 (binary), C=0, B=0. Or, in my example C code, type_normal( nc, nc ). It should obviously also not be used for code execution, as the performance impact of running without instruction caching is really severe.
If at all practical, the use of explicit cache maintenance should be considered preferable for performance reasons. On the Cortex-A8, you can moreover use the Preload Engine to fetch data to and/or evict data from L2 cache in the background while software is performing other tasks (this is especially useful when processing data in a streaming fashion).
are there any problems other than performance issue?
If there would be no problem, the memory attribute should be left to the developer.
The SDRC should work correctly under any memory attributes.
I think that this post is not aimed how we should configure memory attribute but aimed how we could solve the Gopu's problem.
I wonder why you did mention another MMU setting example.
Most likely the only reason why cache settings have an effect on whether or not the error occurs is because they affect the type (e.g. burst or not) and rate of requests to the SDRC.
The above your comment would be possible because the SDRC had worked well under uncache attribute.
However, I think the problem has nothing with the memory type (or attribute).
Anyway, the SDRAM mode regsiter setting might be revised.
yasuhikokoumoto wrote: are there any problems other than performance issue? If there would be no problem, the memory attribute should be left to the developer.
yasuhikokoumoto wrote:
Performance timings were being discussed. I raised the issue of memory attributes in that context.
But yes, there are more restrictions w.r.t. device and strongly-ordered memory. For example, unaligned access is forbidden (regardless of the strict alignment checking flag). More restrictions can be found by browsing the ARM architecture reference manual, for example I just found "in a VMSA implementation when any associated MMU is enabled, any multi-access instruction that loads or stores the PC must access only Normal memory. If the instruction accesses Device or Strongly-ordered memory the result is UNPREDICTABLE." Such a load is common for returning from a function, which means that putting your stack in device or strongly-ordered memory is a bad idea.
I agree, which I why I said the main issue lies there, but since I have no experience with the omap3 sdrc (only with the substantially different emif4d found in later devices) I don't immediately have any suggestions what might be the cause. Gopu said he will investigate it once his DDR memory arrives, so I will await that.
Misconfiguration of the memory typically causes data corruption rather than bus errors, since the memory controller has no way to verify the integrity of the response.