This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

ARM Cortex A8 : Enabling D Cache aborts

Gopu over 10 years ago

I am using Omap3515 (Arm Cortex A8). Enabled I-Cache, D-Cache, Branch Prediction and MMU.

I am getting a data abort, if I try to copy a frame buffer of 600KB from an external memory region to another external memory region. After the data abort, I could notice that the SDR i.e SDRAM is not accessible.

I have enabled MMU in such a way that PA=VA.

There is no issue if I copy less amount data.

And also, If I disable D-Cache then there is no abort and it works fine. But I would like to enable D-Cache for faster access.

Thanks and regards,

Gopu

Parents

0 Gopu over 10 years ago in reply to Yasuhiko Koumoto

Hi,
Thanks a lot for going through the code. I have implemented by enabling I cache D cache and MMU at the same time.
But modified after going through the below links 1. ARM Information Center
and 2. ARM Information Center
Already I have tried all the combinations of C and B bit, but the problem still exist.
Thanks and regards,
Gopu
Cancel
Vote up 0 Vote down

Cancel

Reply

0 Gopu over 10 years ago in reply to Yasuhiko Koumoto

Hi,
Thanks a lot for going through the code. I have implemented by enabling I cache D cache and MMU at the same time.
But modified after going through the below links 1. ARM Information Center
and 2. ARM Information Center
Already I have tried all the combinations of C and B bit, but the problem still exist.
Thanks and regards,
Gopu
Cancel
Vote up 0 Vote down

Cancel

Children

0 Yasuhiko Koumoto over 10 years ago in reply to Gopu

Hello vskgopu,

The contents of
1. ARM Information Center
would be no errors but I and D cache enabling codes are not shown.
The contents of
2. ARM Information Center
would also be no errors.
Unless I or D cache and MMU are enabled at the same time, 'Not Allowed' state of the link 2 would happen.
Of course, I know these my comments would not be related with your problem.
Best regards,
Yasuhiko Koumoto.
Cancel
Vote up 0 Vote down

Cancel
0 Yasuhiko Koumoto over 10 years ago in reply to Gopu

Hello,
can I confirm one thing?
What was the result of your experiment when the C=0 and B=0?
In this case, it is equivalent to D-cache disable.
Had the combination succeeded?
Best regards,
Yasuhiko Koumoto.
Cancel
Vote up 0 Vote down

Cancel
0 Gopu over 10 years ago in reply to Yasuhiko Koumoto

Hello,
Here is the test report
Test 1:
MMU - Enabled
I Cache - Enabled
D Cahce - Enabled
C Bit - Disabled
B Bit - Disabled
Brach Predictin - Enabled
Time taken to execute the code in internal ram is 312 milli seconds.
Test 2:
MMU - Enabled
I Cache - Enabled
D Cahce - Disabled
C Bit - Enabled
B Bit - Enabled
Brach Predictin - Enabled
Time taken to execute the code in internal ram is 7 milli seconds.
The following are the further test reports
milli sec(s) Execution Region MMU I Cache D Cahce C Bit B Bit Brach Predictin
7 SRAM Enabled Enabled Enabled Enabled Enabled Enabled
292 SRAM Enabled Disabled Disabled Enabled Enabled Enabled
7 SRAM Disabled Enabled Disabled Enabled Enabled Enabled
292 SRAM Disabled Disabled Disabled Enabled Enabled Enabled
312 SRAM Enabled Enabled Enabled Disabled Disabled Enabled
7 SRAM Enabled Enabled Enabled Enabled Disabled Enabled
312 SRAM Enabled Enabled Enabled Disabled Enabled Enabled
7 SRAM Enabled Enabled Disabled Enabled Enabled Enabled
Thanks and regards,
Gopu
Cancel
Vote up 0 Vote down

Cancel
0 Yasuhiko Koumoto over 10 years ago in reply to Gopu

Hello Gopu,
thank you for the information. I's interesting.
According to the results, it seems that the performance depend only on C-bit regardless of whether D-cache is enable or disable.
I checked the OMAP3515 block diagram shown below.
As you can see, there are L2-Cache between CPU and SRAM.
I guess the attribute of L1 caching was absorbed in L2Cache because L2Cache was disable and L2Cache would react according only to C-bit (which is a part of ARCACHE/AWCACHE of an interconnect between CPU and L2Cache).
That is, for the SRAM veiw, L1 D-cache state (i.e. enable or disable) will not affect the SRAM performance when MMU is enabled.
It is just assumption and only the following case cannot be explained.

7 SRAM Disabled Enabled Disabled C=1 B=1 Enabled

By the way, I had wanted to know the results of the SDRAM and the combination of C-bit and B-bit.
In the SDRAM case, did the transaction always failed? Or, were there any conditions which had succeeded?
Best regards,
Yasuhiko Koumoto.
Cancel
Vote up 0 Vote down

Cancel
0 Gopu over 10 years ago in reply to Yasuhiko Koumoto

Hello,
Thanks a lot for the analysis. Here is the report of code in SDRAM
milli sec(s)      Execution Region      MMU      I Cache      D Cahce      C Bit      B Bit      Brach Predictin
7                 SDRAM                Enabled   Enabled     Enabled    Enabled    Enabled       Enabled
483               SDRAM                Enabled   Disabled    Disabled   Enabled    Enabled       Enabled
7                 SDRAM                Disabled Enabled     Disabled   Enabled    Enabled       Enabled
483               SDRAM                Disabled Disabled    Disabled   Enabled    Enabled       Enabled
483               SDRAM                Enabled   Enabled     Enabled    Disabled   Disabled      Enabled
7                 SDRAM                Enabled   Enabled     Enabled    Enabled    Disabled      Enabled
483               SDRAM                Enabled   Enabled     Enabled    Disabled   Enabled       Enabled
7                 SDRAM                Enabled   Enabled     Disabled   Enabled    Enabled       Enabled
Thanks and regards,
Gopu
Cancel
Vote up 0 Vote down

Cancel
0 Yasuhiko Koumoto over 10 years ago in reply to Gopu

Hello Gopu,
tnank you.
Regarding SDRAM case, the following 2 cases are against to my assumption.
Can anyone explain the phenomena without inconsistency?
Might it be possible if both caches were disabled the C-bit would be ignored?
Anyway, the 2nd case could not be explained and it would be the same as the SRAM case.

483 SDRAM Enabled Disabled Disabled Enabled Enabled Enabled

7 SDRAM Disabled Enabled Disabled Enabled Enabled Enabled

Do the other conditions which are not listed cause SDRAM crush?
Best regards.
Yasuhiko Koumoto.
Cancel
Vote up 0 Vote down

Cancel
0 Gopu over 10 years ago in reply to Yasuhiko Koumoto

Hello,
     For enabling L2 cache, is it enough to do the following or do I have to do some other settings as well ?
;==================================================================
; Enable Cortex-A8 Level2 Unified Cache
;==================================================================
EnableL2UnifiedCache:
        MRC     p15, #0, r0, c1, c0, #1      ; Read Auxiliary Control Register
        ORR     r0, r0, #2                     ; L2EN bit, enable L2 cache
        ;BIC   r0, r0, #(0x1 << 1)         ; L2EN bit, disable L2 cache
        ;ORR     r0, r0, #(0x1 << 4)        ;Enables speculative accesses on AXI
        ORR     r0, r0, #(0x1 << 4)        ;Enables speculative accesses on AXI
        ORR     r0, r0, #(0x1 << 5)        ;Enables caching NEON data within the L1 data cache
        MCR     p15, #0, r0, c1, c0, #1      ; Write Auxiliary Control Register
        BX      lr
Thanks and regards,
Gopu
Cancel
Vote up 0 Vote down

Cancel
0 Yasuhiko Koumoto over 10 years ago in reply to Gopu

Hello Gopu,
you should clear C bit in the CP15 Control Register c1 before initializing L2 Cache.
Secondary, you should invalidate L2 Cache by similar method to L1 Cache.
These two steps are lost.
Finally, you should set C bit in the CP15 Control Register c1.
For your reference, The below are L2 Cache enable/disable sequences extracted from "Cortex™-A8 Technical Reference Manual Revision: r3p2".

8.3 Enabling and disabling the L2 cache controller

To enable the L2 cache following a reset or to change the settings of the L2 Cache Auxiliary Control Register, you must use the following sequence:

1. Complete the processor reset sequence or disable the L2 cache.
2. Program the L2 Cache Auxiliary Control Register. See c9, L2 Cache Auxiliary Control Register on page 3-95 for details.

Note
If you have configured the processor to support parity or ECC memory, you must enable those features before you can program the C bit.

    MRC p15, 1, <Rd>, c9, c0, 2 ; Read L2 Cache Auxiliary Control Register
    MCR p15, 1, <Rd>, c9, c0, 2 ; Write L2 Cache Auxiliary Control Register

3. Program the Auxiliary Control Register to set the L2EN bit to 1. See c1, Auxiliary Control Register on page 3-47 for details.

    MRC p15, 0, <Rd>, c1, c0, 1 ; Read Auxiliary Control Register
    MCR p15, 0, <Rd>, c1, c0, 1 ; Write Auxiliary Control Register

4. Program the C bit in the CP15 Control Register c1. See c1, Control Register on page 3-44 for details.

    MRC p15, 0, <Rd>, c1, c0, 0 ; Read Control Register
    MCR p15, 0, <Rd>, c1, c0, 0 ; Write Control Register

To disable the L2 cache, but leave the L1 data cache enabled, use the following sequence:
1. Disable the C bit.
2. Clean and invalidate the L1 and L2 caches.
3. Disable the L2 cache by clearing the L2EN bit to 0.
4. Enable the C bit.
Note
To keep memory coherent when using cache maintenance operations, you must follow the L2 cache disabling sequence. Cache maintenance operations have an effect on the L1 and L2 caches when they are disabled. A cache maintenance operation can evict a cache line from the L1 data cache. If the L2EN bit is set to 1, the evicted cache line can be allocated to the L2 cache. If the L2EN bit is not set to 1, then evictions from the L1 data cache are sent directly to external memory using the AXI interface.

Best regards,
Yasuhiko Koumoto.
Cancel
Vote up 0 Vote down

Cancel
0 Matthijs van Duin over 10 years ago in reply to Yasuhiko Koumoto
Note that in ARMv7 the S, TEX, C, and B bits of a translation table entry together determine the cacheability and external behaviour of a memory region. When TEX=0, the C and B bits provide ARMv4/v5-compatible behaviour:
C=0,B=0: strongly-ordered
C=0,B=1: device
C=1,B=0: normal memory, L1 and L2 write-through cacheable, read-allocate
C=1,B=1: normal memory, L1 and L2 write-back cacheable, read/write-allocate
But for normal memory regions, new encodings exist which allow you to separately specify the L1 and L2 cache policy. Note that device and strongly-ordered regions (TEX=0,C=0) should only be used for peripherals, and incur a serious performance penalty. They should never be used for code execution.
If the MMU is disabled, then instruction fetches behave as if the target memory region is configured as normal memory, L1/L2 cacheable (write policy irrelevant), while data accesses behave as if the target memory region is configured as strongly-ordered. This incoherency means you have to be a bit careful when the MMU is disabled, and I recommend configuring and enabling the MMU at the earllest convenient moment.
The timing measurements mostly show that terrible performance results if instruction caching is disabled (by either disabling it explicitly in the system control register, or by enabling the MMU but running code out of memory marked device or strongly-ordered). This is not very surprising.

Addendum: I've just uploaded a small example that may be helpful in constructing section translation tables. It is for the am335x but should be easily adaptable to omap3.
Cancel
Vote up 0 Vote down

Cancel
0 Yasuhiko Koumoto over 10 years ago in reply to Matthijs van Duin

Hi matthijs,
I cannot agree with you.

Note that device and strongly-ordered regions (TEX=0,C=0) should only be used for peripherals, and incur a serious performance penalty. They should never be used for code execution.

How come are you so sure?
Hadn't you made any application programs?
How should we configure to set a normal memory uncached?
Best regadrs,
Yasuhiko Koumoto.
Cancel
Vote up 0 Vote down

Cancel
0 Matthijs van Duin over 10 years ago in reply to Yasuhiko Koumoto

Strongly-ordered and device memory impose additional constraints compared to normal uncacheable memory, which adversely impact performance without any benefit when targeting memory. In essence, accesses to device memory behave more like remote procedure calls. For example, performing 8 sequential byte-stores to normal uncacheable memory can be (and often is) merged to become a single dword-store. To device memory, they will always remain 8 individual byte-stores on the AXI bus. If strongly-ordered, then moreover the CPU will wait for each store to complete.
Normal uncacheable memory is configured by setting TEX=100 (binary), C=0, B=0. Or, in my example C code, type_normal( nc, nc ). It should obviously also not be used for code execution, as the performance impact of running without instruction caching is really severe.
If at all practical, the use of explicit cache maintenance should be considered preferable for performance reasons. On the Cortex-A8, you can moreover use the Preload Engine to fetch data to and/or evict data from L2 cache in the background while software is performing other tasks (this is especially useful when processing data in a streaming fashion).
Cancel
Vote up 0 Vote down

Cancel
0 Yasuhiko Koumoto over 10 years ago in reply to Matthijs van Duin

Hi matthijs,
are there any problems other than performance issue?
If there would be no problem, the memory attribute should be left to the developer.
The SDRC should work correctly under any memory attributes.
I think that this post is not aimed how we should configure memory attribute but aimed how we could solve the Gopu's problem.
I wonder why you did mention another MMU setting example.

Most likely the only reason why cache settings have an effect on whether or not the error occurs is because they affect the type (e.g. burst or not) and rate of requests to the SDRC.

The above your comment would be possible because the SDRC had worked well under uncache attribute.
However, I think the problem has nothing with the memory type (or attribute).
Anyway, the SDRAM mode regsiter setting might be revised.
Best regards,
Yasuhiko Koumoto.
Cancel
Vote up 0 Vote down

Cancel
0 Matthijs van Duin over 10 years ago in reply to Yasuhiko Koumoto

yasuhikokoumoto wrote:

are there any problems other than performance issue?

If there would be no problem, the memory attribute should be left to the developer.

Performance timings were being discussed. I raised the issue of memory attributes in that context.
But yes, there are more restrictions w.r.t. device and strongly-ordered memory. For example, unaligned access is forbidden (regardless of the strict alignment checking flag). More restrictions can be found by browsing the ARM architecture reference manual, for example I just found "in a VMSA implementation when any associated MMU is enabled, any multi-access instruction that loads or stores the PC must access only Normal memory. If the instruction accesses Device or Strongly-ordered memory the result is UNPREDICTABLE." Such a load is common for returning from a function, which means that putting your stack in device or strongly-ordered memory is a bad idea.

The SDRC should work correctly under any memory attributes.

I agree, which I why I said the main issue lies there, but since I have no experience with the omap3 sdrc (only with the substantially different emif4d found in later devices) I don't immediately have any suggestions what might be the cause. Gopu said he will investigate it once his DDR memory arrives, so I will await that.

Anyway, the SDRAM mode regsiter setting might be revised.

Misconfiguration of the memory typically causes data corruption rather than bus errors, since the memory controller has no way to verify the integrity of the response.
Cancel
Vote up 0 Vote down

Cancel