This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

ARM Cortex A8 : Enabling D Cache aborts

I am using Omap3515 (Arm Cortex A8). Enabled I-Cache, D-Cache, Branch Prediction and MMU.

I am getting a data abort, if I try to copy a frame buffer of 600KB from an external memory region to another external memory region. After the data abort, I could notice that the SDR i.e SDRAM is not accessible.

I have enabled MMU in such a way that PA=VA.

There is no issue if I copy less amount data.

And also, If I disable D-Cache then there is no abort and it works fine. But I would like to enable D-Cache for faster access.

Thanks and regards,

Gopu

  • Hello Gopu,

    how did you solve the problem which the internal RAM access was slower than the external RAM when MMU was enabled?

    In case of successful case, how many data did you transferred?

    Is it less that dcache size?

    By the way, what is your computer board of OMAP 3515?

    What is the LPDDR part?

    Best regards,

    Yasuhiko Koumoto.

  • Hello,

          Thanks for the reply. Here are the details.

          Internal RAM slow :

          This is due to the C bit and B bit settings in TLB of MMU. By mistake I have not enabled the cacheable and bufferable bit in the TLB. After enabling these bits. Internal RAM started working fine.

      In OMAP 3515 internal ram is of 64 KB, I have tested a NEONCopyPLD of 47 KB from one region to another region it worked fine. i.e  src = 0x40204000, dest = 0x40204400, size = 1024*47. 

          External RAM and abort :

          This issue still exists. I want to copy 600KB from one region of SDRAM to another region of SDRAM. But it aborts if D Cache is enabled.

          I tired 300KB, it works fine for some times. But for most of the times it goes to prefetch/data abort handler. And following are the status register values after the abort

        

          Data Fault Status Register is 0

          Instruction Fault Status Register is 0x1008

          Instruction Fault Address Register is 0x80437314 (this is my code region)

          Data Auxiliary Fault Status Register is 0

          In the board we have single-data-rate LPSDR connected at the CS0 i.e from 0x80000000.

        

      I am using only L1 address table in the MMU, and I did not enable L2 cache.     

    Thanks and regards,

    Gopu

  • Here is my code

                .arm

                .sect     EntryOnReset

    ;//         Module Imports and exports          

                .global   ResetHandler

                .global   RelocateImage

                .global   OasysEntry

                .global   BoardInit

                .global    __stack

                .global   __STACK_SIZE

      .global   NEONCopyPLD

      .global   EnableCaches

      .global   DisableCaches

      .global   MemmoryTest

                .global   core_init

    ;// Stack Size Definition.

    UND_Stack_Size  .EQU     0x00000100

    SVC_Stack_Size  .EQU     0x00001000      ;//Only main function stack is to be mentioned here.

    ABT_Stack_Size  .EQU     0x00000100

    FIQ_Stack_Size  .EQU     0x00000100

    IRQ_Stack_Size  .EQU     0x00000100

    USR_Stack_Size  .EQU     0x00000100

    ;//******************************************************************************

    ;//                                EQUATES

    ;//******************************************************************************

    ;// Standard definitions of Mode bits and Interrupt (I & F) flags in PSRs

    Mode_USR        .EQU     0x10

    Mode_FIQ        .EQU     0x11

    Mode_IRQ        .EQU     0x12

    Mode_SVC        .EQU     0x13

    Mode_ABT        .EQU     0x17

    Mode_UND        .EQU     0x1B

    Mode_SYS        .EQU     0x1F

    I_Bit           .EQU     0x80            ;// when I bit is set, IRQ is disabled

    F_Bit           .EQU     0x40            ;// when F bit is set, FIQ is disabled

    ;//******************************************************************************

    ;//                                EQUATES

    ;//******************************************************************************

    FIQ_IRQ_DISABLE .EQU  0xC0               ;//Disable both FIQ and IRQ.

    ;//******************************************************************************

    ;//                             CODE GENERATION DIRECTIVES

    ;//******************************************************************************

    ;// Area Definition and Entry Point

    ;// Startup Code must be linked first at Address at which it expects to run.

    ;//------------------------------------------------------------------------------

    ;//******************************************************************************

    ;// Routine name    : ENTRY

    ;// Description     : Entry point for software.

    ;// Assumptions     : <none>

    ;// Tainted registers  :  <none>

    ;//  Functions called  :

    ;// Low level

    ;// Requirements       :

    ;//

    ;//    1. This routine shall initialize the Stack pointer.

    ;//

    ;//    2. This routine shall perform architecture specific

    ;//       initializations by calling InitCPU routine.

    ;//******************************************************************************

    ResetHandler:

    ;==================================================================

    ; Enable access to NEON/VFP by enabling access to Coprocessors 10 and 11.

    ; Enables Full Access i.e. in both privileged and non privileged modes

    ;==================================================================

            MRC     p15, #0, r0, c1, c0, #2      ; Read Coprocessor Access Control Register (CPACR)

            ORR     r0, r0, #(0xF << 20)       ; Enable access to CP 10 & 11

            MCR     p15, #0, r0, c1, c0, #2      ; Write Coprocessor Access Control Register (CPACR)

            ISB

    ;==================================================================

    ; Switch on the VFP and NEON hardware

    ;=================================================================

            MOV     r0, #0x40000000

            VMSR    FPEXC, r0                   ; Write FPEXC register, EN bit set

        ;@ Disable MMU.

        MRC p15, #0, r1, c1, c0, #0                               ;@ Read Control Register configuration data.

        BIC r1, r1, #0x1

        MCR p15, #0, r1, c1, c0, #0                               ;@ Write Control Register configuration data.

          

        ;@ Disable L1 Caches.

        MRC p15, #0, r1, c1, c0, #0                               ;@ Read Control Register configuration data.

        BIC r1, r1, #(0x1 << 12)                                ;@ Disable I Cache.

        BIC r1, r1, #(0x1 << 2)                                 ;@ Disable D Cache.

        MCR p15, #0, r1, c1, c0, #0                               ;@ Write Control Register configuration data

       

        ;@ Invalidate L1 Caches.

        ;@ Invalidate Instruction cache.

        MOV r1, #0

        MCR p15, #0, r1, c7, c5, #0

        ;@ Invalidate Data cache.

        ;@ To make the code general purpose, calculate the

        ;@ cache size first and loop through each set + way.

        MRC p15, #1, r0, c0, c0, #0                               ;@ Read Cache Size ID.

    ;TBR LDR r3, #0x1FF

        MOV r3, #0x1FF

        AND r0, r3, r0, LSR #13                                 ;@ r0 = no. of sets - 1.

        MOV r1, #0                              ;@ r1 = way counter way_loop.

    way_loop:

        MOV r3, #0                              ;@ r3 = set counter set_loop.

    set_loop:

        MOV r2, r1, LSL #30

    ;TBR ORR r2, r3, LSL #5                              ;@ r2 = set/way cache operation format.

        ORR r2, r2, r3, LSL #5                              ;@ r2 = set/way cache operation format.

        MCR p15, #0, r2, c7, c6, #2                               ;@ Invalidate the line described by r2.

        ADD r3, r3, #1                              ;@ Increment set counter.

        CMP r0, r3                              ;@ Last set reached yet?

        BGT set_loop                                ;@ If not, iterate set_loop,

        ADD r1, r1, #1                              ;@ else, next.

        CMP r1, #4                              ;@ Last way reached yet?

        BNE way_loop                                ;@ if not, iterate way_loop.

        ;@ Invalidate TLB

        MCR p15, #0, r1, c8, c7, #0

        ;@ Branch Prediction Enable.

        MOV r1, #0

        MRC p15, #0, r1, c1, c0, #0                               ;@ Read Control Register configuration data.

        ORR r1, r1, #(0x1 << 11)                                ;@ Global BP Enable bit.

        MCR p15, #0, r1, c1, c0, #0                               ;@ Write Control Register configuration data.

        ;@ Enable D-side Prefetch

        MRC p15, #0, r1, c1, c0, #1                               ;@ Read Auxiliary Control Register.

        ORR r1, r1, #(0x1 <<2)                              ;@ Enable D-side prefetch.

        MCR p15, #0, r1, c1, c0, #1;                              ;@ Write Auxiliary Control Register.

        DSB

        ISB

        ;@ DSB causes completion of all cache maintenance operations appearing in program

        ;@ order before the DSB instruction.

        ;@ An ISB instruction causes the effect of all branch predictor maintenance

        ;@ operations before the ISB instruction to be visible to all instructions

        ;@ after the ISB instruction.

        ;@ Initialize PageTable.

        ;@ It would be faster to create this in a read-only section in an assembly file.

                                                        ;@ descriptor.

        LDR r1, tlb_l1_base

        MOV r2, #0

        MOV r4, #1

    write_pte

        MOV r0, #0x0E

        ORR r0, r0, r4, LSL #0xA

        ORR r0, r0, r4, LSL #0xB

        ORR r0, r0, r2, LSL #20

        STR r0, [r1]

        ADD r1, r1, #4

        ADD r2, r2, #1                                 ;@ Decrement loop counter.

        SUBS r3, r2, #4096

        BNE write_pte

        ;@ Initialize MMU.

        MOV r1,#0x0

        MCR p15, #0, r1, c2, c0, #2                               ;@ Write Translation Table Base Control Register.

        LDR r1, tlb_l1_base

        MCR p15, #0, r1, c2, c0, #0                               ;@ Write Translation Table Base Register 0.

        ;@ In this simple example, do not use TRE or Normal Memory Remap Register.

        ;@ Set all Domains to Manger.

        MOV r1, #0xFFFF             ; Provied Manager access, so access premition bits (AP) shall not be checked.

        ORR r1, r1, r1, LSL #0x10

        MCR p15, #0, r1, c3, c0, #0                                   ;@ Write Domain Access Control Register.

        ;@ Enable MMU

        MRC p15, #0, r1, c1, c0, #0                                   ;@ Read Control Register configuration data.

        ORR r1, r1, #0x1                                    ;@ Bit 0 is the MMU enable.

        MCR p15, #0, r1, c1, c0, #0                                   ;@ Write Control Register configuration data.

    tlb_l1_base .word 0x40200000

    ;//******************************************************************************

    ;//                              SETUP STACK POINTERS FOR USR MODE

    ;//******************************************************************************

    ;*------------------------------------------------------

    ;* INITIALIZE THE USER MODE STACK

    ;*------------------------------------------------------

    StackInit:

      LDR     sp, c_stack

      LDR     r0, c_STACK_SIZE

      ADD     sp, sp, r0

    ;*-----------------------------------------------------

    ;* Clear upper 3 bits for 64-bit alignment.

    ;*-----------------------------------------------------

      BIC     sp, sp, #0x07

      ;// Set IRQ and FIQ bits in CPSR to disable all interrupts.

      MRS     R0, CPSR

      STMFD   SP!, {R0}                   ;// Store it onto stack

      ORR     R1, R0, #FIQ_IRQ_DISABLE

      MSR     CPSR_c, R1

    ;//******************************************************************************

    ;//                              SETUP STACK FOR OTHER MODES

    ;//******************************************************************************

      MOV     R0, SP

    ;//  Enter Undefined Instruction Mode and set its Stack Pointer

            MSR     CPSR_c, #Mode_UND|I_Bit|F_Bit

            MOV     SP, R0

            SUB     R0, R0, #UND_Stack_Size

    ;//  Enter Abort Mode and set its Stack Pointer

            MSR     CPSR_c, #Mode_ABT|I_Bit|F_Bit

            MOV     SP, R0

            SUB     R0, R0, #ABT_Stack_Size

    ;//  Enter FIQ Mode and set its Stack Pointer

            MSR     CPSR_c, #Mode_FIQ|I_Bit|F_Bit

            MOV     SP, R0

            SUB     R0, R0, #FIQ_Stack_Size

    ;//  Enter IRQ Mode and set its Stack Pointer

            MSR     CPSR_c, #Mode_IRQ|I_Bit|F_Bit

            MOV     SP, R0

            SUB     R0, R0, #IRQ_Stack_Size

    ;//  Enter Supervisor Mode and set its Stack Pointer

            MSR     CPSR_c, #Mode_SVC|I_Bit|F_Bit

            MOV     SP, R0

            SUB     R0, R0, #SVC_Stack_Size

            MSR     CPSR_c, #Mode_SVC|I_Bit

    ;//******************************************************************************

    ;//                                   MOVE TO myfunc

    ;//******************************************************************************

    ;//  The following routine copies the loaded image to execution region.

                    BL      RelocateImage

    ;//  The following routine initialises the Omap3515.

         BL BoardInit

    ;//  The following routine enaables the MMU.

                    ;BL      EnableMMU

    ;//  The following routine enables the I cache.

                    BL      EnableICaches

    ;//  The following routine enables branch prediction.

                    BL      EnableBrachPrediction

    ;//  The following routine enables the D cache.

                    BL      EnableDCaches

    ;//  The following routine enables the L2 cache.

                    ;BL      EnableL2UnifiedCache

    ;//  The following routine starts the OS.

                    BL      MemmoryTest

    ;===================================================================

    ; Enable MMU and Branch to __main

    ; Leaving the caches disabled until after scatter loading.

    ;===================================================================

        .global EnableMMU

    ;******************************************************************************

    ;               c1, Control Register

    ;       [0] M bit Banked                  Enables the MMU:

    ;                                         0 = MMU disabled, reset value

    ;                                         1 = MMU enabled.

    ;******************************************************************************

    EnableMMU:

        ;Read the c1 register

        mrc p15, #0, r0, c1, c0, #0

        ;Set b1 - Enables the MMU

        orr r0, r0, #0x1

        ; Write back to c1 register to enable MMU

        mcr p15, #0, r0, c1, c0, #0

        BX      lr

        .global EnableICaches

    ;==================================================================

    ;  This API enables instruction cache.

    ;==================================================================

    EnableICaches:

            MRC     p15, #0, r0, c1, c0, #0      ; Read System Control Register

            ORR     r0, r0, #(0x1 << 12)         ; Set I bit 12 to enable I Cache

            ;BIC   r0, r0, #(0x1  <<12)         ; Clear bit 0

            MCR     p15, #0, r0, c1, c0, #0      ; Write System Control Register

            BX      lr

        .global EnableDCaches

    ;==================================================================

    ;  This API enables data cache.

    ;==================================================================

    EnableDCaches:

            MRC     p15, #0, r0, c1, c0, #0      ; Read System Control Register

           ORR     r0, r0, #(0x1 << 2)          ; Set C bit  2 to enable D Cache

           ;BIC   r0, r0, #(0x1  << 2)           ; Clear bit 0              

           ;BIC     r0, r0, #(0x1 << 1)          ; disable alignment checks

            MCR     p15, #0, r0, c1, c0, #0      ; Write System Control Register

            BX      lr

        .global EnableL2UnifiedCache

    ;==================================================================

    ; Enable Cortex-A8 Level2 Unified Cache

    ;==================================================================

    EnableL2UnifiedCache:

            MRC     p15, #0, r0, c1, c0, #1      ; Read Auxiliary Control Register

            ORR     r0, r0, #2                     ; L2EN bit, enable L2 cache

            ;BIC   r0, r0, #(0x1  << 1)         ; L2EN bit, disable L2 cache

            ;ORR     r0, r0, #(0x1  << 4)        ;Enables speculative accesses on AXI

            ORR     r0, r0, #(0x1  << 4)        ;Enables speculative accesses on AXI

            ORR     r0, r0, #(0x1  << 5)        ;Enables caching NEON data within the L1 data cache

            MCR     p15, #0, r0, c1, c0, #1      ; Write Auxiliary Control Register

            BX      lr

        .global EnableBrachPrediction

    ;==================================================================

    ;  This API enables branch prediction

    ;==================================================================

    EnableBrachPrediction:

            MRC     p15, #0, r0, c1, c0, #0      ; Read System Control Register

            ORR     r0, r0, #(0x1 << 11)        ; Set Z bit 11 to enable branch prediction

            ;BIC    r0, r0, #(0x1  << 11)       ; Disable all forms of branch prediction

            MCR     p15, #0, r0, c1, c0, #0      ; Write System Control Register

            BX      lr

          

    c_stack:        .long   __stack

    c_STACK_SIZE:   .long   __STACK_SIZE

    ;//******************************************************************************

    ;//                                POINTERS TO VARIABLES

    ;//******************************************************************************

    ;    ENDIF

        .END

  • Hello Gopu,

    thank you for your detailed explanations.

    May ask you the reason why the board equips LPSDR SDRAM?

    According to the OMAP3515 reference manual (http://www.tij.co.jp/jp/lit/ds/symlink/omap3515.pdf),

    there are below descriptions in "6.4.2 SDRAM Controller Subsystem (SDRC)".

    The SDRC module only supports lowpower double-data-rate (LPDDR) SDRAM devices.

    Although I don't know LPDDR specs well, the commands will be double data rate.

    I guess the transactions would succeed by chance at dcache off case because the transaction would be a single access.

    I am afraid the burst transaction in dcache on case would not match the SDRAM specs.

    In the OMAP3515 reference manual, the SDRAM timing chart was not described and I am not sure whether my guess is correct.

    Can't you replace the SDRAM for LPDDR?

    I'm sorry but I can only say it by the information from you.

    Best regards,

    Yasuhiko Koumoto.

  • Hi,

    Thanks a lot for the reply.  OMAP3515 supports both SDR and DDR, But we have mounted SDR as of now. We will be getting a new board with DDR after a month time. May be I can check it at that time. Other than that, do you feel I am missing some settings for MMU.

    Thanks a lot again,

    Gopu

  • Hello Gopu,

    I'm sorry but I had not looked at your code and now I look at it.

    I wonder why you enable D-Cache and MMU separately.

    Because D-Cache will be enabled when MMU is enabled, I think both of D-Cache and MMU should be enabled at the same time.

    However, it might have nothing with the problem.

    As for the other parts, I think there would be OK.

    By the way, can you do an experiment of the write-through?

    You can find by bit31 of the Cache Size Identification Register whether the processor will support the write-through (i.e. C=1, B=0).

    Although the write-through mode is identical to un-cacheable in Cortex-A9, I could not find such the statement in Cortex-A8 TRM.

    But I'm not sure that the phenomenon would be changed.

    Best regards,

    Yasuhiko Koumoto.

  • Hi,

      Thanks a lot for going through the code. I have implemented by enabling I cache D cache and MMU at the same time.

    But modified after going through the below links 1.  ARM Information Center

    and 2. ARM Information Center

    Already I have tried all the combinations of C and B bit, but the problem still exist.

    Thanks and regards,

    Gopu

  • Hello vskgopu,


    The contents of

    1.  ARM Information Center

    would be no errors but I and D cache enabling codes are not shown.

    The contents of

    2. ARM Information Center

    would also be no errors.

    Unless I or D cache and MMU are enabled at the same time, 'Not Allowed' state of the link 2 would happen.

    Of course, I know these my comments would not be related with your problem.

    Best regards,

    Yasuhiko Koumoto.

  • Hello,

    can I confirm one thing?

    What was the result of your experiment when the C=0 and B=0?

    In this case, it is equivalent to D-cache disable.

    Had the combination succeeded?

    Best regards,

    Yasuhiko Koumoto.

  • Hello,

           Here is the test report

    Test 1:

      MMU - Enabled

      I Cache - Enabled

      D Cahce - Enabled

      C Bit - Disabled

      B Bit - Disabled

      Brach Predictin - Enabled

      Time taken to execute the code in internal ram is 312 milli seconds.

    Test 2:

      MMU - Enabled

      I Cache - Enabled

      D Cahce - Disabled

      C Bit - Enabled

      B Bit - Enabled

      Brach Predictin - Enabled

      Time taken to execute the code in internal ram is 7 milli seconds.

    The following are the further test reports

    milli sec(s) Execution Region MMU I Cache D Cahce C Bit B Bit Brach Predictin

    7 SRAM Enabled Enabled Enabled Enabled Enabled Enabled

    292 SRAM Enabled Disabled Disabled Enabled Enabled Enabled

    7 SRAM Disabled Enabled Disabled Enabled Enabled Enabled

    292 SRAM Disabled Disabled Disabled Enabled Enabled Enabled

    312 SRAM Enabled Enabled Enabled Disabled Disabled Enabled

    7 SRAM Enabled Enabled Enabled Enabled Disabled Enabled

    312 SRAM Enabled Enabled Enabled Disabled Enabled Enabled

    7 SRAM Enabled Enabled Disabled Enabled Enabled Enabled

    Thanks and regards,

    Gopu

  • Hello Gopu,

    thank you for the information. I's interesting.

    According to the results, it seems that the performance depend only on C-bit regardless of whether D-cache is enable or disable.

    I checked the OMAP3515 block diagram shown below.

    OMAP3515.jpg

    As you can see, there are L2-Cache between CPU and SRAM.

    I guess the attribute of L1 caching was absorbed in L2Cache because L2Cache was disable and L2Cache would react according only to C-bit (which is a part of ARCACHE/AWCACHE of an interconnect between CPU and L2Cache).

    That is, for the SRAM veiw, L1 D-cache state (i.e. enable or disable) will not affect the SRAM performance when MMU is enabled.

    It is just assumption and only the following case cannot be explained.

    7  SRAM Disabled Enabled  Disabled C=1 B=1 Enabled

    By the way, I had wanted to know the results of the SDRAM and the combination of C-bit and B-bit.

    In the SDRAM case, did the transaction always failed? Or, were there any conditions which had succeeded?

    Best regards,

    Yasuhiko Koumoto.


  • Hello,

    Thanks a lot for the analysis. Here is the report of code in SDRAM

    milli sec(s)      Execution Region      MMU      I Cache      D Cahce      C Bit      B Bit      Brach Predictin

    7                 SDRAM                Enabled   Enabled     Enabled    Enabled    Enabled       Enabled

    483               SDRAM                Enabled   Disabled    Disabled   Enabled    Enabled       Enabled

    7                 SDRAM                Disabled  Enabled     Disabled   Enabled    Enabled       Enabled

    483               SDRAM                Disabled  Disabled    Disabled   Enabled    Enabled       Enabled

    483               SDRAM                Enabled   Enabled     Enabled    Disabled   Disabled      Enabled

    7                 SDRAM                Enabled   Enabled     Enabled    Enabled    Disabled      Enabled

    483               SDRAM                Enabled   Enabled     Enabled    Disabled   Enabled       Enabled

    7                 SDRAM                Enabled   Enabled     Disabled   Enabled    Enabled       Enabled

    Thanks and regards,

    Gopu

  • Hello Gopu,

    tnank you.

    Regarding SDRAM case, the following 2 cases are against to my assumption.

    Can anyone explain the phenomena without inconsistency?

    Might it be possible if both caches were disabled the C-bit would be ignored?

    Anyway, the 2nd case could not be explained and it would be the same as the SRAM case.

    483              SDRAM                Enabled  Disabled    Disabled  Enabled    Enabled      Enabled

    7                SDRAM                Disabled  Enabled    Disabled  Enabled    Enabled      Enabled

    Do the other conditions which are not listed cause SDRAM crush?

    Best regards.

    Yasuhiko Koumoto.

  • Hello,

         For enabling L2 cache, is it enough to do the following or do I have to do some other settings as well ?

    ;==================================================================

    ; Enable Cortex-A8 Level2 Unified Cache

    ;==================================================================

    EnableL2UnifiedCache:

            MRC     p15, #0, r0, c1, c0, #1      ; Read Auxiliary Control Register

            ORR     r0, r0, #2                     ; L2EN bit, enable L2 cache

            ;BIC   r0, r0, #(0x1  << 1)         ; L2EN bit, disable L2 cache

            ;ORR     r0, r0, #(0x1  << 4)        ;Enables speculative accesses on AXI

            ORR     r0, r0, #(0x1  << 4)        ;Enables speculative accesses on AXI

            ORR     r0, r0, #(0x1  << 5)        ;Enables caching NEON data within the L1 data cache

            MCR     p15, #0, r0, c1, c0, #1      ; Write Auxiliary Control Register

            BX      lr

    Thanks and regards,

    Gopu

  • Hello Gopu,

    you should clear C bit in the CP15 Control Register c1 before initializing L2 Cache.

    Secondary, you should invalidate L2 Cache by similar method to L1 Cache.

    These two steps are lost.

    Finally, you should set C bit in the CP15 Control Register c1.

    For your reference, The below are  L2 Cache enable/disable sequences extracted from "Cortex™-A8 Technical Reference Manual Revision: r3p2".

    8.3 Enabling and disabling the L2 cache controller

    To enable the L2 cache following a reset or to change the settings of the L2 Cache Auxiliary Control Register, you must use the following sequence:

    1. Complete the processor reset sequence or disable the L2 cache.
    2. Program the L2 Cache Auxiliary Control Register. See c9, L2 Cache Auxiliary Control Register on page 3-95 for details.

    Note
    If you have configured the processor to support parity or ECC memory, you must enable those features before you can program the C bit.

        MRC p15, 1, <Rd>, c9, c0, 2 ; Read L2 Cache Auxiliary Control Register
        MCR p15, 1, <Rd>, c9, c0, 2 ; Write L2 Cache Auxiliary Control Register

    3. Program the Auxiliary Control Register to set the L2EN bit to 1. See c1, Auxiliary Control Register on page 3-47 for details.

        MRC p15, 0, <Rd>, c1, c0, 1 ; Read Auxiliary Control Register
        MCR p15, 0, <Rd>, c1, c0, 1 ; Write Auxiliary Control Register

    4. Program the C bit in the CP15 Control Register c1. See c1, Control Register on page 3-44 for details.

        MRC p15, 0, <Rd>, c1, c0, 0 ; Read Control Register
        MCR p15, 0, <Rd>, c1, c0, 0 ; Write Control Register

    To disable the L2 cache, but leave the L1 data cache enabled, use the following sequence:
    1. Disable the C bit.
    2. Clean and invalidate the L1 and L2 caches.
    3. Disable the L2 cache by clearing the L2EN bit to 0.
    4. Enable the C bit.
    Note
    To keep memory coherent when using cache maintenance operations, you must follow the L2 cache disabling sequence. Cache maintenance operations have an effect on the L1 and L2 caches when they are disabled. A cache maintenance operation can evict a cache line from the L1 data cache. If the L2EN bit is set to 1, the evicted cache line can be allocated to the L2 cache. If the L2EN bit is not set to 1, then evictions from the L1 data cache are sent directly to external memory using the AXI interface.

    Best regards,

    Yasuhiko Koumoto.