This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

L2 Cache(Pl310) initialisation sequence

Hi ,

I would like to validate L2 cache memory using U-Boot code which running on cortex-A9 dual core.

Here is my L2 Cache initialisation code , While Reading/Writing to DDR Memory location, I doesn't see any Drhit,Dwhit event count register gets updated.

Kindly let me know to Observe the Event Counter register does MMU initialisation required ?   or the initialisation sequence is wrong .

{ no format }
l2_cache_init:
        
        MRC p15, 0, r0, c1, c0, 0 ;@ Read System Control Register
        ORR r0, r0, #(0x1 << 12) ;@ Set I bit 12 to enable I Cache
        ORR r0, r0, #(0x1 << 2) ;@ Set C bit 2 to enable D Cache
        ORR r0, r0, #(0x1 << 11) ;@ Set Z bit 11 to enable branch prediction
        MCR p15, 0, r0, c1, c0, 0 ;@ Write System Control Register
        ldr r0,L2CC_PL310
        @ Set aux cntrl
        @ Way size = 64KB
        ldr     r1, =0x31160000
        str     r1, [r0,#0x104]
        @ Set tag RAM latency
        @ 8 cycles RAM write access latency
        @ 8 cycles RAM read access latency
        @ 8 cycles RAM setup latency
        ldr     r1, =0x00000777
        str     r1, [r0,#0x108]
        @ Set Data RAM latency
        @ 8 cycles RAM write access latency
        @ 8 cycles RAM read access latency
        @ 8 cycles RAM setup latency
        ldr     r1, =0x00000777
        str     r1, [r0,#0x10C]
        @Cache maintenance - invalidate by way (0xff) - base offset 0x77C
        ldr     r1, =0xFF
        str     r1, [r0,#0x77C]
poll_invalidate:
        ldr     r1, [r0,#0x77C]
        tst     r1, #1
        bne     poll_invalidate
        @ Enable Event Counter Control Register. Reset counter 0 and 1 values
        ldr     r1, =0x007
        str     r1, [r0,#0x200]
 @ Counter 1. Count Drhit event
        LDR     r1, =0x008
        STR     r1, [r0,#0x204]
@ Counter 0. Count Dwhit event
        LDR     r1, =0x010
        STR     r1, [r0,#0x208]
        @ Ensure L2 remains Enabled
        LDR     r1, =0x1
        STR     r1, [r0,#0x100]
        mov     pc, lr
 
L2CC_PL310:
             .word   0xD46F4000
{no format}
Parents
  • Hi MarekByKowki

    After Enabling MMU Observed the Drhit and Dwhits event counter register update.

    I have done a some test written  512kbyte of date to cache-able region in DDR and read 100 time the same 512kbytes location .observed a huge time different below are the reports 

    Test 1:

    Enabled  D-Cache, Branch Prediction and MMU :

     Time taken is 15.917 Seconds 

    Test 2:

    Disabled D-Cache,Branch Prediction and MMU:

    Time taken is 9.742 Seconds

    Not Sure why the time taken is hugh if we enable D-cache , Branch Prediction and MMU compare to Test 2(Disabling D-cache, Branch Prediction and MMU).

    As per my understanding if we Enable D-Cache ,Branch Prediction and MMU the read/Write should be faster ,but Observed the weird behaviour  .

    Is this expected or my Understanding is wrong?

    Here is my code  to Initialise l2_cache and MMU :

    {no format}

    l2_cache_init:

    MRC p15, 0, r0, c1, c0, 0 ;@ Read System Control Register
    ORR r0, r0, #(0x1 << 12) ;@ Set I bit 12 to enable I Cache
    ORR r0, r0, #(0x1 << 2) ;@ Set C bit 2 to enable D Cache
    ORR r0, r0, #(0x1 << 11) ;@ Set Z bit 11 to enable branch prediction
    MCR p15, 0, r0, c1, c0, 0 ;@ Write System Control Register


    ldr r0,L2CC_PL310

    @ Set aux cntrl
    @ Way size = 64KB

    ldr r1, =0x31160000
    str r1, [r0,#0x104]

    @ Set tag RAM latency
    @ 8 cycles RAM write access latency
    @ 8 cycles RAM read access latency
    @ 8 cycles RAM setup latency

    ldr r1, =0x00000777
    str r1, [r0,#0x108]

    @ Set Data RAM latency
    @ 8 cycles RAM write access latency
    @ 8 cycles RAM read access latency
    @ 8 cycles RAM setup latency

    ldr r1, =0x00000777
    str r1, [r0,#0x10C]

    @Cache maintenance - invalidate by way (0xff) - base offset 0x77C
    ldr r1, =0xFF
    str r1, [r0,#0x77C]

    poll_invalidate:
    ldr r1, [r0,#0x77C]
    tst r1, #1
    bne poll_invalidate

    @ Enable Event Counter Control Register. Reset counter 0 and 1 values

    ldr r1, =0x007
    str r1, [r0,#0x200]

    @ Counter 1. Count Drhit event

    LDR r1, =0x008
    STR r1, [r0,#0x204]

    @ Counter 0. Count Dwhit event
    LDR r1, =0x010
    STR r1, [r0,#0x208]

    @ Ensure L2 remains disabled for the time being
    LDR r1, =0x0
    STR r1, [r0,#0x100]

    MOVW R9, #0x1080 ;@ Setting for CPU Config Address 0 register
    MOVT R9, #0xD456
    LDR R8,[R9]
    ORR R8, R8, #(1<<1) ;@ Setting for L2CC Cache frequency as 400MHz
    STR R8, [R9]


    ;@ Disable MMU.
    MRC p15, 0, r1, c1, c0, 0 ;@ Read Control Register configuration data.
    BIC r1, r1, #0x1
    MCR p15, 0, r1, c1, c0, 0 ;@ Write Control Register configuration data.

    ;@ Disable L1 Caches.
    MRC p15, 0, r1, c1, c0, 0 ;@ Read Control Register configuration data.
    BIC r1, r1, #(0x1 << 12) ;@ Disable I Cache.
    BIC r1, r1, #(0x1 << 2) ;@ Disable D Cache.
    MCR p15, 0, r1, c1, c0, 0 ;@ Write Control Register configuration data

    ;@ Invalidate L1 Caches.
    ;@ Invalidate Instruction cache.
    MOV r1, #0
    MCR p15, 0, r1, c7, c5, 0

    ;@ Invalidate Data cache.
    ;@ To make the code general purpose, calculate the
    ;@ cache size first and loop through each set + way.

    MRC p15, 1, r0, c0, c0, 0 ;@ Read Cache Size ID.
    LDR r3,=0x1ff
    AND r0, r3, r0, LSR #13 ;@ r0 = no. of sets - 1.

    MOV r1, #0
    way_loop:
    MOV r3, #0 @ r3 = set counter set_loop.
    set_loop:
    MOV r2, r1, LSL #30
    ORR r2, r3, LSL #5 @ r2 = set/way cache operation format.
    MCR p15, 0, r2, c7, c6, 2 @ Invalidate the line described by r2.
    ADD r3, r3, #1 @ Increment set counter.
    CMP r0, r3 @ Last set reached yet?
    BGT set_loop @ If not, iterate set_loop,
    ADD r1, r1, #1 @ else, next.
    CMP r1, #4 @ Last way reached yet?
    BNE way_loop @ if not, iterate way_loop.

    @ Invalidate TLB
    MCR p15, 0, r1, c8, c7, 0

    @ Branch Prediction Enable.
    MOV r1, #0
    MRC p15, 0, r1, c1, c0, 0 @ Read Control Register configuration data.
    ORR r1, r1, #(0x1 << 11) @ Global BP Enable bit.
    MCR p15, 0, r1, c1, c0, 0

    @The following table shows the code you must use to create your translation tables. Use the variable ttb_address to denote the address for the initial translation table. This must be a 16KB area of memory whose start address is aligned to a 16KB boundary, to which an L1 translation table can be written.
    @Example 4.2. Create translation tables
    @ Enable D-side Prefetch
    MRC p15, 0, r1, c1, c0, 1 @ Read Auxiliary Control Register.
    ORR r1, r1, #(0x1 <<2) @ Enable D-side prefetch.
    MCR p15, 0, r1, c1, c0, 1; @ Write Auxiliary Control Register.
    DSB
    ISB
    @ DSB causes completion of all cache maintenance operations appearing in program
    @ order before the DSB instruction.
    @ An ISB instruction causes the effect of all branch predictor maintenance
    @ operations before the ISB instruction to be visible to all instructions
    @ after the ISB instruction.
    @ Initialize PageTable.

    @ Create a basic L1 page table in RAM, with 1MB sections containing a flat
    @ (VA=PA) mapping, all pages Full Access, Strongly Ordered.

    @ It would be faster to create this in a read-only section in an assembly file.

    LDR r0, =0xDE2 @ r0 is the non-address part of
    @ descriptor.
    LDR r1, ttb_address
    LDR r3, = 4095
    write_pte:
    ORR r2, r0, r3, LSL #20 @ OR together address & default PTE bits.
    STR r2, [r1, r3, LSL #2] @ Write PTE to TTB.
    SUBS r3, r3, #1 @ Decrement loop counter.
    BNE write_pte

    @ For the first entry in the table, You can make it cacheable, normal, @ write-back, write allocate.
    BIC r0, r0, #0xc @ Clear CB bits.
    ORR r0, r0, #0x4 @ inner write-back, write allocate
    BIC r0, r0, #0x7000 @ Clear TEX bits.
    ORR r0, r0, #0x5000 @ set TEX as write-back, write allocate
    ORR r0, r0, #0x10000 @ shareable.
    STR r0, [r1]

    LDREQ r0, L2CC_PL310
    LDREQ r1, =0x1
    STREQ r1, [r0,#0x100]

    @ Initialize MMU.
    MOV r1,#0x0
    MCR p15, 0, r1, c2, c0, 2 @ Write Translation Table Base Control Register.
    LDR r1, ttb_address
    MCR p15, 0, r1, c2, c0, 0 @ Write Translation Table Base Register 0.

    @ In this simple example, do not use TRE or Normal Memory Remap Register.
    @ Set all Domains to Client.
    LDR r1, =0x55555555
    MCR p15, 0, r1, c3, c0, 0 @ Write Domain Access Control Register.

    @ Enable MMU
    MRC p15, 0, r1, c1, c0, 0 @ Read Control Register configuration data.
    ORR r1, r1, #0x1 @ Bit 0 is the MMU enable.
    MCR p15, 0, r1, c1, c0, 0

    mov pc, lr
    ttb_address:
    .word 0x18000

    L2CC_PL310:
    .word 0xD46F4000

    {no format}

    --------------------------------------------------------------------------------------------------------------------------------------------------

    DDR Test Case(Application) :

    {no format }

    int l2ccddr_test(unsigned int data)
    {
    unsigned int data_should,*address, i,j=0;
    unsigned int errors=0;
    address = (u32*)DDR_BASE_ADDRESS;
    data_should= data;
    printf("1.Drhit = %x Dwhit = %x \n",readl(0xd46f420c),readl(0xd46f4210));
    for (i=0;i<0x12000;i=i+4)
    {
    get_timer(0);
    *address = data;
    address = address + 1;
    }

    printf("2.Drhit = %x Dwhit = %x \n",readl(0xd46f420c),readl(0xd46f4210));
    for(j=0;j<100;j++) {
    address = (u32*)DDR_BASE_ADDRESS;
    for (i=0;i<0x12000;i=i+4)
    {
    get_timer(0);
    data=*address;
    if(data!=data_should)
    printf("ERROR: Addres 0x%p ,Should be :0x%x Is: 0x%x\r\n", address,data_should,data);
    address = address + 1;
    }
    }
    printf("3.Drhit = %x Dwhit = %x \n",readl(0xd46f420c),readl(0xd46f4210));
    return 0;
    }

    {no format}

Reply
  • Hi MarekByKowki

    After Enabling MMU Observed the Drhit and Dwhits event counter register update.

    I have done a some test written  512kbyte of date to cache-able region in DDR and read 100 time the same 512kbytes location .observed a huge time different below are the reports 

    Test 1:

    Enabled  D-Cache, Branch Prediction and MMU :

     Time taken is 15.917 Seconds 

    Test 2:

    Disabled D-Cache,Branch Prediction and MMU:

    Time taken is 9.742 Seconds

    Not Sure why the time taken is hugh if we enable D-cache , Branch Prediction and MMU compare to Test 2(Disabling D-cache, Branch Prediction and MMU).

    As per my understanding if we Enable D-Cache ,Branch Prediction and MMU the read/Write should be faster ,but Observed the weird behaviour  .

    Is this expected or my Understanding is wrong?

    Here is my code  to Initialise l2_cache and MMU :

    {no format}

    l2_cache_init:

    MRC p15, 0, r0, c1, c0, 0 ;@ Read System Control Register
    ORR r0, r0, #(0x1 << 12) ;@ Set I bit 12 to enable I Cache
    ORR r0, r0, #(0x1 << 2) ;@ Set C bit 2 to enable D Cache
    ORR r0, r0, #(0x1 << 11) ;@ Set Z bit 11 to enable branch prediction
    MCR p15, 0, r0, c1, c0, 0 ;@ Write System Control Register


    ldr r0,L2CC_PL310

    @ Set aux cntrl
    @ Way size = 64KB

    ldr r1, =0x31160000
    str r1, [r0,#0x104]

    @ Set tag RAM latency
    @ 8 cycles RAM write access latency
    @ 8 cycles RAM read access latency
    @ 8 cycles RAM setup latency

    ldr r1, =0x00000777
    str r1, [r0,#0x108]

    @ Set Data RAM latency
    @ 8 cycles RAM write access latency
    @ 8 cycles RAM read access latency
    @ 8 cycles RAM setup latency

    ldr r1, =0x00000777
    str r1, [r0,#0x10C]

    @Cache maintenance - invalidate by way (0xff) - base offset 0x77C
    ldr r1, =0xFF
    str r1, [r0,#0x77C]

    poll_invalidate:
    ldr r1, [r0,#0x77C]
    tst r1, #1
    bne poll_invalidate

    @ Enable Event Counter Control Register. Reset counter 0 and 1 values

    ldr r1, =0x007
    str r1, [r0,#0x200]

    @ Counter 1. Count Drhit event

    LDR r1, =0x008
    STR r1, [r0,#0x204]

    @ Counter 0. Count Dwhit event
    LDR r1, =0x010
    STR r1, [r0,#0x208]

    @ Ensure L2 remains disabled for the time being
    LDR r1, =0x0
    STR r1, [r0,#0x100]

    MOVW R9, #0x1080 ;@ Setting for CPU Config Address 0 register
    MOVT R9, #0xD456
    LDR R8,[R9]
    ORR R8, R8, #(1<<1) ;@ Setting for L2CC Cache frequency as 400MHz
    STR R8, [R9]


    ;@ Disable MMU.
    MRC p15, 0, r1, c1, c0, 0 ;@ Read Control Register configuration data.
    BIC r1, r1, #0x1
    MCR p15, 0, r1, c1, c0, 0 ;@ Write Control Register configuration data.

    ;@ Disable L1 Caches.
    MRC p15, 0, r1, c1, c0, 0 ;@ Read Control Register configuration data.
    BIC r1, r1, #(0x1 << 12) ;@ Disable I Cache.
    BIC r1, r1, #(0x1 << 2) ;@ Disable D Cache.
    MCR p15, 0, r1, c1, c0, 0 ;@ Write Control Register configuration data

    ;@ Invalidate L1 Caches.
    ;@ Invalidate Instruction cache.
    MOV r1, #0
    MCR p15, 0, r1, c7, c5, 0

    ;@ Invalidate Data cache.
    ;@ To make the code general purpose, calculate the
    ;@ cache size first and loop through each set + way.

    MRC p15, 1, r0, c0, c0, 0 ;@ Read Cache Size ID.
    LDR r3,=0x1ff
    AND r0, r3, r0, LSR #13 ;@ r0 = no. of sets - 1.

    MOV r1, #0
    way_loop:
    MOV r3, #0 @ r3 = set counter set_loop.
    set_loop:
    MOV r2, r1, LSL #30
    ORR r2, r3, LSL #5 @ r2 = set/way cache operation format.
    MCR p15, 0, r2, c7, c6, 2 @ Invalidate the line described by r2.
    ADD r3, r3, #1 @ Increment set counter.
    CMP r0, r3 @ Last set reached yet?
    BGT set_loop @ If not, iterate set_loop,
    ADD r1, r1, #1 @ else, next.
    CMP r1, #4 @ Last way reached yet?
    BNE way_loop @ if not, iterate way_loop.

    @ Invalidate TLB
    MCR p15, 0, r1, c8, c7, 0

    @ Branch Prediction Enable.
    MOV r1, #0
    MRC p15, 0, r1, c1, c0, 0 @ Read Control Register configuration data.
    ORR r1, r1, #(0x1 << 11) @ Global BP Enable bit.
    MCR p15, 0, r1, c1, c0, 0

    @The following table shows the code you must use to create your translation tables. Use the variable ttb_address to denote the address for the initial translation table. This must be a 16KB area of memory whose start address is aligned to a 16KB boundary, to which an L1 translation table can be written.
    @Example 4.2. Create translation tables
    @ Enable D-side Prefetch
    MRC p15, 0, r1, c1, c0, 1 @ Read Auxiliary Control Register.
    ORR r1, r1, #(0x1 <<2) @ Enable D-side prefetch.
    MCR p15, 0, r1, c1, c0, 1; @ Write Auxiliary Control Register.
    DSB
    ISB
    @ DSB causes completion of all cache maintenance operations appearing in program
    @ order before the DSB instruction.
    @ An ISB instruction causes the effect of all branch predictor maintenance
    @ operations before the ISB instruction to be visible to all instructions
    @ after the ISB instruction.
    @ Initialize PageTable.

    @ Create a basic L1 page table in RAM, with 1MB sections containing a flat
    @ (VA=PA) mapping, all pages Full Access, Strongly Ordered.

    @ It would be faster to create this in a read-only section in an assembly file.

    LDR r0, =0xDE2 @ r0 is the non-address part of
    @ descriptor.
    LDR r1, ttb_address
    LDR r3, = 4095
    write_pte:
    ORR r2, r0, r3, LSL #20 @ OR together address & default PTE bits.
    STR r2, [r1, r3, LSL #2] @ Write PTE to TTB.
    SUBS r3, r3, #1 @ Decrement loop counter.
    BNE write_pte

    @ For the first entry in the table, You can make it cacheable, normal, @ write-back, write allocate.
    BIC r0, r0, #0xc @ Clear CB bits.
    ORR r0, r0, #0x4 @ inner write-back, write allocate
    BIC r0, r0, #0x7000 @ Clear TEX bits.
    ORR r0, r0, #0x5000 @ set TEX as write-back, write allocate
    ORR r0, r0, #0x10000 @ shareable.
    STR r0, [r1]

    LDREQ r0, L2CC_PL310
    LDREQ r1, =0x1
    STREQ r1, [r0,#0x100]

    @ Initialize MMU.
    MOV r1,#0x0
    MCR p15, 0, r1, c2, c0, 2 @ Write Translation Table Base Control Register.
    LDR r1, ttb_address
    MCR p15, 0, r1, c2, c0, 0 @ Write Translation Table Base Register 0.

    @ In this simple example, do not use TRE or Normal Memory Remap Register.
    @ Set all Domains to Client.
    LDR r1, =0x55555555
    MCR p15, 0, r1, c3, c0, 0 @ Write Domain Access Control Register.

    @ Enable MMU
    MRC p15, 0, r1, c1, c0, 0 @ Read Control Register configuration data.
    ORR r1, r1, #0x1 @ Bit 0 is the MMU enable.
    MCR p15, 0, r1, c1, c0, 0

    mov pc, lr
    ttb_address:
    .word 0x18000

    L2CC_PL310:
    .word 0xD46F4000

    {no format}

    --------------------------------------------------------------------------------------------------------------------------------------------------

    DDR Test Case(Application) :

    {no format }

    int l2ccddr_test(unsigned int data)
    {
    unsigned int data_should,*address, i,j=0;
    unsigned int errors=0;
    address = (u32*)DDR_BASE_ADDRESS;
    data_should= data;
    printf("1.Drhit = %x Dwhit = %x \n",readl(0xd46f420c),readl(0xd46f4210));
    for (i=0;i<0x12000;i=i+4)
    {
    get_timer(0);
    *address = data;
    address = address + 1;
    }

    printf("2.Drhit = %x Dwhit = %x \n",readl(0xd46f420c),readl(0xd46f4210));
    for(j=0;j<100;j++) {
    address = (u32*)DDR_BASE_ADDRESS;
    for (i=0;i<0x12000;i=i+4)
    {
    get_timer(0);
    data=*address;
    if(data!=data_should)
    printf("ERROR: Addres 0x%p ,Should be :0x%x Is: 0x%x\r\n", address,data_should,data);
    address = address + 1;
    }
    }
    printf("3.Drhit = %x Dwhit = %x \n",readl(0xd46f420c),readl(0xd46f4210));
    return 0;
    }

    {no format}

Children
  • MMU enable let's you treat a region as the Cacheable Memory. In other words you cannot have D-Cache without the MMU enabled. I think you may have been observing the correct behavior. With treating a region as a memory allows a number of optimizations (out-of-order execution, merging, speculation, multi-issuing), plus faster memory access if a memory region is Cacheble.