Hi ,
I would like to validate L2 cache memory using U-Boot code which running on cortex-A9 dual core.
Here is my L2 Cache initialisation code , While Reading/Writing to DDR Memory location, I doesn't see any Drhit,Dwhit event count register gets updated.
Kindly let me know to Observe the Event Counter register does MMU initialisation required ? or the initialisation sequence is wrong .
Hi MarekByKowki
After Enabling MMU Observed the Drhit and Dwhits event counter register update.
I have done a some test written 512kbyte of date to cache-able region in DDR and read 100 time the same 512kbytes location .observed a huge time different below are the reports
Test 1:
Enabled D-Cache, Branch Prediction and MMU :
Time taken is 15.917 Seconds
Test 2:
Disabled D-Cache,Branch Prediction and MMU:
Time taken is 9.742 Seconds
Not Sure why the time taken is hugh if we enable D-cache , Branch Prediction and MMU compare to Test 2(Disabling D-cache, Branch Prediction and MMU).
As per my understanding if we Enable D-Cache ,Branch Prediction and MMU the read/Write should be faster ,but Observed the weird behaviour .
Is this expected or my Understanding is wrong?
Here is my code to Initialise l2_cache and MMU :
{no format}
l2_cache_init:
MRC p15, 0, r0, c1, c0, 0 ;@ Read System Control RegisterORR r0, r0, #(0x1 << 12) ;@ Set I bit 12 to enable I CacheORR r0, r0, #(0x1 << 2) ;@ Set C bit 2 to enable D CacheORR r0, r0, #(0x1 << 11) ;@ Set Z bit 11 to enable branch predictionMCR p15, 0, r0, c1, c0, 0 ;@ Write System Control Register
ldr r0,L2CC_PL310
@ Set aux cntrl@ Way size = 64KB
ldr r1, =0x31160000str r1, [r0,#0x104]
@ Set tag RAM latency@ 8 cycles RAM write access latency@ 8 cycles RAM read access latency@ 8 cycles RAM setup latency
ldr r1, =0x00000777str r1, [r0,#0x108]
@ Set Data RAM latency@ 8 cycles RAM write access latency@ 8 cycles RAM read access latency@ 8 cycles RAM setup latency
ldr r1, =0x00000777str r1, [r0,#0x10C]
@Cache maintenance - invalidate by way (0xff) - base offset 0x77Cldr r1, =0xFFstr r1, [r0,#0x77C]
poll_invalidate:ldr r1, [r0,#0x77C]tst r1, #1bne poll_invalidate
@ Enable Event Counter Control Register. Reset counter 0 and 1 values
ldr r1, =0x007str r1, [r0,#0x200]
@ Counter 1. Count Drhit event
LDR r1, =0x008STR r1, [r0,#0x204]
@ Counter 0. Count Dwhit eventLDR r1, =0x010STR r1, [r0,#0x208]
@ Ensure L2 remains disabled for the time beingLDR r1, =0x0STR r1, [r0,#0x100]
MOVW R9, #0x1080 ;@ Setting for CPU Config Address 0 registerMOVT R9, #0xD456LDR R8,[R9]ORR R8, R8, #(1<<1) ;@ Setting for L2CC Cache frequency as 400MHzSTR R8, [R9]
;@ Disable MMU.MRC p15, 0, r1, c1, c0, 0 ;@ Read Control Register configuration data.BIC r1, r1, #0x1MCR p15, 0, r1, c1, c0, 0 ;@ Write Control Register configuration data.
;@ Disable L1 Caches.MRC p15, 0, r1, c1, c0, 0 ;@ Read Control Register configuration data.BIC r1, r1, #(0x1 << 12) ;@ Disable I Cache.BIC r1, r1, #(0x1 << 2) ;@ Disable D Cache.MCR p15, 0, r1, c1, c0, 0 ;@ Write Control Register configuration data
;@ Invalidate L1 Caches.;@ Invalidate Instruction cache.MOV r1, #0MCR p15, 0, r1, c7, c5, 0
;@ Invalidate Data cache.;@ To make the code general purpose, calculate the;@ cache size first and loop through each set + way.
MRC p15, 1, r0, c0, c0, 0 ;@ Read Cache Size ID.LDR r3,=0x1ffAND r0, r3, r0, LSR #13 ;@ r0 = no. of sets - 1.
MOV r1, #0way_loop:MOV r3, #0 @ r3 = set counter set_loop.set_loop:MOV r2, r1, LSL #30ORR r2, r3, LSL #5 @ r2 = set/way cache operation format.MCR p15, 0, r2, c7, c6, 2 @ Invalidate the line described by r2.ADD r3, r3, #1 @ Increment set counter.CMP r0, r3 @ Last set reached yet?BGT set_loop @ If not, iterate set_loop,ADD r1, r1, #1 @ else, next.CMP r1, #4 @ Last way reached yet?BNE way_loop @ if not, iterate way_loop.
@ Invalidate TLBMCR p15, 0, r1, c8, c7, 0
@ Branch Prediction Enable.MOV r1, #0MRC p15, 0, r1, c1, c0, 0 @ Read Control Register configuration data.ORR r1, r1, #(0x1 << 11) @ Global BP Enable bit.MCR p15, 0, r1, c1, c0, 0
@The following table shows the code you must use to create your translation tables. Use the variable ttb_address to denote the address for the initial translation table. This must be a 16KB area of memory whose start address is aligned to a 16KB boundary, to which an L1 translation table can be written.@Example 4.2. Create translation tables@ Enable D-side PrefetchMRC p15, 0, r1, c1, c0, 1 @ Read Auxiliary Control Register.ORR r1, r1, #(0x1 <<2) @ Enable D-side prefetch.MCR p15, 0, r1, c1, c0, 1; @ Write Auxiliary Control Register.DSBISB@ DSB causes completion of all cache maintenance operations appearing in program@ order before the DSB instruction.@ An ISB instruction causes the effect of all branch predictor maintenance@ operations before the ISB instruction to be visible to all instructions@ after the ISB instruction.@ Initialize PageTable.
@ Create a basic L1 page table in RAM, with 1MB sections containing a flat@ (VA=PA) mapping, all pages Full Access, Strongly Ordered.
@ It would be faster to create this in a read-only section in an assembly file.
LDR r0, =0xDE2 @ r0 is the non-address part of@ descriptor.LDR r1, ttb_addressLDR r3, = 4095write_pte:ORR r2, r0, r3, LSL #20 @ OR together address & default PTE bits.STR r2, [r1, r3, LSL #2] @ Write PTE to TTB.SUBS r3, r3, #1 @ Decrement loop counter.BNE write_pte
@ For the first entry in the table, You can make it cacheable, normal, @ write-back, write allocate.BIC r0, r0, #0xc @ Clear CB bits.ORR r0, r0, #0x4 @ inner write-back, write allocateBIC r0, r0, #0x7000 @ Clear TEX bits.ORR r0, r0, #0x5000 @ set TEX as write-back, write allocateORR r0, r0, #0x10000 @ shareable.STR r0, [r1]
LDREQ r0, L2CC_PL310LDREQ r1, =0x1STREQ r1, [r0,#0x100]
@ Initialize MMU.MOV r1,#0x0MCR p15, 0, r1, c2, c0, 2 @ Write Translation Table Base Control Register.LDR r1, ttb_addressMCR p15, 0, r1, c2, c0, 0 @ Write Translation Table Base Register 0.
@ In this simple example, do not use TRE or Normal Memory Remap Register.@ Set all Domains to Client.LDR r1, =0x55555555MCR p15, 0, r1, c3, c0, 0 @ Write Domain Access Control Register.
@ Enable MMUMRC p15, 0, r1, c1, c0, 0 @ Read Control Register configuration data.ORR r1, r1, #0x1 @ Bit 0 is the MMU enable.MCR p15, 0, r1, c1, c0, 0
mov pc, lrttb_address:.word 0x18000
L2CC_PL310:.word 0xD46F4000
--------------------------------------------------------------------------------------------------------------------------------------------------
DDR Test Case(Application) :
{no format }
int l2ccddr_test(unsigned int data){unsigned int data_should,*address, i,j=0;unsigned int errors=0;address = (u32*)DDR_BASE_ADDRESS;data_should= data;printf("1.Drhit = %x Dwhit = %x \n",readl(0xd46f420c),readl(0xd46f4210));for (i=0;i<0x12000;i=i+4){get_timer(0);*address = data;address = address + 1;}
printf("2.Drhit = %x Dwhit = %x \n",readl(0xd46f420c),readl(0xd46f4210));for(j=0;j<100;j++) {address = (u32*)DDR_BASE_ADDRESS;for (i=0;i<0x12000;i=i+4){get_timer(0);data=*address;if(data!=data_should)printf("ERROR: Addres 0x%p ,Should be :0x%x Is: 0x%x\r\n", address,data_should,data);address = address + 1;}}printf("3.Drhit = %x Dwhit = %x \n",readl(0xd46f420c),readl(0xd46f4210));return 0;}
MMU enable let's you treat a region as the Cacheable Memory. In other words you cannot have D-Cache without the MMU enabled. I think you may have been observing the correct behavior. With treating a region as a memory allows a number of optimizations (out-of-order execution, merging, speculation, multi-issuing), plus faster memory access if a memory region is Cacheble.