We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
Hi ,
I would like to validate L2 cache memory using U-Boot code which running on cortex-A9 dual core.
Here is my L2 Cache initialisation code , While Reading/Writing to DDR Memory location, I doesn't see any Drhit,Dwhit event count register gets updated.
Kindly let me know to Observe the Event Counter register does MMU initialisation required ? or the initialisation sequence is wrong .
Hi MarekByKowki
After Enabling MMU Observed the Drhit and Dwhits event counter register update.
I have done a some test written 512kbyte of date to cache-able region in DDR and read 100 time the same 512kbytes location .observed a huge time different below are the reports
Test 1:
Enabled D-Cache, Branch Prediction and MMU :
Time taken is 15.917 Seconds
Test 2:
Disabled D-Cache,Branch Prediction and MMU:
Time taken is 9.742 Seconds
Not Sure why the time taken is hugh if we enable D-cache , Branch Prediction and MMU compare to Test 2(Disabling D-cache, Branch Prediction and MMU).
As per my understanding if we Enable D-Cache ,Branch Prediction and MMU the read/Write should be faster ,but Observed the weird behaviour .
Is this expected or my Understanding is wrong?
Here is my code to Initialise l2_cache and MMU :
{no format}
l2_cache_init:
MRC p15, 0, r0, c1, c0, 0 ;@ Read System Control RegisterORR r0, r0, #(0x1 << 12) ;@ Set I bit 12 to enable I CacheORR r0, r0, #(0x1 << 2) ;@ Set C bit 2 to enable D CacheORR r0, r0, #(0x1 << 11) ;@ Set Z bit 11 to enable branch predictionMCR p15, 0, r0, c1, c0, 0 ;@ Write System Control Register
ldr r0,L2CC_PL310
@ Set aux cntrl@ Way size = 64KB
ldr r1, =0x31160000str r1, [r0,#0x104]
@ Set tag RAM latency@ 8 cycles RAM write access latency@ 8 cycles RAM read access latency@ 8 cycles RAM setup latency
ldr r1, =0x00000777str r1, [r0,#0x108]
@ Set Data RAM latency@ 8 cycles RAM write access latency@ 8 cycles RAM read access latency@ 8 cycles RAM setup latency
ldr r1, =0x00000777str r1, [r0,#0x10C]
@Cache maintenance - invalidate by way (0xff) - base offset 0x77Cldr r1, =0xFFstr r1, [r0,#0x77C]
poll_invalidate:ldr r1, [r0,#0x77C]tst r1, #1bne poll_invalidate
@ Enable Event Counter Control Register. Reset counter 0 and 1 values
ldr r1, =0x007str r1, [r0,#0x200]
@ Counter 1. Count Drhit event
LDR r1, =0x008STR r1, [r0,#0x204]
@ Counter 0. Count Dwhit eventLDR r1, =0x010STR r1, [r0,#0x208]
@ Ensure L2 remains disabled for the time beingLDR r1, =0x0STR r1, [r0,#0x100]
MOVW R9, #0x1080 ;@ Setting for CPU Config Address 0 registerMOVT R9, #0xD456LDR R8,[R9]ORR R8, R8, #(1<<1) ;@ Setting for L2CC Cache frequency as 400MHzSTR R8, [R9]
;@ Disable MMU.MRC p15, 0, r1, c1, c0, 0 ;@ Read Control Register configuration data.BIC r1, r1, #0x1MCR p15, 0, r1, c1, c0, 0 ;@ Write Control Register configuration data.
;@ Disable L1 Caches.MRC p15, 0, r1, c1, c0, 0 ;@ Read Control Register configuration data.BIC r1, r1, #(0x1 << 12) ;@ Disable I Cache.BIC r1, r1, #(0x1 << 2) ;@ Disable D Cache.MCR p15, 0, r1, c1, c0, 0 ;@ Write Control Register configuration data
;@ Invalidate L1 Caches.;@ Invalidate Instruction cache.MOV r1, #0MCR p15, 0, r1, c7, c5, 0
;@ Invalidate Data cache.;@ To make the code general purpose, calculate the;@ cache size first and loop through each set + way.
MRC p15, 1, r0, c0, c0, 0 ;@ Read Cache Size ID.LDR r3,=0x1ffAND r0, r3, r0, LSR #13 ;@ r0 = no. of sets - 1.
MOV r1, #0way_loop:MOV r3, #0 @ r3 = set counter set_loop.set_loop:MOV r2, r1, LSL #30ORR r2, r3, LSL #5 @ r2 = set/way cache operation format.MCR p15, 0, r2, c7, c6, 2 @ Invalidate the line described by r2.ADD r3, r3, #1 @ Increment set counter.CMP r0, r3 @ Last set reached yet?BGT set_loop @ If not, iterate set_loop,ADD r1, r1, #1 @ else, next.CMP r1, #4 @ Last way reached yet?BNE way_loop @ if not, iterate way_loop.
@ Invalidate TLBMCR p15, 0, r1, c8, c7, 0
@ Branch Prediction Enable.MOV r1, #0MRC p15, 0, r1, c1, c0, 0 @ Read Control Register configuration data.ORR r1, r1, #(0x1 << 11) @ Global BP Enable bit.MCR p15, 0, r1, c1, c0, 0
@The following table shows the code you must use to create your translation tables. Use the variable ttb_address to denote the address for the initial translation table. This must be a 16KB area of memory whose start address is aligned to a 16KB boundary, to which an L1 translation table can be written.@Example 4.2. Create translation tables@ Enable D-side PrefetchMRC p15, 0, r1, c1, c0, 1 @ Read Auxiliary Control Register.ORR r1, r1, #(0x1 <<2) @ Enable D-side prefetch.MCR p15, 0, r1, c1, c0, 1; @ Write Auxiliary Control Register.DSBISB@ DSB causes completion of all cache maintenance operations appearing in program@ order before the DSB instruction.@ An ISB instruction causes the effect of all branch predictor maintenance@ operations before the ISB instruction to be visible to all instructions@ after the ISB instruction.@ Initialize PageTable.
@ Create a basic L1 page table in RAM, with 1MB sections containing a flat@ (VA=PA) mapping, all pages Full Access, Strongly Ordered.
@ It would be faster to create this in a read-only section in an assembly file.
LDR r0, =0xDE2 @ r0 is the non-address part of@ descriptor.LDR r1, ttb_addressLDR r3, = 4095write_pte:ORR r2, r0, r3, LSL #20 @ OR together address & default PTE bits.STR r2, [r1, r3, LSL #2] @ Write PTE to TTB.SUBS r3, r3, #1 @ Decrement loop counter.BNE write_pte
@ For the first entry in the table, You can make it cacheable, normal, @ write-back, write allocate.BIC r0, r0, #0xc @ Clear CB bits.ORR r0, r0, #0x4 @ inner write-back, write allocateBIC r0, r0, #0x7000 @ Clear TEX bits.ORR r0, r0, #0x5000 @ set TEX as write-back, write allocateORR r0, r0, #0x10000 @ shareable.STR r0, [r1]
LDREQ r0, L2CC_PL310LDREQ r1, =0x1STREQ r1, [r0,#0x100]
@ Initialize MMU.MOV r1,#0x0MCR p15, 0, r1, c2, c0, 2 @ Write Translation Table Base Control Register.LDR r1, ttb_addressMCR p15, 0, r1, c2, c0, 0 @ Write Translation Table Base Register 0.
@ In this simple example, do not use TRE or Normal Memory Remap Register.@ Set all Domains to Client.LDR r1, =0x55555555MCR p15, 0, r1, c3, c0, 0 @ Write Domain Access Control Register.
@ Enable MMUMRC p15, 0, r1, c1, c0, 0 @ Read Control Register configuration data.ORR r1, r1, #0x1 @ Bit 0 is the MMU enable.MCR p15, 0, r1, c1, c0, 0
mov pc, lrttb_address:.word 0x18000
L2CC_PL310:.word 0xD46F4000
--------------------------------------------------------------------------------------------------------------------------------------------------
DDR Test Case(Application) :
{no format }
int l2ccddr_test(unsigned int data){unsigned int data_should,*address, i,j=0;unsigned int errors=0;address = (u32*)DDR_BASE_ADDRESS;data_should= data;printf("1.Drhit = %x Dwhit = %x \n",readl(0xd46f420c),readl(0xd46f4210));for (i=0;i<0x12000;i=i+4){get_timer(0);*address = data;address = address + 1;}
printf("2.Drhit = %x Dwhit = %x \n",readl(0xd46f420c),readl(0xd46f4210));for(j=0;j<100;j++) {address = (u32*)DDR_BASE_ADDRESS;for (i=0;i<0x12000;i=i+4){get_timer(0);data=*address;if(data!=data_should)printf("ERROR: Addres 0x%p ,Should be :0x%x Is: 0x%x\r\n", address,data_should,data);address = address + 1;}}printf("3.Drhit = %x Dwhit = %x \n",readl(0xd46f420c),readl(0xd46f4210));return 0;}
MMU enable let's you treat a region as the Cacheable Memory. In other words you cannot have D-Cache without the MMU enabled. I think you may have been observing the correct behavior. With treating a region as a memory allows a number of optimizations (out-of-order execution, merging, speculation, multi-issuing), plus faster memory access if a memory region is Cacheble.