We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
I've been studying and experimenting with the caches on an ARM Cortex-A9, namely a Zynq SoC, for the past week with the main objective of loading and locking part of my code to L2 (PL310). The steps I take to achieve this are:
The code for loading is:
Fullscreen123456789101112131415161718192021extern uint32_t code_start;extern uint32_t code_end;void PreloadCode() { uint32_t* temp; uint32_t dummy; //invalidate all ways and L1 data cache L1ICacheInvalidate(); *REG7_CLEAN_INV_WAY = 0xffff; while(*REG7_CLEAN_INV_WAY); *REG9_CACHE_SYNC = 0; while(*REG9_CACHE_SYNC); //unlock all ways *REG9_D_LOCKDOWN0 = 0x0000; *REG9_I_LOCKDOWN0 = 0x0000; asm volatile ("dsb"); asm volatile ("isb");XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXextern uint32_t code_start; extern uint32_t code_end; void PreloadCode() { uint32_t* temp; uint32_t dummy; //invalidate all ways and L1 data cache L1ICacheInvalidate(); *REG7_CLEAN_INV_WAY = 0xffff; while(*REG7_CLEAN_INV_WAY); *REG9_CACHE_SYNC = 0; while(*REG9_CACHE_SYNC); //unlock all ways *REG9_D_LOCKDOWN0 = 0x0000; *REG9_I_LOCKDOWN0 = 0x0000; asm volatile ("dsb"); asm volatile ("isb"); for(temp = &code_start; temp < &code_end; temp += 1){ asm volatile ("ldr %0, [%1]" : "=r"(dummy) : "r"(temp)); // asm volatile ("pld [%0]" :: "r"(temp) : "memory"); // asm volatile ("pli [%0]" :: "r"(temp) : "memory"); } asm volatile ("dsb"); asm volatile ("isb"); //lock all ways *REG9_D_LOCKDOWN0 = 0xFFFF; *REG9_I_LOCKDOWN0 = 0xFFFF; }
extern uint32_t code_start; extern uint32_t code_end; void PreloadCode() { uint32_t* temp; uint32_t dummy; //invalidate all ways and L1 data cache L1ICacheInvalidate(); *REG7_CLEAN_INV_WAY = 0xffff; while(*REG7_CLEAN_INV_WAY); *REG9_CACHE_SYNC = 0; while(*REG9_CACHE_SYNC); //unlock all ways *REG9_D_LOCKDOWN0 = 0x0000; *REG9_I_LOCKDOWN0 = 0x0000; asm volatile ("dsb"); asm volatile ("isb"); for(temp = &code_start; temp < &code_end; temp += 1){ asm volatile ("ldr %0, [%1]" : "=r"(dummy) : "r"(temp)); // asm volatile ("pld [%0]" :: "r"(temp) : "memory"); // asm volatile ("pli [%0]" :: "r"(temp) : "memory"); } asm volatile ("dsb"); asm volatile ("isb"); //lock all ways *REG9_D_LOCKDOWN0 = 0xFFFF; *REG9_I_LOCKDOWN0 = 0xFFFF; }
I also set up the event counters in the PL310 to count the number of IRHIT (instruction read hits) and IRREQ(instruction read requests). I run a piece of code periodically, resetting the counters at each loop and also invalidating L1 instruction cache.
I was hoping to verify that after each loop I would see the number of hit and requests for instructions in L2 to be the same. However, this does not happen. The number of hits is always 0 which suggests I've locked all L2 but the code was not loaded.
When I run the exact same code without locking L2 at the end. I get the first loop of 0 % hit rate, but all subsequent loops show a 100 % hit.
Do you have any idea what I'm doing wrong?
Note: I'm only using one of the CPUs. Also, the region I want to load is configured in the page table as Outer and Inner Write-Back, Write-Allocate.