I've been studying and experimenting with the caches on an ARM Cortex-A9, namely a Zynq SoC, for the past week with the main objective of loading and locking part of my code to L2 (PL310). The steps I take to achieve this are:
The code for loading is:
Fullscreen123456789101112131415161718192021extern uint32_t code_start;extern uint32_t code_end;void PreloadCode() { uint32_t* temp; uint32_t dummy; //invalidate all ways and L1 data cache L1ICacheInvalidate(); *REG7_CLEAN_INV_WAY = 0xffff; while(*REG7_CLEAN_INV_WAY); *REG9_CACHE_SYNC = 0; while(*REG9_CACHE_SYNC); //unlock all ways *REG9_D_LOCKDOWN0 = 0x0000; *REG9_I_LOCKDOWN0 = 0x0000; asm volatile ("dsb"); asm volatile ("isb");XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXextern uint32_t code_start; extern uint32_t code_end; void PreloadCode() { uint32_t* temp; uint32_t dummy; //invalidate all ways and L1 data cache L1ICacheInvalidate(); *REG7_CLEAN_INV_WAY = 0xffff; while(*REG7_CLEAN_INV_WAY); *REG9_CACHE_SYNC = 0; while(*REG9_CACHE_SYNC); //unlock all ways *REG9_D_LOCKDOWN0 = 0x0000; *REG9_I_LOCKDOWN0 = 0x0000; asm volatile ("dsb"); asm volatile ("isb"); for(temp = &code_start; temp < &code_end; temp += 1){ asm volatile ("ldr %0, [%1]" : "=r"(dummy) : "r"(temp)); // asm volatile ("pld [%0]" :: "r"(temp) : "memory"); // asm volatile ("pli [%0]" :: "r"(temp) : "memory"); } asm volatile ("dsb"); asm volatile ("isb"); //lock all ways *REG9_D_LOCKDOWN0 = 0xFFFF; *REG9_I_LOCKDOWN0 = 0xFFFF; }
extern uint32_t code_start; extern uint32_t code_end; void PreloadCode() { uint32_t* temp; uint32_t dummy; //invalidate all ways and L1 data cache L1ICacheInvalidate(); *REG7_CLEAN_INV_WAY = 0xffff; while(*REG7_CLEAN_INV_WAY); *REG9_CACHE_SYNC = 0; while(*REG9_CACHE_SYNC); //unlock all ways *REG9_D_LOCKDOWN0 = 0x0000; *REG9_I_LOCKDOWN0 = 0x0000; asm volatile ("dsb"); asm volatile ("isb"); for(temp = &code_start; temp < &code_end; temp += 1){ asm volatile ("ldr %0, [%1]" : "=r"(dummy) : "r"(temp)); // asm volatile ("pld [%0]" :: "r"(temp) : "memory"); // asm volatile ("pli [%0]" :: "r"(temp) : "memory"); } asm volatile ("dsb"); asm volatile ("isb"); //lock all ways *REG9_D_LOCKDOWN0 = 0xFFFF; *REG9_I_LOCKDOWN0 = 0xFFFF; }
I also set up the event counters in the PL310 to count the number of IRHIT (instruction read hits) and IRREQ(instruction read requests). I run a piece of code periodically, resetting the counters at each loop and also invalidating L1 instruction cache.
I was hoping to verify that after each loop I would see the number of hit and requests for instructions in L2 to be the same. However, this does not happen. The number of hits is always 0 which suggests I've locked all L2 but the code was not loaded.
When I run the exact same code without locking L2 at the end. I get the first loop of 0 % hit rate, but all subsequent loops show a 100 % hit.
Do you have any idea what I'm doing wrong?
Note: I'm only using one of the CPUs. Also, the region I want to load is configured in the page table as Outer and Inner Write-Back, Write-Allocate.