I've been studying and experimenting with the caches on an ARM Cortex-A9, namely a Zynq SoC, for the past week with the main objective of loading and locking part of my code to L2 (PL310). The steps I take to achieve this are:
The code for loading is:
extern uint32_t code_start; extern uint32_t code_end; void PreloadCode() { uint32_t* temp; uint32_t dummy; //invalidate all ways and L1 data cache L1ICacheInvalidate(); *REG7_CLEAN_INV_WAY = 0xffff; while(*REG7_CLEAN_INV_WAY); *REG9_CACHE_SYNC = 0; while(*REG9_CACHE_SYNC); //unlock all ways *REG9_D_LOCKDOWN0 = 0x0000; *REG9_I_LOCKDOWN0 = 0x0000; asm volatile ("dsb"); asm volatile ("isb"); for(temp = &code_start; temp < &code_end; temp += 1){ asm volatile ("ldr %0, [%1]" : "=r"(dummy) : "r"(temp)); // asm volatile ("pld [%0]" :: "r"(temp) : "memory"); // asm volatile ("pli [%0]" :: "r"(temp) : "memory"); } asm volatile ("dsb"); asm volatile ("isb"); //lock all ways *REG9_D_LOCKDOWN0 = 0xFFFF; *REG9_I_LOCKDOWN0 = 0xFFFF; }
I also set up the event counters in the PL310 to count the number of IRHIT (instruction read hits) and IRREQ(instruction read requests). I run a piece of code periodically, resetting the counters at each loop and also invalidating L1 instruction cache.
I was hoping to verify that after each loop I would see the number of hit and requests for instructions in L2 to be the same. However, this does not happen. The number of hits is always 0 which suggests I've locked all L2 but the code was not loaded.
When I run the exact same code without locking L2 at the end. I get the first loop of 0 % hit rate, but all subsequent loops show a 100 % hit.
Do you have any idea what I'm doing wrong?
Note: I'm only using one of the CPUs. Also, the region I want to load is configured in the page table as Outer and Inner Write-Back, Write-Allocate.
Despite of not making any sense (i.e. locking before fetching), I tried this approach. No results. Btw, looking at some code provided by Xilinx provided at www.wiki.xilinx.com/Zynq-7000 AP SoC Boot - Locking and Executing out of L2 Cache Tech Tip, it seems like I'm following the correct steps:
int preload_funct(unsigned int uiSrcAddress, unsigned int uiSize) { // static unsigned int uiAlreadyProgrammed; unsigned int i=0; // unsigned int uiNumofWays=0; // unsigned int uiVariable=0; // unsigned int uiValue0=0; // unsigned int uiValue1=0; fsbl_printf(DEBUG_GENERAL,"\n\rInside Preload Functions \n\r"); // Disable FIQ and IRQ interrupt Xil_ExceptionDisableMask(XIL_EXCEPTION_ALL); /* * UnLock Data and Instruction from way 1 to7 and unlock Data and instruction for Way 0. * The PL310 has 8 sets of registers, one per possible CPU. */ for(i=0;i<8;i++) { Xil_Out32((XPS_L2CC_BASEADDR + (XPS_L2CC_CACHE_DLCKDWN_0_WAY_OFFSET + (i*8)) ), (0x00000000)); Xil_Out32((XPS_L2CC_BASEADDR + (XPS_L2CC_CACHE_ILCKDWN_0_WAY_OFFSET + (i*8)) ), (0x00000000)); } /* Flush the Caches */ Xil_DCacheFlush(); Xil_DCacheInvalidate(); fsbl_printf(DEBUG_GENERAL,"\n\r Invalidate D cache \n\r"); /*Preload instruction from section starts from 0x31000000 to Cache Way 0*/ { // Copy Applciation source adress to ro register asm volatile ("mov r0,%0":: "r"(uiSrcAddress)); //Copy application size to r1 register asm volatile ("mov r1,%0":: "r"(uiSize)); // Offset register i.e. r2 asm volatile ("mov r2, #0"); // Label asm ("preload_inst:"); // Load r4 register from the r0+r2 (Source address + offset) // This step create an valid entry of the address (Source address + offset) in L2 cache asm volatile ("ldr r4, [r0,r2]"); // Increment the offset by one cache line asm volatile ("add r2,r2,#4"); // Compare the offset with the Application size. asm volatile ("cmp r1, r2"); // If not equal jump to Label asm volatile ("bge preload_inst"); } // lock both Data and instruction caches from Way 1 to 7. // Lock Data and Instruction Caches for Way 0 for(i=0;i<8;i++) { Xil_Out32((XPS_L2CC_BASEADDR + (XPS_L2CC_CACHE_DLCKDWN_0_WAY_OFFSET + (i*8)) ), 0xffff); Xil_Out32((XPS_L2CC_BASEADDR + (XPS_L2CC_CACHE_ILCKDWN_0_WAY_OFFSET + (i*8)) ), 0xffff); } // Enable all the Interrupts Xil_ExceptionEnableMask(XIL_EXCEPTION_ALL); // uiAlreadyProgrammed=uiVariable; return - XST_SUCCESS; }
However, they run this function with L1 disabled. I also tried this with no results.
Completely lost.
Thank you for your efforts trying to help me.