We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
I've been studying and experimenting with the caches on an ARM Cortex-A9, namely a Zynq SoC, for the past week with the main objective of loading and locking part of my code to L2 (PL310). The steps I take to achieve this are:
The code for loading is:
extern uint32_t code_start; extern uint32_t code_end; void PreloadCode() { uint32_t* temp; uint32_t dummy; //invalidate all ways and L1 data cache L1ICacheInvalidate(); *REG7_CLEAN_INV_WAY = 0xffff; while(*REG7_CLEAN_INV_WAY); *REG9_CACHE_SYNC = 0; while(*REG9_CACHE_SYNC); //unlock all ways *REG9_D_LOCKDOWN0 = 0x0000; *REG9_I_LOCKDOWN0 = 0x0000; asm volatile ("dsb"); asm volatile ("isb"); for(temp = &code_start; temp < &code_end; temp += 1){ asm volatile ("ldr %0, [%1]" : "=r"(dummy) : "r"(temp)); // asm volatile ("pld [%0]" :: "r"(temp) : "memory"); // asm volatile ("pli [%0]" :: "r"(temp) : "memory"); } asm volatile ("dsb"); asm volatile ("isb"); //lock all ways *REG9_D_LOCKDOWN0 = 0xFFFF; *REG9_I_LOCKDOWN0 = 0xFFFF; }
I also set up the event counters in the PL310 to count the number of IRHIT (instruction read hits) and IRREQ(instruction read requests). I run a piece of code periodically, resetting the counters at each loop and also invalidating L1 instruction cache.
I was hoping to verify that after each loop I would see the number of hit and requests for instructions in L2 to be the same. However, this does not happen. The number of hits is always 0 which suggests I've locked all L2 but the code was not loaded.
When I run the exact same code without locking L2 at the end. I get the first loop of 0 % hit rate, but all subsequent loops show a 100 % hit.
Do you have any idea what I'm doing wrong?
Note: I'm only using one of the CPUs. Also, the region I want to load is configured in the page table as Outer and Inner Write-Back, Write-Allocate.
Despite of not making any sense (i.e. locking before fetching), I tried this approach. No results. Btw, looking at some code provided by Xilinx provided at www.wiki.xilinx.com/Zynq-7000 AP SoC Boot - Locking and Executing out of L2 Cache Tech Tip, it seems like I'm following the correct steps:
int preload_funct(unsigned int uiSrcAddress, unsigned int uiSize) { // static unsigned int uiAlreadyProgrammed; unsigned int i=0; // unsigned int uiNumofWays=0; // unsigned int uiVariable=0; // unsigned int uiValue0=0; // unsigned int uiValue1=0; fsbl_printf(DEBUG_GENERAL,"\n\rInside Preload Functions \n\r"); // Disable FIQ and IRQ interrupt Xil_ExceptionDisableMask(XIL_EXCEPTION_ALL); /* * UnLock Data and Instruction from way 1 to7 and unlock Data and instruction for Way 0. * The PL310 has 8 sets of registers, one per possible CPU. */ for(i=0;i<8;i++) { Xil_Out32((XPS_L2CC_BASEADDR + (XPS_L2CC_CACHE_DLCKDWN_0_WAY_OFFSET + (i*8)) ), (0x00000000)); Xil_Out32((XPS_L2CC_BASEADDR + (XPS_L2CC_CACHE_ILCKDWN_0_WAY_OFFSET + (i*8)) ), (0x00000000)); } /* Flush the Caches */ Xil_DCacheFlush(); Xil_DCacheInvalidate(); fsbl_printf(DEBUG_GENERAL,"\n\r Invalidate D cache \n\r"); /*Preload instruction from section starts from 0x31000000 to Cache Way 0*/ { // Copy Applciation source adress to ro register asm volatile ("mov r0,%0":: "r"(uiSrcAddress)); //Copy application size to r1 register asm volatile ("mov r1,%0":: "r"(uiSize)); // Offset register i.e. r2 asm volatile ("mov r2, #0"); // Label asm ("preload_inst:"); // Load r4 register from the r0+r2 (Source address + offset) // This step create an valid entry of the address (Source address + offset) in L2 cache asm volatile ("ldr r4, [r0,r2]"); // Increment the offset by one cache line asm volatile ("add r2,r2,#4"); // Compare the offset with the Application size. asm volatile ("cmp r1, r2"); // If not equal jump to Label asm volatile ("bge preload_inst"); } // lock both Data and instruction caches from Way 1 to 7. // Lock Data and Instruction Caches for Way 0 for(i=0;i<8;i++) { Xil_Out32((XPS_L2CC_BASEADDR + (XPS_L2CC_CACHE_DLCKDWN_0_WAY_OFFSET + (i*8)) ), 0xffff); Xil_Out32((XPS_L2CC_BASEADDR + (XPS_L2CC_CACHE_ILCKDWN_0_WAY_OFFSET + (i*8)) ), 0xffff); } // Enable all the Interrupts Xil_ExceptionEnableMask(XIL_EXCEPTION_ALL); // uiAlreadyProgrammed=uiVariable; return - XST_SUCCESS; }
However, they run this function with L1 disabled. I also tried this with no results.
Completely lost.
Thank you for your efforts trying to help me.