This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

ARM Cortex-A9 Preload and Lock Code in L2C-310

I've been studying and experimenting with the caches on an ARM Cortex-A9, namely a Zynq SoC, for the past week with the main objective of loading and locking part of my code to L2 (PL310). The steps I take to achieve this are:

  • Set TTBR0 and Invalidate TLBS
  • Invalidate L1 Inst and Data Caches and L2 Cache
  • Init and Enable L2 Cache
  • Enable L1 Data and Inst and MMU
  • Unlock all L2 ways. Run a loop loading the code (using symbols defined in the linker script for the memory region I target). I've tried using three types of instructions for loading - LDR, PLD and PLI. Lock all L2 ways.

The code for loading is:

extern uint32_t code_start;
extern uint32_t code_end;

void PreloadCode() {
    uint32_t* temp;
    uint32_t dummy;

    //invalidate all ways and L1 data cache
    L1ICacheInvalidate();
    *REG7_CLEAN_INV_WAY = 0xffff;
    while(*REG7_CLEAN_INV_WAY);
    *REG9_CACHE_SYNC = 0;
    while(*REG9_CACHE_SYNC);

    //unlock all ways
    *REG9_D_LOCKDOWN0 = 0x0000;
    *REG9_I_LOCKDOWN0 = 0x0000;

    asm volatile ("dsb");
    asm volatile ("isb");

    for(temp = &code_start; temp < &code_end; temp += 1){
        asm volatile ("ldr %0, [%1]" : "=r"(dummy) : "r"(temp));
    //  asm volatile ("pld [%0]" :: "r"(temp) : "memory");
    //  asm volatile ("pli [%0]" :: "r"(temp) : "memory");
    }

    asm volatile ("dsb");
    asm volatile ("isb");

    //lock all ways
    *REG9_D_LOCKDOWN0 = 0xFFFF;
    *REG9_I_LOCKDOWN0 = 0xFFFF;

}

I also set up the event counters in the PL310 to count the number of IRHIT (instruction read hits) and IRREQ(instruction read requests). I run a piece of code periodically, resetting the counters at each loop and also invalidating L1 instruction cache.

I was hoping to verify that after each loop I would see the number of hit and requests for instructions in L2 to be the same. However, this does not happen. The number of hits is always 0 which suggests I've locked all L2 but the code was not loaded.

When I run the exact same code without locking L2 at the end. I get the first loop of 0 % hit rate, but all subsequent loops show a 100 % hit.

Do you have any idea what I'm doing wrong?

Note: I'm only using one of the CPUs. Also, the region I want to load is configured in the page table as Outer and Inner Write-Back, Write-Allocate.

Parents
  • Despite of not making any sense (i.e. locking before fetching), I tried this approach. No results. Btw, looking at some code provided by Xilinx provided at www.wiki.xilinx.com/Zynq-7000 AP SoC Boot - Locking and Executing out of L2 Cache Tech Tip, it seems like I'm following the correct steps:

     

    int preload_funct(unsigned int uiSrcAddress, unsigned int uiSize)
    {
    //	static unsigned int uiAlreadyProgrammed;
    	unsigned int 		i=0;
    //	unsigned int  		uiNumofWays=0;
    //	unsigned int 		uiVariable=0;
    //	unsigned int 		uiValue0=0;
    //	unsigned int 		uiValue1=0;
    
    
    	fsbl_printf(DEBUG_GENERAL,"\n\rInside  Preload Functions \n\r");
    	// Disable FIQ and IRQ interrupt
    	Xil_ExceptionDisableMask(XIL_EXCEPTION_ALL);
    	/*
    	 * UnLock Data and Instruction from way 1 to7 and unlock Data and instruction for Way 0.
    	 * The PL310 has 8 sets of registers, one per possible CPU.
    	 */
    	for(i=0;i<8;i++)
    	{
    		Xil_Out32((XPS_L2CC_BASEADDR + (XPS_L2CC_CACHE_DLCKDWN_0_WAY_OFFSET + (i*8)) ), (0x00000000));
    		Xil_Out32((XPS_L2CC_BASEADDR + (XPS_L2CC_CACHE_ILCKDWN_0_WAY_OFFSET + (i*8)) ), (0x00000000));
    
    	}
    
    
    	/* Flush the Caches */
    	Xil_DCacheFlush();
    	Xil_DCacheInvalidate();
    	fsbl_printf(DEBUG_GENERAL,"\n\r Invalidate D cache \n\r");
    
    	/*Preload instruction from section starts from 0x31000000 to Cache Way 0*/
    	{
    	// Copy Applciation source adress to ro register
    	 asm volatile ("mov r0,%0":: "r"(uiSrcAddress));
    	 //Copy application size to r1 register
    	 asm volatile ("mov r1,%0":: "r"(uiSize));
    	 // Offset register i.e. r2
    	 asm volatile  ("mov r2, #0");
    	 // Label
    	 asm ("preload_inst:");
    	 // Load r4 register from the r0+r2 (Source address + offset)
    	 // This step create an valid entry of the address (Source address + offset) in L2 cache
    	 asm volatile ("ldr r4, [r0,r2]");
    	 // Increment the offset by one cache line
    	 asm volatile ("add r2,r2,#4");
    	 // Compare the offset with the Application size.
    	 asm volatile ("cmp r1, r2");
    	 // If not equal jump to Label
    	 asm volatile ("bge preload_inst");
    
    	}
    	// lock both Data and instruction caches from Way 1 to 7.
    	// Lock Data and Instruction Caches for Way 0
    	for(i=0;i<8;i++)
    		{
    			Xil_Out32((XPS_L2CC_BASEADDR + (XPS_L2CC_CACHE_DLCKDWN_0_WAY_OFFSET + (i*8)) ), 0xffff);
    			Xil_Out32((XPS_L2CC_BASEADDR + (XPS_L2CC_CACHE_ILCKDWN_0_WAY_OFFSET + (i*8)) ), 0xffff);
    		}
    	// Enable all the Interrupts
    	Xil_ExceptionEnableMask(XIL_EXCEPTION_ALL);
    //	uiAlreadyProgrammed=uiVariable;
    
    	return - XST_SUCCESS;
    
    }

    However, they run this function with L1 disabled. I also tried this with no results.

    Completely lost.

    Thank you for your efforts trying to help me.

Reply
  • Despite of not making any sense (i.e. locking before fetching), I tried this approach. No results. Btw, looking at some code provided by Xilinx provided at www.wiki.xilinx.com/Zynq-7000 AP SoC Boot - Locking and Executing out of L2 Cache Tech Tip, it seems like I'm following the correct steps:

     

    int preload_funct(unsigned int uiSrcAddress, unsigned int uiSize)
    {
    //	static unsigned int uiAlreadyProgrammed;
    	unsigned int 		i=0;
    //	unsigned int  		uiNumofWays=0;
    //	unsigned int 		uiVariable=0;
    //	unsigned int 		uiValue0=0;
    //	unsigned int 		uiValue1=0;
    
    
    	fsbl_printf(DEBUG_GENERAL,"\n\rInside  Preload Functions \n\r");
    	// Disable FIQ and IRQ interrupt
    	Xil_ExceptionDisableMask(XIL_EXCEPTION_ALL);
    	/*
    	 * UnLock Data and Instruction from way 1 to7 and unlock Data and instruction for Way 0.
    	 * The PL310 has 8 sets of registers, one per possible CPU.
    	 */
    	for(i=0;i<8;i++)
    	{
    		Xil_Out32((XPS_L2CC_BASEADDR + (XPS_L2CC_CACHE_DLCKDWN_0_WAY_OFFSET + (i*8)) ), (0x00000000));
    		Xil_Out32((XPS_L2CC_BASEADDR + (XPS_L2CC_CACHE_ILCKDWN_0_WAY_OFFSET + (i*8)) ), (0x00000000));
    
    	}
    
    
    	/* Flush the Caches */
    	Xil_DCacheFlush();
    	Xil_DCacheInvalidate();
    	fsbl_printf(DEBUG_GENERAL,"\n\r Invalidate D cache \n\r");
    
    	/*Preload instruction from section starts from 0x31000000 to Cache Way 0*/
    	{
    	// Copy Applciation source adress to ro register
    	 asm volatile ("mov r0,%0":: "r"(uiSrcAddress));
    	 //Copy application size to r1 register
    	 asm volatile ("mov r1,%0":: "r"(uiSize));
    	 // Offset register i.e. r2
    	 asm volatile  ("mov r2, #0");
    	 // Label
    	 asm ("preload_inst:");
    	 // Load r4 register from the r0+r2 (Source address + offset)
    	 // This step create an valid entry of the address (Source address + offset) in L2 cache
    	 asm volatile ("ldr r4, [r0,r2]");
    	 // Increment the offset by one cache line
    	 asm volatile ("add r2,r2,#4");
    	 // Compare the offset with the Application size.
    	 asm volatile ("cmp r1, r2");
    	 // If not equal jump to Label
    	 asm volatile ("bge preload_inst");
    
    	}
    	// lock both Data and instruction caches from Way 1 to 7.
    	// Lock Data and Instruction Caches for Way 0
    	for(i=0;i<8;i++)
    		{
    			Xil_Out32((XPS_L2CC_BASEADDR + (XPS_L2CC_CACHE_DLCKDWN_0_WAY_OFFSET + (i*8)) ), 0xffff);
    			Xil_Out32((XPS_L2CC_BASEADDR + (XPS_L2CC_CACHE_ILCKDWN_0_WAY_OFFSET + (i*8)) ), 0xffff);
    		}
    	// Enable all the Interrupts
    	Xil_ExceptionEnableMask(XIL_EXCEPTION_ALL);
    //	uiAlreadyProgrammed=uiVariable;
    
    	return - XST_SUCCESS;
    
    }

    However, they run this function with L1 disabled. I also tried this with no results.

    Completely lost.

    Thank you for your efforts trying to help me.

Children