This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

ARM Cortex-A9 Preload and Lock Code in L2C-310

josecm over 8 years ago

I've been studying and experimenting with the caches on an ARM Cortex-A9, namely a Zynq SoC, for the past week with the main objective of loading and locking part of my code to L2 (PL310). The steps I take to achieve this are:

Set TTBR0 and Invalidate TLBS
Invalidate L1 Inst and Data Caches and L2 Cache
Init and Enable L2 Cache
Enable L1 Data and Inst and MMU
Unlock all L2 ways. Run a loop loading the code (using symbols defined in the linker script for the memory region I target). I've tried using three types of instructions for loading - LDR, PLD and PLI. Lock all L2 ways.

The code for loading is:

extern uint32_t code_start;
extern uint32_t code_end;

void PreloadCode() {
    uint32_t* temp;
    uint32_t dummy;

    //invalidate all ways and L1 data cache
    L1ICacheInvalidate();
    *REG7_CLEAN_INV_WAY = 0xffff;
    while(*REG7_CLEAN_INV_WAY);
    *REG9_CACHE_SYNC = 0;
    while(*REG9_CACHE_SYNC);

    //unlock all ways
    *REG9_D_LOCKDOWN0 = 0x0000;
    *REG9_I_LOCKDOWN0 = 0x0000;

    asm volatile ("dsb");
    asm volatile ("isb");

    for(temp = &code_start; temp < &code_end; temp += 1){
        asm volatile ("ldr %0, [%1]" : "=r"(dummy) : "r"(temp));
    //  asm volatile ("pld [%0]" :: "r"(temp) : "memory");
    //  asm volatile ("pli [%0]" :: "r"(temp) : "memory");
    }

    asm volatile ("dsb");
    asm volatile ("isb");

    //lock all ways
    *REG9_D_LOCKDOWN0 = 0xFFFF;
    *REG9_I_LOCKDOWN0 = 0xFFFF;

}

I also set up the event counters in the PL310 to count the number of IRHIT (instruction read hits) and IRREQ(instruction read requests). I run a piece of code periodically, resetting the counters at each loop and also invalidating L1 instruction cache.

I was hoping to verify that after each loop I would see the number of hit and requests for instructions in L2 to be the same. However, this does not happen. The number of hits is always 0 which suggests I've locked all L2 but the code was not loaded.

When I run the exact same code without locking L2 at the end. I get the first loop of 0 % hit rate, but all subsequent loops show a 100 % hit.

Do you have any idea what I'm doing wrong?

Note: I'm only using one of the CPUs. Also, the region I want to load is configured in the page table as Outer and Inner Write-Back, Write-Allocate.

Top replies

Norbert Goldstein over 8 years ago in reply to josecm +1 verified

Else your code may be locked in L2 cache instead of the instructions that are to be locked down: See below for the high level flow (Stage 2 ): 1. Ensure that no processor exceptions can occur during...

Parents

0 josecm over 8 years ago in reply to Norbert Goldstein

"If an instruction cache is being locked down, use the prefetch instruction cache line operation to fetch the memory cache line into the cache." By this you mean the PLI instruction? Or is there another instruction for prefetching that I'm not aware of?

I opted for the LDR instruction because the Xilinx example uses this.
Cancel
Vote up 0 Vote down

Cancel

Reply

0 josecm over 8 years ago in reply to Norbert Goldstein

"If an instruction cache is being locked down, use the prefetch instruction cache line operation to fetch the memory cache line into the cache." By this you mean the PLI instruction? Or is there another instruction for prefetching that I'm not aware of?

I opted for the LDR instruction because the Xilinx example uses this.
Cancel
Vote up 0 Vote down

Cancel

Children

0 42Bastian Schick over 8 years ago in reply to josecm

I'd say, the "PLI" instruction works for L1 and L2 cache where the "LDR" only for the L2 cache (it is unified).
Anyway, please keep us informed about the final solution.
Cancel
Vote up 0 Vote down

Cancel