This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Cortex-M3 pipelining of consecutive LDR instructions to different memory regions?

Hi all,

recently I did some measurements concerning the SysTick-Timer and consumend clock cycles (because of performance reasons).

I wrote a simple function in assembly, which gets called from a C file. Before and after the call i read the value of the SysTick-Timer to determine the cycles neeed for loading the parameter value into register r0, the call and all the assembly code in the function.

Taking into account, that two consecutive (simple) LDR instructions can get pipeplined, it seems they don't get pipelined - at least when looking at the clock cycles.

Am I right assuming that loads to different memory regions (for SysTick-Timer and stack) don't get (ever) pipelined ? And maybe a slightly other question: do loads get pipelined when crossing boundaries concerning "minimum memory part sizes" (AHB-Lite) in the same memory region?

Thanks in advance,

Alex

Parents
  • Hi Alex,

    You do have a point that needs clarification. Here is what I imagine can be done (a slight chance you have already seen and tried this). In the Load/Store Timing section of the documentation, possible scenarios involving an LDR instruction are explained as follows:

    • LDR [any] are pipelined when possible. This means that if the next instruction is an LDR or STR, and the destination of the first LDR is not used to compute the address for the next instruction, then one cycle is removed from the cost of the next instruction. So, an LDR might be followed by an STR, so that the STR writes out what the LDR loaded. More multiple LDRs can be pipelined together. Some optimized examples are:
      • LDR R0,[R1]; LDR R1,[R2] - normally three cycles total
      • LDR R0,[R1,R2]; STR R0,[R3,#20] - normally three cycles total
      • LDR R0,[R1,R2]; STR R1,[R3,R2] - normally three cycles total
      • LDR R0,[R1,R5]; LDR R1,[R2]; LDR R2,[R3,#4] - normally four cycles total.

    Given the above explanation, I would recommend taking a look at the entire piece of your code at the assembly level to check whether the destination of the first LDR is not being used to compute the address of the next LDR. Hope this helps.

    Regards,

    Sadanand Gulwadi

    ARM University Program Manager, Bangalore

Reply
  • Hi Alex,

    You do have a point that needs clarification. Here is what I imagine can be done (a slight chance you have already seen and tried this). In the Load/Store Timing section of the documentation, possible scenarios involving an LDR instruction are explained as follows:

    • LDR [any] are pipelined when possible. This means that if the next instruction is an LDR or STR, and the destination of the first LDR is not used to compute the address for the next instruction, then one cycle is removed from the cost of the next instruction. So, an LDR might be followed by an STR, so that the STR writes out what the LDR loaded. More multiple LDRs can be pipelined together. Some optimized examples are:
      • LDR R0,[R1]; LDR R1,[R2] - normally three cycles total
      • LDR R0,[R1,R2]; STR R0,[R3,#20] - normally three cycles total
      • LDR R0,[R1,R2]; STR R1,[R3,R2] - normally three cycles total
      • LDR R0,[R1,R5]; LDR R1,[R2]; LDR R2,[R3,#4] - normally four cycles total.

    Given the above explanation, I would recommend taking a look at the entire piece of your code at the assembly level to check whether the destination of the first LDR is not being used to compute the address of the next LDR. Hope this helps.

    Regards,

    Sadanand Gulwadi

    ARM University Program Manager, Bangalore

Children
No data