This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Cortex-M3 pipelining of consecutive LDR instructions to different memory regions?

Hi all,

recently I did some measurements concerning the SysTick-Timer and consumend clock cycles (because of performance reasons).

I wrote a simple function in assembly, which gets called from a C file. Before and after the call i read the value of the SysTick-Timer to determine the cycles neeed for loading the parameter value into register r0, the call and all the assembly code in the function.

Taking into account, that two consecutive (simple) LDR instructions can get pipeplined, it seems they don't get pipelined - at least when looking at the clock cycles.

Am I right assuming that loads to different memory regions (for SysTick-Timer and stack) don't get (ever) pipelined ? And maybe a slightly other question: do loads get pipelined when crossing boundaries concerning "minimum memory part sizes" (AHB-Lite) in the same memory region?

Thanks in advance,

Alex

Parents
  • > Concerning jensbauer's suggested measurements with disabling the 2 loads:

    > The suggested code shows a cycle count of 4 with the 2 loads disabled, 6 cycles with just one load disabled and 7 when executing the loads in the middle, which show expected and reasonable results.

    I suggested the dummy instructions, in order to get accurate measurements; in other words to "synchronize", so you do not get your results disturbed by the reading of the cycle counter.

    If 4 cycles are being used with no loads enabled and 6 cycles are being used with 1 load enabled, that suggests the first load takes 2 clock cycles, correct ?

    If 6 cycles are being used with one load enabled and 7 cycles are being used with 2 loads enabled, that suggests the second load takes 1 clock cycle, correct ?

    If the above is true, then I believe the second instruction is being pipelined as expected.

    -Or do I misinterpret the results ?

    If you need to measure if the reading of the cycle-counter affects the pipelining, you could make a duplicate load:

        LDR R1,[R0]     ;read systick val reg

        SUB R4,R5,R6    ;dummy instruction to flush the pipeline

    ;   LDR R4,[R0]     ;dummy read of systick val reg

    ;   LDR R2,[R2]     ;read variable 1

    ;   LDR R3,[R3]     ;read variable 2

        SUB R4,R5,R6    ;dummy instruction to flush the pipeline

        LDR R0,[R0]     ;read systick val reg

        SUB R0,R1,R0    ;subtract second read value from first read value

    If enabling all 3 loads, I would expect the result to be...

    • 8 if an instruction can be pipelined after reading the systick value.
    • 9 if an instruction cannot be pipelined after reading the systick value.

    Note: Remember that the first load in a sequence is never pipelined, so the first load will always use 2 clock cycles...

Reply
  • > Concerning jensbauer's suggested measurements with disabling the 2 loads:

    > The suggested code shows a cycle count of 4 with the 2 loads disabled, 6 cycles with just one load disabled and 7 when executing the loads in the middle, which show expected and reasonable results.

    I suggested the dummy instructions, in order to get accurate measurements; in other words to "synchronize", so you do not get your results disturbed by the reading of the cycle counter.

    If 4 cycles are being used with no loads enabled and 6 cycles are being used with 1 load enabled, that suggests the first load takes 2 clock cycles, correct ?

    If 6 cycles are being used with one load enabled and 7 cycles are being used with 2 loads enabled, that suggests the second load takes 1 clock cycle, correct ?

    If the above is true, then I believe the second instruction is being pipelined as expected.

    -Or do I misinterpret the results ?

    If you need to measure if the reading of the cycle-counter affects the pipelining, you could make a duplicate load:

        LDR R1,[R0]     ;read systick val reg

        SUB R4,R5,R6    ;dummy instruction to flush the pipeline

    ;   LDR R4,[R0]     ;dummy read of systick val reg

    ;   LDR R2,[R2]     ;read variable 1

    ;   LDR R3,[R3]     ;read variable 2

        SUB R4,R5,R6    ;dummy instruction to flush the pipeline

        LDR R0,[R0]     ;read systick val reg

        SUB R0,R1,R0    ;subtract second read value from first read value

    If enabling all 3 loads, I would expect the result to be...

    • 8 if an instruction can be pipelined after reading the systick value.
    • 9 if an instruction cannot be pipelined after reading the systick value.

    Note: Remember that the first load in a sequence is never pipelined, so the first load will always use 2 clock cycles...

Children