This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Instruction timings - arm cortex m3

newbie over 11 years ago

I am using the following 3 assembly sections to read a memory mapped i/o to multiple registers and to read same i/o and save it ram respectively , on an ARM Cortex M3. I want to know exactly how many CPU cycles this would take to complete. Or in other words how fast am I reading the register.

1) read to and save to memory: Can LDR-STR=LDR-STR be tightly pipelined (With Address Phase of one instruction overlapping Data Phase of previous instruction), in which case the following will take only 9 cycles ?

486: 781a ldrb r2, [r3, #0]

488: 7002 strb r2, [r0, #0]

48a: 781a ldrb r2, [r3, #0]

48c: 7042 strb r2, [r0, #1]

48e: 781a ldrb r2, [r3, #0]

490: 7082 strb r2, [r0, #2]

492: 781a ldrb r2, [r3, #0]

494: 70c2 strb r2, [r0, #3]

2) read to multiple registers: I am assuming these instructions take 5 cycles.

486: 781a ldrb r2, [r3, #0]

48a: 781a ldrb r4, [r3, #0]

48e: 781a ldrb r5, [r3, #0]

492: 781a ldrb r6, [r3, #0]

I appreciate any insight you can provide.

Thanks,

Parents

0 Joseph Yiu over 11 years ago in reply to Jens Bauer

Hi Jens,
For best performance, in general pipeline LDR and STR are good for Cortex-M3/M4. (Not applicable to Cortex-M0, M0+ , M7)
This reduce the subseqence LDR/STR instructions to 1 cycle (assumed 0 wait state, no unaligned/bitband transfers).
If you insert operations between LDR/STR, then each LDR would be 2 cycles (STR could still be 1 cycle because of the write buffer).
Ideally, use 16-bit LDR/STR for this (also for code size benefit).
If you need to use 32-bit versions, then try to make sure that the pipelining LDR/STR instructions are aligned to 32-bit addresses.
regards.
Joseph
Cancel
Vote up 0 Vote down

Cancel

Reply

0 Joseph Yiu over 11 years ago in reply to Jens Bauer

Hi Jens,
For best performance, in general pipeline LDR and STR are good for Cortex-M3/M4. (Not applicable to Cortex-M0, M0+ , M7)
This reduce the subseqence LDR/STR instructions to 1 cycle (assumed 0 wait state, no unaligned/bitband transfers).
If you insert operations between LDR/STR, then each LDR would be 2 cycles (STR could still be 1 cycle because of the write buffer).
Ideally, use 16-bit LDR/STR for this (also for code size benefit).
If you need to use 32-bit versions, then try to make sure that the pipelining LDR/STR instructions are aligned to 32-bit addresses.
regards.
Joseph
Cancel
Vote up 0 Vote down

Cancel

Children

0 Jens Bauer over 11 years ago in reply to Joseph Yiu

Thank you for the detailed reply; this sounds great, because often I do the following ...
     .rept 200
     ldr rX,[rS,#imm]
     str rX,[rT,#imm]
     (optionally integer instructions here)
     .endr
...so that integer instructions (eg. ADD, SUB, AND, ORR, EOR, shifts, etc.) will be right after STR and before LDR; never after LDR.
But in order to make my question more clear:
A:
     .rept 20
          .rept 10
               ldr rX,[rS,#imm]
               str rX,[rT,#imm]
          .endr
          .rept 10
               and rX,rX,#imm
          .endr
     .endr
B:
     .rept 20
          .rept 10
               ldr rX,[rS,#imm]
               str rX,[rT,#imm]
               and rX,rX,#imm
          .endr
     .endr
I understand it as example B would not suffer from bubbles in the pipeline, nothing gets pipelined after a STR, thus it does not matter which instruction we place after STR, correct ?
-Would bubbles appear in the pipeline in example A (because of the long list of LDR/STR), or would they be equally efficient ?
Cancel
Vote up 0 Vote down

Cancel