I am using the following 3 assembly sections to read a memory mapped i/o to multiple registers and to read same i/o and save it ram respectively , on an ARM Cortex M3. I want to know exactly how many CPU cycles this would take to complete. Or in other words how fast am I reading the register.
1) read to and save to memory: Can LDR-STR=LDR-STR be tightly pipelined (With Address Phase of one instruction overlapping Data Phase of previous instruction), in which case the following will take only 9 cycles ?
486: 781a ldrb r2, [r3, #0]
488: 7002 strb r2, [r0, #0]
48a: 781a ldrb r2, [r3, #0]
48c: 7042 strb r2, [r0, #1]
48e: 781a ldrb r2, [r3, #0]
490: 7082 strb r2, [r0, #2]
492: 781a ldrb r2, [r3, #0]
494: 70c2 strb r2, [r0, #3]
2) read to multiple registers: I am assuming these instructions take 5 cycles.
48a: 781a ldrb r4, [r3, #0]
48e: 781a ldrb r5, [r3, #0]
492: 781a ldrb r6, [r3, #0]
I appreciate any insight you can provide.
Thanks,
Thank you, Joseph, this definitely helps a lot in understanding what to do and how to do it.
As I understand it, it sounds like it's a good idea to use 16-bit instructions (and align them on a 32-bit boundary if one can swap two instructions).
I was very much surprised with the LPC1768 - perhaps my measuring results weren't all wrong after all.
When reading your reply, it appears that it's good to place some integer-instructions (eg. add, sub, comp, and, orr, eor, shift, etc.) right after STR-type instructions; because then there will not be pipeline-bubbles, due to that the LDR/STR block will be short, thus it won't be "exhausted" - or am I getting this part wrong ?
Hi Jens,
For best performance, in general pipeline LDR and STR are good for Cortex-M3/M4. (Not applicable to Cortex-M0, M0+ , M7)
This reduce the subseqence LDR/STR instructions to 1 cycle (assumed 0 wait state, no unaligned/bitband transfers).
If you insert operations between LDR/STR, then each LDR would be 2 cycles (STR could still be 1 cycle because of the write buffer).
Ideally, use 16-bit LDR/STR for this (also for code size benefit).
If you need to use 32-bit versions, then try to make sure that the pipelining LDR/STR instructions are aligned to 32-bit addresses.
regards.
Joseph
Thank you for the detailed reply; this sounds great, because often I do the following ...
.rept 200
ldr rX,[rS,#imm]
str rX,[rT,#imm]
(optionally integer instructions here)
.endr
...so that integer instructions (eg. ADD, SUB, AND, ORR, EOR, shifts, etc.) will be right after STR and before LDR; never after LDR.
But in order to make my question more clear:
A:
.rept 20
.rept 10
and rX,rX,#imm
B:
I understand it as example B would not suffer from bubbles in the pipeline, nothing gets pipelined after a STR, thus it does not matter which instruction we place after STR, correct ?
-Would bubbles appear in the pipeline in example A (because of the long list of LDR/STR), or would they be equally efficient ?