This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Instruction timings - arm cortex m3

I am using the following 3 assembly sections to read a memory mapped i/o to multiple registers and to read same i/o and save it ram respectively , on an ARM Cortex M3. I want to know exactly how many CPU cycles this would take to complete. Or in other words how fast am I reading the register.

1) read to and save to memory: Can LDR-STR=LDR-STR be tightly pipelined (With Address Phase of one instruction overlapping Data Phase of previous instruction), in which case the following will take only 9 cycles ? 

     486:   781a      ldrb        r2, [r3, #0]

     488:   7002      strb        r2, [r0, #0]

     48a:   781a      ldrb        r2, [r3, #0]

     48c:   7042      strb        r2, [r0, #1]

     48e:  781a      ldrb        r2, [r3, #0]

     490:   7082      strb        r2, [r0, #2]

     492:   781a      ldrb        r2, [r3, #0]

     494:   70c2      strb        r2, [r0, #3]

2) read to multiple registers: I am assuming these instructions take 5 cycles.

     486:   781a      ldrb        r2, [r3, #0]

     48a:   781a      ldrb        r4, [r3, #0]

     48e:  781a      ldrb        r5, [r3, #0]

     492:   781a      ldrb        r6, [r3, #0]

I appreciate any insight you can provide.

Thanks,

Parents
  • Thank you, Joseph, this definitely helps a lot in understanding what to do and how to do it.

    As I understand it, it sounds like it's a good idea to use 16-bit instructions (and align them on a 32-bit boundary if one can swap two instructions).

    I was very much surprised with the LPC1768 - perhaps my measuring results weren't all wrong after all.

    When reading your reply, it appears that it's good to place some integer-instructions (eg. add, sub, comp, and, orr, eor, shift, etc.) right after STR-type instructions; because then there will not be pipeline-bubbles, due to that the LDR/STR block will be short, thus it won't be "exhausted" - or am I getting this part wrong ?

Reply
  • Thank you, Joseph, this definitely helps a lot in understanding what to do and how to do it.

    As I understand it, it sounds like it's a good idea to use 16-bit instructions (and align them on a 32-bit boundary if one can swap two instructions).

    I was very much surprised with the LPC1768 - perhaps my measuring results weren't all wrong after all.

    When reading your reply, it appears that it's good to place some integer-instructions (eg. add, sub, comp, and, orr, eor, shift, etc.) right after STR-type instructions; because then there will not be pipeline-bubbles, due to that the LDR/STR block will be short, thus it won't be "exhausted" - or am I getting this part wrong ?

Children
  • Hi Jens,

    For best performance, in general pipeline LDR and STR are good for Cortex-M3/M4. (Not applicable to Cortex-M0, M0+ , M7)

    This reduce the subseqence LDR/STR instructions to 1 cycle (assumed 0 wait state, no unaligned/bitband transfers).

    If you insert operations between LDR/STR, then each LDR would be 2 cycles (STR could still be 1 cycle because of the write buffer).

    Ideally, use 16-bit LDR/STR for this (also for code size benefit).

    If you need to use 32-bit versions, then try to make sure that the pipelining LDR/STR instructions are aligned to 32-bit addresses.

    regards.

    Joseph

  • Thank you for the detailed reply; this sounds great, because often I do the following ...

         .rept 200

         ldr rX,[rS,#imm]

         str rX,[rT,#imm]

         (optionally integer instructions here)

         .endr

    ...so that integer instructions (eg. ADD, SUB, AND, ORR, EOR, shifts, etc.) will be right after STR and before LDR; never after LDR.

    But in order to make my question more clear:

    A:

         .rept 20

              .rept 10

                   ldr rX,[rS,#imm]

                   str rX,[rT,#imm]

              .endr

              .rept 10

                   and rX,rX,#imm

              .endr

         .endr

    B:

         .rept 20

              .rept 10

                   ldr rX,[rS,#imm]

                   str rX,[rT,#imm]

                   and rX,rX,#imm

              .endr

         .endr

    I understand it as example B would not suffer from bubbles in the pipeline, nothing gets pipelined after a STR, thus it does not matter which instruction we place after STR, correct ?

    -Would bubbles appear in the pipeline in example A (because of the long list of LDR/STR), or would they be equally efficient ?