Please note: We are aware of an issue affecting replies on the Arm Community forums, which may not be loading as expected.

We apologize for any inconvenience and appreciate your patience while we investigate and work to resolve the issue.

Thank you for your understanding.


This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Instruction timings - arm cortex m3

I am using the following 3 assembly sections to read a memory mapped i/o to multiple registers and to read same i/o and save it ram respectively , on an ARM Cortex M3. I want to know exactly how many CPU cycles this would take to complete. Or in other words how fast am I reading the register.

1) read to and save to memory: Can LDR-STR=LDR-STR be tightly pipelined (With Address Phase of one instruction overlapping Data Phase of previous instruction), in which case the following will take only 9 cycles ? 

     486:   781a      ldrb        r2, [r3, #0]

     488:   7002      strb        r2, [r0, #0]

     48a:   781a      ldrb        r2, [r3, #0]

     48c:   7042      strb        r2, [r0, #1]

     48e:  781a      ldrb        r2, [r3, #0]

     490:   7082      strb        r2, [r0, #2]

     492:   781a      ldrb        r2, [r3, #0]

     494:   70c2      strb        r2, [r0, #3]

2) read to multiple registers: I am assuming these instructions take 5 cycles.

     486:   781a      ldrb        r2, [r3, #0]

     48a:   781a      ldrb        r4, [r3, #0]

     48e:  781a      ldrb        r5, [r3, #0]

     492:   781a      ldrb        r6, [r3, #0]

I appreciate any insight you can provide.

Thanks,

Parents
  • Thank you for the detailed reply; this sounds great, because often I do the following ...

         .rept 200

         ldr rX,[rS,#imm]

         str rX,[rT,#imm]

         (optionally integer instructions here)

         .endr

    ...so that integer instructions (eg. ADD, SUB, AND, ORR, EOR, shifts, etc.) will be right after STR and before LDR; never after LDR.

    But in order to make my question more clear:

    A:

         .rept 20

              .rept 10

                   ldr rX,[rS,#imm]

                   str rX,[rT,#imm]

              .endr

              .rept 10

                   and rX,rX,#imm

              .endr

         .endr

    B:

         .rept 20

              .rept 10

                   ldr rX,[rS,#imm]

                   str rX,[rT,#imm]

                   and rX,rX,#imm

              .endr

         .endr

    I understand it as example B would not suffer from bubbles in the pipeline, nothing gets pipelined after a STR, thus it does not matter which instruction we place after STR, correct ?

    -Would bubbles appear in the pipeline in example A (because of the long list of LDR/STR), or would they be equally efficient ?

Reply
  • Thank you for the detailed reply; this sounds great, because often I do the following ...

         .rept 200

         ldr rX,[rS,#imm]

         str rX,[rT,#imm]

         (optionally integer instructions here)

         .endr

    ...so that integer instructions (eg. ADD, SUB, AND, ORR, EOR, shifts, etc.) will be right after STR and before LDR; never after LDR.

    But in order to make my question more clear:

    A:

         .rept 20

              .rept 10

                   ldr rX,[rS,#imm]

                   str rX,[rT,#imm]

              .endr

              .rept 10

                   and rX,rX,#imm

              .endr

         .endr

    B:

         .rept 20

              .rept 10

                   ldr rX,[rS,#imm]

                   str rX,[rT,#imm]

                   and rX,rX,#imm

              .endr

         .endr

    I understand it as example B would not suffer from bubbles in the pipeline, nothing gets pipelined after a STR, thus it does not matter which instruction we place after STR, correct ?

    -Would bubbles appear in the pipeline in example A (because of the long list of LDR/STR), or would they be equally efficient ?

Children
No data