This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Cortex-M7 load instruction latency and pairing

Hello,

What is the latency for the LDR instruction when the result is used for integer arithmetic operations (for example DSP MAC instructions)?

Also, can 64-bit loads (LDRD) be paired with another instruction? Can I do for example a 64-bit load and an integer MAC at the same time?

I hope ARM will add the latencies and more detailed pairing information to the reference manual soon.

Antti

Parents
  • Hello Antti,

    as I don't know the details of Cortex-M7 micro-architecture, I measured the latencies on the real chip.

    What is the latency for the LDR instruction when the result is used for integer arithmetic operations (for example DSP MAC instructions)?

    LDR r1,[r0]      0.900 cycles

    ----------------

    LDR r1,[r0]      1.000 cycles

    MLA r3,r2,r2,r1

    ----------------

    LDR r1,[r0]      2.000 cycles
    MLA r3,r1,r1,r2

    ----------------

    LDR r1,[r0]      0.999 cycles
    MLA r3,r2,r2,r2

    The results were that if there was no register dependency LDR and MLA were concurrently executed, and if there there was register dependency some wait cycles happened. Even in the case, if the LDR result was the addend of MLA the wait cycles were hidden.

    Also, can 64-bit loads (LDRD) be paired with another instruction? Can I do for example a 64-bit load and an integer MAC at the same time?

    LDRD r2,r3,[r0]  0.999 cycles

    ----------------

    LDRD r2,r3,[r0]  1.900 cycles

    MLA  r1,r0,r0,r2

    ----------------

    LDRD r2,r3,[r0]  1.900 cycles

    MLA  r1,r0,r0,r3

    ----------------

    LDRD r2,r3,[r0]  2.900 cycles

    MLA  r1,r2,r2,r0

    ----------------

    LDRD r2,r3,[r0]  2.900 cycles

    MLA  r1,r3,r3,r0

    ----------------

    LDRD r2,r3,[r0] 1.900 cycles

    MLA  r1,r0,r0,r0

    The results were that LDRD and MLA could not be concurrently executed and the operand order was strongly affect their latencies.

    I hope ARM will add the latencies and more detailed pairing information to the reference manual soon.

    I think it would be too difficult because there would be many variations of each instruction according to the conditions.

    Best regards,

    Yasuhiko Koumoto.

Reply
  • Hello Antti,

    as I don't know the details of Cortex-M7 micro-architecture, I measured the latencies on the real chip.

    What is the latency for the LDR instruction when the result is used for integer arithmetic operations (for example DSP MAC instructions)?

    LDR r1,[r0]      0.900 cycles

    ----------------

    LDR r1,[r0]      1.000 cycles

    MLA r3,r2,r2,r1

    ----------------

    LDR r1,[r0]      2.000 cycles
    MLA r3,r1,r1,r2

    ----------------

    LDR r1,[r0]      0.999 cycles
    MLA r3,r2,r2,r2

    The results were that if there was no register dependency LDR and MLA were concurrently executed, and if there there was register dependency some wait cycles happened. Even in the case, if the LDR result was the addend of MLA the wait cycles were hidden.

    Also, can 64-bit loads (LDRD) be paired with another instruction? Can I do for example a 64-bit load and an integer MAC at the same time?

    LDRD r2,r3,[r0]  0.999 cycles

    ----------------

    LDRD r2,r3,[r0]  1.900 cycles

    MLA  r1,r0,r0,r2

    ----------------

    LDRD r2,r3,[r0]  1.900 cycles

    MLA  r1,r0,r0,r3

    ----------------

    LDRD r2,r3,[r0]  2.900 cycles

    MLA  r1,r2,r2,r0

    ----------------

    LDRD r2,r3,[r0]  2.900 cycles

    MLA  r1,r3,r3,r0

    ----------------

    LDRD r2,r3,[r0] 1.900 cycles

    MLA  r1,r0,r0,r0

    The results were that LDRD and MLA could not be concurrently executed and the operand order was strongly affect their latencies.

    I hope ARM will add the latencies and more detailed pairing information to the reference manual soon.

    I think it would be too difficult because there would be many variations of each instruction according to the conditions.

    Best regards,

    Yasuhiko Koumoto.

Children