Pipeline Stage Read and Write

Note: This was originally posted on 5th March 2011 at http://forums.arm.com

I'm still trying to understand the cycle table of the cortex A8.

Most of the test I've made suppose this:
- source register are needed at the beginning of the stage
- destination register are released at the end of the stage.

With those rules it's seem's that result are quite good.

For example, this code take 3 cycles

    add   r5, r5, #1
    mov   r6, r5

because ADD release r5 on the end of stage 2 while MOV need it at the beginning of stage 1

That 's work and that's real cycle execution timing.

But I've a problem with the MLA shortcuts


    mul   r4, r5, r4
    mla   r0, r6, r7, r4


the MUL should release R4 at the end of stage 5 (of the second cycle of the MUL)
the MLA need r4 at the beginning of the stage 4 (due to MLA shortcut).

So the code should take 5 cycles, but in fact It takes only 4 cycles.

Is it possible that r4 is only needed at the beginning of the stage 4 of the second cycle of the MLA ???
Or may be the forwarding is done at the end of the stage 4. So I could suppose this is the same thing as the beginning of the stage 5 !

That could explain the missing cycle.
Parents
  • Note: This was originally posted on 23rd March 2011 at http://forums.arm.com

    mul     r4, r5, r4
    mla     r0, r6, r7, r4


    we are in the 1 cycle
    The mul instruction block r4 untill E5, so r4 will be avieble in 1+6=7 th cycle
    but mla need r4 onlu in E4, so here we win 3 cycles(during this 3 cycles they executed in "paraler")
    so mla can use R4 only in 7-3=4 th cycle
    bu in your site http://pulsar.webshaker.net/, the mla start to execute in 3 cycle, can you please explain why??
Reply
  • Note: This was originally posted on 23rd March 2011 at http://forums.arm.com

    mul     r4, r5, r4
    mla     r0, r6, r7, r4


    we are in the 1 cycle
    The mul instruction block r4 untill E5, so r4 will be avieble in 1+6=7 th cycle
    but mla need r4 onlu in E4, so here we win 3 cycles(during this 3 cycles they executed in "paraler")
    so mla can use R4 only in 7-3=4 th cycle
    bu in your site http://pulsar.webshaker.net/, the mla start to execute in 3 cycle, can you please explain why??
Children
No data
More questions in this forum