Pipeline Stage Read and Write

Note: This was originally posted on 5th March 2011 at http://forums.arm.com

I'm still trying to understand the cycle table of the cortex A8.

Most of the test I've made suppose this:
- source register are needed at the beginning of the stage
- destination register are released at the end of the stage.

With those rules it's seem's that result are quite good.

For example, this code take 3 cycles

    add   r5, r5, #1
    mov   r6, r5

because ADD release r5 on the end of stage 2 while MOV need it at the beginning of stage 1

That 's work and that's real cycle execution timing.

But I've a problem with the MLA shortcuts


    mul   r4, r5, r4
    mla   r0, r6, r7, r4


the MUL should release R4 at the end of stage 5 (of the second cycle of the MUL)
the MLA need r4 at the beginning of the stage 4 (due to MLA shortcut).

So the code should take 5 cycles, but in fact It takes only 4 cycles.

Is it possible that r4 is only needed at the beginning of the stage 4 of the second cycle of the MLA ???
Or may be the forwarding is done at the end of the stage 4. So I could suppose this is the same thing as the beginning of the stage 5 !

That could explain the missing cycle.
Parents
  • Note: This was originally posted on 6th March 2011 at http://forums.arm.com

    Hello

    I think that the first justification 
      r4 is only needed at the beginning of the stage 4 of the second cycle of the MLA 
    is what causing execution to be of 4 cycles.

    But please can you provide me from where from the TRM you got the data that the 1st multiply will have its result ready at E5 of second cycle. OK, it's commonsense but I remember that I read such detail once but I'm unable to find it again. Actually I want to read the part of documentation in which this info is included another time, surely the answer will be hidden somewhere. Thanks.
Reply
  • Note: This was originally posted on 6th March 2011 at http://forums.arm.com

    Hello

    I think that the first justification 
      r4 is only needed at the beginning of the stage 4 of the second cycle of the MLA 
    is what causing execution to be of 4 cycles.

    But please can you provide me from where from the TRM you got the data that the 1st multiply will have its result ready at E5 of second cycle. OK, it's commonsense but I remember that I read such detail once but I'm unable to find it again. Actually I want to read the part of documentation in which this info is included another time, surely the answer will be hidden somewhere. Thanks.
Children
No data
More questions in this forum