This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Pipeline Stage Read and Write

Note: This was originally posted on 5th March 2011 at http://forums.arm.com

I'm still trying to understand the cycle table of the cortex A8.

Most of the test I've made suppose this:
- source register are needed at the beginning of the stage
- destination register are released at the end of the stage.

With those rules it's seem's that result are quite good.

For example, this code take 3 cycles

    add   r5, r5, #1
    mov   r6, r5

because ADD release r5 on the end of stage 2 while MOV need it at the beginning of stage 1

That 's work and that's real cycle execution timing.

But I've a problem with the MLA shortcuts


    mul   r4, r5, r4
    mla   r0, r6, r7, r4


the MUL should release R4 at the end of stage 5 (of the second cycle of the MUL)
the MLA need r4 at the beginning of the stage 4 (due to MLA shortcut).

So the code should take 5 cycles, but in fact It takes only 4 cycles.

Is it possible that r4 is only needed at the beginning of the stage 4 of the second cycle of the MLA ???
Or may be the forwarding is done at the end of the stage 4. So I could suppose this is the same thing as the beginning of the stage 5 !

That could explain the missing cycle.
Parents
  • Note: This was originally posted on 23rd March 2011 at http://forums.arm.com


    Sorry but what you mean when say word shortcut)))))

    and how do you know that mul then mla, takes 4 cycles?? )
    I mean how you test it??


    Oups RUBO !
    I don't saw that you give the explanation of shortcuts to Vahag! Sorry

    For real bench, the best solution I found is to put the instruction into a loop and looking for real time taken ;)

    You have a cycle counter registrer on CORTEX A8, but I never succed to use it on my linux distribution !
Reply
  • Note: This was originally posted on 23rd March 2011 at http://forums.arm.com


    Sorry but what you mean when say word shortcut)))))

    and how do you know that mul then mla, takes 4 cycles?? )
    I mean how you test it??


    Oups RUBO !
    I don't saw that you give the explanation of shortcuts to Vahag! Sorry

    For real bench, the best solution I found is to put the instruction into a loop and looking for real time taken ;)

    You have a cycle counter registrer on CORTEX A8, but I never succed to use it on my linux distribution !
Children
No data