This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

cycle penality before using a register as pointer

Note: This was originally posted on 28th January 2011 at http://forums.arm.com

Hi.

it seem's that you can't use a modified register as a load address directly in the next cycle (with the Cortex A8)


For example

ADD   r0, r0, #16
LDR   r1,[r0]


will not execute in 2 cycles but in 3 cycles.
I'm looking in the ARM documentation where this penality cycle is explain but I do not find !!!

If I simply use the cortex A8 cycle table:
ADD will write his result in E2
while LDR will need R0 in E1

So If I just apply those rules, the 2 instructions should execute in 2 cycles.

So !!! Does anybody can tell me where this pipeline-dependent latency is explain (or simply notify) ?

Thank's
Parents
  • Note: This was originally posted on 30th January 2011 at http://forums.arm.com

    In the  5th example in the trm,

    ADD   r0, r0, #16
    LDR   r1,[r0]

    I considered that the load will need the register at the beginning of the cycle for address calculation, whereas the add will produce that value of the register (r0) at the end of the cycle. So forwarding is not applicable at the same cycle due to the limited cycle time, and an extra cycle is therefore required at which forwarding will take place.

    So am I right or shall we consider the real latency of the ALU and the Address Generating Unit and if they can complete sequentially in the same cycle?? 
    But are you sure that you got 2 cycles when you tested again?
    Because if you considered what I said, same logic applies on the other two pieces of codes.
    I think here comes the extra cycle!
Reply
  • Note: This was originally posted on 30th January 2011 at http://forums.arm.com

    In the  5th example in the trm,

    ADD   r0, r0, #16
    LDR   r1,[r0]

    I considered that the load will need the register at the beginning of the cycle for address calculation, whereas the add will produce that value of the register (r0) at the end of the cycle. So forwarding is not applicable at the same cycle due to the limited cycle time, and an extra cycle is therefore required at which forwarding will take place.

    So am I right or shall we consider the real latency of the ALU and the Address Generating Unit and if they can complete sequentially in the same cycle?? 
    But are you sure that you got 2 cycles when you tested again?
    Because if you considered what I said, same logic applies on the other two pieces of codes.
    I think here comes the extra cycle!
Children
No data