This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Cortex A8 Instruction Cycle Timing

Note: This was originally posted on 17th March 2011 at http://forums.arm.com

Hi) sorry for bad English

I need to count latency for two instruction, and all I have is the arm cortex A 8 documantation(charter 16) !
but I have no idea how can do this work using that documantation(
Parents
  • Note: This was originally posted on 11th April 2011 at http://forums.arm.com

    The problem is due to your branch.

    You can't simply expect that, if the branch is in the cache, it will take 1 cycle...
    branch is more complex as it seem's.

    Take this testing procédure. It will be more easy to understand the time taken by your program.


    movw r0, #0x0500                   @ you repeat your loop 83232000 times
    movt r0, #0x04F6
    .loop:
    nop                                    @ here is you nop
    nop
    nop
    nop
    nop
    nop
    nop
    nop

    smuad   r1, r1, r1                 @ you can be sure the ending code take 5 cycles.
    nop
    nop
    smuad   r2, r2, r2
    nop
    subs   r0, r0, #1
    smuad   r3, r3, r3
    bgt   .loop

    bx lr


    If you don't put any nop (I speak about your nop ! don't remove the nop after this smuad)
    the program should take 0.52 s (this is logic because you beagleboard works at 800mhz and 5 * 83232000 ~= 400M cycles
    every time you add 2 nop you program will take ~= 0.10 s more

    you could have more readable result is you repeat you loop 80.000.000 times
    in this case use


    movw r0, #0xB400
    movt r0, #0x04C4


    instead of

    movw r0, #0x0500
    movt r0, #0x04F6
Reply
  • Note: This was originally posted on 11th April 2011 at http://forums.arm.com

    The problem is due to your branch.

    You can't simply expect that, if the branch is in the cache, it will take 1 cycle...
    branch is more complex as it seem's.

    Take this testing procédure. It will be more easy to understand the time taken by your program.


    movw r0, #0x0500                   @ you repeat your loop 83232000 times
    movt r0, #0x04F6
    .loop:
    nop                                    @ here is you nop
    nop
    nop
    nop
    nop
    nop
    nop
    nop

    smuad   r1, r1, r1                 @ you can be sure the ending code take 5 cycles.
    nop
    nop
    smuad   r2, r2, r2
    nop
    subs   r0, r0, #1
    smuad   r3, r3, r3
    bgt   .loop

    bx lr


    If you don't put any nop (I speak about your nop ! don't remove the nop after this smuad)
    the program should take 0.52 s (this is logic because you beagleboard works at 800mhz and 5 * 83232000 ~= 400M cycles
    every time you add 2 nop you program will take ~= 0.10 s more

    you could have more readable result is you repeat you loop 80.000.000 times
    in this case use


    movw r0, #0xB400
    movt r0, #0x04C4


    instead of

    movw r0, #0x0500
    movt r0, #0x04F6
Children
No data