This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Cortex A8 Instruction Cycle Timing

Note: This was originally posted on 17th March 2011 at http://forums.arm.com

Hi) sorry for bad English

I need to count latency for two instruction, and all I have is the arm cortex A 8 documantation(charter 16) !
but I have no idea how can do this work using that documantation(
Parents
  • Note: This was originally posted on 27th April 2011 at http://forums.arm.com


    Hum !!!
    You "just need" that ;)

    I can't give you the source code of the cycle counter but I can explain how it's work.
    There Is two part:
    - the general case
    - the specific case (register restriction, shortcuts, ...)

    You are at cycle #10

    1 - The ARM check before starting an instruction that all the registers will be available when the instruction will need them.
    For example:
    you want to execute a MUL Rd, Rm, Rs
    Rm must be available at cycle #11 (#10 + 1 see MUL cycle table http://infocenter.ar...ch16s02s03.html)
    If at least 1 register is not avalable, then the ARM do not start the instruction and you have a stall cycle.


    As far as I know, Cortex-A8 implements some forwarding hardware support, static scheduling scoreboard, replay and pending queue . They help to avoid any kind of data hazard between instructions. So is it correct that the kind of latency you said above is not count?

    In my opinion, to calculate the number of executed cycle, we just have to care about cycle penalty or cycle stall. I mean: branch taken penalty, replay penalty, branch mispredict penalty. I don't know why you didn't mention branch penalty in your explanation?

    I am new in ARM. So please forgive me for any silly understanding.
Reply
  • Note: This was originally posted on 27th April 2011 at http://forums.arm.com


    Hum !!!
    You "just need" that ;)

    I can't give you the source code of the cycle counter but I can explain how it's work.
    There Is two part:
    - the general case
    - the specific case (register restriction, shortcuts, ...)

    You are at cycle #10

    1 - The ARM check before starting an instruction that all the registers will be available when the instruction will need them.
    For example:
    you want to execute a MUL Rd, Rm, Rs
    Rm must be available at cycle #11 (#10 + 1 see MUL cycle table http://infocenter.ar...ch16s02s03.html)
    If at least 1 register is not avalable, then the ARM do not start the instruction and you have a stall cycle.


    As far as I know, Cortex-A8 implements some forwarding hardware support, static scheduling scoreboard, replay and pending queue . They help to avoid any kind of data hazard between instructions. So is it correct that the kind of latency you said above is not count?

    In my opinion, to calculate the number of executed cycle, we just have to care about cycle penalty or cycle stall. I mean: branch taken penalty, replay penalty, branch mispredict penalty. I don't know why you didn't mention branch penalty in your explanation?

    I am new in ARM. So please forgive me for any silly understanding.
Children
No data