This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Cortex A8 Instruction Cycle Timing

Note: This was originally posted on 17th March 2011 at http://forums.arm.com

Hi) sorry for bad English

I need to count latency for two instruction, and all I have is the arm cortex A 8 documantation(charter 16) !
but I have no idea how can do this work using that documantation(
Parents
  • Note: This was originally posted on 27th April 2011 at http://forums.arm.com


    I used above link to check cycles of some ARM instruction. However, I confused about the pipeline column.
    For example, there are "no, n1, 0, 1"  that happen in 1 cycle. They seem to be stages of pipeline. However, Cortex-A8 has 13 stages of pipeline and there is no name like these name. Also, 1 stage takes 1 cycle, right?

    Please give me some explanations.


    Hum. you start with very complex questions !!!



    First I do not understand what you say about "static scheduling scoreboard, replay and pending queue"
    But I do not really understand what ARM call "data hazard" ;(

    What I can say is that if you apply the stage rules describe into the ARM documentation to count cycle, you'll have a "quite" correct result.

    After that there is a lot of special case (and they are not always documented) that can improve the quality of the counting process.
    shortcut (or fast forward) for example.



    Branch mispredict penality : you can't handle this kind of stall cycles because you can't know when the ARM will have a mispredict branch. It's the same problem with memory read outside the cache !
    So you can just expect that most of case you don't have those stall cycle and then ignore those case.



    For the 0 / 1 / n0 / n1 : this is not stages of the pipeline.
    This is the name of the 2 ARM pipelines (0 and 1) and the 2 NEON pipelines (n0 / n1)

    The Cortex "can start" 4 instructions in the same cycle.
    Don't believe you'll be able to execute 4 instructions at each cycle! that's wrong !
    But in some case, in some cycle, the Cortex Can start 4 instructions (2 ARM and 2 NEON) in the same cycle.

    Rem : I don't speak about VPf because Vpf and NEON interaction are another problem !



    About the Cycle Counter:
    I do not handle the 13 pipelines stages. I handle instructions when they enter into a functional unit.
    The cycle counter is not so complex (in fact decode step are not usefull to count cycle (I guess)).

    All that stuff is not very easy to understand.
    To start, forget NEON and its 2 pipelines (n0 and n1).
    Do some tests if you have a Cortex.

    Etienne

Reply
  • Note: This was originally posted on 27th April 2011 at http://forums.arm.com


    I used above link to check cycles of some ARM instruction. However, I confused about the pipeline column.
    For example, there are "no, n1, 0, 1"  that happen in 1 cycle. They seem to be stages of pipeline. However, Cortex-A8 has 13 stages of pipeline and there is no name like these name. Also, 1 stage takes 1 cycle, right?

    Please give me some explanations.


    Hum. you start with very complex questions !!!



    First I do not understand what you say about "static scheduling scoreboard, replay and pending queue"
    But I do not really understand what ARM call "data hazard" ;(

    What I can say is that if you apply the stage rules describe into the ARM documentation to count cycle, you'll have a "quite" correct result.

    After that there is a lot of special case (and they are not always documented) that can improve the quality of the counting process.
    shortcut (or fast forward) for example.



    Branch mispredict penality : you can't handle this kind of stall cycles because you can't know when the ARM will have a mispredict branch. It's the same problem with memory read outside the cache !
    So you can just expect that most of case you don't have those stall cycle and then ignore those case.



    For the 0 / 1 / n0 / n1 : this is not stages of the pipeline.
    This is the name of the 2 ARM pipelines (0 and 1) and the 2 NEON pipelines (n0 / n1)

    The Cortex "can start" 4 instructions in the same cycle.
    Don't believe you'll be able to execute 4 instructions at each cycle! that's wrong !
    But in some case, in some cycle, the Cortex Can start 4 instructions (2 ARM and 2 NEON) in the same cycle.

    Rem : I don't speak about VPf because Vpf and NEON interaction are another problem !



    About the Cycle Counter:
    I do not handle the 13 pipelines stages. I handle instructions when they enter into a functional unit.
    The cycle counter is not so complex (in fact decode step are not usefull to count cycle (I guess)).

    All that stuff is not very easy to understand.
    To start, forget NEON and its 2 pipelines (n0 and n1).
    Do some tests if you have a Cortex.

    Etienne

Children
No data