This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Cortex A8 Instruction Cycle Timing

Note: This was originally posted on 17th March 2011 at http://forums.arm.com

Hi) sorry for bad English

I need to count latency for two instruction, and all I have is the arm cortex A 8 documantation(charter 16) !
but I have no idea how can do this work using that documantation(
Parents
  • Note: This was originally posted on 28th April 2011 at http://forums.arm.com


    Hum. you start with very complex questions !!!

    First I do not understand what you say about "static scheduling scoreboard, replay and pending queue"
    But I do not really understand what ARM call "data hazard" ;(

    What I can say is that if you apply the stage rules describe into the ARM documentation to count cycle, you'll have a "quite" correct result.

    After that there is a lot of special case (and they are not always documented) that can improve the quality of the counting process.
    shortcut (or fast forward) for example.



    Thank you very much, Etienne. I am sorry for my unclear questions.

    About  "static scheduling scoreboard, replay and pending queue", they are some parts of pipeline of Cortex-a8. You can refer to this document: here

    Is it true that you ignore the shortcut (or forwarding, or bypassing) in your method of counting number of cycles? therefore, if  a instruction has an operand that is the source of previous instruction, it may have to wait 1 cycle (or more).


    Branch mispredict penality : you can't handle this kind of stall cycles  because you can't know when the ARM will have a mispredict branch. It's  the same problem with memory read outside the cache !
    So you can just expect that most of case you don't have those stall cycle and then ignore those case.



    I found in Cortex-A8 document that describe mispredict penalty. It happens when the target address that  is predicted by "program flow prediction" is different from target address that is generated in Execution (E5).   I think we can trap instructions that cause mispredict penalty. Please refer to here


    The Cortex "can start" 4 instructions in the same cycle.
    Don't believe you'll be able to execute 4 instructions at each cycle! that's wrong !
    But in some case, in some cycle, the Cortex Can start 4 instructions (2 ARM and 2 NEON) in the same cycle.



    yes, I still can't imagine that. Especially, in IF stage, the pipeline fetches 4 instructions at a same cycle. I don't know how can it handle if there is one branch in these 4 instructions? If you have any related material, please let me know.


    I do not handle the 13 pipelines stages. I handle instructions when they enter into a functional unit.
    The cycle counter is not so complex (in fact decode step are not usefull to count cycle (I guess)).



    What do you mean by "functional unit"? I agree that ID step is not useful. But in IF step, if branch is taken, it causes 1 cycle penalty. I think we need to care about this case, right?

    Dung!
Reply
  • Note: This was originally posted on 28th April 2011 at http://forums.arm.com


    Hum. you start with very complex questions !!!

    First I do not understand what you say about "static scheduling scoreboard, replay and pending queue"
    But I do not really understand what ARM call "data hazard" ;(

    What I can say is that if you apply the stage rules describe into the ARM documentation to count cycle, you'll have a "quite" correct result.

    After that there is a lot of special case (and they are not always documented) that can improve the quality of the counting process.
    shortcut (or fast forward) for example.



    Thank you very much, Etienne. I am sorry for my unclear questions.

    About  "static scheduling scoreboard, replay and pending queue", they are some parts of pipeline of Cortex-a8. You can refer to this document: here

    Is it true that you ignore the shortcut (or forwarding, or bypassing) in your method of counting number of cycles? therefore, if  a instruction has an operand that is the source of previous instruction, it may have to wait 1 cycle (or more).


    Branch mispredict penality : you can't handle this kind of stall cycles  because you can't know when the ARM will have a mispredict branch. It's  the same problem with memory read outside the cache !
    So you can just expect that most of case you don't have those stall cycle and then ignore those case.



    I found in Cortex-A8 document that describe mispredict penalty. It happens when the target address that  is predicted by "program flow prediction" is different from target address that is generated in Execution (E5).   I think we can trap instructions that cause mispredict penalty. Please refer to here


    The Cortex "can start" 4 instructions in the same cycle.
    Don't believe you'll be able to execute 4 instructions at each cycle! that's wrong !
    But in some case, in some cycle, the Cortex Can start 4 instructions (2 ARM and 2 NEON) in the same cycle.



    yes, I still can't imagine that. Especially, in IF stage, the pipeline fetches 4 instructions at a same cycle. I don't know how can it handle if there is one branch in these 4 instructions? If you have any related material, please let me know.


    I do not handle the 13 pipelines stages. I handle instructions when they enter into a functional unit.
    The cycle counter is not so complex (in fact decode step are not usefull to count cycle (I guess)).



    What do you mean by "functional unit"? I agree that ID step is not useful. But in IF step, if branch is taken, it causes 1 cycle penalty. I think we need to care about this case, right?

    Dung!
Children
No data