This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Cortex A8 Instruction Cycle Timing

Note: This was originally posted on 17th March 2011 at http://forums.arm.com

Hi) sorry for bad English

I need to count latency for two instruction, and all I have is the arm cortex A 8 documantation(charter 16) !
but I have no idea how can do this work using that documantation(

Parents

0 Dung Tran over 10 years ago

Note: This was originally posted on 28th April 2011 at http://forums.arm.com

Hum. you start with very complex questions !!!

First I do not understand what you say about "static scheduling scoreboard, replay and pending queue"
But I do not really understand what ARM call "data hazard" ;(

What I can say is that if you apply the stage rules describe into the ARM documentation to count cycle, you'll have a "quite" correct result.

After that there is a lot of special case (and they are not always documented) that can improve the quality of the counting process.
shortcut (or fast forward) for example.

Thank you very much, Etienne. I am sorry for my unclear questions.

About "static scheduling scoreboard, replay and pending queue", they are some parts of pipeline of Cortex-a8. You can refer to this document: here

Is it true that you ignore the shortcut (or forwarding, or bypassing) in your method of counting number of cycles? therefore, if a instruction has an operand that is the source of previous instruction, it may have to wait 1 cycle (or more).

Branch mispredict penality : you can't handle this kind of stall cycles because you can't know when the ARM will have a mispredict branch. It's the same problem with memory read outside the cache !
So you can just expect that most of case you don't have those stall cycle and then ignore those case.

I found in Cortex-A8 document that describe mispredict penalty. It happens when the target address that is predicted by "program flow prediction" is different from target address that is generated in Execution (E5). I think we can trap instructions that cause mispredict penalty. Please refer to here

The Cortex "can start" 4 instructions in the same cycle.
Don't believe you'll be able to execute 4 instructions at each cycle! that's wrong !
But in some case, in some cycle, the Cortex Can start 4 instructions (2 ARM and 2 NEON) in the same cycle.

yes, I still can't imagine that. Especially, in IF stage, the pipeline fetches 4 instructions at a same cycle. I don't know how can it handle if there is one branch in these 4 instructions? If you have any related material, please let me know.

I do not handle the 13 pipelines stages. I handle instructions when they enter into a functional unit.
The cycle counter is not so complex (in fact decode step are not usefull to count cycle (I guess)).

What do you mean by "functional unit"? I agree that ID step is not useful. But in IF step, if branch is taken, it causes 1 cycle penalty. I think we need to care about this case, right?

Dung!
Cancel
Up 0 Down

Cancel

Reply

0 Dung Tran over 10 years ago

Note: This was originally posted on 28th April 2011 at http://forums.arm.com

Hum. you start with very complex questions !!!

First I do not understand what you say about "static scheduling scoreboard, replay and pending queue"
But I do not really understand what ARM call "data hazard" ;(

What I can say is that if you apply the stage rules describe into the ARM documentation to count cycle, you'll have a "quite" correct result.

After that there is a lot of special case (and they are not always documented) that can improve the quality of the counting process.
shortcut (or fast forward) for example.

Thank you very much, Etienne. I am sorry for my unclear questions.

About "static scheduling scoreboard, replay and pending queue", they are some parts of pipeline of Cortex-a8. You can refer to this document: here

Is it true that you ignore the shortcut (or forwarding, or bypassing) in your method of counting number of cycles? therefore, if a instruction has an operand that is the source of previous instruction, it may have to wait 1 cycle (or more).

Branch mispredict penality : you can't handle this kind of stall cycles because you can't know when the ARM will have a mispredict branch. It's the same problem with memory read outside the cache !
So you can just expect that most of case you don't have those stall cycle and then ignore those case.

I found in Cortex-A8 document that describe mispredict penalty. It happens when the target address that is predicted by "program flow prediction" is different from target address that is generated in Execution (E5). I think we can trap instructions that cause mispredict penalty. Please refer to here

The Cortex "can start" 4 instructions in the same cycle.
Don't believe you'll be able to execute 4 instructions at each cycle! that's wrong !
But in some case, in some cycle, the Cortex Can start 4 instructions (2 ARM and 2 NEON) in the same cycle.

yes, I still can't imagine that. Especially, in IF stage, the pipeline fetches 4 instructions at a same cycle. I don't know how can it handle if there is one branch in these 4 instructions? If you have any related material, please let me know.

I do not handle the 13 pipelines stages. I handle instructions when they enter into a functional unit.
The cycle counter is not so complex (in fact decode step are not usefull to count cycle (I guess)).

What do you mean by "functional unit"? I agree that ID step is not useful. But in IF step, if branch is taken, it causes 1 cycle penalty. I think we need to care about this case, right?

Dung!
Cancel
Up 0 Down

Cancel

Children

No data