This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

NEON pipeline stages in instruction timing

Note: This was originally posted on 3rd April 2012 at http://forums.arm.com

I'm trying to understand more detail about the instruction timing in Cortex-A8/A9.

In TRM of A8, the timing is described as E1 or N2, which means pipeline stage "Execution 1" in ARM pipeline and "Execution 2"  in NEON pipeline, is that right?
I think before executing there must be cycles for fetching and decoding. What is the value of cycles that fetching and decoding take? Are they the same for ARM and NEON?

I got such a figure after googling.


Is that a right description for A8 pipeline?

Assuming it's right, the decoding of NEON instruction is after the ARM pipeline. Does it mean that NEON instructions have to pass through the entire ARM pipeline first then get decoded? And when does dual issue happen, after decoding before pipeline?  Why NEON instructions need to be decoded twice? Isn't it a waste of time and die size?

The summing up question: how to calculate the number of cycles that a NEON instruction takes in total, from fetch to write back and taking dual issue into consideration?

Thank you so much.
0