Assuming it's right, the decoding of NEON instruction is after the ARM pipeline. Does it mean that NEON instructions have to pass through the entire ARM pipeline first then get decoded? And when does dual issue happen, after decoding before pipeline? Why NEON instructions need to be decoded twice? Isn't it a waste of time and die size?
The summing up question: how to calculate the number of cycles that a NEON instruction takes in total, from fetch to write back and taking dual issue into consideration?
View all questions in Arm Development Studio forum