What is the value of cycles that fetching and decoding take? Are they the same for ARM and NEON?
Assuming it's right, the decoding of NEON instruction is after the ARM pipeline. Does it mean that NEON instructions have to pass through the entire ARM pipeline first then get decoded?
And when does dual issue happen, after decoding before pipeline?
Why NEON instructions need to be decoded twice? Isn't it a waste of time and die size?
View all questions in Arm Development Studio forum