We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
I'm seeing Cortex-A7 cycle-timing table here :
http://hardwarebug.org/2014/05/15/cortex-a7-instruction-cycle-timings/
For example,
VADD.F32 Dd, Dn, Dm takes 2 cycles
VADD.F32 Qd, Qn, Qm takes 4 cycles
same goes for VMUL..
Is this really the case ? I think to remember that both take 1 cycle on Cortex-A8 ?
I'm wondering what's the benefit to use NEON in this case, except for compatibility reason maybe, where some parts could run 4 times faster on some other NEON implementations ?
Thank you Peter this is definitely clearing up a lot of things.