I'm seeing Cortex-A7 cycle-timing table here :
http://hardwarebug.org/2014/05/15/cortex-a7-instruction-cycle-timings/
For example,
VADD.F32 Dd, Dn, Dm takes 2 cycles
VADD.F32 Qd, Qn, Qm takes 4 cycles
same goes for VMUL..
Is this really the case…