NEON: Cortex A7 is 4 times slower than Cortex A8 ?

I'm seeing Cortex-A7 cycle-timing table here :

http://hardwarebug.org/2014/05/15/cortex-a7-instruction-cycle-timings/


For example, 

VADD.F32 Dd, Dn, Dm takes 2 cycles

VADD.F32 Qd, Qn, Qm takes 4 cycles

same goes for VMUL..

Is this really the case ? I think to remember that both take 1 cycle on Cortex-A8 ?

I'm wondering what's the benefit to use NEON in this case, except for compatibility reason maybe, where some parts could run 4 times faster on some other NEON implementations ?

Parents
  • > except for compatibility reason maybe,


    We have a winner. App developers hate recompiling apps for 10 different variants of an architecture, so compatibility and ensuring that all apps run is a really important design objective.


    > where some parts could run 4 times faster on some other NEON implementations ?


    Cortex-A7 is much smaller and lower power than Cortex-A8 - if you want pure clock-for-clock performance we have plenty of cores which are faster than Cortex-A8, so it really depends what tradeoffs your design is trying to make in terms of silicon area, power, and performance.


    HTH,

    Pete


Reply
  • > except for compatibility reason maybe,


    We have a winner. App developers hate recompiling apps for 10 different variants of an architecture, so compatibility and ensuring that all apps run is a really important design objective.


    > where some parts could run 4 times faster on some other NEON implementations ?


    Cortex-A7 is much smaller and lower power than Cortex-A8 - if you want pure clock-for-clock performance we have plenty of cores which are faster than Cortex-A8, so it really depends what tradeoffs your design is trying to make in terms of silicon area, power, and performance.


    HTH,

    Pete


Children
More questions in this forum