This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

NEON: Cortex A7 is 4 times slower than Cortex A8 ?

I'm seeing Cortex-A7 cycle-timing table here :

http://hardwarebug.org/2014/05/15/cortex-a7-instruction-cycle-timings/


For example, 

VADD.F32 Dd, Dn, Dm takes 2 cycles

VADD.F32 Qd, Qn, Qm takes 4 cycles

same goes for VMUL..

Is this really the case ? I think to remember that both take 1 cycle on Cortex-A8 ?

I'm wondering what's the benefit to use NEON in this case, except for compatibility reason maybe, where some parts could run 4 times faster on some other NEON implementations ?

Parents
  • If you're using your board / device as 'bare metal', writing purely in assembly language (thus knowing exactly which registers are used for what), then you could keep things in registers and thus gain a little extra speed by not reloading all the time.

    That means: If you're using an operating system - such as Linux - then this isn't really possible.

    peterharris - I've understood that basically the higher number that a core has, the faster it is.

    I've seen that this seems to be true for the Cortex-A5, Cortex-A7 and Cortex-A8, but some people claim that parts of the Cortex-A8 is faster than the Cortex-A9 ?

    When moving up to Cortex-A12, Cortex-A15 and Cortex-A17, I'm starting to lack knowledge (except that Cortex-A17 uses less power than Cortex-A15).

    Is there a good overview of the 'core speeds' and the 'NEON speeds' for the Cortex-A architectures ?

Reply
  • If you're using your board / device as 'bare metal', writing purely in assembly language (thus knowing exactly which registers are used for what), then you could keep things in registers and thus gain a little extra speed by not reloading all the time.

    That means: If you're using an operating system - such as Linux - then this isn't really possible.

    peterharris - I've understood that basically the higher number that a core has, the faster it is.

    I've seen that this seems to be true for the Cortex-A5, Cortex-A7 and Cortex-A8, but some people claim that parts of the Cortex-A8 is faster than the Cortex-A9 ?

    When moving up to Cortex-A12, Cortex-A15 and Cortex-A17, I'm starting to lack knowledge (except that Cortex-A17 uses less power than Cortex-A15).

    Is there a good overview of the 'core speeds' and the 'NEON speeds' for the Cortex-A architectures ?

Children
  • I've understood that basically the higher number that a core has, the faster it is.

    Not sure how well it holds up now that we have Cortex-A50 and A70 series, but as a very rough rule of thumb it's probably not too far off "on average". As always there will be bits which perform slightly worse and bits which perform slightly better than that.

    but some people claim that parts of the Cortex-A8 is faster than the Cortex-A9 ?

    Cortex-A8 NEON is dual issue for some pairings of instruction, whereas Cortex-A9 NEON is single issue, so Cortex-A9 can go slower. What it does do a lot better than Cortex-A8 is interoperate ARM and NEON code, so the cost of moving from ARM to NEON/VFP is much lower. This is very important for normal "float" programming in C, as soft-float ABI was so common back when these cores were released.

    Is there a good overview of the 'core speeds' and the 'NEON speeds' for the Cortex-A architectures ?

    Nothing specific I am aware of for NEON.

    HTH,

    Pete

  • Thank you Peter this is definitely clearing up a lot of things.