This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

NEON: Cortex A7 is 4 times slower than Cortex A8 ?

Laurent over 10 years ago

I'm seeing Cortex-A7 cycle-timing table here :

http://hardwarebug.org/2014/05/15/cortex-a7-instruction-cycle-timings/

For example,

VADD.F32 Dd, Dn, Dm takes 2 cycles

VADD.F32 Qd, Qn, Qm takes 4 cycles

same goes for VMUL..

Is this really the case ? I think to remember that both take 1 cycle on Cortex-A8 ?

I'm wondering what's the benefit to use NEON in this case, except for compatibility reason maybe, where some parts could run 4 times faster on some other NEON implementations ?

Parents

0 Peter Harris over 10 years ago in reply to Laurent

> Now I realize although NEON instructions are the same, its implementation in the silicon must be quite different
Yep, exactly this.
> But it looks like I will not gain much on using NEON on the A7

If you know you are going to use only Cortex-A7 then it is unlikely you will gain too much, but NEON has some nice instructions which are not always available as scalar integer or VFP equivalents, so it does tend to be a little faster over a whole algorithm (just not the same multiplier you would get on a bigger core). What you would buy yourself is a little future portability - you could just run the same app on a different platform with a wider NEON implementation and it would automatically go faster without any extra work.

HTH,
Pete
Cancel
Vote up 0 Vote down

Cancel

Reply

0 Peter Harris over 10 years ago in reply to Laurent

> Now I realize although NEON instructions are the same, its implementation in the silicon must be quite different
Yep, exactly this.
> But it looks like I will not gain much on using NEON on the A7

If you know you are going to use only Cortex-A7 then it is unlikely you will gain too much, but NEON has some nice instructions which are not always available as scalar integer or VFP equivalents, so it does tend to be a little faster over a whole algorithm (just not the same multiplier you would get on a bigger core). What you would buy yourself is a little future portability - you could just run the same app on a different platform with a wider NEON implementation and it would automatically go faster without any extra work.

HTH,
Pete
Cancel
Vote up 0 Vote down

Cancel

Children

0 Jens Bauer over 10 years ago in reply to Peter Harris

If you're using your board / device as 'bare metal', writing purely in assembly language (thus knowing exactly which registers are used for what), then you could keep things in registers and thus gain a little extra speed by not reloading all the time.
That means: If you're using an operating system - such as Linux - then this isn't really possible.
peterharris - I've understood that basically the higher number that a core has, the faster it is.
I've seen that this seems to be true for the Cortex-A5, Cortex-A7 and Cortex-A8, but some people claim that parts of the Cortex-A8 is faster than the Cortex-A9 ?
When moving up to Cortex-A12, Cortex-A15 and Cortex-A17, I'm starting to lack knowledge (except that Cortex-A17 uses less power than Cortex-A15).
Is there a good overview of the 'core speeds' and the 'NEON speeds' for the Cortex-A architectures ?
Cancel
Vote up 0 Vote down

Cancel
0 Peter Harris over 10 years ago in reply to Jens Bauer

I've understood that basically the higher number that a core has, the faster it is.

Not sure how well it holds up now that we have Cortex-A50 and A70 series, but as a very rough rule of thumb it's probably not too far off "on average". As always there will be bits which perform slightly worse and bits which perform slightly better than that.

but some people claim that parts of the Cortex-A8 is faster than the Cortex-A9 ?

Cortex-A8 NEON is dual issue for some pairings of instruction, whereas Cortex-A9 NEON is single issue, so Cortex-A9 can go slower. What it does do a lot better than Cortex-A8 is interoperate ARM and NEON code, so the cost of moving from ARM to NEON/VFP is much lower. This is very important for normal "float" programming in C, as soft-float ABI was so common back when these cores were released.

Is there a good overview of the 'core speeds' and the 'NEON speeds' for the Cortex-A architectures ?

Nothing specific I am aware of for NEON.
HTH,
Pete
Cancel
Vote up 0 Vote down

Cancel
0 Jens Bauer over 10 years ago in reply to Peter Harris

Thank you Peter this is definitely clearing up a lot of things.
Cancel
Vote up 0 Vote down

Cancel