We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
I'm seeing Cortex-A7 cycle-timing table here :
http://hardwarebug.org/2014/05/15/cortex-a7-instruction-cycle-timings/
For example,
VADD.F32 Dd, Dn, Dm takes 2 cycles
VADD.F32 Qd, Qn, Qm takes 4 cycles
same goes for VMUL..
Is this really the case ? I think to remember that both take 1 cycle on Cortex-A8 ?
I'm wondering what's the benefit to use NEON in this case, except for compatibility reason maybe, where some parts could run 4 times faster on some other NEON implementations ?
> Now I realize although NEON instructions are the same, its implementation in the silicon must be quite different
Yep, exactly this.
> But it looks like I will not gain much on using NEON on the A7
If you know you are going to use only Cortex-A7 then it is unlikely you will gain too much, but NEON has some nice instructions which are not always available as scalar integer or VFP equivalents, so it does tend to be a little faster over a whole algorithm (just not the same multiplier you would get on a bigger core). What you would buy yourself is a little future portability - you could just run the same app on a different platform with a wider NEON implementation and it would automatically go faster without any extra work.
HTH,
Pete
If you're using your board / device as 'bare metal', writing purely in assembly language (thus knowing exactly which registers are used for what), then you could keep things in registers and thus gain a little extra speed by not reloading all the time.
That means: If you're using an operating system - such as Linux - then this isn't really possible.
peterharris - I've understood that basically the higher number that a core has, the faster it is.
I've seen that this seems to be true for the Cortex-A5, Cortex-A7 and Cortex-A8, but some people claim that parts of the Cortex-A8 is faster than the Cortex-A9 ?
When moving up to Cortex-A12, Cortex-A15 and Cortex-A17, I'm starting to lack knowledge (except that Cortex-A17 uses less power than Cortex-A15).
Is there a good overview of the 'core speeds' and the 'NEON speeds' for the Cortex-A architectures ?
I've understood that basically the higher number that a core has, the faster it is.
Not sure how well it holds up now that we have Cortex-A50 and A70 series, but as a very rough rule of thumb it's probably not too far off "on average". As always there will be bits which perform slightly worse and bits which perform slightly better than that.
but some people claim that parts of the Cortex-A8 is faster than the Cortex-A9 ?
Cortex-A8 NEON is dual issue for some pairings of instruction, whereas Cortex-A9 NEON is single issue, so Cortex-A9 can go slower. What it does do a lot better than Cortex-A8 is interoperate ARM and NEON code, so the cost of moving from ARM to NEON/VFP is much lower. This is very important for normal "float" programming in C, as soft-float ABI was so common back when these cores were released.
Nothing specific I am aware of for NEON.
Thank you Peter this is definitely clearing up a lot of things.