Support forums

Architectures and Processors forum Questions regarding NEON

State Accepted Answer
Locked Locked
Replies 3 replies
Subscribers 349 subscribers
Views 6561 views
Users 0 members are here

Options

How was your experience today?

This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Questions regarding NEON

CFriebel over 10 years ago

Hi,

for a project regarding Digital Signal Processing on ARM SoCs i'm currently gathering some information about the ARM NEON engine and would need some clarification if my assumptions are correct.

I found an instruction timing table in the "Cortex-A9 NEON Media Processing Engine Technical Reference Manual" with columns like "Cycles", "Result" and "Writeback".

For example for a VMLA Advanced SIMD floating-point instruction there are these values given:

VMLA | Dd,Dn,Dm | 1 | 3,2,2 | 9 | 10 |

Is it necessary to add the values of the Cycles, Result and Writeback fields to calculate the duration of the VMLA instruction, so that it takes 20 cycles in total to have the result written back to the register file or can the result be found in the register file already 10 cycles after execution of the instruction?

In other words: are the 10 cycles for the Writeback only used and needed for the Writeback or are the Result- and Execution-Cycles-durations included?

I read that with NEON it's possible to do SIMD single precision x4.

Am I assuming correct that with NEON when talking about single precision we are talking about 32-bit (IEEE-754)?

For a MAC (VMLA) this would mean 32-bit x 32-bit with a 64-bit product that is added to a 64-bit accumulator, correct?

And does the x4 mean that this can be done 4 times in parallel?

How many cycles would it take to have the 4 results in the register file then?

Thank you.