This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Understanding VPf and NEON link

Note: This was originally posted on 14th March 2011 at http://forums.arm.com

I'd like to understand exactly how NEON can be used instead of VPf.

I understand that
FADDS
is now replaced by
VADD.f32

But I have some questions about that.
- Does the 2 syntax have the same memory representation (do they have the same hexadecimal representation) ??? (I guess Yes, but i'd like to be sure)
- FADDS can be conditionnal !!! Can anybody give me an example ?
- if FADDS can be conditionnal then VADD.f32 should be too ! What is the correct syntaxe for a conditionnal VADD.f32
- FADDS is now VADD.f32 and is executed into NEON pipeline. Does it mean that VADD.f32 (and FADDS) execute in 1 cycle instead of 9.

Is I'm right,
FADDD is replaced by VADD.f64 but is not executed into NEON pipeline, so the VPF cycle table must be used !!!
FADDD not seems to be a conditionnal instruction! Are we ok about that ?

Thank's
Etienne

Parents

Etienne SOBOLE over 12 years ago

Note: This was originally posted on 14th March 2011 at http://forums.arm.com

Ok I've made so test...

In this code

fmsr s14, r0 ... fmuls s0, s14, s14 ... fmrs r0, s0 mov pc, lr

fmuls take 12 cycles. this is the same time as vmul.f32 (because as you said, this is the same instruction)
I don't know if it use NEON pipeline or not. I suppose NO because it should take only 7 cycles if it was the case.

and in this code

fmsr s14, r0 ... vmul.f32 d0, d7, d7 ... fmrs r0, s0 mov pc, lr

The vmul.f32 take only 1 cycle !!!

What can be faster ???
Cancel
Vote up 0 Vote down

Cancel

Reply

Etienne SOBOLE over 12 years ago

Note: This was originally posted on 14th March 2011 at http://forums.arm.com

Ok I've made so test...

In this code

fmsr s14, r0 ... fmuls s0, s14, s14 ... fmrs r0, s0 mov pc, lr

fmuls take 12 cycles. this is the same time as vmul.f32 (because as you said, this is the same instruction)
I don't know if it use NEON pipeline or not. I suppose NO because it should take only 7 cycles if it was the case.

and in this code

fmsr s14, r0 ... vmul.f32 d0, d7, d7 ... fmrs r0, s0 mov pc, lr

The vmul.f32 take only 1 cycle !!!

What can be faster ???
Cancel
Vote up 0 Vote down

Cancel

Children

No data