This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

NEON vdiv.f32 syntax

Note: This was originally posted on 17th April 2012 at http://forums.arm.com

I am (re)coding a 3D math library with inline NEON assembly for iOS using the Apple LLVM compiler 3.1.

I get an error message on the following instruction:

[color="#000000"]    "vdiv.f32 q0, q1, q2 \n\t" [/color]

VFP single or double precision register expected -- `vdiv.f32 q0,q1,q2'

According to the 'Assembler Reference' page 4-76 you should specify a single precision register. The following code works:

[color="#000000"]    "vdiv.f32 s0, s4, s8 \n\t" [/color]
"vdiv.f32 s1, s5, s9 \n\t"
"vdiv.f32 s2, s6, s10 \n\t"

I am confused because now the divide is not computed in parallel, which was the reason to use inline assembly.

Also the following instructions work as expected:

[color="#000000"]    // component wise add[/color]
[color="#000000"]    "vadd.f32 q0, q1, q2 \n\t" [/color]

[color="#008311"] // component wise subtract
[color="#000000"]    "vsub.f32 q0, q1, q2 \n\t" [/color]

[color="#008311"] // component wise multiply
[color="#000000"]    "vmul.f32 q0, q1, q2 \n\t" [/color][/color][/color]
[color="#000000"]
[color="#ce2f24"][color="#000000"]Why do I get an error message on the vdiv and not on the vadd, vsub and vmul? Is this a compiler error?[/color]

[/color][/color]

Parents

Etienne SOBOLE over 12 years ago

Note: This was originally posted on 17th April 2012 at http://forums.arm.com

Do you mean that these instructions don't exist:
VDIV.f32 d0, d1, d2 // NEON 2 float operation
VDIV.f32 q0, q1, q2 // NEON 4 float operation

Yes that's exactly what I mean !

And these instruction exist:
VADD.f32 d0, d1, d2 // NEON 2 float operation
VADD.f32 q0, q1, q2 // NEON 4 float operation

Yes that's correct !

Why is it not mentioned in the documentation? Does the divider use to much space on chip?

This is clearly mentioned in the documentation.
To be exact, this os not mentioned that NEON have VDIV instruction !

So you have to trade speed for accuracy?

In fact Yes and No.
There is a small code that allow you to make very accurate division.

vrecpe.f32 d1, d5 vrecps.f32 d2, d1, d5 vmul.f32 d1, d1, d2 vrecps.f32 d2, d1, d5 vmul.f32 d5, d1, d2

You can then decide if you want a very fast division, or a very accurate one !

Etienne
Cancel
Vote up 0 Vote down

Cancel

Reply

Etienne SOBOLE over 12 years ago

Note: This was originally posted on 17th April 2012 at http://forums.arm.com

Do you mean that these instructions don't exist:
VDIV.f32 d0, d1, d2 // NEON 2 float operation
VDIV.f32 q0, q1, q2 // NEON 4 float operation

Yes that's exactly what I mean !

And these instruction exist:
VADD.f32 d0, d1, d2 // NEON 2 float operation
VADD.f32 q0, q1, q2 // NEON 4 float operation

Yes that's correct !

Why is it not mentioned in the documentation? Does the divider use to much space on chip?

This is clearly mentioned in the documentation.
To be exact, this os not mentioned that NEON have VDIV instruction !

So you have to trade speed for accuracy?

In fact Yes and No.
There is a small code that allow you to make very accurate division.

vrecpe.f32 d1, d5 vrecps.f32 d2, d1, d5 vmul.f32 d1, d1, d2 vrecps.f32 d2, d1, d5 vmul.f32 d5, d1, d2

You can then decide if you want a very fast division, or a very accurate one !

Etienne
Cancel
Vote up 0 Vote down

Cancel

Children

No data