NEON vdiv.f32 syntax

Note: This was originally posted on 17th April 2012 at http://forums.arm.com

I am (re)coding a 3D math library with inline NEON assembly for iOS using the Apple LLVM compiler 3.1.

I get an error message on the following instruction:

[color="#000000"]    "vdiv.f32 q0, q1, q2 \n\t" [/color]

VFP single or double precision register expected -- `vdiv.f32 q0,q1,q2'

According to the 'Assembler Reference' page 4-76  you should specify a single precision register. The following code works:

[color="#000000"]    "vdiv.f32 s0, s4,  s8 \n\t" [/color]
"vdiv.f32 s1, s5,  s9 \n\t"
"vdiv.f32 s2, s6, s10 \n\t"

I am confused because now the divide is not computed in parallel, which was the reason to use inline assembly.

Also the following instructions work as expected:

[color="#000000"]    // component wise add[/color]
[color="#000000"]    "vadd.f32 q0, q1, q2 \n\t" [/color]

[color="#008311"] // component wise subtract
[color="#000000"]    "vsub.f32 q0, q1, q2 \n\t" [/color]

[color="#008311"] // component wise multiply
[color="#000000"]    "vmul.f32 q0, q1, q2 \n\t" [/color][/color][/color]
[color="#000000"]
[color="#ce2f24"][color="#000000"]Why do I get an error message on the vdiv and not on the vadd, vsub and vmul? Is this a compiler error?[/color]




[/color][/color]
Parents
  • Note: This was originally posted on 17th April 2012 at http://forums.arm.com

    Do you mean that these instructions don't exist:
    VDIV.f32 d0, d1, d2  // NEON 2 float operation
    VDIV.f32 q0, q1, q2  // NEON 4 float operation

    And these instruction exist:


    VADD.f32 d0, d1, d2  // NEON 2 float operation
    VADD.f32 q0, q1, q2  // NEON 4 float operation



    Why is it not mentioned in the documentation? Does the divider use to much space on chip?


    So you have to trade speed for accuracy?

Reply
  • Note: This was originally posted on 17th April 2012 at http://forums.arm.com

    Do you mean that these instructions don't exist:
    VDIV.f32 d0, d1, d2  // NEON 2 float operation
    VDIV.f32 q0, q1, q2  // NEON 4 float operation

    And these instruction exist:


    VADD.f32 d0, d1, d2  // NEON 2 float operation
    VADD.f32 q0, q1, q2  // NEON 4 float operation



    Why is it not mentioned in the documentation? Does the divider use to much space on chip?


    So you have to trade speed for accuracy?

Children
No data
More questions in this forum