This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

NEON vdiv.f32 syntax

Note: This was originally posted on 17th April 2012 at http://forums.arm.com

I am (re)coding a 3D math library with inline NEON assembly for iOS using the Apple LLVM compiler 3.1.

I get an error message on the following instruction:

[color="#000000"]    "vdiv.f32 q0, q1, q2 \n\t" [/color]

VFP single or double precision register expected -- `vdiv.f32 q0,q1,q2'

According to the 'Assembler Reference' page 4-76  you should specify a single precision register. The following code works:

[color="#000000"]    "vdiv.f32 s0, s4,  s8 \n\t" [/color]
"vdiv.f32 s1, s5,  s9 \n\t"
"vdiv.f32 s2, s6, s10 \n\t"

I am confused because now the divide is not computed in parallel, which was the reason to use inline assembly.

Also the following instructions work as expected:

[color="#000000"]    // component wise add[/color]
[color="#000000"]    "vadd.f32 q0, q1, q2 \n\t" [/color]

[color="#008311"] // component wise subtract
[color="#000000"]    "vsub.f32 q0, q1, q2 \n\t" [/color]

[color="#008311"] // component wise multiply
[color="#000000"]    "vmul.f32 q0, q1, q2 \n\t" [/color][/color][/color]
[color="#000000"]
[color="#ce2f24"][color="#000000"]Why do I get an error message on the vdiv and not on the vadd, vsub and vmul? Is this a compiler error?[/color]




[/color][/color]
Parents
  • Note: This was originally posted on 17th April 2012 at http://forums.arm.com

    Do you mean that these instructions don't exist:
    VDIV.f32 d0, d1, d2  // NEON 2 float operation
    VDIV.f32 q0, q1, q2  // NEON 4 float operation

    And these instruction exist:


    VADD.f32 d0, d1, d2  // NEON 2 float operation
    VADD.f32 q0, q1, q2  // NEON 4 float operation



    Why is it not mentioned in the documentation? Does the divider use to much space on chip?


    So you have to trade speed for accuracy?

Reply
  • Note: This was originally posted on 17th April 2012 at http://forums.arm.com

    Do you mean that these instructions don't exist:
    VDIV.f32 d0, d1, d2  // NEON 2 float operation
    VDIV.f32 q0, q1, q2  // NEON 4 float operation

    And these instruction exist:


    VADD.f32 d0, d1, d2  // NEON 2 float operation
    VADD.f32 q0, q1, q2  // NEON 4 float operation



    Why is it not mentioned in the documentation? Does the divider use to much space on chip?


    So you have to trade speed for accuracy?

Children
No data