This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

NEON vdiv.f32 syntax

Note: This was originally posted on 17th April 2012 at http://forums.arm.com

I am (re)coding a 3D math library with inline NEON assembly for iOS using the Apple LLVM compiler 3.1.

I get an error message on the following instruction:

[color="#000000"]    "vdiv.f32 q0, q1, q2 \n\t" [/color]

VFP single or double precision register expected -- `vdiv.f32 q0,q1,q2'

According to the 'Assembler Reference' page 4-76  you should specify a single precision register. The following code works:

[color="#000000"]    "vdiv.f32 s0, s4,  s8 \n\t" [/color]
"vdiv.f32 s1, s5,  s9 \n\t"
"vdiv.f32 s2, s6, s10 \n\t"

I am confused because now the divide is not computed in parallel, which was the reason to use inline assembly.

Also the following instructions work as expected:

[color="#000000"]    // component wise add[/color]
[color="#000000"]    "vadd.f32 q0, q1, q2 \n\t" [/color]

[color="#008311"] // component wise subtract
[color="#000000"]    "vsub.f32 q0, q1, q2 \n\t" [/color]

[color="#008311"] // component wise multiply
[color="#000000"]    "vmul.f32 q0, q1, q2 \n\t" [/color][/color][/color]
[color="#000000"]
[color="#ce2f24"][color="#000000"]Why do I get an error message on the vdiv and not on the vadd, vsub and vmul? Is this a compiler error?[/color]




[/color][/color]
Parents
  • Note: This was originally posted on 17th April 2012 at http://forums.arm.com


    Do you mean that these instructions don't exist:
    VDIV.f32 d0, d1, d2  // NEON 2 float operation
    VDIV.f32 q0, q1, q2  // NEON 4 float operation


    Yes that's exactly what I mean !

    And these instruction exist:
    VADD.f32 d0, d1, d2  // NEON 2 float operation
    VADD.f32 q0, q1, q2  // NEON 4 float operation


    Yes that's correct !


    Why is it not mentioned in the documentation? Does the divider use to much space on chip?


    This is clearly mentioned in the documentation.
    To be exact, this os not mentioned that NEON have VDIV instruction !


    So you have to trade speed for accuracy?


    In fact Yes and No.
    There is a small code that allow you to make very accurate division.


    vrecpe.f32          d1, d5
    vrecps.f32          d2, d1, d5
    vmul.f32            d1, d1, d2
    vrecps.f32          d2, d1, d5
    vmul.f32            d5, d1, d2


    You can then decide if you want a very fast division, or a very accurate one !

    Etienne
Reply
  • Note: This was originally posted on 17th April 2012 at http://forums.arm.com


    Do you mean that these instructions don't exist:
    VDIV.f32 d0, d1, d2  // NEON 2 float operation
    VDIV.f32 q0, q1, q2  // NEON 4 float operation


    Yes that's exactly what I mean !

    And these instruction exist:
    VADD.f32 d0, d1, d2  // NEON 2 float operation
    VADD.f32 q0, q1, q2  // NEON 4 float operation


    Yes that's correct !


    Why is it not mentioned in the documentation? Does the divider use to much space on chip?


    This is clearly mentioned in the documentation.
    To be exact, this os not mentioned that NEON have VDIV instruction !


    So you have to trade speed for accuracy?


    In fact Yes and No.
    There is a small code that allow you to make very accurate division.


    vrecpe.f32          d1, d5
    vrecps.f32          d2, d1, d5
    vmul.f32            d1, d1, d2
    vrecps.f32          d2, d1, d5
    vmul.f32            d5, d1, d2


    You can then decide if you want a very fast division, or a very accurate one !

    Etienne
Children
No data