This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

NEON vdiv.f32 syntax

Note: This was originally posted on 17th April 2012 at http://forums.arm.com

I am (re)coding a 3D math library with inline NEON assembly for iOS using the Apple LLVM compiler 3.1.

I get an error message on the following instruction:

[color="#000000"]    "vdiv.f32 q0, q1, q2 \n\t" [/color]

VFP single or double precision register expected -- `vdiv.f32 q0,q1,q2'

According to the 'Assembler Reference' page 4-76  you should specify a single precision register. The following code works:

[color="#000000"]    "vdiv.f32 s0, s4,  s8 \n\t" [/color]
"vdiv.f32 s1, s5,  s9 \n\t"
"vdiv.f32 s2, s6, s10 \n\t"

I am confused because now the divide is not computed in parallel, which was the reason to use inline assembly.

Also the following instructions work as expected:

[color="#000000"]    // component wise add[/color]
[color="#000000"]    "vadd.f32 q0, q1, q2 \n\t" [/color]

[color="#008311"] // component wise subtract
[color="#000000"]    "vsub.f32 q0, q1, q2 \n\t" [/color]

[color="#008311"] // component wise multiply
[color="#000000"]    "vmul.f32 q0, q1, q2 \n\t" [/color][/color][/color]
[color="#000000"]
[color="#ce2f24"][color="#000000"]Why do I get an error message on the vdiv and not on the vadd, vsub and vmul? Is this a compiler error?[/color]




[/color][/color]
  • Note: This was originally posted on 17th April 2012 at http://forums.arm.com

    Do you mean that these instructions don't exist:
    VDIV.f32 d0, d1, d2  // NEON 2 float operation
    VDIV.f32 q0, q1, q2  // NEON 4 float operation

    And these instruction exist:


    VADD.f32 d0, d1, d2  // NEON 2 float operation
    VADD.f32 q0, q1, q2  // NEON 4 float operation



    Why is it not mentioned in the documentation? Does the divider use to much space on chip?


    So you have to trade speed for accuracy?

  • Note: This was originally posted on 17th April 2012 at http://forums.arm.com

    Thanks,


    Still don't see any difference in the documentation however. VADD, VSUB and VDIV are mentioned on the same page.
    It does not mention  you can use quad registers. Am I using the wrong documentation? Is NEON != VFP instructions?

    See documentation

    The following instructions compile and work on the iPhone and iPad hardware.

    [color="#000000"]    // component wise add[/color]
    [color="#000000"]    "vadd.f32 q0, q1, q2 \n\t" [/color]

    [color="#008311"][color="#000000"]    // component wise subtract[/color]
    [color="#000000"]    "vsub.f32 q0, q1, q2 \n\t" [/color]


    [/color]
  • Note: This was originally posted on 17th April 2012 at http://forums.arm.com

    Thanks, again, just started with NEON yesterday, its twenty years since I used assembly. Using inline NEON assembly for simple vector math already pays off. Also saw some very interesting instructions to code out a complete loops and make it even faster. Thinking I move some code from the GPU to NEON so they work in parallel.
  • Note: This was originally posted on 17th April 2012 at http://forums.arm.com


    Did you had a Acorn Archimedes ?



    No I used to program microcontrollers in assembly to control all kinds of machines. Safety software for boilers etc.  Those little guys with a minimal amount of ROM / RAM. To clean the program memory you had to give them a UV sunbath. I those days you had to program you own 16 bit multiply and divide etc. I did a good job because they still manufacture thousands of boilers with the majority of the code 15 years old.

  • Note: This was originally posted on 17th April 2012 at http://forums.arm.com

    There is no NEON VDIV instruction !

    for most NEON/Vpf instruction the register define the unit used,


    Vxxx.f32 s0, s1, s2  // Vpf 1 float operation
    Vxxx.f64 d0, d1, d2  // Vpf 1 double operation
    Vxxx.f32 d0, d1, d2  // NEON 2 float operation
    Vxxx.f32 q0, q1, q2  // NEON 4 float operation


    But there is no NEON division.

    You must have a look to VRECPE
    this instruction return a estimation of the reciprocal value. The precision of the result is 8 bit.

    Etienne.
  • Note: This was originally posted on 17th April 2012 at http://forums.arm.com


    Do you mean that these instructions don't exist:
    VDIV.f32 d0, d1, d2  // NEON 2 float operation
    VDIV.f32 q0, q1, q2  // NEON 4 float operation


    Yes that's exactly what I mean !

    And these instruction exist:
    VADD.f32 d0, d1, d2  // NEON 2 float operation
    VADD.f32 q0, q1, q2  // NEON 4 float operation


    Yes that's correct !


    Why is it not mentioned in the documentation? Does the divider use to much space on chip?


    This is clearly mentioned in the documentation.
    To be exact, this os not mentioned that NEON have VDIV instruction !


    So you have to trade speed for accuracy?


    In fact Yes and No.
    There is a small code that allow you to make very accurate division.


    vrecpe.f32          d1, d5
    vrecps.f32          d2, d1, d5
    vmul.f32            d1, d1, d2
    vrecps.f32          d2, d1, d5
    vmul.f32            d5, d1, d2


    You can then decide if you want a very fast division, or a very accurate one !

    Etienne
  • Note: This was originally posted on 17th April 2012 at http://forums.arm.com

    I see the problem

    Use this PDF documentation instead
    http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0406c/index.html

    chapter A8.8.312
    It's said "Encoding T1/A1 VFPv2, VFPv3, VFPv4"

    VDIV is not a NEON instruction.

    Vpf and NEON are not the same computing unit.
    ARM have decided to unify the instruction syntax but the two unit are very different !!!

    Etienne
  • Note: This was originally posted on 17th April 2012 at http://forums.arm.com

    Did you had a Acorn Archimedes ?