This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Division with NEON

Note: This was originally posted on 30th September 2011 at http://forums.arm.com

Hi.

I have 4 unsigned 16bit values into a Dn register (or 8 into a Qn register)

[v1] [v2] [v3] [v4]

I'm looking for the code to finaly have

[65536 / v1] [65536 / v2] [65536 / v3] [65536 / v4]

into another (or the same) Dn (or Qn) register...
Thank's

Etienne
Parents
  • Note: This was originally posted on 26th September 2012 at http://forums.arm.com

    When dividing signed numbers (by an unsigned value) one way to do it is to multiply the numbers by -1 before and after the division if necessary, so the division still takes place as unsigned. This can be accomplished with the following code:


    // 8x16-bit signed inputs are in q0
    // Elements in q1 are 0xFFFF for negative values, 0x0000 for positive (or zero) values
    vclt.s16 q1, q0, #0
    // Make negative values positive
    vabs.s16 q0, q0

    // ... Division performed here, results in q0 ...

    // Negate values that were negative. This is done by observing that neg(x) = not(x) + 1.
    // For values that were negative the field in q1 was 0xFFFF, therefore we get ((x ^ 0xFFFF) - 0xFFFF) which is not(x) + 1.
    // For values that were positive the field in q1 was 0x0000, therefore we get (x ^ 0x0000) - 0x0000 which is just x.
    // If you can, put some other operation between these two instructions to avoid a stall.
    veor.s16 q0, q0, q1
    vsub.s16 q0, q0, q1


    Note that this will round the negative result towards zero if the positive result was rounded towards zero. This is how most CPU integer divide instructions work, but if you want one that rounds negative values towards negative infinity you'll have to do this differently.

    For the actual division please refer to my post from October 3. Note that webshaker didn't need fully accurate results, so he was able to just use the reciprocal approximation instruction by itself. But if you want accurate result that won't work. He also must have been starting with 4x32-bit unsigned values so he probably did the conversion from 16-bit to 32-bit earlier in code he didn't show (with a vmovw or something).
Reply
  • Note: This was originally posted on 26th September 2012 at http://forums.arm.com

    When dividing signed numbers (by an unsigned value) one way to do it is to multiply the numbers by -1 before and after the division if necessary, so the division still takes place as unsigned. This can be accomplished with the following code:


    // 8x16-bit signed inputs are in q0
    // Elements in q1 are 0xFFFF for negative values, 0x0000 for positive (or zero) values
    vclt.s16 q1, q0, #0
    // Make negative values positive
    vabs.s16 q0, q0

    // ... Division performed here, results in q0 ...

    // Negate values that were negative. This is done by observing that neg(x) = not(x) + 1.
    // For values that were negative the field in q1 was 0xFFFF, therefore we get ((x ^ 0xFFFF) - 0xFFFF) which is not(x) + 1.
    // For values that were positive the field in q1 was 0x0000, therefore we get (x ^ 0x0000) - 0x0000 which is just x.
    // If you can, put some other operation between these two instructions to avoid a stall.
    veor.s16 q0, q0, q1
    vsub.s16 q0, q0, q1


    Note that this will round the negative result towards zero if the positive result was rounded towards zero. This is how most CPU integer divide instructions work, but if you want one that rounds negative values towards negative infinity you'll have to do this differently.

    For the actual division please refer to my post from October 3. Note that webshaker didn't need fully accurate results, so he was able to just use the reciprocal approximation instruction by itself. But if you want accurate result that won't work. He also must have been starting with 4x32-bit unsigned values so he probably did the conversion from 16-bit to 32-bit earlier in code he didn't show (with a vmovw or something).
Children
No data