Arm Community
Site
Search
User
Site
Search
User
Support forums
Arm Development Studio forum
Division with NEON
Jump...
Cancel
Locked
Locked
Replies
8 replies
Subscribers
119 subscribers
Views
14078 views
Users
0 members are here
Options
Share
More actions
Cancel
Related
How was your experience today?
This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion
Division with NEON
Etienne SOBOLE
over 12 years ago
Note: This was originally posted on 30th September 2011 at
http://forums.arm.com
Hi.
I have 4 unsigned 16bit values into a Dn register (or 8 into a Qn register)
[v1] [v2] [v3] [v4]
I'm looking for the code to finaly have
[65536 / v1] [65536 / v2] [65536 / v3] [65536 / v4]
into another (or the same) Dn (or Qn) register...
Thank's
Etienne
Parents
Gilead Kutnick
over 12 years ago
Note: This was originally posted on 3rd October 2011 at
http://forums.arm.com
vrecpe.u32 takes normalized inputs, similar to how floating point significant data is usually stored. What that means is that the input has no leading zeroes past the first bit that's always 0. So the top two bits will always be 01.
Another way to look at it is that vrecpe.u32 works on values between 0.5 and 1.0 (non-inclusive), where the format is 0.1.31. That means no sign bits, 1 whole bit, and 31 fraction bits. Due to the input constraints the top bit will always be 0.
The reason for this format is to limit the possible range of the calculated reciprocal, which you'll notice must be between 1.0 and 2.0. The one whole number bit was kept available to satisfy this range. If you didn't perform this range limiting you wouldn't be able to define very useful data representations for integer reciprocals, since the reciprocal of any whole number is a fraction.
What normalization does is converts an input x to the format:
x_normalized = x * 2^shift
x = x_normalized * 2^-shift
Where the multiplication can be performed by a bit-shift. Note that for the reciprocal:
x_reciprocal = 1 / x = 1 / (x_normalized * 2^-shift) = (1 / x_normalized) * 2^shift
Which means that you end performing a left shift in the end to undo the normalization. This is instead of a right shift because the reciprocal changes the sign of the power.
Then for the actual division:
a = y / x
a = y * (1 / x)
a = y * (1 / x_normalized * 2^-shift)
a = (y * (1 / x_normalized)) * 2^-shift
You can find the normalization shift with a count leading zeroes instruction. In your case you'll want to use vclz.u16. But you need to leave that integer bit, so you want to set shift equal to clz(x) - 1.
However, you will not always get the correct answer using vrecpe.u32, because it's only correct to ~8 bits. In order to improve the result to get correct 16 bit values you need to use Newton-Raphson iteration. That is, for y = 1 / x,
y_refined = y * (2 - (x * y))
This is kind of a pain to do in integer on NEON because there's no vrecps equivalent instruction and since this is a fixed point multiplication you need the long answer, only to throw out the bottom bits. Honestly you're probably better off just converting to floating point and back. You don't even have to do the final multiplication, you can use vcvt to convert between floating point and fixed point and do the multiplication (left shift by 16) for free. Of course, you can do something similar if you stick with integer.
Cancel
Vote up
0
Vote down
Cancel
Reply
Gilead Kutnick
over 12 years ago
Note: This was originally posted on 3rd October 2011 at
http://forums.arm.com
vrecpe.u32 takes normalized inputs, similar to how floating point significant data is usually stored. What that means is that the input has no leading zeroes past the first bit that's always 0. So the top two bits will always be 01.
Another way to look at it is that vrecpe.u32 works on values between 0.5 and 1.0 (non-inclusive), where the format is 0.1.31. That means no sign bits, 1 whole bit, and 31 fraction bits. Due to the input constraints the top bit will always be 0.
The reason for this format is to limit the possible range of the calculated reciprocal, which you'll notice must be between 1.0 and 2.0. The one whole number bit was kept available to satisfy this range. If you didn't perform this range limiting you wouldn't be able to define very useful data representations for integer reciprocals, since the reciprocal of any whole number is a fraction.
What normalization does is converts an input x to the format:
x_normalized = x * 2^shift
x = x_normalized * 2^-shift
Where the multiplication can be performed by a bit-shift. Note that for the reciprocal:
x_reciprocal = 1 / x = 1 / (x_normalized * 2^-shift) = (1 / x_normalized) * 2^shift
Which means that you end performing a left shift in the end to undo the normalization. This is instead of a right shift because the reciprocal changes the sign of the power.
Then for the actual division:
a = y / x
a = y * (1 / x)
a = y * (1 / x_normalized * 2^-shift)
a = (y * (1 / x_normalized)) * 2^-shift
You can find the normalization shift with a count leading zeroes instruction. In your case you'll want to use vclz.u16. But you need to leave that integer bit, so you want to set shift equal to clz(x) - 1.
However, you will not always get the correct answer using vrecpe.u32, because it's only correct to ~8 bits. In order to improve the result to get correct 16 bit values you need to use Newton-Raphson iteration. That is, for y = 1 / x,
y_refined = y * (2 - (x * y))
This is kind of a pain to do in integer on NEON because there's no vrecps equivalent instruction and since this is a fixed point multiplication you need the long answer, only to throw out the bottom bits. Honestly you're probably better off just converting to floating point and back. You don't even have to do the final multiplication, you can use vcvt to convert between floating point and fixed point and do the multiplication (left shift by 16) for free. Of course, you can do something similar if you stick with integer.
Cancel
Vote up
0
Vote down
Cancel
Children
No data