This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

speed up square root computation (approximation)

Hello,

I have to compute the square root of a floating point number.
A function sqrt() already exists in the math.h, but this function is too slow and the computation time depends on the input value (because of an iterative method used to compute the square root). The accuracy requirements in my case are not very high, so 2 digits after decimal point would be enough.

So I seached for an alternative method to compute a square root and found an interessting method, that uses an approximation to compute the square root (see below). But this only works on 32 bit CPUs (little endian), so I tried to port it to a ST10.

I thought this is not very difficult because the ST10 also uses little endian and is IEEE-754 compatible.

The function to approximate the square root:
float fastsqrt(float val) {
int tmp = *(int *)&val;
tmp -= 1<<23; /* Remove IEEE bias from exponent (-2^23) */
/* tmp is now an appoximation to logbase2(val) */
tmp = tmp >> 1; /* divide by 2 */
tmp += 1<<23; /* restore the IEEE bias from the exponent (+2^23) */
return *(float *)&tmp;
}

I replaced all int with long (because long is 32bit on ST10), but this doesn't work anyway.

It seems that the behavior of the shifts is different on the ST10 in combination with long numbers than on a 32-bit CPU.

Does anybody has same hints for me to make this function working?
Or has somebody an alternative function for approximate or compute a square root in a quick way on a ST10?

Thanks a lot...

Alexander

Parents

0 Drew Davis over 19 years ago

This function simply divides the exponent of the floating point number by 2. Multiplying two numbers together means you add their exponents. Dividing the exponent by two is a way to find two numbers with equal exponents that multiply to the original exponent, which is roughly equivalent to taking the square root

Consider sqrt(100). That's 10^2, which is to say 10^1 * 10^1. 2 / 2 = 1, the new exponent of 10 in the result.

Since this method neglects the mantissa entirely, it won't be very precise, nor accurate for numbers without large exponents. The example above happens to be exactly accurate, because the input is an even power of the base I'm using (10).

Consider sqrt(500). 5 * 10^2 -> 5 * 10^1. This function would produce the result "50" rather than ~22, because it doesn't take the root of the mantissa.

For really large numbers, this effect might not be as important. The square root of 5,000,000,000,000 is about 5,000,000, and the error factor of around 2 perhaps doesn't matter compared to the different of six orders of magnitude between the input and the root. For numbers close to 2^1, the error is likely much more significant.

A 256-entry lookup table for the mantissa should get you around 2 decimal digits of accuracy there (with a maximum error around 1/256th, or ~0.4%). You could look up the 8 most significant bits of the mantissa and replace it with a pre-calculated 8-bit value from the LUT. This will still be pretty quick, but cost 256 bytes of RAM.

Interpolation between table entries would further improve accuracy at the cost of speed.
Cancel
Vote up 0 Vote down

Cancel

Reply

0 Drew Davis over 19 years ago

This function simply divides the exponent of the floating point number by 2. Multiplying two numbers together means you add their exponents. Dividing the exponent by two is a way to find two numbers with equal exponents that multiply to the original exponent, which is roughly equivalent to taking the square root

Consider sqrt(100). That's 10^2, which is to say 10^1 * 10^1. 2 / 2 = 1, the new exponent of 10 in the result.

Since this method neglects the mantissa entirely, it won't be very precise, nor accurate for numbers without large exponents. The example above happens to be exactly accurate, because the input is an even power of the base I'm using (10).

Consider sqrt(500). 5 * 10^2 -> 5 * 10^1. This function would produce the result "50" rather than ~22, because it doesn't take the root of the mantissa.

For really large numbers, this effect might not be as important. The square root of 5,000,000,000,000 is about 5,000,000, and the error factor of around 2 perhaps doesn't matter compared to the different of six orders of magnitude between the input and the root. For numbers close to 2^1, the error is likely much more significant.

A 256-entry lookup table for the mantissa should get you around 2 decimal digits of accuracy there (with a maximum error around 1/256th, or ~0.4%). You could look up the 8 most significant bits of the mantissa and replace it with a pre-calculated 8-bit value from the LUT. This will still be pretty quick, but cost 256 bytes of RAM.

Interpolation between table entries would further improve accuracy at the cost of speed.
Cancel
Vote up 0 Vote down

Cancel

Children

0 Alexander Laue over 19 years ago in reply to Drew Davis

Hi Mike,

your modified code works pretty well now. Thanks a lot. :D

I did a little benchmark with this code and measure the execution time over 100 square-root computations with a timer.

This fastsqrt-function is more than 3 times faster than the normal sqrt-function and has a fixed computation time.

The only drawback is the low accuracy for small numbers. But I'm dealing with more or less large numbers and my accuracy-requirements are not very high, so this functions works fine for me.

If you need high accuracy for small numbers, Drews tip using a lookup-table for the mantissa is great.

Thanks a lot to everyone who participates on this discussion.

regards...

Alexander
Cancel
Vote up 0 Vote down

Cancel
0 Mik Kleshov over 19 years ago in reply to Alexander Laue
You can optimize the function further if you write it in assembly:

MOV R4,R8 MOV R5,R9 SUB R5,#0x3F80 SHR R4,#1 BMOV R4.15,R5.0 ASHR R5,#1 ADD R5,#0x3F80
Only 7 instructions: this should be very fast. With optimization switched on, the compiler should generate similar code. Try raising optimization level and see the generated code. If the compiler doesn't do a good job, consider writing the function in assembly.

Regards,
- mike
Cancel
Vote up 0 Vote down

Cancel