This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

speed up square root computation (approximation)

Hello,

I have to compute the square root of a floating point number.
A function sqrt() already exists in the math.h, but this function is too slow and the computation time depends on the input value (because of an iterative method used to compute the square root). The accuracy requirements in my case are not very high, so 2 digits after decimal point would be enough.

So I seached for an alternative method to compute a square root and found an interessting method, that uses an approximation to compute the square root (see below). But this only works on 32 bit CPUs (little endian), so I tried to port it to a ST10.

I thought this is not very difficult because the ST10 also uses little endian and is IEEE-754 compatible.

The function to approximate the square root:
float fastsqrt(float val) {
int tmp = *(int *)&val;
tmp -= 1<<23; /* Remove IEEE bias from exponent (-2^23) */
/* tmp is now an appoximation to logbase2(val) */
tmp = tmp >> 1; /* divide by 2 */
tmp += 1<<23; /* restore the IEEE bias from the exponent (+2^23) */
return *(float *)&tmp;
}

I replaced all int with long (because long is 32bit on ST10), but this doesn't work anyway.

It seems that the behavior of the shifts is different on the ST10 in combination with long numbers than on a 32-bit CPU.

Does anybody has same hints for me to make this function working?
Or has somebody an alternative function for approximate or compute a square root in a quick way on a ST10?

Thanks a lot...

Alexander

Parents

0 Alexander Laue over 19 years ago in reply to Mik Kleshov

Hi Mike,

thanks for your answer.
Yout hint doesn't work, I'm afraid.

I don't know why this function work, but it does work. It is a special case of newton's method in combination with the bit-format of an IEEE-floating-point-number.

Please referr to http://en.wikipedia.org/wiki/Methods_of_computing_square_roots#Approximations_that_depend_on_IEEE_representation for more information.

regards...

Alexander
Cancel
Vote up 0 Vote down

Cancel

Reply

0 Alexander Laue over 19 years ago in reply to Mik Kleshov

Hi Mike,

thanks for your answer.
Yout hint doesn't work, I'm afraid.

I don't know why this function work, but it does work. It is a special case of newton's method in combination with the bit-format of an IEEE-floating-point-number.

Please referr to http://en.wikipedia.org/wiki/Methods_of_computing_square_roots#Approximations_that_depend_on_IEEE_representation for more information.

regards...

Alexander
Cancel
Vote up 0 Vote down

Cancel

Children

0 Keil Software Support Intl. over 19 years ago in reply to Alexander Laue

The algorithm assumes that 'int' is a 32-bit number. Replace 'int' with 'long'.
Cancel
Vote up 0 Vote down

Cancel
0 erik malund over 19 years ago in reply to Keil Software Support Intl.

Jack Gannsle has had a few articles about fast approximations in embedded magazine (I then to remember feb-apr) go to http://www.embedded.com and hunt them up

Erik
Cancel
Vote up 0 Vote down

Cancel
0 Mik Kleshov over 19 years ago in reply to Alexander Laue
Hi Alexander,

I read the Wikipedia article you mentioned. Seems like this should work. The function will return a denormalized result, but this should not be a problem. Please confirm that this works (or doesn't work):

float fastsqrt(float val) { long tmp = *(long *)&val; tmp -= 1L<<23; /* Remove IEEE bias from exponent (-2^23) */ /* tmp is now an appoximation to logbase2(val) */ tmp = tmp >> 1; /* divide by 2 */ tmp += 1L<<23; /* restore the IEEE bias from the exponent (+2^23) */ return *(float *)&tmp; }

If not, how do you know that it doesn't work?

- mike
Cancel
Vote up 0 Vote down

Cancel

0 Alexander Laue over 19 years ago in reply to Mik Kleshov

Hello together,

I have tested the modified function Mike posted above. This function doesn't work.

On my uC-board is a display (16x2 chars) so I'm able to see the result of the fastsqrt-computation on the display and the result is always zero (independed of input-value).

I used this quick&dirty hacked program:
So there are same variables and #include I have to use to control the display. But this has no effect on the fastsqrt-function.

#include "stdio.h"
#include "math.h"
#include "Intrins.h"			// intrinsic commands (nop..)
#include "regst10F269.h"		// Register Set of ST10F268 controller

// variables for the display
unsigned int far display_count1,display_count2,x,disp_pos,disp_data;
unsigned char far c;
bit far disp_busy, disp_tmp1, disp_tmp2;

#include "display_2zeilig_4bit.h"	// display control

float fastsqrt(float val) {
    long tmp = *(long *)&val;
    tmp -= 1L<<23; /* Remove IEEE bias from exponent (-2^23) */
    /* tmp is now an appoximation to logbase2(val) */
    tmp = tmp >> 1; /* divide by 2 */
    tmp += 1L<<23; /* restore the IEEE bias from the exponent (+2^23) */
    return *(float *)&tmp;
}

void main(void)	{

	float sqrtval = 10.0;
	float result = fastsqrt(sqrtval);
	float result2 = sqrt(sqrtval);

	init_display();
	printf("%f, %f", result, result2);
}

This is the generated assembler-code for the fastsqrt-function.

	fastsqrt  PROC  FAR
	PUBLIC  fastsqrt
; FUNCTION fastsqrt (BEGIN  RMASK = @0x0330)
	MOV	[-R0],R9
	MOV	[-R0],R8
	SUB	R0,#4
	MOV	R4,[R0+#4]                 ; val
	MOV	R5,[R0+#6]                 ; val+2
	MOV	[R0],R4                    ; tmp
	MOV	[R0+#2],R5                 ; tmp+2
	MOV	R8,R4
	MOV	R9,R5
	SUB	R9,#128
	MOV	[R0],R4                    ; tmp
	MOV	[R0+#2],R9                 ; tmp+2
	MOV	R5,R9
	SHR	R4,#1
	BMOV	R4.15,R5.0
	ASHR	R5,#1
	MOV	[R0],R4                    ; tmp
	MOV	[R0+#2],R5                 ; tmp+2
	ADD	R5,#128
	MOV	[R0],R4                    ; tmp
	MOV	[R0+#2],R5                 ; tmp+2
	MOV	R4,[R0]                    ; tmp
	MOV	R5,[R0+#2]                 ; tmp+2
	ADD	R0,#8
	RETS
; FUNCTION fastsqrt (END    RMASK = @0x0330)
	fastsqrt  ENDP

The input-value (in my example 10.0) is stored in R8 and R9:

	MOV	R8,#0
	MOV	R9,#16672
	CALL	fastsqrt

Thanks for your help.

regards

Alexander

0 Mik Kleshov over 19 years ago in reply to Alexander Laue

Hi Alexander,

I've played around with the code a bit. I believe that you need to replace 1L<<23 with 127L<<23. That's the correct way of removing and restoring IEEE exponent bias, if I am not mistaken, based on the discription of float representation in memory from Keil's docs. I tried it and it gave sensible results.
Googling around I found this link, which seemed interesting:
http://www.mactech.com/articles/mactech/Vol.14/14.01/FastSquareRootCalc/

Regards,
- mike
Cancel
Vote up 0 Vote down

Cancel