This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

speed up square root computation (approximation)

Hello,

I have to compute the square root of a floating point number.
A function sqrt() already exists in the math.h, but this function is too slow and the computation time depends on the input value (because of an iterative method used to compute the square root). The accuracy requirements in my case are not very high, so 2 digits after decimal point would be enough.

So I seached for an alternative method to compute a square root and found an interessting method, that uses an approximation to compute the square root (see below). But this only works on 32 bit CPUs (little endian), so I tried to port it to a ST10.

I thought this is not very difficult because the ST10 also uses little endian and is IEEE-754 compatible.

The function to approximate the square root:
float fastsqrt(float val) {
int tmp = *(int *)&val;
tmp -= 1<<23; /* Remove IEEE bias from exponent (-2^23) */
/* tmp is now an appoximation to logbase2(val) */
tmp = tmp >> 1; /* divide by 2 */
tmp += 1<<23; /* restore the IEEE bias from the exponent (+2^23) */
return *(float *)&tmp;
}

I replaced all int with long (because long is 32bit on ST10), but this doesn't work anyway.

It seems that the behavior of the shifts is different on the ST10 in combination with long numbers than on a 32-bit CPU.

Does anybody has same hints for me to make this function working?
Or has somebody an alternative function for approximate or compute a square root in a quick way on a ST10?

Thanks a lot...

Alexander

Parents Reply Children
  • The algorithm assumes that 'int' is a 32-bit number. Replace 'int' with 'long'.

  • Jack Gannsle has had a few articles about fast approximations in embedded magazine (I then to remember feb-apr) go to http://www.embedded.com and hunt them up

    Erik

  • Hi Alexander,

    I read the Wikipedia article you mentioned. Seems like this should work. The function will return a denormalized result, but this should not be a problem. Please confirm that this works (or doesn't work):

    float fastsqrt(float val) {
        long tmp = *(long *)&val;
        tmp -= 1L<<23; /* Remove IEEE bias from exponent (-2^23) */
        /* tmp is now an appoximation to logbase2(val) */
        tmp = tmp >> 1; /* divide by 2 */
        tmp += 1L<<23; /* restore the IEEE bias from the exponent (+2^23) */
        return *(float *)&tmp;
    }
    

    If not, how do you know that it doesn't work?

    - mike

  • Hello together,

    I have tested the modified function Mike posted above. This function doesn't work.

    On my uC-board is a display (16x2 chars) so I'm able to see the result of the fastsqrt-computation on the display and the result is always zero (independed of input-value).

    I used this quick&dirty hacked program:
    So there are same variables and #include I have to use to control the display. But this has no effect on the fastsqrt-function.

    #include "stdio.h"
    #include "math.h"
    #include "Intrins.h"			// intrinsic commands (nop..)
    #include "regst10F269.h"		// Register Set of ST10F268 controller
    
    // variables for the display
    unsigned int far display_count1,display_count2,x,disp_pos,disp_data;
    unsigned char far c;
    bit far disp_busy, disp_tmp1, disp_tmp2;
    
    #include "display_2zeilig_4bit.h"	// display control
    
    float fastsqrt(float val) {
        long tmp = *(long *)&val;
        tmp -= 1L<<23; /* Remove IEEE bias from exponent (-2^23) */
        /* tmp is now an appoximation to logbase2(val) */
        tmp = tmp >> 1; /* divide by 2 */
        tmp += 1L<<23; /* restore the IEEE bias from the exponent (+2^23) */
        return *(float *)&tmp;
    }
    
    void main(void)	{
    
    	float sqrtval = 10.0;
    	float result = fastsqrt(sqrtval);
    	float result2 = sqrt(sqrtval);
    
    	init_display();
    	printf("%f, %f", result, result2);
    }
    


    This is the generated assembler-code for the fastsqrt-function.
    	fastsqrt  PROC  FAR
    	PUBLIC  fastsqrt
    ; FUNCTION fastsqrt (BEGIN  RMASK = @0x0330)
    	MOV	[-R0],R9
    	MOV	[-R0],R8
    	SUB	R0,#4
    	MOV	R4,[R0+#4]                 ; val
    	MOV	R5,[R0+#6]                 ; val+2
    	MOV	[R0],R4                    ; tmp
    	MOV	[R0+#2],R5                 ; tmp+2
    	MOV	R8,R4
    	MOV	R9,R5
    	SUB	R9,#128
    	MOV	[R0],R4                    ; tmp
    	MOV	[R0+#2],R9                 ; tmp+2
    	MOV	R5,R9
    	SHR	R4,#1
    	BMOV	R4.15,R5.0
    	ASHR	R5,#1
    	MOV	[R0],R4                    ; tmp
    	MOV	[R0+#2],R5                 ; tmp+2
    	ADD	R5,#128
    	MOV	[R0],R4                    ; tmp
    	MOV	[R0+#2],R5                 ; tmp+2
    	MOV	R4,[R0]                    ; tmp
    	MOV	R5,[R0+#2]                 ; tmp+2
    	ADD	R0,#8
    	RETS
    ; FUNCTION fastsqrt (END    RMASK = @0x0330)
    	fastsqrt  ENDP
    

    The input-value (in my example 10.0) is stored in R8 and R9:
    	MOV	R8,#0
    	MOV	R9,#16672
    	CALL	fastsqrt
    

    Thanks for your help.

    regards

    Alexander

  • Hi Alexander,

    I've played around with the code a bit. I believe that you need to replace 1L<<23 with 127L<<23. That's the correct way of removing and restoring IEEE exponent bias, if I am not mistaken, based on the discription of float representation in memory from Keil's docs. I tried it and it gave sensible results.
    Googling around I found this link, which seemed interesting:
    http://www.mactech.com/articles/mactech/Vol.14/14.01/FastSquareRootCalc/

    Regards,
    - mike