<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="https://community.arm.com/utility/feedstylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>speed up square root computation (approximation)</title><link>https://community.arm.com/developer/tools-software/tools/f/keil-forum/20313/speed-up-square-root-computation-approximation</link><description> Hello, 
 
I have to compute the square root of a floating point number. 
A function sqrt() already exists in the math.h, but this function is too slow and the computation time depends on the input value (because of an iterative method used to compute</description><dc:language>en-US</dc:language><generator>Telligent Community 10</generator><item><title>RE: speed up square root computation (approximation)</title><link>https://community.arm.com/thread/98222?ContentTypeID=1</link><pubDate>Fri, 30 Jun 2006 04:54:20 GMT</pubDate><guid isPermaLink="false">dd9e70c8-6d3c-4c71-b136-2456382a7b5c:8f7143fa-296f-4321-ab7a-bbef530ed48d</guid><dc:creator>Mik Kleshov</dc:creator><description>&lt;p&gt;You can optimize the function further if you write it in assembly:&lt;br /&gt;
&lt;pre&gt;
MOV	R4,R8
MOV	R5,R9
SUB	R5,#0x3F80
SHR	R4,#1
BMOV	R4.15,R5.0
ASHR	R5,#1
ADD	R5,#0x3F80
&lt;/pre&gt;
Only 7 instructions: this should be very fast. With optimization switched on, the compiler should generate similar code. Try raising optimization level and see the generated code. If the compiler doesn&amp;#39;t do a good job, consider writing the function in assembly.&lt;br /&gt;
&lt;br /&gt;
Regards,&lt;br /&gt;
- mike&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;</description></item><item><title>RE: speed up square root computation (approximation)</title><link>https://community.arm.com/thread/74150?ContentTypeID=1</link><pubDate>Fri, 30 Jun 2006 02:56:56 GMT</pubDate><guid isPermaLink="false">dd9e70c8-6d3c-4c71-b136-2456382a7b5c:61bdd102-122d-48b5-9d68-925dafc051af</guid><dc:creator>Alexander Laue</dc:creator><description>&lt;p&gt;Hi Mike,&lt;br /&gt;
&lt;br /&gt;
your modified code works pretty well now. Thanks a lot. :D&lt;br /&gt;
&lt;br /&gt;
I did a little benchmark with this code and measure the execution time over 100 square-root computations with a timer.&lt;br /&gt;
&lt;br /&gt;
This fastsqrt-function is more than 3 times faster than the normal sqrt-function and has a fixed computation time.&lt;br /&gt;
&lt;br /&gt;
The only drawback is the low accuracy for small numbers. But I&amp;#39;m dealing with more or less large numbers and my accuracy-requirements are not very high, so this functions works fine for me.&lt;br /&gt;
&lt;br /&gt;
If you need high accuracy for small numbers, Drews tip using a lookup-table for the mantissa is great.&lt;br /&gt;
&lt;br /&gt;
Thanks a lot to everyone who participates on this discussion.&lt;br /&gt;
&lt;br /&gt;
regards...&lt;br /&gt;
&lt;br /&gt;
Alexander&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;</description></item><item><title>RE: speed up square root computation (approximation)</title><link>https://community.arm.com/thread/123307?ContentTypeID=1</link><pubDate>Thu, 29 Jun 2006 10:50:18 GMT</pubDate><guid isPermaLink="false">dd9e70c8-6d3c-4c71-b136-2456382a7b5c:1870d785-351a-4135-b45c-68583022d6de</guid><dc:creator>Mik Kleshov</dc:creator><description>&lt;p&gt;Hi Alexander,&lt;br /&gt;
&lt;br /&gt;
I&amp;#39;ve played around with the code a bit. I believe that you need to replace &lt;b&gt;1L&amp;lt;&amp;lt;23&lt;/b&gt; with &lt;b&gt;127L&amp;lt;&amp;lt;23&lt;/b&gt;. That&amp;#39;s the correct way of removing and restoring IEEE exponent bias, if I am not mistaken, based on the discription of &lt;b&gt;float&lt;/b&gt; representation in memory from Keil&amp;#39;s docs. I tried it and it gave sensible results.&lt;br /&gt;
Googling around I found this link, which seemed interesting:&lt;br /&gt;
&lt;a href="http://www.mactech.com/articles/mactech/Vol.14/14.01/FastSquareRootCalc/" target="_blank"&gt;http://www.mactech.com/articles/mactech/Vol.14/14.01/FastSquareRootCalc/&lt;/a&gt;&lt;br /&gt;
&lt;br /&gt;
Regards,&lt;br /&gt;
- mike&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;</description></item><item><title>RE: speed up square root computation (approximation)</title><link>https://community.arm.com/thread/112186?ContentTypeID=1</link><pubDate>Thu, 29 Jun 2006 02:52:46 GMT</pubDate><guid isPermaLink="false">dd9e70c8-6d3c-4c71-b136-2456382a7b5c:495a869e-7ee1-4545-a3f4-219918dda14f</guid><dc:creator>Alexander Laue</dc:creator><description>&lt;p&gt;Hello together,&lt;br /&gt;
&lt;br /&gt;
I have tested the modified function Mike posted above. This function doesn&amp;#39;t work.&lt;br /&gt;
&lt;br /&gt;
On my uC-board is a display (16x2 chars) so I&amp;#39;m able to see the result of the fastsqrt-computation on the display and the result is always zero (independed of input-value).&lt;br /&gt;
&lt;br /&gt;
I used this quick&amp;amp;dirty hacked program:&lt;br /&gt;
So there are same variables and #include I have to use to control the display. But this has no effect on the fastsqrt-function.&lt;br /&gt;
&lt;br /&gt;
&lt;pre&gt;
#include &amp;quot;stdio.h&amp;quot;
#include &amp;quot;math.h&amp;quot;
#include &amp;quot;Intrins.h&amp;quot;			// intrinsic commands (nop..)
#include &amp;quot;regst10F269.h&amp;quot;		// Register Set of ST10F268 controller

// variables for the display
unsigned int far display_count1,display_count2,x,disp_pos,disp_data;
unsigned char far c;
bit far disp_busy, disp_tmp1, disp_tmp2;

#include &amp;quot;display_2zeilig_4bit.h&amp;quot;	// display control

float fastsqrt(float val) {
    long tmp = *(long *)&amp;val;
    tmp -= 1L&amp;lt;&amp;lt;23; /* Remove IEEE bias from exponent (-2^23) */
    /* tmp is now an appoximation to logbase2(val) */
    tmp = tmp &amp;gt;&amp;gt; 1; /* divide by 2 */
    tmp += 1L&amp;lt;&amp;lt;23; /* restore the IEEE bias from the exponent (+2^23) */
    return *(float *)&amp;tmp;
}

void main(void)	{

	float sqrtval = 10.0;
	float result = fastsqrt(sqrtval);
	float result2 = sqrt(sqrtval);

	init_display();
	printf(&amp;quot;%f, %f&amp;quot;, result, result2);
}
&lt;/pre&gt;
&lt;br /&gt;
&lt;br /&gt;
This is the generated assembler-code for the fastsqrt-function.&lt;br /&gt;
&lt;pre&gt;
	fastsqrt  PROC  FAR
	PUBLIC  fastsqrt
; FUNCTION fastsqrt (BEGIN  RMASK = @0x0330)
	MOV	[-R0],R9
	MOV	[-R0],R8
	SUB	R0,#4
	MOV	R4,[R0+#4]                 ; val
	MOV	R5,[R0+#6]                 ; val+2
	MOV	[R0],R4                    ; tmp
	MOV	[R0+#2],R5                 ; tmp+2
	MOV	R8,R4
	MOV	R9,R5
	SUB	R9,#128
	MOV	[R0],R4                    ; tmp
	MOV	[R0+#2],R9                 ; tmp+2
	MOV	R5,R9
	SHR	R4,#1
	BMOV	R4.15,R5.0
	ASHR	R5,#1
	MOV	[R0],R4                    ; tmp
	MOV	[R0+#2],R5                 ; tmp+2
	ADD	R5,#128
	MOV	[R0],R4                    ; tmp
	MOV	[R0+#2],R5                 ; tmp+2
	MOV	R4,[R0]                    ; tmp
	MOV	R5,[R0+#2]                 ; tmp+2
	ADD	R0,#8
	RETS
; FUNCTION fastsqrt (END    RMASK = @0x0330)
	fastsqrt  ENDP
&lt;/pre&gt;
&lt;br /&gt;
The input-value (in my example 10.0) is stored in R8 and R9:&lt;br /&gt;
&lt;pre&gt;
	MOV	R8,#0
	MOV	R9,#16672
	CALL	fastsqrt
&lt;/pre&gt;
&lt;br /&gt;
Thanks for your help.&lt;br /&gt;
&lt;br /&gt;
regards&lt;br /&gt;
&lt;br /&gt;
Alexander&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;</description></item><item><title>RE: speed up square root computation (approximation)</title><link>https://community.arm.com/thread/98220?ContentTypeID=1</link><pubDate>Wed, 28 Jun 2006 12:34:33 GMT</pubDate><guid isPermaLink="false">dd9e70c8-6d3c-4c71-b136-2456382a7b5c:f05997f4-5203-4fc5-a6a4-92fc0bdbcc02</guid><dc:creator>Mik Kleshov</dc:creator><description>&lt;p&gt;Hi Alexander,&lt;br /&gt;
&lt;br /&gt;
I read the Wikipedia article you mentioned. Seems like this should work. The function will return a denormalized result, but this should not be a problem. Please confirm that this works (or doesn&amp;#39;t work):&lt;br /&gt;
&lt;pre&gt;
float fastsqrt(float val) {
    long tmp = *(long *)&amp;val;
    tmp -= 1L&amp;lt;&amp;lt;23; /* Remove IEEE bias from exponent (-2^23) */
    /* tmp is now an appoximation to logbase2(val) */
    tmp = tmp &amp;gt;&amp;gt; 1; /* divide by 2 */
    tmp += 1L&amp;lt;&amp;lt;23; /* restore the IEEE bias from the exponent (+2^23) */
    return *(float *)&amp;tmp;
}
&lt;/pre&gt;
&lt;br /&gt;
If not, how do you know that it doesn&amp;#39;t work?&lt;br /&gt;
&lt;br /&gt;
- mike&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;</description></item><item><title>RE: speed up square root computation (approximation)</title><link>https://community.arm.com/thread/46291?ContentTypeID=1</link><pubDate>Wed, 28 Jun 2006 12:34:18 GMT</pubDate><guid isPermaLink="false">dd9e70c8-6d3c-4c71-b136-2456382a7b5c:1fa55035-ee9b-4470-9e34-410f330d2722</guid><dc:creator>Drew Davis</dc:creator><description>&lt;p&gt;This function simply divides the exponent of the floating point number by 2.  Multiplying two numbers together means you add their exponents.  Dividing the exponent by two is a way to find two numbers with equal exponents that multiply to the original exponent, which is roughly equivalent to taking the square root&lt;br /&gt;
&lt;br /&gt;
Consider sqrt(100).  That&amp;#39;s 10^2, which is to say 10^1 * 10^1.  2 / 2 = 1, the new exponent of 10 in the result.&lt;br /&gt;
&lt;br /&gt;
Since this method neglects the mantissa entirely, it won&amp;#39;t be very precise, nor accurate for numbers without large exponents.  The example above happens to be exactly accurate, because the input is an even power of the base I&amp;#39;m using (10).&lt;br /&gt;
&lt;br /&gt;
Consider sqrt(500).  5 * 10^2 -&amp;gt; 5 * 10^1.  This function would produce the result &amp;quot;50&amp;quot; rather than ~22, because it doesn&amp;#39;t take the root of the mantissa.&lt;br /&gt;
&lt;br /&gt;
For really large numbers, this effect might not be as important.  The square root of 5,000,000,000,000 is about 5,000,000, and the error factor of around 2 perhaps doesn&amp;#39;t matter compared to the different of six orders of magnitude between the input and the root.  For numbers close to 2^1, the error is likely much more significant.&lt;br /&gt;
&lt;br /&gt;
A 256-entry lookup table for the mantissa should get you around 2 decimal digits of accuracy there (with a maximum error around 1/256th, or ~0.4%).  You could look up the 8 most significant bits of the mantissa and replace it with a pre-calculated 8-bit value from the LUT.  This will still be pretty quick, but cost 256 bytes of RAM.&lt;br /&gt;
&lt;br /&gt;
Interpolation between table entries would further improve accuracy at the cost of speed.&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;</description></item><item><title>RE: speed up square root computation (approximation)</title><link>https://community.arm.com/thread/112177?ContentTypeID=1</link><pubDate>Wed, 28 Jun 2006 07:17:58 GMT</pubDate><guid isPermaLink="false">dd9e70c8-6d3c-4c71-b136-2456382a7b5c:089d525a-278a-47d2-9feb-bb50cc6d47f6</guid><dc:creator>erik  malund</dc:creator><description>&lt;p&gt;Jack Gannsle has had a few articles about fast approximations in embedded magazine (I then to remember feb-apr) go to &lt;a href="http://www.embedded.com" target="_blank"&gt;http://www.embedded.com&lt;/a&gt; and hunt them up&lt;br /&gt;
&lt;br /&gt;
Erik&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;</description></item><item><title>RE: speed up square root computation (approximation)</title><link>https://community.arm.com/thread/98219?ContentTypeID=1</link><pubDate>Wed, 28 Jun 2006 04:27:28 GMT</pubDate><guid isPermaLink="false">dd9e70c8-6d3c-4c71-b136-2456382a7b5c:e65f4a31-d504-4d74-ada1-471c23d96e45</guid><dc:creator>Keil Software Support Intl.</dc:creator><description>&lt;p&gt;The algorithm assumes that &amp;#39;int&amp;#39; is a 32-bit number.  Replace &amp;#39;int&amp;#39; with &amp;#39;long&amp;#39;.&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;</description></item><item><title>RE: speed up square root computation (approximation)</title><link>https://community.arm.com/thread/74148?ContentTypeID=1</link><pubDate>Wed, 28 Jun 2006 03:08:02 GMT</pubDate><guid isPermaLink="false">dd9e70c8-6d3c-4c71-b136-2456382a7b5c:f7bd555f-8b0f-4c67-8ee9-ea1ebd59d896</guid><dc:creator>Alexander Laue</dc:creator><description>&lt;p&gt;Hi Mike,&lt;br /&gt;
&lt;br /&gt;
thanks for your answer.&lt;br /&gt;
Yout hint doesn&amp;#39;t work, I&amp;#39;m afraid.&lt;br /&gt;
&lt;br /&gt;
I don&amp;#39;t know why this function work, but it does work. It is a special case of newton&amp;#39;s method in combination with the bit-format of an IEEE-floating-point-number.&lt;br /&gt;
&lt;br /&gt;
Please referr to &lt;a href="http://en.wikipedia.org/wiki/Methods_of_computing_square_roots#Approximations_that_depend_on_IEEE_representation" target="_blank"&gt;http://en.wikipedia.org/wiki/Methods_of_computing_square_roots#Approximations_that_depend_on_IEEE_representation&lt;/a&gt; for more information.&lt;br /&gt;
&lt;br /&gt;
regards...&lt;br /&gt;
&lt;br /&gt;
Alexander&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;</description></item><item><title>RE: speed up square root computation (approximation)</title><link>https://community.arm.com/thread/46293?ContentTypeID=1</link><pubDate>Tue, 27 Jun 2006 23:49:55 GMT</pubDate><guid isPermaLink="false">dd9e70c8-6d3c-4c71-b136-2456382a7b5c:56cadb11-efc4-4fa4-82ed-235342a048b5</guid><dc:creator>Mik Kleshov</dc:creator><description>&lt;p&gt;I think you need to replace &lt;b&gt;1&amp;lt;&amp;lt;23&lt;/b&gt; with &lt;b&gt;1L&amp;lt;&amp;lt;23&lt;/b&gt;. Consult your favourite C textbook to see why.&lt;br /&gt;
Not sure what this function does, but I doubt it can provide an accuracy of 2 decimal digits.&lt;br /&gt;
&lt;br /&gt;
Regards,&lt;br /&gt;
- mike&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;</description></item></channel></rss>