<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="https://community.arm.com/utility/feedstylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>On Cortex-M4F microcontrollers: is fixed point math faster or floating point?</title><link>https://community.arm.com/developer/tools-software/tools/f/armds-forum/47684/on-cortex-m4f-microcontrollers-is-fixed-point-math-faster-or-floating-point</link><description> Hi, 
 I am using S32K14x controllers (Coretx-M4F). It has floating point math unit. I need to perform many mathematical operations as fast as possible. Which will be faster: fixed point q16 or fixed point q32 or single precision (32 bit) floating point</description><dc:language>en-US</dc:language><generator>Telligent Community 10</generator><item><title>RE: On Cortex-M4F microcontrollers: is fixed point math faster or floating point?</title><link>https://community.arm.com/thread/167821?ContentTypeID=1</link><pubDate>Wed, 30 Sep 2020 09:23:42 GMT</pubDate><guid isPermaLink="false">dd9e70c8-6d3c-4c71-b136-2456382a7b5c:5c9bf975-6766-4e50-a08e-2aabd3dc48f4</guid><dc:creator>Ronan Synnott</dc:creator><description>&lt;p&gt;Intrinsic functions are more commonly used for &amp;#39;non C&amp;#39; type actions, such as barrier instructions. In higher order code, the compiler will generate VFP instructions automatically when compiled for VFP. If you really want to hand craft a function, you would use assembler, rather than intrinsics. For example:&lt;/p&gt;
&lt;p&gt;&lt;pre class="ui-code" data-mode="c_cpp"&gt;float foo(float a, float b){
  return (a+b);
}&lt;/pre&gt;&lt;/p&gt;
&lt;p&gt;Compiled with:&lt;/p&gt;
&lt;p&gt;&lt;pre class="ui-code" data-mode="text"&gt;armclang -c -O2 --target=arm-arm-none-eabi -mcpu=cortex-m4 -mfpu=vfpv3 float.c&lt;/pre&gt;&lt;/p&gt;
&lt;p&gt;outputs:&lt;/p&gt;
&lt;p&gt;&lt;pre class="ui-code" data-mode="text"&gt;    foo
        0x00000000:    ee001a10    ....    VMOV     s0,r1
        0x00000004:    ee010a10    ....    VMOV     s2,r0
        0x00000008:    ee310a00    1...    VADD.F32 s0,s2,s0
        0x0000000c:    ee100a10    ....    VMOV     r0,s0
        0x00000010:    4770        pG      BX       lr&lt;/pre&gt;&lt;/p&gt;
&lt;p&gt;For completeness, when compiled without VFP it calls a library function, which will take many cycles&lt;/p&gt;
&lt;p&gt;&lt;pre class="ui-code" data-mode="text"&gt;    foo
        0x00000000:    f7ffbffe    ....    B.W      __aeabi_fadd&lt;/pre&gt;&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;</description></item><item><title>RE: On Cortex-M4F microcontrollers: is fixed point math faster or floating point?</title><link>https://community.arm.com/thread/167819?ContentTypeID=1</link><pubDate>Wed, 30 Sep 2020 08:59:12 GMT</pubDate><guid isPermaLink="false">dd9e70c8-6d3c-4c71-b136-2456382a7b5c:f8ac0d90-616c-41ea-8f7c-9afb0089f741</guid><dc:creator>Pramod Ranade</dc:creator><description>[quote userid="5916" url="~/developer/tools-software/tools/f/armds-forum/47684/on-cortex-m4f-microcontrollers-is-fixed-point-math-faster-or-floating-point/167817"]Hypothetically yes, however do your ISRs use the FPU registers?[/quote]
&lt;p&gt;Yes!&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;</description></item><item><title>RE: On Cortex-M4F microcontrollers: is fixed point math faster or floating point?</title><link>https://community.arm.com/thread/167817?ContentTypeID=1</link><pubDate>Wed, 30 Sep 2020 08:29:35 GMT</pubDate><guid isPermaLink="false">dd9e70c8-6d3c-4c71-b136-2456382a7b5c:c66e4c07-44c5-4538-88a1-65bae89fa61c</guid><dc:creator>Ronan Synnott</dc:creator><description>&lt;p&gt;Hi Pramod,&lt;/p&gt;
&lt;p&gt;Hypothetically yes, however do your ISRs use the FPU registers?&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;</description></item><item><title>RE: On Cortex-M4F microcontrollers: is fixed point math faster or floating point?</title><link>https://community.arm.com/thread/167814?ContentTypeID=1</link><pubDate>Wed, 30 Sep 2020 06:07:43 GMT</pubDate><guid isPermaLink="false">dd9e70c8-6d3c-4c71-b136-2456382a7b5c:b9bb831e-eacd-40a9-94da-b18d2792e37f</guid><dc:creator>Pramod Ranade</dc:creator><description>[quote userid="5397" url="~/developer/tools-software/tools/f/armds-forum/47684/on-cortex-m4f-microcontrollers-is-fixed-point-math-faster-or-floating-point/167797"]An issue often noted with fixed point is that, aside from the actual calculations, it adds overhead &amp;amp; complexity to the code which needs to supply the input data and/or use the results.[/quote]
&lt;p&gt;I agree.&lt;/p&gt;
&lt;p&gt;Another problem in using fixed point is that code won&amp;#39;t be portable due to the use of non-standard intrinsic functions like __SMMLA etc.&lt;/p&gt;
&lt;p&gt;One problem in using float is that it will increase ISR entry and exit times, due to the need of saving and restoring FPU registers.&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;</description></item><item><title>RE: On Cortex-M4F microcontrollers: is fixed point math faster or floating point?</title><link>https://community.arm.com/thread/167797?ContentTypeID=1</link><pubDate>Tue, 29 Sep 2020 11:08:41 GMT</pubDate><guid isPermaLink="false">dd9e70c8-6d3c-4c71-b136-2456382a7b5c:3944b202-9ba7-45bf-9291-b7829e7667fb</guid><dc:creator>Andy Neil</dc:creator><description>[quote userid="69019" url="~/developer/tools-software/tools/f/armds-forum/47684/on-cortex-m4f-microcontrollers-is-fixed-point-math-faster-or-floating-point/167782"]fixed point math (using the DSP instructions) &lt;em&gt;is&lt;/em&gt; faster than floating point math (using FPU). But the difference is marginal[/quote]
&lt;p&gt;thanks for the feedback.&lt;/p&gt;
&lt;p&gt;An issue often noted with fixed point is that, aside from the actual calculations, it adds overhead &amp;amp; complexity to the code which needs to supply the input data and/or use the results. So I guess one would need a wider benchmark to see if that tips the balance for the overall &lt;em&gt;system&lt;/em&gt; ... ?&lt;/p&gt;
&lt;p&gt;#FloatingVsFixedPoint&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;</description></item><item><title>RE: On Cortex-M4F microcontrollers: is fixed point math faster or floating point?</title><link>https://community.arm.com/thread/167782?ContentTypeID=1</link><pubDate>Tue, 29 Sep 2020 04:47:13 GMT</pubDate><guid isPermaLink="false">dd9e70c8-6d3c-4c71-b136-2456382a7b5c:821a4862-ad42-47a7-86fd-5715ebfc8ebd</guid><dc:creator>Pramod Ranade</dc:creator><description>&lt;p&gt;Update:&lt;/p&gt;
&lt;p&gt;Tried to perform MAC on q31 and float32 numbers. Used DSP instruction for q31 and FPU instruction for float32 numbers. Used gcc with highest optimization (-O3). The statement&lt;/p&gt;
&lt;p&gt;__SMMLA(a, x, y); // equivalent to a += (x * y);&lt;/p&gt;
&lt;p&gt;requires 9 clock cycles to execute, where a, x and y are local q31 variables.&lt;/p&gt;
&lt;p&gt;&lt;/p&gt;
&lt;p&gt;Equivalent statement for floating point variables&lt;/p&gt;
&lt;p&gt;&lt;/p&gt;
&lt;p&gt;f32Var1 += (f32Var2 * f32Var3);&lt;/p&gt;
&lt;p&gt;&lt;/p&gt;
&lt;p&gt;requires 10 clock cycles to execute, where f32Var1, f32Var2 and f32Var3 are local float variables.&lt;/p&gt;
&lt;p&gt;&lt;/p&gt;
&lt;p&gt;So fixed point math (using the DSP instructions) &lt;em&gt;is&lt;/em&gt; faster than floating point math (using FPU). But the difference is marginal.&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;</description></item><item><title>RE: On Cortex-M4F microcontrollers: is fixed point math faster or floating point?</title><link>https://community.arm.com/thread/167733?ContentTypeID=1</link><pubDate>Mon, 28 Sep 2020 03:17:51 GMT</pubDate><guid isPermaLink="false">dd9e70c8-6d3c-4c71-b136-2456382a7b5c:3479ca57-f8d4-4c85-b93d-360b7ed538d8</guid><dc:creator>Pramod Ranade</dc:creator><description>[quote userid="5916" url="~/developer/tools-software/tools/f/armds-forum/47684/on-cortex-m4f-microcontrollers-is-fixed-point-math-faster-or-floating-point/167714"]By mathematical operations do you mean low level operations (MAC etc) or higher level operations (FFT or similar)[/quote]
&lt;p&gt;Low level operations, to start with. May evaluate higher math functions later (e.g. matrix multiplication)&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;</description></item><item><title>RE: On Cortex-M4F microcontrollers: is fixed point math faster or floating point?</title><link>https://community.arm.com/thread/167732?ContentTypeID=1</link><pubDate>Mon, 28 Sep 2020 03:15:13 GMT</pubDate><guid isPermaLink="false">dd9e70c8-6d3c-4c71-b136-2456382a7b5c:97882f95-7f83-49c9-b1cf-820f41af5f6a</guid><dc:creator>Pramod Ranade</dc:creator><description>&lt;p&gt;ARM documentation says that most FPU instructions (except division) complete in 1 clock cycle. But there is an overhead of moving operands between Rx registers and the FPU registers. The DSP instructions also seem to perform most basic arithmetic on q32 numbers in single clock cycle, but the compiler can&amp;#39;t generate DSP instructions. Which means, we must use CMSIS DSP library. In both cases, there is some overhead, but don&amp;#39;t know which is worse. Hence the question.&lt;/p&gt;
[quote userid="5397" url="~/developer/tools-software/tools/f/armds-forum/47684/on-cortex-m4f-microcontrollers-is-fixed-point-math-faster-or-floating-point/167617"]why don&amp;#39;t you run some tests to find out ?[/quote]
&lt;p&gt;Yes, I am planning the same right now. I am using CMSIS DSP library. It has functions to perform arithmetic on q32 as well as float32. I assume it will use the DSP and FPU instructions, respectively. Will compare performance of the _q32 and _f32 variants of the same functions. Will post the results here when done.&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;</description></item><item><title>RE: On Cortex-M4F microcontrollers: is fixed point math faster or floating point?</title><link>https://community.arm.com/thread/167714?ContentTypeID=1</link><pubDate>Sun, 27 Sep 2020 14:19:32 GMT</pubDate><guid isPermaLink="false">dd9e70c8-6d3c-4c71-b136-2456382a7b5c:74a705b6-9996-4cf2-b8e4-c29009c2e1cc</guid><dc:creator>Ronan Synnott</dc:creator><description>&lt;p&gt;Hi Pramod,&lt;/p&gt;
&lt;p&gt;Further to Andy&amp;#39;s excellent reply above, the CPU has a cycle count register you can use to easily compare code performance&lt;br /&gt;&lt;a href="https://developer.arm.com/documentation/ddi0439/b/Data-Watchpoint-and-Trace-Unit/DWT-Programmers-Model"&gt;https://developer.arm.com/documentation/ddi0439/b/Data-Watchpoint-and-Trace-Unit/DWT-Programmers-Model&lt;/a&gt;&lt;br /&gt;(enabled by bit 0 o DWT_CTRL). Most development tools (such as Keil MDK) have this integrated into the environment.&lt;br /&gt;&lt;br /&gt;By mathematical operations do you mean low level operations (MAC etc) or higher level operations (FFT or similar). The CMSIS DSP library contains a number of optimized routines (with and without VFP) to further help you analyze.&lt;br /&gt;&lt;a href="https://www.keil.com/pack/doc/cmsis/dsp/html/index.html"&gt;www.keil.com/.../index.html&lt;/a&gt;&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;</description></item><item><title>RE: On Cortex-M4F microcontrollers: is fixed point math faster or floating point?</title><link>https://community.arm.com/thread/167617?ContentTypeID=1</link><pubDate>Tue, 22 Sep 2020 16:52:42 GMT</pubDate><guid isPermaLink="false">dd9e70c8-6d3c-4c71-b136-2456382a7b5c:9bcef00d-3f33-4f89-8fa3-ea5454f49d1b</guid><dc:creator>Andy Neil</dc:creator><description>[quote userid="69019" url="~/developer/tools-software/tools/f/armds-forum/47684/on-cortex-m4f-microcontrollers-is-fixed-point-math-faster-or-floating-point"] On Cortex-M4F microcontrollers: is fixed point math faster ? [/quote]
&lt;p&gt;Probably not:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://blogs.sw.siemens.com/embedded-software/2012/09/10/the-floating-point-argument/"&gt;https://blogs.sw.siemens.com/embedded-software/2012/09/10/the-floating-point-argument/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;why don&amp;#39;t you run some tests to find out ?&lt;/p&gt;
&lt;p&gt;But you do have to be careful to stick to single precision:&lt;/p&gt;
&lt;p&gt;&lt;a href="/developer/ip-products/processors/b/processors-ip-blog/posts/10-useful-tips-to-using-the-floating-point-unit-on-the-arm-cortex--m4-processor"&gt;https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/10-useful-tips-to-using-the-floating-point-unit-on-the-arm-cortex--m4-processor&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://dzone.com/articles/be-aware-floating-point-operations-on-arm-cortex-m"&gt;https://dzone.com/articles/be-aware-floating-point-operations-on-arm-cortex-m&lt;/a&gt;&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;</description></item></channel></rss>