This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Cortex M4 FPU against fixed point math

Note: This was originally posted on 1st March 2012 at http://forums.arm.com

Hi,
I'm working on 3 cortex, a STM32F027, cortex M3, a TI Cortex M4 and an Infineon Cortex M4.
I would like to move from a TI C2000 TMS320F2810 (fixed point 32bit core) to an M4 to control a 3 phase power bridge.
My algorithms nowadays work in fixed point math, IQ22, and are based for 98% on simple multiplications and some sine/cosine calculations: PI, PID, Pll, low pass filter, notch, ..
I ported the algorithm in the cortex mainly redifining the IQmpy, moltiplication, and the IQsin, sine calculation first in fixed point then in floating point.
I was expencitng to have a speed improvment running in floating because every multiplication in fixed math requires a shift while in floating I don't need the shift but I'm exeriencing a dramatic slow down of the algorithm running in floating point.
I'm doing my test in IAR.
I checked the assembler and I verified the compiler is using the floating point.
My only explenation is that the FPU doesn't have, as far as I know, direct access to the CPU registers so every multiplication in FPU requires 2 loads to the FPU registers and another load to move the result to the CPU register.
Is there anybody that can confirm me that?
Thank you very much
michele

michele corradin over 12 years ago

Note: This was originally posted on 1st March 2012 at http://forums.arm.com

dear joseph,
I'll post a couple of example with the assembler.
In my case I'm completely switching all my code from IQ22 to float and I verified it is using the floating point.
Let's me make a "simple" question: is it true that the FPU doesn't have direct access to the core registers so to perform an operation in the FPU I have to load the data from the CPU registers to the FPU and back?
I check the manual but I'm not sure if I understood right: in the assembler it seems to me to see some load operations.
Thank you very much for your help
michele
Cancel
Vote up 0 Vote down

Cancel
michele corradin over 12 years ago

Note: This was originally posted on 2nd March 2012 at http://forums.arm.com

Dear Joseph
I verified that there was a cast error in my algorythms moving from IQ math to float: now the floating code runs a 10-20% slower then the IQ one.
I notice some vmov, vstr operations .. I guess that explain what I said before.
I downloaded the ARM DSP library: is there a speed report about IQ, float operation?
Thanks a lot for your support
Michele
Cancel
Vote up 0 Vote down

Cancel
michele corradin over 12 years ago

Note: This was originally posted on 5th March 2012 at http://forums.arm.com

Yes, I know ARM provides Q15,Q31 and single precision floating point libraries. I mean if there is any comparison of speed between the execution time of those library in Q15, Q31 and floating maybe on sinewave calculation or PID, ..
Thanks
Michele
Cancel
Vote up 0 Vote down

Cancel
Joseph Yiu over 12 years ago

Note: This was originally posted on 1st March 2012 at http://forums.arm.com

Hi Michele,

Without looking at the code I cannot be sure what is happening, but maybe the switching between the IQ22 and single precision is possibly the main issue. It is not just copying the data from integer register to floating point register and back, as you will also need to add in the exponent and sign bit, and the IEEE754 single precision format use 23 bits rather than 22 bits, so you might have additional shift operations there.

Can you change all the operations to single precision floating point?

regards,
Joseph
Cancel
Vote up 0 Vote down

Cancel
Joseph Yiu over 12 years ago

Note: This was originally posted on 1st March 2012 at http://forums.arm.com

Hi Michele,

Yes, you are correct.

The floating instructions operates on the floating point register bank. There are instructions to transfer floating point data to/from memory. So in theory the floating point data do not have to go through the integer register bank at all. But when mixing with IQ22 or fixed point, which (assumed) are processed in the integer registers, then it has to be transferred and converted between the two register bank. Instructions to convert between floating point and fixed point are available. So even the conversion is needed it shouldn't be too much worst.

The instruction set of the Cortex-M4 floating point unit can be found in this pdf document:
http://infocenter.arm.com/help/topic/com.arm.doc.dui0553a/DUI0553A_cortex_m4_dgug.pdf
or from ARM Infocenter:
http://infocenter.arm.com/help/index.jsp
-> Developer Guides and Articles
-> Software Development
-> Cortex-M4 Devices Generic User Guide

Potentially there are other areas that can make the performance worst
- accidentally used double precision data/functions
- Compiler/run-time library setting (e.g. hard VFP vs soft VFP)

regards,
Joseph
Cancel
Vote up 0 Vote down

Cancel
Joseph Yiu over 12 years ago
Cancel
Vote up 0 Vote down

Cancel
Joseph Yiu over 12 years ago

Note: This was originally posted on 9th March 2012 at http://forums.arm.com

Hi Michele,

The information available on public domain is limited.
There are some information available. For example:

http://www.emcu.it/STM32/STM32_Journal/stm32_journal_1_1.pdf

http://www.embedded-world.eu/fileadmin/user_upload/pdf/arm_entwicklerkonferenz_2011/Session_3/01%20-%20Developing%20Advanced%20Signal%20Processing%20_Johnson_ARM.pdf

I know that this might not be exactly what you want, but you can generate the data using instruction set simulator in Keil MDK if needed.
regards,
Joseph
Cancel
Vote up 0 Vote down

Cancel