10 useful tips for using the floating point unit on the Cortex-M4

October 25, 2013

5 minute read time.

The Arm Cortex-M4 processor addresses application areas requiring a blend of traditional microcontroller functionality and digital signal processing - this is often called a Digital Signal Controller.

One of the optional features which a licensee of the Cortex-M4 can include in their design is a powerful Floating Point Unit (FPU).

This document gives 10 tips on how best to use the FPU on a Cortex-M4 processor.

1. Floating Point Unit (FPU) is optional

FPU is an optional feature on the Cortex-M4 processor. Some microcontrollers with Cortex-M4 processor do not have an FPU, so please check the datasheets carefully. If the FPU is not present, then most toolchains include an option to emulate floating point using integer operations in their C run-time library.

2. The FPU in Cortex-M4 supports single precision FP operations, but not double precision

Floating point numbers can be single precision (“float” in C) or double precision (“double” in C). The FPU in the Cortex-M4 processor supports single precision operations, but not double. If double precision calculation is present, the C compiler will use runtime library functions to handle the calculation in software. For best performance, it is best to handle the calculation in single precision if possible. Note that most compilers will emulate double precision using integer operations (not single precision FP).

3. Check the compiler’s output

There can be cases where you accidentally used a double precision calculation and you didn’t know it. This can be due to implicit widening of types required by the C language standard. Therefore it is useful to check the output from compilation process to see if it is calling double precision runtime functions. The exact way to do this is toolchain-dependent, so you need to check with the tool’s documentation. One common way is to generate a disassembled listing of the compiled image and see what is inside. Some compilers might have an option to force every floating point operation to single precision only, or generate notification messages when double precision calculation is used.

4. Using floating point calculations in an Interrupt Service Routine

Floating point calculations are performed on a separate register bank inside the floating point unit. If both the main thread (e.g. main program) and interrupt service routines (ISR) use the FPU, extra context saving and restoring are required to ensure that the ISR does not corrupt the data used by the main thread. The extra context saving and restoring requires extra clock cycles, and therefore if you want to have fast ISR response time, one way is to avoid floating point calculations inside an ISR. In this way, the stacking and unstacking of FPU registers is avoided using a feature called Lazy Stacking (see Arm application note AN298).

5. Reserve extra stack space

When floating point operations are carried out in thread mode and an interrupt occurs, the Lazy Stacking (see #4) feature reserves space for the FPU registers on the stack so that they can be pushed onto the stack later if necessary. So you need to check the stack size allocation to make sure there is enough space to accommodate the larger stack frame (26 words instead of 8 words).

6. Runtime library options

Many toolchains provide multiple choices of C runtime libraries for different processing requirements. For example, in the Arm C compiler/Keil MDK, you can select between standard C library (for higher performance) or MicroLib (for smaller size). In gcc, you can also have the option of NewLIB or NewLib-Nano (small memory footprint library). In most cases, these size-optimized libraries provide all the features required in most embedded applications. However, be aware that the runtime library optimized for size might not have full IEEE754 support (e.g. corner cases for NaN etc).

7. CMSIS-Core and FPU

The CMSIS header files provide a C-level abstraction of the underlying Cortex-M4 core.

Please note that the CMSIS-Core header files use two C macros:

- __FPU_PRESENT : Defined if FPU is present
- __FPU_USED : Defined if FPU is used. When this is set, the system initialization function “SystemInit()” enables the FPU.

The FPU must be enabled before any FPU instruction is executed, otherwise a hardware exception will be raised.

8. Turn off the FPU if not used

If your application does not need to handle any floating point calculations, then you can leave the FPU switched off all the time by not defining the __FPU_USED macro. This can reduce power consumption. In some applications, you might only need to use the FPU for a short time and then can switch it off again when the FPU operations are completed. However, in this case you should create a HardFault or UsageFault handler to check fault status and re-enable the FPU in case any floating point code is executed accidentally when the FPU is disabled.

9. Hardfp and softfp linkage

Even if you have the FPU in your microcontroller, some floating point calculations (e.g. sine, cosine) still need to be handled by C runtime library functions. In those cases parameters and results must be passed between the program code and C runtime library functions. Be aware that there are two options in the ABI for the Arm architecture, namely hard ABI and soft ABI. In the hard ABI values are passed via the FPU registers, and in the soft ABI values are passed via integer registers. If you are creating a code library that need to run on multiple targets that may or may not have FPU, you should use the soft ABI. More details on this topic is covered in chapter 13 of “The Definitive Guide to Arm Cortex-M3 and Cortex-M4 Processors, 3^rd edition”.

10. Rounding modes

The IEEE-754 standard defines several rounding modes for floating point calculations. The FPU in the Cortex-M4 supports:

Round to nearest (default)
Round toward plus infinity
Round toward minus infinity
Round toward zero

You can use the “fesetround” function to select which rounding mode to use.

You can learn more about the Cortex-M4 processor by clicking on the link below.

More about the Cortex-M4

Robmar over 2 years ago in reply to Jens Bauer

Floats that are NAN, not a number, can be loaded but the maths operations will fail.

You can use isnan(float) to check, but its inefficient, you should code not to have this occur.
- Cancel
- Up 0 Down
- Reply
- More
- Cancel
Robmar over 2 years ago

Why hasn't arm published two routine to save and restore the FPU status, given that the Cortex M4 only supports partial context switching?

I see people asking how to do this through the years, and no one bothers to post a sample routine.

One post says that a bus fault occurs when reading all the registers, so that context switching has to be handled in the fault handler.

Just to close this, please, though of you at Arm, please give a damn about your clients and post said routines, once and for all. Thanks.
- Cancel
- Up 0 Down
- Reply
- More
- Cancel
Jens Bauer over 11 years ago

This information is very valuable to me. Thank you so much Joseph!
Often it won't pay to use the FPU for copying data; if you have just a single M4 register available, it would be quicker to just repeat LDR/STR than loading multiple FPU registers. But if you've run completely out of integer registers, it might pay to use a list of floats.
I'd like to give other readers a hint:
If you don't use the FPU anyway, you could initalize a structure and read it into a list of floating point registers ("ink the stamp").
Then you could use these registers as a 'data-stamp'. For most people, this would be initializing arrays.
For people like me, I'll keep at least one or 2 'data-stamps' ready, I'll reserve space for copying data and also a few registers for normal float calculations if I would ever need it.
Hopefully someone will benefit from this post.
- Cancel
- Up 0 Down
- Reply
- More
- Cancel
Joseph Yiu over 11 years ago

It is fine to use floating point register to store values that is not a valid floating point number. The check for validity of floating point format takes place when you try to carry out a floating point calculation, but not for data transfers or memory accesses. So the data will not be modified provided that you limit the operations to data transfers.
In fact, sometime C compilers make use of floating point registers for non-floating point processing (but this could affect interrupt latency as you might need to save/restore more registers.)
- Cancel
- Up 0 Down
- Reply
- More
- Cancel
Jens Bauer over 11 years ago
Very good article.
What happens if I load a value, which is not a valid floating point value, into a floating point register ?
Is the value modified ?
Is the value modified when I store the value in memory ?
What I'm getting at, is that in some cases it might pay to use the FPU for copying or zeroing blocks of memory.
- Cancel
- Up 0 Down
- Reply
- More
- Cancel

Architectures and Processors blog

Scalable Matrix Extension: Expanding the Arm Intrinsics Search Engine

Chris Walsh

Arm is pleased to announce that the Arm Intrinsics Search Engine has been updated to include the Scalable Matrix Extension (SME) intrinsics, including both SME and SME2 intrinsics.
- October 3, 2025
Arm A-Profile Architecture developments 2025

Martin Weidmann

Each year, Arm publishes updates to the A-Profile architecture alongside full Instruction Set and System Register documentation. In 2025, the update is Armv9.7-A.
- October 2, 2025
When a barrier does not block: The pitfalls of partial order

Wathsala Vithanage

Acquire fences aren’t always enough. See how LDAPR exposed unsafe interleavings and what we did to patch the problem.
- September 15, 2025

AI blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded and Microcontrollers blog

Internet of Things (IoT) blog

Laptops and Desktops blog

Mobile, Graphics, and Gaming blog

Operating Systems blog

Servers and Cloud Computing blog

SoC Design and Simulation blog

Tools, Software and IDEs blog