How much stack memory do I need for my Arm Cortex-M applications?

March 21, 2016

19 minute read time.

Overview of stack size requirement estimations in Cortex-M based applications

1 - Overview

“How much stack memory do I need for this application?” - This is a common question for many software developers working on applications that run on microcontroller devices. If the reserved stack size is insufficient, the stack memory used could end up overflowing into memory spaces reserved for other data storage. As a result a program could crash, it can get incorrect results, or both. For systems that have security requirements, stack overflow can also result in security vulnerabilities.

In most microcontroller software development environments, the development tools require the stack size(s) to be defined by the software developers. So it is important for software developers to understand the stack size requirements of their applications and setup the stack sizes of those projects accordingly. Optionally, the tools also typically allow a developer to define the size of the heap memory (heap memory is used for functions like “malloc()”) but this topic is not covered in this article.

There are many factors that can affect the stack size requirements, such as the compilation tools and compilation options, the choice of Real Time Operating System (RTOS), and hence there is no simple golden rule of determining stack size. Nevertheless, this article covers an overview of the stack size requirements in embedded applications for Arm Cortex-M based systems, and hopefully will help developers to decide the stack size requirements in your microcontroller projects.

2 - Stack Memory layout

First let us have a quick overview of how stack operations works in Arm Cortex-M processors. In these processors, the stack operations are based on a “Full Descending” model, which means that the stack pointers are pointing to the last filled stack space location, and if new data is to be pushed into the stack, the corresponding stack pointer is decremented and then the data is stored into the new memory location pointed to by the stack pointer (Figure 1). As a result, the initial value for each stack pointer is set to the top of each stack space.

Figure 1: Full descending stack operation concept

There are two stack pointers in processors based on Armv6-M and Armv7-M architectures. In the latest Armv8-M architecture, the maximum number of stack pointers is increased to 4 (Table 1) when the optional Security extension is implemented. The Armv6-M architecture covers the Cortex-M0, Cortex-M0+ and Cortex-M1 processors, and Armv7-M architecture covers the Cortex-M3, Cortex-M4 and Cortex-M7 processors.

Stack type	Stack Pointers in Armv6-M, Armv7-M	Stack Pointers in Armv8-M
Main stack (default) for bare metal applications (applications without any OS), exception handlers.	Main Stack Pointer (MSP)	Secure Main Stack Pointer (MSP_S) for Secure software. Available only if TrustZone® technology is included. Non-secure Main Stack Pointer (MSP_NS) for Non-secure software
Process Stack for application tasks inside an OS environment	Process Stack Pointer (PSP)	Secure Process Stack Pointer (PSP_S) for Secure software. Available only if TrustZone® technology is included. Non-secure Processor Stack Pointer (PSP_NS) for Non-secure software

Table 1: Stack Pointers in Cortex-M Processors

In most simple applications without an RTOS, we can use the MSP for all operations. This means that PSP can be unused and ignored. In this case we only need to have one stack space in the application.

In applications with an RTOS, typically there is one stack memory area for the main stack, and a number of process stack areas, one for each application thread. The PSP is dynamically switched to each of these stack spaces when the OS switches between different threads.

The Main Stack Pointer (MSP) is initialized by the hardware reset sequence. The initial value is fetched from the first word in the vector table (default to address 0x00000000 in Cortex-M0/M0+/M3/M4, and configurable by chip designers in newer processors such as the Cortex-M7 processor). Typically, the vector table for booting is defined in a piece of startup code, and tool chain specific arrangement is used to allow software developers to define the stack size for main stack and hence the initial value of MSP in the startup code. Usually startup codes are supplied by microcontroller vendors or development suites, and most of the startup codes are based on the CMSIS-CORE software frame. (Cortex Microcontroller Software Interface Standard, or “CMSIS” is developed by Arm to help the interoperability and portability of software and tools).

Depending on the tool chain you are using, you might find that the memory layout in the SRAM is based on one of the arrangements shown in Figure 2.

Figure 2: Typical stack layouts

In many software development tools, there can be a second step of stack pointer initialization before entering the main()application. This overrides the stack pointer value which was set in the vector table, even though the new value can be identical to the one in the vector table in many cases. Such an arrangement is useful for applications that use external memory for stack memories and the external memory interface needs to be configured before being used. With this arrangement, the initial stack (within the vector table) can be set to an internal SRAM area to allow the reset handler to operate and initialize the external memory interface, before the external memory is used for stack storage within the main application code.

3 - Stack size reporting features in development tools

Most modern compilation tools can generate stack usage reports:

3.1 Keil Microcontroller Development Kit (MDK-Arm)

Keil MDK uses the Arm Compiler 5 (and in future releases Arm Compiler 6), which is the same compiler engine used in Arm Development Studio 5 (DS-5). The tool chain supports the following linker options:

--callgraph : produces a static callgraph (html or text format) which displays stack usage
--info=stack : lists the stack usage of all global symbols

The Keil MDK environment utilizes the stack reporting feature by default. After a project is compiled, an HTML file is generated and you can locate the stack usage information from there (Table 2):

…

main (Thumb, 196 bytes, Stack size 16 bytes, char_lcd.o(.text))

[Stack]

- Max Depth = 56

- Call Chain = main ⇒ clcd_set_cg ⇒ clcd_set_dd_ram_addr ⇒ clcd_write_ir ⇒ clcd_wait_if_busy ⇒ clcd_read_ir

…

Table 2: Example stack usage report from Keil MDK 5

You can also see the stack usage for each function in this file.

The Arm Compiler also has two compiler options (reference 3) which can be useful:

--protect_stack : inserts a guard variable on the stack frame for each vulnerable function.
--protect_stack_all : inserts a guard variable on the stack frame for all functions.

These options create a guard variable at the bottom of the stack and sets it to a value (specified by void *__stack_chk_guard). At the end of the function a stack check is carried out and if a stack overflow is detected, a call back function (void __stack_chk_fail(void)) is executed.

3.2 - gcc

A compilation option called “-fstack-usage” is available to enable stack reports.

This enables the stack usage information to be generated on a per function basis (you still need to trace the call tree to find the maximum stack usage).

A stack protection feature is available on gcc using the following options (reference 4):

-fstack-protector: inserts a guard variable on the stack frame for each vulnerable function.
-fstack-protector-all: inserts a guard variable on the stack frame for all functions.

When these options are used, the guards are inserted to the bottom of the stack and initialized when a function is entered and then checked when the function exits.

3.3 - IAR

IAR Embedded Workbench for Arm (EWARM) provides stack size report in the linker map file. To enable this, the following project settings are required:

Enable linker map file generation in Linker settings (List tap)
Enable stack usage analysis (Advanced tap).

The report shows the maximum stack usage for the call tree.

3.4 - Limitations of stack usage report generation

It is important to understand that stack usage reports from development tools only cover the stack usage for each function or a call tree. They do not include additional stack spaces needed by exception handlers.

In addition, software developers might find that there are many cases where the reports are unable to provide information about the stack requirements for some parts of the applications. There are many reasons why stack usage report generation does not work for some code:

The use of function pointers in the application can mean that the tool cannot generate a call tree.
In many tools the stack usage for functions in C runtime libraries are unknown.
The application contains recursive function calls or self-modifying code.

In these cases, you might have to calculate the maximum stack usage for these functions manually, or estimate this out by trials. For example, you can use a debugger to fill the stack memory space with certain data pattern before running a program, then execute your code, and examine the stack memory space to see how much of the stack space has been modified by the program execution.

It is also possible to handle the stack estimation by adding instrumentation code in your project. For example, appendix I shown a stack check utility code for gcc Arm Embedded (with NewLib).

4 - Finding the maximum stack usage – bare metal scenario

In simple applications that only use the main stack (without any use of an RTOS, and without separating stack areas for thread and handler modes), the maximum stack size can be calculated by adding:

the maximum stack size required by the applications in Thread mode, plus
the maximum stack size required by exception handling.
The size of exception stack frames need to be taken into account.

The maximum stack size required in Thread mode can be obtained by stack usage reports or by trials. The maximum stack size required by the exception handling is more complicated because:

Interrupt handlers can be nested. This includes the fault exception(s) such as HardFault handler and Non-Maskable Interrupt (NMI).
The size of exception stack frames need to be taken into account.
The size of exception stack frame depends on whether the interrupt process was using the floating point unit (FPU) or not.
Due to double-word stack alignment defined by the AAPCS (reference 1), the exception stack frames are required to be double word aligned and therefore a padding word of stack space might be added before the stack frame in the exception entry sequence.

Assuming that an application only utilizes two interrupt priority levels for peripheral interrupts, there could be 4 levels of nested exceptions due to potential occurrence of the HardFault exception and NMI exception (if utilized by the application), as shown in Figure 3.

Figure 3: Worst case stack size requirement when considering maximum number of nested exceptions

In such cases, the maximum stack size required can be calculated by adding:

Maximum stack size for Thread mode application code, rounded up to a multiple of eight (double word alignment).
Maximum size of stack frame for each level of exception entrance (this needs to consider if the FPU is used or not).
Stack usage for the exception handler with the maximum stack usage at each priority level.

In some cases it can be tricky to handle the calculation: if exception handler X (using the FPU) uses less stack space than handler Y (not using the FPU) and both are at the same exception priority level, the total stack size when combining the stack usage for handler X and the stack frame for the next level of higher priority exception can be higher than the equivalent for handler Y. Therefore, we can use a trick to simplify the calculation flow: adding the increase of stack frame size for FPU registers stacking into the function that uses the FPU.

We can then devise a stack size calculation flow chart as shown in Figure 4.

Figure 4: Stack calculation flow for bare metal applications (Armv6-M, Armv7-M and Non-Secure software in Armv8-M)

If an application utilizes a large number of priority levels, potentially this can significantly increase the stack size requirement. Therefore a software developer might want to investigate reducing the number of exception priority levels used to reduce the stack size usage.

The method shown in Figure 4 devises a worst case scenario. Note that this is worse case and which may not be possible in actual operation, for example, certain exceptions (e.g. SVCall) may never occur when executing certain application code. Potentially further adjustment might be needed to relax the stack size requirement based on the actual application requirements.

In Armv8-M architecture, the sizes of the exception stack frames when running Secure software is larger than the stack frames for Non-Secure state because the processor needs to reserve enough stack space for all registers in the event of a Non-Secure interrupt occurring.

	FPU not available / disabled / not active in current context (CONTROL.FPCA==0)	FPU enabled and active in current context (CONTROL.FPCA==1)
In Armv6-M/Armv7-M architecture, or when running software in Armv8-M architecture in Non-Secure state.	8 words	26 words
Running software in Secure state in Armv8-M architecture	8 words if exception is Secure 18 if exception is Non-Secure	Default: 42 words if exception is Secure52 words if exception is Non-Secure Otherwise, if the FPU is configured as always Non-Secure: 26 words if exception is Secure36 words if exception is Non-Secure

Table 3: Size of exception stack frames

In addition, since the system can use two separated stacks (Secure and Non-Secure), the stack size calculation flow for Secure software in Armv8-M architecture (Figure 5) is different from Figure 3 due to different stack frame sizes. In some cases, MCU software developers might not have visibility of the Secure state software so some of the steps are optional (it might be impossible to obtain stack size requirements for Secure state software).

Stack calculation flow bare metal applications

Figure 5: Stack calculation flow for bare metal applications (software in Armv8-M with Secure state)

Again, this flow provides a pessimistic estimation of stack usage in scenarios which might be impossible to reach in real world applications. The flow illustrated in Figure 5 also assumes that exception handlers do not have function calls across the security domains (i.e. use only one stack). If there are function calls across domains in these handlers and if the stack usage information for Secure software is available, it might be better to go through the estimation flow twice: once for scenarios with worst cases for Non-Secure stack usage, and the other one for worst cases in Secure stack usage.

5 - Finding the maximum stack usage – RTOS scenario

If an RTOS is used in an application, we need to reserve stack space for the main stack as well as the processor stack spaces for each of the threads.

In most cases, the RTOS vendors should provide estimation of the main stack requirements. However, the stack size requirement could be dependent on the OS features being used. You should contact RTOS vendors if you need additional information in this area. You must also include additional stack space for exception handlers for the main stack. The calculation flow shown in Figure 3 could be used for the main stack size estimation.

Most of the RTOS for Arm Cortex-M processors use the PSP when running application threads. In this case, for each of the application threads the required stack size comprises the following stack contents (Figure 6):

Maximum stack size required by the application thread (obtained from the stack usage report from compilation tools), and rounded up to multiple of eight (double word alignment).
Stack size for the exception stack frame (8 words if FPU is not used in this thread, or 26 words if FPU is used).
Additional stack size required for additional data saving operations required by the OS. During context switching, the OS might use the processor stack for each thread to save additional data for that thread. The exact size needed is OS dependent, but often this includes space required for callee saved registers (R4 to R12, and S16 to S31 if the FPU is used in the thread). You should check with OS vendor to find out the exact stack size requirements for those additional data.

Figure 6: Stack size needed by an application thread/task in a typical RTOS

Please note each RTOS may have additional process stack size such requirements, as minimum size and granularity of size steps can differ. An RTOS for Armv6-M and Armv7-M architectures that uses the MPU for memory protection might also have additional programming steps for the configuration of the MPU.

In the less common case where the RTOS uses the MSP when running application threads, you should contact the RTOS vendor for information regarding stack requirements. It might be possible that the stack areas for each thread needs to provide enough space for nested interrupt handling as covered in Figure 3.

6 - Additional Information

A number of additional hardware and software features are available to help stack overflow detection or stack management:

Many RTOS products include stack checking features. The stack size checks take place when OS code is executed, for example, at context switching and potentially when the OS tick handler is executed. Depending on the implementation, such feature might not guarantee the detection of stack overflow if the overflow only happens for a short time, and cannot prevent corruption of data if a stack overflow happened. Nevertheless, this can still detect many stack overflow scenarios and is still a useful feature.
Some compilation toolchains have runtime stack checking features. This adds a small amount of runtime overhead but can be very useful for certain applications where the stack size requirements can be very unpredictable.
Some debuggers also have additional features to help stack usage analysis. For example, starting from Keil MDK 5.14 (with RTX 4.77), a stack usage watermarking feature is introduced to allow software developers to determine the maximum stack usage for each thread.

Figure 7: Stack usage watermarking feature in Keil MDK

More information on this feature can be found on the Keil website.

A new stack limit checking feature is available in the Armv8-M architecture. This feature allows stack overflow in the Secure state to be detected. This helps prevent the leakage of Secure data caused by stack overflow errors. For processors based on the Armv8-M Mainline architecture, each of the stack pointers has a corresponding stack limit register which allows software to define watermark levels for stack overflow detection, and when stack overflow occurs, a Usage fault or HardFault exception is triggered. For Armv8-M Baseline, the stack limit registers are available only for Secure stack pointers (MSP_S and PSP_S). Software running in the Non-secure state can still use the Memory Protection Unit (MPU) to define memory areas available for stack usage as a way to detect stack overflow. More information about Armv8-M architecture can be found in the Armv8-M Architecture Technical Overview (reference 2).

It is possible to use both MSP and PSP even if an application does not use an RTOS. This allows fault exception handlers to execute even if there has been a stack overflow in the thread. If using both stack pointers, be careful to make sure that the main stack and process stack areas do not overlap.

7 - Summary

It is important to allocate enough memory space for stack operations. In this article we have covered features in a number of compilation tool chains that can help in determining the stack size requirements. We have also provided suggestions of how to determine the stack size required for whole applications, including the stack required for exception handling and scenarios when using an RTOS.

8 - References

The following documents are referenced in this article:

Ref	Document
1	Arm Architecture Procedure Call Standard (AAPCS) http://infocenter.arm.com/help/topic/com.arm.doc.ihi0042e/IHI0042E_aapcs.pdf
2	Armv8-M Architecture Technical Overview https://community.arm.com/docs/DOC-10896
3	Arm Compiler armcc user guide : --protect_stack option http://infocenter.arm.com/help/topic/com.arm.doc.dui0472l/chr1359124940593.html
4	gcc : -fstack-protector option https://gcc.gnu.org/onlinedocs/gcc-5.3.0/gcc/Optimize-Options.html#Optimize-Options
5	Cortex Microcontroller Software Interface Standard (CMSIS-CORE) http://www.arm.com/cmsis http://www.keil.com/pack/doc/CMSIS/Core/html/index.html

9 - Appendix I

Instrumentation code for gcc for stack and heap size check.

This utility code has been tested on gcc Arm Embedded 5-2015-q4 release.

Assumptions

printf is supported
There is a symbol “end” as start of heap, which is the way newlib works
Init_array and fini_array is supported, as in newlib
The main program does exit, instead of ending with a dead loop

To use it:

Use GCC to compile and link with your application. After the application exit, the result will be printed out like the result below:

Hello world

** TEST ENDED **

Heap used in bytes: 1720

Stack used in bytes: 288

Note:

The stack/heap size doesn’t include printf, if printf was not used in the tested application.
This code will increase execution time at startup and exit

Acknowledgement:

Many thanks to joeyye Ye in Arm gcc team for contributing this code.

Code:

/* This file is to calculate actual stack/heap size used in a user
* program.
*
* It hooks into init_array to initialize, and hijacks exit() to
* calculate stack/heap usage.
*
* To use it, just simply include it in the source file list or the
* Makefile. To measure stack size more accurately, please use -Os to
* build exit.c
*
* Pre-condition:
*   it use IO so either semihosting or retarget must be enabled.
*/
#include <stdio.h>
#include <stdlib.h>
#ifndef QEMU_STACK_BASE
// Qemu by default set stack base to 0x8000000 by semihosting
#define QEMU_STACK_BASE 0x8000000
#endif
extern int * __stack asm ("__stack");
#define STACK_BASE (&__stack)
#define MAGIC_WORD 0xbeabbeac
extern void * _sbrk(int);
extern char end asm ("end");
static void exit_check(int r)
{
    char * heap_start = &end;
    char * heap_end = (char *)_sbrk(0);
    if (heap_end == (char *)-1)
    {
        printf("Heap overflow\n");
    }
    else {
        unsigned heap_size = heap_end - heap_start;
        int * stack_limit = (int *)(((unsigned int)heap_end + 3) & ~3);
        int * stack_base = (int *)(((unsigned int)STACK_BASE + 3) & ~3);
        unsigned int max_stack_size = (stack_base - stack_limit) * 4;
        unsigned int stack_size;
        while (stack_limit < stack_base && *stack_limit == MAGIC_WORD)
                stack_limit++;
            stack_size = (stack_base - stack_limit) * 4;
            if (stack_size == max_stack_size)
                printf("Stack/heap overflow\n");
            printf("\n");
            printf("Heap used in bytes:  %d\n", heap_size);
            printf("Stack used in bytes: %d\n", stack_size);
        }
}
static void * atexit_dummy = &atexit;
register int * sp asm ("sp");
static void init_stack_check()
{
    char * heap_start = &end;
    int * stack_limit = (int *)(((unsigned int)heap_start + 3) & ~3);
    int * stack_base = sp;
    int magic_word = MAGIC_WORD;
    while (stack_limit < stack_base) *stack_limit++ = magic_word;
}
// Be sure to have "used" attribute, otherwise lto will optimize it away!!!
static void * __attribute__((used, section(".init_array")))
init_stack_check_p=init_stack_check;
static void * __attribute__((used, section(".fini_array")))
exit_stack_check_p=exit_check;

1 comment
0 members are here

Architectures and Processors blog

Introducing GICv5: Scalable and secure interrupt management for Arm

Christoffer Dall

Introducing Arm GICv5: a scalable, hypervisor-free interrupt controller for modern multi-core systems with improved virtualization and real-time support.
- April 28, 2025
Getting started with AARCHMRS Features.json using Python

Joh

A high-level introduction to the Arm Architecture Machine Readable Specification (AARCHMRS) Features.json with some examples to interpret and start to work with the available data using Python.
- April 8, 2025
Advancing server manageability on Arm Neoverse Compute Subsystem (CSS) with OpenBMC

Samer El-Haj-Mahmoud

Arm and 9elements Cyber Security have brought a prototype of OpenBMC to the Arm Neoverse Compute Subsystem (CSS) to advancing server manageability.
- January 28, 2025

AI blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded and Microcontrollers blog

Internet of Things (IoT) blog

Laptops and Desktops blog

Mobile, Graphics, and Gaming blog

Operating Systems blog

Servers and Cloud Computing blog

SoC Design and Simulation blog

Tools, Software and IDEs blog