Overview of stack size requirement estimations in Cortex-M based applications
“How much stack memory do I need for this application?” - This is a common question for many software developers working on applications that run on microcontroller devices. If the reserved stack size is insufficient, the stack memory used could end up overflowing into memory spaces reserved for other data storage. As a result a program could crash, it can get incorrect results, or both. For systems that have security requirements, stack overflow can also result in security vulnerabilities.
In most microcontroller software development environments, the development tools require the stack size(s) to be defined by the software developers. So it is important for software developers to understand the stack size requirements of their applications and setup the stack sizes of those projects accordingly. Optionally, the tools also typically allow a developer to define the size of the heap memory (heap memory is used for functions like “malloc()”) but this topic is not covered in this article.
There are many factors that can affect the stack size requirements, such as the compilation tools and compilation options, the choice of Real Time Operating System (RTOS), and hence there is no simple golden rule of determining stack size. Nevertheless, this article covers an overview of the stack size requirements in embedded applications for Arm Cortex-M based systems, and hopefully will help developers to decide the stack size requirements in your microcontroller projects.
First let us have a quick overview of how stack operations works in Arm Cortex-M processors. In these processors, the stack operations are based on a “Full Descending” model, which means that the stack pointers are pointing to the last filled stack space location, and if new data is to be pushed into the stack, the corresponding stack pointer is decremented and then the data is stored into the new memory location pointed to by the stack pointer (Figure 1). As a result, the initial value for each stack pointer is set to the top of each stack space.
Figure 1: Full descending stack operation concept
There are two stack pointers in processors based on Armv6-M and Armv7-M architectures. In the latest Armv8-M architecture, the maximum number of stack pointers is increased to 4 (Table 1) when the optional Security extension is implemented. The Armv6-M architecture covers the Cortex-M0, Cortex-M0+ and Cortex-M1 processors, and Armv7-M architecture covers the Cortex-M3, Cortex-M4 and Cortex-M7 processors.
Table 1: Stack Pointers in Cortex-M Processors
In most simple applications without an RTOS, we can use the MSP for all operations. This means that PSP can be unused and ignored. In this case we only need to have one stack space in the application.
In applications with an RTOS, typically there is one stack memory area for the main stack, and a number of process stack areas, one for each application thread. The PSP is dynamically switched to each of these stack spaces when the OS switches between different threads.
The Main Stack Pointer (MSP) is initialized by the hardware reset sequence. The initial value is fetched from the first word in the vector table (default to address 0x00000000 in Cortex-M0/M0+/M3/M4, and configurable by chip designers in newer processors such as the Cortex-M7 processor). Typically, the vector table for booting is defined in a piece of startup code, and tool chain specific arrangement is used to allow software developers to define the stack size for main stack and hence the initial value of MSP in the startup code. Usually startup codes are supplied by microcontroller vendors or development suites, and most of the startup codes are based on the CMSIS-CORE software frame. (Cortex Microcontroller Software Interface Standard, or “CMSIS” is developed by Arm to help the interoperability and portability of software and tools).
Depending on the tool chain you are using, you might find that the memory layout in the SRAM is based on one of the arrangements shown in Figure 2.
Figure 2: Typical stack layouts
In many software development tools, there can be a second step of stack pointer initialization before entering the main()application. This overrides the stack pointer value which was set in the vector table, even though the new value can be identical to the one in the vector table in many cases. Such an arrangement is useful for applications that use external memory for stack memories and the external memory interface needs to be configured before being used. With this arrangement, the initial stack (within the vector table) can be set to an internal SRAM area to allow the reset handler to operate and initialize the external memory interface, before the external memory is used for stack storage within the main application code.
Most modern compilation tools can generate stack usage reports:
Keil MDK uses the Arm Compiler 5 (and in future releases Arm Compiler 6), which is the same compiler engine used in Arm Development Studio 5 (DS-5). The tool chain supports the following linker options:
The Keil MDK environment utilizes the stack reporting feature by default. After a project is compiled, an HTML file is generated and you can locate the stack usage information from there (Table 2):
…
main (Thumb, 196 bytes, Stack size 16 bytes, char_lcd.o(.text)) [Stack]
Table 2: Example stack usage report from Keil MDK 5
You can also see the stack usage for each function in this file.
The Arm Compiler also has two compiler options (reference 3) which can be useful:
These options create a guard variable at the bottom of the stack and sets it to a value (specified by void *__stack_chk_guard). At the end of the function a stack check is carried out and if a stack overflow is detected, a call back function (void __stack_chk_fail(void)) is executed.
A compilation option called “-fstack-usage” is available to enable stack reports.
This enables the stack usage information to be generated on a per function basis (you still need to trace the call tree to find the maximum stack usage).
A stack protection feature is available on gcc using the following options (reference 4):
When these options are used, the guards are inserted to the bottom of the stack and initialized when a function is entered and then checked when the function exits.
IAR Embedded Workbench for Arm (EWARM) provides stack size report in the linker map file. To enable this, the following project settings are required:
The report shows the maximum stack usage for the call tree.
It is important to understand that stack usage reports from development tools only cover the stack usage for each function or a call tree. They do not include additional stack spaces needed by exception handlers.
In addition, software developers might find that there are many cases where the reports are unable to provide information about the stack requirements for some parts of the applications. There are many reasons why stack usage report generation does not work for some code:
In these cases, you might have to calculate the maximum stack usage for these functions manually, or estimate this out by trials. For example, you can use a debugger to fill the stack memory space with certain data pattern before running a program, then execute your code, and examine the stack memory space to see how much of the stack space has been modified by the program execution.
It is also possible to handle the stack estimation by adding instrumentation code in your project. For example, appendix I shown a stack check utility code for gcc Arm Embedded (with NewLib).
In simple applications that only use the main stack (without any use of an RTOS, and without separating stack areas for thread and handler modes), the maximum stack size can be calculated by adding:
The maximum stack size required in Thread mode can be obtained by stack usage reports or by trials. The maximum stack size required by the exception handling is more complicated because:
Assuming that an application only utilizes two interrupt priority levels for peripheral interrupts, there could be 4 levels of nested exceptions due to potential occurrence of the HardFault exception and NMI exception (if utilized by the application), as shown in Figure 3.
Figure 3: Worst case stack size requirement when considering maximum number of nested exceptions
In such cases, the maximum stack size required can be calculated by adding:
In some cases it can be tricky to handle the calculation: if exception handler X (using the FPU) uses less stack space than handler Y (not using the FPU) and both are at the same exception priority level, the total stack size when combining the stack usage for handler X and the stack frame for the next level of higher priority exception can be higher than the equivalent for handler Y. Therefore, we can use a trick to simplify the calculation flow: adding the increase of stack frame size for FPU registers stacking into the function that uses the FPU.
We can then devise a stack size calculation flow chart as shown in Figure 4.
Figure 4: Stack calculation flow for bare metal applications (Armv6-M, Armv7-M and Non-Secure software in Armv8-M)
If an application utilizes a large number of priority levels, potentially this can significantly increase the stack size requirement. Therefore a software developer might want to investigate reducing the number of exception priority levels used to reduce the stack size usage.
The method shown in Figure 4 devises a worst case scenario. Note that this is worse case and which may not be possible in actual operation, for example, certain exceptions (e.g. SVCall) may never occur when executing certain application code. Potentially further adjustment might be needed to relax the stack size requirement based on the actual application requirements.
In Armv8-M architecture, the sizes of the exception stack frames when running Secure software is larger than the stack frames for Non-Secure state because the processor needs to reserve enough stack space for all registers in the event of a Non-Secure interrupt occurring.
Table 3: Size of exception stack frames
In addition, since the system can use two separated stacks (Secure and Non-Secure), the stack size calculation flow for Secure software in Armv8-M architecture (Figure 5) is different from Figure 3 due to different stack frame sizes. In some cases, MCU software developers might not have visibility of the Secure state software so some of the steps are optional (it might be impossible to obtain stack size requirements for Secure state software).
Figure 5: Stack calculation flow for bare metal applications (software in Armv8-M with Secure state)
Again, this flow provides a pessimistic estimation of stack usage in scenarios which might be impossible to reach in real world applications. The flow illustrated in Figure 5 also assumes that exception handlers do not have function calls across the security domains (i.e. use only one stack). If there are function calls across domains in these handlers and if the stack usage information for Secure software is available, it might be better to go through the estimation flow twice: once for scenarios with worst cases for Non-Secure stack usage, and the other one for worst cases in Secure stack usage.
If an RTOS is used in an application, we need to reserve stack space for the main stack as well as the processor stack spaces for each of the threads.
In most cases, the RTOS vendors should provide estimation of the main stack requirements. However, the stack size requirement could be dependent on the OS features being used. You should contact RTOS vendors if you need additional information in this area. You must also include additional stack space for exception handlers for the main stack. The calculation flow shown in Figure 3 could be used for the main stack size estimation.
Most of the RTOS for Arm Cortex-M processors use the PSP when running application threads. In this case, for each of the application threads the required stack size comprises the following stack contents (Figure 6):
Figure 6: Stack size needed by an application thread/task in a typical RTOS
Please note each RTOS may have additional process stack size such requirements, as minimum size and granularity of size steps can differ. An RTOS for Armv6-M and Armv7-M architectures that uses the MPU for memory protection might also have additional programming steps for the configuration of the MPU.
In the less common case where the RTOS uses the MSP when running application threads, you should contact the RTOS vendor for information regarding stack requirements. It might be possible that the stack areas for each thread needs to provide enough space for nested interrupt handling as covered in Figure 3.
A number of additional hardware and software features are available to help stack overflow detection or stack management:
Figure 7: Stack usage watermarking feature in Keil MDK
More information on this feature can be found on the Keil website.
It is important to allocate enough memory space for stack operations. In this article we have covered features in a number of compilation tool chains that can help in determining the stack size requirements. We have also provided suggestions of how to determine the stack size required for whole applications, including the stack required for exception handling and scenarios when using an RTOS.
The following documents are referenced in this article:
Ref
Document
1
Arm Architecture Procedure Call Standard (AAPCS)
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0042e/IHI0042E_aapcs.pdf
2
Armv8-M Architecture Technical Overview
https://community.arm.com/docs/DOC-10896
3
Arm Compiler armcc user guide : --protect_stack option
http://infocenter.arm.com/help/topic/com.arm.doc.dui0472l/chr1359124940593.html
4
gcc : -fstack-protector option
https://gcc.gnu.org/onlinedocs/gcc-5.3.0/gcc/Optimize-Options.html#Optimize-Options
5
Cortex Microcontroller Software Interface Standard (CMSIS-CORE)
http://www.arm.com/cmsis
http://www.keil.com/pack/doc/CMSIS/Core/html/index.html
Instrumentation code for gcc for stack and heap size check.
This utility code has been tested on gcc Arm Embedded 5-2015-q4 release.
Assumptions
To use it:
Hello world
** TEST ENDED **
Heap used in bytes: 1720
Stack used in bytes: 288
Note:
Acknowledgement:
Many thanks to joeyye Ye in Arm gcc team for contributing this code.
Code:
/* This file is to calculate actual stack/heap size used in a user * program. * * It hooks into init_array to initialize, and hijacks exit() to * calculate stack/heap usage. * * To use it, just simply include it in the source file list or the * Makefile. To measure stack size more accurately, please use -Os to * build exit.c * * Pre-condition: * it use IO so either semihosting or retarget must be enabled. */ #include <stdio.h> #include <stdlib.h> #ifndef QEMU_STACK_BASE // Qemu by default set stack base to 0x8000000 by semihosting #define QEMU_STACK_BASE 0x8000000 #endif extern int * __stack asm ("__stack"); #define STACK_BASE (&__stack) #define MAGIC_WORD 0xbeabbeac extern void * _sbrk(int); extern char end asm ("end"); static void exit_check(int r) { char * heap_start = &end; char * heap_end = (char *)_sbrk(0); if (heap_end == (char *)-1) { printf("Heap overflow\n"); } else { unsigned heap_size = heap_end - heap_start; int * stack_limit = (int *)(((unsigned int)heap_end + 3) & ~3); int * stack_base = (int *)(((unsigned int)STACK_BASE + 3) & ~3); unsigned int max_stack_size = (stack_base - stack_limit) * 4; unsigned int stack_size; while (stack_limit < stack_base && *stack_limit == MAGIC_WORD) stack_limit++; stack_size = (stack_base - stack_limit) * 4; if (stack_size == max_stack_size) printf("Stack/heap overflow\n"); printf("\n"); printf("Heap used in bytes: %d\n", heap_size); printf("Stack used in bytes: %d\n", stack_size); } } static void * atexit_dummy = &atexit; register int * sp asm ("sp"); static void init_stack_check() { char * heap_start = &end; int * stack_limit = (int *)(((unsigned int)heap_start + 3) & ~3); int * stack_base = sp; int magic_word = MAGIC_WORD; while (stack_limit < stack_base) *stack_limit++ = magic_word; } // Be sure to have "used" attribute, otherwise lto will optimize it away!!! static void * __attribute__((used, section(".init_array"))) init_stack_check_p=init_stack_check; static void * __attribute__((used, section(".fini_array"))) exit_stack_check_p=exit_check;
Hello. I'm a little confused about the Stack Size, Thumb and Max Depth shown on the <Static Call Chain> file. Where could I found the information what these several words exactly meaning? I know the Max Depth is the Stack needed of the Longest Call Chain. Is that means for this function, the stack it needed is exactly equal the Max Depth? So, what about the 'Stack Size'? I means the stack size of the current function itself needed? If so, what's the 'Thumb'? Is which means that, 'Thumb' Instruction Set? Could I say, Max Depth=Thumb+Stack Size+some other unknown exception?