Decoding the Startup file for Arm Cortex-M4

January 5, 2015

7 minute read time.

Introduction

This is my attempt to understand the startup file for an Arm Cortex M4 processor, specifically the STM32F4 (Cortex M4) processor. This document should help in giving a feel of assembly language for Arm and understanding how the Cortex M4 processor starts. Familiarity with the architecture of Cortex M4 is required to understand it better.

More importantly, I am looking forward for expert comments and corrections which will help me fill in the gaps in my knowledge.

I am not reproducing the startup code entirely here to avoid clutter. Please refer to the file uploaded. This file is part of the STMicroelectronics software pack along with KEIL MDK-Arm which means it uses the Arm assembler and not the GNU assembler.

Please ignore the line numbers appearing in the code snippets mentioned below. They do not correspond to the line numbers in the startup file.

Organization of the Startup code

There are 5 parts of the startup code.

Declaration of the Stack area
Declaration of the Heap area
Vector table
Reset handler code
Other exception handler code

Stack Area

The assembly code is usually divided into different sections by the AREA directive. Let's first look at how the stack area is declared.

Stack_Size     EQU     0x00000400

This line declares a constant called Stack_Size of value 0x00000400. The EQU is an assembler directive which is similar to a the #define pre-processor directive in C language.

AREA     STACK, NOINIT, READWRITE, ALIGN=3

Next, this is a declaration of the area for Stack. This is done by the assembler directive AREA. This directive denotes a separate section in the memory. STACK in this case is just the name of the section. Following the name of the section are some attributes for this section.

NOINIT indicates that the data in this section is initialized to zero.

READWRITE as the name implies, this section is allowed to be written to and read from.

ALIGN=3 makes the starting of this section on an 8-byte boundary. (2^3 = 8).

Stack_Mem          SPACE          Stack_Size

This line allocates a space of 0x0400 bytes in the stack area. SPACE is an assembly directive which just reserves a space of specified bytes.

__initial_sp is the declaration of a label which is later used in the vector table. This label will equate to the next address after the stack space in this area. Since the stack grows downwards, this serves as the initial stack pointer.

Vector Table

Ignore the heap section for now. Let's now look at the vector table.

The vector table is in section called as RESET. This declaration of the section is denoted by line:

AREA           RESET,     DATA,     READONLY

RESET is the name of the area. DATA indicates that this section will contain data and not instructions. This is true because the vector table contains only the addresses of the handlers and initial stack pointer value.

READONLY as the name indicates protects this area from being overwritten by the program code.

This area is placed at start of the CODE section of the flash memory which is 0x08000000 for this particular device. (Refer the memory mapping of the MCU in datasheet) This value is specified in linker options - either in a scatter file or by command line linker options. So this means that the vector table is placed at offset 0. Since the vector table offset register VTOR is defaulted to 0, the processor therefore uses this vector table at startup.

The vector table contains:

Initial value of the Stack Pointer
Starting address of the reset handler i.e. the code which will be executed on reset
Starting addresses of all other exceptions and interrupts including the NMI handler, Hard fault handler and so on.

DCD          __initial_sp

This line stores the value of label __initial_sp in the RESET area. DCD is an assembly directive which stores a word data (32-bit) in the memory.

DCD          Reset_Handler

Similarly the next word stored is the address of Reset_Handler. This is a forward reference because the label Reset_Handler is declared somewhere down the code. (The assembler processes the file in two passes which helps it to resolve such forward references).

Following these are then the labels which are starting addresses of various handlers such as NMI_Handler, HardFault_Handler and so on. Up to SysTick_Handler are the Arm processors' exceptions. After that the table continues with External interrupts. Here 'external' refers to Arm processor and not the MCU STM32. These interrupts are connected to various peripherals in the MCU such as Watchdog, DMA, RTC etc. The list continues up to FPU_IRQHandler (Flash point Unit IRQ).

The vector table and especially the first two entries in it are essential to start the core to execute some program and handle the PUSH/POP instructions. This is because when the CortexM4 starts, it first copies the first entry in the vector table to the stack pointer (which is the Main Stack Pointer or MSP). Next it copies the next entry into PC (Program counter) and the execution starts from this address. So we specify the address of our Reset Handler which is the first code it will execute.

Reset Handler

After defining the vector table, actual code starts. This is contained in a CODE region.

AREA    |.text|, CODE, READONLY

This defines an area of memory containing code and is marked as Read-only to avoid getting overwritten by the program itself. The name of the section is .text as a convention but could be anything you wish. Vertical bars around this name are necessary because the name does not start with an alphabet. This is a requirement of the assembly directive.

In this region the code will first call a function called SystemInit which initializes the clock speed of the MCU and then calls up main() function. Thus the control is now transferred to main() function.

IMPORT   SystemInit

refers to the function SystemInit defined elsewhere in the project.

IMPORT __main

This line refers to the __main in the C library which eventually calls the main() function defined elsewhere in your project.

If you are using plain assembly, you will need to place an ENTRY directive in the reset handler in absence of the __main. This allows the linker, debugger to locate the entry point of the program.

LDR     R0, =SystemInit

is a pseudo assembly instruction which loads the address of SystemInit function in R0 and then the following instruction BLX R0 jumps the code to execute from that address.

Similarly after control returns from SystemInit, the main() function is called.

Exception Handlers

Once the code starts executing, there might be exceptions occurring and therefore you need exception handlers. For e.g. look at the NMI handler.

NMI_Handler     PROC
                EXPORT     NMI_Handler     [WEAK]
                B     .
                ALIGN
                ENDP

The first line NMI_Handler is the label for this small function. PROC is an assembly directive which defines start of a procedure or a function.

Next line EXPORT makes this label NMI_Handler available to other parts of the program. The attribute [WEAK] is added so that the handler can be redefined elsewhere in the project. This helps you to have your own custom handler in your project and even different handlers for different projects but still keep the same startup file. This is something similar to the virtual functions in C++.

Of course if you want to have the same handler for all your projects, then this startup file can be modified to call your own function from here or add your code here itself.

By default the handlers are defined only as endless loop by the instruction B . This instruction is branching to the same address thus generating in an infinite loop.

ENDP denotes end of the procedure.

ALIGN is an assembler directive which aligns the current memory location to the next word boundary. NOP instructions (or zero data) are inserted to achieve this, if the current location is already on the boundary. It can be used to align to different boundaries and even to insert/pad specified data instead of just NOP or zero data.

This handler code is used for all the processor exceptions.

For the external interrupt handlers, the startup file just defines only one procedure (the same endless loop) Default_Handler. All the external interrupt handler labels are defined same as this Default_Handler. This means that for any exception occurring from the MCU peripherals, the code will execute this Default_Handler. Again, all these are exported as weak so you can redefine them in your project.

Note that even the Reset_Handler is also exported as weak so you can have your own reset handler if you wish.

Heap Area

The heap section is defined similar to the stack area. The two labels __heap_base and __heap_limit indicate the starting of heap area and end of the heap area respectively. If using the Arm Microlib, the labels for initial stack pointer and the start and end of heap area are just exported. Otherwise it needs to be handled differently. I am yet to explore into deep of this so will add more details later.

Miscellaneous

Two more directives in the startup file are worth mentioning.

PRESERVE8

This directive instructs the linker to preserve 8-byte alignment of the stack. This is a requirement of the Arm Architecture Procedure Call Standard (AAPCS).

THUMB

This indicates THUMB mode which is the only mode available on Cortex-M processors since it does not support the Arm mode.

I hope this information will be useful in understanding a bit of the processor and startup code.

Any comments and especially corrections are welcome.

Learn more about Cortex-M

startup_stm32f40xx.s.zip

Top Comments

harshan over 10 years ago

Nice Article very useful.
- Cancel
- Up 0 Down
- Reply
- More
- Cancel
Gopal Amlekar over 10 years ago

Ah.. Didn't know about the label conventions. Yet at nascent stage in Arm assembly coding..
- Cancel
- Up 0 Down
- Reply
- More
- Cancel
Jens Bauer over 10 years ago

The interrupt service routine will be invoked, yes.
The name of the label is correct, '1' (one). If you have a numeric label between 1 and 9, then you can branch backwards or forwards to the nearest label of that kind by supplying 'b' or 'f'.
For instance...
1:      subs    r2,r2,#1
        bpl     1b
1:      subs    r3,r3,#1
        bne     1b
1:      subs    r4,r4,#1
        bge     1b
... will be completely valid, though it looks very odd, compared to what you're used to.
- Cancel
- Up 0 Down
- Reply
- More
- Cancel
Gopal Amlekar over 10 years ago

In this case, when an interrupt arrives, will it not serve that interrupt?
By the way, the label should be lb instead of l (OR the instruction should be b l )
- Cancel
- Up 0 Down
- Reply
- More
- Cancel
Jens Bauer over 10 years ago

I think I wrote the above comment a bit too quickly.
The loop would look like this:
1: wfi
b 1b
-That will branch back to the wfi, so when we actually get an interrupt, we'll go back and wait for another one.
- Cancel
- Up +1 Down
- Reply
- More
- Cancel

Architectures and Processors blog

Future Architecture Technologies: POE2 and vMTE

Martin Weidmann

This blog post introduces two future technologies, Permission Overlay Extension version 2 (POE2) and Virtual Tagging Extension (vMTE).
- October 23, 2025
Scalable Matrix Extension: Expanding the Arm Intrinsics Search Engine

Chris Walsh

Arm is pleased to announce that the Arm Intrinsics Search Engine has been updated to include the Scalable Matrix Extension (SME) intrinsics, including both SME and SME2 intrinsics.
- October 3, 2025
Arm A-Profile Architecture developments 2025

Martin Weidmann

Each year, Arm publishes updates to the A-Profile architecture alongside full Instruction Set and System Register documentation. In 2025, the update is Armv9.7-A.
- October 2, 2025

AI blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded and Microcontrollers blog

Internet of Things (IoT) blog

Laptops and Desktops blog

Mobile, Graphics, and Gaming blog

Operating Systems blog

Servers and Cloud Computing blog

SoC Design and Simulation blog

Tools, Software and IDEs blog