Arm Community
Arm Community
  • Site
  • User
  • Site
  • Search
  • User
Arm Community blogs
Arm Community blogs
Architectures and Processors blog Decoding the Startup file for Arm Cortex-M4
  • Blogs
  • Mentions
  • Sub-Groups
  • Tags
  • Jump...
  • Cancel
More blogs in Arm Community blogs
  • AI blog

  • Announcements

  • Architectures and Processors blog

  • Automotive blog

  • Embedded and Microcontrollers blog

  • Internet of Things (IoT) blog

  • Laptops and Desktops blog

  • Mobile, Graphics, and Gaming blog

  • Operating Systems blog

  • Servers and Cloud Computing blog

  • SoC Design and Simulation blog

  • Tools, Software and IDEs blog

Tell us what you think
Tags
  • GNU Assembler
  • Thumb
  • STM32
  • Tutorial
  • Cortex-M4
Actions
  • RSS
  • More
  • Cancel
Related blog posts
Related forum threads

Decoding the Startup file for Arm Cortex-M4

Gopal Amlekar
Gopal Amlekar
January 5, 2015
7 minute read time.

Introduction

This is my attempt to understand the startup file for an Arm Cortex M4 processor, specifically the STM32F4 (Cortex M4) processor. This document should help in giving a feel of assembly language for Arm and understanding how the Cortex M4 processor starts. Familiarity with the architecture of Cortex M4 is required to understand it better.

More importantly, I am looking forward for expert comments and corrections which will help me fill in the gaps in my knowledge.

I am not reproducing the startup code entirely here to avoid clutter. Please refer to the file uploaded. This file is part of the STMicroelectronics software pack along with KEIL MDK-Arm which means it uses the Arm assembler and not the GNU assembler.

Please ignore the line numbers appearing in the code snippets mentioned below. They do not correspond to the line numbers in the startup file.

Organization of the Startup code

There are 5 parts of the startup code.

  1. Declaration of the Stack area
  2. Declaration of the Heap area
  3. Vector table
  4. Reset handler code
  5. Other exception handler code

Stack Area

The assembly code is usually divided into different sections by the AREA directive. Let's first look at how the stack area is declared.

Stack_Size     EQU     0x00000400



This line declares a constant called Stack_Size of value 0x00000400. The EQU is an assembler directive which is similar to a the #define pre-processor directive in C language.

AREA     STACK, NOINIT, READWRITE, ALIGN=3



Next, this is a declaration of the area for Stack. This is done by the assembler directive AREA. This directive denotes a separate section in the memory. STACK in this case is just the name of the section. Following the name of the section are some attributes for this section.

NOINIT indicates that the data in this section is initialized to zero.

READWRITE as the name implies, this section is allowed to be written to and read from.

ALIGN=3 makes the starting of this section on an 8-byte boundary. (2^3 = 8).

Stack_Mem          SPACE          Stack_Size



This line allocates a space of 0x0400 bytes in the stack area. SPACE is an assembly directive which just reserves a space of specified bytes.

__initial_sp is the declaration of a label which is later used in the vector table. This label will equate to the next address after the stack space in this area. Since the stack grows downwards, this serves as the initial stack pointer.

Vector Table

Ignore the heap section for now. Let's now look at the vector table.

The vector table is in section called as RESET. This declaration of the section is denoted by line:

AREA           RESET,     DATA,     READONLY



RESET is the name of the area. DATA indicates that this section will contain data and not instructions. This is true because the vector table contains only the addresses of the handlers and initial stack pointer value.

READONLY as the name indicates protects this area from being overwritten by the program code.

This area is placed at start of the CODE section of the flash memory which is 0x08000000 for this particular device. (Refer the memory mapping of the MCU in datasheet) This value is specified in linker options - either in a scatter file or by command line linker options. So this means that the vector table is placed at offset 0. Since the vector table offset register VTOR is defaulted to 0, the processor therefore uses this vector table at startup.

The vector table contains:

  • Initial value of the Stack Pointer
  • Starting address of the reset handler i.e. the code which will be executed on reset
  • Starting addresses of all other exceptions and interrupts including the NMI handler, Hard fault handler and so on.
DCD          __initial_sp



This line stores the value of label __initial_sp in the RESET area. DCD is an assembly directive which stores a word data (32-bit) in the memory.

DCD          Reset_Handler



Similarly the next word stored is the address of Reset_Handler. This is a forward reference because the label Reset_Handler is declared somewhere down the code. (The assembler processes the file in two passes which helps it to resolve such forward references).

Following these are then the labels which are starting addresses of various handlers such as NMI_Handler, HardFault_Handler and so on. Up to SysTick_Handler are the Arm processors' exceptions. After that the table continues with External interrupts. Here 'external' refers to Arm processor and not the MCU STM32. These interrupts are connected to various peripherals in the MCU such as Watchdog, DMA, RTC etc. The list continues up to FPU_IRQHandler (Flash point Unit IRQ).

The vector table and especially the first two entries in it are essential to start the core to execute some program and handle the PUSH/POP instructions. This is because when the CortexM4 starts, it first copies the first entry in the vector table to the stack pointer (which is the Main Stack Pointer or MSP). Next it copies the next entry into PC (Program counter) and the execution starts from this address. So we specify the address of our Reset Handler which is the first code it will execute.

Reset Handler

After defining the vector table, actual code starts. This is contained in a CODE region.

AREA    |.text|, CODE, READONLY



This defines an area of memory containing code and is marked as Read-only to avoid getting overwritten by the program itself. The name of the section is .text as a convention but could be anything you wish. Vertical bars around this name are necessary because the name does not start with an alphabet. This is a requirement of the assembly directive.

In this region the code will first call a function called SystemInit which initializes the clock speed of the MCU and then calls up main() function. Thus the control is now transferred to main() function.

IMPORT   SystemInit



refers to the function SystemInit defined elsewhere in the project.

IMPORT __main



This line refers to the __main in the C library which eventually calls the main() function defined elsewhere in your project.

If you are using plain assembly, you will need to place an ENTRY directive in the reset handler in absence of the __main. This allows the linker, debugger to locate the entry point of the program.

LDR     R0, =SystemInit



is a pseudo assembly instruction which loads the address of SystemInit function in R0 and then the following instruction BLX     R0 jumps the code to execute from that address.

Similarly after control returns from SystemInit, the main() function is called.

Exception Handlers

Once the code starts executing, there might be exceptions occurring and therefore you need exception handlers. For e.g. look at the NMI handler.

NMI_Handler     PROC
                EXPORT     NMI_Handler     [WEAK]
                B     .
                ALIGN
                ENDP



The first line NMI_Handler is the label for this small function. PROC is an assembly directive which defines start of a procedure or a function.

Next line EXPORT makes this label NMI_Handler available to other parts of the program. The attribute [WEAK] is added so that the handler can be redefined elsewhere in the project. This helps you to have your own custom handler in your project and even different handlers for different projects but still keep the same startup file. This is something similar to the virtual functions in C++.

Of course if you want to have the same handler for all your projects, then this startup file can be modified to call your own function from here or add your code here itself.

By default the handlers are defined only as endless loop by the instruction B . This instruction is branching to the same address thus generating in an infinite loop.

ENDP denotes end of the procedure.

ALIGN is an assembler directive which aligns the current memory location to the next word boundary. NOP instructions (or zero data) are inserted to achieve this, if the current location is already on the boundary. It can be used to align to different boundaries and even to insert/pad specified data instead of just NOP or zero data.

This handler code is used for all the processor exceptions.

For the external interrupt handlers, the startup file just defines only one procedure (the same endless loop) Default_Handler. All the external interrupt handler labels are defined same as this Default_Handler. This means that for any exception occurring from the MCU peripherals, the code will execute this Default_Handler. Again, all these are exported as weak so you can redefine them in your project.

Note that even the Reset_Handler is also exported as weak so you can have your own reset handler if you wish.

Heap Area

The heap section is defined similar to the stack area. The two labels __heap_base and __heap_limit indicate the starting of heap area and end of the heap area respectively. If using the Arm Microlib, the labels for initial stack pointer and the start and end of heap area are just exported. Otherwise it needs to be handled differently. I am yet to explore into deep of this so will add more details later.

Miscellaneous

Two more directives in the startup file are worth mentioning.

PRESERVE8

This directive instructs the linker to preserve 8-byte alignment of the stack. This is a requirement of the Arm Architecture Procedure Call Standard (AAPCS).

THUMB

This indicates THUMB mode which is the only mode available on Cortex-M processors since it does not support the Arm mode.

I hope this information will be useful in understanding a bit of the processor and startup code.

Any comments and especially corrections are welcome.

Learn more about Cortex-M

startup_stm32f40xx.s.zip
Anonymous

Top Comments

  • Jens Bauer
    Jens Bauer over 10 years ago +1
    Nice article. I can't write so much comments, because I am using the GNU assembler myself, and it differs a bit from the one you're using (I believe that's Arm's own assembler). But one thing I will...
  • Jens Bauer
    Jens Bauer over 10 years ago +1
    I think I wrote the above comment a bit too quickly. The loop would look like this: 1:      wfi         b       1b -That will branch back to the wfi, so when we actually get an interrupt, we'll go back...
  • harshan
    harshan over 10 years ago

    Nice Article very useful.

    • Cancel
    • Up 0 Down
    • Reply
    • More
    • Cancel
  • Gopal Amlekar
    Gopal Amlekar over 10 years ago

    Ah.. Didn't know about the label conventions. Yet at nascent stage in Arm assembly coding..

    • Cancel
    • Up 0 Down
    • Reply
    • More
    • Cancel
  • Jens Bauer
    Jens Bauer over 10 years ago

    The interrupt service routine will be invoked, yes.

    The name of the label is correct, '1' (one). If you have a numeric label between 1 and 9, then you can branch backwards or forwards to the nearest label of that kind by supplying 'b' or 'f'.

    For instance...

    1:      subs    r2,r2,#1

            bpl     1b

    1:      subs    r3,r3,#1

            bne     1b

    1:      subs    r4,r4,#1

            bge     1b

    ... will be completely valid, though it looks very odd, compared to what you're used to.

    • Cancel
    • Up 0 Down
    • Reply
    • More
    • Cancel
  • Gopal Amlekar
    Gopal Amlekar over 10 years ago

    In this case, when an interrupt arrives, will it not serve that interrupt?

    By the way, the label should be lb instead of l (OR the instruction should be b l )

    • Cancel
    • Up 0 Down
    • Reply
    • More
    • Cancel
  • Jens Bauer
    Jens Bauer over 10 years ago

    I think I wrote the above comment a bit too quickly.

    The loop would look like this:

    1:      wfi

            b       1b

    -That will branch back to the wfi, so when we actually get an interrupt, we'll go back and wait for another one.

    • Cancel
    • Up +1 Down
    • Reply
    • More
    • Cancel
<>
Architectures and Processors blog
  • Future Architecture Technologies: POE2 and vMTE

    Martin Weidmann
    Martin Weidmann
    This blog post introduces two future technologies, Permission Overlay Extension version 2 (POE2) and Virtual Tagging Extension (vMTE).
    • October 23, 2025
  • Scalable Matrix Extension: Expanding the Arm Intrinsics Search Engine

    Chris Walsh
    Chris Walsh
    Arm is pleased to announce that the Arm Intrinsics Search Engine has been updated to include the Scalable Matrix Extension (SME) intrinsics, including both SME and SME2 intrinsics.
    • October 3, 2025
  • Arm A-Profile Architecture developments 2025

    Martin Weidmann
    Martin Weidmann
    Each year, Arm publishes updates to the A-Profile architecture alongside full Instruction Set and System Register documentation. In 2025, the update is Armv9.7-A.
    • October 2, 2025