Arm Community
Arm Community
  • Site
  • User
  • Site
  • Search
  • User
Arm Community blogs
Arm Community blogs
Architectures and Processors blog Writing your own startup code for Cortex-M
  • Blogs
  • Mentions
  • Sub-Groups
  • Tags
  • Jump...
  • Cancel
More blogs in Arm Community blogs
  • AI blog

  • Announcements

  • Architectures and Processors blog

  • Automotive blog

  • Embedded and Microcontrollers blog

  • Internet of Things (IoT) blog

  • Laptops and Desktops blog

  • Mobile, Graphics, and Gaming blog

  • Operating Systems blog

  • Servers and Cloud Computing blog

  • SoC Design and Simulation blog

  • Tools, Software and IDEs blog

Tell us what you think
Tags
  • GNU Assembler
  • Cortex-M
  • Tutorial
Actions
  • RSS
  • More
  • Cancel
Related blog posts
Related forum threads

Writing your own startup code for Cortex-M

Jens Bauer
Jens Bauer
December 15, 2014
11 minute read time.

Introduction

This document is designed as a tutorial in how to write assembly code for the Cortex-M series.

I only know the assembler syntax for the GNU assembler, but as there are many different assemblers available, you might need to consult the documentation for the one you will be using.

This tutorial will teach you what different directives in the GNU Assembler (GAS) do, and also teach you a few basic Cortex-M instructions.

The code was written for Cortex-M3 and will work on Cortex-M4. It will require minor modification for Cortex-M0, since I've used instructions that are only present in Cortex-M3 and Cortex-M4.

In addition, you will also learn how to define simple macros that takes parameters.

And finally, you will learn how a Cortex-M microcontroller starts up.

Note: Throughout the document, I've used C-style comments. Normally, a comment starts with semicolon, and extends to the rest of the line.

Whitespaces are ignored

But when using GAS, the rules are different. GAS expects comments to start with a '@' instead of colon.

That is a little inconvenient when writing documents like this, and the C-comments should work if you use a C pre-processor.

You might want to write comments in this way, in order to make them more compatible:

                    ;/* This comment should work in most assemblers. */

Assembler Directives

You'll want to be familiar with these assembler-directives, I'll explain some of them in detail later, and you will see how they are used:

                    .syntax             unified                     /* use modern assembler syntax + auto-generate IT instructions. Put in top of your source file */

                    .weak               label{,label}               /* allow 'label' to be undefined. If it's undefined, it will have the value NULL (0x00000000). */

                    .weakref            label,defaultLabel          /* allow 'label' to be undefined. If it's undefined, it will have the value of another label. */

                    .section            sectionName                 /* all output from now on goes into a section called 'sectionName' */

                    .align              [bitposition]               /* align the output offset */

                    .long               value                       /* output a 32-bit value */

                    .text                                           /* all output from now on goes into a section called '.text' (same as '.section .text') */

                    .func               label[,actualLabel]         /* mark the beginning of function 'label', so the linker may exclude the block if not referenced */

                    .endfunc                                        /* mark the end of the function */

                    .pool                                           /* allow the assembler to place constants here */

                    .size               label,size                  /* tell the linker how long the block that this symbol points to is (in Bytes) */

                    .thumb_func         label                       /* mark this as a thumb function (required if the function is called by using 'bx' or 'blx') */

                    .type               label,%type                 /* specify the type of the symbol. Required if there is a pointer to the function somewhere. */

                    .cpu                cpuType                     /* cpuType may for instance be cortex-m0, cortex-m3 or cortex-m4. */

 

Some of the directives above may take more parameters than I've shown. '.align' and '.section' are two such directives.

See the GNU Assembler Manual for more information on these directives.

The above are not instructions; they're assembler-directives, which means they tell the assembler something, without producing "code".

We've already been looking at ".syntax unified", which allow us to use modern unified syntax and can automate the generation of IT-instructoins (If-Then instructions).

In order to let the assembler generate the IT instructions automatically, you may have to enable this feature on the compiler's command-line.

For gcc, this option is as follows: -Wa,-mimplicit-it=always

The .weak directive allows a label to be undefined. If it's undefined, the value will default to NULL. You can specify multiple labels in one .weak directive; for instance:

                    .weak               LowLevelInit,SystemInit,cab,cap,car,cat

If any of the above mentioned labels do not exist at compile (assemble) time, their values will default to NULL (0x00000000).

That means you can check at run-time if they're there.

The .weakref directive allows you to specify a default-value instead of NULL. This is especially useful for exception vectors.

For instance...

                    .weakref            Reset_Handler,defaultResetHandler

                    .weakref            HardFault_Handler,defaultExceptionHandler

                    .weakref            NMI_Handler,defaultExceptionHandler

                    ...

                    ...

                    .weakref            SysTick_Handler,defaultExceptionHandler

                    ...

... Here we provide default handlers in our startup.S file (capital .S is recommended due to case-problems with some compilers on Windows)

Our default handlers are called defaultResetHandler and defaultExceptionHandler.

All it does is to go into an infinite loop, so you could think of this as "stopping" the microcontroller (although the program is in fact still running).

Normally, we would just use our standard 'defaultResetHandler', but in case we need to do things differently, it's nice to be able to override the default handler, without having to copy-and-edit the startup code.

-So all exceptions that we haven't implemented in our code, will point to the same handler.

The Exception Vector Table

Knowing the above, we can start creating our exception vector table using the '.long' directive, which will output a 32-bit value to the binary file directly:

                    .section            isr_vector                  /* Put everything in a section called "isr_vector" from now on... */

                    .align              2                           /* Make sure the output goes on an address divisible by 4 (that's 1 << 2) */

                                                                    /* Address:   Exception Vector Description: */

                    .long               _stack                      /* 0x00000000 The initial stack pointer (defined by the linker-script) */

                    .long               Reset_Handler               /* 0x00000004 The startup-code, the code that runs on power-on or RESET */

                    .long               NMI_Handler                 /* 0x00000008 Non-Masktable Interrupt, this can not be stopped, preempted or prevented */

                    .long               HardFault_Handler           /* 0x0000000c Hard Fault, all classes of Fault */

                    .long               MemManage_Handler           /* 0x00000010 Memory Management, MPU mismatch, including Access Violation and No Match */

                    .long               BusFault_Handler            /* 0x00000014 Bus Fault, Pre-Fetch- Memory Access Fault, other address/memory related Fault */

                    .long               UsageFault_Handler          /* 0x00000018 Usage Fault, i.e. Undefined Instructions, Illegal State Transitions */

                    .long               0                           /* 0x0000001c */

                    .long               0                           /* 0x00000020 */

                    .long               0                           /* 0x00000024 */

                    .long               0                           /* 0x00000028 */

                    .long               SVC_Handler                 /* 0x0000002c Supervisor Call */

                    .long               DebugMon_Handler            /* 0x00000030 Debug Monitor */

                    .long               0                           /* 0x00000034 */

                    .long               PendSV_Handler              /* 0x00000038 Pending Service, pending requests for system service */

                    .long               SysTick_Handler             /* 0x0000003c System Tick Timer (this may not exist on all implementations) */

/*                  .long       ..._IRQHandler */                   /* 0x00000040 and forward. IRQ vectors specific to your microcontroller follows here... */

                    .text                                           /* Put everything in the text-section from now on... */

                    .align                                          /* Make sure address is aligned for code output */

                    /* you can place your startup code here if you wish. */

As the exception vectors must be the very first data in the flash-memory, we've created a special section, which we call "isr_vector".

In our linker-script, we can tell the linker that the 'isr_vector' must be the very first section in the output file, so we're sure it's placed correctly every time. We should only have one block of data/code in the "isr_vector" section.

The .align directive ensures that the next output from the assembler will be placed on an address, which is suitable for an assembler-instruction.

Usually such addresses must be divisible by 4; .align takes care of that.

The .text directive will do the same as ".section .text", it switches to the section called '.text', which usually contains normal code.

The .text section is by default read-only and executable.

On all Cortex-M microcontrollers, the first 16 vectors are always located at the same addresses.

Some processors may choose to not implement a SysTick handler for instance.

After the first 16 vectors, there's space for IRQ vectors. Those usually differ between different kinds of devices; even within the same vendor.

For instance, one Cortex-M4 based microcontroller may have Timer0_IRQHandler,Timer1_IRQHandler,Timer2_IRQHandler,Timer3_IRQHandler, while another one from the same vendor would start by having a WDT_IRQHandler there instad; the timers could be placed 13 vectors later, and not necessarily contiguous.

The Startup Code

You can place your startup-code in the .text section, for instance, this is how a standard startup code for C could be implemented.

The startup code will start by looking for the LowLevelInit and SystemInit functions. If they're found, they will be executed (in that order).

After that, the .data section will be copied to SRAM, and an optional .fastcode section will be copied to SRAM as well.

The .fastcode section is often an "advanced topic", as it usually requires you to tweak the linker-scripts.

It would work just as well for assembly language...

  .macro             FUNCTION name                /* this macro makes life less tedious. =) */

                    .func              \name,\name                  /* this tells a debugger that the function starts here */

                    .type              \name,%function              /* when a function is pointed to from a table, this is mandatory */

                    .thumb_func                                     /* when a function is called by using 'bx' or 'blx' this is mandatory */

                    .align                                          /* make sure the address is aligned for code output */

\name\():                                                           /* this defines the label. the \() is necessary to separate the colon from the label */

                    .endm

                    .macro             ENDFUNC name                 /* FUNCTION and ENDFUNC must always be paired */

                    .size              \name, . - \name             /* tells the linker how big the code block for the function is */

                    .pool                                           /* let the assembler place constants here */

                    .endfunc                                        /* mark the end of the function, so a debugger can display it better */

                    .endm

                    .text                                           /* switch to text section, so code will be placed there. */

                    FUNCTION            defaultResetHandler

                    ldr                 r0,=LowLevelInit            /* get the address of the LowLevelInit routine */

                    cmp                 r0,#0                       /* is it NULL ? */

                    blxne               r0                          /* if not, call the routine. */

                    ldr                 r0,=SystemInit              /* get the address of the SystemInit routine */

                    cmp                 r0,#0                       /* is it NULL ? */

                    blxne               r0                          /* if not, call the routine. */

                    /* copy the .data section from Flash memory to SRAM (this allows us to pre-initialize variables) */

                    ldr                 r1,=_sidata                 /* point our source register to the start of the .data section */

                    ldr                 r2,=_sdata                  /* point our destination register to the SRAM dedicated to our writable data */

                    ldr                 r3,=_edata                  /* point r3 after the last byte we will be writing */

                    bl                  copy                        /* call a subroutine that copies the memory block */

                    /* copy code from Flash memory to SRAM. Code executes faster in SRAM than in Flash memory. */

                    ldr                 r1,=_sifastcode             /* source address */

                    ldr                 r2,=_sfastcode              /* destination start */

                    ldr                 r3,=_efastcode              /* destination end */

                    bl                  copy                        /* copy code */

                    /* Now zero the .bss section. This is a section of memory, which is dedicated to our varaibles. */

                    movs                r0,#0                       /* zero r0 */

                    ldr                 r1,=_sbss                   /* point r1 to BSS starting address (in SRAM) */

                    ldr                 r2,=_ebss                   /* point r2 to BSS ending address (in SRAM) */

1:                  cmp                 r1,r2                       /* check if end is reached */

                    strlo               r0,[r1],#4                  /* if end not reached, store zero and advance pointer */

                    blo                 1b                          /* if end not reached, branch back to loop */

                    ldr                 r0,=main                    /* get the address of the C 'main' code */

                    blx                 r0                          /* jump to code */

                    b                   defaultExceptionHandler     /* if we ever get here, we'll continue into an infinite loop for safety reasons */

copy:               cmp                 r2,r3                       /* check if we've reached the end */

                    ldrlo               r0,[r1],#4                  /* if end not reached, get word and advance source pointer */

                    strlo               r0,[r2],#4                  /* if end not reached, store word and advance destination pointer */

                    blo                 copy                        /* if end not reached, branch back to loop */

                    bx                  lr                          /* return to caller */

                    ENDFUNC             defaultResetHandler

                    FUNCTION            defaultExceptionHandler

                    wfi                                             /* wait for an interrupt, in order to save power */

                    b                   defaultExceptionHandler     /* go round loop */

                    ENDFUNC             defaultExceptionHandler

Notice the use of ldrlo and strlo will allow us to copy / write zero byte blocks.

Normally, on other microcontrollers, we would have to make an initial branch forward, wasting both space and CPU-time.

-That's not necessary, on a Cortex-M3 or later, because we can have conditional execution on all the instructions in the loop.

... You can implement the above defaultResetHandler.

This would allow you to write a small program easily, without worrying about the basic setup.

If at some point, you need it, you can write your own Reset_Handler, which will override (and discard) the defaultResetHandler.

-Thus the defaultResetHandler will not even be present in the final file.

Related articles

  • Arm Cortex-M0 assembly programming tips and tricks
  • A fairly quick Count Leading Zeroes for Cortex-M0
Anonymous

Top Comments

  • Jens Bauer
    Jens Bauer over 10 years ago +1
    Almost all the startup code provided by microcontroller vendors are different in some way. One major difference is that if you write your own, you will have to add the exception vectors yourself from vector...
  • Alan
    Alan over 10 years ago

    Hi Jens,

    Nice tutorial. Anyway, I have a simple question:

    Why to continue to write runtime code in assembler? You can reach the target using in the correct way the C compiler.

    Personally I use the open source RTOS uKOS (http://www.ukos.ch/). These guys use the same crt0 for all the cpus/platforms. In a short, you can have a single pcs of code, for all your targets. Advantage:

    1. portability
    2. understandable code
    3. easy to maintain
    4. easy to implement specific functionalities
    5. it can be as small and fast as the assembler code

    OK, this can be easily achieved with gcc ... maybe this is more complicate (or even impossible) with other compilers.

    Here is an example for the cortex-M4 and for a stm32F429 controller

    The crt0.c

    crt0.c.jpg

    Well, this was just a suggestion.

    Please, check the package i suggested (uKOS) ... really brilliant implementations of the runtime.

    Regards,

    Alan

    • Cancel
    • Up 0 Down
    • Reply
    • More
    • Cancel
  • Jens Bauer
    Jens Bauer over 10 years ago

    Yes, it's the IRQ table; correct.

    It's actually very easy to use.

    Pick a name in the list, for instance SysTick_Handler (well, because SysTick_Handler is one of the easiest ones, heh).

    Write your C-function with the name exactly like the one in the IRQ handler table, eg. like this:

    volatile uint32_t gSysTickCount = 0;

    void SysTick_Handler(void)

    {

        gSysTickCount++;

    }

    Use your preferred way of starting the SysTick interrupt (for instance by using the driver libraries), and you should have a working tick-timer.

    The SysTick interrupt does not require clearing a pending bit, but almost all other interrupts do, so if you're using for instance a Timer-interrupt, an ADC-interrupt, a DMA-interrupt or any other 'repeatable' interrupt, you should make sure you clear its pending bits. I prefer clearing the pending bits as the very first thing in the Interrupt Service Routine, whenever possible.

    • Cancel
    • Up 0 Down
    • Reply
    • More
    • Cancel
  • Jo Van Montfort
    Jo Van Montfort over 10 years ago

    Nice tutorial Jens!

    I have this in my startup code.

    Default_Handler:

    b .

    .size Default_Handler, . - Default_Handler

    /*    Macro to define default handlers. Default handler

    *    will be weak symbol and just dead loops. They can be

    *    overwritten by other handlers */

    .macro def_irq_handler handler_name

    .weak \handler_name

    .set \handler_name, Default_Handler

    .endm

    def_irq_handler NMI_Handler

    def_irq_handler HardFault_Handler

    def_irq_handler MemManage_Handler

    def_irq_handler BusFault_Handler def_irq_handler UsageFault_Handler def_irq_handler SVC_Handler def_irq_handler DebugMon_Handler def_irq_handler PendSV_Handler def_irq_handler SysTick_Handler def_irq_handler DEF_IRQHandler .end

    Is this the IRQ vector table? How does it work?

    • Cancel
    • Up 0 Down
    • Reply
    • More
    • Cancel
  • Jens Bauer
    Jens Bauer over 10 years ago

    Almost all the startup code provided by microcontroller vendors are different in some way.

    One major difference is that if you write your own, you will have to add the exception vectors yourself from vector 16 and forward.

    The reason that I personally write my own startup code (like the one in the tutorial), is that I like having the optional LowLevelInit function, where I can set the speed of the microcontroller for instance; before I start other program execution. I can also initialize external memory here, before copying code to - say - SDRAM from the Flash memory.

    SystemInit might set the speed of the microcontroller by default (if SystemInit is provided / linked); this depends on what (library) files you link with.

    In addition to the above mentioned differnces, I also support .fastcode; this is code that is copied to SRAM from Flash memory during startup.

    Code normally executes much faster from SRAM than from Flash; it also allows you to have writable variables close to the code if required.

    There is another detail, which I would like to ask all vendors to add: The defaultResetHandler. If for instance, a vendor has startup code that only includes basic initialization, and the developer needs to use a language such as C++ (which needs to initialize constructors), then the developer needs to rewrite the startup code (or copy-and-modify).

    Or ... If the developer needs to reduce the startup-code, so that main() is called immediately at reset, it would also be necessary to modify the startup file.

    Now imagine that the developer (me for instance), often need to execute code from RAM, then he would need to copy and modify the startup code every time he need this functionality. Instead, it would be wise to have one central startup code file, which is capable of handling all the situations the developer needs.

    I am only using a single startup.s file in my system; I am not copying it to my project, so it's not a 'template' file as you've seen it in most setups.

    If I need to distribute the source code, I have the option to copy the startup.s file and include it with the distribution, but I don't like having multiple copies of the same file all over my harddrive, especially not when I need to fix a bug, which is common to all of the files.

    The actual reason for posting the tutorial was that another member of the community needed an example on how this could be done; so I decided to write it in a tutorial-style, to explain each step. It's possible to write the startup code in C as well, while keeping the functionality presented in this one.

    • Cancel
    • Up +1 Down
    • Reply
    • More
    • Cancel
  • Teddy Zhai
    Teddy Zhai over 10 years ago

    Hi Jens,

    what is the difference between your startup code here and the standard startup code normally provided by SoC vendors. I seem to have seen this provided in STM32 SoC family.

    Best regards. Teddy

    • Cancel
    • Up 0 Down
    • Reply
    • More
    • Cancel
<
Architectures and Processors blog
  • When a barrier does not block: The pitfalls of partial order

    Wathsala Vithanage
    Wathsala Vithanage
    Acquire fences aren’t always enough. See how LDAPR exposed unsafe interleavings and what we did to patch the problem.
    • September 15, 2025
  • Introducing GICv5: Scalable and secure interrupt management for Arm

    Christoffer Dall
    Christoffer Dall
    Introducing Arm GICv5: a scalable, hypervisor-free interrupt controller for modern multi-core systems with improved virtualization and real-time support.
    • April 28, 2025
  • Getting started with AARCHMRS Features.json using Python

    Joh
    Joh
    A high-level introduction to the Arm Architecture Machine Readable Specification (AARCHMRS) Features.json with some examples to interpret and start to work with the available data using Python.
    • April 8, 2025