This document is designed as a tutorial in how to write assembly code for the Cortex-M series.
I only know the assembler syntax for the GNU assembler, but as there are many different assemblers available, you might need to consult the documentation for the one you will be using.
This tutorial will teach you what different directives in the GNU Assembler (GAS) do, and also teach you a few basic Cortex-M instructions.
The code was written for Cortex-M3 and will work on Cortex-M4. It will require minor modification for Cortex-M0, since I've used instructions that are only present in Cortex-M3 and Cortex-M4.
In addition, you will also learn how to define simple macros that takes parameters.
And finally, you will learn how a Cortex-M microcontroller starts up.
Note: Throughout the document, I've used C-style comments. Normally, a comment starts with semicolon, and extends to the rest of the line.
But when using GAS, the rules are different. GAS expects comments to start with a '@' instead of colon.
That is a little inconvenient when writing documents like this, and the C-comments should work if you use a C pre-processor.
You might want to write comments in this way, in order to make them more compatible:
;/* This comment should work in most assemblers. */
You'll want to be familiar with these assembler-directives, I'll explain some of them in detail later, and you will see how they are used:
.syntax unified /* use modern assembler syntax + auto-generate IT instructions. Put in top of your source file */
.weak label{,label} /* allow 'label' to be undefined. If it's undefined, it will have the value NULL (0x00000000). */
.weakref label,defaultLabel /* allow 'label' to be undefined. If it's undefined, it will have the value of another label. */
.section sectionName /* all output from now on goes into a section called 'sectionName' */
.align [bitposition] /* align the output offset */
.long value /* output a 32-bit value */
.text /* all output from now on goes into a section called '.text' (same as '.section .text') */
.func label[,actualLabel] /* mark the beginning of function 'label', so the linker may exclude the block if not referenced */
.endfunc /* mark the end of the function */
.pool /* allow the assembler to place constants here */
.size label,size /* tell the linker how long the block that this symbol points to is (in Bytes) */
.thumb_func label /* mark this as a thumb function (required if the function is called by using 'bx' or 'blx') */
.type label,%type /* specify the type of the symbol. Required if there is a pointer to the function somewhere. */
.cpu cpuType /* cpuType may for instance be cortex-m0, cortex-m3 or cortex-m4. */
Some of the directives above may take more parameters than I've shown. '.align' and '.section' are two such directives.
See the GNU Assembler Manual for more information on these directives.
The above are not instructions; they're assembler-directives, which means they tell the assembler something, without producing "code".
We've already been looking at ".syntax unified", which allow us to use modern unified syntax and can automate the generation of IT-instructoins (If-Then instructions).
In order to let the assembler generate the IT instructions automatically, you may have to enable this feature on the compiler's command-line.
For gcc, this option is as follows: -Wa,-mimplicit-it=always
The .weak directive allows a label to be undefined. If it's undefined, the value will default to NULL. You can specify multiple labels in one .weak directive; for instance:
.weak LowLevelInit,SystemInit,cab,cap,car,cat
If any of the above mentioned labels do not exist at compile (assemble) time, their values will default to NULL (0x00000000).
That means you can check at run-time if they're there.
The .weakref directive allows you to specify a default-value instead of NULL. This is especially useful for exception vectors.
For instance...
.weakref Reset_Handler,defaultResetHandler
.weakref HardFault_Handler,defaultExceptionHandler
.weakref NMI_Handler,defaultExceptionHandler
...
.weakref SysTick_Handler,defaultExceptionHandler
... Here we provide default handlers in our startup.S file (capital .S is recommended due to case-problems with some compilers on Windows)
Our default handlers are called defaultResetHandler and defaultExceptionHandler.
All it does is to go into an infinite loop, so you could think of this as "stopping" the microcontroller (although the program is in fact still running).
Normally, we would just use our standard 'defaultResetHandler', but in case we need to do things differently, it's nice to be able to override the default handler, without having to copy-and-edit the startup code.
-So all exceptions that we haven't implemented in our code, will point to the same handler.
Knowing the above, we can start creating our exception vector table using the '.long' directive, which will output a 32-bit value to the binary file directly:
.section isr_vector /* Put everything in a section called "isr_vector" from now on... */
.align 2 /* Make sure the output goes on an address divisible by 4 (that's 1 << 2) */
/* Address: Exception Vector Description: */
.long _stack /* 0x00000000 The initial stack pointer (defined by the linker-script) */
.long Reset_Handler /* 0x00000004 The startup-code, the code that runs on power-on or RESET */
.long NMI_Handler /* 0x00000008 Non-Masktable Interrupt, this can not be stopped, preempted or prevented */
.long HardFault_Handler /* 0x0000000c Hard Fault, all classes of Fault */
.long MemManage_Handler /* 0x00000010 Memory Management, MPU mismatch, including Access Violation and No Match */
.long BusFault_Handler /* 0x00000014 Bus Fault, Pre-Fetch- Memory Access Fault, other address/memory related Fault */
.long UsageFault_Handler /* 0x00000018 Usage Fault, i.e. Undefined Instructions, Illegal State Transitions */
.long 0 /* 0x0000001c */
.long 0 /* 0x00000020 */
.long 0 /* 0x00000024 */
.long 0 /* 0x00000028 */
.long SVC_Handler /* 0x0000002c Supervisor Call */
.long DebugMon_Handler /* 0x00000030 Debug Monitor */
.long 0 /* 0x00000034 */
.long PendSV_Handler /* 0x00000038 Pending Service, pending requests for system service */
.long SysTick_Handler /* 0x0000003c System Tick Timer (this may not exist on all implementations) */
/* .long ..._IRQHandler */ /* 0x00000040 and forward. IRQ vectors specific to your microcontroller follows here... */
.text /* Put everything in the text-section from now on... */
.align /* Make sure address is aligned for code output */
/* you can place your startup code here if you wish. */
As the exception vectors must be the very first data in the flash-memory, we've created a special section, which we call "isr_vector".
In our linker-script, we can tell the linker that the 'isr_vector' must be the very first section in the output file, so we're sure it's placed correctly every time. We should only have one block of data/code in the "isr_vector" section.
The .align directive ensures that the next output from the assembler will be placed on an address, which is suitable for an assembler-instruction.
Usually such addresses must be divisible by 4; .align takes care of that.
The .text directive will do the same as ".section .text", it switches to the section called '.text', which usually contains normal code.
The .text section is by default read-only and executable.
On all Cortex-M microcontrollers, the first 16 vectors are always located at the same addresses.
Some processors may choose to not implement a SysTick handler for instance.
After the first 16 vectors, there's space for IRQ vectors. Those usually differ between different kinds of devices; even within the same vendor.
For instance, one Cortex-M4 based microcontroller may have Timer0_IRQHandler,Timer1_IRQHandler,Timer2_IRQHandler,Timer3_IRQHandler, while another one from the same vendor would start by having a WDT_IRQHandler there instad; the timers could be placed 13 vectors later, and not necessarily contiguous.
You can place your startup-code in the .text section, for instance, this is how a standard startup code for C could be implemented.
The startup code will start by looking for the LowLevelInit and SystemInit functions. If they're found, they will be executed (in that order).
After that, the .data section will be copied to SRAM, and an optional .fastcode section will be copied to SRAM as well.
The .fastcode section is often an "advanced topic", as it usually requires you to tweak the linker-scripts.
It would work just as well for assembly language...
.macro FUNCTION name /* this macro makes life less tedious. =) */ .func \name,\name /* this tells a debugger that the function starts here */ .type \name,%function /* when a function is pointed to from a table, this is mandatory */ .thumb_func /* when a function is called by using 'bx' or 'blx' this is mandatory */ .align /* make sure the address is aligned for code output */ \name\(): /* this defines the label. the \() is necessary to separate the colon from the label */ .endm .macro ENDFUNC name /* FUNCTION and ENDFUNC must always be paired */ .size \name, . - \name /* tells the linker how big the code block for the function is */ .pool /* let the assembler place constants here */ .endfunc /* mark the end of the function, so a debugger can display it better */ .endm .text /* switch to text section, so code will be placed there. */ FUNCTION defaultResetHandler ldr r0,=LowLevelInit /* get the address of the LowLevelInit routine */ cmp r0,#0 /* is it NULL ? */ blxne r0 /* if not, call the routine. */ ldr r0,=SystemInit /* get the address of the SystemInit routine */ cmp r0,#0 /* is it NULL ? */ blxne r0 /* if not, call the routine. */ /* copy the .data section from Flash memory to SRAM (this allows us to pre-initialize variables) */ ldr r1,=_sidata /* point our source register to the start of the .data section */ ldr r2,=_sdata /* point our destination register to the SRAM dedicated to our writable data */ ldr r3,=_edata /* point r3 after the last byte we will be writing */ bl copy /* call a subroutine that copies the memory block */ /* copy code from Flash memory to SRAM. Code executes faster in SRAM than in Flash memory. */ ldr r1,=_sifastcode /* source address */ ldr r2,=_sfastcode /* destination start */ ldr r3,=_efastcode /* destination end */ bl copy /* copy code */ /* Now zero the .bss section. This is a section of memory, which is dedicated to our varaibles. */ movs r0,#0 /* zero r0 */ ldr r1,=_sbss /* point r1 to BSS starting address (in SRAM) */ ldr r2,=_ebss /* point r2 to BSS ending address (in SRAM) */ 1: cmp r1,r2 /* check if end is reached */ strlo r0,[r1],#4 /* if end not reached, store zero and advance pointer */ blo 1b /* if end not reached, branch back to loop */ ldr r0,=main /* get the address of the C 'main' code */ blx r0 /* jump to code */ b defaultExceptionHandler /* if we ever get here, we'll continue into an infinite loop for safety reasons */ copy: cmp r2,r3 /* check if we've reached the end */ ldrlo r0,[r1],#4 /* if end not reached, get word and advance source pointer */ strlo r0,[r2],#4 /* if end not reached, store word and advance destination pointer */ blo copy /* if end not reached, branch back to loop */ bx lr /* return to caller */ ENDFUNC defaultResetHandler FUNCTION defaultExceptionHandler wfi /* wait for an interrupt, in order to save power */ b defaultExceptionHandler /* go round loop */ ENDFUNC defaultExceptionHandler
Notice the use of ldrlo and strlo will allow us to copy / write zero byte blocks.
Normally, on other microcontrollers, we would have to make an initial branch forward, wasting both space and CPU-time.
-That's not necessary, on a Cortex-M3 or later, because we can have conditional execution on all the instructions in the loop.
... You can implement the above defaultResetHandler.
This would allow you to write a small program easily, without worrying about the basic setup.
If at some point, you need it, you can write your own Reset_Handler, which will override (and discard) the defaultResetHandler.
-Thus the defaultResetHandler will not even be present in the final file.
Thank you, Jens!
Removing __attribute__ ((naked)) did the trick. This is not meant to be portable code, so I'll just keep it attribute free.
/Bo
For Cortex-M, an ISR is just a simple subroutine. You do not need a special prologue/epilogue.
Often subroutines (and thereby ISR) would use PUSH/POP, which would include the LR/PC.
So at the beginning, they would push some registers and LR; at the end they would pop the same registers pushed, except that they would pop PC instead of LR.
If they have neither POP nor BX LR, then make sure you have not specified "__attribute__("naked")" in front of the soubroutine.
Also, if you're writing an ISR in assembly language, you should make sure you specify
.type functionname,%function
(where functionname is the name of the subroutine being called).
-In short, the .type directive makes code-pointers in tables become 'odd' addresses, which means they're thumb functions.
If you are using GCC and you're re-using the code on other architectures than Cortex-M, you can use either __attribute__("interrupt") or __attribute__("isr"), so you tell the compiler to generate interrupt-prologue/epilogue code for those architectures (that could be for instance Arm7TDMI). But again, it's not necessary at all to use special compiler directives for Cortex-M.
Is there a special directive to let the compiler know that a function is indeed an ISR function?
I'm having trouble with my ISR functions in that they lack the finishing "bx lr" instructions. I have to add it manually to my C-code.
Jens,
I agree, there is place for all the "shapes and colours".
However, I fully agree with Alan. Today, you can reach the same runtime performance (and control) with a C implementation. At this point, it is difficult to justify assembler approaches.
I am also an experienced eng in the embedded world and I was an assembler guru for many decades. I changed a bit my mindset simply because C is the best candidate (even for the runtime).
Today, the most important thing is the reliability, the usability, the re-usability of your code and It is difficult to justify runtime code in assembler. You can have the full control on the generated code (just dump the listing). You could be surprised how "clever" sometime the compiler could be.
Probably, the only place were the assembler is still unavoidable is inside the uKernel (for the context switching). For this particular example, assembler remain mandatory for nearly all the CPU cores (C is simple inappropriate).
Rachele
The original reason for me to write this article, was because I was asked how to write startup code in assembly language for a Cortex-M4 microcontroller.
In addition, it is meant to be a real-world example on how assembly language could look, and something that would be usable no matter which Cortex-M flavour you prefer.
The standard crt.c does not call LowLevelInit, nor does it call SystemInit. These are necessary to set up the microcontroller's CPU frequency, because it usually starts at a slow speed. If it's going to copy a large data data section and clear a large BSS section, this can be sped up significantly, if the CPU is running at higher speeds.
The job of SystemInit is to initialize the CPU speed and start up SRAM and SDRAM, otherwise you might have no place for your .data and .bss sections, depending on the microcontroller you're using.
You can find C-code for doing these things everywhere, including in all the programming tools that you can get for free, so posting another C-code example would make no sense.
- For microcontrollers, which require a small footprint, precision and fine control is essential and sometimes necessary.
Assembly language can be made just as portable, if you have the discipline to make it portable.
(I've been working for 10 years on a project which uses portable assembly language between different architectures, so I can say for certain that it is possible)
Since C is a general language, and it is not Arm specific, I've chosen to focus my articles on what is specific to Arm, and how you can benefit from using Arm specific instructions.
That said, C is a neat language, I use it a lot. It's fairly easy to learn, it's fairly quickly to write general-purpose code in, but I will probably not post any C-code in this community, because it would only be working on a specific vendor's microcontroller. For instance: If the C-code is using a timer, you can only use it on a particular microcontroller, because the timer's hardware often differ from microcontroller to microcontroller (even if it's the same vendor).
If writing code in C, chances that you are pushing the microcontroller to its limits are very slim. You would normally use something like 10..20% of its potential.
With assembly language, you can push the limits, go in details and you can no longer be sure that the word "impossible" is justified to use.
Besides, if someone tells me "shut up and use what you're given", I say: "No! I will not! I want full control. I want to see what's inside. I want to go beyond and fit the triangular peg in the round hole. I will prove that what you say can't be done, can be done anyway."
Thus... There is no "one size fits all", but if we provide different sizes, shapes and colours, then I'm sure that everyone can find something that fits.