Arm is a 32-bit CPU architecture where every instruction is 32 bits long. Any constants which are part of an instruction must be encoded within the 32 bits of the given instruction and this naturally limits the range of constants that can be represented in one instruction. This post will show you how we can deal with these limitations and how the latest revision of the Arm architecture (Armv7) provides a simple and efficient solution. Most arithmetic and logical Arm instructions accept three parameters:
add r0, r1, r2 @ r0 = r1 + r2
sub r0, r1, #3 @ r0 = r1 - 3
An Operand 2 immediate must obey the following rule to fit in the instruction: an 8-bit value rotated right by an even number of bits between 0 and 30 (inclusive). This allows for constants such as 0xFF (0xFF rotated right by 0), 0xFF00 (0xFF rotated right by 24) or 0xF000000F (0xFF rotated right by 4).
Operand 2 immediates are also valid immediates for mov instructions, making it possible to move constant values into registers without performing any other computation:
mov r0, #0xFF0 @ r0 = 0xFF0
In software - especially in languages like C - constants tend to be small. When they are not small they tend to be bit masks. Operand 2 immediates provide a reasonable compromise between constant coverage and encoding space; most common constants can be encoded directly.
What happens if you need a constant which cannot be expressed as an Operand 2 immediate? The constant has to be moved into a register before use and there are many ways to do so. The traditional solution is to load the constant from memory.
Loading a value from memory will require a pointer to the memory location of the value. Pointers need to be held in a register, so we are back to the same problem, an extra register is needed. However, in Arm, the program counter (pc) can generally be used like any other register and therefore can be used as a base pointer for a load operation. This allows you to store the constant relative to the instruction loading the constant. Loading the constant in a register then becomes something like this:
ldr r0, [pc, #offset]
Here #offset is the offset in bytes of the constant relative to the program counter (PC). When executing an Arm instruction, PC reads as the address of the current instruction plus 8. #offset can take any values between -4095 and +4095 (inclusive).
#offset
Knowing where to store the constants in memory (and keeping track of them) can be a tedious task. Thankfully most assemblers provide pseudo-instructions to simplify the operation. For example, in GNU assembler you can write this:
returns_0x12345678: ldr r0, =0x12345678 bx lr @ function return
The above will assemble to this:
returns_0x12345678: ldr r0, [pc, #0] @ remember pc is 8 bytes ahead bx lr @ function return .word 0x12345678
In fact the ldr= pseudo instruction is a bit more clever than it looks, as it will check if the given constant can be represented by an Operand 2 immediate and will generate a mov instruction if it can. A mov instruction will be faster than an ldr instruction as there is no need to read the constant from memory, also resulting in memory savings.
ldr=
As mentioned earlier, there are other ways to load a constant. In the latest version of the Arm architecture, Armv7, two new instructions were introduced to improve the situation:
movw
movt
movw r0, #0x5678 @ r0 = 0x00005678 movt r0, #0x1234 @ r0 = (r0 & 0x0000FFFF) | 0x12340000 (=0x12345678)
Note that the order matters since movw will zero the upper 16 bits. Here again the GNU assembler provides some syntactic sugar: the prefixes :upper16: and :lower16: allow you to extract the corresponding half from a 32-bit constant:
.equ label, 0x12345678 movw r0, #:lower16:label movt r0, #:upper16:label
While this approach takes two instructions, it does not require any extra space to store the constant so both the movw/movt method and the ldr method will end up using the same amount of memory. Memory bandwidth is precious in and the movw/movt approach avoids an extra read on the data side, not to mention the read could have missed the cache.
If you know you can use it, movw/movt is the recommended way to load a 32-bit constant. However, if it is possible to encode the 32-bit constant using an 8-bit immediate and if necessary rotated right, try to use Operand 2 directly, and avoid the need to use an extra register.
Two additional notes:
1: According to ARMv7-M_ARM.pdf, page 121, "Modified Immediate", it's only possible to shift an 8-bit constant between 0 and 24, so the given example 0xF000000F would not be possible on ARM Cortex-M. However, it's possible to also use one of these constants:
%00000000abcdefgh00000000abcdefgh, %abcdefgh00000000abcdefgh00000000 or %abcdefghabcdefghabcdefghabcdefgh.
2: In some cases, it might be quicker to load immediate 32-bit values from memory instead of using MOVW+MOVT. This occurs if you have two consecutive load instructions that can be pipelined. Thus if pipelining is possible, the first load instruction will take 2 clock cycles and the next will take one clock cycle. In other words: This requires the two load instruction to be right next to eachother, without any other instructions in between; otherwise both load instructions will use 2 clock cycles each, resulting in 4 clock cycles being used instead of 3. If using MOVW+MOVT, each of those will take one clock cycle. But if you're only loading a single 32-bit immediate value, I recommend using MOVW+MOVT.