Branch and Call Sequences Explained

September 25, 2013

7 minute read time.

What Does a Branch Do?

A branch, quite simply, is a break in the sequential flow of instructions that the processor is executing. Some other architectures call them jumps, but they're essentially the same thing. The following is a trivial, and hopefully familiar example of a branch:

entry_point:
  mov     r0, #0      @ Set r0 to 0.
  b       target      @ Jump forward to 'target'.
  mov     r0, #1      @ Set r0 to 1.
target:
  ...                 @ At this point, r0 holds the value 0.
  ...                 @ The second mov instruction did not execute.

There are several variants of branches in the Arm and Thumb instruction sets. Several of these variants are in common with many other CPU architectures, but there are also a few branch variants specific to Arm. Each variant is explained in detail below:

Relative and Absolute Branch Targets

A relative branch is one where the target address is calculated based on the value of the current pc (program counter). Given the example above, an assembler would work out that the target label is eight bytes ahead of the b target instruction (in Arm code) and then generate a relative branch which means 'jump forward by eight bytes'. Relative branches are essential for position-independant code, which is expected to run correctly at any location in memory. The most common relative branches on Arm are single instructions and tend to be the most efficient branches available, though they have limited range.

An absolute branch will always jump to the specified address, regardless of the current pc. Absolute branches are used when the address of the target is provided as a function pointer, for example. However, because an absolute branch requires a full 32-bit target address, absolute branches usually require a load or some other constant-loading mechanism in addition to the branch instruction itself.

In many cases, the programmer (or compiler) may not actually care whether a branch is relative or absolute, and might just use whichever is most efficient on a case-by-case basis.

Branch Range

Because the Arm instruction set is fixed-width at 32 bits (and Thumb has either 16 or 32 bits), it is not possible to encode a full 32-bit branch offset in a single instruction. Relative branches can be encoded using a limited-range offset from the current pc. In assembly code, this is usually written as a branch to a label (as in the example above). The assembler will work out the required offset.

The range available varies between Arm and Thumb (and in a few cases also between instruction variants) but is usually very large and quite sufficient for most branches within a program. By using various combinations or additional instructions and literal pool loads, it is also possible to construct arbitrary full-range branches in case the single-instruction range is not sufficient. All practical absolute branches are necessarily full-range, since a 32-bit target address needs to be loaded.

Function Calls

Almost every modern programming language has some concept of functions. Any given function can (in general) be called from any part of a program, so processor architectures need some way to store the address of the caller. On Arm processors, this return address is stored in lr (the link register). Branch instructions with an l suffix -- like bl and blx -- work just like a standard b or bx branch, but also store a return address in lr.

If a function does not modify lr, then the return sequence can (and should) be a simple "bx lr". Otherwise, the lr can be pushed onto the stack at function entry. From here, the best return sequence is usually to pop directly into pc, though a number of other options are possible depending on the situation.

Interworking Branches (Between Arm and Thumb Code)

Programs on Arm processors can use either the Arm or Thumb instruction set, or both. Whilst Arm and Thumb instructions cannot be directly interleaved, it is possible to switch (or interwork) between Arm and Thumb states at run-time. This interworking is most notably achieved using special branch instructions with an x suffix, like bx and blx. Several other branch mechanisms are also capable of interworking. For example, the return sequence which writes to the pc using pop (or any other memory access) can interwork, and will always return in the appropriate state.

Branch instructions fall into three classes: Instructions that never change state (like "b label"), instructions that always change state (like "blx label"), and instructions that automatically change state based on the target address (like "bx register").

Address-based interworking uses the lowest bit of the address to determine the instruction set at the target. If the lowest bit is 1, the branch will switch to Thumb state. If the lowest bit is 0, the branch will switch to Arm state. Note that the lowest bit is never actually used as part of the address as all instructions are either 4-byte aligned (as in Arm) or 2-byte aligned (as in Thumb).

Arm Branch Instructions

The following table lists the branch instructions commonly used on Arm processors:

Instruction	Relativity	Linkage	Interworking	Notes
b label	Relative	Simple (none)	Never
bx register	Absolute	Simple (none)	Address-based [2]
bl label	Relative	Function call (lr)	Never	Note that assemblers will generally select between bl label and blx label automatically, regardless of which instruction you use.
blx label	Relative	Function call (lr)	Always	Note that assemblers will generally select between bl label and blx label automatically, regardless of which instruction you use.
blx register	Absolute	Function call (lr)	Address-based [2]
pop {..., pc}	Absolute	Simple (none)	Address-based [2] (since Armv5T)	A common return sequence in cases where lr has been pushed onto the stack at the start of the function.
ldr pc, =address	Absolute	Simple (none)	Address-based [2] (since Armv5T)	Load from a literal pool directly into pc.

It is also possible to write into the pc using arithmetic instructions, but this is useful only in specific cases ¹, and use of the normal branch instructions is advisable where possible.

Most of the interworking branches were added on Armv5T. The only way to interwork on Armv4T was to use the bx instruction. Armv4T interworking branch sequences are often much less efficient than the Armv5T versions, so it's best to use Armv5T branches unless you really need Armv4T compatibility.

Using More Complex Branches

To encode more complex branches than those listed above, a combination of instructions must be used. In cases like this, where the target address must be calculated in advance of the branch instruction, normal methods for loading and calculated values are used. Arithmetic might be used for long-range relative branches, for example, and a constant pool load might be used for an absolute branch.

Thumb-2 Special-Purpose Branches

Finally, there are a few branches available specifically in the Thumb-2 instruction set that are designed for specific use-cases. These are not available to the Arm instruction set (or to the original Thumb instruction set), and so I will give them only a brief mention, but if you're writing Thumb-2 code they can be very useful. For further details, refer to the Armv7-A/R Architecture Reference Manual.

For each special-purpose branch, I will also give a roughly equivalent Arm implementation. The Arm implementations have different limitations (such as branch range) and have other side effects (such as requiring a scratch register). Nevertheless, they should serve to clarify the behaviour of the Thumb-2 instructions.

`cbnz` and `cbz`

The cbnz (compare, branch on non-zero) and cbz (compare, branch on zero) instructions are useful for very short-range forward branches, such as loop terminations, that would otherwise require two or more instructions. The two-instruction version is still available, of course, and may be useful if more range is required, or if a more complicated comparison is required.

Arm Implementation              Thumb-2 Implementation

cmp     rA, #0                  cbz   rA, label
beq     label

cmp     rA, #0                  cbnz  rA, label
bne     label

`tbb` and `tbh`

The tbb (table branch byte) and tbh (table branch halfword) instructions are useful for the implementation of jump tables. One argument register is a base pointer to a table, and the second argument is an index into the table. The value loaded from the table is then doubled and added to the pc.

Arm Implementation              Thumb-2 Implementation

ldrb    ip, [rA, rB]            tbb   rA, rB
add     pc, pc, ip, lsl #1

ldrh    ip, [rA, rB, lsl #1]    tbh   rA, rB, lsl #1
add     pc, pc, ip, lsl #1

¹A typical example of where arithmetic-based branches are useful is in the implementation of jump tables, but they are occasionally useful in other cases.

²Bit 0 of the address indicates the instruction set of the target. If 1, the target is Thumb. If 0, the target is Arm.

Jacob Bramley over 4 years ago in reply to marcusob

Sorry, I didn't see this when it was posted!

By "ldr PC,lr+4" I think you mean "mov pc, lr + 4". However, the +4 isn't necessary; the call sequence sets up lr so that it points to the next instruction. The callee could do a simple "mov pc, lr" to return, but it would be much better to use "bx lr" for this.

Regarding the +2 vs +4 offset: The original Arm instruction read PC+8 because it had a three-stage pipeline, so the PC value read by an instruction was two instructions (or eight bytes) ahead. The original Thumb instruction set only had two-byte instructions, so it read PC+4. When four-byte Thumb instructions were introduced, this behaviour was preserved, so Thumb always reads PC+4 irrespective of the size of the instruction used to read it.

In your specific case, you have a calling function written in C, which will almost certainly use "bl" or "blx" to call your assembly function. Your assembly function doesn't have to know exactly how this was done, because "bl" and "blx" will both set LR to the next instruction. Your assembly function should just use "bx lr" to return.
- Cancel
- Up 0 Down
- Reply
- More
- Cancel
marcusob over 5 years ago

If you want to jump back to a calling function in c from assembly but to the next instruction we do a ldr PC,lr+4 for arm . but if we want to jump back to the next instruction and its thumb it could be or+2 or or+4 as thumb2 can be 2 or 4 byte. So how do we know. If the compiler of the c func used a 2 or 4 byte instruction and therefore how do we know whether to add 2 or 4 to the lr ?
- Cancel
- Up 0 Down
- Reply
- More
- Cancel
marcusob over 5 years ago

In thumb2 If you want to jump back to a calling function in c from assembly but to the next instruction we do a ldr PC,lr+4 for arm . but if we want to jump back to the next instruction and its thumb it could be or+2 or or+4 as thumb2 can be 2 or 4 byte. So how do we know. If the compiler of the c func used a 2 or 4 byte instruction and therefore how do we know whether to add 2 or 4 to the kr ?
- Cancel
- Up 0 Down
- Reply
- More
- Cancel

Architectures and Processors blog

MPAM-Style cache partitioning with ATP-Engine and gem5

Hristo Belchev

Upstream gem5 and ATP-Engine MPAM-style cache partitioning are discussed, with experiments for the feature being proposed and analyzed.
- April 24, 2024
Optimizing your programs for Arm platforms

Tamar Christina

This blog covers techniques and tips that are useful to create better performing programs through compilers whether you are creating Android, Desktop or Server applications.
- April 24, 2024
Deep dive into the PMU value of L2D_CACHE_WR on the Neoverse N2 server

Ker Liu

In-depth analysis of what the PMU of L2D_CACHE_WR counts on the Neoverse N2 server.
- April 15, 2024

AI and ML blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded blog

Graphics, Gaming, and VR blog

High Performance Computing (HPC) blog

Infrastructure Solutions blog

Internet of Things (IoT) blog

Operating Systems blog

SoC Design and Simulation blog

Tools, Software and IDEs blog

Branch and Call Sequences Explained

What Does a Branch Do?

Relative and Absolute Branch Targets

Branch Range

Function Calls

Interworking Branches (Between Arm and Thumb Code)

Arm Branch Instructions

Using More Complex Branches

Thumb-2 Special-Purpose Branches

`cbnz` and `cbz`

`tbb` and `tbh`

MPAM-Style cache partitioning with ATP-Engine and gem5

Optimizing your programs for Arm platforms

Deep dive into the PMU value of L2D_CACHE_WR on the Neoverse N2 server

Branch and Call Sequences Explained

What Does a Branch Do?

Relative and Absolute Branch Targets

Branch Range

Function Calls

Interworking Branches (Between Arm and Thumb Code)

Arm Branch Instructions

Using More Complex Branches

Thumb-2 Special-Purpose Branches

cbnz and cbz

tbb and tbh

`cbnz` and `cbz`

`tbb` and `tbh`