Code reuse attacks: the compiler story

May 23, 2019

11 minute read time.

How does a developer prevent an attacker from taking control of a program when the developer is providing the tools to the attacker? What are code reuse attacks and how can the Arm Architecture remove the vulnerabilities? How can someone use these architectural security features in their code? How effective are these features? Are they worth the code size increase?

In this blog post we answer these questions and demonstrate how modern compilers can do all the heavy lifting!

Code Reuse Attacks: A history

Attacking a piece of software was once as straightforward as finding a buffer overflow exploit, filling the buffer with arbitrary code to execute and replacing the return address to point to the beginning of this new code. Fortunately, we now prevent areas of memory being both writable and executable, either an attacker cannot overwrite the code that exists or they cannot execute the code they have injected.

Not to be perturbed, attackers continued looking for exploits and found code reuse attacks. These attacks rely on chaining together several small snippets of code that already exist in a program. It's still necessary to begin the attack through some buffer overflow exploit but no new code needs to be injected.

Imagine a program running as root on a server. Once in control of the flow of execution, an attacker can string together a sequence of gadgets where every operation operates with elevated privileges. If the attacker can launch a shell, they’ve won.

For the rest of this blog-post to make sense, we need to first discuss what gadgets are and why they are dangerous.

Gadgets: What are they - and why are they dangerous?

Gadgets are sequences of instructions that end in an indirect branch, that is an instruction that branches to an address stored in a register. In the AArch64 instruction set, these instructions are:

BR <Rn>
BLR <RN>
RET (BR LR)

Shorter gadgets are more useful to an attacker. An ideal gadget is only 2 or 3 instructions long. As it gets longer, it starts to become unwieldy, hard to reason about and difficult to use. A useful gadget performs some operation like arithmetic, memory read/writes, function calls and so on.

One of the most important things to note is that these gadgets exist in every program! They occur in both hand-written assembly and compiler generated code. Let’s look at the following assembly:

foo:
    SUB SP, SP, #16
    STR X0, [SP, #8]
    STR W1, [SP, #4]
    LDR W1, [SP, #4]
    LDR X0, [SP, #8]
    STR W1, [X0]
    LDR W0, [SP, #4]
    ADD SP, SP, #16
    RET

The final 3 instructions are a potential gadget. A value is loaded into w0 from the stack, the stack pointer is increased and the gadget branches to the value in the link register. Any number of the instructions in the function could have been included in the gadget.

Each gadget is harmless on its own, it is the composition of gadgets that makes them attractive to an attacker. We really need to look at how somebody might join these gadgets together! Let’s look at some code reuse attacks.

ROP Attacks

Return Orientated Programming (ROP) is a code reuse attack. Each gadget used in the attack ends in a return instruction, employing the return register (link register) to control the flow of execution.

The following figure helps illustrate how a ROP attack operates.

ROP attack

In this attack the stack is loaded up with the addresses of gadgets, in order of execution, any data that is required for a gadget and the padding that may be necessary between executing one gadget and the next.

Each gadget executes, consumes its data from the stack, pops the next address off the stack and “returns” to the next gadget in the sequence.

Preventing ROP Attacks

The Armv8.3-A Pointer Authentication extension introduced new features to protect the integrity of data and instructions. We’re most interested in those that focus on protecting the return register. These instructions are:

PACIASP,PACIBSP - sign the return register
AUTIASP,AUTIBSP - authenticate the return register
RETAA,RETAB - fused authenticate the return register and return

Pointer signing works by creating a pointer authentication code (PAC) from the value in the input register, some secret key value and some context. In the case of PACIASP/PACIBSP these values are the return address in the link register, the value in the A/B key system registers and the value of the stack pointer. The PAC is then stored in the top bits of the link register.

When authenticating, another PAC is constructed from the same registers and if this matches the PAC in the register then the PAC is removed from the value in the register. Otherwise, the value is corrupted such that a translation fault will occur when the value is used in a branching instruction.

The architecture provides two keys, A and B, for signing return addresses. One could imagine this allowing one program to sign and authenticate its own addresses but not that of another. A typical use of this would be to have userspace programs using the A key whilst the kernel uses the B key, this avoids needing to clear and set registers on context switches.

The pac and aut instructions live in the encoding space that has been reserved for NOP space in previous architectures; they are backwards compatible with Armv8-A architectures. The fused authenticate and return (RETA) instruction is not part of the NOP space and can only be used on Armv8.3-A architecture and onward.

Pointer authentication, or return address signing, is best displayed with an example:

foo: 
  SUB SP, SP, #32
  STP X29, X30, [SP, #16]
  ADD X29, SP, #16
  STR X0, [SP, #8]
  LDR X0, [SP, #8]
  BL bar
  LDP X29, X30, [SP, #16]
  ADD SP, SP, #32
  RET

The code pushes the return address to the stack, branches to bar, pops the return address off the stack into the return register and returns. Unfortunately, we have no way of knowing if the return address has been corrupted. So, trying again with pointer authentication:

foo:
  PACIASP
  SUB SP, SP, #32
  STP X29, X30, [SP, #16]
  ADD X29, SP, #16
  STR X0, [SP, #8]
  LDR X0, [SP, #8]
  BL bar
  LDP X29, X30, [SP, #16]
  ADD SP, SP, #32
  AUTIASP
  RET

Now the return address is signed before being saved to the stack. When it’s loaded back into the return address register the PAC will be recomputed from the current value of the link register, the A key and the stack pointer. If return address has been corrupted in some way, it’s unlikely that the PAC will match anymore so the return register will be forced to a value that ensures a translation fault. If all is fine, execution will continue as expected.

JOP Attacks

ROP attacks have been dealt with but unfortunately there is more than just one kind of code reuse attack. Another, more sophisticated, type of code reuse attack is jump oriented programming (JOP) attacks.

With the link register protected by return address signing, an attacker looks for another means to control the flow of execution. There are two remaining branching instructions, BR and BLR. These instructions can branch using any register, so if the target register can be loaded with some attacker-controlled value then it's vulnerable to attack.

This brings us to looking at JOP attacks. Again, better explained with a diagram:

JOP Attacks

There are three components to this attack:

Dispatcher table - a location in memory that holds the addresses of the functional gadgets and the data for the attack. This is constructed by the attacker, overflowing a buffer with addresses and data.
Dispatcher gadget – a gadget which can iterate through the dispatcher table and branch to the next functional gadget in it. An attacker must find this gadget in memory, the access pattern of this gadget will control the shape of the dispatcher table and the locations of its entries.
Functional gadgets – any gadget that performs some operation, e.g. arithmetic, memory read/write and end in an indirect branch instruction which branches back to the dispatcher gadget.

We can draw parallels between JOP and ROP attacks. In a ROP attack the return address controls the flow of execution but in a JOP attack the flow is controlled via some other register and updated by f(ptr). The dispatcher table that we see in a JOP attack exists on the stack in a ROP attack.

Preventing JOP Attacks

Although there are similarities in both the attacks, the tactic to prevent ROP attacks won’t work to prevent JOP attacks: there is no address to preserve across a function call. Instead, observe that the JOP attack takes advantage of the freedom that allows any indirect branch to legally land anywhere in the program. What if we could ensure that indirect branches can only go to certain places?

Armv8.5-A introduces a new feature that ensures indirect branches must land on corresponding instructions. Introducing the branch target identification (BTI) mechanism where we pair every indirect instruction with a corresponding legal instruction:

BTI <target> where target is one of:

‘c’ : Target of indirect calls (BLR Rn).
‘j’ : Target of indirect jumps (BR Rn).
‘jc’: Target of indirect jumps or calls.

Almost all other instructions^[5] are invalid branch targets and branching to an incompatible instruction raises a branch target exception.

All the BTI instructions are in the NOP space which means binaries protected with BTI are backward compatible.

Forcing branches to land on certain instructions makes it difficult to find desirable gadgets. Most shorter gadgets no longer exist as they don’t start on a BTI instruction. The gadgets that do start on BTI instructions are often long and manipulate registers and memory in ways that make them incompatible with each other. To cause even more issue to an attacker, the dispatcher gadget only has access to a subset of the remaining gadget; the branching instruction in the dispatcher gadget must match the leading BTI instruction in each functional gadget.

BTI protected code may look as follows:

foo:
  BTI  C                    // branch target
  SUB  SP, SP, #32
  ADRP X8, .Ltmp2
  ADD  X8, X8, :lo12:.Ltmp2
  STR  W0, [SP, #28]
  STR  X8, [SP, #16]
  LDR  X8, [SP, #16]
  STR  X8, [SP, #8]
  B    .LBB0_2
.Ltmp2:
  BTI  J                    // branch target
  LDR  W0, [SP, #28]
  ADD  SP, SP, #32
  RET
.LBB0_2:
  LDR  X8, [SP, #8]
  BR   X8

There are only two valid target instructions to land on in this function.

Protecting Your Code

Whilst the Arm architecture provides hardware features to prevent these attacks, a toolchain provides developers a means to effortlessly leverage the protective abilities of these features.

What kind of support exists for Armv8.3-A Pointer Authentication and Armv8.5-A Branch Target Identification in Arm Compiler 6, GNU and LLVM?

Both of these features can be easily used through a single command-line option:

-mbranch-protection=<protection>

Where <protection> can be any combination of:

‘pac-ret{+leaf+b-key}’ where
- 'pac-ret' enables return address signing for non-leaf functions using the A-key.
- '+leaf' increase the scope of return address signing to include leaf functions.
- '+b-key' uses B-Key instructions to sign addresses instead of A-key instructions.
‘bti’ protects code using Branch Target Identification.
‘standard’ turns on all types of branch protection. Currently ‘standard’ implies ‘pac-ret+bti’.
‘none’ turns off all types of branch protection.

This option is available for AArch64 for Armv8-A architecture onward.

When compiling for an architecture version of Armv8.3-A or later, the compiler can take advantage of the non-NOP space Pointer Authentication instruction like RETAA and RETAB to optimize the branch protection code. This option is available in Arm Compiler 6.11^[2], GCC-9.1^[3] and LLVM 7^[4]. GCC-9.1 is also introducing a configure option ‘--enable-standard-branch-protection’ to turn on both protections by default.

Pointer Authentication has been available in gcc since GCC-7 without the B-key support via-msign-return-address=[none|non-leaf|all]. From GCC-9.1 onward this option is deprecated and is replaced by the new option. The options translate as follows:

Old Option	-msign-return-address=non-leaf	-msign-return-address=all
New Option	-mbranch-protection=pac-ret	-mbranch-protection=pac-ret+leaf

How Secure is Your Code?

The features discussed have been designed to reduce the number of gadgets available to an attacker. So we now explore just how many gadgets these features remove.

GLIBC is a big library that is ubiquitous throughout C/C++ applications. This makes it an ideal target for attackers looking for vulnerabilities. We start investigating the effectiveness of return address signing and BTI by identifying how many gadgets exist in GLIBC; before and after the mitigations are applied.

Finding gadgets in code is incredibly easy and automatable for programs built for AArch64. A scanner needs only find an indirect branch and then report all gadgets, up to some depth, that end on that instruction. One existing tool is ROPgadget.py^[1], running ROPgadget.py on GLIBC gives an insight in to the scale of the problem.

Finding gadgets in code gadget reduction graph

There are a huge number of gadgets to be found in an unprotected GLIBC, around 16,500. Yet a compiler using the mitigations we've described is able to reduce this by a whopping 97.65%, a small fraction of those we had in the beginning. This results in ~200 gadgets and these are mostly just short functions that are indistinguishable from accidental gadgets. Reducing to this number allowed us to manually inspect the remaining gadgets in GLIBC (explicitly the JOP dispatcher gadgets) and reverse engineer the binary to identify coding patterns that lead to a compiler emitting exploitable code.

What About the Side Effects?

This protection is great but there are going to be costs associated with these benefits. The obvious and initial cost is the increase in code size, what follows is an analysis into this cost:

graph showing code size effect on GLIBC

The graph shows the code size effect on GLIBC is not that bad. Even though turning on both the mitigations leads to a 2.9% code size increase, this increase is less dramatic when compiling with -march=armv8.3-a. Compiling for Armv8.3-A allows the compiler to use fused authenticate and return instructions, in this case the code size increase is only 1.6%.

What's the Outcome?

Whilst the fight against attackers is a continuous effort, this blog clearly displays that we have made substantial effort in mitigating against a certain class of attack. Programmers normally have little to no control over the gadgets that appear in their final binary. With the compilers we have developed that use return address signing and BTI, a programmer can rely on the toolchain to reduce the attack surface of their applications.

References

^{For more reading on ROP and JOP attacks: https://www.comp.nus.edu.sg/~liangzk/papers/asiaccs11.pdf}
^{[1] ROPgadget.py (We modified this tool to fit our requirements): https://github.com/JonathanSalwan/ROPgadget}
^{Compilers with these features:}
- ^{[2] Arm Compiler 6.11 : https://developer.arm.com/tools-and-software/embedded/arm-compiler/downloads/version-6}
- ^{[3] GCC 9.1 : https://gcc.gnu.org/gcc-9/}
- ^{[4] LLVM 7.0.1 : http://releases.llvm.org/download.html#7.0.1}
^{[5] BRK, BTI, HLT, PACIASP and PACIBSP may not. See the individual instructions for details.}

Alexei Fedorov over 6 years ago

Hi Luke,

It seems that "+b-key" option is not supported in GCC9.1, but only GCC10.

The following error is generated when compiled with '-mbranch-protection=pac-ret+b-key' with aarch64-none-elf-gcc (fsf-9.37) 9.1.1 20190704:

cc1: error: invalid arg 'b-key' for '-mbranch-protection='
- Cancel
- Up 0 Down
- Reply
- More
- Cancel

Tools, Software and IDEs blog

CPython Core Dev Sprint 2025 at Arm Cambridge: The biggest one yet

Diego Russo

For one week, Arm’s Cambridge HQ became the heart of Python development. Contributors globally came together for the CPython Core Developer Sprint.
- October 9, 2025
Python on Arm: 2025 Update

Diego Russo

Python powers applications across Machine Learning (ML), automation, data science, DevOps, web development, and developer tooling.
- August 21, 2025
Product update: Arm Development Studio 2025.0 now available

Stephen Theobald

Arm Development Studio 2025.0 now available with Arm Toolchain for Embedded Professional.
- July 18, 2025

AI blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded and Microcontrollers blog

Internet of Things (IoT) blog

Laptops and Desktops blog

Mobile, Graphics, and Gaming blog

Operating Systems blog

Servers and Cloud Computing blog

SoC Design and Simulation blog

Tools, Software and IDEs blog

Code reuse attacks: the compiler story

Code Reuse Attacks: A history

Gadgets: What are they - and why are they dangerous?

ROP Attacks

Preventing ROP Attacks

JOP Attacks

Preventing JOP Attacks

Protecting Your Code

How Secure is Your Code?

What About the Side Effects?

What's the Outcome?

References

CPython Core Dev Sprint 2025 at Arm Cambridge: The biggest one yet

Python on Arm: 2025 Update

Product update: Arm Development Studio 2025.0 now available