How does a developer prevent an attacker from taking control of a program when the developer is providing the tools to the attacker? What are code reuse attacks and how can the Arm Architecture remove the vulnerabilities? How can someone use these architectural security features in their code? How effective are these features? Are they worth the code size increase?
In this blog post we answer these questions and demonstrate how modern compilers can do all the heavy lifting!
Attacking a piece of software was once as straightforward as finding a buffer overflow exploit, filling the buffer with arbitrary code to execute and replacing the return address to point to the beginning of this new code. Fortunately, we now prevent areas of memory being both writable and executable, either an attacker cannot overwrite the code that exists or they cannot execute the code they have injected.
Not to be perturbed, attackers continued looking for exploits and found code reuse attacks. These attacks rely on chaining together several small snippets of code that already exist in a program. It's still necessary to begin the attack through some buffer overflow exploit but no new code needs to be injected.
Imagine a program running as root on a server. Once in control of the flow of execution, an attacker can string together a sequence of gadgets where every operation operates with elevated privileges. If the attacker can launch a shell, they’ve won.
For the rest of this blog-post to make sense, we need to first discuss what gadgets are and why they are dangerous.
Gadgets are sequences of instructions that end in an indirect branch, that is an instruction that branches to an address stored in a register. In the AArch64 instruction set, these instructions are:
BR <Rn>
BLR <RN>
RET (BR LR)
Shorter gadgets are more useful to an attacker. An ideal gadget is only 2 or 3 instructions long. As it gets longer, it starts to become unwieldy, hard to reason about and difficult to use. A useful gadget performs some operation like arithmetic, memory read/writes, function calls and so on.
One of the most important things to note is that these gadgets exist in every program! They occur in both hand-written assembly and compiler generated code. Let’s look at the following assembly:
foo: SUB SP, SP, #16 STR X0, [SP, #8] STR W1, [SP, #4] LDR W1, [SP, #4] LDR X0, [SP, #8] STR W1, [X0] LDR W0, [SP, #4] ADD SP, SP, #16 RET
The final 3 instructions are a potential gadget. A value is loaded into w0 from the stack, the stack pointer is increased and the gadget branches to the value in the link register. Any number of the instructions in the function could have been included in the gadget.
Each gadget is harmless on its own, it is the composition of gadgets that makes them attractive to an attacker. We really need to look at how somebody might join these gadgets together! Let’s look at some code reuse attacks.
Return Orientated Programming (ROP) is a code reuse attack. Each gadget used in the attack ends in a return instruction, employing the return register (link register) to control the flow of execution.
The following figure helps illustrate how a ROP attack operates.
In this attack the stack is loaded up with the addresses of gadgets, in order of execution, any data that is required for a gadget and the padding that may be necessary between executing one gadget and the next.
Each gadget executes, consumes its data from the stack, pops the next address off the stack and “returns” to the next gadget in the sequence.
The Armv8.3-A Pointer Authentication extension introduced new features to protect the integrity of data and instructions. We’re most interested in those that focus on protecting the return register. These instructions are:
PACIASP,PACIBSP
AUTIASP,AUTIBSP
RETAA,RETAB
Pointer signing works by creating a pointer authentication code (PAC) from the value in the input register, some secret key value and some context. In the case of PACIASP/PACIBSP these values are the return address in the link register, the value in the A/B key system registers and the value of the stack pointer. The PAC is then stored in the top bits of the link register.
PACIASP/PACIBSP
When authenticating, another PAC is constructed from the same registers and if this matches the PAC in the register then the PAC is removed from the value in the register. Otherwise, the value is corrupted such that a translation fault will occur when the value is used in a branching instruction.
The architecture provides two keys, A and B, for signing return addresses. One could imagine this allowing one program to sign and authenticate its own addresses but not that of another. A typical use of this would be to have userspace programs using the A key whilst the kernel uses the B key, this avoids needing to clear and set registers on context switches.
The pac and aut instructions live in the encoding space that has been reserved for NOP space in previous architectures; they are backwards compatible with Armv8-A architectures. The fused authenticate and return (RETA) instruction is not part of the NOP space and can only be used on Armv8.3-A architecture and onward.
pac
aut
RETA
Pointer authentication, or return address signing, is best displayed with an example:
foo: SUB SP, SP, #32 STP X29, X30, [SP, #16] ADD X29, SP, #16 STR X0, [SP, #8] LDR X0, [SP, #8] BL bar LDP X29, X30, [SP, #16] ADD SP, SP, #32 RET
The code pushes the return address to the stack, branches to bar, pops the return address off the stack into the return register and returns. Unfortunately, we have no way of knowing if the return address has been corrupted. So, trying again with pointer authentication:
bar
foo: PACIASP SUB SP, SP, #32 STP X29, X30, [SP, #16] ADD X29, SP, #16 STR X0, [SP, #8] LDR X0, [SP, #8] BL bar LDP X29, X30, [SP, #16] ADD SP, SP, #32 AUTIASP RET
Now the return address is signed before being saved to the stack. When it’s loaded back into the return address register the PAC will be recomputed from the current value of the link register, the A key and the stack pointer. If return address has been corrupted in some way, it’s unlikely that the PAC will match anymore so the return register will be forced to a value that ensures a translation fault. If all is fine, execution will continue as expected.
ROP attacks have been dealt with but unfortunately there is more than just one kind of code reuse attack. Another, more sophisticated, type of code reuse attack is jump oriented programming (JOP) attacks.
With the link register protected by return address signing, an attacker looks for another means to control the flow of execution. There are two remaining branching instructions, BR and BLR. These instructions can branch using any register, so if the target register can be loaded with some attacker-controlled value then it's vulnerable to attack.
BR
BLR
This brings us to looking at JOP attacks. Again, better explained with a diagram:
There are three components to this attack:
We can draw parallels between JOP and ROP attacks. In a ROP attack the return address controls the flow of execution but in a JOP attack the flow is controlled via some other register and updated by f(ptr). The dispatcher table that we see in a JOP attack exists on the stack in a ROP attack.
f(ptr)
Although there are similarities in both the attacks, the tactic to prevent ROP attacks won’t work to prevent JOP attacks: there is no address to preserve across a function call. Instead, observe that the JOP attack takes advantage of the freedom that allows any indirect branch to legally land anywhere in the program. What if we could ensure that indirect branches can only go to certain places?
Armv8.5-A introduces a new feature that ensures indirect branches must land on corresponding instructions. Introducing the branch target identification (BTI) mechanism where we pair every indirect instruction with a corresponding legal instruction:
BTI <target> where target is one of:
BTI <target>
where target is one of:
‘c’ : Target of indirect calls (BLR Rn).
‘j’ : Target of indirect jumps (BR Rn).
‘jc’: Target of indirect jumps or calls.
Almost all other instructions[5] are invalid branch targets and branching to an incompatible instruction raises a branch target exception.
All the BTI instructions are in the NOP space which means binaries protected with BTI are backward compatible.
Forcing branches to land on certain instructions makes it difficult to find desirable gadgets. Most shorter gadgets no longer exist as they don’t start on a BTI instruction. The gadgets that do start on BTI instructions are often long and manipulate registers and memory in ways that make them incompatible with each other. To cause even more issue to an attacker, the dispatcher gadget only has access to a subset of the remaining gadget; the branching instruction in the dispatcher gadget must match the leading BTI instruction in each functional gadget.
BTI protected code may look as follows:
foo: BTI C // branch target SUB SP, SP, #32 ADRP X8, .Ltmp2 ADD X8, X8, :lo12:.Ltmp2 STR W0, [SP, #28] STR X8, [SP, #16] LDR X8, [SP, #16] STR X8, [SP, #8] B .LBB0_2 .Ltmp2: BTI J // branch target LDR W0, [SP, #28] ADD SP, SP, #32 RET .LBB0_2: LDR X8, [SP, #8] BR X8
There are only two valid target instructions to land on in this function.
Whilst the Arm architecture provides hardware features to prevent these attacks, a toolchain provides developers a means to effortlessly leverage the protective abilities of these features.
What kind of support exists for Armv8.3-A Pointer Authentication and Armv8.5-A Branch Target Identification in Arm Compiler 6, GNU and LLVM?
Both of these features can be easily used through a single command-line option:
-mbranch-protection=<protection>
Where <protection> can be any combination of:
<protection>
pac-ret{+leaf+b-key
}
'pac-ret
'+leaf'
'+b-key'
bti
standard
pac-ret+bti
none
This option is available for AArch64 for Armv8-A architecture onward.
When compiling for an architecture version of Armv8.3-A or later, the compiler can take advantage of the non-NOP space Pointer Authentication instruction like RETAA and RETAB to optimize the branch protection code. This option is available in Arm Compiler 6.11[2], GCC-9.1[3] and LLVM 7[4]. GCC-9.1 is also introducing a configure option ‘--enable-standard-branch-protection’ to turn on both protections by default.
RETAA
RETAB
--enable-standard-branch-protection
Pointer Authentication has been available in gcc since GCC-7 without the B-key support via-msign-return-address=[none|non-leaf|all]. From GCC-9.1 onward this option is deprecated and is replaced by the new option. The options translate as follows:
-msign-return-address=[none|non-leaf|all].
The features discussed have been designed to reduce the number of gadgets available to an attacker. So we now explore just how many gadgets these features remove.
GLIBC is a big library that is ubiquitous throughout C/C++ applications. This makes it an ideal target for attackers looking for vulnerabilities. We start investigating the effectiveness of return address signing and BTI by identifying how many gadgets exist in GLIBC; before and after the mitigations are applied.
Finding gadgets in code is incredibly easy and automatable for programs built for AArch64. A scanner needs only find an indirect branch and then report all gadgets, up to some depth, that end on that instruction. One existing tool is ROPgadget.py[1], running ROPgadget.py on GLIBC gives an insight in to the scale of the problem.
There are a huge number of gadgets to be found in an unprotected GLIBC, around 16,500. Yet a compiler using the mitigations we've described is able to reduce this by a whopping 97.65%, a small fraction of those we had in the beginning. This results in ~200 gadgets and these are mostly just short functions that are indistinguishable from accidental gadgets. Reducing to this number allowed us to manually inspect the remaining gadgets in GLIBC (explicitly the JOP dispatcher gadgets) and reverse engineer the binary to identify coding patterns that lead to a compiler emitting exploitable code.
This protection is great but there are going to be costs associated with these benefits. The obvious and initial cost is the increase in code size, what follows is an analysis into this cost:
The graph shows the code size effect on GLIBC is not that bad. Even though turning on both the mitigations leads to a 2.9% code size increase, this increase is less dramatic when compiling with -march=armv8.3-a. Compiling for Armv8.3-A allows the compiler to use fused authenticate and return instructions, in this case the code size increase is only 1.6%.
-march=armv8.3-a.
Whilst the fight against attackers is a continuous effort, this blog clearly displays that we have made substantial effort in mitigating against a certain class of attack. Programmers normally have little to no control over the gadgets that appear in their final binary. With the compilers we have developed that use return address signing and BTI, a programmer can rely on the toolchain to reduce the attack surface of their applications.
Hi Luke,
It seems that "+b-key" option is not supported in GCC9.1, but only GCC10.
The following error is generated when compiled with '-mbranch-protection=pac-ret+b-key' with aarch64-none-elf-gcc (fsf-9.37) 9.1.1 20190704:
cc1: error: invalid arg 'b-key' for '-mbranch-protection='