We are excited to announce the Armv8.1-M Pointer Authentication and Branch Target Identification (PACBTI) Extension, see the latest version of the Armv8-M Architecture Reference Manual (version B.o onward). This extension enhances the M-profile security model and offers new tools for software developers.
PACBTI is inspired by two concepts from the A-profile architecture, Pointer Authentication introduced in Armv8.3-A and Branch Target Instructions introduced in Armv8.5-A. These technologies are designed to mitigate Return-Oriented Programming (ROP) and Jump-Oriented Programming (JOP) security exploit attacks.
The attacks utilize existing and legitimate code fragments called gadgets. In a successful exploit the attacker gains control over the call stack, for example via stack smashing, and then the pointers stored on the stack are overwritten to point to selected gadgets. By branching from one gadget to another the attacker can escalate the operating privileges and take full control of the system.
Please refer to Learn the architecture: Providing protection for complex software, which extensively covers stack smashing, ROP and JOP.
Armv8-M already provides security and memory protection features through TrustZone for Armv8-M, Memory Protection Unit (MPU), and Privileged Execute-never (PXN). These features offer efficient mechanisms for isolating critical security firmware and private information, enforcing privilege rules, separating processes, and enforcing access rules. PACBTI builds on these existing features and offers new tools for detecting ROP and JOP exploitable software.
Like many hardware security features, PACBTI is designed to catch common exploitable software errors but it is not the ultimate solution for all ROP and JOP attacks. This extension relies on a robust software model and it can be a powerful tool when combined with good software development practices. A sensible compiler implementation for the PACBTI extension should ensure that all PAC and BTI functionality is correctly inserted into the compiled code.
The pointer authentication technique was initially designed for AArch64 Armv8-A, which is a 64-bit architecture. Pointers are often stored on the stack and if the stack is compromised by an attacker, then the pointers can be forged. The full 64-bit address range is currently not fully utilized, so there are some spare bits available in the MSB’s of the pointer, which are used to embed security information for validating the pointer. The pointer authentication technique used in Armv8.3-A cannot be directly ported to Armv8.1-M, since the M-profile architecture is based on a 32-bit physically mapped memory. A pointer to any address never exceeds 32 bits, and there are no vacant bits in the pointer. For this reason, in Armv8.1-M the pointer authentication information is stored in a General-Purpose Register (GPR) separate to the pointer, and as long as software correctly inserts the pointer authentication instructions, the behavior is comparable. Pointer authentication can be enabled for every security and privilege state, see table 1.
Table 1: Enabling pointer authentication
The process of creating a Pointer Authentication Code (PAC) can be referred to as signing the pointer. To create a PAC, the pointer, a modifier, and a secret key are fed into a cryptographic mechanism, which produces a 32-bit fixed length code. We will call this code PACenter, and the signing instruction stores it in a GPR. For example, the PAC instruction uses fixed registers, Link Register (LR) as the pointer, Stack Pointer (SP) as the modifier, and R12 for storing the generated PAC, whereas the PACG instruction offers a selection of input and output GPRs.
enter
This operation is illustrated in figure 1 where:
Figure 1: Creating PAC
The modifier must be a value which will be the same upon creation and validation of the signed pointer. For example, when signing a return address on a function call, the SP can have a different value every time the function is called but it will have the same value at the start and at the end of every call. Using the SP as a modifier produces a PAC that is only valid for that instance of the function call, since the SP will probably be in a different location on future calls.
To validate a pointer, the authentication instruction compares PACenter with a new PAC, called PACreturn, implicitly produced by the authentication instruction. The PACreturn value is not visible to software. If PACenter is identical to PACreturn then none of the following values have been modified:
return
If PACenter does not match PACreturn then an INVSTATE UsageFault exception is generated by the authentication instruction, as illustrated in figure 2. The handler will terminate the thread since any authorization failure is a clear indication of tampering. Any speculatively executed instructions should be terminated ensuring that there are no observable side-effects due to the corrupted, pointer, modifier, key, or PAC.
Figure 2: Validating PAC
The authentication mechanism can detect a change in any or all the input values with a high degree of confidence. The output of the cryptographic algorithm is a 32-bit PAC, so it is possible for different combinations of input arguments and cryptographic keys to produce an identical PAC. This type of cryptographic collision is an inherent limitation of all lossy cryptographic schemes and cannot be detected by the PAC authentication instructions. However, the likelihood of such a collision is very low.
The architecture provides four 128-bit keys, one for each security and privilege state, see table 2. Each key is stored in four 32-bit System registers.
Table 2: PAC keys
Privileged software can access the keys via MSR and MRS instructions and manage operations like key update and maintenance. Unprivileged software cannot directly access the PAC keys. We also support the general rules of TrustZone for Armv8-M and the following accesses are also permitted:
A unique key is available in each security and privilege state, so software does not need to switch between the keys, the hardware does this automatically. However, most user software will operate in Non-secure unprivileged Thread mode, so we recommend that each thread is assigned its own unique PAC key. If an attacker attempts to form a chain of gadgets, then every PAC must be guessed correctly, otherwise an exception will be raised.
Note that PAC, PACBTI, PACG operations can interoperate with any AUT, BXAUT, AUTG instructions, provided that the same input arguments are used for creating and authenticating PAC.
Some of the new pointer authentication instructions are in the NOP space. Applications or libraries that protect themselves with these NOP-space instructions can run on older processors without pointer authentication support. Although the older processors will not benefit from the protections, this can be very useful in heterogeneous systems.
The following instructions are NOP space compatible:
The PACBTI Extension can support software and external debuggers.
Software vulnerability to ROP attacks can be demonstrated by looking at an example written in C. In this example we call a function and request an external input. Since there are no bounds checks, the user can enter a string of any length and overrun the allocated memory. This is a good example of very bad code, and it should be easy enough to catch. However, the compiler may not generate any warnings, so the programmer must be experienced enough to understand this weakness.
void callee(void){ char username[12]; // Stored on the stack scanf("%s", username); // If input is greater than the // allocated array size, then // the return pointer may be // overwritten } void caller(void){ callee(); }
When the C code is compiled for Cortex-M, depending on the optimization level, the assembly will look something like:
scanf: ... ; Accept “external input” callee: PUSH {LR} ; Stack return address SUB SP, SP, #12 ; Adjust the stack pointer MOV R1, SP ; SP passed to “scanf” through R1 LDR R0, .L3 ; .L3 holds pointer to “%s” BL scanf ADD SP, SP, #12 ; Readjust the SP before return POP {PC} ; If the “external input” exceeds the ; bounds, then the value loaded into ; the PC can be used in a ROP attack caller: PUSH {R3, LR} BL callee POP {R3, PC}
So, what can we do to mitigate this vulnerability? The obvious answer would be to fix the software, but not all cases are easy to fix, and this is where additional hardware mechanisms such as PAC can be used. The compiler can be directed to use the PAC functionality, applied to the “callee” function, and the resultant assembly will look something like:
callee: PAC R12, LR, SP ; Sign the return address PUSH {R12, LR} ; Stack PAC and return address SUB SP, SP, #16 ; Adjust the stack pointer MOV R1, SP ; SP passed to “scanf” through R1 LDR R0, .L3 ; .L3 holds pointer to “%s” BL scanf ADD SP, SP, #16 ; Readjust the SP before return POP {R12, LR} ; Restore the PAC and return address BXAUT R12, LR, SP ; Validate LR and return to “caller”
Adding the PAC instruction to the start of the function ensures that any tampering with LR or R12 will be detected when the BXAUT instruction is executed. R12 must be stacked because there is no guarantee that the “scanf” function will preserve it. SP does not need to be stacked in this example, but PAC authentication will fail if the SP has changed from its original value.
While PAC is useful for catching pointer exploits, not all functions need to be protected. Functions are most vulnerable when loading a return pointer from the stack, and then branching to that address. In a typical function, one such example would be the return address held in the LR register. Function return will be accomplished via the “BX LR” instruction, and if the pointer has been tampered with, then the branch will not return to the intended caller function. However, in a leaf function where the LR is not stored and restored from the stack, such an attack is not feasible, and PAC protection is unnecessary.
The combined authenticate and branch instruction, BXAUT, prevents some compiler optimizations but also adds robustness. It is useful for two reasons, code density improvements and elimination of some gadgets, but we cannot eliminate all code density penalties as the PAC instruction is required and cannot be hidden away.
The BXAUT instruction can be substituted with AUT + BX instructions to produce backwards compatible code. Since the authentication operation and the return branch are separate instructions, a valid compiler optimization could rearrange the code to insert other instructions between the AUT and BX. This is perfectly acceptable if the validated LR is not stacked and restored, the same applies to the AUTG instruction. However, any gap between the authenticated pointer and the branch could expose a useful ROP or JOP gadget, if PACBTI protection is not applied to the full software stack.
PACBTI is a new feature, and we cannot expect all software libraries to be immediately recompiled with this support, so hybrid software is likely to be used. PACBTI protection is intended for securing your own code, reducing the risk of exploits. However, the security of the rest of the system cannot be guaranteed, and some libraries and user code could remain exposed to ROP attacks.
In the following example we demonstrate how typical code, which is vulnerable to ROP attacks, is protected through the PAC mechanism. Note that some code complexity has been hidden to aid readability.
Original
main: BL memcpy memcpy: PUSH {R0, LR} WLS LR, R2, loopEnd loopStart: LDRSB R3, [R1], #1 STRB R3, [R0], #1 LE LR, loopStart loopEnd: POP {R0, LR} BX LR
Protected by PAC
main: BL memcpy memcpy: PAC R12, LR, SP ; Sign the pointer PUSH {R0, LR} WLS LR, R2, loopEnd loopStart: LDRSB R3, [R1], #1 STRB R3, [R0], #1 LE LR, loopStart loopEnd: POP {R0, LR} AUT R12, LR, SP ; Authenticate the pointer BX LR
This example shows a simple “memcpy” function which utilizes three Armv8.1-M technologies, Helium, Low Overhead Branches (LOB), and PAC. The LOB operation uses LR for counting loop iterations, or vector elements in the case of Helium. Therefore, even a leaf function will need to stack LR if no scratch registers are available.
We can substitute the AUT and BX operations with a single instruction, BXAUT. This instruction is not in the NOP space, so any code compiled using this instruction will only function on CPUs that support PACBTI.
Backwards compatible solution
loopEnd: POP {R0, LR} AUT R12, LR, SP BX LR
Compact solution
loopEnd: POP {R0, LR} BXAUT R12, LR, SP
A-profile
M-profile
The PAC Extension and BTI Extension can be implemented independently.
The PACBTI Extension offers both PAC and BTI features. However, individual controls are provided for both features.
This feature is mandatory in Armv8.3 implementations.
PACBTI is an optional feature for Armv8.1-M.
Computed PAC is stored in the upper bits of the 64-bit virtual address.
Computed PAC is saved into a 32-bit GPR. The PAC and physical address are stored in separate registers.
The size of PAC ranges from 11 to 31 bits when tagged addresses are disabled, and from 3 to 23 bits when tagged addresses are enabled.
The PAC length is not configurable, it is fixed to 32 bits.
Five 128-bit keys are provided. Two for instruction addresses, two for data addresses, and one for generic authentication.
Four 128-bit keys are provided. With no distinction between instruction and data addresses. No generic authentication key is provided.
PAC keys are not banked by Exception level.
PAC keys are available for each combination of the Security and privilege level.
PAC algorithm: QARMA or implementation defined.
PAC algorithm: QARMA or implementation defined, same as in A-profile.
Pointer authentication is enabled via SCTLR.
Pointer authentication is enabled via the CONTROL register. PAC can be enabled for each combination of the Security and privilege level.
As a part of the authorization process the authentication instruction performs one of the following:
• Replaces the PAC with the extension bits if the pointer is validated.
• Replaces the PAC with the extension bits and sets two bits of the extension to a fixed unique number. If the pointer is used by a branch, then the execution branches to an address that generates a Translation fault.
The authentication instruction performs one of the following steps:
• If the pointer is validated, then there will be no side-effects.
• If the PAC, pointer, modifier, or key do not match the original values, the instruction will generate a synchronous INVSTATE UsageFault.
On an authorization failure some authentication instructions generate a synchronous exception, for example AUTIASP, while other may generate a Translation fault when the address is accessed, for example RETAA.
All authentication instructions generate a synchronous fault on authorization failure.
On authorization failure, a specific PAC exception is signaled.
On authorization failure, a INVSTATE UsageFault is generated.
The PAC is embedded in the pointer, so specific instructions like XPACI are used to strip the PAC from a pointer without authentication.
PAC is held in a separate GPR, so the register can be cleared independently. The pointer can be used without PAC authentication, it is a software choice.
Branch Target Identification (BTI) can mitigate against some JOP attacks by creating an architectural dependency between certain indirect branch instructions and the instructions that they target. Indirect branches are vulnerable to JOP attacks as the pointers are frequently stored on the stack and if the stack is compromised then these pointers can be manipulated. By modifying the pointer an attacker can utilize existing indirect branches and jump to desired gadgets.
In AArch64, the CPU can be configured so that indirect branch instructions only target valid “landing pad” instructions within a select memory region, which is specified by the Guarded Page (GP) bit in the translation tables. The architecture can record the type of branch that targeted the landing pad, and both direct and indirect branches can be tracked. This is done through the BTYPE field in PSTATE, and three branch types can be identified, calls, jumps, and all branches.
Armv8.1-M only supports physical addressing, there are no spare bits remaining in the MPU registers, so we cannot mark memory regions with a GP bit or equivalent. However, BTI can still function effectively without MPU support. We have introduced the EPSR.B bit which records indirect branches. Unlike AArch64 we have chosen a subset of the indirect branches, and direct branches cannot be recorded. Direct branches use PC relative addressing, and typical targets like function calls can be protected by PAC, so on the M-profile only jumps should be tracked by BTI.
These jump instructions are called “BTI setting” instructions, and when executed, they set EPSR.B to one. A “BTI clearing”, or “landing pad”, instruction clears EPSR.B to zero. When implemented correctly, the BTI setting instruction must always target a BTI clearing instruction, otherwise an INVSTATE UsageFault exception will be raised. The general Armv8.1-M BTI behavioral model is described in figure 3. Note that the Branch Future (BF) instructions notify the PE of an upcoming branch, and they do not directly modify EPSR.B. Instead, the LO_BRANCH_INFO.BTI is updated to indicate an upcoming BTI setting branch.
Figure 3: BTI behavior
A BTI exception is synchronously generated by fetching a non-BTI clearing instruction when EPSR.B is set to one. When the exception is generated, EPSR is stacked as normal, so the state of EPSR.B is captured. Before entering the handler, EPSR.B is cleared to zero, since BTI may not be enabled in the handler. The handler will terminate the thread since any authorization failure is a clear indication of tampering.
We have added the BTI setting functionality to existing Armv8.1-M indirect branch instructions. If BTI is enabled for the target Security and privilege state, then the following instructions will be BTI setting:
The “BX LR” and “BFX LR” are not BTI setting instructions because these are frequently used for function returns, and the pointer can be protected by PAC authentication. The BTI setting instructions have been selected based on typical compiler generated code, so not all indirect branch instructions require the BTI setting functionality.
The following instructions are BTI clearing:
These valid landing pad instructions clear EPSR.B to zero whenever executed, which is important because common software constructs like functions or case statements can be called from any piece of code. This is particularly relevant for software libraries that are protected using BTI.
Except for a BKPT instruction, which is useful for debugging, attempting to execute all other non-landing pad instructions will generate a fault. The exception will be generated on instruction fetch, so any JOP attempt to execute malicious code will be thwarted without architecturally visible side-effects.
The new instructions BTI and PACBTI are in NOP space. Applications or libraries that protect themselves with this NOP-space instruction can run on older processors without BTI support. Although the older processors will not benefit from the protection, it can be useful in heterogeneous systems.
BTI can support software and external debuggers.
The TrustZone technology for ARMv8-M describes the transitions between Secure and Non-secure software. Read more about TrustZone technology for Armv8-M architecture.
The PACBTI Extension introduces separate controls for enabling BTI in every Security and privilege level, see table 3. For example, user code compiled without BTI could call into a Secure library which is protected by BTI.
Table 3: BTI controls
All the instructions used for Security state transitions also support BTI:
BTI setting
BTI clearing
When BTI is implemented and enabled, the behavior is described in the security state transition model is shown in figure 4.
Figure 4: Security state transitions with BTI
In this example we illustrate how TrustZone works with BTI. Enabling BTI does not change the assembly because software does not directly control the BTI architectural state, and existing instructions implicitly support this behavior.
non-secure: ... LDR R4, =non-secure-callable ... BLX R4 ; EPSR.B set to 1, BTI setting instruction ; Indirect branch to SG ... ; No BTI clearing instruction required ; Return address of the Secure function ... non-secure-callable: SG ; EPSR.B set to 0, BTI clearing instruction B secure ; Not a BTI setting instruction ; Direct branch to the Secure function ... secure: ... ; No BTI clearing instruction required ; Function body BXNS LR ; Not a BTI setting instruction ; Return to the non-secure function
This example shows the BTI behavior when it is enabled for the Non-secure state, but the code will remain the same even when BTI is disabled for the Non-secure state. The BTI settings for the Secure state are not relevant, because the access goes through an SG instruction, which is always BTI clearing. The return from Secure to Non-secure state does not trigger any BTI behavior since it is achieved through a “BXNS LR” instruction, which is not BTI setting.
In this example we demonstrate how BTI behavior can be added to Secure program calls to Non-secure functions.
When BTI is disabled for the Non-secure state, the Secure software must ensure that BTI is not set when calling the Non-secure function. Secure BTI can be enabled or disabled. Since Secure software can access the Non-secure bank of the CONTROL register, it can always query the settings of the Non-secure state. Non-secure software, like libraries, may not be compiled with PACBTI, so Secure software must ensure that typical accesses to the Non-secure state continue to function correctly.
Secure software can call other Secure functions using the BLXNS instruction, and in these cases the BLXNS instruction will query the current, Secure, bank of the CONTROL register and determine whether the BTI setting functionality must be applied. If Secure BTI is enabled then BLXNS will set EPSR.B to one, otherwise EPSR.B will not be modified.
secure: ... LDR R0, =non-secure ... BLXNS R0 ; Instruction implicitly checks CONTROL_NS.UBTI_EN ; EPSR.B unchanged, Non-secure BTI is disabled ... ; Return address non-secure: ... ; No BTI clearing instruction required BX LR ; Not a BTI setting instruction ; Return to the secure function
When BTI is enabled for the Non-secure state, the Secure software will require BTI setting instructions to set EPSR.B when calling Non-secure functions. Secure BTI can be enabled or disabled. The Non-secure function must begin with a BTI clearing instruction, PACBTI when PAC protection is required, or BTI when PAC protection is not required.
secure: ... LDR R0, =non-secure ... BLXNS R0 ; Instruction implicitly checks CONTROL_NS.UBTI_EN ; EPSR.B is set to 1, Non-secure BTI is enabled ... ; Return address non-secure: BTI ; EPSR.B is set to 0, BTI clearing instruction ; PACBTI can be used but FNC_RETURN is loaded from ; the Secure stack, PAC may be redundant ... ; Function body BX LR ; Not a BTI setting instruction ; Return to the secure function
When BTI is enabled only a few instructions are valid landing pads, like PACBTI. During compilation, if a BTI setting instruction is substituted with a non-BTI setting instruction, by a Link Time Optimizer (LTO), then a BTI clearing instruction may not be necessary and the landing pad can be safely removed. If the LTO can verify that all instructions targeting a landing pad are no longer BTI setting, then the substitution can occur. Removing the landing pad will improve security as there will be fewer gadget entry points. To cover this scenario, and other instances where a PACBTI may be undesirable, the PACBTI instruction can be substituted with a PAC instruction.
Adding a BTI at the start of the function ensures that even if an attacker can manipulate a pointer then jumping into the middle of “func” will fail, as the rules for BTI will be violated and the PE will generate an exception. In this example the ADD instruction could act as a useful gadget that an attacker might use.
main: LDR R4, =func LDR PC, [R4] func: ADD R0, R1, R2 BX LR
Protected by BTI
main: LDR R4, =func LDR PC, [R4] ; BTI setting func: BTI ; BTI clearing ADD R0, R1, R2 BX LR
This is an example of a function where the return address can be protected with PAC, and we can also protect the entry point into this instruction by using a PACBTI instruction, which will ensure that an attacker cannot jump into the middle of the function body.
func: PUSH {R4-R6, LR} ... ; Function body POP {R4-R6, LR} BX LR
Protected by PAC and BTI
func: PACBTI R12, LR, SP PUSH {R4-R6, R12, LR} ... ; Function body POP {R4-R6, R12, LR} BXAUT R12, LR, SP
The Branch Future family of instructions is designed to notify the processor of an upcoming branch. If BTI is enabled, the BFLX instruction will implicitly set LO_BRANCH_INFO.BTI to one. When the execution reaches the BF branch point, implicit branch, the processor will automatically set EPSR.B if the LO_BRANCH_INFO cache is valid. Since the LO_BRANCH_INFO cache might be cleared on an exception, the BFLX instruction does not directly update EPSR.B.
main: LDR R4, =func BFLX call, R4 ... call: ; Implicit call to func ; ; Fallback code BLX R4 ; Call func ... func: ... ; Function body BX LR
main: LDR R4, =func BFLX call, R4 ... call: ; BTI setting ; Implicit call to func ; ; Fallback code BLX R4 ; BTI setting ... func: PACBTI R12, LR, SP ... ; Function body BXAUT R12, LR, SP
The BTI Extension and PAC Extension can be implemented independently.
This feature is mandatory in Armv8.5 implementations.
On executing an indirect branch, the type of indirect branch is recorded in PSTATE.BTYPE.
Only specific indirect branch instructions, BTI setting, set the EPSR.B.
There is no direct way of reading or writing to the PSTATE.BTYPE field.
EPSR.B can be accessed through MSR and MRS instructions by privileged software and privileged debuggers.
The architecture distinguishes between branches used for function calls, non-function calls like case-statements. A generic “all” BTYPE is also permitted.
The architecture does not distinguish between BTI setting instructions.
Support for landing pads is enabled for each page, using the GP bit in the translation tables.
Memory is physically mapped and BTI is only controlled through EPSR.B.
The BTI instructions are NOPs in a non-guarded page.
BTI clearing instructions always clear EPSR.B, regardless of the BTI setting for the current Security and privilege state.
A BTI access violation to a guarded memory region will generate a Branch Target exception.
A BTI access violation occurs when a non-BTI clearing instruction is fetched. An INVSTATE UsageFault exception is generated.
Download the latest version of the Armv8-M Architecture Reference Manual.
[CTAToken URL = "https://developer.arm.com/documentation/ddi0553/latest" target="_blank" text="Armv8-M Architecture Reference Manual" class ="green"]
So after Apple announced the x86->ARM transition I compared 64-bit ARM cores and one thing stood out. Most cores are on ARMv8.2-A, but Apple has updated their revision every year since 2016 and are now on 8.4 (so A14/A14X will probably be on 8.5). My question would be if there are any big advances with revisions after 8.2 and if it's possible to match a 8.2 A55 to, say, 8.5 A79?