Armv8.1-M Pointer Authentication and Branch Target Identification Extension

April 7, 2021

24 minute read time.

We are excited to announce the Armv8.1-M Pointer Authentication and Branch Target Identification (PACBTI) Extension, see the latest version of the Armv8-M Architecture Reference Manual (version B.o onward). This extension enhances the M-profile security model and offers new tools for software developers.

PACBTI is inspired by two concepts from the A-profile architecture, Pointer Authentication introduced in Armv8.3-A and Branch Target Instructions introduced in Armv8.5-A. These technologies are designed to mitigate Return-Oriented Programming (ROP) and Jump-Oriented Programming (JOP) security exploit attacks.

The attacks utilize existing and legitimate code fragments called gadgets. In a successful exploit the attacker gains control over the call stack, for example via stack smashing, and then the pointers stored on the stack are overwritten to point to selected gadgets. By branching from one gadget to another the attacker can escalate the operating privileges and take full control of the system.

Please refer to Learn the architecture: Providing protection for complex software, which extensively covers stack smashing, ROP and JOP.

Armv8-M already provides security and memory protection features through TrustZone for Armv8-M, Memory Protection Unit (MPU), and Privileged Execute-never (PXN). These features offer efficient mechanisms for isolating critical security firmware and private information, enforcing privilege rules, separating processes, and enforcing access rules. PACBTI builds on these existing features and offers new tools for detecting ROP and JOP exploitable software.

Like many hardware security features, PACBTI is designed to catch common exploitable software errors but it is not the ultimate solution for all ROP and JOP attacks. This extension relies on a robust software model and it can be a powerful tool when combined with good software development practices. A sensible compiler implementation for the PACBTI extension should ensure that all PAC and BTI functionality is correctly inserted into the compiled code.

Pointer authentication

The pointer authentication technique was initially designed for AArch64 Armv8-A, which is a 64-bit architecture. Pointers are often stored on the stack and if the stack is compromised by an attacker, then the pointers can be forged. The full 64-bit address range is currently not fully utilized, so there are some spare bits available in the MSB’s of the pointer, which are used to embed security information for validating the pointer. The pointer authentication technique used in Armv8.3-A cannot be directly ported to Armv8.1-M, since the M-profile architecture is based on a 32-bit physically mapped memory. A pointer to any address never exceeds 32 bits, and there are no vacant bits in the pointer. For this reason, in Armv8.1-M the pointer authentication information is stored in a General-Purpose Register (GPR) separate to the pointer, and as long as software correctly inserts the pointer authentication instructions, the behavior is comparable. Pointer authentication can be enabled for every security and privilege state, see table 1.

	Current	Non-secure	Secure
Privileged	CONTROL.PAC_EN	CONTROL.NS.PAC_EN	CONTROL_S.PAC_EN
Unprivileged	CONTROL.UPAC_EN	CONTROL_NS.UPAC_EN	CONTROL_S.UPAC_EN

Table 1: Enabling pointer authentication

Creating a pointer authentication code

The process of creating a Pointer Authentication Code (PAC) can be referred to as signing the pointer. To create a PAC, the pointer, a modifier, and a secret key are fed into a cryptographic mechanism, which produces a 32-bit fixed length code. We will call this code PACenter, and the signing instruction stores it in a GPR. For example, the PAC instruction uses fixed registers, Link Register (LR) as the pointer, Stack Pointer (SP) as the modifier, and R12 for storing the generated PAC, whereas the PACG instruction offers a selection of input and output GPRs.

This operation is illustrated in figure 1 where:

PAC: Pointer authentication code, stored in the destination GPR.
MOD: Modifier, 32-bit value loaded from an input GPR.
PTR: Pointer, the 32-bit address to be protected loaded from an input GPR.
Key: A secret 128-bit key.
C: The cryptographic mechanism. For example, QARMA.

Figure 1: Creating PAC

The modifier must be a value which will be the same upon creation and validation of the signed pointer. For example, when signing a return address on a function call, the SP can have a different value every time the function is called but it will have the same value at the start and at the end of every call. Using the SP as a modifier produces a PAC that is only valid for that instance of the function call, since the SP will probably be in a different location on future calls.

Validating a pointer authentication code

To validate a pointer, the authentication instruction compares PACenter with a new PAC, called PACreturn, implicitly produced by the authentication instruction. The PACreturn value is not visible to software. If PACenter is identical to PACreturn then none of the following values have been modified:

The pointer.
The modifier.
The secret key.
The PACenter.

If PACenter does not match PACreturn then an INVSTATE UsageFault exception is generated by the authentication instruction, as illustrated in figure 2. The handler will terminate the thread since any authorization failure is a clear indication of tampering. Any speculatively executed instructions should be terminated ensuring that there are no observable side-effects due to the corrupted, pointer, modifier, key, or PAC.
Figure 2: Validating PAC

Figure 2: Validating PAC

The authentication mechanism can detect a change in any or all the input values with a high degree of confidence. The output of the cryptographic algorithm is a 32-bit PAC, so it is possible for different combinations of input arguments and cryptographic keys to produce an identical PAC. This type of cryptographic collision is an inherent limitation of all lossy cryptographic schemes and cannot be detected by the PAC authentication instructions. However, the likelihood of such a collision is very low.

Keys

The architecture provides four 128-bit keys, one for each security and privilege state, see table 2. Each key is stored in four 32-bit System registers.

	Current	Non-secure	Secure
Privileged	PAC_KEY_P	PAC_KEY_P_NS	PAC_KEY_P_S
Unprivileged	PAC_KEY_U	PAC_KEY_U_NS	PAC_KEY_U_S

Table 2: PAC keys

Privileged software can access the keys via MSR and MRS instructions and manage operations like key update and maintenance. Unprivileged software cannot directly access the PAC keys. We also support the general rules of TrustZone for Armv8-M and the following accesses are also permitted:

Secure software can access Secure and Non-secure keys.
Non-secure software can only access Non-secure keys.

A unique key is available in each security and privilege state, so software does not need to switch between the keys, the hardware does this automatically. However, most user software will operate in Non-secure unprivileged Thread mode, so we recommend that each thread is assigned its own unique PAC key. If an attacker attempts to form a chain of gadgets, then every PAC must be guessed correctly, otherwise an exception will be raised.

New instructions

PAC: Sign LR using SP as the modifier and the PAC is stored in R12.
PACBTI: Sign LR using SP as the modifier and the PAC is stored in R12. When BTI is enabled this instruction is a valid “landing pad”, see the section on BTI for further details.
PACG: Sign a GPR Rn using Rm as the modifier and the PAC is stored in Rd.
AUT: Authenticate LR using SP as the modifier and PAC in R12, generates a synchronous INVSTATE UsageFault on authorization failure.
BXAUT: Indirect branch with pointer authentication. Branch to GPR Rn using Rm as the modifier and the PAC in Rd, generates a synchronous INVSTATE UsageFault on authorization failure.
AUTG: Authenticate a GPR Rn using Rm as the modifier and the PAC is in Rd, generates a synchronous INVSTATE UsageFault on authorization failure.

Note that PAC, PACBTI, PACG operations can interoperate with any AUT, BXAUT, AUTG instructions, provided that the same input arguments are used for creating and authenticating PAC.

Use of the NOP space

Some of the new pointer authentication instructions are in the NOP space. Applications or libraries that protect themselves with these NOP-space instructions can run on older processors without pointer authentication support. Although the older processors will not benefit from the protections, this can be very useful in heterogeneous systems.

The following instructions are NOP space compatible:

PAC.
PACBTI.
AUT.

Debugging

The PACBTI Extension can support software and external debuggers.

A privileged debugger can access the PAC keys using the debug register transfer mechanism.
Accesses to PAC keys from privileged debuggers that have been demoted by the Unprivileged Debug Extension (UDE) are prohibited.
A privileged debugger can enable and disable PAC by accessing CONTROL.PAC_EN and CONTROL.UPAC_EN.
Unprivileged debug access is permitted to the CONTROL.UPAC_EN which controls the PAC settings in unprivileged mode.

When and where do we use PAC?

Software vulnerability to ROP attacks can be demonstrated by looking at an example written in C. In this example we call a function and request an external input. Since there are no bounds checks, the user can enter a string of any length and overrun the allocated memory. This is a good example of very bad code, and it should be easy enough to catch. However, the compiler may not generate any warnings, so the programmer must be experienced enough to understand this weakness.

void callee(void){ 
    char username[12];      // Stored on the stack 
    scanf("%s", username);  // If input is greater than the 
                            // allocated array size, then 
                            // the return pointer may be 
                            // overwritten 
} 
void caller(void){ 
    callee(); 
}

When the C code is compiled for Cortex-M, depending on the optimization level, the assembly will look something like:

scanf: 
    ...                     ; Accept “external input” 
callee: 
    PUSH    {LR}            ; Stack return address 
    SUB     SP, SP, #12     ; Adjust the stack pointer 
    MOV     R1, SP          ; SP passed to “scanf” through R1 
    LDR     R0, .L3         ; .L3 holds pointer to “%s”	  
    BL      scanf 
    ADD     SP, SP, #12     ; Readjust the SP before return 
    POP     {PC}            ; If the “external input” exceeds the 
                            ; bounds, then the value loaded into 
                            ; the PC can be used in a ROP attack 
caller: 
    PUSH    {R3, LR} 
    BL      callee 
    POP     {R3, PC}

So, what can we do to mitigate this vulnerability? The obvious answer would be to fix the software, but not all cases are easy to fix, and this is where additional hardware mechanisms such as PAC can be used. The compiler can be directed to use the PAC functionality, applied to the “callee” function, and the resultant assembly will look something like:

callee: 
    PAC     R12, LR, SP	    ; Sign the return address 
	PUSH	{R12, LR}		; Stack PAC and return address 
    SUB     SP, SP, #16		; Adjust the stack pointer 
    MOV     R1, SP          ; SP passed to “scanf” through R1 
    LDR     R0, .L3         ; .L3 holds pointer to “%s”	  
    BL      scanf 
    ADD     SP, SP, #16     ; Readjust the SP before return 
    POP     {R12, LR}       ; Restore the PAC and return address 
    BXAUT   R12, LR, SP     ; Validate LR and return to “caller”

Adding the PAC instruction to the start of the function ensures that any tampering with LR or R12 will be detected when the BXAUT instruction is executed. R12 must be stacked because there is no guarantee that the “scanf” function will preserve it. SP does not need to be stacked in this example, but PAC authentication will fail if the SP has changed from its original value.

While PAC is useful for catching pointer exploits, not all functions need to be protected. Functions are most vulnerable when loading a return pointer from the stack, and then branching to that address. In a typical function, one such example would be the return address held in the LR register. Function return will be accomplished via the “BX LR” instruction, and if the pointer has been tampered with, then the branch will not return to the intended caller function. However, in a leaf function where the LR is not stored and restored from the stack, such an attack is not feasible, and PAC protection is unnecessary.

The combined authenticate and branch instruction, BXAUT, prevents some compiler optimizations but also adds robustness. It is useful for two reasons, code density improvements and elimination of some gadgets, but we cannot eliminate all code density penalties as the PAC instruction is required and cannot be hidden away.

The BXAUT instruction can be substituted with AUT + BX instructions to produce backwards compatible code. Since the authentication operation and the return branch are separate instructions, a valid compiler optimization could rearrange the code to insert other instructions between the AUT and BX. This is perfectly acceptable if the validated LR is not stacked and restored, the same applies to the AUTG instruction. However, any gap between the authenticated pointer and the branch could expose a useful ROP or JOP gadget, if PACBTI protection is not applied to the full software stack.

PACBTI is a new feature, and we cannot expect all software libraries to be immediately recompiled with this support, so hybrid software is likely to be used. PACBTI protection is intended for securing your own code, reducing the risk of exploits. However, the security of the rest of the system cannot be guaranteed, and some libraries and user code could remain exposed to ROP attacks.

Protecting memcpy

In the following example we demonstrate how typical code, which is vulnerable to ROP attacks, is protected through the PAC mechanism. Note that some code complexity has been hidden to aid readability.

Original

main:
    BL      memcpy
memcpy:
    PUSH    {R0, LR}
    WLS     LR, R2, loopEnd
loopStart:
    LDRSB   R3, [R1], #1
    STRB    R3, [R0], #1
    LE      LR, loopStart
loopEnd:
    POP     {R0, LR}
    BX      LR

Protected by PAC

main:
    BL      memcpy
memcpy:
    PAC     R12, LR, SP     ; Sign the pointer
    PUSH    {R0, LR}
    WLS     LR, R2, loopEnd
loopStart:
    LDRSB   R3, [R1], #1
    STRB    R3, [R0], #1
    LE      LR, loopStart
loopEnd:
    POP     {R0, LR}
    AUT     R12, LR, SP     ; Authenticate the pointer
    BX      LR

This example shows a simple “memcpy” function which utilizes three Armv8.1-M technologies, Helium, Low Overhead Branches (LOB), and PAC. The LOB operation uses LR for counting loop iterations, or vector elements in the case of Helium. Therefore, even a leaf function will need to stack LR if no scratch registers are available.

We can substitute the AUT and BX operations with a single instruction, BXAUT. This instruction is not in the NOP space, so any code compiled using this instruction will only function on CPUs that support PACBTI.

Backwards compatible solution

loopEnd: 
    POP     {R0, LR} 
    AUT     R12, LR, SP 
    BX      LR

Compact solution

loopEnd: 
    POP     {R0, LR} 
    BXAUT   R12, LR, SP

Comparing PAC: M-profile and A-profile

A-profile	M-profile
The PAC Extension and BTI Extension can be implemented independently.	The PACBTI Extension offers both PAC and BTI features. However, individual controls are provided for both features.
This feature is mandatory in Armv8.3 implementations.	PACBTI is an optional feature for Armv8.1-M.
Computed PAC is stored in the upper bits of the 64-bit virtual address.	Computed PAC is saved into a 32-bit GPR. The PAC and physical address are stored in separate registers.
The size of PAC ranges from 11 to 31 bits when tagged addresses are disabled, and from 3 to 23 bits when tagged addresses are enabled.	The PAC length is not configurable, it is fixed to 32 bits.
Five 128-bit keys are provided. Two for instruction addresses, two for data addresses, and one for generic authentication.	Four 128-bit keys are provided. With no distinction between instruction and data addresses. No generic authentication key is provided.
PAC keys are not banked by Exception level.	PAC keys are available for each combination of the Security and privilege level.
PAC algorithm: QARMA or implementation defined.	PAC algorithm: QARMA or implementation defined, same as in A-profile.
Pointer authentication is enabled via SCTLR.	Pointer authentication is enabled via the CONTROL register. PAC can be enabled for each combination of the Security and privilege level.
As a part of the authorization process the authentication instruction performs one of the following: • Replaces the PAC with the extension bits if the pointer is validated. • Replaces the PAC with the extension bits and sets two bits of the extension to a fixed unique number. If the pointer is used by a branch, then the execution branches to an address that generates a Translation fault.	The authentication instruction performs one of the following steps: • If the pointer is validated, then there will be no side-effects. • If the PAC, pointer, modifier, or key do not match the original values, the instruction will generate a synchronous INVSTATE UsageFault.
On an authorization failure some authentication instructions generate a synchronous exception, for example AUTIASP, while other may generate a Translation fault when the address is accessed, for example RETAA.	All authentication instructions generate a synchronous fault on authorization failure.
On authorization failure, a specific PAC exception is signaled.	On authorization failure, a INVSTATE UsageFault is generated.
The PAC is embedded in the pointer, so specific instructions like XPACI are used to strip the PAC from a pointer without authentication.	PAC is held in a separate GPR, so the register can be cleared independently. The pointer can be used without PAC authentication, it is a software choice.

Branch target identification

Branch Target Identification (BTI) can mitigate against some JOP attacks by creating an architectural dependency between certain indirect branch instructions and the instructions that they target. Indirect branches are vulnerable to JOP attacks as the pointers are frequently stored on the stack and if the stack is compromised then these pointers can be manipulated. By modifying the pointer an attacker can utilize existing indirect branches and jump to desired gadgets.

In AArch64, the CPU can be configured so that indirect branch instructions only target valid “landing pad” instructions within a select memory region, which is specified by the Guarded Page (GP) bit in the translation tables. The architecture can record the type of branch that targeted the landing pad, and both direct and indirect branches can be tracked. This is done through the BTYPE field in PSTATE, and three branch types can be identified, calls, jumps, and all branches.

Armv8.1-M only supports physical addressing, there are no spare bits remaining in the MPU registers, so we cannot mark memory regions with a GP bit or equivalent. However, BTI can still function effectively without MPU support. We have introduced the EPSR.B bit which records indirect branches. Unlike AArch64 we have chosen a subset of the indirect branches, and direct branches cannot be recorded. Direct branches use PC relative addressing, and typical targets like function calls can be protected by PAC, so on the M-profile only jumps should be tracked by BTI.

These jump instructions are called “BTI setting” instructions, and when executed, they set EPSR.B to one. A “BTI clearing”, or “landing pad”, instruction clears EPSR.B to zero. When implemented correctly, the BTI setting instruction must always target a BTI clearing instruction, otherwise an INVSTATE UsageFault exception will be raised. The general Armv8.1-M BTI behavioral model is described in figure 3. Note that the Branch Future (BF) instructions notify the PE of an upcoming branch, and they do not directly modify EPSR.B. Instead, the LO_BRANCH_INFO.BTI is updated to indicate an upcoming BTI setting branch.

Figure 3: BTI behavior

A BTI exception is synchronously generated by fetching a non-BTI clearing instruction when EPSR.B is set to one. When the exception is generated, EPSR is stacked as normal, so the state of EPSR.B is captured. Before entering the handler, EPSR.B is cleared to zero, since BTI may not be enabled in the handler. The handler will terminate the thread since any authorization failure is a clear indication of tampering.

BTI setting instructions

We have added the BTI setting functionality to existing Armv8.1-M indirect branch instructions. If BTI is enabled for the target Security and privilege state, then the following instructions will be BTI setting:

BX, BXNS: Only when LR is not used.
BLX, BLXNS.
BFX: Only when LR is not used, updates LO_BRANCH_INFO.BTI.
BFLX: Updates LO_BRANCH_INFO.BTI.
LDR (register): Only when PC is updated by the instruction.
LDR (literal): Only when PC is updated by the instruction.
LDR (immediate): Only when PC is updated and the base address register is either not the SP or the SP and write-back of the SP does not occur.
LDM, LDMIA, LDMFD: Only when PC is updated and the base address register is either not the SP or the SP and write-back of the SP does not occur.
LDMDB, LDMEA: Only when PC is updated and the base address register is either not the SP or the SP and write-back of the SP does not occur.

The “BX LR” and “BFX LR” are not BTI setting instructions because these are frequently used for function returns, and the pointer can be protected by PAC authentication. The BTI setting instructions have been selected based on typical compiler generated code, so not all indirect branch instructions require the BTI setting functionality.

BTI clearing instructions

The following instructions are BTI clearing:

BTI.
SG.
PACBTI.

These valid landing pad instructions clear EPSR.B to zero whenever executed, which is important because common software constructs like functions or case statements can be called from any piece of code. This is particularly relevant for software libraries that are protected using BTI.

Except for a BKPT instruction, which is useful for debugging, attempting to execute all other non-landing pad instructions will generate a fault. The exception will be generated on instruction fetch, so any JOP attempt to execute malicious code will be thwarted without architecturally visible side-effects.

Use of the NOP space

The new instructions BTI and PACBTI are in NOP space. Applications or libraries that protect themselves with this NOP-space instruction can run on older processors without BTI support. Although the older processors will not benefit from the protection, it can be useful in heterogeneous systems.

Debugging

BTI can support software and external debuggers.

A privileged debugger can enable and disable BTI by accessing CONTROL.BTI_EN and CONTROL.UBTI_EN.
Unprivileged debug access is permitted to the CONTROL.UBTI_EN, which controls the BTI settings in unprivileged mode.

Security state transitions

The TrustZone technology for ARMv8-M describes the transitions between Secure and Non-secure software. Read more about TrustZone technology for Armv8-M architecture.

The PACBTI Extension introduces separate controls for enabling BTI in every Security and privilege level, see table 3. For example, user code compiled without BTI could call into a Secure library which is protected by BTI.

	Current	Non-secure	Secure
Privileged	CONTROL.BTI_EN	CONTROL_NS.BTI_EN	CONTROL_S.BTI_EN
Unprivileged	CONTROL.UBTI_EN	CONTROL_NS.UBTI_EN	CONTROL_S.UBTI_EN

Table 3: BTI controls

All the instructions used for Security state transitions also support BTI:

BTI setting

BXNS: Branch and Exhange Non-secure.
BLXNS: Branch with Link and Exchange Non-secure.

BTI clearing

SG: Secure Gateway.

When BTI is implemented and enabled, the behavior is described in the security state transition model is shown in figure 4.

Figure 4: Security state transitions with BTI

Transition to Secure state

In this example we illustrate how TrustZone works with BTI. Enabling BTI does not change the assembly because software does not directly control the BTI architectural state, and existing instructions implicitly support this behavior.

non-secure:
    ...
    LDR     R4, =non-secure-callable
    ...
    BLX	    R4      ; EPSR.B set to 1, BTI setting instruction
                    ; Indirect branch to SG
    ...             ; No BTI clearing instruction required
                    ; Return address of the Secure function
    ...
non-secure-callable:
    SG			    ; EPSR.B set to 0, BTI clearing instruction
    B	    secure	; Not a BTI setting instruction
                    ; Direct branch to the Secure function
    ...
secure:
    ...             ; No BTI clearing instruction required
                    ; Function body
    BXNS    LR      ; Not a BTI setting instruction
                    ; Return to the non-secure function

This example shows the BTI behavior when it is enabled for the Non-secure state, but the code will remain the same even when BTI is disabled for the Non-secure state. The BTI settings for the Secure state are not relevant, because the access goes through an SG instruction, which is always BTI clearing. The return from Secure to Non-secure state does not trigger any BTI behavior since it is achieved through a “BXNS LR” instruction, which is not BTI setting.

Calling Non-secure software

In this example we demonstrate how BTI behavior can be added to Secure program calls to Non-secure functions.

BTI disabled for the Non-secure state

When BTI is disabled for the Non-secure state, the Secure software must ensure that BTI is not set when calling the Non-secure function. Secure BTI can be enabled or disabled. Since Secure software can access the Non-secure bank of the CONTROL register, it can always query the settings of the Non-secure state. Non-secure software, like libraries, may not be compiled with PACBTI, so Secure software must ensure that typical accesses to the Non-secure state continue to function correctly.

Secure software can call other Secure functions using the BLXNS instruction, and in these cases the BLXNS instruction will query the current, Secure, bank of the CONTROL register and determine whether the BTI setting functionality must be applied. If Secure BTI is enabled then BLXNS will set EPSR.B to one, otherwise EPSR.B will not be modified.

secure:
    ...
    LDR     R0, =non-secure
    ...
    BLXNS	R0      ; Instruction implicitly checks CONTROL_NS.UBTI_EN
                    ; EPSR.B unchanged, Non-secure BTI is disabled
    ...             ; Return address
non-secure:
    ...             ; No BTI clearing instruction required
    BX	LR          ; Not a BTI setting instruction
                    ; Return to the secure function

BTI enabled for the Non-secure state

When BTI is enabled for the Non-secure state, the Secure software will require BTI setting instructions to set EPSR.B when calling Non-secure functions. Secure BTI can be enabled or disabled. The Non-secure function must begin with a BTI clearing instruction, PACBTI when PAC protection is required, or BTI when PAC protection is not required.

secure:
    ...
    LDR     R0, =non-secure
    ...
    BLXNS	R0      ; Instruction implicitly checks CONTROL_NS.UBTI_EN
                    ; EPSR.B is set to 1, Non-secure BTI is enabled
    ...             ; Return address
non-secure:
    BTI             ; EPSR.B is set to 0, BTI clearing instruction
                    ; PACBTI can be used but FNC_RETURN is loaded from
                    ; the Secure stack, PAC may be redundant
    ...             ; Function body
    BX      LR      ; Not a BTI setting instruction
                    ; Return to the secure function

The PACBTI instruction

When BTI is enabled only a few instructions are valid landing pads, like PACBTI. During compilation, if a BTI setting instruction is substituted with a non-BTI setting instruction, by a Link Time Optimizer (LTO), then a BTI clearing instruction may not be necessary and the landing pad can be safely removed. If the LTO can verify that all instructions targeting a landing pad are no longer BTI setting, then the substitution can occur. Removing the landing pad will improve security as there will be fewer gadget entry points. To cover this scenario, and other instances where a PACBTI may be undesirable, the PACBTI instruction can be substituted with a PAC instruction.

Examples

Simple function

Adding a BTI at the start of the function ensures that even if an attacker can manipulate a pointer then jumping into the middle of “func” will fail, as the rules for BTI will be violated and the PE will generate an exception. In this example the ADD instruction could act as a useful gadget that an attacker might use.

Original

main:
    LDR     R4, =func
    LDR     PC, [R4]
func:
    ADD     R0, R1, R2
    BX      LR

Protected by BTI

main:
    LDR     R4, =func
    LDR     PC, [R4]    ; BTI setting
func:
    BTI                 ; BTI clearing
    ADD     R0, R1, R2
    BX      LR

Non-leaf function

This is an example of a function where the return address can be protected with PAC, and we can also protect the entry point into this instruction by using a PACBTI instruction, which will ensure that an attacker cannot jump into the middle of the function body.

Original

func:
    PUSH    {R4-R6, LR}
    ...     ; Function body
    POP     {R4-R6, LR}
    BX      LR

Protected by PAC and BTI

func:
    PACBTI  R12, LR, SP
    PUSH    {R4-R6, R12, LR}
    ...     ; Function body
    POP     {R4-R6, R12, LR}
    BXAUT   R12, LR, SP

Branch future

The Branch Future family of instructions is designed to notify the processor of an upcoming branch. If BTI is enabled, the BFLX instruction will implicitly set LO_BRANCH_INFO.BTI to one. When the execution reaches the BF branch point, implicit branch, the processor will automatically set EPSR.B if the LO_BRANCH_INFO cache is valid. Since the LO_BRANCH_INFO cache might be cleared on an exception, the BFLX instruction does not directly update EPSR.B.

Original

main: 
    LDR     R4, =func 
    BFLX    call, R4 
    ... 
call:
            ; Implicit call to func
            ;
            ; Fallback code
    BLX     R4 ; Call func 
    ... 
func:
    ...     ; Function body
    BX      LR

Protected by PAC and BTI

main: 
    LDR     R4, =func 
    BFLX    call, R4 
    ... 
call:       ; BTI setting 
            ; Implicit call to func 
            ; 
            ; Fallback code 
    BLX     R4 ; BTI setting 
    ... 
func: 
    PACBTI  R12, LR, SP  
    ...     ; Function body
    BXAUT   R12, LR, SP

Comparing BTI: M-profile and A-profile

A-profile	M-profile
The BTI Extension and PAC Extension can be implemented independently.	The PACBTI Extension offers both PAC and BTI features. However, individual controls are provided for both features.
This feature is mandatory in Armv8.5 implementations.	PACBTI is an optional feature for Armv8.1-M.
On executing an indirect branch, the type of indirect branch is recorded in PSTATE.BTYPE.	Only specific indirect branch instructions, BTI setting, set the EPSR.B.
There is no direct way of reading or writing to the PSTATE.BTYPE field.	EPSR.B can be accessed through MSR and MRS instructions by privileged software and privileged debuggers.
The architecture distinguishes between branches used for function calls, non-function calls like case-statements. A generic “all” BTYPE is also permitted.	The architecture does not distinguish between BTI setting instructions.
Support for landing pads is enabled for each page, using the GP bit in the translation tables.	Memory is physically mapped and BTI is only controlled through EPSR.B.
The BTI instructions are NOPs in a non-guarded page.	BTI clearing instructions always clear EPSR.B, regardless of the BTI setting for the current Security and privilege state.
A BTI access violation to a guarded memory region will generate a Branch Target exception.	A BTI access violation occurs when a non-BTI clearing instruction is fetched. An INVSTATE UsageFault exception is generated.

Download the latest version of the Armv8-M Architecture Reference Manual.

Armv8-M Architecture Reference Manual

1 comment
0 members are here

Architectures and Processors blog

Introducing GICv5: Scalable and secure interrupt management for Arm

Christoffer Dall

Introducing Arm GICv5: a scalable, hypervisor-free interrupt controller for modern multi-core systems with improved virtualization and real-time support.
- April 28, 2025
Getting started with AARCHMRS Features.json using Python

Joh

A high-level introduction to the Arm Architecture Machine Readable Specification (AARCHMRS) Features.json with some examples to interpret and start to work with the available data using Python.
- April 8, 2025
Advancing server manageability on Arm Neoverse Compute Subsystem (CSS) with OpenBMC

Samer El-Haj-Mahmoud

Arm and 9elements Cyber Security have brought a prototype of OpenBMC to the Arm Neoverse Compute Subsystem (CSS) to advancing server manageability.
- January 28, 2025