Certain versions of Arm 64-bit processors have features that can help provide control flow integrity and reduce gadget space, making software more robust in the face of attack. Pointer Authentication Codes (PAC) work by signing and verifying indirect branch targets and branch target instructions (BTI) function by marking all valid branch locations. These technologies harden the control flow by ensuring that modification of control flow values are cryptographically verified and that control flow can only be transferred to valid locations. Details on how this works can be found in another Arm blog post on BTI and PAC.
This post is going to spare the underlying implementation details and is going to focus on the A processors and the Linux ecosystem of C/C++ code, ELF, exception handling, and toolchains. The goal being to provide a pragmatic guide for enablement throughout that ecosystem. This is also specifically for C and C++ projects that may optionally contain intermixed assembly, as assembly code modification is required to enable support. Other languages may or may not support these technologies at this time and will not be discussed. All these examples were executed on a Linux machine with support for PAC and BTI. To test if your machine has support for pac and bti you can run the following command:
ELF,
pac
bti
cat /proc/cpuinfo | grep -E -o "bti|pac" | sort | uniq bti pac
Contemporary versions of both the gcc and clang compiler suites, runtimes and assorted binutils support PAC and BTI. Enabling a C or C++ project is as simple as passing the compiler option -mbranch-protection=standard. This will enable the standard set of PAC and BTI features. To facilitate in verifying the project is built with BTI one can optionally specify the linker option -zforce-bti,--fatal-warnings.
gcc
clang
binutils
-mbranch-protection=standard
-zforce-bti,--fatal-warnings.
The linker flags will force the linker to generate an error and output what object files do not support BTI.
Additionally, you can check the produced ELF binary for support using readelf -n <binary> . We will create an empty C file and compile it to an object file and check the resulting object file for a set of special flags. For Example:
readelf -n <binary>
touch empty.c gcc -mbranch-protection=standard -c -o empty.o empty.c readelf -n empty.o Displaying notes found in: .note.gnu.property Owner Data size Description GNU 0x00000010 NT_GNU_PROPERTY_TYPE_0 Properties: AArch64 feature: BTI, PAC
The "Properties" section will indicate PAC and/or BTI support. The main issues with supporting PAC and BTI, is projects utilize standalone assembly, and the assembly must be instrumented to provide this support.
The simplest way to enable assembly is to rewrite it using a combination of C/C++, intrinsics and inline assembly if needed. Modern compilers are very capable of generating optimized assembly routines that are often better than hand coded assembly. However, certain use case may dictate otherwise, and thus existing or new assembly will need modification for 3 specific cases:
We will use the following example program containing both C and Assembly sources. The C code calls an assembly routine called call_function which takes a function pointer as the first argument and jumps to it. Create the following files indicated below. The source code examples can also be found at https://gitlab.arm.com/pac-and-bti-blog/blog-example. The source repository is annotated with tags and the tag name will be associated with the example via the "Tag" keyword, along with a link, for those using the source code repository.
call_function
Tag: Example-1
main.c:
call_function.S:
.section .rodata .align 3 .Lstring: .string "Hello From My Jump!" .section .text .global my_jump .global call_function my_jump: stp x29, x30, [sp, #-16]! // Print "Hello From My Jump!" using puts. // puts can modify registers, so push the return address in x1 // to the stack adrp x0, .Lstring // Get the page the string is within add x0, x0, :lo12:.Lstring // Get the page offset (handles relocations ADD_ABS_LO12_NC) bl puts // puts prints the string in x0 ldp x29, x30, [sp], #16 ret // Function prototype // void call_function(void (*func)()) call_function: // Save link register and frame pointer, allocating enough space for // saving the return location. stp x29, x30, [sp, #-16]! mov x29, sp // x0 is the caller's first argument, so jump // to the "function" pointed by x0 and save // the return address to the stack adr lr, return_loc br x0 //intentionally avoiding a branch and link, you'll see why later. return_loc: // Restore link register and frame pointer ldp x29, x30, [sp], #16 // Return from the function ret
Makefile:
To compile the example code execute:
make ./main Hello From My Jump!
Step one when enabling BTI is to enable it through the compiler. In this case, we will use -mbranch-protection=bti so we only get the instructions for BTI and not PAC. We will also add the linker flags to force an error if BTI is not enabled within an ELF object file. We will use the Makefile to compile all the examples with differing sets of CFLAGS and LDFLAGS.
-mbranch-protection=bti
ELF
Makefile
CFLAGS and LDFLAGS
Perform the following to compile the code with bti support:
CFLAGS="-mbranch-protection=bti" LDFLAGS='-Wl,-zforce-bti,--fatal-warnings' make cc -mbranch-protection=bti -c -o main.o main.c cc -mbranch-protection=bti -c -o call_function.o call_function.S cc -mbranch-protection=bti -Wl,-zforce-bti,--fatal-warnings -o main main.o call_function.o /usr/bin/ld: call_function.o: warning: BTI turned on by -z force-bti when all inputs do not have BTI in NOTE section. collect2: error: ld returned 1 exit status make: *** [Makefile:7: main] Error 1
As designed, the linker errored and reported that BTI is not enabled in the two assembly object files. The next step will be enabling BTI, and a convenient way of doing so, is with support from the C pre-processor which enables conditional compilation and they include features within the C/C++ languages. It can be leveraged so that BTI is support is included conditionally. BTI can be included unconditionally, and the linker will discard the GNU note section flags when combined with other object files that do not declare BTI in the GNU Notes section. Additionally, the BTI instructions will NOP, but you will still pay a cycle count penalty on the NOP operation. With that stated, let's create a header file and include it within our assembly so that the BTI instructions are enabled only when compiled, assembled and linked explicitly with support. Documentation on the feature test macros to use can be found at Arm's Developer documentation on Feature Test Macros.
Tag: Example-2
aarch64.h:#ifndef _AARCH_64_H_ #define _AARCH_64_H_ /* * References: * - https://developer.arm.com/documentation/101028/0012/5--Feature-test-macros * - https://github.com/ARM-software/abi-aa/blob/main/aaelf64/aaelf64.rst */ #if defined(__ARM_FEATURE_BTI_DEFAULT) && __ARM_FEATURE_BTI_DEFAULT == 1 #define BTI_J bti j /* for jumps, IE br instructions */ #define BTI_C bti c /* for calls, IE bl instructions */ #define GNU_PROPERTY_AARCH64_BTI 1 /* bit 0 GNU Notes is for BTI support */ #else #define BTI_J #define BTI_C #define GNU_PROPERTY_AARCH64_BTI 0 #endif /* Add the BTI support to GNU Notes section */ #if GNU_PROPERTY_AARCH64_BTI != 0 .pushsection .note.gnu.property, "a"; /* Start a new allocatable section */ .balign 8; /* align it on a byte boundry */ .long 4; /* size of "GNU\0" */ .long 0x10; /* size of descriptor */ .long 0x5; /* NT_GNU_PROPERTY_TYPE_0 */ .asciz "GNU"; .long 0xc0000000; /* GNU_PROPERTY_AARCH64_FEATURE_1_AND */ .long 4; /* Four bytes of data */ .long GNU_PROPERTY_AARCH64_BTI; /* BTI is enabled */ .long 0; /* padding for 8 byte alignment */ .popsection; /* end the section */ #endif #endif
#ifndef _AARCH_64_H_ #define _AARCH_64_H_ /* * References: * - https://developer.arm.com/documentation/101028/0012/5--Feature-test-macros * - https://github.com/ARM-software/abi-aa/blob/main/aaelf64/aaelf64.rst */ #if defined(__ARM_FEATURE_BTI_DEFAULT) && __ARM_FEATURE_BTI_DEFAULT == 1 #define BTI_J bti j /* for jumps, IE br instructions */ #define BTI_C bti c /* for calls, IE bl instructions */ #define GNU_PROPERTY_AARCH64_BTI 1 /* bit 0 GNU Notes is for BTI support */ #else #define BTI_J #define BTI_C #define GNU_PROPERTY_AARCH64_BTI 0 #endif /* Add the BTI support to GNU Notes section */ #if GNU_PROPERTY_AARCH64_BTI != 0 .pushsection .note.gnu.property, "a"; /* Start a new allocatable section */ .balign 8; /* align it on a byte boundry */ .long 4; /* size of "GNU\0" */ .long 0x10; /* size of descriptor */ .long 0x5; /* NT_GNU_PROPERTY_TYPE_0 */ .asciz "GNU"; .long 0xc0000000; /* GNU_PROPERTY_AARCH64_FEATURE_1_AND */ .long 4; /* Four bytes of data */ .long GNU_PROPERTY_AARCH64_BTI; /* BTI is enabled */ .long 0; /* padding for 8 byte alignment */ .popsection; /* end the section */ #endif #endif
Now that the header file is in place, let's augment the assembly file.
#include "aarch64.h" .section .rodata .align 3 .Lstring: .string "Hello From My Jump!" .section .text .global my_jump .global call_function my_jump: stp x29, x30, [sp, #-16]! // Print "Hello From My Jump!" using puts. // puts can modify registers, so push the return address in x1 // to the stack adrp x0, .Lstring // Get the page the string is within add x0, x0, :lo12:.Lstring // Get the page offset (handles relocations ADD_ABS_LO12_NC) bl puts // puts prints the string in x0 ldp x29, x30, [sp], #16 ret // Function prototype // void call_function(void (*func)()) call_function: BTI_C // Save link register and frame pointer, allocating enough space for // saving the return location. stp x29, x30, [sp, #-16]! mov x29, sp // x0 is the caller's first argument, so jump // to the "function" pointed by x0 and save // the return address to the stack adr lr, return_loc br x0 //intentionally avoiding a branch and link, you'll see why later. return_loc: // Restore link register and frame pointer ldp x29, x30, [sp], #16 // Return from the function ret
As a reminder, since main.c and Makefile require no modifications, it will not be displayed. However, we will need to clean, and rebuild:
main.c
and Makefile
make clean LDFLAGS='-Wl,-zforce-bti,--fatal-warnings' CFLAGS="-mbranch-protection=bti" make cc -mbranch-protection=bti -c -o main.o main.c cc -mbranch-protection=bti -c -o call_function.o call_function.S cc -mbranch-protection=bti -c -o my_jump.o my_jump.S cc -mbranch-protection=bti -Wl,-zforce-bti,--fatal-warnings -o main main.o call_function.o my_jump.o
Notice now the linker no longer complains about "missing the BTI in Note section" and now we can check that the BTI bit is set in the ELF object file:
readelf -n main Displaying notes found in: .note.gnu.property Owner Data size Description GNU 0x00000010 NT_GNU_PROPERTY_TYPE_0 Properties: AArch64 feature: BTI
We can also execute the program:
./main Illegal instruction (core dumped)
Wait, it did not work. Why? The example was intentionally omitting the bti j instruction for the landing pad for the jump. Since the ELF GNU Notes declared that it has support for BTI, the linker or loader mapped the executable pages with PROT_BTI and a runtime exception occurred, as designed. Now, let's add the landing pad to my_jump .
bti j
PROT_BTI
my_jump
Tag: Example-3
#include "aarch64.h" .section .rodata .align 3 .Lstring: .string "Hello From My Jump!" .section .text .global my_jump .global call_function my_jump: BTI_J stp x29, x30, [sp, #-16]! // Print "Hello From My Jump!" using puts. // puts can modify registers, so push the return address in x1 // to the stack adrp x0, .Lstring // Get the page the string is within add x0, x0, :lo12:.Lstring // Get the page offset (handles relocations ADD_ABS_LO12_NC) bl puts // puts prints the string in x0 ldp x29, x30, [sp], #16 ret // Function prototype // void call_function(void (*func)()) call_function: BTI_C // Save link register and frame pointer, allocating enough space for // saving the return location. stp x29, x30, [sp, #-16]! mov x29, sp // x0 is the caller's first argument, so jump // to the "function" pointed by x0 and save // the return address to the stack adr lr, return_loc br x0 //Later has arrived, it's to highlight use of bti j. return_loc: // Restore link register and frame pointer ldp x29, x30, [sp], #16 // Return from the function ret
Then we can re-build the code and run the executable as follows:
make clean CFLAGS="-mbranch-protection=bti" make ./main Hello From My Jump!
One thing of note is when to use bti j vs bti c. Generally speaking, functions called from C/C++ will be through a bl instruction and would use bti c. Whereas assembly will need to be audited to understand the context. It is still useful to audit the C/C++ code with something like objdump -d or having gcc output the assembler. Let's audit the generated assembly and verify how it is calling my_function .
bl
bti c
objdump -d
my_function
gcc -mbranch-protection=bti -S -o main.S main.c
Then lets review the generated assembly:
.arch armv8-a .file "main.c" .text .align 2 .global main .type main, %function main: .LFB0: .cfi_startproc hint 34 // bti c stp x29, x30, [sp, -32]! .cfi_def_cfa_offset 32 .cfi_offset 29, -32 .cfi_offset 30, -24 mov x29, sp adrp x0, call_function add x0, x0, :lo12:call_function str x0, [sp, 24] ldr x1, [sp, 24] adrp x0, my_jump add x0, x0, :lo12:my_jump blr x1 mov w0, 0 ldp x29, x30, [sp], 32 .cfi_restore 30 .cfi_restore 29 .cfi_def_cfa_offset 0 ret .cfi_endproc .LFE0: .size main, .-main .ident "GCC: (GNU) 14.2.1 20240801 (Red Hat 14.2.1-1)" .section .note.GNU-stack,"",@progbits .section .note.gnu.property,"a" .align 3 .word 4 .word 16 .word 5 .string "GNU" .word 3221225472 .word 4 .word 1 .align 3
Notice the control flow change to call_function is through blr, so the resulting routine needs a bti c. However, the control flow transfer to my_jump and return_loc is through br and thus needs a bti j. To summarize, if it is an indirect branch instruction with a link it is classified as a call. However, if it is a plain indirect branch instruction, then it is considered a jump. It's also incredibly important to note that the invocation of ret does not need a bti landing pad, even though, conceptually it is an indirect branch using the link register. If an indirect branch was used to return to return_loc then it would need a bti j landing pad. However, using that approach would also increase the usage of bti landing pads, which would increase the amount of entry points in the code that could be called or jumped to, thus increasing the gadget space available to a potential attacker.
br
ret
return_loc
The BTI instruction bti jc, would be valid in both locations, but it is best to limit the scope of the target to how the program is using it, as it will limit the attackers possibilities. If an entry point serves as both a jump and call location, then it would be appropriate to mark it with a bti jc.
It is important to state that the GNU notes section is mandatory to get BTI support even if the instructions are present. As showcased above, when we linked in call_function.o and my_jump.o the linker reported that these ELF object files do not have the required BTI support indicated. This is because it is missing the GNU notes section. It does not matter if the toolchain is linking an executable or a shared object, every object file must have the support, or the support is stripped from the GNU notes in the linker produced binary. Consequently, it is very important to understand the implications of this behavior. When the loader loads the binary into memory, it checks the GNU Notes section for this support bit to indicate what memory protections to apply. This is indicated by PROT_BTI which is a mprotect / mmap flag that can be applied to enable BTI support for that memory mapping in the MMU. If the GNU Notes section is missing the flag indicating BTI protections, then BTI protections will not be enabled for that memory region. Consider a binary that has multiple shared libraries, this allows BTI aware shared libraries to exist with non-BTI shared libraries where some protections are afforded. Namely, when a control flow change is directed into PROT_BTI marked memory, protections are enforced. If control is transferred into non-BTI memory, BTI instructions, if present, are "NOP'd" and thus not enforced. In the case of static linking, one missing object file will disable it for the whole linked binary.
call_function.o
my_jump.o
mprotect
mmap
Enabling PAC follows the same logical steps as BTI. However, the GNU notes field is optional, but is nice for auditing purposes and we recommended to add it. The reason this flag can be omitted, is unlike BTI, PAC is currently a callee ABI in Linux with no changes to memory permissions. The Linux ABI is that the callee is modified to sign and verify the link register within their function context. So given the most recent assembly sources, let us modify it to support to PAC.
Tag: Example-4
aarch64.h:
#ifndef _AARCH_64_H_ #define _AARCH_64_H_ /* * References: * - https://developer.arm.com/documentation/101028/0012/5--Feature-test-macros * - https://github.com/ARM-software/abi-aa/blob/main/aaelf64/aaelf64.rst */ #if defined(__ARM_FEATURE_BTI_DEFAULT) && __ARM_FEATURE_BTI_DEFAULT == 1 #define BTI_J bti j /* for jumps, IE br instructions */ #define BTI_C bti c /* for calls, IE bl instructions */ #define GNU_PROPERTY_AARCH64_BTI 1 /* bit 0 GNU Notes is for BTI support */ #else #define BTI_J #define BTI_C #define GNU_PROPERTY_AARCH64_BTI 0 #endif #if defined(__ARM_FEATURE_PAC_DEFAULT) #if __ARM_FEATURE_PAC_DEFAULT & 1 #define SIGN_LR paciasp /* sign with the A key */ #define VERIFY_LR autiasp /* verify with the A key */ #elif __ARM_FEATURE_PAC_DEFAULT & 2 #define SIGN_LR pacibsp /* sign with the b key */ #define VERIFY_LR autibsp /* verify with the b key */ #endif #define GNU_PROPERTY_AARCH64_POINTER_AUTH 2 /* bit 1 GNU Notes is for PAC support */ #else #define SIGN_LR #define VERIFY_LR #define GNU_PROPERTY_AARCH64_POINTER_AUTH 0 #endif /* Add the BTI support to GNU Notes section */ #if GNU_PROPERTY_AARCH64_BTI != 0 || GNU_PROPERTY_AARCH64_POINTER_AUTH != 0 .pushsection .note.gnu.property, "a"; /* Start a new allocatable section */ .balign 8; /* align it on a byte boundry */ .long 4; /* size of "GNU\0" */ .long 0x10; /* size of descriptor */ .long 0x5; /* NT_GNU_PROPERTY_TYPE_0 */ .asciz "GNU"; .long 0xc0000000; /* GNU_PROPERTY_AARCH64_FEATURE_1_AND */ .long 4; /* Four bytes of data */ .long (GNU_PROPERTY_AARCH64_BTI|GNU_PROPERTY_AARCH64_POINTER_AUTH); /* BTI or PAC is enabled */ .long 0; /* padding for 8 byte alignment */ .popsection; /* end the section */ #endif #endif
#include "aarch64.h" .section .rodata .align 3 .Lstring: .string "Hello From My Jump!" .section .text .global my_jump .global call_function my_jump: BTI_J stp x29, x30, [sp, #-16]! // Print "Hello From My Jump!" using puts. // puts can modify registers, so push the return address in x1 // to the stack adrp x0, .Lstring // Get the page the string is within add x0, x0, :lo12:.Lstring // Get the page offset (handles relocations ADD_ABS_LO12_NC) bl puts // puts prints the string in x0 ldp x29, x30, [sp], #16 ret // Function prototype // void call_function(void (*func)()) call_function: BTI_C SIGN_LR // Save link register and frame pointer, allocating enough space for // saving the return location. stp x29, x30, [sp, #-16]! mov x29, sp // x0 is the caller's first argument, so jump // to the "function" pointed by x0 and save // the return address to the stack adr lr, return_loc br x0 //Later has arrived, it's to highlight use of bti j. return_loc: // Restore link register and frame pointer ldp x29, x30, [sp], #16 // Return from the function VERIFY_LR ret
Then compile the code with -mbranch-protection=pac-ret which enables standard PAC support only.
-mbranch-protection=pac-ret
make clean CFLAGS="-mbranch-protection=pac-ret" make ./main Hello From My Jump!
Since the PAC support bit was added to the GNU Notes section, readelf should indicate PAC support.
readelf
readelf -n main Displaying notes found in: .note.gnu.property Owner Data size Description GNU 0x00000010 NT_GNU_PROPERTY_TYPE_0 Properties: AArch64 feature: PAC
In Part 1 of the series, we covered how to enable both PAC and BTI for generic C and assembly projects as well is detect if the features are enabled for a given executable. In part 2 of the series, we will look at various optimizations we can make on the current example code.