In Part 2, we looked at enabling PAC and BTI together, optimizations and the hint space. In part 3, we will look at C++ style exception handling and how DWARF interacts with runtimes to provide this support and the needed modifications to support PAC. We also look at using the other signing key available for PAC and adding support in the assembly code.
If you wanted to support exception handling across assembly routines, you must implement the CFI directives to do so. The CFI, or Call Frame Information, are a set of assembler directives that handle generating the DWARF data needed to unwind the call frames and stack when a C++ exception occurs. DWARF itself is a Turing complete stack-based virtual machine, and the CFI directives can be thought of as programming that virtual machine. The DWARF code is executed to generate the required data for handling exceptions. Let's modify our program to throw an exception and ensure it gets handled.
Tag: Example-7
Makefile:
call_function.S:
#include "aarch64.h" .section .text .global call_function // Function prototype // void call_function(void (*func)()) call_function: .cfi_startproc SIGN_LR CFI_WINDOW_SAVE // Save link register and frame pointer, allocating enough space for // saving the return location. stp x29, x30, [sp, #-16]! .cfi_def_cfa_offset 16 .cfi_offset 29, -16 .cfi_offset 30, -8 mov x29, sp // x0 is the caller's first argument, so jump // to the "function" pointed by x0 and save // the return address to the stack blr x0 return_loc: // Restore link register and frame pointer ldp x29, x30, [sp], #16 .cfi_restore 30 .cfi_restore 29 .cfi_def_cfa_offset 0 // Return from the function VERIFY_LR ret .cfi_endproc
main.cpp:
Now we need to compile and run the C++ example:
make clean CXXFLAGS="-mbranch-protection=standard" make ./main Throwing exception... Caught exception: 42
The major differences between this and our previous examples is that instead of main.c we now have main.cpp so we can use C++ exceptions and thus main.c is no longer needed and can be removed. We also modified call_function to call the C++ routine that throws an exception by using blr and not just br and thus my_jump is no longer needed. Additionally, the code was augmented with the required CFI directives. Note that clang and gcc will output the CFI directives in their assembly code when generating assembly from C/C++ code using the option -S. We can now examine how to propagate an exception through an assembly layer so various parts of the runtime can make use of it.An important part of using CFI directives is to understand the meaning of "CFA". The CFA, or Canonical Frame Address, is what the DWARF system uses, and ultimately the unwinder, to unwind the call stack. Debuggers will also make use of this additional DWARF data. The way that DWARF works in practice, is that each function gets its own FDE, or Function Description Entry. Additionally, each FDE is related to a CIE, or Common Information Entry, which, as implied, has common information used by a set of FDEs. By default, the CIE states that the sp is the CFA, so anytime the sp is modified we need to let DWARF know through those CFI directives. That is what .cfi_def_cfa_offset does, it lets DWARF know that the CFA is the current sp plus an offset of 16 bytes. The next thing DWARF needs to know is where to find the lr and the fp relative to the CFA. This is what .cfi_offset does, it informs DWARF that the value for the fp or x29, it is the same register, can be found at the current CFA at offset -16 bytes. Similarly, the same is done for x30 , or the lr with the appropriate offset. The next CFI directive, .cfi_restore, just restores the rule for the register to the same state when .cfi_start_proc was issued. After that, .cfi_def_cfa_offset indicates that the CFA is equal to sp and finally .cfi_endproc ends the FDE entry. All of this instruments the DWARF system, which in-turn is used by debuggers, runtimes and the unwinder. All of these systems need to know that the address in the pushed lr is signed and they need to potentially verify the pointer and demangle the address before using it. The unwinder uses the autia1716 or autib1716 instructions to demangle the return address. Both of these are within the hint space as hint 12 and hint 14 respectively. The pointer must be demangled, as the pointer is modified to include the PAC signature, so removing the signature restores the pointer to a valid pointer.
main.c
main.cpp
call_function
blr
br
my_jump
clang
gcc
-S
sp
.cfi_def_cfa_offset
lr
fp
.cfi_offset
x29
x30
.cfi_restore,
.cfi_start_proc
sp and finally .cfi_endproc ends the FDE entry.
autia1716
autib1716
hint 12
hint 14
Our header files and discussions thus far have indicated that PAC supports two keys: the A and B keys. These keys can be changed at build time through compiler options. This can be done be specifying -mbranch-protection=pac-ret+b-key. Let's modify our latest C++ example, namely my_function.S and aarch64.h to support the B key within the required DWARF code:
-mbranch-protection=pac-ret+b-key
my_function.S and aarch64.h
Tag: Example-8
aarch64.h:
#ifndef _AARCH_64_H_ #define _AARCH_64_H_ /* * References: * - https://developer.arm.com/documentation/101028/0012/5--Feature-test-macros * - https://github.com/ARM-software/abi-aa/blob/main/aaelf64/aaelf64.rst */ #if defined(__ARM_FEATURE_BTI_DEFAULT) && __ARM_FEATURE_BTI_DEFAULT == 1 #define BTI_J hint 36 /* bti j: for jumps, IE br instructions */ #define BTI_C hint 34 /* bti c: for calls, IE bl instructions */ #define GNU_PROPERTY_AARCH64_BTI 1 /* bit 0 GNU Notes is for BTI support */ #else #define BTI_J #define BTI_C #define GNU_PROPERTY_AARCH64_BTI 0 #endif #if defined(__ARM_FEATURE_PAC_DEFAULT) #if __ARM_FEATURE_PAC_DEFAULT & 1 #define SIGN_LR hint 25 /* paciasp: sign with the A key */ #define VERIFY_LR hint 29 /* autiasp: verify with the A key */ #define CFI_B_KEY_FRAME /* empty is no B key */ #elif __ARM_FEATURE_PAC_DEFAULT & 2 #define SIGN_LR hint 27 /* pacibsp: sign with the b key */ #define VERIFY_LR hint 32 /* autibsp: verify with the b key */ #define CFI_B_KEY_FRAME .cfi_b_key_frame #endif #define CFI_WINDOW_SAVE .cfi_window_save #define GNU_PROPERTY_AARCH64_POINTER_AUTH 2 /* bit 1 GNU Notes is for PAC support */ #else #define SIGN_LR BTI_C #define VERIFY_LR #define CFI_WINDOW_SAVE #define CFI_B_KEY_FRAME #define GNU_PROPERTY_AARCH64_POINTER_AUTH 0 #endif /* Add the BTI support to GNU Notes section */ #if GNU_PROPERTY_AARCH64_BTI != 0 || GNU_PROPERTY_AARCH64_POINTER_AUTH != 0 .pushsection .note.gnu.property, "a"; /* Start a new allocatable section */ .balign 8; /* align it on a byte boundry */ .long 4; /* size of "GNU\0" */ .long 0x10; /* size of descriptor */ .long 0x5; /* NT_GNU_PROPERTY_TYPE_0 */ .asciz "GNU"; .long 0xc0000000; /* GNU_PROPERTY_AARCH64_FEATURE_1_AND */ .long 4; /* Four bytes of data */ .long (GNU_PROPERTY_AARCH64_BTI|GNU_PROPERTY_AARCH64_POINTER_AUTH); /* BTI or PAC is enabled */ .long 0; /* padding for 8 byte alignment */ .popsection; /* end the section */ #endif #endif
#include "aarch64.h" .section .text .global call_function // Function prototype // void call_function(void (*func)()) call_function: .cfi_startproc SIGN_LR CFI_WINDOW_SAVE CFI_B_KEY_FRAME // Save link register and frame pointer, allocating enough space for // saving the return location. stp x29, x30, [sp, #-16]! .cfi_def_cfa_offset 16 .cfi_offset 29, -16 .cfi_offset 30, -8 mov x29, sp // x0 is the caller's first argument, so jump // to the "function" pointed by x0 and save // the return address to the stack blr x0 return_loc: // Restore link register and frame pointer ldp x29, x30, [sp], #16 .cfi_restore 30 .cfi_restore 29 .cfi_def_cfa_offset 0 // Return from the function VERIFY_LR ret .cfi_endproc
Compile and run the program:
make clean CXXFLAGS="-mbranch-protection=pac-ret+b-key+bti" make ./main Throwing exception... Caught exception: 42
As previously mentioned, DWARF is byte code for a virtual machine. This DWARF information is then embedded within different sections in the generated ELF files for the various consumers like the unwinder and debuggers. It is possible to dump these DWARF instructions as a dissasembled version which is rather nice for debugging. Note, we will add -g to produce some debug info for the upcoming addr2line example.
-g
addr2line
make clean CXXFLAGS="-mbranch-protection=pac-ret+b-key+bti -g" make readelf --debug-dump=frames call_function.o Contents of the .eh_frame section: 00000000 0000000000000010 00000000 CIE Version: 1 Augmentation: "zR" Code alignment factor: 4 Data alignment factor: -8 Return address column: 30 Augmentation data: 1b DW_CFA_def_cfa: r31 (sp) ofs 0 00000014 0000000000000020 00000018 FDE cie=00000000 pc=0000000000000000..0000000000000014 DW_CFA_advance_loc: 4 to 0000000000000004 DW_CFA_def_cfa_offset: 16 DW_CFA_offset: r29 (x29) at cfa-16 DW_CFA_offset: r30 (x30) at cfa-8 DW_CFA_advance_loc: 12 to 0000000000000010 DW_CFA_restore: r30 (x30) DW_CFA_restore: r29 (x29) DW_CFA_def_cfa_offset: 0 DW_CFA_nop DW_CFA_nop DW_CFA_nop DW_CFA_nop DW_CFA_nop DW_CFA_nop DW_CFA_nop
The noteworthy elements here, for starters, is the "B" in the Augmentation string. This is within the CIE, which will be inherited by all FDEs that use it. The "B" indicates that the PAC B signing key is used. If "B" is not present, then the "A" key is in use. An example usage is demonstrated by unwinders to choose the right instruction, either autib1716 or autia1716, when demangling PAC signed addresses. The other important item to note, is the DW_CFA_AARCH64_negate_ra_state which is the output from the CFI directive .cfi_window_save. This DWARF opcode indicates that the lr is signed and that that anything interpreting the lr needs to demangle it.
Augmentation
DW_CFA_AARCH64_negate_ra_state
.cfi_window_save
It is possible to associate and FDE to a function using addr2line, note it needs -g in the compilation flags or you will see ? in the addr2line output:
?
addr2line -f -e call_function.o 0 call_function /home/bill/workspace/blog-example/call_function.S:10
When an indirect transfer of control flow occurs, BTI enabled hardware and its corresponding software enabled stacks, will ensure that indirect control flow transfers land on landing pad. Another way to state this, is that direct control flow changes are not checked. This is because the target address is encoded in the instruction itself and not provided externally with a potentially attacker controlled value. Consequently, instructions like br and brl and their associated instructions are checked that they land on proper landing pads. Typically, the branch instructions with a link, like brl ,are used to call functions and thus the control flow change needs to land on a bti c or bti jc instruction. For branches that do not modify the link register, like br, they are used for a "jump" and thus must transfer control flow to a bti j or bti jc landing pad. However, in certain scenarios where jump oriented programing models are used, a branch or jump may be used to transfer control flow to a function that is typically called. In some cases, that function that was "jumped to" using a branch instruction is compiled code from a C or C++ compiler and thus the landing pad for that function will be a bti c instruction. Because of this, BTI enforcement will occur and an exception thrown because jumps or branches without the link expect the first instruction for the landing pad as a bti j instruction. To work around this possible issue, the architecture supports that if the target address is in register x16 or x17, that the BTI enforcement will allow the jump to occur to a bti c label or a bti j label as expected. This is further discussed in Jump Oriented Programing.
brl
bti c
bti jc
bti j
This multi-part tutorial shows how to enable PAC and BTI through assembly functions, how PAC instructions can also serve as BTI landing pads, and how to handle PAC A and B keys in source. It also highlights how exception handling needs to be augmented through the use of CFI directives, and how to dump the CFI generated DWARF data.