In my last couple of blogs we built an ELF image to expose some features of the Armv8-A architecture and toolchain for embedded software development. We got to a point where we could print "hello world" to a telnet console, and enable interrupts on the system. In this blog we will not add much functionality to the image, as I want to discuss the architectural features of Exception level and Security state in more detail. You may find it easier to download the code and follow it, as only snippets have been included in the post.
In the general case, privilege levels are operational states of a hardware component which determine what it can do at a given point in time. In the Armv8-A architecture we have a series of Exception levels which control system register accessibility and instruction availability.
Armv8-A Exception levels
EL3 is the most privileged level, with the others built upon it in the abstraction stack. As an example, the Architectural Feature Trap Register (EL3), CPTR_EL3, is only accessible at EL3. It controls a few things, such as floating point operations, for all Exception levels. There is a similar register at EL2, CPTR_EL2, which is accessible at EL2 and EL3. However CPTR_EL2 only affects EL2 and lower Exception levels. The main purpose of this hierarchy is to grant only as much control as is needed by particular pieces of software; you wouldn't want a user application to have the same level of system control as an operating system. Additionally, there are the notions of Security state and Execution state. Security state controls access certain registers and memory marked as secure, and will be discussed later. The Execution state of the processor can be either 32-bit or 64-bit, but we will not discuss the differences in this post. The relationship between Security state and Exception level is more complex than it appears, however we can at least say there is:
CPTR_EL3
CPTR_EL2
Switching between Exception levels is accomplished by returning from an exception, but we must also have something to return to. In "startup.s" we define a function:
.global el1_entry_aarch64 .type el1_entry_aarch64, "function" el1_entry_aarch64: // we can use the same vector table in this example, but in general // each combination of Exception level, Security state, and Execution state // will need a new vector table LDR x0, =vectors MSR VBAR_EL1, x0 //we must ensure that floating point register accesses are not trapped //since the c library for AArch64-v8A uses them MOV x0, #(0x3 << 20) MSR CPACR_EL1, x0 // ensures that all instructions complete before ISB // Branch to scatter loading and C library init code .global __main B __main
We now have a label the ERET can branch too, so we are in a position to modify the start64 function.
ERET
start64
boot: ADRP x0, Image$$STACK_EL3$$ZI$$Limit // get stack address MOV sp, x0 // NB, CODE OMITTED // Configure SCR_EL3 // ----------------- MOV w1, #0 // Initial value of register is unknown ORR w1, w1, #(1 << 11) // set ST bit (disable trapping of timer control registers) ORR w1, w1, #(1 << 10) // set RW bit (next lower EL in aarch64) ORR w1, w1, #(1 << 3) // Set EA bit (SError routed to EL3) ORR w1, w1, #(1 << 2) // Set FIQ bit (FIQs routed to EL3) ORR w1, w1, #(1 << 1) // Set IRQ bit (IRQs routed to EL3) MSR SCR_EL3, x1 // NB, CODE OMITTED // Initialize SCTLR_EL1 // -------------------- // SCTLR_EL1 has an unknown reset value and must be configured // before we can enter EL1 MSR SCTLR_EL1, xzr LDR x0, =el1_entry_aarch64 LDR x1, =AArch64_EL1_SP1 MSR ELR_EL3, x0 // where to branch to when exception completes MSR SPSR_EL3, x1 // set the program state for this point to a known value BL gicInit ERET
Going through these sequentially. Firstly, we must define a new stack pointer for the current Exception level. Earlier we were relying on the Arm c libraries to initialise the stack pointer, but since we have moved our branch to __main this will only initialise a stack pointer for EL1. We also add the line STACK_EL3 +0 ALIGN 64 EMPTY 0x4000 {} in "scatter.txt" to define the stack in memory. Next, we disable trapping of the timer register accesses as the processor will be in EL1 when the timer interrupt is generated. Then set the next lower Exception level, Secure EL1, to the 64bit Execution state. Finally, after making sure the System Control Register, SCTLR_EL1, is zero initialised, we set the Exception Link Register, ELR_EL3, and Saved Program State Register, SPSR_EL3, to the desired address and state at EL1. It is worth noting that SPSR_EL3 is responsible for controlling the Exception level that the processor enters after the ERET, while ELR_EL3 merely specifies the address to return to. Also, we have moved the branch to gicInit here, this function modifies registers accessible at EL3 only, so it can't be placed in the main() function as that is now at EL1. Building the image and running the image will send a "hello world" and interrupt message as before.
__main
STACK_EL3 +0 ALIGN 64 EMPTY 0x4000 {}
gicInit
main()
Arm's Architecture Reference Manual introduces a Secure and Non-secure state for the processor. Definitively, the Secure state can access Secure and Non-secure physical addresses, while the Non-secure state can only access the Non-secure address space and cannot access certain secure system registers. Partitioning memory accesses in this way prevents, say, a user level application in Non-secure EL0 from accessing encryption keys held by a trusted operating system running in Secure EL1. It also plays a part in the implementation of Arm TrustZone technology.
Control over Security state is performed at EL3, which sets the Security state of lower Exception levels. Specifically, setting the leading bit of the Secure Configuration Register SCR_EL3, will put the system into a Non-secure state, after the system returns to a lower Exception level. However, this is not the only change we will have to make since the Non-secure state introduces a number of complexities. We must ensure that any instructions that are to be executed while in the Non-secure state, are in Non-secure memory. So we will start by modifying "scatter.txt"
SCR_EL3
SROM_LOAD 0x00000000 { SROM_EXEC +0 { startup.o(BOOT, +FIRST) gic.o } STACK_EL3 0x04000000 ALIGN 64 EMPTY 0x10000 {} } NSROM_LOAD 0x80000000 { ROM_EXEC +0 0x10000 { startup.o(NONSECURE) * (+RO) } RAM_EXEC +0 0x10000 { * (+RW, +ZI) } ARM_LIB_STACKHEAP +0 EMPTY 0x10000 {} STACK_EL2 +0 ALIGN 64 EMPTY 0x10000 {} }
We have defined a new region of memory, NSROM_LOAD, starting at the Non-secure DRAM portion of the model's memory. In this region we have placed the NONSECURE section of our startup code, which we will define later. Wildcard data has been placed in this region, so all data which is not explicitly placed elsewhere will be placed in the relevant regions here. A stack for EL2 has also been defined, and we have moved the library stack-heap here also. Our SROM_LOAD region is located in secure memory, and we have put the gic and BOOT sections of the code in this region too. EL3's stack has been placed in secure SRAM.
NSROM_LOAD
NONSECURE
SROM_LOAD
BOOT
Now we have dealt with memory, we must now consider changes to the code. Since we will be branching to __main in Non-secure EL1, we must change references to the secure timer register to the non secure timer registers. So in "timer.s" we have changed accesses to CNTPS_TVAL_EL1 and CNTPS_CTL_EL1, with CNTP_TVAL_EL0 and CNTP_CTL_EL0. Then we define the EL1 and EL2 entry functions in "startup.s", and we wrap them into the section named NONSECURE, so they are placed in Non-secure memory.
CNTPS_TVAL_EL1
CNTPS_CTL_EL1
CNTP_TVAL_EL0
CNTP_CTL_EL0
// ------------------------------------------------------------ // EL2 AArch64 // ------------------------------------------------------------ .section NONSECURE, "ax" .align 3 .global el2_entry_aarch64 .type el2_entry_aarch64, "function" el2_entry_aarch64: NOP ADRP x0, Image$$STACK_EL2$$ZI$$Limit MOV sp, x0 // Configure HCR_EL2 - the hypervisor configuration register // --------------------------------------------------------- NOP MRS x0, HCR_EL2 MOV x1, #(1 << 31) ORR x0, x0, x1 MSR HCR_EL2, x0 // Configure CNTHCTL_EL2 - the Counter-timer Hypervisor Control register // --------------------------------------------------------------------- // we need to enable timer register access for lower EL levels MRS x0, CNTHCTL_EL2 ORR x0, x0, #(1 << 1) ORR x0, x0, #1 MSR CNTHCTL_EL2, x0 // we can use the same vector table in this example, but in general // each combination of Exception level, Security state, and Execution state // will need a new vector table // ADD YOUR CODE HERE LDR x0, =vectors MSR VBAR_EL2, x0 // Initialize SCTLR_EL1 // -------------------- // SCTLR_EL1 has an unknown reset value and must be configured // before we can enter EL1 MSR SCTLR_EL1, xzr // Enter EL1 // --------- LDR x0, =el1_entry_aarch64 LDR x1, =AArch64_EL1_SP1 MSR ELR_EL2, x0 MSR SPSR_EL2, x1 ERET // ------------------------------------------------------------ // EL1 AArch64 // ------------------------------------------------------------ .global el1_entry_aarch64 .type el1_entry_aarch64, "function" el1_entry_aarch64: // we can use the same vector table in this example, but in general // each combination of Exception level, Security state, and Execution state // will need a new vector table LDR x0, =vectors MSR VBAR_EL1, x0 //we must ensure that floating point register accesses are not trapped //since the c library for AArch64-v8A uses them MOV x0, #(0x3 << 20) MSR CPACR_EL1, x0 // ISB ensures that all instructions complete before this instruction ISB // Branch to scatter loading and C library init code .global __main B __main
The comments in the code explain the modifications to system registers. It is also worth noting that it is not necessary to change Exception level incrementally. Configuration of the registers in el2_entry_aarch64 could have been done at EL3, and we could have returned from EL3 directly to EL1. Instead we have taken the scenic route, and configured the state of EL1 at EL2. Now that we have defined our entry points, we turn our attention to the interrupt controller. In "gic.s":
el2_entry_aarch64
MOV x0, #ICC_SRE_ELn.Enable ORR x0, x0, #ICC_SRE_ELn.SRE MSR ICC_SRE_EL3, x0 ISB MSR ICC_SRE_EL2, x0 ISB MSR ICC_SRE_EL1, x0 // Set the Secure version of ICC_SRE_EL1 ISB MRS x1, SCR_EL3 BIC w1, w1, #1 // Set NS bit (lower EL in Secure state) MSR SCR_EL3, x1 ISB MSR ICC_SRE_EL1, x0 MRS x1, SCR_EL3 ORR w1, w1, #1 // Set NS bit (lower EL in non Secure state) MSR SCR_EL3, x1 ISB MOV x0, #0xFF MSR ICC_PMR_EL1, x0 // Set PMR to lowest priority ISB MOV x0, #3 MSR ICC_IGRPEN1_EL3, x0 ISB MOV x0, #1 MSR ICC_IGRPEN1_EL1, x0 MSR ICC_IGRPEN0_EL1, x0 ISB
As you can see in the code, we seem to modify the interrupt controller system register enable register, ICC_SRE_EL1, twice. The reason for this is that there are two versions of ICC_SRE_EL1, a secure and Non-secure version, which share the same alias! Which version you are accessing depends on the current value of the NS bit, hence the momentary modification to SCR_EL3. Since we have defined our interrupts as secure group zero, setting the secure version of the register is paramount. The final change we make is in "hello_world.c", where we have modified the fiqHandler() procedure.
ICC_SRE_EL1
NS
fiqHandler()
void fiqHandler(void) { uint32_t intid; intid = readIAR0(); //interrupt id for non secure timer is different to the secure timer! if (intid == 30) { flag = 1; disableTimer(); } else { printf("Should never reach here!\n"); } writeEOIR0(intid); return; }
There we have changed the expected value of intid to 30, the Non-secure timer is a separate source of interrupts to the secure timer as they are two different pieces of hardware. Building the image, then running the model:
intid
$ FVP_Base_AEMv8A -C bp.secure_memory=false -C bp.refcounter.non_arch_start_at_default=1 -a __image.axf
Should generate the telnet messages as before. Note that we have used the option bp.secure_memory=false, which disables the TrustZone controller. This controller is a peripheral which is included in the model we are using. We won't go into much in detail here, but the controller allows the programmer to dynamically define regions of memory with configurable access checks. However, the default behaviour is to abort all accesses to memory that can be programmed to the controller, which includes the models Non-secure DRAM. If you run the image without this option, and step through the execution with a debugger you'll find the system aborts the instruction fetch just after entering EL2 in the Non-secure state. By using bp.secure_memory=false we sidestep configuring the controller, but this is a cheap solution as it also allows Non-secure accesses to secure memory.
bp.secure_memory=false
One change we have not discussed is the redefinition of the interrupts. In this post we have left the configuration of the timer interrupt as Secure Group 0, however it would be appropriate to have it as Non-Secure Group 1 in the second example. The code for this has been included in the download, but it is left as an exercise for the reader to note the differences between the source files.
The latest version of DS5 and the FVP now supports v8.4-A of the architecture. I have not tested the code with the newer version of the model, but I imagine little/no changes will be needed for it to run.
It should be noted that Secure EL2 was added to the architecture in Armv8.4-A.
But the FVP used does not fully support Armv8.4-A, at the time of writing, and there is no Secure EL2 in the code examples in this post.