We’d like to welcome all of you and describe few interesting issues we encounter during our work with ARMv8 and FreeBSD. In this cycle it is planned to talk a little about various bugs found in the kernel and the ways how they were resolved.
The new ARM architecture represents different approach to exception levels. A typical figure showing all levels can be found in ARM documentation.
FreeBSD kernel only makes use of the first two lowest exception levels. All userspace processes are being run in EL0 mode, where all kernel code is executed in more privileged EL1.
At the first glance, the exception level model is drastically simplified as compared with ARMv7 and now looks similar to x86_64 architecture rather than to old-fashion RISC-like processor.
But how the change in exception level is handled in FreeBSD? Let’s take “syscall” as an example. As is known, when user process (via libc) wants to ask the kernel to do some work on its behalf, it must use a special system call. On ARMv8 it is done by generating special exception type, which is the only way to increase execution privilege and start running kernel code.
Upon receiving SVC call (exception from userspace indicating syscall access), the processor jumps to predefined vector and executes the following code:
ENTRY(handle_el0_sync)
save_registers 0
mov x0, sp
bl do_el0_sync
do_ast
restore_registers 0
eret
END(handle_el0_sync)
Not going in too much details, it stores all registers on the stack (creating trapframe, passed as a parameter to do_el0_sync) and calls C-function for handling this event. In current case, it parses SVC parameters and executes an appropriate syscall handler (svc_handler).
void
do_el0_sync(struct trapframe *frame)
{
struct thread *td;
td = curthread;
td->td_frame = frame;
…..
switch(exception) {
…...
break;
case EXCP_SVC:
svc_handler(frame);
...
default:
….
}
Once syscall is done, function returns, handle_el0_sync restores all registers and goes back to the user process.
The most common exception executed on FreeBSD is a page fault. It is absolutely normal that it happens when user or kernel thread wants to, for example, map a page which is being used for the first time.
FreeBSD kernel uses advanced memory management features, such as copy-on-write and lazy-alloc. Page fault is then used as an indication if any of mentioned operations should be performed by the kernel.
Of course, when a user process tries to do something which is not allowed to do, like dereference NULL pointer, page fault indicates invalid operation and the kernel sends a killing signal to the process - the kernel acts as a guard in this case not allowing a process to go anywhere outside predefined bounds.
The ARM core can utilize stack in various modes. The most common (and the one used by FreeBSD) is an descending implementation of the stack. That means, that stack grows into lower memory addresses, as shown on the picture below.
To visualise the dangerous possibility, let’s discuss the real-life example, encountered on the beginning of porting FreeBSD to armv8.
When the kernel thread is being created, it shares the memory space with the rest of the kernel. The only thing which needs to be private it’s a stack. As might be expected, it’s done by a kernel version of malloc.
It looks fine at the first glance, we’re creating thread, malloc’ing the stack for it and everything should work just fine. But it didn’t. The problem was with a special feature mentioned before, lazy-alloc. Physical allocation of pages is a time consuming process, so it is better to do that just in time where the page is needed, i.e. when a first page-fault happens on an address from malloc’ed area.
It might not be obvious yet, but this can make the whole system stuck! Let’s see what is happening. Assume that the stack starts at 0x10008000 and grows down. At the beginning, malloc allocated only one page (because top-of-the-stack is always accessed during creation of the process and filled with some thread-specific stuff, descriptors etc.). Just when the call stack is big enough, it eventually grow pass the allocated space and falls into the page below (0x10006000 - 0x10006fff). The first access causes pagefault which is intended to be handled and allocate required page, but not this time.
Take a closer look at the assembler:
.macro save_registers el
.if \el == 1
mov x18, sp
sub sp, sp, #128
.endif
sub sp, sp, #(TF_SIZE + 16)
stp x29, x30, [sp, #(TF_SIZE)]
stp x28, x29, [sp, #(TF_X + 28 * 8)]
stp x26, x27, [sp, #(TF_X + 26 * 8)]
stp x24, x25, [sp, #(TF_X + 24 * 8)]
ENTRY(handle_el1h_sync)
save_registers 1
bl do_el1h_sync
restore_registers 1
END(handle_el1h_sync)
The first thing done in EL1_sync handler is, yep, storing all registers onto the stack. But, we’re just run out of allocated space, so there is no stack accessible here and the EL1_sync exception repeats just when the first “stp” instruction is executed. What’s more, it repeats constantly ever since and the only way to recover is to hard reset the board.
Unfortunately, armv8 architecture is susceptible to this scenario. There is always a chance that the kernel thread exceeds its allocated stack range and ends up in described state. What we can do is to minimize the chance for that to happen.
Following things can be done on FreeBSD:
On 32-bit ARM architecture, the situation was a lot easier. Previously, almost every exception level had its own stack pointer and there was almost impossible that the system-stack gets overflowed. The lazy-alloc functionality was also easier to implement and use.
The ARMv8 architecture is superior in most of the aspects, but the programmer must be aware of some dangers hiding inside. I hope this short article was interesting and helps to visualise the issues we were facing with during FreeBSD ARMv8 porting.
Semihalf creates software for advanced solutions in the areas of platform infrastructure (operating systems, bootloaders), virtualization, networking and storage. We make software tightly coupled with the underlying hardware to achieve maximum system capacity.
Technologies developed by Semihalf power a wide range of products, from consumer electronics to cloud data center elements and carrier-grade networking gear.
Zbigniew Bodek <zbb@semihalf.com>
Dominik Ermel <der@semihalf.com>
Wojciech Macek <wma@semihalf.com>
Michał Stanek <mst@semihalf.com>