Semihalf ARM blog #2: Dead board and stack growth

March 11, 2016

7 minute read time.

We’d like to welcome all of you and describe few interesting issues we encounter during our work with ARMv8 and FreeBSD. In this cycle it is planned to talk a little about various bugs found in the kernel and the ways how they were resolved.

Exception model in ARMv8

The new ARM architecture represents different approach to exception levels. A typical figure showing all levels can be found in ARM documentation.

How does it look inside FreeBSD kernel

FreeBSD kernel only makes use of the first two lowest exception levels. All userspace processes are being run in EL0 mode, where all kernel code is executed in more privileged EL1.

At the first glance, the exception level model is drastically simplified as compared with ARMv7 and now looks similar to x86_64 architecture rather than to old-fashion RISC-like processor.

But how the change in exception level is handled in FreeBSD? Let’s take “syscall” as an example. As is known, when user process (via libc) wants to ask the kernel to do some work on its behalf, it must use a special system call. On ARMv8 it is done by generating special exception type, which is the only way to increase execution privilege and start running kernel code.

Upon receiving SVC call (exception from userspace indicating syscall access), the processor jumps to predefined vector and executes the following code:

ENTRY(handle_el0_sync)

save_registers 0

mov x0, sp

bl do_el0_sync

do_ast

restore_registers 0

eret

END(handle_el0_sync)

Not going in too much details, it stores all registers on the stack (creating trapframe, passed as a parameter to do_el0_sync) and calls C-function for handling this event. In current case, it parses SVC parameters and executes an appropriate syscall handler (svc_handler).

void

do_el0_sync(struct trapframe *frame)

{

struct thread *td;

td = curthread;

td->td_frame = frame;

…..

switch(exception) {

…...

break;

case EXCP_SVC:

svc_handler(frame);

break;

...

default:

….

}

Once syscall is done, function returns, handle_el0_sync restores all registers and goes back to the user process.

Exceptions are not as bad - page faults

The most common exception executed on FreeBSD is a page fault. It is absolutely normal that it happens when user or kernel thread wants to, for example, map a page which is being used for the first time.

FreeBSD kernel uses advanced memory management features, such as copy-on-write and lazy-alloc. Page fault is then used as an indication if any of mentioned operations should be performed by the kernel.

Of course, when a user process tries to do something which is not allowed to do, like dereference NULL pointer, page fault indicates invalid operation and the kernel sends a killing signal to the process - the kernel acts as a guard in this case not allowing a process to go anywhere outside predefined bounds.

Stack on ARM

The ARM core can utilize stack in various modes. The most common (and the one used by FreeBSD) is an descending implementation of the stack. That means, that stack grows into lower memory addresses, as shown on the picture below.

Dynamic stack growth in kernel threads and how can we get with dead system

To visualise the dangerous possibility, let’s discuss the real-life example, encountered on the beginning of porting FreeBSD to armv8.

When the kernel thread is being created, it shares the memory space with the rest of the kernel. The only thing which needs to be private it’s a stack. As might be expected, it’s done by a kernel version of malloc.

It looks fine at the first glance, we’re creating thread, malloc’ing the stack for it and everything should work just fine. But it didn’t. The problem was with a special feature mentioned before, lazy-alloc. Physical allocation of pages is a time consuming process, so it is better to do that just in time where the page is needed, i.e. when a first page-fault happens on an address from malloc’ed area.

It might not be obvious yet, but this can make the whole system stuck! Let’s see what is happening. Assume that the stack starts at 0x10008000 and grows down. At the beginning, malloc allocated only one page (because top-of-the-stack is always accessed during creation of the process and filled with some thread-specific stuff, descriptors etc.). Just when the call stack is big enough, it eventually grow pass the allocated space and falls into the page below (0x10006000 - 0x10006fff). The first access causes pagefault which is intended to be handled and allocate required page, but not this time.

Take a closer look at the assembler:

.macro save_registers el

.if \el == 1

mov x18, sp

sub sp, sp, #128

.endif

sub sp, sp, #(TF_SIZE + 16)

stp x29, x30, [sp, #(TF_SIZE)]

stp x28, x29, [sp, #(TF_X + 28 * 8)]

stp x26, x27, [sp, #(TF_X + 26 * 8)]

stp x24, x25, [sp, #(TF_X + 24 * 8)]

….

ENTRY(handle_el1h_sync)

save_registers 1

mov x0, sp

bl do_el1h_sync

restore_registers 1

eret

END(handle_el1h_sync)

The first thing done in EL1_sync handler is, yep, storing all registers onto the stack. But, we’re just run out of allocated space, so there is no stack accessible here and the EL1_sync exception repeats just when the first “stp” instruction is executed. What’s more, it repeats constantly ever since and the only way to recover is to hard reset the board.

Ways to workaround

Unfortunately, armv8 architecture is susceptible to this scenario. There is always a chance that the kernel thread exceeds its allocated stack range and ends up in described state. What we can do is to minimize the chance for that to happen.

Following things can be done on FreeBSD:

when allocating the stack, ensure that all pages are allocated (wired) and no pagefault occur for the whole stack range - this is a solution currently implemented in FreeBSD which works well for over a year of testing.
Allocate more stack than requested and in the exception handler check if stack is bigger than a predefined size. Then we still have some of the stack left, so the system can, for example, enter debugger or do a sysdump. (this was not implemented)

How can we avoid this and why ARMv7 was different

On 32-bit ARM architecture, the situation was a lot easier. Previously, almost every exception level had its own stack pointer and there was almost impossible that the system-stack gets overflowed. The lazy-alloc functionality was also easier to implement and use.

Conclusion

The ARMv8 architecture is superior in most of the aspects, but the programmer must be aware of some dangers hiding inside. I hope this short article was interesting and helps to visualise the issues we were facing with during FreeBSD ARMv8 porting.

About Semihalf

Semihalf creates software for advanced solutions in the areas of platform infrastructure (operating systems, bootloaders), virtualization, networking and storage. We make software tightly coupled with the underlying hardware to achieve maximum system capacity.

Technologies developed by Semihalf power a wide range of products, from consumer electronics to cloud data center elements and carrier-grade networking gear.

The team

Zbigniew Bodek <zbb@semihalf.com>

Dominik Ermel <der@semihalf.com>

Wojciech Macek <wma@semihalf.com>

Michał Stanek <mst@semihalf.com>

Architectures and Processors blog

Arm A-Profile Architecture Developments 2024

Martin Weidmann

Arm's 2024 v9.6-A extensions introduce significant updates to enhance computing performance, efficiency, and security for today’s AI enabled software.
- October 1, 2024
Accelerate multi-token search in strings with SVE2 SVMATCH instruction

Yibo Cai

The SVMATCH instruction in Arm SVE2 accelerates multi-token string matches, boosting performance in tasks like JSON decoding.
- September 25, 2024
Accelerating video decode and image processing with Armv9 CPUs and SVE2

Poulomi Dasgupta

This blog post explores three video and image use cases demonstrating the proven impact of the Armv9 CPU architectural features.
- September 23, 2024

AI and ML blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded blog

Graphics, Gaming, and VR blog

High Performance Computing (HPC) blog

Infrastructure Solutions blog

Internet of Things (IoT) blog

Operating Systems blog

SoC Design and Simulation blog

Tools, Software and IDEs blog