Using the Stack in AArch32 and AArch64

November 19, 2015

4 minute read time.

When reading assembly-level code for any of the AArch32 or AArch64 instruction sets, you may have noticed that the stack pointer has various alignment and usage restrictions. These restrictions are part of the procedure-call standard – the set of common rules that allow functions to call one another. However, some of the rules also apply even if you aren't actually handling function calls. The stack is shared between parts of an application, any libraries that it uses as well as signal handlers, so it is important that these components agree on how the stack should behave.

If you're just writing C code, the compiler will sort this all out for you, but you'll need to understand the rules if you're dealing with any assembly code that needs to interact with the stack.

This article assumes that your platform uses ARM's AAPCS (for AArch32) or AAPCS64 (for AArch64). This is the case on Linux and Android, but other systems may define their own standards.

Shared Stack-Usage Rules

For both AArch32 and AArch64:

The stack is full-descending, meaning that sp – the stack pointer – points to the most recently pushed object on the stack, and it grows downwards, towards lower addresses.
sp must point to a valid address in the memory allocated for the stack.
- Formally, sp must lie in the range stack_limit < sp <= stack_base, though the values of stack_limit and stack_base are often inaccessible.
The memory below sp (but above stack_limit) must not be accessed by your code.
- In practice, signal handlers use this memory, so it can be corrupted unexpectedly and without warning.
At public interfaces, the alignment of sp must be two times the pointer size.
- For AArch32 that's 8 bytes, and for AArch64 it's 16 bytes.
- A "public interface" is typically a function that is visible to some other, separately-compiled code. The exact definition depends upon the language and the toolchain, and is out of scope of this article. It's reasonable to assume that any C or C++ functions that you interact with using assembly are treated as public interfaces.

Rules Specific to AArch32

For AArch32 (ARM or Thumb), sp must be at least 4-byte aligned at all times. As long as you only push and pop whole registers, this restriction will never be broken.

Rules Specific to AArch64

For AArch64, sp must be 16-byte aligned whenever it is used to access memory. This is enforced by AArch64 hardware.

This means that it is difficult to implement a generic push or pop operation for AArch64. There are no push or pop aliases like there are for ARM and Thumb.
The hardware checks can be disabled by privileged code, but they're enabled in at least Linux and Android.

C compilers will typically reserve stack space at the start of the function, then leave sp alone until the end, so the restriction is not as awkward as it first seems. However, you must be aware of it when handling assembly code, and it can be tricky for simple compilers (such as stack-based JIT compilers).

Note that unlike AArch32, arbitrarily-aligned values can be stored in sp, as long as the previously-described rules are followed for memory accesses and public interfaces. This is useful for allocating variable-length arrays of small values, for example:

// Allocate a variable-length array of bytes on the stack.
  sub sp, sp, x0                    // x0 holds the length.
  and sp, sp, #0xfffffffffffffff0   // Align sp.

Push and Pop on AArch64

The alignment-check-on-memory-access means that AArch64 cannot have general-purpose push- or pop-like operations.

For example:

// Broken AArch64 implementation of `push {x1}; push {x0};`.
  str   x1, [sp, #-8]!  // This works, but leaves `sp` with 8-byte alignment ...
  str   x0, [sp, #-8]!  // ... so the second `str` will fail.

In this particular case, the stores could be combined:

// AArch64 implementation of `push {x0, x1}`.
  stp   x0, x1, [sp, #-16]!

However, in a simple compiler, it is not always easy to combine instructions in that way.

If you're handling w registers, the problem will be even more apparent: these have to be pushed in sets of four to maintain stack pointer alignment, and since this isn't possible in a single instruction, the code can become difficult to follow. This is what VIXL generates, for example:

// AArch64 implementation of `push {w0, w1, w2, w3}`.
  stp   w0, w1, [sp, #-16]!   // Allocate four words and store w0 and w1 at the lower addresses.
  stp   w2, w3, [sp, #8]      // Store w2 and w3 at the upper addresses.

If you're dealing with hand-written AArch64 assembly code, you'll have to be aware of these patterns.

Many JIT compilers have a tricky situation, though: such compilers are built around a simple stack machine, and expect to be able to push and pop in an ad-hoc fashion. Managing this on AArch64 requires an inventive approach, and I'll describe a few possibilities in a follow-up article.

¹Some time ago I was told that the 8-byte alignment restriction exists to allow the use of instructions such as ldrexd and strexd, which require an 8-byte-aligned address. Without a guarantee that a function will be entered with proper alignment, these instructions would be awkward to use on stack variables. There may also be other reasons, but I don't know what they are, and AAPCS doesn't document them.

1 comment
0 members are here

Architectures and Processors blog

Introducing GICv5: Scalable and secure interrupt management for Arm

Christoffer Dall

Introducing Arm GICv5: a scalable, hypervisor-free interrupt controller for modern multi-core systems with improved virtualization and real-time support.
- April 28, 2025
Getting started with AARCHMRS Features.json using Python

Joh

A high-level introduction to the Arm Architecture Machine Readable Specification (AARCHMRS) Features.json with some examples to interpret and start to work with the available data using Python.
- April 8, 2025
Advancing server manageability on Arm Neoverse Compute Subsystem (CSS) with OpenBMC

Samer El-Haj-Mahmoud

Arm and 9elements Cyber Security have brought a prototype of OpenBMC to the Arm Neoverse Compute Subsystem (CSS) to advancing server manageability.
- January 28, 2025

AI blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded and Microcontrollers blog

Internet of Things (IoT) blog

Laptops and Desktops blog

Mobile, Graphics, and Gaming blog

Operating Systems blog

Servers and Cloud Computing blog

SoC Design and Simulation blog

Tools, Software and IDEs blog

Using the Stack in AArch32 and AArch64

Shared Stack-Usage Rules

Rules Specific to AArch32

Rules Specific to AArch64

Push and Pop on AArch64

Introducing GICv5: Scalable and secure interrupt management for Arm

Getting started with AARCHMRS Features.json using Python

Advancing server manageability on Arm Neoverse Compute Subsystem (CSS) with OpenBMC