On the AAPCS, with an application to efficient parameter passing

November 14, 2013

4 minute read time.

(With sincere apologies to Alan Turing)

Have you ever heard of the “Procedure Call Standard for the ARM Architecture”? We often refer to it as the AAPCS (and, yes, there is a historical reason why the acronym doesn’t match the name!)

You could be forgiven for not having heard of it. If you went looking for it, you would need to know that it forms part of a much larger suite of documents, the “Application Binary Interface (ABI) for the ARM Architecture". It hides amongst other gems such as “DWARF for the ARM Architecture” and “Support for Debugging Overlaid Programs” and so on. The ABI is a very useful set of documents which help to ensure that different libraries and tools interwork successfully when targeting the ARM architecture.

Defining AAPCS

But, returning to the AAPCS… this simply defines, in its own words, “how subroutines can be separately written, separately compiled, and separately assembler to work together.” If you like, it describes a contract between a calling routine and called routine which allows each to make a set of working assumptions about the other.

There’s a lot of useful information in the AAPCS. Today, I just want to pick out a couple of aspects of it as understanding them can have a significant effect on the performance of your software.

Registers and functions

The APCS contains a lot of rules about how registers are used within functions and at externally-visible function-call boundaries. These are perhaps the most important part of the specification. The table below is taken from the APPCS document and shows the defined use for each of the sixteen registers.

Register	Purpose
r0	Argument, result, scratch register 1
r1	Argument, result, scratch register 2
r2	Argument, scratch register 3
r3	Argument, scratch register 4
r4	Variable register 1
r5	Variable register 2
r6	Variable register 3
r7	Variable register 4
r8	Variable register 5
r9	Platform register (usage defined by platform in use)
r10	Variable register 7
r11	Variable register 8
r12	Intra-procedure-call scratch register
r13	Stack pointer (SP)
r14	Link register (LR)
r15	Program counter (PC)

So, at a function call boundary, four registers (r0-r3) can be used for passing parameters and two of those (r0-r1) can be used for returning a result. Most of the remaining registers (with the exception of r12) must be preserved by a called function. So, the calling function can assume that their values won’t change and the called function must take some action to preserve and restore them if it wants to use them (typically, they would be pushed onto the stack on entry and popped off just before exit).

So, what can we learn from this which helps us write better code? The first thing we can observe is that passing up to four word-sized parameters can be done very efficiently as they can all be placed in registers. If we pass more than four parameters, there are no extra registers to use so they are placed on the stack. That will take extra instructions, extra time and consume stack space so we should avoid that if at all possible.

Function code

Here is the code for a function which passes just four parameters.

Fullscreen

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
int func1(int a, int b, int c, int d)
{
    return a+b+c+d;
}
int caller1(void)
{
    return func1(1,2,3,4);
} 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

int func1(int a, int b, int c, int d)

{

    return a+b+c+d;

}

int caller1(void)

{

    return func1(1,2,3,4);

}

That might compile to something like this:

Fullscreen

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
func1
     ADD r0, r0, r1
     ADD r0, r0, r2
     ADD r0, r0, r3
     BX lr
caller1
     MOV r3, #4
     MOV r2, #3
     MOV r1, #2
     MOV r0, #1
     B func1
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

func1

     ADD r0, r0, r1

     ADD r0, r0, r2

     ADD r0, r0, r3

     BX lr

caller1

     MOV r3, #4

     MOV r2, #3

     MOV r1, #2

     MOV r0, #1

     B func1

And here is the code you might see from the compiler for a similar function which passes six parameters:

Fullscreen

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
func1
PUSH     {r4,r5,lr}
ADD      r0,r0,r1
ADD      r0,r0,r2
LDRD     r4,r5,[sp,#0xc]
ADD      r0,r0,r3
ADD      r0,r0,r4
ADD      r0,r0,r5
POP      {r4,r5,pc}
caller1
PUSH     {r2,r3,lr}
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

func1

PUSH     {r4,r5,lr}

ADD      r0,r0,r1

ADD      r0,r0,r2

LDRD     r4,r5,[sp,#0xc]

ADD      r0,r0,r3

ADD      r0,r0,r4

ADD      r0,r0,r5

POP      {r4,r5,pc}

caller1

PUSH     {r2,r3,lr}

MOVS     r1,#6

MOVS     r0,#5

MOVS     r3,#4

STRD     r0,r1,[sp,#0]

MOVS     r2,#3

MOVS     r1,#2

MOVS     r0,#1

BL       func2

POP      {r2,r3,pc}

You can see that the second example is considerably longer and involves a significant number of stack accesses.

So, a good rule would be to restrict the number of parameters to a maximum of four wherever possible. If that isn’t possible, then you should try to place the most frequently accessed parameters in r0-r3 so that stack accesses in the called function are minimised.

But there is another subtlety buried in the AAPCS. You may be aware that variables in memory should be aligned to their natural size for best performance. The AAPCS places a similar set of “alignment” restrictions on how parameters are allocated to registers. Specifically, a doubleword-sized parameter must be passed in an even-odd register pair. In other words, you can pass a doubleword in r0:r1 or r2:r3 but you can’t pass it in r1:r2.

Here’s a function call which passes three parameters. If you count up the total size, it’s only four words, so you might expect that all can be passed in registers. However, the allocation rule means that the doubleword is placed in r2:r3 and r1 is unused. The third parameter then gets placed on the stack.

fx(int a, long long b, int c);

I’m sure you can see that this is easy to fix simply by reordering the parameters. If you place the doubleword either first or last in the parameter list, then all the parameters will fit in r0-r3.

So, two simple rules from the AAPCS which will help your code perform better

Restrict parameters to four or fewer
Remember the alignment rules when passing doublewords

Happy coding!

0 comments
0 members are here

Architectures and Processors blog

Introducing GICv5: Scalable and secure interrupt management for Arm

Christoffer Dall

Introducing Arm GICv5: a scalable, hypervisor-free interrupt controller for modern multi-core systems with improved virtualization and real-time support.
- April 28, 2025
Getting started with AARCHMRS Features.json using Python

Joh

A high-level introduction to the Arm Architecture Machine Readable Specification (AARCHMRS) Features.json with some examples to interpret and start to work with the available data using Python.
- April 8, 2025
Advancing server manageability on Arm Neoverse Compute Subsystem (CSS) with OpenBMC

Samer El-Haj-Mahmoud

Arm and 9elements Cyber Security have brought a prototype of OpenBMC to the Arm Neoverse Compute Subsystem (CSS) to advancing server manageability.
- January 28, 2025

AI blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded and Microcontrollers blog

Internet of Things (IoT) blog

Laptops and Desktops blog

Mobile, Graphics, and Gaming blog

Operating Systems blog

Servers and Cloud Computing blog

SoC Design and Simulation blog

Tools, Software and IDEs blog

On the AAPCS, with an application to efficient parameter passing

Defining AAPCS

Registers and functions

Function code

Introducing GICv5: Scalable and secure interrupt management for Arm

Getting started with AARCHMRS Features.json using Python

Advancing server manageability on Arm Neoverse Compute Subsystem (CSS) with OpenBMC