(With sincere apologies to Alan Turing)
Have you ever heard of the “Procedure Call Standard for the ARM Architecture”? We often refer to it as the AAPCS (and, yes, there is a historical reason why the acronym doesn’t match the name!)
You could be forgiven for not having heard of it. If you went looking for it, you would need to know that it forms part of a much larger suite of documents, the “Application Binary Interface (ABI) for the ARM Architecture". It hides amongst other gems such as “DWARF for the ARM Architecture” and “Support for Debugging Overlaid Programs” and so on. The ABI is a very useful set of documents which help to ensure that different libraries and tools interwork successfully when targeting the ARM architecture.
But, returning to the AAPCS… this simply defines, in its own words, “how subroutines can be separately written, separately compiled, and separately assembler to work together.” If you like, it describes a contract between a calling routine and called routine which allows each to make a set of working assumptions about the other.
There’s a lot of useful information in the AAPCS. Today, I just want to pick out a couple of aspects of it as understanding them can have a significant effect on the performance of your software.
The APCS contains a lot of rules about how registers are used within functions and at externally-visible function-call boundaries. These are perhaps the most important part of the specification. The table below is taken from the APPCS document and shows the defined use for each of the sixteen registers.
So, at a function call boundary, four registers (r0-r3) can be used for passing parameters and two of those (r0-r1) can be used for returning a result. Most of the remaining registers (with the exception of r12) must be preserved by a called function. So, the calling function can assume that their values won’t change and the called function must take some action to preserve and restore them if it wants to use them (typically, they would be pushed onto the stack on entry and popped off just before exit).
So, what can we learn from this which helps us write better code? The first thing we can observe is that passing up to four word-sized parameters can be done very efficiently as they can all be placed in registers. If we pass more than four parameters, there are no extra registers to use so they are placed on the stack. That will take extra instructions, extra time and consume stack space so we should avoid that if at all possible.
Here is the code for a function which passes just four parameters.
int func1(int a, int b, int c, int d) { return a+b+c+d; } int caller1(void) { return func1(1,2,3,4); }
That might compile to something like this:
func1 ADD r0, r0, r1 ADD r0, r0, r2 ADD r0, r0, r3 BX lr caller1 MOV r3, #4 MOV r2, #3 MOV r1, #2 MOV r0, #1 B func1
And here is the code you might see from the compiler for a similar function which passes six parameters:
func1 PUSH {r4,r5,lr} ADD r0,r0,r1 ADD r0,r0,r2 LDRD r4,r5,[sp,#0xc] ADD r0,r0,r3 ADD r0,r0,r4 ADD r0,r0,r5 POP {r4,r5,pc} caller1 PUSH {r2,r3,lr} MOVS r1,#6 MOVS r0,#5 MOVS r3,#4 STRD r0,r1,[sp,#0] MOVS r2,#3 MOVS r1,#2 MOVS r0,#1 BL func2 POP {r2,r3,pc}
You can see that the second example is considerably longer and involves a significant number of stack accesses.
So, a good rule would be to restrict the number of parameters to a maximum of four wherever possible. If that isn’t possible, then you should try to place the most frequently accessed parameters in r0-r3 so that stack accesses in the called function are minimised.
But there is another subtlety buried in the AAPCS. You may be aware that variables in memory should be aligned to their natural size for best performance. The AAPCS places a similar set of “alignment” restrictions on how parameters are allocated to registers. Specifically, a doubleword-sized parameter must be passed in an even-odd register pair. In other words, you can pass a doubleword in r0:r1 or r2:r3 but you can’t pass it in r1:r2.
r0:r1
r2:r3
r1:r2.
Here’s a function call which passes three parameters. If you count up the total size, it’s only four words, so you might expect that all can be passed in registers. However, the allocation rule means that the doubleword is placed in r2:r3 and r1 is unused. The third parameter then gets placed on the stack.
r1
fx(int a, long long b, int c);
I’m sure you can see that this is easy to fix simply by reordering the parameters. If you place the doubleword either first or last in the parameter list, then all the parameters will fit in r0-r3.
r0-r3
So, two simple rules from the AAPCS which will help your code perform better
Happy coding!