Size matters...

November 27, 2013

4 minute read time.

Size matters

How much does it take to increment a number? One instruction? Two? It is not as simple a question as you might imagine. And, believe me, this is one case where size matters.

On an ARM processor, any ARM processor, you can increment a 32-bit number in a single instruction, usually taking one cycle. It really is that simple. This is because ARM processors are 32-bit devices, with 32-bit registers, a 32-bit ALU and 32-bit internal data paths. They are good at doing 32-bit operations efficiently - because that is what they are designed to do.

Look at the C routine below and its corresponding assembly code. You can see that the increment operation translates to a single instruction.

C Code	ARM Assembly Code
int increment(int a) { a = a + 1; return a; }	increment add r0, r0, #1 bx lr

But not every processor is an ARM processor. An 8051, for instance, has a natural data size of 8 bits. It has 8-bit registers and its ALU carries out 8-bit operations. So, what might it take to increment a 32-bit variable on an 8051? Try this:

C Code

8051 Assembly Code

long increment(long a)
{
a = a + 1;
return a;
}

; a assigned to R4:R5:R6:R7

MOV A, R7
ADD A, #01h
MOV R7, A
CLR A
ADDC A, R6
MOV R6, A
CLR A
ADDC A, R5
MOV R5, A
CLR A
ADDC A, R4
MOV R4, A
RET

It is clear that 32-bit addition on an 8051 is much harder and much more time-consuming than a simple 8-bit addition. Since the ALU can only handle 8 bits at a time, four separate additions are required to propagate any carry across four separate parts of the result. On an ARM processor, the reverse is true. Here is an example of incrementing an 8-bit variable.

C Code	ARM Assembly Code
int increment(unsigned char a) { a = a + 1; return a; }	increment ADD r0, r0, #1 AND r0, r0, #0xFF BX lr

Although it may not seem much of an overhead, the compiler has to insert extra instructions to remove unwanted overflow and restrict the 32-bit result to fit in a declared 8-bit variable. The same would be true when using a 16-bit variable.

So, when moving from other, “smaller” architectures to ARM a change in mindset is necessary. It is no longer the right decision to choose the smallest possible container for a variable. Instead, 32-bit variables should be the default as they are the most efficient, arithmetically.

Store small, process large

But, in many applications, storage space is at a premium. That means you may still want to choose the smallest viable size for a particular variable so that it takes up the least possible space in memory. That can still be an efficient choice on ARM too. But you should still process items at the natural size of the core i.e. 32-bit words. ARM processors have byte and halfword sized load and store instructions which make it very easy to do the conversion at the time you transfer values into and out of registers. Here is an example of incrementing an 8-bit variable held in memory.

C Code	ARM Assembly Code
unsigned char a; void increment_a(void) { a = a + 1; }	increment_a LDR r0, =&a LDRB r1, [r0] ADD r1, r1, #1 STRB r1, [r0] BX lr

(Yes, I know that the first statement isn’t legal assembler but you can see what it means!)

Here, the LDRB and STRB instructions automatically zero-extend the 8-bit value when loading it and truncate it when storing it. This takes care of the size adjustment and it is almost free – there may be an additional cycle of latency on the load instruction on some cores. Of course, if you want to do some more complex processing on an 8-bit variable, then it might be necessary to copy it into a word-sized local variable after loading it. It can then be processed at the natural word size and then only truncated again when finally written back to memory.

So remember, small isn’t always beautiful!

(ARM Processors)

Parents

Chris Shore over 10 years ago

Thanks for the comment. I think I am in the clear as I am writing from the point of view of the ARM compiler which assumes char to be unsigned by default. Still, to be absolutely clear, I have edited the document and made the declaration explicitly unsigned.
- Cancel
- Up 0 Down
- Reply
- More
- Cancel

Comment

Chris Shore over 10 years ago

Thanks for the comment. I think I am in the clear as I am writing from the point of view of the ARM compiler which assumes char to be unsigned by default. Still, to be absolutely clear, I have edited the document and made the declaration explicitly unsigned.
- Cancel
- Up 0 Down
- Reply
- More
- Cancel

Children

No Data

Architectures and Processors blog

Using SVE in C#

Alan Hayward

.NET 9 introduces SVE support on Arm, allowing users to write simplified vectorised code. This blog gives examples in C# and compares it to C++.
- November 20, 2024
Part 3: Enabling PAC and BTI on AArch64 for Linux

Bill Roberts

Supporting C++ style exceptions and DWARF for Pointer Authentication Codes (PAC) signed pointers.
- November 20, 2024
Part 2: Enabling PAC and BTI on AArch64 for Linux

Bill Roberts

Utilizing Pointer Authentication Codes (PAC) and Branch Target Instructions (BTI) together and optimizations in instruction counts.
- November 19, 2024

AI and ML blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded blog

Graphics, Gaming, and VR blog

High Performance Computing (HPC) blog

Infrastructure Solutions blog

Internet of Things (IoT) blog

Operating Systems blog

SoC Design and Simulation blog

Tools, Software and IDEs blog

Size matters...

Size matters

Store small, process large

Using SVE in C#

Part 3: Enabling PAC and BTI on AArch64 for Linux

Part 2: Enabling PAC and BTI on AArch64 for Linux