How much does it take to increment a number? One instruction? Two? It is not as simple a question as you might imagine. And, believe me, this is one case where size matters.
On an ARM processor, any ARM processor, you can increment a 32-bit number in a single instruction, usually taking one cycle. It really is that simple. This is because ARM processors are 32-bit devices, with 32-bit registers, a 32-bit ALU and 32-bit internal data paths. They are good at doing 32-bit operations efficiently - because that is what they are designed to do.
Look at the C routine below and its corresponding assembly code. You can see that the increment operation translates to a single instruction.
C Code
ARM Assembly Code
int increment(int a) { a = a + 1; return a; }
increment add r0, r0, #1 bx lr
But not every processor is an ARM processor. An 8051, for instance, has a natural data size of 8 bits. It has 8-bit registers and its ALU carries out 8-bit operations. So, what might it take to increment a 32-bit variable on an 8051? Try this:
8051 Assembly Code
long increment(long a) { a = a + 1; return a; }
; a assigned to R4:R5:R6:R7
MOV A, R7 ADD A, #01h MOV R7, A CLR A ADDC A, R6 MOV R6, A CLR A ADDC A, R5 MOV R5, A CLR A ADDC A, R4 MOV R4, A RET
It is clear that 32-bit addition on an 8051 is much harder and much more time-consuming than a simple 8-bit addition. Since the ALU can only handle 8 bits at a time, four separate additions are required to propagate any carry across four separate parts of the result. On an ARM processor, the reverse is true. Here is an example of incrementing an 8-bit variable.
int increment(unsigned char a) { a = a + 1; return a; }
increment ADD r0, r0, #1 AND r0, r0, #0xFF BX lr
Although it may not seem much of an overhead, the compiler has to insert extra instructions to remove unwanted overflow and restrict the 32-bit result to fit in a declared 8-bit variable. The same would be true when using a 16-bit variable.
So, when moving from other, “smaller” architectures to ARM a change in mindset is necessary. It is no longer the right decision to choose the smallest possible container for a variable. Instead, 32-bit variables should be the default as they are the most efficient, arithmetically.
But, in many applications, storage space is at a premium. That means you may still want to choose the smallest viable size for a particular variable so that it takes up the least possible space in memory. That can still be an efficient choice on ARM too. But you should still process items at the natural size of the core i.e. 32-bit words. ARM processors have byte and halfword sized load and store instructions which make it very easy to do the conversion at the time you transfer values into and out of registers. Here is an example of incrementing an 8-bit variable held in memory.
unsigned char a; void increment_a(void) { a = a + 1; }
increment_a LDR r0, =&a LDRB r1, [r0] ADD r1, r1, #1 STRB r1, [r0] BX lr
(Yes, I know that the first statement isn’t legal assembler but you can see what it means!)
Here, the LDRB and STRB instructions automatically zero-extend the 8-bit value when loading it and truncate it when storing it. This takes care of the size adjustment and it is almost free – there may be an additional cycle of latency on the load instruction on some cores. Of course, if you want to do some more complex processing on an 8-bit variable, then it might be necessary to copy it into a word-sized local variable after loading it. It can then be processed at the natural word size and then only truncated again when finally written back to memory.
So remember, small isn’t always beautiful!
(ARM Processors)
Thanks for the comment. I think I am in the clear as I am writing from the point of view of the ARM compiler which assumes char to be unsigned by default. Still, to be absolutely clear, I have edited the document and made the declaration explicitly unsigned.