Arm Community
Arm Community
  • Site
  • User
  • Site
  • Search
  • User
Arm Community blogs
Arm Community blogs
Architectures and Processors blog Size matters...
  • Blogs
  • Mentions
  • Sub-Groups
  • Tags
  • Jump...
  • Cancel
More blogs in Arm Community blogs
  • AI blog

  • Announcements

  • Architectures and Processors blog

  • Automotive blog

  • Embedded and Microcontrollers blog

  • Internet of Things (IoT) blog

  • Laptops and Desktops blog

  • Mobile, Graphics, and Gaming blog

  • Operating Systems blog

  • Servers and Cloud Computing blog

  • SoC Design and Simulation blog

  • Tools, Software and IDEs blog

Tell us what you think
Tags
  • development
  • compiler
  • programming
  • coding
  • Processors
Actions
  • RSS
  • More
  • Cancel
Related blog posts
Related forum threads

Size matters...

Chris Shore
Chris Shore
November 27, 2013
4 minute read time.

Size matters

How much does it take to increment a number? One instruction? Two? It is not as simple a question as you might imagine. And, believe me, this is one case where size matters.

On an ARM processor, any ARM processor, you can increment a 32-bit number in a single instruction, usually taking one cycle. It really is that simple. This is because ARM processors are 32-bit devices, with 32-bit registers, a 32-bit ALU and 32-bit internal data paths. They are good at doing 32-bit operations efficiently - because that is what they are designed to do.

Look at the C routine below and its corresponding assembly code. You can see that the increment operation translates to a single instruction.

C Code

ARM Assembly Code

int increment(int a)
{
a = a + 1;
return a;
}

increment
add r0, r0, #1
bx lr

But not every processor is an ARM processor. An 8051, for instance, has a natural data size of 8 bits. It has 8-bit registers and its ALU carries out 8-bit operations. So, what might it take to increment a 32-bit variable on an 8051? Try this:

C Code

8051 Assembly Code

long increment(long a)
{
a = a + 1;
return a;
}

; a assigned to R4:R5:R6:R7

MOV A, R7
ADD A, #01h
MOV R7, A
CLR A
ADDC A, R6
MOV R6, A
CLR A
ADDC A, R5
MOV R5, A
CLR A
ADDC A, R4
MOV R4, A
RET

It is clear that 32-bit addition on an 8051 is much harder and much more time-consuming than a simple 8-bit addition. Since the ALU can only handle 8 bits at a time, four separate additions are required to propagate any carry across four separate parts of the result. On an ARM processor, the reverse is true. Here is an example of incrementing an 8-bit variable.

C Code

ARM Assembly Code

int increment(unsigned char a)
{
a = a + 1;
return a;
}

increment
ADD r0, r0, #1
AND r0, r0, #0xFF
BX lr

Although it may not seem much of an overhead, the compiler has to insert extra instructions to remove unwanted overflow and restrict the 32-bit result to fit in a declared 8-bit variable. The same would be true when using a 16-bit variable.

So, when moving from other, “smaller” architectures to ARM a change in mindset is necessary. It is no longer the right decision to choose the smallest possible container for a variable. Instead, 32-bit variables should be the default as they are the most efficient, arithmetically.

Store small, process large

But, in many applications, storage space is at a premium. That means you may still want to choose the smallest viable size for a particular variable so that it takes up the least possible space in memory. That can still be an efficient choice on ARM too. But you should still process items at the natural size of the core i.e. 32-bit words. ARM processors have byte and halfword sized load and store instructions which make it very easy to do the conversion at the time you transfer values into and out of registers. Here is an example of incrementing an 8-bit variable held in memory.

C Code

ARM Assembly Code

unsigned char a;
void increment_a(void)
{
a = a + 1;
}

increment_a
LDR  r0, =&a
LDRB r1, [r0]
ADD  r1, r1, #1
STRB r1, [r0]
BX   lr

(Yes, I know that the first statement isn’t legal assembler but you can see what it means!)

Here, the LDRB and STRB instructions automatically zero-extend the 8-bit value when loading it and truncate it when storing it. This takes care of the size adjustment and it is almost free – there may be an additional cycle of latency on the load instruction on some cores. Of course, if you want to do some more complex processing on an 8-bit variable, then it might be necessary to copy it into a word-sized local variable after loading it. It can then be processed at the natural word size and then only truncated again when finally written back to memory.

So remember, small isn’t always beautiful!

(ARM Processors)

Anonymous
  • Sean Ellis
    Sean Ellis over 11 years ago

    Another thing to note is that by using an ADDS instruction, the ARM code sequence sets the flags correctly for the 32-bit value as a whole. The longer 8051 sequence will also do so (assuming I'm remembering my 8051 assembly correctly), but the shorter sequence using INC instructions does not, as the INC instruction does not affect the carry or overflow flags.

    • Cancel
    • Up 0 Down
    • Reply
    • More
    • Cancel
  • Chris Shore
    Chris Shore over 11 years ago

    Again,  thanks for the comment. I think the 8051 is often the obvious comparator as it is very widely used and many engineers have used it a lot and understand it well.

    You are right to point out the much longer code for incrementing a 32-bit value. This exposes one of the major advantages of the ARM microcontroller cores in that they are designed as native 32-bit machines which process 32-bit values very well. Processors, like 8051, with a smaller natural word size are much less efficient by comparison.

    • Cancel
    • Up 0 Down
    • Reply
    • More
    • Cancel
  • Chris Shore
    Chris Shore over 11 years ago

    Thanks for the comment. I think I am in the clear as I am writing from the point of view of the ARM compiler which assumes char to be unsigned by default. Still, to be absolutely clear, I have edited the document and made the declaration explicitly unsigned.

    • Cancel
    • Up 0 Down
    • Reply
    • More
    • Cancel
  • 42Bastian
    42Bastian over 11 years ago

    Your assembler code for int increment(char a); is wrong if you assume char to be signed (as is the following example).

    So instead of the AND you need to sign-extend r0.

    • Cancel
    • Up 0 Down
    • Reply
    • More
    • Cancel
  • 42Bastian
    42Bastian over 11 years ago

    I always wonder why always 8051 is used to compare against ARM. Anyway the 8051 code for the increment functions is really poor.

    Check this:

    increment:
            inc r7      jnz exit      inc r6      jnz exit      inc r5      jnz exit      inc r4 exit:      ret
    • Cancel
    • Up 0 Down
    • Reply
    • More
    • Cancel
Architectures and Processors blog
  • Scalable Matrix Extension: Expanding the Arm Intrinsics Search Engine

    Chris Walsh
    Chris Walsh
    Arm is pleased to announce that the Arm Intrinsics Search Engine has been updated to include the Scalable Matrix Extension (SME) intrinsics, including both SME and SME2 intrinsics.
    • October 3, 2025
  • Arm A-Profile Architecture developments 2025

    Martin Weidmann
    Martin Weidmann
    Each year, Arm publishes updates to the A-Profile architecture alongside full Instruction Set and System Register documentation. In 2025, the update is Armv9.7-A.
    • October 2, 2025
  • When a barrier does not block: The pitfalls of partial order

    Wathsala Vithanage
    Wathsala Vithanage
    Acquire fences aren’t always enough. See how LDAPR exposed unsafe interleavings and what we did to patch the problem.
    • September 15, 2025