Arm Community
Arm Community
  • Site
  • User
  • Site
  • Search
  • User
  • Groups
    • Research Collaboration and Enablement
    • DesignStart
    • Education Hub
    • Innovation
    • Open Source Software and Platforms
  • Forums
    • AI and ML forum
    • Architectures and Processors forum
    • Arm Development Platforms forum
    • Arm Development Studio forum
    • Arm Virtual Hardware forum
    • Automotive forum
    • Compilers and Libraries forum
    • Graphics, Gaming, and VR forum
    • High Performance Computing (HPC) forum
    • Infrastructure Solutions forum
    • Internet of Things (IoT) forum
    • Keil forum
    • Morello Forum
    • Operating Systems forum
    • SoC Design and Simulation forum
    • 中文社区论区
  • Blogs
    • AI and ML blog
    • Announcements
    • Architectures and Processors blog
    • Automotive blog
    • Graphics, Gaming, and VR blog
    • High Performance Computing (HPC) blog
    • Infrastructure Solutions blog
    • Innovation blog
    • Internet of Things (IoT) blog
    • Operating Systems blog
    • Research Articles
    • SoC Design and Simulation blog
    • Tools, Software and IDEs blog
    • 中文社区博客
  • Support
    • Arm Support Services
    • Documentation
    • Downloads
    • Training
    • Arm Approved program
    • Arm Design Reviews
  • Community Help
  • More
  • Cancel
Arm Community blogs
Arm Community blogs
Architectures and Processors blog Size matters...
  • Blogs
  • Mentions
  • Sub-Groups
  • Tags
  • Jump...
  • Cancel
More blogs in Arm Community blogs
  • AI and ML blog

  • Announcements

  • Architectures and Processors blog

  • Automotive blog

  • Embedded blog

  • Graphics, Gaming, and VR blog

  • High Performance Computing (HPC) blog

  • Infrastructure Solutions blog

  • Internet of Things (IoT) blog

  • Operating Systems blog

  • SoC Design and Simulation blog

  • Tools, Software and IDEs blog

Tell us what you think
Tags
  • development
  • compiler
  • programming
  • coding
  • Processors
Actions
  • RSS
  • More
  • Cancel
Related blog posts
Related forum threads

Size matters...

Chris Shore
Chris Shore
November 27, 2013
4 minute read time.

Size matters

How much does it take to increment a number? One instruction? Two? It is not as simple a question as you might imagine. And, believe me, this is one case where size matters.

On an ARM processor, any ARM processor, you can increment a 32-bit number in a single instruction, usually taking one cycle. It really is that simple. This is because ARM processors are 32-bit devices, with 32-bit registers, a 32-bit ALU and 32-bit internal data paths. They are good at doing 32-bit operations efficiently - because that is what they are designed to do.

Look at the C routine below and its corresponding assembly code. You can see that the increment operation translates to a single instruction.

C Code

ARM Assembly Code

int increment(int a)
{
a = a + 1;
return a;
}

increment
add r0, r0, #1
bx lr

But not every processor is an ARM processor. An 8051, for instance, has a natural data size of 8 bits. It has 8-bit registers and its ALU carries out 8-bit operations. So, what might it take to increment a 32-bit variable on an 8051? Try this:

C Code

8051 Assembly Code

long increment(long a)
{
a = a + 1;
return a;
}

; a assigned to R4:R5:R6:R7

MOV A, R7
ADD A, #01h
MOV R7, A
CLR A
ADDC A, R6
MOV R6, A
CLR A
ADDC A, R5
MOV R5, A
CLR A
ADDC A, R4
MOV R4, A
RET

It is clear that 32-bit addition on an 8051 is much harder and much more time-consuming than a simple 8-bit addition. Since the ALU can only handle 8 bits at a time, four separate additions are required to propagate any carry across four separate parts of the result. On an ARM processor, the reverse is true. Here is an example of incrementing an 8-bit variable.

C Code

ARM Assembly Code

int increment(unsigned char a)
{
a = a + 1;
return a;
}

increment
ADD r0, r0, #1
AND r0, r0, #0xFF
BX lr

Although it may not seem much of an overhead, the compiler has to insert extra instructions to remove unwanted overflow and restrict the 32-bit result to fit in a declared 8-bit variable. The same would be true when using a 16-bit variable.

So, when moving from other, “smaller” architectures to ARM a change in mindset is necessary. It is no longer the right decision to choose the smallest possible container for a variable. Instead, 32-bit variables should be the default as they are the most efficient, arithmetically.

Store small, process large

But, in many applications, storage space is at a premium. That means you may still want to choose the smallest viable size for a particular variable so that it takes up the least possible space in memory. That can still be an efficient choice on ARM too. But you should still process items at the natural size of the core i.e. 32-bit words. ARM processors have byte and halfword sized load and store instructions which make it very easy to do the conversion at the time you transfer values into and out of registers. Here is an example of incrementing an 8-bit variable held in memory.

C Code

ARM Assembly Code

unsigned char a;
void increment_a(void)
{
a = a + 1;
}

increment_a
LDR  r0, =&a
LDRB r1, [r0]
ADD  r1, r1, #1
STRB r1, [r0]
BX   lr

(Yes, I know that the first statement isn’t legal assembler but you can see what it means!)

Here, the LDRB and STRB instructions automatically zero-extend the 8-bit value when loading it and truncate it when storing it. This takes care of the size adjustment and it is almost free – there may be an additional cycle of latency on the load instruction on some cores. Of course, if you want to do some more complex processing on an 8-bit variable, then it might be necessary to copy it into a word-sized local variable after loading it. It can then be processed at the natural word size and then only truncated again when finally written back to memory.

So remember, small isn’t always beautiful!

(ARM Processors)

Anonymous
Parents
  • 42Bastian
    Offline 42Bastian over 9 years ago

    I always wonder why always 8051 is used to compare against ARM. Anyway the 8051 code for the increment functions is really poor.

    Check this:

    increment:
            inc r7      jnz exit      inc r6      jnz exit      inc r5      jnz exit      inc r4 exit:      ret
    • Cancel
    • Up 0 Down
    • Reply
    • More
    • Cancel
Comment
  • 42Bastian
    Offline 42Bastian over 9 years ago

    I always wonder why always 8051 is used to compare against ARM. Anyway the 8051 code for the increment functions is really poor.

    Check this:

    increment:
            inc r7      jnz exit      inc r6      jnz exit      inc r5      jnz exit      inc r4 exit:      ret
    • Cancel
    • Up 0 Down
    • Reply
    • More
    • Cancel
Children
No Data
Architectures and Processors blog
  • How Arm is making it easier to build platforms that support Confidential Computing

    Mark Knight
    Mark Knight
    With new open-source patches for the Realm Management Extension (RME) now available, this blog explores latest developments and techniques for confidential computing on Arm.
    • April 28, 2023
  • Apache Arrow optimization on Arm

    Yibo Cai
    Yibo Cai
    This blog introduces Arm optimization practices with two solid examples from Apache Arrow project.
    • February 23, 2023
  • Optimizing TIFF image processing using AARCH64 (64-bit) Neon

    Ramin Zaghi
    Ramin Zaghi
    This guest blog shows how 64-bit Neon technology can be used to improve performance in image processing applications.
    • October 13, 2022