How do I do <x> in assembler?

December 19, 2013

7 minute read time.

One of the most common classes of question in the ARM community which I answer is "How do I do <x> in ARM assembler?". While I am always happy to answer these questions, I thought it would be useful to write blog containing a few handy self-help tips.

The crux of the technique which I use is to not write the assembler at all the first time. It is very easy to get wrong, the syntax is fiddly, and not even engineers employed by ARM memorize the whole architecture reference manual. Instead cheat and use C, a compiler, and a disassembler as your teacher. It is easier to demonstrate the technique with an example, so I'll pick on a question which was asked recently in the ARM Processors place.

The prerequisites for following the example yourself are simply an ARM-build of a GCC cross-compiler, and a text editor.

Question: How do I perform 64-bit value comparison in ARM instructions?

The steps we will follow are really quite simple:

Write a C example in a file called main.c
Compile with the command line below*/**:
- arm-eabi-gcc -c -O1 main.c -o main.o
Disassemble to investigate what the compiler did with the command line below*/**:
- arm-eabi-objdump -d main.o

However there are a few useful tips when writing the example C file which will make your life a lot easier.

* Your GCC build may have a different prefix than "arm-eabi-", so replace accordingly.

** All other ARM tool chains I have used have an equivalent set of command lines; for ARM DS-5 you can use armcc and fromelf instead of gcc and objdump.

Worked Example

The first step is to write a C example of the behaviour you want to understand. I quite like making my code snippets executable, as it means I can run it with some test code to make sure it is doing what I expect. One common mistake at this point is to put the code under test inside the main function, for example:

    int main( void ) {
        long long a = 1LL;
        long long b = 2LL;
        if( a < b )  {
            return 0;
        }
        return 1;
    }

If you compile with -O0 you tend to get a lot of additional complexity in the disassembly due to the main function itself, and the compiler not optimizing the output leaves a lot of unneeded instructions in the code sequence.

    00000000 <main>:
      0: e52db004 push {fp} ; (str fp, [sp, #-4]!)
      4: e28db000 add fp, sp, #0
      8: e24dd014 sub sp, sp, #20
      c: e3a02001 mov r2, #1
     10: e3a03000 mov r3, #0
     14: e14b20fc strd r2, [fp, #-12]
     18: e3a02002 mov r2, #2
     1c: e3a03000 mov r3, #0
     20: e14b21f4 strd r2, [fp, #-20] ; 0xffffffec
     24: e14b00dc ldrd r0, [fp, #-12]
     28: e14b21d4 ldrd r2, [fp, #-20] ; 0xffffffec
     2c: e1500002 cmp r0, r2
     30: e0d1c003 sbcs ip, r1, r3
     34: aa000001 bge 40 <main+0x40>
     38: e3a03000 mov r3, #0
     3c: ea000000 b 44 <main+0x44>
     40: e3a03001 mov r3, #1
     44: e1a00003 mov r0, r3
     48: e28bd000 add sp, fp, #0
     4c: e8bd0800 ldmfd sp!, {fp}
     50: e12fff1e bx lr

If you are just starting out on the path of learning the ARM ISA then all of this extra code makes even finding the instructions for the behaviour you wanted to learn about more difficult than it needs to be. To try and remove the additional instruction you then try to be clever and compile with -01, which gives:

    00000000 <main>:
       0: e3a00000 mov r0, #0
       4: e12fff1e bx lr

The compiler has determined that the inputs to our test sequence are constant, and optimized out the whole thing, which is also not particularly useful for our current purpose!

The solution is to create a small non-static stand-alone function which encapsulates the functionality you want to investigate. By separating it out from main you ensure that the function appears by itself in the disassembly listing, which generally makes it much easier to understand as it is not polluted by the main function pre/post-amble, and by making it non-static you stop the compiler detecting that the use is trivial and optimizing out the code because another object may import the symbol and use the function in a less trivial way (this is also why we specify -c as it tells the compiler to compile-only rather than treating this single file as a whole program which would allow more optimizations again).

Applying these rules, the source for the example becomes:

    int compare( long long a, long long b ) {
        if( a < b )  {
            return 0;
        }
        return 1;
    }


    int main( void ) {
        long long a = 1LL;
        long long b = 2LL;
        return compare( a, b );

    }

... and when compiled with -O1 and disassembled, you should get the following:

    00000000 <compare>:
       0: e1500002 cmp r0, r2
       4: e0d1c003 sbcs ip, r1, r3
       8: b3a00000 movlt r0, #0
       c: a3a00001 movge r0, #1
      10: e12fff1e bx lr

    00000014 <main>:
      14: e92d4008 push {r3, lr}
      18: e3a00001 mov r0, #1
      1c: e3a01000 mov r1, #0
      20: e3a02002 mov r2, #2
      24: e3a03000 mov r3, #0
      28: ebfffffe bl 0 <compare>
      2c: e8bd8008 pop {r3, pc}

Suddenly (I hope) it becomes very clear what part of the assembler is the small test function, and the simplified view makes this a really useful diagnostic tool when first learning the ISA.

One additional "handy hint" ...

GCC will by default use its internal "special names" for registers which can be an unnecessary source of confusion when first learning the ISA, as some of these names do not exist in the ARM documentation. In the example above we see three special names: ip, lr, pc. For most developers lr and pc are well understood - the link register and the program counter are fairly generic concepts which exist in the ARM documentation too - but ip is not mentioned in the ARM docs (it's r12 which is a scratch register in the ABI).

If you want to hide the GCC-specific special names, and stick to the ones in the ARM ISA documentation, you can add an additional flag to the objdump command line.

arm-eabi-objdump -d -Mreg-names-std main.o

I hope someone finds this useful - it's still a trick I use today even after programming on ARM for over 12 years ...

Pete

frequencydrive over 6 years ago

Oh man that is nice. I have PDF'ed this and saved it. Taking an embedded systems in engineering school. I have found this technique usefull my self but don't use GNU GCC. I'm using Keil and look at the dis-assembly of my C code all the time. I like the advice about moving the code you are interested in out of main because the compiled main function can obfuscate the code you are interested in. That is good advice that I am going to use from now on.
- Cancel
- Up 0 Down
- Reply
- More
- Cancel

Architectures and Processors blog

MPAM-Style cache partitioning with ATP-Engine and gem5

Hristo Belchev

Upstream gem5 and ATP-Engine MPAM-style cache partitioning are discussed, with experiments for the feature being proposed and analyzed.
- April 24, 2024
Optimizing your programs for Arm platforms

Tamar Christina

This blog covers techniques and tips that are useful to create better performing programs through compilers whether you are creating Android, Desktop or Server applications.
- April 24, 2024
Deep dive into the PMU value of L2D_CACHE_WR on the Neoverse N2 server

Ker Liu

In-depth analysis of what the PMU of L2D_CACHE_WR counts on the Neoverse N2 server.
- April 15, 2024

AI and ML blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded blog

Graphics, Gaming, and VR blog

High Performance Computing (HPC) blog

Infrastructure Solutions blog

Internet of Things (IoT) blog

Operating Systems blog

SoC Design and Simulation blog

Tools, Software and IDEs blog

How do I do <x> in assembler?

Question: How do I perform 64-bit value comparison in ARM instructions?

Worked Example

One additional "handy hint" ...

MPAM-Style cache partitioning with ATP-Engine and gem5

Optimizing your programs for Arm platforms

Deep dive into the PMU value of L2D_CACHE_WR on the Neoverse N2 server