Porting to ARM 64-bit

This White Paper is an introduction to porting existing code to the A64 instruction set supported by ARMv8-A processors like the Cortex-A53 and Cortex-A57 from ARM. It will also be useful for those writing new code for these platforms.

Why 64-bit? It seems that is a question with many answers! For some, it will be the need to address more than 4GB of memory, for others the need for wider registers and greater accuracy of 64-bit data

processing, for still others the attraction of a larger register set.

Whatever your reason for looking to move to 64-bit, it is likely that you will have a body of legacy software which will need porting as well as new code which needs writing. This paper is designed to help with both processes.

We’ll start with a quick look at the evolution of the ARM architecture which has brought 64-bit to reality.

Porting to ARM 64-bit.pdf
  • I have doubt on "Structure padding" you explained in this document.

    "In LP64, it has size 20. The ”long” has increased from 4 bytes and 8 bytes and must now be double-word aligned. This introduces four bytes of padding between the end of the first ”int” and the ”long”."

    If this structure need to be double-word aligned, then its size must be 24 not 20 ? In my opinion,padding will be required for both int, unless we re-ordered structure definition.

    struct foo {

    int a;

    int x;

    long l;

    }

    In above case size will be 16 bytes?

    could you pl.comment ??

  • hi,

    Could someone give me a introduction about "C1x/C++11" in the figure of page 2nd ?

  • Hi,

    As you and others have kindly pointed out, there is a minor error here. The structure will also have 4 bytes of padding added at the end to ensure that it aligns properly when declared as an array.

    I will upload a minor update to the document shortly.

    Many thanks for pointing this out. I trust that you found the rest of the document useful.

    Chris

  • Hi,

    Thank you for your comment on the document.

    C1x and C++11 are shorthand names for the latest ISO/IEC standards for C (C11, ISO/IEC 9899:2011) and C++ (ISO/IEC 14882:2011). Among other things, these standards introduce standard capabilities for multi-threaded programming. This includes the requirement for standard implementations of mutexes and other forms of "uninterruptible object access". The Load-Acquire and Store-Release instructions introduced in A64 are intended to comply with this.

    I hope this helps.

    Chris

  • Hi  Chris,

    I have  doubt on page 10, In the table that shows alternate instruction for PUSH and POP, In A64, to push and pop x0 and x1 we require 16 byte location in stack. But in the example it shows 8. Could you please comment on this? Thanks!

  • Thank you for pointing this out. I will work on a minor update and post it shortly.

    Chris

  • Hi Chris,

    Do you have plan to put this doc in infocenter? It is useful.

    Thanks,

    Juan

  • This is a very good document. Thank you, Chris for this.

    (I'm using it as a guideline, so I can prepare my coding style for 64-bit, even though I haven't moved to Cortex-A yet. This means I'll perhaps be able to minimize the re-writing code cycle).

  • I wonder if the section about the various options like ilp32 llp64 shouldn't be expanded a bit to be clearer about peoples options depending on where they are coming from and what they'll be running on. Plus it could mention big-endian options.

    For instance ilp32 could be useful for straight porting for speed but needs the appropriate libraries. The system structures in include files may not be identical to the ones for A32. Will user structures be exactly the same?

    ilp64 will not be supported at all

    Some machines may operate in big-endian mode e.g. when porting from MIPS or PowerPC. Support for one endian mode is very unlikely to be provided on the other except through virtualization.

    Apple do their own thing and reserve an extra register but otherwise support only ilp64 in 64 bit mode.

  • If writing network code, I would certainly switch to Big Endian mode too; then I would need no byte-swap in software.