The ARMv8-A architecture and its ongoing development

 ARMv8-A, the ARMv8 A-profile version of the ARM architecture, was first publically previewed in October 2011. Over the past two years, there have been a growing number of ARMv8-A announcements from ARM, such as its Cortex-A53 and Cortex-A57 products, plus additional cores and end-user devices from licensees and OEMs. Many of these products are in, or entering, volume production today. As reported in the Q3-2014 financial results, ARM has signed 57 ARMv8-A processor and architecture licenses, meaning there are many more ARMv8-A based processors and products under development that will appear over the next 1-2 years.

Architecture evolves with constant requests for additions and refinements. To allow the ARM ecosystem to manage the next stage of its evolution, ARM is introducing a set of small scale enhancements that are fully backwards compatible with the initial v8.0 architecture, and will be collectively known as ARMv8.1-A. These have been developed in conjunction with the ARM partnership and will start to appear in public specifications, software development tools, models and software support throughout 2015, with early adopter silicon expected in the latter part of 2015. More details will emerge from ARM and its partners as products are introduced. It is important to recognize that introduction of these enhancements into new cores will take several years, and other design choices can have a much greater impact on system performance. Some markets and use cases, such as mobile, are expected to see little benefit from these changes. This means that v8.0 will continue to be the architecture of choice for many new designs and most software development over the medium term, and that v8.1 will have a gradual affect across different market segments, starting with very large systems. Many of the changes will be transparent to the user, with operating systems such as Linux using runtime library selection or kernel patches to adapt where necessary.

For a summary of the ARMv8-A architecture, see the section on ARMv8 architectural concepts in Chapter A1 of the ARMv8-A Architecture Reference Manual. This document, ARM DDI 0487, can be downloaded from by following the links from the top level => ARM architecture => reference manuals section.

ARMv8.1 overview

The enhancements introduced with ARMv8.1 fall into two categories:

  • Changes to the instruction set.
  • Changes to the exception model and memory translation.

Instruction set enhancements

ARMv8.1 includes the following additions to the A64 instruction set:

  • A set of AArch64 atomic read-write instructions
  • Additions to the Advanced SIMD instruction set for both AArch32 and AArch64 to enable opportunities for some library optimizations:
    • Signed Saturating Rounding Doubling Multiply Accumulate, Returning High Half
    • Signed Saturating Rounding Doubling Multiply Subtract, Returning High Half
    • The instructions are added in vector and scalar forms.
  • A set of AArch64 load and store instructions that can provide memory access order that is limited to configurable address regions.

As well as the additions, the optional CRC instructions in v8.0 become a requirement in ARMv8.1.

The atomic instructions can be used as an alternative to Load-exclusive/Store-exclusive instructions, by example to ease the implementation of atomic memory updates in very large systems. This could be in a closely coupled cache, sometimes referred to as near atomics, or further out in the memory system as far atomics. The instructions provide atomic update of register content with memory for a range of conditions:

  • Compare and swap of 8-, 16-, 32-, 64- or a pair of 32- or 64-bit registers as a conditional update of a value in memory.
  • ADD, BitClear, ExclusiveOR, BitSet signed and unsigned MAXimum or MINimum value data processing operations on -8, 16-, 32- or 64-bit values in memory. These can occur with or without copying the original value in memory to a register.
  • Swap of an 8-, 16-, 32- or 64-bit value between a register and value in memory.
  • The instructions also include controls associated with influencing the order properties, based on acquire and release semantics.

The limited order (LO) support is in two parts:

  • System registers configure one or more memory LORegions with a minimum resolution of 64Kbytes.
  • LoadLOAcquire and StoreLORelease instructions for 8-, 16-, 32- and 64-bit values are added, and can be used instead of the global ARMv8 LoadAcquire and StoreRelease instructions.

Exception Model and Translation System enhancements

Additions associated with the exception and memory model are:

  • A new Privileged Access Never (PAN) state bit. This bit provides control that prevents privileged access to user data unless explicitly enabled; an additional security mechanism against possible software attacks.
  • An increased VMID range for virtualization; supports a larger number of virtual machines.
  • Optional support for hardware update of the page table access flag, and the standardization of an optional, hardware updated, dirty bit mechanism.
  • The Virtualization Host Extensions (VHE). These enhancements improve the performance of Type 2 hypervisors by reducing the software overhead associated when transitioning between the Host and Guest operating systems. The extensions allow the Host OS to execute at EL2, as opposed to EL1, without substantial modification.
  • A mechanism to free up some translation table bits for operating system use, where the hardware support is not needed by the OS.

Finally, some new events are added to the Performance Monitor Unit (PMU) to better support profiling in operating systems such as the perf utility in Linux.


The ARM architecture, in line with other processor architectures, is evolving with time. ARMv8.1 is the first set of changes that ARM is introducing to the latest version of its ARMv8 A-profile architecture, grouped to help the eco-system manage tools and software support alongside the large numbers of ARMv8-A based processors and products in development or production today. These changes provide incremental benefits over v8.0, and as such, will appear as a gradual migration in cores and related products over several years. It should be noted that other design choices by silicon partners can have a much greater impact than the choice between v8 versus v8.1, and consequently we expect both to co-exist in the market for many years to come.  Public specifications will be supplied to support initial product introductions mid-2015, with some early visibility through tools and software starting now. Partners can currently obtain more details under a confidentiality agreement through their sales and support channels.

David Brash is Architecture Program Director in the Architecture and Technology Group, one of several groups within ARM’s engineering community.

  • Addition of atomic operations in v8.1 is very surprising, considering the strong push to deprecate the "swap" instruction, which has been the only atomic instruction up to now.

    I thought the reasoning behind the load/store exclusive operations is that atomic operations clog the bus (interconnect)?

  • The number of different memory type operations is already getting a bit overwhelming, a really good document at a couple of different levels and for different targets would be good I think.

    As to clogging up the bus or having rather long latency that certainly is a worry with the current memory interface. However if they had a way of sending the operation down the bus so it was done locally that could make the timing much more predictable and cut down on overheads. So my guess is that's what they're thinking of doing. That would help with either real time or very large systems

  • Hi,

    Does ds-5 ultimate evaluation edition supports arm_v8 neon (cortex-A57)?

    if it supports how to select that cpu option in armclng?

  • Joe, would you please be able to comment on DS-5 ARMv8 NEON support?

  • Ya it's supports arm_v8 neon.I wrote one function using the arm_v8 neon instruction it's supporting and working. 

  • Yes, as peterharris mentioned in this thread on the Software Development Tools group, ARMv8 NEON doesn't need separate compiler options and is supported in DS-5 Development Studio Ultimate Edition. zhangzheng posted some resources for how to specify CPU selection on that thread too.

  • Hi, actually i need to do the conversion of 32 bit into 64 bit while doing that i came to know that many of the instructions that supports in arm_v7 not supporting in arm_v8 example:RSB,SASX,SSAX,SADD16,SADD what is the alternative instructions in arm_v8

  • I see this question has been raised as a separately at why SASX,SSAX,SADD16,SADD are not supported in ARMv8?

    That's probably a good thing, I'll move my answer to there.

  • I see ARM have now released

    ARMv8.1 Reference Manual Supplement

    giving details about these extensions.

    I note the ARMv8 reference manual has also gone to issue A.j and now includes a note about a future change to the memory model which won't affect ARMv8 but will be put into the manual - so presumably will be to fix up how ARMv8.1 works. A lot of ARMv8.1 is about accessing memory in different ways - the statement "The ST<OP> instructions are not regarded as doing a read for the purpose of a DMB LD barrier." in the section about atomic instructions for instance is interesting.