Armv8-A architecture evolution

Armv8-A adoption continues to grow as the demand for 64-bit computing gathers momentum. As reported in the Q3-2015 financial results, Arm has now signed a cumulative total of 81 Armv8-A processor and architecture licenses, an increase of 24 licenses in the last year. Alongside this licensing ramp is a steady stream of Armv8-A based processor and other Arm products in or entering production. There are many more under development that will appear over the coming years.

Armv8-A brings benefits beyond 64-bit and the introduction of AArch64. For example, it enables improvements to the memory model matched to the requirements of data-race free programming support in the latest releases of languages such as C11/C++11. This applies to both AArch64 and AArch32 execution.

The Armv8-A architecture continues to evolve. Last year, Arm introduced a set of small scale enhancements that are collectively known as Armv8.1-A. Arm is now ready to announce a second set of enhancements under the architecture name Armv8.2-A. Developed with the partnership, these will start appearing in tools and models immediately, with product introductions to follow. Armv8.2-A is incremental to Armv8.1-A, and both variants support backwards compatibility with current and emerging Armv8.0-A products.

For any changes associated with Armv8.x-A, it is important to recognize that introduction of these features into new cores will take several years, and other design choices can have a much greater impact on system performance. Some features are market specific, reflecting the growing importance of the Arm architecture into areas such as infrastructure markets. Arm expects the v8.x-A variants to co-exist in the market, and many new products will continue to be developed using v8.0-A for some years to come. Many changes are transparent to the user level, with operating systems such as Linux using runtime library selection or kernel patches to adapt where necessary. All changes have been developed in a backwards compatible manner, so all existing software continues to be supported.

Armv8-A application profile

Armv8.2 overview

The enhancements introduced with Armv8.2 fall into four categories:

  • Half-precision floating point data processing
  • Memory model enhancements.
  • Introduction of RAS support.
  • Introduction of statistical profiling.

Half-precision floating point data processing

IEEE754-2008 formatted half-precision floating point data processing is added to Armv8.2-A. Its inclusion is optional and complements the existing half-precision storage format included in all floating point support today. There is increasing interest in the half-precision format for data processing associated with graphics/ pixel manipulation due to its large dynamic range. Machine learning research is another area where it is receiving some attention.

Half-precision data processing instructions are added for both AArch64 and AArch32 execution states as well as across scalar and Advanced SIMD floating point support. The instructions provide the same set of data processing operations as exists for single-precision data processing today.

Memory Model enhancements

Enhancements to the memory model can be summarized as follows. The enhancements apply to AArch64 and AArch32 unless otherwise stated.

For AArch64 execution and a 64KB granule size, Armv8.2-A supports an optional increase in the address space from 48- to 52-bits. This applies to physical and virtual forms of address; VA, IPA and PA. A level one block becomes 4TB in size, and can only be used in isolation i.e. no contiguous 4TB block support .

Prior to Armv8.2, and for a given implementation, all cache operations were defined in relation to the point of unification or the point of coherency in the memory hierarchy. To support advances in non-volatile memory and its effect on memory hierarchy design, a new form of cache clean to the point of persistency is introduced.

                DC CVAP, Xt      // clean virtual address to the point of persistency, AArch64 only

A state bit for Privilege-Access-Never was introduced in Armv8.1-A. Two address translation ‘P’ operations are added that factor in the PSTATE.PAN bit. These complement the existing operations as follows:

  • AT S1E1R, Xt     // Stage 1 address translation, memory is readable at EL1, PAN bit ignored
  • AT S1E1W, Xt    // Stage 1 address translation, memory is writeable at EL1, PAN bit ignored
  • AT S1E1RP, Xt   // Stage 1 address translation, memory is readable at EL1, PAN bit applied
  • AT S1E1WP, Xt  // Stage 1 address translation, memory is writeable at EL1, PAN bit applied

// Factoring PSTATE. PAN allows a privileged device driver to identify attempted violations

// AT S1E1{R,W}P return a permission fault when PAN ==1 && a user accessible address

A UserAccessOverride state bit, PSTATE.UAO, is added in Armv8.2-A that forces the user-access instructions LDTR*/STTR* to be treated as LDR/STR instructions. This allows a kernel to take advantage of the instructions in shared libraries that can execute with or without a user-access restriction.

Optional configuration control bits in the TCR_ELx registers that make page table entry bits PTE[62:59] IMPLEMENTATION DEFINED.

The eXecute Never functionality, XN-bit, associated with a stage 2 translation is extended to a 2-bit field, supporting all combinations of EL1 and EL0 execution:

  • EL1 or EL0 execution is permitted
  • EL1 execution is permitted, EL0 execution is not permitted
  • EL0 execution is permitted, EL1 execution is not permitted
  • Neither EL1 or EL0 execution is permitted.

A Common not private, CnP, bit is added to the translation base system registers that permits sharing of TLB entries in a multithreading implementation. Support is in AArch64 and for the long descriptor page table format only in AArch32.

Reliability, Availability, Serviceability (RAS) Extension

Provision of RAS support is an essential feature in many enterprise computing situations.  Kernel support requires a minimum level of RAS capability in all Armv8.2-A implementations. Minimal support enables:

  • Standard adoption of an Error Synchronization Barrier (ESB) instruction within the Linux Kernel
    • Basic ID mechanism for level of support implemented.
    • Default implementation of the system status registers.

Statistical Profiling Extension

The Statistical profiling extension is optional in Armv8.2 and only supported in the AArch64 execution state. A sample criterion is set on an instruction or micro-op basis, and then sampled at a regular interval. Each sample then gathers context associated with that sample into a profiling record, with only one record ever being compiled at any given time. Analysing large working sets of samples can provide considerable insight into software execution and its associated performance when sampling continuously on systems running large workloads over extended periods of time.


The Arm architecture continues to evolve and Armv8.2-A is the second set of changes being introduced to the Armv8-A architecture profile. Grouping enhancements in this manner helps the ecosystem manage tools and software support alongside the large numbers of Armv8-A based processors and products in development or production today. These changes add to the gradual migration in cores and related products over several years. Partners can currently obtain more details under a confidentiality agreement through their sales and support channels.

For a summary of the Armv8-A architecture, see the section on Armv8 architectural concepts in Chapter A1 of the Armv8-A Architecture Reference Manual. 

Summary of Armv8-A architecture

Armv8.1-A publication is now scheduled for 1Q, 2016 with Armv8.2-A due for publication in 2H, 2016. Partners can currently obtain more details under a confidentiality agreement through their sales and support channels.