Armv8-A architecture: 2016 additions

October 26, 2016

6 minute read time.

The Armv8-A architecture continues to evolve, with the additions developed through 2016 collectively known as Armv8.3-A. Grouping enhancements in this manner helps the ecosystem manage tools and software support alongside the large numbers of Armv8-A based processors and products in development or production today. These changes add to the gradual migration in cores and related products over several years.

Developed in collaboration with our architecture licensees and other key partners, Armv8.3-A adds:

A mechanism for enhanced security associated with pointer authentication
Additional controls and adjustment to the exception model for nested virtualization
A range of small-scale enhancements to the instruction set and System register support in a variety of areas

All these changes are incremental to previous sets of enhancements, with the Armv8-A System register ID mechanism used to identify features in any given implementation.

Please note: Arm recently announced support for a new vector processing architecture, the Scalable Vector Extension (SVE). This extension is independent of the changes introduced with Armv8.3-A. See Technology Update: The Scalable Vector Extension (SVE) for the Armv8-A architecture for more details.

Armv8.3-A overview

The enhancements introduced with Armv8.3 fall into the following categories:

Pointer authentication	(AArch64 only)
Nested virtualization	(AArch64 only)
Advanced SIMD complex number support	(AArch64 and AArch32)
Improved Javascript data type conversion support	(AArch64 and AArch32)
A change to the memory consistency model	(AArch64 only)
ID mechanism support for larger system-visible caches	(AArch64 and AArch32)

Note: AArch64 indicates the 64-bit Execution state and AArch32 the 32-bit Execution state in the Arm architecture.

Pointer authentication

Computer attacks are becoming more sophisticated. Examples of this are exploit mechanisms such as the use of gadgets in Return-Orientated-Programming (ROP) and Jump-Orientated-Programming (JOP). To mitigate against such exploits, Armv8.3-A introduces a feature that authenticates the contents of a register before it is used as the address for an indirect branch or data reference. For address authentication, the functionality uses the upper bits in a 64-bit address value normally associated with signed extension of the address space. This allows the introduction of a Pointer Authentication Code (PAC) as a new field within the upper bits of the value.

The functionality is summarized as follows:

Instructions are added for:
- PAC value creation that write the value to the uppermost bits in a destination register alongside an address pointer value
- Authentication that validate a PAC and update the destination register with a correct or corrupt address pointer. If the authentication fails, an indirect branch or load that uses the authenticated, and corrupt, address will cause an exception.
- Removing a PAC value from the specified register
An implementation can create a PAC using a standard and/or proprietary algorithm
The standardized form uses a recently published block cipher known as QARMA.

Nested virtualization

There is growing interest in cloud computing, and, in particular, in an increasingly common use case where a user rents a virtual machine from an Infrastructure as a Service (IaaS) provider. Nested virtualization is an attractive proposition where the workload to run on this virtual machine includes the use of a hypervisor. In this blog, the hypervisor that is run natively on the hardware is described as the host hypervisor, while the nested hypervisor that is run under the control of the host hypervisor is described as the guest hypervisor.

The Armv8.3-A nested virtualization support enables a guest hypervisor to run transparently in non-secure EL1 mode, unaware that it is not executing at EL2. Running a guest hypervisor at EL1, removes the exception trap overhead, performance, and latency costs of running this software as a non-secure user-level process. This feature is only supported in AArch64, and requires implementation of EL2.

Advanced SIMD floating-point complex number support

New instructions are added to AArch32 and AArch64 to aid floating-point multiplication and addition of complex numbers, where the complex numbers are packed in a vector register as a pair of elements. The Imaginary part of the number is placed in the more significant element, and the Real part of the number is placed in the less significant element.

The instructions include:

An optional rotation (when considered in polar representation) of one of the arguments by 0, 90, 180, or 270 degrees
Single-precision and double-precision data types, the latter only with AArch64 execution
Half-precision data type support that is only implemented if the half-precision floating-point instructions defined in Armv8.2-A are implemented; otherwise, the half-precision encodings are UNDEFINED

The floating-point functionality supported is:

Complex number signed multiply and accumulate
Complex number signed addition

Improved Javascript data type conversion

Javascript uses the double-precision floating-point format for all numbers. However, it needs to convert this common number format to 32-bit integers in order to perform bit-wise operations. Conversions from double-precision float to integer, as well as the need to check if the number converted really was an integer, are therefore relatively common occurrences.

Armv8.3-A adds instructions that convert a double-precision floating-point number to a signed 32-bit integer with round towards zero. Where the integer result is outside the range of a signed 32-bit integer (DP float supports integer precision up to 53 bits), the value stored as the result is the integer conversion modulo 2³², taking the same sign as the input float.

The Z-flag is used to determine if the original number was an integer; the other flags (N, C, and V) are always cleared. The Z-flag is set to one to indicate an integer within range, meaning it is cleared when the input number is:

An infinity
A NaN
Too large for a 32-bit signed integer
-0
not an integer value, and rounded accordingly

This approach allows a B.NE conditional branch to be used immediately after this instruction to test if the input double-precision number is a true representation of a 32-bit signed integer.

Memory consistency model

The Armv8.0 support for release consistency is based around the “RCsc” (Release Consistency sequentially consistent) model described by Adve & Gharacholoo in ^[1], where the Acquire/Release instructions follow a sequentially consistent order with respect to each other. This is well aligned to the requirements of the C++11/C11 memory_order_seq_cst, which is the default ordering of atomics in C++11/C11.

Instructions are added as part of Armv8.3-A to support the weaker RCpc (Release Consistent processor consistent) model where it is permissible that a Store-Release followed by a Load-Acquire to a different address can be re-ordered. This model is supported by the use of memory_order_release/ memory_order_acquire /memory_order_acqrel in C++11/C11.

^[1] Adve & Gharachorloo Shared Memory Consistency Models: A Tutorial

Support for larger architected caches

The Current Cache Size ID Register (CCSIDR) defines the number of sets of a cache level by using a 15-bit field, and the associativity and number of ways in a 10-bit field. To avoid one or both of these becoming limiting factors in an implementation, a second 32-bit register, CCSIDR2, is added and a new format adopted across the 64 bits provided by the existing and new registers.

To conclude

For a summary of the Armv8-A architecture, see the section on Armv8 architectural concepts in Chapter A1 of the Armv8-A Architecture Reference Manual.

Read Armv8-A Architecture Reference Manual

Armv8.1-A details are currently available as a supplement. Their consolidation alongside the Armv8.2-A details will be published in early 2017.

It is expected that the Armv8.3-A details will be consolidated into the Armv8-A Architecture Reference Manual and published in mid-2017.

Top Comments

Parents

Jens Bauer over 8 years ago

128-bit integers are very easy.

-If just adding and subtracting numbers, they can be quite fast by using two 64-bit integers.

But the 128-bit float will (as you know) extend the "range" of values, which is important when calculating "real-world" distances.

(I'm speaking about billions of 128-bit float calculations per second).

So far, the PPC is still the best CPU at doing this; I'd prefer handing the job over to Arm-based designs, because it's easier to obtain an Arm Cortex-A these days (and the cost is far lower), plus the performance of the Cortex-A is generally better and there are a lot of implementations to choose from.

I can do 1024-bit integer calculations, but each time I extend by an integer register, multiplying is dragged down terribly much, so parallel calculations suddenly need a whole bunch more CPUs.
- Cancel
- Up +1 Down
- Reply
- More
- Cancel

Comment

Jens Bauer over 8 years ago

128-bit integers are very easy.

-If just adding and subtracting numbers, they can be quite fast by using two 64-bit integers.

But the 128-bit float will (as you know) extend the "range" of values, which is important when calculating "real-world" distances.

(I'm speaking about billions of 128-bit float calculations per second).

So far, the PPC is still the best CPU at doing this; I'd prefer handing the job over to Arm-based designs, because it's easier to obtain an Arm Cortex-A these days (and the cost is far lower), plus the performance of the Cortex-A is generally better and there are a lot of implementations to choose from.

I can do 1024-bit integer calculations, but each time I extend by an integer register, multiplying is dragged down terribly much, so parallel calculations suddenly need a whole bunch more CPUs.
- Cancel
- Up +1 Down
- Reply
- More
- Cancel

Children

No Data

Architectures and Processors blog

Scalable Matrix Extension: Expanding the Arm Intrinsics Search Engine

Chris Walsh

Arm is pleased to announce that the Arm Intrinsics Search Engine has been updated to include the Scalable Matrix Extension (SME) intrinsics, including both SME and SME2 intrinsics.
- October 3, 2025
Arm A-Profile Architecture developments 2025

Martin Weidmann

Each year, Arm publishes updates to the A-Profile architecture alongside full Instruction Set and System Register documentation. In 2025, the update is Armv9.7-A.
- October 2, 2025
When a barrier does not block: The pitfalls of partial order

Wathsala Vithanage

Acquire fences aren’t always enough. See how LDAPR exposed unsafe interleavings and what we did to patch the problem.
- September 15, 2025

AI blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded and Microcontrollers blog

Internet of Things (IoT) blog

Laptops and Desktops blog

Mobile, Graphics, and Gaming blog

Operating Systems blog

Servers and Cloud Computing blog

SoC Design and Simulation blog

Tools, Software and IDEs blog