Arm Community
Arm Community
  • Site
  • User
  • Site
  • Search
  • User
  • Groups
    • Research Collaboration and Enablement
    • DesignStart
    • Education Hub
    • Innovation
    • Open Source Software and Platforms
  • Forums
    • AI and ML forum
    • Architectures and Processors forum
    • Arm Development Platforms forum
    • Arm Development Studio forum
    • Arm Virtual Hardware forum
    • Automotive forum
    • Compilers and Libraries forum
    • Graphics, Gaming, and VR forum
    • High Performance Computing (HPC) forum
    • Infrastructure Solutions forum
    • Internet of Things (IoT) forum
    • Keil forum
    • Morello Forum
    • Operating Systems forum
    • SoC Design and Simulation forum
    • 中文社区论区
  • Blogs
    • AI and ML blog
    • Announcements
    • Architectures and Processors blog
    • Automotive blog
    • Graphics, Gaming, and VR blog
    • High Performance Computing (HPC) blog
    • Infrastructure Solutions blog
    • Innovation blog
    • Internet of Things (IoT) blog
    • Operating Systems blog
    • Research Articles
    • SoC Design and Simulation blog
    • Tools, Software and IDEs blog
    • 中文社区博客
  • Support
    • Arm Support Services
    • Documentation
    • Downloads
    • Training
    • Arm Approved program
    • Arm Design Reviews
  • Community Help
  • More
  • Cancel
Arm Community blogs
Arm Community blogs
Architectures and Processors blog Armv8-A architecture: 2016 additions
  • Blogs
  • Mentions
  • Sub-Groups
  • Tags
  • Jump...
  • Cancel
More blogs in Arm Community blogs
  • AI and ML blog

  • Announcements

  • Architectures and Processors blog

  • Automotive blog

  • Embedded blog

  • Graphics, Gaming, and VR blog

  • High Performance Computing (HPC) blog

  • Infrastructure Solutions blog

  • Internet of Things (IoT) blog

  • Operating Systems blog

  • SoC Design and Simulation blog

  • Tools, Software and IDEs blog

Tell us what you think
Tags
  • EL1
  • Architecture
  • A-Profile CPU
  • EL2
  • AArch64
  • Armv8-A
  • AArch32
Actions
  • RSS
  • More
  • Cancel
Related blog posts
Related forum threads

Armv8-A architecture: 2016 additions

David Brash
David Brash
October 26, 2016
6 minute read time.

The Armv8-A architecture continues to evolve, with the additions developed through 2016 collectively known as Armv8.3-A. Grouping enhancements in this manner helps the ecosystem manage tools and software support alongside the large numbers of Armv8-A based processors and products in development or production today. These changes add to the gradual migration in cores and related products over several years.

Developed in collaboration with our architecture licensees and other key partners, Armv8.3-A adds:

  • A mechanism for enhanced security associated with pointer authentication
  • Additional controls and adjustment to the exception model for nested virtualization
  • A range of small-scale enhancements to the instruction set and System register support in a variety of areas

All these changes are incremental to previous sets of enhancements, with the Armv8-A System register ID mechanism used to identify features in any given implementation.

Please note: Arm recently announced support for a new vector processing architecture, the Scalable Vector Extension (SVE). This extension is independent of the changes introduced with Armv8.3-A. See Technology Update: The Scalable Vector Extension (SVE) for the Armv8-A architecture for more details.

Armv8.3-A overview

The enhancements introduced with Armv8.3 fall into the following categories:

  • Pointer authentication
(AArch64 only)
  • Nested virtualization
(AArch64 only)
  • Advanced SIMD complex number support
(AArch64 and AArch32)
  • Improved Javascript data type conversion support
(AArch64 and AArch32)
  • A change to the memory consistency model
(AArch64 only)
  • ID mechanism support for larger system-visible caches
(AArch64 and AArch32)

Note: AArch64 indicates the 64-bit Execution state and AArch32 the 32-bit Execution state in the Arm architecture.

Pointer authentication

Computer attacks are becoming more sophisticated. Examples of this are exploit mechanisms such as the use of gadgets in Return-Orientated-Programming (ROP) and Jump-Orientated-Programming (JOP). To mitigate against such exploits, Armv8.3-A introduces a feature that authenticates the contents of a register before it is used as the address for an indirect branch or data reference. For address authentication, the functionality uses the upper bits in a 64-bit address value normally associated with signed extension of the address space. This allows the introduction of a Pointer Authentication Code (PAC) as a new field within the upper bits of the value.

The functionality is summarized as follows:

  • Instructions are added for:
    • PAC value creation that write the value to the uppermost bits in a destination register alongside an address pointer value
    • Authentication that validate a PAC and update the destination register with a correct or corrupt address pointer. If the authentication fails, an indirect branch or load that uses the authenticated, and corrupt, address will cause an exception.
    • Removing a PAC value from the specified register
  • An implementation can create a PAC using a standard and/or proprietary algorithm
  • The standardized form uses a recently published block cipher known as QARMA.

Nested virtualization

There is growing interest in cloud computing, and, in particular, in an increasingly common use case where a user rents a virtual machine from an Infrastructure as a Service (IaaS) provider. Nested virtualization is an attractive proposition where the workload to run on this virtual machine includes the use of a hypervisor. In this blog, the hypervisor that is run natively on the hardware is described as the host hypervisor, while the nested hypervisor that is run under the control of the host hypervisor is described as the guest hypervisor.

The Armv8.3-A nested virtualization support enables a guest hypervisor to run transparently in non-secure EL1 mode, unaware that it is not executing at EL2. Running a guest hypervisor at EL1, removes the exception trap overhead, performance, and latency costs of running this software as a non-secure user-level process. This feature is only supported in AArch64, and requires implementation of EL2.

Advanced SIMD floating-point complex number support

New instructions are added to AArch32 and AArch64 to aid floating-point multiplication and addition of complex numbers, where the complex numbers are packed in a vector register as a pair of elements. The Imaginary part of the number is placed in the more significant element, and the Real part of the number is placed in the less significant element.

The instructions include:

  • An optional rotation (when considered in polar representation) of one of the arguments by 0, 90, 180, or 270 degrees
  • Single-precision and double-precision data types, the latter only with AArch64 execution
  • Half-precision data type support that is only implemented if the half-precision floating-point instructions defined in Armv8.2-A are implemented; otherwise, the half-precision encodings are UNDEFINED

The floating-point functionality supported is:

  • Complex number signed multiply and accumulate
  • Complex number signed addition

Improved Javascript data type conversion

Javascript uses the double-precision floating-point format for all numbers. However, it needs to convert this common number format to 32-bit integers in order to perform bit-wise operations. Conversions from double-precision float to integer, as well as the need to check if the number converted really was an integer, are therefore relatively common occurrences.

Armv8.3-A adds instructions that convert a double-precision floating-point number to a signed 32-bit integer with round towards zero. Where the integer result is outside the range of a signed 32-bit integer (DP float supports integer precision up to 53 bits), the value stored as the result is the integer conversion modulo 232, taking the same sign as the input float.

The Z-flag is used to determine if the original number was an integer; the other flags (N, C, and V) are always cleared. The Z-flag is set to one to indicate an integer within range, meaning it is cleared when the input number is:

  • An infinity
  • A NaN
  • Too large for a 32-bit signed integer
  • -0
  • not an integer value, and rounded accordingly

This approach allows a B.NE conditional branch to be used immediately after this instruction to test if the input double-precision number is a true representation of a 32-bit signed integer.

Memory consistency model

The Armv8.0 support for release consistency is based around the “RCsc” (Release Consistency sequentially consistent) model described by Adve & Gharacholoo in [1], where the Acquire/Release instructions follow a sequentially consistent order with respect to each other. This is well aligned to the requirements of the C++11/C11 memory_order_seq_cst, which is the default ordering of atomics in C++11/C11.

Instructions are added as part of Armv8.3-A to support the weaker RCpc (Release Consistent processor consistent) model where it is permissible that a Store-Release followed by a Load-Acquire to a different address can be re-ordered. This model is supported by the use of memory_order_release/ memory_order_acquire /memory_order_acqrel in C++11/C11.

[1] Adve & Gharachorloo Shared Memory Consistency Models: A Tutorial

Support for larger architected caches

The Current Cache Size ID Register (CCSIDR) defines the number of sets of a cache level by using a 15-bit field, and the associativity and number of ways in a 10-bit field. To avoid one or both of these becoming limiting factors in an implementation, a second 32-bit register, CCSIDR2, is added and a new format adopted across the 64 bits provided by the existing and new registers.

To conclude

For a summary of the Armv8-A architecture, see the section on Armv8 architectural concepts in Chapter A1 of the Armv8-A Architecture Reference Manual.

Read Armv8-A Architecture Reference Manual

Armv8.1-A details are currently available as a supplement. Their consolidation alongside the Armv8.2-A details will be published in early 2017.

It is expected that the Armv8.3-A details will be consolidated into the Armv8-A Architecture Reference Manual and published in mid-2017.

Anonymous

Top Comments

  • Jens Bauer
    Offline Jens Bauer over 6 years ago +1
    128-bit integers are very easy. -If just adding and subtracting numbers, they can be quite fast by using two 64-bit integers. But the 128-bit float will (as you know) extend the "range" of values...
  • daith
    daith over 6 years ago +1
    Well you can get it mattering if you're measuring to the nearest centimeter over the distance to Pluto but that's not everyday real-word. The real use is to ameliorate the rather nasty gotchas that can...
  • Krister Walfridsson
    Offline Krister Walfridsson over 6 years ago +1
    The complex number support piques my curiosity… What use case motivates adding this to the ISA?
  • Saul Luizaga
    Offline Saul Luizaga over 2 years ago

    Where can i download in .pdf or .epub these ISA/extension?

    • Cancel
    • Up 0 Down
    • Reply
    • More
    • Cancel
  • Saul Luizaga
    Offline Saul Luizaga over 2 years ago

    Where can i download in .pdf or .epub these ISA/extension?

    • Cancel
    • Up +1 Down
    • Reply
    • More
    • Cancel
  • daith
    daith over 6 years ago

    Complex numbers occur in all sorts of situation so C C++ Python Java FORTRAN etc all support them. From what I can see there they are not talking about direct support for complex multiplication but for simple operations to arrange the operands easily - otherwise messing around just arranging things will take up a large part of the time.

    I'm not sure why they mention addition as no extra support is needed, and if they are doing those rearrangement operations I'd have thought they would have supported getting the complex conjugate as well as those rotations.

    Doing anything more than what they say would require quite a bit of work and be unlikely to be worth it except maybe for a HPC machine.Even something as simple as cabs(x+iy) = sqrt(x^2+y^2) has to do various checks and tricks to avoid getting an unnecessary overflow or underflow if x or y are large or small. And getting it accurate is even more difficult - who would bet that if a and b are approximately equal then (a-b)^2+2ab can be a more accurate approximation of a^2+b^2? Not that most users would be bothered but that's the problem with libraries, they have to cater for those who do care.

    • Cancel
    • Up +1 Down
    • Reply
    • More
    • Cancel
  • Krister Walfridsson
    Offline Krister Walfridsson over 6 years ago

    The complex number support piques my curiosity… What use case motivates adding this to the ISA?

    • Cancel
    • Up +1 Down
    • Reply
    • More
    • Cancel
  • Jens Bauer
    Offline Jens Bauer over 6 years ago

    This is absolutely correct and very important (great document too).

    -It's also one of the reasons that the GNU multiprecision libraries were written (probably the most important reasons I believe).

    I can think of several reasons for calculating distances:

    1. Distances between planets and other objects in the universe.
    2. Particle distance on earth (in a bowl in the lab - when baking a cake of course).
    3. Particle distances in space.
    4. Distances between places on Earth [GPS coordinates need corrections all the time].

    #2 also covers the people working at CERN (I know they use ARM).

    • Cancel
    • Up 0 Down
    • Reply
    • More
    • Cancel
>
Architectures and Processors blog
  • What is new in LLVM 15?

    Pablo Barrio
    Pablo Barrio
    LLVM 15.0.0 was released on September 6, followed by a series of minor bug-fixing releases. Arm contributed support for new Arm extensions and CPUs.
    • February 27, 2023
  • Apache Arrow optimization on Arm

    Yibo Cai
    Yibo Cai
    This blog introduces Arm optimization practices with two solid examples from Apache Arrow project.
    • February 23, 2023
  • Optimizing TIFF image processing using AARCH64 (64-bit) Neon

    Ramin Zaghi
    Ramin Zaghi
    This guest blog shows how 64-bit Neon technology can be used to improve performance in image processing applications.
    • October 13, 2022