Arm Community
Arm Community
  • Site
  • User
  • Site
  • Search
  • User
Arm Community blogs
Arm Community blogs
Architectures and Processors blog Arm A-Profile Architecture Developments 2021
  • Blogs
  • Mentions
  • Sub-Groups
  • Tags
  • Jump...
  • Cancel
More blogs in Arm Community blogs
  • AI blog

  • Announcements

  • Architectures and Processors blog

  • Automotive blog

  • Embedded and Microcontrollers blog

  • Internet of Things (IoT) blog

  • Laptops and Desktops blog

  • Mobile, Graphics, and Gaming blog

  • Operating Systems blog

  • Servers and Cloud Computing blog

  • SoC Design and Simulation blog

  • Tools, Software and IDEs blog

Tell us what you think
Tags
  • CPU Architecture
  • A-Profile CPU
  • Architectures
  • A-profile
Actions
  • RSS
  • More
  • Cancel
Related blog posts
Related forum threads

Arm A-Profile Architecture Developments 2021

Martin Weidmann
Martin Weidmann
September 8, 2021
3 minute read time.

Working with its architecture licensees and ecosystem partners, Arm continues to evolve its architecture, developing new functionality to meet the needs of both new and existing markets.

In this blog read about some of the key additions to the A-Profile architecture in 2021.

Full Instruction set and System register information is available from the end of September with our developer webpages. The complete Arm Architecture Reference Manual, documenting the 2021 extensions and earlier functionality, is due for release in early 2022. Updates to the Learn the Architecture pages will appear during 2021.

Details of previous updates to the A-Profile architecture are available here: 2014, 2015, 2016, 2017, 2018, 2019 and 2020.

Optimizing for the memcpy() family of functions

The memcpy()/memset() family of library functions are widely used in software. Having efficient implementations of these functions is an important part of a system’s performance.

The traditional RISC approach is to build operations such as memcpy() out of standard instructions, such as loads and stores. One issue with this approach is the optimal instruction sequence can vary depending on factors such as the micro-architecture, starting alignment and size of the operation. This means that it is common to find pre-amble code in libraries to select between a wide range of implementations. Adding to overhead and increasing the long-term maintenance cost for software.

To address these concerns the 2021 extensions introduce new instructions specifically targeting the memcpy() and memset() family of functions. 

memcpy()/memmove()

memset()

CPY[F]Px [dst]!,[src]!,num_bytes!
CPY[F]Mx [dst]!,[src]!,num_bytes!
CPY[F]Ex [dst]!,[src]!,num_bytes!

SETPx  [dst]!,num_bytes!,data
SETMx  [dst]!,num_bytes!,data
SETEx  [dst]!,num_bytes!,data

For software developers these instructions give the ability to write a standard optimized sequence that is portable across micro-architectures, alignments, and size. For hardware designers, the new instructions make it easier to detect memcpy()/memset() operations and therefore optimize for them.

Non-maskable interrupts

In the past some Arm processors, such as the Cortex-R4, have supported non-maskable interrupts (NMI), but they were not a standard architectural feature. This is changing with the 2021 extensions, with new support added in both the CPU and Generic Interrupt Controller (GIC) architectures.

GICv3.3 adds an NMI attribute that software can assign to interrupts. Interrupts with the NMI attribute are treated as the highest priority for the owning Security state, with different masking and pre-emptions rules:

Figure 1: Handling of non-maskable interrupts in the GIC and CPU

Within the CPU, NMIs are not subject to the existing PSTATE.I and PSTATE.F masks. Allowing NMIs to be taken as exceptions even when most interrupts are masked. Some masking of NMIs is necessary, for example on interrupt entry and exit to prevent corruption of return state. A new mask, PSTATE.AllInt, is added that masks all interrupts including NMIs. Software can also use the selected stack pointer, PSTATE.SP, as an implicit mask.

Performance Monitoring Unit (PMU) updates

The Performance Monitoring Unit (PMU) is an important tool for helping developers to understand how efficiently their code runs on Arm processors. 

The 2021 extensions add new PMU events for cache line state tracking. These events can be used to profile the accuracy of cache prefetching. Another set of PMU events is added for reporting where data is coming from on cache hits, giving information on the type and level of the cache.

Some PMU events can increment by more than 1 per cycle, for example the number of FP operations per cycle. The 2021 extensions introduce a new threshold control, which allows software to examine the distribution of these values, by creating a histogram profile.

Other functionality

Other features included in the 2021 extensions:

  • Hinted conditional branches.
  • QARMA3 algorithm for Pointer Authentication.
  • EL1 and EL2 traps on use of IMPDEF functionality at EL0.
  • Controls for EL0 cache maintenance operations.
  • BRBE extended to support EL3.

Summary

This blog provides a brief introduction to the latest features included in the Arm architecture as Armv8.8-A and Armv9.3-A. More detailed information can be found on our Developer website.

The next step will be working with our ecosystem partners, including Linaro, to ensure that open-source software is enabled, to make use of this functionality when the hardware becomes available.

Join me at Virtual Linaro Connect in September to learn more about the 2021 extensions and take part in the discussions.

Anonymous
  • 42Bastian Schick
    42Bastian Schick over 3 years ago

    Ouch, a maskable "NMI" :-) It should not be maskable in the core, but rather the GIC.

    • Cancel
    • Up 0 Down
    • Reply
    • More
    • Cancel
Architectures and Processors blog
  • Introducing GICv5: Scalable and secure interrupt management for Arm

    Christoffer Dall
    Christoffer Dall
    Introducing Arm GICv5: a scalable, hypervisor-free interrupt controller for modern multi-core systems with improved virtualization and real-time support.
    • April 28, 2025
  • Getting started with AARCHMRS Features.json using Python

    Joh
    Joh
    A high-level introduction to the Arm Architecture Machine Readable Specification (AARCHMRS) Features.json with some examples to interpret and start to work with the available data using Python.
    • April 8, 2025
  • Advancing server manageability on Arm Neoverse Compute Subsystem (CSS) with OpenBMC

    Samer El-Haj-Mahmoud
    Samer El-Haj-Mahmoud
    Arm and 9elements Cyber Security have brought a prototype of OpenBMC to the Arm Neoverse Compute Subsystem (CSS) to advancing server manageability.
    • January 28, 2025