Arm Community
Arm Community
  • Site
  • User
  • Site
  • Search
  • User
  • Groups
    • Arm Research
    • DesignStart
    • Education Hub
    • Graphics and Gaming
    • High Performance Computing
    • Innovation
    • Multimedia
    • Open Source Software and Platforms
    • Physical
    • Processors
    • Security
    • System
    • Software Tools
    • TrustZone for Armv8-M
    • 中文社区
  • Blog
    • Announcements
    • Artificial Intelligence
    • Automotive
    • Healthcare
    • HPC
    • Infrastructure
    • Innovation
    • Internet of Things
    • Machine Learning
    • Mobile
    • Smart Homes
    • Wearables
  • Forums
    • All developer forums
    • IP Product forums
    • Tool & Software forums
    • Pelion IoT Platform
  • Support
    • Open a support case
    • Documentation
    • Downloads
    • Training
    • Arm Approved program
    • Arm Design Reviews
  • Community Help
  • More
  • Cancel
Processors
  • Developer Community
  • IP Products
  • Processors
  • Jump...
  • Cancel
Processors
Processors blog Arm A-Profile Architecture Developments 2020
  • Blogs
  • Leaderboard
  • Forums
  • Videos & Files
  • Mentions
  • Sub-Groups
  • Tags
  • Jump...
  • Cancel
  • New
More blogs in Processors
  • DesignStart blog

  • Machine Learning IP blog

  • Processors blog

  • TrustZone for Armv8-M blog

Tell us what you think
Tags
  • A-Profile CPU
  • Armv8-A
  • A-profile
  • Processor Architecture
Actions
  • RSS
  • More
  • Cancel
Related blog posts
Related forum threads

Arm A-Profile Architecture Developments 2020

Martin Weidmann
Martin Weidmann
September 21, 2020

Working with its architecture licensees and ecosystem partners, Arm continues to evolve its architecture, developing new functionality to meet the needs of both new and existing markets.

This blog discusses some of the key additions to the A-profile architecture in 2020.

This blog also introduces two new additions to the Future Architecture Technologies program, which provides advanced information on unreleased versions of the architecture.

Full Instruction Set and System Register information will be available via our technical webpages. The complete Armv8-A Architecture Reference Manual (ArmARM), documenting the 2020 extensions and earlier functionality, is due for release in early 2021. XML releases will be available soon and we will link to those when available.

Details of previous updates to the A-profile architecture are available here: 2014, 2015, 2016, 2017, 2018 and 2019.

Enhanced support for device – hot-unplug

As part of the 2020 extensions, Arm is adding the ability to identify devices which can be subject to long delays. TLB invalidate (TLBI) operations and barriers can also be annotated with this attribute.

Technologies such as PCIe allow for devices to be hot-unplugged. This can occur even when there are outstanding requests to the device. When a device is removed, the PCIe root complex will respond with a default response after a timeout period, which is typically in the order of 50ms.

Some impact on the software directly interacting with the removed device is expected. However, we want to minimize the impact on other, unrelated tasks. Consider the following example:

Hot-unplug TLBI response

Figure 1 - Hot-unplug causing delayed TLBI response

Core 1 was interacting with the removed device and is now waiting for a response.

Core 2 broadcasts an unrelated TLBI and waits for the acknowledgment from core 1. Ideally core 1 would respond quickly, as it has no outstanding transactions for the location covered by the TLBI. However, some micro-architectures do not track the translation used for issued transactions. To meet the architectural requirements, core 1 would have to wait for all transactions to complete before replying to the TLBI, making core 2 also subject to the PCIe timeout.

The XS attribute gives an efficient mechanism for avoiding this. The mappings for the PCIe devices have XS=1, indicating that long delays are possible. Other regions, such as RAM, are marked as XS=0. A core can track whether outstanding transactions are XS=0 or 1 without needing to record the full original translation. In our example scenario, core 1 knows that only XS=1 accesses are outstanding. Allowing it to quickly respond to core 2’s TLBI if it is marked as applying to XS=0 mappings.

XS attribute-TLBI response delay

Figure 2 - XS attribute used to avoid TLBI response delay

Atomic 64-byte load and stores

A growing trend in enterprise systems is the introduction of accelerators that can be accessed using a 64-byte atomic loads or stores. These are used to add items to queues and can, in some cases, signal success or failure of the enqueue operation.

To support this new breed of accelerators 64-byte atomic load (LD64B) instruction and three store (ST64Bx) instructions are added to the architecture. 

work item work queue

Figure 3 - Adding a work item to a work queue

WFE and WFI with timeouts

The WFE and WFI instructions allow the core to put into standby, for example, while waiting for a resource to become available. There is no limit to how long the core could stay in standby, should no event or interrupt be received. This is one limitation on the use of these instructions.

To address this limitation, new variants of the WFI and WFE instructions are introduced which take a register operand containing a counter value. The core resumes from standby when the CNTVCT_EL0 virtual counter reaches or exceeds the specified value. This allows software to specify a maximum time to remain in standby.

Other functionality

The 2020 extensions also include other small features:

  • Support for 52-bit virtual and physical addressing with 4KB and 16KB translation granules
  • Enhancements to Privilege Access Never (PAN)
  • Support for asymmetric fault handling in MTE
  • Enhancements to PMU and SPE

Future Architecture Technologies

As part of the 2020 enhancements, Arm is introducing two new extensions as part of the Future Architecture Technologies program. Future Architecture Technologies are not released architectures, but those for which we want to share advance information to enable the ecosystem to prepare.

The Call-Stack Recorder Extension (CSRE) and Branch-Record Buffer Extension (BRBE) aim to improve the experience of developing software for Arm. The experience is improved by providing enhanced visibility of how code is executing. This information can be used for debugging, profiling, identifying hot-spots, Feedback Driven Optimization (FDO), and many other uses.

CSRE provides a low impact mechanism to record and unwind the stack. A live view of the current call stack is recorded in memory, where it can be efficiently captured for performance analysis or interpreted for debug.

BRBE captures a recent sequence of branches in an easily consumable format. This information can be used for debugging or fed into profiling tools for hot-spot analysis and AutoFDO. 

Summary

This blog provides a brief introduction to the latest features included in the Armv8-A architecture as Armv8.7-A, and some information on Future Architecture Technologies. More detailed information will soon be available on our Developer website.

The next step will be working with our ecosystem partners, including Linaro, to ensure that open source software is enabled, to make use of this functionality as soon as the hardware becomes available. Join us at Linaro Connect to learn more about the 2020 extensions and take part in the discussions.

Join Linaro Connect

Anonymous
Parents
  • 42Bastian Schick
    Offline 42Bastian Schick 2 months ago

    As for "WFE and WFI with timeouts", it would be cool if the instruction would return 0 or 1 in the register to detect the timeout w/o additional overhead.

    • Cancel
    • Up 0 Down
    • Reply
    • More
    • Cancel
Comment
  • 42Bastian Schick
    Offline 42Bastian Schick 2 months ago

    As for "WFE and WFI with timeouts", it would be cool if the instruction would return 0 or 1 in the register to detect the timeout w/o additional overhead.

    • Cancel
    • Up 0 Down
    • Reply
    • More
    • Cancel
Children
No Data
Processors blog
  • Time to get excited about the growing Windows on Arm Ecosystem

    Rahoul Varma
    Rahoul Varma
    This blog highlights the latest developments with the growing Windows on Arm ecosystem.
    • November 30, 2020
  • Parallel heterogenous computing for IoT-boards and nanocomputers with Armv8 and AArch64 hardware architecture

    Arthur Ratz
    Arthur Ratz
    Read this guest blog by Arthur Ratz about computing for IoT-boards and nanocomputers with Armv8-A and AArch64. This is a guest blog contribution from Arthur Ratz Build and run a modern parallel code…
    • November 20, 2020
  • Memory Model Tool: Morello (and some Memory Tagging)

    Ambroise Vincent
    Ambroise Vincent
    This post presents a new extension to the Memory Model Tool: the implementation of the Morello architectural features.
    • November 5, 2020