Arm Community
Arm Community
  • Site
  • User
  • Site
  • Search
  • User
  • Groups
    • Research Collaboration and Enablement
    • DesignStart
    • Education Hub
    • Innovation
    • Open Source Software and Platforms
  • Forums
    • AI and ML forum
    • Architectures and Processors forum
    • Arm Development Platforms forum
    • Arm Development Studio forum
    • Arm Virtual Hardware forum
    • Automotive forum
    • Compilers and Libraries forum
    • Graphics, Gaming, and VR forum
    • High Performance Computing (HPC) forum
    • Infrastructure Solutions forum
    • Internet of Things (IoT) forum
    • Keil forum
    • Morello Forum
    • Operating Systems forum
    • SoC Design and Simulation forum
    • 中文社区论区
  • Blogs
    • AI and ML blog
    • Announcements
    • Architectures and Processors blog
    • Automotive blog
    • Graphics, Gaming, and VR blog
    • High Performance Computing (HPC) blog
    • Infrastructure Solutions blog
    • Innovation blog
    • Internet of Things (IoT) blog
    • Operating Systems blog
    • Research Articles
    • SoC Design and Simulation blog
    • Tools, Software and IDEs blog
    • 中文社区博客
  • Support
    • Arm Support Services
    • Documentation
    • Downloads
    • Training
    • Arm Approved program
    • Arm Design Reviews
  • Community Help
  • More
  • Cancel
Arm Community blogs
Arm Community blogs
Architectures and Processors blog New Technologies for the Arm A-Profile Architecture
  • Blogs
  • Mentions
  • Sub-Groups
  • Tags
  • Jump...
  • Cancel
More blogs in Arm Community blogs
  • AI and ML blog

  • Announcements

  • Architectures and Processors blog

  • Automotive blog

  • Embedded blog

  • Graphics, Gaming, and VR blog

  • High Performance Computing (HPC) blog

  • Infrastructure Solutions blog

  • Internet of Things (IoT) blog

  • Operating Systems blog

  • SoC Design and Simulation blog

  • Tools, Software and IDEs blog

Tell us what you think
Tags
  • Transactional Memory Extension (TME)
  • A-profile
  • Scalable Vector Extension (SVE)
Actions
  • RSS
  • More
  • Cancel
Related blog posts
Related forum threads

New Technologies for the Arm A-Profile Architecture

Berenice Mann
Berenice Mann
April 18, 2019
3 minute read time.

Nigel Stephens, Lead ISA Architect and Fellow, Architecture and Technology Group, Arm

This month, Arm is making available early technical details of two significant new technologies for its A-Profile architecture, both of which are designed to enhance the performance and scalability of parallel software. These new technologies are the Scalable Vector Extension version two (SVE2) and the Transactional Memory Extension (TME).

The purpose of this early disclosure is to inform and enable the OS and tools developer ecosystems, so that support will be widely available by the time CPUs, which deploy these new technologies, become available.

SVE2 allows a wider range of software to benefit from the advanced, scalable SIMD vector technology of the original SVE architecture, announced in 2017. TME allows certain classes of multi-threaded software to be scaled more easily, from running on a few CPU cores to running on many hundreds of cores.

Scalable Vector Extension 2 (SVE2)

The first version of the Scalable Vector Extension (SVE) was not a variant of the Arm Neon instruction set, but was targeted at the high-performance computing (HPC) space, bringing many advanced vectorization technologies to Arm-based processors. These could choose to implement vectors ranging from 128 up to 2048 bits in length. Its novel vector length-agnostic programming model allows vector code to be compiled or written once, and then scaled automatically to exploit the implemented vector length, reducing software development and deployment costs.

SVE2 builds on the foundations of SVE to bring the benefits of scalable SIMD vector performance and advanced auto-vectorization capabilities to a wider range of software, including DSP and multimedia SIMD codes that currently use Neon. It also adds many new features to further expand the use of SIMD vector hardware and increase the amount of fine-grain, data-level parallelism in programs.  

For backwards compatibility, the Neon instruction set remains fully supported. However, on future CPUs which implement SVE2, scalable SIMD code using SVE2 can be as performant as Neon when running on the same 128-bit vector length.

Other benefits of SVE2 include:

  • Scaling of performance as the hardware vector length increases, without having to rewrite or recompile code, can allow support of large-scale data processing workloads on a general-purpose CPU, with less need for specialized hardware accelerators, as shown in the image below.
  • The advanced auto-vectorization techniques, enabled by SVE2, allow more loops to be vectorized by compilers, increasing the amount of fine-grain, data-level parallelism while reducing the need for hand coding by specialist SIMD programmers.

A diagram showing the performance for Arm SVE2

Parity and beyond with traditional Neon DSP/Media workloads

Transactional Memory Extension (TME)

The Transactional Memory Extension brings Hardware Transactional Memory (HTM) support to the Arm Architecture. Transactional Memory is used to address the difficulty of writing highly concurrent, multi-threaded programs in which the amount of coarse-grain, thread-level parallelism can scale better with the number of CPUs, by reducing serialization due to lock contention.

Although high performance can be achieved using lock-free programming techniques, such code can take many years to develop because it is very hard to reason about, test and debug. Transactional Memory is a technology which reduces the difficulty of developing such software, while allowing the performance of concurrent accesses to large, shared data structures in memory to scale easily to the new breed of processors that contain many parallel CPU cores.

One of the most promising uses of Transactional Memory is known as Transactional Lock Elision (TLE), which allows existing regions of code, protected by locks, to be executed concurrently within a transaction. This happens with no modification to the multi-threaded program, and only falls back to the less optimal lock-taking path if the hardware detects a conflict within the transaction.

Developing software for SVE2 and TME

Hand-in-hand with the development of these new architecture technologies, we have been preparing simulation models, software development tools, optimized libraries and programming guides to enable early software exploration and porting. An early access software development environment, including compiler, debugger, and models for virtual prototyping is available now for lead architecture partners.

Moreover, we will soon begin the process of contributing support for SVE2 and TME to key open source initiatives, such as the LLVM and GNU toolchains, to ensure that the software ecosystem can be ready when the first devices become available.

Additional resources are available

Arm is continually working on improvements to its architecture. These new architecture technologies, SVE2 and TME, have been in development for several years, along with the associated tools and models, and will provide improved, scalable performance across a range of future A-Profile Arm-based devices.

We presented a more detailed presentation of the SVE2 and TME at Linaro Connect Bangkok, in April 2019. A PDF is available to download here.

Download the XML

Anonymous
Architectures and Processors blog
  • What is new in LLVM 15?

    Pablo Barrio
    Pablo Barrio
    LLVM 15.0.0 was released on September 6, followed by a series of minor bug-fixing releases. Arm contributed support for new Arm extensions and CPUs.
    • February 27, 2023
  • Apache Arrow optimization on Arm

    Yibo Cai
    Yibo Cai
    This blog introduces Arm optimization practices with two solid examples from Apache Arrow project.
    • February 23, 2023
  • Optimizing TIFF image processing using AARCH64 (64-bit) Neon

    Ramin Zaghi
    Ramin Zaghi
    This guest blog shows how 64-bit Neon technology can be used to improve performance in image processing applications.
    • October 13, 2022