What is new in LLVM12 for Arm

April 28, 2021

4 minute read time.

LLVM 12 is now available with improved support for Arm. It provides a native Windows on Arm toolchain, adds support for new Arm cores, debugging support for SVE/2 applications and performance improvements to a key CPU benchmark.

Windows-on-Arm native 64-bit Arm toolchain

LLVM 12 release provides native 64-bit Arm Windows on Arm LLVM binaries, thanks to Linaro’s effort (see this blog for more detail). LLVM now has 64-bit Arm Windows on Arm buildbots, hosted by Linaro. Further improvements are planned for the upcoming LLVM releases including better debug support.

New processor support

LLVM 12 adds mcpu support for Arm Neoverse cores, Neoverse V1 and Neoverse N2, Arm Cortex-A78C, Arm Cortex-R82 and Fujitsu A64FX. When provided with the relevant CPU flag, the compiler automatically picks the right architecture features and tunes for the target processor.

The following table describes the flag for each target processor:

Processor	Flag
Arm Neoverse V1	-mcpu=neoverse-v1
Arm Neoverse N2	-mcpu=neoverse-n2
Arm Cortex-A78C	-mcpu=cortex-a78c
Arm Cortex-R82	-mcpu=cortex-r82
Fujitsu A64FX	-mcpu=a64fx

Out-of-line Atomics for LSE deployment

AArch64 Large System Extensions (LSE) were introduced in Armv8.1-A. These provide more efficient atomic instructions for large multi-core systems.

LLVM 12 adds support for a new flag ‘-moutline-atomics', which detects at runtime whether the processor supports LSE. It then uses these new atomic instructions if possible, falling back to Armv8.0-A LL/SC loops on processors without LSE support. This option behaviour mirrors similar support available within GNU family of projects. We are working towards making this option enabled by default in the upcoming LLVM13 release.

Better SVE/SVE2 intrinsics support

LLVM11 was the first LLVM release to add vector-length agnostic SVE intrinsics (also known as ACLE) support. LLVM 12 adds vector-length specific ACLE support and improves vector-length-agnostic support.

SVE code-generation infrastructure

Enabling SVE auto-vectorization in LLVM has been a long journey and we have significant progress in LLVM12. For the first time, LLVM is now able to vectorize certain loops using width-agnostic SVE auto-vectorization. For example, consider the following loop, which is adapted from the TSVC_2 set of loops:

Fullscreen

1
2
3
4
5
6
7
void s000(double * __restrict a, double * __restrict b) {
  unsigned LEN_1D = 1024;
#pragma clang loop vectorize_width(2, scalable) interleave(disable) unroll(disable)
  for (int i = 0; i < LEN_1D; i++) {
    a[i] = b[i] + 1;
  }
}
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

void s000(double * __restrict a, double * __restrict b) {
  unsigned LEN_1D = 1024;
#pragma clang loop vectorize_width(2, scalable) interleave(disable) unroll(disable)
  for (int i = 0; i < LEN_1D; i++) {
    a[i] = b[i] + 1;
  }
}

The key to enabling SVE autovectorization is #pragma clang loop vectorize_width(2, scalable). This tells LLVM to attempt to vectorize the loop using a scalable vectorization width of 2 lanes. With this support, we are now able to work towards the eventual goal of enabling SVE auto-vectorization by default in LLVM.

Further information on this functionality can be found in our recent Linaro Virtual Connect talk on SVE in LLVM.

SPEC CPU 2017 CPU benchmark improvements

LLVM 12 has added new generic vectorization related optimizations that improves SPEC CPU 2017 Integer 525.x264_r benchmark. On Arm Neoverse N1 hardware, we see an uplift of 25 percent for this individual benchmark and an overall 2 percent improvement in SPEC CPU 2017 INT score. LLVM now identifies SAD pattern and combines UDADDV instructions to generate vector add. LLVM also supports epilogue vectorization. The chart below shows the various benchmark improvements over LLVM 11.

Improvements to MVE

MVE is an extension of the Armv8.1-M architecture. It is designed to give a significant performance uplift for machine learning and digital signal processing workloads on CPUs for embedded devices, such as Arm Cortex-M55.

LLVM 12 brings improvements to vectorization and code quality for MVE, leading to significant improvements to both performance and code size for a range of workloads. A particular highlight is the ability to fully utilise the capabilities of MVE to allow tail-predicated vectorization of reduction loops.

SVE/2 support in LLDB

LLDB now has full support for debugging SVE/2 applications including dynamic size update of SVE registers. Work is progressing on supporting Pointer Authentication (PAuth), Memory Tagging Extension (MTE) and Top Byte Ignore (TBI) architecture features in LLDB that are expected to be part of LLVM13 release.

Looking forward

LLVM 12 has major improvements for Arm-based hardware including support for newer CPUs, Windows on Arm native binaries and more. We (Arm and our partners) are working on more features for the forthcoming LLVM 13 (expected in Sep/Oct 2021) where you can look forward to exciting new features and performance improvements.

Read about the native LLVM toolchain

Learn about Windows on Arm

0 comments
0 members are here

Tools, Software and IDEs blog

What is new in LLVM 20?

Volodymyr Turanskyy

Discover what's new in LLVM 20, including Armv9.6-A support, SVE2.1 features, and key performance and code generation improvements.
- April 29, 2025
Running KleidiAI MatMul kernels in a bare-metal Arm environment

Paul Black

Benchmarking Arm®︎ KleidiAI MatMul kernels on bare-metal with AC6, GCC, and ATfE compilers.
- April 17, 2025
Migrating a project from GCC to Arm Toolchain for Embedded

Paul Black

Learn about migrating software projects to Arm Toolchain for Embedded in this blog post.
- March 28, 2025

AI blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded and Microcontrollers blog

Internet of Things (IoT) blog

Laptops and Desktops blog

Mobile, Graphics, and Gaming blog

Operating Systems blog

Servers and Cloud Computing blog

SoC Design and Simulation blog