What is new in LLVM 16?

May 1, 2023

12 minute read time.

LLVM 16 was announced on March 17, 2022. As usual, Arm added support for new architectures and CPUs, and significant performance improvements. This time around, we also brought exciting new functionality such as function multi-versioning and full support for strict floating-point, and several existing features have been improved. llvm-objdump is now a better substitute for GNU objdump. We fixed support for the older Armv4 architecture, and improvements to the Fortran front-end means that we can now build SPEC2017.

Many thanks to all the people who contributed content to this blog post. Most notably:

Daniel Kiss
Kyrylo Tkachov
Paul Walker
Kiran Chandramohan
David Green
Simon Tatham
John Brawn
Ties Stuij.

If you want to know more about the previous release, you can read the blog about what is new in LLVM 15.

New architecture and CPU support

LLVM now supports the Armv8.9-A and Armv9.4-A extensions. You can learn more about the new extensions in the announcement blog.

Other than the standard support for this year's architecture, we finished assembly support for the Scalable Matrix Extension (SME and SME2). On the CPU side, this release extends the line-up of Armv9-A cores with support for our Cortex-A715 and Cortex-X3 CPUs.

A-profile 2022 updates: Armv8.9-A and Armv9.4-A

Assembly and disassembly is now available for all extensions except for the Guarded Call Stacks (GCS), GCS will be supported in the next LLVM release. The Arm C Language Extensions (ACLE) have also been extended with 2 new intrinsics, __rsr128 and __wsr128. These intrinsics make the new 128-bit System registers easier to access and now supported in LLVM.

The Translation Hardening Extension (THE) is one of the main security improvements coming with Armv9.4-A and it is part of the Virtual Memory System Architecture (VMSA). Its purpose is to prevent arbitrary changes to the virtual memory's translation tables in situations where an attacker has gained kernel privileges. The new Read-Check-Write (RCW) instructions have been added to the architecture to allow controlled modification of such tables while disabling ordinary writes.

Even though these are intended for kernel rather than user-space developers, the RCW instructions map nicely to various atomic operations on 128-bit datatypes in C++. More specifically, fetch_and, fetch_or, and exchange can be implemented directly with these instructions. This functionality is useful for anyone using atomics, so we added code generation support in LLVM 16. In targets where the LRCPC3 and LSE2 extensions are also available, these specialized instructions are directly generated from C++ code without the need of assembly or intrinsics. The following code is an example for std::atomic::fetch_and:

What is new in LLVM 16?

New architecture and CPU support

A-profile 2022 updates: Armv8.9-A and Armv9.4-A

Function multi-versioning

Performance improvements

Complex number autovectorization

Function specialization enabled by default and SPEC2017 intrate improvements

Improvements to SVE and autovectorization

Spec2017 builds with Flang

Target-gated ACLE intrinsics

Improvements to llvm-objdump

Support for strict floating-point on AArch64

Support for early Arm architectures in compiler-rt and LLD