Arm Compiler for Linux: what is new in the 22.0 release?

May 27, 2022

3 minute read time.

Arm Compiler for Linux 22.0 is now available with improved compilers and libraries. Arm Compiler for Linux (ACfL) is a combination of Arm C/C++ Compiler (armclang), Arm Fortran Compiler (armflang), and Arm Performance Libraries (ArmPL). In this blog, we explore what is new in this release.

Arm Compilers now based on LLVM 13

Arm Compilers are now based on LLVM 13, and this has resulted in performance improvements.

SPEC 2017 improvement with 22.0 Compilers

We see many sub-benchmarks of SPEC CPU 2017 improve, with an overall geomean score of 2.2% over the previous release of 21.1. The benchmark was run on an AWS c6g.metal instance (with Arm Neoverse-N1 core).

Better tuned for Neoverse-V1 (core in AWS Graviton 3)

Arm Compilers in 22.0 feature a tuned cost model for Neoverse-V1 and many SVE code-gen related improvements. This includes (1) optimal usage of the Gather/Scatter feature of SVE (2) aligning loops with padding to make better use of instruction cache (3) using SVE splice operation optimally when inserting one element of a vector into another.

Performance using SVE vs NEON

The cumulative effect of these optimizations can be seen in the previous graph. We are comparing here the SVE code tuned for Neoverse-V1 to the Neon code tuned for Neoverse-V1. Our benchmark is a set of representative micro-benchmarks used when developing the SVE architecture extension. You can see that the compilers in 22.0 (orange bar) outperform version 21.1 (blue bar). With these improvements, the 22.0 release is ready for the development of HPC applications on AWS Graviton 3.

GCC 11 update

The package now ships GCC 11 series of compilers, with many performance improvements.

Single ArmPL with runtime detection of CPU

Arm Performance Libraries are no longer packaged with separate libraries for SVE and non-SVE cores. We now provide a single library, which contains optimized versions for all supported cores, including SVE. At run time, the library detects the type of core and chooses the most optimal routines and configuration. As a user, you can automatically benefit from the fastest tunings within the library, without the need to re-link to a core-specific library.

Faster BLAS, LAPACK, and FFT

ArmPL 22.0 comes with further improvements in BLAS and LAPACK routines.

API	Improvements
BLAS Level 1	SVE optimizations for ?COPY, ?SCAL, ?AXPY
BLAS Level 2	Packed and banded functionality; ?TRMV and ?TRSV for large problems
BLAS Level 3	?TRMM and ?TRSM for large problems
LAPACK	?EEVD (eigenvalue decomposition) for small problems; ?POTRF for multithreaded cases

BLAS and LAPACK improvements in 22.0 ArmPL

The previous graph shows improvements in 22.0 over 21.0 (released in early 2021). The data is from benchmarks of over 5000 individual cases, covering: benchmarks across the wide set of BLAS routines, a selection of important LAPACK routines, for small O(10), medium O(100) and large O(1000) problem sizes, in both serial (1 thread) and parallel (8 threads) execution.

Improvements in math functions

In 22.0, we have improved the performance of many math functions. These include improvements in scalar functions (atan, atan2, atan2f, cos, exp, sin and erf) and vector functions (atanf, atan2f, cosf, erfcf, expo, logf, pow, sinf and tanf). In the following graph, you can see the impact when Elefunt benchmark is run on an AWS Graviton 2 (Neoverse N1) system.

Math routines improvement in 22.0 ArmPL

Module name changes

The package provides module files to easily load the required compiler or libraries. With the 22.0 release, please use the following module commands.

Environment	module load command
Arm C/C++/Fortran Compilers	module load acfl/22.0
Arm Performance Libraries	module load armpl/22.0
GNU compilers	module load gnu/11.2.0

Conclusion

Arm Compiler for Linux 22.0 brings many improvements and changes over the previous 21.x series. We continue to make further improvements and plan to provide the next release 22.1 in Sep/Oct 2022.

Download the latest package now

Jackzhu over 2 years ago

Does the different compiler will influence the cache coherence of multi-cores?
- Cancel
- Up 0 Down
- Reply
- More
- Cancel
Rama Malladi over 3 years ago

What are the C++ and Fortran compiler options used for getting performance on N1 and V1 architectures? Thanks
- Cancel
- Up 0 Down
- Reply
- More
- Cancel

Tools, Software and IDEs blog

CPython Core Dev Sprint 2025 at Arm Cambridge: The biggest one yet

Diego Russo

For one week, Arm’s Cambridge HQ became the heart of Python development. Contributors globally came together for the CPython Core Developer Sprint.
- October 9, 2025
Python on Arm: 2025 Update

Diego Russo

Python powers applications across Machine Learning (ML), automation, data science, DevOps, web development, and developer tooling.
- August 21, 2025
Product update: Arm Development Studio 2025.0 now available

Stephen Theobald

Arm Development Studio 2025.0 now available with Arm Toolchain for Embedded Professional.
- July 18, 2025

AI blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded and Microcontrollers blog

Internet of Things (IoT) blog

Laptops and Desktops blog

Mobile, Graphics, and Gaming blog

Operating Systems blog

Servers and Cloud Computing blog

SoC Design and Simulation blog