Arm Compiler for Linux: what is new in the 22.0 release?

May 27, 2022

3 minute read time.

Arm Compiler for Linux 22.0 is now available with improved compilers and libraries. Arm Compiler for Linux (ACfL) is a combination of Arm C/C++ Compiler (armclang), Arm Fortran Compiler (armflang), and Arm Performance Libraries (ArmPL). In this blog, we explore what is new in this release.

Arm Compilers now based on LLVM 13

Arm Compilers are now based on LLVM 13, and this has resulted in performance improvements.

SPEC 2017 improvement with 22.0 Compilers

We see many sub-benchmarks of SPEC CPU 2017 improve, with an overall geomean score of 2.2% over the previous release of 21.1. The benchmark was run on an AWS c6g.metal instance (with Arm Neoverse-N1 core).

Better tuned for Neoverse-V1 (core in AWS Graviton 3)

Arm Compilers in 22.0 feature a tuned cost model for Neoverse-V1 and many SVE code-gen related improvements. This includes (1) optimal usage of the Gather/Scatter feature of SVE (2) aligning loops with padding to make better use of instruction cache (3) using SVE splice operation optimally when inserting one element of a vector into another.

Performance using SVE vs NEON

The cumulative effect of these optimizations can be seen in the previous graph. We are comparing here the SVE code tuned for Neoverse-V1 to the Neon code tuned for Neoverse-V1. Our benchmark is a set of representative micro-benchmarks used when developing the SVE architecture extension. You can see that the compilers in 22.0 (orange bar) outperform version 21.1 (blue bar). With these improvements, the 22.0 release is ready for the development of HPC applications on AWS Graviton 3.

GCC 11 update

The package now ships GCC 11 series of compilers, with many performance improvements.

Single ArmPL with runtime detection of CPU

Arm Performance Libraries are no longer packaged with separate libraries for SVE and non-SVE cores. We now provide a single library, which contains optimized versions for all supported cores, including SVE. At run time, the library detects the type of core and chooses the most optimal routines and configuration. As a user, you can automatically benefit from the fastest tunings within the library, without the need to re-link to a core-specific library.

Faster BLAS, LAPACK, and FFT

ArmPL 22.0 comes with further improvements in BLAS and LAPACK routines.

API	Improvements
BLAS Level 1	SVE optimizations for ?COPY, ?SCAL, ?AXPY
BLAS Level 2	Packed and banded functionality; ?TRMV and ?TRSV for large problems
BLAS Level 3	?TRMM and ?TRSM for large problems
LAPACK	?EEVD (eigenvalue decomposition) for small problems; ?POTRF for multithreaded cases

BLAS and LAPACK improvements in 22.0 ArmPL

The previous graph shows improvements in 22.0 over 21.0 (released in early 2021). The data is from benchmarks of over 5000 individual cases, covering: benchmarks across the wide set of BLAS routines, a selection of important LAPACK routines, for small O(10), medium O(100) and large O(1000) problem sizes, in both serial (1 thread) and parallel (8 threads) execution.

Improvements in math functions

In 22.0, we have improved the performance of many math functions. These include improvements in scalar functions (atan, atan2, atan2f, cos, exp, sin and erf) and vector functions (atanf, atan2f, cosf, erfcf, expo, logf, pow, sinf and tanf). In the following graph, you can see the impact when Elefunt benchmark is run on an AWS Graviton 2 (Neoverse N1) system.

Math routines improvement in 22.0 ArmPL

Module name changes

The package provides module files to easily load the required compiler or libraries. With the 22.0 release, please use the following module commands.

Environment	module load command
Arm C/C++/Fortran Compilers	module load acfl/22.0
Arm Performance Libraries	module load armpl/22.0
GNU compilers	module load gnu/11.2.0

Conclusion

Arm Compiler for Linux 22.0 brings many improvements and changes over the previous 21.x series. We continue to make further improvements and plan to provide the next release 22.1 in Sep/Oct 2022.

Download the latest package now

Parents

Jackzhu over 2 years ago

Does the different compiler will influence the cache coherence of multi-cores?
- Cancel
- Up 0 Down
- Reply
- More
- Cancel

Comment

Jackzhu over 2 years ago

Does the different compiler will influence the cache coherence of multi-cores?
- Cancel
- Up 0 Down
- Reply
- More
- Cancel

Children

No Data

Tools, Software and IDEs blog

Part 3: Leveraging Rust with Rich Operating Systems on Arm

Jonathan Pallant

Understand how Rust can take full advantage of running on a full-blown operating system such as Linux.
- November 15, 2024
Part 2: Integrating Rust with Real-Time Operating Systems on Arm

Jonathan Pallant

Explore how to integrate Rust with Real-Time Operating Systems (RTOS) on Arm microcontrollers and processors.
- November 13, 2024
Building Safe and Secure Software with Rust on Arm

Jonathan Pallant

Learn how Rust enhances safety and performance on Arm microcontrollers, with practical examples and insights from Jonathan Pallant.
- November 11, 2024

AI and ML blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded blog

Graphics, Gaming, and VR blog

High Performance Computing (HPC) blog

Infrastructure Solutions blog

Internet of Things (IoT) blog

Operating Systems blog

SoC Design and Simulation blog