Arm Compiler for Linux 22.0 is now available with improved compilers and libraries. Arm Compiler for Linux (ACfL) is a combination of Arm C/C++ Compiler (armclang), Arm Fortran Compiler (armflang), and Arm Performance Libraries (ArmPL). In this blog, we explore what is new in this release.
Arm Compilers are now based on LLVM 13, and this has resulted in performance improvements.
We see many sub-benchmarks of SPEC CPU 2017 improve, with an overall geomean score of 2.2% over the previous release of 21.1. The benchmark was run on an AWS c6g.metal instance (with Arm Neoverse-N1 core).
Arm Compilers in 22.0 feature a tuned cost model for Neoverse-V1 and many SVE code-gen related improvements. This includes (1) optimal usage of the Gather/Scatter feature of SVE (2) aligning loops with padding to make better use of instruction cache (3) using SVE splice operation optimally when inserting one element of a vector into another.
The cumulative effect of these optimizations can be seen in the previous graph. We are comparing here the SVE code tuned for Neoverse-V1 to the Neon code tuned for Neoverse-V1. Our benchmark is a set of representative micro-benchmarks used when developing the SVE architecture extension. You can see that the compilers in 22.0 (orange bar) outperform version 21.1 (blue bar). With these improvements, the 22.0 release is ready for the development of HPC applications on AWS Graviton 3.
The package now ships GCC 11 series of compilers, with many performance improvements.
Arm Performance Libraries are no longer packaged with separate libraries for SVE and non-SVE cores. We now provide a single library, which contains optimized versions for all supported cores, including SVE. At run time, the library detects the type of core and chooses the most optimal routines and configuration. As a user, you can automatically benefit from the fastest tunings within the library, without the need to re-link to a core-specific library.
ArmPL 22.0 comes with further improvements in BLAS and LAPACK routines.
The previous graph shows improvements in 22.0 over 21.0 (released in early 2021). The data is from benchmarks of over 5000 individual cases, covering: benchmarks across the wide set of BLAS routines, a selection of important LAPACK routines, for small O(10), medium O(100) and large O(1000) problem sizes, in both serial (1 thread) and parallel (8 threads) execution.
In 22.0, we have improved the performance of many math functions. These include improvements in scalar functions (atan, atan2, atan2f, cos, exp, sin and erf) and vector functions (atanf, atan2f, cosf, erfcf, expo, logf, pow, sinf and tanf). In the following graph, you can see the impact when Elefunt benchmark is run on an AWS Graviton 2 (Neoverse N1) system.
The package provides module files to easily load the required compiler or libraries. With the 22.0 release, please use the following module commands.
Arm Compiler for Linux 22.0 brings many improvements and changes over the previous 21.x series. We continue to make further improvements and plan to provide the next release 22.1 in Sep/Oct 2022.
Download the latest package now
Does the different compiler will influence the cache coherence of multi-cores?
What are the C++ and Fortran compiler options used for getting performance on N1 and V1 architectures? Thanks