Arm release the latest version of Arm Compiler for Linux twice a year. This includes a full user space compilation toolchain for Linux-based environments for software written in C, C++ and Fortran. In addition, the package also includes Arm Performance Libraries, the vendor library containing optimized implementations of sparse and dense linear algebra functions, FFTs and math.h functions.
In April 2024, we have released the 24.04 version of both compilers and libraries. It is available for free and can be downloaded here. In this blog post, we outline some of the biggest changes in this release. Note that the next release, 24.10, is expected in October 2024.
Arm Performance Libraries versions are also released as standalone downloads for Linux (compatible with GCC and NVHPC), macOS (compatible with Clang) and Windows (compatible with Clang and MSVC).
A summary of the key features in this release: Compiler:
-fno-math-errno
-O1
-fmath-errno
-fveclib=none
-fmath-errno.
fmod)
Libraries:
Package:
Full release notes can be found online.
Arm Compiler for Linux (ACfL) 24.04 brings some performance improvements on a number of workloads. The graph below shows the performance of SPEC2k17 benchmarked on a Neoverse V1 system. Results are compared to the ACfL 23.04 release.
This second graph shows ACfL performance over a number of industry standard applications running over 64 cores on an AWS c7g.16xlarge. It can be seen that ACfL 24.04 delivers some noticeable improvements over ACfL 23.04.
Arm Performance Libraries (Arm PL) provides optimized standard core math libraries for numerical applications on 64-bit Arm (AArch64) processors. These are built with OpenMP parallelism for BLAS, LAPACK, FFT, and sparse routines to maximize performance in multi-processor environments. The libraries are available for Linux, macOS and Windows.
Version 24.04 features, in addition to regular performance improvements, new functions for generating random numbers, tuned support for the latest AArch64 systems, and improved compatibility with GCC for stand-alone Arm PL releases.
Arm PL includes libamath, a library containing optimized scalar and vector math.h functions, and from 24.04 this is available on Windows for the first time.
Arm Performance Libraries 24.04 includes the interface to the random number generation part of the VSL library developed by Intel® and shipped for x86 processors as part of oneMKL. We are grateful to Intel® for having released this interface, along with their documentation, to us under a Creative Commons 4.0 license, allowing us to develop our own implementation of this functionality for users of Arm-based systems, enabling software portability between architectures.
By linking to Arm PL, users can now generate random values by selecting from different basic random number generators (pseudorandom, quasirandom or non-deterministic generators are supported) and then generate a stream of random values according to a chosen distribution. Both continuous distributions (such as Gaussian and uniform), and discrete distributions (such as Bernoulli and binomial) are supported. We have provided complete documentation for our implementations, including an overview of which features are, and are not, supported in this first release.
We have endeavored to ensure that the same generators and initializations are used as documented in the oneMKL documentation. This means that functions which return bit sequences are bitwise reproducible between Arm and x86 systems. If an integer or floating-point answer is requested answers may differ as the precision of various operations is different between the two libraries.
Note that in this release not all of the random number functions from VSL have been included. These functions are listed in the documentation as not being currently implemented. We are intending to fill out this coverage in future releases, and we are very keen to hear from users who find missing functionality that they would like us to prioritize.
The following chart demonstrates the benefit of having the VSL RNGs available on Arm for machine learning using PyTorch. PyTorch can be configured to use the VSL RNGs from Arm Performance Libraries as part of the dropout layer, drawing random values from a Bernoulli distribution. For example, for a batch size of 16, with an input tensor [16, 128, 3072] we see a 4x performance improvement when running sequentially compared with using the default RNGs within PyTorch. In addition, when the VSL interface is enabled, PyTorch calls the skip-ahead function vslSkipAheadStream to allow the parallel generation of random values. If VSL is not used, then the random values are always generated sequentially, without parallelism. Unlocking parallelism for the input tensor [16, 128, 3072] using 16 threads improves performance even further to around 44 times faster with Arm PL than the default in the dropout layer.
From 24.04 the libraries have been tuned to run efficiently on the following new Arm systems:
This is in addition to the systems previously tuned-for:
Previously, the standalone version of Arm Performance Libraries for Linux was available as separate downloads for each supported Linux distribution and for each of the supported major versions of GCC. However, this approach does not scale well as we add support for newer distributions and newer versions of GCC. As we move to supporting GCC 13 this time, so we have chosen to simplify the download options for Linux users, without loss of support for compiler version or OS. There are now just two downloadable packages: one for RPM distributions and one for .deb distributions.
Users should download the RPM package if they are using one of the following supported distributions:
Users should download the .deb package if they are using one of the supported Ubuntu distributions:
The version of Arm PL is exactly the same in each GCC-compatible package, and it is supported to work with versions of GCC from 7 through to 13.
Note: we also support the NVIDIA HPC compiler (NVHPC) in a similar way, providing Arm PL RPM and .deb packages which are compatible with NVHPC 24.1. Go to the Arm Performance Libraries downloads page to access the standalone versions of Arm PL for Linux, as well as Windows and macOS.
With Arm PL now available across multiple platforms, we provide a separate "Getting Started" guide for each to explain the basics. These are short guides available on developer.arm.com as either web pages or PDFs. We recommend downloading the PDF versions of the files for reference:
Users are also referred to the Arm Performance Libraries Reference Guide for complete documentation of all of the functions provided in the libraries.
More HPC blog posts