We are happy to announce that Arm Compiler for Linux (ACfL) and Arm Performance Libraries (Arm PL) are now available as installable packages in Spack, a widely used package manager in the HPC community. Using Spack makes installing ACfL and Arm PL a simple task. And multiple versions of ACfL and Arm PL can co-exist and be easily managed on the same machine. Building applications with ACfL and linking them with Arm PL to use optimized BLAS, LAPACK, FFT, and math functions on Arm architecture have been made as easy as a single command. Users can also conveniently switch between different compiler and libraries versions for performance comparison.
ACfL and Arm PL are available in Spack as a package named acfl and armpl-gcc. The acfl package includes Arm C/C++/Fortran compilers and Arm PL, while the armpl-gcc package corresponds to the standalone version of Arm PL compiled with GCC compilers. To install acfl or armpl-gcc package, use the Spack command:
spack install <acfl/armpl-gcc>
This command installs the latest version (currently version 23.04) of ACfL and Arm PL by default. For previous versions of ACfL and Arm PL, users can specify a version they want to install by appending @<version> to the package name:
spack install <acfl/armpl-gcc>@<version>
spack info <acfl/armpl-gcc>
lists available versions of acfl and armpl-gcc package in Spack. The command also shows variants for Arm PL, for example, multi-/single-threaded version or 32-bit/64-bit integer version of the libraries, that users can choose to install. The default option installs the single-threaded 32-bit integer version of Arm PL as shared libraries. To install, for example, the multi-threaded 32-bit integer shared libraries, append threads=openmp to the Spack install command:
spack install <acfl/armpl-gcc> threads=openmp
For acfl, as with other compilers, a post-installation step is required before users can start using the compilers in Spack. Arm compilers must be loaded and added into Spack compiler list by using the command spack load acfl followed by spack compiler find.
spack load acfl
spack compiler find
If you prefer using an environment module system for managing your shell environment, Spack also generates module files for use by both the environment module and Lmod system. Please consult Spack's module file tutorial for details on how to set up environment modules in Spack
There are currently over 7000 software packages available in Spack. To build a software package using Arm compilers, use the command:
spack install <package name> %arm
This command builds the package and all of its dependencies using the latest version of Arm compilers that users have installed in Spack. In the same way as when installing acfl, users can instruct Spack to use a specific version of the compilers by appending @<version> after the compiler name:
spack install <package name> %arm@<version>
Many software packages in Spack depend on widely used libraries such as BLAS, LAPACK, or FFT. Because there are many providers for these basic mathematical functions, Spack handles this situation by making these packages virtual and, as such, several Spack packages can provide these virtual packages. Users can then specify which library provider they would like to use by when building packages with the Spack install command. Both acfl and armpl-gcc provide blas, lapack, and fftw-api@3 virtual packages in Spack. To build a package with acfl or armpl-gcc as a dependency, use the command:
spack install <package name> ^<acfl/armpl-gcc>
As before, users can be more even more specific about which variant of acfl or armpl-gcc to use when building. For example, using
spack install <package name> ^<acfl/armpl-gcc> threads=openmp
builds the package and link it with the multi-threaded 32-bit integer version of Arm PL.
To demonstrate the benefits using Arm PL can provide, we perform benchmarks of widely used software packages in HPC and data science. Namely Gromacs, Quantum Espresso, CP2K, numpy, and scipy, on an AWS Graviton3 c7g.2xlarge instance. This instance type has 8 vCPUs with 16GB of main memory. Figure 1. shows measured speedups for different build configurations for a build using GCC 12.2.0, FFTW 3.3.10, and OpenBLAS 0.3.21.
Figure 1: Speedup over GCC + FFTW + OpenBLAS build configuration running with 8 threads.
We observe performance speedup of ~1.25x-1.3x in most of these applications when they are linked with Arm PL, either through the acfl package or directly with the armpl-gcc package. Profiling these benchmarks reveals that they spend most computing time in matrix-matrix multiplication operations (DGEMM). Gromacs, on the other hand, demonstrates only a slight overall speedup of a few percent when using FFT functions in Arm PL instead of FFTW library. This result is because the cost of 3D FFT calculations for solving the electrostatic potential using the Particle-Mesh Ewald (PME) method is only responsible for 8 percent of the total computing time. If we compare only the cost of the 3D FFT computation in this benchmark, we observe that Arm PL delivers ~1.3x greater performance than FFTW.
In this blog, we have demonstrated how users can install Arm compilers and Arm PL through Spack and how has it been made easy to build applications using them. We have also shown that depending on where the performance bottleneck lies, many applications can easily benefit from optimized BLAS, LAPACK, FFT, and basic mathematical functions available within Arm PL when running on Arm architecture. Currently, not all packages are built successfully with Arm compilers and some packages may not link with Arm PL out of the box. But, we are currently working on enabling more and more packages in Spack. If you have a specific request or encounter a problem, you can report it on the Spack github page or contact us directly and we will try our best to help solving the problem.