SC, the annual ACM/IEEE supercomputing conference, is a milestone on every calendar in the HPC community, drawing luminaries and novices alike from every corner of the HPC landscape. Bright-eyed academics exchange knowledge, competing vendors plug their products, and senior executives swap stories over a drink or two. But SC21 was completely unlike every prior SC in two important ways.
First, Arm was prominent like never before. Over the last few years, Arm-based HPC has grown from cute experiments in low-power clusters to enabling the #1 seat on the Top500 – four times in a row. Arm IP also appeared in one of China’s stealthy exascale systems as the host for a powerful home-grown matrix accelerator. As Arm explosively leveled-up from Raspberry Pi to RIKEN’s Fugaku, the Arm HPC community came together and formed the Arm HPC Users Group, or AHUG for short. AHUG is a user-led, not-for-profit organization intent on promoting the latest Arm-based silicon, systems, and platforms for the HPC community.
Second, SC21 also marked the first hybrid in-person/virtual instantiation of the event. This new format made recording and sharing much easier and lowered barriers for participants who could not travel to the USA. Arm, and many key Arm partners, opted for pure virtual participation, which means many of the best moments of the conference are now free to watch (see links below). AHUG organized three open events for the HPC community to come together and exchange ideas and share knowledge related to using Arm devices and tools for HPC: a symposium, a hackathon, and a birds-of-a-feather.
The SC21 BoF “The Arm HPC Users Group: Experiences and Predictions for Extreme-Scale Arm Systems” featured experiences and lessons learned from Arm-based HPC systems currently in production. One of the highlights of the event was Los Alamos National Laboratory (LANL). LANL revealed the name of their 2023 system based on Arm CPUs and NVIDIA’s yet to be announced next generation GPU. Its name is Venado, which is a peak in the Taos mountains of New Mexico, where LANL is located. The BoF also featured some lively discussion around the state of Arm’s compiler ecosystem for HPC, and end-user demand for SVE (Arm’s Scalable Vector Extension). SVE is a key design feature in Fugaku and expected in many Arm-based CPUs on the near horizon.
This high-energy event mustered individuals and teams from academia and industry alike to rapidly identify and resolve performance problems on four different Arm-based HPC systems. AHUG members and partners like NVIDIA, Oracle, AWS, and Fujitsu joined in to mentor our hackers and help rapidly resolve or triage vendor-specific issues.
Because the AHUG SC21 Hackathon was virtual, we chased the sun and kicked off three times in three different locales: Asia, Europe, and the Americas. The hackathon ran for approximately two days and almost sixty people registered. Our hackers had their choice of four (actually five) different systems:
We took breaks from hacking to hear short presentations from experts and take “guided tours” of well-known HPC applications like SPECFEM3D, OpenFOAM, GROMACS, and NWChem. All these presentations are available on YouTube, and you can download the hands-on materials and find application build/run/profile instructions on the event website: https://arm-hpc-user-group.github.io/SC21-Hackathon/. Our hackers ran machine learning, seismic modeling, and earth systems modeling applications on all four systems and compared performance across architectures. PyTorch, AlphaZero.jl, and COAWST from the US Geological Survey saw good performance throughout the event. “Oracle has swallowed the elephant!” said one hacker after successfully rendering Disney's Moana Island Scene via PBRT, a benchmark that strains both core performance and memory capacity. In the end, many hackers stayed to the final minute, and a few even showed up in the Slack feed the next day hoping to keep going! Fortunately, accounts for all systems are available and many of the hackers plan to keep going on their own time. We are even hearing that a report from USGS will include some of the results from this event.
The main AHUG event was a symposium of AHUG members worldwide. Scientists, researchers, engineers, and Arm partners presented the latest developments in Arm-based HPC for science and discovery. For example, Eric Lequiniou, VP Radioss Development and Altair Solver HPC, demonstrated how Altair Radioss leading crash simulation software is now supported on Arm. Radioss has excellent performance on Arm with the help of Ampere Altra 80-core Neoverse N1 CPUs. Hatem Ltaief of KAUST showed how leveraging low-rank matrix approximations improves the performance of seismic codes on Fujitsu A64FX. Several presenters highlighted how easy it is for scientists and engineers to use Arm-based HPC systems: “it’s boringly normal!” said a user from the Edinburgh Parallel Computing Centre. There can hardly be a higher complement in scientific computing. You can watch the entire symposium on YouTube.
The hybrid in-person/virtual SC had some advantages over the traditional format, and I hope the SC organizers retain some of the new infrastructure for recording and broadcasting conference sessions. I am confident AHUG will. AHUG runs events throughout the year, so keep an eye on a-hug.org. But I also look forward to a return to in-person conferences. Virtual conferences simply are no replacement for the high-bandwidth idea exchange of an in-person event. The next major HPC conference is ISC 2022 in Hamburg, Germany, and SC22 will be held in Dallas, Texas. See you there!
[CTAToken URL = "https://www.arm.com/solutions/infrastructure/high-performance-computing" target="_blank" text="Explore more HPC at Arm" class ="green"]