Arm Community
Arm Community
  • Site
  • User
  • Site
  • Search
  • User
Arm Community blogs
Arm Community blogs
Servers and Cloud Computing blog Isambard update: Arm rolling forward at scale for High Performance Computing
  • Blogs
  • Mentions
  • Sub-Groups
  • Tags
  • Jump...
  • Cancel
More blogs in Arm Community blogs
  • AI blog

  • Announcements

  • Architectures and Processors blog

  • Automotive blog

  • Embedded and Microcontrollers blog

  • Internet of Things (IoT) blog

  • Laptops and Desktops blog

  • Mobile, Graphics, and Gaming blog

  • Operating Systems blog

  • Servers and Cloud Computing blog

  • SoC Design and Simulation blog

  • Tools, Software and IDEs blog

Tags
  • High Performance Computing (HPC)
  • event
Actions
  • RSS
  • More
  • Cancel
Related blog posts
Related forum threads

Isambard update: Arm rolling forward at scale for High Performance Computing

Darren Cepulis
Darren Cepulis
May 13, 2019
4 minute read time.

Co-authored by Darren Cepulis, HPC Segment Manager, Arm, and Simon McIntosh-Smith, Professor of HPC at the University of Bristol.

At the CUG’19 event in Montreal, Canada this week, Simon McIntosh-Smith provided an at-scale performance update on the Isambard supercomputer deployed by the GW4 Alliance and the Met Office.

What is Isambard?

Isambard is an HPC cluster of 168 nodes based on the Cray XC50 system design and using Marvell ThunderX2 CPUs. It uses Cray's Aries high-speed interconnect.

The CUG presentation and associated paper provide an early in-depth view of performance data for at-scale applications running on the Arm-based Marvell ThunderX2. Here we provide a recap of some of the data and salient points presented.

Isambard: key findings

1. Marvell continues to advance their existing ThunderX2 SoC with the rollout of its B2 stepping

In a nutshell, the good news is that ThunderX2 B2 silicon is scaling similarly to Skylake when both are using Cray’s Aries interconnect. This is what we expected, but the Isambard project has now provided the evidence to confirm the hypothesis.

2. Higher Arm-based core counts are adhering to power budgets without down clocking

In Isambard, the new B2 silicon ThunderX2s are running at their turbo clock speed of 2.5 GHz all the time, even when running HPL. This is in contrast to most experiences of variable clock speeds, where modern x86 CPUs tend to downclock when running intensive codes. With Isambard’s CPUs running at their turbo speeds all the time, we know that they’re using less than 175 Watts and their temperatures are staying below 94 C, so far no matter what we’ve run on them.

The base clock speed of Isambard’s CPUs is 2.1 GHz, so for compute bound codes, the turbo speed is gaining around 10-15% performance over what we had before.

3. As we strong scale, codes become more network bound and CPU performance matters less

At scale, with high node count and high core count conditions, most codes become more network bound. This results in many of our results levelling out - Skylake catches up with ThunderX2 on the bandwidth-bound codes, and ThunderX2 catches up with Skylake on the compute-bound codes. GROMACS is striking in this regard - it was the most extreme result on a single node, with a dual-socket node of Skylake 28 core being twice as fast as a dual-socket node of ThunderX2 32 core.

However, at realistic scale, ThunderX2 and Skylake have almost identical performance. It’s worth bearing in mind that ThunderX2 CPUs are generally available at a fraction of the price of comparable top-bin Skylake CPUs, giving Arm-based ThunderX2 CPUs a significant performance per dollar advantage, even for compute bound codes such as GROMACS.

GROMACS scaling relative performance graph4. The Arm at-scale ecosystem also continues to advance

Bristol found a couple of minor scaling issues when testing MPI performance over the Aries interconnect for ThunderX2, which appear to be mostly related to collective operations. This wasn’t a surprise, given that Isambard is one of the first Arm-based Cray systems to be deployed at scale.

Cray is working with the Isambard team to identify and fix these MPI performance issues, and we anticipate that the few examples which don’t scale quite as we’d expect on ThunderX2 should be resolved soon.  ]

5. The Catalyst UK project is underway and advancing the Arm HPC ecosystem

While the Isambard system has focused on some of the key HPC applications for the UK and EU theatre as well as the Cray connectivity, further work is on-going at Bristol with a 64-node HPE Apollo 70 cluster. The on-going Catalyst UK project also draws in teams from EPCC in Edinburgh and the University of Leicester, each with their own similarly configured clusters. Working with Arm and partners such as HPE, SUSE, Marvell and Mellanox, the three university sites are each focusing on the scientific applications of their chosen fields of interest and work is driven by their scientists.

Besides further investigation into MPI connectivity performance, many more application are being ported and analyzed in terms of how well they run on these Arm based ThunderX2 platforms. 

6. Early Catalyst observations show further potential for performance improvement

In comparing Isambard results to those in the Catalyst UK related whitepaper that the University of Edinburgh recently submitted to the PASC19 conference show some bridgeable gaps in performance. In similar experiments, the Isambard results outshine the Catalyst at similar scale, with the finger pointing at variations in the connectivity stacks, adapters, and CPU stepping. Tuning performance on these Catalyst systems is moving to the forefront and we expect additional leveling of the playing field as work proceeds.

The Arm HPC User Group (AHUG) is be benefiting greatly from all the work being done by Arm-based supercomputer users and partners world-wide.

Arm will be hosting an AHUG workshop next month at ISC19. We hope to see you there.

Learn more about Arm's HPC Ecosystem

Please see our dedicated event page for further information on Arm's ISC19 presence.

To learn about the Arm HPC Ecosystem, please visit our Developer page.

Anonymous
Servers and Cloud Computing blog
  • Out-of-band telemetry on Arm Neoverse based servers

    Samer El-Haj-Mahmoud
    Samer El-Haj-Mahmoud
    Arm and Insyde advance out-of-band telemetry on Neoverse servers, enabling scalable, real-time datacenter insights via open standards and fleet analytics.
    • September 17, 2025
  • Optimizing Code Cache Performance for Large Code Footprint Java Applications on Neoverse

    Yanqin Wei
    Yanqin Wei
    Learn how smarter cache use transforms heavy Java apps into faster, more efficient workloads.
    • September 16, 2025
  • Redefining Datacenter Performance for AI: The Arm Neoverse Advantage

    Shivangi Agrawal
    Shivangi Agrawal
    In this blog post, explore the features that make Neoverse V series the choice of compute platform for AI.
    • September 8, 2025