Is the Cloud Ready for HPC Workloads?

June 4, 2020

2 minute read time.

Moving High Performance Computing (HPC) workloads to the cloud has been a trend for some time, but progress has advanced slowly facing resistance due to high costs and performance concerns based on lack of parallel file systems and low-latency networking. I was the co-author on one of the first papers to investigate the overhead of virtualization on HPC applications and we concluded virtualized environments imposed unacceptable overhead for performance-critical applications and systems.

But the cloud landscape and capabilities it offers have changed significantly. The allure of “virtually unlimited” resources and zero-wait batch queues is clearly desirable and despite some shortcomings, HPC workloads are beginning to make their way into the cloud. Industry analysts disagree on the size & momentum of this new business model. How fast will cloud adoption in HPC happen and will this transition cannibalize traditional HPC data centers? In the recent sale of Cray to HPE, the CEO of Cray stated that the impact of cloud caused future business to be in doubt.

Cloud providers have made steady improvement to their HPC offerings which are beginning to experience rapid growth. Microsoft Azure stood up dedicated Cray systems with tightly coupled, low-latency interconnects. AWS's position was that better networking would be deployed across their data centers, keeping the "sea of compute" homogenous which avoids the complexity of workload placement and resource fragmentation and isolation. AWS acquired Annapurna Labs and quickly made progress on network offload and acceleration with the AWS Nitro System SmartNIC implementation, which frees up expensive compute resources and improves workload performance.

On the parallel storage front, AWS began deploying Lustre images from Whamcloud in the mid-2010s and have done quite well with that offering. Recently, AWS unveiled Amazon FSx for Lustre - a fully managed Lustre offering that uses S3 to store data at rest.

These technology advancements and innovation are setting a stage to welcome HPC applications to the AWS cloud. Every step up in compute, networking, and I/O performance raises their platform’s applicability to a broader set of HPC workloads, attracting more business.

The remaining obstacle to broad adoption is cost, with the prevailing opinion that a fully utilized HPC data center must be more cost-effective than outsourcing to the cloud. The breakthrough here is the arrival of the Arm Neoverse-based AWS Graviton2 processor and Amazon EC2 M6g/C6g/R6g instance family. With the promise of up to 40% better price and performance than x86, AWS is tackling the HPC cost-of-cloud concern head-on. And independent experiments are validating those claims on benchmarks as well as on real-world workloads. The numbers do not lie - Arm-based technology is both faster and less expensive than competing x86 systems.

The combination of advances in networking, storage, and compute VIA Graviton2 make AWS a desirable platform for HPC applications. At Arm, our HPC team is working with open source and ISV applications vendors to study the reality of running HPC in the cloud. We are looking closely and cannot wait to report back on our findings.

See Arm Infrastructure solutions for HPC

Servers and Cloud Computing blog

Refining MurmurHash64A for greater efficiency in Libstdc++

Zongyao Zhang

Discover how tuning MurmurHash64A’s memory access pattern yields up to 9% faster hashing performance.
- October 16, 2025
How Fujitsu implemented confidential computing on FUJITSU-MONAKA with Arm CCA

Marc Meunier

Discover how FUJITSU-MONAKA secures AI and HPC workloads with Arm v9 and Realm-based confidential computing.
- October 13, 2025
Pre-silicon simulation and validation of OpenBMC + UEFI on Neoverse RD-V3

odinlmshen

In this blog post, learn how to integrate virtual BMC and firmware simulation into CI pipelines to speed bring-up, testing, and developer onboarding.
- October 13, 2025

AI blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded and Microcontrollers blog

Internet of Things (IoT) blog

Laptops and Desktops blog

Mobile, Graphics, and Gaming blog

Operating Systems blog

Servers and Cloud Computing blog

SoC Design and Simulation blog

Tools, Software and IDEs blog

Is the Cloud Ready for HPC Workloads?

Refining MurmurHash64A for greater efficiency in Libstdc++

How Fujitsu implemented confidential computing on FUJITSU-MONAKA with Arm CCA

Pre-silicon simulation and validation of OpenBMC + UEFI on Neoverse RD-V3