Arm Community
Arm Community
  • Site
  • User
  • Site
  • Search
  • User
Arm Community blogs
Arm Community blogs
Servers and Cloud Computing blog Is the Cloud Ready for HPC Workloads?
  • Blogs
  • Mentions
  • Sub-Groups
  • Tags
  • Jump...
  • Cancel
More blogs in Arm Community blogs
  • AI blog

  • Announcements

  • Architectures and Processors blog

  • Automotive blog

  • Embedded and Microcontrollers blog

  • Internet of Things (IoT) blog

  • Laptops and Desktops blog

  • Mobile, Graphics, and Gaming blog

  • Operating Systems blog

  • Servers and Cloud Computing blog

  • SoC Design and Simulation blog

  • Tools, Software and IDEs blog

Tags
  • High Performance Computing (HPC)
  • Cloud Computing
  • infrastructure
  • Neoverse
Actions
  • RSS
  • More
  • Cancel
Related blog posts
Related forum threads

Is the Cloud Ready for HPC Workloads?

Brent Gorda
Brent Gorda
June 4, 2020
2 minute read time.

Moving High Performance Computing (HPC) workloads to the cloud has been a trend for some time, but progress has advanced slowly facing resistance due to high costs and performance concerns based on lack of parallel file systems and low-latency networking. I was the co-author on one of the first papers to investigate the overhead of virtualization on HPC applications and we concluded virtualized environments imposed unacceptable overhead for performance-critical applications and systems.  

But the cloud landscape and capabilities it offers have changed significantly. The allure of “virtually unlimited” resources and zero-wait batch queues is clearly desirable and despite some shortcomings, HPC workloads are beginning to make their way into the cloud. Industry analysts disagree on the size & momentum of this new business model. How fast will cloud adoption in HPC happen and will this transition cannibalize traditional HPC data centers? In the recent sale of Cray to HPE, the CEO of Cray stated that the impact of cloud caused future business to be in doubt.

Cloud providers have made steady improvement to their HPC offerings which are beginning to experience rapid growth. Microsoft Azure stood up dedicated Cray systems with tightly coupled, low-latency interconnects. AWS's position was that better networking would be deployed across their data centers, keeping the "sea of compute" homogenous which avoids the complexity of workload placement and resource fragmentation and isolation. AWS acquired Annapurna Labs and quickly made progress on network offload and acceleration with the AWS Nitro System SmartNIC implementation, which frees up expensive compute resources and improves workload performance.

On the parallel storage front, AWS began deploying Lustre images from Whamcloud in the mid-2010s and have done quite well with that offering. Recently, AWS unveiled Amazon FSx for Lustre - a fully managed Lustre offering that uses S3 to store data at rest.

These technology advancements and innovation are setting a stage to welcome HPC applications to the AWS cloud. Every step up in compute, networking, and I/O performance raises their platform’s applicability to a broader set of HPC workloads, attracting more business.

The remaining obstacle to broad adoption is cost, with the prevailing opinion that a fully utilized HPC data center must be more cost-effective than outsourcing to the cloud. The breakthrough here is the arrival of the Arm Neoverse-based AWS Graviton2 processor and Amazon EC2 M6g/C6g/R6g instance family. With the promise of up to 40% better price and performance than x86, AWS is tackling the HPC cost-of-cloud concern head-on. And independent experiments are validating those claims on benchmarks as well as on real-world workloads. The numbers do not lie - Arm-based technology is both faster and less expensive than competing x86 systems.

The combination of advances in networking, storage, and compute VIA Graviton2 make AWS a desirable platform for HPC applications. At Arm, our HPC team is working with open source and ISV applications vendors to study the reality of running HPC in the cloud. We are looking closely and cannot wait to report back on our findings. 

See Arm Infrastructure solutions for HPC

Anonymous
Servers and Cloud Computing blog
  • How SiteMana scaled real-time visitor ingestion and ML inference by migrating to Arm-based AWS Graviton3

    Peter Ma
    Peter Ma
    Migrating to Arm-based AWS Graviton3 improved SiteMana’s scalability, latency, and costs while enabling real-time ML inference at scale.
    • July 4, 2025
  • Arm Performance Libraries 25.04 and Arm Toolchain for Linux 20.1 Release

    Chris Goodyer
    Chris Goodyer
    In this blog post, we announce the releases of Arm Performance Libraries 25.04 and Arm Toolchain for Linux 20.1. Explore the new product features, performance highlights and how to get started.
    • June 17, 2025
  • Harness the Power of Retrieval-Augmented Generation with Arm Neoverse-powered Google Axion Processors

    Na Li
    Na Li
    This blog explores the performance benefits of RAG and provides pointers for building a RAG application on Arm®︎ Neoverse-based Google Axion Processors for optimized AI workloads.
    • April 7, 2025