How Arm IT saved cost, space, and power with Ampere Altra-based HPE ProLiant RL300 Gen11 servers

September 23, 2024

5 minute read time.

The Arm ecosystem is growing by leaps and bounds across all our lines of business. Whether that is Microsoft Copilot+ PC in Client, continued growth of cloud (AWS, Google, Microsoft) in Infrastructure, or success with IVI or ADAS in Automotive, Arm’s customers demand a steady drum beat of new CPU innovation. To support this growth, Arm IT and Productivity Engineering teams must decide where and when to add new capacity, and what vendors or technology to base it on.

We have written many times about Arm’s use of EDA in the cloud and recently shared how improvements in Arm Neoverse per-core performance are letting us move more EDA workloads to the Arm architecture. This blog post focuses on our on-premises clusters. These clusters have historically been x86-based, but we continue to move EDA workloads to the Arm architecture as new tools and systems become available. We recently installed several hundred HPE ProLiant RL300 Gen11 servers, using the Neoverse N1-based Ampere® Altra® Max CPU, into our Austin datacenter and wanted to share our early observations. Arm-based servers, like the HPE RL300, offer a unique combination of performance and power-efficiency, and that has significant real-world benefits from a cost, space, and power perspective.

HPE announced the ProLiant RL300 Gen11 server at their Discover 2023 event, and Arm became an early customer. While targeted at cloud-native scale-out workloads, we have found the 128-core Ampere Altra Max CPU to be well suited to a range of Arm’s EDA workloads. Importantly, it also draws less power relative to our legacy Intel Xeon-based servers and AMD EPYC CPUs.

Measuring system power

Here are power measurements taken from our HPE RL300 server with 128-cores of Arm Neoverse N1, compared to running the same workload on our servers equipped with 128-cores of AMD EPYC 3^rd gen and 4^th gen x86 CPUs.

Over a 1-week period, the HPE RL300 server maxes out at a peak consumption of 455 watts. This RL300 system includes a single-socket Ampere Altra Max M128-30 CPU, 16x 64GB DDR4 DIMMs, and a single 1.6TB NVME drive.

HPE RL300 Gen11 power histogram

Figure 1: Power histogram of the HPE RL300 with an Arm CPU running an EDA batch workload.

The 3^rd Gen EPYC x86 CPU equipped server maxes out at over 1150 watts. This server consists of dual-sockets (128-cores total) of 3^rd-Generation AMD EPYC, 32x 64GB DDR4 DIMMs, and a single 1.6TB NVME drive.

HPE DL365 power histogram

Figure 2: Power histogram of the HPE DL365 with x86 CPUs running an EDA batch workload

Finally, the latest 4^th Gen EPYC x86 CPU equipped server maxes out at almost 1400 watts! This system uses dual-sockets (128-cores total) of 4^th-Generation AMD EPYC, 24x 64GB DDR5 DIMMs, and a single 1.6TB NVME drive.

Dell R7626 with 4th Gen AMD EPYC power histogram

Figure 3: Power histogram of the Dell R7626 with x86 CPUs running an EDA batch workload

This reduced system power from deploying Arm directly translates into a key productivity benefit - which is delivering higher core density per-rack to our engineers. From a productivity standpoint, for many EDA workloads, we care about “slots”, aka, a single CPU core. More slots equal higher throughput and more productivity.

Core density and cost advantages

Arm’s on-premises datacenters support racks with power densities ranging from 20kW up to 30kW per rack. Regardless of rack power, using the Ampere Altra Max CPU in the HPE RL300, we can support double the core density compared to using x86.

Arm IT case study rack density comparison vs. x86

Figure 4: Comparison of core density per rack using Arm or x86-based CPUs

This superior Arm core density offers deployment flexibility to our IT team. They can replace aging x86-based system with Arm and double throughput capacity. They can replace x86 servers with Arm one-for-one and free up space and power to make room for future workloads, like generative AI. And being able to fill a rack with servers means not having to cross-cable between racks to achieve better network switch port utilization. This core-density advantage also edges us closer to a goal we’ve set for 2024, which is to be running at least 50% of our on-prem EDA cluster infrastructure on Arm.

From a price-per-core standpoint, Ampere Altra Max is also a bargain compared to our x86 CPUs. Comparing 128-core systems, AMD EPYC CPUs cost approximately 4-5 times more per core versus Ampere Altra Max. Like the benefit, we see in watts per server, this cost per core difference allows us to deploy more Arm cores for the same budget.

Normalized cost per core for Arm IT systems

Figure 5: Normalized cost-per-core comparison of Arm and x86-based CPUs

Note: Arm uses three different systems configurations from HPE.

RL300 Gen11 with a single 128-core Ampere® Altra® Max M128-30 CPU
DL365 Gen10 with dual 64-core AMD EPYC 7773X (‘Milan-X’) CPUs
DL385 Gen11 with dual 64-core AMD EPYC 9384X (‘Genoa-X’) CPUs

Each system comes with a slightly different memory configuration (750GB to 1TB per CPU), and support contract options. For this comparison we have backed out all costs except for base system configuration and CPU. As pricing varies per customer, please view this as an approximation (your mileage may vary).

A significant number of software packages that Arm engineers need are available on Arm today, and we expect that the full range of tools will become available on Arm in the near future both for our own use as well as the wider ecosystem.

HPE RL300 Gen11 Smart Buy

If you have been thinking about trying out the HPE RL300 for yourself, right now is the perfect time. HPE just announced a new, RL300 Gen11 Smart Buy BTO Offer, available through its global fulfilment partners. The server ships as bare metal, but a wide range of operating systems, container environments, and applications are supported on Arm. If you have questions on whether the application you care about is support, or what version is needed, our Software Ecosystem Dashboard for Arm tracks this for you. You can also find many workload-based Learning Paths on the Arm Developer Hub or access tutorials directly from Ampere.

We are excited about the core-density and power advantages the HPE RL300 Gen11 and Ampere Altra Max CPU bring to our on-prem EDA environment. Which is why we are currently completing our largest ever installation in our Austin datacenter. We hope to share more findings on our experience with these servers at a later date.

Learn more about HPE RL300 Gen 11 Smart Buy BTO here:

Learn more

0 comments
0 members are here

Servers and Cloud Computing blog

Arm CMN S3: Driving CXL storage innovation

John Xavier Lionel

CXL are revolutionizing the storage landscape. Neoverse CMN S3 plays a pivotal role in enabling high-performance, scalable storage devices configured as CXL Type 1 and Type 3.
- February 24, 2025
Streamline Arm adoption with GitHub Copilot and Arm64 Runners

Michael Gamble

The Arm for GitHub Copilot extension is here to change the way developers approach architecture migration.
- February 19, 2025
Expanding Arm on Arm with the NVIDIA Grace CPU

Tim Thornton

In this blog post, we show how the Arm Neoverse V2-based NVIDIA Grace CPU can run Arm's most performance-critical workloads and allows Arm to operate a consistent environment in-cloud and on-prem.
- November 20, 2024

AI blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded and Microcontrollers blog

Internet of Things (IoT) blog

Laptops and Desktops blog

Mobile, Graphics, and Gaming blog

Operating Systems blog

Servers and Cloud Computing blog

SoC Design and Simulation blog

Tools, Software and IDEs blog

How Arm IT saved cost, space, and power with Ampere Altra-based HPE ProLiant RL300 Gen11 servers

Measuring system power

Core density and cost advantages

HPE RL300 Gen11 Smart Buy

Arm CMN S3: Driving CXL storage innovation

Streamline Arm adoption with GitHub Copilot and Arm64 Runners

Expanding Arm on Arm with the NVIDIA Grace CPU