The Arm ecosystem is growing by leaps and bounds across all our lines of business. Whether that’s Microsoft Copilot+ PC in Client, continued growth of cloud (AWS, Google, Microsoft) in Infrastructure, or success with IVI or ADAS in Automotive, Arm’s customers demand a steady drum beat of new CPU innovation. To support this growth, Arm IT and Productivity Engineering teams must decide where and when to add new capacity, and what vendors or technology to base it on.
We’ve written many times about Arm’s use of EDA in the cloud and recently shared how improvements in Arm Neoverse per-core performance are letting us move more EDA workloads to the Arm architecture. This blog focuses on our on-premises clusters. These clusters have historically been x86-based, but we continue to move EDA workloads to the Arm architecture as new tools and systems become available. We recently installed several hundred HPE ProLiant RL300 Gen11 servers - using the Neoverse N1-based Ampere® Altra® Max CPU - into our Austin datacenter and wanted to share our early observations. Arm-based servers, like the HPE RL300, offer a unique combination of performance and power-efficiency, and that has significant real-world benefits from a cost, space, and power perspective.
HPE announced the ProLiant RL300 Gen11 server at their Discover 2023 event, and Arm became an early customer. While targeted at cloud-native scale-out workloads, we’ve found the 128-core Ampere Altra Max CPU to be very well suited to a range of Arm’s EDA workloads. Importantly, it also draws less power relative to our legacy Intel Xeon-based servers and AMD EPYC CPUs.
Here are power measurements taken from our HPE RL300 server with 128-cores of Arm Neoverse N1, compared to running the same workload on our servers equipped with 128-cores of AMD EPYC 3rd gen and 4th gen x86 CPUs.
Over a 1-week period, the HPE RL300 server maxes out at a peak consumption of 455 watts. This RL300 system includes a single-socket Ampere Altra Max M128-30 CPU, 16x 64GB DDR4 DIMMs, and a single 1.6TB NVME drive.
Figure 1. Power histogram of the HPE RL300 with an Arm CPU running an EDA batch workload
The 3rd Gen EPYC x86 CPU equipped server maxes out at over 1150 watts. This server consists of dual-sockets (128-cores total) of 3rd-Generation AMD EPYC, 32x 64GB DDR4 DIMMs, and a single 1.6TB NVME drive.
Figure 2. Power histogram of the HPE DL365 with x86 CPUs running an EDA batch workload
Finally, the latest 4th Gen EPYC x86 CPU equipped server maxes out at almost 1400 watts! This system uses dual-sockets (128-cores total) of 4th-Generation AMD EPYC, 24x 64GB DDR5 DIMMs, and a single 1.6TB NVME drive.
Figure 3. Power histogram of the Dell R7626 with x86 CPUs running an EDA batch workload
This reduced system power from deploying Arm directly translates into a key productivity benefit - which is delivering higher core density per-rack to our engineers. From a productivity standpoint, for many EDA workloads, we care about “slots”, aka, a single CPU core. More slots equal higher throughput and more productivity.
Arm’s on-premises datacenters support racks with power densities ranging from 20kW up to 30kW per rack. Regardless of rack power, using the Ampere Altra Max CPU in the HPE RL300, we can support double the core density compared to using x86.
Figure 4. Comparison of core density per rack using Arm or x86-based CPUs
This superior Arm core density offers deployment flexibility to our IT team. They can replace aging x86-based system with Arm and double throughput capacity. They can replace x86 servers with Arm one-for-one and free up space and power to make room for future workloads, like generative AI. And being able to fill a rack with servers means not having to cross-cable between racks to achieve better network switch port utilization. This core-density advantage also edges us closer to a goal we’ve set for 2024, which is to be running at least 50% of our on-prem EDA cluster infrastructure on Arm.
From a price-per-core standpoint, Ampere Altra Max is also a bargain compared to our x86 CPUs. Comparing 128-core systems, AMD EPYC CPUs cost approximately 4-5 times more per core versus Ampere Altra Max. Like the benefit we see in watts per server, this cost per core difference allows us to deploy more Arm cores for the same budget.
Figure 5. Normalized cost-per-core comparison of Arm and x86-based CPUs
Note: Arm uses three different systems configurations from HPE.
Each system comes with a slightly different memory configuration (750GB to 1TB per CPU), and support contract options. For this comparison we have backed out all costs except for base system configuration and CPU. As pricing varies per customer, please view this as an approximation (your mileage may vary).
A significant number of software packages that Arm engineers need are available on Arm today, and we expect that the full range of tools will become available on Arm in the near future both for our own use as well as the wider ecosystem.
If you’ve been thinking about trying out the HPE RL300 for yourself, right now is the perfect time. HPE just announced a new, RL300 Gen11 Smart Buy BTO Offer, available through its global fulfilment partners. The server ships as bare metal, but a wide range of operating systems, container environments, and applications are supported on Arm. If you have questions on whether the application you care about is support, or what version is needed, our Software Ecosystem Dashboard for Arm tracks this for you. You can also find many workload-based Learning Paths on the Arm Developer Hub or access tutorials directly from Ampere.
We are excited about the core-density and power advantages the HPE RL300 Gen11 and Ampere Altra Max CPU bring to our on-prem EDA environment. Which is why we are currently completing our largest ever installation in our Austin datacenter. We hope to share more findings on our experience with these servers at a later date.
Learn more about the new HPE RL300 Gen11 Smart Buy BTO