Understanding the behavior of the earth’s oceans has been critical throughout history. As the climate change increases, modeling the ocean’s response is critical to financial, military, and environmental decisions. The HYbrid Coordinate Ocean Model (HYCOM) is a general circulation model used by a wide variety of agencies for both research and operational decisions.
HYCOM can run on systems varying in size from a desktop to the largest HPC clusters. This variety of problem sizes naturally leads to running HYCOM in the cloud. The flexibility of cloud HPC resources such as AWS c6g instances are designed for simulations with varying resource needs like HYCOM. In this blog, we present how HYCOM performs on AWS cloud resources, particularly how the AWS Graviton2 processor, based on Arm’s Neoverse N1 core, is the best choice for running HYCOM.
HYCOM is a general-circulation ocean modeling code developed by a consortium of academic, US government and commercial partners. The theoretical background dates to the 1980s when the hybrid coordinate model was first set forth. The model is isopycnal in the open, stratified ocean, but smoothly reverts to a terrain-following coordinate in shallow coastal regions, and to z-level coordinates in the mixed layer and unstratified seas. The hybrid coordinate extends the geographic range of applicability of traditional isopycnic coordinate circulation models (the basis of the present hybrid code), such as the Miami Isopycnic Coordinate Ocean Model (MICOM) and the Navy Layered Ocean Model (NLOM), toward shallow coastal seas and unstratified parts of the world ocean.
Instructions for building HYCOM can be found on the Arm community GitLab page and the HYCOM GitHub. HYCOM is a Fortran code with both MPI and OpenMP for parallelism. In general, single thread MPI is recommended for large problem sizes therefore no OpenMP threading was used in these tests. The default GCC 7.3 and OpenMPI 4.0.2 supplied on AWS instances were used. Spack was used to build NetCDF (both C and FORTRAN versions) which is necessary for the HYCOM tools to pre and post-process data.
When building HYCOM or the HYCOM-tools, a configuration file must be selected based on the architecture and the compiler. The configuration files provided by HYCOM give a general guideline for creating your own file if necessary. The “intelGF-impi-sm-relo_mpi” configuration file was used as the basis for building in these tests. The only necessary modification for the Graviton2 is removing the “-m64” flag since this flag is not defined for AArch64.
The HYCOM GitHub contains sample datasets for testing. Two different benchmarks were run for this study. Both are global ocean simulations that differ in the resolution. The smaller simulation, GLBt0.72, has a cell resolution of 500x382. This simulation is well suited for quick single node runs to check that HYCOM was built successfully. The larger simulation, GLBb0.08, is 4500x3298 and requires significant computational resources.
The first series of tests measured the single node performance of instances using the smaller GLbt0.72 simulation. The HYCOM repository has detailed instructions and pre-made scripts for running this simulation. It is important to note that these scripts generate mesh partitions prior to running HYCOM. This creates processor counts which may look unusual at first but are based on dividing the parent mesh in the most stable fashion.
A comparison was made between AWS c5a.16xlarge (AMD EPYC 7571), c5.18xlarge (Intel® Xeon® Platinum 8175m) and c5n.18xlarge (Intel® Xeon® Platinum 8259CL), and c6g.16xlarge (AWS Graviton2) instances for a one year simulation on the GLBt0.72 bathymetry. Figure 1 clearly shows that the AWS Graviton2 outperforms the other instances regardless of the number of MPI tasks.
Figure 1: Instance scaling for 1-year HYCOM GLBt0.72 simulation
A 3-day GLBb0.08 simulation was also tested across various processor counts. The HYCOM repository does not contain scripts for pre-processing data for this experiment. Detailed information on how this simulation was prepared can be found on the Arm community GitLab page. Once again, the processor count was determined by the mesh partitioning script. The processor counts do not evenly match processors per instance but the counts selected match as close as possible. The script did not produce any acceptable meshes in the 300 processor range.
Once again, figure 2 shows the Arm Neoverse based AWS Graviton2 c6g.16xlarge instances outperform the other instance types.
Figure 2: Instance scaling comparison for 3-day HYCOM GLBb0.08 simulation
HYCOM has been demonstrated to run easily and perform well on AWS cloud instances. The AWS Graviton2 based c6g instances have the fastest time to solution compared to other instance types. The c6g instances are also offered at the lowest cost. HYCOM users looking to leverage cloud resources for HPC applications would see the lowest cost per simulation on the c6g instances.
Explore HPC on Arm