Seismic wave propagation simulation is key to several HPC workloads. On the one hand, it is at the heart of algorithms such as the Reverse Time Migration (RTM) or the Full Waveform Inversion (FWI) used by the Oil & Gas community to map the underground. On the other hand, it is a key tool to better understand the effects of earthquakes, evaluate seismic hazards and calculate potential damage to buildings. The discretization of the wave equation has therefore been a hot topic of research in the past decades. Which led to several numerical methods being used to simulate the propagation of waves: Spectral Element Methods (SEM), Finite Element Methods (FEM), Discontinuous Galerkin Methods (DGM) or Finite-Difference Methods (FDM). We have shown in a previous blog the readiness of AWS Graviton2 for seismic wave Modeling with SPECFEM3D, which is based on a spectral element method. In the present blog, we would like to study the readiness of AWS Graviton2 for a finite-difference based application: SW4Lite.
SW4 [1] was developed at Lawrence Livermore National Laboratory (LLNL) with financial support of the US Department of Energy. SW4 stands for Seismic Waves fourth order. SW4 simulates the propagation of seismic waves in three-dimensional heterogeneous material model. It uses a fourth order in space and time finite-difference discretization of the elastic wave equations in displacement formulation [2],[3]. The dissipative nature of realistic materials can also be modelled by a viscoelastic propagation behavior [4]. SW4 is written in C++ and Fortran. Both shared and distributed memory parallelization of the code are available with a hybrid MPI and OpenMP implementation. GPU and RAJA versions are also available. SW4 source code is available for download here. SW4Lite is a proxy application implementing some of the most compute-intensive kernels of SW4 and its communication scheme. SW4Lite belongs to the ECP Proxy Application Suite. It is used to evaluate and compare different hardware solutions, and in co-design activities to find solutions to overcome exascale challenges. SW4Lite source code can be downloaded here.
Building SW4Lite requires a C++ and Fortran compiler. In our case, we used the version 9.3 of the GNU compilers. The MPI implementation that we used is Open MPI 4.0.3. An implementation of BLAS is also required, we used the Arm Performance Libraries (ArmPL). First retrieve the source code from the Git repository.
git clone https://github.com/geodynamics/sw4lite.git
Insert in the Makefile a subsection as follows:
else ifeq ($(findstring C6g.16xlarge,$(HOSTNAME)),C6g.16xlarge) FC = mpif90 CXX = mpic++ OMPOPT = -fopenmp OPT = -Ofast -mcpu=native -I${ARMPL_INCLUDES} EXTRA_LINK_FLAGS = -L${ARMPL_LIBRARIES} -mcpu=native -larmpl_lp64_mp -lgfortran -lm openmp = yes debug=no computername := C6g.16xlarge
Make sure to replace C6g.16xlarge by your C6g.16xlarge hostname in the else statement. And then simply use the Makefile to compile:
C6g.16xlarge
else
make
This will compile the Fortran version of SW4Lite’s kernels. If the compilation is successful, one should see SW4 Lives! In the terminal:
SW4 Lives!
``'-.,_,.-'``'-.,_,.='``'-.,_,.-'``'-.,_,.='````'-.,_,.-'``'-.,_,.='`` _________ ____ __ ____ _ __ / ____ \ \ \ / \ / / / | | | | | \./ \ \ / \ / / / | | | | |______ \ \/ \/ / / '--' | \______ \ \ / |______ | | | \ /\ / | | /`\____| | \ / \ / | | \_________/ \_/ \_/ |__| __ __ ____ ____ _______ ______ __ | | | | \ \ / / | ____| / __| | | | | | | \ \/ / | |__ | (__ | | | | | | \ / | __| \__ | | | | `----.| | \ / | |____ __) | |__| |_______||__| \__/ |_______| (_____/ (__) ``'-.,_,.-'``'-.,_,.='``'-.,_,.-'``'-.,_,.='````'-.,_,.-'``'-.,_,.='``
make ckernel=yes
In this study we focused on the Fortran version.
SW4Lite comes with a small test suite to validate the results. For the Fortran version, one needs to change the parameter corder to 0 in the files
corder
sw4lite/pytest/reference/pointsource/pointsource.in
sw4lite/pytest/reference/topo/curvilinear.in
The “-l” parameter can be 0, 1 or 2 and is the level, “-m” sets the number of MPI ranks, “-t” the number of threads and most importantly “-d” gives the path to the binary of SW4Lite being tested.
python3 ./test_sw4lite.py -v -l 2 -m 8 -t 1 -d optimize_mp_C6g.16xlarge
[…] Test # 1 Input file: pointsource.in PASSED […] Test # 2 Input file: curvilinear.in PASSED […] Out of 2 tests, 0 failed and 2 passed
Several test cases and input files are provided with the SW4Lite repository. As an example, one can launch the LOH.1-h100 problem available in the directory sw4lite/tests/loh1/ as detailed below.
sw4lite/tests/loh1/
export OMP_NUM_THREADS=1 mpirun -n 64 –bind-to core sw4lite LOH.1-h100.in
We propose here to compare the AWS Graviton2 (C6g.16xlarge) to the latest x86-based instances available in AWS EC2: Intel (C5n.18xlarge) and AMD (C5a.16xlarge). To compare different AWS EC2 instances we ran SW4Lite for different test cases provided in SW4Lite repository. We normalized the execution time for each test case, taking the time on C6g.16xlarge instance as the reference. Since each AWS EC2 instance has a different cost we are focusing on the normalized cost of a simulation rather than normalized execution time. At the time of writing, for the N. Virginia region, the cost of the instances was:
Taking into account the execution time for each simulation in Figure 1, the AWS Graviton2 (C6g.16xlarge) shows on average a 25% cost reduction compared to other instances.
Figure 1 - Simulation price comparison across different AWS EC2 instances
Realistic simulations require more computational power than what a single node can provide. It is common for such computations to use hundreds up thousands of cores. It is important to make sure that using more resources brings a benefit in terms of execution time. One aspect is to make sure that the communication network between the nodes is not a limiting factor. If the execution time is reduced proportionally to the increase of compute resources, we say that the application scales well on the system.
We tested the strong scaling behavior of SW4Lite. For the fixed sized problem LOH1-h50.in provided in the sw4lite/tests/loh1 repository, we increased the computational resources from one node to eight nodes and looked at the elapsed time.
LOH1-h50.in
sw4lite/tests/loh1
Figure 2 shows that SW4Lite scales well across several AWS Graviton2 instances (C6g.16xlarge). As we could have expected on on-premises HPC clusters, the application first scales linearly with the increase of compute power. And then starts flattening for higher number of resources, when the amount of communication is higher than the amount of computations.
Figure 2 - SW4Lite strong scaling study over several AWS EC2 C6g.16xlarge instances
We ported SW4Lite on AWS Graviton 2 which is based on Arm Neoverse N1 cores. Not only the compilation is straightforward with no modification of the source code required, but the excellent performance was also easy to achieve. The readiness of AWS EC2 C6g instances was demonstrated by the results on different test cases. A good scalability behavior was also shown across several C6g instances.
At the time of writing, the cost per simulation for SW4Lite on C6g was on average 25% lower than on the C5a and C5n instances.
[CTAToken URL = "https://www.arm.com/solutions/infrastructure/high-performance-computing" target="_blank" text="Explore HPC on Arm" class ="green"]
[1] Petersson, N.A. and B. Sjogreen (2017). SW4 v2.0. Computational Infrastructure of Geodynamics,Davis, CA. DOI: 10.5281/zenodo.1045297.
[2] Petersson, N.A. and B. Sjogreen (2015). Wave propagation in anisotropic elastic materialsand curvilinear coordinates using a summation-by-parts finite-difference method,Journal of Computational Physics, 299, 820-841. DOI: 10.1016/j.jcp.2015.07.023, URL:http://linkinghub.elsevier.com/retrieve/pii/S0021999115004684.
[3] Sjogreen, B. and N.A. Petersson (2012). A Fourth Order Accurate Finite Difference Schemefor the Elastic Wave Equation in Second Order Formulation, Journal of Scientific Computing,52 (1) , 17-48, doi: 10.1007/s10915-011-9531-1, url: http://link.springer.com/10.1007/s10915-011-9531-1[4] Petersson, N.A. and B. Sjogreen (2012). Stable and efficient modeling of anelastic attenuationin seismic wave propagation, Communications in Computational Physics, 12 (01), 193-225.