Assessing Seismic Wave Modelling on AWS Graviton2 with SW4Lite

September 9, 2020

6 minute read time.

Seismic wave propagation simulation is key to several HPC workloads. On the one hand, it is at the heart of algorithms such as the Reverse Time Migration (RTM) or the Full Waveform Inversion (FWI) used by the Oil & Gas community to map the underground. On the other hand, it is a key tool to better understand the effects of earthquakes, evaluate seismic hazards and calculate potential damage to buildings.

The discretization of the wave equation has therefore been a hot topic of research in the past decades. Which led to several numerical methods being used to simulate the propagation of waves: Spectral Element Methods (SEM), Finite Element Methods (FEM), Discontinuous Galerkin Methods (DGM) or Finite-Difference Methods (FDM).

We have shown in a previous blog the readiness of AWS Graviton2 for seismic wave Modeling with SPECFEM3D, which is based on a spectral element method. In the present blog, we would like to study the readiness of AWS Graviton2 for a finite-difference based application: SW4Lite.

Background to SW4 and SW4Lite

SW4 [1] was developed at Lawrence Livermore National Laboratory (LLNL) with financial support of the US Department of Energy.

SW4 stands for Seismic Waves fourth order. SW4 simulates the propagation of seismic waves in three-dimensional heterogeneous material model. It uses a fourth order in space and time finite-difference discretization of the elastic wave equations in displacement formulation [2],[3]. The dissipative nature of realistic materials can also be modelled by a viscoelastic propagation behavior [4].

SW4 is written in C++ and Fortran. Both shared and distributed memory parallelization of the code are available with a hybrid MPI and OpenMP implementation. GPU and RAJA versions are also available.

SW4 source code is available for download here.

SW4Lite is a proxy application implementing some of the most compute-intensive kernels of SW4 and its communication scheme. SW4Lite belongs to the ECP Proxy Application Suite. It is used to evaluate and compare different hardware solutions, and in co-design activities to find solutions to overcome exascale challenges.

SW4Lite source code can be downloaded here.

Building SW4Lite on AWS Graviton2

Building SW4Lite requires a C++ and Fortran compiler. In our case, we used the version 9.3 of the GNU compilers. The MPI implementation that we used is Open MPI 4.0.3. An implementation of BLAS is also required, we used the Arm Performance Libraries (ArmPL).

First retrieve the source code from the Git repository.

git clone https://github.com/geodynamics/sw4lite.git

Insert in the Makefile a subsection as follows:

else ifeq ($(findstring C6g.16xlarge,$(HOSTNAME)),C6g.16xlarge)
FC = mpif90
CXX = mpic++
OMPOPT = -fopenmp
OPT = -Ofast -mcpu=native -I${ARMPL_INCLUDES}
EXTRA_LINK_FLAGS = -L${ARMPL_LIBRARIES} -mcpu=native -larmpl_lp64_mp -lgfortran -lm
openmp = yes
debug=no
computername := C6g.16xlarge

Make sure to replace C6g.16xlarge by your C6g.16xlarge hostname in the else statement. And then simply use the Makefile to compile:

make

This will compile the Fortran version of SW4Lite’s kernels. If the compilation is successful, one should see SW4 Lives! In the terminal:

``'-.,_,.-'``'-.,_,.='``'-.,_,.-'``'-.,_,.='````'-.,_,.-'``'-.,_,.='``


  _________    ____      __      ____    _    __
 /   ____  \   \   \    /  \    /   /   / |  |  |
 |  |    \./    \   \  /    \  /   /   /  |  |  |
 |  |______      \   \/      \/   /   /   '--'  |
 \______   \      \              /    |______   |
        |  |       \     /\     /            |  |
 /`\____|  |        \   /  \   /             |  |
 \_________/         \_/    \_/              |__|

   __       __  ____    ____  _______    ______    __
  |  |     |  | \   \  /   / |   ____|  /    __|  |  |
  |  |     |  |  \   \/   /  |  |__     |   (__   |  |
  |  |     |  |   \      /   |   __|    \__    |  |  |
  |  `----.|  |    \    /    |  |____    __)   |  |__|
  |_______||__|     \__/     |_______|  (_____/   (__)


``'-.,_,.-'``'-.,_,.='``'-.,_,.-'``'-.,_,.='````'-.,_,.-'``'-.,_,.='``

For the C version, use the following command:

make ckernel=yes

In this study we focused on the Fortran version.

Validation

SW4Lite comes with a small test suite to validate the results.

For the Fortran version, one needs to change the parameter corder to 0 in the files

sw4lite/pytest/reference/pointsource/pointsource.in
sw4lite/pytest/reference/topo/curvilinear.in

The “-l” parameter can be 0, 1 or 2 and is the level, “-m” sets the number of MPI ranks, “-t” the number of threads and most importantly “-d” gives the path to the binary of SW4Lite being tested.

  python3 ./test_sw4lite.py -v -l 2 -m 8 -t 1 -d optimize_mp_C6g.16xlarge

Both tests passed in our case:

[…]
Test # 1 Input file: pointsource.in PASSED
[…]
Test # 2 Input file: curvilinear.in PASSED
[…]
Out of 2 tests, 0 failed and  2 passed

Running SW4Lite on AWS Graviton2

Several test cases and input files are provided with the SW4Lite repository. As an example, one can launch the LOH.1-h100 problem available in the directory sw4lite/tests/loh1/ as detailed below.

export OMP_NUM_THREADS=1
mpirun -n 64 –bind-to core sw4lite LOH.1-h100.in

Single node performance

We propose here to compare the AWS Graviton2 (C6g.16xlarge) to the latest x86-based instances available in AWS EC2: Intel (C5n.18xlarge) and AMD (C5a.16xlarge). To compare different AWS EC2 instances we ran SW4Lite for different test cases provided in SW4Lite repository. We normalized the execution time for each test case, taking the time on C6g.16xlarge instance as the reference. Since each AWS EC2 instance has a different cost we are focusing on the normalized cost of a simulation rather than normalized execution time. At the time of writing, for the N. Virginia region, the cost of the instances was:

C6g.16xlarge	C5a.16xlarge	C5n.18xlarge
2.176 USD/h	2.464 USD/h	3.888 USD/h

Taking into account the execution time for each simulation in Figure 1, the AWS Graviton2 (C6g.16xlarge) shows on average a 25% cost reduction compared to other instances.

Figure 1 - Simulation price comparison across different AWS EC2 instances

Multi-node performance

Realistic simulations require more computational power than what a single node can provide. It is common for such computations to use hundreds up thousands of cores. It is important to make sure that using more resources brings a benefit in terms of execution time. One aspect is to make sure that the communication network between the nodes is not a limiting factor. If the execution time is reduced proportionally to the increase of compute resources, we say that the application scales well on the system.

We tested the strong scaling behavior of SW4Lite. For the fixed sized problem LOH1-h50.in provided in the sw4lite/tests/loh1 repository, we increased the computational resources from one node to eight nodes and looked at the elapsed time.

Figure 2 shows that SW4Lite scales well across several AWS Graviton2 instances (C6g.16xlarge). As we could have expected on on-premises HPC clusters, the application first scales linearly with the increase of compute power. And then starts flattening for higher number of resources, when the amount of communication is higher than the amount of computations.

Figure 2 - SW4Lite strong scaling study over several AWS EC2 C6g.16xlarge instances

Summary

We ported SW4Lite on AWS Graviton 2 which is based on Arm Neoverse N1 cores. Not only the compilation is straightforward with no modification of the source code required, but the excellent performance was also easy to achieve.

The readiness of AWS EC2 C6g instances was demonstrated by the results on different test cases. A good scalability behavior was also shown across several C6g instances.

At the time of writing, the cost per simulation for SW4Lite on C6g was on average 25% lower than on the C5a and C5n instances.

Explore HPC on Arm

References

[1] Petersson, N.A. and B. Sjogreen (2017). SW4 v2.0. Computational Infrastructure of Geodynamics,
Davis, CA. DOI: 10.5281/zenodo.1045297.

[2] Petersson, N.A. and B. Sjogreen (2015). Wave propagation in anisotropic elastic materialsand curvilinear coordinates using a summation-by-parts finite-difference method,
Journal of Computational Physics, 299, 820-841. DOI: 10.1016/j.jcp.2015.07.023, URL:
http://linkinghub.elsevier.com/retrieve/pii/S0021999115004684.

[3] Sjogreen, B. and N.A. Petersson (2012). A Fourth Order Accurate Finite Difference Scheme
for the Elastic Wave Equation in Second Order Formulation, Journal of Scientific Computing,
52 (1) , 17-48, doi: 10.1007/s10915-011-9531-1, url: http://link.springer.com/10.1007/s10915-
011-9531-1

[4] Petersson, N.A. and B. Sjogreen (2012). Stable and efficient modeling of anelastic attenuation
in seismic wave propagation, Communications in Computational Physics, 12 (01), 193-225.

Servers and Cloud Computing blog

How SiteMana scaled real-time visitor ingestion and ML inference by migrating to Arm-based AWS Graviton3

Peter Ma

Migrating to Arm-based AWS Graviton3 improved SiteMana’s scalability, latency, and costs while enabling real-time ML inference at scale.
- July 4, 2025
Arm Performance Libraries 25.04 and Arm Toolchain for Linux 20.1 Release

Chris Goodyer

In this blog post, we announce the releases of Arm Performance Libraries 25.04 and Arm Toolchain for Linux 20.1. Explore the new product features, performance highlights and how to get started.
- June 17, 2025
Harness the Power of Retrieval-Augmented Generation with Arm Neoverse-powered Google Axion Processors

Na Li

This blog explores the performance benefits of RAG and provides pointers for building a RAG application on Arm®︎ Neoverse-based Google Axion Processors for optimized AI workloads.
- April 7, 2025

AI blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded and Microcontrollers blog

Internet of Things (IoT) blog

Laptops and Desktops blog

Mobile, Graphics, and Gaming blog

Operating Systems blog

Servers and Cloud Computing blog

SoC Design and Simulation blog

Tools, Software and IDEs blog