Physics-based three-dimensional numerical simulations are becoming more predictive and are an essential tool for improving the understanding of natural phenomena like seismic wave propagation. These simulations are at the heart of popular Oil & Gas workflows such as the Reverse Time Migration (RTM) or the Full Wave Inversion (FWI). Seismic wave modeling solvers need to take advantage of both post-petascale computing facilities and the latest features at the chip level. This often corresponds to hundreds of thousands of high-end computing cores routinely used for seismic inversion in the largest computing facilities. These applications make the most of reliable software stacks and ecosystems, including optimized compilers toolchains and math libraries.
SPECFEM3D is a leading software package dedicated to seismic wave modeling [1]. The main author, D.Komatitsch (1970-2019), received numerous awards including the Gordon Bell Super Computing Award in 2003 for various breakthroughs in computational geophysics on multi-petascale systems [3].
The code is implemented in Fortran95 and relies heavily on the MPI library, though OpenMP multithreading is also available as an option. Most of the MPI communications occur between neighboring subdomains to exchange boundary information. Phases of communication and computation are overlapped in the explicit time-marching scheme. The Spectral-Elements Method is computationally intense, and the computation of the internal forces can be responsible for as much as 85% of the application runtime. The code simulates acoustic (fluid), elastic (solid), coupled acoustic and elastic, poroelastic, or seismic wave propagation in any type of conforming mesh of hexahedra (structured or unstructured). It can, for instance, model seismic waves propagating in sedimentary basins or any other regional geological model. It can also be used for non-destructive testing or for ocean acoustics. The code is publicly available to download.
For this study, we consider the most recent version of SPECFEM3D Cartesian (2.0) accessible VIA the Git repository. We use the Arm Compiler for Linux (ACfL) with the Arm Performance Libraries (ArmPL), and OpenMPI 4.0.3. The compilation of this application is straightforward with no modification of the source code. Further details, as well as instructions for compiling the SPECFEM3D_Globe version of the code, are available on the Arm HPC Users Group Packages Wiki.
# git clone --recursive https://github.com/geodynamics/specfem3d.git# export FCFLAGS="-O3 -mcpu=native"# ./configure FC=armflang CC=armclang MPIFC=mpif90 --with-mpi# make
We consider a standard three-dimensional benchmark available in the source code in the EXAMPLES/meshfem3D_examples/simple_model/ folder. This example describes a three-dimensional model with topography, an internal interface, and a mesh doubling layer. The 3D mesh is shown in Figure 1. As the in-house mesher is used, Cubit Software package is not needed. This example is representative of standard usage of SPECFEM3D.
EXAMPLES/meshfem3D_examples/simple_model/
Figure 1: View of the three-dimensional mesh (hexahedra).
We adjust the size of this benchmark for both single node and multi-nodes runs. The first file to modify is DATA/Par_file where the number of cores involved in the computation is specified by setting the NPROC variable. For benchmarking purpose, we also set NSTEPS to specify the number of timesteps since the simulation wall clock time is directly proportional to the number of timesteps:
DATA/Par_file
# number of MPI processorsNPROC = 4# time step parametersNSTEP = 2000DT = 0.03
Similarly, we adjust DATA/meshfem3D_files/Mesh_Par_file to specify the number of grid points in each direction. The number of grid points in each direction must be divisible by the number of cores.
DATA/meshfem3D_files/Mesh_Par_file
# number of elements at the surface along edges of the mesh at the surface# (must be 8 * multiple of NPROC below if mesh is not regular and contains mesh doublings)# (must be multiple of NPROC below if mesh is regular)NEX_XI = 64NEX_ETA = 64# number of MPI processors along xi and eta (can be different)NPROC_XI = 2NPROC_ETA = 2
Also in DATA/meshfem3D_files/Mesh_Par_file, we define the different regions of the model:
# number of regionsNREGIONS = 4# define the different regions of the model as :#NEX_XI_BEGIN #NEX_XI_END #NEX_ETA_BEGIN #NEX_ETA_END #NZ_BEGIN #NZ_END #material_id1 64 1 64 1 4 11 64 1 64 5 5 21 64 1 64 6 15 314 25 7 19 7 10 4
The AWS M6g instances use AWS's own Graviton2 SoC based on Arm's Neoverse N1 cores. For this study, we use an m6g.16xlarge instance. This instance has 64 Arm N1 CPUs, a 25Gbps interconnect, and an Elastic Block Store (EBS) filesystem running at 19Gbps.
Figure 2 shows the seismograms obtained from the execution of the standard benchmark located in EXAMPLES/meshfem3D_examples/simple_model/. These reference results are available from the Git repository to facilitate the numerical validation of simulations. For three different receivers (virtual sensors located at the free surface to measure the propagation of the wave), we can observe that the velocity of the seismic wave from the runs on AWS Graviton2 match with the reference plots. A detailed analysis of the output files confirms this visual comparison.
Figure 2: Comparison of the seismograms obtained on AWS Gravtion2, m6g.16xlarge instances and the reference results.
Figure 3 summarizes the performance of SPECFEM3D on up to six AWS m6g.16xlarge instances. We increased the size of the benchmark with NEX_XI = 256 and NEX_XI=192 and we performed 5000 timesteps to compare the run times from 16-384 cores. Table 1 summarizes results obtained on single node up to 64 computing cores. For multi-nodes simulations, the maximum speedup measured is 6.38 on six M6g instances (384 cores in total). In this case, communications costs from the exchange of the boundary of the subdomains appear to be perfectly overlapped by the computation of the inner elements on each partition. Additionally, we slightly benefit from a well-documented cache-effect by reducing the number of elements computed by each MPI process.
16
32
64
Elapsed Time
1920s
1030s
610s
Table 1: Run time on a single node (from 16 to 64 MPI tasks).
Figure 3: SPECFEM3D on up to six AWS m6g.16xlarge instances (at most 384 cores).
We ported one of the leading seismic wave modeling applications, SPECFEM3D, to the Arm Neoverse N1. SPECFEM3D on AWS Graviton2 shows good scalability, both on single node and across multiple M6g instances. The compilation of this application is straightforward with no modification of the source code. This also highlights the maturity of the Arm Compiler for Linux on the target platform with major enhancements for Neoverse N1 cores. Based on a representative three-dimensional benchmark, single and multi-nodes results demonstrate the readiness of AWS EC2 M6g instances for seismic simulations.
[CTAToken URL = "https://www.arm.com/solutions/infrastructure?_ga=2.113822528.654899687.1591039071-225901144.1543880409" target="_blank" text="See Arm Infrastructure solutions for HPC" class ="green"]