Arm Community
Arm Community
  • Site
  • User
  • Site
  • Search
  • User
Arm Community blogs
Arm Community blogs
Servers and Cloud Computing blog Using Arm servers to reduce the time and cost of Genomics
  • Blogs
  • Mentions
  • Sub-Groups
  • Tags
  • Jump...
  • Cancel
More blogs in Arm Community blogs
  • AI blog

  • Announcements

  • Architectures and Processors blog

  • Automotive blog

  • Embedded and Microcontrollers blog

  • Internet of Things (IoT) blog

  • Laptops and Desktops blog

  • Mobile, Graphics, and Gaming blog

  • Operating Systems blog

  • Servers and Cloud Computing blog

  • SoC Design and Simulation blog

  • Tools, Software and IDEs blog

Tags
  • High Performance Computing (HPC)
  • High Performance Compute
  • Neoverse
  • Server and HPC
Actions
  • RSS
  • More
  • Cancel
Related blog posts
Related forum threads

Using Arm servers to reduce the time and cost of Genomics

David Lecomber
David Lecomber
October 5, 2022
3 minute read time.

Genomics has been absolutely transformational to public health and continues to deliver benefits for us all. To achieve its many results, involves a significant and growing amount of computing in Cloud and on-prem data centers, at research centers, hospitals, and the wider life sciences industry. 

Reference-guided assembly is an essential stage in many workflows in this field. For a typical patient, a swab leads to a sample being sequenced in a sequencing machine. And the output of this machine is gigabytes of fragments (substrings of the A, C, G, and T DNA bases).  These reads are “aligned” against a complete human genome from a (standard) reference individual to establish where those reads “fit” and assemble large sections of the genome of the patient.

The three most well-known applications that accomplish reference-guided assembly are BWA, bwa-mem2, and minimap2. With such widespread use, the price and performance of these applications is critical to the industry. 

In a previous blog (Optimizing the BWA aligner for Arm servers) we showed how to run BWA and its performance on AWS Graviton2 against the prevailing x86_64 servers of early 2021. 

In this blog, we can now show the performance of the three major aligners on the Arm architecture AWS Graviton3. AWS Graviton3 is the most recent Arm-based server in the AWS fleet, and the successor to AWS Graviton2. 

We demonstrate that AWS Graviton3 increases performance by between 12% and 31% over the AWS Graviton2. And Graviton3 increases performance by 10% and 23% over the best available x86_64 systems today. This result delivers a cost saving of 20-30% over the comparable x86_64 systems.

Applications and test case

We use the human_g1k_v37 reference from the 1000 Genomes project, and NA12878 from NIST archive. These test cases are both mirrored on AWS S3 and fetched using:

aws s3 cp –no-sign-request s3://1000genomes/technical/reference/human_g1k_v37.fasta.gz .

aws s3 cp –no-sign-request s3://giab/data/NA12878/Garvan_NA12878_HG001_HiSeq_Exome/NIST7035_TAAGGCGA_L001_R1_001.fastq.gz .

In each case, we have used the gcc-10 compiler for the platform comparison. Each of these test cases builds easily on Arm through ordinary build scripts, either in the main repository, or in public forks awaiting merge.

https://github.com/lh3/bwa

https://github.com/dslarm/bwa-mem2

https://github.com/dslarm/minimap2 

The applications are all multithreaded with a configurable number of worker threads. We illustrate the benchmark on 8xlarge instances - which have 32 vCPUs - using 32 worker threads.

At run time, we use the Cloudflare zlib package to replace the system zlib, which helps the aligners to decompress the input data files faster.  In a further optimization, for bwa we preload the jemalloc library - which can be more efficient for standard memory allocation functions in multithreaded codes.

The build scripts for each application and to fetch the data sets are available on our Github at https://github.com/arm-hpc/genomics-blog.

AWS Graviton2 to AWS Graviton3 – A leap in capability 

AWS Graviton3 uses the Arm Neoverse V1 core, in contrast AWS Graviton2 uses the Arm Neoverse N1 core.

Neoverse N1 Pipeline

Neoverse V1 Pipeline

The Neoverse V1 brings significant changes, in particular it is a significantly wider core – able to execute more instructions per cycle, extracting more instruction level parallelism than its predecessor.

Using perf stat we can extract the achieved instructions per cycle (IPC) rate for both platforms.

Comparative Instructions per Cycle

As can be seen – the improvement in IPC varies across each application – with minimap2 seeing the most benefit at +26% more instructions per cycle. This result is for a full workload, and also includes time spent in I/O.

AWS Graviton3 is also the first DDR5 system in the AWS fleet, with 50% more DDR bandwidth than its predecessor.

Also, the AWS Graviton3 executes at 2.6GHz, compared to 2.5GHz of the Graviton2. 

The combined impact of higher IPC and frequency translates directly into runtime, which we look at next.

Which architecture provides the most performance and the least cost?

Runtime per alignment with 32-worker threads

Compared to the previous generation AWS Graviton2 (c6g.8xlarge) the performance is between 12% and 31% higher.

However, AWS Graviton3 also demonstrates between 10% and 23% more performance compared to Intel Icelake (c6i.8xlarge) and between 11% and 21% more performance than AMD Milan (c6a.8xlarge).  

If we turn to cost per alignment:

Relative cost of alignment

Summary

AWS Graviton3 offers the most price performance of any platform for all three genomics applications. Running the same alignment on AMD Milan costs up to 27% more per sample set or up to 45% more on Intel Ice Lake. This result means that AWS Graviton3 saves up to 20% over AMD Milan and up to 30% compared Intel Ice Lake.

More HPC Blogs

Anonymous
Servers and Cloud Computing blog
  • Advancing Chiplet Innovation for Data Centers: Novatek’s CSS N2 SoC in Arm Total Design

    Marc Meunier
    Marc Meunier
    Novatek’s CSS N2 SoC, built with Arm Total Design, drives AI, cloud, and automotive innovation with chiplet-based, scalable compute.
    • September 24, 2025
  • How we cut LLM inference costs by 35% migrating to Arm-Based AWS Graviton

    Cornelius Maroa
    Cornelius Maroa
    The monthly wake-up call. Learn how Arm-based Graviton3 reduced costs 40%, cut power use 23%, and unlocked faster, greener AI at scale.
    • September 24, 2025
  • Hands-on with MPAM: Deploying and verifying on Ubuntu

    Howard Zhang
    Howard Zhang
    In this blog post, Howard Zhang walks through how to configure and verify MPAM on Ubuntu Linux.
    • September 24, 2025