Comparing data compression algorithm performance on AWS Graviton2

February 8, 2022

4 minute read time.

Co-Authors: Manoj Iyer and Yichen Jia

With the vast amounts of data being managed in the cloud, there is a need to compress the data before storing it to achieve efficient usage of storage media. Various algorithms have been developed to do the compression and decompression of various data types in flight. In this blog, we take a couple of the widely recognized algorithms – Zstandard and Snappy and compare their performance on Arm servers.

Background

There are various classes of data compression algorithms - some of them tailored to the type of data – ex, video, audio, images/graphics. However, most other types of data require a generic lossless compression algorithm and can provide good compression ratios across different data sets. These compression algorithms get used across multiple applications.

File or object-storage systems like Ceph, OpenZFS, SquashFS
Database or analytics applications like MongoDB, Kafka, Hadoop, Redis etc.
Web or HTTP – NGINX, curl, Django etc.
Archival software – tar, winzip etc.
Several other use-cases like – ex, Linux kernel compression

Compression vs. speed

A key challenge for compression algorithms is whether they are optimized for achieving higher compression or for compressing/decompressing at higher speed. One is optimized for saving storage space while the other helps save compute cycles and lowers the latency of operations. Some algorithms, such as Zstandard[1] and zlib[2], offer multiple presets that allow the user/application to select their own trade-off depending on usage. Whereas others (ex, Snappy[3]) are designed for speed.

Zstandard was developed as an open-source algorithm by Facebook to provide maximum compression ratios comparable to DEFLATE algorithm but optimized for much higher speed – especially for decompression. Since its launch in 2016, it has become very popular across multiple sets of applications and become default compression algorithm for the Linux kernel.

Snappy was developed as open-source algorithm by Google, and aims to optimize compression speed with reasonable compression ratios. It is very popular in database and analytics applications.

The Arm software team has optimized both algorithms for high performance on Arm server platforms based on Arm Neoverse cores. These optimizations use the Neon vector engine capabilities for accelerating certain parts of the algorithm.

Performance comparisons

We took the latest optimized versions of Zstandard and Snappy algorithms and benchmarked them on comparable cloud instances on AWS (Amazon Web Services).

2xlarge instances – using AWS Graviton2 based on Arm Neoverse N1 cores
2xlarge instances – using Intel Cascade Lake

Both algorithms were benchmarked in two different scenarios:

Focusing on raw algorithm performance – we tested using the lzbench tool against the Silesia corpus that includes different industry standard data types.
Application-level performance with a popular NoSQL database, MongoDB – testing the impact of using these compression algorithms on database operations throughput and latency with the YCSB tool, and measuring the overall compression of the database.

Raw algorithm performance

Bandwidth (speed) comparison

This test focuses on raw aggregated compression/de-compression throughput of 16 parallel processes for different datasets. For Zstandard, we observed an overall performance uplift of 30-67% on C6g instances for compression, and 11-35% for decompression.

With the 20% lower price of the C6g instances factored in, up to 52% savings per MB of compressed data is achieved.

Zstd8 compression throughput - C5 vs C6g

Figure 1: Zstd8 Compression throughput comparison - C5 vs. G6g

Zstd8 de-compression throughput - C5 vs C6g

Figure 2: Zstd8 De-compression throughput comparison - C5 vs. G6g

With Snappy as the compression algorithm, we observed much higher compression and relatively similar decompression speeds as compared to Zstandard, which is expected. Overall, Snappy performed 40-90% better across the various datasets on C6g instances as compared to C5.

With the 20% lower price of the C6g instances factored in, the result is up to 58% savings per MB of compressed data.

Snappy compression C5 vs C6g

Figure 3: Snappy compression - C5 vs. C6g

Snappy de-compression C5 vs C6g

Figure 4: Snappy De-Compression - C5 vs. C6g

Compression ratio

We also compared the compression ratios for various datasets between the two algorithms on both C6g and C5 instances. In both cases, the same compression ratio was achieved – which shows that the algorithm is operating efficiently as intended.

Application-level performance

MongoDB WiredTiger storage engine supports several compression modes: snappy, zstd, zlib, etc. and none. Here we are testing compression modes snappy, zstd none. We used a dataset consisting of English text of 10,000 sentences, that was randomly generated using Python faker.

Separate AWS instances were used as test victim and test host. Documents were inserted into a MongoDB database accounting for 5GB (approximate) of data. The test victim instances that were used are Arm (c6g.2xlarge) and Intel (c5.2xlarge). After the MongoDB database was populated with 5GB of data, we used ‘dbstat’ command to get the storage size.

Snappy vs Zstandard – speed vs. compression

Between Snappy and Zstandard, we observed that Zstandard was better at compressing the overall database size, as expected.

MongoDB - Database Compression Ratio

Figure 5: MongoDB: Database Compression Ratio

Snappy offered better throughput in the Insert operation, which is write (compression) intensive. However, the read/modify/write which involves a mix of compression and decompression showed little difference between the 2 algorithms.

MongoDB Insert Throughput - Snappy vs. Zstd

Figure 6: MongoDB: Insert Throughput - Snappy vs. Zstd

MongoDB Read/Modify/Write Throughput - Snappy vs. Zstd

Figure 7: MongoDB: Read/Modify/Write Throughput - Snappy vs. Zstd

Conclusion

Generic compression algorithms like Zstandard and Snappy are used across a variety of applications and are very versatile at compressing different types of generic datasets. With Zstandard and Snappy both optimized for Arm Neoverse and AWS Graviton2, we observe two key results vs. Intel-based instances. One, Graviton2-based instances can achieve 11-90% better compression and decompression performance compared to similar Intel-based instance types. Two, Graviton2-based instances can reduce the cost of data-compression by half. With real-world applications like MongoDB, these compression algorithms add little overhead to typical operations, while achieving significant reduction in the database size.

More workloads on AWS Graviton2

References:

[1] Zstandard, http://facebook.github.io/zstd/
[2] Zlib, https://zlib.net/
[3] Snappy, https://github.com/google/snappy

0 comments
0 members are here

Servers and Cloud Computing blog

Unlocking Performance and Cost Savings with Arm Neoverse-powered AWS Graviton for Zilliz Cloud

Jiang Chen

This blog explores how Zilliz Cloud migrated from x86 to Arm CPUs to boost performance and slash costs with Arm Neoverse CPUs for compute-intensive AI workloads, scalable vector search and RAG pipelines…
- July 21, 2025
Introducing New Sparse Functions in Arm Performance Libraries 25.07

Chris Armstrong

In this blog, we introduce the new sparse functions added in Arm Performance Libraries 25.07. We also take a closer look at new features and share performance insights based on benchmarks running on Arm…
- July 16, 2025
How SiteMana scaled real-time visitor ingestion and ML inference by migrating to Arm-based AWS Graviton3

Peter Ma

Migrating to Arm-based AWS Graviton3 improved SiteMana’s scalability, latency, and costs while enabling real-time ML inference at scale.
- July 4, 2025

AI blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded and Microcontrollers blog

Internet of Things (IoT) blog

Laptops and Desktops blog

Mobile, Graphics, and Gaming blog

Operating Systems blog

Servers and Cloud Computing blog

SoC Design and Simulation blog

Tools, Software and IDEs blog