Arm Community
Arm Community
  • Site
  • User
  • Site
  • Search
  • User
  • Groups
    • Research Collaboration and Enablement
    • DesignStart
    • Education Hub
    • Innovation
    • Open Source Software and Platforms
  • Forums
    • AI and ML forum
    • Architectures and Processors forum
    • Arm Development Platforms forum
    • Arm Development Studio forum
    • Arm Virtual Hardware forum
    • Automotive forum
    • Compilers and Libraries forum
    • Graphics, Gaming, and VR forum
    • High Performance Computing (HPC) forum
    • Infrastructure Solutions forum
    • Internet of Things (IoT) forum
    • Keil forum
    • Morello Forum
    • Operating Systems forum
    • SoC Design and Simulation forum
    • 中文社区论区
  • Blogs
    • AI and ML blog
    • Announcements
    • Architectures and Processors blog
    • Automotive blog
    • Graphics, Gaming, and VR blog
    • High Performance Computing (HPC) blog
    • Infrastructure Solutions blog
    • Innovation blog
    • Internet of Things (IoT) blog
    • Operating Systems blog
    • Research Articles
    • SoC Design and Simulation blog
    • Tools, Software and IDEs blog
    • 中文社区博客
  • Support
    • Arm Support Services
    • Documentation
    • Downloads
    • Training
    • Arm Approved program
    • Arm Design Reviews
  • Community Help
  • More
  • Cancel
Arm Community blogs
Arm Community blogs
Infrastructure Solutions blog Comparing data compression algorithm performance on AWS Graviton2
  • Blogs
  • Mentions
  • Sub-Groups
  • Tags
  • Jump...
  • Cancel
More blogs in Arm Community blogs
  • AI and ML blog

  • Announcements

  • Architectures and Processors blog

  • Automotive blog

  • Embedded blog

  • Graphics, Gaming, and VR blog

  • High Performance Computing (HPC) blog

  • Infrastructure Solutions blog

  • Internet of Things (IoT) blog

  • Operating Systems blog

  • SoC Design and Simulation blog

  • Tools, Software and IDEs blog

Tags
  • compression
  • Neoverse N1
  • infrastructure
  • Cloud Application
Actions
  • RSS
  • More
  • Cancel
Related blog posts
Related forum threads

Comparing data compression algorithm performance on AWS Graviton2

Ravi Malhotra
Ravi Malhotra
February 8, 2022
4 minute read time.

Co-Authors: Manoj Iyer and Yichen Jia


With the vast amounts of data being managed in the cloud, there is a need to compress the data before storing it to achieve efficient usage of storage media. Various algorithms have been developed to do the compression and decompression of various data types in flight. In this blog, we take a couple of the widely recognized algorithms – Zstandard and Snappy and compare their performance on Arm servers.

Background

There are various classes of data compression algorithms - some of them tailored to the type of data – ex, video, audio, images/graphics. However, most other types of data require a generic lossless compression algorithm and can provide good compression ratios across different data sets. These compression algorithms get used across multiple applications.

  • File or object-storage systems like Ceph, OpenZFS, SquashFS
  • Database or analytics applications like MongoDB, Kafka, Hadoop, Redis etc.
  • Web or HTTP – NGINX, curl, Django etc.
  • Archival software – tar, winzip etc.
  • Several other use-cases like – ex, Linux kernel compression

Compression vs. speed

A key challenge for compression algorithms is whether they are optimized for achieving higher compression or for compressing/decompressing at higher speed. One is optimized for saving storage space while the other helps save compute cycles and lowers the latency of operations. Some algorithms, such as Zstandard[1] and zlib[2], offer multiple presets that allow the user/application to select their own trade-off depending on usage. Whereas others (ex, Snappy[3]) are designed for speed.

Zstandard was developed as an open-source algorithm by Facebook to provide maximum compression ratios comparable to DEFLATE algorithm but optimized for much higher speed – especially for decompression. Since its launch in 2016, it has become very popular across multiple sets of applications and become default compression algorithm for the Linux kernel.

Snappy was developed as open-source algorithm by Google, and aims to optimize compression speed with reasonable compression ratios. It is very popular in database and analytics applications.

The Arm software team has optimized both algorithms for high performance on Arm server platforms based on Arm Neoverse cores. These optimizations use the Neon vector engine capabilities for accelerating certain parts of the algorithm.

Performance comparisons

We took the latest optimized versions of Zstandard and Snappy algorithms and benchmarked them on comparable cloud instances on AWS (Amazon Web Services).

  • 2xlarge instances – using AWS Graviton2 based on Arm Neoverse N1 cores
  • 2xlarge instances – using Intel Cascade Lake

Both algorithms were benchmarked in two different scenarios:

  • Focusing on raw algorithm performance – we tested using the lzbench tool against the Silesia corpus that includes different industry standard data types.
  • Application-level performance with a popular NoSQL database, MongoDB – testing the impact of using these compression algorithms on database operations throughput and latency with the YCSB tool, and measuring the overall compression of the database.

Raw algorithm performance

Bandwidth (speed) comparison

This test focuses on raw aggregated compression/de-compression throughput of 16 parallel processes for different datasets. For Zstandard, we observed an overall performance uplift of 30-67% on C6g instances for compression, and 11-35% for decompression.

With the 20% lower price of the C6g instances factored in, up to 52% savings per MB of compressed data is achieved.

Zstd8 compression throughput - C5 vs C6g

Figure 1: Zstd8 Compression throughput comparison - C5 vs. G6g

Zstd8 de-compression throughput - C5 vs C6g

Figure 2: Zstd8 De-compression throughput comparison - C5 vs. G6g

With Snappy as the compression algorithm, we observed much higher compression and relatively similar decompression speeds as compared to Zstandard, which is expected. Overall, Snappy performed 40-90% better across the various datasets on C6g instances as compared to C5.

With the 20% lower price of the C6g instances factored in, the result is up to 58% savings per MB of compressed data.

Snappy compression C5 vs C6g

Figure 3: Snappy compression - C5 vs. C6g

Snappy de-compression C5 vs C6g

Figure 4: Snappy De-Compression - C5 vs. C6g

Compression ratio

We also compared the compression ratios for various datasets between the two algorithms on both C6g and C5 instances. In both cases, the same compression ratio was achieved – which shows that the algorithm is operating efficiently as intended. 

Application-level performance

MongoDB WiredTiger storage engine supports several compression modes: snappy, zstd, zlib, etc. and none. Here we are testing compression modes snappy, zstd none. We used a dataset consisting of English text of 10,000 sentences, that was randomly generated using Python faker. 

Separate AWS instances were used as test victim and test host. Documents were inserted into a MongoDB database accounting for 5GB (approximate) of data. The test victim instances that were used are Arm (c6g.2xlarge) and Intel (c5.2xlarge). After the MongoDB database was populated with 5GB of data, we used ‘dbstat’ command to get the storage size.  

Snappy vs Zstandard – speed vs. compression

Between Snappy and Zstandard, we observed that Zstandard was better at compressing the overall database size, as expected.

MongoDB - Database Compression Ratio

Figure 5: MongoDB: Database Compression Ratio

Snappy offered better throughput in the Insert operation, which is write (compression) intensive. However, the read/modify/write which involves a mix of compression and decompression showed little difference between the 2 algorithms.

MongoDB Insert Throughput - Snappy vs. Zstd

Figure 6: MongoDB: Insert Throughput - Snappy vs. Zstd

MongoDB Read/Modify/Write Throughput - Snappy vs. Zstd

Figure 7: MongoDB: Read/Modify/Write Throughput - Snappy vs. Zstd

Conclusion

Generic compression algorithms like Zstandard and Snappy are used across a variety of applications and are very versatile at compressing different types of generic datasets. With Zstandard and Snappy both optimized for Arm Neoverse and AWS Graviton2, we observe two key results vs. Intel-based instances. One, Graviton2-based instances can achieve 11-90% better compression and decompression performance compared to similar Intel-based instance types. Two, Graviton2-based instances can reduce the cost of data-compression by half.  With real-world applications like MongoDB, these compression algorithms add little overhead to typical operations, while achieving significant reduction in the database size.

More workloads on AWS Graviton2

References:

[1] Zstandard, http://facebook.github.io/zstd/
[2] Zlib, https://zlib.net/
[3] Snappy, https://github.com/google/snappy

Anonymous
Infrastructure Solutions blog
  • Arm at Fortinet Accelerate : The Need for Powerful Energy-Efficient Compute

    Marc Meunier
    Marc Meunier
    Arm is excited to be a gold sponsor of Fortinet Accelerate 2023 where we will share details and answer questions on how Arm is transforming the world with our licensable efficient compute technology.
    • March 30, 2023
  • Improve Memcached performance up to 41% with Alibaba Cloud Yitian 710 instances

    Ker Liu
    Ker Liu
    In this blog we demonstrate the advantage of running Memcached on Arm-based Alibaba Yitian 710 instances over x86-based instances.
    • March 14, 2023
  • Spark on AWS Graviton2 best practices: K-Means clustering case study

    Masoud Koleini
    Masoud Koleini
    This report provides an in-depth tuning guide for running a Spark application on a Graviton EC2 instance cluster. And we make recommendations to improve performance and reduce cost.
    • March 7, 2023