Co-Authors: Manoj Iyer and Yichen Jia
With the vast amounts of data being managed in the cloud, there is a need to compress the data before storing it to achieve efficient usage of storage media. Various algorithms have been developed to do the compression and decompression of various data types in flight. In this blog, we take a couple of the widely recognized algorithms – Zstandard and Snappy and compare their performance on Arm servers.
There are various classes of data compression algorithms - some of them tailored to the type of data – ex, video, audio, images/graphics. However, most other types of data require a generic lossless compression algorithm and can provide good compression ratios across different data sets. These compression algorithms get used across multiple applications.
A key challenge for compression algorithms is whether they are optimized for achieving higher compression or for compressing/decompressing at higher speed. One is optimized for saving storage space while the other helps save compute cycles and lowers the latency of operations. Some algorithms, such as Zstandard[1] and zlib[2], offer multiple presets that allow the user/application to select their own trade-off depending on usage. Whereas others (ex, Snappy[3]) are designed for speed.
Zstandard was developed as an open-source algorithm by Facebook to provide maximum compression ratios comparable to DEFLATE algorithm but optimized for much higher speed – especially for decompression. Since its launch in 2016, it has become very popular across multiple sets of applications and become default compression algorithm for the Linux kernel.
Snappy was developed as open-source algorithm by Google, and aims to optimize compression speed with reasonable compression ratios. It is very popular in database and analytics applications.
The Arm software team has optimized both algorithms for high performance on Arm server platforms based on Arm Neoverse cores. These optimizations use the Neon vector engine capabilities for accelerating certain parts of the algorithm.
We took the latest optimized versions of Zstandard and Snappy algorithms and benchmarked them on comparable cloud instances on AWS (Amazon Web Services).
Both algorithms were benchmarked in two different scenarios:
This test focuses on raw aggregated compression/de-compression throughput of 16 parallel processes for different datasets. For Zstandard, we observed an overall performance uplift of 30-67% on C6g instances for compression, and 11-35% for decompression.
With the 20% lower price of the C6g instances factored in, up to 52% savings per MB of compressed data is achieved.
Figure 1: Zstd8 Compression throughput comparison - C5 vs. G6g
Figure 2: Zstd8 De-compression throughput comparison - C5 vs. G6g
With Snappy as the compression algorithm, we observed much higher compression and relatively similar decompression speeds as compared to Zstandard, which is expected. Overall, Snappy performed 40-90% better across the various datasets on C6g instances as compared to C5.
With the 20% lower price of the C6g instances factored in, the result is up to 58% savings per MB of compressed data.
Figure 3: Snappy compression - C5 vs. C6g
Figure 4: Snappy De-Compression - C5 vs. C6g
We also compared the compression ratios for various datasets between the two algorithms on both C6g and C5 instances. In both cases, the same compression ratio was achieved – which shows that the algorithm is operating efficiently as intended.
MongoDB WiredTiger storage engine supports several compression modes: snappy, zstd, zlib, etc. and none. Here we are testing compression modes snappy, zstd none. We used a dataset consisting of English text of 10,000 sentences, that was randomly generated using Python faker.
Separate AWS instances were used as test victim and test host. Documents were inserted into a MongoDB database accounting for 5GB (approximate) of data. The test victim instances that were used are Arm (c6g.2xlarge) and Intel (c5.2xlarge). After the MongoDB database was populated with 5GB of data, we used ‘dbstat’ command to get the storage size.
Between Snappy and Zstandard, we observed that Zstandard was better at compressing the overall database size, as expected.
Figure 5: MongoDB: Database Compression Ratio
Snappy offered better throughput in the Insert operation, which is write (compression) intensive. However, the read/modify/write which involves a mix of compression and decompression showed little difference between the 2 algorithms.
Figure 6: MongoDB: Insert Throughput - Snappy vs. Zstd
Figure 7: MongoDB: Read/Modify/Write Throughput - Snappy vs. Zstd
Generic compression algorithms like Zstandard and Snappy are used across a variety of applications and are very versatile at compressing different types of generic datasets. With Zstandard and Snappy both optimized for Arm Neoverse and AWS Graviton2, we observe two key results vs. Intel-based instances. One, Graviton2-based instances can achieve 11-90% better compression and decompression performance compared to similar Intel-based instance types. Two, Graviton2-based instances can reduce the cost of data-compression by half. With real-world applications like MongoDB, these compression algorithms add little overhead to typical operations, while achieving significant reduction in the database size.
[CTAToken URL = "https://www.arm.com/why-arm/partner-ecosystem/aws" target="_blank" text="More workloads on AWS Graviton2" class ="green"]
References:
[1] Zstandard, http://facebook.github.io/zstd/[2] Zlib, https://zlib.net/[3] Snappy, https://github.com/google/snappy