Increase performance by up to 30% by deploying Apache Cassandra on AWS Graviton2

August 18, 2021

9 minute read time.

Co Authors: Masoud Koleini and Paul Yang

Introduction

Apache Cassandra is an open source distributed NoSQL database for mission critical applications. Cassandra is known for its hybrid solutions, security, high scalability, and speed. It has SQL-like query language called CQL (Cassandra Query Language) and is easier to use by people familiar with traditional SQL. Cassandra is specifically optimized for writes, so it offers high scalability and lower latencies for write operations.

In this blog, we compare the performance of Cassandra running in the AWS cloud using compute-optimized instances C6gd (AWS Graviton2), C5d (Intel Skylake-SP or Cascade Lake), and C5ad (AMD EPYC 7002 series). All the instance types provide high network bandwidth and feature local high speed NVMe-based SSD storage for faster disk IO operations.

Arm Neoverse-based AWS Graviton2 Processors

AWS Graviton2 processors are custom built by Amazon Web Services using Arm Neoverse N1 cores to deliver the best price performance for your cloud workloads running in Amazon EC2. Compared to similar x86-based instances, Graviton2 provides better price performance and more NVMe storage, thus allowing users to run their computations at a lower price. Arm64 is supported by major Linux distributions, so it is very convenient for users to migrate their applications to Graviton2 instances to reduce operational costs while enhancing performance.

Today, Graviton2 instances are available as general purpose (M6g/M6gd/ T4g), compute-optimized (C6g/C6gd/C6gn), and memory-optimized (R6g/R6gd/X2gd) types. Users can choose their instance types based on the applications they plan to run, which will deliver up to 40% better price-performance compared to similar 5th generation x86 instances.

Test environment

Benchmarks are run against a single instance of Cassandra. We use YCSB (Yahoo! Cloud Serving Benchmark) as the benchmarking tool to report metrics on INSERT, UPDATE, and RMW (read-modify-write) operations. YCSB runs on a separate instance in AWS, but within the same cluster and placement group as Cassandra in order to keep latency to a minimum.

AWS instances setup

Both the Cassandra instance and YCSB load generator run on 2xlarge R-series instances. Instances C6gd, C5d, and C5ad come with 8 vCPUs, 16 GiB of memory, NVMe SSD storage and network bandwidth up to 10 Gbps. All the instances run Ubuntu 20.04 as the operating system.

Load generator setup

YCSB workloads execute in two phases, loading and transactions. The first phase defines the data to be inserted into the database, and the second defines the operations. The parameters of the workloads can be passed as input or defined in a workload file. We used Workload F for the tests, which benchmarks the database against a 50:50 ratio of read and read-modify-write operations. We modified the workload by changing the 'recordcount' and 'operationcount' parameters to 200K. For each instance type tested, we ran the load generator setup using that same instance type (ex, to test c6g.2xlarge, we ran the load generator on a c6g.2xlarge instance).

Apache Cassandra setup

For this benchmark, we used Cassandra 4.0-RC1 (the latest at the time of publication) and ran it on OpenJDK 11. In the Cassandra configuration file, the parameters `data_file_directories` and `commitlog_directory` were changed to point to the NVMe directory mounted on the instance. In addition, the required `usertable` was created in Cassandra before running the load generator.

Price-performance comparison

We'll begin with a price-performance summary of our findings followed by performance and latency results below.

On-demand pricing is used for our price-performance calculations, using published on-demand pricing at the time of testing. AWS Graviton2 has the lowest hourly-based cost of all the instances tested.

The followings charts show cost effectiveness of Graviton2 vs other x86 instance types with thread counts of 1, 8, 16, 32, 64, 96 and 120 at the time of publication. The left axis is the price-performance for INSERT and RMW operations, and the right axis is the improvement percentage of Graviton2 vs x86 instances.

Up to 45% better Cassandra INSERT price-performance

Figure 1. Cassandra INSERT Price-performance improvements of Graviton2 over Intel and AMD based instances

Up to 46% better Cassandra RMW price-performance

Figure 2. Cassandra RMW Price-performance improvements of Graviton2 over Intel and AMD based instances

Figures 1 & 2 show that users can improve their performance per dollar between 15% to 46% on INSERT and RMW operations when they migrate their Cassandra database to AWS Graviton2-based instances.

Benchmark performance results

YCSB reports metrics such as latency and throughput for READ, UPDATE and other operation types. For this blog, we provide metrics on INSERT and RMW workloads. YCSB runs in two phases:

Loads data into the database (INSERT latency/throughput collected)
Run workload (READ/UPDATE latency/RMW throughput collected)

For each phase we present 99th-percentile latency and throughput plots. Cassandra runs on OpenJDK11, and each plot compares the measurements on AWS Graviton2 and x86 instances with thread counts of 1, 8, 16, 32, 64, 96 and 120.

INSERT results

The following plots show INSERT latency (99-percentile) and throughput on the three types of instances and for different thread counts. These metrics are collected by YCSB when loading the data into Cassandra.

Cassandra INSERT p99 latency

Figure 3. Cassandra p99 INSERT latency for different numbers of threads

Instance type/Threads	1	8	16	32	64	96	120
c6gd.2xlarge	0.297	0.431	0.637	1.153	1.985	3.309	3.967
c5d.2xlarge	0.328	0.461	0.663	1.125	2.103	3.829	4.807
c5ad.2xlarge	0.471	0.55	0.731	1.144	1.954	3.071	4.147

Table 1. Cassandra INSERT latency (ms) for different numbers of threads

Cassandra INSERT Throughput

Figure 4. Cassandra INSERT throughput for different numbers of threads

Instance type/Threads	1	8	16	32	64	96	120
c6gd.2xlarge	4589.0	29480.3	43407.5	53474.5	61396.8	60816.2	63991.8
c5d.2xlarge	4933.1	29768.5	42349.6	53898.2	56003.6	58154.7	59989.8
c5ad.2xlarge	3530.0	24667.9	38392.9	45330.9	49447.6	48232.5	50155.4

Table 2. Cassandra INSERT throughput (INSERT/sec) for different numbers of threads

The following charts show the percentage of Graviton2 improvements for INSERT operations over x86 instances.

Cassandra INSERT relative p99 latency

Figure 5. Cassandra INSERT latency improvement of AWS Graviton2 (C6gd)

Instance comp/Threads	1	8	16	32	64	96	120
C6gd vs C5d (2xlarge)	9.45%	6.50%	3.92%	-2.48%	5.61%	13.58%	17.47%
C6gd vs C5ad (2xlarge)	36.94%	21.63%	12.85%	-0.79%	-1.6%	-7.75%	4.34%

Table 3. Cassandra INSERT latency percentage improvement

Cassandra INSERT relative throughput

Figure 6. Cassandra INSERT throughput improvement of AWS Graviton2 (c6gd)

Instance comp/Threads	1	8	16	32	64	96	120
C6gd vs C5d (2xlarge)	-6.97%	-0.96%	2.49%	-0.78%	9.63%	4.57%	6.67%
C6gd vs C5ad (2xlarge)	30%	19.50%	13.06%	17.96%	24.16%	26.09%	27.58%

Table 4. Cassandra INSERT throughput percentage improvement of AWS Graviton2 (c6gd)

As shown in figures 5 & 6, Graviton2 offers up to 17% lower insert latency and 9.6% higher throughput compared to C5d for most of the thread counts. Graviton2 does even better compared to C5ad instances, with up to 37% lower insert latency and 30% higher throughput. For x86 instances throughput saturation occurs at 32-threds whereas Graviton2 continues to scale up to 64 threads.

RMW results

The following plots show RMW results for READ/UPDATE latency and RMW throughput for different thread counts. The RMW workload simulates the operations where a user reads data from Cassandra and writes the modified values back to the database.

Cassandra RMW p99 Latency

Figure 7. Cassandra RMW latency for different numbers of threads

Instance type/Threads	1	8	16	32	64	96	120
c6gd.2xlarge	0.347	0.563	0.914	1.305	2.335	3.729	7.523
c5d.2xlarge	0.344	0.661	0.853	1.35	3.805	9.391	13.359
c5ad.2xlarge	0.52	0.709	0.851	1.354	5.511	6.903	18.191

Table 5. Cassandra RMW latency (ms) for different number of threads

Cassandra RMW Throughput

Figure 8. Cassandra RMW throughput for different numbers of threads

Instance type/Threads	1	8	16	32	64	96	120
c6gd.2xlarge	2914.4	18368.0	26074.6	32562.7	41060.2	42905.6	45945.3
c5d.2xlarge	3164.5	19324.6	27360.2	35009.1	41018.9	42852.2	47779.5
c5ad.2xlarge	2323.6	16266.0	25126.9	30284.7	34978.5	38055.4	35203.8

Table 6. Cassandra RMW throughput for different numbers of threads

The following charts plot Graviton2 improvements for READ-MODIFY-WRITE/READ operations over x86 instances:

Cassandra RMW relative p99 latency

Figure 9. Cassandra RMW-READ latency improvement of Graviton2 (c6gd)

Instance comp/Threads	1	8	16	32	64	96	120
C6gd vs C5d (2xlarge)	-0.87%	14.82%	-7.15%	3.33%	38.63%	60.29%	43.69%
C6gd vs C5ad (2xlarge)	33.27%	20.59%	-7.40%	3.62%	57.63%	45.98%	58.64%

Table 7. Cassandra RMW-READ latency percentage improvement of Graviton2 (C6gd)

Cassandra RMW relative throughput

Figure 10. Cassandra RMW throughput improvement of Graviton2 (C6gd)

Instance comp/Threads	1	8	16	32	64	96	120
C6gd vs C5d (2xlarge)	-7.90%	-4.95%	-4.70%	-6.99%	0.101%	0.12%	-3.84%
C6gd vs C5ad (2xlarge)	25.4%	12.92%	3.77%	7.52%	17.39%	12.74%	30.51%

Table 8. Cassandra RMW throughput percentage improvement of Graviton2 (C6gd)

As with INSERT, AWS Graviton2 instances perform similar or better than x86 instances with up to 60% lower READ latency. While for some thread numbers, C5d throughput does better by up to 7.9%, Graviton2 outperforms C5ad by up to 30%. For RMW, all tested instances experience throughput saturation at a thread count around 64.

The following charts plot Graviton2 improvements for READ-MODIFY-WRITE/UPDATE latency over x86 instances:

Cassandra RMW-UPDATE p99 latency

Figure 11. Cassandra RMW-UPDATE latency for different numbers of threads

Instance type/Threads	1	8	16	32	64	96	120
c6gd.2xlarge	0.26	0.417	0.67	1.294	2.004	4.135	3.987
c5d.2xlarge	0.259	0.388	0.61	1.236	2.853	4.043	3.807
c5ad.2xlarge	0.449	0.491	0.677	1.232	2.247	3.565	5.923

Table 9. Cassandra RMW-UPDATE latency for different numbers of threads

Cassandra RMW-UPDATE p99 relative latency

Figure 12. Cassandra RMW-UPDATE latency percentage improvement of Graviton2 (c6gd)

Instance comp/Threads	1	8	16	32	64	96	120
C6gd vs C5d (2xlarge)	-0.39	-7.47	-9.84	-4.69	29.76	-2.28	-4.73
C6gd vs C5ad (2xlarge)	42.09	15.07	1.03	-5.03	10.81	-15.99	32.69

Table10. Cassandra RMW-UPDATE latency percentage improvement of Graviton2 (C6gd)

Here we see UPDATE latency benefits of up to 42% for the Graviton2 instance compared to the x86-based instances. The RMW-UPDATE results do show more parity between Graviton2 and x86, with x86-based instance showing lower latency at some thread counts.

Conclusion

Our benchmarks show that INSERT latency on Cassandra is lower on c6gd (AWS Graviton2) for most thread counts; up to 17% and 37% over the C5d (Intel) and C5ad (AMD) respectively. Benchmarks also show READ latency improvements on Graviton2 over x86 for thread counts of 32 and higher; up to 60 percent. UPDATE latency improvements on Graviton2 are up to 42%.

INSERT throughput of Cassandra running on Graviton2 is consistently higher than x86 with thread counts of 32 and above (when saturation happens on x86). For RMW, the throughput of C6gd and C5d instances are similar. However, compared to C5ad instances, the RMW throughput of Cassandra on C6gd can outperform by up to 30%.

In addition to experiencing improved performance when users migrate their Cassandra databases to Graviton2, they enjoy paying lower hourly rates compared to the equivalent x86 instances. The price-performance benefits when running Cassandra on AWS Graviton2 range from 15% up to 46% for INSERT and RMW operations for different thread counts.

AWS Graviton2 is showing significant performance and price-performance benefits across a wide range of workloads. These include H.264 video encoding, memcached, Elasticsearch and many others. And many organizations are finding they can achieve meaningful performance and costs benefits in just a few days by migrating workloads to Graviton2. AWS is currently sponsoring the Graviton Challenge, with discounts and prizes available for teams who show benefits by migrating their applications to Graviton2. The contest ends August 31, but we encourage any readers to check it out.

Check out AWS Graviton2

0 comments
0 members are here

Servers and Cloud Computing blog

Harness the Power of Retrieval-Augmented Generation with Arm Neoverse-powered Google Axion Processors

Na Li

This blog explores the performance benefits of RAG and provides pointers for building a RAG application on Arm®︎ Neoverse-based Google Axion Processors for optimized AI workloads.
- April 7, 2025
Arm CMN S3: Driving CXL storage innovation

John Xavier Lionel

CXL are revolutionizing the storage landscape. Neoverse CMN S3 plays a pivotal role in enabling high-performance, scalable storage devices configured as CXL Type 1 and Type 3.
- February 24, 2025
Streamline Arm adoption with GitHub Copilot and Arm64 Runners

Michael Gamble

The Arm for GitHub Copilot extension is here to change the way developers approach architecture migration.
- February 19, 2025

AI blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded and Microcontrollers blog

Internet of Things (IoT) blog

Laptops and Desktops blog

Mobile, Graphics, and Gaming blog

Operating Systems blog