Apache Cassandra is an open source distributed NoSQL database for mission critical applications. Cassandra is known for its hybrid solutions, security, high scalability, and speed. It has SQL-like query language called CQL (Cassandra Query Language) and is easier to use by people familiar with traditional SQL. Cassandra is specifically optimized for writes, so it offers high scalability and lower latencies for write operations.
In this blog, we compare the performance of Cassandra running in the AWS cloud using compute-optimized instances C6gd (AWS Graviton2), C5d (Intel Skylake-SP or Cascade Lake), and C5ad (AMD EPYC 7002 series). All the instance types provide high network bandwidth and feature local high speed NVMe-based SSD storage for faster disk IO operations.
AWS Graviton2 processors are custom built by Amazon Web Services using Arm Neoverse N1 cores to deliver the best price performance for your cloud workloads running in Amazon EC2. Compared to similar x86-based instances, Graviton2 provides better price performance and more NVMe storage, thus allowing users to run their computations at a lower price. Arm64 is supported by major Linux distributions, so it is very convenient for users to migrate their applications to Graviton2 instances to reduce operational costs while enhancing performance.
Today, Graviton2 instances are available as general purpose (M6g/M6gd/ T4g), compute-optimized (C6g/C6gd/C6gn), and memory-optimized (R6g/R6gd/X2gd) types. Users can choose their instance types based on the applications they plan to run, which will deliver up to 40% better price-performance compared to similar 5th generation x86 instances.
Benchmarks are run against a single instance of Cassandra. We use YCSB (Yahoo! Cloud Serving Benchmark) as the benchmarking tool to report metrics on INSERT, UPDATE, and RMW (read-modify-write) operations. YCSB runs on a separate instance in AWS, but within the same cluster and placement group as Cassandra in order to keep latency to a minimum.
Both the Cassandra instance and YCSB load generator run on 2xlarge R-series instances. Instances C6gd, C5d, and C5ad come with 8 vCPUs, 16 GiB of memory, NVMe SSD storage and network bandwidth up to 10 Gbps. All the instances run Ubuntu 20.04 as the operating system.
YCSB workloads execute in two phases, loading and transactions. The first phase defines the data to be inserted into the database, and the second defines the operations. The parameters of the workloads can be passed as input or defined in a workload file. We used Workload F for the tests, which benchmarks the database against a 50:50 ratio of read and read-modify-write operations. We modified the workload by changing the 'recordcount' and 'operationcount' parameters to 200K. For each instance type tested, we ran the load generator setup using that same instance type (ex, to test c6g.2xlarge, we ran the load generator on a c6g.2xlarge instance).
For this benchmark, we used Cassandra 4.0-RC1 (the latest at the time of publication) and ran it on OpenJDK 11. In the Cassandra configuration file, the parameters `data_file_directories` and `commitlog_directory` were changed to point to the NVMe directory mounted on the instance. In addition, the required `usertable` was created in Cassandra before running the load generator.
We'll begin with a price-performance summary of our findings followed by performance and latency results below.
On-demand pricing is used for our price-performance calculations, using published on-demand pricing at the time of testing. AWS Graviton2 has the lowest hourly-based cost of all the instances tested.
The followings charts show cost effectiveness of Graviton2 vs other x86 instance types with thread counts of 1, 8, 16, 32, 64, 96 and 120 at the time of publication. The left axis is the price-performance for INSERT and RMW operations, and the right axis is the improvement percentage of Graviton2 vs x86 instances.
Figure 1. Cassandra INSERT Price-performance improvements of Graviton2 over Intel and AMD based instances
Figure 2. Cassandra RMW Price-performance improvements of Graviton2 over Intel and AMD based instances
Figures 1 & 2 show that users can improve their performance per dollar between 15% to 46% on INSERT and RMW operations when they migrate their Cassandra database to AWS Graviton2-based instances.
YCSB reports metrics such as latency and throughput for READ, UPDATE and other operation types. For this blog, we provide metrics on INSERT and RMW workloads. YCSB runs in two phases:
For each phase we present 99th-percentile latency and throughput plots. Cassandra runs on OpenJDK11, and each plot compares the measurements on AWS Graviton2 and x86 instances with thread counts of 1, 8, 16, 32, 64, 96 and 120.
The following plots show INSERT latency (99-percentile) and throughput on the three types of instances and for different thread counts. These metrics are collected by YCSB when loading the data into Cassandra.
Figure 3. Cassandra p99 INSERT latency for different numbers of threads
Table 1. Cassandra INSERT latency (ms) for different numbers of threads
Figure 4. Cassandra INSERT throughput for different numbers of threads
Table 2. Cassandra INSERT throughput (INSERT/sec) for different numbers of threads
The following charts show the percentage of Graviton2 improvements for INSERT operations over x86 instances.
Figure 5. Cassandra INSERT latency improvement of AWS Graviton2 (C6gd)
C6gd vs C5d (2xlarge)
C6gd vs C5ad (2xlarge)
Table 3. Cassandra INSERT latency percentage improvement
Figure 6. Cassandra INSERT throughput improvement of AWS Graviton2 (c6gd)
Table 4. Cassandra INSERT throughput percentage improvement of AWS Graviton2 (c6gd)
As shown in figures 5 & 6, Graviton2 offers up to 17% lower insert latency and 9.6% higher throughput compared to C5d for most of the thread counts. Graviton2 does even better compared to C5ad instances, with up to 37% lower insert latency and 30% higher throughput. For x86 instances throughput saturation occurs at 32-threds whereas Graviton2 continues to scale up to 64 threads.
The following plots show RMW results for READ/UPDATE latency and RMW throughput for different thread counts. The RMW workload simulates the operations where a user reads data from Cassandra and writes the modified values back to the database.
Figure 7. Cassandra RMW latency for different numbers of threads
Table 5. Cassandra RMW latency (ms) for different number of threads
Figure 8. Cassandra RMW throughput for different numbers of threads
Table 6. Cassandra RMW throughput for different numbers of threads
The following charts plot Graviton2 improvements for READ-MODIFY-WRITE/READ operations over x86 instances:
Figure 9. Cassandra RMW-READ latency improvement of Graviton2 (c6gd)
Table 7. Cassandra RMW-READ latency percentage improvement of Graviton2 (C6gd)
Figure 10. Cassandra RMW throughput improvement of Graviton2 (C6gd)
Table 8. Cassandra RMW throughput percentage improvement of Graviton2 (C6gd)
As with INSERT, AWS Graviton2 instances perform similar or better than x86 instances with up to 60% lower READ latency. While for some thread numbers, C5d throughput does better by up to 7.9%, Graviton2 outperforms C5ad by up to 30%. For RMW, all tested instances experience throughput saturation at a thread count around 64.
The following charts plot Graviton2 improvements for READ-MODIFY-WRITE/UPDATE latency over x86 instances:
Figure 11. Cassandra RMW-UPDATE latency for different numbers of threads
Table 9. Cassandra RMW-UPDATE latency for different numbers of threads
Figure 12. Cassandra RMW-UPDATE latency percentage improvement of Graviton2 (c6gd)
Table10. Cassandra RMW-UPDATE latency percentage improvement of Graviton2 (C6gd)
Here we see UPDATE latency benefits of up to 42% for the Graviton2 instance compared to the x86-based instances. The RMW-UPDATE results do show more parity between Graviton2 and x86, with x86-based instance showing lower latency at some thread counts.
Our benchmarks show that INSERT latency on Cassandra is lower on c6gd (AWS Graviton2) for most thread counts; up to 17% and 37% over the C5d (Intel) and C5ad (AMD) respectively. Benchmarks also show READ latency improvements on Graviton2 over x86 for thread counts of 32 and higher; up to 60 percent. UPDATE latency improvements on Graviton2 are up to 42%.
INSERT throughput of Cassandra running on Graviton2 is consistently higher than x86 with thread counts of 32 and above (when saturation happens on x86). For RMW, the throughput of C6gd and C5d instances are similar. However, compared to C5ad instances, the RMW throughput of Cassandra on C6gd can outperform by up to 30%.
In addition to experiencing improved performance when users migrate their Cassandra databases to Graviton2, they enjoy paying lower hourly rates compared to the equivalent x86 instances. The price-performance benefits when running Cassandra on AWS Graviton2 range from 15% up to 46% for INSERT and RMW operations for different thread counts.
AWS Graviton2 is showing significant performance and price-performance benefits across a wide range of workloads. These include H.264 video encoding, memcached, Elasticsearch and many others. And many organizations are finding they can achieve meaningful performance and costs benefits in just a few days by migrating workloads to Graviton2. AWS is currently sponsoring the Graviton Challenge, with discounts and prizes available for teams who show benefits by migrating their applications to Graviton2. The contest ends August 31, but we encourage any readers to check it out.
Check out AWS Graviton2