Gain up to 35% performance benefits for deploying Redis on AWS Graviton2

July 20, 2021

3 minute read time.

Co Authors: Pranay Bakre and Masoud Koleini

It has been more than a year since the Arm Neoverse powered AWS Graviton2 processors became generally available, and customers are deploying a wide range of applications and workloads to gain price and performance benefits. The workloads range from load balancers/reverse proxy and API gateways (NGINX), to search engines (Elasticsearch), to in-memory databases (Memcached). We recommend reading our series of performance blogs on these different categories of workloads.

Databases like Memcached and Redis are referred to as in-memory databases. These databases, unlike traditional databases that store data in disks or SSDs, are purpose-built to store data in memory. This typically results in faster response times and higher IOPS. Redis is an open-source, in-memory datastore that is often used as a database, caching system and message broker. It is widely used in industries for real-time applications such as healthcare systems, IoT, and financial services. Redis is highly scalable and used for real time analytics, caching, pub/subs applications, and session management.

In this blog, we compare the throughput and latency of Redis on AWS Graviton2-based R6g instances to Intel Xeon-based R5 instances across of range of instance sizes to see which offers better Redis performance.

Performance benchmarking setup and results

For benchmarking setup, we used GNU Compiler Collection (GCC) version 10.2.0. Arm in collaboration with its partners and the GCC community have worked to significantly increase the performance with GCC 10 release. We compiled the Redis server from its source repository with GCC 10.2 before executing the benchmarking tests.

Using these tests, we observed up to 35% performance benefit of running an open-source Redis database on AWS Graviton2 based instances compared to equivalent x86-based instances. We also observed more than twice the number of operations/second output values from Redis deployed on Arm-based Amazon EC2 R6g instances compared to x86-based Amazon EC2 R5 instances. Additionally, we observed significantly lower latency values for similar operations.

We used Memtier as the load generator and performance benchmarking tool. It is an open-source high-throughput benchmarking tool for Redis built by Redis Labs. Memtier was deployed on separate EC2 instances in the same VPC as Redis instances

Component name	Version
Redis	6.0.9
GCC version	10.2.0
Memtier benchmarking tool	1.3.0
Operating System	Ubuntu 20.04

Input parameter	Value
Number of threads	5
Number of clients per thread	50
Number of requests per client	10k
Number of consecutive tests runs	10
Data size	128
Protocol	Redis
Key pattern	Sequential
Pipeline	1

Each test run generated 5 threads with 50 clients per thread, which gave 250 simultaneous connections (sessions). That added up to 2.5 million requests sent from Memtier on each run. Default pipeline value (1) was used during each test run. Pipelining is used to increase the throughput of the application. For bulk data transfers and achieving higher throughput, pipeline values greater than 1 can be considered. This github repo contains all the scripts required to create the test infrastructure and steps to execute the benchmarks.

The result shown in the following tables are an aggregated result of 30 consecutive test runs.

Let us look at the performance numbers of self-hosted Redis on R6g and R5 instances. We compared the throughput (operations/sec) and latency (lower is better) values after multiple test runs.

Instance size	R5 (Operations/Sec)	R6g (Operations/Sec)	Performance gain (%)
Large	142653.43	192730.22	35%
XLarge	145666.72	193117.02	32%
2XLarge	167997.1	199732.16	18%

Table 1: Redis throughput performance results on R5 vs R6g

Instance size	R5 (ms latency)	R6g (ms latency)	Performance gain (%)
Large	1.75	1.32	24%
XLarge	1.71	1.29	24%
2XLarge	1.49	1.25	16%

Table 2: Redis average latency performance results on R5 vs R6g

The throughput and latency performance comparison graphs for R5 and R6g instances are shown in the following figures.

Up to 35% better Redis throughput on AWS Graviton2

Figure 1: Performance gain for R6g vs R5 instances for self-hosted Redis deployment.

Up to 24% reduced Redis latency on AWS Graviton2

Figure 2. Lower latency for R6g vs R5 instances for self-hosted Redis deployment.

Summary

To conclude, Redis deployed on AWS Graviton2 provides up to 35% more throughput, with 24% reduced latency and a 20% cost benefit compared to the equivalent x86 based EC2 instances. Deploying applications on these instances is simple and efficient without the need for major complexities. For details on how to migrate existing applications to AWS Graviton2, please check this github page.

Visit the AWS Graviton page for customer stories on adoption of Arm-based processors. For any queries related to your software workloads running on Arm Neoverse platforms, feel free to reach out to us at sw-ecosystem@arm.com.

Noobie over 2 years ago

Thanks for the informative blog entry!

My team just migrated our Elasticache Redis nodes to Graviton 2 (r5.xl -> r6g.xl). We're seeing a drop in CPU which I guess can be a proxy for throughput, but our latency didn't decrease.

I wanted to ask if your drop in latency may have been caused by CPU throttling when maxing out throughput. Our CPU Util is < 10% so we're not close to maxing. If you have any ideas why my results may have been different than yours I'd really love to hear it.
- Cancel
- Up 0 Down
- Reply
- More
- Cancel

Servers and Cloud Computing blog

Integrated Modular Firmware Solutions: A Vital Component of Custom Silicon Chiplet Architecture Designs

Marc Meunier

Firmware is now the backbone of chiplet-based silicon—enabling modular integration, early validation, and secure, efficient system orchestration.
- October 8, 2025
Scaling GenAI Infrastructure with proteanTecs and Arm’s Neoverse CSS

Marc Meunier

proteanTecs successful integration of monitoring into Arm Neoverse CSS brings customer-ready solutions with accelerated time-to-market.
- October 2, 2025
Accelerate LLM Inference with ONNX Runtime on Arm Neoverse-powered Microsoft Cobalt 100

Na Li

In this blog, we take a closer look at how Microsoft Cobalt 100 processors and Arm’s ONNX Runtime optimizations deliver significant performance gains for running LLMs.
- October 1, 2025

AI blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded and Microcontrollers blog

Internet of Things (IoT) blog

Laptops and Desktops blog

Mobile, Graphics, and Gaming blog

Operating Systems blog

Servers and Cloud Computing blog

SoC Design and Simulation blog

Tools, Software and IDEs blog

Gain up to 35% performance benefits for deploying Redis on AWS Graviton2

Performance benchmarking setup and results

Summary

Integrated Modular Firmware Solutions: A Vital Component of Custom Silicon Chiplet Architecture Designs

Scaling GenAI Infrastructure with proteanTecs and Arm’s Neoverse CSS

Accelerate LLM Inference with ONNX Runtime on Arm Neoverse-powered Microsoft Cobalt 100