Arm Community
Arm Community
  • Site
  • User
  • Site
  • Search
  • User
Arm Community blogs
Arm Community blogs
Servers and Cloud Computing blog Improve Memcached performance up to 41% with Alibaba Cloud Yitian 710 instances
  • Blogs
  • Mentions
  • Sub-Groups
  • Tags
  • Jump...
  • Cancel
More blogs in Arm Community blogs
  • AI blog

  • Announcements

  • Architectures and Processors blog

  • Automotive blog

  • Embedded and Microcontrollers blog

  • Internet of Things (IoT) blog

  • Laptops and Desktops blog

  • Mobile, Graphics, and Gaming blog

  • Operating Systems blog

  • Servers and Cloud Computing blog

  • SoC Design and Simulation blog

  • Tools, Software and IDEs blog

Tags
  • Cloud Computing
  • Open Source Software
  • Server and Infrastructure
Actions
  • RSS
  • More
  • Cancel
Related blog posts
Related forum threads

Improve Memcached performance up to 41% with Alibaba Cloud Yitian 710 instances

Ker Liu
Ker Liu
March 14, 2023
2 minute read time.

Memcached is an open source, high-performance, distributed memory object caching system. It is a popular choice for powering real-time applications in web, mobile apps, gaming, ad-tech, and e-Commerce. Memcached is an in-memory key-value store that offers higher application performance by removing the need to access disks or SSDs. By keeping its data in memory, it avoids delays and can access data much faster than traditional disk-based databases.

In this blog, we compare the throughput of Memcached on two types of Alibaba Cloud ECS instances, to show the performance advantage of Arm. G8y instances, powered by the Alibaba Yitian 710 processor based on Armv9, represent Arm. G7 instances, powered by 3rd Generation Intel Xeon Scalable processors, represent x86.

Benchmark setup and results

We used Memtier as the load generator and performance benchmarking tool. It is an open-source high-throughput benchmarking tool for Memcached. Memtier was deployed on separate ECS instance.

For the Memcached server, we deployed multiple Memcached processes on each core.

Memcached benchmarking topology

Figure 1. Memcached benchmarking topology

The server under test has two ECS instances with the following configurations. The benchmark client used a single G8y.8xlarge instance.

Processor ECS type
Yitian 710 G8y.2xlarge
The 3rd Generation Xeon G7.2xlarge

Table 1. Test server configurations

The benchmark tests were performed with the following software versions and test parameters.

Component name Version
Memcached 1.5.22
GCC version 10.2.1 20200825 (Alibaba 10.2.1-3 2.32)
Memtier benchmarking tool 1.4.0
Operating system Alibaba Cloud Linux 3.2104 LTS

 

Test config parameter Value
Number of Memtier clients 8
Number of threads 8
Number of clients per thread 10
Number of consecutive tests runs 3
Data size 128
Memcached protocol text
Key pattern random
Pipeline 1, 50, 100

We use 8 Memtier clients to generate requests for 8 Memcached processes simultaneously, each Memtier client created 8 threads with 10 clients per thread, which gave 80 simultaneous connections (sessions). Pipeline 1, 50 and 100 was used in this test. Pipeline values greater than 1 can be used for bulk data transfers to increase the throughput of the application.

After enabling XPS (transmit packet steering), RPS (receive packet steering) and RFS (receive flow steering), the performance on both instances can be improved. We observed up to 41% performance benefit of running a Memcached database on Yitian 710 based instances compared to equivalent x86-based instances. The result shown in the following tables is an aggregated result of 30 consecutive test runs. 

Let us look at the performance numbers of Memcached on G8y and G7 instances. We compared the throughput (Operations/Sec) values after multiple test runs. 

Pipeline parameter G7.2x (Operations/Sec) G8y.2x (Operations/Sec) Performance gain (%)
Pipeline=1 1256257.41 1482112.07 18%
Pipeline=50 4870840.43 6484505.32 33%
Pipeline=100 5241900.43 7379739.17 41%

Table 2. Memcached throughput performance results on G8y vs. G7

 Memcached performance gains for G8y vs. G7 instances

Figure 2. Performance gains for G8y vs. G7 instances

Conclusion

To conclude, Memcached deployed on Yitian 710 based ECS provides up to 41% more throughput compared to equivalent x86-based ECS instances. In addition, G8y instances are priced 20% less than comparable G7 instances. 

More workload blogs

Anonymous
Servers and Cloud Computing blog
  • Arm Performance Libraries 25.04 and Arm Toolchain for Linux 20.1 Release

    Chris Goodyer
    Chris Goodyer
    In this blog post, we announce the releases of Arm Performance Libraries 25.04 and Arm Toolchain for Linux 20.1. Explore the new product features, performance highlights and how to get started.
    • June 17, 2025
  • Harness the Power of Retrieval-Augmented Generation with Arm Neoverse-powered Google Axion Processors

    Na Li
    Na Li
    This blog explores the performance benefits of RAG and provides pointers for building a RAG application on Arm®︎ Neoverse-based Google Axion Processors for optimized AI workloads.
    • April 7, 2025
  • Arm CMN S3: Driving CXL storage innovation

    John Xavier Lionel
    John Xavier Lionel
    CXL are revolutionizing the storage landscape. Neoverse CMN S3 plays a pivotal role in enabling high-performance, scalable storage devices configured as CXL Type 1 and Type 3.
    • February 24, 2025