Improve Apache httpd performance up to 40% by deploying on Alibaba Cloud Yitian 710 instances
In this blog, we look at the advantages of using Alibaba Yitian 710 CPU Arm-based instances for Apache httpd compared to x86-based instances.
By Martin Ma

Introduction
Apache httpd is one of the most popular web servers, which is a software program that usually runs in the background, as a process. It plays the role of a server in a client-server model using the HTTP or HTTPS network protocols.
In this blog, we compare the Apache httpd throughput on two Alibaba Cloud Elastic Compute Service (ECS) instances. These instances are ECS g8y (powered by Yitian 710 processors based on Armv9 architecture) and g7 (powered by 3rd Generation Intel Xeon Scalable processors). Our findings demonstrate that httpd deployments on g8y instances can achieve up to 40% performance advantage over g7 instances. The following sections cover the details of our testing methodology and results.
Performance benchmark setup and result
For benchmark setup, there is one instance as load generator and one instance under test. We use wrk as the benchmark tool to generate the load and collect throughput to compare the performance between g8y and g7 instances.
The following table shows the configuration of the tested instances:
| Instance type | Instance size (vCPU) | Memory (GiB) | Storage |
| g8y | 2xlarge (8) | 32 | 40GB (ESSD PL0 2280 IOPS) |
| g7 | 2xlarge (8) | 32 | 40GB (ESSD PL0 2280 IOPS) |
The software versions and test parameters are as following:
| Software | Version |
| Apache httpd | 2.4.37 |
| Operation system | Alibaba Cloud Linux 3.2104 LTS |
| Kernel | 5.10.134-12.al8.aarch64 5.10.134-12.al8.x86 |
httpd default Multi-Processing Module (MPM) is event. It is designed to allow more requests to be served simultaneously by passing off some processing work to the listener threads. This action frees up the worker threads to serve new requests.
The following table shows the configuration of httpd that were tested:
| MPM event parameters | StartServers | 8 |
| ServerLimit | 100 | |
| ThreadsPerChild | 125 | |
| MaxRequestWorkers | 2000 | |
| ThreadLimit | 2000 | |
| MaxSpareThreads | 1000 | |
| Persistent connection parameters | KeepAlive | On |
| MaxKeepAliveRequests | 0 | |
| KeepAliveTimeout | 50 | |
| Disable submodules | brotli lua http2 http2-proxy |
|
To achieve better performance, we set CPU affinity for httpd processes and threads as in the following diagram.

The benchmark tool (wrk) runs on a single g8y.4xlarge instance. Each test creates 32 threads which send the request through the configured 1000 keep-alive HTTP/HTTPS connections, with a 30 second duration. The following tables show wrk version and test cases:
| Software | Version |
| wrk version | 4.0.2 |
| Threads | 32 |
| Connections | 1000 |
| Durations | 30 seconds |
| Test Case | Command |
| HTTP persistent connection | wrk -t 32 -c 1000 -d 30 --latency http://$serverIP |
| HTTPS persistent connection | wrk -t 32 -c 1000 -d 30 --latency https://$serverIP:443 |
Test Result
The throughput results are the average of 10 consecutive tests after one warmup test. Running httpd with logging disabled on g8y.2xlarge instances compared to g7.2xlarge instances we observe a 39.6% performance uplift for HTTP persistent connections and a 26.7% performance uplift for HTTPS persistent connections.
The following table shows throughput comparison (logging disabled) between g8y.2xlarge and g7.2xlarge.
| Test Case | g8y.2xlarge (Requests/Sec) | g7.2xlarge (Requests/Sec) | Performance gain |
| HTTP persistent connection | 243138.93 | 174186.11 | 39.6% |
| HTTPS persistent connection | 172087.59 | 135807.16 | 26.7% |
Table 1: Throughput results (logging disabled) on g8y and g7

Figure 1. Throughput (logging disabled) performance gains for g8y vs. g7
To effectively manage a web server, httpd provides logging capabilities to get feedback about the activity and performance of the server and any problems that may be occurring. To achieve better performance when logging is enabled, we set the parameter “BufferedLogs On”. This parameter is used to buffer log entries in memory before writing to disk. Running httpd with logging enabled on g8y.2xlarge instances compared to g7.2xlarge instances we observe a 40.0% performance uplift for HTTP persistent connections and a 27.1% performance uplift for HTTPS persistent connections.
The following table shows throughput comparison (logging enabled) between g8y.2xlarge and g7.2xlarge.
| Test Case | g8y.2xlarge (Requests/Sec) | G7.2xlarge (Requests/Sec) | Performance gain |
| HTTP persistent connection | 234099.50 | 167237.14 | 40.0% |
| HTTPS persistent connection | 163650.42 | 128793.82 | 27.1% |
Table 2: Throughput results (logging enabled) on g8y and g7

Figure 2. Throughput (logging enabled) performance gains for g8y vs. g7
Conclusion
By deploying Apache httpd on Yitian 710-based instances compared to deploying on 3rd generation Xeon Scalable processor-based instances, we see several benefits:
- A 40% throughput performance advantage for HTTP persistent connections
- A 27% throughput performance advantage for HTTPS persistent connections
- A 20% cost benefit
Please visit this page for details on how to migrate existing applications to Yitian 710 based instances. For any queries related to your software workloads running on Arm platforms, feel free to reach out to us at sw-ecosystem@arm.com.
By Martin Ma
Re-use is only permitted for informational and non-commerical or personal use only.
