In this blog we explore the performance of a Nginx Reverse Proxy (RP) and API Gateway (APIGW) on AWS Graviton3-based instances. We will also refer to these collectively as RP/APIGW. We compared AWS Graviton3-based instances to Intel Xeon 'Ice Lake'-based instances and AWS Graviton2-based instances to demonstrate the leadership performance available with AWS Graviton3.
Compared to AWS Graviton2-based instances, AWS Graviton3-based instances perform from 52% to 71% higher, with 5% to 18% higher performance-per-dollar, depending on instance size.
Compared to Intel Xeon 'Ice Lake'-based instances, AWS Graviton3-based instance perform from 41% to 91% higher, with 39% to 71% higher performance-per-dollar, depending on instance size.
AWS Graviton2-based instances also offer similar performance with up to 59% higher performance-per-dollar compared to Intel Xeon 'Ice Lake'-based instances.
In the past, we explored Nginx performance on the AWS Graviton and AWS Graviton2-based instances. This blog follows much of the same tuning and testing methodology as those previous blogs. That said, the latest Nginx tuning methodology can be found in the Learn how to Tune Nginx learning path at learn.arm.com. This learning path will be kept up to date and can serve as a starting point for tuning your Nginx deployment.
Below is the RP/APIGW test setup.
On the left is the load generator instance which simulates client connections using wrk2. The load generator will make as many HTTPS requests as possible for a 1kb file to measure maximum throughput. The request size was selected to be 1kb because small sized requests are common, and it results in a CPU bounded test. Testing with larger sized requests is valid, but these tend to make the test network bounded. Here, we are interested in testing CPU performance, not cloud networking performance. The load generator instance was selected to be large enough so that it isn't a bottleneck. A safe choice is an instance with at least 64 vCPUs and network bandwidth of 30Gbps or more.
The instance in the center is the RP/APIGW which is the instance under test. Its configuration is documented in the Learn how to Tune Nginx learning path. The instance types tested were the C7gn (based on AWS Graviton3) and C6in (based on Intel Xeon 'Ice Lake'). The sizes tested were L (2vCPU), XL (4vCPU), 2XL (8vCPU), and 4XL (16vCPU). Since this instance is a RP/APIGW, it does not host the 1kb file that the load generator is requesting, that file will be located on the upstream servers. The RP/APIGW is simply forwarding the requests to the upstream servers.
On the right are upstream file servers that contain the 1kb file. These servers are configured as documented in the Learn how to Tune Nginx learning path. There needs to be multiple upstream file servers so that this part of the test setup is not a bottleneck. We found that 4-6 upstream servers tend to be sufficient. However, with every new generation of cloud instance, the appropriate number and size of upstream instances should be reevaluated.
Nginx-Plus supports the use of JSON Web Tokens (JWT). Two use cases for JWT are user authentication and client-side data storage. We tested the authentication use case with the RS256 JWT algorithm. We leave it up to the reader to learn about JWT use cases and algorithms. Something to note is that JWT authentication on Nginx relies on OpenSSL.
The C6gn, C7gn, and C6in instances used in this blog are all network optimized instances.
Amazon EC2 C6gn instances are powered by AWS Graviton2 processors featuring up to 100 Gbps network bandwidth.
Amazon EC2 C6in instances are powered by 3rd Generation Intel Xeon Scalable processors with an all-core turbo frequency of up to 3.5 GHz. They are the first x86-based Amazon EC2 compute-optimized instances offering up to 200 Gbps network bandwidth. C6in instances deliver up 2x more network bandwidth, and 2x higher packet performance than comparable C5n instances.
Amazon EC2 C7gn instances, launched in June 2023, are powered by the latest AWS Graviton3E processors and feature the new 5th generation AWS Nitro Card. C7gn instances offer up to 200Gbps network bandwidth and up to 3x higher packet-processing performance per vCPU versus comparable current generation x86-based network optimized instances.
First, we show maximum RPS achieved by each instance type and size without the verification of JWT tokens. We include results for the AWS Graviton2 (C6gn) below to get a sense for the generational improvement of going from AWS Graviton2 to AWS Graviton3 on a basic Nginx setup (i.e. No JWT).
Above, we see that the C7gn (Graviton3) achieves the highest throughput across all the sizes tested. When compared to the C6gn (Graviton2), the C7gn achieves about 52% to 68% higher throughput. When compared to the C6in (Intel Xeon 'Ice Lake'), the C7gn achieves about 54% to 88% higher throughput. Next, we factor the cost of each instance type by dividing the maximum RPS measured by the cost of each instance.
In the graph above, the C7gn (Graviton3) achieves the highest RPS/$ of all the instances tested. When compared to the C6gn (Graviton2), the C7gn has about a 6% to 17% higher RPS/$ ratio. When compared to the C6in (Intel Xeon 'Ice Lake'), the C7gn has about a 39% to 71% higher RPS/$ ratio. It's also worth noting that the C6gn has a cost advantage of about 24% over the C6in. This gives the C6gn about a 20% to 58% RPS/$ ratio advantage over the C6in.
Below is the maximum RPS graphs when JWT RS256 authentication is enabled.
Comparing these results to the case where JWT RS256 is not used, we see that enabling JWT RS256 results in roughly a 90% decrease in maximum RPS across all the instance types and sizes tested. This is because the JWT token must be attached to every HTTPS request made to the RP/APIGW. When the token is received, the RP/APIGW verifies the token signature which consumes a large number of CPU cycles on each request. Thus, impacting maximum throughput. That said, the C7gn (Graviton3) is the top performer across all instances and sizes tested. When compared to the C6in (Intel Xeon 'Ice Lake'), the C7gn achieves about 47% to 52% higher throughput. Below is the RPS/$ graph based on the RS256 results shown above.
Similar to when JWT RS256 wasn't enabled, the C7gn (Graviton3) has the best RPS/$ ratio. When compared to the C6in (Intel Xeon 'Ice Lake'), the C7gn has about a 34% to 39% higher RPS/$ ratio.
The difference between the API Gateway case and the Reverse Proxy case is that the API Gateway has an extra step where the requested URI from a client needs to be matched with a RegEx and then rewritten before it is forwarded to the upstream servers. The particular RegEx used for this test is documented in the Learn how to Tune Nginx learning path. We leave understanding the use cases of an API Gateway to the reader.
Below is a graph showing the maximum RPS measured on the API Gateway without JWT RS256 enabled. We include the Graviton2 (C6gn) again for reference.
Comparing these results to the Reverse Proxy case, we see that the impact of the API Gateway RegEx match and URI rewrite is around 4% to 5% across all instances tested. This makes sense because the RegEx is simple. However, keep in mind that it is possible for an API Gateway configuration to be more complex than what was tested here. Thus, it is possible for an API Gateway to have a more significant impact on maximum RPS than what is shown in this data.
Aside from the results being a few percentage points lower, the results look similar to the Reverse Proxy case. Like in the Reverse Proxy case, we see that the C7gn outperforms all other instances across all sizes. When compared to the C6gn (Graviton2), the C7gn achieves about 52% to 71% higher throughput. When compared to the C6in (Intel Xeon 'Ice Lake'), the C7gn achieves about 55% to 91% higher throughput. Below is the API Gateway graph when we factor in the cost of each instance.
Above, the C7gn (Graviton3) achieves the highest RPS/$ of all the instances tested. When compared to the C6gn (Graviton2), the C7gn has about a 5% to 18% higher RPS/$ ratio. When compared to the C6in (Intel Xeon 'Ice Lake'), the C7gn has about a 41% to 69% higher RPS/$ ratio. Like with the Reverse Proxy, the cost advantage of the C6gn gives it about a 19% to 59% RPS/$ ratio over the C6in.
Below is the maximum RPS graphs for the API Gateway when JWT RS256 is enabled.
Again, the C7gn outperforms all other instances across all sizes. When compared to the C6in, the C7gn achieves about 48% to 52% higher throughput. Below is the API Gateway graph for RS256 when we factor in the cost of each instance.
As expected, the C7gn is the top performer. When compared to the C6in, the C7gn achieves about a 34% to 38% higher RPS/$ ratio.
In terms of raw performance, AWS Graviton3-based instances outperform all other instances we tested. We also saw that enabling JWT authentication reduces the maximum RPS of all instances significantly, however the relative performance of all the instances is similar to when JWT authentication is not used. Last, the API Gateway scenario we tested doesn't impact the maximum RPS significantly. However, we should keep in mind that the API Gateway tested here was rather simplified.
If readers want to understand how to tune a Nginx deployment, we suggest visiting the Nginx tuning learning path linked below. There are also other tuning learning paths available like MySQL and PostgreSQL.
Review Nginx Tuning Learning Path