Optimize your Elasticsearch deployment with Arm-based Amazon EC2 M6g instances

September 25, 2020

10 minute read time.

Every day, globally an approximate 2.5 quintillion bytes of data are created. A quintillion is 1 with 30 zeroes after it. Let that sink in for a moment. While a lot of this data could be cat videos on Internet, there is still a considerable amount of data that is produced in text and traditional readable content. While popular search engines like Google have simplified the parsing of this public data deluge, organizations both small and large still rely on search-based tools to unravel the insights from day to day content generated within the company firewalled boundaries or custom text and datasets that help with their business needs.

Elasticsearch - the most popular enterprise search engine

Elasticsearch is a highly scalable open-source text-search and analytics engine based on Apache Lucene library. The primary use cases for Elasticsearch include full text search majorly in e-commerce-based applications, document storage with cataloging, time-series data events and metrics and so on.

Recently, the team at elastic.co added Arm64 binaries for Elasticsearch. This allows users to deploy Elasticsearch on Arm Neoverse powered AWS Graviton2 instances. In this blog, we show Elasticsearch analytics use case for Twitter data analysis on a cluster of AWS Graviton2 based Amazon EC2 M6g instances. In addition, we conducted performance benchmarking using Rally as the benchmarking tool comparing Arm-powered Amazon EC2 M6g instances to x86-based M5 instances to showcase the benefit of using these instances for an Elasticsearch deployment.

For performing analytics with Elasticsearch, these instances provide better throughput and lower latency values up to 25% respectively compared to x86 based M5 instances while performing varying type of data analytics. These instances also provide 20% cost benefit. These are significant cost and performance benefits for customers as deploying Elasticsearch on Arm is seamless and requires no additional investments in time.

Use Case: Analyze twitter data

In this use case, we gather tweets and relevant data based on keywords and insert that data into Elasticsearch. We also create indexes and shards while inserting the data. We execute a python script that interacts with the twitter streaming api and fetch live tweets based on the keywords we specify. This data is then inserted into an Elasticsearch cluster running on AWS Graviton2 based Amazon M6g instances. On completion of the script, search queries are executed, and the data is analyzed.

Figure 1. Architecture of use case

The entire flow of the use case is captured in the following video:

For more details on how setup this use case, please refer to the Configurations section towards the end of this blog.

Performance benchmarking of Elasticsearch on Amazon EC2 M6g and Amazon EC2 M5 instances

Now, let us look at performance metrics on Elasticsearch comparing AWS Graviton2 based M6g instances with the x86 based M5 instances. For benchmarking Elasticsearch, we used Rally from Elasticsearch. We executed benchmarking tests on two instances with following specifications:

Instance Type	Instance Size	Disk Size
M6g (arm64)	m6g.xlarge	50GB
M5 (x86_64)	m5.xlarge	50GB

Table 1. EC2 instances types and size

Following tracks are tested using Rally to measure Elasticsearch performance representing a variety of datasets:

http_logs: This track is based on the web server logs from the 1998 Football World Cup. This track was chosen for evaluating the performance of web server logs
nested: This track is based on data dump of StackOverflow posts. This track was chosen for evaluating the performance of nested documents
pmc: This track contains a filtered set of data retrieved from PMC (a full-text archive of medical journals). This track was chosen to evaluate the performance of full-text search
geonames: This track is based on the data set from GeoNames (a geographical database that covers all countries). This track was chosen for evaluating the performance of structured data
geopoint: This track is based on a subset of data from Planet.osm (OpenStreetMap data). This track was chosen for evaluating the performance of geo queries

For each of the tracks described above, following metrics are measured from Rally’s benchmark reporting:

Throughput - Number of operations that Elasticsearch can perform within a certain time period, usually per second
Latency - Time period between submission of a request and receiving the complete response. It also includes wait time, that is, the time the request spends waiting until it is ready to be serviced by Elasticsearch
Service Time - Time period between start of request processing and receiving the complete response. This metric can easily be mixed up with latency but does not include waiting time.

In Rally, each track has different tasks that are performed while benchmarking Elasticsearch. We observed two major types of operations:

Batch-style operations (bulk size): A task such as – ‘index-append’ is defined as a batch-style operation where achieving the maximum throughput and finishing the operations as early as possible is desired
Interactive operations: Interactive operations like ‘scroll’ (full-text search, document search etc.) are search based operations where a target throughput is defined before running the benchmarks. The instances should perform at the target throughput level to achieve realistic result

The table below shows a comparative analysis of metrics observed during the ‘index-append’ task for Elasticsearch. Below three (3) tracks are with a data size that is large in nature (20-30 GB), comprising of many logfiles and documents. In this table we see 15%-25% better performance from M6g instances over the equivalent M5 instances.

Track Name	Task Name	Metrics Name	m5.xlarge	m6g.xlarge	Performance Improvement
http_logs	index_append	Throughput (higher is better)	76596.6 docs/s	86088.8 docs/s	12.4%
		Latency (lower is better)	17338.5ms	12937ms	25.3%
		Service Time	17338.5ms	12937ms	25.3%
pmc	index_append	Throughput	637.61 docs/s	733.94 docs/s	15%
		Latency	29331.3ms	21706.9ms	26%
		Service Time	29331.3ms	21706.9ms	26%
nested	index_append	Throughput	26568.2 docs/s	33555.6 docs/s	26.2%
		Latency	11311ms	7794.68ms	31%
		Service Time	11311ms	7794.68ms	31%

Table 2. Single node Elasticsearch instance – Performance metrics for batch-style operations

The two (2) tracks covered below are for geological queries and structured data, with smaller data size (2-3 GB). In this case, we observe 6%-15% better performance from M6g instances when compared to M5 instances.

Track Name	Task Name	Metrics Name	m5.xlarge	m6g.xlarge	Performance Improvement
geonames	index_append	Throughput (higher is better)	33328.5 docs/s	35887.6 docs/s	7.6%
		Latency (lower is better)	12070.2ms	11222.1ms	7%
		Service Time	12070.2ms	11222.1ms	7%
geopoint	index_append	Throughput	128434 docs/s	136216 docs/s	6%
		Latency	11484.3ms	9707.04ms	15%
		Service Time	11484.3ms	9707.04ms	15%

Table 3. Single node Elasticsearch instance - Performance metrics for batch-style operations

The following table shows a comparative analysis of metrics during interactive tasks like, scroll. As explained above in such cases a target throughput is defined and a lower latency and service time designates stable performance from the instance. The following table shows a comparative analysis of such tasks for both type of instances.

Track Name	Task name	Metrics Name	m5.xlarge	m6g.xlarge	Performance Improvement
http_logs	scroll	Throughput	25.14 pages/s	25.16 pages/s	NA
		Latency	491.124ms	381.334ms	22.3%
		Service Time	489.79ms	380.286ms	22.3%
nested	randomized-nested-queries	Throughput	20.01 ops/s	20.01 ops/s	NA
		Latency	125.935ms	118.873ms	5.6%
		Service Time	124.342ms	118.069ms	5%
pmc	scroll	Throughput	12.7 pages/s	12.7 pages/s	NA
		Latency	442.297ms	328.392ms	25.7%
		Service Time	440.01ms	325.641ms	25.9%
geonames	scroll	Throughput	20.06 pages/s	20.07 pages/s	NA
		Latency	546.091ms	352.821ms	35.3%
		Service Time	544.829ms	350.713ms	35.6%
geopoint	polygon	Throughput	2.01 ops/s	2.01 ops/s	NA
		Latency	112.135ms	97.273ms	13.2%
		Service Time	111.216ms	95.206ms	14.3%

Table 4. Single Node Elasticsearch instance - Performance metrics for interactive operations

Additionally, the following table shows the performance metrics for a three (3) node Elasticsearch cluster consisting of EC2 instances. It’s for the ‘index-append’ task.

Track Name	Task Name	Metrics Name	m5.xlarge	m6g.xlarge	Performance Improvement
http_logs	index_append	Throughput (higher is better)	159995 docs/s	186635 docs/s	16.6%
		Latency (lower is better)	13529ms	8941.91ms	33.9%
		Service Time	13529ms	8941.91ms	33.9%
nested	index_append	Throughput	32085 docs/s	39661 docs/s	23.6%
		Latency	9532.52ms	7795.97ms	18.2%
		Service Time	9532.52ms	7795.97ms	18.2%

Table 5. Three (3) Node Elasticsearch Cluster - Performance metrics for batch-style operations

Following table shows a comparative analysis for interactive operations for a three (3) node Elasticsearch cluster:

Track Name	Task name	Metrics Name	m5.xlarge	m6g.xlarge	Performance Improvement
http_logs	scroll	Throughput	25.16 pages/s	25.18 pages/s	NA
		Latency	415.282ms	311.113ms	25.08%
		Service Time	413.767ms	310.043ms	25.06%
nested	randomized-nested-queries	Throughput	20.01 ops/s	20.01 ops/s	NA
		Latency	98.58ms	93.46ms	5.1%
		Service Time	97.45ms	93.46ms	4%

Table 6. Three (3) Node Elasticsearch Cluster - Performance metrics for interactive operations

Summary

To conclude, Elasticsearch can be used for a variety of use cases and AWS Graviton2 provides better performance and cost benefits. Arm-based M6g instances provide better throughput and lower latency values up to 25% respectively compared to x86 based M5 instances while performing varying type of data analytics. These instances also provide 20% cost benefit.

For more information on software ecosystem on AWS Graviton2, please visit the AWS sessions at Arm DevSummit and for questions reach us here.

Configurations:

These are the performance related settings we updated to achieve the results described previously:

1. Change the default JVM heap size to 50% the memory of each instance.

sudo vi /etc/elasticsearch/jvm.options 

-Xms8g (for an xlarge instance) 

-Xmx8g

2. Turn off memory swap and make sure Elasticsearch is the only service running in the instance.

sudo swapoff -a (on each instance)

3. If turning off memory swap is not possible for some reason, then edit the elasticsearch.yml file and change the following settings.

bootstrap.mlockall: true 

sudo vi /etc/default/elasticsearch 

MAX_LOCKED_MEMORY=unlimited

4. Ensure that Elasticsearch is configured and running on a machine with a 10GbE networking interface.

5. Run the Rally tool on a separate instance and make sure there are no network disconnects, latency issues while connecting to Elasticsearch.

Following are the pre-requisites before installing Elasticsearch:

Amazon Machine Image (AMI) – Ubuntu 20.04 (arm64 based)

Open-jdk-14.0.1+7 installed on each EC2 M6g instance (only required if you’re using the no JDK binary of Elasticsearch)

You can use either the default distribution of Elasticsearch or the open-source version as described in the following.

Download Elasticsearch using the following command:

OSS version-

curl -L -O https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-oss-7.8.0-aarch64.deb

Default version-

wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.8.0-arm64.deb

Additionally, download sha512 from the same location:

OSS version-

curl -L -O https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-oss-7.8.0-aarch64.deb.sha512

Default version-

wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.8.0-arm64.deb.sha512

Verify the sha for the binaries that you have downloaded:

shasum -a 512 -c elasticsearch-oss-7.8.0-aarch64.deb.sha512

You should see an OK message printed out on the screen.

Now, you are ready to install Elasticsearch.

sudo dpkg -i elasticsearch-oss-7.8.0-aarch64.deb

Repeat these steps on each M6g instance in AWS.

After the installation is complete, edit the following configuration file on each instance.

sudo vi /etc/elasticsearch/elasticsearch.yml

Set the name of Elasticsearch cluster by locating the field cluster.name and replace its value with your own.

Now, set the name of Elasticsearch node by editing the following field:

node.name: <hostname of the node>

Start the Elasticsearch service with the following command.

sudo systemctl start elasticsearch.service

Check the status of the service.

sudo systemctl status elasticsearch.service

Execute a simple curl command to check whether you are able to query Elasticsearch.

curl localhost:9200

Figure 2. Command output to show successful install of Elasticsearch on a single node

After the installation is complete on all three nodes, check the status of the Elasticsearch cluster by executing the following commands on each node:

curl -XGET ‘http://<clusterIP>:9200/_cluster/state?pretty’

Check the health of the Elasticsearch cluster by executing the following command. It should display the status as green.

curl -XGET ‘http://<clusterIP>:9200/_cat/health\?v’

Figure 3. Command output to show health of the Elasticsearch cluster

Once the Elasticsearch cluster is running, we install a python application called tweepy on our client machine. It is a simple python-based library that is used to interact with the Twitter streaming api.

pip3 install tweepy

We have a sample python application that’s going to use the twitter streaming api and fetch live tweets based on the keywords we specify. To use this application, you will need to sign up for a Twitter developer account. It is straightforward and the steps are listed here.

Once you have created the account and registered your application, you should have an access token, api key and api secret key. These needs to be provided in the python script. The sample script can be downloaded from the github repo here.

As shown in the following image, we run a python script that searches for live tweets based on specific keywords. The script then connects with a three (3) node Elasticsearch cluster running on AWS Graviton2 based Amazon M6g instances. The live tweets that are collected in the script are formatted and sent to the Elasticsearch cluster. On successful completion of the script, search queries are executed, and the data is analyzed.

Figure 4. Script execution result

In this script we look for keywords like “aws”, “graviton2” or “arm”. On executing this script, it is going to look for live tweets that reference any of these keywords and insert the tweet data into Elasticsearch database. We run the script for two hours to collect considerable amount of data. Now, it is time to search our keywords and analyze the tweets.

Executing the following command to search for keyword ‘graviton2’.

curl -XGET ‘http://<esclusterIP>:9200/sentiment/_search?q=graviton2’

Figure 5. Search and analysis result for keyword

It shows data related to a tweet from a few minutes ago that referenced ‘graviton2’ as a keyword. These stats can also be viewed with tools such as Kibana or Grafana.

Rally configuration

Before installing Rally, you need to have Python3 configured on the machine.

sudo apt install python3-pip

To install rally, execute the following command:

pip3 install esrally

After the tool is installed, add its location to PATH environment variable and execute the configuration command:

esrally configure

We should see the following output on a successful configuration:

Servers and Cloud Computing blog

How SiteMana scaled real-time visitor ingestion and ML inference by migrating to Arm-based AWS Graviton3

Peter Ma

Migrating to Arm-based AWS Graviton3 improved SiteMana’s scalability, latency, and costs while enabling real-time ML inference at scale.
- July 4, 2025
Arm Performance Libraries 25.04 and Arm Toolchain for Linux 20.1 Release

Chris Goodyer

In this blog post, we announce the releases of Arm Performance Libraries 25.04 and Arm Toolchain for Linux 20.1. Explore the new product features, performance highlights and how to get started.
- June 17, 2025
Harness the Power of Retrieval-Augmented Generation with Arm Neoverse-powered Google Axion Processors

Na Li

This blog explores the performance benefits of RAG and provides pointers for building a RAG application on Arm®︎ Neoverse-based Google Axion Processors for optimized AI workloads.
- April 7, 2025

AI blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded and Microcontrollers blog

Internet of Things (IoT) blog

Laptops and Desktops blog

Mobile, Graphics, and Gaming blog

Operating Systems blog

Servers and Cloud Computing blog

SoC Design and Simulation blog

Tools, Software and IDEs blog

Optimize your Elasticsearch deployment with Arm-based Amazon EC2 M6g instances

Elasticsearch - the most popular enterprise search engine

Use Case: Analyze twitter data

Performance benchmarking of Elasticsearch on Amazon EC2 M6g and Amazon EC2 M5 instances

Summary

Configurations:

How SiteMana scaled real-time visitor ingestion and ML inference by migrating to Arm-based AWS Graviton3

Arm Performance Libraries 25.04 and Arm Toolchain for Linux 20.1 Release

Harness the Power of Retrieval-Augmented Generation with Arm Neoverse-powered Google Axion Processors