Arm Community
Arm Community
  • Site
  • User
  • Site
  • Search
  • User
Arm Community blogs
Arm Community blogs
Servers and Cloud Computing blog Optimize your Elasticsearch deployment with Arm-based Amazon EC2 M6g instances
  • Blogs
  • Mentions
  • Sub-Groups
  • Tags
  • Jump...
  • Cancel
More blogs in Arm Community blogs
  • AI blog

  • Announcements

  • Architectures and Processors blog

  • Automotive blog

  • Embedded and Microcontrollers blog

  • Internet of Things (IoT) blog

  • Laptops and Desktops blog

  • Mobile, Graphics, and Gaming blog

  • Operating Systems blog

  • Servers and Cloud Computing blog

  • SoC Design and Simulation blog

  • Tools, Software and IDEs blog

Tags
  • optimization
  • aws
  • Graviton2
  • infrastructure
  • Neoverse
  • Elasticsearch
Actions
  • RSS
  • More
  • Cancel
Related blog posts
Related forum threads

Optimize your Elasticsearch deployment with Arm-based Amazon EC2 M6g instances

Pranay Bakre
Pranay Bakre
September 25, 2020
10 minute read time.

Every day, globally an approximate 2.5 quintillion bytes of data are created. A quintillion is 1 with 30 zeroes after it. Let that sink in for a moment. While a lot of this data could be cat videos on Internet, there is still a considerable amount of data that is produced in text and traditional readable content. While popular search engines like Google have simplified the parsing of this public data deluge, organizations both small and large still rely on search-based tools to unravel the insights from day to day content generated within the company firewalled boundaries or custom text and datasets that help with their business needs. 

Elasticsearch - the most popular enterprise search engine

Elasticsearch is a highly scalable open-source text-search and analytics engine based on Apache Lucene library. The primary use cases for Elasticsearch include full text search majorly in e-commerce-based applications, document storage with cataloging, time-series data events and metrics and so on. 

Recently, the team at elastic.co added Arm64 binaries for Elasticsearch. This allows users to deploy Elasticsearch on Arm Neoverse powered AWS Graviton2 instances. In this blog, we show Elasticsearch analytics use case for Twitter data analysis on a cluster of AWS Graviton2 based Amazon EC2 M6g instances. In addition, we conducted performance benchmarking using Rally as the benchmarking tool comparing Arm-powered Amazon EC2 M6g instances to x86-based M5 instances to showcase the benefit of using these instances for an Elasticsearch deployment.  

For performing analytics with Elasticsearch, these instances provide better throughput and lower latency values up to 25% respectively compared to x86 based M5 instances while performing varying type of data analytics. These instances also provide 20% cost benefit. These are significant cost and performance benefits for customers as deploying Elasticsearch on Arm is seamless and requires no additional investments in time. 

Use Case: Analyze twitter data

In this use case, we gather tweets and relevant data based on keywords and insert that data into Elasticsearch. We also create indexes and shards while inserting the data. We execute a python script that interacts with the twitter streaming api and fetch live tweets based on the keywords we specify. This data is then inserted into an Elasticsearch cluster running on AWS Graviton2 based Amazon M6g instances. On completion of the script, search queries are executed, and the data is analyzed.

Figure 1. Architecture of use case

The entire flow of the use case is captured in the following video:

For more details on how setup this use case, please refer to the Configurations section towards the end of this blog.

Performance benchmarking of Elasticsearch on Amazon EC2 M6g and Amazon EC2 M5 instances 

Now, let us look at performance metrics on Elasticsearch comparing AWS Graviton2 based M6g instances with the x86 based M5 instances. For benchmarking Elasticsearch, we used Rally from Elasticsearch. We executed benchmarking tests on two instances with following specifications: 

Instance Type Instance Size Disk Size
M6g (arm64) m6g.xlarge 50GB
M5 (x86_64) m5.xlarge 50GB

 Table 1. EC2 instances types and size 

Following tracks are tested using Rally to measure Elasticsearch performance representing a variety of datasets:  

  • http_logs: This track is based on the web server logs from the 1998 Football World Cup. This track was chosen for evaluating the performance of web server logs 
  • nested: This track is based on data dump of StackOverflow posts. This track was chosen for evaluating the performance of nested documents
  • pmc: This track contains a filtered set of data retrieved from PMC (a full-text archive of medical journals). This track was chosen to evaluate the performance of full-text search
  • geonames: This track is based on the data set from GeoNames (a geographical database that covers all countries). This track was chosen for evaluating the performance of structured data
  • geopoint: This track is based on a subset of data from Planet.osm (OpenStreetMap data). This track was chosen for evaluating the performance of geo queries 

For each of the tracks described above, following metrics are measured from Rally’s benchmark reporting:  

  • Throughput - Number of operations that Elasticsearch can perform within a certain time period, usually per second 
  • Latency - Time period between submission of a request and receiving the complete response. It also includes wait time, that is, the time the request spends waiting until it is ready to be serviced by Elasticsearch 
  • Service Time - Time period between start of request processing and receiving the complete response. This metric can easily be mixed up with latency but does not include waiting time.  

 In Rally, each track has different tasks that are performed while benchmarking Elasticsearch. We observed two major types of operations:   

  • Batch-style operations (bulk size): A task such as – ‘index-append’ is defined as a batch-style operation where achieving the maximum throughput and finishing the operations as early as possible is desired  
  • Interactive operations: Interactive operations like ‘scroll’ (full-text search, document search etc.) are search based operations where a target throughput is defined before running the benchmarks. The instances should perform at the target throughput level to achieve realistic result  

The table below shows a comparative analysis of metrics observed during the ‘index-append’ task for Elasticsearch. Below three (3) tracks are with a data size that is large in nature (20-30 GB), comprising of many logfiles and documents. In this table we see 15%-25% better performance from M6g instances over the equivalent M5 instances.  

Track Name Task Name Metrics Name m5.xlarge m6g.xlarge Performance Improvement
http_logs index_append Throughput (higher is better) 76596.6 docs/s 86088.8 docs/s 12.4%
Latency (lower is better) 17338.5ms 12937ms 25.3%
Service Time 17338.5ms 12937ms 25.3%
pmc index_append Throughput 637.61 docs/s 733.94 docs/s 15%
Latency 29331.3ms 21706.9ms 26%
Service Time 29331.3ms 21706.9ms 26%
nested index_append Throughput 26568.2 docs/s 33555.6 docs/s 26.2%
Latency 11311ms 7794.68ms 31%
Service Time 11311ms 7794.68ms 31%

Table 2. Single node Elasticsearch instance – Performance metrics for batch-style operations 


The two (2) tracks covered below are for geological queries and structured data, with smaller data size (2-3 GB). In this case, we observe 6%-15% better performance from M6g instances when compared to M5 instances. 

Track Name Task Name Metrics Name m5.xlarge m6g.xlarge Performance Improvement
geonames index_append Throughput (higher is better) 33328.5 docs/s 35887.6 docs/s 7.6%
Latency (lower is better) 12070.2ms 11222.1ms 7%
Service Time 12070.2ms 11222.1ms 7%
geopoint index_append Throughput 128434 docs/s 136216 docs/s 6%
Latency 11484.3ms 9707.04ms 15%
Service Time 11484.3ms 9707.04ms 15%

Table 3. Single node Elasticsearch instance - Performance metrics for batch-style operations 

The following table shows a comparative analysis of metrics during interactive tasks like, scroll. As explained above in such cases a target throughput is defined and a lower latency and service time designates stable performance from the instance. The following table shows a comparative analysis of such tasks for both type of instances. 

Track Name Task name Metrics Name m5.xlarge m6g.xlarge Performance Improvement
http_logs scroll Throughput 25.14 pages/s 25.16 pages/s NA
Latency 491.124ms 381.334ms 22.3%
Service Time 489.79ms 380.286ms 22.3%
nested randomized-nested-queries Throughput 20.01 ops/s 20.01 ops/s NA
Latency 125.935ms 118.873ms 5.6%
Service Time 124.342ms 118.069ms 5%
pmc scroll Throughput 12.7 pages/s 12.7 pages/s NA
Latency 442.297ms 328.392ms 25.7%
Service Time 440.01ms 325.641ms 25.9%
geonames scroll  Throughput 20.06 pages/s 20.07 pages/s NA
Latency 546.091ms 352.821ms 35.3%
Service Time 544.829ms 350.713ms 35.6%
geopoint polygon  Throughput 2.01 ops/s 2.01 ops/s NA
Latency 112.135ms 97.273ms 13.2%
Service Time 111.216ms 95.206ms 14.3%

Table 4. Single Node Elasticsearch instance - Performance metrics for interactive operations 

Additionally, the following table shows the performance metrics for a three (3) node Elasticsearch cluster consisting of EC2 instances. It’s for the ‘index-append’ task. 

Track Name Task Name Metrics Name m5.xlarge m6g.xlarge Performance Improvement
http_logs index_append Throughput (higher is better) 159995 docs/s 186635 docs/s 16.6%
Latency (lower is better) 13529ms 8941.91ms 33.9%
Service Time 13529ms 8941.91ms 33.9%
nested  index_append Throughput 32085 docs/s 39661 docs/s 23.6%
Latency 9532.52ms 7795.97ms 18.2%
Service Time 9532.52ms 7795.97ms 18.2%

Table 5. Three (3) Node Elasticsearch Cluster - Performance metrics for batch-style operations 

Following table shows a comparative analysis for interactive operations for a three (3) node Elasticsearch cluster: 

Track Name Task name Metrics Name m5.xlarge m6g.xlarge Performance Improvement
http_logs scroll  Throughput 25.16 pages/s 25.18 pages/s NA
Latency 415.282ms 311.113ms 25.08%
Service Time 413.767ms 310.043ms 25.06%
nested randomized-nested-queries  Throughput 20.01 ops/s 20.01 ops/s NA
Latency 98.58ms 93.46ms 5.1%
Service Time 97.45ms 93.46ms 4%

Table 6. Three (3) Node Elasticsearch Cluster - Performance metrics for interactive operations 

Summary

To conclude, Elasticsearch can be used for a variety of use cases and AWS Graviton2 provides better performance and cost benefits. Arm-based M6g instances provide better throughput and lower latency values up to 25% respectively compared to x86 based M5 instances while performing varying type of data analytics. These instances also provide 20% cost benefit.

For more information on software ecosystem on AWS Graviton2, please visit the AWS sessions at Arm DevSummit and for questions reach us here.  

Register for Arm DevSummit

Configurations:

These are the performance related settings we updated to achieve the results described previously: 

1. Change the default JVM heap size to 50% the memory of each instance. 

sudo vi /etc/elasticsearch/jvm.options 

-Xms8g (for an xlarge instance) 

-Xmx8g 

2. Turn off memory swap and make sure Elasticsearch is the only service running in the instance. 

sudo swapoff -a (on each instance) 

3. If turning off memory swap is not possible for some reason, then edit the elasticsearch.yml file and change the following settings.

bootstrap.mlockall: true 

sudo vi /etc/default/elasticsearch 

MAX_LOCKED_MEMORY=unlimited 

4. Ensure that Elasticsearch is configured and running on a machine with a 10GbE networking interface. 

5. Run the Rally tool on a separate instance and make sure there are no network disconnects, latency issues while connecting to Elasticsearch. 

Following are the pre-requisites before installing Elasticsearch:  

Amazon Machine Image (AMI) – Ubuntu 20.04 (arm64 based) 
Open-jdk-14.0.1+7 installed on each EC2 M6g instance (only required if you’re using the no JDK binary of Elasticsearch) 

You can use either the default distribution of Elasticsearch or the open-source version as described in the following.  

Download Elasticsearch using the following command: 

OSS version- 

curl -L -O https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-oss-7.8.0-aarch64.deb 

Default version- 

wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.8.0-arm64.deb 

 Additionally, download sha512 from the same location: 

OSS version- 

curl -L -O https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-oss-7.8.0-aarch64.deb.sha512 

Default version- 

wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.8.0-arm64.deb.sha512 

Verify the sha for the binaries that you have downloaded: 

shasum -a 512 -c elasticsearch-oss-7.8.0-aarch64.deb.sha512 

You should see an OK message printed out on the screen. 

Now, you are ready to install Elasticsearch. 

sudo dpkg -i elasticsearch-oss-7.8.0-aarch64.deb 

Repeat these steps on each M6g instance in AWS.  

After the installation is complete, edit the following configuration file on each instance.

sudo vi /etc/elasticsearch/elasticsearch.yml 

Set the name of Elasticsearch cluster by locating the field cluster.name and replace its value with your own. 

Now, set the name of Elasticsearch node by editing the following field:  

node.name: <hostname of the node> 

Start the Elasticsearch service with the following command.

sudo systemctl start elasticsearch.service 

Check the status of the service. 

sudo systemctl status elasticsearch.service 

Execute a simple curl command to check whether you are able to query Elasticsearch. 

curl localhost:9200 

Figure 2. Command output to show successful install of Elasticsearch on a single node

After the installation is complete on all three nodes, check the status of the Elasticsearch cluster by executing the following commands on each node: 

curl -XGET ‘http://<clusterIP>:9200/_cluster/state?pretty’ 

Check the health of the Elasticsearch cluster by executing the following command. It should display the status as green. 

curl -XGET ‘http://<clusterIP>:9200/_cat/health\?v’ 

Figure 3. Command output to show health of the Elasticsearch cluster

Once the Elasticsearch cluster is running, we install a python application called tweepy on our client machine. It is a simple python-based library that is used to interact with the Twitter streaming api.  

pip3 install tweepy 

We have a sample python application that’s going to use the twitter streaming api and fetch live tweets based on the keywords we specify. To use this application, you will need to sign up for a Twitter developer account. It is straightforward and the steps are listed here. 

Once you have created the account and registered your application, you should have an access token, api key and api secret key. These needs to be provided in the python script. The sample script can be downloaded from the github repo here. 

As shown in the following image, we run a python script that searches for live tweets based on specific keywords. The script then connects with a three (3) node Elasticsearch cluster running on AWS Graviton2 based Amazon M6g instances. The live tweets that are collected in the script are formatted and sent to the Elasticsearch cluster. On successful completion of the script, search queries are executed, and the data is analyzed. 

Figure 4. Script execution result 

In this script we look for keywords like “aws”, “graviton2” or “arm”. On executing this script, it is going to look for live tweets that reference any of these keywords and insert the tweet data into Elasticsearch database. We run the script for two hours to collect considerable amount of data. Now, it is time to search our keywords and analyze the tweets.  

Executing the following command to search for keyword ‘graviton2’. 

curl -XGET ‘http://<esclusterIP>:9200/sentiment/_search?q=graviton2’ 

Figure 5. Search and analysis result for keyword 

It shows data related to a tweet from a few minutes ago that referenced ‘graviton2’ as a keyword. These stats can also be viewed with tools such as Kibana or Grafana. 

Rally configuration 

Before installing Rally, you need to have Python3 configured on the machine. 

sudo apt install python3-pip 

To install rally, execute the following command: 

pip3 install esrally  

After the tool is installed, add its location to PATH environment variable and execute the configuration command: 

esrally configure 

We should see the following output on a successful configuration: 

  

Anonymous
Servers and Cloud Computing blog
  • How SiteMana scaled real-time visitor ingestion and ML inference by migrating to Arm-based AWS Graviton3

    Peter Ma
    Peter Ma
    Migrating to Arm-based AWS Graviton3 improved SiteMana’s scalability, latency, and costs while enabling real-time ML inference at scale.
    • July 4, 2025
  • Arm Performance Libraries 25.04 and Arm Toolchain for Linux 20.1 Release

    Chris Goodyer
    Chris Goodyer
    In this blog post, we announce the releases of Arm Performance Libraries 25.04 and Arm Toolchain for Linux 20.1. Explore the new product features, performance highlights and how to get started.
    • June 17, 2025
  • Harness the Power of Retrieval-Augmented Generation with Arm Neoverse-powered Google Axion Processors

    Na Li
    Na Li
    This blog explores the performance benefits of RAG and provides pointers for building a RAG application on Arm®︎ Neoverse-based Google Axion Processors for optimized AI workloads.
    • April 7, 2025