Use case: How to enable real-time sentiment analysis on Arm Neoverse-based Kubernetes Clusters

November 11, 2024

8 minute read time.

This blog post is co-authored by Na Li, Masoud Koleini, Pranay Bakre and Nobel Chowdary Mandepudi.

Social media's influence today is massive, spanning personal, social, political, economic, and cultural realms. Monitoring user sentiment can help organizations to quickly understand public reaction to events, trends, and products. This data-driven insight is crucial for reputation management, market research, and decision making for organizations. For example, Twitter, now known as "X" plays an influential role among social media platforms, which provides a platform for real-time communication and information sharing. Twitter has approximately 530 million active users as of 2024, who post an estimated 500 million tweets daily, making it a powerful real-time channel for gauging public sentiment. Therefore, tracking real-time changes enables organizations to understand sentiment patterns and make informed decisions promptly, allowing for timely and appropriate actions. However, real-time sentiment monitoring is a compute-intensive task and can quickly drive-up resources (Both compute and cost) if not managed effectively.

In this blog post, we will demonstrate how to build a distributed Kubernetes cluster on Arm Neoverse-based CPUs to monitor sentiment changes in real time based on tweets. So, you can fully utilize the computing foundation of Arm Neoverse for the performance, efficiency and unmatched flexibility.

Performance and efficiency with Arm Neoverse-based EC2 AWS Graviton instances

Amazon Web Services (AWS) offers EC2 instances that are powered by AWS Graviton processors, which are based on Arm Neoverse architecture. These instances built on Graviton2, Graviton3, and Graviton4 provide strong performance with significant cost efficiency [1-5]. In order to leverage the best of these benefits, we developed our use case on AWS Graviton instances using Amazon Kinesis, Apache Spark, Amazon EKS (Graviton 3 instances), Amazon EC2 (Graviton 4 instance) Amazon Elastic Search and Kibana dashboard, Prometheus, and Grafana (see Graph 1). Our use case can enable the fast creation and execution of massively parallel machine-learning jobs to derive real time insights by enabling different nodes. So that organizations can utilize the real-time insights to stay adaptable, responsive, and resilient in a rapidly changing world.

Logical Architecture Diagram

Graph 1: logical architecture diagram using AWS as example.

Also, keep in mind that Arm-Neoverse powered instances are available in Google Cloud and Microsoft Azure. So, using this type of logical architecture should also allow you to set up a similar solution using Google Cloud and Microsoft Azure services. Now we will walk you through each component in the diagram to explain its purpose and how it’s constructed, giving you a full understanding of the entire system by taking AWS as example. Later, we’ll release a learning path with code examples so you can replicate and build your own solution.

1: Set up Twitter API to retrieve Tweets

To timely retrieve new tweets published on Twitter's website, we will use Twitter Developer API, which is a set of programming tools and protocols provided by Twitter that allows developers to access and interact with Twitter data programmatically. It allows us to gather, filter, and analyze information from Twitter's vast database of tweets, user information, and other social media content.

To set it up, you will first need to create a Twitter developer account to use Twitter API. You must first create a project and an App using the developer portal. Then, create an API Key, API Secret, Access Token, and Access Token Secret to authenticate your application and read the tweets. Note that Twitter applies rate limits and constrains on the number of tweets you can retrieve per your subscription to the applications to provide reliable service.

2: Process data with Amazon Web Services (AWS) Kinesis

Amazon Web Services (AWS) Kinesis is a fully managed data streaming service, built to handle and process large volumes of real-time data. In our setup, we will use AWS Kinesis to capture live data from the Twitter API, ensuring that every tweet matching our filters (such as hashtags, keywords, accounts, language, timeframe, etc.) flows directly into Kinesis as soon as it’s posted. To configure this, follow the step-by-step guide provided in this document. The Twitter API script sends each tweet as a JSON object into a Kinesis stream, making the data readily available for subscribers to consume.

3: Perform sentiment analysis

The sentiment analyzer is a text classification model that detects the emotional tone of tweets, categorizing them into three or more groups based on the words used. This allows application users to quickly understand real-time opinions on a specific topic without having to read each tweet manually. The results provide valuable sentiment insights, enabling users to make data-driven decisions. There are several ways to calculate sentiment: you can train your own text classification model, which requires labeled data and can be time-consuming, or, as in our approach, you can use a pretrained sentiment classification model.

We process the sentiment in tweets using Spark Streaming which is an API within Spark for reliable stream processing of high-throughput streams of data such as Kafka, AWS Kinesis, HDFS/S3, Flume. It splits the input stream into mini batches and runs them through the Spark engine, creating a stream of batches of processed data. Spark comes with the streaming API on top of Spark SQL called Structured Streaming. It allows data to be presented as Datasets/Data frames (APIs on top of RDD) and allows optimized Spark SQL engine processing over the streaming data.

The Spark Streaming API reads the stream of tweets from the kinesis stream. The Spark engine runs the jobs on the received data frames, processing them with a pretrained sentiment classification model from the Stanford core NLP library, producing an output for each tweet into one of the following labels: [VERY_NEGATIVE, NEGATIVE, NEUTRAL, POSITIVE, VERY_POSITIVE]. The results are then sent to Elasticsearch.

4: Set up Elasticsearch

Elasticsearch is a robust, open-source search and analytics engine designed to efficiently store, search, and analyze large volumes of data in near real-time. It allows for fast data ingestion and nearly instant searchability. Its real-time indexing capability is crucial for handling high-velocity streams, such as tweets, that continuously flow in from APIs or event streams. To set up Elasticsearch on an AWS EC2 instance, you can follow these instructions.

5: Visualize data in Kibana dashboard

Kibana is an open-source visualization tool that works seamlessly with Elasticsearch, providing an interface for exploring, visualizing, and interacting with data. With Elasticsearch and Kibana, users can interact with the data, apply filters, and receive alerts if sentiment drops sharply, all in real time. If your Elasticsearch deployment did not include a Kibana instance initially, you can follow this instruction to enable Kibana first. For new Elasticsearch clusters, a Kibana instance is automatically created for you so that you can access directly. Once Kibana is enabled, you can follow this document to set up your desired visualization to display the data from elastic search.

6: Monitor Kubernetes metrics with Prometheus

Prometheus is a monitoring and alerting toolkit. It’s widely used for collecting and querying real-time metrics in cloud-native environments like Kubernetes. Prometheus collects essential metrics (for example, CPU, memory usage, pod counts, request latency) that help in monitoring the health and performance of Kubernetes clusters.

7: Visualize Prometheus with Grafana

Grafana is a visualization and analytics tool that integrates with data sources from Prometheus, to create interactive dashboards to monitor and analyze Kubernetes metrics over time. We deployed Prometheus and Grafana on Kubernetes using Helm. This blog post provides a good tutorial.

8: Set up Amazon Elastic Kubernetes Service (EKS)

Amazon Elastic Kubernetes Service is a managed Kubernetes service by AWS that allows you to deploy, manage, and scale applications. For our applications, since tweets volume can fluctuate greatly depending on trending topics or event, EKS allows for auto-scaling of Kubernetes pods and nodes, ensuring that sentiment analysis applications have the resources to handle peak loads and automatically scale down when traffic subsides, optimizing cost efficiency.

HashiCorp has provided the documentation on how to provision an EKS cluster on AWS. There is also Terraform scripts available to help to set it up automatically. To run it Graviton3-based instances, it requires a few changes:

Determine the Kubernetes version supported by EKS that you would like to run your cluster with. You can set the version in eks-cluster.tf
Determine the optimal Amazon Linux AMI for that Kubernetes version
Update the worker group parameters

Fullscreen

1
2
3
Name = "eks-nodes-aarch64"
ami_type = "AL2023_ARM_64_STANDARD" 
instance_types = ["r7g.4xlarge"]
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Name = "eks-nodes-aarch64"
ami_type = "AL2023_ARM_64_STANDARD" 
instance_types = ["r7g.4xlarge"]

Results and summary:

The inference time for each tweet depends on its length and the model used. To achieve accurate sentiment predictions, you can select a large model [7], which increases latency compared to a smaller model but yields higher sentiment accuracy. Since tweet lengths vary, inference times also fluctuate accordingly, averaging around a couple of hundreds msec per tweet. This means our use case can process approximately 5-10 tweets per second with the large model. Running smaller model usually fasters reducing the latency by half, processing 20-30 tweets per second [8].

Globally, about 6,000 tweets are posted on Twitter every second, with roughly 4,000 unique hashtags identified [6]. This translates to 1-2 tweets per hashtag per second.*

We have built our use case on Twitter, but you can take the main principles in this use case and deploy a similar solution on other social media platforms and take the full benefits of Arm Neoverse-based cloud instances in multiple major cloud providers including AWS, Google Cloud and Microsoft Azure. If you would like to learn more about this use case or explore the significant performance and efficiency benefits of Arm-Neoverse based instances in the cloud, please visit our booth (N12) at KubeCon 24.

References:

*This use case currently operates with two worker nodes, which might not handle all global tweets but can effectively manage those related to specific hashtags.

Scaling up the number of worker nodes would enable processing of a higher tweet volume if needed.

0 comments
0 members are here

Servers and Cloud Computing blog

Refining MurmurHash64A for greater efficiency in Libstdc++

Zongyao Zhang

Discover how tuning MurmurHash64A’s memory access pattern yields up to 9% faster hashing performance.
- October 16, 2025
How Fujitsu implemented confidential computing on FUJITSU-MONAKA with Arm CCA

Marc Meunier

Discover how FUJITSU-MONAKA secures AI and HPC workloads with Arm v9 and Realm-based confidential computing.
- October 13, 2025
Pre-silicon simulation and validation of OpenBMC + UEFI on Neoverse RD-V3

odinlmshen

In this blog post, learn how to integrate virtual BMC and firmware simulation into CI pipelines to speed bring-up, testing, and developer onboarding.
- October 13, 2025

AI blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded and Microcontrollers blog

Internet of Things (IoT) blog

Laptops and Desktops blog

Mobile, Graphics, and Gaming blog

Operating Systems blog