Migrating our GenAI pipeline to AWS Graviton powered by Arm Neoverse: A 40% cost reduction story

August 28, 2025

8 minute read time.

As an Arm Ambassador and a developer focused on optimizing cloud infrastructure, I have seen how quickly costs can escalate when scaling AI.

This blog post explains how Esankethik.com, an IT and AI solutions company, successfully migrated its internal GenAI pipeline to AWS Graviton Arm64. The migration delivered 40% cost savings and 25% performance gains.

The problem: When AI ambition meets exploding cloud bills

We built a scalable GenAI pipeline to automate customer support. It handled over 1,000 daily AI inference requests on AWS x86-based infrastructure. It worked. However, costs quickly escalated, with GenAI consuming 60% of our AI budget.

Scaling further meant either compromising or rethinking our infrastructure. We needed a cost-effective, high-performance alternative.

Why it matters: The imperative for sustainable AI at scale

Demand for sophisticated AI solutions is surging across industries. Yet, the high operational cost of AI inference on traditional slows adoption and limits innovation. Especially for startups and growth-stage companies.

Industry Impact: High infra costs affect product viability and competitiveness.
Who is Affected: Developers, architects, and decision-makers all feel the pressure to scale smartly.
Why Now: Arm Neoverse-powered AWS Graviton instances offer a proven path to better price-performance, and greener compute.

The solution: Building a Graviton-optimized GenAI pipeline

We rebuilt and deployed our GenAI pipeline on AWS to reduce cost and improve performance. We moved deliberately to the Arm64-basedGraviton architecture, powered by Arm Neoverse. This shift cut infrastructure costs by 40% and improved AI inference performance by 20%.

Here is how we achieved it and why Arm (AWS Graviton) was the cornerstone of our strategy

Our solution involved leveraging Arm64 based AWS Graviton technology across our entire GenAI stack. This ensured all orchestration and AI inference processes was fully optimized:

Migrated AWS Lambda functions from x86 to Arm64 architecture (Python 3.11 runtime). These Lambdas handle core compute tasks, including requests to Bedrock for AI inference.
Rebuilt Docker containers for multi-architecture support, specifically linux/arm64. These containers are used for batch processing and other high-volume workloads, orchestrating calls to the Bedrock models.
Optimized Python dependencies to ensure full Arm64 compatibility. Built from source when pre-compiled wheels were unavailable.
Used Infrastructure as Code (IaC) with CloudFormation and Terraform for consistent, repeatable Arm64 deployments, including Arm64-specific AMI selection for AWS EC2 instances.
Implemented CI/CD with Arm64-specific build pipelines using Docker Buildx for cross-platform compilation.

Steps we followed for our migration

Initial audit & local testing: We reviewed all Python dependencies for Arm64 compatibility. We started local development using Docker with --platform linux/arm64.
Parallel deployment: We deployed Arm64-based AWSinstances and Lambda functions alongside our x86 infrastructure behind an AWS Application Load Balancer.
Gradual traffic shift & validation: We gradually shifted live traffic to the Arm64-based deployments. We validated compatibility, performance, and stability without downtime.
Consolidation & optimization: Once validated, we moved all workloads to Arm64. We consolidated where possible to simpler, more efficient architectures, and fine-tuning configurations.

We chose AWS Graviton processors, built on Arm Neoverse cores, for their balance of cost savings, raw performance, and energy efficiency. They are ideal for our compute-intensive AI inference workloads and the surrounding pipeline orchestration.

Technical details

Here is a deeper look into the stack, tools, and configurations that powered our migration.

Cloud infrastructure

AWS Lambda: Arm64 runtime with Python 3.11 for event-driven, serverless inference orchestration.
Amazon Bedrock: Claude 3 Haiku model for core AI inference capabilities. Our Arm64-optimized Lambda functions and EC2 instances facilitate highly efficient interactions with this model.
AWS EC2: Graviton instances (e.g., t4g.medium, c7g.large), powered by Arm Neoverse. Used for high-volume, dedicated workloads requiring more sustained compute. Including those that call AWS Bedrock for inference.
Container Platform: Docker with linux/arm64 builds to ensure native Arm64 execution for all application components.

Development stack

Language: Python 3.11, ensuring access to the latest Arm64-optimized libraries.
Key Dependencies: boto3 (for AWS API interaction), FastAPI, uvicorn (for API endpoints) – all verified for Arm64 compatibility.
Build Tools: Docker Buildx for seamless multi-architecture container compilation.
IaC: CloudFormation and Terraform, configured to provision Arm64 resources.

Key configurations

YAML:

YAML ocnfiguration

Docker file:

Docker file

Challenges and solutions

Our migration was not without its hurdles, but each challenge presented an opportunity to refine our approach and deepen our understanding of Arm64 optimization.

1. Missing Arm64 Wheels

Fixed by using --platform linux_aarch64 and building from source.
Code:

Bash:

Missing Arm64 Wheels

2. Cross-platform builds

Solved with Docker Buildx to simplify multi-arch compilation.

Docker BuildX

3. IaC AMI Selection

Used dynamic AMI lookup for Arm64 in Terraform.

Terraform example:
IaC AMI Selection

4. Performance testing

Ran custom benchmarks (ab, Python scripts) and tracked CloudWatch metrics.

Impact: Unlocking AI potential with unprecedented efficiency

Migrating to Arm Neoverse-based AWS Graviton architecture, powered by Arm Neoverse, was a game-changer for our GenAI pipeline. The results were immediate and measurable, delivering significant cost savings and performance gains across the board. This was more than a performance upgrade. It reshaped our Total Cost of Ownership (TCO) strategy by proving how Arm64-based architectures can scale savings in a tangible, repeatable way.

Cost savings

AWS Lambda costs: Reduced from $200 to $120 per million requests (40% savings)
AWS EC2 costs: t4g.medium at $24/month vs t3.medium at $40/month (40% savings)
Annual savings: $960+ per workload, $4,800 across 5 production workloads

Performance improvements

Inference latency: Improved from 1.2s to 0.9s (25% faster)
Cold start time: Reduced from 800ms to 600ms (25% faster)
Throughput: Increased from 100 to 125 requests/second (25% higher)
Memory efficiency: 15% better memory utilization

Operational benefits

Deployment time: Reduced by 30% with optimized Arm64 builds
Carbon footprint: 40% reduction in power consumption
Scalability: Better auto-scaling performance under load

Cost savings at scale

As illustrated below, our cost savings were substantial across various request volumes, reaching 40% across the board.

Arm64 vs x86 cost comparison

Figure 1: Comparison of Monthly Costs for x86 vs. Arm64-based AWS Graviton3 Instances, showing 40% savings across the board

Long-term financial and performance gains

Our projections confirm that these savings grow significantly over time, leading to substantial cumulative benefits.

Cost projection and savings

Figure 2: 12-Month Cost Projection and Cumulative Savings, highlighting the long-term financial benefits of migrating to Arm64-based Graviton3.

Beyond financial metrics, Arm-based AWS Graviton instances also delivered significant performance and efficiency improvements:

Metric	x86 (Before Migration)	Arm64 (After Migration)	Improvement
Lambda Cost (per 1M requests)	$200	$120	40% Savings
EC2 t*.medium Instance Cost	~$40/month (t3.medium)	~$24/month (t4g.medium)	40% Savings
Inference Latency	1.2 seconds	0.9 seconds	25% Faster
Cold Start Time	800 milliseconds	600 milliseconds	25% Faster
Throughput	100 requests/second	125 requests/second	25% Higher
Memory Efficiency	1.0x	1.15x	15% Better
Annual Savings (per workload)	N/A	$960+	Significant
Deployment Time	1.0x	0.7x	30% Faster
Carbon Footprint	1.0x	0.6x	40% Reduction

This radar chart visually summarizes our performance and efficiency gains:

Performance and efficiency gains

Figure 3: Radar Chart illustrating key performance improvements of Arm64-basedAWS Graviton3 over x86 infrastructure.

Beyond the numbers, we experienced tangible operational benefits:

Significant annual savings: Our first workload saved over $960 per year. With five production workloads, we saved more than $4,800 annually. These savings directly improved our TCO and freed up budget for further AI innovation and scaling.
Enhanced scalability: Our pipeline now handles higher volumes with better auto-scaling. Performance stays consistent even during peak times.

Environmental impact

Finally, adopting Arm-based AWS Graviton instances significantly reduced our environmental footprint.

Environmental impact: Arm64 vs x86 infrastructure

Figure 4: Comparison of Monthly Energy Consumption and CO2 Emissions for x86 vs. Arm64-based AWS Graviton3 Infrastructure, showing a 45% reduction.

The 40% reduction in power consumption aligns with our sustainability goals. Making our AI solution not just performant but also responsible.

What is next: Be part of the Graviton revolution

Our cost-optimized GenAI pipeline with AWS Graviton and Arm Neoverse proved that modern AI can be both powerful and efficient.

Ready to explore it yourself?

Check the code: Visit my GitHub repo to get started.
Try it out: Use our deployment guides to launch your own pipeline.
Watch the demo: See the performance first-hand.
Join the community: Share feedback, contribute, or collaborate.
Tell your story: Migrating to Arm? Let us know, we would love to hear it!

Code, repos, and further reading

GitHub Repository:

Documentation:

AWS Resources:

Benchmarking Tools:

Key takeaways and tips for other developers

Before migration

Check all Python packages for Arm64 support early.
Test locally with Docker (--platform linux/arm64).
Roll out gradually—start in parallel with x86.

During Migration

Use IaC (CloudFormation/Terraform) for consistent deployment.
Rely on AWS services like Lambda & Bedrock—they handle Arm64 well.
Monitor performance (CloudWatch: x86 vs Arm64).

Best Practices

Build multi-arch containers using docker buildx.
Add Arm64 to CI/CD testing.
Track cost savings via AWS Cost Explorer.

Watch out for

Not all libs are Arm64-ready, test everything.
Avoid full migration at once, start small.
Don’t skip performance testing, results vary.

Learn more about migrating to Arm Cloud

About the author

Hrudu Shibu is an Arm Ambassador with a strong passion for scalable cloud solutions and system automation. With hands-on expertise in enterprise IT systems, compliance frameworks, and developer tooling, he actively contributes to the developer community through open-source projects and technical writing.

GitHub profile LinkedIn profile

Servers and Cloud Computing blog

How Fujitsu implemented confidential computing on FUJITSU-MONAKA with Arm CCA

Marc Meunier

Discover how FUJITSU-MONAKA secures AI and HPC workloads with Arm v9 and Realm-based confidential computing.
- October 13, 2025
Pre-silicon simulation and validation of OpenBMC + UEFI on Neoverse RD-V3

odinlmshen

In this blog post, learn how to integrate virtual BMC and firmware simulation into CI pipelines to speed bring-up, testing, and developer onboarding.
- October 13, 2025
Accelerating early developer bring-up and pre-silicon validation with Arm Neoverse CSS V3

odinlmshen

Discover the Arm Neoverse RD-V3 Software Stack Learning Path—helping developers accelerate early bring-up and pre-silicon validation for complex firmware on Neoverse CSS V3.
- October 13, 2025

AI blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded and Microcontrollers blog

Internet of Things (IoT) blog

Laptops and Desktops blog

Mobile, Graphics, and Gaming blog

Operating Systems blog