Optimizing AI workloads with Ethos NPUs

October 23, 2019

3 minute read time.

Today’s Artificial Intelligence (AI) use cases are moving beyond the hype, delivering real value to consumers with vision and voice-based use cases. Image classification requirements are exploding upward with real-time video processing at increased resolutions. Realistic personal assistants are becoming part of everyday life, bringing natural language understanding and realistic text to speech. Cameras are getting more advanced with super night mode, advanced portrait mode, and image enhancements based on the items found in the scene.

These improvements are increasing consumer expectations, while driving performance requirements steadily upward, demanding more operations in highly efficient packages.

One Size Does not Fit All

Premium users of Machine Learning (ML) inference require the highest throughput and performance efficiency, delivering the best user experience without compromising battery life. But as consumer appetite for AI increases, the want for ML in mainstream devices has grown significantly, requiring a careful balance of power, efficiency and area. Even the most cost-sensitive devices, with strict limitations on available DRAM bandwidth and area, are looking to capitalize on the power of ML.

But how do you meet these demands across the spectrum? How do you rationalize varying power, performance and area requirements while still achieving optimal performance?

A ‘family’ of processors – such as the new Ethos NPU Series – specially designed from the ground up, allows designers to meet market needs without breaking the bank. Significant time-to-market advantage can be gained by integrating Ethos NPUs into multiple devices; selecting the appropriate NPU can saves months of design time. Ethos-N NPUs are tuned for high utilization, allowing designers to achieve outstanding ML performance while minimizing the area required, lowering the cost of intelligent devices.

ML Use case diagram

Here is how the numbers stack up:

Product	Throughput	MAC/Cycle	Internal Memory	Target
Ethos-N77	Up to 4 TOP/s	2048 8x8	1-4 MB	Computational photography, premium smartphones, AR/VR
Ethos-N57	Up to 2 TOP/s	1024 8x8	512 KB	Mainstream smartphones, smart home hubs
Ethos-N37	Up to 1 TOP/s	512 8x8	512 KB	Smart cameras, entry smartphones, DTV

Ethos-N77 – formerly known as the Arm ML processor – remains the NPU for premium applications, with internal memory footprint configurable from 1MB to 4MB. It allows the most demanding ML applications with large input sizes to be run – without killing your battery life – by keeping more weights and activations on-chip, eliminating power-hungry DRAM traffic.

Smaller NPUs – Ethos-N57 and Ethos-N37 – deliver powerful, cost-effective ML for mainstream SoCs with much tighter area budgets. All of these processors bring end-to-end compression technology that lowers DRAM requirements, minimizing system bandwidth by 1.5-3x with lossless compression for weights and activations using clustering, sparsity and workload tiling. This allows easy integration into existing designs without major modification to the memory structure. Ethos NPUs also provide hardware support for Winograd – and power-gating optimizations for sparsity that allow demanding ML workloads to be run at the endpoint.

Software Bridges the Gap

The versatility of the Ethos processor family is demonstrated when used with Arm NN – an inference engine that bridges the gap between existing NN frameworks and the underlying CPU, GPU, and NPU IP. Arm NN allows developers to write applications just once, yet still target a wide range of endpoints. This is because Arm NN provides an abstraction layer, eliminating the challenges of programming multiple, heterogeneous processors and allowing workloads to be run across devices like phones, TVs, and throughout the smart home, with minimal effort.

The Power of the Ecosystem

As with all Arm processors, the Ethos series gives you the ability to draw on the strength of the world’s largest AI ecosystem. Arm’s alliances with leading algorithm partners also ensure that developers are able to rapidly create valuable applications before hardware is released. This includes enabling popular mobile use cases such as computational photography, beautification and avatar generation. Recent partnerships bring forward new DTV use cases, such as 2K/4K super resolution, delivering crystal clear content.

Underpinned by high-performance, truly open-source software and the robust and diverse ecosystem, Ethos series processors give the flexibility to exceed consumer expectations, addressing the most demanding use cases, encapsulated in a tight power envelope.

If you would like to know more, you can read the Arm newsroom blog or visit Arm Developer at the following link.

Find out more on developer.arm.com/ethos

AI blog

Advancing PyTorch Performance on Arm: Key Enhancements in the 2.9 Release

Ashok Bhat

As part of the new PyTorch 2.9 release, Arm contributed key enhancements to ensure seamless performance and stability on Arm platforms. Learn more about the enhancements in this blog post.
- October 15, 2025
Are you attending PyTorch Conference 2025?

Michelle Yung

Join us on site at the PyTorch Conference 2025 on October 22-23 to learn how Arm empowers developers to build and deploy AI applications easily using PyTorch and ExecuTorch.
- October 15, 2025
Unlocking AI Potential with Kleidi: Seamless Acceleration Workshop Recap

Parichay Das

Explore takeaways from our Kleidi AI workshop led by Arm Ambassador Parichay Das, where participants tackled performance gaps and future AI needs.
- September 25, 2025

AI blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded and Microcontrollers blog

Internet of Things (IoT) blog

Laptops and Desktops blog

Mobile, Graphics, and Gaming blog

Operating Systems blog

Servers and Cloud Computing blog

SoC Design and Simulation blog

Tools, Software and IDEs blog

Optimizing AI workloads with Ethos NPUs

One Size Does not Fit All

Software Bridges the Gap

The Power of the Ecosystem

Advancing PyTorch Performance on Arm: Key Enhancements in the 2.9 Release

Are you attending PyTorch Conference 2025?

Unlocking AI Potential with Kleidi: Seamless Acceleration Workshop Recap