Arm Community
Arm Community
  • Site
  • User
  • Site
  • Search
  • User
Arm Community blogs
Arm Community blogs
AI blog Optimizing AI workloads with Ethos NPUs
  • Blogs
  • Mentions
  • Sub-Groups
  • Tags
  • Jump...
  • Cancel
More blogs in Arm Community blogs
  • AI blog

  • Announcements

  • Architectures and Processors blog

  • Automotive blog

  • Embedded and Microcontrollers blog

  • Internet of Things (IoT) blog

  • Laptops and Desktops blog

  • Mobile, Graphics, and Gaming blog

  • Operating Systems blog

  • Servers and Cloud Computing blog

  • SoC Design and Simulation blog

  • Tools, Software and IDEs blog

Tags
  • Artificial Intelligence (AI)
  • Machine Learning (ML)
Actions
  • RSS
  • More
  • Cancel
Related blog posts
Related forum threads

Optimizing AI workloads with Ethos NPUs

Dylan Zika
Dylan Zika
October 23, 2019
3 minute read time.

Today’s Artificial Intelligence (AI) use cases are moving beyond the hype, delivering real value to consumers with vision and voice-based use cases. Image classification requirements are exploding upward with real-time video processing at increased resolutions. Realistic personal assistants are becoming part of everyday life, bringing natural language understanding and realistic text to speech. Cameras are getting more advanced with super night mode, advanced portrait mode, and image enhancements based on the items found in the scene.

These improvements are increasing consumer expectations, while driving performance requirements steadily upward, demanding more operations in highly efficient packages.

One Size Does not Fit All

Premium users of Machine Learning (ML) inference require the highest throughput and performance efficiency, delivering the best user experience without compromising battery life. But as consumer appetite for AI increases, the want for ML in mainstream devices has grown significantly, requiring a careful balance of power, efficiency and area. Even the most cost-sensitive devices, with strict limitations on available DRAM bandwidth and area, are looking to capitalize on the power of ML.

But how do you meet these demands across the spectrum? How do you rationalize varying power, performance and area requirements while still achieving optimal performance?

A ‘family’ of processors – such as the new Ethos NPU Series – specially designed from the ground up, allows designers to meet market needs without breaking the bank. Significant time-to-market advantage can be gained by integrating Ethos NPUs into multiple devices; selecting the appropriate NPU can saves months of design time. Ethos-N NPUs are tuned for high utilization, allowing designers to achieve outstanding ML performance while minimizing the area required, lowering the cost of intelligent devices.

ML Use case diagram

Here is how the numbers stack up:

Product

Throughput

MAC/Cycle

Internal Memory

Target

Ethos-N77

Up to 4 TOP/s

2048 8x8

1-4 MB

Computational photography, premium smartphones, AR/VR

Ethos-N57

Up to 2 TOP/s

1024 8x8

512 KB

Mainstream smartphones, smart home hubs

Ethos-N37

Up to 1 TOP/s

512 8x8

512 KB

Smart cameras, entry smartphones, DTV

Ethos-N77 – formerly known as the Arm ML processor – remains the NPU for premium applications, with internal memory footprint configurable from 1MB to 4MB. It allows the most demanding ML applications with large input sizes to be run – without killing your battery life – by keeping more weights and activations on-chip, eliminating power-hungry DRAM traffic.

Smaller NPUs – Ethos-N57 and Ethos-N37 – deliver powerful, cost-effective ML for mainstream SoCs with much tighter area budgets. All of these processors bring end-to-end compression technology that lowers DRAM requirements, minimizing system bandwidth by 1.5-3x with lossless compression for weights and activations using clustering, sparsity and workload tiling. This allows easy integration into existing designs without major modification to the memory structure. Ethos NPUs also provide hardware support for Winograd – and power-gating optimizations for sparsity that allow demanding ML workloads to be run at the endpoint.

Software Bridges the Gap

The versatility of the Ethos processor family is demonstrated when used with Arm NN – an inference engine that bridges the gap between existing NN frameworks and the underlying CPU, GPU, and NPU IP. Arm NN allows developers to write applications just once, yet still target a wide range of endpoints. This is because Arm NN provides an abstraction layer, eliminating the challenges of programming multiple, heterogeneous processors and allowing workloads to be run across devices like phones, TVs, and throughout the smart home, with minimal effort.

The Power of the Ecosystem

As with all Arm processors, the Ethos series gives you the ability to draw on the strength of the world’s largest AI ecosystem. Arm’s alliances with leading algorithm partners also ensure that developers are able to rapidly create valuable applications before hardware is released. This includes enabling popular mobile use cases such as computational photography, beautification and avatar generation. Recent partnerships bring forward new DTV use cases, such as 2K/4K super resolution, delivering crystal clear content.

Underpinned by high-performance, truly open-source software and the robust and diverse ecosystem, Ethos series processors give the flexibility to exceed consumer expectations, addressing the most demanding use cases, encapsulated in a tight power envelope.

If you would like to know more, you can read the Arm newsroom blog or visit Arm Developer at the following link.

Find out more on developer.arm.com/ethos

Anonymous
AI blog
  • Deploying PyTorch models on Arm edge devices: A step-by-step tutorial

    Cornelius Maroa
    Cornelius Maroa
    As AI adoption in edge computing grows, deploying PyTorch models on ARM devices is becoming essential. This tutorial guides you through the process.
    • April 22, 2025
  • Updates in KleidiCV: Multithreading support and OpenCV 4.11 integration

    Mark Horvath
    Mark Horvath
    What's new with KleidiCV 0.2.0 and 0.3.0? Updates include new features and performance enhancements.
    • February 25, 2025
  • Part 2: Sing this song in another language, translating Machine Learning Pipelines to Android

    Virginia Cangelosi
    Virginia Cangelosi
    Part 2 explores the challenges of porting such a complex pipeline to Android, with insight on key design choices to facilitate the process.
    • January 15, 2025