Arm Community
Arm Community
  • Site
  • User
  • Site
  • Search
  • User
Arm Community blogs
Arm Community blogs
AI blog Arm Ethos-N ML Inference Processors: Powering Exciting User Experiences on Edge Devices
  • Blogs
  • Mentions
  • Sub-Groups
  • Tags
  • Jump...
  • Cancel
More blogs in Arm Community blogs
  • AI blog

  • Announcements

  • Architectures and Processors blog

  • Automotive blog

  • Embedded and Microcontrollers blog

  • Internet of Things (IoT) blog

  • Laptops and Desktops blog

  • Mobile, Graphics, and Gaming blog

  • Operating Systems blog

  • Servers and Cloud Computing blog

  • SoC Design and Simulation blog

  • Tools, Software and IDEs blog

Tags
  • mobile
  • Neural Network
  • Artificial Intelligence (AI)
  • Machine Learning (ML)
  • Processors
Actions
  • RSS
  • More
  • Cancel
Related blog posts
Related forum threads

Arm Ethos-N ML Inference Processors: Powering Exciting User Experiences on Edge Devices

Ian Forsyth
Ian Forsyth
May 27, 2019
5 minute read time.

OK. Quick survey: How many connected devices do you own?

Whether you’re a gadget addict or just an average Josephine, I’m not sticking my neck out too far if I guess that you own more today than you did five years ago. From smartphones and tablets to personal fitness trackers, smart asthma inhalers and smart doorbells, we’re all busily increasing our connectivity year on year – along with our own personal data explosion. According to a recent report, in the last ten years, the global average of connected devices per capita has leapt from less than two to a projected 6.58 by 2020. That’s an awful lot of devices creating an awful lot of data.

Until recently, that data was routinely shipped to the cloud for processing. But as the amount of data and devices increase exponentially, it’s just not practical – not to mention secure or cost-effective – to keep shifting all that data back and forth.

Fortunately, recent advances in machine learning (ML) mean that more processing, and pre-processing, can now be done on-device than ever before. This brings a range of benefits, from increased safety and security, thanks to the reduced risk of data exposure, to cost and power savings. Infrastructure to transmit data to the cloud and back doesn’t come cheap, so the more processing that can be done on-device, the better.

Power and Efficiency Across the Performance Curve

On-device ML starts with the CPU, which acts as an adept ‘traffic controller’, either single-handedly managing entire ML workloads or distributing selected tasks to specialized Ethos-N processor.

Arm CPUs – and GPUs – are already powering thousands of ML use cases across the performance curve, not least for mobile, where edge ML is already driving features that consumers have come to expect as standard. (Bunny ear selfie, anyone?)

As these processors get ever-more powerful and efficient, they drive even higher performance, which enables more on-device compute power for secure ML at the edge. (See the launch of the third-generation DynamIQ ‘big’ core Arm Cortex-A77 CPU, for example, which can manage compute-intensive tasks without impacting battery life, and the Arm Mali-G77 GPU, which delivers a 60 percent performance improvement for ML.)

But while CPUs and GPUs are ML powerhouses in their own right, where the most intensive and efficient performance is required, they can struggle to meet requirements. For these tasks, the might of a dedicated neural processing unit (NPU), such as the Arm Ethos-N77, comes into its own, delivering the highest throughput and most efficient processing for ML inference at the edge.

NPU Drives New, Exciting User Experiences

So, what makes the Ethos-N77  so special? Well, it’s based on a brand-new architecture, targeting connected devices such as smartphones, smart cameras, augmented and virtual reality (AR/VR) devices and drones, as well as medical and consumer electronics. If you’re interested in how it stacks up numbers-wise, you can’t fail to be impressed by its outstanding performance of up to 4 TOP/s, enabling new use cases that were previously impossible due to limited battery life or thermal constraints. This enables developers to create new user experiences such as 3D face unlock or advanced portrait modes featuring depth control or portrait lighting.

Of course, superb performance is great – but not if it requires you to charge your device every couple of hours or drag a power bank with you wherever you go. To set users free from the tyranny of the charging cable, the ML processor boasts an industry-leading power efficiency of 5 TOPs/W – achieved through state-of-the-art optimizations, such weight and activation compression, as well as Winograd.

Winograd enables 225% greater performance on key convolution filters compared to other NPUs, in a smaller footprint, driving efficient performance while reducing the number of components required in any given design. This in turn lowers cost and power requirements without compromising on user experience.

The architecture consists of fixed-function engines, for the efficient execution of convolution layers, and programmable layer engines, for executing non-convolution layers and implementing selected primitives and operators. These natively supported functionalities are closely aligned with common neural frameworks to reduce network deployment costs allowing for a faster time to market.

Arm EthosEthos-N77 premium ML inference processor contains 16 compute engines
  • Efficiency: Provides a massive uplift from CPUs, GPUs, DSPs and accelerators of up to 5 TOPs/W
  • Network support: Processes a variety of popular neural networks, including convolutional (CNNs) and recurrent (RNNs), for classification, object detection, image enhancements, speech recognition and natural language understanding
  • Security: Executes with minimum attack surface using the foundation of Arm TrustZone architecture
  • Scalability: Scales, via multicore, up to eight NPUs and 32 TOPs in a single cluster or 64 NPUs in a mesh configuration
  • Neural framework support: Integrates closely with existing frameworks: TensorFlow, TensorFlow Lite, Caffe, Caffe 2 and others via ONNX.
  • Winograd convolution: Accelerates common filters by 225% compared to other NPUs, allowing more performance in less area
  • Memory compression: Minimizes system memory bandwidth through a variety of compression technologies
  • Heterogeneous ML compute: Optimized for use with Arm Cortex-A CPUs and Arm Mali GPUs
  • Enabled by open-source software: Supported by Arm NN to reduce cost and avoid lock-in

Futureproof and Versatile

To make life easy for developers, the Ethos-N77 has an integrated network control unit and DMA which manages the overall execution and traversal of the network, as well as moving data in and out of the main memory in the background.

Onboard memory allows central storage for weights and feature maps, reducing the traffic to external memory and so increasing battery life – another nod to the superlative user experience that consumers have come to expect as standard.

Crucially, the Ethos-N77  is flexible enough to support use cases with higher requirements, running an increased number and size of concurrent features: up to 8 cores can be configured in a single cluster achieving 32 TOP/s of performance, or up to 64 NPUs in a mesh configuration.

Ultimately, the Ethos-N77  boosts performance, drives efficiency, reduces network deployment costs and – through tight coupling of fixed-function and programmable engines – futureproofs the design, allowing firmware to be updated as new features are developed.

Through this combination of power, efficiency and flexibility, the Ethos-N77  is defining the future of ML inference at the edge, empowering developers to meet the requirements of tomorrow’s use cases whilst creating today’s optimal user experience. 

Learn more about Ethos-N processors

Anonymous
AI blog
  • Deploying PyTorch models on Arm edge devices: A step-by-step tutorial

    Cornelius Maroa
    Cornelius Maroa
    As AI adoption in edge computing grows, deploying PyTorch models on ARM devices is becoming essential. This tutorial guides you through the process.
    • April 22, 2025
  • Updates in KleidiCV: Multithreading support and OpenCV 4.11 integration

    Mark Horvath
    Mark Horvath
    What's new with KleidiCV 0.2.0 and 0.3.0? Updates include new features and performance enhancements.
    • February 25, 2025
  • Part 2: Sing this song in another language, translating Machine Learning Pipelines to Android

    Virginia Cangelosi
    Virginia Cangelosi
    Part 2 explores the challenges of porting such a complex pipeline to Android, with insight on key design choices to facilitate the process.
    • January 15, 2025