Arm NN: the Easy Way to Deploy Edge ML

January 4, 2019

5 minute read time.

Machine learning (ML) is no longer the new kid on the block. We’re almost all familiar with the concept of personal assistants, connected homes and a seemingly limitless torrent of gadgets that can improve our lives – as long as we have a data connection. But in the quest for a more reliable, responsive, secure and cost-effective user experience, the technology that was born in the cloud is now moving to the edge, directly into the smart devices that play an ever more important role in our daily lives.

It’s a logical step that brings wide-reaching benefits: shifting large amounts of data to the cloud for processing can, for example, produce a noticeable lag that may have a negative impact on time-critical applications. On-device processing avoids this delay and removes reliance on a data connection. Furthermore, the cost of shifting and storing data is huge, consumes an enormous amount of power and jeopardizes the security of the data, but building more and more data centers isn’t a viable option – not least because of the negative impact on carbon emissions. By keeping as much processing as possible on-device, costs and risk are mitigated.

Of course, cloud-based ML still has an important role: simply because of the hefty power and bandwidth requirements, a significant amount of neural network (NN) training will, most likely, continue to happen in the cloud. However, as Amazon said in a recent press release, “While training rightfully receives a lot of attention, inference actually accounts for the majority of the cost and complexity for running machine learning in production (for every dollar spent on training, nine are spent on inference).”

Given the size and importance of the inference market, it’s prudent to leverage the benefits of edge ML to get your full nine dollars’ worth.

Cloud vs Edge: What’s the Difference?

Naturally, running efficient ML on edge devices introduces new challenges, primarily because the parameters are so different to those of cloud compute.

ML in the cloud

Typically applied to a limited number of focused, vertical applications
Targets a small range of processors
Plenty of available power and bandwidth
Large equipment budget

ML on edge devices

Potentially applied to a wide and diverse range of applications
Many possible processor targets, from CPUs and GPUs to NPUs, DSPs and other forms of dedicated accelerator
Numerous – often proprietary – application programming interfaces (APIs)
Devices are relatively low-cost, and operate in thermally and power-constrained environments

Multiple ML Use Cases on Edge Devices

Of course, these differences have a considerable impact on software requirements. While the scale of NNs varies from the cloud to the edge, developers in either scenario have a common goal: to run NNs developed in high-level frameworks, particularly the most popular, such as Google’s TensorFlow and Facebook’s Caffe¹.

Developers targeting high-power CPUs and GPUs in the cloud will use hardware-specific software libraries to translate and run these high-level frameworks. But the numerous APIs edge developers are faced with make it difficult to create performance-portable, platform-agnostic software. What’s really needed is an easy way to target a wide range of processor types.

Enter Arm NN.

Arm NN: Easily Target Multiple Processors

Arm NN is an open-source, common software framework that bridges the gap between the NN frameworks edge developers want to use with the underlying processors on their platform. A common interface for all hardware types, it allows developers to efficiently and easily move NN workloads around an SoC, reducing the need for processor-specific optimization and facilitating software portability.

Crucially, Arm NN does not require developers to move to different high-level frameworks or tools – they can, for example, continue to use TensorFlow, while Arm NN provides the translation tools to translate the graphs into a common centralized format.

Up to 9.2x Faster

Along with a quarterly release schedule that continually adds features and improvements, Arm NN is consistently subjected to a huge amount of performance analysis, across a wide variety of networks running on a range of IP. This helps to prioritize optimization work and, as the chart below shows, brings about impressive improvements.

The example networks – just a few of the many that Arm measures – are shown with figures illustrating performance uplift over a period of just six months for an Arm big Cortex-A CPU, an Arm LITTLE Cortex-A CPU and an Arm Mali GPU. These improvements – up to 9.2x faster – come from software improvements only. Further improvements are expected as work continues: Arm is dedicated to enabling best-in-class performance across a range of existing IP, including CPUs, GPUs and NPUs, as well as new architectures as they become available.

NN performance improvements

These impressive figures have not gone unnoticed by the ecosystem, with key players such as Google and Facebook recognizing the benefits of Arm ML software – from Arm NN to the Compute Library and CMSIS-NN – choosing to integrate these libraries into their own ML solutions.

"We're delighted to work with Arm on enabling high-performance ML across the breadth of Android devices. The Arm Compute Library provides excellent performance. We're looking forward to using it."

The Android Neural Networks team

"The TensorFlow team is excited to work with Arm and Linaro to expand support for edge devices, and we're looking forward to integrating with the Arm NN library."

Pete Warden, Google

Progress Through Collaboration

Earlier this year, Arm donated Arm NN to Linaro’s Machine Intelligence initiative to enable the wider industry to benefit from an open and optimal framework for ML at the edge, and allow third-party IP developers to add their own support to the Arm NN framework.

As more and more ML moves to the edge, this kind of collaboration – and a standardized, open-source software approach – will become increasingly important. With the support of key players in the ecosystem, Arm will continue to invest significantly in Arm NN and its supporting libraries. To date, 100 man-years of effort has led to over 340,000 lines of code, and It’s estimated that Arm NN is already shipping in over 200 million Android devices. As the move to the edge extends to the very smallest microcontroller CPUs, the reach of Arm NN will grow to billions – and, eventually, trillions – of secure, connected devices.

By removing the need for custom code targeting specific accelerators, Arm NN allows developers to focus their efforts on the key differentiators that make their product unique. Both companies and individuals can contribute code, so if you have expertise to share, why not become part of this industry-changing movement?

Download Arm NN

¹ Deep Learning Framework Power Scores 2018, Jeff Hale, towardsdatascience.com, 20 September 2018

0 comments
0 members are here

Tools, Software and IDEs blog

GCC 15: Continuously Improving

Tamar Christina

GCC 15 brings major Arm optimizations: enhanced vectorization, FP8 support, Neoverse tuning, and 3–5% performance gains on SPEC CPU 2017.
- June 26, 2025
GitHub and Arm are transforming development on Windows for developers

Pareena Verma

Develop, test, and deploy natively on Windows on Arm with GitHub-hosted Arm runners—faster CI/CD, AI tooling, and full dev stack, no emulation needed.
- May 20, 2025
What is new in LLVM 20?

Volodymyr Turanskyy

Discover what's new in LLVM 20, including Armv9.6-A support, SVE2.1 features, and key performance and code generation improvements.
- April 29, 2025

AI blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded and Microcontrollers blog

Internet of Things (IoT) blog

Laptops and Desktops blog

Mobile, Graphics, and Gaming blog

Operating Systems blog

Servers and Cloud Computing blog

SoC Design and Simulation blog