Arm Community
Arm Community
  • Site
  • User
  • Site
  • Search
  • User
  • Groups
    • Arm Research
    • DesignStart
    • Education Hub
    • Innovation
    • Open Source Software and Platforms
  • Forums
    • AI and ML forum
    • Architectures and Processors forum
    • Arm Development Platforms forum
    • Arm Development Studio forum
    • Arm Virtual Hardware forum
    • Automotive forum
    • Compilers and Libraries forum
    • Graphics, Gaming, and VR forum
    • High Performance Computing (HPC) forum
    • Infrastructure Solutions forum
    • Internet of Things (IoT) forum
    • Keil forum
    • Morello Forum
    • Operating Systems forum
    • SoC Design and Simulation forum
    • 中文社区论区
  • Blogs
    • AI and ML blog
    • Announcements
    • Architectures and Processors blog
    • Automotive blog
    • Graphics, Gaming, and VR blog
    • High Performance Computing (HPC) blog
    • Infrastructure Solutions blog
    • Innovation blog
    • Internet of Things (IoT) blog
    • Mobile blog
    • Operating Systems blog
    • Research Articles
    • SoC Design and Simulation blog
    • Smart Homes
    • Tools, Software and IDEs blog
    • Works on Arm blog
    • 中文社区博客
  • Support
    • Open a support case
    • Documentation
    • Downloads
    • Training
    • Arm Approved program
    • Arm Design Reviews
  • Community Help
  • More
  • Cancel
Arm Community blogs
Arm Community blogs
Operating Systems blog Arm support for Android NNAPI gives >4x performance boost
  • Blogs
  • Mentions
  • Sub-Groups
  • Tags
  • Jump...
  • Cancel
More blogs in Arm Community blogs
  • AI and ML blog

  • Announcements

  • Architectures and Processors blog

  • Automotive blog

  • Embedded blog

  • Graphics, Gaming, and VR blog

  • High Performance Computing (HPC) blog

  • Infrastructure Solutions blog

  • Internet of Things (IoT) blog

  • Operating Systems blog

  • SoC Design and Simulation blog

  • Tools, Software and IDEs blog

Tags
  • mobile
  • Android
  • ARMv8.2-A
  • Mali GPU Tools
  • Mali
  • Neural Network
  • Artificial Intelligence (AI)
  • Machine Learning (ML)
  • Cortex-A
  • google
  • ARM Accelerator
  • Arm Compute Library (ACL)
Actions
  • RSS
  • More
  • Cancel
Related blog posts
Related forum threads

Arm support for Android NNAPI gives >4x performance boost

Robert Elliott
Robert Elliott
January 29, 2018

The launch of Arm support for the Android Neural Networks API (NNAPI) sees the release of open-source, optimized neural network operators that deliver significant performance uplift across CPUs and GPUs.

Back in May at Google I/O, we heard the first public news about TensorFlow Lite for Android. This was the first exciting hint of a major new API that will affect the deployment of neural networks on Arm-based platforms supporting Android.

Inference engines are nothing new, but the big change with the announcement of NNAPI is standardized support within Android and the ability to target the wide array of accelerators available from the Arm ecosystem, such as the Arm Mali GPU.

At Arm, we fully support this development and will be releasing support for our Arm Cortex-A CPUs and Mali GPUs from day one. This is following on from other efforts to improve the performance of machine learning applications on Arm platforms, adding to our existing release of the Compute Library at the beginning of the year, and our ongoing engagement with the community of companies and developers that is standardizing approaches and sharing developments in the open.

A tricky problem

The way neural network inference is supported at the high level is deceptively simple. First, a model representing a neural network and its associated weights is provided by the application or ML framework (such as TensorFlow Lite). Then, the Android NN Runtime performs scheduling to determine how the graph should be run – on CPU or any devices that have been registered to support neural network computation. After this, the selected device – often the CPU or GPU, and sometimes another accelerator – will be given the model to run. Finally, the device will break the workload down into key operations, and run the inference process on the model, producing the result that will be used by the application. 

Arm support for Android NNAPI

An overview of Arm's support for Google's NN API

This may appear simple, but there's been a lot of work put in by our software teams to make each stage run well – particularly when it comes to HAL and driver support for Mali GPUs and the heavily optimized operators which run on both the CPU and GPU. These have been carefully tuned by Arm, and are at the heart of the Google CPU backend for Android NN, as well as in the Arm Mali GPU routines provided through our GPU implementation of the Android NN HAL.

The key operators needed for convolutional neural networks are supported, ready to speed up existing applications and open up the possibility for new ones to be deployed. Fortunately, we’ve been building these software components for a long time, so when this new API became available, we were ready.

Heavily optimized

Since the announcement, lots of hard work has been happening at both Arm and Google to make sure that high-performance neural network inference is easy to achieve on Arm platforms. This has culminated in the release of optimized CPU operators for Cortex-A, integrated into Google's framework, and for Arm Mali GPUs, along with an inference engine to run them. What's more, these operators are released as open source and available as part of the Compute Library.

Arm already provides support for 32-bit floating point and this support is improved with our NNAPI release to speed up neural network computation by three times. We're also working to support 8-bit integer operations, which will provide around four times more performance than fp32 when running on the Mali GPU already deployed in most mobile devices.

Additionally, there's ongoing work to add support for further Arm CPUs and GPUs as they are released. For example, Cortex-A55 and Cortex-A75 are beginning to appear in products, and we'll unlock the power of the new ARMv8.2 architecture to give a 4x performance boost to 8-bit convolution and matrix multiply.

All this is great news for anyone wanting to deploy convolutional neural networks on Arm, as these invariably quantize down to 8-bit with nearly the same accuracy as running in 32-bit, but at notably higher performance.

Alongside this, the added benefit of reduced bandwidth, and improvements due to the memory subsystem, result in even better performance, whichever Arm platform you choose.

Where next?

Alongside the Android 8.1 release, smartphones will immediately gain from the performance improvements made to CPU routines and, for platforms with the Mali GPU, work will automatically be offloaded for even higher performance. This is an area where we continue to optimize, so expect even better performance in future.

We've been working closely with a number of our partners to make this available for use on their platforms, so you'll soon see devices from Huawei and Mediatek that support NNAPI with accelerated Arm CPU and GPU support.

To keep track of developments on machine learning and how to run it on Arm-powered devices, keep an eye on this blog. Expect more on Arm’s machine learning platform and our support for Android soon!

Machine Learning Developer Community

Useful links

Anonymous
  • Fleet Lee
    Offline Fleet Lee over 3 years ago

    It's very useful for me, thanks very much!

    • Cancel
    • Up 0 Down
    • Reply
    • More
    • Cancel
  • Anton Lokhmotov
    Offline Anton Lokhmotov over 4 years ago

    Glad to hear that our Collective Knowledge enabled optimisation has been useful! Let's keep collaboration going.

    • Cancel
    • Up 0 Down
    • Reply
    • More
    • Cancel
Operating Systems blog
  • MongoDB performance on Arm Neoverse based AWS Graviton2 processors

    Julio Suarez
    Julio Suarez
    In this post, we show how the AWS Graviton2 based R6g achieves 117% higher throughput on MongoDB than the x86-based R5.
    • June 9, 2021
  • OCI Ampere A1 Compute instances can significantly reduce video encoding costs versus modern CPUs

    Steve Demski
    Steve Demski
    In this blog we show how OCI A1 instances provide leading performance per dollar for x264 video encoding.
    • May 25, 2021
  • Arm-based OCI Ampere A1 Compute instances beat the latest competition on NGINX

    Steve Demski
    Steve Demski
    In this blog we test the performance of OCI A1 Arm-based instances on NGINX Plus compared to competitive offerings.
    • May 25, 2021