Arm Community
Arm Community
  • Site
  • User
  • Site
  • Search
  • User
Arm Community blogs
Arm Community blogs
Architectures and Processors blog Arm ML Processor: Powering Machine Learning at the Edge
  • Blogs
  • Mentions
  • Sub-Groups
  • Tags
  • Jump...
  • Cancel
More blogs in Arm Community blogs
  • AI blog

  • Announcements

  • Architectures and Processors blog

  • Automotive blog

  • Embedded and Microcontrollers blog

  • Internet of Things (IoT) blog

  • Laptops and Desktops blog

  • Mobile, Graphics, and Gaming blog

  • Operating Systems blog

  • Servers and Cloud Computing blog

  • SoC Design and Simulation blog

  • Tools, Software and IDEs blog

Tell us what you think
Tags
  • mobile
  • Mali
  • Neural Network
  • Artificial Intelligence (AI)
  • Machine Learning (ML)
  • Processors
Actions
  • RSS
  • More
  • Cancel
Related blog posts
Related forum threads

Arm ML Processor: Powering Machine Learning at the Edge

Ian Forsyth
Ian Forsyth
February 13, 2018
3 minute read time.

It would be really amazing to have a personal assistant in my hands that is actually smart, truly understands my words and responds intelligently to resolve day-to-day tasks. The recent advancements in Machine Learning (ML) make me optimistic that such a day is not too far away. ML has quickly moved from identifying cat pictures to solving real world problems significantly beyond the mobile market, in areas such as healthcare, retail, automotive and servers.

Now, the major challenge is to move this power to the edge and solve the privacy, security, bandwidth and latency issues that exist today. The Arm ML processor is a huge step in this direction.

Mobile Performance

The ML processor is a brand new design for the mobile and adjacent markets – such as smart cameras, AR/VR, drones, medical and consumer electronics – offering a performance of 4.6 TOP/s with an efficiency of 3 TOPs/W. Further compute and memory optimizations lead to a significant gain in performance for different networks.

The architecture consists of fixed-function engines, for the execution of convolution layers; and programmable layer engines, for executing non-convolution layers and implementing selected primitives and operators. The network control unit manages the overall execution and traversal of the network and DMA moves data in and out of the main memory. On-board memory allows central storage for weights and feature maps, reducing the traffic to external memory and, therefore, power.

Arm ML processor

Thanks to the presence of both fixed-function and programmable engines, the ML processor is extremely powerful, incredibly efficient and flexible enough to adapt to your future challenges, providing raw performance, along with the versatility to execute different neural networks effectively.

Key Features

  • Massive efficiency uplift from CPUs, GPUs, DSPs and accelerators
  • Enabled by open source software, so there’s no lock in
  • Closely integrated with existing software frameworks: TensorFlow, TensorFlow Lite, Caffe, Caffe 2
  • Optimized for use with Arm Cortex CPUs and Arm Mali GPUs

Arm ML architecture: flexible, scalable, futureproof

Trillium flexible scalable architecture

To tackle the challenges of multiple markets, with a wide range of performance requirements – from a few GOPs for IoT to tens of TOPs for servers – the ML processor is based on a new, scalable architecture.

The architecture can be scaled down to approximately 2 GOPs of performance for IoT or embedded level applications, or scaled up to 150 TOPs of performance for ADAS, 5G, or server-type applications. These multiple configurations can achieve many times the efficiency of existing solutions.

Compatible with existing Arm CPU, GPU and other IPs, providing a complete, heterogeneous system, the architecture will also be accessible through the popular ML frameworks, such as TensorFlow, TensorFlow Lite, Caffe and Caffe 2.

As more and more workloads move to ML, compute requirements will take on a wide variety of forms. Many ML use cases are already running on Arm, with our enhanced CPUs and GPUs providing a range of performance and efficiency levels. With the introduction of the Arm Machine Learning platform, we aim to extend that choice, providing a heterogeneous environment with the choice and flexibility required to meet each and every use case, enabling intelligent systems at the edge… and perhaps even the personal assistant I dream of.

Useful links

  • Arm NN SDK
  • Machine Learning on Arm - Frameworks Supporting Arm IP
  • Arm ML Processor
  • Compute Library
  • Cortex Microcontroller Software Interface Standard (CMSIS)

Related Machine Learning Resources

Related content

Anonymous
Architectures and Processors blog
  • Introducing GICv5: Scalable and secure interrupt management for Arm

    Christoffer Dall
    Christoffer Dall
    Introducing Arm GICv5: a scalable, hypervisor-free interrupt controller for modern multi-core systems with improved virtualization and real-time support.
    • April 28, 2025
  • Getting started with AARCHMRS Features.json using Python

    Joh
    Joh
    A high-level introduction to the Arm Architecture Machine Readable Specification (AARCHMRS) Features.json with some examples to interpret and start to work with the available data using Python.
    • April 8, 2025
  • Advancing server manageability on Arm Neoverse Compute Subsystem (CSS) with OpenBMC

    Samer El-Haj-Mahmoud
    Samer El-Haj-Mahmoud
    Arm and 9elements Cyber Security have brought a prototype of OpenBMC to the Arm Neoverse Compute Subsystem (CSS) to advancing server manageability.
    • January 28, 2025