Arm Community
Arm Community
  • Site
  • User
  • Site
  • Search
  • User
Arm Community blogs
Arm Community blogs
Architectures and Processors blog Arm ML Processor: Powering Machine Learning at the Edge
  • Blogs
  • Mentions
  • Sub-Groups
  • Tags
  • Jump...
  • Cancel
More blogs in Arm Community blogs
  • AI blog

  • Announcements

  • Architectures and Processors blog

  • Automotive blog

  • Embedded and Microcontrollers blog

  • Internet of Things (IoT) blog

  • Laptops and Desktops blog

  • Mobile, Graphics, and Gaming blog

  • Operating Systems blog

  • Servers and Cloud Computing blog

  • SoC Design and Simulation blog

  • Tools, Software and IDEs blog

Tell us what you think
Tags
  • mobile
  • Mali
  • Neural Network
  • Artificial Intelligence (AI)
  • Machine Learning (ML)
  • Processors
Actions
  • RSS
  • More
  • Cancel
Related blog posts
Related forum threads

Arm ML Processor: Powering Machine Learning at the Edge

Ian Forsyth
Ian Forsyth
February 13, 2018
3 minute read time.

It would be really amazing to have a personal assistant in my hands that is actually smart, truly understands my words and responds intelligently to resolve day-to-day tasks. The recent advancements in Machine Learning (ML) make me optimistic that such a day is not too far away. ML has quickly moved from identifying cat pictures to solving real world problems significantly beyond the mobile market, in areas such as healthcare, retail, automotive and servers.

Now, the major challenge is to move this power to the edge and solve the privacy, security, bandwidth and latency issues that exist today. The Arm ML processor is a huge step in this direction.

Mobile Performance

The ML processor is a brand new design for the mobile and adjacent markets – such as smart cameras, AR/VR, drones, medical and consumer electronics – offering a performance of 4.6 TOP/s with an efficiency of 3 TOPs/W. Further compute and memory optimizations lead to a significant gain in performance for different networks.

The architecture consists of fixed-function engines, for the execution of convolution layers; and programmable layer engines, for executing non-convolution layers and implementing selected primitives and operators. The network control unit manages the overall execution and traversal of the network and DMA moves data in and out of the main memory. On-board memory allows central storage for weights and feature maps, reducing the traffic to external memory and, therefore, power.

Arm ML processor

Thanks to the presence of both fixed-function and programmable engines, the ML processor is extremely powerful, incredibly efficient and flexible enough to adapt to your future challenges, providing raw performance, along with the versatility to execute different neural networks effectively.

Key Features

  • Massive efficiency uplift from CPUs, GPUs, DSPs and accelerators
  • Enabled by open source software, so there’s no lock in
  • Closely integrated with existing software frameworks: TensorFlow, TensorFlow Lite, Caffe, Caffe 2
  • Optimized for use with Arm Cortex CPUs and Arm Mali GPUs

Arm ML architecture: flexible, scalable, futureproof

Trillium flexible scalable architecture

To tackle the challenges of multiple markets, with a wide range of performance requirements – from a few GOPs for IoT to tens of TOPs for servers – the ML processor is based on a new, scalable architecture.

The architecture can be scaled down to approximately 2 GOPs of performance for IoT or embedded level applications, or scaled up to 150 TOPs of performance for ADAS, 5G, or server-type applications. These multiple configurations can achieve many times the efficiency of existing solutions.

Compatible with existing Arm CPU, GPU and other IPs, providing a complete, heterogeneous system, the architecture will also be accessible through the popular ML frameworks, such as TensorFlow, TensorFlow Lite, Caffe and Caffe 2.

As more and more workloads move to ML, compute requirements will take on a wide variety of forms. Many ML use cases are already running on Arm, with our enhanced CPUs and GPUs providing a range of performance and efficiency levels. With the introduction of the Arm Machine Learning platform, we aim to extend that choice, providing a heterogeneous environment with the choice and flexibility required to meet each and every use case, enabling intelligent systems at the edge… and perhaps even the personal assistant I dream of.

Useful links

  • Arm NN SDK
  • Machine Learning on Arm - Frameworks Supporting Arm IP
  • Arm ML Processor
  • Compute Library
  • Cortex Microcontroller Software Interface Standard (CMSIS)

Related Machine Learning Resources

Related content

Anonymous
Architectures and Processors blog
  • Scalable Matrix Extension: Expanding the Arm Intrinsics Search Engine

    Chris Walsh
    Chris Walsh
    Arm is pleased to announce that the Arm Intrinsics Search Engine has been updated to include the Scalable Matrix Extension (SME) intrinsics, including both SME and SME2 intrinsics.
    • October 3, 2025
  • Arm A-Profile Architecture developments 2025

    Martin Weidmann
    Martin Weidmann
    Each year, Arm publishes updates to the A-Profile architecture alongside full Instruction Set and System Register documentation. In 2025, the update is Armv9.7-A.
    • October 2, 2025
  • When a barrier does not block: The pitfalls of partial order

    Wathsala Vithanage
    Wathsala Vithanage
    Acquire fences aren’t always enough. See how LDAPR exposed unsafe interleavings and what we did to patch the problem.
    • September 15, 2025