Arm ML Processor: Powering Machine Learning at the Edge

February 13, 2018

3 minute read time.

It would be really amazing to have a personal assistant in my hands that is actually smart, truly understands my words and responds intelligently to resolve day-to-day tasks. The recent advancements in Machine Learning (ML) make me optimistic that such a day is not too far away. ML has quickly moved from identifying cat pictures to solving real world problems significantly beyond the mobile market, in areas such as healthcare, retail, automotive and servers.

Now, the major challenge is to move this power to the edge and solve the privacy, security, bandwidth and latency issues that exist today. The Arm ML processor is a huge step in this direction.

Mobile Performance

The ML processor is a brand new design for the mobile and adjacent markets – such as smart cameras, AR/VR, drones, medical and consumer electronics – offering a performance of 4.6 TOP/s with an efficiency of 3 TOPs/W. Further compute and memory optimizations lead to a significant gain in performance for different networks.

The architecture consists of fixed-function engines, for the execution of convolution layers; and programmable layer engines, for executing non-convolution layers and implementing selected primitives and operators. The network control unit manages the overall execution and traversal of the network and DMA moves data in and out of the main memory. On-board memory allows central storage for weights and feature maps, reducing the traffic to external memory and, therefore, power.

Arm ML processor

Thanks to the presence of both fixed-function and programmable engines, the ML processor is extremely powerful, incredibly efficient and flexible enough to adapt to your future challenges, providing raw performance, along with the versatility to execute different neural networks effectively.

Key Features

Massive efficiency uplift from CPUs, GPUs, DSPs and accelerators
Enabled by open source software, so there’s no lock in
Closely integrated with existing software frameworks: TensorFlow, TensorFlow Lite, Caffe, Caffe 2
Optimized for use with Arm Cortex CPUs and Arm Mali GPUs

Arm ML architecture: flexible, scalable, futureproof

Trillium flexible scalable architecture

To tackle the challenges of multiple markets, with a wide range of performance requirements – from a few GOPs for IoT to tens of TOPs for servers – the ML processor is based on a new, scalable architecture.

The architecture can be scaled down to approximately 2 GOPs of performance for IoT or embedded level applications, or scaled up to 150 TOPs of performance for ADAS, 5G, or server-type applications. These multiple configurations can achieve many times the efficiency of existing solutions.

Compatible with existing Arm CPU, GPU and other IPs, providing a complete, heterogeneous system, the architecture will also be accessible through the popular ML frameworks, such as TensorFlow, TensorFlow Lite, Caffe and Caffe 2.

As more and more workloads move to ML, compute requirements will take on a wide variety of forms. Many ML use cases are already running on Arm, with our enhanced CPUs and GPUs providing a range of performance and efficiency levels. With the introduction of the Arm Machine Learning platform, we aim to extend that choice, providing a heterogeneous environment with the choice and flexibility required to meet each and every use case, enabling intelligent systems at the edge… and perhaps even the personal assistant I dream of.

AI blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded and Microcontrollers blog

Internet of Things (IoT) blog

Laptops and Desktops blog

Mobile, Graphics, and Gaming blog

Operating Systems blog

Servers and Cloud Computing blog

SoC Design and Simulation blog

Tools, Software and IDEs blog

Arm ML Processor: Powering Machine Learning at the Edge

Mobile Performance

Key Features

Arm ML architecture: flexible, scalable, futureproof

Useful links

Related Machine Learning Resources

Related content

Introducing GICv5: Scalable and secure interrupt management for Arm

Getting started with AARCHMRS Features.json using Python

Advancing server manageability on Arm Neoverse Compute Subsystem (CSS) with OpenBMC