Machine Learning Silicon Isn’t One Size Fits All

March 1, 2018

4 minute read time.

These days, just about everyone in the technology industry is talking Artificial Intelligence (AI) and Machine Learning (ML). There’s a huge amount of excitement and a rush to be the first to get it right. What you might have noticed in this dialogue is that almost everyone is talking big, powerful, Neural Network accelerators as an essential part of bringing ML to life on your device – and whilst it’s true that they have a significant role to play, they’re just one part of the story.

Early ML was performed in the cloud with very large data sets, making significant processing power an absolute essential, but today – particularly in the mobile and smart device sectors – the focus is shifting to what can be achieved at the edge.

There are a number of reasons for this shift, not least latency, reliability and responsiveness – factors that are of considerable importance to the consumer. Edge compute can provide the kind of always-on, always-available type usability that we’ve come to expect from our devices, while significant reductions in latency and bandwidth can be achieved by removing the need to go back and forth to the cloud. Security – a high-profile topic in the industry at the moment – is another excellent justification for performing ML in the palm of your hand, rather than sending your data back and forth across the ether, with all the increased potential for security breaches that implies.

Achieving ML at the Edge

So, if ML at the edge is your goal, how can you make it happen? Well, it all depends on what you’re trying to achieve. A System on Chip (SoC) contains multiple processors that are each suited for many different activities. People often ask which is best for running ML, but the simple answer is… it depends. There’s a spectrum of compute, with varying degrees of power and area, and different combinations of IP can achieve the same results, so the processor you choose to perform these tasks all comes down to the trade-offs you’re prepared to make.

The current trend to push to smaller and smaller workloads, for example, makes super-area-efficient Cortex-M processors ideal for simple tasks like voice activation. Speech processing requires more processing power, meaning you might want to choose a slightly larger CPU to handle it, and image processing yet more, for which the wide execution engines of the GPU might be most appropriate.

However, the launch of Project Trillium, Arm’s Machine Learning (ML) platform, brings with it a further, exciting proposition that enables a new era of ultra-efficient inference at the edge. The platform – which includes the Arm ML processor, along with Arm NN open-source software – provides a new class of highly scalable processors that have been specifically designed for machine learning and neural network capabilities.

Arm ML system story

The ML processor is optimized for the mobile and smart camera markets. Designed from the ground up, it offers the highest performance per mm² available today, typically over 4 tera ops (TOPs) per second, with additional optimization providing a further uplift of 2x to 4x in real-world use cases. It’s also extremely energy efficient, providing 5 TOPs per watt – a factor that’s hugely important for mobile and its thermal- and cost-constrained environments.

Flexible, scalable, futureproof

To tackle the challenges of multiple markets with a range of performance requirements, this processor is based on a new, highly scalable architecture. Future derivatives of this architecture will meet an enormous range of performance requirements, scaling as low as 2 GOPs for IoT and always-on devices to over 150 TOPs for server-type applications.

In fact, the Project Trillium architecture is the only complete, heterogeneous compute platform for ML. And the beauty of it is that it’s compatible with existing Arm IP, so you can now select a comprehensive Arm ML solution that’s tailored to your requirements, from Arm Cortex-M processors for smart, connected embedded applications to Arm Mali-G72 for demanding on-device use cases, or the ML processor itself. This flexibility to address all use cases is unique to Arm.

Project Trillium Arm ML diagram

AI, powered by ML, is well on its way to becoming the biggest disruptor the tech industry – and, indeed, the world – has seen for decades, and has been impacting the way we design all of our products for some time. Cortex-A processors have been gaining support for ML workloads across the last few iterations, notably with last year’s launch of DynamIQ flexible architecture, and Mali too is seeing great improvements in ML capability across the tiers, from mainstream to premium GPUs, with the ML-optimized Mali-G72 GPU winning Linley’s award for Processor of the Year 2017.

So, whether your focus is end usability, silicon cost, or integration effort, there is an Arm processor, or a combination of them, for any ML workload.

Learn more about Project Trillium

0 comments
0 members are here

Architectures and Processors blog

Introducing GICv5: Scalable and secure interrupt management for Arm

Christoffer Dall

Introducing Arm GICv5: a scalable, hypervisor-free interrupt controller for modern multi-core systems with improved virtualization and real-time support.
- April 28, 2025
Getting started with AARCHMRS Features.json using Python

Joh

A high-level introduction to the Arm Architecture Machine Readable Specification (AARCHMRS) Features.json with some examples to interpret and start to work with the available data using Python.
- April 8, 2025
Advancing server manageability on Arm Neoverse Compute Subsystem (CSS) with OpenBMC

Samer El-Haj-Mahmoud

Arm and 9elements Cyber Security have brought a prototype of OpenBMC to the Arm Neoverse Compute Subsystem (CSS) to advancing server manageability.
- January 28, 2025

AI blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded and Microcontrollers blog

Internet of Things (IoT) blog

Laptops and Desktops blog

Mobile, Graphics, and Gaming blog

Operating Systems blog

Servers and Cloud Computing blog

SoC Design and Simulation blog

Tools, Software and IDEs blog

Machine Learning Silicon Isn’t One Size Fits All

Achieving ML at the Edge

Flexible, scalable, futureproof

Introducing GICv5: Scalable and secure interrupt management for Arm

Getting started with AARCHMRS Features.json using Python

Advancing server manageability on Arm Neoverse Compute Subsystem (CSS) with OpenBMC