Arm Community
Arm Community
  • Site
  • User
  • Site
  • Search
  • User
Arm Community blogs
Arm Community blogs
AI blog Powering the Edge: How Will YOU Do ML?
  • Blogs
  • Mentions
  • Sub-Groups
  • Tags
  • Jump...
  • Cancel
More blogs in Arm Community blogs
  • AI blog

  • Announcements

  • Architectures and Processors blog

  • Automotive blog

  • Embedded and Microcontrollers blog

  • Internet of Things (IoT) blog

  • Laptops and Desktops blog

  • Mobile, Graphics, and Gaming blog

  • Operating Systems blog

  • Servers and Cloud Computing blog

  • SoC Design and Simulation blog

  • Tools, Software and IDEs blog

Tags
  • Artificial Intelligence (AI)
  • Machine Learning (ML)
  • Processors
  • Edge Computing
Actions
  • RSS
  • More
  • Cancel
Related blog posts
Related forum threads

Powering the Edge: How Will YOU Do ML?

Dylan Zika
Dylan Zika
July 30, 2019
4 minute read time.

The Arm ML processor, designed to deliver the highest throughput and most efficient processing for on-device inference, is based on a brand new architecture. Arm's Dylan Zika explains how the development team set about defining requirements and building an ML powerhouse from the ground up.

You’ve got a great little edge device, and you’re keen to add machine learning capabilities to assist local decision-making. So, what do you do next?

Possibly the simplest course of action is to repurpose a CPU, GPU or DSP. A continual drive to improve performance and efficiency has seen the CPU evolve to become a kind of mission control for ML, either single-handedly managing entire ML workloads or distributing selected tasks to specialized ML processors. GPUs offer significant performance, but less flexibility, while DSPs are often cited having an immature programming environment¹.

But where you need a high level of responsiveness or power efficiency, these processors may struggle to meet requirements, and a dedicated neural processing unit (NPU) – such as the Arm ML processor – may be the most appropriate IP to integrate into your heterogeneous solution.

Chart: How much juice do you need to do ML?

The Future is Heterogeneous

Before we began to spec out the ML processor, we did A LOT of research, one element of which was a survey among chip and AI product designers in the global Arm ecosystem. Respondents were drawn from a range of sectors using AI-enabled technologies, including IoT (54 percent), industrial (27 percent), automotive (25 percent) and mobile computing (16 percent).

In one question, respondents were asked, “Thinking about future products or design projects, where do you think AI/ML functionality will be best computed for your device or app?” As the chart below shows, the majority of responses were split across CPU, GPU and a dedicated ML processor, with a slight overall preference for the latter.

Chart: where will your AI be computed?

Thinking about future products or design projects, where do you think AI/ML functionality will be best computed for your device or app?

This corroborated our other research to both validate the holistic approach we’d taken with Project Trillium, Arm’s ML platform – examining how flexible solutions can address use cases on a variety of IP – and underline the need for a dedicated ML processor to address the most challenging applications.

Building a Powerhouse from the Ground Up

Our next step was to deepen our discussion with the ecosystem. We took time to understand exactly what developers were looking for from an NPU, and we found that the majority of use cases fell into three broad groups: vision, voice and vibration.

In many cases, the aim was to drive an exhilarating user experience: How can we help users capture breathtaking memories with real-time photo Bokeh, or provide more accurate and responsive face unlock? How can we untether personal assistants from the cloud and deliver a truly personalized experience?

Other goals were more industrial in nature, ranging from automatic detection of poor operating behavior using anomalies in sensor data to the development of IP suitable for multiple market segments.

Working closely with our partners, we distilled these high-level use cases into requirements, supporting the neural frameworks of choice through open-source software, and identifying key architectures and operators for the processor’s feature set. We developed semi-fixed function hardware to accelerate these operators and included programmable hardware to “futureproof” the design, allowing the firmware to be updated as new features are developed.

Naturally, security is also an essential part of system design. We designed the ML processor to allow several implementation choices to address multiple risk profiles. We also used industry-proven Arm microcontroller technology with standard privilege levels and firmware that clears the SRAMs, making it easier to audit. No other solutions have these security features built in from the start. 

Targeting Performance AND Efficiency

Our deep-dive analysis led us to the sweet spot of performance vs power vs area: a processor that achieves a baseline 4 TOP/s in a single instantiation. For more demanding use cases, running a number of features concurrently, performance can be scaled up through multi-processing. Up to eight cores can be configured in a single cluster, achieving 32 TOP/s of performance, or a maximum of 64 cores in a mesh configuration, to reach over 250 TOP/s.

Of course, high performance is great…. but not if it’s draining your device’s battery every time you venture away from the wireless charging matt. For performance to truly be a benefit, it needs to be coupled with efficiency. That’s why the ML processor provides an industry-leading power efficiency of 5 TOPs/W, achieved through state-of-the-art optimizations such as neural compilation, efficient convolutions and bandwidth reduction mechanisms. This helps to lower cost and power requirements without compromising on user experience.

Taking a Deeper Dive

We’re immensely proud of the ML processor. Its optimized design delivers a massive uplift in efficiency compared to CPUs, GPUs and DSPs, and its scalable architecture delivers the computational determinism required for real-time responses – without compromising on flexibility.

If you’d like to dig deeper into the techniques we’ve used to bring this state-of-the-art processor to life, just click on the link below to download our white paper, Powering the Edge: Driving Optimal Performance with the Arm ML Processor.

Download Whitepaper

¹ Machine Learning at Facebook: Understanding Inference at the Edge

Anonymous
AI blog
  • Bringing Generative AI to the masses with ExecuTorch and KleidiAI

    Gian Marco Iodice
    Gian Marco Iodice
    With the recent Arm SME2 announcement, the role of Arm KleidiAI is increasingly clear as Arm’s AI accelerator layer powering the next wave of AI.
    • August 13, 2025
  • Yellow Teaming on Arm: A look inside our responsible AI workshop

    Annie Tallund
    Annie Tallund
    Led a hands-on Yellow Teaming workshop at WeAreDevelopers, exploring Responsible AI and LLMs on Arm-powered tech.
    • July 28, 2025
  • Arm at KubeCon and CloudNativeCon China 2025: Powering the future of Cloud Native AI

    Fei Xiang
    Fei Xiang
    Arm energized KubeCon + CloudNativeCon China 2025, driving record dev engagement and showcasing cloud-native AI innovation on Arm-based infrastructure.
    • July 21, 2025