Let’s talk about how Arm enables developers and the IoT software ecosystem to deliver smart, energy efficient ML edge devices. The IoT is steadily growing and new innovations in AI technology are mind-blowing. In the IoT line of business, we are working to scale AI innovations to tiny, constrained ML edge devices that are frequently powered by microcontrollers.
Generative AI and large language models have captured the public attention. We all have fun experimenting with an AI chat bot, but it's both brilliant and equally flawed. One thing is obvious, an AI chat box that just lives within the chat box window is limited. It's cut off from the real world.
By connecting AI to the real world, it gets interesting. Merging the technology with IoT endpoints creates synergy and new possibilities. It's clear to me that AI at the edge and IoT are going to come together to create the platforms of the future.
The IoT can provide real time data, and it can connect to the real world. It's the eyes, the ears, it's the touch and all the senses, and it's the muscles. But ML edge devices are more than just sensing and actuating. It takes localized control and regulates critical functions. In human biology, it’s called the autonomic nervous system, and its autonomous control keeps things alive even when cloud AI is taking a nap. The combination of cloud AI and ML edge devices will revolutionize many services and industries. And, to cope with real-time, power, and data constraints several of the smart tasks will be handled locally on ML edge devices.
Arm offers a broad range of optimized processors targeting ML applications on edge devices. Even the smallest Arm processor, the Cortex-M0/M0+, executes simple ML algorithms. Starting with Cortex-M4, the processors add hardware floating point arithmetic and SIMD instructions to accelerate DSP and ML algorithms.
The Helium vector extension on Cortex-M52, M55, or M85 boosts these algorithms further and enables applications such as speech keyword spotting or object and anomaly detection. And the Ethos series of Neural Processor Units (NPU) is a turbo co-processor for even more demanding applications such as smart cameras with real-time object classification.
The diagram above shows typical use cases but selecting the right processor for an application can be a challenge. Fortunately, most ML models can be deployed to a variety of Arm processors and system architects may therefore initially focus on the software workloads. Still, it is important to understand memory and compute requirements on different platforms. Here the EEMBC/Spec AudioMark benchmark is helpful. It implements a typical audio pipeline with keyword spotting that you find in smart speakers. While individual ML algorithms operate significantly faster with Helium or Ethos-U, AudioMark lets you compare at the level of a complete application.
Developing a complete ML edge device is a multi-year endeavor. To accelerate this process code reuse and early validation is key. Corstone IoT sub-systems help during SoC design with the right architecture choice, integration, and verification. The various Corstone systems are available as FPGA image and Arm Virtual Hardware simulation model and support both hardware architects and software developers during the whole design phase of an ML edge device.
The Corstone-315 (shown below) integrates Cortex-M85 along with an optional Ethos-U65 NPU and Arm Mali-C55 image signal processor (ISP) to build low-power, low-cost, high-performance secure endpoint AI devices that support convolutional neural networks (CNNs). CNNs are powerful artificial neural networks that are well-suited for image recognition, for example in smart cameras.
To provide you with the best experience for developing ML applications Arm offers solutions that cover hardware components, tools, and software to make product development easy and productive. The ML Developers Guide for Cortex-M Processors and Ethos-U NPU gives you an overview of the ML development process. It introduces you to the Arm technology and products that support ML development workflows from starting ML model training up to debugging on hardware.
Microcontrollers that are based on Cortex-M55, Cortex-M85, and Ethos-U are now hitting the mass market, and low-cost evaluation boards let you explore this modern technology. A few examples of such evaluation boards are:
The embedded market is fragmented as it addresses a diverse range of application specific use cases. However, many embedded systems contain similar building blocks, and the standardization of commonalities enables code reuse across many systems and simplifies the product lifecycle management. And Arm invests in software standardization, development tools, and ecosystem partnerships.
Development tools:
Software building blocks:
Tool suites:
Many ecosystem partners such as ST, NXP, and IAR are utilizing CMSIS, PSA and other components in their development tools. ML Frameworks such as TensorFlow are validated for the Arm processor portfolio using Arm Compiler and AVH. And MLOps partners are now integrating the Arm foundation components into their MLOps systems.
The software and system design of a ML edge device can be separated into two parts:
Sensor, audio, or video inputs are typically converted into serial data streams for processing. Most ML applications process these data streams in several steps (see picture below). The signal conditioning and feature extraction is implemented using a resource optimized DSP front-end. The ML model gets as input this optimized data stream. Such an optimized ML processing pipeline is device agnostic and does not require the exact physical target. Software development uses therefore frequently simulation models or superset boards that offer more resources during test and validation. Once the implementation is tested on the Arm processor, it is relatively simple to adopt such a validated ML processing pipeline to a different target hardware.
During development DSP developers and ML training requires real world data that is collected with sensors of the edge device. This data collection is supported by the Synchronous Data Stream (SDS) Framework that allows recording of real-world data and play back to AVH FVP during validation. CMSIS-Stream helps developers to design and optimization processing pipelines with multiple DSP algorithms. Combined, SDS and CMSIS-Stream are effective tools that support you during the development cycle. With AVH FVP you can analysis correctness and performance of each step in an ML processing pipeline.
The development of the ML classification or ML model itself is a complex task that is typically performed by domain experts and data analysts. Fortunately, the market offers many powerful ML models that target Arm ML edge devices. Arm works with several AI ecosystem partners to optimize ML models for a variety of typical applications. Below are a few examples:
Get the ML Developers Guide for Cortex-M Processors and Ethos-U NPU:
Get the guide