Sparse Architecture Search (SpArSe): Democratizing and Enabling TinyML on Arm M-class

December 2, 2019

4 minute read time.

Microcontrollers (MCUs) are truly the ubiquitous computer of our time. They are tiny, cheap, and low power. They can often be powered indefinitely using a solar cell. They are in your watch, your fridge, and your car contains about 30 of them. The average household subsumes about three dozen. In simple terms, MCUs are the most abundant computing platform on the planet.

Commercial interest around the Internet of Things (IoT) often relates to smart sensor applications, where an MCU is attached to various types of sensors. The additional compute available in smart sensors allows them to curate and process measurements of the physical world. This is before potentially acting or transmitting data for further analysis. This additional compute capability is key, enabling local data analysis and reducing the amount of communication with a central processor. Without local intelligence, the amount of data that is generated would be far greater than could reasonably be processed in a centralized way. The oft-cited phrase “swimming in sensors, drowning in data” accurately reflects the problem that we hope addresses smart-sensors.

At the Arm ML Research Lab, we see machine learning (ML) as a key enabling technology for a host of exciting applications for smart sensors, as ML excels at challenging perception tasks such as image classification or speech recognition. Adding ML capabilities to smart sensors allows the sensor itself to start interpreting the physical world, which is a step change that is compared to the traditional paradigm of merely transmitting data to some central location. In recent years, MCUs have been used to inject intelligence and connectivity into everything from industrial monitoring sensors to consumer devices.

However, there is a problem. ML and deep neural networks specifically tend to be highly demanding in terms of compute and memory resources. Although MCUs are cheap and accessible, they also tend to be limited in terms of compute resources when compared to, say, phones or laptops. This can make it challenging to run neural networks on MCUs in general. In fact, even the simplest convolutional neural networks for vision applications, such as the well-known LeNet architecture do not fit in the limited memory of common MCU IoT platforms. The research interest around running ML on highly constrained hardware like MCUs has become known as ‘TinyML’.

We have been working on this problem for a few years as one focus for our AutoML projects. AutoML is an approach to designing and optimizing ML automatically, with minimal human input. It is a powerful concept that can really help democratize ML, but at the same time, it also presents several challenges. We view AutoML as a tool for TinyML, enabling us to discover neural networks that are both accurate and compatible with constrained hardware.

We have recently published our latest effort on AutoML for MCUs, which I am be presenting at the NeurIPS conference in Vancouver this December. The paper is entitled ‘SpArSe’; an acronym for Sparse Architecture Search. This work is from Arm’s ML Research Lab, and is the result of a collaboration between myself, Paul Whatmough, Matthew Mattina, and Professor Ryan Adams at Princeton University. NeurIPS is the premier conference for machine learning and we are excited to be showing work from the Arm ML Research Lab.

SpArSe challenges the idea that CNNs are too big for deployment on MCUs. Instead, we demonstrate that, using AutoML optimization technology, it is possible to design CNNs which generalize well, while also being small enough to fit onto memory-limited MCUs.

Our Sparse Architecture Search method combines neural architecture search with pruning in a single, unified approach, which learns superior models on four popular IoT datasets. The CNNs we find are more accurate and up to 7.4× smaller than previous approaches, while meeting the strict MCU working memory constraint, which can be as low as 2KB.

Taking these findings beyond concept phase, we have deployed these models on real MCU platforms using the uTensor framework, and have also been checking out TensorFlowLite Micro.

In summary, we had a lot of fun designing CNNs to fit in 2KB and running them on MCUs. If you’re interested to know more, please get in touch. And, if you’re planning on attending NeurIPS this December, please do drop by the poster session to say hello and find out more about Arm’s Machine Learning Research Lab and our other projects.

Read the Paper Contact Matthew Mattina

1 comment
0 members are here

Research Articles

HOL4 users' workshop 2025

Hrutvik Kanabar

Tue 10th - Wed 11th June 2025. A workshop to bring together developers/users of the HOL4 interactive theorem prover.
- March 24, 2025
TinyML: Ubiquitous embedded intelligence

Becky Ellis

With Arm’s vast microprocessor ecosystem at its foundation, the world is entering a new era of Tiny ML. Professor Vijay Janapa Reddi walks us through this emerging field.
- November 28, 2024
To the edge and beyond

Becky Ellis

London South Bank University’s Electrical and Electronic Engineering department have been using Arm IP and teaching resources as core elements in their courses and student projects.
- November 5, 2024

Research Articles

Sparse Architecture Search (SpArSe): Democratizing and Enabling TinyML on Arm M-class

HOL4 users' workshop 2025

TinyML: Ubiquitous embedded intelligence

To the edge and beyond