Train a TinyML model to recognize sounds that uses 23 kB of RAM

April 6, 2020

6 minute read time.

**All content in this blog provided by Daniel Situnayake (@dansitu), Founding TinyML Engineer at Edge Impulse**

Over the past few months, you have may have heard talk about TinyML: the idea of running machine learning models on Cortex-M chips at under 1mW of power. In this article I introduce TinyML, applications for audio recognition, and how to get started on a Cortex-M4 development board.

TinyML is exciting because it helps tiny devices make decisions based on huge amounts of data—without wasting time and energy transmitting it elsewhere. For example, imagine you are tracking animal behavior in the African Savanna. You want to know how often lions roar at different times of day.

An image of a lion roaring.

You have a few choices of how to collect this data:

Hide out in the long grass with a notepad and pencil, making a note every time you hear a roar.
Set up an audio recorder with a battery, and pick up the memory card every few weeks.
Transmit audio over a data connection, perhaps a cellular network if available.

All these work, but there are some major drawbacks:

Keeping a human on-site is expensive, and there may be safety issues to think about
Driving out to collect a memory card takes time and money, and you only get new data every few weeks.
Transmitting data uses lots of energy and money, and bandwidth is probably limited in lion territory. You might get the data faster, but you will still have to drive out and change the battery.

In addition to these points, counting lion roars in a week’s worth of audio recordings is really boring and costs precious funds. To relieve the tedium, you could train a machine learning model to recognize lion roars in the recordings and count them automatically. To do this, you’d collect a set of labelled data, feed it into an algorithm, and create a model that can spot roars in audio.

A diagram to show a training model.

This would solve the problem of listening to hours of Savanna audio (which, in retrospect, could be quite relaxing). But it still leaves the drawbacks described above.

But there is some hope. In the past, machine learning models have had to live on big, powerful hardware, so they could only be run on a server in the lab. However, in recent years, machine learning algorithms and low-power hardware have evolved to the point that it is possible to run sophisticated models on embedded devices.

What if we took our lion roar counting model and deployed it to an embedded device, out in the field? Here are some of the benefits:

Instead of streaming all the audio VIA an expensive high-bandwidth connection, our device could count how many roars it hears and send an hourly total VIA low-power, long range radio, like LoRa.
There would be no need to store the audio or collect a memory card, since the number of roars is all we need.
The device could be cheap and extremely low power, running for years from a single battery.
Nobody would have to listen to a 100-hour wildlife mix tape.

This sounds like a great solution. We solve some real problems and end up with a cheaper, more reliable solution than what we had before.

But machine learning is an intimidating subject. It is highly technical, involves a lot of new concepts, and there are a bunch of pitfalls that make it easy to train a model that seems useful, but does not do the job.

Even more, writing machine learning code that runs on embedded devices is hard. In addition to needing knowledge of machine learning and signal processing algorithms, you will often be running at the limits of the hardware, and you will need to use every trick in the book to squeeze out all of the performance you can for a given type of chip.

When we were writing the TinyML book, I realized that while it is easy for anyone to get started learning machine learning on embedded devices. It is a lot harder to build something ready for production. For the average engineer, focused on solving real-world problems, there just are not enough hours in the day to spend studying machine learning. This is let alone optimizing low-level ML code for specific microcontroller architectures. Machine learning sounds like a great solution, but it requires a huge investment to learn and use.

This is why I am so excited about Edge Impulse (in fact, so much so that I joined the team). It is a set of tools that takes care of the hairy parts of machine learning, letting developers focus on the problem they are trying to solve. Edge Impulse makes it easy to collect a dataset, choose the right machine learning algorithm, train a production-grade model, and run tests to prove that it works. It then exports the whole thing as an efficient, highly optimized C++ library designed to drop easily into your project.

Using Edge Impulse, the steps for creating our roar-counting model are simple:

Collect a small amount of audio data, labeled with “roar” or “not roar”. Even just a few minutes are enough to get started.
Upload the data to Edge Impulse using the Edge Impulse CLI.
Follow the instructions to train a simple model.
Add more data and tweak the model’s settings until you get the level of accuracy you need.
Export the model as a C++ library and add it to your embedded project.

The whole process is quick enough to run through in a few minutes, and you do not have to visit the African Savanna. Instead, you can step through this tutorial, which is also available in video form:

Recognize sounds from audio

Since you may lack any lions, the tutorial has you train a model that can recognize household sounds: namely, the sound of running water from a faucet. The model you will train is around 18Kb in size, which is mind-blowing and small for something so sophisticated, and leaves a lot of space for your application code.

If you have an STM32 IoT Node Discovery Kit board, based on an Arm Cortex-M4, you can capture your own dataset over WiFi or serial. If you do not, or while you are waiting for one to arrive, you can download a pre-built dataset collected from my Sunnyvale apartment. Edge Impulse builds a compact, stand-alone C++ library that can be built into any Cortex-M or Cortex-A device. We automatically make use of FPU, vector extensions, CMSIS-DSP, and CMSIS-NN to optimize performance and minimize RAM and Flash usage.

Beyond lions and faucets, there are a huge range of applications for TinyML. Imagine tiny devices that can recognize speech commands (there is a dataset for that one too). Hear when machines are malfunctioning, and understand the activities happening in a home based on the ambient sounds that are present. The best part is that with inference on-device, user privacy is protected—no audio ever needs to be sent to the cloud.

By making it easy for any developer to build machine learning applications, Edge Impulse is opening the field for everyone to turn their amazing ideas into hardware. And since we are continually improving our platforms as the technology evolves, everyone benefits from the latest production-ready algorithms and techniques.

It is an exciting time to be an embedded engineer. We would love to hear what you are planning to build. Try our audio classification tutorial, and let us know what you think in the comments, on our forum, and on the @edgeimpulse Twitter.

For more, please check out this awesome webinar on Getting started with TinyML.

Watch the TinyML Hackster Webinar event

Daniel Situnayake (@dansitu)

Founding TinyML Engineer at Edge Impulse

AI blog

Ethos-U and Beyond: How ExecuTorch 1.0 powers AI at the edge

Per Åstrand

AI meets the edge: ExecuTorch 1.0 brings PyTorch performance and portability to Arm’s tiniest, most efficient devices.
- October 22, 2025
Arm neural technology in ExecuTorch 1.0

Robert Elliott

With the announcement of Arm neural technology, Arm is enabling neural networks and a new class of neural graphics capabilities.
- October 22, 2025
ExecuTorch 1.0 is here and with SME2 optimizations through KleidiAI

Gian Marco Iodice

Today marks an exciting milestone with the official general availability (GA) release of ExecuTorch 1.0, a lightweight, production-ready runtime from the PyTorch ecosystem.
- October 22, 2025

AI blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded and Microcontrollers blog

Internet of Things (IoT) blog

Laptops and Desktops blog

Mobile, Graphics, and Gaming blog

Operating Systems blog

Servers and Cloud Computing blog

SoC Design and Simulation blog

Tools, Software and IDEs blog

Train a TinyML model to recognize sounds that uses 23 kB of RAM

Ethos-U and Beyond: How ExecuTorch 1.0 powers AI at the edge

Arm neural technology in ExecuTorch 1.0

ExecuTorch 1.0 is here and with SME2 optimizations through KleidiAI