The Arm Cortex-M55 processor is the first processor with Armv8.1-M architecture which includes the implementation of Arm Helium Technology, also known as M-profile Vector Extension (MVE). Helium technology enables increased levels of Machine Learning and Signal Processing on the next generation of embedded devices.
MCU developers are already porting Machine Learning applications to Arm Cortex-M devices today using open-source libraries such as CMSIS-DSP, CMSIS-NN, and ML frameworks, such as TensorFlow Lite for Microcontrollers. TensorFlow Lite for Microcontrollers is designed to run machine learning models on processors with a very small memory footprint. Support for Cortex-M55 has been added to CMSIS-DSP and CMSIS-NN libraries. This makes porting ML applications to Cortex-M55 a lot easier because developers can use the same libraries and neural network frameworks, they are already familiar with.
Learn how to port CMSIS-DSP libraries to Cortex-M55 here: How to use the Arm Cortex-M55 Processor with the open-source CMSIS library.
In this article, we will detail the process of building and porting TensorFlow Lite for Microcontroller applications to Arm Cortex-M55 Fast and Cycle Model systems. Arm Fast Models are functionally accurate programmer’s view models of Arm IP. We use an Arm Cortex-M55 Fast Model system to debug, run, and validate the ML application we have built. We will then run the validated ML application on an Arm Cortex-M55 Cycle Model system to measure the accurate system performance. Arm Cycle Models are built directly from Arm RTL and provide complete cycle accuracy.
To learn more around Arm Fast Models and Cycle Models refer to links below:
To get started clone the project repository from github:
$git clone https://github.com/ARM-software/Tool-Solutions.git
The tflite-micro-models project contains everything you need to build Tensorflow Lite for Microcontrollers examples that run on M55 Fast and Cycle Model virtual platforms. Here is a list of what is included in the project:
We are using a Docker-based development environment for our project. It makes it easier to create a known good development environment with all the dependencies in one package.
First, download Arm Compiler 6.14 for Linux. You can download it directly from Arm Developer here or by running the get-ac6.sh file:
Then, build the Docker image for this project by running the build.sh file or with the following command:
$ docker build -t tflite-micro-models -f Dockerfile .
Next, start the docker container by running the run.sh file or with the following command:
$ docker run --network host -it tflite-micro-models /bin/bash
We are now in the Docker container and can start building the TensorFlow Lite Micro example applications.
TensorFlow Lite for Microcontrollers comes with several examples. With TensorFlow Lite for Microcontrollers you can generate standalone projects for Keil, Make, and Mbed development environments. In this example, we generate the project with Make.
The applications can be built with either the reference kernels in TFLite Micro(tensorflow/lite/micro/kernels) or with optimized kernels that make use of the CMSIS-NN library(tensorflow/lite/micro/kernels/cmsis-nn). The reference kernels do not include any platform-specific optimization. We build the applications with the optimized CMSIS-NN kernels.
CMSIS-NN is a collection of efficient neural network kernels that maximize the performance and minimize the memory footprint of neural networks on Cortex-M processor cores. Kernel support for Arm Helium Technology (M-Profile Vector Extension) has been added to the CMSIS-NN library.
To build the example applications targeted to a Cortex-M55 system use the build_tflite_micro_test.sh script. The example application you want to build is passed as an argument to this script. The following options are:
If you are using TensorFlow Lite for Microcontrollers framework for the first time, hello world is a classic start to understand the basics and learn the full end-to-end workflow.
In this article, we are going to port the micro speech example to Cortex-M55 Fast and Cycle Model systems. This example runs a 22kB TensorFlow Lite model and uses very little memory (around 10kB of RAM). The model can recognize two keywords, yes and no from sample speech data.
To build the micro_speech_test executable in the container use:
$ ./build_tflite_micro_test.sh -t micro_speech
The resulting executable “micro_speech_test” is copied into the Cortex-M55/software/exe directory. To understand what this test does, you can inspect the source code here. It essentially creates an interpreter, gets a handle to the TensorFlow Lite model and then runs the interpreter with the model and some sample inputs.
We are now ready to run the micro_speech_test executable on the Cortex-M55 Fast and Cycle Model systems. To get access to these systems please contact us here.
Arm Fast Models are fast, functionally accurate programmer’s view models of Arm CPU and System IP. You can develop software targeting Arm IP using Fast Models well before any hardware for it is available, ideal for developing on the Cortex-M55.
The TensorFlow Lite for Microcontrollers executable “micro_speech_test” runs on a simple M55 Fast Model system described in the following.
// This file was generated by System Generator Canvas
ramdevice : RAMDevice("size"=0x100000000);
pvbus2ambapv : PVBus2AMBAPV();
BusDecoder : PVBusDecoder();
armm55ct : ARMCortexM55CT("MVE"=2, "CFGDTCMSZ"=0xf, "CFGITCMSZ"=0xf);
Clock1Hz : MasterClock();
Clock100MHz : ClockDivider("mul"=100000000);
Clock100MHz.clk_out => armm55ct.clk_in;
Clock1Hz.clk_out => Clock100MHz.clk_in;
armm55ct.pvbus_m => BusDecoder.pvbus_s;
pvbus2ambapv.amba_pv_m => self.amba_pv_m;
BusDecoder.pvbus_m_range[0x0..0x9fffffff] => ramdevice.pvbus;
BusDecoder.pvbus_m_range[0xa8000000..0xa8001000] => pvbus2ambapv.pvbus_s;
master port<AMBAPV> amba_pv_m;
Here is the output from running micro_speech_test on this M55 Fast Model system:
Fast Models [11.10.22 (Mar 11 2020)]
Copyright 2000-2020 ARM Limited.
All Rights Reserved.
1/1 tests passed
~~~ALL TESTS PASSED~~~
simulation is complete
Info: /OSCI/SystemC: Simulation stopped by user.
The output means that the trained micro speech TensorFlow model was loaded successfully on the Cortex-M55 target, some example inputs were run through it, and it got the expected outputs.
To learn more about getting started with Arm Fast Models refer to the Quick Start here.
You can also port these examples to the Arm Cortex-M55 FVP (Fixed Virtual Platform) by making minor modifications to the system-specific code in TFLite_micro_IPSS_Support.
This type of workflow, leveraging Arm Fast Models, is useful to verify neural network behavior on a platform before hardware is available. To obtain and analyze cycle accurate performance metrics from the simulation, such as how long the network took to execute, the Arm Cortex-M55 Cycle Model system is used.
Arm Cycle Models are 100% functional and cycle accurate models of Arm IP, compiled directly from RTL.
The Cortex-M55 Cycle Model is available on Arm IP Exchange. We built up a simple Cortex-M55 Cycle Model system using this CPU model in addition to Cycle Models for the system interconnect and memory. By matching the memory map of the M55 Cycle Model system to that of the Fast Model system, we can run the same micro_speech_test executable on the Cycle Model system.
Running the micro_speech_test executable on the M55 Cycle Model system gives us an accurate cycle count for the entire application. We use the cycle accurate simulation results to measure system critical performance metrics such as memory bandwidth utilization. We can also insert software markers in our TensorFlow Lite application to measure the cycle count for running just the inference on the TensorFlow Lite model.
Support for Cortex-M55 in the Arm Compiler and the tight integration of CMSIS-NN libraries into TensorFlow Lite for Microcontrollers has made the process of porting ML workloads to new Cortex-M devices quick and easy to use. Furthermore, the availability of Arm Fast Models and Cycle Models for Cortex-M55 enables early software bring-up, validation, and performance analysis with ML workloads before any hardware devices are available.
Learn more about TensorFlow Lite for Microcontrollers
Learn more about Arm tools for Cortex-M55