This blog post describes the steps to develop tiny machine learning (ML) software by using TensorFlow Lite for Microcontrollers (TFLu) on the Corstone-3xx platform. It assumes that you have a basic knowledge of ML software and embedded-M software programming.
This blog post assumes that you have the following tools available on your machine:
The Corstone-3xx portfolio enables SoC designers to quickly develop IoT and endpoint AI devices based on the Arm V8-M processor and Arm Ethos-U NPU.
The following figure shows that the Corstone-3xx portfolio contains a configurable example subsystem, Arm System IP, software, and tools:
This section describes how to quickly develop and evaluate TFLu inference software on the Corstone-310 platform. Similarly, we can extend this method to other Corstone-3xx platforms easily.
The general TFLu-based ML software development processes on the Corstone-3xx platform are as follows:
To start machine learning software in the Corstone-310 platform, perform the following steps:
The Arm® Keil® Studio Pack is a collection of Visual Studio Code extensions. The pack provides the software development environment for embedded systems and IoT software development on Arm-based microcontroller (MCU) devices.
Follow the instructions described in Create a solution to create a new solution in Visual Studio Code and CMSIS.
For the Corstone-310 project, you can select SSE-310 as shown in the following figure. You can select a proper Compiler toolchain and create the project.
The following figure shows the structure of the created project:
The TFLM software packs are available from the CMSIS-Pack web page. The Keil Studio provides Manage Software Components function to add the software packs into the project.
The required TFLM software packs are shown in the following figures:
The optimized model includes TensorFlow Lite custom operators for the parts of the model that can be accelerated by the Ethos-U NPU.
Add the Ethos-U driver software pack into the project as shown in the following figure. The Corstone-310 platform uses U55.
The CMSIS-NN is a software library of neural network kernels that are optimized for various Arm Cortex-M processors. The model parts that cannot be accelerated by the NPU are left unchanged and run on the Cortex‑M series CPU by using an CMSIS-NN kernel.
Add the CMSIS-NN and DSP software packs as shown in the following figure.
This blog post uses a pre-trained model created from the CIFAR-10 dataset and TensorFlow framework. The input to the model is a 32x32 pixel image with 3 color channels (RGB), which are classified into one of the 10 output classes.
To deploy your Neural Network (NN) model on Ethos-U, perform the following steps:
Vela is an open-source Python tool which can optimize a neural network model to run on an embedded system containing an Ethos-U NPU.
For details about Vela , see Vela Compiler: The first step to deploy your NN model on the Arm Ethos-U microNPU.2. Convert the result model into the firmware image, which must be represented as an array in a C/C++ language syntax.
You can use the hexdump utility for example xxd to convert the binary .tflite file into a C/C++ source file that you can add to your project. For examples:
xxd -i cifar10_Int8_vela_H256.tflite > cifar10_Int8_vela_H256.cc3. Add the .cc file to the project.
xxd -i cifar10_Int8_vela_H256.tflite > cifar10_Int8_vela_H256.cc
Here are the steps for how application code calls the TFLM inference APIs to run inference on the model.
1. Load the model
Application code calls tflite::GetModel(cifar10_Int8_tflite_vela) to load the model.
2. Set up the operation resolver
For the pre-trained CIFAR-10 model, the following snippet shows you how to set up an operation resolver for this model.
op_resolver.AddConv2D(); op_resolver.AddMaxPool2D(); op_resolver.AddReshape(); op_resolver.AddFullyConnected(); op_resolver.AddSoftmax(); op_resolver.AddRelu(); op_resolver.AddEthosU();
The last one, Add EthosU, is to add TensorFlow Lite custom operators. TensorFlow Lite custom operators are for the model parts that can be accelerated by the Ethos-U NPU.
3. Create and initialize the MicroInterpreter
tflite::MicroInterpreter interpreter(model, op_resolver, tensor_arena, kTensorArenaSize); interpreter.AllocateTensors());
4. Set input data
Application code calls the interpreter.input(0) function to get the input tensor and sets up proper input data for the model. The pretrained model supports 32*32 pixels with 3 bytes for RGB.
5. Invoke the Model
Application code calls interpreter.Invoke() to run inference on the model.
6.Get the output
Application code calls interpreter.output(0) to get the inference output.
After the project built successfully, perform the following steps:
1.Start a debug session in Keil Studio
Following the instructions described in Start a debug session to set up a debug session for project.
The following figure shows the debug session settings for the Corstone-310 FVP.
2. Run the image in FVP
The pre-trained CIFAR-10 model is used for image classification. It can classify images into 10 classes, including:
The output from this model is a 1D tensor with a size of 10. Each element in this vector corresponds to the predicted probability that the input image belongs to one of the 10 categories.
The following figure shows the output text from inference on the CIFAR-10 model in the Corstone-310 FVP. The model inference classifies the input image as a Truck correctly.
The Corstone-3xx series and its related tools and ecosystem help you rapidly develop endpoint AI devices and deploy ML software on them.
This section lists relevant Arm publications for your reference: