The Cortex-M55 processor is Arm’s most AI-capable cortex-M processor and is the first CPU based on Arm’s Helium technology. While the Cortex-M55 is fast enough to run ML models on the tiny microcontrollers, integration with the Arm Ethos-U55 microNPU can accelerate ML inference in embedded systems up to 480x faster.
The Ethos-U55 is a machine learning processor which has been optimized to execute common mathematical ML algorithm operations such as convolutions or activation functions. The Ethos-U processor supports popular neural network models such as CNNs and RNNs for audio processing, speech recognition, image classification and object detection.
To run your inference on Ethos-U NPU, it is essential to quantize the network operators to either 8-bit (unsigned or signed) or 16-bit (signed) as Ethos-U supports only 8-bit weights or 16-bit activations. TensorFlow Model Optimization Toolkit enables developers to optimize ML models for deployment on devices with tight memory, power constraints, and storage limitations. There are different optimization techniques including quantization, pruning, and clustering which is part of the TensorFlow Model Optimization Toolkit and compatible with TensorFlow Lite. For example, you can perform post-training integer quantization to the model to convert the weights and activations from floating point numbers to integer numbers after loading the converted model with TFLiteConverter. Notice that once a model is pruned and clustered, there is usually a small training performed post pruning/clustering to account for the lost accuracy. So, you need to make a trade-off between the model complexity and size.
Optimize your model using post-training quantization: def representative_dataset(): for _ in range(100): # Using some random data for testing purposes data = np.random.rand(1, 244, 244, 3) yield [data.astype(np.float32)] # Load the model into TensorFlow using TFLite converter converter = tf.lite.TFLiteConverter.from_saved_model(“model_tf”) # Set options for full integer post-training quantization converter.optimizations = [tf.lite.Optimize.DEFAULT] converter.representative_dataset = representative_data_gen # Ensure that if any ops can't be quantized, the converter throws an error converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8] # Set the input and output tensors to int8 converter.inference_input_type = tf.int8 converter.inference_output_type = tf.int8 # Convert to TFLite tflite_model_quant = converter.convert()
def representative_dataset(): for _ in range(100): # Using some random data for testing purposes data = np.random.rand(1, 244, 244, 3) yield [data.astype(np.float32)] # Load the model into TensorFlow using TFLite converter converter = tf.lite.TFLiteConverter.from_saved_model(“model_tf”) # Set options for full integer post-training quantization converter.optimizations = [tf.lite.Optimize.DEFAULT] converter.representative_dataset = representative_data_gen # Ensure that if any ops can't be quantized, the converter throws an error converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8] # Set the input and output tensors to int8 converter.inference_input_type = tf.int8 converter.inference_output_type = tf.int8 # Convert to TFLite tflite_model_quant = converter.convert()
To deploy your NN model on Ethos-U, you need to compile the trained quantized model with Vela to generate an optimized NN model for Ethos-U. Vela is an open source Python tool which compiles a TFLite NN model into an optimized version that can run on an embedded system containing Arm Ethos-U NPU. You can install Vela by running $pip install ethos-u-vela command and then compile the network using a particular Ethos-U NPU config such as ethos-u55-128 by running the following command line. Read more about the different command-line options with vela from https://pypi.org/project/ethos-u-vela/.
$ vela model.tflite \ --accelerator-config=ethos-u55-128 \ --optimise Performance --config vela.ini --system-config=Ethos-U55_High_End_Embedded –-output-dir ./results_dir /path/to/model.tflite \
accelerator-config specifies the microNPU configuration to use between:
Figure 1: Vela Workflow
The output of the Vela is an optimized TensorFlow Lite file which is ready to deploy on a system using an Ethos-U NPU in this case the Arm Virtual Hardware configured with Corstone-300 FVP.
Normally you can load the TFLite model from the disk using TensorFlow Lite Interpreter Python API for deployment. # Load the TFLite model in TFLite Interpreter interpreter = tf.lite.Interpreter(model_content=tflite_model)
# Load the TFLite model in TFLite Interpreter interpreter = tf.lite.Interpreter(model_content=tflite_model)
However, most microcontrollers do not have a filesystem and therefore extra code and space is required to load a model from disk. An effective way is to provide the model in a C source file that can be included in our binary and loaded directly into memory. To do so, you need to use the TensorFlow Lite for microcontrollers C++ library to load the model and make predictions. Another fast and easy way is using open-source Arm ML Embedded Evaluation Kit. It enables developers to quickly execute the neural networks model using TensorFlow Lite for microcontrollers inference engine targeting Arm Cortex-M55 and Ethos-U microcontrollers.
The Arm ML Evaluation Kit allows developers to quickly build and deploy embedded machine learning applications for Arm Cortex-M55 and Arm Ethos-U55 NPU. It contains developed software ML applications for Ethos-U55 systems including:
Therefore, you can quickly evaluate the performance metrics of the networks running on Cortex-M CPU and Ethos-U NPU with these ready-to-use ML examples. You can also easily create your custom ML software applications for Ethos-U with the generic inference runner available in the evaluation kit. The generic inference runner allows you to feed any model and obtain the performance matrix such as number of NPU cycles and the amount of memory transactions across different buses.
The software stack of evaluation kit contains different layers with the application on top and dependencies at the bottom. After the configuration of build system for Ethos-U NPU, the integrated TensorFlow Lite for Microcontrollers with the Ethos-U NPU driver executes the certain operators that can be accelerated by the Ethos-U NPU. For unsupported neural network model operators on NPU, the inference run on the CPU using CMSIS-NN. CMSIS-NN optimizes CPU workload execution or using reference kernels that are provided by the inference engine. The Hardware Abstraction Layer (HAL) sources provide a platform agnostic API to access hardware platform-specific functions.
Figure 2: ML Evaluation Kit Software Stack
The ML Eval Kit is based on Arm Corstone-300 reference package which helps SoC designers to build secure systems faster. It maximizes the performance of IoT and embedded devices by taking advantage of Arm Cortex-M55 processor. The Corstone-300 can easily integrate the Ethos-U55, and the platform is available as Ecosystem FPGA (MPS3) and Fixed Virtual Platform (FVP) to allow development ahead of hardware availability (silicon hardware will be released soon).
Figure 3: ML Evaluation Kit Hardware Stack
FVPs are a digital twin of the MPS3 FPGA image. It enables developers to quickly build and evaluate real-world embedded ML applications on the virtual platforms using Arm Cortex-M55 and Arm Ethos-U55 design.
The Corstone-300 FVP with Ethos-U55 and Ethos-U65 is available as part of Arm Virtual Hardware. Arm Virtual Hardware provides functionally accurate models of Arm-based SoCs for application developers to build and test software before and after silicon and hardware availability, helping to accelerate the development of IoT and endpoint AI applications. It runs as a simple application in the cloud for simulating memory and peripherals, removing the complexity of building and configuring board farms.
A common workflow to build and run ready to use ML Eval Kit ML examples such as keyword spotting on Cortex-M and Ethos-U with ML-Eval-Kit is as follows:
1. Ensure the following prerequisites are installed and they are available on the path. GNU Arm embedded toolchain version 10.2.1 or higher or the Arm Compiler version 6.15 or higher CMake version 3.15 or above Python 3.6 or above Python virtual environment module Make An Arm Corstone-300 based FV
GNU Arm embedded toolchain version 10.2.1 or higher or the Arm Compiler version 6.15 or higher CMake version 3.15 or above Python 3.6 or above Python virtual environment module Make An Arm Corstone-300 based FV
2. Clone the Ethos-U evaluation kit repository $ git clone https://review.mlplatform.org/ml/ethos-u/ml-embedded-evaluation-kit $cd ml-embedded-evaluation-kit
$ git clone https://review.mlplatform.org/ml/ethos-u/ml-embedded-evaluation-kit $cd ml-embedded-evaluation-kit
3. Pull all the external dependencies $ git submodule update --init
$ git submodule update --init
4. Execute the build_default.py to configure the build system with default setting like MPS3 FVP target and Ethos-U55 timing-adapter.
a. If using Arm GNU embedded toolchain $ python build_default.py
$ python build_default.py
b. If using Arm Compiler $ python build_default.py –toolchain arm
$ python build_default.py –toolchain arm
5. Compile the project with a make command
6. Results of the build are placed under build/bin directory, for example:
bin ├── ethos-u-<use_case_name>.axf ├── ethos-u-<use_case_name>.htm ├── ethos-u-<use_case_name>.map └── sectors ├── audio.txt └── <use_case> ├── ddr.bin └── itcm.bin
7. Launch the desired application on the FVP with Arm Virtual Hardware if using FVP. For example the keyword spotting use case on Ethos-U55 can be started by following command:$ FVP_Corstone_SSE-300_Ethos-U55 -a ./build/bin/ethos-u-kws.axf
$ FVP_Corstone_SSE-300_Ethos-U55 -a ./build/bin/ethos-u-kws.axf
Learn more about different command-line parameters that FVP supports from here.
The ML Eval Kit is also very easy to use with custom workflow and NN model. For example, you can pass a new model instead of MobileNet for image classification along with the input size. However, to run your specific ML model on Ethos-U NPU, ensure that your custom model has been run through the Vela compiler successfully to generate an optimized NN model. Then configure the build system with Cmake by creating a build directory and setting the path of the TFLite file generated by Vela. Finally, compile the project with make.
You can use Generic Inference Runner ML Eval Kit build option to profile inference speeds for your specific ML applications on Cortex-M55 and Ethos-U55. These can be done by running the following commands:
$ mkdir build && cd build $ cmake .. \ -Dinference_runner_MODEL_TFLITE_PATH=TFLITE_PATH -DUSE_CASE_BUILD=inference_runner
See building default configuration for more information on the different parameter options that you can use with cmake.
$ make
And then running the application binary on FVP with choice of Ethos-U55 using Arm Virtual Hardware.
Note: The number of MACs on the Arm Virtual Hardware FVP execution should be the same as on the Vela compiler --accelerator-config configuration.
--accelerator-config
FVP_Corstone_SSE-300_Ethos-U55 -C ethosu.num_macs=128 -a ./build/bin/ethos-u-inference_runner.axf
You can get start your ML software development for Arm Ethos-U NPU today with the ML Evaluation Kit, Corstone-300 FVP available in Arm Virtual Hardware, Arm Vela Compiler and the available ML examples.
Try Arm Virtual Hardware now!
[CTAToken URL = "https://aws.amazon.com/marketplace/pp/prodview-urbpq7yo5va7g" target="_blank" text="Access credits" class ="green"]