Arm ML Embedded Evaluation Kit

November 22, 2021

10 minute read time.

The Cortex-M55 processor is Arm’s most AI-capable cortex-M processor and is the first CPU based on Arm’s Helium technology. While the Cortex-M55 is fast enough to run ML models on the tiny microcontrollers, integration with the Arm Ethos-U55 microNPU can accelerate ML inference in embedded systems up to 480x faster.

The Ethos-U55 is a machine learning processor which has been optimized to execute common mathematical ML algorithm operations such as convolutions or activation functions. The Ethos-U processor supports popular neural network models such as CNNs and RNNs for audio processing, speech recognition, image classification and object detection.

Broadest Range of ML-optimized Processing Solutions

Deploying an Inference on Ethos-U NPU

To run your inference on Ethos-U NPU, it is essential to quantize the network operators to either 8-bit (unsigned or signed) or 16-bit (signed) as Ethos-U supports only 8-bit weights or 16-bit activations. TensorFlow Model Optimization Toolkit enables developers to optimize ML models for deployment on devices with tight memory, power constraints, and storage limitations. There are different optimization techniques including quantization, pruning, and clustering which is part of the TensorFlow Model Optimization Toolkit and compatible with TensorFlow Lite. For example, you can perform post-training integer quantization to the model to convert the weights and activations from floating point numbers to integer numbers after loading the converted model with TFLiteConverter. Notice that once a model is pruned and clustered, there is usually a small training performed post pruning/clustering to account for the lost accuracy. So, you need to make a trade-off between the model complexity and size.

Optimize your model using post-training quantization:

def representative_dataset(): 
    for _ in range(100): 
      # Using some random data for testing purposes  
      data = np.random.rand(1, 244, 244, 3) 
      yield [data.astype(np.float32)] 
 
# Load the model into TensorFlow using TFLite converter 
converter = tf.lite.TFLiteConverter.from_saved_model(“model_tf”) 
# Set options for full integer post-training quantization 
converter.optimizations = [tf.lite.Optimize.DEFAULT] 
converter.representative_dataset = representative_data_gen 
 
# Ensure that if any ops can't be quantized, the converter throws an error 
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8] 
 
# Set the input and output tensors to int8 
converter.inference_input_type = tf.int8 
converter.inference_output_type = tf.int8 
  
# Convert to TFLite 
tflite_model_quant = converter.convert()

To deploy your NN model on Ethos-U, you need to compile the trained quantized model with Vela to generate an optimized NN model for Ethos-U. Vela is an open source Python tool which compiles a TFLite NN model into an optimized version that can run on an embedded system containing Arm Ethos-U NPU. You can install Vela by running $pip install ethos-u-vela command and then compile the network using a particular Ethos-U NPU config such as ethos-u55-128 by running the following command line. Read more about the different command-line options with vela from https://pypi.org/project/ethos-u-vela/.

$ vela model.tflite \ 
--accelerator-config=ethos-u55-128    \ 
--optimise Performance 
--config vela.ini 
--system-config=Ethos-U55_High_End_Embedded       
–-output-dir ./results_dir /path/to/model.tflite \

accelerator-config specifies the microNPU configuration to use between:

ethos-u55-256
ethos-u55-128
ethos-u55-64
ethos-u55-32
ethos-u65-256
ethos-u65-512

Vela Workflow

Figure 1: Vela Workflow

The output of the Vela is an optimized TensorFlow Lite file which is ready to deploy on a system using an Ethos-U NPU in this case the Arm Virtual Hardware configured with Corstone-300 FVP.

Normally you can load the TFLite model from the disk using TensorFlow Lite Interpreter Python API for deployment.

# Load the TFLite model in TFLite Interpreter 
interpreter = tf.lite.Interpreter(model_content=tflite_model)

However, most microcontrollers do not have a filesystem and therefore extra code and space is required to load a model from disk. An effective way is to provide the model in a C source file that can be included in our binary and loaded directly into memory. To do so, you need to use the TensorFlow Lite for microcontrollers C++ library to load the model and make predictions. Another fast and easy way is using open-source Arm ML Embedded Evaluation Kit. It enables developers to quickly execute the neural networks model using TensorFlow Lite for microcontrollers inference engine targeting Arm Cortex-M55 and Ethos-U microcontrollers.

Overview of the Evaluation Kit

Arm Cortex-M and Arm Ethos-U

The Arm ML Evaluation Kit allows developers to quickly build and deploy embedded machine learning applications for Arm Cortex-M55 and Arm Ethos-U55 NPU. It contains developed software ML applications for Ethos-U55 systems including:

Therefore, you can quickly evaluate the performance metrics of the networks running on Cortex-M CPU and Ethos-U NPU with these ready-to-use ML examples. You can also easily create your custom ML software applications for Ethos-U with the generic inference runner available in the evaluation kit. The generic inference runner allows you to feed any model and obtain the performance matrix such as number of NPU cycles and the amount of memory transactions across different buses.

ML Embedded evaluation kit software and hardware Stack

The software stack of evaluation kit contains different layers with the application on top and dependencies at the bottom. After the configuration of build system for Ethos-U NPU, the integrated TensorFlow Lite for Microcontrollers with the Ethos-U NPU driver executes the certain operators that can be accelerated by the Ethos-U NPU. For unsupported neural network model operators on NPU, the inference run on the CPU using CMSIS-NN. CMSIS-NN optimizes CPU workload execution or using reference kernels that are provided by the inference engine. The Hardware Abstraction Layer (HAL) sources provide a platform agnostic API to access hardware platform-specific functions.

ML Evaluation Kit software Stack

Figure 2: ML Evaluation Kit Software Stack

The ML Eval Kit is based on Arm Corstone-300 reference package which helps SoC designers to build secure systems faster. It maximizes the performance of IoT and embedded devices by taking advantage of Arm Cortex-M55 processor. The Corstone-300 can easily integrate the Ethos-U55, and the platform is available as Ecosystem FPGA (MPS3) and Fixed Virtual Platform (FVP) to allow development ahead of hardware availability (silicon hardware will be released soon).
ML Evaluation Kit Hardware Stack

Figure 3: ML Evaluation Kit Hardware Stack

Corstone-300 FVP with Ethos-U55 with Arm Virtual Hardware

FVPs are a digital twin of the MPS3 FPGA image. It enables developers to quickly build and evaluate real-world embedded ML applications on the virtual platforms using Arm Cortex-M55 and Arm Ethos-U55 design.

The Corstone-300 FVP with Ethos-U55 and Ethos-U65 is available as part of Arm Virtual Hardware. Arm Virtual Hardware provides functionally accurate models of Arm-based SoCs for application developers to build and test software before and after silicon and hardware availability, helping to accelerate the development of IoT and endpoint AI applications. It runs as a simple application in the cloud for simulating memory and peripherals, removing the complexity of building and configuring board farms.
Fixed Virtual Platform (FVP) GUI

ML Eval Kit Workflow

A common workflow to build and run ready to use ML Eval Kit ML examples such as keyword spotting on Cortex-M and Ethos-U with ML-Eval-Kit is as follows:

1. Ensure the following prerequisites are installed and they are available on the path.

GNU Arm embedded toolchain version 10.2.1 or higher or the Arm Compiler version 6.15 or higher 
CMake version 3.15 or above  
Python 3.6 or above 
Python virtual environment module 
Make 
An Arm Corstone-300 based FV

2. Clone the Ethos-U evaluation kit repository

$ git clone https://review.mlplatform.org/ml/ethos-u/ml-embedded-evaluation-kit
$cd ml-embedded-evaluation-kit

3. Pull all the external dependencies

$ git submodule update --init

4. Execute the build_default.py to configure the build system with default setting like MPS3 FVP target and Ethos-U55 timing-adapter.

a. If using Arm GNU embedded toolchain 

$ python build_default.py

b. If using Arm Compiler

$ python build_default.py –toolchain arm

5. Compile the project with a make command

6. Results of the build are placed under build/bin directory, for example:

bin 
├── ethos-u-<use_case_name>.axf 
├── ethos-u-<use_case_name>.htm 
├── ethos-u-<use_case_name>.map 
└── sectors 
       ├── audio.txt 
        └── <use_case> 
                ├── ddr.bin 
                └── itcm.bin

7. Launch the desired application on the FVP with Arm Virtual Hardware if using FVP. For example the keyword spotting use case on Ethos-U55 can be started by following command:

$ FVP_Corstone_SSE-300_Ethos-U55 -a ./build/bin/ethos-u-kws.axf

Learn more about different command-line parameters that FVP supports from here.

Configuring and running custom model with ML Embedded Evaluation Kit

The ML Eval Kit is also very easy to use with custom workflow and NN model. For example, you can pass a new model instead of MobileNet for image classification along with the input size. However, to run your specific ML model on Ethos-U NPU, ensure that your custom model has been run through the Vela compiler successfully to generate an optimized NN model. Then configure the build system with Cmake by creating a build directory and setting the path of the TFLite file generated by Vela. Finally, compile the project with make.

You can use Generic Inference Runner ML Eval Kit build option to profile inference speeds for your specific ML applications on Cortex-M55 and Ethos-U55. These can be done by running the following commands:

$ mkdir build && cd build  
 
$ cmake .. \  
-Dinference_runner_MODEL_TFLITE_PATH=TFLITE_PATH 
-DUSE_CASE_BUILD=inference_runner

See building default configuration for more information on the different parameter options that you can use with cmake.

$ make

And then running the application binary on FVP with choice of Ethos-U55 using Arm Virtual Hardware.

Note: The number of MACs on the Arm Virtual Hardware FVP execution should be the same as on the Vela compiler --accelerator-config configuration.

FVP_Corstone_SSE-300_Ethos-U55 -C ethosu.num_macs=128 -a ./build/bin/ethos-u-inference_runner.axf

Try out today

You can get start your ML software development for Arm Ethos-U NPU today with the ML Evaluation Kit, Corstone-300 FVP available in Arm Virtual Hardware, Arm Vela Compiler and the available ML examples.

Try Arm Virtual Hardware now!

Access credits

0 comments
0 members are here

AI blog

Bringing Generative AI to the masses with ExecuTorch and KleidiAI

Gian Marco Iodice

With the recent Arm SME2 announcement, the role of Arm KleidiAI is increasingly clear as Arm’s AI accelerator layer powering the next wave of AI.
- August 13, 2025
Yellow Teaming on Arm: A look inside our responsible AI workshop

Annie Tallund

Led a hands-on Yellow Teaming workshop at WeAreDevelopers, exploring Responsible AI and LLMs on Arm-powered tech.
- July 28, 2025
Arm at KubeCon and CloudNativeCon China 2025: Powering the future of Cloud Native AI

Fei Xiang

Arm energized KubeCon + CloudNativeCon China 2025, driving record dev engagement and showcasing cloud-native AI innovation on Arm-based infrastructure.
- July 21, 2025

AI blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded and Microcontrollers blog

Internet of Things (IoT) blog

Laptops and Desktops blog

Mobile, Graphics, and Gaming blog

Operating Systems blog

Servers and Cloud Computing blog

SoC Design and Simulation blog

Tools, Software and IDEs blog