There is an explosion of edge and endpoint artificial intelligence (AI) in the world today. To address this wave of edge and endpoint AI devices, the microNPU, a new class of machine learning(ML) processor has been specifically designed by Arm to accelerate ML inference in area-constrained embedded and IoT devices. With Arm’s range of Ethos-U microNPUs, you can easily build low-cost, highly efficient AI solutions in a wide range of embedded devices and systems based on Arm Cortex and Arm Neoverse. Ethos-U provides a scalable range of performance and memory interfaces and integrates low-power Cortex-M SoCs, as well as SoCs based on high-performance Arm Cortex-A, Cortex-R, and Arm Neoverse.
Figure 1: Chip diagram of Arm Ethos-U MicroNPU
To deploy your neural network (NN) model on Ethos-U, the first step you need to do is use Vela to compile your prepared model. Vela is an open-source Python tool which can optimize a neural network model into a version that can run on an embedded system containing an Ethos-U NPU.
After compilation, the optimized model will contain TensorFlow Lite custom operators for those parts of the model that can be accelerated by the Ethos-U microNPU. Parts of the model that cannot be accelerated are left unchanged and will run on the CPU using an appropriate kernel.
Figure 2: Operators workflow
We constantly contribute to increase Vela’s support operators. To check the operator list that Vela currently supports, type the following command into your computer's CLI window (Such as Windows cmd or Linux terminal) to generate a report after you have installed Vela. The report is generated under your current working directory and named as the “SUPPORT_OPS.md”.
vela --supported-ops-report
However, you may be aware that in the report, some operators have constraints. If the constraints are not met, then that operator will be scheduled on the CPU. Note that under this situation, it doesn't mean the whole model cannot be run. Just some parts of your network cannot be run on Ethos-U. These unsupported parts run on the CPU instead. By using the following command, you can check which operators of your model falls back to the CPU.
vela network.tflite --show-cpu-operations
Vela is the first and essential step to deploy your NN model on Arm Ethos-U microNPUs. In this blog, we show you the generic workflow of how to use Vela to compile your model.
Vela runs on the Linux, Mac OS, and Microsoft Windows 10 operating systems. You can easily install it from PyPi community by the following command, but you can also obtain the source code and see more advanced installation methods from Arm ML platform.
pip3 install ethos-u-vela
Please note that your computer should meet the prerequisites listed in the “Prerequisites” session before you kick off the installation process. For version details, you can check with the following command.
vela --version
The generic workflow is in the following diagram.
Figure 3: Generic workflow
To be accelerated by the Ethos-U microNPU, your network operators must be quantized to either 8-bit or 16-bit (signed). Vela is run with an input .tflite file passed on the command line, which contains your optimized neural network.
You can prepare the .tflite initial model by the following two ways:
Vela is a highly customizable offline compilation tool for the Ethos-U series. You can easily customize various properties of the Ethos-U embedded system, like memory latencies and bandwidths, by rewriting the Vela configuration file. But we strongly recommend you customize it as close as to the real hardware system which you plan to deploy your NN model on.
The format of Vela configuration file is a Python ConfigParser .ini file format. In the .ini file, it mainly consists of 2 sections, System Configuration and Memory Mode, used to identify a configuration, and key and value pair options used to specify the properties. Note that all sections and key/value pairs are case-sensitive.
Similarly, the following two ways can help you prepare your Vela configuration file.
Use default Vela configuration fileWe offer you a default "vela.ini" file to describe some generic classes of embedded devices and systems. You can use it as your vela configuration file directly and current choices we offer in default "vela.ini" file can be seen in following table. For more detailed properties information of each choice, you can check the vela.ini file.
If existing generic configuration choices in current vela version don't meet your hardware system requirements, you can write your custom Vela configuration file. Meanwhile, use the following command to specify the path to your custom Vela configuration file. Refer to the detailed writing instructions in the "Configuration File" session to complete your custom vela configuration file. Setting should be aligned with a driver programing the region configuration registers to control which AXI port to use for model data access (see Ethos-U programmers model for more details).
vela network.tflite --config your_vela_configuration_file.ini
Vela provides users with lots of command-line interfaces (CLI) to configure each specific calling process. The verbose and detailed description can be found in "Command Line Interface" session.
Among the numerous parameters, besides the “Network” which is required, it is essential and important to set the following key parameter options correctly to reflect the real hardware platform configuration. If you do not specify these parameters additionally, it runs under the internal default values which are version-specific. Refer to each version’s “Vela Options” documentation to find out the default value of each parameter.
Currently, we offer the following choices for you to use as your hardware accelerator configuration and the optimization strategy.
One configuring and calling example can be seen as follows:
vela network.tflite \ –-output-dir ./output \ --accelerator-config ethos-u55-256 \ --optimise Performance \ --config vela.ini \ --system-config Ethos-U55_High_End_Embedded \ --memory-mode Shared_Sram
After using the above command to call Vela seen previously, you will obtain the optimized output model under your specified directory "./output". The output file is in _vela.tflite format. Meanwhile, your computer’s console window will present a log of the Vela compilation process.
Sometimes you find some warnings appear in the log. Take a careful look at them. They will indicate to you the decisions that the compiler has made to create the optimized network, like which operators will be called back to CPU.
As your first step to deploy your NN model on Ethos-U, the Vela compiler is open-source and easy-use. Try it out today to experience the huge improvement in Machine Learning ability brought about by Arm’s Ethos-U to an embedded system.
[CTAToken URL = "https://review.mlplatform.org/plugins/gitiles/ml/ethos-u/ethos-u-vela" target="_blank" text="Access Ethos-u Vela" class ="green"]