Arm Community
Arm Community
  • Site
  • User
  • Site
  • Search
  • User
Arm Community blogs
Arm Community blogs
AI blog Vela Compiler: The first step to deploy your NN model on the Arm Ethos-U microNPU
  • Blogs
  • Mentions
  • Sub-Groups
  • Tags
  • Jump...
  • Cancel
More blogs in Arm Community blogs
  • AI blog

  • Announcements

  • Architectures and Processors blog

  • Automotive blog

  • Embedded and Microcontrollers blog

  • Internet of Things (IoT) blog

  • Laptops and Desktops blog

  • Mobile, Graphics, and Gaming blog

  • Operating Systems blog

  • Servers and Cloud Computing blog

  • SoC Design and Simulation blog

  • Tools, Software and IDEs blog

Tags
  • Artificial Intelligence (AI)
  • Machine Learning (ML)
  • Arm Ethos-U processor
Actions
  • RSS
  • More
  • Cancel
Related blog posts
Related forum threads

Vela Compiler: The first step to deploy your NN model on the Arm Ethos-U microNPU

Liliya Wu
Liliya Wu
January 17, 2022
7 minute read time.

Arm Ethos-U microNPU 

There is an explosion of edge and endpoint artificial intelligence (AI) in the world today. To address this wave of edge and endpoint AI devices, the microNPU, a new class of machine learning(ML) processor has been specifically designed by Arm to accelerate ML inference in area-constrained embedded and IoT devices. With Arm’s range of Ethos-U microNPUs, you can easily build low-cost, highly efficient AI solutions in a wide range of embedded devices and systems based on Arm Cortex and Arm Neoverse. Ethos-U provides a scalable range of performance and memory interfaces and integrates low-power Cortex-M SoCs, as well as SoCs based on high-performance Arm Cortex-A, Cortex-R, and Arm Neoverse.  

Chip Diagram of Ethos-U55 Chip Diagram of Ethos-U65 in Cortex-M based system Chip Diagram of Ethos-U65 in Cortex-A/Neoverse based system
 Figure 1: Chip diagram of Arm Ethos-U MicroNPU

Vela overview 

To deploy your neural network (NN) model on Ethos-U, the first step you need to do is use Vela to compile your prepared model. Vela is an open-source Python tool which can optimize a neural network model into a version that can run on an embedded system containing an Ethos-U NPU. 

After compilation, the optimized model will contain TensorFlow Lite custom operators for those parts of the model that can be accelerated by the Ethos-U microNPU. Parts of the model that cannot be accelerated are left unchanged and will run on the CPU using an appropriate kernel.  

Operators Workflow
Figure 2: Operators workflow

We constantly contribute to increase Vela’s support operators. To check the operator list that Vela currently supports, type the following command into your computer's CLI window (Such as Windows cmd or Linux terminal) to generate a report after you have installed Vela. The report is generated under your current working directory and named as the “SUPPORT_OPS.md”.

vela --supported-ops-report

However, you may be aware that in the report, some operators have constraints. If the constraints are not met, then that operator will be scheduled on the CPU. Note that under this situation, it doesn't mean the whole model cannot be run. Just some parts of your network cannot be run on Ethos-U. These unsupported parts run on the CPU instead. By using the following command, you can check which operators of your model falls back to the CPU. 

vela network.tflite --show-cpu-operations

Generic workflow

Vela is the first and essential step to deploy your NN model on Arm Ethos-U microNPUs. In this blog, we show you the generic workflow of how to use Vela to compile your model.

Vela runs on the Linux, Mac OS, and Microsoft Windows 10 operating systems. You can easily install it from PyPi community by the following command, but you can also obtain the source code and see more advanced installation methods from Arm ML platform.

pip3 install ethos-u-vela

Please note that your computer should meet the prerequisites listed in the “Prerequisites” session before you kick off the installation process. For version details, you can check with the following command.

vela --version

The generic workflow is in the following diagram.  

Generic Workflow
Figure 3: Generic workflow

1. Prepare your NN model

To be accelerated by the Ethos-U microNPU, your network operators must be quantized to either 8-bit or 16-bit (signed). Vela is run with an input .tflite file passed on the command line, which contains your optimized neural network.

You can prepare the .tflite initial model by the following two ways:

  • If you already have your own pre-trained models on hand,
    there are many existing tools like TensorFlow Model Optimization toolkit that can help you obtain the well-quantized model. You can also follow our Optimization blog here to optimize your custom NN model.
  • If you do not have your models on hand, depending on your specific ML applications, Arm ML Zoo and TensorFlow Hub can offer you a wide variety of ML models in .tflite format. You can download and use them directly as your own model.

2. Prepare your Vela configuration file

Vela is a highly customizable offline compilation tool for the Ethos-U series. You can easily customize various properties of the Ethos-U embedded system, like memory latencies and bandwidths, by rewriting the Vela configuration file. But we strongly recommend you customize it as close as to the real hardware system which you plan to deploy your NN model on.

The format of Vela configuration file is a Python ConfigParser .ini file format. In the .ini file, it mainly consists of 2 sections, System Configuration and Memory Mode, used to identify a configuration, and key and value pair options used to specify the properties. Note that all sections and key/value pairs are case-sensitive.

Similarly, the following two ways can help you prepare your Vela configuration file.

  • Use default Vela configuration file
    We offer you a default "vela.ini" file to describe some generic classes of embedded devices and systems. You can use it as your vela configuration file directly and c
    urrent choices we offer in default "vela.ini" file can be seen in following table. For more detailed properties information of each choice, you can check the vela.ini file.

    System Configuration and Memory Mode option
    Option Choices and description
    System Configuration Ethos_U55_Deep_Embedded: SRAM (1.6 GB/s) and Flash (0.1 GB/s)
    Ethos_U55_High_End_Embedded: SRAM (4 GB/s) and Flash (0.5 GB/s)
    Ethos_U65_Embedded: SRAM (8 GB/s) and Flash (0.5 GB/s)
    Ethos_U65_Mid_End: SRAM (8 GB/s) and DRAM (3.75 GB/s)
    Ethos_U65_High_End: SRAM (16 GB/s) and DRAM (3.75 GB/s)
    Ethos_U65_Client_Server: SRAM (16 GB/s) and DRAM (12 GB/s)
    Memory Mode Sram_Only: Model static data and tensor arena are placed into SRAM. The SRAM is shared between Ethos-U and Cortex-M.
    Shared_Sram: Model static data is read from flash and DRAM, tensor arena is placed in SRAM. The SRAM is shared between the Ethos-U and the Cortex-M.
    Dedicated_Sram: The SRAM (384KB) is only for use by the Ethos-U. This memory mode could be used in Ethos-U65 if SRAM is too small to store model data. All model data is placed in DRAM. Ethos-U has a dedicated SRAM carve out for caching.
    Dedicated_Sram_512KB: the SRAM (512KB) is only for use by the Ethos-U
    Note: The choices shown are offered in vela 3.2.0 version and they are version-specific.

  • Use custom Vela configuration file

    If existing generic configuration choices in current vela version don't meet your hardware system requirements, you can write your custom Vela configuration file. Meanwhile, use the following command to specify the path to your custom Vela configuration file. Refer to the detailed writing instructions in the "Configuration File" session to complete your custom vela configuration file. Setting should be aligned with a driver programing the region configuration registers to control which AXI port to use for model data access (see Ethos-U programmers model for more details).

    vela network.tflite --config your_vela_configuration_file.ini

3. Configure and run

Vela provides users with lots of command-line interfaces (CLI) to configure each specific calling process. The verbose and detailed description can be found in "Command Line Interface" session.

Among the numerous parameters, besides the “Network” which is required, it is essential and important to set the following key parameter options correctly to reflect the real hardware platform configuration. If you do not specify these parameters additionally, it runs under the internal default values which are version-specific. Refer to each version’s “Vela Options” documentation to find out the default value of each parameter.

Key Configure Parameter list
Key Parameter Descriptions
Network (required) Filename of the network model to compile. The file has to be a .tflite file.
Output Directory Specifies the output directory of the optimized network model
Config Specifies the path to the Vela configuration file. 
Accelerator Configuration Choose which hardware accelerator configuration to compile for. Format is accelerator name followed by a hyphen, followed by the number of MACs in the configuration.
System Config Selects the system configuration to use as specified in the Vela configuration file.
Memory Mode Selects the memory mode to use as specified in the Vela configuration file.
Optimize Set the optimization strategy.

Currently, we offer the following choices for you to use as your hardware accelerator configuration and the optimization strategy.

Accelerator Configuration and Optimization strategy choice
Key Parameter Choices
Accelerator Configuration ethos-u55-32: Ethos-U55 with 32 MACs.
ethos-u55-64: Ethos-U55 with 64 MACs.
ethos-u55-128: Ethos-U55 with 128 MACs.
ethos-u55-256: Ethos-U55 with 256 MACs.
ethos-u65-256: Ethos-U65 with 256 MACs.
ethos-u65-512: Ethos-U65 with 512 MACs.
Optimize Size: The size strategy results in minimal SRAM usage (it does not use arena cache memory area size).
Performance: The performance strategy results in maximal performance (it uses the arena cache memory area size if specified either with the CLI option of Vela configuration file).

One configuring and calling example can be seen as follows:

vela network.tflite \
–-output-dir ./output \
--accelerator-config ethos-u55-256 \  
--optimise Performance \
--config vela.ini \
--system-config Ethos-U55_High_End_Embedded \
--memory-mode Shared_Sram

After using the above command to call Vela seen previously, you will obtain the optimized output model under your specified directory "./output". The output file is in _vela.tflite format. Meanwhile, your computer’s console window will present a log of the Vela compilation process.

Sometimes you find some warnings appear in the log. Take a careful look at them. They will indicate to you the decisions that the compiler has made to create the optimized network, like which operators will be called back to CPU.

Try it out today

As your first step to deploy your NN model on Ethos-U, the Vela compiler is open-source and easy-use. Try it out today to experience the huge improvement in Machine Learning ability brought about by Arm’s Ethos-U to an embedded system.

Access Ethos-u Vela

Anonymous
AI blog
  • Build AI responsibly with the Yellow Teaming methodology and LLM assistant

    Zach Lasiuk
    Zach Lasiuk
    Yellow Teaming helps developers build responsible AI by aligning products with long-term value, not just short-term success.
    • June 6, 2025
  • Unlocking audio generation on Arm CPUs to all: Running Stable Audio Open Small with KleidiAI

    Gian Marco Iodice
    Gian Marco Iodice
    Real-time AI audio on Arm: Generate 10s of sound in ~7s with Stable Audio Open Small, now open-source and ready for mobile.
    • May 14, 2025
  • Deploying PyTorch models on Arm edge devices: A step-by-step tutorial

    Cornelius Maroa
    Cornelius Maroa
    As AI adoption in edge computing grows, deploying PyTorch models on ARM devices is becoming essential. This tutorial guides you through the process.
    • April 22, 2025