I have a question about how to make Ethos-U NPU work on a ARM Cortex-A + Cortex-M processor. First, I found ethos-u-linux-driver-stack and ethos-u-core-software on https://git.mlplatform.org/.
1. I know ethos-u-linux-driver-stack is Ethos-U kernel driver. Should it be integrated into the Linux OS running on Cortex-A or be integrated into the Linux OS running on Cortex-M? I am nor clear about which core it need to perform on.
2. For ethos-u-core-software, how to run it? I did't find the detail steps to run it. Does it run on NPU or any core?
3. Except the above two repos, is there any other repo necessory to make Ethos-U NPU work on an ARM Cortex-A + Cortex-M processor?
Thanks for your suggestion in advance.
Hi, Kristofer, I have two more questions. Please give me some guide.
1. According to my understanding, ethos-u/core_software/core_driver/ is the ethos-u hardware driver and accomplishes the registers' configuration. I want to know whether it is sufficient for ethos-u's initialization and working. For example, is there any firmware needed to run on ethos-u?
2. I am using ethos-u/ tag 20.08 as base. For core_software, my building command is as below.
cmake .. -DCMAKE_TOOLCHAIN_FILE=../cmake/toolchain/arm-none-eabi-gcc.cmake -DCMAKE_SYSTEM_PROCESSOR=cortex-m33make
I can get all the static libraries. I want to know whether ethos-u is used instead of Cortex-M in the generated libtensorflow-microlite.a using the above command and whether libethosu_core_driver.a is called.
Thanks.
Hi
1. There is no other firmware needed to drive the Arm Ethos-U NPU. The entry point for running an subgraph is ethosu_invoke_v2(), which takes a pointer to driver actions and an array of base addresses.
The Arm Ethos-U supports power save mode, which may result in loss of power between inferences. Because of this the driver will always do a complete setup of all necessary registers before every inference.
Running an inference requires the TFLu framework. The inference process shows how to setup the model and the arena before invoking the framework. The connection between TLFu and the driver is implemented in tensorflow/lite/micro/kernels/ethos-u/ethosu.cc.
inference process -> TLFu framework -> ethosu.cc -> ethosu_driver.c
TLFu operators that are not supported by the Arm Ethos-U NPU will be executed on the Arm Cortex-M. For an inference this might mean that execution ping-pongs between running operators on the NPU and on the CPU. For an ideal case the complete inference would run on the NPU.
2. We had some issues with patches not making it into TLFu in time for the release, so I suspect that there might be minor compilation issues with the 20.08 tag. I did however just try this and it worked,.
$ git clone "https://review.mlplatform.org/ml/ethos-u/ethos-u" $ cd ethos-u $ ./fetch_externals.py fetch $ mkdir core_software/build $ cd core_software/build $ cmake .. -DCMAKE_TOOLCHAIN_FILE=../cmake/toolchain/arm-none-eabi-gcc.cmake -DCMAKE_SYSTEM_PROCESSOR=cortex-m33 && make -j8
The commands for building TFLu can be found in core_software/tensorflow.cmake. With the current setup libethosu_core_driver.a is built separately and passed as ETHOSU_DRIVER_LIBS to the TFLu build system. The driver will consequently not be linked into libtensorflow-microlite.a, but is provided as a separate library. In other words you need to pass both libtensorflow-microlite.a and libethosu_core_driver.a to the linker when you build your firmware binary.
Hi, Kristofer,
Thanks for your reply.
For item 2, yes, I met some compilation issues with the 20.08 tag too. Then I did some changes and made it compile successfully. If it is not necessary to update it to the latest one, I will keep working on 20.08 tag, ok?
For libtensorflow-microlite.a and libethosu_core_driver.a, I have integrated both of them in FreeRTOS. The linker links both of them. But I am not sure whether I need to do other configuration, such as add “ETHOSU_DRIVER_LIBS=” you have mentioned to make the process executes as TLFu framework -> ethosu.cc -> ethosu_driver.c. Do I need to do other configuration? If so, please tell me how to do it in linker.
The commands we use to build libtensorflow-microlite.a are defined in core_software/tensorflow.cmake. You can find them in the build log if you call 'make VERBOSE=1' when building core software. I guess all arguments are important, but the argument triggering Arm Ethos-U to be enabled is TAGS="ethos-u cmsis-nn".
The Tensorflow lite micro build system is spread out across a number of Makefiles. The most important ones for us are tensorflow/lite/micro/tools/make/Makefile and tensorflow/lite/micro/tools/make/ext_libs/ethosu.inc.
If you take a look at ethosu.inc you will see that ETHOSU_DRIVER_LIBS controls if the driver sources are built directly into libtensorflow-microlite.a, or if they are excluded and you need build libethosu_core_driver.a yourself and provide the library to the linker of your binary.
Running a tflite model on the Arm Ethos-U of course requires the tflite file to be optimized by Vela. Else it will run on Cortex-M only.
Hi, Kristofer, thanks a lot for your guild.
1. I checked the build log for generating libtensorflow-microlite.a and libethosu_core_driver.a. The part log is as below.
CMSIS_PATH=xxx/ethos-u_20.08_build/core_software/cmsis ETHOSU_DRIVER_PATH=xxx/ethos-u_20.08_build/core_software/core_driver ETHOSU_DRIVER_LIBS=xxx/ethos-u_20.08_build/core_software/build/core_driver/libethosu_core_driver.a ETHOSU_FAST_MEMORY_SIZE=0 TAGS="cmsis-nn ethos-u" BUILD_TYPE=release
According to the log, Arm Ethos-U is enabled by "TAGS="ethos-u cmsis-nn" argument. The driver source is built as libethosu_core_driver.a. Anyway, I have linked both libtensorflow-microlite.a and libethosu_core_driver.a in the binary. I guess the process is TLFu framework -> ethosu.cc -> ethosu_driver.c.
2. I have another question. I saw the entry point ethosu_invoke which is called in ethosu.cc. In ethosu_driver.c, there are other necessary driver functions, such as ethosu_init, ethosu_irq_handler.. But I didn't find where these driver functions are called. Shouldn't they be executed?
3. I think the ethos-u register base address and irq number need to be used in the driver. For example, register base address is needed for ethous_init and ethosu_invoke. What is the preferred way to transfer these values to these functions?
Actually, I don't know how much work I should do for this core_driver. Please give me some guild.
Up until recently there were no Arm Ethos-U compatible platforms available in the public domain. Because of this we have only published platform generic software components like applications, frameworks and drivers.
Now that the Corstone-300 + Ethos-U has been published we will be able to upstream target specific code as well. This code will demonstrate how to setup the interrupt vector, initialize drivers and how to link the software into a binary. We have created ethos-u-core-platform that in the next month or two will be populated with examples.
https://developer.arm.com/tools-and-software/open-source-software/arm-platforms-software/arm-ecosystem-fvps
2.Code for setting up the platform is not present in core software. Examples of this will later on be published in ethos-u-core-platform. This code will for example show how to setup the interrupt vector and how to call ethosu_init_...().
3. There are many options how the base address could be set. The address could be hard coded in the code; a build system could set a define; a build system could generate a header header file with a variable or a define; etc. I can't say which would be the preferred way, only that there are several options that would solve the problem.
Hi, Kristofer, I have two questions about running the core software on Cortex-M.
1. Does the core software (wrapper application + tflite micro + ethos-u driver) be verified on any Cortex-M core? Do you use the model on https://www.tensorflow.org/lite/guide/hosted_models#automl_mobile_models to verify?
2. I have tried the core software on Cortex-M7. When calling interpreter.AllocateTensors() in applications/inference_process/src/inference_process.cc, it returns kTfLiteError. What's your suggestion about it?
In case you are interested we have just uploaded code to Core Platform. It demonstrates how the Arm Ethos-U driver stack including FreeRTOS can be built for Corstone-300.
Please use fetch_externals.py to download all repositories and follow the instructions in core_software/README.md how to build with either ArmClang or Gcc.
https://git.mlplatform.org/ml/ethos-u/ethos-u.git
1. You should in theory be able to build Core Software for any Arm Cortex-M. However, all variants are not built and tested because they are expected to be too weak for running ML workloads, so I assume that the smaller cores would need some minor adjustments to build.The driver stack is tested with a wide range of network models. I don't know for sure where they originate from.
2. Hard to tell for sure, but my guess is that the Tensor Arena might be too small. The required Tensor Arena size varies a lot from network to network.
Hi, Kristofer
Thanks for your reply. I have fixed the issues. Now the process inference process -> TLFu framework -> ethosu.cc -> ethosu_driver.c works.
As now there is not real hardware to verify ethous-driver, I want to use the original model (not optimized by vela too) to run on m-core.
1. For the original model, it will go through CMSIS-NN, not ethos-u, right?
2. I remember you have said there is one small patch that has not yet reach upstream, that adjusts the build flags and a few paths to CMSIS-NN. How could I get it?
1. Correct.
2. That patch has reached upstream. The revisions referenced in the 20.11 release should be possible to build. Please see the link below how to download the repositories from the 20.11 release.
https://git.mlplatform.org/ml/ethos-u/ethos-u.git/about/
Hi, Kristofer, Thanks for your reply.
I tried to use the same networkModelData and inputData in https://git.mlplatform.org/ml/ethos-u/ethos-u-core-platform.git/tree/targets/corstone-300/main.cpp to run on i.MX8MP's Cortex-M7 core. But the outputData is not same as the expectedData. Any suggestions?
The network model checked in to main.cpp has been optimized by Vela and will only run on a platform with an Arm Ethos-U NPU. It has been provided as an example of how to run an inference on the NPU.
Hi, Kristofer, I am so curious about the network model in main.cpp has been optimized by Vela. According to my test result on i.MX8MP's M7 core, it didn't execute TLFu framework -> ethosu.cc -> ethosu_inovke in ethosu_driver.c, but execute TLFu framework -> cmsis-nn MAX_POOL_2D invoke. If it is optimized by Vela, it should execute TLFu framework -> ethosu.cc -> ethosu_inovke as my previous tests. In my previous tests, I used the xxx_vela.tflite model which is optimized by Vela and it really executed TLFu framework -> ethosu.cc -> ethosu_inovke in ethosu_driver.c.
That was an unlucky example we have uploaded. We will update the example model with something that actually runs on the NPU.
Hi, Kristofer, I have a question about ethosu_invoke() in core driver. In this function, data_ptr->driver_action_command will be checked. I suspect it is COMMAND_STREAM, and handle_command_stream() will be called. But it is actually OPTIMIZER_CONFIG when running on the board. I have no idea about this result. Could you help to give me some guide? Thanks.