We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
I have a question about how to make Ethos-U NPU work on a ARM Cortex-A + Cortex-M processor. First, I found ethos-u-linux-driver-stack and ethos-u-core-software on https://git.mlplatform.org/.
1. I know ethos-u-linux-driver-stack is Ethos-U kernel driver. Should it be integrated into the Linux OS running on Cortex-A or be integrated into the Linux OS running on Cortex-M? I am nor clear about which core it need to perform on.
2. For ethos-u-core-software, how to run it? I did't find the detail steps to run it. Does it run on NPU or any core?
3. Except the above two repos, is there any other repo necessory to make Ethos-U NPU work on an ARM Cortex-A + Cortex-M processor?
Thanks for your suggestion in advance.
I don't have a clear understanding about how to handle the operators that can be executed on NPU and need to be executed on CPU. Are there many interactions when executing the operators on NPU and CPU? For example, the first operator executes on NPU, then the second operator executes on CPU... If so, there are too many interactions between NPU and CPU. It may be a big expense.
Actually, my idea is to run TFLite on Cortex-A. When it needs to run on NPU, Cortex-A will send this request to Cortex-M. Cortex-M will execute ethosu_invoke() and handle IRQ. I mean the core_driver still execute on Cortex-M.
Hi
1. Next release of Arm Ethos-U is planned for end of February, which I am sure will include the detection postprocess operator.
Until then you can use the fetch_externals.py script to download the latest versions of all repos.
https://git.mlplatform.org/ml/ethos-u/ethos-u.git/about/
$ ./fetch_externals.py fetch
I don't know if Vela has support for detection postprocess in their plans or not.
2. Tensorflow Lite for microcontrollers (TFLu) has been designed to run on Cortex-M. I suppose it would be possible to compile a bare metal app for Cortex-A, but I guess what you wonder is, if Arm Ethos-U can be driven directly from Linux?
In theory, yes, it would be possible to drive the Arm Ethos-U directly from Linux. However, operators that cannot be mapped to the NPU would need to be executed on the CPU, either in kernel space or some kind of user space fallback. Implementing this fallback mechanism is doable, but I don't think it is a trivial task to do. The IRQ handling on Linux might also introduce latency, degrading the performance.
Hi, Kristofer, recently I met a problem and had some ideas as below. I really hope you could give me some suggestions.
1. I want to try object detection demo on Cortex-M using tflite micro + SSD model. Then I found operator DETECTION_POSTPROCESS is not supported in tflite micro based on ethos-u 20.08 and 20.11 releases. I also checked the latest tensorflow source code, this operator is supported in tflite micro. So my question is when or which version of ethos-u release will integrate the tensorflow source code that this operator is supported in tflite micro. I suppose this operator will be supported in vela tool, right?
2. As I met such problems for tflite micro, I want to ask a question. Why don't we use tflite on Cortex-A directly instead of tflite micro on Cortex-M? Is there any necessary binding between ethos-u65 and tflite micro?
That is correct.
Happy New Year!
Hi, Kristofer, it doesn't matter. Happy New Year! :) Thank you very much for your detailed explanation. According to the explanation, the loop [1] will first run case OPTIMIZER_CONFIG and verify that the command stream has been generated for the correct NPU, then data_ptr will point to the actual command stream. The loop [1] will run case COMMAND_STREAM and handle_command_stream. Is it right?
[1] https://git.mlplatform.org/ml/ethos-u/ethos-u-core-driver.git/tree/src/ethosu_driver.c#n351
Hi Alison
Sorry for the late reply. I have been on Christmas holiday for the last two weeks and this is my first day after the holiday.
The function ethosu_invoke() takes a pointer to struct custom_data_s [1], which is a dynamic array of driver actions. Vela will typically place an optimizer config followed by a command stream.
[1] https://git.mlplatform.org/ml/ethos-u/ethos-u-core-driver.git/tree/src/ethosu_driver.c#n71[2] https://git.mlplatform.org/ml/ethos-u/ethos-u-core-driver.git/tree/src/ethosu_driver.c#n363
Best regardsKristofer
Hi, Kristofer, could you help to reply my question?
Hi, Kristofer, I have a question about ethosu_invoke() in core driver. In this function, data_ptr->driver_action_command will be checked. I suspect it is COMMAND_STREAM, and handle_command_stream() will be called. But it is actually OPTIMIZER_CONFIG when running on the board. I have no idea about this result. Could you help to give me some guide? Thanks.
That was an unlucky example we have uploaded. We will update the example model with something that actually runs on the NPU.
Hi, Kristofer, I am so curious about the network model in main.cpp has been optimized by Vela. According to my test result on i.MX8MP's M7 core, it didn't execute TLFu framework -> ethosu.cc -> ethosu_inovke in ethosu_driver.c, but execute TLFu framework -> cmsis-nn MAX_POOL_2D invoke. If it is optimized by Vela, it should execute TLFu framework -> ethosu.cc -> ethosu_inovke as my previous tests. In my previous tests, I used the xxx_vela.tflite model which is optimized by Vela and it really executed TLFu framework -> ethosu.cc -> ethosu_inovke in ethosu_driver.c.
The network model checked in to main.cpp has been optimized by Vela and will only run on a platform with an Arm Ethos-U NPU. It has been provided as an example of how to run an inference on the NPU.
Hi, Kristofer, Thanks for your reply.
I tried to use the same networkModelData and inputData in https://git.mlplatform.org/ml/ethos-u/ethos-u-core-platform.git/tree/targets/corstone-300/main.cpp to run on i.MX8MP's Cortex-M7 core. But the outputData is not same as the expectedData. Any suggestions?
1. Correct.
2. That patch has reached upstream. The revisions referenced in the 20.11 release should be possible to build. Please see the link below how to download the repositories from the 20.11 release.
Hi, Kristofer
Thanks for your reply. I have fixed the issues. Now the process inference process -> TLFu framework -> ethosu.cc -> ethosu_driver.c works.
As now there is not real hardware to verify ethous-driver, I want to use the original model (not optimized by vela too) to run on m-core.
1. For the original model, it will go through CMSIS-NN, not ethos-u, right?
2. I remember you have said there is one small patch that has not yet reach upstream, that adjusts the build flags and a few paths to CMSIS-NN. How could I get it?
In case you are interested we have just uploaded code to Core Platform. It demonstrates how the Arm Ethos-U driver stack including FreeRTOS can be built for Corstone-300.
Please use fetch_externals.py to download all repositories and follow the instructions in core_software/README.md how to build with either ArmClang or Gcc.
https://git.mlplatform.org/ml/ethos-u/ethos-u.git
1. You should in theory be able to build Core Software for any Arm Cortex-M. However, all variants are not built and tested because they are expected to be too weak for running ML workloads, so I assume that the smaller cores would need some minor adjustments to build.The driver stack is tested with a wide range of network models. I don't know for sure where they originate from.
2. Hard to tell for sure, but my guess is that the Tensor Arena might be too small. The required Tensor Arena size varies a lot from network to network.