This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

How to make Ethos-U NPU work on an ARM Cortex-A + Cortex-M processor?

alisonw over 5 years ago

I have a question about how to make Ethos-U NPU work on a ARM Cortex-A + Cortex-M processor. First, I found ethos-u-linux-driver-stack and ethos-u-core-software on https://git.mlplatform.org/.

1. I know ethos-u-linux-driver-stack is Ethos-U kernel driver. Should it be integrated into the Linux OS running on Cortex-A or be integrated into the Linux OS running on Cortex-M? I am nor clear about which core it need to perform on.

2. For ethos-u-core-software, how to run it? I did't find the detail steps to run it. Does it run on NPU or any core?

3. Except the above two repos, is there any other repo necessory to make Ethos-U NPU work on an ARM Cortex-A + Cortex-M processor?

Thanks for your suggestion in advance.

Top replies

0 alisonw over 4 years ago in reply to Kristofer Jonsson

I don't have a clear understanding about how to handle the operators that can be executed on NPU and need to be executed on CPU. Are there many interactions when executing the operators on NPU and CPU? For example, the first operator executes on NPU, then the second operator executes on CPU... If so, there are too many interactions between NPU and CPU. It may be a big expense.

Actually, my idea is to run TFLite on Cortex-A. When it needs to run on NPU, Cortex-A will send this request to Cortex-M. Cortex-M will execute ethosu_invoke() and handle IRQ. I mean the core_driver still execute on Cortex-M.
Cancel
Vote up 0 Vote down

View discussion

Cancel
0 Kristofer Jonsson over 4 years ago in reply to alisonw

Hi

1. Next release of Arm Ethos-U is planned for end of February, which I am sure will include the detection postprocess operator.

Until then you can use the fetch_externals.py script to download the latest versions of all repos.

https://git.mlplatform.org/ml/ethos-u/ethos-u.git/about/

$ ./fetch_externals.py fetch

I don't know if Vela has support for detection postprocess in their plans or not.

2. Tensorflow Lite for microcontrollers (TFLu) has been designed to run on Cortex-M. I suppose it would be possible to compile a bare metal app for Cortex-A, but I guess what you wonder is, if Arm Ethos-U can be driven directly from Linux?

In theory, yes, it would be possible to drive the Arm Ethos-U directly from Linux. However, operators that cannot be mapped to the NPU would need to be executed on the CPU, either in kernel space or some kind of user space fallback. Implementing this fallback mechanism is doable, but I don't think it is a trivial task to do. The IRQ handling on Linux might also introduce latency, degrading the performance.
Cancel
Vote up 0 Vote down

View discussion

Cancel
0 alisonw over 4 years ago

Hi, Kristofer, recently I met a problem and had some ideas as below. I really hope you could give me some suggestions.

1. I want to try object detection demo on Cortex-M using tflite micro + SSD model. Then I found operator DETECTION_POSTPROCESS is not supported in tflite micro based on ethos-u 20.08 and 20.11 releases. I also checked the latest tensorflow source code, this operator is supported in tflite micro. So my question is when or which version of ethos-u release will integrate the tensorflow source code that this operator is supported in tflite micro. I suppose this operator will be supported in vela tool, right?

2. As I met such problems for tflite micro, I want to ask a question. Why don't we use tflite on Cortex-A directly instead of tflite micro on Cortex-M? Is there any necessary binding between ethos-u65 and tflite micro?
Cancel
Vote up 0 Vote down

View discussion

Cancel
0 Kristofer Jonsson over 4 years ago in reply to alisonw

That is correct.

Happy New Year!
Cancel
Vote up 0 Vote down

View discussion

Cancel
0 alisonw over 4 years ago in reply to Kristofer Jonsson

Hi, Kristofer, it doesn't matter. Happy New Year! :) Thank you very much for your detailed explanation. According to the explanation, the loop [1] will first run case OPTIMIZER_CONFIG and verify that the command stream has been generated for the correct NPU, then data_ptr will point to the actual command stream. The loop [1] will run case COMMAND_STREAM and handle_command_stream. Is it right?

[1] https://git.mlplatform.org/ml/ethos-u/ethos-u-core-driver.git/tree/src/ethosu_driver.c#n351
Cancel
Vote up 0 Vote down

View discussion

Cancel
0 Kristofer Jonsson over 4 years ago in reply to alisonw
Hi Alison

Sorry for the late reply. I have been on Christmas holiday for the last two weeks and this is my first day after the holiday.

The function ethosu_invoke() takes a pointer to struct custom_data_s [1], which is a dynamic array of driver actions. Vela will typically place an optimizer config followed by a command stream.

Optimizer config is used by the driver to verify that the command stream has been generated for the correct NPU.

Command stream [2] contains the actual command stream embedded for the NPU. This command stream is passed to handle_command_stream().

[1] https://git.mlplatform.org/ml/ethos-u/ethos-u-core-driver.git/tree/src/ethosu_driver.c#n71
[2] https://git.mlplatform.org/ml/ethos-u/ethos-u-core-driver.git/tree/src/ethosu_driver.c#n363

Best regards
Kristofer
Cancel
Vote up 0 Vote down

View discussion

Cancel
0 alisonw over 4 years ago in reply to alisonw

Hi, Kristofer, could you help to reply my question?
Cancel
Vote up 0 Vote down

View discussion

Cancel
0 alisonw over 4 years ago

Hi, Kristofer, I have a question about ethosu_invoke() in core driver. In this function, data_ptr->driver_action_command will be checked. I suspect it is COMMAND_STREAM, and handle_command_stream() will be called. But it is actually OPTIMIZER_CONFIG when running on the board. I have no idea about this result. Could you help to give me some guide? Thanks.
Cancel
Vote up 0 Vote down

View discussion

Cancel
0 Kristofer Jonsson over 4 years ago in reply to alisonw

That was an unlucky example we have uploaded. We will update the example model with something that actually runs on the NPU.
Cancel
Vote up 0 Vote down

View discussion

Cancel
0 alisonw over 4 years ago in reply to Kristofer Jonsson

Hi, Kristofer, I am so curious about the network model in main.cpp has been optimized by Vela. According to my test result on i.MX8MP's M7 core, it didn't execute TLFu framework -> ethosu.cc -> ethosu_inovke in ethosu_driver.c, but execute TLFu framework -> cmsis-nn MAX_POOL_2D invoke. If it is optimized by Vela, it should execute TLFu framework -> ethosu.cc -> ethosu_inovke as my previous tests. In my previous tests, I used the xxx_vela.tflite model which is optimized by Vela and it really executed TLFu framework -> ethosu.cc -> ethosu_inovke in ethosu_driver.c.
Cancel
Vote up 0 Vote down

View discussion

Cancel
0 Kristofer Jonsson over 4 years ago in reply to alisonw

The network model checked in to main.cpp has been optimized by Vela and will only run on a platform with an Arm Ethos-U NPU. It has been provided as an example of how to run an inference on the NPU.
Cancel
Vote up 0 Vote down

View discussion

Cancel
0 alisonw over 4 years ago in reply to Kristofer Jonsson

Hi, Kristofer, Thanks for your reply.

I tried to use the same networkModelData and inputData in https://git.mlplatform.org/ml/ethos-u/ethos-u-core-platform.git/tree/targets/corstone-300/main.cpp to run on i.MX8MP's Cortex-M7 core. But the outputData is not same as the expectedData. Any suggestions?
Cancel
Vote up 0 Vote down

View discussion

Cancel
0 Kristofer Jonsson over 4 years ago in reply to alisonw

1. Correct.

2. That patch has reached upstream. The revisions referenced in the 20.11 release should be possible to build. Please see the link below how to download the repositories from the 20.11 release.

https://git.mlplatform.org/ml/ethos-u/ethos-u.git/about/
Cancel
Vote up 0 Vote down

View discussion

Cancel
0 alisonw over 4 years ago in reply to Kristofer Jonsson

Hi, Kristofer

Thanks for your reply. I have fixed the issues. Now the process inference process -> TLFu framework -> ethosu.cc -> ethosu_driver.c works.

As now there is not real hardware to verify ethous-driver, I want to use the original model (not optimized by vela too) to run on m-core.

1. For the original model, it will go through CMSIS-NN, not ethos-u, right?

2. I remember you have said there is one small patch that has not yet reach upstream, that adjusts the build flags and a few paths to CMSIS-NN. How could I get it?
Cancel
Vote up 0 Vote down

View discussion

Cancel
0 Kristofer Jonsson over 4 years ago in reply to alisonw

Hi

In case you are interested we have just uploaded code to Core Platform. It demonstrates how the Arm Ethos-U driver stack including FreeRTOS can be built for Corstone-300.

Please use fetch_externals.py to download all repositories and follow the instructions in core_software/README.md how to build with either ArmClang or Gcc.

https://git.mlplatform.org/ml/ethos-u/ethos-u.git

1. You should in theory be able to build Core Software for any Arm Cortex-M. However, all variants are not built and tested because they are expected to be too weak for running ML workloads, so I assume that the smaller cores would need some minor adjustments to build.

The driver stack is tested with a wide range of network models. I don't know for sure where they originate from.

2. Hard to tell for sure, but my guess is that the Tensor Arena might be too small. The required Tensor Arena size varies a lot from network to network.
Cancel
Vote up 0 Vote down

View discussion

Cancel