I have a question about how to make Ethos-U NPU work on a ARM Cortex-A + Cortex-M processor. First, I found ethos-u-linux-driver-stack and ethos-u-core-software on https://git.mlplatform.org/.
1. I know ethos-u-linux-driver-stack is Ethos-U kernel driver. Should it be integrated into the Linux OS running on Cortex-A or be integrated into the Linux OS running on Cortex-M? I am nor clear about which core it need to perform on.
2. For ethos-u-core-software, how to run it? I did't find the detail steps to run it. Does it run on NPU or any core?
3. Except the above two repos, is there any other repo necessory to make Ethos-U NPU work on an ARM Cortex-A + Cortex-M processor?
Thanks for your suggestion in advance.
Hi, Kristofer, recently I met a problem and had some ideas as below. I really hope you could give me some suggestions.
1. I want to try object detection demo on Cortex-M using tflite micro + SSD model. Then I found operator DETECTION_POSTPROCESS is not supported in tflite micro based on ethos-u 20.08 and 20.11 releases. I also checked the latest tensorflow source code, this operator is supported in tflite micro. So my question is when or which version of ethos-u release will integrate the tensorflow source code that this operator is supported in tflite micro. I suppose this operator will be supported in vela tool, right?
2. As I met such problems for tflite micro, I want to ask a question. Why don't we use tflite on Cortex-A directly instead of tflite micro on Cortex-M? Is there any necessary binding between ethos-u65 and tflite micro?
Hi
1. Next release of Arm Ethos-U is planned for end of February, which I am sure will include the detection postprocess operator.
Until then you can use the fetch_externals.py script to download the latest versions of all repos.
https://git.mlplatform.org/ml/ethos-u/ethos-u.git/about/
$ ./fetch_externals.py fetch
I don't know if Vela has support for detection postprocess in their plans or not.
2. Tensorflow Lite for microcontrollers (TFLu) has been designed to run on Cortex-M. I suppose it would be possible to compile a bare metal app for Cortex-A, but I guess what you wonder is, if Arm Ethos-U can be driven directly from Linux?
In theory, yes, it would be possible to drive the Arm Ethos-U directly from Linux. However, operators that cannot be mapped to the NPU would need to be executed on the CPU, either in kernel space or some kind of user space fallback. Implementing this fallback mechanism is doable, but I don't think it is a trivial task to do. The IRQ handling on Linux might also introduce latency, degrading the performance.
I don't have a clear understanding about how to handle the operators that can be executed on NPU and need to be executed on CPU. Are there many interactions when executing the operators on NPU and CPU? For example, the first operator executes on NPU, then the second operator executes on CPU... If so, there are too many interactions between NPU and CPU. It may be a big expense.
Actually, my idea is to run TFLite on Cortex-A. When it needs to run on NPU, Cortex-A will send this request to Cortex-M. Cortex-M will execute ethosu_invoke() and handle IRQ. I mean the core_driver still execute on Cortex-M.
Arm has analyzed the most common AI networks in the embedded space and tried to map the operators to the Arm Ethos-U. How well the this maps for you depends on what networks you want to run.
The software stack for Arm Ethos-U has been designed to fall back to Cortex-M for operators that are not supported by the NPU. Running TLFu on Cortex-A and dispatching custom operators to Cortex-M and the NPU could be possible, but is nothing we have planned to implement. In the Linux Driver Stack for Ethos-U we have provided an example how a Linux user space process can dispatch inferences to an Arm Ethos-U subsystem.