I have a question about how to make Ethos-U NPU work on a ARM Cortex-A + Cortex-M processor. First, I found ethos-u-linux-driver-stack and ethos-u-core-software on https://git.mlplatform.org/.
1. I know ethos-u-linux-driver-stack is Ethos-U kernel driver. Should it be integrated into the Linux OS running on Cortex-A or be integrated into the Linux OS running on Cortex-M? I am nor clear about which core it need to perform on.
2. For ethos-u-core-software, how to run it? I did't find the detail steps to run it. Does it run on NPU or any core?
3. Except the above two repos, is there any other repo necessory to make Ethos-U NPU work on an ARM Cortex-A + Cortex-M processor?
Thanks for your suggestion in advance.
I would like to divide this answer into two parts.
The Arm Ethos-U NPU driver is OS agnostic and does not use any OS specific primitives (like mutex, queues, etc). It consequently can be paired up with any RTOS. You spawn a thread that drive the TFLu runtime from within the thread. I would even go one step further and say that it is recommended to use a RTOS. So yes, you can use FreeRTOS or any other RTOS you prefer.
What is not (yet) supported are user facing OS APIs. The APIs would be used by applications to schedule inferneces, and implemented by drivers to run inferences on the physical device. The APIs would provide hardware abstraction (you do not know which hardware that accelerates your network) and scheduling (multiple applications can share multiple NPUs).
Kristofer, according to my understanding about the current code, Tensorflow Lite APIs (for example, interpreter.Invoke) are used by the application and call the ethosu driver in micro/. Are Tensorflow Lite APIs one kind of the user facing OS APIs you mentioned? I am not clear about the user facing OS APIs which is not (yet) supported. Please correct and guide me.
User facing APIs would be part of the OS and should be generic enough to support multiple frameworks (TFLu, TVM, etc). They should allow multiple applications to share NPU resources and ideally provide hardware abstraction (the application is unaware of which hardware that accelerates the network). An application would do an OS call to run an inference, instead of directly calling interpreter.Invoke().
These APIs don't exist today and we do not yet have a clear picture of what they would look like, or if this even is the right way to go. Hardware abstraction might also be difficult to achieve, because networks might have been optimized for a specific hardware.