I have a question about how to make Ethos-U NPU work on a ARM Cortex-A + Cortex-M processor. First, I found ethos-u-linux-driver-stack and ethos-u-core-software on https://git.mlplatform.org/.
1. I know ethos-u-linux-driver-stack is Ethos-U kernel driver. Should it be integrated into the Linux OS running on Cortex-A or be integrated into the Linux OS running on Cortex-M? I am nor clear about which core it need to perform on.
2. For ethos-u-core-software, how to run it? I did't find the detail steps to run it. Does it run on NPU or any core?
3. Except the above two repos, is there any other repo necessory to make Ethos-U NPU work on an ARM Cortex-A + Cortex-M processor?
Thanks for your suggestion in advance.
The Linux driver stack for Arm Ethos-U is provided as an example of how an Arm Cortex-A running Linux can dispatch inferences to an Arm Ethos-U subsystem (Arm Cortex-M, Arm Ethos-U, SRAM, …). This is a so-called Asymmetric MultiProcessing (AMP) system, which requires a small amount of shared memory and an external hardware block (e.g. the Arm MHU) to trigger IRQs on the remote CPU.
The Linux driver stack currently contains a user space application, a user space driver library, and a kernel driver. Important to notice is that the kernel driver will not drive the Arm Ethos-U NPU directly, but instead sends a message to an Arm Cortex-M in the Arm Ethos-U subsystem that drives the NPU.
The setup of the AMP communication is platform dependent and is done in the DTB file.
All software running on the Arm Cortex-M is referred to as core. The code that runs on the NPU is referred to as command stream.
The Arm Ethos-U (sub)system requires an Arm Ethos-U, some SRAM and an Arm Cortex-M to drive the NPU. The (sub)system is highly customizable. A customer may choose which Arm Cortex-M to use, the amount of SRAM, which peripheral to attach, which software to run etc.
Because of the high degrees of flexibility Arm can’t provide a ready packaged software to boot on the Arm Cortex-M. Core software only contains the necessary software components needed to run an inference using the TensorflowLite microcontroller framework.
You would need to write your own main() function to initialize the platform. You will also need to define a scatter file (Arm Clang) or a linker script (GCC) that describes the memory layout.
All software publicly available for Arm Ethos-U can be downloaded with fetch_externals.py from the Arm Ethos-U repository.
Other repos worth mentioning is Vela, which takes a tflite file as input and produces another optimized tflite as output. The optimized tflite file contains custom operators that are executed on the Arm Ethos-U.
The Arm Ethos-U IP has just recently been released, so it will still take some time before you can buy an Arm Ethos-U capable platform to test on. There are currently no virtual platforms available for download on the Arm website that include the Arm Ethos-U, but hopefully that will change in a not too distant future.
Kristofer, thank you very much for your detailed reply. It's very useful for me. I have two more questions.
1. The TensorflowLite microcontroller framework you mentioned is the common source https://github.com/tensorflow/tensorflow, right? I checked the tensorflow directory in ethos-u/core_software, it seems the common one, no other external patches.
2. Is OpenAMP used for IPC between Cortex-A and Cortex-M? According to the AMP communication you mentioned, it seems the current codes are sufficient.
1. Yes it is the Tensorflow framework from GitHub. The build system under tensorflow/lite/micro/tools/make/ is used to produce a static library, including CMSIS-NN and the Ethos-U driver. There is to my knowledge one small patch that has not yet reach upstream, that adjusts the build flags and a few paths to CMSIS-NN.
2. OpenAMP is in the plan, but for the moment the communication is defined in linux_driver_stack/kernel/ethosu_core_interface.h and does not make use of virtio and rpmsg. It does however use the Linux kernel maibox APIs, to abstract which hardware block that is used to trigger IRQs on the remote CPU.
Thanks a lot for your reply.