I have a question about how to make Ethos-U NPU work on a ARM Cortex-A + Cortex-M processor. First, I found ethos-u-linux-driver-stack and ethos-u-core-software on https://git.mlplatform.org/.
1. I know ethos-u-linux-driver-stack is Ethos-U kernel driver. Should it be integrated into the Linux OS running on Cortex-A or be integrated into the Linux OS running on Cortex-M? I am nor clear about which core it need to perform on.
2. For ethos-u-core-software, how to run it? I did't find the detail steps to run it. Does it run on NPU or any core?
3. Except the above two repos, is there any other repo necessory to make Ethos-U NPU work on an ARM Cortex-A + Cortex-M processor?
Thanks for your suggestion in advance.
As mentioned in previous comments there there are two TFLu buffers - model and arena - that need to be placed in memory. For a system with SRAM and DRAM we have three combinations that make sense.
Vela allocates a buffer inside of the arena. This buffer contains temporary data that the NPU will access frequently, and should for optimal performance be placed in SRAM.
However, for alternative 3 the arena will be placed in DRAM. For this option Vela can be configured to split the temporary data into an "arena buffer" and a "fast memory buffer". The Ethos-U will redirect the "fast memory buffer" to a memory area in SRAM.
The fast memory feature is a bit complicated and requires synchronized changes in several places:
Vela takes a tflite file as input, and produces another optimized tflite file as output. During the optimization phase Vela controls in which input tensors data is placed, like this:
The Ethos-U NPU driver writes the address of the command stream to the QBASE register. The addresses of input tensors 2-4 are written to the BASEP<nr> registers. If spilling has been enabled, then the driver will override the 'fast' tensor address before the BASEP<nr> register is written.
The Ethos-U NPU has two AXI interfaces, M0 and M1. The REGIONCFG register controls over which AXI interface the base pointers are routed to.
For example, with current Vela implementation weights and biases are accessed over base pointer 0. In the region config you can control if base pointer 0 should use M0 or M1.
The default region config is defined here. Please note that AXI0 and AXI1 are routed to M0, and AXI2 and AXI3 to M1.