I have a question about how to make Ethos-U NPU work on a ARM Cortex-A + Cortex-M processor. First, I found ethos-u-linux-driver-stack and ethos-u-core-software on https://git.mlplatform.org/.
1. I know ethos-u-linux-driver-stack is Ethos-U kernel driver. Should it be integrated into the Linux OS running on Cortex-A or be integrated into the Linux OS running on Cortex-M? I am nor clear about which core it need to perform on.
2. For ethos-u-core-software, how to run it? I did't find the detail steps to run it. Does it run on NPU or any core?
3. Except the above two repos, is there any other repo necessory to make Ethos-U NPU work on an ARM Cortex-A + Cortex-M processor?
Thanks for your suggestion in advance.
Recently I have some hardware questions about ethos-u65. Could you give me some guide?
For field shram_size in CONFIG register, I see there are two values, SHRAM_48kB and SHRAM_96kB.
For DMA controller, I see there are several channels.
For Arm AMBA 5 AXI interfaces, there are two read/write master M0 and M1.
I thought I had responded to this question, but I can't find my answer in the thread.
The SHRAM is built into the NPU and the size can't be changed by software. The memory is typically used for storage of weights, biases and temporary data. This is a small memory that will not fit a network like ssd mobilenet. Instead small portions of the weighs and biases are copied to the SHRAM as they are needed.
There are 8 "logical memory channels". Which logical channel that is used for what data (ifm, ofm, weights, biases etc) is coded in the command stream by Vela. The driver maps the logical channels to the physical DMA interfaces (M0 and M1) in the region config registers.
Consequently which DMA interface that is used depends both on Vela and the driver. Now this is subject to change, but as of today with the default settings in Vela and the driver, the command stream, weights and biases will go over M1, and all other data over M0. This allows the TFLu model (command stream, weights and biases) to be moved from fast memory (SRAM) to slower memory (flash or DRAM) without congesting the M0 interface.
Kristofer, please help to confirm my below comments.
1. The fast memory (SRAM) you mentioned in "This allows the TFLu model (command stream, weights and biases) to be moved from fast memory (SRAM) to slower memory (flash or DRAM) without congesting the M0 interface." is the SHRAM built into the NPU, right?
2. Now, the current process is SHRAM <-> DMA & M1 or M0 <-> slower memory (flash or DRAM), right?
Maybe I didn't understand your last sentence clearly.
The TFLu has two buffer, model and arena. For optimal performance both the model and the arena should be placed in SRAM (or similar memory technology), however SRAM is expensive and at the cost of performance the model could be moved to DRAM or flash.
Kristofer, I want to set the particular SRAM and DRAM address for U65 on our processor, how should I do?
I am not sure what you mean with setting the SRAM and DRAM address for U65. Could you please elaborate a bit more on what problem you have?
If you for example wonder about how to place the model and arena buffers in memory, then perhaps this information might help you.
Running an inference on the TFLu framework requires three memory regions.
For the tests we have upstreamed to MLPlatform we have defined two additional buffers.
Please have a look at the baremetal example application. Each buffer is named with a section attribute.
The section attributes are placed in memory by the scatter file (ArmClang) or linker script (GCC). To change where the buffers are placed in memory you need to edit the scatter file or linker script.
View all questions in Machine Learning forum