I have a question about how to make Ethos-U NPU work on a ARM Cortex-A + Cortex-M processor. First, I found ethos-u-linux-driver-stack and ethos-u-core-software on https://git.mlplatform.org/.
1. I know ethos-u-linux-driver-stack is Ethos-U kernel driver. Should it be integrated into the Linux OS running on Cortex-A or be integrated into the Linux OS running on Cortex-M? I am nor clear about which core it need to perform on.
2. For ethos-u-core-software, how to run it? I did't find the detail steps to run it. Does it run on NPU or any core?
3. Except the above two repos, is there any other repo necessory to make Ethos-U NPU work on an ARM Cortex-A + Cortex-M processor?
Thanks for your suggestion in advance.
We have written some basic instructions here about the different memory configurations.
For optimal performance both model and arena should be placed in SRAM. If that doesn't fit our recommendation is to move the model to DRAM and leave the arena in SRAM.
If that still doesn't fit, then there are two options depending on your NPU.
For Ethos-U55 the only option is to pay a performance penalty and place both model and arena in DRAM.
For Ethos-U65 there is the option to enable spilling. Spilling means that both the model and the arena are placed in DRAM, and you reserve a smaller memory area in SRAM. Vela will use the spilling buffer as a cache, and will generate extra instructions to copy frequently accessed data between the arena and the spilling buffer. There will still be a performance impact, but it will be lower compared to not using spilling.
Hi, Kristofer, thank you very much for your clarification. It's very useful for me.
I have some other questions. In my verification for ethos-u65, I met the following two issues.
1. For mobilenet-ssd models, the inference couldn't complete. There is no interrupt generated. Only some of the command stream are executed, the rest of it are not executed. I tried to read QREAD register during the inference process. It was not increased at some offset.
2. In some test case, the interrupt would generate during the inference process. But the command stream are not all executed. When reading the STATUS register, the value is 0x2. In normal, the interrupt should generate when the command stream are all executed, and the value of STATUS register should be 0xFFFF0022.
For the above two issues, I don't have the solution. I think one idea is to know what is the exact command of the command stream when the issues occur.
I tried to add "--verbose-register-command-stream" when converting to vela model. But I can't understand the following log. How could I find the exact command of the the command stream?
Code: Command: Param: Payload: 0x0123 cmd0.NPU_SET_PARALLEL_MODE 0 - 0x010f cmd0.NPU_SET_IFM_REGION 1 - 0x4000 cmd1.NPU_SET_IFM_BASE0 0 0x00008000 (32768) 0x4001 cmd1.NPU_SET_IFM_BASE1 0 0x00000000 (0) 0x4002 cmd1.NPU_SET_IFM_BASE2 0 0x00000000 (0) 0x4003 cmd1.NPU_SET_IFM_BASE3 0 0x00000000 (0) 0x010b cmd0.NPU_SET_IFM_HEIGHT0_M1 31 - 0x010c cmd0.NPU_SET_IFM_HEIGHT1_M1 31 - 0x010a cmd0.NPU_SET_IFM_WIDTH0_M1 31 - 0x0104 cmd0.NPU_SET_IFM_DEPTH_M1 2 - 0x4006 cmd1.NPU_SET_IFM_STRIDE_C 0 0x00000001 (1) ....
Could you help to give me some insights? Thank you very much.
Reading the QREAD register is a good start. QREAD is the offset in bytes from the start of the command stream. Counting cmd0 commands times 4 and cmd1 commands times 8 should make it possible to determine which command that hangs.
If the STATUS register contains 0000'0002, then the NPU is stopped and an IRQ has been raised. The command stream has not reached the end, neither has an error interrupt been raised. It is difficult to say what is causing the hang, but a possible cause could be a weight stream corruption, or a DMA job reading or writing an illegal address.
Debugging these kind of errors is usually easier on a model. I don't know if you have built a FVP (Fixed Virtual Platform) model of your hardware, or if you could try running the same network on the Corstone-300 FVP with Ethos-U65?