Is it possible for the Ethos-U55's MAC Engine and Elementwise Engine to run concurrently?

Hello ARM Community,


First, from ethos-u55 manual, I saw that NPU_OP_<KERNEL> is non-blocking command.

Second, I understand that the shared buffer stores the data that the NPU is processing. Since the MAC main OP and the elementwise OP use the same shared buffer, it might not be possible for these two engines to produce correct results concurrently.

But I tried the experiment without worring about the result:
```
set register, ifm, ofm...
NPU_OP_CONV
NPU_OP_ELEMENTWISE (perform add)
other command...
```
and the total cycle report from the FVP indicates that these two engines don't seem to run concurrently (total cycles = single conv cycles + single add cycles).

The question is as stated in the title. Did I misunderstand something?