I want to try out U-55 and compare it with baseline CPU-based CMSIS-NN performance.
To do so, I am wondering if I can simply compile a toy code with just, e.g., matrix multiplication, and run it with U-55 vs CPU-based CMSIS-NN using AWS VHT.
The github has a really great set of example using TFLMicro, but TFLMicro is hiding a lot of things under its hood.
I want to more simply look at some stripped-down toy examples. I think this should be doable if I can just locate the function call that invokes U-55's computation, but not sure where that is, or if it is possible to not involve TFLMicro in the AWS VHT setup to begin with.
Thank you for your reply. I am personally familiar with writing ML application as a series of CMSIS-NN library calls or a custom kernel (e.g., produced by TVM). From your response, it seems like it is simply not possible to call a single kernel for some operation (e.g., convolution or matmult) using Ethos-U, and involving TFLu is always necessary?In Figure 2.1 from https://developer.arm.com/documentation/101888/0500/NPU-software-overview/NPU-software-components?lang=en, it seems like TFLu is interfacing with an NPU driver anyways, just like it is interfacing with the CMSIS-NN lib. I want to know if it is possible to manually interface with the driver as a programmer, just as how I can manually call CMSIS-NN lib.