Hi, experts.
I want to try out U-55 and compare it with baseline CPU-based CMSIS-NN performance.
To do so, I am wondering if I can simply compile a toy code with just, e.g., matrix multiplication, and run it with U-55 vs CPU-based CMSIS-NN using AWS VHT.
The github has a really great set of example using TFLMicro, but TFLMicro is hiding a lot of things under its hood.
I want to more simply look at some stripped-down toy examples. I think this should be doable if I can just locate the function call that invokes U-55's computation, but not sure where that is, or if it is possible to not involve TFLMicro in the AWS VHT setup to begin with.
Thank you!
That is not the usage of Ethos-U55, you would use a TFLu model, optimized for Ethos-U by Vela Compiler:https://developer.arm.com/documentation/101888/0500/NPU-software-overview/NPU-software-tooling/The-Vela-compiler
Thank you for your reply. I am personally familiar with writing ML application as a series of CMSIS-NN library calls or a custom kernel (e.g., produced by TVM). From your response, it seems like it is simply not possible to call a single kernel for some operation (e.g., convolution or matmult) using Ethos-U, and involving TFLu is always necessary?In Figure 2.1 from https://developer.arm.com/documentation/101888/0500/NPU-software-overview/NPU-software-components?lang=en, it seems like TFLu is interfacing with an NPU driver anyways, just like it is interfacing with the CMSIS-NN lib. I want to know if it is possible to manually interface with the driver as a programmer, just as how I can manually call CMSIS-NN lib.