I want to try out U-55 and compare it with baseline CPU-based CMSIS-NN performance.
To do so, I am wondering if I can simply compile a toy code with just, e.g., matrix multiplication, and run it with U-55 vs CPU-based CMSIS-NN using AWS VHT.
The github has a really great set of example using TFLMicro, but TFLMicro is hiding a lot of things under its hood.
I want to more simply look at some stripped-down toy examples. I think this should be doable if I can just locate the function call that invokes U-55's computation, but not sure where that is, or if it is possible to not involve TFLMicro in the AWS VHT setup to begin with.