Hello to all:
I'm using an i.mx6 with cortex A9 and when I run neural networks (MLP) written by me in plain C++ I got a difference from my I7 of 10 times which seems to be reasonable. My problem appears when I use a CNN (LFFD in my case) with TVM, then the difference between my computer and the i.mx6 if 40 times which is 4 times slower than my own written code. This is what I can't understand, why do I get this extra x4 when using TVM (I have also used TFLite with a similar outcome).
The original format of the file is .onnx and the conversion to TVM is done using optimization without performance loss.
Anyone knows how to overcome this issue?
Thanks in advance.