Hello to all:
I'm using an i.mx6 with cortex A9 and when I run neural networks (MLP) written by me in plain C++ I got a difference from my I7 of 10 times which seems to be reasonable. My problem appears when I use a CNN (LFFD in my case) with TVM, then the difference between my computer and the i.mx6 if 40 times which is 4 times slower than my own written code. This is what I can't understand, why do I get this extra x4 when using TVM (I have also used TFLite with a similar outcome).
The original format of the file is .onnx and the conversion to TVM is done using optimization without performance loss.
Anyone knows how to overcome this issue?
Thanks in advance.
I did't comment it, but the inefficient network implmentation is generated with MXNET to onnx (works fine in Windows under Opencv) and then it is optimized with TVM to work on ARM9.