Subsequent iterated inference cycles work slower than the very first one

Hello Ben Clark and the ARM Community:

I've run into an interesting performance issue with ARMNN (running it on Raspberry Pi 4).  I am using mobilenet_v1 (alpha depth 0.75) 128 x 128 image classification model.  I am using the latest ARMNN library 21.05 cross-compiled for Raspberry Pi, and I am using backends{ "CpuAcc", "CpuRef" }.

I am running a cycle of inferences on a canned image file.  In the initialization routine of my application, I initialize the ARMNN framework, allocate output tensors and save a pointer to the ARMNN runtime (armnn::IRuntime* runtime). Then in the inference function, called in a loop, I am retrieving the saved pointer to the runtime and run the inference.

It works fine, but...  To my surprise, the very first cycle of inference works significantly faster than the subsequent iterated cycles!  With my model, the very first inference takes ~ 77 ms while the subsequent inferences take ~ 125 ms, almost twice as long!

Any idea why?

And the second interesting issue with ARMNN performance:  It works significantly slower than inference using TensorFlow Lite inference library.  Even very first cycle of inference with ARMNN is almost 2 times slower than using TensorFlow Lite inference library.  I didn't expect that...

More questions in this forum