This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Subsequent iterated inference cycles work slower than the very first one

Hello Ben Clark and the ARM Community:

I've run into an interesting performance issue with ARMNN (running it on Raspberry Pi 4).  I am using mobilenet_v1 (alpha depth 0.75) 128 x 128 image classification model.  I am using the latest ARMNN library 21.05 cross-compiled for Raspberry Pi, and I am using backends{ "CpuAcc", "CpuRef" }.

I am running a cycle of inferences on a canned image file.  In the initialization routine of my application, I initialize the ARMNN framework, allocate output tensors and save a pointer to the ARMNN runtime (armnn::IRuntime* runtime). Then in the inference function, called in a loop, I am retrieving the saved pointer to the runtime and run the inference.

It works fine, but...  To my surprise, the very first cycle of inference works significantly faster than the subsequent iterated cycles!  With my model, the very first inference takes ~ 77 ms while the subsequent inferences take ~ 125 ms, almost twice as long!

Any idea why?

And the second interesting issue with ARMNN performance:  It works significantly slower than inference using TensorFlow Lite inference library.  Even very first cycle of inference with ARMNN is almost 2 times slower than using TensorFlow Lite inference library.  I didn't expect that...

Parents
  • Hi Ben,

    Thanks once again for your quick response.

    I am very well aware of the OS that runs in Raspberry Pi -- yes, it is armv7, as I myself had mentioned to you in my previous messages.  And no, at this point I am not interested in experimenting with aarch64 OS on Raspberry Pi.

    My current goal is to evaluate performance of the ARM Compute Library, and by extension the ARM NN inference engine, in 32-bit hardware platforms, both Linux-based and, even more importantly for me, bare-metal. I've been quite disappointed so far, as you can tell.  I am surprised that ARM NN team seems to have overlooked the importance of performance optimization for the 32-bit architectures.

Reply
  • Hi Ben,

    Thanks once again for your quick response.

    I am very well aware of the OS that runs in Raspberry Pi -- yes, it is armv7, as I myself had mentioned to you in my previous messages.  And no, at this point I am not interested in experimenting with aarch64 OS on Raspberry Pi.

    My current goal is to evaluate performance of the ARM Compute Library, and by extension the ARM NN inference engine, in 32-bit hardware platforms, both Linux-based and, even more importantly for me, bare-metal. I've been quite disappointed so far, as you can tell.  I am surprised that ARM NN team seems to have overlooked the importance of performance optimization for the 32-bit architectures.

Children