Varied Inference Times on running same Neural Network repeatedly on GPU on HiKey970


I am working on Hikey970 board with ARM Mali-G72 MP12 GPU. I flashed the board with Android and installed the benchmark app

with the command:

adb install -r -d -g android_aarch64_benchmark_model.apk

by following the TensorFlow website, and the link is attached below:

I converted the python code of neural network and converted into .tflite format to run with the
bench-marking tool. After executing Squeezenet with the command:

$ adb shell am start -S \

  -n org.tensorflow.lite.benchmark/.BenchmarkModelActivity \
  --es args '"--graph=/data/local/tmp/squeezenet.tflite \

The inference times that I obtained for SqueezeNet on GPU had standard deviation of 38.39ms. 
On executing the network for second time, the inference time was 7 times greater than that of
executing network for the first time. On executing the network for the 3rd time,the inference
time was similar to that of the first iteration. Could you please, help me understand,why the
inference time on GPU has large variation after each run?