Hello Everyone, I am Emmanuel Katto. I'm working on a mobile app that uses AI to perform image recognition, and I'm struggling to optimize the model deployment for low-power consumption. I've noticed that the app's power consumption is quite high, especially when running the AI model, which is affecting the user experience.
Can anyone share some best practices for optimizing mobile AI model deployment for low-power consumption?
I'm currently using TensorFlow Lite and OpenVINO on Android, but I'm open to exploring other options.
Are there any specific ARM-specific features or tools that can help with AI model optimization and deployment?
Please let me know.
Thanks!
Emmanuel Katto
Hi Emmanuel,
There are a number of things you can try, especially if you can adjust the model yourself.
Quantization is a big opportunity: Neural Network Model quantization on mobile Here you can reduce the power, compute and bandwidth by often 4x by going to eg int8 instead of float32. And depending on the network it might not have much of a drop in quality. Some layers/operators are especially sensitive to quantization, so if the whole thing has too big a quality drop, you may well still be able to do most of it with a partial quantization.
If you're willing to get deeper into the model, pruning and also weight clustering can reduce the size and compute much further again. This blog is talking about Ethos-U, but the same principles apply on Mobile: Benefit of pruning and clustering a neural network for before deploying on Arm Ethos-U NPU
Inference Advisor may be able to give you some additional tips: Take your neural networks to the next level with Arm's Machine Learning Inference Advisor
And you can try different inference engines too. ArmNN runs TFLite (and can be a delegate to standard tflite), and is optimised for Arm: https://github.com/ARM-software/armnn/releases . If your original model is PyTorch, ExecuTorch can be worth considering too: ExecuTorch and TOSA enabling PyTorch on Arm platforms
Hopefully some of those ideas can help. It can get a lot more specific if you're going layer-by-layer (convolutions are good on mobile! Especially depthwise), or you can just look for a model that has been tuned for mobile already, depending on what level you're looking at.
Cheers,
Ben
Thanks Ben!
Hi Emmanuel
You may find this recent announcement, and associated learning paths very useful and informative:
https://newsroom.arm.com/blog/kleidiai-integration-mediapipe
Thanks Ronan Synnott