Improving Inference Performance of a Quantized CNN on Cortex-A78

Hello

I’m deploying a convolutional neural network on an ARM Cortex-A78-based mobile platform and using 8-bit post-training quantization to reduce model size. Inference works, but performance is lower than expected — latency per image is around 120 ms, and CPU utilization is high. I’ve tried using ARM Compute Library and Neon intrinsics, but I’m unsure if I’m fully leveraging the CPU’s vectorization capabilities.

Has anyone successfully optimized quantized CNNs on Cortex-A78? Are there recommended compiler flags, threading strategies, or memory layout adjustments that significantly reduce latency? Any practical examples or benchmarks would be extremely helpful.

Thank you

Med venlig hilsen,
Mikkel Jensen
Denmark

Parents
  • Hi Mikkel,

    We have optimized/quantized for cortex-A including A78. We probably need a bit more info to know what the issue is / what can be done. Is it Dynamic or static quantization? Do you know layer timings at all - is it the convolution layers or another layer that isn't getting sped up? What was speed before quantization? Any pruning? What size images? etc etc.  But I'll point a couple of ACL people at this thread as well, they'll have better specifics than me if you have more detail. If there's parts you cannot share one a public forum we can provide an email address etc. Or you can ask more on Arm Developer discord if you join the Arm Developer program too.

Reply
  • Hi Mikkel,

    We have optimized/quantized for cortex-A including A78. We probably need a bit more info to know what the issue is / what can be done. Is it Dynamic or static quantization? Do you know layer timings at all - is it the convolution layers or another layer that isn't getting sped up? What was speed before quantization? Any pruning? What size images? etc etc.  But I'll point a couple of ACL people at this thread as well, they'll have better specifics than me if you have more detail. If there's parts you cannot share one a public forum we can provide an email address etc. Or you can ask more on Arm Developer discord if you join the Arm Developer program too.

Children
No data