Hi guys:
I'm an developing an opencl application on MTK P60(Mali G72 mp3). But i have met some problems.
The application has been run successfully on snapdragon 660(GPU Adreno 512), the performance was about 10ms. But when I run it on Mali G72 mp3, it should cost 60ms! When I check the gpu_utilization, it's almost 100 percent.
Firstly, I couldn't find any specification about the flops performance with the Mali G72.(Adreno 512 GPU flops performance: 255 Gflops)
Secondly, according to benchmarks, performance of G72 mp3 should close to the Adreno 512. I can't find out why it should perform so bad on G72 mp3.
Welcome to talk about this. :)
Hi Harris:
I have do seme tests, and found where the bottleneck is.
My original kernel code was writen based on Adreno 660. I use many vector local variable in the code. Adreno 660 has 2 compute units and support maximum 1024 work group size per CU. But the G72 only support 384 work group size per CU. I suspect that Mali GPU has much less hareware resources than Adreo per CU. This result in little work items working concurrently.
So I tuned the kernel code, mainly reduce the vector numbers and put them into loops. The performance increased! Time consumption decrease from 66ms to 36ms.
But I got another problem. I run the program by a commandline window. When I only run the commandline program, the time was 66ms. But when I run the commandline program meanwhile opening the system camera, the time becomes 36ms. Why the system camera could enhance the performance? It seems that the camera heat up the GPU device.
It is possibly related to DVFS (dynamic voltage and frequency scaling). When idle the device will run at a low power state; it may take some time for the CPU, GPU, and memory system to select and stabilize on a frequency when new workloads start running and increase demand. With more things running - such as the camera - it may more rapidly selected a higher frequency for heavily loaded components.
Can I change DVFS mode directly? I remember that I can set DVFS to "power save" or "performance" to control Adreno GPU. Is there similar way to control Mali GPU?
The DVFS implementation isn't provided by Arm - it's implemented by the chipset manufacturer, so you'd have to check with them, sorry.
Kind regards, Pete