This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Mali G72 mp3 flops performance

Hi guys:

  I'm an developing an opencl application on MTK P60(Mali G72 mp3). But i have met some problems.  

 The application has been run successfully on snapdragon 660(GPU Adreno 512), the performance was about 10ms. But when I run it on Mali G72 mp3, it should cost 60ms! When I check the gpu_utilization, it's almost 100 percent.

  Firstly, I couldn't find any specification about the flops performance with the Mali G72.(Adreno 512 GPU flops performance: 255 Gflops)

  Secondly, according to benchmarks, performance of G72 mp3 should close to the Adreno 512. I can't find out why it should perform so bad on G72 mp3.

  Welcome to talk about this. :)

 

Parents
  • Hi Harris:

    I have do seme tests, and found where the bottleneck is.  

    My original kernel code was writen based on Adreno 660. I use many vector local variable in the code. Adreno 660 has 2 compute units and support maximum 1024 work group size per CU. But the G72 only support 384 work group size per CU. I suspect that Mali GPU has much less hareware resources than Adreo per CU. This result in little work items working concurrently. 

    So I tuned the kernel code,  mainly reduce the vector numbers and put them into loops. The performance increased! Time consumption decrease from 66ms to 36ms.

    But I got another problem. I run the program by a commandline window. When I only run the commandline program, the time was 66ms. But when I run the commandline program meanwhile opening the system camera, the time becomes 36ms. Why the system camera could enhance the performance? It seems that the camera heat up the GPU device.

Reply
  • Hi Harris:

    I have do seme tests, and found where the bottleneck is.  

    My original kernel code was writen based on Adreno 660. I use many vector local variable in the code. Adreno 660 has 2 compute units and support maximum 1024 work group size per CU. But the G72 only support 384 work group size per CU. I suspect that Mali GPU has much less hareware resources than Adreo per CU. This result in little work items working concurrently. 

    So I tuned the kernel code,  mainly reduce the vector numbers and put them into loops. The performance increased! Time consumption decrease from 66ms to 36ms.

    But I got another problem. I run the program by a commandline window. When I only run the commandline program, the time was 66ms. But when I run the commandline program meanwhile opening the system camera, the time becomes 36ms. Why the system camera could enhance the performance? It seems that the camera heat up the GPU device.

Children