This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Mali G72 mp3 flops performance

Hi guys:

  I'm an developing an opencl application on MTK P60(Mali G72 mp3). But i have met some problems.  

 The application has been run successfully on snapdragon 660(GPU Adreno 512), the performance was about 10ms. But when I run it on Mali G72 mp3, it should cost 60ms! When I check the gpu_utilization, it's almost 100 percent.

  Firstly, I couldn't find any specification about the flops performance with the Mali G72.(Adreno 512 GPU flops performance: 255 Gflops)

  Secondly, according to benchmarks, performance of G72 mp3 should close to the Adreno 512. I can't find out why it should perform so bad on G72 mp3.

  Welcome to talk about this. :)

 

Parents
  • That's right. I enqueue more than 1 hundred kernels to the queue as one pass and cycle it. But 80% of them are very small kernels .(like relu and sum operation in CNN) And several convolution kernels costs 80% of the time. 

    very small kernels which are not able to parallelize and fully load the GPU because they are so small with a low thread count

    I am not quiet understand those words mean. When kernels are small and GPU cycles counter is high, will it affect the GPU load? I have tuned their work group size, and each small kernel can dispatch hundreds of threads. How could the GPU core is not fully loaded?

Reply
  • That's right. I enqueue more than 1 hundred kernels to the queue as one pass and cycle it. But 80% of them are very small kernels .(like relu and sum operation in CNN) And several convolution kernels costs 80% of the time. 

    very small kernels which are not able to parallelize and fully load the GPU because they are so small with a low thread count

    I am not quiet understand those words mean. When kernels are small and GPU cycles counter is high, will it affect the GPU load? I have tuned their work group size, and each small kernel can dispatch hundreds of threads. How could the GPU core is not fully loaded?

Children
No data