This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Mali G72 mp3 flops performance

Hi guys:

  I'm an developing an opencl application on MTK P60(Mali G72 mp3). But i have met some problems.  

 The application has been run successfully on snapdragon 660(GPU Adreno 512), the performance was about 10ms. But when I run it on Mali G72 mp3, it should cost 60ms! When I check the gpu_utilization, it's almost 100 percent.

  Firstly, I couldn't find any specification about the flops performance with the Mali G72.(Adreno 512 GPU flops performance: 255 Gflops)

  Secondly, according to benchmarks, performance of G72 mp3 should close to the Adreno 512. I can't find out why it should perform so bad on G72 mp3.

  Welcome to talk about this. :)

 

Parents
  • Not all flops are equal - there is no point counting adders if you need multipliers, etc - so generally flops numbers are not that useful.

    Assuming a GPU clocked at 650 MHz you get 12 fp32 FMAs per core per clock.If you count an FMA as 2 FLOPS, then you get 12 * 2 * 3 * 650M = 46G FLOPS of FP32 FMAs. If you write well vectorized FP16 then you get double that.

    If you can share your CL kernel we can probably provide more targeted advice.

    Cheers, 
    Pete

Reply
  • Not all flops are equal - there is no point counting adders if you need multipliers, etc - so generally flops numbers are not that useful.

    Assuming a GPU clocked at 650 MHz you get 12 fp32 FMAs per core per clock.If you count an FMA as 2 FLOPS, then you get 12 * 2 * 3 * 650M = 46G FLOPS of FP32 FMAs. If you write well vectorized FP16 then you get double that.

    If you can share your CL kernel we can probably provide more targeted advice.

    Cheers, 
    Pete

Children