Support forums

Mobile, Graphics, and Gaming forum Mali G72 mp3 flops performance

State Accepted Answer
+1 person also asked this people also asked this
Locked Locked
Replies 15 replies
Subscribers 139 subscribers
Views 66448 views
Users 0 members are here

Options

How was your experience today?

This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Mali G72 mp3 flops performance

Jeljeli over 6 years ago

Hi guys:

I'm an developing an opencl application on MTK P60(Mali G72 mp3). But i have met some problems.

The application has been run successfully on snapdragon 660(GPU Adreno 512), the performance was about 10ms. But when I run it on Mali G72 mp3, it should cost 60ms! When I check the gpu_utilization, it's almost 100 percent.

Firstly, I couldn't find any specification about the flops performance with the Mali G72.(Adreno 512 GPU flops performance: 255 Gflops)

Secondly, according to benchmarks, performance of G72 mp3 should close to the Adreno 512. I can't find out why it should perform so bad on G72 mp3.

Welcome to talk about this. :)

Top replies

Parents

0 Jeljeli over 6 years ago in reply to Peter Harris

That's right. I enqueue more than 1 hundred kernels to the queue as one pass and cycle it. But 80% of them are very small kernels .(like relu and sum operation in CNN) And several convolution kernels costs 80% of the time.

Peter Harris said:
very small kernels which are not able to parallelize and fully load the GPU because they are so small with a low thread count

I am not quiet understand those words mean. When kernels are small and GPU cycles counter is high, will it affect the GPU load? I have tuned their work group size, and each small kernel can dispatch hundreds of threads. How could the GPU core is not fully loaded?
Cancel
Vote up 0 Vote down

Cancel

Reply

0 Jeljeli over 6 years ago in reply to Peter Harris

That's right. I enqueue more than 1 hundred kernels to the queue as one pass and cycle it. But 80% of them are very small kernels .(like relu and sum operation in CNN) And several convolution kernels costs 80% of the time.

Peter Harris said:
very small kernels which are not able to parallelize and fully load the GPU because they are so small with a low thread count

I am not quiet understand those words mean. When kernels are small and GPU cycles counter is high, will it affect the GPU load? I have tuned their work group size, and each small kernel can dispatch hundreds of threads. How could the GPU core is not fully loaded?
Cancel
Vote up 0 Vote down

Cancel

Children

No data