Please note: We are aware of an issue affecting replies on the Arm Community forums, which may not be loading as expected.

We apologize for any inconvenience and appreciate your patience while we investigate and work to resolve the issue.

Thank you for your understanding.


This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Mali-G52 OpenCL performances

Hallo, I am working with a Rockchip platform (RK3568) with a Mali-G52 GPU and I am trying to understand the processing time of my openCL code. I simplified my kernel as much as possible so that it only does a copy of a buffer and I am measuring very high processing time, compared to the theoretical values I computed.

Here are the computations made :

Size of copied buffer : 14 587 776 Bytes

Announced memory bus frequency * width (LPDDR4-1600) : 1600 * 2 * 32 bits

So i get my theoretical value : Image_size / (bus Freq * width) = 1.14ms to read, the full buffer.

I doubled that value since I want to read + write back, so i get 2.28ms. I read that the efficiency of such DDR should be around 65-70%.

Now, when I use the OpenCL built-in function 'clEnqueueCopyBuffer', i get a processing time of 6.5ms, which is already more than double the theoretical time. When i write a kernel myself, that takes as input 2 buffers of said size, allocated by Host (CPU), I get a best case of 12.2ms, using SVM for both buffers.

Here are my kernel's parameters : global_work_size = Img_size/16 ; reading/writing 16 bytes at a time using vload16/vstore16 functions.

Additionaly, I used the following command to watch the GPU/DDR clocks while i was doing my tests (1000 copies in a row) :

root@rock-3a:/sys/kernel/debug/clk# cat clk_summary | grep gpu

root@rock-3a:/sys/kernel/debug/clk# cat clk_summary | grep ddr

And I noticed the given frequencies where never as high as the announced frequencies of 800MHz for GPU and 1600MHz for DDR.

So I am wondering if I am missing anything while profiling ? Or maybe a setting to "force" the GPU/DDR frequencies to go as high as possible ? Is there something wrong in my theoretical computations already ?

I also posted a support case, but i figured asking to the community might prove useful.

Thanks for you time and consideration, Virgile