We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
Hallo, I am working with a Rockchip platform (RK3568) with a Mali-G52 GPU and I am trying to understand the processing time of my openCL code. I simplified my kernel as much as possible so that it only does a copy of a buffer and I am measuring very high processing time, compared to the theoretical values I computed.
Here are the computations made :
Size of copied buffer : 14 587 776 Bytes
Announced memory bus frequency * width (LPDDR4-1600) : 1600 * 2 * 32 bits
So i get my theoretical value : Image_size / (bus Freq * width) = 1.14ms to read, the full buffer.
I doubled that value since I want to read + write back, so i get 2.28ms. I read that the efficiency of such DDR should be around 65-70%.
Now, when I use the OpenCL built-in function 'clEnqueueCopyBuffer', i get a processing time of 6.5ms, which is already more than double the theoretical time. When i write a kernel myself, that takes as input 2 buffers of said size, allocated by Host (CPU), I get a best case of 12.2ms, using SVM for both buffers.
Here are my kernel's parameters : global_work_size = Img_size/16 ; reading/writing 16 bytes at a time using vload16/vstore16 functions.
Additionaly, I used the following command to watch the GPU/DDR clocks while i was doing my tests (1000 copies in a row) :
root@rock-3a:/sys/kernel/debug/clk# cat clk_summary | grep gpu
root@rock-3a:/sys/kernel/debug/clk# cat clk_summary | grep ddr
And I noticed the given frequencies where never as high as the announced frequencies of 800MHz for GPU and 1600MHz for DDR.
So I am wondering if I am missing anything while profiling ? Or maybe a setting to "force" the GPU/DDR frequencies to go as high as possible ? Is there something wrong in my theoretical computations already ?
I also posted a support case, but i figured asking to the community might prove useful.
Thanks for you time and consideration, Virgile