ARM Graphic expert:
I have a cl_mem buffer, which will be reused every camera frame, so it need reset to zero before we can use it.
for opencl 1.2 we can use clEnqueueFillBuffer, for mali driver, what would be the most efficient way to fill the buffer?
I have write a kernel and fill the buffer on gpu, but it turn out to be very slow, the kernel:
__kernel void set(__global int* dstptr)
{
int y = get_global_id(0);
__global int* dst = dstptr + y*ROW_WIDTH;
for (int i=0; i<(ROW_WIDTH/4); i++) {
*dst = 0;
dst++;
}
ROW_WIDTH is a compile time macro, which is set to prime number of the buffer size, for example, for the buffer size 1280*960, it's 75.
And it turn out to be very slow, why? and what's can be done?
Hi Songmao,
many thanks for your question.
As you correctly said, clEnqueueFillBuffer is available from OpenCL 1.2.
Starting from your OpenCL kernel for resetting to 0 a CL buffer, I listed a couple of main strategies for improving its performance
Using these 2 strategies, a possible OpenCL kernel for resetting to 0 a CL buffer could be:
/* Each work item stores 4 int values */
int x = get_global_id(0) * 4
int y = get_global_id(1);
int offset = x + y * ROW_WIDTH;
vstore4((int4)0, 0, dstptr + offset);
In order to reproduce the result, could you tell us which ARM GPU you use and if you have disabled the DVFS for your performance evaluation?
I hope this can help you,
Thanks,
Gian Marco
Thanks for you reply!
I am running code on mali t760 mp2 on android platform.
I haven't disable dvfs. How to disable dvfs? I have the kernel source.
And I will try your suggestion later.
songmao
Hi songmao,
thanks for your information.
Please, could you tell us as well which platform do you use and which mali driver do you have?
many thanks,
It's mtk mt6752, the driver seems to be r5, but I am not sure.
thank you very much for providing the information about your platform and Mali driver.
In order to understand if the power management for the GPU affects the performance of your OpenCL kernel, you could try to execute it in a loop
i.e.
for(i = 0; i < N; ++i)
start timer
exec opencl kernel
stop timer (show timer)
If the execution time decreases with every iteration, it probably means that the power management for your GPU is enabled.
Hope this can help your investigation,
Once again many thanks,