Graphics, Gaming, and VR forum What's the best way to fill a buffer?

State Accepted Answer
+1 person also asked this people also asked this
Locked Locked
Replies 5 replies
Subscribers 136 subscribers
Views 6945 views
Users 0 members are here

Options

Related

How was your experience today?

This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

What's the best way to fill a buffer?

songmao over 8 years ago

ARM Graphic expert:

I have a cl_mem buffer, which will be reused every camera frame, so it need reset to zero before we can use it.

for opencl 1.2 we can use clEnqueueFillBuffer, for mali driver, what would be the most efficient way to fill the buffer?

I have write a kernel and fill the buffer on gpu, but it turn out to be very slow, the kernel:

__kernel void set(__global int* dstptr)

{

int y = get_global_id(0);

__global int* dst = dstptr + y*ROW_WIDTH;

for (int i=0; i<(ROW_WIDTH/4); i++) {

*dst = 0;

dst++;

}

}

ROW_WIDTH is a compile time macro, which is set to prime number of the buffer size, for example, for the buffer size 1280*960, it's 75.

And it turn out to be very slow, why? and what's can be done?

Parents

+1 Gian Marco Iodice over 8 years ago
Hi Songmao,
many thanks for your question.
As you correctly said, clEnqueueFillBuffer is available from OpenCL 1.2.
Starting from your OpenCL kernel for resetting to 0 a CL buffer, I listed a couple of main strategies for improving its performance
Avoiding the for loop as it may reduce the number of registers the kernel needs allowing more work items to run on the GPU at the same time. The for loop can be replaced by using the second dimension of the GWS.
Exploiting the vector capabilities of ARM GPUs. Mali-T600 and T700 series GPUs natively support all CL data types as have SIMD capabilities. This means that is is possible to compute the same operation on multiple datas allowing to dispatch fewer work-items.
Using these 2 strategies, a possible OpenCL kernel for resetting to 0 a CL buffer could be:
__kernel void set(__global int* dstptr)
{
    /* Each work item stores 4 int values */
    int x = get_global_id(0) * 4
    int y = get_global_id(1);
    int offset = x + y * ROW_WIDTH;
    vstore4((int4)0, 0, dstptr + offset);
}
In order to reproduce the result, could you tell us which ARM GPU you use and if you have disabled the DVFS for your performance evaluation?
I hope this can help you,
Thanks,
Gian Marco
Cancel
Up 0 Down

Cancel

Reply

+1 Gian Marco Iodice over 8 years ago
Hi Songmao,
many thanks for your question.
As you correctly said, clEnqueueFillBuffer is available from OpenCL 1.2.
Starting from your OpenCL kernel for resetting to 0 a CL buffer, I listed a couple of main strategies for improving its performance
Avoiding the for loop as it may reduce the number of registers the kernel needs allowing more work items to run on the GPU at the same time. The for loop can be replaced by using the second dimension of the GWS.
Exploiting the vector capabilities of ARM GPUs. Mali-T600 and T700 series GPUs natively support all CL data types as have SIMD capabilities. This means that is is possible to compute the same operation on multiple datas allowing to dispatch fewer work-items.
Using these 2 strategies, a possible OpenCL kernel for resetting to 0 a CL buffer could be:
__kernel void set(__global int* dstptr)
{
    /* Each work item stores 4 int values */
    int x = get_global_id(0) * 4
    int y = get_global_id(1);
    int offset = x + y * ROW_WIDTH;
    vstore4((int4)0, 0, dstptr + offset);
}
In order to reproduce the result, could you tell us which ARM GPU you use and if you have disabled the DVFS for your performance evaluation?
I hope this can help you,
Thanks,
Gian Marco
Cancel
Up 0 Down

Cancel

Children

0 songmao over 8 years ago in reply to Gian Marco Iodice

Thanks for you reply!
I am running code on mali t760 mp2 on android platform.
I haven't disable dvfs. How to disable dvfs? I have the kernel source.
And I will try your suggestion later.
Thanks,
songmao
Cancel
Up 0 Down

Cancel
0 Gian Marco Iodice over 8 years ago in reply to songmao

Hi songmao,
thanks for your information.
Please, could you tell us as well which platform do you use and which mali driver do you have?
many thanks,
Gian Marco
Cancel
Up 0 Down

Cancel
0 songmao over 8 years ago in reply to Gian Marco Iodice

It's mtk mt6752, the driver seems to be r5, but I am not sure.
Cancel
Up 0 Down

Cancel