Hi,
I am working on a video solution code. where I have to provide source image to GPU and do computation and write in the destination. I read that using buffer creates in the loop every time will add GPU overhead.so, I implemented the following. but, still, I am facing the same performance issue. can someone help me?
//create a buffer for source and destination
clCreateBuffer(context_,CL_MEM_READ_WRITE|CL_MEM_ALLOC_HOST_PTR, mem_size, NULL, &error_code);
while(recording){
clEnqueueWriteBuffer(queue_,buffer_ptr,CL_TRUE,0,memsize ,src_ptr,0, NULL,NUL):
global_size=(dst_w,dst_h);
clEnqueueNDRangeKernel(queue_, kernel, 2, NULL, global_size, NULL, 0, NULL, &event_kernel);
clEnqueueReadBuffer(queue_,buffer_dst_ptr,CL_TRUE,0,memsize,dst_y ,0, NULL,NULL);
}
My kernel is completely simple.It has very minimal computation.you can consider like its just copying image from source to desitnation.because of power constraints.I have to do it on GPU only.I know memory transfer overhead is there,but unable to find how to reduce it.
Hi kevin
Thanks for replying.
1.Total exicution time of loop is 21ms and exicution time of kernel is 10ms.
2.Buffer create size is around 2800x1600 and i have to create for source and destination
3. I can use clenqueuemap.but i did not find any difference. In this case i have to write memcopy and transfer image from source ptr to mapped buffer region right? So..copy will always be there