We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
I am trying to implement the copy frame kernel. I have a pointer to an image that I have to copy to the location given by the destination pointer. I can implement this with CPU, which will give me the best performance. but, because of power requirements, I am doing on GPU.
CPU time:2ms GPU time:24ms
Please review this GPU code and help in optimizing this.
//create buffer // repeat below code and create buffer variables for source and destination mem_flag |= CL_MEM_USE_HOST_PTR; buffers = clCreateBuffer(context_, mem_flag, mem_size, host_ptr, &error_code); global_size[2] = { (size_t) dst_w/8, (size_t) dst_h}; int ret = clEnqueueNDRangeKernel(queue_, kernel, 2, NULL, global_size, NULL, 0, NULL, &event_kernel); clFinish(queue_); //Kernal code // buf_src_y: Buffer pointer to source image buf_dst_y: Buffer pointer to destination image //buf_src_uv : buf_src + src_uv_offset buf_dst_uv : buf_dst + dst_uv_offset int x = get_global_id(0) * 8; int y = get_global_id(1); int src_pos = mad24(y, src_stride, x); int dst_pos = mad24(y, dst_stride, x); vstore8(vload8(0, buf_src_y + src_pos), 0, buf_dst_y + dst_pos); if (y < dst_uv_h) { vstore8(vload8(0, buf_src_uv + src_pos), 0, buf_dst_uv + dst_pos); }
Hello.Your code is not complete, so it cannot help you. When you are finished, you will receive more comments