This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Copy frame Taking more time on Mali GPU

I am trying to implement the copy frame kernel. I have a pointer to an image that I have to copy to the location given by the destination pointer. I can implement this with CPU, which will give me the best performance. but, because of power requirements, I am doing on GPU.

CPU time:2ms GPU time:24ms

Please review this GPU code and help in optimizing this.

//create buffer

// repeat below code and create buffer variables for  source and destination 

mem_flag |= CL_MEM_USE_HOST_PTR;
buffers = clCreateBuffer(context_, mem_flag, mem_size, host_ptr, &error_code);

global_size[2] = { (size_t) dst_w/8, (size_t) dst_h};

int ret = clEnqueueNDRangeKernel(queue_, kernel, 2, NULL, global_size, NULL, 0, NULL, &event_kernel);
clFinish(queue_);

//Kernal code

// buf_src_y: Buffer pointer to source image   buf_dst_y: Buffer pointer to destination image
//buf_src_uv : buf_src + src_uv_offset            buf_dst_uv : buf_dst + dst_uv_offset         

int x = get_global_id(0) * 8;
int y = get_global_id(1);

int src_pos = mad24(y, src_stride, x);
int dst_pos = mad24(y, dst_stride, x);
vstore8(vload8(0, buf_src_y + src_pos), 0, buf_dst_y + dst_pos);

if (y < dst_uv_h) {
vstore8(vload8(0, buf_src_uv + src_pos), 0, buf_dst_uv + dst_pos);
}

Top replies

Kévin Petit over 5 years ago +1 verified

Hi, You're more likely to get useful help if you post the complete code. What makes you think the GPU is more power efficient when it comes to memory copies? Regards, Kévin

Parents

0 doithuong over 5 years ago

Hello.
Your code is not complete, so it cannot help you. When you are finished, you will receive more comments
Cancel
Vote up 0 Vote down

Cancel

Reply

0 doithuong over 5 years ago

Hello.
Your code is not complete, so it cannot help you. When you are finished, you will receive more comments
Cancel
Vote up 0 Vote down

Cancel

Children

No data