Hi,
I am doing an image crop and writing to the destination. I am using a vector load and store of 8 uchar's. can someone help in optimizing this kernel . any mali G-72 gpu specific changes required?
uchar* src_y : source pointer to Y data of the image
uchar* dst_y : destination pointer to Y data of the image
uchar* src_uv : source pointer to UV data of the image
uchar* dst_uv : destination pointer to UV data of the image
dst_uv_h = image height/2 // for copying uv part along with y
global_size {dst_w,dst_h}; //destination width , destination height
int x = get_global_id(0) * 8; int y = get_global_id(1);
int src_pos = mad24(y, src_stride, x); // y*w+h for source position int dst_pos = mad24(y, dst_stride, x); //y*w+h for destination position vstore8(vload8(0, src_y + src_pos), 0, dst_y + dst_pos);
if (y < dst_uv_h) { vstore8(vload8(0, src_uv + src_pos), 0, dst_uv + dst_pos); //copy UV part of image }