The efficient way to load and store memory in Mali G-72

Hi,

I am doing an image crop and writing to the destination. I am using a vector load and store of 8 uchar's. can someone help in optimizing this kernel . any mali G-72 gpu specific changes required?

uchar* src_y : source pointer to Y data of the image

uchar* dst_y : destination pointer to Y data of the image

uchar* src_uv : source pointer to UV data of the image

uchar* dst_uv : destination pointer to UV data of the image

dst_uv_h = image height/2 // for copying uv part along with y 

global_size {dst_w,dst_h}; //destination width , destination height 

int x = get_global_id(0) * 8;
int y = get_global_id(1);

int src_pos = mad24(y, src_stride, x);  // y*w+h for source position
int dst_pos = mad24(y, dst_stride, x);  //y*w+h  for destination position
vstore8(vload8(0, src_y + src_pos), 0, dst_y + dst_pos);

if (y < dst_uv_h) {
vstore8(vload8(0, src_uv + src_pos), 0, dst_uv + dst_pos); //copy UV part of image 
}

More questions in this forum