for(y=0;y<height;y++){ a0 = vld1_u16 (&p[width*0]); // a0 - a10 = 16x4 vector and width is 32 bit integer a1 = vld1_u16 (&p[width*1]); a2 = vld1_u16 (&p[width*2]); a3 = vld1_u16 (&p[width*3]); a4 = vld1_u16 (&p[width*4]); a5 = vld1_u16 (&p[width*5]); a6 = vld1_u16 (&p[width*6]); a7 = vld1_u16 (&p[width*7]); a8 = vld1_u16 (&p[width*8]); a9 = vld1_u16 (&p[width*9]); a10 = vld1_u16 (&p[width*10]); for(x=0;x<width1;x++) { p=p+4; }}
All load instructions from a0 to a9 is loading properly inside y loop. But loading of a10 causing an issue i.e loading zeros to a10 vector.The loading of a10 is happening if defined inside x for loop.
Hi Lohith,Can you provide more details on which toolchain you are using? Which target hardware are you using?RregardsAshok