We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
for(y=0;y<height;y++){ a0 = vld1_u16 (&p[width*0]); // a0 - a10 = 16x4 vector and width is 32 bit integer a1 = vld1_u16 (&p[width*1]); a2 = vld1_u16 (&p[width*2]); a3 = vld1_u16 (&p[width*3]); a4 = vld1_u16 (&p[width*4]); a5 = vld1_u16 (&p[width*5]); a6 = vld1_u16 (&p[width*6]); a7 = vld1_u16 (&p[width*7]); a8 = vld1_u16 (&p[width*8]); a9 = vld1_u16 (&p[width*9]); a10 = vld1_u16 (&p[width*10]); for(x=0;x<width1;x++) { p=p+4; }}
All load instructions from a0 to a9 is loading properly inside y loop. But loading of a10 causing an issue i.e loading zeros to a10 vector.The loading of a10 is happening if defined inside x for loop.