Facing issue while Prefetching the data

for(y=0;y<height;y++)
{
a0 = vld1_u16 (&p[width*0]); // a0 - a10 = 16x4 vector and width is 32 bit integer
a1 = vld1_u16 (&p[width*1]);
a2 = vld1_u16 (&p[width*2]);
a3 = vld1_u16 (&p[width*3]);
a4 = vld1_u16 (&p[width*4]);
a5 = vld1_u16 (&p[width*5]);
a6 = vld1_u16 (&p[width*6]);
a7 = vld1_u16 (&p[width*7]);
a8 = vld1_u16 (&p[width*8]);
a9 = vld1_u16 (&p[width*9]);
a10 = vld1_u16 (&p[width*10]);
for(x=0;x<width1;x++)
{
p=p+4;
}
}

All load instructions from a0 to a9 is loading properly inside y loop. But loading of a10 causing an issue i.e loading zeros to a10 vector.
The loading of a10 is happening if defined inside x for loop. Cross Compiler used is Linaro GCC.  Please help me to resolve this issue.

Thank you 

More questions in this forum