for(y=0;y<height;y++){ a0 = vld1_u16 (&p[width*0]); // a0 - a10 = 16x4 vector and width is 32 bit integer a1 = vld1_u16 (&p[width*1]); a2 = vld1_u16 (&p[width*2]); a3 = vld1_u16 (&p[width*3]); a4 = vld1_u16 (&p[width*4]); a5 = vld1_u16 (&p[width*5]); a6 = vld1_u16 (&p[width*6]); a7 = vld1_u16 (&p[width*7]); a8 = vld1_u16 (&p[width*8]); a9 = vld1_u16 (&p[width*9]); a10 = vld1_u16 (&p[width*10]); for(x=0;x<width1;x++) { p=p+4; }}
All load instructions from a0 to a9 is loading properly inside y loop. But loading of a10 causing an issue i.e loading zeros to a10 vector.The loading of a10 is happening if defined inside x for loop. Cross Compiler used is Linaro GCC. Please help me to resolve this issue.
Thank you
Hi Lohith,Can you share which Linaro cross-toolchain are you using? If you are able to use a later toolchain, I would strongly recommend using Arm released GNU 8 based toolchain. You can download it from https://developer.arm.com/tools-and-software/open-source-software/developer-tools/gnu-toolchain/gnu-a/downloads.RegardsAshok