for(y=0;y<height;y++){ a0 = vld1_u16 (&p[width*0]); // a0 - a10 = 16x4 vector and width is 32 bit integer a1 = vld1_u16 (&p[width*1]); a2 = vld1_u16 (&p[width*2]); a3 = vld1_u16 (&p[width*3]); a4 = vld1_u16 (&p[width*4]); a5 = vld1_u16 (&p[width*5]); a6 = vld1_u16 (&p[width*6]); a7 = vld1_u16 (&p[width*7]); a8 = vld1_u16 (&p[width*8]); a9 = vld1_u16 (&p[width*9]); a10 = vld1_u16 (&p[width*10]); for(x=0;x<width1;x++) { p=p+4; }}
All load instructions from a0 to a9 is loading properly inside y loop. But loading of a10 causing an issue i.e loading zeros to a10 vector.The loading of a10 is happening if defined inside x for loop. Cross Compiler used is Linaro GCC. Please help me to resolve this issue.
Thank you
Hi Lohith,Can you share which Linaro cross-toolchain are you using? If you are able to use a later toolchain, I would strongly recommend using Arm released GNU 8 based toolchain. You can download it from https://developer.arm.com/tools-and-software/open-source-software/developer-tools/gnu-toolchain/gnu-a/downloads.RegardsAshok
Hi Lohith,
If a newer version doesn't work as Ashok suggested, could you post a full reproducer?
One that we can compile that has the problem you're describing. There's currently not enough information to be able to give any useful help.
Also the flags used to compile.
Thanks,
Tamar