There are two development articles metioned it that GCC can do it:
ntroducing NEON
NEON Support in Compilation Tools
But I tested code snap in these docs with GCC compling options but the generated assembly code
doesn't use any neon instruction.
Glad you got it working =)
Hint of loop number like (len & ~3) is not required,
In library code where "len" cannot be determined this helps the compiler avoid unneeded code. Normally it will still vectorize - but will need to include additional code to handle any left over parts. For example if you pass in a list of length 17 then it will run the vectorized part 4 times to handle 4x4 elements, and then need some scalar code to handle the one left over. The mask is a means by which you can "promise" the compiler that the application guarantees that the list is a multiple of 4 - so the vectorizer knows that it doesn't need to generate code to handle the "left over" parts which are not vectorizable (because there can never be any left overs).
HTH, Pete
It should be what you said, but the real compiling result is almost the same.
1) Without the hint of 'len & ~3'
int accumulate(int * __attribute__ ((aligned (16))) c, int len)
{
int i, retval;
for(i=0, retval = 0; i < len; i++) {
retval += c[i];
}
return retval;
Compling output:
accumulate:
.L3:
.L5:
.L29:
.L13:
.L7:
.L2:
.L28:
.L14:
2) With the hint of 'len & ~3'
for(i=0, retval = 0; i < (len & ~3) ; i++) {
Compling result:
The only different is that 'len & ~3' complied into a instruction 'bic r1, r1, #3', nothing else.