...worked after using volatile before all variables...
this "clipping" takes up almost half the time of execution of the loop
// Move constants of zero and maxVal into Neon registers VMOV.I16 d0,#0 VMOV.I16 d1,#maxVal ... // Perform clipping VMAX.S16 d4,d4,d0 // Choose largest of zero and value VMIN.S16 d4,d4,d1 // Choose smallest of new value and maxVal ...
Would writing the program in[/size][/color] intrinsics[color=#222222][size=2] rather than [/size][/color]asm[color=#222222][size=2] be better(at least allow the whole program to work) ?
Okay and the second half of D0[0] can be utilized by doing {snip}. Right?