...worked after using volatile before all variables...
this "clipping" takes up almost half the time of execution of the loop
// Move constants of zero and maxVal into Neon registers VMOV.I16 d0,#0 VMOV.I16 d1,#maxVal ... // Perform clipping VMAX.S16 d4,d4,d0 // Choose largest of zero and value VMIN.S16 d4,d4,d1 // Choose smallest of new value and maxVal ...