We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
Why do you believe this to be less efficient?
Isn't that really exactly what the VBIT instruction does, except that it does so in a manner which is (1) generic and fairly flexible so you can use it for other things and (2) it doesn't need a load of extra special logic just for this specific use. The only downside of the current NEON approach is that you need one extra register to store the condition pattern, but this is rarely an issue in most algorithms.
Does my sample too easy or is it always possible to eval a conditional expression without any branch (only with conditional instruction) ?
; Q0 = Q0 + Q1VADD.U16 q0, q0, q1; if (Q0 >= Q2) Q0 = Q3VCGE.U16 q4, q0, q2VBIT q0, q3, q4
It seems not as efficient as if it was a true conditional execution, but I don't see any other way.