Isn't that really exactly what the VBIT instruction does, except that it does so in a manner which is (1) generic and fairly flexible so you can use it for other things and (2) it doesn't need a load of extra special logic just for this specific use. The only downside of the current NEON approach is that you need one extra register to store the condition pattern, but this is rarely an issue in most algorithms.