This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

NEON conditional execution

Note: This was originally posted on 17th February 2012 at http://forums.arm.com

Hi, I would like to perform the following operation simultaneously on 8x16bits registers:

Q0 = Q0 + Q1
if (Q0 >= Q2) Q0 = Q3

I am not clear if this is possible to do that ?
In normal mode, I know it is since the mov can be conditional, but in SIMD I don't know.
Parents
  • Note: This was originally posted on 21st February 2012 at http://forums.arm.com


    Isn't that really exactly what the VBIT instruction does, except that it does so in a manner which is (1) generic and fairly flexible so you can use it for other things and (2) it doesn't need a load of extra special logic just for this specific use.

    The only downside of the current NEON approach is that you need one extra register to store the condition pattern, but this is rarely an issue in most algorithms.


    In most cases when using VBIT & VBIF it will require at least 2 extra registers, not only to store the condition pattern, but also for the operation's result you potentially want to affect to another register. Of course this can get worse when you have to do more conditional instructions related to a common condition result.
    VBIT/VBIF do the job, but it is not the most flexible solution.

    I am not saying NEON instructions need to be conditional with a 4bits flag (like many simple ARM instructions).
    I really like the VCxx instructions approach that generates a one bit flag, this has allowed to do add new compare instructions like VACxx. Very nice.
    However, it would be better if they did not require a whole temporary register to store that one bit flag, but instead store it next to the corresponding lane.
    After that, yes, we would need to have a 2bits flag per NEON instruction, 00 = exec if flag=0, 01 = exec if flag=1, 1X exec always.

    This approach is more efficient, but you are right, that does require extra logic in the chip and I can understand it was not done (yet)
Reply
  • Note: This was originally posted on 21st February 2012 at http://forums.arm.com


    Isn't that really exactly what the VBIT instruction does, except that it does so in a manner which is (1) generic and fairly flexible so you can use it for other things and (2) it doesn't need a load of extra special logic just for this specific use.

    The only downside of the current NEON approach is that you need one extra register to store the condition pattern, but this is rarely an issue in most algorithms.


    In most cases when using VBIT & VBIF it will require at least 2 extra registers, not only to store the condition pattern, but also for the operation's result you potentially want to affect to another register. Of course this can get worse when you have to do more conditional instructions related to a common condition result.
    VBIT/VBIF do the job, but it is not the most flexible solution.

    I am not saying NEON instructions need to be conditional with a 4bits flag (like many simple ARM instructions).
    I really like the VCxx instructions approach that generates a one bit flag, this has allowed to do add new compare instructions like VACxx. Very nice.
    However, it would be better if they did not require a whole temporary register to store that one bit flag, but instead store it next to the corresponding lane.
    After that, yes, we would need to have a 2bits flag per NEON instruction, 00 = exec if flag=0, 01 = exec if flag=1, 1X exec always.

    This approach is more efficient, but you are right, that does require extra logic in the chip and I can understand it was not done (yet)
Children
No data