This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

NEON conditional execution

Note: This was originally posted on 17th February 2012 at http://forums.arm.com

Hi, I would like to perform the following operation simultaneously on 8x16bits registers:

Q0 = Q0 + Q1
if (Q0 >= Q2) Q0 = Q3

I am not clear if this is possible to do that ?
In normal mode, I know it is since the mov can be conditional, but in SIMD I don't know.
Parents
  • Note: This was originally posted on 17th February 2012 at http://forums.arm.com

    yes I understand what you say regarding continuous instruction flow.



    Why do you believe this to be less efficient?


    my example
    Q0 = Q0 + Q1
    if (Q0 >= Q2) Q0 = Q0 - Q3

    could be implemented without VBIT and without using 2 extra registers, one that stores the temporary result of Q0-Q3, one that stores the result of VCGE (that is truly just a one bit flag by the way..)  to use with VBIT or VBIF

    so, really like in the SISD mode:

    add r0,r0,r1
    cmp r0,r2
    subge r0, r0, r3

    but for that, it would mean that each lane could maintain its own set of flags (result of a generic CMP instruction) and then conditionally execute the instruction depending on the lane's flags.There are some DSP that do this in SIMD like the analog devices ADSP213xx family. This is the only SIMD DSP I have used so I first assumed NEON did the same.

    I wonder what is the technological problem in doing this in the chip, or at least, just maintaining a single flag bit per lane (result of a VCxx instruction), and preventing the instruction to execute (or affect any registers) in that lane when its flag is 0 ?
    Seeing the NEON instruction encoding, the 4bits condition flags are not used yet, maybe a future feature ? ...

    Anyways, I can see the possibilities with VBIT, and doing the "else" counter-part with VBIF, that is a good thing already, it makes things possible
Reply
  • Note: This was originally posted on 17th February 2012 at http://forums.arm.com

    yes I understand what you say regarding continuous instruction flow.



    Why do you believe this to be less efficient?


    my example
    Q0 = Q0 + Q1
    if (Q0 >= Q2) Q0 = Q0 - Q3

    could be implemented without VBIT and without using 2 extra registers, one that stores the temporary result of Q0-Q3, one that stores the result of VCGE (that is truly just a one bit flag by the way..)  to use with VBIT or VBIF

    so, really like in the SISD mode:

    add r0,r0,r1
    cmp r0,r2
    subge r0, r0, r3

    but for that, it would mean that each lane could maintain its own set of flags (result of a generic CMP instruction) and then conditionally execute the instruction depending on the lane's flags.There are some DSP that do this in SIMD like the analog devices ADSP213xx family. This is the only SIMD DSP I have used so I first assumed NEON did the same.

    I wonder what is the technological problem in doing this in the chip, or at least, just maintaining a single flag bit per lane (result of a VCxx instruction), and preventing the instruction to execute (or affect any registers) in that lane when its flag is 0 ?
    Seeing the NEON instruction encoding, the 4bits condition flags are not used yet, maybe a future feature ? ...

    Anyways, I can see the possibilities with VBIT, and doing the "else" counter-part with VBIF, that is a good thing already, it makes things possible
Children
No data