This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Implementation in NEON of non uniform address jumps

Parents
  • Note: This was originally posted on 17th July 2012 at http://forums.arm.com

    ...worked after using volatile before all variables...

    This probably isn't the correct solution (I suspect you just needed to tell GCC that your assembly code modified variables in memory).

    this "clipping" takes up almost half the time of execution of the loop

    The likely problem here is that you are moving values back and forth between the Neon and main register file. On most implementations there are significant performance penalties for doing so, thus where ever possible this should be avoided.

    In this particular case you can perform the clipping using Neon instructions (if you are lucky with your choice of maxVal, you may be able to convert the previous shift and clipping to a single VQSHRN instruction). For example:

      // Move constants of zero and maxVal into Neon registers
      VMOV.I16 d0,#0
      VMOV.I16 d1,#maxVal
      ...
      // Perform clipping
      VMAX.S16 d4,d4,d0  // Choose largest of zero and value
      VMIN.S16 d4,d4,d1  // Choose smallest of new value and maxVal
      ...


    hth
    s.
Reply
  • Note: This was originally posted on 17th July 2012 at http://forums.arm.com

    ...worked after using volatile before all variables...

    This probably isn't the correct solution (I suspect you just needed to tell GCC that your assembly code modified variables in memory).

    this "clipping" takes up almost half the time of execution of the loop

    The likely problem here is that you are moving values back and forth between the Neon and main register file. On most implementations there are significant performance penalties for doing so, thus where ever possible this should be avoided.

    In this particular case you can perform the clipping using Neon instructions (if you are lucky with your choice of maxVal, you may be able to convert the previous shift and clipping to a single VQSHRN instruction). For example:

      // Move constants of zero and maxVal into Neon registers
      VMOV.I16 d0,#0
      VMOV.I16 d1,#maxVal
      ...
      // Perform clipping
      VMAX.S16 d4,d4,d0  // Choose largest of zero and value
      VMIN.S16 d4,d4,d1  // Choose smallest of new value and maxVal
      ...


    hth
    s.
Children
No data