This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

How to get absolute value of a 32-bit signed integer as fast as possible?

Hi.

I wonder how to calculate absolute value of a 32-bit signed integer in C as fast as possible. I saw that there is a FPU instruction VABS.F32, which do that in one cycle (above the floats). I thought, if it is possible to use it also with integers (sign bit is in both cases MSB bit). Maybe in some way with inline or embedded assembly?

Or do you have any other advice how to get absolute value of an integer in C in the fastest way.

Not to forget - I am speaking regarding Cortex-M4 processor.

Thanks

Parents
  • And if you just want to be different here's some interesting code which loses a register and doesn't save any cycles on the M4 ;-)

         asrs    r1, r0, #31

         eors    r0, r1

         sub     r0, r1

    There's a good reason, not just to be different, for using this method. I frequently use this sequence of instructions (Arithmetic Shift Right, Exclusive OR, Subtract) for determining the absolute value of integral data type in various processor architectures that I use. My main reason is not necessarily speed based on the total or statistical average of instruction cycle count but the property that the code does not branch so we get the same number of cycles for calculating the absolute value regardless of the sign of the input. The code just make unnecessary operations when the input is positive.

         eor     r1, r0, r0, asr #31

         sub     r0, r1, r0, asr #31

    Comparing this two-instruction and the previous three-instruction method, the increase in the size of code from 6 to 8 bytes might have a negative impact on the overall performance so further analysis may be needed before making a choice between the two. In ARM state (matic specified Cortex-M4, just for comparison) this is irrelevant.

         int src, tmp, dst;

    For all the codes that were posted, when the input is 0x80000000 (-2147483648) the output will be the same (0x80000000). The simplest way to handle this special case is to treat the output (dst in your C code) as unsigned. This will allow 0x80000000 to be treated as (+)2147483648 without requiring additional bit(s) of storage. Also, or consequently, we can just proceed with the arithmetic instruction, simply ignoring the flags. With (signed) integer data type, the output 0x80000000 is still -2147483648 after the supposed absolute value operation. Although we can still make the (signed) integer data type work, especially since C is loosely typed, the unsigned type will help make the code easier to comprehend and work with.

Reply
  • And if you just want to be different here's some interesting code which loses a register and doesn't save any cycles on the M4 ;-)

         asrs    r1, r0, #31

         eors    r0, r1

         sub     r0, r1

    There's a good reason, not just to be different, for using this method. I frequently use this sequence of instructions (Arithmetic Shift Right, Exclusive OR, Subtract) for determining the absolute value of integral data type in various processor architectures that I use. My main reason is not necessarily speed based on the total or statistical average of instruction cycle count but the property that the code does not branch so we get the same number of cycles for calculating the absolute value regardless of the sign of the input. The code just make unnecessary operations when the input is positive.

         eor     r1, r0, r0, asr #31

         sub     r0, r1, r0, asr #31

    Comparing this two-instruction and the previous three-instruction method, the increase in the size of code from 6 to 8 bytes might have a negative impact on the overall performance so further analysis may be needed before making a choice between the two. In ARM state (matic specified Cortex-M4, just for comparison) this is irrelevant.

         int src, tmp, dst;

    For all the codes that were posted, when the input is 0x80000000 (-2147483648) the output will be the same (0x80000000). The simplest way to handle this special case is to treat the output (dst in your C code) as unsigned. This will allow 0x80000000 to be treated as (+)2147483648 without requiring additional bit(s) of storage. Also, or consequently, we can just proceed with the arithmetic instruction, simply ignoring the flags. With (signed) integer data type, the output 0x80000000 is still -2147483648 after the supposed absolute value operation. Although we can still make the (signed) integer data type work, especially since C is loosely typed, the unsigned type will help make the code easier to comprehend and work with.

Children
  • If you don't want (or cannot afford) to treat absolute output as unsigned as @goodwin suggested, here is what I would do to perform Absolute function on signed integer keeping output as signed (i.e. saturating abs(-2^31) to -1+2^31)

    if (val < 0) {
        val = __qsub(0, val);
    }
    

    This would be translated to

    cmp      r0,#0
    itt      MI 
    movMI    r1,#0
    qsubMI   r0,r1,r0
    

    This will take 4 cycles regardless input value (maybe 3 if you have some 0 constant already in your registers !) on a Cortex-M4 but requires that you use intrinsics (or inline assembly) from your compiler !

  • The trick with eor can be used to do the saturating abs in 3 cycles

         asrs    r1, r0, #31

         eors    r0, r1

         qsub    r0, r1

         tmp = (int)src >> 31;

         dst = __qsub(val^tmp, tmp);

  • Thanks. I like that solution most.

    I write it as __QSUB(src^src >> 31, src >> 31)  and it works well also with 0x80000000 input. Result (if signed) become 2147483647 (0x7FFFFFFF). So, perhaps it isn't necesary to compute intermediate "tmp" value as you did, is it? Therefor you avoid additional STR operation.

    Also, is casting with (int) there necessary? Because, the dissasembly code and result are the same witout it. As in my example.

  • Looks that this solution is the most efficient, I would put some parenthesis though ....

    Regarding the STR instruction, as long as tmp variable is local to your function, there is no reason a compiler with decent optimization would store this intermediate result !! Did you compile without optimization ? Or maybe, your tmp variable is declared as global ?

    For the cast to int, it may not be necessary (if you declared src as an int), but this way you can make sure that compiler will use ASR (arithmetic shift) instead of LSR (logical shift). Also anyone that reads this C code will understand that your intention was to sign-fill your variable !