This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

How to get absolute value of a 32-bit signed integer as fast as possible?

Hi.

I wonder how to calculate absolute value of a 32-bit signed integer in C as fast as possible. I saw that there is a FPU instruction VABS.F32, which do that in one cycle (above the floats). I thought, if it is possible to use it also with integers (sign bit is in both cases MSB bit). Maybe in some way with inline or embedded assembly?

Or do you have any other advice how to get absolute value of an integer in C in the fastest way.

Not to forget - I am speaking regarding Cortex-M4 processor.

Thanks

Parents
  • Hi everybody,

    Just worked again on this absolute value topic and reached another way to achieve both good efficiency and portability including the saturation !

    The trick is to start inverting the input value in order to test its sign in the process.

    Here is the 32-bit version:

    int32_t myAbs32(int32_t parVal)
    {
        int32_t wResult;
       
        /* Take opposite of value and test value in the process */
        wResult = -parVal;
        
        if (wResult < 0)
        {
            /* result is negative, that means :
            - parVal is positive : following operation will subtract 0 to parVal
            - parVal is 0x80000000 : following operation will subtract 1 to parVal => 0x7FFFFFFF */   
            wResult = parVal - (((uint32_t)parVal) >> 31);
        }
        /* else result is positive, that means that parVal is negative, nothing else to do */
        return wResult;
    }
    

    When inlined in calling code of Cortex M3/M4/M7 (Thumb 2), this leads to a 2 or 3 cycles (most probably 2 cycles because of alignment of 16-bit RSBS and IT:

    RSBS      R1, R0, #0
    IT        MI
    SUBMI     R1, R0, R0, LSR #31
    

    In ARM mode, this takes exactly two instructions !

    A very similar approach can be applied for all data sizes below, here is the 16-bit version:

    int32_t myAbs16(int16_t parVal)
    {
        int32_t wResult;
       
        /* Take opposite of value and test value in the process */
        wResult = -parVal;
        
        if (wResult >= 0)
        {
            /* result is positive or null, that means :
            - parVal is 0xFFFF8000, result is 0x00008000 : following operation will subtract 1 => 0x00007FFF
            - parVal is negative >= 0xFFFF8001 : following operation will subtract 0
            - parVal is 0 : following operation will sutract 0 */   
            wResult -= (wResult >> 15);
        }
        else
        {
            /* result is negative, that means that parVal is positive, restore it */
            wResult = parVal;
        }
        return wResult;
    }
    

    Same remark, this should take 2 cycles :

    RSBS      R1, R0, #0
    IT        PL
    SUBPL     R0, R1, R1, ASR #15
    

    It is even good for Cortex M0(+), assembly code for 32-bit absolute (3 or 4 cycles for M0+, 4 cycles for M0) :

        RSBS      R1, R0, #0
        BPL       next
        LSRS      R1, R0, #31
        SUBS      R1, R0, R1
    next:
    

    I thought this was worth digging up this topic !

Reply
  • Hi everybody,

    Just worked again on this absolute value topic and reached another way to achieve both good efficiency and portability including the saturation !

    The trick is to start inverting the input value in order to test its sign in the process.

    Here is the 32-bit version:

    int32_t myAbs32(int32_t parVal)
    {
        int32_t wResult;
       
        /* Take opposite of value and test value in the process */
        wResult = -parVal;
        
        if (wResult < 0)
        {
            /* result is negative, that means :
            - parVal is positive : following operation will subtract 0 to parVal
            - parVal is 0x80000000 : following operation will subtract 1 to parVal => 0x7FFFFFFF */   
            wResult = parVal - (((uint32_t)parVal) >> 31);
        }
        /* else result is positive, that means that parVal is negative, nothing else to do */
        return wResult;
    }
    

    When inlined in calling code of Cortex M3/M4/M7 (Thumb 2), this leads to a 2 or 3 cycles (most probably 2 cycles because of alignment of 16-bit RSBS and IT:

    RSBS      R1, R0, #0
    IT        MI
    SUBMI     R1, R0, R0, LSR #31
    

    In ARM mode, this takes exactly two instructions !

    A very similar approach can be applied for all data sizes below, here is the 16-bit version:

    int32_t myAbs16(int16_t parVal)
    {
        int32_t wResult;
       
        /* Take opposite of value and test value in the process */
        wResult = -parVal;
        
        if (wResult >= 0)
        {
            /* result is positive or null, that means :
            - parVal is 0xFFFF8000, result is 0x00008000 : following operation will subtract 1 => 0x00007FFF
            - parVal is negative >= 0xFFFF8001 : following operation will subtract 0
            - parVal is 0 : following operation will sutract 0 */   
            wResult -= (wResult >> 15);
        }
        else
        {
            /* result is negative, that means that parVal is positive, restore it */
            wResult = parVal;
        }
        return wResult;
    }
    

    Same remark, this should take 2 cycles :

    RSBS      R1, R0, #0
    IT        PL
    SUBPL     R0, R1, R1, ASR #15
    

    It is even good for Cortex M0(+), assembly code for 32-bit absolute (3 or 4 cycles for M0+, 4 cycles for M0) :

        RSBS      R1, R0, #0
        BPL       next
        LSRS      R1, R0, #31
        SUBS      R1, R0, R1
    next:
    

    I thought this was worth digging up this topic !

Children
No data