This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Code for integer division on Cortex-A8?

Hi all,

when I wrote a C code with division operation the compiler is generating some library calls.....when I tried to see the equivalent code for those function calls...I'm unable to reach there (may be because they arm library SW routines)...I cant use those SW routine calls...I can embed the equivalent code in my program( in assembly code). Form where I can get the Equivalent codes for division SW routines....I need codes for 32-bit/32-bit ,32-bit/16-bit & 16-bit/16-bit divisions....

Thanks in advance....

Parents
  • daith wrote:

    It's worth remembering too that division by a constant can be done much more efficiently.

    Although not often applicable, it is also interesting to note that exact division (i.e. where it is known in advance that the result is exactly an integer, no rounding is required) is also more efficient, both in general and when dividing by a constant.

    The obvious reason is that you can use division routines with a different rounding mode than the usual one if they happen to be more efficient, but another less well-known option is that in this case you can replace division by an odd number by multiplication with the modular inverse.  This means you can use MUL instead of SMULL/UMULL/SMMUL for 32-bit operands, and 64-bit operands take only a few more instructions.  For even numbers, combine with a shift.

    To calculate the modular inverse using Newton-Raphson iteration:

    // modular inverse of odd 32-bit value
    u32 modinv32( u32 x ) {
            u32 y = x;      // 3 bits correct
            y *= 2 - x * y; // 6
            y *= 2 - x * y; // 12
            y *= 2 - x * y; // 24
            y *= 2 - x * y; // 32
            return y;
    }

    64-bit requires only one extra iteration, and doesn't need 32-bit math until then:

    // modular inverse of odd 64-bit value
    u64 modinv64( u64 x ) {
            u64 y = modinv32( (u32) x );
            y *= 2 - x * y;
            return y;
    }

    Demo (compile with -fwrapv):

    int x = -1234567890;
    x *= 123;
    x *= (int) modinv32( 123 );
    printf( "%d\n", x );

    Note that unintuitively it works regardless of whether the numerator is signed or unsigned, and it doesn't even matter that the original multiplication overflowed.  Without -fwrapv signed integer overflow yields undefined behaviour so you'd need to cast to unsigned and back.

    Turning this into a signed division routine (doesn't need -fwrapv):

    // 32-bit exact division
    int div32_exact( int n, int d ) {
            int shift = __builtin_ctz( d );
            n >>= shift;
            d >>= shift;
            return (int)( (u32) n * modinv32( (u32) d ) );
    }

    With modinv inlined, GCC compiles this to a tiny function of 20 instructions (or 19 using -Os).  Unsigned version is essentially the same: only the shifts depend on signedness.

Reply
  • daith wrote:

    It's worth remembering too that division by a constant can be done much more efficiently.

    Although not often applicable, it is also interesting to note that exact division (i.e. where it is known in advance that the result is exactly an integer, no rounding is required) is also more efficient, both in general and when dividing by a constant.

    The obvious reason is that you can use division routines with a different rounding mode than the usual one if they happen to be more efficient, but another less well-known option is that in this case you can replace division by an odd number by multiplication with the modular inverse.  This means you can use MUL instead of SMULL/UMULL/SMMUL for 32-bit operands, and 64-bit operands take only a few more instructions.  For even numbers, combine with a shift.

    To calculate the modular inverse using Newton-Raphson iteration:

    // modular inverse of odd 32-bit value
    u32 modinv32( u32 x ) {
            u32 y = x;      // 3 bits correct
            y *= 2 - x * y; // 6
            y *= 2 - x * y; // 12
            y *= 2 - x * y; // 24
            y *= 2 - x * y; // 32
            return y;
    }

    64-bit requires only one extra iteration, and doesn't need 32-bit math until then:

    // modular inverse of odd 64-bit value
    u64 modinv64( u64 x ) {
            u64 y = modinv32( (u32) x );
            y *= 2 - x * y;
            return y;
    }

    Demo (compile with -fwrapv):

    int x = -1234567890;
    x *= 123;
    x *= (int) modinv32( 123 );
    printf( "%d\n", x );

    Note that unintuitively it works regardless of whether the numerator is signed or unsigned, and it doesn't even matter that the original multiplication overflowed.  Without -fwrapv signed integer overflow yields undefined behaviour so you'd need to cast to unsigned and back.

    Turning this into a signed division routine (doesn't need -fwrapv):

    // 32-bit exact division
    int div32_exact( int n, int d ) {
            int shift = __builtin_ctz( d );
            n >>= shift;
            d >>= shift;
            return (int)( (u32) n * modinv32( (u32) d ) );
    }

    With modinv inlined, GCC compiles this to a tiny function of 20 instructions (or 19 using -Os).  Unsigned version is essentially the same: only the shifts depend on signedness.

Children