This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Code for integer division on Cortex-A8?

Hi all,

when I wrote a C code with division operation the compiler is generating some library calls.....when I tried to see the equivalent code for those function calls...I'm unable to reach there (may be because they arm library SW routines)...I cant use those SW routine calls...I can embed the equivalent code in my program( in assembly code). Form where I can get the Equivalent codes for division SW routines....I need codes for 32-bit/32-bit ,32-bit/16-bit & 16-bit/16-bit divisions....

Thanks in advance....

Parents

0 Matthijs van Duin over 11 years ago in reply to daith

daith wrote:

It's worth remembering too that division by a constant can be done much more efficiently.

Although not often applicable, it is also interesting to note that exact division (i.e. where it is known in advance that the result is exactly an integer, no rounding is required) is also more efficient, both in general and when dividing by a constant.
The obvious reason is that you can use division routines with a different rounding mode than the usual one if they happen to be more efficient, but another less well-known option is that in this case you can replace division by an odd number by multiplication with the modular inverse. This means you can use MUL instead of SMULL/UMULL/SMMUL for 32-bit operands, and 64-bit operands take only a few more instructions. For even numbers, combine with a shift.
To calculate the modular inverse using Newton-Raphson iteration:
// modular inverse of odd 32-bit value u32 modinv32( u32 x ) {         u32 y = x;      // 3 bits correct         y *= 2 - x * y; // 6         y *= 2 - x * y; // 12         y *= 2 - x * y; // 24         y *= 2 - x * y; // 32         return y; }
64-bit requires only one extra iteration, and doesn't need 32-bit math until then:
// modular inverse of odd 64-bit value u64 modinv64( u64 x ) {         u64 y = modinv32( (u32) x );         y *= 2 - x * y;         return y; }
Demo (compile with -fwrapv):
int x = -1234567890; x *= 123; x *= (int) modinv32( 123 ); printf( "%d\n", x );
Note that unintuitively it works regardless of whether the numerator is signed or unsigned, and it doesn't even matter that the original multiplication overflowed. Without -fwrapv signed integer overflow yields undefined behaviour so you'd need to cast to unsigned and back.
Turning this into a signed division routine (doesn't need -fwrapv):
// 32-bit exact division int div32_exact( int n, int d ) {         int shift = __builtin_ctz( d );         n >>= shift;         d >>= shift;         return (int)( (u32) n * modinv32( (u32) d ) ); }
With modinv inlined, GCC compiles this to a tiny function of 20 instructions (or 19 using -Os). Unsigned version is essentially the same: only the shifts depend on signedness.
Cancel
Vote up 0 Vote down

Cancel

Reply

0 Matthijs van Duin over 11 years ago in reply to daith

daith wrote:

It's worth remembering too that division by a constant can be done much more efficiently.

Although not often applicable, it is also interesting to note that exact division (i.e. where it is known in advance that the result is exactly an integer, no rounding is required) is also more efficient, both in general and when dividing by a constant.
The obvious reason is that you can use division routines with a different rounding mode than the usual one if they happen to be more efficient, but another less well-known option is that in this case you can replace division by an odd number by multiplication with the modular inverse. This means you can use MUL instead of SMULL/UMULL/SMMUL for 32-bit operands, and 64-bit operands take only a few more instructions. For even numbers, combine with a shift.
To calculate the modular inverse using Newton-Raphson iteration:
// modular inverse of odd 32-bit value u32 modinv32( u32 x ) {         u32 y = x;      // 3 bits correct         y *= 2 - x * y; // 6         y *= 2 - x * y; // 12         y *= 2 - x * y; // 24         y *= 2 - x * y; // 32         return y; }
64-bit requires only one extra iteration, and doesn't need 32-bit math until then:
// modular inverse of odd 64-bit value u64 modinv64( u64 x ) {         u64 y = modinv32( (u32) x );         y *= 2 - x * y;         return y; }
Demo (compile with -fwrapv):
int x = -1234567890; x *= 123; x *= (int) modinv32( 123 ); printf( "%d\n", x );
Note that unintuitively it works regardless of whether the numerator is signed or unsigned, and it doesn't even matter that the original multiplication overflowed. Without -fwrapv signed integer overflow yields undefined behaviour so you'd need to cast to unsigned and back.
Turning this into a signed division routine (doesn't need -fwrapv):
// 32-bit exact division int div32_exact( int n, int d ) {         int shift = __builtin_ctz( d );         n >>= shift;         d >>= shift;         return (int)( (u32) n * modinv32( (u32) d ) ); }
With modinv inlined, GCC compiles this to a tiny function of 20 instructions (or 19 using -Os). Unsigned version is essentially the same: only the shifts depend on signedness.
Cancel
Vote up 0 Vote down

Cancel

Children

0 daith over 11 years ago in reply to Matthijs van Duin

The original and best paper on all this is
Division by Invariant Integers using Multiplication
by Torbjorn Granlund and Peter L. Montgomery
see also
Improved division by invariant integers
Cancel
Vote up 0 Vote down

Cancel
0 Matthijs van Duin over 11 years ago in reply to daith

Nice references, thanks!
Cancel
Vote up 0 Vote down

Cancel
0 daith over 11 years ago in reply to Matthijs van Duin

If you're really desperate here's a routine which does unsigned division by a constant.d not equal to zero
Initialise dr and sh such that
Shift d left while zero to give dbig with top bit set and shwzcnt the shift count
recip = (2^64 - 1) / dbig, a long unsigned division yielding an unsigned integer
dr = recip + 1
sh = 31 - shwzcnt
n = r0 the number to be divided + the eventual result
dr = r1
sh = r2
tmp = r3
     movs     dr, dr
     umullne tmp, dr, n, dr
     subne    n, n, dr
     addne    n, dr, n, lsr #1
     mov       n, n, lsr sh
     mov       pc, lr
This can be used with a run time divisor when it is going to be used a number of times. But I'd just write a C routine and copy the generated code for divides by constants..
Cancel
Vote up 0 Vote down

Cancel
0 daith over 11 years ago in reply to daith

In fact I've found a surprising improvement for the more troublesome unsigned integer divisions by constant compared to the usual method see
Labor of Division (Episode III): Faster Unsigned Division by Constants
Cancel
Vote up 0 Vote down

Cancel