This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Code for integer division on Cortex-A8?

Hi all,

when I wrote a C code with division operation the compiler is generating some library calls.....when I tried to see the equivalent code for those function calls...I'm unable to reach there (may be because they arm library SW routines)...I cant use those SW routine calls...I can embed the equivalent code in my program( in assembly code). Form where I can get the Equivalent codes for division SW routines....I need codes for 32-bit/32-bit ,32-bit/16-bit & 16-bit/16-bit divisions....

Thanks in advance....

0 Chris Shore over 11 years ago

Hi Joseph,
Yes, you're right that on most ARM cores, division will be carried out using library calls. If you're unable to link to the libraries for some reason, then there are several ways to get around that. You could disassemble the libraries and extract the assembly code for the division routines but this would be very hard work!
Alternatively, you can write your own. There are quite a few examples available in the web if you care to look. Here are a couple which I have found.
This one is from the Keil documentation and is provided by way of an example of how to write a macro function in the assembler:
http://www.keil.com/support/man/docs/armasm/armasm_cegecdgd.htm
This one is a much more complex full-freatures fixed-point division function:
A Fast Hi Precision Fixed Point Divide
Beware that implementing correct high-speed division functions is not an easy task so take care!
Hope this helps
Chris
Cancel
Vote up 0 Vote down

Cancel
0 semp over 11 years ago

GCC would do the same thing on hardware w/o division support. The routines it uses are available in libgcc. More specifically, the routine 'divsi3' could be of interest as a reference point.
Some info on this can be found here: Libgcc - GNU Compiler Collection (GCC) Internals
Cancel
Vote up 0 Vote down

Cancel
0 daith over 11 years ago

It's worth remembering too that division by a constant can be done much more efficiently. If you run gcc with a division by a constant you'll see it generating inline code rather than calling a subroutine.
Cancel
Vote up 0 Vote down

Cancel
0 Chris Shore over 11 years ago in reply to daith

Hi Daith,
Thanks for the contribution. You're right about division by constants and the ARM compiler will do this too. Where appropriate it will use shift instructions for division (or multiplication) by power of two. For other compiler-time constants it will use a standard multiply sequence. Here is an example of division by 10:
LDR r0, =0xCCCCCCCD
UMULL r2, r1, r0, r1
MOV r1, r1, LSR #3
The constant used is a fixed-point binary representation of 1/10 and the final shift removes the fractional bits from the result to leave an integer.
Chris
Cancel
Vote up 0 Vote down

Cancel
0 daith over 11 years ago in reply to Chris Shore

It isn't always quite as simple as that. Sometimes there's an extra instruction as depending on the constant the last bit may need extra twiddling. It's not something one can or should do by hand, plagiarizing the output of the compiler is best. In fact I just think of C as my assembler language and would only touch actual assembler in extreme circumstances.
Cancel
Vote up 0 Vote down

Cancel
0 Chris Shore over 11 years ago in reply to daith

Hi Daith,
Thanks for your reply. I agree that it isn't always that simple. Many don't realise, for instance, that ASR doesn't exactly replace integer division because it doesn't round properly. But, as you say, this isn't something which you should undertake unless you really know what you're doing.
We might debate your comment about using the C compiler as an "assembler" another time. I have quite strong views on that! C is relatively "low-level" when compared with other high-level languages but I think it's wrong to thing of it as an assembly replacement. If you're trying to write assembly code, then my view is that you should use an assembler!
Chris
Cancel
Vote up 0 Vote down

Cancel
0 Matthijs van Duin over 11 years ago in reply to Chris Shore

Chris Shore wrote:

Thanks for your reply. I agree that it isn't always that simple. Many don't realise, for instance, that ASR doesn't exactly replace integer division because it doesn't round properly.

Though to be honest I think floored division is superior to truncated division, especially with regard to the associated mod operator. But yes, the difference is something to be aware of.
Cancel
Vote up 0 Vote down

Cancel
0 Matthijs van Duin over 11 years ago in reply to daith

daith wrote:

It's worth remembering too that division by a constant can be done much more efficiently.

Although not often applicable, it is also interesting to note that exact division (i.e. where it is known in advance that the result is exactly an integer, no rounding is required) is also more efficient, both in general and when dividing by a constant.
The obvious reason is that you can use division routines with a different rounding mode than the usual one if they happen to be more efficient, but another less well-known option is that in this case you can replace division by an odd number by multiplication with the modular inverse. This means you can use MUL instead of SMULL/UMULL/SMMUL for 32-bit operands, and 64-bit operands take only a few more instructions. For even numbers, combine with a shift.
To calculate the modular inverse using Newton-Raphson iteration:
// modular inverse of odd 32-bit value u32 modinv32( u32 x ) {         u32 y = x;      // 3 bits correct         y *= 2 - x * y; // 6         y *= 2 - x * y; // 12         y *= 2 - x * y; // 24         y *= 2 - x * y; // 32         return y; }
64-bit requires only one extra iteration, and doesn't need 32-bit math until then:
// modular inverse of odd 64-bit value u64 modinv64( u64 x ) {         u64 y = modinv32( (u32) x );         y *= 2 - x * y;         return y; }
Demo (compile with -fwrapv):
int x = -1234567890; x *= 123; x *= (int) modinv32( 123 ); printf( "%d\n", x );
Note that unintuitively it works regardless of whether the numerator is signed or unsigned, and it doesn't even matter that the original multiplication overflowed. Without -fwrapv signed integer overflow yields undefined behaviour so you'd need to cast to unsigned and back.
Turning this into a signed division routine (doesn't need -fwrapv):
// 32-bit exact division int div32_exact( int n, int d ) {         int shift = __builtin_ctz( d );         n >>= shift;         d >>= shift;         return (int)( (u32) n * modinv32( (u32) d ) ); }
With modinv inlined, GCC compiles this to a tiny function of 20 instructions (or 19 using -Os). Unsigned version is essentially the same: only the shifts depend on signedness.
Cancel
Vote up 0 Vote down

Cancel
0 daith over 11 years ago in reply to Matthijs van Duin

The original and best paper on all this is
Division by Invariant Integers using Multiplication
by Torbjorn Granlund and Peter L. Montgomery
see also
Improved division by invariant integers
Cancel
Vote up 0 Vote down

Cancel
0 Matthijs van Duin over 11 years ago in reply to daith

Nice references, thanks!
Cancel
Vote up 0 Vote down

Cancel
0 daith over 11 years ago in reply to Matthijs van Duin

If you're really desperate here's a routine which does unsigned division by a constant.d not equal to zero
Initialise dr and sh such that
Shift d left while zero to give dbig with top bit set and shwzcnt the shift count
recip = (2^64 - 1) / dbig, a long unsigned division yielding an unsigned integer
dr = recip + 1
sh = 31 - shwzcnt
n = r0 the number to be divided + the eventual result
dr = r1
sh = r2
tmp = r3
     movs     dr, dr
     umullne tmp, dr, n, dr
     subne    n, n, dr
     addne    n, dr, n, lsr #1
     mov       n, n, lsr sh
     mov       pc, lr
This can be used with a run time divisor when it is going to be used a number of times. But I'd just write a C routine and copy the generated code for divides by constants..
Cancel
Vote up 0 Vote down

Cancel
0 daith over 11 years ago in reply to daith

In fact I've found a surprising improvement for the more troublesome unsigned integer divisions by constant compared to the usual method see
Labor of Division (Episode III): Faster Unsigned Division by Constants
Cancel
Vote up 0 Vote down

Cancel