We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
Hi all,
when I wrote a C code with division operation the compiler is generating some library calls.....when I tried to see the equivalent code for those function calls...I'm unable to reach there (may be because they arm library SW routines)...I cant use those SW routine calls...I can embed the equivalent code in my program( in assembly code). Form where I can get the Equivalent codes for division SW routines....I need codes for 32-bit/32-bit ,32-bit/16-bit & 16-bit/16-bit divisions....
Thanks in advance....
Hi Joseph,
Yes, you're right that on most ARM cores, division will be carried out using library calls. If you're unable to link to the libraries for some reason, then there are several ways to get around that. You could disassemble the libraries and extract the assembly code for the division routines but this would be very hard work!
Alternatively, you can write your own. There are quite a few examples available in the web if you care to look. Here are a couple which I have found.
This one is from the Keil documentation and is provided by way of an example of how to write a macro function in the assembler:
http://www.keil.com/support/man/docs/armasm/armasm_cegecdgd.htm
This one is a much more complex full-freatures fixed-point division function:
A Fast Hi Precision Fixed Point Divide
Beware that implementing correct high-speed division functions is not an easy task so take care!
Hope this helps
Chris
GCC would do the same thing on hardware w/o division support. The routines it uses are available in libgcc. More specifically, the routine 'divsi3' could be of interest as a reference point.
Some info on this can be found here: Libgcc - GNU Compiler Collection (GCC) Internals
It's worth remembering too that division by a constant can be done much more efficiently. If you run gcc with a division by a constant you'll see it generating inline code rather than calling a subroutine.
Hi Daith,
Thanks for the contribution. You're right about division by constants and the ARM compiler will do this too. Where appropriate it will use shift instructions for division (or multiplication) by power of two. For other compiler-time constants it will use a standard multiply sequence. Here is an example of division by 10:
LDR r0, =0xCCCCCCCD
UMULL r2, r1, r0, r1
MOV r1, r1, LSR #3
The constant used is a fixed-point binary representation of 1/10 and the final shift removes the fractional bits from the result to leave an integer.
It isn't always quite as simple as that. Sometimes there's an extra instruction as depending on the constant the last bit may need extra twiddling. It's not something one can or should do by hand, plagiarizing the output of the compiler is best. In fact I just think of C as my assembler language and would only touch actual assembler in extreme circumstances.
Thanks for your reply. I agree that it isn't always that simple. Many don't realise, for instance, that ASR doesn't exactly replace integer division because it doesn't round properly. But, as you say, this isn't something which you should undertake unless you really know what you're doing.
We might debate your comment about using the C compiler as an "assembler" another time. I have quite strong views on that! C is relatively "low-level" when compared with other high-level languages but I think it's wrong to thing of it as an assembly replacement. If you're trying to write assembly code, then my view is that you should use an assembler!
Chris Shore wrote: Thanks for your reply. I agree that it isn't always that simple. Many don't realise, for instance, that ASR doesn't exactly replace integer division because it doesn't round properly.
Chris Shore wrote:
Thanks for your reply. I agree that it isn't always that simple. Many don't realise, for instance, that ASR doesn't exactly replace integer division because it doesn't round properly.
Though to be honest I think floored division is superior to truncated division, especially with regard to the associated mod operator. But yes, the difference is something to be aware of.
daith wrote: It's worth remembering too that division by a constant can be done much more efficiently.
daith wrote:
It's worth remembering too that division by a constant can be done much more efficiently.
Although not often applicable, it is also interesting to note that exact division (i.e. where it is known in advance that the result is exactly an integer, no rounding is required) is also more efficient, both in general and when dividing by a constant.
The obvious reason is that you can use division routines with a different rounding mode than the usual one if they happen to be more efficient, but another less well-known option is that in this case you can replace division by an odd number by multiplication with the modular inverse. This means you can use MUL instead of SMULL/UMULL/SMMUL for 32-bit operands, and 64-bit operands take only a few more instructions. For even numbers, combine with a shift.
To calculate the modular inverse using Newton-Raphson iteration:
// modular inverse of odd 32-bit valueu32 modinv32( u32 x ) { u32 y = x; // 3 bits correct y *= 2 - x * y; // 6 y *= 2 - x * y; // 12 y *= 2 - x * y; // 24 y *= 2 - x * y; // 32 return y;}
64-bit requires only one extra iteration, and doesn't need 32-bit math until then:
// modular inverse of odd 64-bit valueu64 modinv64( u64 x ) { u64 y = modinv32( (u32) x ); y *= 2 - x * y; return y;}
Demo (compile with -fwrapv):
-fwrapv
int x = -1234567890;x *= 123;x *= (int) modinv32( 123 );printf( "%d\n", x );
Note that unintuitively it works regardless of whether the numerator is signed or unsigned, and it doesn't even matter that the original multiplication overflowed. Without -fwrapv signed integer overflow yields undefined behaviour so you'd need to cast to unsigned and back.
Turning this into a signed division routine (doesn't need -fwrapv):
// 32-bit exact divisionint div32_exact( int n, int d ) { int shift = __builtin_ctz( d ); n >>= shift; d >>= shift; return (int)( (u32) n * modinv32( (u32) d ) );}
With modinv inlined, GCC compiles this to a tiny function of 20 instructions (or 19 using -Os). Unsigned version is essentially the same: only the shifts depend on signedness.
The original and best paper on all this is
Division by Invariant Integers using Multiplication
by Torbjorn Granlund and Peter L. Montgomery
see also
Improved division by invariant integers
Nice references, thanks!
If you're really desperate here's a routine which does unsigned division by a constant.d not equal to zero
Initialise dr and sh such that
Shift d left while zero to give dbig with top bit set and shwzcnt the shift count
recip = (2^64 - 1) / dbig, a long unsigned division yielding an unsigned integer
dr = recip + 1
sh = 31 - shwzcnt
n = r0 the number to be divided + the eventual result
dr = r1
sh = r2
tmp = r3
movs dr, dr
umullne tmp, dr, n, dr
subne n, n, dr
addne n, dr, n, lsr #1
mov n, n, lsr sh
mov pc, lr
This can be used with a run time divisor when it is going to be used a number of times. But I'd just write a C routine and copy the generated code for divides by constants..
In fact I've found a surprising improvement for the more troublesome unsigned integer divisions by constant compared to the usual method see
Labor of Division (Episode III): Faster Unsigned Division by Constants