I discussed in a previous blog post that it is possible to set some condition flags based on the result of an arithmetic operation. Consider the following code:
adds r0, r0, r1 bvs <some_address>
The above code adds r1 to r0, then branches somewhere if a (signed) overflow was detected. This technique is used frequently in JIT-compilers for dynamic languages. In such contexts, the type and size of a variable is often not known when the code is compiled, so the JIT-compiler will test for overflow, and then fall back to a slower implementation in the case where a signed 32-bit integer cannot represent the result of the required operation. This is the approach taken by Mozilla's Trace Monkey JavaScript engine, for example.
r1
r0
mul
Those familiar with ARM's mul instruction may realize that although it can take the s suffix to update the flags, it only updates the n and z flags. Consider the following:
s
n
z
ldr r0, =100000 muls r0, r0, r0
It is, of course, not possible to represent 100000 × 100000 in a 32-bit signed integer, but muls does not report the overflow. A likely explanation for this limitation is that addition and subtraction overflows (and carries) can be used to implement 64-bit arithmetic, but the same is not true for mul as an operation may overflow by more than one bit. Indeed, 100000 × 100000 should result in 10000000000, or 0x2540be400, which cannot be reconstructed as a 64-bit quantity from the 32-bit result and the APSR flags alone.
100000 × 100000
muls
10000000000
0x2540be400
The ARM architecture does, however, provide a facility for multiplying two 32-bit quantities to produce a 64-bit result. This operation cannot overflow:
ldr r0, =100000 smull r2, r3, r0, r0 @ Signed: [r3:r2] = r0 * r0 umull r2, r3, r0, r0 @ Unsigned: [r3:r2] = r0 * r0
The smull and umull instruction can also set the flags if you add the s suffix, but as before, this isn't terribly useful for detecting 32-bit overflow.
smull
umull
Detecting unsigned overflow using umull is actually fairly trivial. Only if the multiplication overflows will the high word of the result be non-zero, so we can use the umull instruction to get a 64-bit result and test the top word in a scratch register. Here, we use ip as a scratch register 1.
ip
ldr r0, =100000 umull r0, ip, r0, r0 @ [ip:r0] = r0 * r0 cmp ip, #0 @ ---- bne <somewhere> @ Branch if we overflowed. beq <elsewhere> @ Branch if we did not overflow.
The snippet above will multiply 100000 by itself, store the least-significant 32 bits of the result back into r0, and clear the Z flag if the calculation overflowed into ip. Subsequent code can use the ne and eq condition codes to act on this (as shown in the example).
Z
ne
eq
Detecting unsigned overflow is fairly trivial because the top word can easily be checked for equality to zero. With a signed quantity, however, the top word may contain a non-zero value even if no overflow occurred. Specifically, the top 33 bits of the result will contain either all zeroes or all ones if no overflow occurred.
There is obviously no machine instruction to check that 33 consecutive bits all contain the same value. However, we can use an Operand 2 signed shift modifier to perform this check:
ldr r0, =100000 smull r0, ip, r0, r0 cmp ip, r0, ASR #31
The ASR operation shifts r0 31 bits to the right, but it does an arithmetic shift, meaning that it extends the sign bit as it goes. With a 0 in bit 31, it merely shifts it all the way to bit 0, and fills the rest with zeroes. With a 1 in bit 31, it shifts it all the way to bit 0, but considers the value negative, so it fills the rest of the bits with ones. The cmp instruction therefore compares ip with either 0x00000000 or 0xffffffff, and therefore compares equal if the top 33 bits are identical. As with the unsigned version, the Z flag is cleared if the calculation overflowed, or set if it did not.
ASR
cmp
0x00000000
0xffffffff
This technique is used by at least Mozilla's Trace Monkey and Google's V8 JavaScript JIT compilers, as both try to perform integer calculations but fall back to slower calculations (perhaps using a double-precision floating-point representation) if a signed integer cannot represent the required values.
add
sub
1Note that ip is a synonym for r12, the intra-procedure-call scratch register. The ARM Procedure Call Standard allows ip to be corrupted by function calls, and it is commonly used as a general-purpose scratch register.
r12
Thanks for this.
While this is super useful to understand what's going on, it might be worth noting that if you're using GCC (v7+) and need overflow checks in C code, it it's fairly simple to use "__builtin_*_overflow( )". Refer to https://gcc.gnu.org/onlinedocs/gcc/Integer-Overflow-Builtins.html