Detecting Overflow from MUL

September 11, 2013

4 minute read time.

Detecting Overflow from Arithmetic Operations

I discussed in a previous blog post that it is possible to set some condition flags based on the result of an arithmetic operation. Consider the following code:

adds    r0, r0, r1
bvs     <some_address>

The above code adds r1 to r0, then branches somewhere if a (signed) overflow was detected. This technique is used frequently in JIT-compilers for dynamic languages. In such contexts, the type and size of a variable is often not known when the code is compiled, so the JIT-compiler will test for overflow, and then fall back to a slower implementation in the case where a signed 32-bit integer cannot represent the result of the required operation. This is the approach taken by Mozilla's Trace Monkey JavaScript engine, for example.

Setting the Flags with `mul`

Those familiar with ARM's mul instruction may realize that although it can take the s suffix to update the flags, it only updates the n and z flags. Consider the following:

ldr     r0, =100000
muls    r0, r0, r0

It is, of course, not possible to represent 100000 × 100000 in a 32-bit signed integer, but muls does not report the overflow. A likely explanation for this limitation is that addition and subtraction overflows (and carries) can be used to implement 64-bit arithmetic, but the same is not true for mul as an operation may overflow by more than one bit. Indeed, 100000 × 100000 should result in 10000000000, or 0x2540be400, which cannot be reconstructed as a 64-bit quantity from the 32-bit result and the APSR flags alone.

The ARM architecture does, however, provide a facility for multiplying two 32-bit quantities to produce a 64-bit result. This operation cannot overflow:

ldr     r0, =100000
smull   r2, r3, r0, r0      @   Signed: [r3:r2] = r0 * r0
umull   r2, r3, r0, r0      @ Unsigned: [r3:r2] = r0 * r0

The smull and umull instruction can also set the flags if you add the s suffix, but as before, this isn't terribly useful for detecting 32-bit overflow.

Detecting Unsigned Overflow

Detecting unsigned overflow using umull is actually fairly trivial. Only if the multiplication overflows will the high word of the result be non-zero, so we can use the umull instruction to get a 64-bit result and test the top word in a scratch register. Here, we use ip as a scratch register ¹.

ldr     r0, =100000
umull   r0, ip, r0, r0      @   [ip:r0] = r0 * r0
cmp     ip, #0
@ ----
bne     <somewhere>   @ Branch if we overflowed.
beq     <elsewhere>   @ Branch if we did not overflow.

The snippet above will multiply 100000 by itself, store the least-significant 32 bits of the result back into r0, and clear the Z flag if the calculation overflowed into ip. Subsequent code can use the ne and eq condition codes to act on this (as shown in the example).

Detecting Signed Overflow

Detecting unsigned overflow is fairly trivial because the top word can easily be checked for equality to zero. With a signed quantity, however, the top word may contain a non-zero value even if no overflow occurred. Specifically, the top 33 bits of the result will contain either all zeroes or all ones if no overflow occurred.

There is obviously no machine instruction to check that 33 consecutive bits all contain the same value. However, we can use an Operand 2 signed shift modifier to perform this check:

ldr     r0, =100000
smull   r0, ip, r0, r0
cmp     ip, r0, ASR #31

The ASR operation shifts r0 31 bits to the right, but it does an arithmetic shift, meaning that it extends the sign bit as it goes. With a 0 in bit 31, it merely shifts it all the way to bit 0, and fills the rest with zeroes. With a 1 in bit 31, it shifts it all the way to bit 0, but considers the value negative, so it fills the rest of the bits with ones. The cmp instruction therefore compares ip with either 0x00000000 or 0xffffffff, and therefore compares equal if the top 33 bits are identical. As with the unsigned version, the Z flag is cleared if the calculation overflowed, or set if it did not.

This technique is used by at least Mozilla's Trace Monkey and Google's V8 JavaScript JIT compilers, as both try to perform integer calculations but fall back to slower calculations (perhaps using a double-precision floating-point representation) if a signed integer cannot represent the required values.

Performance Considerations

Both the smull and the umull solutions use an additional scratch register. This often isn't a problem, but it might add additional complexity (and a performance penalty) if another value needs to be preserved on the stack.
The smull and umull instructions are more complicated than the mul instruction, and are therefore likely to take more cycles to execute. In contrast, the processing overhead of detecting overflow from an instruction such as add or sub is trivial, and usually free in practice. Performance may be improved if the smull or umull is calculated early, so the cmp doesn't stall waiting for a result.

¹Note that ip is a synonym for r12, the intra-procedure-call scratch register. The ARM Procedure Call Standard allows ip to be corrupted by function calls, and it is commonly used as a general-purpose scratch register.

ahogen over 5 years ago

Thanks for this.

While this is super useful to understand what's going on, it might be worth noting that if you're using GCC (v7+) and need overflow checks in C code, it it's fairly simple to use "__builtin_*_overflow( )". Refer to https://gcc.gnu.org/onlinedocs/gcc/Integer-Overflow-Builtins.html
- Cancel
- Up 0 Down
- Reply
- More
- Cancel

Architectures and Processors blog

Scalable Matrix Extension: Expanding the Arm Intrinsics Search Engine

Chris Walsh

Arm is pleased to announce that the Arm Intrinsics Search Engine has been updated to include the Scalable Matrix Extension (SME) intrinsics, including both SME and SME2 intrinsics.
- October 3, 2025
Arm A-Profile Architecture developments 2025

Martin Weidmann

Each year, Arm publishes updates to the A-Profile architecture alongside full Instruction Set and System Register documentation. In 2025, the update is Armv9.7-A.
- October 2, 2025
When a barrier does not block: The pitfalls of partial order

Wathsala Vithanage

Acquire fences aren’t always enough. See how LDAPR exposed unsafe interleavings and what we did to patch the problem.
- September 15, 2025

AI blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded and Microcontrollers blog

Internet of Things (IoT) blog

Laptops and Desktops blog

Mobile, Graphics, and Gaming blog

Operating Systems blog

Servers and Cloud Computing blog

SoC Design and Simulation blog

Tools, Software and IDEs blog

Detecting Overflow from MUL

Detecting Overflow from Arithmetic Operations

Setting the Flags with `mul`

Detecting Unsigned Overflow

Detecting Signed Overflow

Performance Considerations

Scalable Matrix Extension: Expanding the Arm Intrinsics Search Engine

Arm A-Profile Architecture developments 2025

When a barrier does not block: The pitfalls of partial order

Detecting Overflow from MUL

Detecting Overflow from Arithmetic Operations

Setting the Flags with mul

Detecting Unsigned Overflow

Detecting Signed Overflow

Performance Considerations

Setting the Flags with `mul`