Detecting Overflow from MUL

September 11, 2013

4 minute read time.

Detecting Overflow from Arithmetic Operations

I discussed in a previous blog post that it is possible to set some condition flags based on the result of an arithmetic operation. Consider the following code:

adds    r0, r0, r1
bvs     <some_address>

The above code adds r1 to r0, then branches somewhere if a (signed) overflow was detected. This technique is used frequently in JIT-compilers for dynamic languages. In such contexts, the type and size of a variable is often not known when the code is compiled, so the JIT-compiler will test for overflow, and then fall back to a slower implementation in the case where a signed 32-bit integer cannot represent the result of the required operation. This is the approach taken by Mozilla's Trace Monkey JavaScript engine, for example.

Setting the Flags with `mul`

Those familiar with ARM's mul instruction may realize that although it can take the s suffix to update the flags, it only updates the n and z flags. Consider the following:

ldr     r0, =100000
muls    r0, r0, r0

It is, of course, not possible to represent 100000 × 100000 in a 32-bit signed integer, but muls does not report the overflow. A likely explanation for this limitation is that addition and subtraction overflows (and carries) can be used to implement 64-bit arithmetic, but the same is not true for mul as an operation may overflow by more than one bit. Indeed, 100000 × 100000 should result in 10000000000, or 0x2540be400, which cannot be reconstructed as a 64-bit quantity from the 32-bit result and the APSR flags alone.

The ARM architecture does, however, provide a facility for multiplying two 32-bit quantities to produce a 64-bit result. This operation cannot overflow:

ldr     r0, =100000
smull   r2, r3, r0, r0      @   Signed: [r3:r2] = r0 * r0
umull   r2, r3, r0, r0      @ Unsigned: [r3:r2] = r0 * r0

The smull and umull instruction can also set the flags if you add the s suffix, but as before, this isn't terribly useful for detecting 32-bit overflow.

Detecting Unsigned Overflow

Detecting unsigned overflow using umull is actually fairly trivial. Only if the multiplication overflows will the high word of the result be non-zero, so we can use the umull instruction to get a 64-bit result and test the top word in a scratch register. Here, we use ip as a scratch register ¹.

ldr     r0, =100000
umull   r0, ip, r0, r0      @   [ip:r0] = r0 * r0
cmp     ip, #0
@ ----
bne     <somewhere>   @ Branch if we overflowed.
beq     <elsewhere>   @ Branch if we did not overflow.

The snippet above will multiply 100000 by itself, store the least-significant 32 bits of the result back into r0, and clear the Z flag if the calculation overflowed into ip. Subsequent code can use the ne and eq condition codes to act on this (as shown in the example).

Detecting Signed Overflow

Detecting unsigned overflow is fairly trivial because the top word can easily be checked for equality to zero. With a signed quantity, however, the top word may contain a non-zero value even if no overflow occurred. Specifically, the top 33 bits of the result will contain either all zeroes or all ones if no overflow occurred.

There is obviously no machine instruction to check that 33 consecutive bits all contain the same value. However, we can use an Operand 2 signed shift modifier to perform this check:

ldr     r0, =100000
smull   r0, ip, r0, r0
cmp     ip, r0, ASR #31

The ASR operation shifts r0 31 bits to the right, but it does an arithmetic shift, meaning that it extends the sign bit as it goes. With a 0 in bit 31, it merely shifts it all the way to bit 0, and fills the rest with zeroes. With a 1 in bit 31, it shifts it all the way to bit 0, but considers the value negative, so it fills the rest of the bits with ones. The cmp instruction therefore compares ip with either 0x00000000 or 0xffffffff, and therefore compares equal if the top 33 bits are identical. As with the unsigned version, the Z flag is cleared if the calculation overflowed, or set if it did not.

This technique is used by at least Mozilla's Trace Monkey and Google's V8 JavaScript JIT compilers, as both try to perform integer calculations but fall back to slower calculations (perhaps using a double-precision floating-point representation) if a signed integer cannot represent the required values.

Performance Considerations

Both the smull and the umull solutions use an additional scratch register. This often isn't a problem, but it might add additional complexity (and a performance penalty) if another value needs to be preserved on the stack.
The smull and umull instructions are more complicated than the mul instruction, and are therefore likely to take more cycles to execute. In contrast, the processing overhead of detecting overflow from an instruction such as add or sub is trivial, and usually free in practice. Performance may be improved if the smull or umull is calculated early, so the cmp doesn't stall waiting for a result.

¹Note that ip is a synonym for r12, the intra-procedure-call scratch register. The ARM Procedure Call Standard allows ip to be corrupted by function calls, and it is commonly used as a general-purpose scratch register.

1 comment
0 members are here

Architectures and Processors blog

Introducing GICv5: Scalable and secure interrupt management for Arm

Christoffer Dall

Introducing Arm GICv5: a scalable, hypervisor-free interrupt controller for modern multi-core systems with improved virtualization and real-time support.
- April 28, 2025
Getting started with AARCHMRS Features.json using Python

Joh

A high-level introduction to the Arm Architecture Machine Readable Specification (AARCHMRS) Features.json with some examples to interpret and start to work with the available data using Python.
- April 8, 2025
Advancing server manageability on Arm Neoverse Compute Subsystem (CSS) with OpenBMC

Samer El-Haj-Mahmoud

Arm and 9elements Cyber Security have brought a prototype of OpenBMC to the Arm Neoverse Compute Subsystem (CSS) to advancing server manageability.
- January 28, 2025

AI blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded and Microcontrollers blog

Internet of Things (IoT) blog

Laptops and Desktops blog

Mobile, Graphics, and Gaming blog

Operating Systems blog

Servers and Cloud Computing blog

SoC Design and Simulation blog

Tools, Software and IDEs blog

Detecting Overflow from MUL

Detecting Overflow from Arithmetic Operations

Setting the Flags with `mul`

Detecting Unsigned Overflow

Detecting Signed Overflow

Performance Considerations

Introducing GICv5: Scalable and secure interrupt management for Arm

Getting started with AARCHMRS Features.json using Python

Advancing server manageability on Arm Neoverse Compute Subsystem (CSS) with OpenBMC

Detecting Overflow from MUL

Detecting Overflow from Arithmetic Operations

Setting the Flags with mul

Detecting Unsigned Overflow

Detecting Signed Overflow

Performance Considerations

Setting the Flags with `mul`