Arm Community
Arm Community
  • Site
  • User
  • Site
  • Search
  • User
Arm Community blogs
Arm Community blogs
Architectures and Processors blog Detecting Overflow from MUL
  • Blogs
  • Mentions
  • Sub-Groups
  • Tags
  • Jump...
  • Cancel
More blogs in Arm Community blogs
  • AI blog

  • Announcements

  • Architectures and Processors blog

  • Automotive blog

  • Embedded and Microcontrollers blog

  • Internet of Things (IoT) blog

  • Laptops and Desktops blog

  • Mobile, Graphics, and Gaming blog

  • Operating Systems blog

  • Servers and Cloud Computing blog

  • SoC Design and Simulation blog

  • Tools, Software and IDEs blog

Tell us what you think
Tags
  • Assembly
  • Tutorial
Actions
  • RSS
  • More
  • Cancel
Related blog posts
Related forum threads

Detecting Overflow from MUL

Jacob Bramley
Jacob Bramley
September 11, 2013
4 minute read time.

Detecting Overflow from Arithmetic Operations

I discussed in a previous blog post that it is possible to set some condition flags based on the result of an arithmetic operation. Consider the following code:

adds    r0, r0, r1
bvs     <some_address>

The above code adds r1 to r0, then branches somewhere if a (signed) overflow was detected. This technique is used frequently in JIT-compilers for dynamic languages. In such contexts, the type and size of a variable is often not known when the code is compiled, so the JIT-compiler will test for overflow, and then fall back to a slower implementation in the case where a signed 32-bit integer cannot represent the result of the required operation. This is the approach taken by Mozilla's Trace Monkey JavaScript engine, for example.

Setting the Flags with mul

Those familiar with ARM's mul instruction may realize that although it can take the s suffix to update the flags, it only updates the n and z flags. Consider the following:

ldr     r0, =100000
muls    r0, r0, r0

It is, of course, not possible to represent 100000 × 100000 in a 32-bit signed integer, but muls does not report the overflow. A likely explanation for this limitation is that addition and subtraction overflows (and carries) can be used to implement 64-bit arithmetic, but the same is not true for mul as an operation may overflow by more than one bit. Indeed, 100000 × 100000 should result in 10000000000, or 0x2540be400, which cannot be reconstructed as a 64-bit quantity from the 32-bit result and the APSR flags alone.

The ARM architecture does, however, provide a facility for multiplying two 32-bit quantities to produce a 64-bit result. This operation cannot overflow:

ldr     r0, =100000
smull   r2, r3, r0, r0      @   Signed: [r3:r2] = r0 * r0
umull   r2, r3, r0, r0      @ Unsigned: [r3:r2] = r0 * r0

The smull and umull instruction can also set the flags if you add the s suffix, but as before, this isn't terribly useful for detecting 32-bit overflow.

Detecting Unsigned Overflow

Detecting unsigned overflow using umull is actually fairly trivial. Only if the multiplication overflows will the high word of the result be non-zero, so we can use the umull instruction to get a 64-bit result and test the top word in a scratch register. Here, we use ip as a scratch register 1.

ldr     r0, =100000
umull   r0, ip, r0, r0      @   [ip:r0] = r0 * r0
cmp     ip, #0
@ ----
bne     <somewhere>   @ Branch if we overflowed.
beq     <elsewhere>   @ Branch if we did not overflow.

The snippet above will multiply 100000 by itself, store the least-significant 32 bits of the result back into r0, and clear the Z flag if the calculation overflowed into ip. Subsequent code can use the ne and eq condition codes to act on this (as shown in the example).

Detecting Signed Overflow

Detecting unsigned overflow is fairly trivial because the top word can easily be checked for equality to zero. With a signed quantity, however, the top word may contain a non-zero value even if no overflow occurred. Specifically, the top 33 bits of the result will contain either all zeroes or all ones if no overflow occurred.

There is obviously no machine instruction to check that 33 consecutive bits all contain the same value. However, we can use an Operand 2 signed shift modifier to perform this check:

ldr     r0, =100000
smull   r0, ip, r0, r0
cmp     ip, r0, ASR #31

The ASR operation shifts r0 31 bits to the right, but it does an arithmetic shift, meaning that it extends the sign bit as it goes. With a 0 in bit 31, it merely shifts it all the way to bit 0, and fills the rest with zeroes. With a 1 in bit 31, it shifts it all the way to bit 0, but considers the value negative, so it fills the rest of the bits with ones. The cmp instruction therefore compares ip with either 0x00000000 or 0xffffffff, and therefore compares equal if the top 33 bits are identical. As with the unsigned version, the Z flag is cleared if the calculation overflowed, or set if it did not.

This technique is used by at least Mozilla's Trace Monkey and Google's V8 JavaScript JIT compilers, as both try to perform integer calculations but fall back to slower calculations (perhaps using a double-precision floating-point representation) if a signed integer cannot represent the required values.

Performance Considerations

  • Both the smull and the umull solutions use an additional scratch register. This often isn't a problem, but it might add additional complexity (and a performance penalty) if another value needs to be preserved on the stack.
  • The smull and umull instructions are more complicated than the mul instruction, and are therefore likely to take more cycles to execute. In contrast, the processing overhead of detecting overflow from an instruction such as add or sub is trivial, and usually free in practice. Performance may be improved if the smull or umull is calculated early, so the cmp doesn't stall waiting for a result.

1Note that ip is a synonym for r12, the intra-procedure-call scratch register. The ARM Procedure Call Standard allows ip to be corrupted by function calls, and it is commonly used as a general-purpose scratch register.

Anonymous
  • ahogen
    ahogen over 5 years ago

    Thanks for this.

    While this is super useful to understand what's going on, it might be worth noting that if you're using GCC (v7+) and need overflow checks in C code, it it's fairly simple to use "__builtin_*_overflow( )". Refer to https://gcc.gnu.org/onlinedocs/gcc/Integer-Overflow-Builtins.html

    • Cancel
    • Up 0 Down
    • Reply
    • More
    • Cancel
Architectures and Processors blog
  • Introducing GICv5: Scalable and secure interrupt management for Arm

    Christoffer Dall
    Christoffer Dall
    Introducing Arm GICv5: a scalable, hypervisor-free interrupt controller for modern multi-core systems with improved virtualization and real-time support.
    • April 28, 2025
  • Getting started with AARCHMRS Features.json using Python

    Joh
    Joh
    A high-level introduction to the Arm Architecture Machine Readable Specification (AARCHMRS) Features.json with some examples to interpret and start to work with the available data using Python.
    • April 8, 2025
  • Advancing server manageability on Arm Neoverse Compute Subsystem (CSS) with OpenBMC

    Samer El-Haj-Mahmoud
    Samer El-Haj-Mahmoud
    Arm and 9elements Cyber Security have brought a prototype of OpenBMC to the Arm Neoverse Compute Subsystem (CSS) to advancing server manageability.
    • January 28, 2025