Arm Community
Arm Community
  • Site
  • User
  • Site
  • Search
  • User
Arm Community blogs
Arm Community blogs
Architectures and Processors blog Divide and Conquer
  • Blogs
  • Mentions
  • Sub-Groups
  • Tags
  • Jump...
  • Cancel
More blogs in Arm Community blogs
  • AI blog

  • Announcements

  • Architectures and Processors blog

  • Automotive blog

  • Embedded and Microcontrollers blog

  • Internet of Things (IoT) blog

  • Laptops and Desktops blog

  • Mobile, Graphics, and Gaming blog

  • Operating Systems blog

  • Servers and Cloud Computing blog

  • SoC Design and Simulation blog

  • Tools, Software and IDEs blog

Tell us what you think
Tags
  • Architecture
  • Cortex-R
  • NEON
  • Cortex-A
  • Cortex-M
  • Processors
  • Tutorial
Actions
  • RSS
  • More
  • Cancel
Related blog posts
Related forum threads

Divide and Conquer

Chris Shore
Chris Shore
January 10, 2014
4 minute read time.

Division on ARM Cores

“At the end of the day, we must go forward with hope and not backward by fear and division.” – Jesse Jackson.

It often surprises me how many people believe that “ARM doesn’t do division” or “ARM cores don’t have divide hardware”. Of course, that used to be the case – up until the launch of the Cortex brand in 2004, ARM cores simply didn’t have hardware support for division operations.

Which cores support hardware divide?

Well, here’s the official answer...

Support for the SDIV/UDIV instructions is mandatory in ARMv7-M and for the Thumb instruction set in ARMv7-R. It is optional for the ARM instruction set in ARMv7-R. It is optional in ARMv7-A and, if supported, may be in the Thumb instruction set only or in both Thumb and ARM. In ARMv7-A with the Virtualization Extensions, it is mandatory in Thumb and ARM.

Confused! Well, if you want to know whether your particular core supports these instructions, there is a handy register to check. On ARMv7-A and ARMv7-R cores, the Instruction Set Attribute Register (ID_ISAR0) contains a field called “Divide_instrs” (bits 27:24) which takes the following values:

     0000 – Not implemented
     0001 – SDIV/UDIV in Thumb instruction set
     0010 – SDIV/UDIV in both ARM and Thumb instruction sets

You can find more information in the Architecture Reference Manuals.

The following table shows the status of hardware divide support for all current ARM cores.

Processor Thumb DIV ARM DIV
Cortex-A57 Y Y
Cortex-A53 Y Y
Cortex-A15 Y Y
Cortex-A9 N N
Cortex-A8 N N
Cortex-A7 Y Y
Cortex-A5 N N
Cortex-R7 Y Y
Cortex-R5 Y Y (r1 only)
Cortex-R4 Y N
Cortex-M4 Y N/A
Cortex-M3 Y N/A
Cortex-M1 N N/A
Cortex-M0 N N/A
Cortex-M0+ N N/A

How do those instructions work?

The syntax of the instructions is simple enough:

SDIV Rd, Rn, Rm ; Rd = Rn / Rm

The only real wrinkle you need to be aware of is the handling of division by zero. Again, the behavior varies by architecture.

ARMv7-A - divide by zero always returns a zero result.

ARMv7-R - the SCTLR.DZ bit controls whether you get a zero result or a Undefined Instruction exception when you attempt to divide by zero (the default is to return zero).

ARMv7-M -  the CCR.DIV_0_TRP bit controls whether an exception is generated. If this occurs, it will cause a UsageFault and the UFSR.DIVBYZERO bit will indicate the reason for the fault.

Note that none of the divide instructions, in any of the architectures, affect the condition code flags. All can be made conditional: in ARM state, via the condition code field and in Thumb state via the IT instruction.

What to do when your core doesn’t have divide hardware?

So, many ARM cores do support hardware divide these days. But there are still some which don’t. So we should consider what happens there.

In the general case, the compiler will use a run-time library routine for division and you should regard this as “slow”. The ARM tools do provide two versions of the library routine, one of which is labelled “real-time” and is guaranteed to return in fewer than 45 cycles every time. It will be faster for larger quotients but slower for typical quotients so should be used in applications which require a more deterministic division performance.

The compiler will, however, do its best to provide the best division performance it can. In the case of division by a compile-time constant, it will use shifts where possible to divide by power of two, for instance. For other constants, it will use an inline long multiplication sequence to calculate an integer result. For example, here is the sequence it will use for division by ten.

LDR     r0, =0xCCCCCCCD
UMULL   r2, r1, r0, r1
MOV     r1, r1, LSR #3

The constant used is a fixed-point binary representation of 1/10. The final shift removes the fractional part of the result to leave an integer.

And don’t forget that the module (%) operator requires a division to work out the remainder, so is really a “divide in disguise”. A common requirement is to increment a counter and wrap it a limit value. Programmers often use a module to do this. The code below shows that a simple test-and-reset construct is much more efficient.

C Code ARM Assembly Code
count = (count + 1) % 60;

ADD      r1, r0, #1

MOV      r0, #0x3c  

BL       __aeabi_idiv

MOV      r0, r1

if (++count >= 60)

    count = 0;

ADD      r0, r0, #1

CMP      r0, #0x3c  

MOVCS    r0, #0

What about floating point

So far, we have only discussed integer arithmetic. Many applications running on ARM platforms require floating point support.

Many ARM cores support floating point hardware as an option. This applies across the range from the Cortex-M4 microcontroller to the Cortex-R and Cortex-A cores. In many cases, the support is optional so you should check documentation to find out what is supported in your device.

Be careful though and check carefully. The VFP architecture supports single and double precision floating point, including divide operations. The NEON architecture (the two are often implemented together and share a register bank) only supports single precision floating point and doesn’t support division. In these cases, a runtime library would he used.

Learn more about Arm cores

Anonymous
  • Shone ted
    Shone ted over 7 years ago

    Be careful though and check carefully. The VFP architecture supports single and double precision floating point, including divide operations. The NEON architecture (the two are often implemented together and share a register bank) only supports single precision floating point and doesn’t support division. In these cases, a runtime library would he used.

    • Cancel
    • Up 0 Down
    • Reply
    • More
    • Cancel
  • Massimo Manca
    Massimo Manca over 8 years ago

    Peter:
    1. it is ALWAYS wrong because there is no precondition about the initial value of count, in the example if count value is 358 after 1 increment the result is 0 as if value was 300 but obviously in the 1st case you should require just one more step (after the first preincrement) because the boolean test become true but because the logic behind it it is wrong you will require 60 increments more.

    And I would not tell about which programmer should use module to limit a value because at least in C and C++ any decent software engineer doesn't use module for that. May be we have different idea about what means decent and expert engineer.

    2. How many engineers don't know that module is the operator returning the rest of a division? I hope very few. Seems to me that to compute module (without a hardware divider) we need a division and not only a division.
    I am not sure if in Mathematics in English module operator is colled module or remainder but I think any high school student have the notion of module.

    3. The article had been more concrete and helpful just pointing to the complexity of the division algorithm in ARM assembly (traditional, thumb and thumb-2) and remembering somethink about the modulo and its mathematical expression requiring 1 division, 1 multiplication and 1 subtraction. There is nothing more to say.

    Anyway I turn back to my initial problem because googling I found the article: correcting the result of a division by zero operation without resetting the cpu.

    • Cancel
    • Up 0 Down
    • Reply
    • More
    • Cancel
  • Peter Harris
    Peter Harris over 8 years ago
    but in both cases the other code only returns 0 so it is totally wrong

    As per the sentence above the table, this isn't designed to be a generate purpose replacement for every use of modulo.

    "A common requirement is to increment a counter and wrap it a limit value. Programmers often use a modulo to do this. The code below shows that a simple test-and-reset construct is much more efficient."

    Did you actually try to run it? Note the pre-increment on "count" in the "if" test - the counter will increment on every iteration.

    if (++count >= 60)

        count = 0;

    • Cancel
    • Up 0 Down
    • Reply
    • More
    • Cancel
  • Massimo Manca
    Massimo Manca over 8 years ago

    The code using modulo operator and the greater or equal are not equivalent. % returns the rest of a division by a number so 125 % 60 = 5 as 305 % 60 but in both cases the other code only returns 0 so it is totally wrong.


    Mathematically:
                          

                             C = A % B is equivalent to C = A – B * (A / B)

    and that is the only general way to compute the modulo.

    Just in case B is a power of 2 you can simplify the equivalence, in that case and only in that case:

                             C = A%B is equivalent to C = A & (B-1)

    • Cancel
    • Up 0 Down
    • Reply
    • More
    • Cancel
  • Chris Shore
    Chris Shore over 11 years ago

    Thanks for the comment, Pete. Good point which extends to an area I didn't want to get in to in a first pass. It would be good to see an example of you to use those instructions to implement an iterative division operation.

    • Cancel
    • Up 0 Down
    • Reply
    • More
    • Cancel
>
Architectures and Processors blog
  • Introducing GICv5: Scalable and secure interrupt management for Arm

    Christoffer Dall
    Christoffer Dall
    Introducing Arm GICv5: a scalable, hypervisor-free interrupt controller for modern multi-core systems with improved virtualization and real-time support.
    • April 28, 2025
  • Getting started with AARCHMRS Features.json using Python

    Joh
    Joh
    A high-level introduction to the Arm Architecture Machine Readable Specification (AARCHMRS) Features.json with some examples to interpret and start to work with the available data using Python.
    • April 8, 2025
  • Advancing server manageability on Arm Neoverse Compute Subsystem (CSS) with OpenBMC

    Samer El-Haj-Mahmoud
    Samer El-Haj-Mahmoud
    Arm and 9elements Cyber Security have brought a prototype of OpenBMC to the Arm Neoverse Compute Subsystem (CSS) to advancing server manageability.
    • January 28, 2025