This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Cortex M3 : what determines the cycle count for a variable cycle count instruction?

I have looked at the cycle counts for the Cortex M3 instructions at http://infocenter.arm.com/help/topic/com.arm.doc.100165_0201_00_en/ric1414056333562.html. Some instructions are listed as taking a range of cycles to complete. I want to understand what conditions determine the actual cycle counts.

I am particularly interested in the SMULL/UMULL, SMLAL/UMLAL instructions which take between 3-5 and 4-7 cycles respectively. The linked reference stipulates the instructions terminate early depending on the size of source values. What does this mean exactly?

I am also interested in the SDIV and UDIV instructions which take 2-12 cycles. Is there a way I can determine how many cycles the instruction will actually take?

Parents
  • I would guess the long multiplies can involve up to three 32x32 multiplies plus an addition and an overhead cycle, and the ones with accumulate can involve another two additions. And a cycle can be left out if an operand is 0 (and perhaps -1 but I wouldn't bet on that).

    For the division the timings sound like they can do up to three instances of a very simple single bit shift algorithm per cycle with an extra cycle making the operands positive for a signed divide and quickly shifting through zeroes in the numerator

    I wouldn't bet on any of that and if you need the timings to be constant you can expand as individual instructions that take a constant time. Dividing by a constant can be done by a multiply and a little messing around. This can actually be faster than a hardware divide in some cases.

Reply
  • I would guess the long multiplies can involve up to three 32x32 multiplies plus an addition and an overhead cycle, and the ones with accumulate can involve another two additions. And a cycle can be left out if an operand is 0 (and perhaps -1 but I wouldn't bet on that).

    For the division the timings sound like they can do up to three instances of a very simple single bit shift algorithm per cycle with an extra cycle making the operands positive for a signed divide and quickly shifting through zeroes in the numerator

    I wouldn't bet on any of that and if you need the timings to be constant you can expand as individual instructions that take a constant time. Dividing by a constant can be done by a multiply and a little messing around. This can actually be faster than a hardware divide in some cases.

Children