I have looked at the cycle counts for the Cortex M3 instructions at http://infocenter.arm.com/help/topic/com.arm.doc.100165_0201_00_en/ric1414056333562.html. Some instructions are listed as taking a range of cycles to complete. I want to understand what conditions determine the actual cycle counts.
I am particularly interested in the SMULL/UMULL, SMLAL/UMLAL instructions which take between 3-5 and 4-7 cycles respectively. The linked reference stipulates the instructions terminate early depending on the size of source values. What does this mean exactly?
I am also interested in the SDIV and UDIV instructions which take 2-12 cycles. Is there a way I can determine how many cycles the instruction will actually take?
How do you get three 32x32 multiplies? I'm not seeing that.
Is it possible to get the number of cycles for input sizes confirmed by ARM?
For division, I presumed it would be a fast-forwarded conditional subtraction and shift operation. I just wanted it confirmed.
My motivation for these questions is that I want a better understanding of the conditions that would cause an operation to take much longer than expected due to inputs.
Using an extra register would involve an extra cycle which would go in somewhere - but I think perhaps the best thing to do is to write a program and see. I would guess the times are based on whether the top 16 bits are zero or not but you could also try 15 bit and negative numbers and just time a loop and see what the difference is. Try some very small numbers too just to get a base point. and in case the cycles go up starting at a smaller number.