Still more instruction things giving me head ache.
This time it's the MUL-instruction.
What the heck means:
Multiply multiplies two register values. The least significant 32 bits of the result are written to the destination register. These 32 bits do not depend on whether the source register values are considered to be signed values or unsigned values.
Multiply multiplies two register values. The least significant 32 bits of the result are written to the destination
register. These 32 bits do not depend on whether the source register values are considered to be signed values or
unsigned values.
(The bolded part)
Well they don't and the result of an addition doesn't depend on that either. There is a related point though that the multiply instructions don't set the C and V flags to anything useful whereas they can be used to test for overflow or carry after an add.
Hello,
the MUL instruction is very dangerous because ARM architecture does not consider the signed or unsigned for MUL.
The explanation shows that just lower 32 bit value of multiplication is stored into the destination register.
Therefore the result of each signed or unsigned multiplication is the same.
In other words, the result is not valid in the case overflow in the 32 bit range.
For example, 0x12345678 * 0x8 produces the negative number of 0x91A2B3C0.
I think we should use SMULL or UMULL instead of MUL, if we want the correct result.
Best regards,
Yasuhiko Koumoto.
I understand an operation is either unsigned (same as 'raw') or signed,and the result is different in bits depending the way it's handled,but the explanation doesn't tell which way it works.
BTW, what 'add' are you referring to?
The result in those 32 bits is not different. This is a feature of two's complement arithmetic. Multiply is a version of lots of shifted adds and add with twos complement arithmetic makes no distinction in the result bits between signed and unsigned numbers. See the article Two's complement - Wikipedia, the free encyclopedia it has a section about multiplication. At the start of the section about addition it says "Adding two's-complement numbers requires no special processing if the operands have opposite signs: the sign of the result is determined automatically"
So it's basically unsigned?
It's still mystery for me, though, what this:
if x<N-1> == '1' then result = result - 2^N;
in the pseudocode of SInt(bits(N) x) is supposed to mean.
'2^N - result' would be 2's complement, but 'result - 2^N' should be negative of that.
I.e. result (or - -result)
I played with Yasuhiko Koumoto's example using a calculator, and I saw what you mean.
It's kind of funny though that only the lower part (the "common bits") is the same.
I mean as many bits as the shorter one has.
-0x8 = 0xFFFFFFFFFFFFFFF8
0x12345678 * 0x00000000FFFFFFF8 = 0x123456776E5D4C40
0x12345678 * 0xFFFFFFFFFFFFFFF8 = 0xFFFFFFFF6E5D4C40
0x12345678 * 0x000000000000FFF8 = 0x1233C4D54C40
Oh darn. I forgot to mark this as a question again.
Anyway, thanks Yasuhiko Koumoto and daith. Now I got it.
anyway, there is no means to check whether MUL result was overflowed or not.
It is also a mystery.
Yes it would be better if one could test the C and V flags for signed or unsigned overflow like for addition if they are setting the N and Z flags anyway. I guess they never thought the extra hardware would pay for itself but it is a difficult thing to do otherwise without doing a full multiply and a funny looking test.
I think MUL can be used if the numbers are known to be small enough, or with pre-detection of overflow and such.
I haven't checked, but maybe MUL is just a useful by-product of other instructions...
Couldn't tell...
Here are the MUL instruction decoding with its "closest relatives". (Isn't spreadsheet wonderful for analyzing instruction sets?)
c c c c 0 0 0 0 0 0 0 S d d d d 0 0 0 0 m m m m 1 0 0 1 n n n n MUL{S}<c> <Rd>, <Rn>, <Rm>
c c c c 0 0 0 0 0 0 1 S d d d d a a a a m m m m 1 0 0 1 n n n n MLA{S}<c> <Rd>, <Rn>, <Rm>, <Ra>
c c c c 0 0 0 0 0 1 0 0 h h h h l l l l m m m m 1 0 0 1 n n n n UMAAL<c> <RdLo>, <RdHi>, <Rn>, <Rm>
c c c c 0 0 0 0 0 1 1 0 d d d d a a a a m m m m 1 0 0 1 n n n n MLS<c> <Rd>, <Rn>, <Rm>, <Ra>
c c c c 0 0 0 0 1 0 0 S h h h h l l l l m m m m 1 0 0 1 n n n n UMULL{S}<c> <RdLo>, <RdHi>, <Rn>, <Rm>
c c c c 0 0 0 0 1 0 1 S h h h h l l l l m m m m 1 0 0 1 n n n n UMLAL{S}<c> <RdLo>, <RdHi>, <Rn>, <Rm>
c c c c 0 0 0 0 1 1 0 S h h h h l l l l m m m m 1 0 0 1 n n n n SMULL{S}<c> <RdLo>, <RdHi>, <Rn>, <Rm>
c c c c 0 0 0 0 1 1 1 S h h h h l l l l m m m m 1 0 0 1 n n n n SMLAL{S}<c> <RdLo>, <RdHi>, <Rn>, <Rm>
I almost see something, but not quite...
It looks like these instructions are pairs and MUL and MLA are pair like UMULL and UMLAL, but
also like UMAAL and MLS. The four (MUL, MLA, UMAAL and MLS) form a weird subset, though.
But I still fail to see the gist.
An update (now that I've done them in my SW):
MUL, MLA and MLS are a triplet like SMMUL, SMMLA and SMMLS.
MUL only multiplies, MLA adds the product to accumulate-value and MLS subtracts the product from accumulate-value.
I handled the triplets together - they were so similar in encoding and in function.
c c c c 0 0 0 0 0 0 0 S d d d d 0 0 0 0 m m m m 1 0 0 1 n n n n arm_cmac_mul arm_core_data_mac MUL{S}<c> <Rd>, <Rn>, <Rm> A1 A8.8.114
c c c c 0 0 0 0 0 0 1 S d d d d a a a a m m m m 1 0 0 1 n n n n arm_cmac_mla arm_core_data_mac MLA{S}<c> <Rd>, <Rn>, <Rm>, <Ra> A1 A8.8.100
c c c c 0 0 0 0 0 1 1 0 d d d d a a a a m m m m 1 0 0 1 n n n n arm_cmac_mls arm_core_data_mac MLS<c> <Rd>, <Rn>, <Rm>, <Ra> A1 A8.8.101
The "signedness immune" multiplication taking only the low-bits is common to all three.
IF MLS THEN
Rd = Ra + Rd
ELSE IF MLA THEN
Rd = Ra - Rd
ENDIF
The differences bertween SMMUL, SMMLA and SMMLS are the same.
So MUL seems not to be a by-product.
I am developing on a SAMD21G18 SoC so MUL is the only multiply instruction I have. Recently a developed in the ARM community posted a 32-bit x 32-bit --->64 bit multiply in 17 cycles and it is certainly an interesting routine. My problem is that I am converting a 64kb/s mono MP3 decoder and there are 10s of thousands of MULSHIFT32 macros all over the code. As the name suggests. it performs a 32-bit x 32-bit --->64 bit product but only bits 32-63 are required. I have just about exhausted the various methodologies within the programming fraternity as well as the pure maths branch of science. One GOOD result is that for people using an SoC with a 32-cycle multiply, Karatsuba multiplication is some 22 cycles faster. Something of use to people writing for the very smallest ARM cores. For myself, I have exhausted tricks like finding the least significant bits within a register so that just 2 multiplies will produce the correct results.... but buy is it slow.If anyone has just a name for me to search, I would really appreciate it. The system works at 48MHz but the slower I can run the SoC, the less power it uses and with the product aiming to use a single (rechargable) AA battery, improving battery life is vital.There are quite a few interesting & novel aspects to the ARM processors. It is always interesting although the C bit treated as a 'borrow' rather than a carry spoils some looping and makes some maths... interesting.I would like to take this chance to thank Yasuhiko Koumoto who has always been a superb source of information explaining 'branch shadows' which I think is unique to ARM. I coded many of the RISC chips developed in the 80s (used in consoles in the 90s) so I came from 'branch delay-slot' instructions.