Why the Cortex M4 instruction SMMUL (32 = 32 x 32b) preserves a redundant sign bit and discards one useful bit of information? What could possibly be the justification for such blatant disregard of the ISO/IEC TR 18037 standard Fract format?
What was wanted was what the NEON instructions VQDMULH or VQRDMULH do so ARM certainly thought the operation was worthwhile implementing when they designed NEON.
The examples in Cortex-M4 Devices Generic User Guide, 3.6.8. SMMUL use SMULL instead of SMMUL.
This is not an answer to petr's question, I just found myself comparing they way some RISC processors initiated their support for multiplication in hardware.
The MUL instruction was added in ARMv2, SMULL in ARMv3M.
The i960 has multiply instructions generating (the least significant) 32 bits and extended multiply instruction that generates 64 bits stored in two 32-bit registers.
When I was studying the PowerPC (using older generations), I have to learn that to perform 32-bit x 32-bit = 64-bit two instructions must be used, one for getting the high-order 32 bits and one for getting the low-order 32 bits of the result.
When multiplying, MIPS32 uses special registers for storing the high- and low-order words of the result.
Hi Petr,
I am not sure I understand your problem. But the opcode you mention is to give you the higher 32 bits of a signed multiplication.
So doing 0x7fff.ffff * 0x8000.0000, I'd expect the Rd to contain 0xc000.0000 which is does and seems to be the correct value.
If you need the full 64bit result, you can use SMULL<c> <RdLo>,<RdHi>,<Rn>,<Rm>