Why the Cortex M4 instruction SMMUL (32 = 32 x 32b) preserves a redundant sign bit and discards one useful bit of information? What could possibly be the justification for such blatant disregard of the ISO/IEC TR 18037 standard Fract format?
SMULL gives both the high and low parts , one can do everything with that and it is implemented in the Cortex-M3,
SMMUL gives just the high part and is part of the DSP extension, you need a Cortex M4 for SMMUL.
This is just to avoid misinterpretation.
SMULL is still present in Cortex-M4.
Misinterpretation. Yes I could easily get paranoid about people misinterpreting what I've said! It just seems to happen so easily despite ones best efforts.
By the way some of the Cortex-M4 processors have single precision floating point and that is quite quick.
I just had a look at what gcc does for this and it doesn't do saturation. It could do the work in the same time and saturation I think with
smull hi, lo, x, y
lsr lo,31
qdadd result,lo,hi
Yes, your code example corresponds to the saturated multiplication that I have been using. It takes 3 cycles to complete. The single precision floating point is faster (VMUL.F32 takes 1 cycle) but the 24-bit mantissa has lower resolution than the 32-bit Fract so it can't be considered a direct replacement.
daith, sorry if I posted that. I didn't have much time and about to log-out then but there was a young engineer (I've recently convinced to also study ARM instead of being too dedicated to AVR and PICmicro) who read your reply pertaining to SMULL/SMMUL and wondered if SMULL was excluded in Cortex-M4. I then decided to post such response, hoping that would help prevent some other readers, especially new users of Cortex-M, from also misinterpreting the info.
I was agreeing with you. Misinterpreting happens all the time and is very hard to guard against.
This is not an answer to petr's question, I just found myself comparing they way some RISC processors initiated their support for multiplication in hardware.
The MUL instruction was added in ARMv2, SMULL in ARMv3M.
The i960 has multiply instructions generating (the least significant) 32 bits and extended multiply instruction that generates 64 bits stored in two 32-bit registers.
When I was studying the PowerPC (using older generations), I have to learn that to perform 32-bit x 32-bit = 64-bit two instructions must be used, one for getting the high-order 32 bits and one for getting the low-order 32 bits of the result.
When multiplying, MIPS32 uses special registers for storing the high- and low-order words of the result.