Why the Cortex M4 instruction SMMUL (32 = 32 x 32b) preserves a redundant sign bit and discards one useful bit of information? What could possibly be the justification for such blatant disregard of the ISO/IEC TR 18037 standard Fract format?
I believe this is intended to be used for getting the highword result of a topword x topword multiplication.
(note: by topword, I mean the most significant word of each factor; they could for instance be a 32-bit value multiplied by a 32-bit value or the highword of a 64-bit value multiplied by the highword of a 64-bit value; the product would go into the highest word of a 128-bit value).
Without it, I think it would be cumbersome to multiply two more-than-32-bit signed values.
Which operation do you need to perform in more details ?
Jens,
I need to perform these operations:
All of the above operations require the standard fractional multiplication for optimal accuracy.
Could you please give me an example of a DSP application on a low-power 32-bit MCU where you would need to multiply two more-than-32-bit signed values?
SMULL gives both the high and low parts , one can do everything with that and it is implemented in the Cortex-M3,
SMMUL gives just the high part and is part of the DSP extension, you need a Cortex M4 for SMMUL.
This is just to avoid misinterpretation.
SMULL is still present in Cortex-M4.
Misinterpretation. Yes I could easily get paranoid about people misinterpreting what I've said! It just seems to happen so easily despite ones best efforts.
By the way some of the Cortex-M4 processors have single precision floating point and that is quite quick.
I just had a look at what gcc does for this and it doesn't do saturation. It could do the work in the same time and saturation I think with
smull hi, lo, x, y
lsr lo,31
qdadd result,lo,hi
Yes, your code example corresponds to the saturated multiplication that I have been using. It takes 3 cycles to complete. The single precision floating point is faster (VMUL.F32 takes 1 cycle) but the 24-bit mantissa has lower resolution than the 32-bit Fract so it can't be considered a direct replacement.
daith, sorry if I posted that. I didn't have much time and about to log-out then but there was a young engineer (I've recently convinced to also study ARM instead of being too dedicated to AVR and PICmicro) who read your reply pertaining to SMULL/SMMUL and wondered if SMULL was excluded in Cortex-M4. I then decided to post such response, hoping that would help prevent some other readers, especially new users of Cortex-M, from also misinterpreting the info.
Hi petr,
I'm not sure if Jens' answer was the main reason for the SMMUL instruction. Note however that Cortex-M4 is strictly not a DSP but an MCU with DSP extension so multiplication of more-than-32-bit signed values may have application aside from DSP.
I hope you can visit here more often. You can share your knowledge about DSP by participating in discussions, posting blogs, etc. My impression is that you already have intensive experience in DSP especially using DSP/DSC rather than MCU.
Regards,
Goodwin
I was agreeing with you. Misinterpreting happens all the time and is very hard to guard against.
This is not an answer to petr's question, I just found myself comparing they way some RISC processors initiated their support for multiplication in hardware.
The MUL instruction was added in ARMv2, SMULL in ARMv3M.
The i960 has multiply instructions generating (the least significant) 32 bits and extended multiply instruction that generates 64 bits stored in two 32-bit registers.
When I was studying the PowerPC (using older generations), I have to learn that to perform 32-bit x 32-bit = 64-bit two instructions must be used, one for getting the high-order 32 bits and one for getting the low-order 32 bits of the result.
When multiplying, MIPS32 uses special registers for storing the high- and low-order words of the result.