Why the Cortex M4 instruction SMMUL (32 = 32 x 32b) preserves a redundant sign bit and discards one useful bit of information? What could possibly be the justification for such blatant disregard of the ISO/IEC TR 18037 standard Fract format?
By the way some of the Cortex-M4 processors have single precision floating point and that is quite quick.
I just had a look at what gcc does for this and it doesn't do saturation. It could do the work in the same time and saturation I think with
smull hi, lo, x, y
lsr lo,31
qdadd result,lo,hi
Yes, your code example corresponds to the saturated multiplication that I have been using. It takes 3 cycles to complete. The single precision floating point is faster (VMUL.F32 takes 1 cycle) but the 24-bit mantissa has lower resolution than the 32-bit Fract so it can't be considered a direct replacement.