Why the Cortex M4 instruction SMMUL (32 = 32 x 32b) preserves a redundant sign bit and discards one useful bit of information? What could possibly be the justification for such blatant disregard of the ISO/IEC TR 18037 standard Fract format?
Yes, your code example corresponds to the saturated multiplication that I have been using. It takes 3 cycles to complete. The single precision floating point is faster (VMUL.F32 takes 1 cycle) but the 24-bit mantissa has lower resolution than the 32-bit Fract so it can't be considered a direct replacement.