Why the Cortex M4 instruction SMMUL (32 = 32 x 32b) preserves a redundant sign bit and discards one useful bit of information? What could possibly be the justification for such blatant disregard of the ISO/IEC TR 18037 standard Fract format?
The saturated multiplication would be an obvious choice (you can use SMULL when you need guarded result). The ARM format does not even allow chaining of multiplications without a progressive loss of accuracy, ugh.
The ARM DSP extension was defined in 2009 - three years after that ISO standard. The M4 core was introduced in 2010 so no excuse there. The fractional format itself dates back to 1980's with chips like the Motorola DSP56000.
The ARM DSP extension including the SMMUL instruction was introduced into the ARM architecture in 2000 in ARMv5TE in 2004 in ARMv6.
I see it was added after the other DSP instructions, a bit later than I thought but still before the standard..And they refer to it as an extended multiply instruction rather than DSP, it might have helped if it was thought as DSP.
Looking a bit deeper as that struck me as a bit wrong - the ARM1136 technical Manual r0p1 from February 2003 has SMMUL in it even though that is before ARMv6 was defined. ARM wasn't so rigorous about versions and features then. ARM1136 was upgraded a bit when the ARMv6 definition came out but it already had this instruction.
It's interesting that even the 8-bit 68HC11 have versions with support, albeit minimal, for fractional format. The E variants are perhaps the earliest to provide such support. Nonetheless, the fractional data format might have already been used even in early computers.