Why the Cortex M4 instruction SMMUL (32 = 32 x 32b) preserves a redundant sign bit and discards one useful bit of information? What could possibly be the justification for such blatant disregard of the ISO/IEC TR 18037 standard Fract format?
I think they just missed a trick there when defining the DSP instructions. You can get the result you want by using a long multiply and double but it takes a few extra cycles. Or one can get by with one less bit of precision and do a shift left of one of the operands before the multiply but that isn't any sort of standard. Or if they had set carry for the top bit of the low half then doubling would have just taken an extra cycle.
By the way that 'redundant sign bit' isn't completely redundant because 0x80000000 multiplied by 0x80000000 gives 0x4000000 for the top half. An instruction designed for DSP would have to do saturation to 0x7fffffff or else ignore that case and get 0 or 0x80000000.
As to blatant disregard for Fract format - the instructions were defined and implemented long before that standard was thought of as far as I'm aware.
The saturated multiplication would be an obvious choice (you can use SMULL when you need guarded result). The ARM format does not even allow chaining of multiplications without a progressive loss of accuracy, ugh.
The ARM DSP extension was defined in 2009 - three years after that ISO standard. The M4 core was introduced in 2010 so no excuse there. The fractional format itself dates back to 1980's with chips like the Motorola DSP56000.
The ARM DSP extension including the SMMUL instruction was introduced into the ARM architecture in 2000 in ARMv5TE in 2004 in ARMv6.
I see it was added after the other DSP instructions, a bit later than I thought but still before the standard..And they refer to it as an extended multiply instruction rather than DSP, it might have helped if it was thought as DSP.
Looking a bit deeper as that struck me as a bit wrong - the ARM1136 technical Manual r0p1 from February 2003 has SMMUL in it even though that is before ARMv6 was defined. ARM wasn't so rigorous about versions and features then. ARM1136 was upgraded a bit when the ARMv6 definition came out but it already had this instruction.
It's interesting that even the 8-bit 68HC11 have versions with support, albeit minimal, for fractional format. The E variants are perhaps the earliest to provide such support. Nonetheless, the fractional data format might have already been used even in early computers.