Strange behavior of the "smmls" instruction

I try to optimize my code by substituting a pair of smull and sub instructions with one smmls instruction, but the smmls gives the wrong result! This is the test case:

mov32   r1, #0x8000d286
mov32   r2, #0xfcdbc095
mov32   r3, #0x01921d20
smull   r0, ip, r1, r2
sub     r0, r3, ip
smmls   r0, r1, r2, r3

The combination of smull and sub gives zero as expected. But the smmls gives 0xffffffff. I tried to run this code in a simulator (Cortex-M4) and on a real hardware (STM32F407VG), the result is always the same 0xffffffff. What am I doing wrong?

More questions in this forum