I try to optimize my code by substituting a pair of smull and sub instructions with one smmls instruction, but the smmls gives the wrong result! This is the test case:
mov32 r1, #0x8000d286 mov32 r2, #0xfcdbc095 mov32 r3, #0x01921d20 smull r0, ip, r1, r2 sub r0, r3, ip smmls r0, r1, r2, r3
The combination of smull and sub gives zero as expected. But the smmls gives 0xffffffff. I tried to run this code in a simulator (Cortex-M4) and on a real hardware (STM32F407VG), the result is always the same 0xffffffff. What am I doing wrong?
I figured out what the problem is. The smmls description in the "ARM Cortex-M4 Devices User Guide" is wrong. It says that the subtraction occurs after extracting the most significant 32 bits of the product. But the "ARMv7-M Architecture Reference Manual" states that subtraction occurs before extraction, which is what really happens.
View all questions in Keil forum