We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
I try to optimize my code by substituting a pair of smull and sub instructions with one smmls instruction, but the smmls gives the wrong result! This is the test case:
mov32 r1, #0x8000d286 mov32 r2, #0xfcdbc095 mov32 r3, #0x01921d20 smull r0, ip, r1, r2 sub r0, r3, ip smmls r0, r1, r2, r3
The combination of smull and sub gives zero as expected. But the smmls gives 0xffffffff. I tried to run this code in a simulator (Cortex-M4) and on a real hardware (STM32F407VG), the result is always the same 0xffffffff. What am I doing wrong?
I figured out what the problem is. The smmls description in the "ARM Cortex-M4 Devices User Guide" is wrong. It says that the subtraction occurs after extracting the most significant 32 bits of the product. But the "ARMv7-M Architecture Reference Manual" states that subtraction occurs before extraction, which is what really happens.