This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

ARM7TDMI: SUBS vs SUB + CMP

Note: This was originally posted on 2nd April 2009 at http://forums.arm.com

Hi,

I have a question regarding the SUBS instruction and how it compares to SUB and CMP (due to unexpected behavior in a C-program).

The original C-code goes like this (all variables are 32 bit signed integers):

    t = a*b - c*c;
    if (t > 0) d = t;

In a particular case (a = 0x80, b = 0x106DCD9, c = 0x4E352501) I get an overflow in a*b as well as in the subtraction. What is puzzling to me is that there's a difference between the the following two assembler versions in a certain simulated environment - and I'm trying to figure out if this is expected or a bug in the simulator:

MUL temp1, c, c
MUL temp2, a, b
SUBS t, temp2, temp1
MOVGT d, t

and

MUL temp1, c, c
MUL temp2, a, b
SUB t, temp2, temp1
CMP t, #0
MOVGT d, t

(These are genereated depending on different compiler settings.)

Any help is greatly appreciated!

Greger
  • Note: This was originally posted on 3rd April 2009 at http://forums.arm.com

    Thank you for your answers!

    I realize I gave the wrong numbers. The calculation in question (indeed, the root cause is poor algorithm design) is

    t = 0x80 * 0x106DCD9 - 0x8D7F * 0x8D7F

    => t = 0x836E6C80 - 0x4E352501 (=0x3539477F)

    (a = 0x80, b = 0x106DCD9, c = 0x8DF, a*b = 0x836E6C80, c*c = 0x4E352501)

    So, if I understand you right we should have

    MUL temp1, c, c => no overflow flag, no negative flag (this is in bounds)
    MUL temp2, a, b => negative flag, no overflow flag (but shouldn't overflow be set since it's a signed multiply?)
    SUBS t, temp2, temp1 => no negative flag, overflow flag (underflow)
    MOVGT d, t => MOV is not executed

    and

    MUL temp1, c, c => no overflow flag, no negative flag (this is in bounds)
    MUL temp2, a, b => negative flag, no overflow flag (but shouldn't overflow be set since it's a signed multiply?)
    SUB t, temp2, temp1
    CMP t, #0  => 0x3539477F  - 0 => no negative flag, no overflow overflow flag
    MOVGT d, t => MOV is executed

    OK. Then it is really not a problem with the simulator but that the optimizer takes a legal short cut due to a poorly designed algorithm (which overflows). "Good."

    Just a follow-up question here, shouldn't the negative flag be set in the SUBS? I mean we do subtract a positive number from a negative number.

    Thanks again!

    Greger
  • Note: This was originally posted on 7th April 2009 at http://forums.arm.com

    The ARM we're working with is ARM7TDMI (Texas Instruments TMS470). Actually, we work with several different embedded controllers but the TI compiler was the only one taking this particular, unfortunately "legal", shortcut :-) But I'll definitely read up on the ARM family!
  • Note: This was originally posted on 6th April 2009 at http://forums.arm.com

    Thank you very much for all your answers! I also realize I need to find myself a good book on this... :-)

    Greger
  • Note: This was originally posted on 2nd April 2009 at http://forums.arm.com

    That looks like a bug in your simulator - although not because the behavior is different.

    The reason...

    GT is defined as executing if "Z == 0 and N == V" in the CPSR flag bits.

    In your example:
    temp1 = F3C3 4A01 (overflowed)
    temp2 = 836E6C80
    t = temp2 - temp1 (underflowed)

    In the SUBS case the SUBS instruction will perform (836E6C80 - F3C3 4A01 = 8FAB227F) which will set Nzcv in the CPSR (Negative, non-zero, no carry, no overflow)  - this means the final MOVGT won't execute. No carry for a subtract indicates underflow.

    In the CMP case the CMP instruction will perform (8FAB227F - 0) which will set NzCv in the CPSR (Negative, non-zero, Carry, no overflow) - which should also mean that the final MOVGT won't execute. Carry for a subtract indicates no underflow occured.

    It is worth noting that the CPSR flags are different for these two cases, so some condition codes for the final MOV may validly generate different behavior for the two cases. Compilers are not designed to cope with under/overflow - so this is allowed by the C spec =)
  • Note: This was originally posted on 3rd April 2009 at http://forums.arm.com

    > Just a follow-up question here, shouldn't the negative flag be set in the SUBS? I mean we do subtract a positive number from a negative number.
    The ARM core generally has no notion of signed and unsigned instructions (for general data processing anyway - signed and unsigned multiply and saturating DSP instructions exist) - that's a compiler issue to generate appropriate instructions which fit 2's complement maths, the 'N'egative bit is simply bit 31 of the result register.

    > MUL temp2, a, b => negative flag, no overflow flag (but shouldn't overflow be set since it's a signed multiply?)
    Also - you mention flag setting in your multiplication instructions - on ARM the flags are only set for instructions with the 'S' postfix (there are notable exceptions such as CMP and CMPN which implicitly set flags). So SUBS sets the flags, SUB doesn't. In your case the MUL will not set any flags at all because it isn't a MULS.
  • Note: This was originally posted on 2nd April 2009 at http://forums.arm.com

    To merge the comparison into the SUBS flag setting would require the ability to perform a condition check for N clear and Z clear, i.e a positive non-zero value; however, no such condition check exists.
    The closest is GT, with V clear, i.e. Z clear and N==V.

    Whether or not this is a compiler bug is a different question; the compiler choosing GT rathen than NE, indicates that "t" is a "signed int", and this may be why this apparently erroneous behaviour is legal -- the C standard has a number of caveats with respect to arithmetic operations exceeding the precision of "signed" integer types and allowing implemenation defined results.

    hth
    s.
  • Note: This was originally posted on 6th April 2009 at http://forums.arm.com

    I'd highly recommend the ARM System Developer's Guide (http://www.arm.com/documentation/books/4975.html) - it covers lots of useful stuff from algorithms to operating system techniques.

    It is slightly out of date now - it mostly covers ARMv5T(E) which is ARM, Thumb and ARM 'E' DSP instructions. This is all relevant to newer processors too - but if you expect to be working on ARM11 (ARMv6) or any of the Cortex (ARMv7)processors, expect minor differences and additions (such as the Thumb2 instruction set).