This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

float substraction error

Has anyone had any problems substracting floats? I have the following code:
(C166 V4.23 compiler)

float dummy, test1, test2;

test1 = 1.0e10;
test2 = 1.1e10;
dummy = test2 - test1;

for some reason dummy comes out as
1.000001e9

I've tried declaring them as doubles and get the same result.


DS

Parents
  • Floating-point numbers are by no means EXACT representations of all real numbers. An explanation of how your numbers are represented will probably make more sense.

    Floating-point numbers are stored according to the IEEE-754 format with a sign bit, exponent, and mantissa. So:

    1.0e10 = F9021550x = 1111 1001 0000 0010 0001 0101 0101 0000
                         MMMM MMMM EMMM MMMM EMMM MMMM SEEE EEEE
    The sign is 0 (positive).
    The exponent is 10100000 = 160-127 = 33.
    The mantissa is (1.)00101010000001011111001.

    When you multiply the mantissa by 2^33, you get 10000000000.

    1.1e10 = ACE92350x = 1010 1100 1110 1001 0010 0011 0101 0000
                         MMMM MMMM EMMM MMMM EMMM MMMM SEEE EEEE
    The sign is 0 (positive).
    The exponent is 10100000 = 160-127 = 33.
    The mantissa is (1.)01000111110100110101100.

    When you multiply the mantissa by 2^33, you get 11000000512 which is as close to 1.1e10 as base-2 floating-point numbers can approximate.

    When you subtract these numbers: 10000000000-11000000512, you get 1000000512. If you round this to 7 significant digits (this is what is normally done in printf and in the debugger), you get 1.000001e9 which is the result that you see.

    As for using double precision, make sure that you check the Double-precision Floating-point box is checked in the C166 compiler options. Otherwise, double is implemented as a float.

    Jon

Reply
  • Floating-point numbers are by no means EXACT representations of all real numbers. An explanation of how your numbers are represented will probably make more sense.

    Floating-point numbers are stored according to the IEEE-754 format with a sign bit, exponent, and mantissa. So:

    1.0e10 = F9021550x = 1111 1001 0000 0010 0001 0101 0101 0000
                         MMMM MMMM EMMM MMMM EMMM MMMM SEEE EEEE
    The sign is 0 (positive).
    The exponent is 10100000 = 160-127 = 33.
    The mantissa is (1.)00101010000001011111001.

    When you multiply the mantissa by 2^33, you get 10000000000.

    1.1e10 = ACE92350x = 1010 1100 1110 1001 0010 0011 0101 0000
                         MMMM MMMM EMMM MMMM EMMM MMMM SEEE EEEE
    The sign is 0 (positive).
    The exponent is 10100000 = 160-127 = 33.
    The mantissa is (1.)01000111110100110101100.

    When you multiply the mantissa by 2^33, you get 11000000512 which is as close to 1.1e10 as base-2 floating-point numbers can approximate.

    When you subtract these numbers: 10000000000-11000000512, you get 1000000512. If you round this to 7 significant digits (this is what is normally done in printf and in the debugger), you get 1.000001e9 which is the result that you see.

    As for using double precision, make sure that you check the Double-precision Floating-point box is checked in the C166 compiler options. Otherwise, double is implemented as a float.

    Jon

Children
  • Ok fair enough, I'll try and keep it to 6 digits to be safe. I just wonder how my ancient calculator does it as I doubt it is more than a 16bit processor.

    thanks,
    DS

  • The data size of your calculator's processor is irrelevant, the original electronic calculators used 4 bit processors.

    Most calculators use an internal BCD notation (Binary coded Decimal) to avoid rounding errors caused by conversions from decimal to binary. The tradeoff is speed.

  • "Most calculators use an internal BCD notation (Binary coded Decimal) to avoid rounding errors"

    There will always be rounding errors whatever representation you use!
    Some numbers just can't be written down; eg 1/3.

    On my casio fx-451, if I reciprocate 9.000000001 and then reciprocate the result I get 9.000000002.
    You can have hours of fun playing with a calculator to find its little errors like this.
    Or maybe just get out a little more often... ;-)