This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

float substraction error

Has anyone had any problems substracting floats? I have the following code:
(C166 V4.23 compiler)

float dummy, test1, test2;

test1 = 1.0e10;
test2 = 1.1e10;
dummy = test2 - test1;

for some reason dummy comes out as
1.000001e9

I've tried declaring them as doubles and get the same result.


DS

Parents
  • Floating-point numbers are by no means EXACT representations of all real numbers. An explanation of how your numbers are represented will probably make more sense.

    Floating-point numbers are stored according to the IEEE-754 format with a sign bit, exponent, and mantissa. So:

    1.0e10 = F9021550x = 1111 1001 0000 0010 0001 0101 0101 0000
                         MMMM MMMM EMMM MMMM EMMM MMMM SEEE EEEE
    The sign is 0 (positive).
    The exponent is 10100000 = 160-127 = 33.
    The mantissa is (1.)00101010000001011111001.

    When you multiply the mantissa by 2^33, you get 10000000000.

    1.1e10 = ACE92350x = 1010 1100 1110 1001 0010 0011 0101 0000
                         MMMM MMMM EMMM MMMM EMMM MMMM SEEE EEEE
    The sign is 0 (positive).
    The exponent is 10100000 = 160-127 = 33.
    The mantissa is (1.)01000111110100110101100.

    When you multiply the mantissa by 2^33, you get 11000000512 which is as close to 1.1e10 as base-2 floating-point numbers can approximate.

    When you subtract these numbers: 10000000000-11000000512, you get 1000000512. If you round this to 7 significant digits (this is what is normally done in printf and in the debugger), you get 1.000001e9 which is the result that you see.

    As for using double precision, make sure that you check the Double-precision Floating-point box is checked in the C166 compiler options. Otherwise, double is implemented as a float.

    Jon

Reply
  • Floating-point numbers are by no means EXACT representations of all real numbers. An explanation of how your numbers are represented will probably make more sense.

    Floating-point numbers are stored according to the IEEE-754 format with a sign bit, exponent, and mantissa. So:

    1.0e10 = F9021550x = 1111 1001 0000 0010 0001 0101 0101 0000
                         MMMM MMMM EMMM MMMM EMMM MMMM SEEE EEEE
    The sign is 0 (positive).
    The exponent is 10100000 = 160-127 = 33.
    The mantissa is (1.)00101010000001011111001.

    When you multiply the mantissa by 2^33, you get 10000000000.

    1.1e10 = ACE92350x = 1010 1100 1110 1001 0010 0011 0101 0000
                         MMMM MMMM EMMM MMMM EMMM MMMM SEEE EEEE
    The sign is 0 (positive).
    The exponent is 10100000 = 160-127 = 33.
    The mantissa is (1.)01000111110100110101100.

    When you multiply the mantissa by 2^33, you get 11000000512 which is as close to 1.1e10 as base-2 floating-point numbers can approximate.

    When you subtract these numbers: 10000000000-11000000512, you get 1000000512. If you round this to 7 significant digits (this is what is normally done in printf and in the debugger), you get 1.000001e9 which is the result that you see.

    As for using double precision, make sure that you check the Double-precision Floating-point box is checked in the C166 compiler options. Otherwise, double is implemented as a float.

    Jon

Children