This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Floating Addition - Loosing data

Hello,

I'm making use of floating math in my application and have run into a little problem with floating addition. It seems that the floating addition operation looses data at some point. I setup the following test program:

#include <intrins.h>

void main(void)
  {
  float t, v;
  unsigned int i;

  while(1)
    {
    // this block uses multiplication
    for(i=1; i<1000; i++)
      {
      t = 5964.1 * i;
      v = t/i;

      if(v != 5964.1)
        _nop_();
      }

    t = 0;

    // this block will continuously add  5964.1
    for(i=1; i<1000; i++)
      {
      t += 5964.1;
      v = t/i;

      if(v != 5964.1)
        _nop_();
      }
    }
  }

My project is setup using the P89C668 with all default options EXCEPT "bits to round for float compare" is set to 1. I tried all levels of optimization and the result was the same in all cases.

I then set a breakpoint on each nop. When I simulate the program and hit run, a break occurs on the second nop. This would indicate that the addition opperation lost some data.

Whats happening here? Why would the addition opperation not properly calculate the result while the multiplication does?

Thanks
Philip

Parents
  • However, why does this error accumulation exist?

    This is similar to the problem of buying 3 items for $1.00.

    If I go to the store and I buy 6 items, the price is $2.00. This is similar to your multiplication code.

    If I go to the store and I buy 1 item, the price is $0.33.

    If I go to the store later and buy 1 item, the price is $0.33.

    If I go to the store a third time and buy 1 item, the price is $0.33.

    Buying the items individually, I've only paid $0.99 instead of $1.00. This is equivalent to your addition code.

    The reason for the difference is that we can't represent 1/3 of a cent using the coins that we have.

    Now, with floating-point math, we have a 23-bit base-2 mantissa. That gives us about 7 decimal digits of precision. Don't forget that 5964.1 is not stored in base-10. It's stored in base-2 and normalized and is expanded/contracted with an exponent (power of 2) as Drew pointed out.

    The reason, that iterative adding doesn't equate to instantaneous multiplication is because at some point in the additions, the result exceeds the precision of the storage format (IEEE-754) and digits from the end (binary digits, that is) are lost. The same as with the 1/3 of a penny.

    That's why with floating-point math, you must use comparisons that account for TLAR (that looks about right).

    Jon

Reply
  • However, why does this error accumulation exist?

    This is similar to the problem of buying 3 items for $1.00.

    If I go to the store and I buy 6 items, the price is $2.00. This is similar to your multiplication code.

    If I go to the store and I buy 1 item, the price is $0.33.

    If I go to the store later and buy 1 item, the price is $0.33.

    If I go to the store a third time and buy 1 item, the price is $0.33.

    Buying the items individually, I've only paid $0.99 instead of $1.00. This is equivalent to your addition code.

    The reason for the difference is that we can't represent 1/3 of a cent using the coins that we have.

    Now, with floating-point math, we have a 23-bit base-2 mantissa. That gives us about 7 decimal digits of precision. Don't forget that 5964.1 is not stored in base-10. It's stored in base-2 and normalized and is expanded/contracted with an exponent (power of 2) as Drew pointed out.

    The reason, that iterative adding doesn't equate to instantaneous multiplication is because at some point in the additions, the result exceeds the precision of the storage format (IEEE-754) and digits from the end (binary digits, that is) are lost. The same as with the 1/3 of a penny.

    That's why with floating-point math, you must use comparisons that account for TLAR (that looks about right).

    Jon

Children