This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Floating Addition - Loosing data

Hello,

I'm making use of floating math in my application and have run into a little problem with floating addition. It seems that the floating addition operation looses data at some point. I setup the following test program:

#include <intrins.h>

void main(void)
  {
  float t, v;
  unsigned int i;

  while(1)
    {
    // this block uses multiplication
    for(i=1; i<1000; i++)
      {
      t = 5964.1 * i;
      v = t/i;

      if(v != 5964.1)
        _nop_();
      }

    t = 0;

    // this block will continuously add  5964.1
    for(i=1; i<1000; i++)
      {
      t += 5964.1;
      v = t/i;

      if(v != 5964.1)
        _nop_();
      }
    }
  }

My project is setup using the P89C668 with all default options EXCEPT "bits to round for float compare" is set to 1. I tried all levels of optimization and the result was the same in all cases.

I then set a breakpoint on each nop. When I simulate the program and hit run, a break occurs on the second nop. This would indicate that the addition opperation lost some data.

Whats happening here? Why would the addition opperation not properly calculate the result while the multiplication does?

Thanks
Philip

Parents
  • Not all numbers can be represented precisely in floating-point format.

    The equals operator typically just does a bit-for-bit compare between two floats; if they're close, but not exactly the same, the comparison fails. The usual procedure is to code some tolerable range of error into the comparison, rather than using "==".

    if (fabs(v - i) < Tolerance)
       { // v and i are "close enough" to equal
       }
    

    Since floating point math is all software on the 8051 anyway, the Keil compiler helps with this problem by giving you an option of how fuzzy you want equality to be. The "bits to round for compare" option (or FLOATFUZZY directive) controls how close floats must be to compare equal.

    However, that setting only affects comparisons. It doesn't help with accumulated error due to iteration. Adding something to itself 1000 times gives you a thousand opportunities to accumulate error. Multiplication gives you one. When I run the code on the simulator, v comes out to be 5964.101, not 5964.1(00).

    IEEE 754 has 23 bits of mantissa. 5964.1 is (1.)01110100110000011001101, which is really 5964.1001. 5964.101 is 01110100110000011001101, which is really 5964.1011.

    5964.100 -> 01110100110000011001101
    5964.101 -> 01110100110000011001111
    

    Note that these values are different in the next-to-last bit position. If you only round off 1 bit, then the two values will not compare equal. You'd need to use even more "fuzz" if you want the two values to compare equal.

Reply
  • Not all numbers can be represented precisely in floating-point format.

    The equals operator typically just does a bit-for-bit compare between two floats; if they're close, but not exactly the same, the comparison fails. The usual procedure is to code some tolerable range of error into the comparison, rather than using "==".

    if (fabs(v - i) < Tolerance)
       { // v and i are "close enough" to equal
       }
    

    Since floating point math is all software on the 8051 anyway, the Keil compiler helps with this problem by giving you an option of how fuzzy you want equality to be. The "bits to round for compare" option (or FLOATFUZZY directive) controls how close floats must be to compare equal.

    However, that setting only affects comparisons. It doesn't help with accumulated error due to iteration. Adding something to itself 1000 times gives you a thousand opportunities to accumulate error. Multiplication gives you one. When I run the code on the simulator, v comes out to be 5964.101, not 5964.1(00).

    IEEE 754 has 23 bits of mantissa. 5964.1 is (1.)01110100110000011001101, which is really 5964.1001. 5964.101 is 01110100110000011001101, which is really 5964.1011.

    5964.100 -> 01110100110000011001101
    5964.101 -> 01110100110000011001111
    

    Note that these values are different in the next-to-last bit position. If you only round off 1 bit, then the two values will not compare equal. You'd need to use even more "fuzz" if you want the two values to compare equal.

Children
  • Correct, in my example above, only 1 bit after the decimal is used in comparison and is only used to show how multiple addition is accumulating error.

    However, why does this error accumulation exist? If I have two values that are within the limits required of type float and the opporation doesn't result in "overflow" of this limit, why would error occur?

    To me, a software floating addition routine would not purposely shorten the opporation such that precision is lost.

    Are you sure this is what one should expect?

    Phil

  • However, why does this error accumulation exist?

    This is similar to the problem of buying 3 items for $1.00.

    If I go to the store and I buy 6 items, the price is $2.00. This is similar to your multiplication code.

    If I go to the store and I buy 1 item, the price is $0.33.

    If I go to the store later and buy 1 item, the price is $0.33.

    If I go to the store a third time and buy 1 item, the price is $0.33.

    Buying the items individually, I've only paid $0.99 instead of $1.00. This is equivalent to your addition code.

    The reason for the difference is that we can't represent 1/3 of a cent using the coins that we have.

    Now, with floating-point math, we have a 23-bit base-2 mantissa. That gives us about 7 decimal digits of precision. Don't forget that 5964.1 is not stored in base-10. It's stored in base-2 and normalized and is expanded/contracted with an exponent (power of 2) as Drew pointed out.

    The reason, that iterative adding doesn't equate to instantaneous multiplication is because at some point in the additions, the result exceeds the precision of the storage format (IEEE-754) and digits from the end (binary digits, that is) are lost. The same as with the 1/3 of a penny.

    That's why with floating-point math, you must use comparisons that account for TLAR (that looks about right).

    Jon

  • Ahh, Good Example.

    Thanks Jon and Drew.
    Phil

  • I only wish to thank to Drew Davis and Jon Ward for the superb reply.

    I know this rule of the thumb, for a long time:

    "Never compare floats without a precision."

    But i'd never seen such a good explanation and example!!

    PS:
    Thank you Jon, now i know haw to save 0.01 $ in the shop... :)

  • Thank you Jon, now i know haw to save 0.01 $ in the shop... :)

    Ahhh. But be careful. Some cash register systems can be programmed to always round-up and in such a case you would pay $0.34 for each item ($1.02 for 3). :-)

    Jon

  • Yes,I'v understood last explaination. but, in my program, if I excute a float math. 4.6-4, the result will be 0.5999999. this is terrible for a user display. User will say"that's wrong". The question is how to resolve it, can I patch it?

  • Round the results up before displaying them.

    Stefan

  • You have two options for patching this:

    1) change the program. Make it present results cut down to a reasonable number of digits. Avoid showing the noise that's sitting in (at least) the last digit of any floating point result.

    2) change the user. Educate them about floating point comparison. Teach them the old
    dogma: In computing, 10.0 times 0.1 is hardly ever 1.0

  • Avoid showing the noise that's sitting in (at least) the last digit of any floating point result.
    what about noise between 99999 and 100000?

    In computing, 10.0 times 0.1 is hardly ever 1.0
    BS, it ALWAYS is if you do not use the much overused floating point.

    Erik

  • "The question is how to resolve it"

    Do you really need to use floating point?
    As has already been discussed, this will always result in such problems.

    Could you use fixed-point instead?

  • what about noise between 99999 and 100000?

    If you write it in an appropriate way for floating point numbers, that'll be 9.9999e4 and 1.00000e5. And for a 4-byte float, that's
    excessive displayed precision, in other words, you're showing noise. Reduce the format to %.3g and you'll get 1.000e5 in both cases.

    BS, it ALWAYS is if you do not use the much overused floating point

    But if you see a manifest constant written 0.1, that practically always is floating point. I don't think I've seen any programming language in which "10.0 * 0.1" would be evaluated in some fixed point number format. Have you?

    Anyway: it's called a dogma not because it's always actually true, but because by remembering it, you'll learn something useful.

  • But if you see a manifest constant written 0.1, that practically always is floating point. I don't think I've seen any programming language in which "10.0 * 0.1" would be evaluated in some fixed point number format. Have you?
    I have programmed for more years than most and never had to use floating point.

    Floating point is the lazy mans way to handle fractions.

    Erik

  • Could you use fixed-point instead?

    That's what I ALWAYS do in systems where accuracy is important. The following is probably obvious but I'll point it out anyway.

    An int can hold any of the following value ranges EASILY:

     -32768 to 32767
    -3276.8 to 3276.7  (*10)
    -327.68 to 327.67  (*100)
    -32.768 to 32.767  (*1000)
    -3.2768 to 3.2767  (*10000)

    This can be applied to char and long types as well.

    Jon

  • "Floating point is the lazy mans way to handle fractions"
    That what I want to learn!!!Thanks a lot!
    I'm really a lazy man.