This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

64 bit float

I know Keil does not support a 64 bit float.

Does anybody know of a way to create a 64 bit float? Maybe from a 32 bit float? The original data is in a ushort. We to create and read a 64 bit float to interface to an external device. We have no choice.

Parents Reply Children
  • Is it just a matter of shifting the float after it is in an 8 byte array?

    More or less. It will probably take two shifts combined with some bitwise OR'ing and AND'ing. It probably makes sense to construct bit fields representing single and double precision floats (sign, mantissa, exponent.) After that converting between them would probably be as simple as this:

    f64.sign = f32.sign;
    f64.exponent = f32.exponent;
    f64.mantissa = f32.mantissa;
    


    Of course, I could be missing something.

  • Note that the exponent is biased and not two-complement, and the big difference in range of the double-precision exponent means that it has a different bias.

    So you would have to extract the exponent from the 32-bit float. Update from 8-bit to 16-bit. Perform an add for changing bias. Shift to the correct location and store.

    For the mantissa, it should be enough to just move the bits. The double will have a number of extra bits at the end, that should be left as zero.

    Starting from a 16-bit integer, there will not be any denormalized values, NaN, +Inf or -Inf.

  • I was too fast posting. I had intended to add this link:
    en.wikipedia.org/.../IEEE_754-1985

    It has quite good examples of how the numbers are actually stored.