# At Home on the Range - Why Floating Point Formats Matter in Graphics

Posted by Sean Ellis in ARM Mali Graphics on Sep 6, 2011 4:44:15 PMAs graphics processor hardware now supports a number of different floating point data formats, it is important to understand how to select an appropriate format for your calculations, and why the choice is important. Even the simplest of operations benefit greatly from a little thought. Here is an instructive example that we recently came across.

**Animating Shaders**

If you are creating an animated shader in OpenGL ES, you will need to tell it the time. The obvious way to do this is to create a *uniform* variable called, let's say, *animation_time*, and use it to modify some aspect of the object we are drawing, such as its color, position, or texture. Let us look at some complications that can occur.

Here is a very simple animated pixel shader that uses the time to pulse a red light repeatedly:

`uniform mediump float animation_time;`

gl_FragColor = vec3(fract(animation_time), 0.0, 0.0);

The *fract *function returns the fractional part of the *animation_time* value, which gives us a sawtooth-like ramp over time. This is placed in the red component of the fragment color, so that the object appears to be repeatedly pulsing red.

We pass the time to the shader using C code something like this. We will assume that we are running at a reasonable frame rate, say 30 frames per second (0.033 seconds per frame), and advance the time appropriately on each frame:

`GLint location = glGetUniformLocation(myProgramObject, "animation_time");`

float animation_time = 0.0f;

while (game_not_finished_yet)

{

` glUniform1f(location, true, animation_time);`

/* ... do some rendering here ... */

/* Now advance the time (assume 30 frames/sec) */

animation_time += 0.033f; /* 0.033s = 33ms per frame */

`}`

**Lumpy Animation**

However, there's a subtle problem here that may cause your animation to go off the rails pretty quickly on most (but annoyingly not all) implementations of OpenGL ES. On a typical implementation, the animation will get jerky, then lumpy, becoming worse and worse until it finally stops altogether after about a minute and a half.

The reason is pretty hard to spot. If we put a *printf* in the loop, we see nothing amiss. The time is smoothly increasing as we go along, and yet we still see the bad behaviour. Changing the C program to use a double instead of a float has no effect, so it's not a precision issue.

Or is it?

**The Range/Precision Problem**

The key is the precision specifier on the *uniform* declaration in the first line of the shader program. The OpenGL ES shading language specifies that "mediump" precision variables need only have 10 bits of precision, which corresponds to representing numbers with about 3 significant decimal digits.

If your implementation uses this minimum (as they usually do - it allows the entire floating point value to fit neatly into 16 bits), the implicit conversion from C's float value to the internal value rounds off to that 3 decimal digits of precision. Since these are significant digits, the range of the value affects the absolute precision we have to work with. I will use decimal precision to illustrate what is happening, because it's easier than fiddling with binary, and it shows the problem just as well.

Initially, we see no problem. The values are small enough that 3 digits are enough to represent them exactly.

C float | mediump | color |
---|---|---|

0.0000 | 0.000 | 0.000 |

0.0330 | 0.033 | 0.033 |

0.0660 | 0.066 | 0.066 |

0.0990 | 0.099 | 0.099 |

... | ... | ... |

0.9900 | 0.990 | 0.990 |

As we pass the 1-second mark, we can no longer represent the values in three significant digits and we start to lose precision. The progress of the animation is no longer as smooth. On some frames we progress by 0.03 units, on some by 0.04.

C float | mediump | color |
---|---|---|

1.0230 | 1.02 | 0.02 |

1.0560 | 1.06 | 0.06 |

1.0890 | 1.09 | 0.09 |

1.1220 | 1.12 | 0.12 |

... | ... | ... |

9.9330 | 9.93 | 0.93 |

9.9660 | 9.97 | 0.97 |

9.9990 | 10.0 | 0.0 |

And as we reach 10 seconds, the problem gets a lot worse, as we see no change in color between multiple frames.

C float | mediump | color |
---|---|---|

10.0320 | 10.0 | 0.0 |

10.0650 | 10.1 | 0.1 |

10.0980 | 10.1 | 0.1 |

10.1210 | 10.1 | 0.1 |

10.1540 | 10.2 | 0.2 |

... | ... | ... |

99.9330 | 99.9 | 0.9 |

99.9660 | 100 | 0 |

99.9900 | 100 | 0 |

100.0230 | 100 | 0 |

... | ... | ... |

After 100 seconds, our 3 digits of precision give us no fractional part at all and the animation effectively halts forever.

Of course, in real life, the floating-point values are represented in binary, so the degradation happens in smaller steps at powers of two, but the principle is the same. With a typical 16-bit implementation of *mediump*, we start seeing degradation after a few seconds, and complete failure in less than two minutes.

Ouch.

**Higher Precision?**

Upping the precision of the uniform *animation_time* to *highp* may help in some systems, but by no means all. The OpenGL ES spec sets minimum limits on *highp* which allow hardware to implement it with the same precision as *mediump* in fragment shaders. There is an expectation that fragment shaders will be dealing primarily with colors, where a restricted range is not a big issue, so this is exactly what many implementations in fact do.

**Match Range and Precision**

Luckily, there's a simple solution that works universally.

In many animations, we want to repeat our animation over a finite time period. In our example here, we just used *fract* so the period was one second. Since we are discarding the integral value of the time, we could not bother passing it in at all and it would have exactly the same effect.

So, we remove the call to *fract* in the shader, and instead put it in the C code. That means that we do not need to send a time in that increases without bounds, and we can guarantee that we stay within the range where we get good precision.

Different animations may use different periods. If you are using *sin* or *cos*, it makes sense to either clamp the input time into the range 0 to 2π, or keep it to 0 to 1 and scale by 2π inside the shader.

The mention of 2π brings up one last point. A symmetric range -*n* to +*n* is twice as precise as the equivalent range 0 to 2*n* since the sign bit is always present and does not count against our allocation of precision bits. Only the absolute magnitude of *n* counts. For *sin* and *cos*, using the range -π to π is therefore a better choice than 0 to 2π, and is often more convenient as well.

**Conclusion**

In general, please consider carefully both the precision and the range of the numbers you are passing into your shaders, especially fragment shaders. Using too big a range wastes precision, and when precision is limited, the effects aren't pretty.

*Have you had interesting experiences with floating point precision? Why not share them in the comments?*

## Comments