This blog was originally published on 29th August 2013 on blogs.arm.com

Welcome to Part III of my blog series on GPU floating-point quality! This series was inspired bysome interesting work by Stuart Russell over at Youi Labs™, exploring differences in behaviour of various mobile GPUs. In Part I, I described floating-point formats, and analysed Stuart’s test shader. We used that to tell us how many bits of floating-point precision a GPU’s fragment shader has. In Part II, we explored methods of floating-point rounding, and used the results to find out which GPUs were taking the easy (round-to-zero) way out, and which were using the more accurate round-to-nearest-even. In this last blog, we’ll look at another oddity of floating-point – the hole around zero – and see which mobile GPUs are designed to avoid falling into it.

A key premise of this series is that using floating-point arithmetic is tricky. Floating-point numbers behave a lot, but not exactly, like the numbers you learned about in school. If you ignore the differences, your code will work most of the time; but every so often, it will bite you. If you’re going to use floating-point for anything serious, you’re going to have to learn how it works in more detail than most – how shall I put this? – most normal people would want to. So, we’ll start this journey by talking about what the zero hole is, and how the IEEE-754 specification deals with it. Then, we’ll write a fragment shader that lets us visualize it.

## More detail than you really wanted, part 3

Throughout this series, we’ve been using a generic floating-point format similar to IEEE-754’s binary32(which is what most people mean when they say “floating-point”). The format describes a number using a sign bit n, an eight-bit exponent E, and a twenty-four-bit significand (1.sss...). The value of such a number is

(-1)n × 2E × 1.sssssssssssssssssssssss

where E ranges (logically) from -126 to +127. The significand is a fixed-point binary number with 23 bits of fractional precision, and an implicit ‘one’ bit in the one’s place. In the first two blogs, we looked at what happens when you add numbers using this format, especially when those numbers differ in size. Today we’ll look at something much more basic: What is the set of values this format can represent?

To start with the obvious: For each value of E, there is a set of 223 distinct positive values the number can take on. The numbers in each set are uniformly spaced along the real number line, at a distance of 2(E-23)apart. So as E gets smaller and smaller, the numbers get closer and closer together, until E reaches its minimum value of -126. (That, by the way, is almost the defining property of floating-point numbers: the quantization error they introduce is roughly proportional to the size of the number. That means that the error is roughly constant in percentage terms, which is totally awesome. Well, to me, anyway.)

However, if you’ve been paying attention (and weren’t in on the joke already), you may have noticed something strange about the format as I’ve explained it so far: it has no way to represent the number zero! (The 2E term is always greater than zero, and the significand is always between one and two, so the product can never be zero.) We’ve known at least since the time of al-Khwarizmi that zero is a very, very important number; so this just won’t do.

The solution to this little problem also explains another odd feature of our number format. The exponent range is from -126 to +127, which is only 254 values, but our eight-bit exponent can represent 256 values. What are the other two doing? The answer is, they’re serving as flags to indicate values that can’t be represented in the usual way, such as zero, infinity, or the ever-popular NaN (Not a Number). So for example, we can say that when E has the logical value -127, the value of the number is zero.

So far, so good, but look where we’ve ended up. The space between representable numbers gets steadily smaller as E decreases, until we reach the very smallest number we can represent: 2-126, which is

(-1)0 × 2-126 × 1.00000000000000000000000

or roughly 1.175 × 10-38. This is a very small number, but it isn’t zero; in fact, there are an infinite number of real numbers between this number and zero. The next largest number we can represent is

(-1)0 × 2-126 × 1.00000000000000000000001

Notice that the distance between these two numbers is 2-149, which is way smaller than 2-126. Let me put that another way: the distance between zero and the smallest positive number we can represent is eight million times bigger than the distance between that number and the next smallest number. To help visualize what this looks like, imagine a really primitive floating-point format with only four bits of fractional precision in the significand, and a minimum exponent of -4. If we plot the spacing between representable values, it looks like this:

See the huge gap between zero and the smallest representable positive number, compared to the spacing of the numbers above it? That’s the zero hole. With a 24-bit significand, it’s much worse.

## Who cares?

OK, so there’s a hole around zero. Does it matter? Well, it depends what you’re using floating-point for; but this blog (like Stuart’s) is all about what happens when you’re doing fairly extreme stuff. And it turns out that if you don’t do something about it, things can get weird. We tend to take it on faith that numbers in a computer behave pretty much like numbers as we learned them in school, so when they don’t, it’s upsetting. As an awkward teenager, I took great comfort from the fact that there were certain truths I could rely on; for example, given two numbers A and B, if A minus B is equal to zero, then A is equal to B. Sadly, with the format we’ve been discussing, that isn’t even close to true.

It was this sort of consideration that led the IEEE-754 committee, after long debate, to require a solution to the problem: denormalized or subnormal numbers. The idea is beautifully simple. We already have a special exponent value to represent zero. Suppose that when the exponent has that value, instead of the usual formula

value = (-1)n × 2E × 1.sssssssssssssssssssssss

we use

value = (-1)n × 2-126 × 0.sssssssssssssssssssssss

For the primitive four-bit format we looked at before, the set of representable values now looks like this:

The zero hole is gone! The space between representable values never increases as we approach zero, A minus B is zero if and only if A equals B, and all’s right with the world!

## Hunting for Holes

Of course, filling the zero hole isn’t free; that’s why the IEEE-754 committee argued about it. And for some applications, you can engineer your way around the zero hole. So it’s no surprise that many special-purpose processors (such as DSPs) don’t implement denormal arithmetic. What about GPUs? Can we tell which GPUs follow the IEEE standard fully, and which take the short cut? More importantly, is there a fun way to do it?

Here’s a fragment shader that does something like what we want:

precision highp float;
uniform vec2 resolution;
uniform float minexp;
uniform float maxexp;
void main( void ) {

float y = (gl_FragCoord.y / resolution.y) * (maxexp - minexp);
float x = (1.0 - (gl_FragCoord.x / resolution.x));
float row = floor(y) + minexp;
for (float c = 0.0; c < row; c = c + 1.0) x = x / 2.0;
for (float c = 0.0; c < row; c = c + 1.0) x = x * 2.0;
gl_FragColor = vec4(vec3(x), 1.0);
if (x == 0.0) gl_FragColor = vec4(1.0, 0.0, 0.0, 1.0);
if (fract(y) > 0.9) gl_FragColor = vec4(0.0, 0.0, 0.0, 1.0);

}

What does this do? If you’ve looked at Stuart Russell’s shader, used in the two previous blogs, this will look familiar. The shader is executed at every pixel on the screen. The first line (variable y) divides the image into horizontal bars, each corresponding to an exponent value in the range from minexp to maxexp. The second line (variable x) computes an intensity value that varies linearly from nearly 1.0 (white) at the left edge of the image, to nearly 0.0 (black) at the right edge. Line 3 determines the exponent corresponding to the bar our pixel is in.

The fun happens in lines 4 and 5. Line 4 divides the grey value by 2 E times, where E is the exponent computed in line 3. Line 5 then multiplies it by 2 E times. In the world of mathematics, this would bring it back to its original value. But floating-point doesn’t work that way. If E is large enough, line 4 will cause the grey value to underflow (go to zero), after which, line 5 won’t be able to restore it.

The last three lines translate the value into a color we can see. Normally, we return the grey value; but if the value underflows to zero, we return red to show that something bad happened. Finally, we draw a thin black line between each bar, to make it easy to count and see which bar we’re in.

## The New (de)Normal

So what do we see, when we run this shader on a mobile GPU? Figure 1 below shows the output on an Ascend D1 smartphone using a Vivante GC4000 GPU, and on an iPad 4 using the Imagination SGX 554. Here I’ve set minexp and maxexp so that the exponent runs from -120 to -152, spanning the minimum FP32 exponent of -126. Remember, red regions correspond to numbers that the GPU cannot distinguish from zero. On these GPUs, what we see is that the grey value varies smoothly as long as the floating-point exponent is -126 or greater. When it reaches -127, the value suddenly underflows – it has fallen into the zero hole. These GPUs do not support subnormal values. Any value less than 2-126 is zero, as far as they are concerned.

Figure 2 shows the result on a Nexus 10 tablet, using ARM’s Mali™-T604. Here, instead of maintaining full precision down to 2-126 and then falling suddenly to zero, the grey value maintains full precision to down to 2-126 and then underflows gradually, giving up precision a bit at a time down to a value of 2-149. The Mali-T604 supports subnormals. It can represent an additional eight million non-zero values between zero and 2-126 .

We’ve run this shader on a lot of GPUs, and found that support for denormals is rare. In fact, as far as we know, the Mali Midgard™ series is the only mobile GPU family that offers it. But as GPU computing becomes more and more important, and we move into the world of heterogeneous computing, it will be essential that computations on the GPU give the same results as on the CPU. When that day comes, we’ll be ready – and that day is right around the corner. We’re proud that the Mali-T604 and its successors are taking the lead in offering the highest quality floating-point available in modern GPUs.

## What next?

We could have tons more fun investigating floating point behaviour in GPUs. Do they follow IEEE 754 precision requirements for operations? How do they handle NaNs and infinities? (For example, what is the result of a divide by zero?) And how good are they, really, at evaluating transcendental functions? We’re confident that the Mali Midgard GPUs would do very well in that sort of competition; after all, they are the only mobile GPUs that pass the vicious precision requirements of full profile OpenCL. (How vicious? Hint: your desktop CPU with standard C math libraries would fail miserably.)

But these questions will have to wait; three blogs in a row on precision is about all I can stand. There’s lots of other fun stuff going on in the wake of SIGGRAPH - notably the release of Samsung’s new Exynos 5 Octa, featuring a screaming fast Mali-T628 MP6. And there’s some technology news in the works that we’re pretty excited about, starting with the Forward Pixel Kill technique described in Sean Ellis’s recent blog. So we’ll come back to precision one day, but for now, so long, and thanks for all the bits…

Previous blogs in this series: