Killing Pixels - A New Optimization for Shading on ARM Mali GPUs

September 11, 2013

5 minute read time.

Invisible pixels are expensive

Shading pixels is expensive, so you want to make sure that you don't spend time and energy shading pixels that will not actually make it to the screen. To address this, ARM® Mali™ GPUs are pioneering a novel optimization.

But before we jump to the solution, why do we have invisible pixels in the first place? For an exploration of when pixels are and are not, you might also enjoy Ed Plowman's Of Philosophy and When is a Pixel Not a Pixel?

The colour of every pixel on the screen is determined by a shader program. Each object typically has a different program associated with it, and one thread of execution is spawned for every pixel in the object. Once launched, these threads are committed to complete (unless they execute a "discard" instruction to terminate themselves), and they then pass the calculated colour to the blending unit where it is combined with the existing pixel value in the output image.

The key problem here is overdraw - nearby objects will be drawn over more distant objects, hiding them. There's no point drawing the Emerald City on the horizon in huge detail if there's a hill in the foreground occluding (hiding) it. If you have already spent the time and effort rendering the emerald pixels before discovering that they will be overdrawn, then this is a waste of performance, time, battery life, and possibly karma.

Reducing the load

There are several existing approaches that aim to reduce the cost of overdrawn pixels. The first is for the application to use its knowledge of the scene to avoid even sending geometry to the graphics driver at all. This works well in closed, room-based games but requires additional logic in the game engine. For common classes of scene, it's also quite difficult actually working out which objects are occluding others.

Even if you do eliminate some of the more distant geometry, there will still be cases where the geometry you do draw is still hidden. Perhaps there's an enemy player in the same room as you - you can see their helmet, but the rest of them is behind a crate. You don't want to shade the pixels for the whole character when just the top of his hat is enough.

Using a simple depth-buffer, together with "early" depth testing, it is possible to determine that the pixels from a more distant object are hidden by the pixels from the nearer one before we start shading the pixels.

By sorting the objects in order of increasing distance, and drawing the nearest objects first, it is possible to help the process along and eliminate most of the hidden pixels in an overdrawn image. Of course, it's not possible to do the reverse, as the pipeline is not psychic and cannot know what is going to be drawn afterwards... but hold that thought.

But front-to-back sorting has some other problems.

For semi-transparent objects, front-to-back is exactly the wrong order to draw them in, as they need to be blended with the objects behind them. And just sorting the objects in the first place takes time. Even worse, the structure of modern graphics APIs (OpenGL ES® and Direct3D®) doesn't really include the concept of "object in a scene" at all, so you have to keep track of this yourself and draw in an acceptable order.

Another way to avoid work is to defer as much shading as possible, by first running a quick pass that just calculates the depths and stores the data about which object is in front at each pixel, and only after all the pixels have been calculated, running the full lighting calculation.

This works extremely well. At least, it works well until you come across something that breaks the rules. Perhaps it's a pixel which writes its own depth. Perhaps it's a semi-transparent object. As soon as that happens, you have to fall back into a more "brute force" mode of operation in order to keep track of the additional data. The fail-over isn't soft, either, as performance decreases markedly as soon as any special cases are detected, and these are becoming common as the game engines strive for more and more realism.

And so, with the inevitability of a rhetorical question at the end of the introduction to a technology article, what can we do about it?

Forward Pixel Kill

Our answer is a patented technology known as Forward Pixel Kill (FPK), which is included in ARM Mali GPUs from Mali-T62X and T678 onwards (such as the Mali-T628 MP6 in the recently announced Samsung Exynos5420).

In an FPK-enabled GPU, the threads that colour the pixels are not irrevocably committed to complete once they are launched. Calculations already in flight can be terminated at any time if we spot that a later thread will write opaque data to the same pixel location. Since each thread takes a finite time to complete, we have a window in time which we can exploit to kill pixels already in the pipeline. In effect, we exploit the depth of the pipeline to emulate the "psychic" seeing-into-the-future effect that I alluded to earlier.

In fact, it's possible to do even better than this. By adding a simple FIFO buffer to the start of the pipeline, we can extend the forward pixel kill zone, making it more likely to spot overdraw, and at the same time giving the pipeline the chance to kill threads before they are even started.

This all works particularly well with a tile-based renderer like the ARM Mali GPUs. With even a modest kill zone, this can produce results that are as good as the front-to-back drawing order, but without the requirement to sort the scene (with consequent overhead in silicon area, power and memory bandwidth). So, no need to modify your application to add the sorting algorithm. Also, since drawing proceeds in the same natural order, semi-transparent content works properly without expensive workarounds that degrade performance.

And the best thing is that the transition between operating regimes is soft - more like a steady speed adjustment than a gear change. Inconsistent frame rates (sometimes known as "jank") are extremely annoying to users, so any technique which significantly reduces the uncertainty in scene rendering time will be popular with users and developers alike.

Parents

Maxim Mogilnitsky over 11 years ago

Very Impressive. Just to be sure. As far as I know a competitive solution, PowerVR from Imagination, has similar technologies. To my knowledge this technologies are patented from "top to bottom" of the GPU processing pipe. They, of course, very highly guarded as being one of the major assets of Imagination. Even further, I heard that Imagintion continue to enlarge this asset by adding more and more patents on that matter. From other side, as far as I understand the deferred GPU technology this is an absolute must to achieve descent performance. So how?
- Cancel
- Up 0 Down
- Reply
- More
- Cancel

Comment

Maxim Mogilnitsky over 11 years ago

Very Impressive. Just to be sure. As far as I know a competitive solution, PowerVR from Imagination, has similar technologies. To my knowledge this technologies are patented from "top to bottom" of the GPU processing pipe. They, of course, very highly guarded as being one of the major assets of Imagination. Even further, I heard that Imagintion continue to enlarge this asset by adding more and more patents on that matter. From other side, as far as I understand the deferred GPU technology this is an absolute must to achieve descent performance. So how?
- Cancel
- Up 0 Down
- Reply
- More
- Cancel

Children

No Data

Graphics, Gaming, and VR blog

The mobile gaming revolution, powered by Arm

Philippe Bressy

This blog post describes the stratospheric growth of mobile gaming growth from the late 90s to present day, and how Arm technology has been at the heart of the mobile gaming revolution.
- November 18, 2024
Shader analysis and more in Arm Performance Studio 2024.4

Julie Gaskin

Learn about the new shader analysis features for mobile developers in Frame Advisor, and hear about other Arm Performance Studio changes in this release.
- October 2, 2024
Save your battery while enjoying the modern graphics on mobile with Android Dynamic Performance Framework

Patrick Wang

Save battery and enhance mobile gaming with ADPF and Unreal Engine. Mori shows you how it optimizes graphics based on real-time thermal data, reducing overheating and power consumption.
- September 26, 2024

AI and ML blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded blog

Graphics, Gaming, and VR blog

High Performance Computing (HPC) blog

Infrastructure Solutions blog

Internet of Things (IoT) blog

Operating Systems blog

SoC Design and Simulation blog

Tools, Software and IDEs blog

Killing Pixels - A New Optimization for Shading on ARM Mali GPUs

Invisible pixels are expensive

Reducing the load

Forward Pixel Kill

The mobile gaming revolution, powered by Arm

Shader analysis and more in Arm Performance Studio 2024.4

Save your battery while enjoying the modern graphics on mobile with Android Dynamic Performance Framework