Arm Community
Arm Community
  • Site
  • User
  • Site
  • Search
  • User
Arm Community blogs
Arm Community blogs
Mobile, Graphics, and Gaming blog Killing Pixels - A New Optimization for Shading on ARM Mali GPUs
  • Blogs
  • Mentions
  • Sub-Groups
  • Tags
  • Jump...
  • Cancel
More blogs in Arm Community blogs
  • AI blog

  • Announcements

  • Architectures and Processors blog

  • Automotive blog

  • Embedded and Microcontrollers blog

  • Internet of Things (IoT) blog

  • Laptops and Desktops blog

  • Mobile, Graphics, and Gaming blog

  • Operating Systems blog

  • Servers and Cloud Computing blog

  • SoC Design and Simulation blog

  • Tools, Software and IDEs blog

Tags
  • Mali
  • graphics
  • optimizations
Actions
  • RSS
  • More
  • Cancel
Related blog posts
Related forum threads

Killing Pixels - A New Optimization for Shading on ARM Mali GPUs

Sean Ellis
Sean Ellis
September 11, 2013
5 minute read time.

Invisible pixels are expensive                                                           

Chinese Version中文版


Shading pixels is  expensive, so you want to make sure that you don't spend time and energy  shading pixels that will not actually make it to the screen. To address  this, ARM® Mali™ GPUs are pioneering a novel optimization.

But before we jump to the solution, why do we have invisible pixels in the first place? For an exploration of when pixels are and are not, you might also enjoy Ed Plowman's Of Philosophy and When is a Pixel Not a Pixel?

The colour of every pixel on the screen is determined by a shader program.  Each object typically has a different program associated with it, and  one thread of execution is spawned for every pixel in the object. Once  launched, these threads are committed to complete (unless they execute a  "discard" instruction to terminate themselves), and they then pass the  calculated colour to the blending unit where it is combined with the  existing pixel value in the output image.

The key problem here is overdraw  - nearby objects will be drawn over more distant objects, hiding them.  There's no point drawing the Emerald City on the horizon in huge detail  if there's a hill in the foreground occluding (hiding) it. If you have  already spent the time and effort rendering the emerald pixels before  discovering that they will be overdrawn, then this is a waste of  performance, time, battery life, and possibly karma.

 
Reducing the load


There  are several existing approaches that aim to reduce the cost of  overdrawn pixels. The first is for the application to use its knowledge  of the scene to avoid even sending geometry to the graphics driver at  all. This works well in closed, room-based games but requires additional  logic in the game engine. For common classes of scene, it's also quite  difficult actually working out which objects are occluding others.

Even  if you do eliminate some of the more distant geometry, there will still  be cases where the geometry you do draw is still hidden. Perhaps  there's an enemy player in the same room as you - you can see their  helmet, but the rest of them is behind a crate. You don't want to shade  the pixels for the whole character when just the top of his hat is  enough.

Using a simple depth-buffer,  together with "early" depth testing, it is possible to determine that  the pixels from a more distant object are hidden by the pixels from the  nearer one before we start shading the pixels.

By  sorting the objects in order of increasing distance, and drawing the  nearest objects first, it is possible to help the process along and  eliminate most of the hidden pixels in an overdrawn image.  Of course, it's not  possible to do the reverse, as the pipeline is not  psychic and cannot  know what is going to be drawn afterwards... but hold  that thought.

But front-to-back sorting has some other problems.

For semi-transparent objects, front-to-back is  exactly the wrong order to draw them in, as they need to be blended with  the objects behind them. And just sorting the objects in the first  place takes time. Even worse, the structure of modern graphics APIs (OpenGL ES® and Direct3D®)  doesn't really include the concept of "object in a scene" at all, so  you have to keep track of this yourself and draw in an acceptable order.

Another  way to avoid work is to defer as much shading as possible, by first  running a quick pass that just calculates the depths and stores the data  about which object is in front at each pixel, and only after all the  pixels have been calculated, running the full lighting calculation.

This  works extremely well. At least, it works well until you come across  something that breaks the rules. Perhaps it's a pixel which writes its  own depth. Perhaps it's a semi-transparent object. As soon as that  happens, you have to fall back into a more "brute force" mode of  operation in order to keep track of the additional data. The fail-over  isn't soft, either, as performance decreases markedly as soon as any  special cases are detected, and these are becoming common as the game  engines strive for more and more realism.

And so, with the  inevitability of a rhetorical question at the end of the introduction to  a technology article, what can we do about it?

Forward Pixel Kill


Our  answer is a patented technology known as Forward Pixel Kill (FPK),  which is included in ARM Mali GPUs from Mali-T62X and T678 onwards (such  as the Mali-T628 MP6 in the recently announced Samsung Exynos5420).

In  an FPK-enabled GPU, the threads that colour the pixels are not  irrevocably committed to complete once they are launched. Calculations  already in flight can be terminated at any time if we spot that a later  thread will write opaque data to the same pixel location. Since each  thread takes a finite time to complete, we have a window in time which  we can exploit to kill pixels already in the pipeline. In effect, we  exploit the depth of the pipeline to emulate the "psychic"  seeing-into-the-future effect that I alluded to earlier.

In fact, it's possible to do even better than this. By adding a simple FIFO buffer  to the start of the pipeline, we can extend the forward pixel kill  zone, making it more likely to spot overdraw, and at the same time  giving the pipeline the chance to kill threads before they are even  started.

This all works particularly well with a tile-based  renderer like the ARM Mali GPUs. With even a modest kill zone, this can  produce results that are as good as the front-to-back drawing order, but  without the requirement to sort the scene (with consequent overhead in  silicon area, power and memory bandwidth). So, no need to modify your  application to add the sorting algorithm. Also, since drawing proceeds  in the same natural order, semi-transparent content works properly  without expensive workarounds that degrade performance.

And the best thing is that the transition between operating regimes is soft - more like a steady speed adjustment than a gear change. Inconsistent  frame rates (sometimes known as "jank") are extremely annoying to users,  so any technique which significantly reduces the uncertainty in scene  rendering time will be popular with users and developers alike.

Anonymous
Parents
  • Sean Ellis
    Sean Ellis over 11 years ago

    Maxim,

    I don't want to comment on competing technologies directly. However, I think our approach has two very important key features. The first is its relative simplicity, which means  low area overhead, and the second is the fact that it copes gracefully when it encounters primitives for which Forward Pixel Kill is not appropriate.

    Sean.

    • Cancel
    • Up 0 Down
    • Reply
    • More
    • Cancel
Comment
  • Sean Ellis
    Sean Ellis over 11 years ago

    Maxim,

    I don't want to comment on competing technologies directly. However, I think our approach has two very important key features. The first is its relative simplicity, which means  low area overhead, and the second is the fact that it copes gracefully when it encounters primitives for which Forward Pixel Kill is not appropriate.

    Sean.

    • Cancel
    • Up 0 Down
    • Reply
    • More
    • Cancel
Children
No Data
Mobile, Graphics, and Gaming blog
  • What is Arm Performance Studio?

    Jai Schrem
    Jai Schrem
    Arm Performance Studio gives developers free tools to analyze performance, debug graphics, and optimize apps on Arm platforms.
    • August 27, 2025
  • How Neural Super Sampling works: Architecture, training, and inference

    Liam O'Neil
    Liam O'Neil
    A deep dive into a practical, ML-powered approach to temporal super sampling.
    • August 12, 2025
  • Start experimenting with Neural Super Sampling for mobile graphics today

    Sergio Alapont Granero
    Sergio Alapont Granero
    Laying the foundation for neural upscaling to enable sharper, smoother, AI-powered gaming on next-generation Arm GPUs.
    • August 12, 2025