Mali Performance 3: Is EGL_BUFFER_PRESERVED a good thing?

Previous blog in the series: Mali Performance 2: How to Correctly Handle Framebuffers

This week I'm finishing off my slight diversion into the land of application framebuffer management with an analysis of EGL_BUFFER_PRESERVED, and how you determine whether it is a good technique to use. This is a question which comes up regularly when talking to our customers about user-interface development and, like many things in graphics, its efficiency depends heavily on what you are doing, so I hope this blog makes it all crystal clear (or at least slightly less murky)!

What is EGL_BUFFER_PRESERVED?

As described in my previous blog, Mali Performance 2: How to Correctly Handle Framebuffers, in normal circumstances the contents of window surfaces are not preserved from one frame to the next. The Mali driver can assume that the contents of the framebuffer are discarded, and therefore it does not need to maintain any state for the color, depth, or stencil buffers. In EGL specification terms the default EGL_SWAP_BEHAVIOR is EGL_BUFFER_DESTROYED.

When creating a window surface via EGL it can alternatively be created with EGL_SWAP_BEHAVIOR configured as EGL_BUFFER_PRESERVED. This means that the color data in the framebuffer at the end of rendering of frame N is used as the starting color in the color buffer for the rendering of frame N+1. Note that the preservation only applies to the color buffer; the depth and stencil buffers are not preserved and their value is lost at the end of every frame.

Great, I can render only what changed!

The usual mistake most people make is that they believe this technique allows them to patch a small amount of rendering over the existing framebuffer. If the only thing which has changed on screen since the previous frame is the clock incrementing one second, then I just have to modify the clock in the taskbar, right? Wrong!

Remember that most real systems are running an N-buffered rendering scheme, sometimes double-buffered, but increasingly commonly triple-buffered. The memory buffer you are appending on top of when rendering frame N+1 is not the color buffer frame N, but probably that for frame N-2. Far from being a simple patch operation, EGL_BUFFER_PRESERVED forces the driver to render a textured rectangle containing the color buffer from frame N into the working tile memory for frame N+1.

As mentioned in one of my previous blogs, and covered by seanellis's blog on Forward Pixel Kill (FPK), some of the more recent1 members of the Mali GPU family have support for removal of overdrawn fragments before they become a significant cost to the GPU. In cases where overdraw on top of the previous frame is opaque (no blending, and fragment shader does not call "discard"), the overdrawn parts of the readback can be suppressed and consequently do not have a performance or bandwidth impact. In addition, if you have EGL_BUFFER_PRESERVED enabled but find you want to overdraw everything, then you can always just insert a normal glClear() call at the start of the frame's rendering to prevent the readback happening at all.

Is EGL_BUFFER_PRESERVED worth using?

So, accepting the need for this full screen readback, which is relatively straightforward when you starting thinking in terms of multi-frame rendering pipelines, the next question that I get asked is " should I use EGL_BUFFER_PRESERVED for my user interface application or not?"

Like many worthwhile engineering questions, the answer is not a simple "yes" or "no", but the more subtle "it depends".

The cost of EGL_BUFFER_PRESERVED is the full-frame load of the previous frame's data (excepting that killed by FPK) to populate the frame with the correct starting color. The alternative is re-rendering the frame from scratch, starting from the clear color. Whether using EGL_BUFFER_PRESERVED is the right thing to do therefore depends on the relative cost of these two things.

  • If your UI application is compositing multiple uncompressed layers which make heavy use of transparencies, then using EGL_BUFFER_PRESERVED is probably a sensible thing to do. The cost of one single layer of readback of the previous color data will be less expensive than recreating the color from scratch via the multi-layer + blending route.
  • If you have a simple UI or 2D game which is predominantly single layer, reading from compressed textures, then EGL_BUFFER_PRESERVED is very likely to be the wrong thing to do. The bandwidth overheads of the readback of the previous frame's color will be more expensive than recreating the frame from scratch.

It is obviously not always as clear cut as this — there are shades of grey between these two extremes — so care is needed when performing any analysis. If in doubt, use the GPU performance counters to review the performance of your real application running in place on your production platform, with and without EGL_BUFFER_PRESERVED enabled. Nothing will give a better answer than measuring your real use case in a real device . Some of the other blogs in this series, provide guidance on such application performance analysis, and I’ll be continuing to add more material in this area over the coming months.

However, when performing such performance experiments, it is important to note that the best applications are designed explicitly to work with (or without) EGL_BUFFER_PRESERVED; it is not normally as simple as just flicking an EGL configuration switch if you want to get the most efficient solution out of either route.

It is also worth noting that in a system with an ARM FrameBuffer Compression (AFBC) -enabled display controller, such as Mali-DP500, and GPU, such as Mali-T760, the bandwidth overheads of the EGL_BUFFER_PRESERVED readback can be significantly reduced, as the readback bandwidth will be that of the compressed framebuffer, which is typically in the range of 25-50% smaller than the uncompressed original.

A Better Future?

The behavior of EGL_BUFFER_PRESERVED is a nice idea, and in many cases still useful, but many of the theoretical advantages of it are lost in N-buffered systems due to the need to insert this full-frame readback of the previous frame's data.

We believe that applications — user interfaces in particular — can be made significantly more efficient if both the application and the buffer preservation schemes available explicitly expose (and can therefore exploit) the N-buffered memory model on a particular platform.  If the application knows that the system is double buffered, and it knows the delta between the current state and the state two frames ago, then it is possible to get close to the architectural ideal of only rendering and compositing the regions in memory which have changed. This has the potential to reduce the energy consumption and memory bandwidth radically for mostly steady-state user-interfaces.

EGL_KHR_partial_update

The EGL_KHR_partial_update extension is designed to allow applications to query the N-buffering level of the system – the buffer age –  and using that information, and the knowledge of what has changed in the application logic since the buffer was last rendered, to specify the screen region "dirty rectangles" which must be rendered by the GPU.

The buffer aging functionality in this extension is very similar to that provided by the EGL_EXT_buffer_age extension, but critically for tile-based rendering it also provides the dirty rectangles which allow us to know ahead of time which tiles we can completely drop because they are guaranteed not to be modified. If you ever have a choice of these two extensions, use the EGL_KHR_partial_update functionality for the best performance; the Mali drivers do not expose EGL_EXT_buffer_age for this reason.

EGL_KHR_swap_buffers_with_damage

The EGL_KHR_swap_buffers_with_damage extension provides a means for applications to provide dirty rectangle hints to the system composition process, enabling N-buffered compositors and display controllers to also benefit from the optimizations the client rendering gets from EGL_KHR_partial_update.

Do I need to use both extensions to get full benefit?

Yes. Using EGL_KHR_partial_update optimizes what the application renders using the GPU as the buffer producer; using EGL_KHR_swap_buffers_with_damage optimizes what the system compositor will have to refresh to send a valid output image to the display as the buffer consumer.  The damage rectangles which the application must specify in each case are typically different, hence the need for two extensions.

Tune In Next Time

This brings me to the end of my short diversion on framebuffer management, so next time we'll be back looking at using Mali with ARM DS-5 Streamline to investigate application performance bottlenecks and optimization opportunities.

TTFN,
Pete

Further reading:

Footnotes

  1. FPK is supported from Mali-T620 onwards

Next blog in the series: Mali Performance 4: Principles of High Performance Rendering


Pete Harris is the lead performance engineer for the Mali OpenGL ES driver team at ARM. He enjoys spending his time working on a whiteboard and determining how to get the best out of combined hardware and software compute sub-systems. He spends his working days thinking about how to make the ARM Mali drivers even better.

Anonymous
Graphics & Multimedia blog