Best practices to resolve typical multi-context rendering issues

November 24, 2022

3 minute read time.

This blog provides guidance for you to resolve a rendering issue commonly seen in applications that use camera.

What are the issues?

When doing multi-context rendering, we see a common rendering issue occurs for Mali CSF (Command Stream Frontend)-based GPUs. You often encounter this issue when doing camera or video related rendering.

After investigation, we find that most of the issues caused by wrong behaviors in applications.

The following figure shows you an example image with a rendering issue. The image is broken with clear edges and tile-aligned.

Common application render flow

Take the one common application logic like the following as example:

Create two shared eglContext. (Context_A, Context_B)
Create shared texture (Tex_1)
Create FBO_A1 in Context_A
Bind Tex_1 to FBO_A1
Render in Context_A to update the data for Tex_1. For example, get the image from camera, make simple modify and upload to Tex_1
Bind Tex_1 in Context_B.
Sampling from Tex_1 and rendering in Context_B. For example, apply different beauty filters to the image and shows on screen

The following part of this doc would show you how we analyze the issues. We use some pseudocodes and figures to explain that step by step.

Issue investigation

Simplest case without any GL sync control

The following pseudocode shows the simplest case where there are two contexts, and each focus on their own render task. However, there is not any sync control in the GL part.

Figure 1 shows the actual execution sequence in GL server side. Since GLES works in async mode, Thread-B GL commands may start execution in the GL server while Thread-A GL commands may still hold in the Command queue. Therefore, Thread-B might sample outdated data and lead to errors.

Why Flush does not help?

Many developers would add glFlush after upload texture data to force flush out the Thread-A GL Command into GL server before waking up Thread-B.

For traditional Mali JM (Job Manager) based GPUs, the JM receives commands from all contexts, then dispatch and execute the jobs on final hardware. Therefore, this flush operation ensures that the commands are executed as the following figure 2. The mechanism should help resolve the issue on JM GPUs.

The issues, however, still occur for Mali CSF-based GPUs. Since for CSF-based GPUs, there are multiple CSFHWIF (CSF HardWare InterFace) blocks. Each CSFHWIF block can hold one context’s command stream, and they could run all in parallel. Therefore, the sampling for Tex_1 in Thread-B and the rendering for Tex_1 in Thread-A might occur at the same time. This can cause conflicts as figure 3 shown here.

Solution

This section offers you two methods to resolve this issue:

Method 1: add glFinish command
Method 2: use EGL Fence

From the EGL Spec Version 1.5, Section 3.7.3.2. It describes in detail about the order of rendering operations between contexts. Please find the details from the EGL Spec Version 1.5. The following is a screenshot from the spec:

Now let us check how each method works with our Mali CSF GPUs.

Method 1: Add glFinish command

By changing glFlush to glFinish, even for the CSF GPU, the texture update and sample operations in sequence, shown in figure 4.

Method 2: Use EGL Fence

Even the glFinish can guarantee the execution order, but we can see that the GL operations before sample Tex1 also got delayed in Thread-B. This can cause decrease in performance.

A better solution is to use the EGL fence to do the synchronization where needed. The following code example shows you the use of the myfence object:

Previous use of myfence allow the other render operation before sample Tex1 in Thread-B can be pulled in. As a result, both contexts can be executed in parallel as much as possible. The final execution order in the CSF GPU might be as shown in the following figure5:

Summary

Apart from the previous binding texture to Framebuffer example, the following scenarios can cause issues too:

Use glTexsub* command in thread-1 to upload the texture data, and then thread-2 performs sampling on the texture. The sampled data in Thread-2 might be wrong if there is no synchronization.
Read the texture in thread-1, and then reuse the same texture and update the data for it in Thread-2. The read result in thread-1 might be corrupted if there is no synchronization control.

Modem GPUs work in asynchronous mode and GLES works as the client-server mode. When enabling the multithreading rendering, you might encounter various issues. So we must be cautious and strictly follow the Spec when designing and implementing the code.

1 comment
0 members are here

Mobile, Graphics, and Gaming blog

How Neural Super Sampling works: Architecture, training, and inference

Liam O'Neil

A deep dive into a practical, ML-powered approach to temporal supersampling.
- August 12, 2025
Start experimenting with Neural Super Sampling for mobile graphics today

Sergio Alapont Granero

Laying the foundation for neural upscaling to enable sharper, smoother, AI-powered gaming on next-generation Arm GPUs.
- August 12, 2025
Launching the Unity Profiler eBook – in collaboration with Arm

Gemma Paris

We are thrilled to announce our collaboration with Unity on the release of the new Unity Profiler eBook for Unity6. The eBook is a comprehensive guide to optimizing performance across many platforms and…
- August 1, 2025

AI blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded and Microcontrollers blog

Internet of Things (IoT) blog

Laptops and Desktops blog

Mobile, Graphics, and Gaming blog

Operating Systems blog

Servers and Cloud Computing blog

SoC Design and Simulation blog

Tools, Software and IDEs blog