In the previous part of this series, we discussed the advantages of the multiview extension and how it could help you tackle problems encountered while making an application for mobile virtual reality. Now chances are your application is no longer CPU but GPU bound - more precisely fragment bound; the time you are taking to shade a pixel is too important for the budget you have.
With the appearance of consumer grade eye tracking devices, foveated rendering is getting increasingly popular. So let us pause, sit, and take a bit of time to understand what foveated rendering really is. Let us decouple foveated rendering from eye tracking, and look at why you might want consider adding it to your pipeline.
As we have seen previously, in case of CPU bound applications, multiview is the ideal extension; but what if you are not or no longer CPU bound?
If you are not yet familiar with the figure below, this is an abstract representation of the process happening behind the scene, between your CPU and GPU, when you are rendering a frame. In light blue, you have the left eye related jobs, in violet the right eye related jobs, and in yellow the composition job. You may be intrigued by the lack of CPU related jobs for the right eye, thanks to the multiview extension jobs for the right and left eyes are combined on the CPU side. More about this in the blog: Optimizing Virtual Reality: Understanding Multiview
Figure 1: Rendering timeline for a fragment bound multiview application. In light blue and purple are the jobs for the left and right eye. The composition jobs are in yellow.
As we can see on this multiview timeline, we are all good on the CPU side, as well as on the vertex processing part. Our problem now is the fragment part, which is taking more time than it should. If lowering the quality of your environment is not an option, you will need to find another way.
The human eye is made in such a way that only a small section of it is really sensible. This section, called foveal system, is the only part were you can truly see sharp details and is usually only 5 degrees in the center of your vision. All around this region you are gradually only able to distinguish from contour to movement, at the very end of your Field Of View (FOV).
In virtual reality, where every drop of performance available is needed, exploiting such a characteristic in the eye design is essential.
In current virtual reality systems, the whole screen is rendered at high resolution, no matter where you are looking. As we have seen in previously, due to the human eye design, this is neither optimal nor efficient. Most of the high resolution information is wasted as deformation is applied on the surrounding areas and users rarely look at the periphery.
What if we could lower the cost of rendering such areas?
Enter foveated rendering.
Foveated rendering is designed to address this issue by rendering the outset at lower resolution than the fovea region, thus allowing you to save fragments. As for now, the fovea region will be called inset and everything around outset.
Figure 2: Foveated rendering
As you may have guessed by now, in order to use foveated rendering you will need to render your scene four times. Two views per eye, one for the inset and one for the outset. This can have a major impact on both your CPU and GPU if you are not careful. That is where the multiview extension comes in handy to do the heavy lifting for you. By setting it to 4 views we can render the whole scene four times in a single draw call. The majority of Mali powered devices will allow for 4 views multiview. Nevertheless, to ensure maximum compatibility across all of your user base, I would advise to also query the hardware capabilities in order to check for availability.
int maxViewsMultiview = -1;
glGetIntegerv( 0x9631, &maxViews ); // 0x9631 = MAX_VIEWS_OVR
One of the limitations of multiview is that it does not allow you to use multiple resolutions across your multiple textures, thus making it impossible to properly reduce the resolution of the outset part. The trick to circumvent this problem is to zoom on the middle part. We would then be using four textures with the same resolution, only changing the field of view to counteract the lower resolution.
If we have a look at the next timeline, we can see that our vertex load increased while the fragment part decreased. Our application will now be able to render in the expected timeframe while keeping the same visual quality.
Figure 3: Rendering timeline for an application using foveated rendering. compared to figure 2, note the increased geometry processing time and decreased fragment processing time.
The saving in pixel can be pretty substantial as shown in the table below:
Savings in pixels compared to 2-View Multiview
4-Views Foveated Rendering
2,097,152 twice number of pixel rendered
46,528 pixels saved
1,048,576 pixels saved
Let us now have a look at what the application could look like if we were running it using foveated rendering at 30% reduction (or 70% inset size).
Figure 4: View of the four final textures that will be written to during the main scene rendering process. Note that all the textures are using the same resolution but different field of views.
As you can see in figure 4 we would have four textures of 716x716, two for the left eye and two for the right eye. Each eye would have one “Low” texture for the outset region, and one “High” for the fovea region. As you can see all the textures are exactly the same size, the only thing differentiating the two kinds is the field of view.
Figure 5: Final texture layout that will be presented to the user
Figure 5 is showing how the textures are assembled onto the final one. The ”low” resolution ones are scaled up in order to fill the final texture, and the high ones are copied on a 1:1 mapping basis. Adopting such a strategy allows keeping native quality where the user is looking while still reducing the number of pixels that needs to be shaded.
As we have seen in the previous part, the field of view is directly correlated to the inset size ratio in such a way that we can maintain a 1:1 mapping without feeling zoomed in or out. The mathematical relation is as follows, with the FOV for the outset and ratio the expected inset size reduction (not the percentage of reduction):
I would like to bring your attention on the vertex shader part. Let us bring back the vertex shader we wrote for the previous blog on multiview.
#version 300 es
#extension GL_OVR_multiview : enable
layout(num_views = 2) in;
in vec3 vertexPosition;
uniform mat4 MVP;
gl_Position = MVP[gl_ViewID_OVR] * vec4(vertexPosition, 1.0f);
If you remember, we discussed possible driver optimizations, as it would create a loop in order to only execute the gl_ViewID_OVR related lines multiple times. We also discussed the benefits one could have in the vertex shader, resulting from this optimization. It’s good to know that nothing changes with foveated rendering, this statement is still true, but the savings on the vertex shader might not be as expected, relatively speaking to a two view case.
If your application is doing heavy computation that is related to the view in the vertex shader you might need to rethink that part as it will be executed four times.
To achieve true immersion we need to have a credible world, and needless to say that aliased lines are not adding to the credibility of the environment. As we have seen in the previous blog, you can work with multisampled framebuffers using multiview. If you may have overlooked this option until now, I cannot stress enough how important it is if you are doing foveated rendering.
As you may remember from the chapter were we described the human eye physical design, regions outside the fovea are mainly sensible to movement. Since we are scaling up the outer regions pixels will get “bigger”, and as such aliased pixels will become more apparent.
f we look at our current solution there is still some work we could do to make it even more optimized. One of the problems is that the fovea region after being copied onto the final texture will actually cover some part of the outset region. Some work prior to render could be done in order not to shade these pixels.
The final mask would look something like this. From the inset region we only remove the parts where the corners are in order to make easier the transition when we will merge them both. On the outset, everything that will be covered by the inset is stenciled.
Figure 6: Geometry of the mask that will be used
Let us have a look at the math for the savings, for the inset first, with w being our texture width and ratio the expected inset size reduction (not the percentage of reduction):
And now for the outset part:
Now if we apply this for a 50% inset ratio:
As we can see, applying this simple stencil allows us to save about 215,456 pixels to shade or 21%. And using 70% inset this can go up to 615,994 or 30%.
657,937 pixels saved
1,264,032 pixels saved
The foveated method we just discussed has several weaknesses. If you want to keep enough details, just in case the user looks at the periphery of his current field of view, you will need to raise the size of the texture used, and as such reduce the benefits. Eye tracking is slowly making its way into our headset, and many experiments were made to bind it to game mechanics and interaction. Alternatively, we can also use it to boost performance.
In our case, we can use eye tracking to match the inset more closely to the user gaze. We can then drop even more the size of the inset to around 30%, allowing even higher savings.
As we have demonstrated, foveated might be able to solve some of your problems if you are fragment bound. But something worth keeping in mind is the impact of rendering four times the view related parts in your vertex shader. As such, before adding foveated rendering you should carefully profile your application in order to determine where your bottleneck is. If you are CPU bound, foveated rendering won’t help as it will only reduce the number of fragment to process. Thanks to multiview impact of foveated rendering on the CPU will be limited. On the other end, if you are fragment bound, foveated rendering is the way to go. You will also need to weight the impact of foveated rendering on your vertex shaders, as the increase could, in some cases, make your application vertex bound. Further optimizations like stenciling might even help you reduce a bit more the number of fragments you have to shade.
Without stenciling benefits for foveated rendering will start after 70% inset size, as anything above would mean you are rendering more pixels than two views multiview. From our experiments, 50% inset size gave us the best results quality versus performance wise.
Using a gaze tracking device, as we have seen, allows reducing even more the inset size, up to 30%. Careful attention needs to be put, in that case, in order to reduce the amount of artifacts on the outset. Multisampling is a good option for this, as it has a limited cost on ARM Mali devices compared to other post-processing or temporal antialiasing methods.
Hi dvasilev, thanks a lot for the report, it should be fixed now.
Can you guys fix permissions on all the figures? It's really difficult to understand what you are talking about without seeing the illustrations!