Welcome back to the third post in my Virtual Reality (VR) blog series. We’ve already discussed the reasoning behind our interest in VR as a major driving force for graphics development, but even in the time between posts, VR uptake has grown tangibly and it’s become apparent that the industry is moving ahead fast. It’s only a matter of time now until VR is totally mass market. Yes, it will probably start with high end gaming but is set to grow so quickly that the time to take notice is now.
In blog two we considered the difficulty of focus when developing for VR and some of the ways of managing this. This time around we’re looking at how to develop low latency VR. The latency measurement in question is the time it takes to turn the actual motion of your head into the image you see on the screen of your VR headset. The two events need to be close enough together that you don’t notice any disconnect – just like in the real world. If latency is too high or too variable the immersion feels unnatural and the disparity with your brain’s understanding of normal movement will start to cause nausea or dizziness – not a super fun experience. Industry research tell us that this “motion-to-photons” latency should be consistently less than 20 milliseconds (ms) for a smooth and natural VR experience. This is a tough call at a standard refresh rate of 60Hz, which would give a latency of 16ms, but is attainable with the right approach.
There are a few elements which, when combined, can contribute greatly to successfully building a low latency VR system. The first aspect we will consider is front buffer rendering. Double or triple buffering is commonly used in graphics applications, including Android, in order to increase smoothness by allowing the GPU to draw pixels to a rotating buffer of off-screen copies, swapping them with the on-screen buffer at the end of each display refresh. This process helps iron-out variations between neighbouring frame times but also has the added effect of increasing latency which is of course the opposite of what we are looking for in a VR application. In front buffer rendering, the GPU is able to bypass the off-screen buffers and render directly to the buffer the display reads from in order to reduce latency. Rendering to the front buffer needs careful synchronisation with the display to ensure the GPU is always writing ahead of the display reading. The context_priority extension available on Mali GPUs is used to enable prompt scheduling of tasks on the GPU in order to allow front buffer rendering processes such as Timewarp to take priority over less immediately urgent tasks and improve user experience.
Another important part of this puzzle is selecting the right type of display for your VR device. Organic Light Emitting Diode (OLED) displays are an additional element that can improve a VR experience and they work differently from the familiar and well-established LCD display. Each and every pixel in an OLED display provides its own light source through the Thin Film Transistor array which sits behind it, as opposed to the white LED backlighting of an LCD. The brightness of an OLED display is established by the level of power of the electron current travelling through the film. Because colours are managed by individually varying the tiny red, green and blue LEDs behind the screen, it is possible to get brighter and sharper hues with stronger saturation. You can turn off sections of the panel so you can achieve a deeper, truer black than is typically possible on an LCD which has to block out the back light. This is the usual selling point for OLED panels, but critically for VR it also allows an easier means to achieve low persistence through partial illumination. A full persistence display is lit continuously which means the scene view is correct only briefly then is very quickly out of date. A low persistence approach means the image only remains lit while the view is correct and then goes dark, a process which is imperceptible at a high refresh rate, providing the illusion of a continuous image.
This is important to reducing blurring for the user. The additional flexibility with which you can illuminate the pixels in an OLED panel means that the display can show multiple partial images during a single refresh and so react mid-frame to the changes fed to it by the sensors in the headset, allowing the head location to change whilst the scan is moving across the screen. This is not possible to achieve with an LCD panel without replacing the global backlight. The ability to drive an OLED panel by drawing to the front buffer in sections or slices through a Timewarp-like process is key for achieving a lower latency VR experience. This is because the image you see on screen can adapt to your head movements much more quickly than is otherwise possible.
Now we consider one of the keystones of this combination, the Timewarp process. Due to the comparatively gradual changes of scene in an immersive VR application, the image changes between views by a small and therefore, relatively predictable amount. Warping is basically shifting an image rendered at an older head location to match a newer one. This partially decouples the application frame rate from the refresh rate and allows the system to enforce a latency guarantee that some applications may not provide. This shifting can account for changes in head rotation but not head position or scene animation. It’s therefore something of an approximation but provides an effective safety net and also enables an application running at 30 FPS to appear (at least in part) as if it’s tracking the users head at 60 FPS or above.
In this post we've discussed the tight integration that needs to exist between GPU and display, but this is only one part of the stack. If we are to play videos, perhaps DRM-protected ones, and integrate system notifications; the complexity doesn’t end there. High quality VR support requires that your multimedia products are well synchronized and communicate in bandwidth-efficient ways in order to provide not only the best experience to the end user but also the best power efficiency and performance. The ARM® Mali™ Multimedia Suite (MMS) of GPU, Video and Display processors is integrated and supported by efficiencies such as ARM Frame Buffer Compression (AFBC) and ARM TrustZone® making it a leader in VR development technology.
Join us at GDC to find out more!
Thanks Sam! Your answer was very thorough and informative.
But I'm now curious about how many buffers are being written to. You hint that there are at least two: the eyebuffer (the application writes out to, and the timewarp reads from), and the displaybuffer (what the timewarp writes out to, and the display reads from). But am I wrong to assume that there must be a third buffer as well? If the timewarp must read an old buffer (to mitigate a missed update), I'm guessing that the application must not be writing to that buffer, or there will be weird discontinuities with what the timewarp writes out to the displaybuffer. Does the application maintain at least 2 eyebuffers with rendered content (in addition to the display buffer) -- one that is being composed, and the other that holds a previously completed frame?
I plan to get a GearVR and two GS7s (exynos and snapdragon variants) when I hit the development/experimentation phase, and I will take your advice. I'm sure I'll also have a bit of fun with the headset!
Cheers,
Sean
Hi Sean, hopefully I can help out to some degree:
I'm not sure I fully understand your first question but will give it a shot: I think it's easier to think of timewarp as an "always on" process. It'll take the latest output from the application and can tweak the head rotation while writing it to the frontbuffer. Exactly how and when it tweaks the rotation is something of an implementation detail, but conceptually it's always in a position to update the rotation, simply because it runs after the application rendering (eyebuffer) phase. There isn't really an 'if' statement in there (at least conceptually). Skipping to your third question, the timewarp is always sync'd to the display - Android exposes a timestamped v-sync event you could do this with, for example. It has to keep writing data ahead of the display read or the display will scan out old data. It can't depend on the app providing up to date eyebuffers so if the application drops a few frames, then the timewarp will just has to correct the (old) eyebuffers it has more severely. This is cool but not magic. 15Hz VR apps still look pretty ****
To your second question, it's an interesting question as to whether you could push something faster than the refresh rate through this process. I don't have data either way but suspect while it's plausible in theory it might have subtle issues. Updating the eyebuffers the timewarp sees is a discontinuity. If you restrict it to happen on refresh boundaries at least both eyes see the same data.
There are no substantial changes in external bandwidth from this kind of frontbuffer rendering - you are still reading/writing the same amount of data. You just do it in less overall memory footprint (ie. less buffering). However, VR needs decent res offscreen eyebuffers and the display to run at native resolution, so bandwidth is a pressure point generally. Current hardware is surprisingly good at handling this though - AFBC, tiled rendering, decent texture compression - they are all helping keep bandwidth down to make this feasible.
It's definitely more involved than it first looks!
Fun tip: Get a GearVR device, build and install a debug-signed sample app so you can turn on 'developer mode' and see the low persistence display mode without it being in the headset, and then take a video of the screen through a high speed camera - the "slow mo" mode on an iPhone kind of works!
Sam
I'm confused about something. You mention this:
The ability to drive an OLED panel by drawing to the front buffer in sections or slices through a Timewarp-like process is key for achieving a lower latency VR experience. This is because the image you see on screen can adapt to your head movements much more quickly than is otherwise possible.
Is this implying that at vsync, the new frame begins rendering out to the display buffer slightly behind the display read position of the current low-persistence "chunk", and if there is no new buffer, "timewarp" is calculated (with the latest head position vector) and is pushed out instead?
What implications does this have with tearing? For example if a developer were to write out at faster than 16ms, would tearing be perceptible if the GPU write was synchronized to occur before the display read?
Is it possible to programatically determine the size of the low-persistence chunks for custom "time-warp" operations, or does the GPU simply stall fragment operations that want to write ahead of the display read? Or is the time-warp substituted at vsync if the rendered frame is not completed? What types of instructions are available for synchronizing (and querying) display-related timings?
And what implications does this have with external bandwidth? Does this imply that display bandwidth is being consumed by a faster-than-16ms time-warp (as each low-persistence chunk is drawn), and if so, what type of bandwidth GPU utilization is customary when time-warp is enabled? Is time-warp consuming bandwidth (and ALU cycles) regardless of GPU-write finishing on time?
I'm sorry about the torrent of questions and forgive me if some of these are overly ignorant. I'm really interested in this as it seems more involved than I had originally intuited!