Vulkan Integration in Unity

Introduction

Last year at GDC 2016, Khronos launched the Vulkan 1.0 specification and the Khronos members released first Vulkan drivers and SDKs. Just a year later, at GDC 2017 Unity announced the Unity 5.6 release with the built-in Vulkan renderer. With this, Unity showed not only its support to Vulkan but also to developers that expects the best from Unity.

One of the ARM’s sponsored talks at GDC 2017 was devoted to the topic of this blog: “Get the Most from Vulkan in Unity with Practical Examples from Infinite Dreams”. The talk covered the main advantages that the new API brings to developers in general and the benefits we can expect when running Vulkan on ARM CPUs and GPUS.

In the talk Marek Wyszyński, VP & Co-Founder of Infinite Dreams, shared his experience in developing the Sky Force Reloaded mobile game with very interesting data when comparing its execution in OpenGL ES versus Vulkan. This data gathered a lot of interest among the attendees and in general at GDC. At the GDC Unity’s keynotes Lucas Meijer, the Technical Director at Unity, showed a video we prepared in collaboration with Infinite Dreams to demonstrate how Vulkan maximizes performance and reduces battery usage.

We also had the opportunity at the talk to listen first hand from Mikko Strandborg, the Vulkan lead at Unity, about their experience in implementing the Vulkan renderer into the Unity rendering pipeline. Mikko’s presentation was interesting not only for Unity developers, but also for other developers that are working in their own Vulkan implementation as with his tips they can save a lot of time and effort.

This blog covers the topics presented in the ARM Sponsored Talk at GDC 2017 related with Vulkan integration in Unity. The full talk video is also available in the GDC Vault.

The benefits of the new graphics API

It has become evident over the last few years that OpenGL and OpenGL ES have completed their life cycle and that it was impossible to evolve further on top of an API that carries so much inherited weight after serving the industry for over 25 years. Additionally, GPUs are becoming highly programmable and compute capable, mobile platforms are becoming more relevant, memory is becoming unified and processors are becoming multi–core. Therefore, Vulkan is central to the GPU future.

The talk covered two of the most relevant benefits of Vulkan for graphics: multithreading and multi-pass.

Multi-threading & multicore efficiency

Nowadays even mid-range phones come with four cores. Traditional graphics APIs were not designed for multi-threaded use and required a lot of synchronisation with the CPU to manage draw calls, resulting in high CPU overhead with it becoming a bottleneck, especially on mobile devices.

Vulkan was created from the ground up, to be thread-friendly and there's a huge amount of details in the specs related to multi-threading and the consequences of function calls. Most of the functions don’t need to be synchronised externally.

In OpenGL ES for instance, the driver might have several background threads working while waiting for API calls from the application. In Vulkan, this responsibility has been moved up to the application level, so it's now up to developers to ensure correct and efficient multi-threading behaviour. This is a good thing since the application often has better visibility of what it wants to achieve.

In Vulkan, the lower CPU load due to driver simplification helps to reduce the energy consumption and makes the battery last longer. Here multi-threading plays a key role by allowing the CPU to get sleep faster since it can spread the work out to multiple cores.

Additionally, ARM processors can schedule and migrate tasks between big and LITTLE cores according to the load, in this way achieving an optimal energy balance. ARM processors are able to choose between big cores to achieve high performance for high load tasks and LITTLE cores to achieve higher energy efficiency for low and medium load tasks.

Multi-pass rendering

Multi-pass rendering feature in Vulkan is conceptually very similar to the Pixel Local Storage introduced by ARM in 2014. In Vulkan you could think of the begin/end render pass functions almost like a scope. While you're inside the scope of that pass, everything you execute/draw is part of that render pass.

Render passes determine first of all whether you're executing graphics commands or compute commands in each sub-pass. Then you specify the attachments that you'll be outputting to inside of your shaders. You specify your binding points explicitly, and they must match up, and must be outputted to as specified in your pipeline.

Especially on tiled GPUs such as ARM Mali GPUs, multi-pass allows the driver to perform additional optimizations when each pixel rendered in a sub pass accesses the results of the previous sub-pass at the same pixel location. In this way all the data can be contained and remain on the fast on-chip memory. That means less data transfer to the main memory and thus less bandwidth use which helps to save energy and battery.

The beauty of render passes is that you declare everything ahead of time, and you can have multiple sub-passes. This feature is beneficial in general for other architectures as well as it helps scheduling. With multi-pass, the driver doesn't have to do any guess work in the middle of your rendering; your state is already declared, so it knows what to expect. You can specify different outputs for each sub-pass, and you can chain them together.

For example, for a deferred rendering setup, you can have a sub-pass for pre-Z/depth, G-buffer, light accumulation, post-processing, etc.

Figure 1. Screenshots from Lofoten demo developed with Vulkan.

In the picture above you can see some screenshots from Lofoten demo based on a game engine developed in house. We have used this engine to test Vulkan implementation running on ARM CPUs and GPUs.

This demo shows that multi-pass enables efficient deferred shading on mobiles. It also shows that multithreading enables an efficient use of the multi-core architecture to the point that the graphics API overhead can be reduced by up to 10 times.

Mali Graphics Debugger

Any new graphics API also requires tools in order to allow developers to optimise their games. At GDC 2017, we announced that Unity 5.6 is capable of building Android applications with support from Mali Graphics Debugger (MGD). The process to trace Vulkan applications with MGD is straightforward as described in Fig. 2 below. More detailed explanation can be found in other ARM Community blog.

All you need to do is to place the library provided with the MGD installer in your project folder indicated in the picture, then you need to tick the Development Build box to make Unity package the MGD validation layer into the APK. Install the APK in the device, connect MGD to the Daemon over USB, enable the MGD Vulkan layer and you are ready to trace your Vulkan calls.

Thanks to the collaboration with Unity, the process for building OpenGL ES applications with MGD support has been also substantially simplified and now looks practically the same as for Vulkan.

 Figure 2. Steps required to build Unity applications with MGD support for Vulkan and OpenGl ES APIs.

Sky Force Reloaded with Vulkan and Unity

Infinite Dreams is the game studio behind the popular mobile shoot’em up game called Sky Force Reloaded. The game features an intense action and has rich graphics. It’s trying to push CPU & GPU to it’s limits and it’s built using Unity.

Figure 3. Sky Force Reloaded.

According to Marek Wyszyński, the CEO and Co-founder of Infinite Dreams, when profiling the game they found that they had quite a lot of drawcalls; up to 1000 drawcalls per frame. This amount can slow down even the latest generation of mobile devices. As a result, the CPU was spending a lot of time in a driver, preparing data for the GPU.

The only way they had to optimize the game was to minimize the amount of draw calls or modify them in a way so that they could be batched by the game engine. Draw calls are very expensive and OpenGL ES driver is not quite optimal because it keeps CPU busy for a long time. The very moment they heard about Vulkan, they couldn’t wait to give it a try and see how it can improve the performance of their game. In Unity, Vulkan is just another rendering API and absolutely no knowledge is required from a game developer in order to use it - all the hard work is done inside Unity.

Figure 4. Sky Force Reloaded FPS in Vulkan and OpenGL ES.

Infinite Dreams created a synthetic benchmark that can be replayed the same way multiple times to measure OpenGL ES and Vulkan performance. It is possible to see in the Fig. 4 that while OpenGL ES struggles a lot, Vulkan is able to keep 60 fps most of the time and the overall improvement in performance is 15% over OpenGL ES.

Then they started to increase the complexity of the scene by adding additional objects to the benchmark level, up to the point where even Vulkan was not able to achieve 60 fps. At this point the gap between OpenGL ES and Vulkan was getting bigger. In this case, on average, Vulkan was 32% faster than OpenGL ES as shown in Fig. 5. This desmonstrates that there was an extra power that could be utilized.

Figure 5. Sky Force reloaded FPS after increasing the complexity of the scene.

In the talk, Marek showed a very interesting video that demonstrates how that extra power that Vulkan brings to developers can be used to render more complex geometry. Fig. 6 shows a screenshot from the video where is possible to see how Vulkan can render a lot of more particles than OpenGL ES while keeping 60 FPS.

 Figure 6. Vulkan vs OpenGL ES in terms of scene complexity.

Sky Force Reloaded is a very CPU and GPU intensive game. Some players have even complained that the game drained their battery to quickly, so Infinite Dreams in collaboration with ARM compared the energy consumption of the OpenGL ES and Vulkan Unity builds of Sky Force Reloaded when running on ARM powered devices.

Marek showed the results of energy consumption comparision between Vulkan and OpenGL ES when running Sky Force Reloaded. The tests demonstrated that Vulkan can reduce up to 10-12%  the power consumption when compared with OpenGL ES. Hence that reduction directly translates to battery life to the gamers’ delight.

Vulkan in Unity - Under the hood

The final part of the talk was devoted to the experience of the Unity Vulkan team while implementing Vulkan into the Unity rendering pipeline. Not every day we get the opportunity to listen first hand about the implemntation of a Vulkan renderer.

I would highly recommend Mikko’s presentation to both Unity developers to understand what is behind the process when we select Vulkan in Unity, but also to those developers that are implementing Vulkan into their own engines.

Some of Mikko’s recommendations are listed below.

  • In general GPUs don’t like switching between buffer bindings or having tons of small buffers. On the contrary, having a large buffer and just changing offsets within it is almost free.
  • Even if you do your own memory object management, avoid creating and deleting Vulkan objects on the fly. Instead, if you need to get a temporary buffer, try to keep a pool of those buffers and recycle them.
  • Descriptor set objects may get consumed at bind time. Instead of binding each resource separately, you bind a whole descriptor set at once with one call.
  • Mobile GPUs don’t do any magic on constant buffers! They’re just pointers to main RAM.
  • Whenever the GPU reads anything from the main RAM, the read is typically cached. The cache line size is usually the normal 64 bytes on mobiles and 32 bytes on discrete GPUs.
  • Cache all descriptor set objects! Build a global cache of all descriptor set objects and try to reuse them as much as possible.
  • Flush is a system call, so only do it once right before job submission.
  • In OpenGL ES, the driver can pin shader uniforms into GPU registers but Vulkan only has constant buffers so the driver cannot do that automatically. A push constant block behaves like a normal constant buffer block in the shader, but its contents are given as part of the command buffer stream. On Mali GPUs, push constant blocks are automatically pinned into GPU registers.
  • Don’t bother with reusing secondary command buffers. On most GPUs reusing a secondary command buffer requires so much patching to the buffer that there’s very little benefit over rebuilding the whole command buffer anyway.

A more detailed description of the Vulkan implementation in Unity can be found in the GDC Vault presentation.  Mikko’s tips can save developers a lot of time and effort when implementing their own Vulkan renderer.

Conclusions

In Vulkan, several responsabilities that previously were in the driver’s hands have now been move up to the application level. The advantages introduced by Vulkan with this shift are especially relevant for mobiles devices: lower CPU load and more memory usage within on-chip memory which ultimately leads to less energy consumption and longer battery life.

Figure 7. A wrap up of Vulkan benefits.

Unity developers can make use of these benefits just by selecting Vulkan as the renderer when building their game. For those developers that have decided to implement their own Vulkan renderer, Mikko’s tips can save a lot of time and effort. For them I would also strongly recommend to read our Application Best Practices for ARM Mali GPUs.

Additionally when developing with Vulkan we can’t forget about mobile application optimization. The reader will find in the Unity Engine Tutorials section of the ARM Developer portal a set of presentations about highly optimized rendering techniques and recomendations to develop efficient and performant mobile applications in Unity. The slides of the sponsor talk presented at GDC can be found here as well.

Finally I would like to highlight the fact that now building Unity applications with MGD support for Vulkan and OpenGL ES is very simple and strighforward so I would encorage developers to upgrade to Unity 5.6 and benefit from all the features that MGD offers when profiling your Unity appliation on ARM Mali GPUs.

Anonymous
Graphics & Multimedia blog