Arm Community
Arm Community
  • Site
  • User
  • Site
  • Search
  • User
  • Groups
    • Arm Research
    • DesignStart
    • Education Hub
    • Graphics and Gaming
    • High Performance Computing
    • Innovation
    • Multimedia
    • Open Source Software and Platforms
    • Physical
    • Processors
    • Security
    • System
    • Software Tools
    • TrustZone for Armv8-M
    • 中文社区
  • Blog
    • Announcements
    • Artificial Intelligence
    • Automotive
    • Healthcare
    • HPC
    • Infrastructure
    • Innovation
    • Internet of Things
    • Machine Learning
    • Mobile
    • Smart Homes
    • Wearables
  • Forums
    • All developer forums
    • IP Product forums
    • Tool & Software forums
  • Support
    • Open a support case
    • Documentation
    • Downloads
    • Training
    • Arm Approved program
    • Arm Design Reviews
  • Community Help
  • More
  • Cancel
Graphics and Gaming
  • Developer Community
  • Tools and Software
  • Graphics and Gaming
  • Jump...
  • Cancel
Graphics and Gaming
Graphics and Gaming blog Vulkan Samples: High Fidelity Graphics for Android Mobile Game Development Using Vulkan
  • Blog
  • Graphics - Most active members
  • Forum
  • Videos & Files
  • Mentions
  • Sub-Groups
  • Tags
  • Jump...
  • Cancel
  • New
More blogs in Graphics and Gaming
  • Graphics and Gaming blog

Tell us what you think
Tags
  • vulkan
  • Graphics APIs
  • vulkan api
  • Tutorial
  • Graphics Processing Unit
Actions
  • RSS
  • More
  • Cancel
Related blog posts
Related forum threads

Vulkan Samples: High Fidelity Graphics for Android Mobile Game Development Using Vulkan

Ben Walshe
Ben Walshe
July 21, 2020

In this blog, we briefly look at two examples of how to use Vulkan to maximize the graphics performance in your game. We will walk you through a few key Vulkan performance samples that demonstrate common optimizations and best practices to follow in your mobile games, so you can start squeezing every last drop of performance out of the device and give your fans the game they absolutely need to play, through the power of Vulkan APIs. 

As an Android game developer, you have two choices for graphics APIs: OpenGL ES and Vulkan. In this article, we are going to look at Vulkan. Designed to push 3D graphics on mobile devices, Vulkan acts as a super-thin abstraction layer. This gives you much more control, lower CPU overhead, a smaller memory footprint, and greater stability.

We will walk you through a few key Vulkan performance samples that demonstrate common optimizations and best practices to follow in your mobile games, so you can start squeezing every last drop of performance out of the device and give your fans the game they absolutely need to play, through the power of Vulkan APIs.

Maximum Performance. Minimum Overhead.

How Vulkan enables high-performance, cross-platform graphics is simple: “With great power comes great responsibility.” To enable maximum graphics performance, Vulkan allows more control over the hardware resources than OpenGL ES, in exchange for requiring more explicit memory management and operations. And to achieve lower CPU overhead, the Vulkan API supports multithreading and takes advantage of the four to eight cores built into mainstream mobile devices.

For more detail, Vulkan Essentials is a great resource with an in-depth explanation of how Vulkan works under the hood.

Vulkan API Samples and Tutorials

There are tons of great resources and examples available to learn how to use the Vulkan API. The two examples we will look at are Render Passes and Wait Idle, which demonstrate some of the most useful optimizations you can take advantage of in your own mobile game. These performance samples show recommended best practices for enhancing performance with the Vulkan APIs, and provide real-time profiling information to help you identify and understand bottlenecks in your application. The full set of samples and tutorials, open-sourced by Arm and administered by the Khronos Group, can be found here.

This article assumes you are familiar with 3D render pipelines and Vulkan API basics. If you are new to Vulkan, this Vulkan Guide and introductory tutorial helps you get your first triangles rendered. For additional examples, refer to these API samples that cover topics such as HDR, instancing, texture loading, and tessellation.

Prerequisites

To work with the Vulkan samples, you need to have the right tools and dependencies. For Android, you can check out the Android section of the Build Guide.

The main prerequisites are:

  • CMake v3.10 or later
  • JDK 8 or later
  • Android NDK r18 or later
  • Android SDK
  • Gradle 5 or later
  • Sample 3D Models

Appropriate Use of Render Pass Attachments

Render pass attachments are how Vulkan keeps track of your input and output render targets. It might make sense to think of them as references to color or depth buffers. Configuring them optimally is a simple but effective way to gain precious milliseconds during the render pass.

Let us start by taking a look at this performance tutorial and sample code.

You will see an app rendering a 3D scene in a single pass with a GUI showing render stats and options to switch between load operations for the color attachment and store operations for the depth attachment.

Render passes on device

Knowing whether or not the contents of the attachment buffer needs to be cleared of a color, read from, or written to can greatly affect the draw performance. This is because you can set it up in a way to minimize the number of read/write operations.

For example, because you do not need to read the contents of the final color buffer drawn to the screen, in Vulkan, you can set its load operation for the attachment description to VK_ATTACHMENT_LOAD_OP_DONT_CARE and speed up your render pass.

You can test this out by selecting Load for your color attachment load operation and then seeing how the External Read Bytes value increases because it prepares your color buffer to not just draw the scene, but also to be able to read in its contents for this pass.

Changing the Depth attachment store operation has a similar effect on External Write Bytes because you are indicating whether you want to spend time saving the depth information to the buffer.

Here is a typical setup for how you could optimally use render pass attachments when drawing a 3D scene in your own code:

VkAttachmentDescription attachments[
  2 ];
  
  //
  Color attachment
  attachments[ 0 ].format = colorFormat;
  attachments[ 0 ].samples = VK_SAMPLE_COUNT_1_BIT;
  attachments[ 0 ].loadOp = VK_ATTACHMENT_LOAD_OP_DONT_CARE;
  attachments[ 0 ].storeOp = VK_ATTACHMENT_STORE_OP_STORE;
  attachments[ 0 ].stencilLoadOp =
  VK_ATTACHMENT_LOAD_OP_DONT_CARE;
  attachments[ 0 ].stencilStoreOp = VK_ATTACHMENT_STORE_OP_DONT_CARE;
  attachments[ 0 ].initialLayout = VK_IMAGE_LAYOUT_UNDEFINED;
  attachments[ 0 ].finalLayout =
  VK_IMAGE_LAYOUT_PRESENT_SRC_KHR;
  
  //
  Depth attachment
  attachments[ 1 ].format = depthFormat;
  attachments[ 1 ].samples = VK_SAMPLE_COUNT_1_BIT;
  attachments[ 1 ].loadOp = VK_ATTACHMENT_LOAD_OP_CLEAR;
  attachments[ 1 ].storeOp =
  VK_ATTACHMENT_STORE_OP_DONT_CARE;
  attachments[ 1 ].stencilLoadOp =
  VK_ATTACHMENT_LOAD_OP_DONT_CARE;
  attachments[ 1 ].stencilStoreOp =
  VK_ATTACHMENT_STORE_OP_DONT_CARE;
  attachments[ 1 ].initialLayout = VK_IMAGE_LAYOUT_UNDEFINED;
  attachments[ 1 ].finalLayout =
  VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL;
  
  VkAttachmentReference colorReference = {};
  colorReference.attachment = 0;
  colorReference.layout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL;
  
  VkAttachmentReference depthReference = {};
  depthReference.attachment = 1;
  depthReference.layout = VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL;
  
  VkSubpassDescription subpass = {};
  subpass.pipelineBindPoint = VK_PIPELINE_BIND_POINT_GRAPHICS;
  subpass.colorAttachmentCount = 1;
  subpass.pColorAttachments = &colorReference;
  subpass.pDepthStencilAttachment = &depthReference;
  
  VkRenderPassCreateInfo renderPassInfo = {};
  renderPassInfo.sType = VK_STRUCTURE_TYPE_RENDER_PASS_CREATE_INFO;
  renderPassInfo.attachmentCount = 2;
  renderPassInfo.pAttachments = attachments;
  renderPassInfo.subpassCount = 1;
  renderPassInfo.pSubpasses = &subpass;
  
  vkCreateRenderPass( g_device, &renderPassInfo, nullptr,
  &renderPass );

One final option demonstrated in this sample is the Use vkCmdClear checkbox, which will explicitly clear the color attachment, and demonstrates how doing so can negatively affect performance. Resetting the whole buffer by using the load operation is more efficient. Using this explicit clear function is better reserved for other scenarios, such as when you need to specify an inner rectangular region to be cleared.

For instance, if you want to keep a 10px border intact, you could add to your command buffer like this:

VkClearAttachment clearAttachment =
  {};
  clearAttachment.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
  clearAttachment.clearValue.color = 0;
  clearAttachment.colorAttachment = 0;
  
  VkClearRect clearRect = {};
  clearRect.layerCount = 1;
  clearRect.rect.offset = { 10, 10 };
  clearRect.rect.extent = { width - 20, height - 20 };
  
  vkCmdClearAttachments( g_cmdBuffer, 1,
  &clearAttachment, 1, &clearRect );

Optimization Tip: Identifying specifically how you are using each render pass attachment will help make sure you get the best read/write throughput. Remember to use VK_ATTACHMENT_LOAD_OP_CLEAR when you need to clear a render target. And when you do not need to read an attachment’s contents, set VK_ATTACHMENT_LOAD_OP_DONT_CARE to avoid unnecessary operations.

Synchronizing the CPU and GPU

Some of Vulkan’s render pipeline computations are done on the CPU, like creating command buffers, and others are done on the GPU, such as shaders and render targets. Processing them in the correct order means that the CPU and GPU need to work with each other with proper timing.

An easy and reliable way to accomplish this with Vulkan APIs is to use vkQueueWaitIdle to simply wait for the current queue to be empty before the CPU adds new commands to hand off to the GPU. However, one of the biggest gains to your render throughput can come from making sure your GPU and CPU aren’t sitting around waiting for long periods of micro-time and can work right away on preparing the next frame.

You can see how this makes a difference in the Wait Idle performance tutorial and sample code.

Frame rate

Running this sample shows a scene with two options, Wait Idle and Fences, and text showing the frame times (the average time it took to render the frame). This sample demonstrates how efficiently queuing up the next frame (or the next command buffer for more complex passes) can improve performance.

When you run the sample, you will notice that the frame times are much higher with the Wait Idle option selected, and lower when the Fences option is selected.

Here is how you could set up your render loop to do this in your code to use fences:

void render()
  {
      vkWaitForFences( g_device, 1, &g_renderFence, VK_TRUE,
  UINT64_MAX );
      vkResetFences( g_device, 1, &g_renderFence );
  
      // Update frame with new commands
      setCmdBuffer( g_cmdBuffer );
  
      uint32_t imageIndex;
      vkAcquireNextImageKHR( g_device, g_swapchain, UINT64_MAX, 
  g_imageSemaphore,
  VK_NULL_HANDLE, &imageIndex );
  
      VkSubmitInfo submitInfo = { VK_STRUCTURE_TYPE_SUBMIT_INFO };
      submitInfo.waitSemaphoreCount = 1;
      submitInfo.pWaitSemaphores = &g_imageSemaphore;
      submitInfo.commandBufferCount = 1;
      submitInfo.pCommandBuffers = &g_cmdBuffer;
  
      vkQueueSubmit( g_queue, 1, &submitInfo, g_renderFence );
  
      VkPresentInfoKHR presentInfo = 
  {
  VK_STRUCTURE_TYPE_PRESENT_INFO_KHR };
      presentInfo.waitSemaphoreCount = 1;
      presentInfo.pWaitSemaphores = &g_renderSemaphore;
      presentInfo.swapChainCount = 1;
      presentInfo.pSwapchains = &g_swapchain;
      presentInfo.pImageIndices = &imageIndex;
      vkQueuePresentKHR( g_queue, &presentInfo );
  }

Optimization Tip: Keep your render queue moving by avoiding vkQueueWaitIdle and vkDeviceWaitIdle and using VkFence objects and vkWaitForFences. You need to make sure that each fence works independently of the others without overlap (separate render frames, for example). Also, if you have multiple commands within a single frame on the GPU that don’t need to be synchronized with the CPU, you might consider using VkSemaphore objects instead.

To see a more detailed example on synchronizing the CPU and GPU, you can also take a look at this Vulkan tutorial for Frames in Flight.

Next Steps

We briefly looked at two examples of how to use Vulkan to maximize the graphics performance in your game. Vulkan provides some low-level optimizations that require you to manage processes in your app on a more granular level. But as you have seen, implementing some individual Vulkan APIs makes it easier to get started and can pay performance dividends immediately.

That is only the beginning. There are many more open source tutorials and samples available here to help you optimize the drawing of polygons and do more with your render passes in your mobile game.

Here are a few more performance samples we recommend if you are developing for Android devices with Vulkan:

  • Benefits of Subpasses Over Multiple Render Passes
  • Enabling AFBC (Arm Frame Buffer Compression)
  • Using Pipeline Barriers Efficiently

Get involved

We would encourage you to check out the project on the Vulkan Samples GitHub page and try the sample for yourself. The project has just been donated to The Khronos Group. You can tweak the number of command buffers and the allocation strategy directly on the screen, showing the performance impact through real-time hardware counter graphs. You are also warmly invited to contribute to the project by providing feedback and fixes and creating additional samples.

You may also read the other posts in this series:

  • Picking the Most Efficient Load/Store Operations
  • Appropriate Use of Surface Rotation
  • Descriptor and Buffer Management
  • Vulkan FAQs Part 1 and Part 2
  • Management of Command Buffers and Multi-threaded Recording
  • Multithreading in Vulkan

And here are some other useful resources:

  • PerfDoc - Vulkan tool that validates applications for best practices
  • Mali GPU Best Practices - Best practices guide for Arm Mali GPUs
  • Android NDK Vulkan Graphics API Guide

Vulkan Samples 

This article was originally posted on CodeProject as a sponsored article by Arm. It was written by Raphael Munn and you can find the link to the CodeProject article here

Anonymous
Graphics and Gaming blog
  • World of Tanks Blitz: Automated performance testing for modern graphics needs

    Pavel Busko
    Pavel Busko
    In this blog, read about performance testing with modern graphics. This is a guest blog entry from Pavel Busko.
    • March 9, 2021
  • New game changing Vulkan extensions for mobile: Timeline Semaphores

    Hans-Kristian Arntzen
    Hans-Kristian Arntzen
    This blog looks at 'Timeline Semaphores', one of the three 'game-changers' offered through the new Vulkan extensions on mobile.
    • March 4, 2021
  • New game changing Vulkan extensions for mobile: Buffer Device Address

    Hans-Kristian Arntzen
    Hans-Kristian Arntzen
    This blog looks at 'Buffer Device Address', one of the three 'game-changers' offered through new Vulkan extensions on mobile.
    • February 25, 2021