In this tutorial, we will be covering a little-known feature in Khronos' validation layer which can help you detect potential performance issues for Arm Mali GPUs. In 2020, we integrated the functionality of our now-deprecated PerfDoc layer directly into the Khronos validation layer, which we now refer to as Arm best practice validation. The PerfDoc layer allowed a developer to check for common mistakes when tuning for Mali, or any other GPU with similar features, such as tile memory. With this functionality now in the Vulkan SDK, all Vulkan developers can rapidly enable these checks and benefit from them.
Here we will go over how to use Arm best practice validation, and how to act on some of the performance issues it may detect.
Khronos' validation layer contains various sub-components which are aimed at checking for separate types of validity in API usage – sometimes these sub-components are referred to as "layer objects". The best practices layer object is what we're interested in here: it contains checks for API usage which is technically correct, but potentially unadvisable in terms of resulting performance. As it is only checking for API usage, which might hinder performance, the layer object is opt-in. This needs to be enabled via one of a variety of different interfaces.
In version 126.96.36.199 of the Vulkan SDK, the Khronos validation layer introduced Arm-specific best practice checks, adding to the existing set of vendor-agnostic best practice checks. As a result, we can now also opt-in to receive Arm best practice checks. These automatically detect API usage which could negatively impact performance on Arm Mali GPUs specifically.
One of the great benefits of having a tool like this is that one can validate against Mali-friendliness on desktop. Say a studio has made a game using Vulkan, and it was targeted for the PC platform. The studio then decides it wants to investigate porting to mobile. With best practice validation, the studio can quickly check if their graphics implementation has fallen into any common pitfalls for Mali.
The reasoning behind the checks themselves follow from our Mali GPU Best Practices Developer Guide. Take a look at the guide for more information about how to get the most out of Mali.
One of the most interesting features of Vulkan is its layer architecture. To understand best practices checks, it helps to provide some context as to how the checks are being run, and where their functionality comes from. Here's a short overview of Vulkan's layer architecture, and what Khronos' validation layer does.
(Instance Call Chain Example - Architecture of the Vulkan Loader Interfaces, Khronos, CC BY-ND 4.0)
When a Vulkan function is called, the trampoline loader passes parameters to the function hook for the first layer. For instance functions and device functions (which comprise most Vulkan functions), Vulkan layer implementations are expected to, and are completely responsible for, calling the next layer in the chain themselves.
Please refer to Khronos' official documentation for more in-depth details about the Vulkan layer loader interface.
Vulkan layers can be enabled implicitly or explicitly. Implicit layers are specified via the `VK_INSTANCE_LAYERS` environment variable on Windows, Mac, and Linux; or via the `debug.vulkan.layer` property on Android.
Each implicit layer must also implement a disable environment variable, specific to it, and may optionally implement an enable environment variable. If an enable variable is implemented, it is required to be set in the environment before the layer is active. Then, if the disable environment variable is set, this overrides everything and disables the layer in all cases.
Explicit layers are specified to be loaded by the application itself. The layers are specified via VkInstanceCreateInfo, used in vkCreateInstance.
In either case, layer implementations are searched for in a handful of pre-defined system locations, as well as all paths specified in VK_LAYER_PATH.
Khronos Validation Layer Details
The Khronos validation layer has an internal interface for pre-call hooks and post-call hooks. So when a Vulkan API call occurs, the sequence of events looks something like this:
The best practice layer object tracks data from the parameters and return values of all Vulkan calls, using the pre-call and post-call hooks. Therefore, it can identify patterns of API usage at runtime.
In order to receive log output from the Khronos validation layer, we need to register the debug callback for our application. Take a look at this tutorial for more information on enabling the debug callback.
The debug callback needs to register that it will accept performance warnings. Best practice performance warnings require that the debug callback enables VK_DEBUG_UTILS_MESSAGE_SEVERITY_WARNING_BIT_EXT in the message severity flags, and VK_DEBUG_UTILS_MESSAGE_TYPE_PERFORMANCE_BIT_EXT in the message type flags.
There are effectively 4 ways to enable the best practice validation, each have use-cases.
Using VK_LAYER_ENABLES is the simplest way to test out best practice validation when running an application on the command line or through a script. VK_LAYER_ENABLES is a colon-separated list of symbols (or semicolons on Windows) representing simple boolean settings in a layer. To enable best practice validation, VK_LAYER_ENABLES needs to contain VK_VALIDATION_FEATURE_ENABLE_BEST_PRACTICES_EXT. To enable Arm-specific best practice validation, the variable also needs to include VALIDATION_CHECK_ENABLE_VENDOR_SPECIFIC_ARM.
In short, we should use VK_LAYER_ENABLES=VK_VALIDATION_FEATURE_ENABLE_BEST_PRACTICES_EXT: VALIDATION_CHECK_ENABLE_VENDOR_SPECIFIC_ARM – which while hard to read, is quite easy to set within a shell environment.
Using `vk_layer_settings.txt` is sometimes required if layers allow for more complex settings. The most basic `vk_layer_settings.txt` file which we’d need to specify Arm best practice validation looks like this:
# Basic configuration file for enabling best practices validation warnings khronos_validation.debug_action=VK_DBG_LAYER_ACTION_LOG_MSG khronos_validation.report_flags=info,warn,perf,error khronos_validation.log_filename=stdout khronos_validation.enables=VK_VALIDATION_FEATURE_ENABLE_BEST_PRACTICES_EXT,VALIDATION_CHECK_ENABLE_VENDOR_SPECIFIC_ARM
When running the application, we then need the environment variable VK_LAYER_SETTINGS_PATH to contain the path to this file.
One can also enable best practice warnings at `vkCreateInstance` using the programmatic interface. However, this is limited only to basic best practice enablement – Arm best practice validation cannot be enabled this way.
The Vulkan Configurator (vkconfig) is a graphic user interface (GUI) for helping with quick layer settings and overrides. It is also available in the Vulkan SDK. Vulkan Configurator now includes a checkbox for best practice validation, as well as vendor-specific best practice validation. One can either launch an application directly from Vulkan Configurator or have it override settings for all Vulkan applications, as long as it is open.
Please refer to the official Vulkan Configurator documentation for more information about its features.
Please also refer to Khronos' official documentation regarding best practices validation for more details.
Now that everything is set up, let’s look at an example with Vulkan-Samples. Once Vulkan-Samples is built (with validation layers enabled), and we have Arm best practice validation enabled via vk_layer_settings.txt, we should see the warnings in the terminal.
Let’s open the MSAA sample like so. On Linux:
VK_LAYER_SETTINGS_PATH=<path-to>/vk_layer_settings.txt build/app/bin/Debug/x86_64/vulkan_samples msaa
If we then switch the MSAA sample count to 8, we get warnings about performance on Mali.
[ UNASSIGNED-BestPractices-vkCreateImage-too-large-sample-count ] Object 0: handle = 0x559eb1033138, type = VK_OBJECT_TYPE_DEVICE; | MessageID = 0xa4282245 | [Arm] vkCreateImage():
Trying to create an image with 8 samples. The hardware revision may not have full throughput for framebuffers with more than 4 samples.
As our MSAA tutorial explains, we should avoid using more than 4x MSAA without checking what the performance impact is, especially on older Mali-based devices. The warning in this case simply asks us to pay attention to this fact and may not represent a problem in practice.
If we set the “Resolve color” setting in the MSAA sample to “separate” we see a warning about `vkCmdResolveImage`.
[ UNASSIGNED-BestPractices-vkCmdResolveImage-resolving-image ] Object 0: handle = 0x5589dc2efce8, type = VK_OBJECT_TYPE_DEVICE; | MessageID = 0x3899059a | [Arm] Attempting to use vkCmdResolveImage to resolve a multisampled image.
This is a very slow and extremely bandwidth intensive path. You should always resolve multisampled images on-tile with pResolveAttachments in VkRenderPass.
As the warning mentions, using `vkCmdResolveImage` can negatively affect performance on Mali, we should instead set up the renderpass to resolve the image inline. We have documentation about how to get the best multisampling performance in our best practices guide. In this case, the MSAA Vulkan sample implements proper inline multisampling resolution for Mali in its default options – so if we switch the “resolve color” option back to “on writeback”, the warning will go away.
It is also possible to get best practice warnings within RenderDoc. This is useful because when a debug message is emitted from a Vulkan layer, the references to Vulkan objects in the warning are hyperlinked to the equivalent resource in RenderDoc. This means that you can be linked directly to the API call or resource which caused a best practice warning. Let's look at an example.
Firstly, to allow RenderDoc to see best practice warnings – or any other type of opt-in warning such as synchronization validation – we need to launch RenderDoc itself with the environment variables. In other words, you should open RenderDoc like this:
VK_INSTANCE_LAYERS=VK_LAYER_KHRONOS_validation VK_LAYER_ENABLES=VK_VALIDATION_FEATURE_ENABLE_BEST_PRACTICES_EXT:VALIDATION_CHECK_ENABLE_VENDOR_SPECIFIC_ARM <path-to-renderdoc>/qrenderdoc
Or on Windows:
When we capture a frame in RenderDoc and go to the “Errors and Warnings” tab, we will see any best practice warnings which occur during the frame. For example, let’s use Vulkan-Samples' “render_passes” sample, capture a frame, and then load the capture.
If we go to the errors and warnings tab, we find the following:
There are two warnings for “UNASSIGNED-BestPractices-vkCmdDrawIndexed-post-transform-cache-thrashing". Let’s look at the description. The first one says:
[Arm] The indices which were specified for the draw call are estimated to cause thrashing of the post-transform vertex cache, with a hit-rate of 46.12%. I.e., the ordering of the index buffer may not make optimal use of indices associated with recently shaded vertices.
...and the second says the same thing, but with a hit rate of 42.69% instead.
What does this mean? First, some background information on what the post-transform cache is, and why this is relevant for performance on Mali.
Since position data is highly likely to be re-used in a mesh, e.g., when multiple triangles share a vertex, it is more optimal to only process position data once and allow the data to be accessed from a cache. This is called the post-transform cache. Because the cache has a fixed size, it is in our best interests to keep usages of a particular index close to each other in the index buffer. If the index buffer contains sequences of indices which vary wildly, with less spacial locality, this can increase the risk of needing to re-shade some vertices. This is mentioned in our best practices guide, alongside other tips to bear in mind when using indexed draw calls.
The validation layer implements a small cache model, which while technically different from the underlying hardware, can still detect meshes which are likely to be sub-optimal in any case. With this in mind, the warning likely means that either: a mesh being drawn in the frame is highly discontinuous or the indices are specified in a sub-optimal order for a mesh.
As mentioned, the objects in the warning message are linked to resources within RenderDoc. So, if we double-click on the first message, we are automatically taken to the offending Vulkan API call in the event log.
Now, let’s look at the mesh using the “mesh viewer” tab.
We can infer from the mesh that the draw call in question is for the ornamental plate behind the lion head sculpture.
If we look at the mesh associated with the next warning, it turns out that the roof tiles are the culprit.
The estimated cache hit rate is quite low for these meshes, which means the GPU might re-do some position or “varying” shading, unnecessarily. The details of why a particular mesh results in this behavior can vary. For example, it could mean there are duplicated vertices (and therefore no index re-use). It could also mean that the indices specified for the draw call are in a suboptimal order. Finally, it could simply mean that a mesh has low connectivity with itself, so there may be no simple fix.
In the absence of any details about what is wrong with the meshes, we can try to automatically optimize them using a mesh optimiser. Arseny Kapoulkine’s meshoptimizer can help us here.
To run meshoptimizer, we can use `gltfpack`: a wrapper of the meshoptimizer libraries with a nice command line interface. We will download it from NPM (node package manager) in this case.
npm install -g gltfpackThis will install `gltfpack` on our PATH. Next, we can attempt to optimise the meshes like so:
npm install -g gltfpack
gltfpack -i assets/scenes/sponza/Sponza01.gltf -o assets/scenes/sponza/Sponza01_optimised.gltf -si 1 -noq
The `-si 1` flag specifies that we want to simplify meshes in the scene, and that we want to use a simplification ratio of 1. If the ratio is 1, it means we want to optimise the meshes, but still maintain full quality. We could also use a ratio less than 1 if we needed to; however, this would reduce the quality of the meshes. `-noq` is also required in this case, since Vulkan-Samples does not support the KHR_mesh_quantization glTF extension, at the moment.
We then modify the render_passes sample to use `Sponza01_optimised.gltf`, by changing `load_scene(“scenes/sponza/Sponza01.gltf”)` to `load_scene(“scenes/sponza/Sponza01_optimised.gltf”)`, in `RenderPassesSample::prepare`, in `samples/performance/render_passes/render_passes.cpp`.
After re-compiling Vulkan-Samples, and capturing the frame again, the cache thrashing warnings are gone because the hit rate is improved. Then, as a bonus, Sponza’s file size is significantly reduced.
In this blog, we have explored Arm best practice validation, how to enable it, how to interpret the warnings that may be emitted, and gone through some examples of how to fix issues that are implied by the warnings.
We hope that this extra validation proves useful when writing Vulkan for Mali. We would love to hear any feedback about how best practice validation has helped.
If you encounter any problems with using the system or have any feature requests, please refer to the validation layers’ GitHub repository. Here you can raise an issue if the problem has not already been solved.