Arm Best Practice warnings in the Vulkan SDK

November 29, 2021

13 minute read time.

In this tutorial, we will be covering a little-known feature in Khronos' validation layer which can help you detect potential performance issues for Arm Mali GPUs. In 2020, we integrated the functionality of our now-deprecated PerfDoc layer directly into the Khronos validation layer, which we now refer to as Arm best practice validation. The PerfDoc layer allowed a developer to check for common mistakes when tuning for Mali, or any other GPU with similar features, such as tile memory. With this functionality now in the Vulkan SDK, all Vulkan developers can rapidly enable these checks and benefit from them.

Here we will go over how to use Arm best practice validation, and how to act on some of the performance issues it may detect.

Khronos' validation layer contains various sub-components which are aimed at checking for separate types of validity in API usage – sometimes these sub-components are referred to as "layer objects". The best practices layer object is what we're interested in here: it contains checks for API usage which is technically correct, but potentially unadvisable in terms of resulting performance. As it is only checking for API usage, which might hinder performance, the layer object is opt-in. This needs to be enabled via one of a variety of different interfaces.

In version 1.2.148.0 of the Vulkan SDK, the Khronos validation layer introduced Arm-specific best practice checks, adding to the existing set of vendor-agnostic best practice checks. As a result, we can now also opt-in to receive Arm best practice checks. These automatically detect API usage which could negatively impact performance on Arm Mali GPUs specifically.

One of the great benefits of having a tool like this is that one can validate against Mali-friendliness on desktop. Say a studio has made a game using Vulkan, and it was targeted for the PC platform. The studio then decides it wants to investigate porting to mobile. With best practice validation, the studio can quickly check if their graphics implementation has fallen into any common pitfalls for Mali.

The reasoning behind the checks themselves follow from our Mali GPU Best Practices Developer Guide. Take a look at the guide for more information about how to get the most out of Mali.

Vulkan Layers

Conceptual Overview

One of the most interesting features of Vulkan is its layer architecture. To understand best practices checks, it helps to provide some context as to how the checks are being run, and where their functionality comes from. Here's a short overview of Vulkan's layer architecture, and what Khronos' validation layer does.

Vulkan layers

(Instance Call Chain Example - Architecture of the Vulkan Loader Interfaces, Khronos, CC BY-ND 4.0)

When a Vulkan function is called, the trampoline loader passes parameters to the function hook for the first layer. For instance functions and device functions (which comprise most Vulkan functions), Vulkan layer implementations are expected to, and are completely responsible for, calling the next layer in the chain themselves.

Please refer to Khronos' official documentation for more in-depth details about the Vulkan layer loader interface.

Vulkan layers can be enabled implicitly or explicitly. Implicit layers are specified via the VK_INSTANCE_LAYERS environment variable on Windows, Mac, and Linux; or via the debug.vulkan.layer property on Android.

Each implicit layer must also implement a disable environment variable, specific to it, and may optionally implement an enable environment variable. If an enable variable is implemented, it is required to be set in the environment before the layer is active. Then, if the disable environment variable is set, this overrides everything and disables the layer in all cases.

Explicit layers are specified to be loaded by the application itself. The layers are specified via VkInstanceCreateInfo, used in vkCreateInstance.

In either case, layer implementations are searched for in a handful of pre-defined system locations, as well as all paths specified in VK_LAYER_PATH.

Khronos Validation Layer Details

The Khronos validation layer has an internal interface for pre-call hooks and post-call hooks. So when a Vulkan API call occurs, the sequence of events looks something like this:

API call to vkFoo
The trampoline loader dispatches to the first layer, each layer calls the appropriate dispatch function for the next layer. Eventually we reach the Khronos validation layer.
For each enabled validation object in the Khronos layer (e.g., core validation, synchronization validation, best practice validation), do the following:
- Run the object's PreCallValidateVkFoo function
- Run the object's PreCallRecordVkFoo function
Run the next layer's vkFoo function – if this is the last layer in the chain, then the ICD (Installable Client Driver) behavior for vkFoo will run instead.
Once control is returned to the Khronos validation layer, after the next layer has finished, for each validation object, do:
- Run the object’s PostCallRecordVkFoo.

Khronos Validation Layer Details

The best practice layer object tracks data from the parameters and return values of all Vulkan calls, using the pre-call and post-call hooks. Therefore, it can identify patterns of API usage at runtime.

Best Practice Validation

Setup

In order to receive log output from the Khronos validation layer, we need to register the debug callback for our application. Take a look at this tutorial for more information on enabling the debug callback.

The debug callback needs to register that it will accept performance warnings. Best practice performance warnings require that the debug callback enables VK_DEBUG_UTILS_MESSAGE_SEVERITY_WARNING_BIT_EXT in the message severity flags, and VK_DEBUG_UTILS_MESSAGE_TYPE_PERFORMANCE_BIT_EXT in the message type flags.

Enabling Best Practice Validation

There are effectively 4 ways to enable the best practice validation, each have use-cases.

The VK_LAYER_ENABLES environment variable.
Using vk_layer_settings.txt and VK_LAYER_SETTINGS_PATH.
Using the "programmatic interface" for layer features.
Using the Vulkan Configurator

Layer Enables Variable

Using VK_LAYER_ENABLES is the simplest way to test out best practice validation when running an application on the command line or through a script. VK_LAYER_ENABLES is a colon-separated list of symbols (or semicolons on Windows) representing simple boolean settings in a layer. To enable best practice validation, VK_LAYER_ENABLES needs to contain VK_VALIDATION_FEATURE_ENABLE_BEST_PRACTICES_EXT. To enable Arm-specific best practice validation, the variable also needs to include VALIDATION_CHECK_ENABLE_VENDOR_SPECIFIC_ARM.

In short, we should use VK_LAYER_ENABLES=VK_VALIDATION_FEATURE_ENABLE_BEST_PRACTICES_EXT: VALIDATION_CHECK_ENABLE_VENDOR_SPECIFIC_ARM – which while hard to read, is quite easy to set within a shell environment.

Layer Settings File

Using vk_layer_settings.txt is sometimes required if layers allow for more complex settings. The most basic vk_layer_settings.txt file which we’d need to specify Arm best practice validation looks like this:

# Basic configuration file for enabling best practices validation warnings
khronos_validation.debug_action=VK_DBG_LAYER_ACTION_LOG_MSG
khronos_validation.report_flags=info,warn,perf,error
khronos_validation.log_filename=stdout
khronos_validation.enables=VK_VALIDATION_FEATURE_ENABLE_BEST_PRACTICES_EXT,VALIDATION_CHECK_ENABLE_VENDOR_SPECIFIC_ARM

When running the application, we then need the environment variable VK_LAYER_SETTINGS_PATH to contain the path to this file.

Programmatic Interface

One can also enable best practice warnings at vkCreateInstance using the programmatic interface. However, this is limited only to basic best practice enablement – Arm best practice validation cannot be enabled this way.

Vulkan Configurator

The Vulkan Configurator (vkconfig) is a graphic user interface (GUI) for helping with quick layer settings and overrides. It is also available in the Vulkan SDK. Vulkan Configurator now includes a checkbox for best practice validation, as well as vendor-specific best practice validation. One can either launch an application directly from Vulkan Configurator or have it override settings for all Vulkan applications, as long as it is open.

Please refer to the official Vulkan Configurator documentation for more information about its features.

Vulkan Layers Management

Please also refer to Khronos' official documentation regarding best practices validation for more details.

Example: MSAA

Now that everything is set up, let’s look at an example with Vulkan-Samples. Once Vulkan-Samples is built (with validation layers enabled), and we have Arm best practice validation enabled via vk_layer_settings.txt, we should see the warnings in the terminal.

Let’s open the MSAA sample like so. On Linux:

VK_LAYER_SETTINGS_PATH=<path-to>/vk_layer_settings.txt build/app/bin/Debug/x86_64/vulkan_samples msaa

On Windows:

set VK_LAYER_SETTINGS=<path-to>\vk_layer_settings.txt
build\app\bin\Debug\AMD64\vulkan_samples.exe msaa

If we then switch the MSAA sample count to 8, we get warnings about performance on Mali.

[ UNASSIGNED-BestPractices-vkCreateImage-too-large-sample-count ] Object 0: handle = 0x559eb1033138, type = VK_OBJECT_TYPE_DEVICE; | MessageID = 0xa4282245 | [Arm] vkCreateImage():

Trying to create an image with 8 samples. The hardware revision may not have full throughput for framebuffers with more than 4 samples.

As our MSAA tutorial explains, we should avoid using more than 4x MSAA without checking what the performance impact is, especially on older Mali-based devices. The warning in this case simply asks us to pay attention to this fact and may not represent a problem in practice.

If we set the “Resolve color” setting in the MSAA sample to “separate” we see a warning about vkCmdResolveImage.

[ UNASSIGNED-BestPractices-vkCmdResolveImage-resolving-image ] Object 0: handle = 0x5589dc2efce8, type = VK_OBJECT_TYPE_DEVICE; | MessageID = 0x3899059a | [Arm] Attempting to use vkCmdResolveImage to resolve a multisampled image.

This is a very slow and extremely bandwidth intensive path. You should always resolve multisampled images on-tile with pResolveAttachments in VkRenderPass.

As the warning mentions, using vkCmdResolveImage can negatively affect performance on Mali, we should instead set up the renderpass to resolve the image inline. We have documentation about how to get the best multisampling performance in our best practices guide. In this case, the MSAA Vulkan sample implements proper inline multisampling resolution for Mali in its default options – so if we switch the “resolve color” option back to “on writeback”, the warning will go away.

RenderDoc

It is also possible to get best practice warnings within RenderDoc. This is useful because when a debug message is emitted from a Vulkan layer, the references to Vulkan objects in the warning are hyperlinked to the equivalent resource in RenderDoc. This means that you can be linked directly to the API call or resource which caused a best practice warning. Let's look at an example.

Firstly, to allow RenderDoc to see best practice warnings – or any other type of opt-in warning such as synchronization validation – we need to launch RenderDoc itself with the environment variables. In other words, you should open RenderDoc like this:

VK_INSTANCE_LAYERS=VK_LAYER_KHRONOS_validation VK_LAYER_ENABLES=VK_VALIDATION_FEATURE_ENABLE_BEST_PRACTICES_EXT:VALIDATION_CHECK_ENABLE_VENDOR_SPECIFIC_ARM <path-to-renderdoc>/qrenderdoc

Or on Windows:

set VK_INSTANCE_LAYERS=VK_LAYER_KHRONOS_validation
set VK_LAYER_ENABLES=VK_VALIDATION_FEATURE_ENABLE_BEST_PRACTICES_EXT:VALIDATION_CHECK_ENABLE_VENDOR_SPECIFIC_ARM
<path-to-renderdoc>\qrenderdoc.exe

When we capture a frame in RenderDoc and go to the “Errors and Warnings” tab, we will see any best practice warnings which occur during the frame. For example, let’s use Vulkan-Samples' “render_passes” sample, capture a frame, and then load the capture.

Vulkan "render_passes" sample

If we go to the errors and warnings tab, we find the following:

Error and warnings tab

There are two warnings for “UNASSIGNED-BestPractices-vkCmdDrawIndexed-post-transform-cache-thrashing". Let’s look at the description. The first one says:

[Arm] The indices which were specified for the draw call are estimated to cause thrashing of the post-transform vertex cache, with a hit-rate of 46.12%. I.e., the ordering of the index buffer may not make optimal use of indices associated with recently shaded vertices.

...and the second says the same thing, but with a hit rate of 42.69% instead.

What does this mean? First, some background information on what the post-transform cache is, and why this is relevant for performance on Mali.

Since position data is highly likely to be re-used in a mesh, e.g., when multiple triangles share a vertex, it is more optimal to only process position data once and allow the data to be accessed from a cache. This is called the post-transform cache. Because the cache has a fixed size, it is in our best interests to keep usages of a particular index close to each other in the index buffer. If the index buffer contains sequences of indices which vary wildly, with less spacial locality, this can increase the risk of needing to re-shade some vertices. This is mentioned in our best practices guide, alongside other tips to bear in mind when using indexed draw calls.

The validation layer implements a small cache model, which while technically different from the underlying hardware, can still detect meshes which are likely to be sub-optimal in any case. With this in mind, the warning likely means that either: a mesh being drawn in the frame is highly discontinuous or the indices are specified in a sub-optimal order for a mesh.

As mentioned, the objects in the warning message are linked to resources within RenderDoc. So, if we double-click on the first message, we are automatically taken to the offending Vulkan API call in the event log.

Offending Vulkan API call in the event log

Now, let’s look at the mesh using the “mesh viewer” tab.

"Mesh viewer" tab

We can infer from the mesh that the draw call in question is for the ornamental plate behind the lion head sculpture.

If we look at the mesh associated with the next warning, it turns out that the roof tiles are the culprit.

"Mesh viewer" with warning

The estimated cache hit rate is quite low for these meshes, which means the GPU might re-do some position or “varying” shading, unnecessarily. The details of why a particular mesh results in this behavior can vary. For example, it could mean there are duplicated vertices (and therefore no index re-use). It could also mean that the indices specified for the draw call are in a suboptimal order. Finally, it could simply mean that a mesh has low connectivity with itself, so there may be no simple fix.

In the absence of any details about what is wrong with the meshes, we can try to automatically optimize them using a mesh optimiser. Arseny Kapoulkine’s meshoptimizer can help us here.

To run meshoptimizer, we can use gltfpack: a wrapper of the meshoptimizer libraries with a nice command line interface. We will download it from NPM (node package manager) in this case.

npm install -g gltfpack

This will install gltfpack on our PATH. Next, we can attempt to optimise the meshes like so:

cd <path-to-vulkan-samples>
gltfpack -i assets/scenes/sponza/Sponza01.gltf -o assets/scenes/sponza/Sponza01_optimised.gltf -si 1 -noq

The -si 1 flag specifies that we want to simplify meshes in the scene, and that we want to use a simplification ratio of 1. If the ratio is 1, it means we want to optimise the meshes, but still maintain full quality. We could also use a ratio less than 1 if we needed to; however, this would reduce the quality of the meshes. -noq is also required in this case, since Vulkan-Samples does not support the KHR_mesh_quantization glTF extension, at the moment.

We then modify the render_passes sample to use Sponza01_optimised.gltf, by changing load_scene(“scenes/sponza/Sponza01.gltf”) to load_scene(“scenes/sponza/Sponza01_optimised.gltf”), in RenderPassesSample::prepare, in samples/performance/render_passes/render_passes.cpp.

After re-compiling Vulkan-Samples, and capturing the frame again, the cache thrashing warnings are gone because the hit rate is improved. Then, as a bonus, Sponza’s file size is significantly reduced.

Summary

In this blog, we have explored Arm best practice validation, how to enable it, how to interpret the warnings that may be emitted, and gone through some examples of how to fix issues that are implied by the warnings.

We hope that this extra validation proves useful when writing Vulkan for Mali. We would love to hear any feedback about how best practice validation has helped.

If you encounter any problems with using the system or have any feature requests, please refer to the validation layers’ GitHub repository. Here you can raise an issue if the problem has not already been solved.

Bibliography

https://developer.arm.com/documentation/101897/latest

https://vulkan.lunarg.com/doc/view/1.2.189.0/linux/loader_and_layer_interface.html

https://github.com/KhronosGroup/Vulkan-Samples/blob/master/samples/extensions/debug_utils/debug_utils_tutorial.md

https://github.com/LunarG/VulkanTools/tree/master/vkconfig

https://vulkan.lunarg.com/doc/view/1.2.189.0/linux/best_practices.html#user-content-enabling-and-specifying-options-with-the-programmatic-interface

https://vulkan.lunarg.com/doc/view/1.2.189.0/linux/best_practices.html

https://github.com/KhronosGroup/Vulkan-Samples

https://github.com/KhronosGroup/Vulkan-Samples/blob/master/samples/performance/msaa/msaa_tutorial.md

https://developer.arm.com/documentation/101897/v2-2/Fragment-Shading/Multisampling-for-Vulkan

https://www.khronos.org/opengl/wiki/Post_Transform_Cache

https://en.wikipedia.org/wiki/Locality_of_reference

https://developer.arm.com/documentation/101897/v2-2/Vertex-Shading/Index-draw-calls

https://github.com/zeux/meshoptimizer

https://github.com/KhronosGroup/glTF/blob/main/extensions/2.0/Khronos/KHR_mesh_quantization/README.md

https://github.com/KhronosGroup/Vulkan-ValidationLayers

Graphics, Gaming, and VR blog

Coming soon in Arm Frame Advisor

Julie Gaskin

Read about our vision for future feature enhancements in Frame Advisor. We have listened to your feedback and plan to extend the kinds of analyses you can perform. Help us to create more great features…
- March 13, 2024
Using the new custom reporting features in Performance Advisor

Connor Brookes

Explaining the new custom reporting features in Performance Advisor and how to use them.
- March 4, 2024
Beyond Mobile: Arm Mobile Studio is now Arm Performance Studio

Julie Gaskin

We are proud to announce that the latest version of our profiling tool suite for mobile is now available to download and use for free. In this release, we have a few changes to tell you about.
- February 26, 2024

AI and ML blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded blog

Graphics, Gaming, and VR blog

High Performance Computing (HPC) blog

Infrastructure Solutions blog

Internet of Things (IoT) blog

Operating Systems blog

SoC Design and Simulation blog

Tools, Software and IDEs blog