As I mentioned previously, Android is enabling a host of useful new Vulkan extensions for mobile. These new extensions are set to improve the state of graphics APIs for modern applications, enabling new use cases and changing how developers can design graphics renderers going forward. These extensions will be available across various Android smartphones, including the new Samsung Galaxy S21, which was recently launched on 14 January. Existing Samsung Galaxy S models, such as the Samsung Galaxy S20, also allow upgrades to Android R.
I have already discussed two of these extensions in previous blogs - Maintenance Extensions and Legacy Support Extensions. However, there are three further Vulkan extensions for Android that I believe are ‘game changers’. In the first of three blogs, I will explore these individual game changer extensions – what they do, why they can be useful and how to use them. The goal here is to not provide complete samples, but there should be enough to get you started. The first Vulkan extension is ‘Descriptor Indexing.’ Descriptor indexing can be available in handsets prior to Android R release. To check what Android devices are available with 'Descriptor Indexing' check here. You can also directly view the Khronos Group/ Vulkan samples that are relevant to this blog here.
In recent years, we have seen graphics APIs greatly evolve in their resource binding flexibility. All modern graphics APIs now have some answer to how we can access a large swathes of resources in a shader.
A common buzzword that is thrown around in modern rendering tech is “bindless”. The core philosophy is that resources like textures and buffers are accessed through simple indices or pointers, and not singular “resource bindings”. To pass down resources to our shaders, we do not really bind them like in the graphics APIs of old. Simply write a descriptor to some memory and a shader can come in and read it later. This means the API machinery to drive this is kept to a minimum.
This is a fundamental shift away from the older style where our rendering loop looked something like:
render_scene() { foreach(drawable) { command_buffer->update_descriptors(drawable); command_buffer->draw(); } }
Now it looks more like:
render_scene() { command_buffer->bind_large_descriptor_heap(); large_descriptor_heap->write_global_descriptors(scene, lighting, shadowmaps); foreach(drawable) { offset = large_descriptor_heap->allocate_and_write_descriptors(drawable); command_buffer->push_descriptor_heap_offsets(offset); command_buffer->draw(); } }
Since we have free-form access to resources now, it is much simpler to take advantage of features like multi-draw or other GPU driven approaches. We no longer require the CPU to rebind descriptor sets between draw calls like we used to.
Going forward when we look at ray-tracing, this style of design is going to be mandatory since shooting a ray means we can hit anything, so all descriptors are potentially used. It is useful to start thinking about designing for this pattern going forward.
The other side of the coin with this feature is that it is easier to shoot yourself in the foot. It is easy to access the wrong resource, but as I will get to later, there are tools available to help you along the way.
This extension is a large one and landed in Vulkan 1.2 as a core feature. To enable bindless algorithms, there are two major features exposed by this extension.
How resources are accessed has evolved quite a lot over the years. Hardware capabilities used to be quite limited, with a tiny bank of descriptors being visible to shaders at any one time. In more modern hardware however, shaders can access descriptors freely from memory and the limits are somewhat theoretical.
Arrays of resources have been with us for a long time, but mostly as syntactic sugar, where we can only index into arrays with a constant index. This is equivalent to not using arrays at all from a compiler point of view.
layout(set = 0, binding = 0) uniform sampler2D Textures[4]; const int CONSTANT_VALUE = 2; color = texture(Textures[CONSTANT_VALUE], UV);
HLSL in D3D11 has this restriction as well, but it has been more relaxed about it, since it only requires that the index is constant after optimization passes are run.
As an optional feature, dynamic indexing allows applications to perform dynamic indexing into arrays of resources. This allows for a very restricted form of bindless. Outside compute shaders however, using this feature correctly is quite awkward, due to the requirement of the resource index being dynamically uniform.
Dynamically uniform is a somewhat intricate subject, and the details are left to the accompanying sample in KhronosGroup/Vulkan-Samples.
Most hardware assumes that the resource index is dynamically uniform, as this has been the restriction in APIs for a long time. If you are not accessing resources with a dynamically uniform index, you must notify the compiler of your intent.
The rationale here is that hardware is optimized for dynamically uniform (or subgroup uniform) indices, so there is often an internal loop emitted by either compiler or hardware to handle every unique index that is used. This means performance tends to depend a bit on how divergent resource indices are.
#extension GL_EXT_nonuniform_qualifier : require layout(set = 0, binding = 0) uniform texture2D Tex[]; layout(set = 1, binding = 0) uniform sampler Sampler; color = texture(nonuniformEXT(sampler2D(Tex[index], Sampler)), UV);
In HLSL, there is a similar mechanism where you use NonUniformResourceIndex, for example.
Texture2D<float4> Textures[] : register(t0, space0); SamplerState Samp : register(s0, space0); float4 color = Textures[NonUniformResourceIndex(index)].Sample(Samp, UV);
All descriptor types can make use of this feature, not just textures, which is quite handy! The nonuniformEXT qualifier removes the requirement to use dynamically uniform indices. See the code sample for more detail.
A key component to make the bindless style work is that we do not have to … bind descriptor sets all the time. With the update-after-bind feature, we effectively block the driver from consuming descriptors at command recording time, which gives a lot of flexibility back to the application. The shader consumes descriptors as they are used and the application can freely update descriptors, even from multiple threads.
To enable, update-after-bind we modify the VkDescriptorSetLayout by adding new binding flags. The way to do this is somewhat verbose, but at least update-after-bind is something that is generally used for just one or two descriptor set layouts throughout most applications:
VkDescriptorSetLayoutCreateInfo info = { … }; info.flags = VK_DESCRIPTOR_SET_LAYOUT_CREATE_UPDATE_AFTER_BIND_POOL_BIT_EXT; const VkDescriptorBindingFlagsEXT flags = VK_DESCRIPTOR_BINDING_VARIABLE_DESCRIPTOR_COUNT_BIT_EXT VK_DESCRIPTOR_BINDING_PARTIALLY_BOUND_BIT_EXT | VK_DESCRIPTOR_BINDING_UPDATE_AFTER_BIND_BIT_EXT | VK_DESCRIPTOR_BINDING_UPDATE_UNUSED_WHILE_PENDING_BIT_EXT; VkDescriptorSetLayoutBindingFlagsCreateInfoEXT binding_flags = { … }; binding_flags.bindingCount = info.bindingCount; binding_flags.pBindingFlags = &flags; info.pNext = &binding_flags;
For each pBinding entry, we have a corresponding flags field where we can specify various flags. The descriptor_indexing extension has very fine-grained support, but UPDATE_AFTER_BIND_BIT and VARIABLE_DESCRIPTOR_COUNT_BIT are the most interesting ones to discuss.
VARIABLE_DESCRIPTOR_COUNT deserves special attention as it makes descriptor management far more flexible. Having to use a fixed array size can be somewhat awkward, since in a common usage pattern with a large descriptor heap, there is no natural upper limit to how many descriptors we want to use. We could settle for some arbitrarily high limit like 500k, but that means all descriptor sets we allocate have to be of that size and all pipelines have to be tied to that specific number. This is not necessarily what we want, and VARIABLE_DESCRIPTOR_COUNT allows us to allocate just the number of descriptors we need per descriptor set. This makes it far more practical to use multiple bindless descriptor sets.
When allocating a descriptor set, we pass down the actual number of descriptors to allocate:
VkDescriptorSetVariableDescriptorCountAllocateInfoEXT variable_info = { … }; variable_info.sType = VK_STRUCTURE_TYPE_DESCRIPTOR_SET_VARIABLE_DESCRIPTOR_COUNT_ALLOCATE_INFO_EXT; variable_info.descriptorSetCount = 1; allocate_info.pNext = &variable_info; variable_info.pDescriptorCounts = &NumDescriptorsStreaming; VK_CHECK(vkAllocateDescriptorSets(get_device().get_handle(), &allocate_info, &descriptors.descriptor_set_update_after_bind));
When we enter the world of descriptor indexing, there is a flipside where debugging and validation is much more difficult. The major benefit of the older binding models is that it is fairly easy for validation layers and debuggers to know what is going on. This is because the number of available resources to a shader is small and focused.
With UPDATE_AFTER_BIND in particular, we do not know anything at draw time, which makes this awkward.
It is possible to enable GPU assisted validation in the Khronos validation layers. This lets you catch issues like:
"UNASSIGNED-Descriptor uninitialized: Validation Error: [ UNASSIGNED-Descriptor uninitialized ] Object 0: handle = 0x55625acf5600, type = VK_OBJECT_TYPE_QUEUE; | MessageID = 0x893513c7 | Descriptor index 67 is uninitialized. Command buffer (0x55625b184d60). Draw Index 0x4. Pipeline (0x520000000052). Shader Module (0x510000000051). Shader Instruction Index = 59. Stage = Fragment. Fragment coord (x,y) = (944.5, 0.5). Unable to find SPIR-V OpLine for source information. Build shader with debug info to get source information."
Or:
"UNASSIGNED-Descriptor uninitialized: Validation Error: [ UNASSIGNED-Descriptor uninitialized ] Object 0: handle = 0x55625acf5600, type = VK_OBJECT_TYPE_QUEUE; | MessageID = 0x893513c7 | Descriptor index 131 is uninitialized. Command buffer (0x55625b1893c0). Draw Index 0x4. Pipeline (0x520000000052). Shader Module (0x510000000051). Shader Instruction Index = 59. Stage = Fragment. Fragment coord (x,y) = (944.5, 0.5). Unable to find SPIR-V OpLine for source information. Build shader with debug info to get source information."
RenderDoc supports debugging descriptor indexing through shader instrumentation, and this allows you to inspect which resources were accessed. When you have several thousand resources bound to a pipeline, this feature is critical to make any sense of the inputs.
If we are using the update-after-bind style, we can inspect the exact resources we used.
In a non-uniform indexing style, we can inspect all unique resources we used.
Descriptor indexing unlocks many design possibilities in your engine and is a real game changer for modern rendering techniques. Use with care, and make sure to take advantage of all debugging tools available to you. You need them.
This blog has explored the first Vulkan extension game changer, with two more parts in this game changer blog series still to come. The next part will focus on ‘Buffer Device Address’ and how developers can use this new feature to enhance their games.
[CTAToken URL = "https://github.com/KhronosGroup/Vulkan-Samples/pulls/hanskristian-work" target="_blank" text="Learn more about the new Vulkan extensions" class ="green"]