Android R is enabling a host of useful Vulkan extensions for mobile, with three being key 'game changers'. These are set to improve the state of graphics APIs for modern applications, enabling new use cases and changing how developers can design graphics renderers going forward. You can expect to see these features across a variety of Android smartphones, such as the new Samsung Galaxy S21, and existing Samsung Galaxy S models like the Samsung Galaxy S20. The first blog explored the first game changer extension for Vulkan – ‘Descriptor Indexing'. This blog explores the second game changer extension – ‘Buffer Device Address.’
VK_KHR_buffer_device_address is a monumental extension that adds a unique feature to Vulkan that none of the competing graphics APIs support.
Pointer support is something that has always been limited in graphics APIs, for good reason. Pointers complicate a lot of things, especially for shader compilers. It is also near impossible to deal with plain pointers in legacy graphics APIs, which rely on implicit synchronization.
There are two key aspects to buffer_device_address (BDA). First, it is possible to query a GPU virtual address from a VkBuffer. This is a plain uint64_t. This address can be written anywhere you like, in uniform buffers, push constants, or storage buffers, to name a few.
The key aspect which makes this extension unique is that a SPIR-V shader can load an address from a buffer and treat it as a pointer to storage buffer memory immediately. Pointer casting, pointer arithmetic and all sorts of clever trickery can be done inside the shader. There are many use cases for this feature. Some are performance-related, and some are new use cases that have not been possible before.
There are some hoops to jump through here. First, when allocating VkDeviceMemory, we must flag that the memory supports BDA:
VkMemoryAllocateInfo info = {…}; VkMemoryAllocateFlagsInfo flags = {…}; flags.flags = VK_MEMORY_ALLOCATE_DEVICE_ADDRESS_BIT_KHR; vkAllocateMemory(device, &info, NULL, &memory);
Similarly, when creating a VkBuffer, we add the VK_BUFFER_USAGE_SHADER_DEVICE_ADDRESS_BIT_KHR usage flag. Once we have created a buffer, we can query the VA:
VkBufferDeviceAddressInfoKHR info = {…}; info.buffer = buffer; VkDeviceSize va = vkGetBufferDeviceAddressKHR(device, &info);
From here, this 64-bit value can be placed in a buffer. You can of course offset this VA. Alignment is never an issue as shaders specify explicit alignment later.
When using BDA, there are some extra features that drivers must support. Since a pointer does not necessarily exist when replaying an application capture in a debug tool, the driver must be able to guarantee that virtual addresses returned by the driver remain stable across runs. To that end, debug tools supply the expected VA and the driver allocates that VA range. Applications do not care that much about this, but it is important to note that even if you can use BDA, you might not be able to debug with it.
typedef struct VkPhysicalDeviceBufferDeviceAddressFeatures { VkStructureType sType; void* pNext; VkBool32 bufferDeviceAddress; VkBool32 bufferDeviceAddressCaptureReplay; VkBool32 bufferDeviceAddressMultiDevice; } VkPhysicalDeviceBufferDeviceAddressFeatures;
If bufferDeviceAddressCaptureReplay is supported, tools like RenderDoc can support BDA.
In Vulkan GLSL, there is the GL_EXT_buffer_reference extension which allows us to declare a pointer type. A pointer like this can be placed in a buffer, or we can convert to and from integers:
#version 450 #extension GL_EXT_buffer_reference : require #extension GL_EXT_buffer_reference_uvec2 : require layout(local_size_x = 64) in; // These define pointer types. layout(buffer_reference, std430, buffer_reference_align = 16) readonly buffer ReadVec4 { vec4 values[]; }; layout(buffer_reference, std430, buffer_reference_align = 16) writeonly buffer WriteVec4 { vec4 values[]; }; layout(buffer_reference, std430, buffer_reference_align = 4) readonly buffer UnalignedVec4 { vec4 value; }; layout(push_constant, std430) uniform Registers { // Placing raw pointers in push constants avoids all // indirection for getting to a buffer. // If the driver allows it, // the pointers can be placed directly in GPU registers // before the shader begins executing! ReadVec4 src; WriteVec4 dst; } registers; // Not all devices support 64-bit integers, but it's possible to cast uvec2 <-> pointer. // Doing address computation like this is fine. uvec2 uadd_64_32(uvec2 addr, uint offset) { uint carry; addr.x = uaddCarry(addr.x, offset, carry); addr.y += carry; return addr; } void main() { uint index = gl_GlobalInvocationID.x; registers.dst.values[index] = registers.src.values[index]; uvec2 addr = uvec2(registers.src); addr = uadd_64_32(addr, 20 * index); // Cast a uvec2 to address and load a vec4 from it. // This address is aligned to 4 bytes. registers.dst.values[index + 1024] = UnalignedVec4(addr).value; }
Using raw pointers is not always the best idea. A natural use case you could consider for pointers is that you have tree structures or list structures in GPU memory. With pointers, you can jump around as much as you want, and even write new pointers to buffers. However, a pointer is 64-bit and a typical performance consideration is to use 32-bit offsets (or even 16-bit offsets) if possible. Using offsets is the way to go if you can guarantee that all buffers live inside a single VkBuffer. On the other hand, the pointer approach can access any VkBuffer at any time without having to use descriptors. Therein lies the key strength of BDA.
This is a life saver in certain situations where you are desperate to debug something without any available descriptor set.
A black magic hack is to place a BDA inside a specialization constant. This allows for accessing a pointer without using any descriptors. Do note that this breaks all forms of pipeline caching and is only suitable for debug code. Do not ship this kind of code. Perform this dark sorcery at your own risk:
#version 450 #extension GL_EXT_buffer_reference : require #extension GL_EXT_buffer_reference_uvec2 : require layout(local_size_x = 64) in; layout(constant_id = 0) const uint DEBUG_ADDR_LO = 0; layout(constant_id = 1) const uint DEBUG_ADDR_HI = 0; layout(buffer_reference, std430, buffer_reference_align = 4) buffer DebugCounter { uint value; }; void main() { DebugCounter counter = DebugCounter(uvec2(DEBUG_ADDR_LO, DEBUG_ADDR_HI)); atomicAdd(counter.value, 1u); }
In SPIR-V, there are some things to note. BDA is an especially useful feature for layering other APIs due to its extreme flexibility in how we access memory. Therefore, generating BDA code yourself is a reasonable use case to assume as well.
Enables BDA in shaders.
OpCapability PhysicalStorageBufferAddresses OpExtension "SPV_KHR_physical_storage_buffer"
The memory model is PhysicalStorageBuffer64 and not logical anymore.
OpMemoryModel PhysicalStorageBuffer64 GLSL450
The buffer reference types are declared basically just like SSBOs.
OpDecorate %_runtimearr_v4float ArrayStride 16 OpMemberDecorate %ReadVec4 0 NonWritable OpMemberDecorate %ReadVec4 0 Offset 0 OpDecorate %ReadVec4 Block OpDecorate %_runtimearr_v4float_0 ArrayStride 16 OpMemberDecorate %WriteVec4 0 NonReadable OpMemberDecorate %WriteVec4 0 Offset 0 OpDecorate %WriteVec4 Block OpMemberDecorate %UnalignedVec4 0 NonWritable OpMemberDecorate %UnalignedVec4 0 Offset 0 OpDecorate %UnalignedVec4 Block
Declare a pointer to the blocks. PhysicalStorageBuffer is the storage class to use.
OpTypeForwardPointer %_ptr_PhysicalStorageBuffer_WriteVec4 PhysicalStorageBuffer %_ptr_PhysicalStorageBuffer_ReadVec4 = OpTypePointer PhysicalStorageBuffer %ReadVec4 %_ptr_PhysicalStorageBuffer_WriteVec4 = OpTypePointer PhysicalStorageBuffer %WriteVec4 %_ptr_PhysicalStorageBuffer_UnalignedVec4 = OpTypePointer PhysicalStorageBuffer %UnalignedVec4
Load a physical pointer from PushConstant.
%55 = OpAccessChain %_ptr_PushConstant__ptr_PhysicalStorageBuffer_WriteVec4 %registers %int_1 %56 = OpLoad %_ptr_PhysicalStorageBuffer_WriteVec4 %55
Access chain into it.
%66 = OpAccessChain %_ptr_PhysicalStorageBuffer_v4float %56 %int_0 %40
Aligned must be specified when dereferencing physical pointers. Pointers can have any arbitrary address and must be explicitly aligned, so the compiler knows what to do.
OpStore %66 %65 Aligned 16
For pointers, SPIR-V can bitcast between integers and pointers seamlessly, for example:
%61 = OpLoad %_ptr_PhysicalStorageBuffer_ReadVec4 %60 %70 = OpBitcast %v2uint %61 // Do math on %70 %86 = OpBitcast %_ptr_PhysicalStorageBuffer_UnalignedVec4 %some_address
We have already explored two key Vulkan extension game changers through this blog and the previous one. The third and final part of this game changer blog series will explore ‘Timeline Semaphores’ and how developers can use this new extension to improve the development experience and enhance their games.
[CTAToken URL = "https://github.com/KhronosGroup/Vulkan-Samples/pulls/hanskristian-work" target="_blank" text="Learn more about the new Vulkan extensions" class ="green"]
Message from Hans-Kristian below:
You can cast between types at will. E.g.:
#version 450
#extension GL_EXT_buffer_reference : require
#extension GL_EXT_buffer_reference_uvec2 : require
layout(buffer_reference) readonly buffer AliasType1 { uint v; };
layout(buffer_reference) writeonly buffer AliasType2 { vec4 v; };
layout(push_constant) uniform Registers
{
uvec2 va;
};
void main()
// You can cast uvec2 or uint64_t to pointer.
AliasType1 type1 = AliasType1(va);
AliasType2 type2 = AliasType2(va);
// You can cast between pointer types.
AliasType1 type3 = AliasType1(type2);
}
You can even do pointer arithmetic on a uvec2/uint64_t VA and go from there.
Thank you for the answer!
Passing the address directly on a buffer seems to be the better solution anyway.
I'm a bit sadden that the array must be typed. I don't see an easy solution to implement Union Types. I guess I can cast the pointer of the array to the array with the correct type? Or maybe I can overlay all sets of type in a single struct, but then they will overlap (not sure if that's allowed). How would you approach this?
Unfortunately, this is not possible. If you intend to have an array of buffer addresses, you'll need to have an array of pointers stored somewhere, for example a plain SSBO:
layout(buffer_reference) PointerToMyData { float data; };
layout(set = 0, binding = 0) buffer BindlessSSBOs {
PointerToMyData pointers[];
Thanks! You explain how to get the pointer of a vkBuffer on the CPU side and than pass it to the shader on a buffer (or push constant..).
How do I get a pointer of a buffer that is bound to a descriptor? If I have an array of storage buffers bound via a descriptor, how do I index into one and get it's "starting" pointer? So I can then derenference any data. Ideally I want to do bindless storage buffers.