After one year of development, the Vulkan best practices have seen massive change. From an idea in the heads of our engineering team to an official donation to Khronos. There are roughly 4,000 visitors a week to the best practices and we get tons of great feedback and questions. The following are a collection of those frequently asked questions. For our series on Vulkan best practices, please see:
Changing a uniform while the GPU is using it is dangerous from a synchronization standpoint: you cannot know if the GPU will read the data before or after the update, so the behavior of your app would be inconsistent.
The spec says:
The descriptor set contents bound by a call to vkCmdBindDescriptorSets may be consumed during host execution of the command. This can also happen during during shader execution of the resulting draws, or anytime in between. Thus, the contents must not be altered (overwritten by an update command, or freed) between when the command is recorded and when the command completes executing on the queue. The contents of pDynamicOffsets are consumed immediately during execution of vkCmdBindDescriptorSets. Once all pending uses have completed, it is legal to update and reuse a descriptor set.
If you want to change the uniform buffer data across frames without breaking synchronization, you will have to replicate those data in some way. One way to do so without major changes to your code would be to create a larger uniform buffer (for example, 3x the size for 3 frames) and bind it as a dynamic uniform buffer. This will change the dynamic offset for each frame.
Since you cannot update a part of a buffer that is in use, pipeline barriers will not help. If you have a single buffer, the update on the CPU side has to wait for the GPU to finish using the buffer, so you would end up serializing frames.
Allocating memory for each buffer VIA vkAllocateMemory might be really slow and there is a cap on the total number of allocations. Mapping memory VIA vkMapMemory is also costly operation. The intended usage for an app is to allocate a big chunk of memory, keep it mapped and manage it.
If you want a drop-in replacement for memory management which follows these best practices, check out VMA. Its API is similar to Vulkan's so it probably will not require any major changes to your code.
// Set of commands - A vkCmdDraw(...) ... vkCmdDraw(...) // Barrier 1 vkCmdPipelineBarrier(...) // Set of commands - B vkCmdDraw(...) ... vkQueueSubmit(...) vkQueuePresentKHR(...) // Barrier 2 vkCmdPipelineBarrier(...) // Set of commands - C vkCmdDraw(...) ... vkQueueSubmit(...) vkQueuePresentKHR(...)
A pipeline barrier always acts on two sets of commands, those which come before the barrier and those which come after.
Since you do not mention render passes, we assume that the calls to vkCmdPipelineBarrier are outside of a render pass instance. The spec says:
If vkCmdPipelineBarrier is called outside a render pass instance, then the first set of commands is all prior commands submitted to the queue and recorded in the command buffer. The second set of commands is all subsequent commands recorded in the command buffer and submitted to the queue.
The main difference between the two barriers is that the first one is in the middle of a command buffer. The second one is after the first commands are submitted and presented (so it is likely to be in another command buffer). This difference does not really matter according to the spec, because commands previously submitted and previously recorded in the current command buffer are treated the same way.
This is a breakdown of the 2 barriers:
Using one descriptor pool per frame it is not strictly necessary, but it is still very good to have. If you create your descriptor pool without the FREE_DESCRIPTOR_SET_BIT flag, it means you can only free the pool VIA vkResetDescriptorPool. If you use only a single pool for all frames, you have to wait idle before freeing. If you use several descriptor pools instead, you will be able to free them for the frames that are not currently in flight.
Avoiding the FREE_DESCRIPTOR_SET_BIT flag can let the driver use a simpler allocator, ultimately improving performance.
You can also check out our blog on descriptor management for more information. If you are performing multithreaded rendering, you may need to allocate more descriptor pools, as discussed in the tutorial on multithreading.
If you don't specify any synchronization, there is a concurrency risk. You have no guarantee that the transfer will be complete when the rendering begins. You could add a pipeline barrier between the transfer and the shader stage in which you are going to use the image. You need a pipeline barrier for the layout transition anyway.
If you are uploading many textures at once, for example when loading a new scene, it might be easier to submit all the transfers and wait idle.
If it is the fence you get from vkQueueSubmit, yes, it means that commands are executed completely.
Actually it means even more than that. If the fence is signaled, it means that all commands from all previous submissions are executed completely:
When a fence is submitted to a queue as part of a queue submission command, it defines a memory dependency on the batches that were submitted as part of that command. This defines a fence signal operation which sets the fence to the signaled state.
The first synchronization scope includes every batch submitted in the same queue submission command. Fence signal operations that are defined by vkQueueSubmit additionally include in the first synchronization scope all commands that occur earlier in submission order.
Uniform buffer alignment is not straightforward due to structure packing rules: a struct in C++ will not match a struct in GLSL unless you structure them carefully. You can find more information on the std140 packing here, which applies both to uniform buffers and push constants. Debugging it might be hard: if you are lucky validation layers complain about some offsets you are not expecting, otherwise you will see weird values being passed to the shaders.
The golden rule is that struct and array elements must be aligned as multiples of 16 bytes (the size of a vec4). Thus:
Dynamic uniform buffers have an additional alignment requirement for the dynamic offset. You might need to further pad your uniform buffer data so that the offset is an exact multiple of that limit. You can check the limit as minUniformBufferOffsetAlignment in VkPhysicalDeviceProperties, with common values ranging between 16 bytes and 256 bytes.
Your app may be running out of memory. Look for a message like this in logcat:
07-13 17:10:37.788 19132 19132 V threaded_app: LowMemory: 0x7926307ec0
If you are running out of memory, debugging the app in Android Studio Profiler may help. It lets you track the memory usage of your app and may let you trace it down to individual allocations.
A first approach to shader variants is to use #ifdef directives in your shaders, like in this one. You can then compile different variants by running glslangValidator with the -D option, like this:
%VULKAN_SDK%\bin\glslangValidator.exe -V pbr.vert -o variants\pbr_vert_.spv %VULKAN_SDK%\bin\glslangValidator.exe -V pbr.vert -o variants\pbr_vert_N.spv -DHAS_NORMALS %VULKAN_SDK%\bin\glslangValidator.exe -V pbr.vert -o variants\pbr_vert_T.spv -DHAS_TANGENTS %VULKAN_SDK%\bin\glslangValidator.exe -V pbr.vert -o variants\pbr_vert_NT.spv -DHAS_NORMALS -DHAS_TANGENTS
This can be done either at compile time or at runtime, by building glslang along with your app.
A different approach to shader variants is to use specialization constants: they are efficient as they are still compile-time constants, specified at pipeline creation time, and you don't need to compile separate variants with glslangValidator or shaderc. Specialization constants do have some limitations, however, the main one being that you can't use if statements while defining your shader's interface, like vertex attributes, texture samplers:
// valid GLSL #ifdef HAS_BASECOLORMAP layout(binding = 0) uniform texture2D baseColorT; #endif // invalid GLSL if (specialization_constant) { layout(binding = 0) uniform texture2D baseColorT; }
So the interface for your shaders will be fixed, but you can use if statements based on specialization constants in your main() function. These will then be evaluated at compile time just like #define. Even if you cannot modify the shader interface variables, the compiler may optimize out the ones you do not need, if you remove all references to them.
We would encourage you to check out the project on Vulkan Mobile Best Practice GitHub page and try the sample for yourself. The tutorials have just been donated to The Khronos Group. The sample code gives developers on-screen control to demonstrate multiple ways of using the feature. It also shows the performance impact of the different approaches through real-time hardware counters on the display. You are also warmly invited to contribute to the project by providing feedback and fixes and creating additional samples.
[CTAToken URL = "https://github.com/KhronosGroup/Vulkan-Samples" target="_blank" text="Vulkan Best Practices" class ="green"]