I'm a researcher of modern Vulkan on mobiles.
There is a device XIAOMI Redmi Note 8 Pro, Android 10, MIUI 12.0.3 Global, Vulkan 1.1.108, Mali-G76 MC4.
Recently I have memory layout problem in the uniform buffer which is visible for vertex shader of VkPipeline. I have Github repository with full source which demonstrates the problem. Direct link to the commit is here.
From architecture standpoint the VkPipeline consists from two shader stages: vertex and fragment. There is a uniform buffer which contains an array of structs with proper paddings. The uniform buffer is binded to the descriptor set with index one and binding point with index zero.
[[vk::binding ( 0, 1 )]]
cbuffer InstanceData: register ( b0 )
// sizeof ( ObjectData ) = 192 bytes
// sizeof ( InstanceData ) = 8064 bytes
ObjectData g_instanceData[ PBR_OPAQUE_MAX_INSTANCE_COUNT ];
gpgpu_limits.inc looks like this:
#define PBR_OPAQUE_MAX_INSTANCE_COUNT 42U
It was found that in runtime the vertex shader stage makes incorrect readings from uniform buffer in situations when there is no any usage the following parameters
So I tried to add some useless code to reference those the members in the vertex shader module. After that I got expected behavior.
OutputData VS ( in InputData inputData )
const ObjectData objectData = g_instanceData[ inputData._instanceIndex ];
result._vertexH = mul ( objectData._localViewProjection, float4 ( inputData._vertex, 1.0F ) );
result._uv = (half2)inputData._uv;
const float3x3 orientation = (float3x3)objectData._localView;
// MALI-G76 optimizes _colorX members and breaks memory layout if no any references to those members in the shader.
/*result._normalView = (half3)mul ( orientation, inputData._normal );
result._tangentView = (half3)mul ( orientation, inputData._tangent );
result._bitangentView = (half3)mul ( orientation, inputData._bitangent );*/
result._normalView = (half3)mul ( orientation, inputData._normal ) + (half3)objectData._color0.xyz;
result._tangentView = (half3)mul ( orientation, inputData._tangent ) + (half3)objectData._color1.xyz;
result._bitangentView = (half3)mul ( orientation, inputData._bitangent ) + (half3)objectData._color2.xyz + (half3)objectData._color3.xyz;
result._instanceIndex = inputData._instanceIndex;
Keeping this in mind I compared disassemble code of two SPIR-V versions on the vertex shader:
SPIR-V disassembler (desirable)
SPIR-V disassembler (suboptimal)
Direct comparation shows that both programs contain correct uniform buffer memory layout.
So my suggestion is that the Vulkan driver makes some decisions about unused uniform buffer members at the linking stage. So the driver decides to remove those declarations. As a result the memory layout changes and C++ code becomes incorrect.
Note that _colorX data is used in the fragment shader stage. So those members are definitely not unused code in term of whole VkPipeline object.
I made a small video with demonstration of the behavior of the program in both cases:
This could impact on some programming techniques as uber uniform buffer . For example Unreal Engine 4 has enormous uniform buffer (about 3 kBytes) which members are selectively used by programs. One time fill rule.
Would you kindly help with the issue?
Vulkan validation layers are ON.
GPU stats and limits: link.
SPIR-V compiler tools: DirectX Shader Compiler v1.5.2005.10152 [official, x64, Windows 10 Pro x64]
Compile flags: -spirv -WX -O3 -fvk-use-dx-layout -enable-16bit-types -T vs_6_6
Android NDK: 21.3.6528147
Direct link to the commit: android-vulkan [Github]
View all questions in Graphics and Gaming forum