Hello.
I'm a researcher of modern Vulkan on mobiles.
There is a device XIAOMI Redmi Note 8 Pro, Android 10, MIUI 12.0.3 Global, Vulkan 1.1.108, Mali-G76 MC4.
Recently I have memory layout problem in the uniform buffer which is visible for vertex shader of VkPipeline. I have Github repository with full source which demonstrates the problem. Direct link to the commit is here.
From architecture standpoint the VkPipeline consists from two shader stages: vertex and fragment. There is a uniform buffer which contains an array of structs with proper paddings. The uniform buffer is binded to the descriptor set with index one and binding point with index zero.
#include "gpgpu_limits.inc" struct ObjectData { matrix _localView; matrix _localViewProjection; float4 _color0; float4 _color1; float4 _color2; float4 _color3; }; [[vk::binding ( 0, 1 )]] cbuffer InstanceData: register ( b0 ) { // sizeof ( ObjectData ) = 192 bytes // sizeof ( InstanceData ) = 8064 bytes ObjectData g_instanceData[ PBR_OPAQUE_MAX_INSTANCE_COUNT ]; }
gpgpu_limits.inc looks like this:
#define PBR_OPAQUE_MAX_INSTANCE_COUNT 42U
It was found that in runtime the vertex shader stage makes incorrect readings from uniform buffer in situations when there is no any usage the following parameters
float4 _color0; float4 _color1; float4 _color2; float4 _color3;
So I tried to add some useless code to reference those the members in the vertex shader module. After that I got expected behavior.
OutputData VS ( in InputData inputData ) { OutputData result; const ObjectData objectData = g_instanceData[ inputData._instanceIndex ]; result._vertexH = mul ( objectData._localViewProjection, float4 ( inputData._vertex, 1.0F ) ); result._uv = (half2)inputData._uv; const float3x3 orientation = (float3x3)objectData._localView; // MALI-G76 optimizes _colorX members and breaks memory layout if no any references to those members in the shader. // Investigating... /*result._normalView = (half3)mul ( orientation, inputData._normal ); result._tangentView = (half3)mul ( orientation, inputData._tangent ); result._bitangentView = (half3)mul ( orientation, inputData._bitangent );*/ result._normalView = (half3)mul ( orientation, inputData._normal ) + (half3)objectData._color0.xyz; result._tangentView = (half3)mul ( orientation, inputData._tangent ) + (half3)objectData._color1.xyz; result._bitangentView = (half3)mul ( orientation, inputData._bitangent ) + (half3)objectData._color2.xyz + (half3)objectData._color3.xyz; result._instanceIndex = inputData._instanceIndex; return result; }
Keeping this in mind I compared disassemble code of two SPIR-V versions on the vertex shader:
SPIR-V disassembler (desirable)
SPIR-V disassembler (suboptimal)
Direct comparation shows that both programs contain correct uniform buffer memory layout.
So my suggestion is that the Vulkan driver makes some decisions about unused uniform buffer members at the linking stage. So the driver decides to remove those declarations. As a result the memory layout changes and C++ code becomes incorrect.
Note that _colorX data is used in the fragment shader stage. So those members are definitely not unused code in term of whole VkPipeline object.
I made a small video with demonstration of the behavior of the program in both cases:
Video demonstration
This could impact on some programming techniques as uber uniform buffer . For example Unreal Engine 4 has enormous uniform buffer (about 3 kBytes) which members are selectively used by programs. One time fill rule.
Would you kindly help with the issue?
Best regards,
Goshido
Additional information:
Vulkan validation layers are ON.
GPU stats and limits: link.
SPIR-V compiler tools: DirectX Shader Compiler v1.5.2005.10152 [official, x64, Windows 10 Pro x64]
Compile flags: -spirv -WX -O3 -fvk-use-dx-layout -enable-16bit-types -T vs_6_6
Android NDK: 21.3.6528147
Direct link to the commit: android-vulkan [Github]
Hi Goshido -
Just to say I have been looking at this issue, with the best bug report ever :). Love the video (& the github & decompiled shaders). I'm unaware of any 'expected' bug here, and initial testing of the the shader against the driver didn't show up anything (UBO size & indexing still seemed correct), so I'm making a reproducer debug apk to hand to the driver team to look at.
Can you confirm the driver version on the phone you're testing with? ("chrome://gpu/" in chrome on the device should show it). We think it is 20 for that device, but would be good to confirm.
Thanks,
Ben
Hello Ben Clark!
Thank you for response.
Can you confirm the driver version on the phone you're testing with?
Sure. The direct link to the device's specs is here.
I look forward to hearing from you.
From investigations today it looks like it's fixed in latest driver for latest phones, if that's good news. It would appear to have been fixed fairly recently though from other tests (it's been reproduced on r19 and r25 of the driver, as well as your r20). And as your driver won't be updated I guess it won't help for your case.
Decompiling the spir-v and recompiling with a different compiler works - so it would appear to be something about the way DirectX Shader Compiler encodes the spir-v that causes the driver issues. So one possible workaround is trying a different compiler. And you've already come up with another workaround.
We'll keep investigating to find what the fix was and if there's a better workaround I'll get back to you with what can be done to avoid this.
Thank you very much for this report - both the fabulous quality of it, but also we do like to know about these issues to address them.
Understood.
Ben Clark would you kindly share the link to the official MALI driver bug tracker? It would be great if I could check similar problems before bothering you guys with my stupid questions ;)
Do you have such resource?
Unfortunately a public driver bug tracker doesn't exist. It is a good suggestion that has been under discussion though.
Your questions definitely are not stupid!