This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

MALI G76 MC4 Vulkan driver bug

Hello.

I'm a researcher of modern Vulkan on mobiles.

There is a device XIAOMI Redmi Note  8 Pro, Android 10, MIUI 12.0.3 Global, Vulkan 1.1.108, Mali-G76 MC4.

Recently I have memory layout problem in the uniform buffer which is visible for vertex shader of VkPipeline. I have Github repository with full source which demonstrates the problem. Direct link to the commit is here.

From architecture standpoint the VkPipeline consists from two shader stages: vertex and fragment. There is a uniform buffer which contains an array of structs with proper paddings. The uniform buffer is binded to the descriptor set with index one and binding point with index zero.

#include "gpgpu_limits.inc"


struct ObjectData
{
    matrix          _localView;
    matrix          _localViewProjection;
    float4          _color0;
    float4          _color1;
    float4          _color2;
    float4          _color3;
};

[[vk::binding ( 0, 1 )]]
cbuffer InstanceData:       register ( b0 )
{
    // sizeof ( ObjectData ) = 192 bytes
    // sizeof ( InstanceData ) = 8064 bytes
    ObjectData      g_instanceData[ PBR_OPAQUE_MAX_INSTANCE_COUNT ];
}

gpgpu_limits.inc looks like this:

#define PBR_OPAQUE_MAX_INSTANCE_COUNT       42U

It was found that in runtime the vertex shader stage makes incorrect readings from uniform buffer in situations when there is no any usage the following parameters

float4          _color0;
float4          _color1;
float4          _color2;
float4          _color3;

So I tried to add some useless code to reference those the members in the vertex shader module. After that I got expected behavior.

OutputData VS ( in InputData inputData )
{
    OutputData result;

    const ObjectData objectData = g_instanceData[ inputData._instanceIndex ];
    result._vertexH = mul ( objectData._localViewProjection, float4 ( inputData._vertex, 1.0F ) );

    result._uv = (half2)inputData._uv;

    const float3x3 orientation = (float3x3)objectData._localView;

    // MALI-G76 optimizes _colorX members and breaks memory layout if no any references to those members in the shader.
    // Investigating...
    /*result._normalView = (half3)mul ( orientation, inputData._normal );
    result._tangentView = (half3)mul ( orientation, inputData._tangent );
    result._bitangentView = (half3)mul ( orientation, inputData._bitangent );*/
    result._normalView = (half3)mul ( orientation, inputData._normal ) + (half3)objectData._color0.xyz;
    result._tangentView = (half3)mul ( orientation, inputData._tangent ) + (half3)objectData._color1.xyz;
    result._bitangentView = (half3)mul ( orientation, inputData._bitangent ) + (half3)objectData._color2.xyz + (half3)objectData._color3.xyz;
    result._instanceIndex = inputData._instanceIndex;

    return result;
}

Keeping this in mind I compared disassemble code of two SPIR-V versions on the vertex shader:

SPIR-V disassembler (desirable)

SPIR-V disassembler (suboptimal)

Direct comparation shows that both programs contain correct uniform buffer memory layout.

So my suggestion is that the Vulkan driver makes some decisions about unused uniform buffer members at the linking stage. So the driver decides to remove those declarations. As a result the memory layout changes and C++ code becomes incorrect.

Note that _colorX data is used in the fragment shader stage.  So those members are definitely not unused code in term of whole VkPipeline object.

I made a small video with demonstration of the behavior of the program in both cases:

Video demonstration

This could impact on some programming techniques as uber uniform buffer . For example Unreal Engine 4 has enormous uniform buffer (about 3 kBytes) which members are selectively used by programs. One time fill rule.

Would you kindly help with the issue?

Best regards,

Goshido


Additional information:

Vulkan validation layers are ON.

GPU stats and limits: link.

SPIR-V compiler tools: DirectX Shader Compiler v1.5.2005.10152 [official, x64, Windows 10 Pro x64]

Compile flags: -spirv -WX -O3 -fvk-use-dx-layout -enable-16bit-types -T vs_6_6

Android NDK: 21.3.6528147

Direct link to the commit: android-vulkan [Github]

Parents
  • Hi Goshido -

    Just to say I have been looking at this issue, with the best bug report ever :). Love the video (& the github & decompiled shaders). I'm unaware of any 'expected' bug here, and initial testing of the the shader against the driver didn't show up anything (UBO size & indexing still seemed correct), so I'm making a reproducer debug apk to hand to the driver team to look at.

    Can you confirm the driver version on the phone you're testing with? ("chrome://gpu/" in chrome on the device should show it). We think it is 20 for that device, but would be good to confirm.

    Thanks,

    Ben

Reply
  • Hi Goshido -

    Just to say I have been looking at this issue, with the best bug report ever :). Love the video (& the github & decompiled shaders). I'm unaware of any 'expected' bug here, and initial testing of the the shader against the driver didn't show up anything (UBO size & indexing still seemed correct), so I'm making a reproducer debug apk to hand to the driver team to look at.

    Can you confirm the driver version on the phone you're testing with? ("chrome://gpu/" in chrome on the device should show it). We think it is 20 for that device, but would be good to confirm.

    Thanks,

    Ben

Children