This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Drawcall will fail on Mali gpu when there is large index value diff in index buffer

We are currently developing a new render technique for which we need to store non-trivial index value in index buffer and later use gl_VertexID to obtain vertex attributes manually(of course not directly using gl_VertexId but after some calculation) in the vertex shader.

I found that on the devices I have(kirin960/kirin970/kirin980 socs, featuring G71 G76 gpu), both vkDrawIndexedIndirect and vkDrawIndexed will fail if there's a large diff between some indices in the index buffer, for example, an index buffer with content of [0, 1, 2, 0x0xb8fffa, 0xb8fffb, 0xb8fffc] will fail. I also tested the same apk on several adreno devices and does't seen any bugs there.

More specifically on the condition to reproduce the problem, I found that

(1)about the "index diff value": smaller diff value is require to cause the bug to happen for vkDrawIndexed and a larger value is required for vkDrawIndexedIndirect, for vkDrawIndexedIndirect I found the value to be 0x0xb8fff8 for kirin980(G76).

(2)only diff matters: adding big value to all indices won't trigger the bug, so I suspect that the hardware may be doing some compression technique to index list while assuming the diff value won't be very large since the max size of vertex buffer is bounded but ignoring someone may be doing vertex fetch manually

 

 

About the demo:

It draws 2 triangles covering the same area, triangle position is completely derived from VertexID. On my huawei phone with mali G76, it's ok with index buffer [0,1,2, 0xb8fff9, 0xb8fffa, 0xb8fffb], but won't render triangles when index buffer is [0,1,2,0xb8fffa,0xb8fffb,0xb8fffc]

ok on g76 with index diff of 0xb8fff7

  

render bug on g76 with index diff of 0xb8fff8

adreno devices is ok with index diff of 0xb8fff8

So my questions are:

(1)Is this a hardware or software driver issue? More specifically, will a software driver patch be able to fix it?

(2)if this is a hardware issue, what is the first generation of GPU that is free from this bug?

(3)How can I circumvent this? For example, if big index diff value is the cause, is there a safe index value range so that if I can avoid the bug by keep all index value within the given range.

  • since I failed to upload test apk several times, I will share it through an external site, please tell me if the link is no longer available

    link:pan.baidu.com/.../19nx8iF1CNLe6xb_WRYR7mA
    password:j3fr

  • Hi lxdeng, 

    Older Mali GPUs assume the index is a simple index, so encoding property bits in high index values is liable to cause problems.

    Mali-G77 onwards should have better behavior here as long as you are not using geometry/tessellation shaders, but you may have a significant vertex overshading rate if the additional property bit values vary at high frequency. This is because Mali shades vertices in groups of 4 contiguous indices, so if a group of 4 verts have different properties the vertex data will get shaded 4 times. 

    The best workaround on old GPUs is to keep the index buffer simple, converting any high-bit properties into a normal vertex attribute, e.g. packed into a GL_UNSIGNED_BYTE.

    Kind regards,
    Pete

  • Thank you for such a quick and informative answer!

    Would you kindly enlighten me on some additional points?

    (1)Under what conditions are verts considered by the gpu to be of different properties(so vertex overshading happens)? Will this overshading happen even if my VS contains no dynamic branch at all? After all in most situations, 4 adjacent indices are supposed to point to at least 3 verts with different properties, otherwise there's a degenerated triangle

    (2)Is the cause of this overshading behavior sorely about index value? Will some pattern of varying vertex data in vertex buffer cause vertex overshading?

    (3)Will my intension to use high bits of index value to encode attributes affect the performance of vertex post-transform cache? How do this vertex overshading behavior interact with post transform cache?

    (4)what we are doing is actually encoding a instance id into high bits of index value, the index buffer is dynamicly generated after cluster culling. We need a way to identify for each vertex which instance it belongs to, so we can use encoded instance id to get the world matrix. I thought about using gl_PrimitiveId instead, but it is known to cause more trouble like not supported on older devices or cause vertex post transform cached to be invalidated. Any suggestions?

  • Hi Ixdeng,

    (2)Is the cause of this overshading behavior sorely about index value? Will some pattern of varying vertex data in vertex buffer cause vertex overshading?

    Yes, it's just related to the index value, not the shader.

    The GPU "just" sees indices - it doesn't know about the additional encoding that you layer on top. Indices are always shaded in groups of 4 sequential index values, so e.g. if your index buffer contains (0, 10, 20) 12 vertices would get shaded (0-3, 8-11, 20-23).

    If you are encoding the instance in the high bits then you just need to try and keep each group of 4 "real index values" with the same instance so that the shaded data in each group of 4 is actually useful. I think you will avoid most overdraw as long as you keep the clusters for each index mostly contiguous in memory. Adjacent verts are highly likely to have the same instance ID, so you shouldn't get much over-shading. 

    (3)Will my intension to use high bits of index value to encode attributes affect the performance of vertex post-transform cache? How do this vertex overshading behavior interact with post transform cache?

    No, I don't think so. It should be no worse than normal indexing, which has unique per-instance entries in the post-transform cache already. 

    thought about using gl_PrimitiveId instead,

    I've not tried this on Mali on recent hardware, but I'm not aware of any issues. I think it will have some bandwidth cost (equivalent to writing out the primitive index to a vertex shader output attribute), but I'll need to check. 

     HTH, 
    Pete