This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

compute shader(mali gpu) bug

As shown in the figure above, the same ComputeShader (the purpose of this CS is to calculate a globally unique and sequentially increasing index based on gl_GlobalInvocationID.xy, namely v2.x) runs right on Redmi RMX2072 (Adreno) but wrong on Huawei LIO-AN00 (arm mali-g76). After some tests, we have not found that Mali (ARM mali) device is running normally.

CompressedASTC_CS_Error and CompressedASTC_CS_Right are the results of running the ComputeShader, which can also be seen in RenderDoc. 

Pay attention to the lines after 65537, which are all zeros in CompressedASTC_CS_Error which runs on mali arm.

All needed Files In Mali_CS (Include RenderDoc & All Files Aboved)

Is computeshader(mali) bug ?


Pls, Help me!

Parents
  • Hi Togchen,

    I suspect the problem is that on pre-Valhall (including Mali-G76), and Valhall-devices with older drivers, GL_MAX_TEXTURE_BUFFER_SIZE is only 65,536 elements. Because the limit is 64K our shader compiler will only use the low 16 bits of the passed index -- hence when u1 goes beyond this it effectively wraps. This explains why the write to u1 = 65,536 has ended up modifying the first entry (index 0). For these kinds of use-cases where you want to modify large buffers from a single dispatch we'd recommend using SSBOs instead as it supports much larger buffer bindings.

    Hope that helps. :) 

    Cheers,
    Christian

Reply
  • Hi Togchen,

    I suspect the problem is that on pre-Valhall (including Mali-G76), and Valhall-devices with older drivers, GL_MAX_TEXTURE_BUFFER_SIZE is only 65,536 elements. Because the limit is 64K our shader compiler will only use the low 16 bits of the passed index -- hence when u1 goes beyond this it effectively wraps. This explains why the write to u1 = 65,536 has ended up modifying the first entry (index 0). For these kinds of use-cases where you want to modify large buffers from a single dispatch we'd recommend using SSBOs instead as it supports much larger buffer bindings.

    Hope that helps. :) 

    Cheers,
    Christian

Children
No data