Using suballocation for vertex buffer causes gpu crash on Mali Gpu

Hello Forum,

I've encountered a strange gpu crash when using suballocation for vertex buffer.

The blend indices are stored as an attribute within the vertex buffer, which I use to sample an SSBO for skinning calculations. This setup works perfectly without suballocation. However, after implementing suballocation, I encountered frequent GPU crashes. Through shader debugging, I’ve confirmed that these crashes are caused by sampling SSBO out of ranges.

community.arm.com/.../about-mali-gpu-index-buffer-processing-group-of-4-indices

I came across an article mentioning that Mali GPUs(Mali-G715-Immortalis MC11...) utilize an index group of four. Based on this, I tried aligning the vertex buffer to 4 * stride and ensuring no 'dirty' data exists in the padding. This resolved the GPU crashes.

Does this mean that if I have, for example, only 5 vertices, the shader might still attempt to access the 6th or 7th vertex positions due to this group-based fetching logic? It seems that without suballocation, the driver might have been transparently handling these OOB checks. But with suballocation, the shader ends up reading into adjacent memory belonging to other resources, leading to the crash.

Additionally, are there any tools similar to NVIDIA Aftermath that can help me pinpoint the exact cause or location of GPU crashes on mobile platforms?

  • Does this mean that if I have, for example, only 5 vertices, the shader might still attempt to access the 6th or 7th vertex positions due to this group-based fetching logic

    Yes. For vertex shaders this is fixed to only shade referenced index ranges in the Immortalis-G925 hardware generation, but can still fail if using geometry shaders, tesselation shaders, or transform feedback.

    are there any tools similar to NVIDIA Aftermath

    Nothing off the shelf for Mali today. Assuming on Vulkan, the best approach is to write progress "breadcrumbs" to a buffer after each workload, using pipeline barriers to serialize workloads to make it clearer which one is failing.