Hello!
We have observed some garbage vertex data fed into vertex shaders, where garbage is located at the very end of vertex buffers. This causes a 100% reproducible GPU crash. Vertex buffers are allocated in the non coherent memory.
This happens on Pixel 6 which has a Mali-G78 MP20 chip.
For now, the workaround is to align up the VkBufferCreateInfo size field to a multiple of nonCoherentAtomSize, and this fixes the GPU crash bug.
Mapping the buffer and reading data back from it on the CPU produces correct data, so it seems that only GPU is not seeing the correct data at the end of the buffer.
We are doing vkFlushMappedMemoryRanges() after memcpy() to the aligned & allocated buffer memory, and there are no Vulkan debug layer errors displayed during the app execution.
I would be curious to know if this is perhaps a known bug on your side?
Thank you in advance for your help,Milan
Hi Pete,
Thanks for the detailed analysis of the issue!
What you are describing is likely very close to the culprit - to clarify a bit further, we are allocating 16MB "pages" of VkDeviceMemory and then binding it to smaller VkBuffer objects. Each mesh has it's own VkBuffer, and since vertex size is 16 bytes and coherency atom size is 64 bytes, there is little chance GPU is reading from the next mesh, but rather, if it's reading outside of the specificed VkBuffer, it's reading junk memory data in the 64 bytes alignment padding before the next VkBuffer starts.
What's puzzling me is, the GPU crashes if we create a VkBuffer of size which is not 4-vertex & coherency atom size aligned, yet if we allocate a buffer whose size is coherency atom size aligned and 4-vertex aligned all works fine. The memory "page" contents are exactly the same in both cases (there is the same junk memory there, as our memcpy to the mapped memory copies vertex data which count is non-divisible by 4, so the last few vertices in that buffer are never initialized / written to), so I would expect a crash in both cases.
My guess would be that the over-fetch, when activated in the first case, is feeding junk data to the vertex shaders, but when it doesn't get activated in the second case, uninitialized memory gets read, and by luck GPU does not crash (maybe the memory is inited to 0 somewhere, so array accesses do not crash the GPU).
On the other hand, maybe I'm completely off with this guess - we are still waiting for the NDAs to be prepared / signed, to be able to send you our repro case.
Thanks,Milan