This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

End of buffer corruption for non-coherent memory type

Hello!

We have observed some garbage vertex data fed into vertex shaders, where garbage is located at the very end of vertex buffers. This causes a 100% reproducible GPU crash. Vertex buffers are allocated in the non coherent memory.

This happens on Pixel 6 which has a Mali-G78 MP20 chip.

For now, the workaround is to align up the VkBufferCreateInfo size field to a multiple of nonCoherentAtomSize, and this fixes the GPU crash bug.

Mapping the buffer and reading data back from it on the CPU produces correct data, so it seems that only GPU is not seeing the correct data at the end of the buffer.

We are doing vkFlushMappedMemoryRanges() after memcpy() to the aligned & allocated buffer memory, and there are no Vulkan debug layer errors displayed during the app execution.

I would be curious to know if this is perhaps a known bug on your side?

Thank you in advance for your help,
Milan

Parents
  • Hi Milan,

    Thanks for getting in touch. I'm not aware of any specific bug like this - can you share a reproducer? Feel free to email developer at arm dot com if you can only share privately. 

    A couple of diagnostics:

    * How are you mapping the buffer, are you using WHOLE_SIZE?

    * Does it stop reproducing if you enable robustBufferAccess?

    * Does it stop reproducing if you round up buffer size to be an exact multiple of 4 vertices?

    Also note this bit in the spec, which might apply to your usage: 

    vkMapMemory ... If the device memory was allocated without the VK_MEMORY_PROPERTY_HOST_COHERENT_BIT set, these guarantees must be made for an extended range: the application must round down the start of the range to the nearest multiple of VkPhysicalDeviceLimits::nonCoherentAtomSize, and round the end of the range up to the nearest multiple of VkPhysicalDeviceLimits::nonCoherentAtomSize.

    FWIW, aligning on nonCoherentAtomSize (cache line alignment) is probably good for performance anyway.

    Kind regards, 
    Pete

Reply
  • Hi Milan,

    Thanks for getting in touch. I'm not aware of any specific bug like this - can you share a reproducer? Feel free to email developer at arm dot com if you can only share privately. 

    A couple of diagnostics:

    * How are you mapping the buffer, are you using WHOLE_SIZE?

    * Does it stop reproducing if you enable robustBufferAccess?

    * Does it stop reproducing if you round up buffer size to be an exact multiple of 4 vertices?

    Also note this bit in the spec, which might apply to your usage: 

    vkMapMemory ... If the device memory was allocated without the VK_MEMORY_PROPERTY_HOST_COHERENT_BIT set, these guarantees must be made for an extended range: the application must round down the start of the range to the nearest multiple of VkPhysicalDeviceLimits::nonCoherentAtomSize, and round the end of the range up to the nearest multiple of VkPhysicalDeviceLimits::nonCoherentAtomSize.

    FWIW, aligning on nonCoherentAtomSize (cache line alignment) is probably good for performance anyway.

    Kind regards, 
    Pete

Children