Numerous models of Mali (f.e. Galaxy 10Se) using Mali-G76 (Bifrost 2nd gen) are producing VK_DEVICE_LOST error when rendering 250K triangles or more. I read about the 180mb driver limit on the Mali systems, and how that simply hands a VK_DEVICE_LOST error back to the developer, and then it is up to them to split render passes. We don't have this issue with Adreno and other Android devices. iOS also has a parameter buffer, but flushes it behind the scenes so we've never hit any issues there either.
community.arm.com/.../memory-limits-with-vulkan-on-mali-gpus
This device lost error happens when I turn on terrain, or turn off culling on the terrain. This spike in triangle count going from 200K tris that render fine to 250k tris is when Vulkan returns VK_DEVICE_LOST and a message prior to that about "QueueSignalReleaseImageandroid failed:-4". Looking this up in the Vulkan sources indicates this is tied in with the framebuffer loss, so may be just the first part of the device loss.
So since I don't have a lot to go on, and Validation seems to crash the driver with an unknown symbol. I was able to fix a few validation errors using other non-Mali devices, but this code has mostly been working up until the high polycounts are hit.
1. Chunk up terrain into index chunks that represent spatially close triangles. These can be culled.
2. Copy out indices for each of the specific materials in new chunks (these are a subset of the indices in the original chunk). LODs work the same.
3. Draw each visible chunk with vkDrawIndexedIndirect that correspond with a given material. Disabling this optimization does not prevent the crash.
I read the Mali guide and there's not much to go on there about organizing vb or ib data. In general, iOS doesn't even recommend anything like repacking. Pete Harris had mention that Bifrost copies the entire min/maxIndex range of vertices, and Valhall copies on the visible/backfaced triangle vertices. So Vallhal gets around 50% more out of the same parameter buffer if half the triangles are backfacing.
With things moving towards mesh shaders and meshlets like in UE5, I was considering repacking/reordering/splitting up our vertex buffers so that each of the indices is an incrementing sequence mostly and the range is as tight as possible. I could even see if these are small enough, that 8-bit indices would suffice. But in step 3, we may pass say 100 of 200 index chunks to the driver that reference a single vb. I understand that within one index range (indexStart, indexCount) all verts are transformed, but if those 100 index chunks reference half the buffer, will only that half be allocated to the parameter buffer. LODs could be packed smallest to largest by appending the unique vertices to the end from the larger LODs.
So I just did an interesting test here. Draw 1 new chunk of terrain each frame until the Mali driver dies. This is independent of culling, but my viewpoint may or may not include the triangles. I also have vertex/index repack disabled which also somehow magically avoids this problem, but doubles our vertex count. Is there a bug in vertex dedupe across index ranges? When I hit 500 / 1500 chunks, that's when the driver fails with a flurry of these same errors and then a VK_DEVICE_LOST. This could obviously be a race condition in the way we recycle our command buffers, but wanted to share the errors, in case anyone has any insight.
Even if I bump up our max command buffer count to 2x what it is now, this error still occurs at roughly the same chunk count. This last time it was at 458. I think what's happening is that we or the driver are somehow returning an uncompleted command buffer back to our pool when enough draw calls/commands are submitted. This only happens with our terrain, not other draws.
I/TerrainBarn: chunkCounter 465D/mali.instrumentation.graph.work: key already addedD/mali.instrumentation.graph.work: key already added
I/TerrainBarn: chunkCounter 466D/mali.instrumentation.graph.work: key already addedD/mali.instrumentation.graph.work: key already addedI
/VALIDATION: VUID-vkBeginCommandBuffer-commandBuffer-00049(ERROR / SPEC): msgNum: 0 - Calling vkBeginCommandBuffer() on active VkCommandBuffer 0x7db4402d80[] before it has completed. You must check command buffer fence before this call. The Vulkan spec states: commandBuffer must not be in the recording or pending state. (https://www.khronos.org/registry/vulkan/specs/1.1-extensions/html/vkspec.html#VUID-vkBeginCommandBuffer-commandBuffer-00049) [0] 0x7db4411a20, type: 6, name: NULL
I/VALIDATION: VUID-vkFreeDescriptorSets-pDescriptorSets-00309(ERROR / SPEC): msgNum: 0 - vkUpdateDescriptorSets() failed write update validation for VkDescriptorSet 0x39a[] with error: Cannot call vkUpdateDescriptorSets() to perform write update on VkDescriptorSet VkDescriptorSet 0x39a[] allocated with VkDescriptorSetLayout VkDescriptorSetLayout 0x38d[] that is in use by a command buffer. The Vulkan spec states: All submitted commands that refer to any element of pDescriptorSets must have completed execution (https://www.khronos.org/registry/vulkan/specs/1.1-extensions/html/vkspec.html#VUID-vkFreeDescriptorSets-pDescriptorSets-00309) [0] 0x4fc, type: 23, name: NULL
I/VALIDATION: VUID-vkResetFences-pFences-01123(ERROR / SPEC): msgNum: 0 - VkFence 0xc7[] is in use. The Vulkan spec states: Each element of pFences must not be currently associated with any queue command that has not yet completed execution on that queue (https://www.khronos.org/registry/vulkan/specs/1.1-extensions/html/vkspec.html#VUID-vkResetFences-pFences-01123) Objects: 1 [0] 0xc7, type: 7, name: NULL
E/vulkan: QueueSignalReleaseImageANDROID failed: -4E/CRASH: ASSERT! VulkanRenderer.cpp (3003): Renderer Crash, Error: ERROR_DEVICE_LOST, exiting app.. Run 'make callstack' to see the symbolicated crash callstack
I suspect we're not going to be able to understand this one from our side without more detail on the specific API sequence (and possibly a reproducer). Assume we can't do that on the forums - please can you contact developer@arm.com and we'll see if we can help offline.
Kind regards, Pete
Already in contact with them. Unfortunately, the repack is already in the live build. So it may not be easy to share a capture. If I can get one of the few devices that supports AGI that isn't Mali, then I might be able to use that to do a capture of the render. I suspect bad logic in the dedupe/sharing of vertices across various index ranges in the driver.