This article covers situations in which a Vulkan application might trigger an out of memory (OOM) condition on Mali GPUs. It results in a DEVICE_LOST error, even if the API usage is correct. The OOM condition that developers hit most often is due to a high vertex load. It might be relatively common when porting Vulkan applications from desktop to mobile.
Mali GPUs have a memory region which is available to store the intermediate geometry output from a render pass. This memory is used to store all the varying data that is generated by vertex, tessellation, and geometry shading prior to fragment shading. Exceeding the size of this region may result in a VK_ERROR_DEVICE_LOST. The limit is fixed to 180MB on current Mali GPUs, but it may be increased or lifted altogether in future GPUs.
The reasoning behind having a varying limit is that tile-based renderers need to write out and then read back intermediate geometry output. the vertex load is directly correlated to memory bandwidth. For, a typical program using 64 bytes of varying data per vertex the 180MB of intermediate storage can contain over 2 million vertices. Which, we expect to be enough for normal mobile application usage. We will now cover the reasons why such a vertex load is unlikely to be sustainable and possible mitigations if your application is hitting it.
Let us consider a vertex-heavy application with a single render pass that reaches the 180MB limit. The GPU has to write the data out and read it back from memory. This results in 2 x 180 = 360 MB/render pass, which at 30 FPS brings memory bandwidth up to 30 x 360 = 10.8 GB/s. Memory bandwidth has a direct correlation with power consumption, which can be estimated as 100 mW/(GB/s). This means that an application using 180MB of varying data consumes at least 1.08W, and this does not consider further contributions to memory bandwidth and general GPU power consumption. A mobile GPU cannot sustain such a power usage without overheating, which would further cause a reduction of GPU frequency and a performance drop.
The only real solution to the issue is to keep the application’s vertex count below approximately 2 million. This is derived above for an average of 64 bytes of varying data per vertex. In scenarios where the memory storage is exceeded and reducing the vertex load is not feasible, and it recommended that the application splits the render pass into multiple render passes, each using a safe amount of intermediate storage. Later render passes can use a loadOp=LOAD to restore the content of the frame buffer and continue rendering on top of earlier rendering. This form of incremental rendering might impact performance, due to the write-out and further read-back of the color image.
If your vertex load is unpredictable and you are hitting DEVICE_LOST issues in the field, you can set up a scheme for estimating memory consumption for each draw call in a render pass, then performing incremental rendering if the limit is reached. You should keep in mind that memory is allocated for all vertex indices between the min and max index referenced by a draw call. And, for all generated vertices for tessellation and geometry shading, even if they are subsequently culled by the clipping and culling pass. Such an estimate is conservative, as the actual amount of memory that is allocated might be lower, so we do not recommend adding a further safety margin to the 180MB limit.
Khronos and its members, in collaboration with Arm and external contributors, created the Vulkan Unified Samples Project in response to user demand for more accessible resources and best practices for developing with Vulkan. The sample code gives developers on-screen control to demonstrate multiple ways of using the feature. It also shows the performance impact of the different approaches through real-time hardware counters on the display.
We would encourage you to check out the project on Vulkan Samples GitHub page and try some of the samples for yourself. You are invited to contribute to the project by providing feedback and fixes and creating more samples. Check it out below.