Hi! We're currently working on implementing subpasses for Vulkan and encountered really strange behaviour on Mali GPUs, specifically G76 (Samsung S10), G77 (S20FE). Samsung S10 is running Android 12. In short, it looks like the driver is not merging subpasses.
The render pass in question consisted of two subpasses. We first output something similar to G-Buffer, including depth, then read the data using input attachments.
We first noticed that subpasses on Mali did not give us performance improvement, or in case of Note 8 Pro, noticeable performance degradation. When we looked at AGI captures, the AGI showed two different render passes with the same VkRenderPass handle, which suggested that driver did not merge subpasses.
Next, we tried to reproduce the issue using the following examples, and observed the same behaviour.
https://github.com/KhronosGroup/Vulkan-Samples
https://github.com/SaschaWillems/Vulkan
In case of Vulkan Samples repo, on Samsung S10, switching between Subpasses and Render Passes did not change Tile Count or system memory accesses. When we tried running Vulkan Samples on Huawei Nova 5T (A10, Mali-G76 MP10), switching from Render Passes to Subpasses yields 2x decrease in Tile Count and system memory reads/writes. As for G77, it also shows our new merged pass with two subpasses as two render passes.
In case of S10 it's especially surprising, as Vulkan Samples page on Subpasses (https://github.com/KhronosGroup/Vulkan-Samples/tree/main/samples/performance/subpasses) mentions this exact phone and shows expected tile usage improvements.
As those samples exhibit the same issues as our client code, is there anything wrong or potentially wrong that may hint the driver to not merge the subpasses? And how should correctly merged subpasses look in AGI?
The other thing I'd like to mention is that we also tried binding depth simultaneously as IA and as pDepthStencilAttachment in DEPTH_STENCIL_READ_ONLY layout. This did not change anything, although ARM guides suggest that this is the correct way of using depth after it's rendered.