Hi! We're currently working on implementing subpasses for Vulkan and encountered really strange behaviour on Mali GPUs, specifically G76 (Samsung S10), G77 (S20FE). Samsung S10 is running Android 12. In short, it looks like the driver is not merging subpasses.
The render pass in question consisted of two subpasses. We first output something similar to G-Buffer, including depth, then read the data using input attachments.
We first noticed that subpasses on Mali did not give us performance improvement, or in case of Note 8 Pro, noticeable performance degradation. When we looked at AGI captures, the AGI showed two different render passes with the same VkRenderPass handle, which suggested that driver did not merge subpasses.
Next, we tried to reproduce the issue using the following examples, and observed the same behaviour.
https://github.com/KhronosGroup/Vulkan-Samples
https://github.com/SaschaWillems/Vulkan
In case of Vulkan Samples repo, on Samsung S10, switching between Subpasses and Render Passes did not change Tile Count or system memory accesses. When we tried running Vulkan Samples on Huawei Nova 5T (A10, Mali-G76 MP10), switching from Render Passes to Subpasses yields 2x decrease in Tile Count and system memory reads/writes. As for G77, it also shows our new merged pass with two subpasses as two render passes.
In case of S10 it's especially surprising, as Vulkan Samples page on Subpasses (https://github.com/KhronosGroup/Vulkan-Samples/tree/main/samples/performance/subpasses) mentions this exact phone and shows expected tile usage improvements.
As those samples exhibit the same issues as our client code, is there anything wrong or potentially wrong that may hint the driver to not merge the subpasses? And how should correctly merged subpasses look in AGI?
I tested those changes on Samsung S10. The GPU queue in AGI on the device looks like this:
Sometimes there are two fragment blocks, sometimes there are three(!!). Also, two vertex phases that belong to the same VkRenderPass handle.
As for your understanding, yes, you are correct. Samsung S10/S20FE do not show any indication of fusion (onscreen metrics, AGI), but Huawei Nova 5T does (decreases tiles, memory bandwidth by 2x).
Unfortunately we cannot share an APK with based on our game client, but you can use either most recent https://github.com/KhronosGroup/Vulkan-Samples , or even multipass sample from https://github.com/ARM-software/vulkan-sdk
Below is the AGI capture of ARM multipass sample running on S10. Again, two vertex blocks, two fragment blocks. There are no local modifications apart from updating Gradle to 4.2.0 and Android SDK to 28.