Hi! We're currently working on implementing subpasses for Vulkan and encountered really strange behaviour on Mali GPUs, specifically G76 (Samsung S10), G77 (S20FE). Samsung S10 is running Android 12. In short, it looks like the driver is not merging subpasses.
The render pass in question consisted of two subpasses. We first output something similar to G-Buffer, including depth, then read the data using input attachments.
We first noticed that subpasses on Mali did not give us performance improvement, or in case of Note 8 Pro, noticeable performance degradation. When we looked at AGI captures, the AGI showed two different render passes with the same VkRenderPass handle, which suggested that driver did not merge subpasses.
Next, we tried to reproduce the issue using the following examples, and observed the same behaviour.
https://github.com/KhronosGroup/Vulkan-Samples
https://github.com/SaschaWillems/Vulkan
In case of Vulkan Samples repo, on Samsung S10, switching between Subpasses and Render Passes did not change Tile Count or system memory accesses. When we tried running Vulkan Samples on Huawei Nova 5T (A10, Mali-G76 MP10), switching from Render Passes to Subpasses yields 2x decrease in Tile Count and system memory reads/writes. As for G77, it also shows our new merged pass with two subpasses as two render passes.
In case of S10 it's especially surprising, as Vulkan Samples page on Subpasses (https://github.com/KhronosGroup/Vulkan-Samples/tree/main/samples/performance/subpasses) mentions this exact phone and shows expected tile usage improvements.
As those samples exhibit the same issues as our client code, is there anything wrong or potentially wrong that may hint the driver to not merge the subpasses? And how should correctly merged subpasses look in AGI?
vkCreateRenderPass2 vkCreateRenderPass2({ VkAttachmentDescription2[6], { { { 0, 1, 2 }, 3 }, { { 0 }, 3 } } }) device Device 50 CreateInfo VkRenderPassCreateInfo2() sType VK_STRUCTURE_TYPE_RENDER_PASS_CREATE_INFO_2 pNext NULL flags VkRenderPassCreateFlagBits(0) attachmentCount 6 pAttachments VkAttachmentDescription2[6] [0] VkAttachmentDescription2() sType VK_STRUCTURE_TYPE_ATTACHMENT_DESCRIPTION_2 pNext NULL flags VkAttachmentDescriptionFlagBits(0) format VK_FORMAT_B10G11R11_UFLOAT_PACK32 samples VK_SAMPLE_COUNT_4_BIT loadOp VK_ATTACHMENT_LOAD_OP_CLEAR storeOp VK_ATTACHMENT_STORE_OP_DONT_CARE stencilLoadOp VK_ATTACHMENT_LOAD_OP_DONT_CARE stencilStoreOp VK_ATTACHMENT_STORE_OP_DONT_CARE initialLayout VK_IMAGE_LAYOUT_UNDEFINED finalLayout VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL [1] VkAttachmentDescription2() sType VK_STRUCTURE_TYPE_ATTACHMENT_DESCRIPTION_2 pNext NULL flags VkAttachmentDescriptionFlagBits(0) format VK_FORMAT_R8G8_UNORM samples VK_SAMPLE_COUNT_4_BIT loadOp VK_ATTACHMENT_LOAD_OP_CLEAR storeOp VK_ATTACHMENT_STORE_OP_DONT_CARE stencilLoadOp VK_ATTACHMENT_LOAD_OP_DONT_CARE stencilStoreOp VK_ATTACHMENT_STORE_OP_DONT_CARE initialLayout VK_IMAGE_LAYOUT_UNDEFINED finalLayout VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL [2] VkAttachmentDescription2() sType VK_STRUCTURE_TYPE_ATTACHMENT_DESCRIPTION_2 pNext NULL flags VkAttachmentDescriptionFlagBits(0) format VK_FORMAT_R8_UNORM samples VK_SAMPLE_COUNT_4_BIT loadOp VK_ATTACHMENT_LOAD_OP_CLEAR storeOp VK_ATTACHMENT_STORE_OP_DONT_CARE stencilLoadOp VK_ATTACHMENT_LOAD_OP_DONT_CARE stencilStoreOp VK_ATTACHMENT_STORE_OP_DONT_CARE initialLayout VK_IMAGE_LAYOUT_UNDEFINED finalLayout VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL [3] VkAttachmentDescription2() sType VK_STRUCTURE_TYPE_ATTACHMENT_DESCRIPTION_2 pNext NULL flags VkAttachmentDescriptionFlagBits(0) format VK_FORMAT_D32_SFLOAT samples VK_SAMPLE_COUNT_4_BIT loadOp VK_ATTACHMENT_LOAD_OP_CLEAR storeOp VK_ATTACHMENT_STORE_OP_DONT_CARE stencilLoadOp VK_ATTACHMENT_LOAD_OP_DONT_CARE stencilStoreOp VK_ATTACHMENT_STORE_OP_DONT_CARE initialLayout VK_IMAGE_LAYOUT_UNDEFINED finalLayout VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL [4] VkAttachmentDescription2() sType VK_STRUCTURE_TYPE_ATTACHMENT_DESCRIPTION_2 pNext NULL flags VkAttachmentDescriptionFlagBits(0) format VK_FORMAT_B10G11R11_UFLOAT_PACK32 samples VK_SAMPLE_COUNT_1_BIT loadOp VK_ATTACHMENT_LOAD_OP_DONT_CARE storeOp VK_ATTACHMENT_STORE_OP_STORE stencilLoadOp VK_ATTACHMENT_LOAD_OP_DONT_CARE stencilStoreOp VK_ATTACHMENT_STORE_OP_DONT_CARE initialLayout VK_IMAGE_LAYOUT_UNDEFINED finalLayout VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL [5] VkAttachmentDescription2() sType VK_STRUCTURE_TYPE_ATTACHMENT_DESCRIPTION_2 pNext NULL flags VkAttachmentDescriptionFlagBits(0) format VK_FORMAT_D32_SFLOAT samples VK_SAMPLE_COUNT_1_BIT loadOp VK_ATTACHMENT_LOAD_OP_DONT_CARE storeOp VK_ATTACHMENT_STORE_OP_STORE stencilLoadOp VK_ATTACHMENT_LOAD_OP_DONT_CARE stencilStoreOp VK_ATTACHMENT_STORE_OP_DONT_CARE initialLayout VK_IMAGE_LAYOUT_UNDEFINED finalLayout VK_IMAGE_LAYOUT_DEPTH_STENCIL_READ_ONLY_OPTIMAL subpassCount 2 pSubpasses VkSubpassDescription2[2] [0] VkSubpassDescription2() sType VK_STRUCTURE_TYPE_SUBPASS_DESCRIPTION_2 pNext NULL flags VkSubpassDescriptionFlagBits(0) pipelineBindPoint VK_PIPELINE_BIND_POINT_GRAPHICS viewMask 0 inputAttachmentCount 0 pInputAttachments VkAttachmentReference2[0] colorAttachmentCount 3 pColorAttachments VkAttachmentReference2[3] [0] VkAttachmentReference2() sType VK_STRUCTURE_TYPE_ATTACHMENT_REFERENCE_2 pNext NULL attachment 0 layout VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL aspectMask VK_IMAGE_ASPECT_COLOR_BIT [1] VkAttachmentReference2() sType VK_STRUCTURE_TYPE_ATTACHMENT_REFERENCE_2 pNext NULL attachment 1 layout VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL aspectMask VK_IMAGE_ASPECT_COLOR_BIT [2] VkAttachmentReference2() sType VK_STRUCTURE_TYPE_ATTACHMENT_REFERENCE_2 pNext NULL attachment 2 layout VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL aspectMask VK_IMAGE_ASPECT_COLOR_BIT pResolveAttachments VkAttachmentReference2[0] pDepthStencilAttachment VkAttachmentReference2() sType VK_STRUCTURE_TYPE_ATTACHMENT_REFERENCE_2 pNext NULL attachment 3 layout VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL aspectMask VK_IMAGE_ASPECT_DEPTH_BIT preserveAttachmentCount 0 pPreserveAttachments uint32_t[0] [1] VkSubpassDescription2() sType VK_STRUCTURE_TYPE_SUBPASS_DESCRIPTION_2 pNext VkSubpassDescriptionDepthStencilResolve() sType VK_STRUCTURE_TYPE_SUBPASS_DESCRIPTION_DEPTH_STENCIL_RESOLVE pNext NULL depthResolveMode VK_RESOLVE_MODE_SAMPLE_ZERO_BIT stencilResolveMode VK_RESOLVE_MODE_SAMPLE_ZERO_BIT pDepthStencilResolveAttachment VkAttachmentReference2() sType VK_STRUCTURE_TYPE_ATTACHMENT_REFERENCE_2 pNext NULL attachment 5 layout VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL aspectMask VK_IMAGE_ASPECT_DEPTH_BIT flags VkSubpassDescriptionFlagBits(0) pipelineBindPoint VK_PIPELINE_BIND_POINT_GRAPHICS viewMask 0 inputAttachmentCount 3 pInputAttachments VkAttachmentReference2[3] [0] VkAttachmentReference2() sType VK_STRUCTURE_TYPE_ATTACHMENT_REFERENCE_2 pNext NULL attachment 1 layout VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL aspectMask VK_IMAGE_ASPECT_COLOR_BIT [1] VkAttachmentReference2() sType VK_STRUCTURE_TYPE_ATTACHMENT_REFERENCE_2 pNext NULL attachment 2 layout VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL aspectMask VK_IMAGE_ASPECT_COLOR_BIT [2] VkAttachmentReference2() sType VK_STRUCTURE_TYPE_ATTACHMENT_REFERENCE_2 pNext NULL attachment 3 layout VK_IMAGE_LAYOUT_DEPTH_STENCIL_READ_ONLY_OPTIMAL aspectMask VK_IMAGE_ASPECT_DEPTH_BIT colorAttachmentCount 1 pColorAttachments VkAttachmentReference2[1] [0] VkAttachmentReference2() sType VK_STRUCTURE_TYPE_ATTACHMENT_REFERENCE_2 pNext NULL attachment 0 layout VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL aspectMask VK_IMAGE_ASPECT_COLOR_BIT pResolveAttachments VkAttachmentReference2[1] [0] VkAttachmentReference2() sType VK_STRUCTURE_TYPE_ATTACHMENT_REFERENCE_2 pNext NULL attachment 4 layout VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL aspectMask VK_IMAGE_ASPECT_COLOR_BIT pDepthStencilAttachment VkAttachmentReference2() sType VK_STRUCTURE_TYPE_ATTACHMENT_REFERENCE_2 pNext NULL attachment 3 layout VK_IMAGE_LAYOUT_DEPTH_STENCIL_READ_ONLY_OPTIMAL aspectMask VK_IMAGE_ASPECT_DEPTH_BIT preserveAttachmentCount 0 pPreserveAttachments uint32_t[0] dependencyCount 3 pDependencies VkSubpassDependency2[3] [0] VkSubpassDependency2() sType VK_STRUCTURE_TYPE_SUBPASS_DEPENDENCY_2 pNext NULL srcSubpass UINT32_MAX dstSubpass 0 srcStageMask VK_PIPELINE_STAGE_LATE_FRAGMENT_TESTS_BIT | VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT dstStageMask VK_PIPELINE_STAGE_EARLY_FRAGMENT_TESTS_BIT | VK_PIPELINE_STAGE_LATE_FRAGMENT_TESTS_BIT | VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT srcAccessMask VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT | VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_READ_BIT | VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_WRITE_BIT dstAccessMask VK_ACCESS_COLOR_ATTACHMENT_READ_BIT | VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT | VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_READ_BIT dependencyFlags VK_DEPENDENCY_BY_REGION_BIT viewOffset 0 [1] VkSubpassDependency2() sType VK_STRUCTURE_TYPE_SUBPASS_DEPENDENCY_2 pNext NULL srcSubpass 0 dstSubpass 1 srcStageMask VK_PIPELINE_STAGE_LATE_FRAGMENT_TESTS_BIT | VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT dstStageMask VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT | VK_PIPELINE_STAGE_EARLY_FRAGMENT_TESTS_BIT | VK_PIPELINE_STAGE_LATE_FRAGMENT_TESTS_BIT | VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT srcAccessMask VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT | VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_WRITE_BIT dstAccessMask VK_ACCESS_INPUT_ATTACHMENT_READ_BIT | VK_ACCESS_COLOR_ATTACHMENT_READ_BIT | VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT | VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_READ_BIT dependencyFlags VK_DEPENDENCY_BY_REGION_BIT viewOffset 0 [2] VkSubpassDependency2() sType VK_STRUCTURE_TYPE_SUBPASS_DEPENDENCY_2 pNext NULL srcSubpass 1 dstSubpass UINT32_MAX srcStageMask VK_PIPELINE_STAGE_LATE_FRAGMENT_TESTS_BIT | VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT dstStageMask VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT | VK_PIPELINE_STAGE_EARLY_FRAGMENT_TESTS_BIT | VK_PIPELINE_STAGE_LATE_FRAGMENT_TESTS_BIT | VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT srcAccessMask VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT | VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_READ_BIT dstAccessMask VK_ACCESS_INPUT_ATTACHMENT_READ_BIT | VK_ACCESS_COLOR_ATTACHMENT_READ_BIT | VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT | VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_READ_BIT | VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_WRITE_BIT dependencyFlags VK_DEPENDENCY_BY_REGION_BIT viewOffset 0 correlatedViewMaskCount 0 pCorrelatedViewMasks uint32_t[0] pAllocator NULL RenderPass Render Pass 1847
Hi Christian, this VkCreateInfo is taken from RDoc capture. This is the render pass where we try to apply multipass rendering.We don't use VK_ACCESS_SHADER_READ_BIT. Can you take a look and see if we may be doing something wrong?