This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Unexpected behaviour of VKN subpasses on G76 (Samsung S10), G77 (S20FE)

Hi! We're currently working on implementing subpasses for Vulkan and encountered really strange behaviour on Mali GPUs, specifically G76 (Samsung S10), G77 (S20FE). Samsung S10 is running Android 12. In short, it looks like the driver is not merging subpasses.

The render pass in question consisted of two subpasses. We first output something similar to G-Buffer, including depth, then read the data using input attachments.

We first noticed that subpasses on Mali did not give us performance improvement, or in case of Note 8 Pro, noticeable performance degradation. When we looked at AGI captures, the AGI showed two different render passes with the same VkRenderPass handle, which suggested that driver did not merge subpasses.

Next, we tried to reproduce the issue using the following examples, and observed the same behaviour.

https://github.com/KhronosGroup/Vulkan-Samples

https://github.com/SaschaWillems/Vulkan

In case of Vulkan Samples repo, on Samsung S10, switching between Subpasses and Render Passes did not change Tile Count or system memory accesses. When we tried running Vulkan Samples on Huawei Nova 5T (A10, Mali-G76 MP10), switching from Render Passes to Subpasses yields 2x decrease in Tile Count and system memory reads/writes. As for G77, it also shows our new merged pass with two subpasses as two render passes.

In case of S10 it's especially surprising, as Vulkan Samples page on Subpasses (https://github.com/KhronosGroup/Vulkan-Samples/tree/main/samples/performance/subpasses) mentions this exact phone and shows expected tile usage improvements.

As those samples exhibit the same issues as our client code, is there anything wrong or potentially wrong that may hint the driver to not merge the subpasses? And how should correctly merged subpasses look in AGI?

Parents
  • Also, here's my experiments with inputattachments example from

    https://github.com/SaschaWillems/Vulkan

    The original render pass creation code goes like this:

    		/*
    			First subpass
    			Fill the color and depth attachments
    		*/
    		VkAttachmentReference colorReference = { 1, VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL };
    		VkAttachmentReference depthReference = { 2, VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL };
    
    		subpassDescriptions[0].pipelineBindPoint = VK_PIPELINE_BIND_POINT_GRAPHICS;
    		subpassDescriptions[0].colorAttachmentCount = 1;
    		subpassDescriptions[0].pColorAttachments = &colorReference;
    		subpassDescriptions[0].pDepthStencilAttachment = &depthReference;
    
    		/*
    			Second subpass
    			Input attachment read and swap chain color attachment write
    		*/
    
    		// Color reference (target) for this sub pass is the swap chain color attachment
    		VkAttachmentReference colorReferenceSwapchain = { 0, VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL };
    
    		subpassDescriptions[1].pipelineBindPoint = VK_PIPELINE_BIND_POINT_GRAPHICS;
    		subpassDescriptions[1].colorAttachmentCount = 1;
    		subpassDescriptions[1].pColorAttachments = &colorReferenceSwapchain;
    
    		// Color and depth attachment written to in first sub pass will be used as input attachments to be read in the fragment shader
    		VkAttachmentReference inputReferences[2];
    		inputReferences[0] = { 1, VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL };
    		inputReferences[1] = { 2, VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL };
    
    		// Use the attachments filled in the first pass as input attachments
    		subpassDescriptions[1].inputAttachmentCount = 2;
    		subpassDescriptions[1].pInputAttachments = inputReferences;
    
    		/*
    			Subpass dependencies for layout transitions
    		*/
    		std::array<VkSubpassDependency, 3> dependencies;
    
    		dependencies[0].srcSubpass = VK_SUBPASS_EXTERNAL;
    		dependencies[0].dstSubpass = 0;
    		dependencies[0].srcStageMask = VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT;
    		dependencies[0].dstStageMask = VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT | VK_PIPELINE_STAGE_EARLY_FRAGMENT_TESTS_BIT;
    		dependencies[0].srcAccessMask = VK_ACCESS_MEMORY_READ_BIT;
    		dependencies[0].dstAccessMask = VK_ACCESS_COLOR_ATTACHMENT_READ_BIT | VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT | VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_WRITE_BIT;
    		dependencies[0].dependencyFlags = VK_DEPENDENCY_BY_REGION_BIT;
    
    		// This dependency transitions the input attachment from color attachment to shader read
    		dependencies[1].srcSubpass = 0;
    		dependencies[1].dstSubpass = 1;
    		dependencies[1].srcStageMask = VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT;
    		dependencies[1].dstStageMask = VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT;
    		dependencies[1].srcAccessMask = VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT;
    		dependencies[1].dstAccessMask = VK_ACCESS_SHADER_READ_BIT;
    		dependencies[1].dependencyFlags = VK_DEPENDENCY_BY_REGION_BIT;
    
    		dependencies[2].srcSubpass = 0;
    		dependencies[2].dstSubpass = VK_SUBPASS_EXTERNAL;
    		dependencies[2].srcStageMask = VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT;
    		dependencies[2].dstStageMask = VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT;
    		dependencies[2].srcAccessMask = VK_ACCESS_COLOR_ATTACHMENT_READ_BIT | VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT;
    		dependencies[2].dstAccessMask = VK_ACCESS_MEMORY_READ_BIT;
    		dependencies[2].dependencyFlags = VK_DEPENDENCY_BY_REGION_BIT;

    After modifying it like this

    inputReferences[1] = { 2, VK_IMAGE_LAYOUT_DEPTH_READ_ONLY_OPTIMAL };
        
    .....
        
    subpassDescriptions[1].pDepthStencilAttachment = inputReferences + 1;
    
    .....
    
    dependencies[1].srcStageMask = VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT | VK_PIPELINE_STAGE_EARLY_FRAGMENT_TESTS_BIT | VK_PIPELINE_STAGE_LATE_FRAGMENT_TESTS_BIT;
    dependencies[1].dstStageMask = VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT | VK_PIPELINE_STAGE_EARLY_FRAGMENT_TESTS_BIT | VK_PIPELINE_STAGE_LATE_FRAGMENT_TESTS_BIT;
    dependencies[1].srcAccessMask = VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT | VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_WRITE_BIT;
    dependencies[1].dstAccessMask = VK_ACCESS_INPUT_ATTACHMENT_READ_BIT | VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_READ_BIT;
    dependencies[1].dependencyFlags = VK_DEPENDENCY_BY_REGION_BIT;

    The FPS in demo drops from ~37 to ~35.

Reply
  • Also, here's my experiments with inputattachments example from

    https://github.com/SaschaWillems/Vulkan

    The original render pass creation code goes like this:

    		/*
    			First subpass
    			Fill the color and depth attachments
    		*/
    		VkAttachmentReference colorReference = { 1, VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL };
    		VkAttachmentReference depthReference = { 2, VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL };
    
    		subpassDescriptions[0].pipelineBindPoint = VK_PIPELINE_BIND_POINT_GRAPHICS;
    		subpassDescriptions[0].colorAttachmentCount = 1;
    		subpassDescriptions[0].pColorAttachments = &colorReference;
    		subpassDescriptions[0].pDepthStencilAttachment = &depthReference;
    
    		/*
    			Second subpass
    			Input attachment read and swap chain color attachment write
    		*/
    
    		// Color reference (target) for this sub pass is the swap chain color attachment
    		VkAttachmentReference colorReferenceSwapchain = { 0, VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL };
    
    		subpassDescriptions[1].pipelineBindPoint = VK_PIPELINE_BIND_POINT_GRAPHICS;
    		subpassDescriptions[1].colorAttachmentCount = 1;
    		subpassDescriptions[1].pColorAttachments = &colorReferenceSwapchain;
    
    		// Color and depth attachment written to in first sub pass will be used as input attachments to be read in the fragment shader
    		VkAttachmentReference inputReferences[2];
    		inputReferences[0] = { 1, VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL };
    		inputReferences[1] = { 2, VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL };
    
    		// Use the attachments filled in the first pass as input attachments
    		subpassDescriptions[1].inputAttachmentCount = 2;
    		subpassDescriptions[1].pInputAttachments = inputReferences;
    
    		/*
    			Subpass dependencies for layout transitions
    		*/
    		std::array<VkSubpassDependency, 3> dependencies;
    
    		dependencies[0].srcSubpass = VK_SUBPASS_EXTERNAL;
    		dependencies[0].dstSubpass = 0;
    		dependencies[0].srcStageMask = VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT;
    		dependencies[0].dstStageMask = VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT | VK_PIPELINE_STAGE_EARLY_FRAGMENT_TESTS_BIT;
    		dependencies[0].srcAccessMask = VK_ACCESS_MEMORY_READ_BIT;
    		dependencies[0].dstAccessMask = VK_ACCESS_COLOR_ATTACHMENT_READ_BIT | VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT | VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_WRITE_BIT;
    		dependencies[0].dependencyFlags = VK_DEPENDENCY_BY_REGION_BIT;
    
    		// This dependency transitions the input attachment from color attachment to shader read
    		dependencies[1].srcSubpass = 0;
    		dependencies[1].dstSubpass = 1;
    		dependencies[1].srcStageMask = VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT;
    		dependencies[1].dstStageMask = VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT;
    		dependencies[1].srcAccessMask = VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT;
    		dependencies[1].dstAccessMask = VK_ACCESS_SHADER_READ_BIT;
    		dependencies[1].dependencyFlags = VK_DEPENDENCY_BY_REGION_BIT;
    
    		dependencies[2].srcSubpass = 0;
    		dependencies[2].dstSubpass = VK_SUBPASS_EXTERNAL;
    		dependencies[2].srcStageMask = VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT;
    		dependencies[2].dstStageMask = VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT;
    		dependencies[2].srcAccessMask = VK_ACCESS_COLOR_ATTACHMENT_READ_BIT | VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT;
    		dependencies[2].dstAccessMask = VK_ACCESS_MEMORY_READ_BIT;
    		dependencies[2].dependencyFlags = VK_DEPENDENCY_BY_REGION_BIT;

    After modifying it like this

    inputReferences[1] = { 2, VK_IMAGE_LAYOUT_DEPTH_READ_ONLY_OPTIMAL };
        
    .....
        
    subpassDescriptions[1].pDepthStencilAttachment = inputReferences + 1;
    
    .....
    
    dependencies[1].srcStageMask = VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT | VK_PIPELINE_STAGE_EARLY_FRAGMENT_TESTS_BIT | VK_PIPELINE_STAGE_LATE_FRAGMENT_TESTS_BIT;
    dependencies[1].dstStageMask = VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT | VK_PIPELINE_STAGE_EARLY_FRAGMENT_TESTS_BIT | VK_PIPELINE_STAGE_LATE_FRAGMENT_TESTS_BIT;
    dependencies[1].srcAccessMask = VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT | VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_WRITE_BIT;
    dependencies[1].dstAccessMask = VK_ACCESS_INPUT_ATTACHMENT_READ_BIT | VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_READ_BIT;
    dependencies[1].dependencyFlags = VK_DEPENDENCY_BY_REGION_BIT;

    The FPS in demo drops from ~37 to ~35.

Children