This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Vulkan Subpass gets a higher GPU load?

Hello, everyone

I'm using Unity Engine to develop an android mobile game and I'm focusing on Vulkan API and multiple sub pass. In our case,we separate the render into 2 sub-pass, the first is the opaque pass and the second one is the transparent pass.

I'm using 3 attachment,0 for depth,1 for color0,2 for color1.In the first sub pass,the inputAttachment is null,the depthAttachment's index is 0,the colorAttachment is 1 and 2.In the second subpass,the inputAttachment index is 1 and 2 for the transparent object draw required,and the depthAttachment's index is 0,the colorAttachment is 1 and 2 which is same as first subpass.

and I have a comparable render feature that does not use multiple sub pass, that draws opaque and then store the result(color and depth) and then copy the color and depth to other textures, and then draw the transparent object, and access the copied color and depth during shading.

The result from arm mobile studio streamline is using the first method takes more $MaliGPUCyclesGPUActiv but less $MaliGPUTasksFragmentTasks compare to the second traditional method and to get High OverDraw and PixelsThroughput and the performance is worse.

I'm wondering the reason if this kind of multiple sub pass is suitable for our case because most introduction of multiple sub pass in Vulkan is for deferred rendering only.

The mobile phone is Kirin820~G57 and the method is from the G57 counter document.

Thanks all

Top replies

Parents

0 Peter Harris over 2 years ago

Hi,

Depending on the algorithm, merged subpasses can be slower due to scheduling bubbles between layers (a later layer cannot progress until an earlier layer at the pixel location has written its result to tile memory). This can result in a clock-for-clock performance reduction, but still usually results in a system-wide energy efficiency improvement due to the lower memory bandwidth.

These issues should be much improved in newer Mali GPUs such as the Mali-G710.

Kind regards,
Pete
Cancel
Up +1 Down

Cancel

Reply

0 Peter Harris over 2 years ago

Hi,

Depending on the algorithm, merged subpasses can be slower due to scheduling bubbles between layers (a later layer cannot progress until an earlier layer at the pixel location has written its result to tile memory). This can result in a clock-for-clock performance reduction, but still usually results in a system-wide energy efficiency improvement due to the lower memory bandwidth.

These issues should be much improved in newer Mali GPUs such as the Mali-G710.

Kind regards,
Pete
Cancel
Up +1 Down

Cancel

Children

0 NOVA over 2 years ago in reply to Peter Harris

Thank you, Pete

For the Mali series, We choose 3 mobile phones as below and don't have G710 based device to test,sorry about that.

Kirin 820 based on G-57 MP6

Kirin 990 based on G-76 MP 16

Mediatek Dimensity 1000+ based on G-77 MP9

those mobile phones are almost the first-class ones in our game design, and I just need to make the final decision to use the multi-sub-pass method or the old fashion method as we are not only focused on the FPS but also the battery thermal issue. It seems the traditional method gets better battery thermal performance as the device's temperature is not hot as the multi sub pass method.

PS:
On the multi sub pass method, the second sub pass for rendering transparent objects, the input attachment's index is 0 and 1, they are exactly the same as the output attachment 0 and 1. Will it be the reason for the GPU cycles for the delay because it's both read-only layout and color attachment layout?
We also test one case to just disable any input attachment for 2nd surpass and see better(low) MaliGPUCyclesGPUActive and better MaliCoreWarpsFragmentWarps but the display is not correct since the transparent shading can't access input 0 and 1.

Thank you.
Cancel
Up 0 Down

Cancel
0 Peter Harris over 2 years ago in reply to NOVA

Delays come when using subpassLoad() in the second subpass to load results from the first subpass. The first layer in the second subpass must wait for the earlier layer from the first subpass to complete.

HTH,
Pete
Cancel
Up +1 Down

Cancel