This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Some questions about MSAA on mali-G77

Hi, I'm a mobile game developer and I try to use MSAA 4x in my game recently. As far as I know, MSAA is almost "free" on Mali GPU. I use UE 4.27 and build a demo to profile the performance.

Demo is using forward render pipeline (Vulkan) and material of scene object is using unlit shading model which is just using its world space normal value as its pixel color.

The profile result shows that GPU Active increase about 20%! The Fragment queue active also increase 20%.

I understand that using MSAA 4x will make more primitives going to the rasterizer and create more quads because there are 4 times sample points within a pixel.
What make me confused is the increament of Fragment Warp/Execution Core Active is not the same to GPU Active/Fragment queue active. Increasement of Fragment Warp is about 3% and Execution Core Active is about 30%.

Since all objects in my demo scene are using the same simple material, I expect that when the workload(e.g. fragment warps) increased by A%, the gpu active should also increased around by A% or even less thant that. But it seems not true according to the profiling result.

There's something even stranger that after using MSAA 4x, the usage of varying unit and texture unit are decreasing!? (More warps but less varing/textureing ????)

So, my questions are:
1. Is MSAA not "free" actually? The increase of GPU Active (20% ~ 30%) is expected?
2. Why the growth rates of Fragment Warp/Execution Core Active/GPU Active are different?
3. What's going on with each unit when using MSAA 4x?

Thanks!

Parents
  • Do you suspect that the performance degradation is caused by FPK?

    No. That metric gives some indication of how well the fixed-function front-end (rasterization, early-zs, etc) is keeping the core fed with fragment quads. If you saw a large drop then that would be indicative that you were seeing a front-end problem. The fact you don't points more towards late-zs or blending being the slow path. 

    If you are able to share the Streamline capture I'd be happy to take a look. If you can't share this publicly, free free to get in touch via developer@arm.com.

    Kind regards, 
    Pete

Reply
  • Do you suspect that the performance degradation is caused by FPK?

    No. That metric gives some indication of how well the fixed-function front-end (rasterization, early-zs, etc) is keeping the core fed with fragment quads. If you saw a large drop then that would be indicative that you were seeing a front-end problem. The fact you don't points more towards late-zs or blending being the slow path. 

    If you are able to share the Streamline capture I'd be happy to take a look. If you can't share this publicly, free free to get in touch via developer@arm.com.

    Kind regards, 
    Pete

Children