We have unexpected frame drops after we integrated PLS into our UE4-based engine. For example, the following scene is rendered in both multi-pass and PLS ways. One can see from the fps that the latter suffers from non-trivial performance degradation.
After some analysis, we found that the major cause of this performance penalty is due to the lack of Early-Stencil culling in PLS. According to the streamline's captures:
Roughly speaking, Our deferred shading pipeline is as follows:We have three primary shading models, each with its own fragment shader. During the GBuffer pass, the stencil buffer is tagged by the geometry material's shading model ID. Then the following shading pass is divided into three full-screen quad draws, each responsible for its shading work.
In the traditional multi-pass pipeline, We can rely on the Early-Stencil culling mechanism to effectively kill the fragments which don't have the matching shading model ID with the stencil buffer before entering their fragment shaders. However, it turns out that this won't work in the PLS case.
In this situation, tons of fragments are actively shaded and then pathetically killed in the Late-ZS stage.
Please note from the chart, it seems that the Early-stencil test has indeed happened. But for some reason, Culling is not there.
I confirm that we didn't do any fancy stuff like clip pixels or depth modifications during the shading passes.
Unfortunately, I couldn't find any resources related to the Early-stencil test in PLS, from both
, or any other relevant forums.
So can anyone tell me if the issue is an open secret, or did I miss some things to make it work correctly?
Thanks and Any replies will be great appreciation.