Hi, our game have a lot of shaders and now it's getting harder to add more shader permutations.So we tried some trick to lower shader permutations. We have a uber shader which outputs depth to a different render target.Can's use depth resolve here because we're using MSAA. Can't store it to alpha channel because we have used all available bits to store HDR colors.
We don't want to add a new permutation separating shaders who needs to write extra depth and who doesn't.So we tried to trick the gpu, we simply don't bind the separate depth render target on low-end devices, and hope without those extra bandwidth, the performance would be the same as using a permutation. I teste on mali gpu G76MP16, and the result implies it's a negative optimization.
The fps drops from 40 to 20, non-fragment cycles grows from 10M to 22M:
Memory related bandwidth/load store instructions are grows down as expected, but non-fragment related cycles grows up massively.Does this mean we shouldn't use this no-binding trick to save shader permutations?
Hi, robert, i'v sen our apc file to developer@arm.com.
But i just found out that similar thing happens when we switch some rendering related quality settings.When switching quality settings in game (GLES, vulkan is find), every metric goes down but No-fragment-cycles.
I guess it's related to UnrealEngine4.26.2's mali GLES implimentation. Same action works fine on adreno(both GLES vulkan works).So it might be some high level implimentation bug, we'll try to figure out our self, thanks anyway.