I have a Unity shader using the multi-compile keyword. I am trying to replace it with a uniform flow-control in order to reduce the number of variants.
I have 4 questions.
Q1: I cannot understand the output of MALIOC (Mali-G71).
Arithmetic Cycles of Fragment shader (in all cases Total Cycles==Shortest Path Cycles==Longest Path Cycles)
- Without the keyword: 7.50
- With the keyword: 7.65
- Uniform flow-control: 7.50
It seems to me that MALIOC reports the cycles of shader with uniform flow-control by assuming the uniform value, and thus only computes the cycles of a path.
If the instructions of both paths are executed, the cycles should be much longer.
Q2: Is uniform flow-control so terrible as described here ? https://developer.arm.com/documentation/101897/0200/shader-code/uniform-control-flow
Q3: May we assume that the driver optimises the shader on-the-fly based upon the uniform value so that only one branch will be executed (I guess not) ?
Q4: Which GPU counters should I check in Streamline for the potential problems of uniform flow control ? According to my experiment, the "Diverged instructions" are almost none in all cases.
I use UNITY_BRANCH to force a branch instruction to be generated. It's still uniform-based flow control.
Unity shows the generated code uses if-statement instead of ternary operador (default). I cannot know how exactly they will be translated to in lower-level.
According to my measurement with Streamline, the fragment cycles and executed instructions are almost the same for both implementation. If I understand it correctly, even if a branch instruction is generated, both then-path and else-path must be executed by the shader core anyway, is this correct ?
Isn't there any branch prediction in this case ?