Please note: We are aware of an issue affecting replies on the Arm Community forums, which may not be loading as expected.

We apologize for any inconvenience and appreciate your patience while we investigate and work to resolve the issue.

Thank you for your understanding.


This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Uniform control-flow cycles reported by MALIOC

I have a Unity shader using the multi-compile keyword. I am trying to replace it with a uniform flow-control in order to reduce the number of variants.

I have 4 questions.

Q1: I cannot understand the output of MALIOC (Mali-G71).

Arithmetic Cycles of Fragment shader (in all cases Total Cycles==Shortest Path Cycles==Longest Path Cycles)

- Without the keyword: 7.50

- With the keyword: 7.65

- Uniform flow-control: 7.50 

It seems to me that MALIOC reports the cycles of shader with uniform flow-control by assuming the uniform value, and thus only computes the cycles of a path.

If the instructions of both paths are executed, the cycles should be much longer.

Q2: Is uniform flow-control so terrible as described here ? https://developer.arm.com/documentation/101897/0200/shader-code/uniform-control-flow

Q3: May we assume that the driver optimises the shader on-the-fly based upon the uniform value so that only one branch will be executed (I guess not) ?

Q4: Which GPU counters should I check in Streamline for the potential problems of uniform flow control ? According to my experiment, the "Diverged instructions" are almost none in all cases.

Parents
  • I use UNITY_BRANCH to force a branch instruction to be generated. It's still uniform-based flow control.

    Unity shows the generated code uses if-statement instead of ternary operador (default). I cannot know how exactly they will be translated to in lower-level.

    According to my measurement with Streamline, the fragment cycles and executed instructions are almost the same for both implementation. If I understand it correctly, even if a branch instruction is generated, both then-path and else-path must be executed by the shader core anyway, is this correct ?

    Isn't there any branch prediction in this case ?

Reply
  • I use UNITY_BRANCH to force a branch instruction to be generated. It's still uniform-based flow control.

    Unity shows the generated code uses if-statement instead of ternary operador (default). I cannot know how exactly they will be translated to in lower-level.

    According to my measurement with Streamline, the fragment cycles and executed instructions are almost the same for both implementation. If I understand it correctly, even if a branch instruction is generated, both then-path and else-path must be executed by the shader core anyway, is this correct ?

    Isn't there any branch prediction in this case ?

Children
No data