This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Why reordering uniforms affects arithmetic cycles?

Hello.


I've recently started using Mali Offline Compiler to get insight into our shaders and I get confusing results from it which I can't really explain.

So I have one quite big shader.

It has block of uniforms, quite large one cause it's uber shader.
I noticed that if I reorder uniforms in a different way - I get different results from Mali compiler.

Fullscreen
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
#if HLSLCC_ENABLE_UNIFORM_BUFFERS
UNITY_BINDING(0) uniform UnityPerMaterial {
#endif
UNITY_UNIFORM vec4 _MainTex_ST;
UNITY_UNIFORM float _MainTexUVSet2;
UNITY_UNIFORM vec4 _SecondaryTex_ST;
UNITY_UNIFORM mediump vec4 _SecondaryColor;
UNITY_UNIFORM float _SecondaryTexUVSet2;
UNITY_UNIFORM vec4 _MaskTex_ST;
UNITY_UNIFORM float _MaskTexUVSet2;
UNITY_UNIFORM vec4 _DissolveTex_ST;
UNITY_UNIFORM float _DissolveTexUVSet2;
UNITY_UNIFORM mediump vec3 _MainColorBright;
UNITY_UNIFORM mediump vec3 _MainColorMid;
UNITY_UNIFORM mediump vec3 _MainColorDark;
UNITY_UNIFORM mediump vec4 _MainColor;
UNITY_UNIFORM vec2 _MainTexScrollSpeed;
UNITY_UNIFORM vec2 _SecondaryTexScrollSpeed;
UNITY_UNIFORM vec2 _DissolveTexScrollSpeed;
UNITY_UNIFORM mediump float _Intensity;
UNITY_UNIFORM mediump float _PSDriven;
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 

So if I take let say _Curvature uniform and reorder it so it's before any other half/int variable
Here are results from fragment shader:

Fullscreen
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
Mali Offline Compiler v7.4.0 (Build 330167)
Copyright 2007-2021 Arm Limited, all rights reserved
Configuration
=============
Hardware: Mali-T720 r1p1
Architecture: Midgard
Driver: r23p0-00rel0
Shader type: OpenGL ES Fragment
Main shader
===========
Work registers: 4
Uniform registers: 0
Stack spilling: false
A LS T Bound
Total instruction cycles: 16.00 9.00 4.00 A
Shortest path cycles: 10.00 9.00 3.00 A
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

And then they become

Fullscreen
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
Mali Offline Compiler v7.4.0 (Build 330167)
Copyright 2007-2021 Arm Limited, all rights reserved
Configuration
=============
Hardware: Mali-T720 r1p1
Architecture: Midgard
Driver: r23p0-00rel0
Shader type: OpenGL ES Fragment
Main shader
===========
Work registers: 4
Uniform registers: 0
Stack spilling: false
A LS T Bound
Total instruction cycles: 16.00 9.00 4.00 A
Shortest path cycles: 9.50 9.00 3.00 A
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX


This uniform is only used in vertex shader but somehow it also affects fragment shader results.

Why do arithmetic cycles are now different?

Right now I have no idea what affects it and how to optimize this in the best possible way and if I should even bother.
But when shader executes in let say 10 cycles and reordering fields can make it execute in 9 or even 8 cycles - this is 10-20% of performance to be gained so I would like to understand what's going on underhood.

Is there a way to get disassembly from mali compiler?
Right now it is a black box to me.

I am attaching both shaders and output from mali compiler in case someone will take a look.

mali.zip

0