Hello.

I've recently started using Mali Offline Compiler to get insight into our shaders and I get confusing results from it which I can't really explain.

So I have one quite big shader.

It has block of uniforms, quite large one cause it's uber shader.

I noticed that if I reorder uniforms in a different way - I get different results from Mali compiler.

#if HLSLCC_ENABLE_UNIFORM_BUFFERS UNITY_BINDING(0) uniform UnityPerMaterial { #endif UNITY_UNIFORM vec4 _MainTex_ST; UNITY_UNIFORM float _MainTexUVSet2; UNITY_UNIFORM vec4 _SecondaryTex_ST; UNITY_UNIFORM mediump vec4 _SecondaryColor; UNITY_UNIFORM float _SecondaryTexUVSet2; UNITY_UNIFORM vec4 _MaskTex_ST; UNITY_UNIFORM float _MaskTexUVSet2; UNITY_UNIFORM vec4 _DissolveTex_ST; UNITY_UNIFORM float _DissolveTexUVSet2; UNITY_UNIFORM mediump vec3 _MainColorBright; UNITY_UNIFORM mediump vec3 _MainColorMid; UNITY_UNIFORM mediump vec3 _MainColorDark; UNITY_UNIFORM mediump vec4 _MainColor; UNITY_UNIFORM vec2 _MainTexScrollSpeed; UNITY_UNIFORM vec2 _SecondaryTexScrollSpeed; UNITY_UNIFORM vec2 _DissolveTexScrollSpeed; UNITY_UNIFORM mediump float _Intensity; UNITY_UNIFORM mediump float _PSDriven; UNITY_UNIFORM mediump float _DissolveAmount; UNITY_UNIFORM mediump float _DissolveSoftness; UNITY_UNIFORM int _ScrollMainTex; UNITY_UNIFORM int _ScrollSecondaryTex; UNITY_UNIFORM int _ScrollDissolveTex; UNITY_UNIFORM int _MultiplyWithVertexColor; UNITY_UNIFORM int _MultiplyWithVertexAlpha; UNITY_UNIFORM int _UseGradientMap; UNITY_UNIFORM int _UseStepMasking; UNITY_UNIFORM float _Curvature; UNITY_UNIFORM mediump float _StepBorder; UNITY_UNIFORM mediump float _UseRForSecondaryTex; UNITY_UNIFORM mediump float _UseRForMask; UNITY_UNIFORM mediump float _MaskSecondTexWithFirst; UNITY_UNIFORM mediump float _UseRAsAlpha; #if HLSLCC_ENABLE_UNIFORM_BUFFERS };

So if I take let say _Curvature uniform and reorder it so it's before any other half/int variable

Here are results from fragment shader:

Mali Offline Compiler v7.4.0 (Build 330167) Copyright 2007-2021 Arm Limited, all rights reserved Configuration ============= Hardware: Mali-T720 r1p1 Architecture: Midgard Driver: r23p0-00rel0 Shader type: OpenGL ES Fragment Main shader =========== Work registers: 4 Uniform registers: 0 Stack spilling: false A LS T Bound Total instruction cycles: 16.00 9.00 4.00 A Shortest path cycles: 10.00 9.00 3.00 A Longest path cycles: 10.25 9.00 3.00 A A = Arithmetic, LS = Load/Store, T = Texture

And then they become

Mali Offline Compiler v7.4.0 (Build 330167) Copyright 2007-2021 Arm Limited, all rights reserved Configuration ============= Hardware: Mali-T720 r1p1 Architecture: Midgard Driver: r23p0-00rel0 Shader type: OpenGL ES Fragment Main shader =========== Work registers: 4 Uniform registers: 0 Stack spilling: false A LS T Bound Total instruction cycles: 16.00 9.00 4.00 A Shortest path cycles: 9.50 9.00 3.00 A Longest path cycles: 9.75 9.00 3.00 A A = Arithmetic, LS = Load/Store, T = Texture

This uniform is only used in vertex shader but somehow it also affects fragment shader results.

Why do arithmetic cycles are now different?

Right now I have no idea what affects it and how to optimize this in the best possible way and if I should even bother.

But when shader executes in let say 10 cycles and reordering fields can make it execute in 9 or even 8 cycles - this is 10-20% of performance to be gained so I would like to understand what's going on underhood.

Is there a way to get disassembly from mali compiler?

Right now it is a black box to me.

I am attaching both shaders and output from mali compiler in case someone will take a look.