I am trying to get deeper understanding of shader optimization and have basic questions about constant and varying buffers.I read this thread (https://community.arm.com/support-forums/f/graphics-gaming-and-vr-forum/53873/will-data-precision-affect-uniform-block-layout) and also parts of mali optimization guide about uniforms but still have some questions:So for the context - we use Unity with SRP batching enabled (https://docs.unity3d.com/Manual/SRPBatcher.html - which means multiple drawcalls uniforms get grouped into big buffer containing uniforms for each drawcall with some offset)I am optimizing one shader which has huge buffer of uniforms, i.e.
CBUFFER_START(UnityPerMaterial) half4 _MainTex_ST; half4 _SecondaryTex_ST; half4 _MaskTex_ST; int _MainTexUVSet2; int _SecondaryTexUVSet2; int _MaskTexUVSet2; half4 _MainColor; half4 _SecondaryColor; half4 _MainColorBright; ... // rest CBUFFER_END
half _Toggle1; int _Toggle2; and later somewhere OUT.result = lerp(_Color1, _Color2, _Toggle1); or if (_Toggle2) { OUT.result = _Color2; } else { OUT.result = _Color1; }
struct Varyings { float4 position : SV_POSITION; float2 mainTexCoord : TEXCOORD0; float2 secondTexCoord : TEXCOORD1; half4 customData : TEXCOORD2; // x - intensity, y - dissolveAmount, z - step masking (ps) #if defined(_USEDISSOLVE_ON) float2 dissolveTexCoord : TEXCOORD6; // dissolve uvs #endif };
varying vec4 v_color; varying vec4 v_color2;
3. Am I correct that smaller size will help with two things i.e. less LS instructions in fragment shader and also after vertex shader executes and its result is written back to main memory (because tile-based GPUs) - less data to write - more bandwidth for other stuff? I know I asked a lot of questions this time, hopefully it will be useful for other people visiting this forum as well
All GLSL data types are 4 bytes, ... type size ... and alignment
Is stated above correct on all Mali hardware and graphics APIs...?
What I read in the guide - uniforms are promoted to registers and it's essentially "free" but you need to watch that size stays under 128 bytes.
Does it mean that combined size of used fields is under 128 bytes [or the higher limit above]?
Does it matter if some fields are unused but buffer is big?
Should I optimize buffer size i.e. have conditional compilation there as well?
What's the general strategy to optimize such buffers?
... Midgard worse, Bifrost better or vice versa
So sometimes I use uniform as some kind of toggle so later it's used in lerp or if branch to either do or not do something (where I don't want to introduce new #ifdef). I can either have this toggle as float/half or int.
I assume that in varyings halfs are actually 2 bytes, not 4 (like in constant buffers). Is it correct? float and int are 4 bytes?
Can you please explain if I should care about padding and packing in varyings or not?
Am I correct that smaller varying size will help with two things i.e. less LS instructions in fragment shader and also after vertex shader executes and its result is written back to main memory...?
I know I asked a lot of questions this time, hopefully it will be useful for other people visiting this forum as well