I'd like to know what it means when a fragment shader is bound by Varying unity (V), in our case.
According to: https://developer.arm.com/documentation/101863/7-4/Mali-GPU-pipelines/Mali-Bifrost-architecture
The varying pipeline is a dedicated pipeline which implements the varying interpolator.
Does it mean that the it takes a lot of cycles just interpolating the varyings than ALU operations, and reducing the amount of varyings could potentially reduce the fragment shader cycles ?
For example:
Mali Offline Compiler v7.4.0 (Build 330167)Copyright 2007-2021 Arm Limited, all rights reserved
Mali Offline Compiler v7.4.0 (Build 330167)
Copyright 2007-2021 Arm Limited, all rights reserved
Configuration=============
Configuration
=============
Hardware: Mali-G71 r0p1Architecture: BifrostDriver: r32p0-00rel0Shader type: OpenGL ES Fragment
Hardware: Mali-G71 r0p1
Architecture: Bifrost
Driver: r32p0-00rel0
Shader type: OpenGL ES Fragment
Main shader===========
Main shader
===========
Work registers: 24Uniform registers: 12Stack spilling: false16-bit arithmetic: 60%
Work registers: 24
Uniform registers: 12
Stack spilling: false
16-bit arithmetic: 60%
A LS V T BoundTotal instruction cycles: 1.42 0.00 3.50 2.00 VShortest path cycles: 1.42 0.00 3.50 2.00 VLongest path cycles: 1.42 0.00 3.50 2.00 V
A LS V T Bound
Total instruction cycles: 1.42 0.00 3.50 2.00 V
Shortest path cycles: 1.42 0.00 3.50 2.00 V
Longest path cycles: 1.42 0.00 3.50 2.00 V
A = Arithmetic, LS = Load/Store, V = Varying, T = Texture
Texture coordinates nearly always need more than mediump precision to get enough sub-texel accuracy for stable filtering, so the compiler will implicitly promote the precision of varyings used in texture lookups to highp. Highp interpolation is half the speed of mediump interpolation.
Pete, does that mean varyings will also be stored at full precision, or the promotion only happens upon loading and subsequent interpolation?
We tend to pack mixed semantic varyings in attempt to achieve "optimal" packing: say 2 uvs with 2 lighting params in a single mediump vec4. Assuming we can't find "free" lanes in other mediump varyings I assume we have to bite the bullet for cases like this and have these other varyings spend more ALU they would need otherwise right?