I've read this: https://community.arm.com/developer/tools-software/graphics/f/discussions/7975/programs-pipelines-performances-questions/30022#30022
Daniele di Donato said: "For example, it will remove a varying calculation in the vertex shader if the fragment shader doesn't declare to use it."
I'd like to know whether the device driver will remove the computation of some varyings which eventually don't contribute to the output of fragment shader, even it's declared in the varying struct ?
Because it's typical to write a uber shader with several features, and toggle each them on and off by preprocessor symbol. But this can become very messy and difficult to maintain. If we can safely rely upon the device driver to optimise them out, it will make the shader development easier. For example, can we safely remove remove #ifdef FEATURE_1/#endif from our code and rely exclusive the device driver to optimise the code out ?
In a real situation, there will be a lot of #ifdef FEATURE_X and they can even be nested.
struct Varying { #ifdef FEATURE_1 float2 uv2: TEXCOORD1; #endif }; half4 frag(Varying input) { ... #ifdef FEATURE_1 color += uv2; #endif ... return color; }
Peter Harris said:you ALSO need to specialize your buffers to remove unused attributes to ensure you get the bandwidth savings on input vertex data
Simply using interleaved VBO should be enough to prevent unnecessary fetch of the unused attributes, right ?
Peter Harris said:like any optimization you are somewhat at the mercy of the compiler so the best option is to specialize the shader if you can as that guarantees the behavior you want.
I assume that I should not rely upon the compiler or the device driver to "pack" the varyings, right ?
Does the compiler make any attempt at all to pack the varyings ? Or it leaves packing entire to the users ?
This packing can become very complicated for an uber-shader, with a lot of conditional compilation directives (#ifdef).
struct Varying1 { float2 uv1: TEXCOORD0; float2 uv2: TEXCOORD1; }; struct Varying2 { float4 uv1_uv2: TEXCOORD0; };
Simply using interleaved VBO should be enough to prevent unnecessary fetch of the unused attributes, right?
Main memory will be accessed in 64 byte bursts (i.e. whole cache lines). If you have fully interleaved vertex data (array of structs style) then any unused attributes which are in the same 64 bytes as used attributes will be fetched.
Input vertex attributes generally can't be packed by the driver - the user can call glMapBuffer() at any point and expect to get their original memory layout.
Hi, Peter, my last question is actually about the varyings, not the attributes.
For varyings that's all controlled by the driver, and will repack as needed for the current pipeline based on what's actually used. The application gets no direct control over it in either case.
HTH, Pete