This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Unused varyings optimisation

I've read this: https://community.arm.com/developer/tools-software/graphics/f/discussions/7975/programs-pipelines-performances-questions/30022#30022

Daniele di Donato said: "For example, it will remove a varying calculation in the vertex shader if the fragment shader doesn't declare to use it."

I'd like to know whether the device driver will remove the computation of some varyings which eventually don't contribute to the output of fragment shader, even it's declared in the varying struct ?

Because it's typical to write a uber shader with several features, and toggle each them on and off by preprocessor symbol. But this can become very messy and difficult to maintain. If we can safely rely upon the device driver to optimise them out, it will make the shader development easier. For example, can we safely remove remove #ifdef FEATURE_1/#endif from our code and rely exclusive the device driver to optimise the code out ?

In a real situation, there will be a lot of #ifdef FEATURE_X and they can even be nested.

struct Varying {
#ifdef FEATURE_1
    float2 uv2: TEXCOORD1;
#endif    
};

half4 frag(Varying input) {
    ...
#ifdef FEATURE_1
    color += uv2;
#endif
    ...
	return color;
}

Parents
  • Simply using interleaved VBO should be enough to prevent unnecessary fetch of the unused attributes, right?    

    Main memory will be accessed in 64 byte bursts (i.e. whole cache lines). If you have fully interleaved vertex data (array of structs style) then any unused attributes which are in the same 64 bytes as used attributes will be fetched.

    I assume that I should not rely upon the compiler or the device driver to "pack" the varyings, right ?

    Input vertex attributes generally can't be packed by the driver - the user can call glMapBuffer() at any point and expect to get their original memory layout.

Reply
  • Simply using interleaved VBO should be enough to prevent unnecessary fetch of the unused attributes, right?    

    Main memory will be accessed in 64 byte bursts (i.e. whole cache lines). If you have fully interleaved vertex data (array of structs style) then any unused attributes which are in the same 64 bytes as used attributes will be fetched.

    I assume that I should not rely upon the compiler or the device driver to "pack" the varyings, right ?

    Input vertex attributes generally can't be packed by the driver - the user can call glMapBuffer() at any point and expect to get their original memory layout.

Children