This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Unused varyings optimisation

I've read this: https://community.arm.com/developer/tools-software/graphics/f/discussions/7975/programs-pipelines-performances-questions/30022#30022

Daniele di Donato said: "For example, it will remove a varying calculation in the vertex shader if the fragment shader doesn't declare to use it."

I'd like to know whether the device driver will remove the computation of some varyings which eventually don't contribute to the output of fragment shader, even it's declared in the varying struct ?

Because it's typical to write a uber shader with several features, and toggle each them on and off by preprocessor symbol. But this can become very messy and difficult to maintain. If we can safely rely upon the device driver to optimise them out, it will make the shader development easier. For example, can we safely remove remove #ifdef FEATURE_1/#endif from our code and rely exclusive the device driver to optimise the code out ?

In a real situation, there will be a lot of #ifdef FEATURE_X and they can even be nested.

struct Varying {
#ifdef FEATURE_1
    float2 uv2: TEXCOORD1;
#endif    
};

half4 frag(Varying input) {
    ...
#ifdef FEATURE_1
    color += uv2;
#endif
    ...
	return color;
}

Parents
  • I'd like to know whether the device driver will remove the computation of some varyings which eventually don't contribute to the output of fragment shader, even it's declared in the varying struct ?  

    It should, as long as you are not using OpenGL ES separate shader objects. However, like any optimization you are somewhat at the mercy of the compiler so the best option is to specialize the shader if you can as that guarantees the behavior you want.

    Note that you ALSO need to specialize your buffers to remove unused attributes to ensure you get the bandwidth savings on input vertex data, so the shader compilation isn't the only thing you have to worry about.

Reply
  • I'd like to know whether the device driver will remove the computation of some varyings which eventually don't contribute to the output of fragment shader, even it's declared in the varying struct ?  

    It should, as long as you are not using OpenGL ES separate shader objects. However, like any optimization you are somewhat at the mercy of the compiler so the best option is to specialize the shader if you can as that guarantees the behavior you want.

    Note that you ALSO need to specialize your buffers to remove unused attributes to ensure you get the bandwidth savings on input vertex data, so the shader compilation isn't the only thing you have to worry about.

Children
  •  

    you ALSO need to specialize your buffers to remove unused attributes to ensure you get the bandwidth savings on input vertex data

    Simply using interleaved VBO should be enough to prevent unnecessary fetch of the unused attributes, right ?

    like any optimization you are somewhat at the mercy of the compiler so the best option is to specialize the shader if you can as that guarantees the behavior you want.

    I assume that I should not rely upon the compiler or the device driver to "pack" the varyings, right ?

    Does the compiler make any attempt at all to pack the varyings ? Or it leaves packing entire to the users ?

    This packing can become very complicated for an uber-shader, with a lot of conditional compilation directives (#ifdef).

    struct Varying1 {
        float2 uv1: TEXCOORD0;
        float2 uv2: TEXCOORD1;
    };
    
    struct Varying2 {
        float4 uv1_uv2: TEXCOORD0;
    };

  • Simply using interleaved VBO should be enough to prevent unnecessary fetch of the unused attributes, right?    

    Main memory will be accessed in 64 byte bursts (i.e. whole cache lines). If you have fully interleaved vertex data (array of structs style) then any unused attributes which are in the same 64 bytes as used attributes will be fetched.

    I assume that I should not rely upon the compiler or the device driver to "pack" the varyings, right ?

    Input vertex attributes generally can't be packed by the driver - the user can call glMapBuffer() at any point and expect to get their original memory layout.

  • Hi, Peter, my last question is actually about the varyings, not the attributes.

  • For varyings that's all controlled by the driver, and will repack as needed for the current pipeline based on what's actually used. The application gets no direct control over it in either case.

    HTH, 
    Pete