This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Programs pipelines performances questions

When reading the following slide from ARM Getting the most out of OpenGL ES 3.0, one of the first feature presented is Separate Shader Objects, which involve using the new Program Pipeline feature of OpenGL ES 3.1. However, when grep-ing the Mali's Android SDK code for pipeline, the only results I got was some references in HTML documentation files and comments.

The question is :

Does the program pipeline have serious performances cost compared to simple programs blocks used with glUseProgram ?

The Apple documentation seems to encourage the use of pipelines in their Best Practices for Shaders documentation, however they target other GPU so I'm wondering if SSOs and pipelines play well compared to monolithic programs on Mali GPU specifically ?

Top replies

Parents

+1 Daniele Di Donato over 7 years ago

Hi Myy,

Using the classic way to create a program allows the driver to perform more optimizations for the whole program. The driver will first optimize each shader separately and then when linking it will optimize the program as a whole. In this phase the compiler will execute link-time optimizations. For example, it will remove a varying calculation in the vertex shader if the fragment shader doesn't declare to use it.

When using SSOs the single shaders get optimized and stored in a way that is much faster to link at runtime but doesn't allow for whole-program optimizations (which will take time).

So in short there could be some optimizations that are not applied when you use SSOs. The impact of the optimization is entirely dependent on your shaders code so it cannot be quantified easily. For example, if your vertex shaders outputs 6 varyings but the fragment shader you are going to use it with is only reading 1 varying, the vertex shader will still calculate the value of those other 5 varying for each vertex. If the calculation is complex, these can impact the performance if the application is vertex bound.

Regards,

Daniele
Cancel
Up +1 Down

Cancel

Reply

+1 Daniele Di Donato over 7 years ago

Hi Myy,

Using the classic way to create a program allows the driver to perform more optimizations for the whole program. The driver will first optimize each shader separately and then when linking it will optimize the program as a whole. In this phase the compiler will execute link-time optimizations. For example, it will remove a varying calculation in the vertex shader if the fragment shader doesn't declare to use it.

When using SSOs the single shaders get optimized and stored in a way that is much faster to link at runtime but doesn't allow for whole-program optimizations (which will take time).

So in short there could be some optimizations that are not applied when you use SSOs. The impact of the optimization is entirely dependent on your shaders code so it cannot be quantified easily. For example, if your vertex shaders outputs 6 varyings but the fragment shader you are going to use it with is only reading 1 varying, the vertex shader will still calculate the value of those other 5 varying for each vertex. If the calculation is complex, these can impact the performance if the application is vertex bound.

Regards,

Daniele
Cancel
Up +1 Down

Cancel

Children

0 Myy over 7 years ago in reply to Daniele Di Donato

Ah ! Thanks for these precious informations Daniele Di Donato !

Interesting to see that drivers can do Whole Program Link Time Optimisations on standard shaders programs.
I'll keep that in mind when analysing the performance of OpenGL programs.

Does the Offline Mali Shader Compiler provide an overview of potential link-time optimisations when combining two shaders in a program ?

Anyway, I'll keep using standard programs when possible.
Cancel
Up 0 Down

Cancel
0 Daniele Di Donato over 7 years ago in reply to Myy

Hi Myy,

The Offline Mali Shader compiler works only on single shader files. That means it can perform optimization on single shader level but not on the whole program since it can run only on a single shader.

Regards,

Daniele
Cancel
Up +1 Down

Cancel