This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Uniform blocks : Optimal qualifiers

Greetings,

I'm currently playing with OpenGL ES 3.0 and I'm wondering if Mali (>= T6xx) drivers can perform interesting optimisations based on the chosen layout qualifier applied to Interface Blocks  ? (shared, packed, std140, row_major, column_major)

The fact that Std140 avoids pinging the OpenGL implementation, about the data layout, makes me wonder if there's any reason to choose another layout qualifier.

  • If a uniform block is shared between multiple programs, is it useful to choose shared ? Even though the uniform block will just get its data from a Uniform Buffer Object ?
  • Does packed really provides useful memory optimisations for OpenGL programs ?
  • In general we recommend only using buffer objects for uniforms which are created on the GPU; direct set uniforms (glUniform*() in OpenGL ES, or push constants in Vulkan) give us far more freedom to handle them in the fastest possible way for any given shader.

    That said, we accept it's a trade off with application complexity so there isn't always a "right" answer, and the differences are probably marginal in the overall frame cost

    In general "packed" isn't worth using - keeping things naturally aligned ensures best access efficiency, and if you have enough uniforms that you start getting cache pressure that would benefit from packing then you're doing something wrong ...

  • Unknown said:

    In general we recommend only using buffer objects for uniforms which are created on the GPU;

    Something like storing the result of a compute shader in a uniform buffer object ?

     

    Unknown said:

    push constants in Vulkan) give us far more freedom to handle them in the fastest possible way for any given shader.

    Interesting ! I would have thought that Uniform Buffer Objects would help restore data more quickly when switching between programs, or help the GPU pack shared data.

    Does it generate indirect accesses from the GPU ? Like

    [GPU Program]←[UBO ID to Memory Pointer table]
    [GPU Program]→[Buffer]

     ?

    Unknown said:

    That said, we accept it's a trade off with application complexity so there isn't always a "right" answer, and the differences are probably marginal in the overall frame cost

    In general "packed" isn't worth using - keeping things naturally aligned ensures best access efficiency, and if you have enough uniforms that you start getting cache pressure that would benefit from packing then you're doing something wrong ...

    Is std140 also problematic in that regard, as it forces alignment constraints ?

  • Something like storing the result of a compute shader in a uniform buffer object ?

    Yes, exactly that.

    I would have thought that Uniform Buffer Objects would help restore data more quickly when switching between programs, or help the GPU pack shared data.

    It's a trade off between CPU-side cost and the driver's freedom to pack things into the hardware in the most efficient way.

    • UBOs will have the lower CPU-side cost, but place some constraints on how we can feed the hardware. 
    • Direct-set uniforms will have higher CPU-side cost, but give us total freedom in how we drive the hardware.

    I can't really talk about the hardware internal microarchitecture on a public forum, but the high-level answer is that we can treat direct set uniforms more efficiently than a buffer load (which is really just a load from memory via a pointer, just like a load in CPU-side C program).

    Is std140 also problematic in that regard, as it forces alignment constraints ?

    The alignment requirements for std140 are deliberately defensive so they already meet the alignment requirements of the hardware, and will insert the necessary padding to ensure that.

    HTH, 
    Pete

  • Alright !

    I guess I'll only use UBO when playing with OpenGL ES 3.1 and compute shaders for now.

    Thanks for all these clarifications !
  • Just one last question !

    When searching for OpenGL ES 3.1 on the web, I found slides from the ARM Mali team presenting the new ES 3.1 features. Between these features there is :

    • Explicit uniform location (Desktop side, it's an OpenGL 3.3 feature)

    As I understand it, I can define a specific ID for uniforms and, then, just reference the static ID on the API (C/C++) side like this :

     glUniform*(myy_program_uniform_id, ...)

     

    The question is : does it have the same impact as forcing layouts ?

  • The explicit locations set in the shader define the bind point name used for each symbol; without this explicit setting they can be set arbitrarily by the driver and different drivers will do different things with them. All the explicit location support means is that if you set the location in the shader you no longer need to call glGetUniformLocation at run-time, you can just use the pre-defined location value set in the shader source code. 

    HTH, 
    Pete

  • So it works the same as setting attributes locations, then.

    However, on Mali devices, is the driver impacted negatively in some way by the arbitrary definition of the ID ?
  • No - shouldn't make any difference.
  • Nice !

    This will make shaders management way easier !

    Thanks again for these clarifications.