This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Uniform blocks : Optimal qualifiers

Greetings,

I'm currently playing with OpenGL ES 3.0 and I'm wondering if Mali (>= T6xx) drivers can perform interesting optimisations based on the chosen layout qualifier applied to Interface Blocks  ? (shared, packed, std140, row_major, column_major)

The fact that Std140 avoids pinging the OpenGL implementation, about the data layout, makes me wonder if there's any reason to choose another layout qualifier.

  • If a uniform block is shared between multiple programs, is it useful to choose shared ? Even though the uniform block will just get its data from a Uniform Buffer Object ?
  • Does packed really provides useful memory optimisations for OpenGL programs ?
Parents
  • Something like storing the result of a compute shader in a uniform buffer object ?

    Yes, exactly that.

    I would have thought that Uniform Buffer Objects would help restore data more quickly when switching between programs, or help the GPU pack shared data.

    It's a trade off between CPU-side cost and the driver's freedom to pack things into the hardware in the most efficient way.

    • UBOs will have the lower CPU-side cost, but place some constraints on how we can feed the hardware. 
    • Direct-set uniforms will have higher CPU-side cost, but give us total freedom in how we drive the hardware.

    I can't really talk about the hardware internal microarchitecture on a public forum, but the high-level answer is that we can treat direct set uniforms more efficiently than a buffer load (which is really just a load from memory via a pointer, just like a load in CPU-side C program).

    Is std140 also problematic in that regard, as it forces alignment constraints ?

    The alignment requirements for std140 are deliberately defensive so they already meet the alignment requirements of the hardware, and will insert the necessary padding to ensure that.

    HTH, 
    Pete

Reply
  • Something like storing the result of a compute shader in a uniform buffer object ?

    Yes, exactly that.

    I would have thought that Uniform Buffer Objects would help restore data more quickly when switching between programs, or help the GPU pack shared data.

    It's a trade off between CPU-side cost and the driver's freedom to pack things into the hardware in the most efficient way.

    • UBOs will have the lower CPU-side cost, but place some constraints on how we can feed the hardware. 
    • Direct-set uniforms will have higher CPU-side cost, but give us total freedom in how we drive the hardware.

    I can't really talk about the hardware internal microarchitecture on a public forum, but the high-level answer is that we can treat direct set uniforms more efficiently than a buffer load (which is really just a load from memory via a pointer, just like a load in CPU-side C program).

    Is std140 also problematic in that regard, as it forces alignment constraints ?

    The alignment requirements for std140 are deliberately defensive so they already meet the alignment requirements of the hardware, and will insert the necessary padding to ensure that.

    HTH, 
    Pete

Children