Greetings,
I'm currently playing with OpenGL ES 3.0 and I'm wondering if Mali (>= T6xx) drivers can perform interesting optimisations based on the chosen layout qualifier applied to Interface Blocks ? (shared, packed, std140, row_major, column_major)
The fact that Std140 avoids pinging the OpenGL implementation, about the data layout, makes me wonder if there's any reason to choose another layout qualifier.
In general we recommend only using buffer objects for uniforms which are created on the GPU; direct set uniforms (glUniform*() in OpenGL ES, or push constants in Vulkan) give us far more freedom to handle them in the fastest possible way for any given shader. That said, we accept it's a trade off with application complexity so there isn't always a "right" answer, and the differences are probably marginal in the overall frame cost In general "packed" isn't worth using - keeping things naturally aligned ensures best access efficiency, and if you have enough uniforms that you start getting cache pressure that would benefit from packing then you're doing something wrong ...
Unknown said: In general we recommend only using buffer objects for uniforms which are created on the GPU;
In general we recommend only using buffer objects for uniforms which are created on the GPU;
Something like storing the result of a compute shader in a uniform buffer object ?
Unknown said: push constants in Vulkan) give us far more freedom to handle them in the fastest possible way for any given shader.
push constants in Vulkan) give us far more freedom to handle them in the fastest possible way for any given shader.
Interesting ! I would have thought that Uniform Buffer Objects would help restore data more quickly when switching between programs, or help the GPU pack shared data.
Does it generate indirect accesses from the GPU ? Like
[GPU Program]←[UBO ID to Memory Pointer table]
[GPU Program]→[Buffer]
?
Unknown said: That said, we accept it's a trade off with application complexity so there isn't always a "right" answer, and the differences are probably marginal in the overall frame cost In general "packed" isn't worth using - keeping things naturally aligned ensures best access efficiency, and if you have enough uniforms that you start getting cache pressure that would benefit from packing then you're doing something wrong ...
That said, we accept it's a trade off with application complexity so there isn't always a "right" answer, and the differences are probably marginal in the overall frame cost In general "packed" isn't worth using - keeping things naturally aligned ensures best access efficiency, and if you have enough uniforms that you start getting cache pressure that would benefit from packing then you're doing something wrong ...
Is std140 also problematic in that regard, as it forces alignment constraints ?
Yes, exactly that.
I would have thought that Uniform Buffer Objects would help restore data more quickly when switching between programs, or help the GPU pack shared data.
It's a trade off between CPU-side cost and the driver's freedom to pack things into the hardware in the most efficient way.
I can't really talk about the hardware internal microarchitecture on a public forum, but the high-level answer is that we can treat direct set uniforms more efficiently than a buffer load (which is really just a load from memory via a pointer, just like a load in CPU-side C program).
The alignment requirements for std140 are deliberately defensive so they already meet the alignment requirements of the hardware, and will insert the necessary padding to ensure that.
HTH, Pete
Just one last question !
When searching for OpenGL ES 3.1 on the web, I found slides from the ARM Mali team presenting the new ES 3.1 features. Between these features there is :
As I understand it, I can define a specific ID for uniforms and, then, just reference the static ID on the API (C/C++) side like this :
glUniform*(myy_program_uniform_id, ...)
The question is : does it have the same impact as forcing layouts ?
The explicit locations set in the shader define the bind point name used for each symbol; without this explicit setting they can be set arbitrarily by the driver and different drivers will do different things with them. All the explicit location support means is that if you set the location in the shader you no longer need to call glGetUniformLocation at run-time, you can just use the pre-defined location value set in the shader source code.
glGetUniformLocation