NVIDIA is ratifying this extension to work with OpenGL ES 3.x and they are exposing it with their latest video drivers (Which are only available on the X1 development platform).
My application takes advantage of the desktop OpenGL variant(GL_ARB_buffer_storage) to dramatically reduce CPU overhead from calling in to the OpenGL API.
This also allows us to easily decode our data directly in to GPU buffers while rendering from it with the GPU. Making sure we don't overwrite information in flight by having multiple frames of data of course. For UMA systems like those that run Mali GPUs this is a big win for us, especially with how little bandwidth is available on these compared to what we have available to us on desktops.
Hopefully you'll think about implementing support for this extension.
Hello,
am I right assuming your biggest interest in this extension is the feature of allowing persistent mappings?
Depending on how much data you generate per frame the overhead of mapping/unmapping either regions (provide the right flags if doing so) or even of rebinding a different buffer for each frame might be acceptable on our current implementation and gives an advantage over glBufferSubData calls.
We cannot comment on further plans or time lines for features.
Cheers,
Jörg
Can this extention be raised with product management team?
I also going with the assumption that persistent mapping is what the op is interested in..Though its a nice feature to have, developers have been doing without persistent mapping before and after its introduction. The explanation giving above works fine as I have used them myself, using a ring buffer or orphaning won't be necessarily slower than a persistent mapping( mileage may vary and one would have to profile both method to see ). A chapter in OpenGL Insight covers experiments done with a few of the strategy listed above ( but for OpenGL not OpenGL ES ) and you would be surprised. Persistent mapping is not a panacea either as one would still have to worry about synchronization and that also add overhead. Last but not least, I don't think designing an application around a single extension is good design practice as now you are limiting the number of device the application can run on, unless the core of the design is to run on just system X.
Since we are a performance oriented application, we support buffer updating multiple ways depending on how efficient it is on different platforms.
Currently we support six different ways of updating buffers.
The most efficient way for us is if they driver exposes GL_{ARB, OES, EXT}_draw_elements_base_vertex alongside GL_{ARB, EXT}_buffer_storage.
If the driver doesn't expose that path we fall back to other ways of updating our buffers, typically glMapBufferRange, glBufferSubData, or glBufferData in descending order from most efficient to least efficient.
Of course each of these methods are used under varying circumstances, say if the driver doesn't expose base_vertex then we can only update our buffers with glBuffer{Sub,}Data.
Then we take it further and determine if we are on a deferred renderer and if we are then fallback to only glBufferData. Unless of course they support base_vertex which then it becomes more efficient to update the buffers with glMapBufferRange with the unsync flag being set.
Again of course, we support all these methods to make sure we get the most efficient buffer updating as possible, and yes the persistent mapping gives us quite an advantage since it lowers CPU overhead quite a bit since we don't need to call in to the driver constantly. We do a very large sum of buffer updates which can definitely get us bound up in API overhead.
This definitely shows on the Nexus 9 where the drivers have been forced to only a GLES 3.1 subset without base_vertex and buffer_storage, but can be hacked around to use the functions still.
Done.
This doesn't imply any commitment, time line, etc., but at least it should be on the radar.
Hi,
might I ask for an update on this feature request? Three years later, we are still waiting. I see that this isn't on the roadmap of any big game or benchmark, but it is a big performance boost for drawing dynamic content. So it will help a lot for performance-critical emulators.
To be honest, I can't confirm that the overhead of mapping is negligible. Mapping unsynchronized serializes the driver threads, mapping with invalid range yields a bufferSubData call. Please keep in mind that we talk about 1000+ updates per frame. Right now, the fastest way to upload data is to *always* call glBufferData. Unsynchronized mapping is slower by a factor of 5 with most of its time in gles_vertexp_bb_neon_transform_and_produce_clip_bits. Everything else stalls the GPU and is even slower.
We could of course queue all rendering calls and buffer updates and submit them at once. But this yields twice the memory bandwidth requirement and a lot of logic for serializing the state changes and draw calls. This is exactly what your GL frontend is supposed to do, isn't it?
Sad fact, Mali and OSX are the last vendors not supporting this extension. Even PVR recently gained support for it. In the end, we just discourage everyone to buy any Mali product and suggest to pick a Snapdragon instead. Let's hope for a benchmark which tests buffer upload overhead. I fear you won't ever think about implementing it else...
Just my two cents, from a disappointed developer
Bye
Do you have an update on this?