NVIDIA is ratifying this extension to work with OpenGL ES 3.x and they are exposing it with their latest video drivers (Which are only available on the X1 development platform).
My application takes advantage of the desktop OpenGL variant(GL_ARB_buffer_storage) to dramatically reduce CPU overhead from calling in to the OpenGL API.
This also allows us to easily decode our data directly in to GPU buffers while rendering from it with the GPU. Making sure we don't overwrite information in flight by having multiple frames of data of course. For UMA systems like those that run Mali GPUs this is a big win for us, especially with how little bandwidth is available on these compared to what we have available to us on desktops.
Hopefully you'll think about implementing support for this extension.
Hello,
am I right assuming your biggest interest in this extension is the feature of allowing persistent mappings?
Depending on how much data you generate per frame the overhead of mapping/unmapping either regions (provide the right flags if doing so) or even of rebinding a different buffer for each frame might be acceptable on our current implementation and gives an advantage over glBufferSubData calls.
We cannot comment on further plans or time lines for features.
Cheers,
Jörg
Can this extention be raised with product management team?
Done.
This doesn't imply any commitment, time line, etc., but at least it should be on the radar.
Hi,
might I ask for an update on this feature request? Three years later, we are still waiting. I see that this isn't on the roadmap of any big game or benchmark, but it is a big performance boost for drawing dynamic content. So it will help a lot for performance-critical emulators.
To be honest, I can't confirm that the overhead of mapping is negligible. Mapping unsynchronized serializes the driver threads, mapping with invalid range yields a bufferSubData call. Please keep in mind that we talk about 1000+ updates per frame. Right now, the fastest way to upload data is to *always* call glBufferData. Unsynchronized mapping is slower by a factor of 5 with most of its time in gles_vertexp_bb_neon_transform_and_produce_clip_bits. Everything else stalls the GPU and is even slower.
We could of course queue all rendering calls and buffer updates and submit them at once. But this yields twice the memory bandwidth requirement and a lot of logic for serializing the state changes and draw calls. This is exactly what your GL frontend is supposed to do, isn't it?
Sad fact, Mali and OSX are the last vendors not supporting this extension. Even PVR recently gained support for it. In the end, we just discourage everyone to buy any Mali product and suggest to pick a Snapdragon instead. Let's hope for a benchmark which tests buffer upload overhead. I fear you won't ever think about implementing it else...
Just my two cents, from a disappointed developer
Bye
Do you have an update on this?