Hello all,
I'm currently using glMapBufferRange to update a trippled buffered UBO in instanced rendering, but I'm noticing that calling glUnmapBuffer is taking ~0.5ms of CPU time, despite calling glMapBufferRange with the GL_MAP_UNSYNCHRONIZED_BIT set and using fences. Is it normal for the glUnmapBuffer call to take this long?
In addition, I found that setting the GL_MAP_INVALIDATE_RANGE_BIT spikes the glMapBufferRange call to 10-20ms on the CPU, which is very strange because I would have expected it to improve performance. I also verified in MGD that I wasn't remapping a previously-invalidated range. Is it also normal for this bit to cause such drastic slowdowns?
On an immediate mode renderer with separate graphics memory applications might have expected "MAP_INVALIDATE_RANGE_BIT | MAP_WRITE_BIT" behavior to create a new buffer chunk that is later patched into the real underlying buffer before rendering. Using UNSYNCHRONIZED_BIT to overwrite the contents of the buffer might then, from the app point of view, corrupt rendering if they expect a patch to be created and applied later. The current drivers are defensive to ensure correct behavior in this scenario and trigger a resource ghost to be created (see https://community.arm.com/graphics/b/blog/posts/mali-performance-6-efficiently-updating-dynamic-resources), so partial buffer mapping causes a full copy of the underlying buffer to be taken (minus the invalidated region, of course).
Khronos has now clarified that this is not expected behavior and applications relying on this would be out of spec, so we should be able to patch the buffer in place without creating a ghost. This change is planned, just not available yet.