This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

glMapBufferRange and glUnmapBuffer performance on the Mali-T880

Hello all,

I'm currently using glMapBufferRange to update a trippled buffered UBO in instanced rendering, but I'm noticing that calling glUnmapBuffer is taking ~0.5ms of CPU time, despite calling glMapBufferRange with the GL_MAP_UNSYNCHRONIZED_BIT set and using fences. Is it normal for the glUnmapBuffer call to take this long?

In addition, I found that setting the GL_MAP_INVALIDATE_RANGE_BIT spikes the glMapBufferRange call to 10-20ms on the CPU, which is very strange because I would have expected it to improve performance. I also verified in MGD that I wasn't remapping a previously-invalidated range. Is it also normal for this bit to cause such drastic slowdowns?

Parents
  • Actually out of curiosity, why is there ambiguity in the invalidate range bit and unsynchronize bit? Or rather, how does that ambiguity lead to such a long CPU synchronization? I think I may be misunderstanding the driver implications of the invalidate bit, so some clarification would be incredibly useful.

Reply
  • Actually out of curiosity, why is there ambiguity in the invalidate range bit and unsynchronize bit? Or rather, how does that ambiguity lead to such a long CPU synchronization? I think I may be misunderstanding the driver implications of the invalidate bit, so some clarification would be incredibly useful.

Children
  • On an immediate mode renderer with separate graphics memory applications might have expected "MAP_INVALIDATE_RANGE_BIT | MAP_WRITE_BIT" behavior to create a new buffer chunk that is later patched into the real underlying buffer before rendering. Using UNSYNCHRONIZED_BIT to overwrite the contents of the buffer might then, from the app point of view, corrupt rendering if they expect a patch to be created and applied later. The current drivers are defensive to ensure correct behavior in this scenario and trigger a resource ghost to be created (see https://community.arm.com/graphics/b/blog/posts/mali-performance-6-efficiently-updating-dynamic-resources), so partial buffer mapping causes a full copy of the underlying buffer to be taken (minus the invalidated region, of course).

    Khronos has now clarified that this is not expected behavior and applications relying on this would be out of spec, so we should be able to patch the buffer in place without creating a ghost. This change is planned, just not available yet.