This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

glMapBufferRange and glUnmapBuffer performance on the Mali-T880

Hello all,

I'm currently using glMapBufferRange to update a trippled buffered UBO in instanced rendering, but I'm noticing that calling glUnmapBuffer is taking ~0.5ms of CPU time, despite calling glMapBufferRange with the GL_MAP_UNSYNCHRONIZED_BIT set and using fences. Is it normal for the glUnmapBuffer call to take this long?

In addition, I found that setting the GL_MAP_INVALIDATE_RANGE_BIT spikes the glMapBufferRange call to 10-20ms on the CPU, which is very strange because I would have expected it to improve performance. I also verified in MGD that I wasn't remapping a previously-invalidated range. Is it also normal for this bit to cause such drastic slowdowns?

Parents
  • cedega said:
    Is it normal for the glUnmapBuffer call to take this long?

    Yes, it can be quite slow depending the platform and whether we actually have to release virtual address range. If I remember correctly 32-bit (ARMv7) applications will unmap quite aggressively,  whereas 64-bit applications (ARMv8) have enough VA space that we can leave things mapped and therefore avoid the need for CPU-side cache maintenance and MMU updates.

    cedega said:
    I found that setting the GL_MAP_INVALIDATE_RANGE_BIT spikes the glMapBufferRange

    Yes, this is a known issue in our drivers. There was some ambiguity in the specification of which bit takes precedence - the invalidate or the unsynchronized - so we currently play it safe. The ambiguity has now been clarified in the standard's group (unsynchronized should take precedence), but we've not yet released a driver with the fix implemented. In general on Mali you should just be able to safely drop the INVALIDATE; we're a unified memory architecture driver so there is e.g. no need to copy from the graphics card into CPU-visible memory.

    Cheers, 
    Pete

Reply
  • cedega said:
    Is it normal for the glUnmapBuffer call to take this long?

    Yes, it can be quite slow depending the platform and whether we actually have to release virtual address range. If I remember correctly 32-bit (ARMv7) applications will unmap quite aggressively,  whereas 64-bit applications (ARMv8) have enough VA space that we can leave things mapped and therefore avoid the need for CPU-side cache maintenance and MMU updates.

    cedega said:
    I found that setting the GL_MAP_INVALIDATE_RANGE_BIT spikes the glMapBufferRange

    Yes, this is a known issue in our drivers. There was some ambiguity in the specification of which bit takes precedence - the invalidate or the unsynchronized - so we currently play it safe. The ambiguity has now been clarified in the standard's group (unsynchronized should take precedence), but we've not yet released a driver with the fix implemented. In general on Mali you should just be able to safely drop the INVALIDATE; we're a unified memory architecture driver so there is e.g. no need to copy from the graphics card into CPU-visible memory.

    Cheers, 
    Pete

Children