GL_EXT_draw_elements_base_vertex support?

This very nice desktop OpenGL has been ratified as a OpenGL ES extension recently.

I would like to request support for this extension since it will improve performance of my application by a very large margin on Mali hardware.

It's at the Khronos registry here. https://www.khronos.org/registry/gles/extensions/EXT/EXT_draw_elements_base_vertex.txt

Parents
  • My application is the Dolphin Gamecube/Wii emulator, for desktops we are able to use up to OpenGL 4.4 features, which we strongly recommend due to the performance increases that buffer_storage gives us.

    We have supported OpenGL ES 3.0 for nearly two years now, ever since Intel has gained support for the standard since early 2013.

    The main issues that we run in to is that we are emulating a console GPU made by ArtX. Games tend to abuse this GPU by a large margin. It is a fixed function pipeline GPU that is exceedingly flexible. Games have full access to the hardware, which allows directly interfacing with the GPU's registers. This allows games to switch state immediately and with exceedingly low overhead.

    The main issue with desktop GPUs is that there isn't efficient ways to upload data to the GPU in OpenGL ES 3.x. Whenever the emulated GPU changes GPU state we have to do a flush of pushing the state to the GPU since we are emulating the fixed-pipeline GPU using shaders.

    In particular a lot of games tend to draw a handful of vertices, switch state, and draw some more. This causes the draw calls to grow to very large amounts, in particular we can call glBuffer{Sub,}Data over 5000 times in a single frame.

    With base_vertex we can negate a lot of driver syncing due to those memory updates by using the elements buffer as a ring buffer, which would allow a glMapBufferRange + unsync flag to be used. So as long as we don't stomp of the state that the GPU is using we are fine and we lose the overhead that the driver has when calling glBuffer{Sub,}Data.

    If we have buffer_storage support we can take this even farther by mapping all of our buffers(elements, uniform buffer, etc) upon initialization and using them all as ring buffers and being capable of storing multiple frames of data as we go along. This removes the overhead that glMapBufferRange has with mapping/unmapping API calls. This has a larger performance impact than one would expect due to how often we are required to update our buffers. In particular on AMD hardware on the desktop the comparison of using glMapBufferRange versus buffer_storage results in 52FPS compared to 80FPS respectively.

    If the only thing that is improved by supporting base_vertex is lower CPU utilization then that is definitely a win, because emulating a GPU on mobile ARM hardware is quite heavy, and anything that lowers CPU usage to improve battery life and speed is great to have.

Reply
  • My application is the Dolphin Gamecube/Wii emulator, for desktops we are able to use up to OpenGL 4.4 features, which we strongly recommend due to the performance increases that buffer_storage gives us.

    We have supported OpenGL ES 3.0 for nearly two years now, ever since Intel has gained support for the standard since early 2013.

    The main issues that we run in to is that we are emulating a console GPU made by ArtX. Games tend to abuse this GPU by a large margin. It is a fixed function pipeline GPU that is exceedingly flexible. Games have full access to the hardware, which allows directly interfacing with the GPU's registers. This allows games to switch state immediately and with exceedingly low overhead.

    The main issue with desktop GPUs is that there isn't efficient ways to upload data to the GPU in OpenGL ES 3.x. Whenever the emulated GPU changes GPU state we have to do a flush of pushing the state to the GPU since we are emulating the fixed-pipeline GPU using shaders.

    In particular a lot of games tend to draw a handful of vertices, switch state, and draw some more. This causes the draw calls to grow to very large amounts, in particular we can call glBuffer{Sub,}Data over 5000 times in a single frame.

    With base_vertex we can negate a lot of driver syncing due to those memory updates by using the elements buffer as a ring buffer, which would allow a glMapBufferRange + unsync flag to be used. So as long as we don't stomp of the state that the GPU is using we are fine and we lose the overhead that the driver has when calling glBuffer{Sub,}Data.

    If we have buffer_storage support we can take this even farther by mapping all of our buffers(elements, uniform buffer, etc) upon initialization and using them all as ring buffers and being capable of storing multiple frames of data as we go along. This removes the overhead that glMapBufferRange has with mapping/unmapping API calls. This has a larger performance impact than one would expect due to how often we are required to update our buffers. In particular on AMD hardware on the desktop the comparison of using glMapBufferRange versus buffer_storage results in 52FPS compared to 80FPS respectively.

    If the only thing that is improved by supporting base_vertex is lower CPU utilization then that is definitely a win, because emulating a GPU on mobile ARM hardware is quite heavy, and anything that lowers CPU usage to improve battery life and speed is great to have.

Children
More questions in this forum