This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

EGL Pixbuffer is slow

Note: This was originally posted on 25th February 2013 at http://forums.arm.com

Hi All,

I'm having a 1k*1k*rgb texture that is rendered using shader and I want to copy the pixels to buffer so that I use Opencv with it. I tried glreadpixels and its very slow, I tried Pixbuffer of Egl, it has the same perforamance its very slow 7FPS. I'm using Mali400 on Exynos4412

Here is the code
  Please view it in pastebin, there is a problem with code posting here

http://pastebin.com/TwrtF0EG
  • Note: This was originally posted on 4th March 2013 at http://forums.arm.com

    Hi Ahmed,

    Just checking I understand, it sounds like you are rendering something using the GPU, and them attempting to perform some CV on it. This sounds like a fairly unusual usecase, can you give us some more detail on what it is that you're trying to achieve, so we can better advise?

    Thanks,
    Chris
  • Note: This was originally posted on 5th March 2013 at http://forums.arm.com

    Thanks for your reply.

    I'm trying to do some GPU Processing like thresholding an image, then get the result from the GPU to the CPU and do some OpenCV operations.

    The way to transfeer the data from the GPU to CPU using glreadpixels or eglpixel buffer is really slow.
  • Note: This was originally posted on 5th March 2013 at http://forums.arm.com

    Hi Ahmed,

    The slowdowns you are experiencing are to be expected with your usage unfortunately, as you are "breaking" the pipeline model that Mali GPU's implement. As a deferred renderer, frames are ideally not submitted to the GPU until a call to eglSwapBuffers is made. Therefore in a normal use case such as a game, the CPU will be working on frame N, whilst the GPU is processing frame N-1. In your use case, you are attempting to synchronously read back pixels from frame N to the CPU side, which implies that the frame up to that point must be submitted to the GPU, be processed, and then be read back. The CPU therefore has to wait for the GPU, and after which time the GPU has a huge pipeline bubble whilst it waits for more work from the CPU. It's easy to see why this is sub-optimal on deferred renderers.

    ReadPixels is a synchronous call as you want to read back the state of rendering for the frame you are currently working on (and that normally wouldn't have been submitted yet). CopyTexImage isn't necessarily synchronous, but in your case as you're copying to a Pbuffer that is accessible from the CPU side it is. I'm asking around for an asynchronous method that would work for your use-case, but my advice for now is that synchronous calls of any kind are bad for deferred renderers, and come with a slowdown. This is described in the "mali gpu application optimization guide" available from Mali Developer Guides, which is well worth a read.

    Hope this helps,
    Chris
  • Note: This was originally posted on 5th March 2013 at http://forums.arm.com

    Hi Ahmed,

    It may be possible to render to a pixmap in one thread, and have another thread wait on an EGL Fence for the render OP to complete, at which time it can grab the data and pass it along to the CV processing. By reading back from pixmaps you are not causing a flush, and by waiting on the fence on another thread you are not blocking your render thread. This removes the synchronous read back and means you should increase your throughput and FPS. Details on the fence mechanism can be found here: http://www.khronos.org/registry/gles/extensions/OES/EGL_KHR_fence_sync.txt. The most optimal implementation should ideally implement a ringbuffer of pixmaps so that you do not have the continuous overhead of creating and destroying pixmaps every frame, which is a general producer-consumer optimization.

    Hope this helps,
    Chris
  • Note: This was originally posted on 5th March 2013 at http://forums.arm.com

    Hi Ahmed,

    I should have asked initially, are you using Linux or Android?

    Thanks,
    Chris
  • Note: This was originally posted on 5th March 2013 at http://forums.arm.com

    Thansk for your reply.
    I'm using Linux.
  • Note: This was originally posted on 5th March 2013 at http://forums.arm.com

    Hi Ahmed,

    In that case, what I described above should work. Android would have made things slightly more complicated.

    Thanks,
    Chris
  • Note: This was originally posted on 6th March 2013 at http://forums.arm.com

    Do you mean a pixel buffer object or pixmap ?
    Would you show please a pseudo code ?
  • Note: This was originally posted on 12th July 2013 at http://forums.arm.com

    Hi Ahmed,

    On GLES3, this can be done with PBO's bound to the GL_PIXEL_PACK_BUFFER target, causing glReadPixels to write to that pbo instead of returning pixel data to the application, which avoids the pipeline stall and flush. The fence is used to signal to the application when this operation has completed and the buffer can be mapped to retrieve the pixel data. There is some sample code in the works but I can't give a date on when this will be available unfortunately. For GLES2, I think for now the only option is pixmaps, but the fence is still supported.

    Hope this helps,
    Chris