Mali-400 MP2 glReadPixels alternatives

Hello everyone,

I'm working on ZynqMP platform with Mali-400 MP2. I configured
headless-EGL backend to render offscreen. I'm using
eglCreatePbufferSurface to initialize egl surface, then I render to a
texture and read the data back with glReadPixels.

The problem is that glReadPixels makes things really slow. I only get
around 10 FPS if I run glgears (a headless version of es2gears actually).

What I tried so far:

1) I followed this advice to use different surfaces for read and write:
community.khronos.org/.../619
It does not give me any FPS boost.

2) I tried to use glMapBuffer. But with Mali-440 I only have gles2 and
GL_PIXEL_PACK_BUFFER seems to be not supported in gles2.

3) I looked into using eglCreatePixmapSurface as e.g. here:
github.com/.../test.c
But it seems to be Samsung specific. And on my platform
EGLNativePixmapType is typedefed to khronos_uintptr_t.

Environment information:
libMali and kernel module version: r8p0-01rel0
Mali module flags:
    CONFIG_MALI400=y
    CONFIG_MALI400_PROFILING=n
    CONFIG_MALI400_DEBUG=y
    CONFIG_MALI_DT=y 
    CONFIG_MALI_SHARED_INTERRUPTS=y
    CONFIG_MALI_DVFS=n

Any help would be appreciated.

Parents
  • Hi ivan144, 

    Note that the "slowness" comes from two things :

    * Firstly when you call glReadPixels() you have to block and wait for the GPU to render the queued work, so there is some latency of waiting for the GPU to render the surface before the copy can start. Usual display updates are asynchronous (eglSwapBuffers() just pushes frames into a queue, it doesn't actually block and wait to do the swap synchronously).

    * Secondly the CPU then has to do the transfer into the application owned buffer.

    You might be able to eliminate the second of these, but the first is unavoidable without the asynchronous PBO support, which as you note is missing in OpenGL ES 2.0.

    The usual trick here is not to create a copy at all - CPUs make terrible data-plane engines and framebuffer copies in software are always going to be relatively slow - so most graphics pipelines are designed to pass outputs directly from GPU to e.g. the display controller or video hardware .What are you using the copied surface for? Is it possible to avoid the copy completely?

    Cheers, 
    Pete

Reply
  • Hi ivan144, 

    Note that the "slowness" comes from two things :

    * Firstly when you call glReadPixels() you have to block and wait for the GPU to render the queued work, so there is some latency of waiting for the GPU to render the surface before the copy can start. Usual display updates are asynchronous (eglSwapBuffers() just pushes frames into a queue, it doesn't actually block and wait to do the swap synchronously).

    * Secondly the CPU then has to do the transfer into the application owned buffer.

    You might be able to eliminate the second of these, but the first is unavoidable without the asynchronous PBO support, which as you note is missing in OpenGL ES 2.0.

    The usual trick here is not to create a copy at all - CPUs make terrible data-plane engines and framebuffer copies in software are always going to be relatively slow - so most graphics pipelines are designed to pass outputs directly from GPU to e.g. the display controller or video hardware .What are you using the copied surface for? Is it possible to avoid the copy completely?

    Cheers, 
    Pete

Children
More questions in this forum