This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

How to share EGLImageKHR object between different devices

Hello,

I am trying to save screenshot of a qml quick controls application on a platform (running QT on wayland) by using native opengl functions .What I am doing is that using a RGB color render buffer with eglCreateImageKHR function and then send the EGLImageKHR void pointer to another device through Qt socket communication. I can successfully create EGLImage that means that there is no error from eglGetError function . For testing the EGLImageKHR object correctness, I bind it to another framebuffer by using glEglImageTargetRenderbufferStorageOES on the same process and read the pixel from glReadPixel function , create a png file from read buffer and observed that correct png is created with correct colors.

After that I tried to send this EGLImageKHR void pointer to another device or process and then create some png from the sended EGLImageKHR object and I do not see correct colored png ,only have a noise on the png.

Following is the code sample to create the EGLImageKHR from render buffer and then saving a tga_file from EGLImageKHR.

////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

// create render buffer and bind it to a framebuffer
glGenRenderbuffers( 1, &renderBuffer );
glBindRenderbuffer( GL_RENDERBUFFER, renderBuffer );
glRenderbufferStorage( GL_RENDERBUFFER, GL_RGB, mWinWidth, mWinHeight );
glBindRenderbuffer(GL_RENDERBUFFER, 0); //mwindow->openglContext()->defaultFramebufferObject());

if (glGetError()==GL_NO_ERROR)
{
//qDebug() << "Render buff storage is OK" << glGetError();
}
else
{
qDebug() << "Render buff storage error is " << glGetError();
}

glGenFramebuffers( 1, &frameBuffer );
glBindFramebuffer( GL_FRAMEBUFFER, frameBuffer);
glFramebufferRenderbuffer( GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_RENDERBUFFER, renderBuffer);

//printFramebufferInfo(frameBuffer);
if( glCheckFramebufferStatus(GL_FRAMEBUFFER) != GL_FRAMEBUFFER_COMPLETE)
{
qDebug() << "Framebuffer error is " << glGetError();
}
else
{
//qDebug() << "Framebuffer is OK" << glGetError();
}

// create EGLImageKHR object
mWinWidth = mwindow->width();
mWinHeight = mwindow->height();

glGetIntegerv(GL_PACK_ALIGNMENT, &rowPack);
glPixelStorei(GL_PACK_ALIGNMENT, 1);
glBindFramebuffer(GL_READ_FRAMEBUFFER,mwindow->openglContext()->defaultFramebufferObject());
glBindFramebuffer(GL_DRAW_FRAMEBUFFER,frameBuffer);
glBlitFramebuffer(0, 0, mWinWidth, mWinHeight, 0, 0, mWinWidth, mWinHeight, GL_COLOR_BUFFER_BIT, GL_NEAREST);

m_display = reinterpret_cast<egldisplay>(reinterpret_cast<void*>(QGuiApplication::platformNativeInterface()->nativeResourceForIntegration("egldisplay")));
m_context = QGuiApplication::platformNativeInterface()->nativeResourceForContext("eglcontext", mwindow->openglContext());

mImage = CreateImageKHR(m_display,m_context, EGL_GL_RENDERBUFFER_KHR,reinterpret_cast<eglclientbuffer>(renderBuffer), nullptr);

if (mImage == EGL_NO_IMAGE_KHR)
{
qDebug("failed to make image from target buffer: %s", get_egl_error());
return -1;
}

int size = mWinWidth * mWinHeight * 3;
sendEglImage(size);
glDeleteRenderbuffers(1,&renderBuffer);
renderBuffer = 0;
glDeleteFramebuffers(1,&frameBuffer);
frameBuffer = 0;

// send EGLImageKHR to client
sendEglImage(int size)
{
if (SenderSocket != NULL)
{
QByteArray data;
data.append(reinterpret_cast<const char*="">(mImage),size);
//data.append(reinterpret_cast<qbytearray *="">(mImage));
QDataStream out(&data, QIODevice::WriteOnly);
out.setDevice(SenderSocket);
out << data;
//qDebug() << "func " << __FUNCTION__ << "line" << __LINE__;
qDebug() << "func " << __FUNCTION__ << "line" << __LINE__ << "data size" << data.size();
}

QImage testImg((uchar *)mImage,640,480,QImage::Format_RGB888, nullptr, nullptr);
if(testImg.save("server.png"))
qDebug() << "Successfully saved image" << testImg;

DestroyImageKHR(m_display,mImage);
mImage = 0;
}

// Another approach to create a tga_file from EGLImageKHR is
FILE *out = fopen("tga_file", "w");
short TGAhead[] = {0, 2, 0, 0, 0, 0, 640, 480, 24};
fwrite(&TGAhead, sizeof(TGAhead), 1, out);
fwrite(mImage, mWinWidth * mWinHeight*3, 1, out);
fflush(out);
fclose(out);

// One more different trial

   int bufSize = mWinHeight * mWinWidth*3;

   unsigned char * trialBuff = new unsigned char[bufSize];

   memcpy(trialBuff,khrImage,bufSize);

   FILE *out = fopen("dada.txt", "w");

   fwrite(trialBuff, bufSize, 1, out);

   fflush(out);

   fsync(fileno(out));

   fclose(out);

   delete [] trialBuff;

////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

So When I try to create a png with QImage or with fwrite from EGLImageKHR object, I do not get a valid png or tga_file.

Note that I do not want to use glReadPixels function since it is causing high cpu load. Is there any idea how I can create some png file from EGLImageKHR and How I can send it to another device ?

Best Regards

Parents
  • I won't claim I know from firsthand experience what best approach is, but I'll throw out some guesses.

    First, unless your source is running 16b, you probably want to look at reading 32bpp (RGBA or RGBX).  Often that will be the fastest format if you are in 24/32b display mode.  Some drivers will do REALLY slow operations, per pixel read, if they have to convert from 32->24 or 32->16.  You can do it faster.

    Second, if you are sending a pixel buffer over a network to another device, that is lower powered, you might consider whether compressing to say PNG or JPG (or other format) is worth using more cpu resource on the host.  Result is local cpu cost to reduce network bandwidth, with maybe potential cpu cost to decode on the other end.  Then again, if you have a 'dumb panel' on the other end, you may want just raw pixels, hard to say.

    Third, have you timed simply doing glReadPixels instead of the overly complex mechanic above?  If you use PBOs internal to a SINGLE FRAME, not with multithreading, not assuming delayed result, you will take immediate cost to synchronize the gpu and pull data to cpu (again, with possible per-pixel conversions).  At which point the PBO isn't useful, just use ReadPixels, take the synchonization hit, and move on. :)

Reply
  • I won't claim I know from firsthand experience what best approach is, but I'll throw out some guesses.

    First, unless your source is running 16b, you probably want to look at reading 32bpp (RGBA or RGBX).  Often that will be the fastest format if you are in 24/32b display mode.  Some drivers will do REALLY slow operations, per pixel read, if they have to convert from 32->24 or 32->16.  You can do it faster.

    Second, if you are sending a pixel buffer over a network to another device, that is lower powered, you might consider whether compressing to say PNG or JPG (or other format) is worth using more cpu resource on the host.  Result is local cpu cost to reduce network bandwidth, with maybe potential cpu cost to decode on the other end.  Then again, if you have a 'dumb panel' on the other end, you may want just raw pixels, hard to say.

    Third, have you timed simply doing glReadPixels instead of the overly complex mechanic above?  If you use PBOs internal to a SINGLE FRAME, not with multithreading, not assuming delayed result, you will take immediate cost to synchronize the gpu and pull data to cpu (again, with possible per-pixel conversions).  At which point the PBO isn't useful, just use ReadPixels, take the synchonization hit, and move on. :)

Children
  • Hello,

    As you said that RGBA is the fastest and consuming lower cpu. Yes I will use gigabit ethernet and there will be no compressing in plan :)

    The pbo Off values in above are for direct glReadPixels: 

    "PBO OFF:

    Intel : 10-12% cpu usage without sending pixels to another device, Read Time: 1.978ms, Process Time:0.001ms

    Nvidia: 24-25%cpu usage without sending pixels to another device, Read Time:3.026 ms, Process Time:0.001 ms"

    At that moment, the nvidia cpu usage are high so I am using pbo :)

    Regards

  • so the quick ideas:

    - read RGBA 32bpp, not 16b, not 24b.

    - make sure you don't have a multisample framebuffer.

    - render to FBO, blit that to screen, then do readpixels or similar on the FBO surface. 

    - might want glPixelStorei(GL_PACK_ALIGNMENT, 4); to force 32b alignment.  I'm not sure that forcing 8b align isn't kicking you off a fast path.

    I can see the timing differences being associated with any of the above.

  • Hello,

    So When I try to read RGBA with Pbo , The cpu  usage is already low:

    RGBA with RunPixelBo  algorithm:

    PBO ON:

    Intel : 18-19% cpu usage without sending pixels to another device, Read Time: 0.085ms, Process Time:1.112ms

    Nvidia: 17-18%cpu usage without sending pixels to another device, Read Time:0.196 ms, Process Time:0.732 ms

    RGBA with RunPixelBo  algorithm:

    PBO OFF(glReadPixels):

    Intel : 16-17% cpu usage without sending pixels to another device, Read Time: 3.25ms, Process Time:0ms

    Nvidia: 31-32%cpu usage without sending pixels to another device, Read Time:4.064 ms, Process Time:0.001 ms

    RGBA with doReadFastBack  algorithm:

    PBO ON:

    Intel : 15-16.5% cpu usage without sending pixels to another device, Read Time: 0.065ms, Process Time:3.217ms

    Nvidia: 14.5-15.5%cpu usage without sending pixels to another device, Read Time:0.108 ms, Process Time:5.833 ms

    For RGB , it is also good number on nvidia but not on intel.So my requirement is 16 bit RGB for this reason 32b or 24b is not suitable for me. I will also check render buffer with 16bit as you ment above.

    Regards