This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

How to gain performance through PBO (pixel buffer object) on Mali T-880

I'm working on a corner detection algorithm on a international version of Samsung S7 which is empowered by Mali T-880. The basic framework is 1.) grab android camera capture into a OpenGL texture. 2) run through several stages of image filters written in GLSL shaders. 3) read the processed result back to main memory, let CPU finish to final detection. As you can image the performance bottleneck is glReadPixels in step 3. The texture/render target size is 2560 * 1440, the usual time glReadPixels() costs is 180ms after all draw calls of these image filters. (if no filter at all, just fetch last render also takes 120ms). Since trigger draw command for these filters is extremely fast < 10ms, I can still get 2x performance boost.

Now I tried further optimized glReadPixels by using PBO. Followings are my code:

// initialize pbo

const int pbo_count = 2;
glGenBuffers( pbo_count, gl_pbo_ids );
for (int i = 0; i < pbo_count; ++i)

{

     glBindBuffer( GL_PIXEL_PACK_BUFFER, gl_pbo_ids[i] );

     glBufferData( GL_PIXEL_PACK_BUFFER, pbo_buffer_size, 0, GL_DYNAMIC_READ );
}

// in render thread, trigger read pixel on one pbo asynchronously , and process another pbo data

static int r_idx = 0;
int p_idx = 0;
r_idx = (r_idx + 1) % pbo_count;
p_idx = (r_idx + 1 ) % pbo_count;

glBindBuffer(GL_PIXEL_PACK_BUFFER, gl_pbo_ids[r_idx]);
glReadPixels(0, 0, width, height, GL_RGBA, GL_UNSIGNED_BYTE, 0);

glBindBuffer(GL_PIXEL_PACK_BUFFER, gl_pbo_ids[p_idx]);
pbo_ptr= (GLubyte*)glMapBufferRange(GL_PIXEL_PACK_BUFFER, 0, pbo_buffer_size, GL_MAP_READ_BIT);
memcpy(data_ptr, pbo_ptr, pbo_buffer_size);
glUnmapBuffer(GL_PIXEL_PACK_BUFFER);
glBindBuffer(GL_PIXEL_PACK_BUFFER, 0);

// process data_ptr with CPU ....

By this code, I can get identical result with non-pbo version, but the performance is even worse. glReadPixels() does return immediately. So is glMapBufferRange(). The problem is memcpy() takes around 450 ms which is totally a disaster. I wonder if I missed some setup or problematic code. I test memcpy() between two CPU allocated memory area with same size, it won't take more than 5ms. And also tried to increase PBO number. It didn't help.

Searched some suggestions to even use an asynchronous read pixel thread, I haven't tried it. Because if memcpy() really takes so long, I don't think it will really solve my issue.

Any opinion would be very appreciated.

Thanks.