This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

glGenerateMipmap() very slow on Samsung Galaxy SII

Note: This was originally posted on 15th February 2012 at http://forums.arm.com

Hi,
I'm currently doing some GPGPU on the Samsung Galaxy SII (Mali-400 MP). For that I need to generate a mipmap from a texture that has been rendered to via a FBO. Unfortunately glGenerateMipmap() appears to be very slow on the device. It takes about 90 milliseconds to generate a mipmap for a 512x512 RGBA8888 texture. Since I also tried the same code on other Android devices, where this function works much faster (about 2 milliseconds), this slowdown really puzzles me. Am I doing something wrong or missing something here? Can anyone provide example code for this case working on a MALI device?

Here are the relevant parts of my code:

glGenTextures(1, &texId);
glActiveTexture(GL_TEXTURE0);
glBindTexture(GL_TEXTURE_2D, texId);

// Allocate graphics memory.
glTexImage2D(GL_TEXTURE_2D, 0, format, cols, rows, 0, format, type, NULL);
// Allocate memory for mipmap.
glGenerateMipmap(GL_TEXTURE_2D);

glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR_MIPMAP_LINEAR);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE);

// Create off-screen framebuffer object and attach the texture to it.
glGenFramebuffers(1, &fboId);
glBindFramebuffer(GL_FRAMEBUFFER, fboId);
glFramebufferTexture2D(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_TEXTURE_2D, texId, 0);

// Now render to that texture
...

// Generate MIP map.
glBindTexture(GL_TEXTURE_2D, texId);
glGenerateMipmap(GL_TEXTURE_2D);
  • Note: This was originally posted on 21st February 2012 at http://forums.arm.com

    Hi Bert,

    thanks for the report. I've done some quick calculations and I agree it looks like something's not right.

    If we assume each mipmap level from 256x256 down to 1x1 must be generated, that's:

    256^2 + 128^2 + ... + 1^2 ~= 90,000 texels to generate

    If we assumed a CPU based implementation which reads 4 texels from the previous level, splits out the colour channels, sums them, averages them, recombines and writes out the result that might be roughly 55 cycles.

    Assuming perfect memory, so no delays (it won't be, but...) and a 1.2GHz CPU clock that'd give a very rough estimate of:

    90,000 * 55 / 1.2e9 ~= 4ms

    So I agree, 90ms sounds too high. That duration on a 1.2GHz CPU should allow in excess of 100M cycles of work.

    Out of interest, how are you timing the operation? Are you trying to do this operation every frame? Are you trying to render the newly mipmapped texture later in the same frame?

    Do you happen to know what the other devices are using for the GL_GENERATE_MIPMAP_HINT when generating the mipmaps - i.e. GL_(FASTEST|NICEST|DONT_CARE) and have you set this hint in your application?

    Cheers, Pete
  • Note: This was originally posted on 21st February 2012 at http://forums.arm.com

    Hey Pete,

    thanks for looking into this.

    On all devices GL_GENERATE_MIPMAP_HINT is 0x1100 (GL_DONT_CARE). I tried  changing it to GL_NICEST and GL_FASTEST on the Mali device before  generating the mipmap, but that did not change anything.

    Yes, I need to mipmap the input texture (which represents a frame of a  camera stream) in every frame and also I am sort of rendering it by  extracting SURF features at interest point positions (those are  scale-space positions, hence I need a mipmap) into another texture.

    Timing the OpenGL ES calls is quite tedious. Currently I'm timing my  whole pipeline for about 100-300 iterations and average the resulting  values. To measure the time of a certain step, I reduce my pipeline step  by step, and estimate the time for each step by substracting. I also  measured the call to glGenerateMipmap() directly by wrapping the timing  functions around it, but that yielded the same results.
    However, actually I'm not 100% sure that what I'm measuring there is  really what I think it is. But that's a problem with OpenGL ES: it's  hard to measure anything more fine-grained than a whole rendering cycle.

    Greet, Bert
  • Note: This was originally posted on 24th February 2012 at http://forums.arm.com

    Hi Bert,

    I am still looking into this, and have replicated the issue with a simple example here. I hope to have more soon.

    Cheers, Pete
  • Note: This was originally posted on 24th February 2012 at http://forums.arm.com

    Thank you very much!
  • Note: This was originally posted on 13th August 2012 at http://forums.arm.com

    Has there ever been a resolution to this problem? I'm very interested to find out why the generation time for your object seems so large.

    Have you made any progress with your experimentation? Have you considered using the DS-5 profiler to get more insight into what's happening?

    Updates please!
  • Note: This was originally posted on 14th August 2012 at http://forums.arm.com

    Hi Sean,

    it turned out the driver was failing to hit a fast-path, which has been fixed. It may be worth contacting your device vendor to enquire whether there are any updates available.

    HTH, Pete