This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

glCompileShader taking a long time

Hello.

I am working on a game and one of the device (Samsung Galaxy S3 (GT-I9300) with gpu Mali-400MP4, android version 4.0.4) is displaying problems while trying to compile one of the shaders. The gl call glCompileShader takes exceptionally long time for this shader (20+seconds). It is not a necessarily a big complicated shader and I am attaching the source file here. I have tried experimenting with changing the sahders and the compile time does go down if I start taking out instructions but even a simple acceptable shader for the game is taking 5-10 seconds to compile depending on the features. Unfortunately I have hit a wall while trying to figure out exactly which instruction is causing this issue and am not getting anywhere. Since it doesnt technically crash I get no information from glGetShaderInfoLog. Any help on this will be greatly appreciated.


PS - I am not seeing this issue on most of the other devices. I also trying using the offline compiler but I ran into other issues like the compiled shader would not link complaining (L0101 All attached shaders must be compiled prior to linking).




shaderGlsl.frag.zip
Parents

  • Thanks for your prompt reply The shader gets compiled as hlsl through D3DXCompileShader (using max optimizations) and then we translate it to glsl which is why the code is the way it is. I will go through the translator code to see if we can improve it based on your suggestions.



    I suspect the main issue is the amount of working data you have hanging about in registers which has quite a long lifetime in the program. The register allocator is going to have to work quite hard to pack that into the register file as efficiently as possible to avoid spending a lot of time stacking and unstacking variables.

        


         By this do you mean local registers like the following?

      mediump vec4 r4_1;

      mediump vec4 r3_2;

      mediump vec4 r2_3;

      mediump vec4 r1_4;

      mediump vec4 r0_5;

       Eliminating them would be hard because these get generated by D3DXCompileShader. I am just surprised as to why this problem only occurs on Mali400MP4 devices. I was hoping it was a something to do with a specific instruction. I will also spend some more time digging into the offline compiler a bit more.

Reply

  • Thanks for your prompt reply The shader gets compiled as hlsl through D3DXCompileShader (using max optimizations) and then we translate it to glsl which is why the code is the way it is. I will go through the translator code to see if we can improve it based on your suggestions.



    I suspect the main issue is the amount of working data you have hanging about in registers which has quite a long lifetime in the program. The register allocator is going to have to work quite hard to pack that into the register file as efficiently as possible to avoid spending a lot of time stacking and unstacking variables.

        


         By this do you mean local registers like the following?

      mediump vec4 r4_1;

      mediump vec4 r3_2;

      mediump vec4 r2_3;

      mediump vec4 r1_4;

      mediump vec4 r0_5;

       Eliminating them would be hard because these get generated by D3DXCompileShader. I am just surprised as to why this problem only occurs on Mali400MP4 devices. I was hoping it was a something to do with a specific instruction. I will also spend some more time digging into the offline compiler a bit more.

Children
  • The shader gets compiled as hlsl through D3DXCompileShader (using max optimizations) and then we translate it to glsl which is why the code is the way it is.

    Yep - I guessed it would be something like that - it definitely has that "written by a machine" look about it.

    I will go through the translator code to see if we can improve it based on your suggestions.

    Try hand fixing one first. It may make no difference, so I'd hate for you to spend a load of time improving the translator for that effort to be wasted .

    By this do you mean local registers like the following?

      mediump vec4 r4_1;

      mediump vec4 r3_2;

      mediump vec4 r2_3;

      mediump vec4 r1_4;

      mediump vec4 r0_5;

    Indirectly yes. Any variable which exists in the program needs to have register storage (spilling to stack storage if we run out of space) from the point it is first assigned a value to the point it is last used. Things like uniforms and constants are handled differently - so are not counted in this.

    This program has quite a lot of "working state". Most of the variables are assigned relatively early and stay "alive" for a long time because they are used in the final few instructions of the program, so the compiler has to work out how to most optimally keep this data in registers, while also packing things efficiently for the vector ALUs.

    If you can change the algorithm to need fewer live variables it could help - but it would change the visual output of course.

    HTH,
    Pete

  • P.S. I've just checked with the current offline compiler from http://malideveloper.arm.com/develop-for-mali/tools/analysis-debug/mali-gpu-offline-shader-compiler/ and this seems to perform OK on a desktop PC (4ms to compile) so I suspect you are running in to an issue which is only present on older driver releases.


    chrisvarns can you please raise a support ticket.


    Cheers,

    Pete

  • Will do on Tuesday, UK bank holiday on Monday.

    Thanks,

    Chris