This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Malisc Offline Compiler Verbose/Analyse result documentation

Hi,

Nice tools you're offering, I really love the offline compiler, and it really fits well in our pipeline to deliver shader to arm/mali mobile (unit tests and debug).

Now we'd like to advance on the perf departement, but I'm having hard time to figure out the documentation.

So I'm running

#malics -V -frag myshade.frag

in hope to get useful information on our shader performance, and got a nice array of results, but I could use some documentation on each column. Here's what I get

"

8 work registers used, 4 uniform registers used, spilling used.

AL/STTotal  Bound
cycles1332517175A
shortest path25241766A
longest path32251474A

"

I did search the website but couldn't find anything on any of those info, I can guess some, but really prefer making sure I have the perfect meaning of each.

I'd really like to be able to get as much info and meaning from command line

(this avoiding the huge 'studio' thing usage which implies too much setup for each shader where a simple compilation is enough)

Thanks !

Parents
  • So Textures: is that texture fetch count or gpu cycles needed for texture operation ?

    It is the number of instructions executed in the T pipe, so is a count of GPU cycles. The T pipe does all texture sampling/filtering.

    Spilling is an interesting one, basically the number of threads that can concurrently execute in the shader core is determined by the number of registers that those threads use. 4 or less means we can concurrently execute the maximum number of threads, up to 8 means we can only do half. If we need more than 8 registers (complicated shaders with lots of variables with long lifetimes) then it's better to "spill" some of those registers to the cache for temporary storage, as this is more performant in practice than expanding the register set any more. The trade-off is that this increases L/S pipe load as it has to save/load those variables.

    Hth,

    Chris

Reply
  • So Textures: is that texture fetch count or gpu cycles needed for texture operation ?

    It is the number of instructions executed in the T pipe, so is a count of GPU cycles. The T pipe does all texture sampling/filtering.

    Spilling is an interesting one, basically the number of threads that can concurrently execute in the shader core is determined by the number of registers that those threads use. 4 or less means we can concurrently execute the maximum number of threads, up to 8 means we can only do half. If we need more than 8 registers (complicated shaders with lots of variables with long lifetimes) then it's better to "spill" some of those registers to the cache for temporary storage, as this is more performant in practice than expanding the register set any more. The trade-off is that this increases L/S pipe load as it has to save/load those variables.

    Hth,

    Chris

Children
  • It is the number of instructions executed in the T pipe, so is a count of GPU cycles. The T pipe does all texture sampling/filtering.

    One slight clarification on this one.


    The counter counts the number of texture instructions. As Chris mentions one texture instruction does one texture access, including filtering, decompression, etc. Single-sample or bi-linear filtered texture instructions take a single cycle, trilinear or 3D textures take two cycles. The compiler doesn't know what data assets are used so it will only ever assume single cycle.


    Texture instructions can effectively take longer than a single cycle if you get bad cache behaviour (e.g. applying a very large texture to a very small screen area without mipmaps so you thrash the texture cache). Any cycle count overheads due to cache misses are not shown by this counter (again, the compiler doesn't know).


    HTH,

    Pete



  • Thanks Pete,

    The shader compiler states this in it's output as well:

    Note: The cycles counts do not include possible stalls due to cache misses.

  • Thanks a lot for the explanation and support.

    Now, I'm a bit at loss with the Texture result, and I reproduced it with that simple fragment shader

    precision lowp float;

    precision lowp int;

      uniform sampler2D myRenderTexture;

      #define SAMPLES 256

       void main(){

       float samples_f = float(SAMPLES);

           float idx_u = 1.0/samples_f;

        vec2 uv = vec2(0.0, 0.0);

        for (int i = 0; i < SAMPLES; ++i){

          uv.x += idx_u;

          uv.y -= idx_u;

             gl_FragColor.rgb += texture2D(myRenderTexture, gl_FragCoord.xy + uv).rgb;

          }

        gl_FragColor.rgb /=  float(SAMPLES);

        gl_FragColor.a = 1.0;

      }

    My understanding was that I ought to get "SAMPLES" value in the T  column

    Whatever I'm changing SAMPLES to, I get 1 T ?

    Btw, what does it means when I get a row of "-1" in "longest path" ?

  • Whatever I'm changing SAMPLES to, I get 1 T ?

    This is a known issue in the stats reported by the offline compiler - the static analysis pass which generates the stats doesn't really understand loops, so assumes that the loop body is executed only once.

    HTH,
    Pete

  • Sorry, the "-1" in "longest path" row should have made me realize... Makes much more sense now.

    Now, let's unroll !

    And Thanks again for the great support, it really helps.