Hi,
Nice tools you're offering, I really love the offline compiler, and it really fits well in our pipeline to deliver shader to arm/mali mobile (unit tests and debug).
Now we'd like to advance on the perf departement, but I'm having hard time to figure out the documentation.
So I'm running
#malics -V -frag myshade.frag
in hope to get useful information on our shader performance, and got a nice array of results, but I could use some documentation on each column. Here's what I get
"
8 work registers used, 4 uniform registers used, spilling used.
I did search the website but couldn't find anything on any of those info, I can guess some, but really prefer making sure I have the perfect meaning of each.
I'd really like to be able to get as much info and meaning from command line
(this avoiding the huge 'studio' thing usage which implies too much setup for each shader where a simple compilation is enough)
Thanks !
It is the number of instructions executed in the T pipe, so is a count of GPU cycles. The T pipe does all texture sampling/filtering.
One slight clarification on this one.
The counter counts the number of texture instructions. As Chris mentions one texture instruction does one texture access, including filtering, decompression, etc. Single-sample or bi-linear filtered texture instructions take a single cycle, trilinear or 3D textures take two cycles. The compiler doesn't know what data assets are used so it will only ever assume single cycle.
Texture instructions can effectively take longer than a single cycle if you get bad cache behaviour (e.g. applying a very large texture to a very small screen area without mipmaps so you thrash the texture cache). Any cycle count overheads due to cache misses are not shown by this counter (again, the compiler doesn't know).
HTH,
Pete
Thanks Pete,
The shader compiler states this in it's output as well:
Note: The cycles counts do not include possible stalls due to cache misses.
Thanks a lot for the explanation and support.
Now, I'm a bit at loss with the Texture result, and I reproduced it with that simple fragment shader
precision lowp float; precision lowp int; uniform sampler2D myRenderTexture; #define SAMPLES 256 void main(){ float samples_f = float(SAMPLES); float idx_u = 1.0/samples_f; vec2 uv = vec2(0.0, 0.0); for (int i = 0; i < SAMPLES; ++i){ uv.x += idx_u; uv.y -= idx_u; gl_FragColor.rgb += texture2D(myRenderTexture, gl_FragCoord.xy + uv).rgb; } gl_FragColor.rgb /= float(SAMPLES); gl_FragColor.a = 1.0; }
precision lowp float;
precision lowp int;
uniform sampler2D myRenderTexture;
#define SAMPLES 256
void main(){
float samples_f = float(SAMPLES);
float idx_u = 1.0/samples_f;
vec2 uv = vec2(0.0, 0.0);
for (int i = 0; i < SAMPLES; ++i){
uv.x += idx_u;
uv.y -= idx_u;
gl_FragColor.rgb += texture2D(myRenderTexture, gl_FragCoord.xy + uv).rgb;
}
gl_FragColor.rgb /= float(SAMPLES);
gl_FragColor.a = 1.0;
My understanding was that I ought to get "SAMPLES" value in the T column
Whatever I'm changing SAMPLES to, I get 1 T ?
Btw, what does it means when I get a row of "-1" in "longest path" ?
This is a known issue in the stats reported by the offline compiler - the static analysis pass which generates the stats doesn't really understand loops, so assumes that the loop body is executed only once.
HTH, Pete
Sorry, the "-1" in "longest path" row should have made me realize... Makes much more sense now.
Now, let's unroll !
And Thanks again for the great support, it really helps.