This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Mali Offline compiler GLSL clamp performance on Mali-Gxx

dbe_dev over 7 years ago

Hi,

Been doing some analysis on some GLSL shader programs with the Mali Offline Compiler, great program btw.

It is however reporting that a clamp (vec3,float,float) is somehow slower for the Mali-Gxx than doing a seperate min/max according the offline compiler.

vec3 clamp_minmax (vec3 val, float minimum, float maximum){
    return clamp(val, minimum, maximum);
}

vec3 clamp_minmax (vec3 val, float minimum, float maximum){
   vec3 rval = min(val, maximum);
   return max (rval, minimum);
}

Note that the minimum is a constant 0.0f.

Is this expected and correct for a Mali-Gxx ?

Top replies

Parents

0 dbe_dev over 7 years ago in reply to Peter Harris

Hi Pete,

Tried to minimize it to the case where I see it happen. It seems to be somehow dependent on the multiply that happens before.

The following fragment code on a G72 it gives 2.75 on the _a (clamp) and 1.25 on the _b max/min variant:

#version 300 es

precision highp float;

in vec2 vTextureCoord;

uniform sampler2D sTexture;

uniform float myparam;
uniform float myparam2;

out vec4 fragColor;

vec3 clamp_minmax_b (vec3 val, float minimum, float maximum){
vec3 rval = max(val, minimum);
return min (rval, maximum);
}

vec3 clamp_minmax_a (vec3 val, float minimum, float maximum){
return clamp(val,minimum,maximum);
}

void main() {
vec4 raw = texture(sTexture, vTextureCoord);

vec3 clamped_raw = vec3(raw.r, raw.g,raw.b);
clamped_raw = clamped_raw * myparam2;
clamped_raw = clamp_minmax_a( clamped_raw, 0.0f,myparam);

fragColor = vec4(clamped_raw, 1.0f);
}

Regards,

Danny
Cancel
Vote up 0 Vote down

Cancel

Reply

0 dbe_dev over 7 years ago in reply to Peter Harris

Hi Pete,

Tried to minimize it to the case where I see it happen. It seems to be somehow dependent on the multiply that happens before.

The following fragment code on a G72 it gives 2.75 on the _a (clamp) and 1.25 on the _b max/min variant:

#version 300 es

precision highp float;

in vec2 vTextureCoord;

uniform sampler2D sTexture;

uniform float myparam;
uniform float myparam2;

out vec4 fragColor;

vec3 clamp_minmax_b (vec3 val, float minimum, float maximum){
vec3 rval = max(val, minimum);
return min (rval, maximum);
}

vec3 clamp_minmax_a (vec3 val, float minimum, float maximum){
return clamp(val,minimum,maximum);
}

void main() {
vec4 raw = texture(sTexture, vTextureCoord);

vec3 clamped_raw = vec3(raw.r, raw.g,raw.b);
clamped_raw = clamped_raw * myparam2;
clamped_raw = clamp_minmax_a( clamped_raw, 0.0f,myparam);

fragColor = vec4(clamped_raw, 1.0f);
}

Regards,

Danny
Cancel
Vote up 0 Vote down

Cancel

Children

0 Peter Harris over 7 years ago in reply to dbe_dev

dbe_dev said:
It seems to be somehow dependent on the multiply that happens before.

A single arithmetic instruction is a packed pair of operations, so the number of cycles can be sensitive to how well surrounding operations pack into those pairings.
Cancel
Vote up 0 Vote down

Cancel
0 dbe_dev over 7 years ago in reply to Peter Harris

Agree that that instruction packing can cause a difference in cycle count in general. However in this particular case for _b adding the multiply reduces the cycle count (from 2 ->1.25) whereas for _a it increases from (2 ->2.75) according to the tool. So that does make doubt the output of the tool for these GPU's as it does not seem logical that adding a multiply reduces the overall cycle count.

As I am currently mostly using the tool for some estimations I would like to understand if the tool output really reflects the actual performance on these GPU's or if am just looking at some glitch in the tooling.
Cancel
Vote up 0 Vote down

Cancel
+1 Peter Harris over 7 years ago in reply to dbe_dev

The absolute numbers for the G series are incorrect (at least for the arithmetic cost) - we'll be fixing this in the next offline compiler release later in the year. The trend direction in terms of is it getting faster or slower should be accurate though, to the best of my knowledge.
Cancel
Vote up +1 Vote down

Cancel
0 dbe_dev over 7 years ago in reply to Peter Harris

Ok , that's enough info for me! Be looking forward to that next release. Thank you for the quick support!
Cancel
Vote up 0 Vote down

Cancel