This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Strange offline shader compiler results for G76 target

I've optimized some math in my shader - got rid of unnecessary uniform computations by offloading them to CPU.

When testing old vs new shader against G52 and G76 targets (Bifrost) I see that `Has uniform computation` changed from `true` to `false`, and number or used uniform registers reduced from 12 to 16.

However, when target is Valhall (either 1st or 2nd gen) GPUs, the results are rather strange. For some reason, compiler still detects uniform computations, and outputs are identical for both new and old shaders.

Old shader:

#extension GL_ARM_shader_framebuffer_fetch_depth_stencil : enable

precision highp float;

uniform vec2 uCameraRange;
uniform float uTransitionSize;

float calc_depth(in float z)
{
  return (2.0 * uCameraRange.x) / (uCameraRange.y + uCameraRange.x - z*(uCameraRange.y - uCameraRange.x));
}

varying vec2 vTextureCoord;
uniform sampler2D sTexture;
uniform vec4 color;

void main() {
   vec4 diffuse = texture2D(sTexture, vTextureCoord) * color;
   float geometryZ = calc_depth(gl_LastFragDepthARM);
   float sceneZ = calc_depth(gl_FragCoord.z);
   float a = clamp(geometryZ - sceneZ, 0.0, 1.0);
   float b = smoothstep(0.0, uTransitionSize, a);
   gl_FragColor = diffuse * b;
}

New shader:

#extension GL_ARM_shader_framebuffer_fetch_depth_stencil : enable

precision highp float;

uniform vec3 uCameraRange; // x = 2 * near; y = far + near; z = far - near
uniform float uTransitionSize;

float calc_depth(in float z)
{
  return uCameraRange.x / (uCameraRange.y - z * uCameraRange.z);
}

varying vec2 vTextureCoord;
uniform sampler2D sTexture;
uniform vec4 color;

void main() {
   vec4 diffuse = texture2D(sTexture, vTextureCoord) * color;
   float geometryZ = calc_depth(gl_LastFragDepthARM);
   float sceneZ = calc_depth(gl_FragCoord.z);
   float a = clamp(geometryZ - sceneZ, 0.0, 1.0);
   float b = smoothstep(0.0, uTransitionSize, a);
   gl_FragColor = diffuse * b;
}

Parents
  • Hi Oleksandr, 

    Thanks - I can reproduce this one on the latest internal builds too.

    Valhall uses a different instruction set, and has different limitations on register scheduling, so some differences across generations is expected. The uniform optimization may also trigger due to some compiler pre-amble that is needed, so it's possible it's being triggered by that rather than the user code here. I'll follow up with the compiler team just to double check that is what is happening.

    Note that even in the Bifrost case the cycle cost of the main shader (ignoring the uniform computation, which is optimized out) is the same, so the only difference here is register allocation count plus the small cost of the uniform folding. Both shaders are well inside the uniform register file capacity, so I don't think there is a major concern here. 

    Cheers, 
    Pete

Reply
  • Hi Oleksandr, 

    Thanks - I can reproduce this one on the latest internal builds too.

    Valhall uses a different instruction set, and has different limitations on register scheduling, so some differences across generations is expected. The uniform optimization may also trigger due to some compiler pre-amble that is needed, so it's possible it's being triggered by that rather than the user code here. I'll follow up with the compiler team just to double check that is what is happening.

    Note that even in the Bifrost case the cycle cost of the main shader (ignoring the uniform computation, which is optimized out) is the same, so the only difference here is register allocation count plus the small cost of the uniform folding. Both shaders are well inside the uniform register file capacity, so I don't think there is a major concern here. 

    Cheers, 
    Pete

Children