This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Shadows with alpha test (discard) not working on Galaxy Note 4 SM-N910U, ARM Mali-T760 MP6

Is this the right place to report problems with the GPU Driver?

I'm an engine developer, just added GLES3 support to my engine and I've noticed that on Android Galaxy Note 4 SM-N910U, ARM Mali-T760 MP6, the alpha tested shadows aren't working correctly.
I suspect OpenGL Driver fault, as the same code works fine on iOS, Windows, Mac, Linux, etc.

Shadows that don't use alpha-test (discard) work correctly, but those that use discard, don't display at all.

Shader for alpha-tested materials is below:

Vertex Shader:
#version 300 es
#ifdef GL_ES
#define LP lowp
#define MP mediump
#define HP highp
precision HP float;
precision HP int;
#else
#define LP
#define MP
#define HP
#endif
#if __VERSION__>=300
#define attribute in
#define varying out
#endif
varying vec4 GL_Tex0;
varying vec4 GL_Tex1;
varying vec4 GL_Tex2;
struct VS_PS{
vec3 _pos9;
vec3 _nrm2;
vec2 _tex3;
};
VS_PS _O1;
vec3 _TMP225;
vec4 _m0228[3];
vec4 _TMP344;
attribute vec4 ATTR0;
attribute vec4 ATTR3;
vec3 _TMP348;
vec3 _TMP349;
vec3 _TMP350;
vec3 _TMP351;
vec4 _TMP352;
vec4 _TMP353;
vec4 _TMP354;
vec4 _TMP355;
uniform vec4 ProjMatrix[4];
uniform vec4 ViewMatrix[180];
void main()
{
vec4 _O_vtx;
_m0228[0]=ViewMatrix[(3*gl_InstanceID+0)];
_m0228[1]=ViewMatrix[(3*gl_InstanceID+1)];
_m0228[2]=ViewMatrix[(3*gl_InstanceID+2)];
_TMP348.x=_m0228[0].x;
_TMP348.y=_m0228[1].x;
_TMP348.z=_m0228[2].x;
_TMP349.x=_m0228[0].y;
_TMP349.y=_m0228[1].y;
_TMP349.z=_m0228[2].y;
_TMP350.x=_m0228[0].z;
_TMP350.y=_m0228[1].z;
_TMP350.z=_m0228[2].z;
_TMP351.x=_m0228[0].w;
_TMP351.y=_m0228[1].w;
_TMP351.z=_m0228[2].w;
_TMP225=ATTR0.x*_TMP348+ATTR0.y*_TMP349+ATTR0.z*_TMP350+_TMP351;
_O1._pos9=_TMP225;
_O1._tex3=ATTR3.xy;
_TMP352.x=ProjMatrix[0].x;
_TMP352.y=ProjMatrix[1].x;
_TMP352.z=ProjMatrix[2].x;
_TMP352.w=ProjMatrix[3].x;
_TMP353.x=ProjMatrix[0].y;
_TMP353.y=ProjMatrix[1].y;
_TMP353.z=ProjMatrix[2].y;
_TMP353.w=ProjMatrix[3].y;
_TMP354.x=ProjMatrix[0].z;
_TMP354.y=ProjMatrix[1].z;
_TMP354.z=ProjMatrix[2].z;
_TMP354.w=ProjMatrix[3].z;
_TMP355.x=ProjMatrix[0].w;
_TMP355.y=ProjMatrix[1].w;
_TMP355.z=ProjMatrix[2].w;
_TMP355.w=ProjMatrix[3].w;
_TMP344=_TMP225.x*_TMP352+_TMP225.y*_TMP353+_TMP225.z*_TMP354+_TMP355;
_O_vtx=_TMP344;
GL_Tex1.xyz=_O1._nrm2;
gl_Position=_TMP344;
GL_Tex2.xy=ATTR3.xy;
GL_Tex0.xyz=_TMP225;
}


Pixel Shader:
#version 300 es
#extension GL_EXT_shader_texture_lod:enable
#extension GL_EXT_shadow_samplers:enable
#ifdef GL_ES
#define LP lowp
#define MP mediump
#define HP highp
precision HP float;
precision HP int;
precision HP sampler2D;
#if __VERSION__<300
#define gl_InstanceID 0
#endif
#else
#define LP
#define MP
#define HP
#endif
#if __VERSION__>=300
#define texture2D texture
#define varying in
#else
#endif
varying vec4 GL_Tex2;
struct MaterialClass{
vec4 _color;
vec4 _ambient_specular;
vec4 _sss_glow_rough_bump;
vec4 _texscale_detscale_detpower_reflect;
};
float _c0079;
uniform MaterialClass Material;
uniform sampler2D Col;
void main()
{
_c0079=texture2D(Col,GL_Tex2.xy).w+(false?float(Material._color.w)*5.00000000E-001-1.00000000E+000:float((Material._color.w-1.00000000E+000)));
if(_c0079<0.00000000E+000){
discard;
}
}

Expected result from Windows:

What I'm getting on Galaxy Note 4:

The tree model is composed of 2 materials (trunk that has no "discard", and the leaves that use the "discard" shader). Both shaders are shadow shaders, don't output any color, their only purpose is write to the depth buffer.

Here is the link to the APK that you can test by yourself:

www.dropbox.com/.../Application 3D.7z

  • The fragment (pixel) shader simplified would be something like this, if I'm not mistaken :

    #version 300 es
    
    precision highp float;
    precision highp int;
    precision highp sampler2D;
    
    in vec4 GL_Tex2;
    
    struct MaterialClass{
    	vec4 _color;
    	vec4 _ambient_specular;
    	vec4 _sss_glow_rough_bump;
    	vec4 _texscale_detscale_detpower_reflect;
    };
    
    float _c0079;
    
    uniform MaterialClass Material;
    uniform sampler2D Col;
    
    void main()
    {
    	float tex_alpha = texture(Col,GL_Tex2.xy).a;
    	float mat_alpha_minus_one = Material._color.a-1.0;
    	_c0079 = tex_alpha + mat_alpha_minus_one;
    	if(_c0079 < 0.0) {
    		discard;
    	}
    }

    I don't see where the fragment color is defined though ? IIRC, You'll need at least a vec4 value set to out and affected during the execution of the main function to get a pixel lit on the output framebuffer, when using OpenGL ES 3.x and later.

    I wonder if it would not be simpler to do :

    #version 300 es
    
    precision highp float;
    precision highp int;
    precision highp sampler2D;
    
    in vec4 GL_Tex2;
    
    struct MaterialClass{
    	vec4 _color;
    	vec4 _ambient_specular;
    	vec4 _sss_glow_rough_bump;
    	vec4 _texscale_detscale_detpower_reflect;
    };
    
    float _c0079;
    
    uniform MaterialClass Material;
    uniform sampler2D Col;
    
    out vec4 myColor;
    
    void main()
    {
    	float tex_alpha = texture(Col,GL_Tex2.xy).a;
    	float mat_alpha_minus_one = Material._color.a-1.0;
    	_c0079 = tex_alpha + mat_alpha_minus_one;
    	myColor = vec4(0.0,0.0,0.0,_c0079);
    }

  • This is a shadow shader, there's no need to set any colors, it only needs to set depth output.

    same shader without discard works fine, because you can see the tree trunk casting shadow.

    As mentioned before, the same shader (with discard) works on iOS which is also GLES3, and Win/Mac/Linux GL - I'm getting correct results everywhere except Mali.

    From my understanding the problem is not with the shader, but with Mali Driver/GPU.

  • Hi Esenthel,

    Thanks for reporting the issue. I have been able to reproduce it on a Note4 internally and seems this issue has already been fixed in newer versions of the driver. I found though some strange issue with your application where black lines straight lines are visible across the screen. I have run your sample with the Mali Graphics Debugger and found there are several errors from the API during initialization and also runtime. Also, in the RenderPass 3 you are using a texture as depth attachment but at the same time you are passing the same texture as a texture object to the shader. That is not allowed by the OpenGL ES spec.

    Also to reduce the number of drawcalls each frame would it be better if you use multiple FBOs initialized at the beginning and use glClear directly at the beginning of each renderpass (this is important since avoids reading back previous content).

    If you can create a smaller sample with only the Cascade shadow map generation I can try to see if there is a possible walkaround for the issue in  the driver version you have.

    Regards,
    Daniele

  • One minor correction here:

    Daniele Di Donato said:
    ... using a texture as depth attachment but at the same time you are passing the same texture as a texture object to the shader. That is not allowed by the OpenGL ES spec.

    It's allowed (e.g. not an error), but the results are implementation defined if you actually modify the attachment while concurrently using it as a source texture.

    It's quite a common use case to attach depth from a previous render pass and use it read-only (e.g. for depth testing, but with all depth writes masked) while also reading it as a texture. 

    Cheers, 
    Pete

  • Thank you Daniele and Peter for your replies.

    I'm glad to know that the issue with the shader has already been addressed in a newer version of the driver.

    However I'm surprised with the black lines that you're seeing, as I don't have that kind of problem on my Note4.

    My driver version is v1.r7p0-03rel0.e941a8

    I've checked my app with Mali Graphics Debugger, however all the errors are related to:

    -failure to create a certain texture format (such as BC7, BGRA) in that case I simply fallback to RGBA texture

    -problem when setting anisotropic filtering, as there's no GL_TEXTURE_MAX_ANISOTROPY defined for GLES3 headers, I've made my own #define GL_TEXTURE_MAX_ANISOTROPY 0x84FE to match GLES2 and desktop GL. However Mali Debugger doesn't recognize this enum. What happened to anisotropic filtering in GLES3? Did it disappear?

    Anyway, those problems shouldn't cause the black lines.

    And regarding the depth texture as render target and shader input, even if it's bound to some shader input, I'm not reading and writing at the same time to it.

    As for the FBO's and glClear, I choose to call 'glInvalidateFramebuffer' instead of glClear, because glClear is not free on some platforms. And I'd like to have a one code path for multiple platforms. If 'glInvalidateFramebuffer' is called at the start of rendering to an FBO, instead of glClear, is it not enough? I've did some performance checks, and speed was similar when I used glInvalidateFramebuffer instead of glClear.

    Sometimes I don't need to clear the memory, because I will overwrite it with some shader at the start, and since glClear is not free on some other platforms, I assumed it's better to just call glInvalidateFramebuffer.

  • Hi Esenthel,

    Yes, using glInvalidateFramebuffer at the beginning each render pass will have the same effect as glClear which basically avoids loading back the framebuffer that you are not going to use anyway. glClear would have made the code a bit cleaner and avoid the multiple api calls to setup the frame buffer for each render pass. Performance wise its the same to use glClear of glInvalidateFramebuffer.

    For the depth texture read you are right, I haven't realized you were not writing into it at the same time. The issue I see happens only on the transparent objects (the leaves in you example) of both the main scene and the shadow show the lines.

    This happens on a Firefly board which has a Mali T-760 MP4 similar to the one in the Note 4 but with new drivers.

    I believe the issue is caused by an out-of-spec behavior of your API calls. Specifically, the 6th render-pass (the last one rendering to an off-screen buffer), binds a texture to read from it in a shader but also uses it as a COLOR_ATTACHMENT0 for the framebuffer you are currently writing (all color masks set to true). This is out-of-spec and can be the cause of the issue.

    Since you are doing deferred shading I suggest to have a look at the Pixel Local Storage extension for the devices which supports it. That will allow to implement your algorithm more efficiently. You can find various documents around about how to use it.

    Cheers,
    DDD

  • Thank you very much for this helpful information.

    I was reading and writing to the same pixel so I assumed it won't be a problem, as it worked fine on DX9/desktop GL, and many mobile devices.

    However I've disabled reading and writing to the same color Render Target, I now output the result to another temporary Render Target.

    Could you let me know if you still see the black lines over there?  https://www.dropbox.com/s/k2im25s16ku7m5l/Application%203.7z?dl=0

    Thank you,

    Greg

  • Esenthel said:
    What happened to anisotropic filtering in GLES3? Did it disappear?

    It's never been part of the OpenGL ES specification; it's only available via extensions on the platforms which support it.