Hello,
My app works OK on:
-Windows Desktop GL
-Apple iOS iPad mini 2
-Samsung Galaxy Note 4 (ARM Mali-T760)
-Sony Xperia XZ2 Adreno 630
However when running on:
Huawei Mate 20 X (ARM Mali-G76) version OpenGL ES 3.2 v1.r16p0-01rel0.95d2435cbe2284d49b9bbcf5b1624fdd
Then I'm getting problems.
Expected results:
ARM Mali results:
after touching the screen to rotate the camera
I'm suspecting a driver bug.
This problem appears to be related to 'glInvalidateFramebuffer', if I replace all glInvalidateFramebuffer calls with glClear (or just remove the glInvalidateFramebuffer calls), then it starts to work ok.
Please check this link which includes APK files and images:
https://www.dropbox.com/sh/17lho4zzuwhuh4r/AACiAVIiSTxSv5_CeMDSqPSZa?dl=0
Thank you,
Greg
Hi Greg,
I'll take a look at this properly in the morning when I'm in the office, but just to check a few things based on the images.
It looks like are you are setting a scissor box which is smaller than the whole image and are just redrawing that (the square part on the screen which is getting drawn)?
A call to glClear behaves like a draw call so it will be impacted by the scissor box, so assuming you e.g. have EGL_BUFFER_PRESERVE set the parts outside of the scissor will be reliably preserved from the previous frame.
Most importantly a call to glInvalidateFramebuffer is *not* restricted by the scissor box - it will always invalidate the entire framebuffer. If you are then only redrawing the part of the frame inside the scissor then the contents outside of that are undefined. There is no guarantee what is rendered unless you redraw an invalidated region.
I'm not sure that's what you are doing here, but it would fit the behavior shown in the images.
HTH, Pete
Hi Pete,
My code more or less looks like this:
I have 2 renderings per frame:
-#1 deferred fullscreen
-#2 deferred to small rectangle on the left side, this rendering is partial (doesn't actually draw to default FBO)
---
Rendering #1, deferred fullscreen:
-set custom RT#0 (fbo attachment 0) for color, RT#1 (fbo attachment 1) for normals
-draw meshes
-set custom RT#2 (fbo attachment 0) for light
-calculate per pixel light based on depth and RT#1 normals
-set custom RT#3 (fbo attachment 0) for lit-color
-combine RT#0 color with RT#2 light
-set default FBO
-copy RT#3 lit-color results into default FBO
start Rendering #2, deferred to small viewport in left rectangle:
-for this rendering I reuse the same RT's from Rendering #1, (HOWEVER reusing might be in different order, like instead of RT#0, RT#1, RT#2.., there's RT#2, RT#1, RT#0)
And I've noticed that the problem on Mali already happens, without even copying results from Rendering #2 into the default FBO.
My guess is that the 'glInvalidateFramebuffer' calls from Rendering #2 affect somehow the Rendering #1, because I reuse the same RT's for both renderings.
Also I operate on viewports instead of scissor rectangle if that matters.
I've just made another test, that makes sure to disable any scissor before clearing RT's. And when I replace all glInvalidateFramebuffer with glClear or glClearBufferfv to purple color, then my app works OK, and I don't see any purple pixels. So it really looks like there's some bug with glInvalidateFramebuffer in the driver.
I would expect the behavior of a sub-frame render to be the same (whether you are using scissors or viewports).
Calling glInvalidateFramebuffer() will invalidate the memory contents of the entire surface - irrespective of what scissor or viewport is set - so any region outside of that will contain garbage unless it is explicitly redrawn. If you are reusing attachments across framebuffers then an invalidate in one framebuffer will render the contents of that attachment invalid in all other framebuffers that are using it (i.e. the invalidate applies to the memory content of the attachment not the framebuffer container).
Calling glClear on the other hand will honour both scissor and viewport, leaving the part outside untouched.
The behavior of the two is very different in these cases, so don't expect the rendering to be the same. That said, I'll check the APK now - the devil is in the detail in these cases.
Cheers, Pete
glClear and glClearBufferfv are not affected by the viewport, only by scissor.
I use only default FBO (0) and 1 custom FBO, for which I change all attachments depending on what I need.
Esenthel said:I use only default FBO (0) and 1 custom FBO, for which I change all attachments depending on what I need.
It's unrelated to this issue - is there any particular reason for not just using multiple pre-generated FBOs? It would be lower overhead on the CPU.
My engine is cross platform, including directx 11 support, and there there's no concept of a frame buffer, you just attach render targets to 0,1,2.. slots. I find this approach much more natural. So I've built the render target management around that concept. Also I have a lot of post process effects, for which most of the time I need different kind of render targets attached. Creating an FBO for each combination would result in a lot of FBO'S created. Also it would make the render targets have to be strictly attached to those FBO'S. It would apply some restrictions about what render targets I can use and what not. But with my approach, I allocate render targets on demand when I need them, I keep them in a pool of render targets, once an effect is finished, then I mark render target as available for reuse, so it can be used for another post process effect. This way there's no restriction on what render targets can be used for a post process effect, I just reuse the first one available that matches desired resolution and format.
Just to confirm - I've taken a look at the APK and reproduced the issue on a Galaxy S10. As far as I can tell it is indeed a driver problem, so I'll raise a ticket for the driver team to take a look at it. Thanks for reporting the problem.
Thank you very much for taking the time to investigate.
Is the only workaround to simply don't call glInvalidateFramebuffer?
What devices are affected, all Mali G-series? What would be the driver version GL_VERSION that fixes this problem?
I'm looking for a way to identify the affected devices based on GL_RENDERER, GL_VENDOR, GL_VERSION, ..
Even if you release a new driver update, what are the chances that major phone manufacturers (Samsung, Huawei, ..) will include this driver in a software update? Few years ago I've contacted Samsung about their GPU driver not being updated to Galaxy Note 4 (which suffered from shadows not being properly rendered for alpha-tested shaders) and they didn't bother to reply or make an update. Personally I think it's a big problem how Android Manufacturers handle updates, and GPU driver updates. I think a better system should be implemented, for example allowing to install GPU drivers manually.
Thank you very much,
Esenthel said:Is the only workaround to simply don't call glInvalidateFramebuffer? What devices are affected, all Mali G-series? What would be the driver version GL_VERSION that fixes this problem?
I'll let you know when I have a more concrete diagnosis from the driver team.
For Mali using a glClear at the start of the render pass performs the same as using glInvalidateFramebuffer; both will set the tile memory to clear color and that is a free operation at the start of the render pass. Given you know this works it seems like the safest option.
I have an untested hunch that another workaround would be to use a different container FBO for each of the different resolutions and/or viewports that you are rendering. This is just an educated guess and I've not had time to test it.
Esenthel said:Even if you release a new driver update, what are the chances that major phone manufacturers (Samsung, Huawei, ..) will include this driver in a software update?
We make the drivers available as soon as we have fixes. How those drivers get downstream to specific devices is unfortunately completely out of our hands.
I'll update this thread when I know more.
we have encountered the similar problem, currently we are trying to send GL_DEPTH_STENCIL_ATTACHMENT to glInvalidateFramebuffer, and this works fine on other devices except for the huawei mate 20. on huawei mate 20 an gles error will be throwed out.
This sounds like a different problem. Please raise a new forum post with details, and a reproducer and we can investigate, Thanks, Pete
Thank you very much, I'll await more info.
Was the bug fixed?
Sorry for the delayed response; it took a while for the driver team to hunt this one down.
On some driver versions we have a state tracking bug which causes an optimization to incorrectly trigger, causing tiles to be discarded even though they are different to the content in memory. This only occurs in cases where the render pass does not redraw the whole screen, so the rendered bounding box changes between uses of the same framebuffer.
The only workaround we have for now is ensuring that the whole screen is redrawn in render passes that use invalidate.
Kind regards Pete