I have a question regarding the efficiency of discarding fragments for scenes that are excellently sorted, but have many overlapping triangles.
If you have a large number of large triangles (eg. 1000) that are perfectly front-to-back sorted but overlap (or occlude) one another, roughly how much hardware performance penalty will this incur? I understand that the sorting will result in zero overdraw and thus fewer fragment's being processed by the fragment shader, but in this case will Mali be able to discard fragments that are not seen at a very low performance cost?
If there is a significant performance cost, what is a rule-of-thumb maximum amount of triangle overlap before performance starts to degrade?
Sean
Hi Peter,
Many thanks for the response! If I understand you correctly, you are implying that pixel-culling is done in hardware in the background, and as such a top-layer triangle can hide the performance cost of more occluded layers if its fragment shader is running longer..
Somewhat off topic: Assuming a modest amount (100% to 200%) of overdraw, is it valid to assume that Mali's tile based rendering should do well at minimizing external memory writes?
No so much "in the background" as just deeply pipelined.
We load triangles at one end of the pipeline, rasterize them to fragments, issue fragments to get colored, run the shader program, and finally colored tiles drop out the other end of the pipeline. All of these stages run in parallel, so if one bit stalls for a while doing "redundant" work that's generally not too much of an issue unless:
In well written content the "run the shader program" part of the pipeline is the dominant part, so the other bits processing some redundant work is not the end of the world (in reality it will have some small knock-on effect, due to shared resources, such as cache or memory bandwidth, but it is normally minor).
> is it valid to assume that Mali's tile based rendering should do well at minimizing external memory writes?
Yes, all of the color / depth / stencil framebuffer state will remain inside the tile until writeout at the end of the tile, so any blending, depth testing, or MSAA is "free". With suitable use of glInvalidateFramebuffer (or discard extension in GLES 2.0) the transient state (which depth and stencil often are) need never hit main memory at all.
HTH,
Pete
Ah, I see... A very important distinction.
Thanks, this has been very helpful (and entertaining)!
Cheers,