I have a question regarding the efficiency of discarding fragments for scenes that are excellently sorted, but have many overlapping triangles.
If you have a large number of large triangles (eg. 1000) that are perfectly front-to-back sorted but overlap (or occlude) one another, roughly how much hardware performance penalty will this incur? I understand that the sorting will result in zero overdraw and thus fewer fragment's being processed by the fragment shader, but in this case will Mali be able to discard fragments that are not seen at a very low performance cost?
If there is a significant performance cost, what is a rule-of-thumb maximum amount of triangle overlap before performance starts to degrade?
Sean
Hi Sean,
Rendering front-to-back rendering is generally really efficient - although I doubt we could hide the cost of loading 999 occluded triangles - that's quite a lot of bandwidth in terms of triangle loading, just to work out that the triangle isn't visible. How much we can actually hide depends on the shader complexity of the visible front layer - if that only takes a single cycle in throughput terms then you will be able to hide a lot less overdraw load overhead than a situation with the front-facing triangle which takes a thousand cycles a pixel.
Most 3D content is pretty good at implementing culling in the application, so for well written applications the overdraw rates for opaque geometry is generally between 20% and 100% of the visible fragments, which we can generally happily hide with no penalty, but making that as low as possible via application-side techniques is always better.
Pete
Hi Peter,
Many thanks for the response! If I understand you correctly, you are implying that pixel-culling is done in hardware in the background, and as such a top-layer triangle can hide the performance cost of more occluded layers if its fragment shader is running longer..
Somewhat off topic: Assuming a modest amount (100% to 200%) of overdraw, is it valid to assume that Mali's tile based rendering should do well at minimizing external memory writes?
No so much "in the background" as just deeply pipelined.
We load triangles at one end of the pipeline, rasterize them to fragments, issue fragments to get colored, run the shader program, and finally colored tiles drop out the other end of the pipeline. All of these stages run in parallel, so if one bit stalls for a while doing "redundant" work that's generally not too much of an issue unless:
In well written content the "run the shader program" part of the pipeline is the dominant part, so the other bits processing some redundant work is not the end of the world (in reality it will have some small knock-on effect, due to shared resources, such as cache or memory bandwidth, but it is normally minor).
> is it valid to assume that Mali's tile based rendering should do well at minimizing external memory writes?
Yes, all of the color / depth / stencil framebuffer state will remain inside the tile until writeout at the end of the tile, so any blending, depth testing, or MSAA is "free". With suitable use of glInvalidateFramebuffer (or discard extension in GLES 2.0) the transient state (which depth and stencil often are) need never hit main memory at all.
HTH,
Ah, I see... A very important distinction.
Thanks, this has been very helpful (and entertaining)!
Cheers,