This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Hypothetical driver stall in upscale pass

Apologies for the odd title but I was wondering if you could perhaps shed some light when it comes do driver-side behavior.
We're seeing some suspiciously high time "around" a single draw call - slightly above 2ms on a Mali-G72MP2 (Samsung A20). This represents an upscale pass (in Unity). I'd expect this to cost next to "nothing", CPU-wise, but I was wondering if what we're seeing could be related with the driver making sure the framebuffer that is being upscaled is fully resolved before the draw call and therefore stalling?

A simplified view of the GL calls would be:

glInvalidateFramebuffer( intermediateFBO ); // DEPTH and Stencil
...
glBindTexture( intermediateTexture); // the texture attached to intermediateFBO
glBindFramebuffer( backbuffer );
glInvalidateFramebuffer( backbuffer ); // COLOR
...
glDrawElements(); // the fullscreen quad

Parents
  • Hi JPJ, 

    I don't have a good answer here unfortunately - there are few things it could be, but TBH neither seem a perfect fit for something mid-frame. 

    • Is the content hitting 60 FPS?
    • Does the content have a lot of simple render passes?

    For content hitting vsync there is always going to be a stall somewhere. It's normally during the either the first drawcall or the final eglSwapBuffers call, but it's possible you hit something mid-frame (but I view this a the least likely of the options).

    The driver includes a rate limiter to manage the number of render passes in flight. The first draw will "commit" that to be a real render pass, so it's possible that is tripping the max outstanding throttle if there are a number of other simple post passes before it. 

    Hard to give a better answer, sorry, 

    Pete

Reply
  • Hi JPJ, 

    I don't have a good answer here unfortunately - there are few things it could be, but TBH neither seem a perfect fit for something mid-frame. 

    • Is the content hitting 60 FPS?
    • Does the content have a lot of simple render passes?

    For content hitting vsync there is always going to be a stall somewhere. It's normally during the either the first drawcall or the final eglSwapBuffers call, but it's possible you hit something mid-frame (but I view this a the least likely of the options).

    The driver includes a rate limiter to manage the number of render passes in flight. The first draw will "commit" that to be a real render pass, so it's possible that is tripping the max outstanding throttle if there are a number of other simple post passes before it. 

    Hard to give a better answer, sorry, 

    Pete

Children
  • Thanks for the reply Pete!

    The content is throttled to 30 FPS but the GPU frame time is hovering the 24ms (with some fairly long gaps in between frames). The CPU-side seems to be struggling (at least from my analysis between Streamline and Unity's profiler). We have less than a handful of passes (shadow, 3D and upscale really). What I found strange was the "stall" happening before the presumable eglSwapBuffers one (which I understand to be represented in Unity by the PresentFrame call, and seems to take on median ~5ms). 

    At first I assumed that the driver could be "flushing" data to the GPU but the stall seems to happen every frame followed by a, perhaps, more expected stall in PresentFrame.

    I attached below some images that represent this (so it better illustrates the point) but I'll approach Unity in regards to this.

    Also, I should add that the UberPost you see in the image is only doing a lightweight vignette (and we also tried just a simple blit, which produced similar results).

  • My only thought here is just "beware of frequency scaling". If you are hitting your vsync or software-enforced performance target, whether you get idle time per frame or not depends on the workload vs the available frequency choices. 

    Back-of-envelope scribbles seem to show that the GPU is running at ~325MHz, which is likely the min frequency for this device. The CPU might well have more frequency choice available, meaning it "just fits" the workload demand.

  • Cheers Pete! I think I see what you mean. I had a look at the Streamline plot and it does seem that one of A73 cores seems well below, at times, from the max 1.6Ghz spec'ed for the Exynos 7884.