After a couple of false starts (turns out the Mali Debugger gets awfully confused if you tidy up after yourself and delete shader programs after linking) I now have some quantitative data in my quest to achieve decent performance in the game I'm working on. Unfortunately, it's just made me more confused than ever.
The scene I'm tackling has 200 draw calls, an average pixel overdraw of 1.3, and none of the shaders involved have a cycle cost over 5 (and that's used very sparingly; the rest cost no more than 3)
Based on the GPU specs, and a target framebuffer of 1920x1080 with no MSAA, the theoretical cycle budget is 45cpp. At worst I'm asking it for 8cpp.
On an iPod6, the same scene runs and renders in under 11ms. On an iPhone6s, it's under 8ms. Yet on a Note 5, it's taking 25-30ms. Even if I strip out half the scene, it still doesn't hit 60fps.
I'm not doing anything that would cause a stall - I'm not trying to modify memory committed to the GPU, I'm not trying to read back the contents of a texture. My engine generates 'render packets' (scene rendering command chunks) a frame behind, and dispatches them all right at the start of the next frame to make the most of the available GPU time.
I'm rendering to a number of off-screen textures (some for animated 'TV screen' purposes, some for character shadows), but they're all small (128x128 or 256x128), and they're all organised so as to be written to once per frame before the main render starts. And yes, I'm disabling scissoring and doing a glClear as the first command each time to avoid a logical load (that was slowing the iOS version down too). I'd like to discard the zbuffer too, for the few that use one, but unfortunately I can't seem to get gldiscardFramebufferEXT to work. In any case, that doesn't cripple the iOS version. Even if I fully disable all off-screen rendering, the remainder of the scene does not render inside 16ms.
I've used Game Tuner and forced it to prevent the GPU clocking down - still nothing. I'm pretty much out of ideas. Anyone else got anything?
EDIT: Done more profiling/fiddling and got some additional numbers to narrow the problem down:
Game logic and render packet generation (that's the bit that walks my scene graphs, concatenates matrices, organises and if necessary sorts the things that need drawing, and creates a streamlined list of draw call packets for next frame) are consuming a total of 15% of the frame time. This means that even if I threaded off the GL rendering (very difficult as I'm working at arm's length via a third party cross-platform language that generates the Java app itself), I wouldn't hit 60fps or even get close to it.
As a test I stripped the scene down until it just borderline hits 60fps (it still flickers into 2 frames sometimes. Here's what I ended up with:
I removed all render-to-texture calls.
I removed all the skinned characters from the scene.
All that's left are 50 low-poly static objects, all in VBOS, with super cheap shaders and blending disabled, and a large opaque quad for the floor. Most of the objects aren't even on-screen in the view I'm sampling, and there's almost no overdraw due to the layout of the scene.
I ran the same configuration - same GL calls, same resources, same everything - on an iPod6. Total run+render time: 3.9ms. iPhone6s? 2.8ms. What the hell is going on?
Hi Peeling,Looking at your message it looks like you have found a bug in the overdraw capture. The reason that Fragment count is disabled for you is that it checked to see if you had deleted your shaders and then it knew that it wouldn't be able to get the data back again so disabled it. Overdraw should have done the same thing. I agree that more can be done around the area and I will look at improving this in the future.Getting profiling information from the device is difficult. Every time a new version of Android gets released the amount of control we have to profile data gets reduced. On certain devices before Android 7 you will be able to get CPU based data but from Android 7.0 onwards this is impossible due to a process not being able to see data from a different process. We are actively working on this issue and early next year we hope to be able to profile at least your own application with Streamline at the cost of seeing information about other applications.For GPU counters you should be able to get data for these on select devices such as the Samsung Galaxy S7 but this is all dependent on whether the device manufacturer gives the user read access to the counters. Some manufactuers do and some others don't. I appreciate there there are some situations where you won't be able to root your phone and we are working on making as many features as possible available under these circumstances.Hope this helps,Stephen
As an aside would it be possible to temporarily not delete your shaders so that you can get access to overdraw information and fragment count information?
Hello again :) I already did disable the deletion of shaders and was able to get overdraw measurements working correctly. However, the fragment count button remains greyed out even with that change.
I also rooted my phone in the hope of getting Streamline working, but unfortunately after rooting the MGD won't connect to the phone any more (I posted a separate thread about the issue), so I'm investigating elsewhere for now. One thing I've discovered is that glDrawElements seems to be blocking for an insane amount of time when asked to render straightforward geometry stored in VBOs. No idea why yet!
Interesting regarding the Fragment Count. Did you start MGD from the beginning of the application or do you connect half way through?
Not sure, and can't check any more because since rooting the phone, MGD won't connect.
Hi again - after a few more false starts I managed to grab fragment data from the MGD.
(This is with all the content re-enabled, and thus running at 20-30 fps)
Total vertices: 401,000 (almost all of them in static VBOs)
Total cycles for all vertex shaders: 4,046,949
Total fragments: 7,868,561
Total cycles for all fragment shaders: 18,411,192
I'm able to compare this to a couple of Unity games I've written that comfortably run at 60fps. Vertex count and cycle count are definitely higher here, but fragment cycles are very similar. And I can remove over 50% of the vertices from the scene and STILL not hit 60fps.
(I actually tried just that for comparison: even with 240,000 fewer vertices it won't get anywhere near 60fps (fragment usage was effectively unchanged, since the removed geometry exposed other, simpler geometry that uses the same fragment shader).
Hi Peeling,
I will point this thread at the developer relations team to see if they have any suggestions that can help you improve your performance.
Cheers,
Stephen
Thank you very much :) This is a real head-scratcher.
If you are comfortable to share the apk (can even be a simpler version that shows the problem) we can have a look at it to see if there is anything triggering the issue.
In case the APK is too large to be attached you can upload it somewhere and send a link to me as part of private message.
Regards,
DDD
Yep, I've got the OK to do that. Will get it to you ASAP. Thanks!