For the best part of a year, on and off, I've been trying without success to get some semblance of decent performance out of some of our Android test devices.
The Note 5 has been particularly reluctant to give up the goods. I've multithreaded our engine, such that one thread does nothing but translate pre-compiled render packets into GL calls. The other thread updates the game and generates the packets, and completes well within the desired 16ms deadline.
On an iPod6 (single-threaded), framerate is nailed to 60, and the GPU time is measured at 8ms.
On the Note 5, the exact same sequence of GL calls exceeds 16ms, and fluctuates wildly. Attempting to profile it gives results like this:
Each green chunk is one frame from a static, unvarying scene. Notice how sometimes it can take two or three times as long to dispatch the exact same GL calls. Meanwhile, a hardware monitor tells me the GPU is barely ticking over, at base clock speed and 50% or lower load, and the CPU also rarely throttles anywhere near maximum.
It's almost as if the phone isn't really trying, but I can't find any clue as to what's actually going on. Help!
I can't help explain why the CPU frequency isn't ramping up under load - normally if the device is busy the frequency should increase unless the device has hit some physical limit such as a temperature threshold. This isn't under our control - all of the CPU and GPU frequency control is provided by Samsung in this case, so I can't really help on this aspect.
In terms of a high baseline CPU processing cost, how many draw calls per frame are you making? Draw calls can be expensive on Mali, especially on older devices running older drivers, so we generally recommend keeping total draw call count under 500 draws a frame. Without knowing your application in more detail it's hard to provide specific advice - there are many things which can cause high CPU load such as bulk data upload, or resource copies due to drivers creating ghosts to avoid pipeline drains. Resource ghosting in particular is sensitive to the relative pipeline latency of the frames being built on the CPU and frames completing on the GPU, so that could explain why some frames are slower than others (e.g. some frames trigger a ghost being built, some don't).
Is there any way you can share a Mali Graphics Debugger capture of a typical frame API sequence, or a reproducer for your application?
I'm not at that computer at the moment, but I'll see about getting you a capture on Friday (next time I'm in the office).
Draw calls in that scene number ~260, mostly simple static geometry in VBOs, the rest around 150KB of dynamic sprite geometry. Shaders are extremely simple; the most we do is apply a world-space projection lighting texture in addition to the base texture and vertex colours.
It's nowhere near what the device is actually capable of: I have test scenes in Unity with far more complex geometry and shaders that run easily at 60fps. By comparison our engine is doing very very little - as I said: an iPod6 chews through it in 8ms flat.
I can't shake the feeling I'm trying to drive with the handbrake on. The app doesn't register as a 'game' when I look at Samsung's game launcher - could that have something to do with it? Failing that, there must be a huge overhead in talking to GL from Java (I'm pretty sure Unity's engine is compiled).
My second thread, which has a LOT more to do (all the game update, all the scene walking and packet prep) completes in a fraction of the time. The render thread just seems to take forever by comparison.