For the best part of a year, on and off, I've been trying without success to get some semblance of decent performance out of some of our Android test devices.
The Note 5 has been particularly reluctant to give up the goods. I've multithreaded our engine, such that one thread does nothing but translate pre-compiled render packets into GL calls. The other thread updates the game and generates the packets, and completes well within the desired 16ms deadline.
On an iPod6 (single-threaded), framerate is nailed to 60, and the GPU time is measured at 8ms.
On the Note 5, the exact same sequence of GL calls exceeds 16ms, and fluctuates wildly. Attempting to profile it gives results like this:
Each green chunk is one frame from a static, unvarying scene. Notice how sometimes it can take two or three times as long to dispatch the exact same GL calls. Meanwhile, a hardware monitor tells me the GPU is barely ticking over, at base clock speed and 50% or lower load, and the CPU also rarely throttles anywhere near maximum.
It's almost as if the phone isn't really trying, but I can't find any clue as to what's actually going on. Help!
I'm not at that computer at the moment, but I'll see about getting you a capture on Friday (next time I'm in the office).
Draw calls in that scene number ~260, mostly simple static geometry in VBOs, the rest around 150KB of dynamic sprite geometry. Shaders are extremely simple; the most we do is apply a world-space projection lighting texture in addition to the base texture and vertex colours.
It's nowhere near what the device is actually capable of: I have test scenes in Unity with far more complex geometry and shaders that run easily at 60fps. By comparison our engine is doing very very little - as I said: an iPod6 chews through it in 8ms flat.
I can't shake the feeling I'm trying to drive with the handbrake on. The app doesn't register as a 'game' when I look at Samsung's game launcher - could that have something to do with it? Failing that, there must be a huge overhead in talking to GL from Java (I'm pretty sure Unity's engine is compiled).
My second thread, which has a LOT more to do (all the game update, all the scene walking and packet prep) completes in a fraction of the time. The render thread just seems to take forever by comparison.