I've been profiling a 3D scene on the Samsung Galaxy S7 and I've noticed that glDrawElements and glDrawArrays CPU time is a lot larger compared to Adreno and PowerVR GPUs.
For some context, in an effort to improve performance on Mali devices, I moved all the OpenGL calls to a separate render thread. After that change, the render thread now is bottle-necking the entire application at a ~50-60ms frame time in a scene with 335 draw calls (after letting the device sit for 5 minutes to thermal throttle).
While I would normally excuse this as being GPU-bound, I ran a DS-5 capture on the device and noticed that the GPU's vertex and fragment time was taking a lot less than this (around ~30ms when the device throttles).
Is there any explanation for why the GL calls are taking so long while the GPU isn't 100%? It looks like every GL call is more expensive on Mali, for some reason.
Here's an attached picture of our DS-5 capture, with the render thread isolated on the CPU Activity
In addition, the Unreal Engine (in the mobile optimization guidelines) recommends scenes to be <= 700 draw calls. While I'm not using the Unreal Engine, is this nevertheless a realistic target for this GPU?
Hi cedega,
The systrace you sent looks more as I would have expected. Since the Render thread is the bottleneck it doesn't wait on the Main thread to start executing again as it was happening in your previous systrace. I see the render thread takes 10ms more when throttled as you mentioned and the whole execution is around 40ms.
If you can send the apk to me I will try to have a look at it.