We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
I've been profiling a 3D scene on the Samsung Galaxy S7 and I've noticed that glDrawElements and glDrawArrays CPU time is a lot larger compared to Adreno and PowerVR GPUs.
For some context, in an effort to improve performance on Mali devices, I moved all the OpenGL calls to a separate render thread. After that change, the render thread now is bottle-necking the entire application at a ~50-60ms frame time in a scene with 335 draw calls (after letting the device sit for 5 minutes to thermal throttle).
While I would normally excuse this as being GPU-bound, I ran a DS-5 capture on the device and noticed that the GPU's vertex and fragment time was taking a lot less than this (around ~30ms when the device throttles).
Is there any explanation for why the GL calls are taking so long while the GPU isn't 100%? It looks like every GL call is more expensive on Mali, for some reason.
Here's an attached picture of our DS-5 capture, with the render thread isolated on the CPU Activity
In addition, the Unreal Engine (in the mobile optimization guidelines) recommends scenes to be <= 700 draw calls. While I'm not using the Unreal Engine, is this nevertheless a realistic target for this GPU?
Hi cedega
Can you provide a systrace when the device is throttled down? I had a look at the traces you sent and I noticed a bit of serialization between main thread and render thread. Specifically, I see the render thread waits for the main thread to start, are game and render thread sharing something that needs synchronization? When the device throttles down, I would expect the main thread will also take longer to complete and the sync point make the overall performance to suffer.
If it's possible, It would be good to have a test apk to understand if there is a specific reason why you are seeing this high CPU usage.
-DDD
The main thread serializes a CPU-side command buffer and sends it to the render thread via semaphore. This process is double-buffered, so the render thread will be rendering 1 frame behind in the case of being GPU-bound (and hence able to execute in parallel).
I don't think the sleeping from the semaphore is shown in the systrace, which may make it look like there is resource contention.
The render thread can also be occasionally interrupted via a blocking request from the main thread, but these events shouldn't be frequent.
I've attached the throttled systrace for the S7. I can send an APK as well, but it would have to be done privately.
S7_throttled.zip
Hi cedega,
The systrace you sent looks more as I would have expected. Since the Render thread is the bottleneck it doesn't wait on the Main thread to start executing again as it was happening in your previous systrace. I see the render thread takes 10ms more when throttled as you mentioned and the whole execution is around 40ms.
If you can send the apk to me I will try to have a look at it.