I've been profiling a 3D scene on the Samsung Galaxy S7 and I've noticed that glDrawElements and glDrawArrays CPU time is a lot larger compared to Adreno and PowerVR GPUs.
For some context, in an effort to improve performance on Mali devices, I moved all the OpenGL calls to a separate render thread. After that change, the render thread now is bottle-necking the entire application at a ~50-60ms frame time in a scene with 335 draw calls (after letting the device sit for 5 minutes to thermal throttle).
While I would normally excuse this as being GPU-bound, I ran a DS-5 capture on the device and noticed that the GPU's vertex and fragment time was taking a lot less than this (around ~30ms when the device throttles).
Is there any explanation for why the GL calls are taking so long while the GPU isn't 100%? It looks like every GL call is more expensive on Mali, for some reason.
Here's an attached picture of our DS-5 capture, with the render thread isolated on the CPU Activity
In addition, the Unreal Engine (in the mobile optimization guidelines) recommends scenes to be <= 700 draw calls. While I'm not using the Unreal Engine, is this nevertheless a realistic target for this GPU?
Hey Daniele and Peter,
The render thread is scheduled to a big core, which starts off at 2.26GHz and throttles to 1.25GHz within 5 minutes. The 4 little cores on the device are a consistent 1.59GHz.
The render thread only executes OpenGL commands -- all scene / game related work is done on the main thread (which is usually waiting for the render thread). The render thread performs the OpenGL commands for the previous frame so it executes in parallel with the main thread.
The device's performance is good until it gets thermal throttled. However, it looks like bulk of the CPU work is done by the render thread, which I suspect is causing aggressive thermal throttling on device.
I've attached android systraces of my application running on the S7 and a Redmi Note 4 (which is a Mali-T880 MP4) while not thermal throttled. HH_Render and HH_Main are the render thread and main thread respectively.
systrace.zip
Hi cedega
Can you provide a systrace when the device is throttled down? I had a look at the traces you sent and I noticed a bit of serialization between main thread and render thread. Specifically, I see the render thread waits for the main thread to start, are game and render thread sharing something that needs synchronization? When the device throttles down, I would expect the main thread will also take longer to complete and the sync point make the overall performance to suffer.
If it's possible, It would be good to have a test apk to understand if there is a specific reason why you are seeing this high CPU usage.
-DDD
The main thread serializes a CPU-side command buffer and sends it to the render thread via semaphore. This process is double-buffered, so the render thread will be rendering 1 frame behind in the case of being GPU-bound (and hence able to execute in parallel).
I don't think the sleeping from the semaphore is shown in the systrace, which may make it look like there is resource contention.
The render thread can also be occasionally interrupted via a blocking request from the main thread, but these events shouldn't be frequent.
I've attached the throttled systrace for the S7. I can send an APK as well, but it would have to be done privately.
S7_throttled.zip
Hi cedega,
The systrace you sent looks more as I would have expected. Since the Render thread is the bottleneck it doesn't wait on the Main thread to start executing again as it was happening in your previous systrace. I see the render thread takes 10ms more when throttled as you mentioned and the whole execution is around 40ms.
If you can send the apk to me I will try to have a look at it.