This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Draw call performance on the Mali-T880 MP12

I've been profiling a 3D scene on the Samsung Galaxy S7 and I've noticed that glDrawElements and glDrawArrays CPU time is a lot larger compared to Adreno and PowerVR GPUs.

For some context, in an effort to improve performance on Mali devices, I moved all the OpenGL calls to a separate render thread. After that change, the render thread now is bottle-necking the entire application at a ~50-60ms frame time in a scene with 335 draw calls (after letting the device sit for 5 minutes to thermal throttle).

While I would normally excuse this as being GPU-bound, I ran a DS-5 capture on the device and noticed that the GPU's vertex and fragment time was taking a lot less than this (around ~30ms when the device throttles).

Is there any explanation for why the GL calls are taking so long while the GPU isn't 100%? It looks like every GL call is more expensive on Mali, for some reason.

Here's an attached picture of our DS-5 capture, with the render thread isolated on the CPU Activity

In addition, the Unreal Engine (in the mobile optimization guidelines) recommends scenes to be <= 700 draw calls. While I'm not using the Unreal Engine, is this nevertheless a realistic target for this GPU?

Parents
  • The main thread serializes a CPU-side command buffer and sends it to the render thread via semaphore. This process is double-buffered, so the render thread will be rendering 1 frame behind in the case of being GPU-bound (and hence able to execute in parallel).

    I don't think the sleeping from the semaphore is shown in the systrace, which may make it look like there is resource contention.

    The render thread can also be occasionally interrupted via a blocking request from the main thread, but these events shouldn't be frequent.

    I've attached the throttled systrace for the S7. I can send an APK as well, but it would have to be done privately.

    S7_throttled.zip

Reply
  • The main thread serializes a CPU-side command buffer and sends it to the render thread via semaphore. This process is double-buffered, so the render thread will be rendering 1 frame behind in the case of being GPU-bound (and hence able to execute in parallel).

    I don't think the sleeping from the semaphore is shown in the systrace, which may make it look like there is resource contention.

    The render thread can also be occasionally interrupted via a blocking request from the main thread, but these events shouldn't be frequent.

    I've attached the throttled systrace for the S7. I can send an APK as well, but it would have to be done privately.

    S7_throttled.zip

Children
  • Hi ,

    The systrace you sent looks more as I would have expected. Since the Render thread is the bottleneck it doesn't wait on the Main thread to start executing again as it was happening in your previous systrace. I see the render thread takes 10ms more when throttled as you mentioned and the whole execution is around 40ms.

    If you can send the apk to me I will try to have a look at it.