This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Draw call performance on the Mali-T880 MP12

I've been profiling a 3D scene on the Samsung Galaxy S7 and I've noticed that glDrawElements and glDrawArrays CPU time is a lot larger compared to Adreno and PowerVR GPUs.

For some context, in an effort to improve performance on Mali devices, I moved all the OpenGL calls to a separate render thread. After that change, the render thread now is bottle-necking the entire application at a ~50-60ms frame time in a scene with 335 draw calls (after letting the device sit for 5 minutes to thermal throttle).

While I would normally excuse this as being GPU-bound, I ran a DS-5 capture on the device and noticed that the GPU's vertex and fragment time was taking a lot less than this (around ~30ms when the device throttles).

Is there any explanation for why the GL calls are taking so long while the GPU isn't 100%? It looks like every GL call is more expensive on Mali, for some reason.

Here's an attached picture of our DS-5 capture, with the render thread isolated on the CPU Activity

In addition, the Unreal Engine (in the mobile optimization guidelines) recommends scenes to be <= 700 draw calls. While I'm not using the Unreal Engine, is this nevertheless a realistic target for this GPU?

Parents
  • Hi ,

    I agree with you that the application is CPU bound at the moment, the GPU execution (Vertex-Fragment) is clearly serialized which is caused by the fact the CPU doesn't provide enough work on time.

    We usually suggest <=500 draw-calls depending on how many vertices you are currently drawing. This depends also on the device CPU configuration. What is the frequency for the core where the thread run on? It's not visible from the screenshot you have attached.

    Each drawcall has some fixed cost that is independent on the number of vertices drawn. This means that using drawcalls with a lot of vertices will allow you to spread this fixed cost better.

    If you can we suggest to:
    -batch as many drawcalls as possible (drawing objects with the same GL state with a single drawcall).
    -if building using OpenGL ES 3.0 use instancing to draw groups of the same object.
    -If it's a VR app. Use Multiview extension to almost halve the CPU drawcall cost.

    I understand that all the drawcalls are called by one thread but is there anything else running on the same thread? (culling algorithms, game logic, etc).

    Regards,

    DDD

Reply
  • Hi ,

    I agree with you that the application is CPU bound at the moment, the GPU execution (Vertex-Fragment) is clearly serialized which is caused by the fact the CPU doesn't provide enough work on time.

    We usually suggest <=500 draw-calls depending on how many vertices you are currently drawing. This depends also on the device CPU configuration. What is the frequency for the core where the thread run on? It's not visible from the screenshot you have attached.

    Each drawcall has some fixed cost that is independent on the number of vertices drawn. This means that using drawcalls with a lot of vertices will allow you to spread this fixed cost better.

    If you can we suggest to:
    -batch as many drawcalls as possible (drawing objects with the same GL state with a single drawcall).
    -if building using OpenGL ES 3.0 use instancing to draw groups of the same object.
    -If it's a VR app. Use Multiview extension to almost halve the CPU drawcall cost.

    I understand that all the drawcalls are called by one thread but is there anything else running on the same thread? (culling algorithms, game logic, etc).

    Regards,

    DDD

Children