This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Overhead of the Driver

We found that when DrawCall used in the product becomes a lot, the API call overhead of the OpenGL ES of the CPU end will be high. This part should be the overhead of Driver. But we don't know what is expensive to overhead for a DrawCall. So we can only rebound through non-stop testing. Although we have tested some conclusions. But we are not very confirmed, they are correct. So, I want to ask a professional:

1.what is the main overhead of the Driver?

2.If you can do a proportion of overhead according to the type, what is the rough proportional situation? For example: Texture, FBO, Vertex, UBO, Uniform, Shader, TBO ... If sorted, what is the order?

3.Also, these overhead mentioned above is related to the specific parameters? Such as the size, number, etc.

Parents
  • Hi Shawn, 

    Draw calls are the most expensive path in the driver (excepting bulk data upload and shader compile and link), as they commit state to the rendering pipeline. For most mobile devices aim for ~500 draw calls a frame, but you can go higher than that on a high-end device (at the expense of CPU load and CPU power consumption). Use offline static batching, instancing, etc, to get the draw call count down as much as you can.

    Draw calls get more expensive when you change state (texture bindings, attribute bindings, enable settings, etc) as more of the descriptors need to be rebuilt before the draw can be processed, so aim to minimize state changes between draw calls. Creating texture atlases, etc, is often needed to minimize state changes.

    Cheers, 
    Pete

Reply
  • Hi Shawn, 

    Draw calls are the most expensive path in the driver (excepting bulk data upload and shader compile and link), as they commit state to the rendering pipeline. For most mobile devices aim for ~500 draw calls a frame, but you can go higher than that on a high-end device (at the expense of CPU load and CPU power consumption). Use offline static batching, instancing, etc, to get the draw call count down as much as you can.

    Draw calls get more expensive when you change state (texture bindings, attribute bindings, enable settings, etc) as more of the descriptors need to be rebuilt before the draw can be processed, so aim to minimize state changes between draw calls. Creating texture atlases, etc, is often needed to minimize state changes.

    Cheers, 
    Pete

Children
  • PS. Recommend reviewing our best practices guide: developer.arm.com/.../

  • Thank you very much for your reply.

    We have done a lot of work to reduce drawcall. The drawcall is already very low.

    We are now more concerned about the driver overhead analysis of a single drawcall. For example, the driver overhead of some drawcalls is about 2ms. We want to know the reason. Can you provide some driver overhead analysis at a single drawcall level?

    For example:

    for the same drawcall, with textures of 1024 and 512, are the driver overheads inconsistent?

    Some data can be stored in texture or in uniform. In which way, the driver overhead is lower?

    Some of them can be passed to GPU for calculation through attributes, or they can be passed to GPU for calculation through texture or uniform. What is the driver overhead?

    We measured these will affect the driver overhead of a single drawcall. But we are not very sure that the conclusion of our test is correct. So, can you give any information ahout this?

    Or, can you tell me, every drawcall, what are the main factors affecting the driver's overhead? For example: format checking accounts for 20%, data transmission accounts for 30%, the longer the data, the greater the overhead... etc.

  • Bulk data size shouldn't matter - that will already be in memory when the draw happens.

    The first draw in each frame and/or the first draw to the window surface might be slower, as it's tied to frame sync (especially if you are hitting vsync - that might be the time where the frame wait is inserted).

  • But we found that the cost of more than one drawcall in a frame is relatively high, and there will be multiple drawcall api calls, which is relatively expensive. What is the reason for this?

  • No idea without any data. Possibly just the application thread is getting descheduled and something else is running? Have you tried a profiler, like our Streamline profiler to see if you can see what's going on?

  • We have checked that the overhead is only the API call of opengl es.

    Or, can I understand this question like this:

    In the Mali GPU driver, except for the first drawcall in a frame, other drawcall overheads are very low and can be ignored, right? There is not any blocking logic in the API call layer, right?

  • In the general case, no, draws shouldn't block. They can be in some corner cases (high vertex counts, high draw counts), but real content tends not to get close to these limits. 

    Pete