This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Performance issue with using EBO

Hello!

We have a some performance issue in our game application with landscape rendering. Rendering is as follows: vertices represented as 16 static VBO, every frame we generate indices and flush it to single elements buffer object (if it's important - it created with 'dynamic' hint), then do a several glDrawElements calls. The problem is in using that elements buffer object. If we use it - performance is much lower, than in case, when we forward indices to glDrawElements from 'non-gpu' memory (so-called 'user provided'). We use DS-5 Streamline on test application with one landscape in scene to find a problem. The capture shows that bottleneck at vertex processing. In case with using buffer, vertex processor process about ten times more vertices, but fetch the same count. Looks like post-T&L cache misses. It greatly decrease performance in real game with complicated scenes. Could you help to understand what happens and how we can resolve the problem? (except using 'user provided' indices).

Captures have been made in DS-5 v5.21.1 on Samsung Galaxy Note 2 with Mali-400 GPU, driver version is r3p2. Test applications and captures in attachment ("IB" - case with elements buffer object, "UP" - 'user provided' case).

Thank in advice for help!

9421.zip
Parents
  • Mali-400 has relatively light vertex processing capability versus it's fragment processing; it can have multiple fragment processing cores (there are four in the Galaxy S2), but only ever has a single vertex processing core. Things to check:

    • How many vertices per frame are you trying to render?
    • How many primitives per frame are you trying to render?
    • How good are your index buffers? Are you getting good reusue of vertices across a connected mesh, or are you processing duplicate vertices at the same location?
    • How good is your attribute buffer packing? Does it contain unused attributes which cause cache pollution unnecessarily?
    • Are your index buffers contiguous, and without holes?

    This final point is a critical one to get right; Mali will shade all vertices between the min and max index value in the index buffer. For example, if your index buffer says [0, 1, 49] then you will end up shading 50 vertices, and only actually use three. Removing holes avoids redundant vertex computation.

    One common issue related to this with landscapes is a dynamic  level of detail scheme in which only a single "full detail" master mesh is used, and lower level of details simply sparsely index into that master mesh. Because Mali will shade all of the index values between min and max you generally end up with redundant vertex processing. To avoid this create specific VBO regions for the vertex data each level of detail, and switch to the unique index ranges for each. E.g. it should look something like this:

    mali-lod.png

    General optimizations to consider:

    • Reducing mesh complexity, using dynamic LOD schemes to select lower detail meshes for more distant models.
    • Reducing vertex shader complexity, so the vertex processor has to do less work per vertex.

    HTH,

    Pete

Reply
  • Mali-400 has relatively light vertex processing capability versus it's fragment processing; it can have multiple fragment processing cores (there are four in the Galaxy S2), but only ever has a single vertex processing core. Things to check:

    • How many vertices per frame are you trying to render?
    • How many primitives per frame are you trying to render?
    • How good are your index buffers? Are you getting good reusue of vertices across a connected mesh, or are you processing duplicate vertices at the same location?
    • How good is your attribute buffer packing? Does it contain unused attributes which cause cache pollution unnecessarily?
    • Are your index buffers contiguous, and without holes?

    This final point is a critical one to get right; Mali will shade all vertices between the min and max index value in the index buffer. For example, if your index buffer says [0, 1, 49] then you will end up shading 50 vertices, and only actually use three. Removing holes avoids redundant vertex computation.

    One common issue related to this with landscapes is a dynamic  level of detail scheme in which only a single "full detail" master mesh is used, and lower level of details simply sparsely index into that master mesh. Because Mali will shade all of the index values between min and max you generally end up with redundant vertex processing. To avoid this create specific VBO regions for the vertex data each level of detail, and switch to the unique index ranges for each. E.g. it should look something like this:

    mali-lod.png

    General optimizations to consider:

    • Reducing mesh complexity, using dynamic LOD schemes to select lower detail meshes for more distant models.
    • Reducing vertex shader complexity, so the vertex processor has to do less work per vertex.

    HTH,

    Pete

Children