Hello!
We have a some performance issue in our game application with landscape rendering. Rendering is as follows: vertices represented as 16 static VBO, every frame we generate indices and flush it to single elements buffer object (if it's important - it created with 'dynamic' hint), then do a several glDrawElements calls. The problem is in using that elements buffer object. If we use it - performance is much lower, than in case, when we forward indices to glDrawElements from 'non-gpu' memory (so-called 'user provided'). We use DS-5 Streamline on test application with one landscape in scene to find a problem. The capture shows that bottleneck at vertex processing. In case with using buffer, vertex processor process about ten times more vertices, but fetch the same count. Looks like post-T&L cache misses. It greatly decrease performance in real game with complicated scenes. Could you help to understand what happens and how we can resolve the problem? (except using 'user provided' indices).
Captures have been made in DS-5 v5.21.1 on Samsung Galaxy Note 2 with Mali-400 GPU, driver version is r3p2. Test applications and captures in attachment ("IB" - case with elements buffer object, "UP" - 'user provided' case).
Thank in advice for help!
Mali-400 has relatively light vertex processing capability versus it's fragment processing; it can have multiple fragment processing cores (there are four in the Galaxy S2), but only ever has a single vertex processing core. Things to check:
This final point is a critical one to get right; Mali will shade all vertices between the min and max index value in the index buffer. For example, if your index buffer says [0, 1, 49] then you will end up shading 50 vertices, and only actually use three. Removing holes avoids redundant vertex computation.
One common issue related to this with landscapes is a dynamic level of detail scheme in which only a single "full detail" master mesh is used, and lower level of details simply sparsely index into that master mesh. Because Mali will shade all of the index values between min and max you generally end up with redundant vertex processing. To avoid this create specific VBO regions for the vertex data each level of detail, and switch to the unique index ranges for each. E.g. it should look something like this:
General optimizations to consider:
HTH,
Pete
Thank you for answer, Peter.
Sorry, but you probably don't understand my question. When we use 'user provided' indices we have satisfactory performance. When we draw the same geometry with identical indices, but using IBO - performance decrease. We trying to understand why using IBO cause described an effect.
Best Regards,
Igor
Hi Igor,
Quite right - I misunderstood your question!
When it is going slowly are you updating the IBO in software every frame or multiple times per frame? The usual reason why this kind of usage is slow is because of the need to create a copy of the IBO when it is still referenced by a pending draw. More information on this type of problem can be found here:
Mali Performance 6: Efficiently Updating Dynamic Resources
The blog mostly talks about textures, but exactly the same issues apply to all buffer-based resources.
Cheers, Pete
Hello Peter,
We have 3 IBO and reusing it one per frame cyclically (it means like IBO0 on frame0, IBO1 on frame1, IBO2 on frame2, IBO 0 on frame3, asf) to be sure that current IBO not used. And we update one IBO per frame by single glBufferData-call.