This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Why vulkan has much higher input primitives than OpenGL?

I create a very simple scene in UE4(empty level so only default material floor and a sky) and I notice the vulkan version has much higher input primitives than opengl (below are data in 10 seconds range)

Then I use render doc to capture the scene on both vulkan and opengl, and it turns out the primitives send to GPU are same:

The streamline says the input primitives means "The total number of input primitives to the rendering process" so I guess they should match the vertices count submitted to GPU and if we submit same number of vertices(and of course with same primitive type), we should expect they have same input primitives.

Can mali expert help to have a look at this issue? I can provide the apk, streamline files and render doc capture files if needed.

Thanks!

  • What GPU are you running on, and with which driver version?

    Older OpenGL ES drivers had some optimizations to remove large spatial jumps in index buffers, but this has a high CPU cost and generally isn't possible on Vulkan due to the more literal approach to memory management. On newer OpenGL ES and Vulkan drivers I'd expect to see the same behaviour on both. On pre-Valhall hardware this will look like the Vulkan numbers in Streamline, on Valhall or newer hardware it should look like the OpenGL ES numbers in Streamline.

    In general we recommend using index buffers where every index between min and max index is used, and avoid using sparsely sampling indices from a large index range. 

    HTH, 
    Pete

  • if you can send the apk/streamline/renderdoc to developer at arm dot com please (I believe you've sent us something before), I'll find someone to look at it, see what we can work out. 

  • Hi peter, I'm running it on mali g78, with driver version:

    GLES: ARM, Mali-G78, OpenGL ES 3.2 v1.r34p0-01eac0.a1b116bd871d46ef040e8feef9ed691e

  • Hi Ben, I have sent the apk to you. Thanks for your help!

  • I will find a valhall hardware to double check, thanks!

  • Just confirmed on Redmi K50pro( Mali-G710 ) and it seems they have same high input primitives.

    So Peter you are right. Thanks!

  • Hmm,  Mali-G78 is Valhall so should be new enough to have consistent driver behavior, so I suspect some other issue at play here. Is the result 100% reproducible for you?

  • Yes. It's 100% reproducible.

  • Hi iculi,

    One thing you can try here is to instead of comparing e.g. a 10 second range, zoom in to a single frame and select and compare this directly. This is usually much easier because then you don't need to worry about what the frame-rate is. 

    For this frame, based on RenderDoc, I see we have 2 main draws of 288 and 11904 indices. That's (11904 + 288) / 3 = 4064 primitives, and on my G77 device I see 4080 primitives total per frame (measured using the zoom-in-to-frame approach). That makes sense as there are of course a few post-processing full-screen-quad draws, as well as some UI, here, in addition to those main-pass draws.

    Looking at your Redmi results you have 4.9M primitives. Is that still for a 10 seconds range? If so it could make sense given 4080 primitives * 120 fps * 10 seconds = 4.896M primitives, which matches pretty much exactly.

    On my G77 device I see the same amounts of primitives per frame in both Vulkan and GLES, as one would expect. The only difference I spot is there are some more tiles/tasks rendered in Vulkan, and consequently it runs a bit slower. In my experience this is usually because the UI rendering in UE usually involves 2 render-passes in Vulkan because of a limitation in Unreal Engine / Slate. So overall this seems like expected.

    If you see a significant difference in input primitives per frame on your side on G78 here, this is, in general, unexpected, I'd say. It's possible the device vendor has implemented some special optimization to cause this, however, which could possibly explain it.

    Cheers,
    Christian

  • Hi Christian. The redmi results is still for 10 seconds range. I choose 10 seconds because I think the sum of range is more stable compared to a single frame. I try to zoom in to one frame, and I can see for vulkan there are many spikes which cause the high input primitives on vulkan.

    These two streamlines are both captured on g78 device with same 60 fps.