Processing order of vertices and fragments in arm gpu

When the ARM GPU processes vertices and fragments, in what order are they distributed and processed? Vertices are in index order? What about fragment? Can we try to control the order of the index on the cpu side to improve the efficiency of the GPU?
For example: Does the optimization of the link below improve the gpu of arm? If there is an improvement, what features of the GPUs they use? github.com/.../meshoptimizer
Parents
  • Hi Shawn, 

    Older Mali GPUs (Utgard, Midgard) render vertices in incrementing index order, between min and max referenced index, ignoring the actual order in the index buffer. 

    Newer Mali GPUs (Bifrost) render vertices in the order in the index buffer (in contiguous blocks of 4), and can skip ranges that are not referenced (except those that are in the same block of 4 as a referenced vertex).  

    Using mesh conditioning tools such as meshoptimizer are a very good idea; the algorithms applied will work well for both types of Mali GPU, either to minimize bandwidth or reduce shading cost.

    HTH, 
    Pete

  • Older Mali GPUs (Utgard, Midgard) render vertices in incrementing index order, between min and max referenced index, ignoring the actual order in the index buffer. 

    Thank you very much for your reply, but this reply gave me new questions. For example, there are vertices 0, 1, 2, 3, 4, 5, 6, and 7 in the vertex buffer. In actual drawing, the index buffer only uses three vertices 0, 5, and 7. But the GPU side will also execute the calculation of the five vertices 1, 2, 3, 4, and 6?

  • In actual drawing, the index buffer only uses three vertices 0, 5, and 7. But the GPU side will also execute the calculation of the five vertices 1, 2, 3, 4, and 6?

    Yes. Don't leave unused indices in the active index range, aim to reference every vertex between min and max. 

    For implementing mesh level-of-detail, duplicate vertices and have a compacted index range for each LOD, don't make low resolution meshes by sparsely sample from the LOD 0 mesh.

Reply
  • In actual drawing, the index buffer only uses three vertices 0, 5, and 7. But the GPU side will also execute the calculation of the five vertices 1, 2, 3, 4, and 6?

    Yes. Don't leave unused indices in the active index range, aim to reference every vertex between min and max. 

    For implementing mesh level-of-detail, duplicate vertices and have a compacted index range for each LOD, don't make low resolution meshes by sparsely sample from the LOD 0 mesh.

Children
No data
More questions in this forum