Mali T400 rendering speed limit with Wayland

Hi,

I am trying to find the practical limit of triangle / frames that the Mali T400 can render while keeping up at 60 FPS on a 1024x600 display with a Wayland integration on a ZynqMP+.

With the program and hardware setup described below, I could reach around 32 000 triangles per frame before performance dips below 60 FPS. This number is lower than I expected considering the "0.11 Mtriangles/sec/MHz" reported in the ZynqpMP+ datasheet (page 2). What steps could I take to render more triangles per frame?

To render as many triangle as possible, I reused the sample program "weston-simple-egl" from the Weston (wayland compositor) project. I changed the rendering to draw a fullscreen window (1024x600) with a GL_TRIANGLE_STRIP spanning around 95% of the screen. I tested the program with 32 bits per pix (bpp) and 16 bpp, but couldn't make any significant gain. The Mali GPU ont the system is clocked at 600MHz. The vertex and fragment shader are respectivly passing the vertices and the fragment as is.

The bottleneck seems to be the `eglSwapBuffers` call. It takes more and more time as the number of triangle rises. With 32 000 triangles, it can take up to 18 ms (!), which explains the FPS drop. Unfortunatly, eglSwapBuffers is implemented by the closed source library libmali, so I couldn't dig deeper. I assume the `eglSwapBuffers` call returns when an IRQ comes back from the GPU indicating that the queued jobs are done.

So, in summary, am I effectivly hitting an hardware limit at 32 000 triangles per frame under wayland or is there something I could do to improve performance?

Parents
  • Hi gchamp, 

    Calling eglSwapBuffers() will block until the next window buffer is available; which "on average" is related to rendering performance although there are some queuing effects here. Specifically the window system won't release the an old buffer until it has a new one to replace it, and we need the new buffer to start queuing commands for it.

    The performance does sound low for a simple triangle grid test app; we'd expect ~10 cycles a vertex not 300. Is the other performance you are seeing (e.g. fragment shading a simple quad with a blit texture) consistent with a 600Mhz GPU performance?

    Cheers, 
    Pete

Reply
  • Hi gchamp, 

    Calling eglSwapBuffers() will block until the next window buffer is available; which "on average" is related to rendering performance although there are some queuing effects here. Specifically the window system won't release the an old buffer until it has a new one to replace it, and we need the new buffer to start queuing commands for it.

    The performance does sound low for a simple triangle grid test app; we'd expect ~10 cycles a vertex not 300. Is the other performance you are seeing (e.g. fragment shading a simple quad with a blit texture) consistent with a 600Mhz GPU performance?

    Cheers, 
    Pete

Children
More questions in this forum