Mali T400 rendering speed limit with Wayland


I am trying to find the practical limit of triangle / frames that the Mali T400 can render while keeping up at 60 FPS on a 1024x600 display with a Wayland integration on a ZynqMP+.

With the program and hardware setup described below, I could reach around 32 000 triangles per frame before performance dips below 60 FPS. This number is lower than I expected considering the "0.11 Mtriangles/sec/MHz" reported in the ZynqpMP+ datasheet (page 2). What steps could I take to render more triangles per frame?

To render as many triangle as possible, I reused the sample program "weston-simple-egl" from the Weston (wayland compositor) project. I changed the rendering to draw a fullscreen window (1024x600) with a GL_TRIANGLE_STRIP spanning around 95% of the screen. I tested the program with 32 bits per pix (bpp) and 16 bpp, but couldn't make any significant gain. The Mali GPU ont the system is clocked at 600MHz. The vertex and fragment shader are respectivly passing the vertices and the fragment as is.

The bottleneck seems to be the `eglSwapBuffers` call. It takes more and more time as the number of triangle rises. With 32 000 triangles, it can take up to 18 ms (!), which explains the FPS drop. Unfortunatly, eglSwapBuffers is implemented by the closed source library libmali, so I couldn't dig deeper. I assume the `eglSwapBuffers` call returns when an IRQ comes back from the GPU indicating that the queued jobs are done.

So, in summary, am I effectivly hitting an hardware limit at 32 000 triangles per frame under wayland or is there something I could do to improve performance?

Parents Reply
  • Hi,

    > kernel settings in petalinux

    I'm not using petalinux, so I have little insight as to what to change there. The defconfig used to compile the kernel must have `CONFIG_DRM_LIMA` and `CONFIG_DRM_XLNX`.

    > device tree binding of gpu

    I changed the interrupt-names in zynqmp.dtsi so they match what's lima_device.c is looking for:

    diff --git a/arch/arm64/boot/dts/xilinx/zynqmp.dtsi b/arch/arm64/boot/dts/xilinx/zynqmp.dtsi
    index b0b306ed796d..97e776231428 100644
    --- a/arch/arm64/boot/dts/xilinx/zynqmp.dtsi
    +++ b/arch/arm64/boot/dts/xilinx/zynqmp.dtsi
    @@ -462,7 +462,7 @@
                            reg = <0x0 0xfd4b0000 0x0 0x10000>;
                            interrupt-parent = <&gic>;
                            interrupts = <0 132 4>, <0 132 4>, <0 132 4>, <0 132 4>, <0 132 4>, <0 132 4>;
    -                       interrupt-names = "IRQGP", "IRQGPMMU", "IRQPP0", "IRQPPMMU0", "IRQPP1", "IRQPPMMU1";
    +                       interrupt-names = "gp", "gpmmu", "pp0", "ppmmu0", "pp1", "ppmmu1";
                            clock-names = "gpu", "gpu_pp0", "gpu_pp1";
                            power-domains = <&zynqmp_firmware PD_GPU>;

    lima_device.c also looks for clock-names `bus` and `core` so I changed the driver code to use the clocks `"gpu`, `gpu_pp0`, `gpu_pp1`. Couldn't really find any docs on those clocks, so I can simply attest that empirically, it works.

    Maybe you already did that  since otherwise there's errors in `dmesg` when lima.ko is loaded.

    > what to do with the mesa libraries. Maybe i am just missing some kind of links.

    Yes, mesa requires a small patch so it knows it can use Xilinx' drm driver.

    I'm really not an expert in the linux graphics ecosystem, but from what I could gather lima is a `render only` driver and Xilinx' drm driver is `display only` (I think Xilinx drm driver was never merged upstreamed, so make sure to use the latest one from their fork), and there's a bit of glue code involved to link them as you said.

    This patch is valid for mesa 19.1.6:

     src/gallium/drivers/kmsro/ | 1 +
     src/gallium/targets/dri/  | 1 +
     src/gallium/targets/dri/target.c     | 1 +
     3 files changed, 3 insertions(+)
    diff --git a/src/gallium/drivers/kmsro/ b/src/gallium/drivers/kmsro/
    index 7c39f97..dbcb389 100644
    --- a/src/gallium/drivers/kmsro/
    +++ b/src/gallium/drivers/kmsro/
    @@ -50,5 +50,6 @@ GALLIUM_TARGET_DRIVERS += repaper
     GALLIUM_TARGET_DRIVERS += sun4i-drm
     $(eval GALLIUM_LIBS += $(LOCAL_MODULE) libmesa_winsys_kmsro)
    diff --git a/src/gallium/targets/dri/ b/src/gallium/targets/dri/
    index 8da21b3..ab57908 100644
    --- a/src/gallium/targets/dri/
    +++ b/src/gallium/targets/dri/
    @@ -85,6 +85,7 @@ foreach d : [[with_gallium_kmsro, [
    +               '',
                  [with_gallium_radeonsi, ''],
                  [with_gallium_nouveau, ''],
    diff --git a/src/gallium/targets/dri/target.c b/src/gallium/targets/dri/target.c
    index f71f690..e8f4340 100644
    --- a/src/gallium/targets/dri/target.c
    +++ b/src/gallium/targets/dri/target.c
    @@ -110,6 +110,7 @@ DEFINE_LOADER_DRM_ENTRYPOINT(st7586)
     #if defined(GALLIUM_LIMA)

    Finally, you can use `kmscube` to test the setup before debugging in weston directly. You should get an ouput like this when everything works correctly (plus a 3D cube on your display):

    # kmscube 
    Using display 0x55bab3fbd0 with EGL version 1.4
    EGL information:
      version: "1.4"
      vendor: "Mesa Project"
      client extensions: "EGL_EXT_client_extensions EGL_EXT_device_base EGL_EXT_device_enumeration EGL_EXT_device_query EGL_EXT_platform_base EGL_KHR_client_get_all_proc_addresses EGL_KHR_debug EGL_EXT_platform_device EGL_EXT_platform_wayland EGL_KHR_platform_wayland EGL_MESA_platform_gbm EGL_KHR_platform_gbm EGL_MESA_platform_surfaceless"
    OpenGL ES 2.x information:
      version: "OpenGL ES 2.0 Mesa 20.1.0"
      shading language version: "OpenGL ES GLSL ES 1.0.16"
      vendor: "lima"
      renderer: "Mali400"

More questions in this forum