This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Mali T400 rendering speed limit with Wayland

Hi,

I am trying to find the practical limit of triangle / frames that the Mali T400 can render while keeping up at 60 FPS on a 1024x600 display with a Wayland integration on a ZynqMP+.

With the program and hardware setup described below, I could reach around 32 000 triangles per frame before performance dips below 60 FPS. This number is lower than I expected considering the "0.11 Mtriangles/sec/MHz" reported in the ZynqpMP+ datasheet (page 2). What steps could I take to render more triangles per frame?

To render as many triangle as possible, I reused the sample program "weston-simple-egl" from the Weston (wayland compositor) project. I changed the rendering to draw a fullscreen window (1024x600) with a GL_TRIANGLE_STRIP spanning around 95% of the screen. I tested the program with 32 bits per pix (bpp) and 16 bpp, but couldn't make any significant gain. The Mali GPU ont the system is clocked at 600MHz. The vertex and fragment shader are respectivly passing the vertices and the fragment as is.

The bottleneck seems to be the `eglSwapBuffers` call. It takes more and more time as the number of triangle rises. With 32 000 triangles, it can take up to 18 ms (!), which explains the FPS drop. Unfortunatly, eglSwapBuffers is implemented by the closed source library libmali, so I couldn't dig deeper. I assume the `eglSwapBuffers` call returns when an IRQ comes back from the GPU indicating that the queued jobs are done.

So, in summary, am I effectivly hitting an hardware limit at 32 000 triangles per frame under wayland or is there something I could do to improve performance?

Parents
  • To sort of give closure on this topic, I upgraded to linux 5.7 and switched to the lima open source drivers. Performance seems slightly better, particularly since dynamic heap memory management was implemented. With Xilinx' binary blob, I was seeing  "PLBU out of memory interrupts" coming back from the GPU for most frames.

    For the heavy performance drop when switching to Qtwayland, it seems like I was hit very hard by this Qt bug https://bugreports.qt.io/browse/QTBUG-76813 which caused frequent 100ms freezes.

    Thank you for the answers and help provided in the thread.

Reply
  • To sort of give closure on this topic, I upgraded to linux 5.7 and switched to the lima open source drivers. Performance seems slightly better, particularly since dynamic heap memory management was implemented. With Xilinx' binary blob, I was seeing  "PLBU out of memory interrupts" coming back from the GPU for most frames.

    For the heavy performance drop when switching to Qtwayland, it seems like I was hit very hard by this Qt bug https://bugreports.qt.io/browse/QTBUG-76813 which caused frequent 100ms freezes.

    Thank you for the answers and help provided in the thread.

Children
  • Hi,

    i am trying switching from Xilinx Mali kernel drivers to lima kernel drivers and i am some kind of stuck.

    Weston with wayland is running with gpu support with mali.ko and libMali.so provided by Xilinx under Ubuntu 20.

    But doing the same thing with lima and self compiled mesa library is another topic.

    So far i was able to load the lima.ko module and could build the mesa drivers but with running weston i got only software rendering, no gpu acceleration.

    Could you please give me a hint what you have done? Like:

    - kernel settings in petalinux

    - device tree binding of gpu

    - what to do with the mesa libraries. Maybe i am just missing some kind of links.

    Regards,

    p00chie

  • Hi,

    > kernel settings in petalinux

    I'm not using petalinux, so I have little insight as to what to change there. The defconfig used to compile the kernel must have `CONFIG_DRM_LIMA` and `CONFIG_DRM_XLNX`.

    > device tree binding of gpu

    I changed the interrupt-names in zynqmp.dtsi so they match what's lima_device.c is looking for:

    diff --git a/arch/arm64/boot/dts/xilinx/zynqmp.dtsi b/arch/arm64/boot/dts/xilinx/zynqmp.dtsi
    index b0b306ed796d..97e776231428 100644
    --- a/arch/arm64/boot/dts/xilinx/zynqmp.dtsi
    +++ b/arch/arm64/boot/dts/xilinx/zynqmp.dtsi
    @@ -462,7 +462,7 @@
                            reg = <0x0 0xfd4b0000 0x0 0x10000>;
                            interrupt-parent = <&gic>;
                            interrupts = <0 132 4>, <0 132 4>, <0 132 4>, <0 132 4>, <0 132 4>, <0 132 4>;
    -                       interrupt-names = "IRQGP", "IRQGPMMU", "IRQPP0", "IRQPPMMU0", "IRQPP1", "IRQPPMMU1";
    +                       interrupt-names = "gp", "gpmmu", "pp0", "ppmmu0", "pp1", "ppmmu1";
                            clock-names = "gpu", "gpu_pp0", "gpu_pp1";
                            power-domains = <&zynqmp_firmware PD_GPU>;
                    };
    

    lima_device.c also looks for clock-names `bus` and `core` so I changed the driver code to use the clocks `"gpu`, `gpu_pp0`, `gpu_pp1`. Couldn't really find any docs on those clocks, so I can simply attest that empirically, it works.

    Maybe you already did that  since otherwise there's errors in `dmesg` when lima.ko is loaded.


    > what to do with the mesa libraries. Maybe i am just missing some kind of links.

    Yes, mesa requires a small patch so it knows it can use Xilinx' drm driver.

    I'm really not an expert in the linux graphics ecosystem, but from what I could gather lima is a `render only` driver and Xilinx' drm driver is `display only` (I think Xilinx drm driver was never merged upstreamed, so make sure to use the latest one from their fork), and there's a bit of glue code involved to link them as you said.

    This patch is valid for mesa 19.1.6:

    ---
     src/gallium/drivers/kmsro/Android.mk | 1 +
     src/gallium/targets/dri/meson.build  | 1 +
     src/gallium/targets/dri/target.c     | 1 +
     3 files changed, 3 insertions(+)
    
    diff --git a/src/gallium/drivers/kmsro/Android.mk b/src/gallium/drivers/kmsro/Android.mk
    index 7c39f97..dbcb389 100644
    --- a/src/gallium/drivers/kmsro/Android.mk
    +++ b/src/gallium/drivers/kmsro/Android.mk
    @@ -50,5 +50,6 @@ GALLIUM_TARGET_DRIVERS += repaper
     GALLIUM_TARGET_DRIVERS += st7586
     GALLIUM_TARGET_DRIVERS += st7735r
     GALLIUM_TARGET_DRIVERS += sun4i-drm
    +GALLIUM_TARGET_DRIVERS += xlnx
     $(eval GALLIUM_LIBS += $(LOCAL_MODULE) libmesa_winsys_kmsro)
     endif
    diff --git a/src/gallium/targets/dri/meson.build b/src/gallium/targets/dri/meson.build
    index 8da21b3..ab57908 100644
    --- a/src/gallium/targets/dri/meson.build
    +++ b/src/gallium/targets/dri/meson.build
    @@ -85,6 +85,7 @@ foreach d : [[with_gallium_kmsro, [
                    'st7735r_dri.so',
                    'stm_dri.so',
                   'sun4i-drm_dri.so',
    +               'xlnx_dri.so',
                  ]],
                  [with_gallium_radeonsi, 'radeonsi_dri.so'],
                  [with_gallium_nouveau, 'nouveau_dri.so'],
    diff --git a/src/gallium/targets/dri/target.c b/src/gallium/targets/dri/target.c
    index f71f690..e8f4340 100644
    --- a/src/gallium/targets/dri/target.c
    +++ b/src/gallium/targets/dri/target.c
    @@ -110,6 +110,7 @@ DEFINE_LOADER_DRM_ENTRYPOINT(st7586)
     DEFINE_LOADER_DRM_ENTRYPOINT(st7735r)
     DEFINE_LOADER_DRM_ENTRYPOINT(stm)
     DEFINE_LOADER_DRM_ENTRYPOINT(sun4i_drm)
    +DEFINE_LOADER_DRM_ENTRYPOINT(xlnx)
     #endif
    
     #if defined(GALLIUM_LIMA)
    

    Finally, you can use `kmscube` to test the setup before debugging in weston directly. You should get an ouput like this when everything works correctly (plus a 3D cube on your display):

    # kmscube 
    
    eglGetPlatformDisplayEXT
    Using display 0x55bab3fbd0 with EGL version 1.4
    ===================================
    EGL information:
      version: "1.4"
      vendor: "Mesa Project"
      client extensions: "EGL_EXT_client_extensions EGL_EXT_device_base EGL_EXT_device_enumeration EGL_EXT_device_query EGL_EXT_platform_base EGL_KHR_client_get_all_proc_addresses EGL_KHR_debug EGL_EXT_platform_device EGL_EXT_platform_wayland EGL_KHR_platform_wayland EGL_MESA_platform_gbm EGL_KHR_platform_gbm EGL_MESA_platform_surfaceless"
    ===================================
    OpenGL ES 2.x information:
      version: "OpenGL ES 2.0 Mesa 20.1.0"
      shading language version: "OpenGL ES GLSL ES 1.0.16"
      vendor: "lima"
      renderer: "Mali400"
    ===================================
    

  • Thanks for your support!

    After doing the changes and compiling Mesa 19.1.6 failed building :(

    I switched to 20.1.0 as you got in your info, made the changed and installed it again.

    Here is my build config:

    meson build/ --buildtype release --prefix=/usr/local --libdir=lib/aarch64-linux-gnu -Dgallium-drivers=lima,kmsro,swrast -Dplatforms=x11,drm,surfaceless,wayland -Dvulkan-drivers= -Ddri-drivers= -Dllvm=false

    Unfortunately i couldn't build kmscube but it's in the Ubuntu 20.10 repo.

    After running kmscube i got the output:

    Using display 0x558dcae390 with EGL version 1.4
    ===================================
    EGL information:
      version: "1.4"
      vendor: "Mesa Project"
      client extensions: "EGL_EXT_client_extensions EGL_EXT_device_base EGL_EXT_device_enumeration EGL_EXT_device_query EGL_EXT_platform_base EGL_KHR_client_get_all_proc_addresses EGL_KHR_debug EGL_EXT_platform_device EGL_EXT_platform_wayland EGL_KHR_platform_wayland EGL_EXT_platform_x11 EGL_KHR_platform_x11 EGL_MESA_platform_gbm EGL_KHR_platform_gbm EGL_MESA_platform_surfaceless"
      display extensions: "EGL_ANDROID_blob_cache EGL_ANDROID_native_fence_sync EGL_EXT_buffer_age EGL_EXT_image_dma_buf_import EGL_EXT_image_dma_buf_import_modifiers EGL_KHR_cl_event2 EGL_KHR_config_attribs EGL_KHR_create_context EGL_KHR_create_context_no_error EGL_KHR_fence_sync EGL_KHR_get_all_proc_addresses EGL_KHR_gl_colorspace EGL_KHR_gl_renderbuffer_image EGL_KHR_gl_texture_2D_image EGL_KHR_gl_texture_3D_image EGL_KHR_gl_texture_cubemap_image EGL_KHR_image EGL_KHR_image_base EGL_KHR_image_pixmap EGL_KHR_no_config_context EGL_KHR_partial_update EGL_KHR_reusable_sync EGL_KHR_surfaceless_context EGL_EXT_pixel_format_float EGL_KHR_wait_sync EGL_MESA_configless_context EGL_MESA_drm_image EGL_MESA_image_dma_buf_export EGL_MESA_query_driver EGL_WL_bind_wayland_display "
    ===================================
    OpenGL ES 2.x information:
      version: "OpenGL ES 2.0 Mesa 20.1.0 (git-7de17e2520)"
      shading language version: "OpenGL ES GLSL ES 1.0.16"
      vendor: "lima"
      renderer: "Mali400"
      extensions: "GL_EXT_blend_minmax GL_EXT_multi_draw_arrays GL_EXT_texture_format_BGRA8888 GL_OES_compressed_ETC1_RGB8_texture GL_OES_depth24 GL_OES_element_index_uint GL_OES_fbo_render_mipmap GL_OES_mapbuffer GL_OES_rgb8_rgba8 GL_OES_standard_derivatives GL_OES_stencil8 GL_OES_texture_3D GL_OES_texture_npot GL_OES_vertex_half_float GL_OES_EGL_image GL_OES_depth_texture GL_OES_packed_depth_stencil GL_OES_get_program_binary GL_APPLE_texture_max_level GL_EXT_discard_framebuffer GL_EXT_read_format_bgra GL_EXT_frag_depth GL_NV_fbo_color_attachments GL_OES_EGL_image_external GL_OES_EGL_sync GL_OES_vertex_array_object GL_EXT_occlusion_query_boolean GL_EXT_unpack_subimage GL_NV_draw_buffers GL_NV_read_buffer GL_NV_read_depth GL_NV_read_depth_stencil GL_NV_read_stencil GL_EXT_draw_buffers GL_EXT_map_buffer_range GL_KHR_debug GL_KHR_texture_compression_astc_ldr GL_NV_pixel_buffer_object GL_OES_required_internalformat GL_OES_surfaceless_context GL_EXT_separate_shader_objects GL_EXT_compressed_ETC1_RGB8_sub_texture GL_EXT_draw_elements_base_vertex GL_EXT_texture_border_clamp GL_KHR_context_flush_control GL_OES_draw_elements_base_vertex GL_OES_texture_border_clamp GL_KHR_no_error GL_KHR_texture_compression_astc_sliced_3d GL_KHR_parallel_shader_compile "
    ===================================
    failed to set mode: Invalid argument
    

    So i think the driver should be ok. Maybe there is something missing the the drm or gpu?

    First the dmesg for mali:

    [   10.085688] lima fd4b0000.gpu: IRQ pmu not found
    [   10.090471] lima fd4b0000.gpu: IRQ ppmmu2 not found
    [   10.095394] lima fd4b0000.gpu: IRQ ppmmu3 not found
    [   10.100322] lima fd4b0000.gpu: gp - mali400 version major 1 minor 1
    [   10.100353] lima fd4b0000.gpu: pp0 - mali400 version major 1 minor 1
    [   10.100373] lima fd4b0000.gpu: pp1 - mali400 version major 1 minor 1
    [   10.100385] lima fd4b0000.gpu: IRQ pp2 not found
    [   10.105041] lima fd4b0000.gpu: IRQ pp3 not found
    [   10.109699] lima fd4b0000.gpu: l2 cache 64K, 4-way, 64byte cache line, 128bit external bus
    [   10.166862] lima fd4b0000.gpu: bus rate = 599999994
    [   10.166871] lima fd4b0000.gpu: mod rate = 599999994
    [   10.172464] [drm] Initialized lima 1.0.0 20190217 for fd4b0000.gpu on minor 1
    

    2nd  the dmesg for drm:

    [    3.537453] OF: graph: no port node found in /amba/zynqmp-display@fd4a0000
    [    3.544426] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
    [    3.551034] [drm] No driver support for vblank timestamp query.
    [    3.557007] xlnx-drm xlnx-drm.0: bound fd4a0000.zynqmp-display (ops 0xffffffc010cf8740)
    [    3.734818] Console: switching to colour frame buffer device 240x75
    [    3.757765] zynqmp-display fd4a0000.zynqmp-display: fb0: xlnxdrmfb frame buffer device
    [    3.765918] [drm] Initialized xlnx 1.0.0 20130509 for fd4a0000.zynqmp-display on minor 0
    [    3.774046] zynqmp-display fd4a0000.zynqmp-display: ZynqMP DisplayPort Subsystem driver probed
    

    Does kmscube did work for you?

    After messing around with custom weston launches without success by just using weston-launch the mali/lima/Utgard back to valhalla is working :)

    cat /proc/interrupts

    root@bcp-linux:/etc/ld.so.conf.d# cat /proc/interrupts 
               CPU0       CPU1       CPU2       CPU3       
      3:     954639     823223     807559     890756     GICv2  30 Level     arch_timer
      6:          0          0          0          0     GICv2  67 Level     zynqmp_ipi
      7:          0          0          0          0     GICv2 175 Level     arm-pmu
      8:          0          0          0          0     GICv2 176 Level     arm-pmu
      9:          0          0          0          0     GICv2 177 Level     arm-pmu
     10:          0          0          0          0     GICv2 178 Level     arm-pmu
     12:          0          0          0          0     GICv2 156 Level     zynqmp-dma
     13:          0          0          0          0     GICv2 157 Level     zynqmp-dma
     14:          0          0          0          0     GICv2 158 Level     zynqmp-dma
     15:          0          0          0          0     GICv2 159 Level     zynqmp-dma
     16:          0          0          0          0     GICv2 160 Level     zynqmp-dma
     17:          0          0          0          0     GICv2 161 Level     zynqmp-dma
     18:          0          0          0          0     GICv2 162 Level     zynqmp-dma
     19:          0          0          0          0     GICv2 163 Level     zynqmp-dma
     20:        155          0          0          0     GICv2 164 Level     gpmmu, ppmmu0, ppmmu1, gp, pp0, pp1
     21:          0          0          0          0     GICv2 109 Level     zynqmp-dma
     22:          0          0          0          0     GICv2 110 Level     zynqmp-dma
     23:          0          0          0          0     GICv2 111 Level     zynqmp-dma
     24:          0          0          0          0     GICv2 112 Level     zynqmp-dma
     25:          0          0          0          0     GICv2 113 Level     zynqmp-dma
     26:          0          0          0          0     GICv2 114 Level     zynqmp-dma
     27:          0          0          0          0     GICv2 115 Level     zynqmp-dma
     28:          0          0          0          0     GICv2 116 Level     zynqmp-dma
     29:          1          0          0          0     GICv2 144 Level     fd070000.memory-controller
     30:     525015          0          0          0     GICv2  89 Level     eth0, eth0
     32:          0          0          0          0     GICv2  49 Level     cdns-i2c
     33:          0          0          0          0     GICv2  42 Level     ff960000.memory-controller
     34:          0          0          0          0     GICv2  57 Level     axi-pmon, axi-pmon
     35:          0          0          0          0     GICv2 155 Level     axi-pmon, axi-pmon
     36:          0          0          0          0     GICv2  47 Level     ff0f0000.spi
     37:          0          0          0          0     GICv2  58 Level     ffa60000.rtc
     38:          0          0          0          0     GICv2  59 Level     ffa60000.rtc
     39:      82759          0          0          0     GICv2  81 Level     mmc0
     40:        814          0          0          0     GICv2  53 Level     xuartps
     41:          0          0          0          0     GICv2  88 Level     ams-irq
     42:     742775          0          0          0     GICv2 154 Level     fd4c0000.dma
     43:       7692          0          0          0     GICv2 151 Level     fd4a0000.zynqmp-display
     45:          0          0          0          0     GICv2 122 Edge      M_AXI_S2O
     46:          0          0          0          0     GICv2 126 Edge      M_AXI_O2S
     47:          0          0          0          0     GICv2 123 Edge      M_AXI_S2O_INTR0
     48:          0          0          0          0     GICv2 124 Edge      M_AXI_S2O_INTR1
     49:          0          0          0          0     GICv2 125 Edge      M_AXI_S2O_INTR2
     50:          0          0          0          0     GICv2 127 Edge      M_AXI_O2S_INTR0
     51:          0          0          0          0     GICv2 128 Edge      M_AXI_O2S_INTR1
     84:       7647          0          0          0     GICv2  97 Level     xhci-hcd:usb1
     85:          3          0          0          0     GICv2 101 Level     dwc3-otg
    IPI0:    101043     227560     313069     254156       Rescheduling interrupts
    IPI1:      1709       6727       6785       6559       Function call interrupts
    IPI2:         0          0          0          0       CPU stop interrupts
    IPI3:         0          0          0          0       CPU stop (for crash dump) interrupts
    IPI4:         0          0          0          0       Timer broadcast interrupts
    IPI5:         0          0          0          0       IRQ work interrupts
    IPI6:         0          0          0          0       CPU wake-up interrupts
    

    But withing weston glmark2-es2-wayland failed with

    error: import buffer not properly aligned

    Can you start it?

    Thanks for your support!

  • > So i think the driver should be ok. Maybe there is something missing the the drm or gpu?

    I think you're right, everything seems initialized correctly in the logs.

    > Does kmscube did work for you?

    Yes it works. It seems you have the error "failed to set mode: Invalid argument". I think this is an issue with the buffer format kmscube uses by default. I have this patch locally for kmscube:

    diff --git a/common.c b/common.c
    index b6f3e9b..d772a79 100644
    --- a/common.c
    +++ b/common.c
    @@ -43,7 +43,7 @@ gbm_surface_create_with_modifiers(struct gbm_device *gbm,
     const struct gbm * init_gbm(int drm_fd, int w, int h, uint64_t modifier)
     {
            gbm.dev = gbm_create_device(drm_fd);
    -       gbm.format = GBM_FORMAT_XRGB8888;
    +       gbm.format = GBM_FORMAT_RGB565;
            gbm.surface = NULL;
     
            if (gbm_surface_create_with_modifiers) {
    

    But withing weston glmark2-es2-wayland failed with

    > error: import buffer not properly aligned

    > Can you start it?

    It starts, but it's doesn't render correctly. The image doesn't render on screen. I have this patch in mesa also, which seems to be the cause of your crash:

    Subject: [PATCH] lima: lima_resource: relax stride check
    
    See https://gitlab.freedesktop.org/mesa/mesa/-/issues/3070
    
    Suggested-by: Vasily Khoruzhick <anarsoul@gmail.com>
    ---
     src/gallium/drivers/lima/lima_resource.c | 17 +++++++++++++++--
     1 file changed, 15 insertions(+), 2 deletions(-)
    
    diff --git a/src/gallium/drivers/lima/lima_resource.c b/src/gallium/drivers/lima/lima_resource.c
    index 4644ea4..fd7614d 100644
    --- a/src/gallium/drivers/lima/lima_resource.c
    +++ b/src/gallium/drivers/lima/lima_resource.c
    @@ -351,8 +351,21 @@ lima_resource_from_handle(struct pipe_screen *pscreen,
           stride = util_format_get_stride(pres->format, width);
           size = util_format_get_2d_size(pres->format, stride, height);
     
    -      if (res->levels[0].stride != stride || res->bo->size < size) {
    -         debug_error("import buffer not properly aligned\n");
    +      if (res->tiled && res->levels[0].stride != stride) {
    +         fprintf(stderr, "tiled imported buffer has mismatching stride: %d (BO) != %d (expected)",
    +                     res->levels[0].stride, stride);
    +         goto err_out;
    +      }
    +
    +      if (!res->tiled && res->levels[0].stride < stride) {
    +         fprintf(stderr, "linear imported buffer has mismatching stride: %d (BO) < %d (expected)",
    +                     res->levels[0].stride, stride);
    +         goto err_out;
    +      }
    +
    +      if (res->bo->size < size) {
    +         fprintf(stderr, "imported bo size is smaller than expected: %d (BO) < %d (expected)\n",
    +                     res->bo->size, size);
              goto err_out;
           }
     
    -- 
    

    Weston has a few sample clients to test as well, such as "weston-simple-egl" that do work for me.

  • Thanks for the advice with GBM_FORMAT_RGB565

    I had to change it on another spot but i forgot about it.

    Are you still under 5.4.0 kernel from Xilinx?

    I think there was a lot of work for the lima driver in the linux kernel but it i couldn't merge the current lima sources with the 5.4 xilinx kernel sources because there were way too many changes. Maybe with a newer version from Xilinx i could give it another try.

    The official way with libMali with Weston 9 under Ubuntu 20.10 seems a bit more advanced since for excample glmark2 is working without any errors under wayland

  • Yes, I'm using Xilinx' 5.4.0 kernel, but I cherry picked a few patches from the upstream lima driver. Mainly dynamic heap memory. It's possible libMali is more battled tested than the open source stack. For my use case, lima was doing a better job and I have insight into the whole stack for debugging, so I went with that. But your mileage may vary :-P

  • This fix finally fixed that a lot of applications didn't load at all.

    But the display of the application is not working. After changing size of the application it becomes somehow ok.

    Could you get around the imaging artifacts?