This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

r12p0 wayland driver (odroid-xu3) frees objects too early leading to segm fault

totem (gnome-videos) crashes on exit with the following backtrace:

Core was generated by `totem bbb_720p.mov'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  get_next_argument (signature=0x2c <error: Cannot access memory at address 0x2c>, details=details@entry=0xbee39a9c) at ../src/connection.c:430
430             for(; *signature; ++signature) {
[Current thread is 1 (Thread 0xb213cd70 (LWP 12827))]
(gdb) bt
#0  get_next_argument (signature=0x2c <error: Cannot access memory at address 0x2c>, details=details@entry=0xbee39a9c) at ../src/connection.c:430
#1  0xb4ce69ba in wl_argument_from_va_list (signature=<optimized out>, args=args@entry=0xbee39acc, count=count@entry=20, ap=..., ap@entry=...) at ../src/connection.c:493
#2  0xb4ce5598 in wl_proxy_marshal (proxy=0x7f6bedb0, opcode=1) at ../src/wayland-client.c:692
#3  0xb4f8685e in window_surface_delete () from /usr/lib/arm-linux-gnueabihf/egl-current/libwayland-egl.so.1
#4  0xb4f7e1e4 in eglp_window_surface_specific_deinitialization () from /usr/lib/arm-linux-gnueabihf/egl-current/libwayland-egl.so.1
#5  0xb4f7cd14 in eglp_delete_surface () from /usr/lib/arm-linux-gnueabihf/egl-current/libwayland-egl.so.1
#6  0xb4f7ce74 in eglp_destroy_all_non_current_surfaces () from /usr/lib/arm-linux-gnueabihf/egl-current/libwayland-egl.so.1
#7  0xb4f7a71a in eglp_try_display_finish_terminating () from /usr/lib/arm-linux-gnueabihf/egl-current/libwayland-egl.so.1
#8  0xb4f7b1e2 in eglTerminate () from /usr/lib/arm-linux-gnueabihf/egl-current/libwayland-egl.so.1
#9  0xb4f7b22c in eglp_unload_callback () from /usr/lib/arm-linux-gnueabihf/egl-current/libwayland-egl.so.1
#10 0xb4decc24 in osup_term_unload_hooks () from /usr/lib/arm-linux-gnueabihf/egl-current/libwayland-egl.so.1
#11 0xb4dde4ca in osup_c_unload_hook () from /usr/lib/arm-linux-gnueabihf/egl-current/libwayland-egl.so.1
#12 0xb6fd3f42 in ?? () from /lib/ld-linux-armhf.so.3
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

following it, i get:

(gdb) print (struct wl_proxy) *0x7f6bedb0
$3 = {object = {interface = 0x7fe1bfc8, implementation = 0x7fb51c30, id = 44}, display = 0x7f660ec0, queue = 0x7f660f2c, flags = 2, refcount = 1, user_data = 0x0, dispatcher = 0x0, version = 3}

(gdb) print (struct wl_interface) *0x7fe1bfc8 # => this is proxy->interface - you can see the name is garbage already
$4 = {name = 0xa93e931d "iXh\377\367Ňş\022KP!0\265{D\021L\205\260\025F\034Y#h\003\223\377\367\f\354\016IjF", version = 49, method_count = -2147421248, methods = 0x7f6beda8, event_count = 0, events = 0x0}

(gdb) print (struct wl_message) *0x7f6beda8 # => this is proxy->interface->methods => you can see the signature field cannot be accessed (0x31 is invalid) leading to the segmentation fault
$5 = {name = 0x0, signature = 0x31 <error: Cannot access memory at address 0x31>, types = 0x7fe1bfc8}

which means that window_surface_delete sends garbage to the wayland-client library.

The interesting bit is that this trace was obtained using the corefile => if i run totem in gdb, there is no segmentation fault - probably because the function calls are delayed and the free does not happen fast enough.

Any help would be appreciated.

 

EGL_VERSION = 1.4 Midgard-"r12p0-04rel0"
EGL_VENDOR = ARM
EGL_EXTENSIONS =  EGL_WL_bind_wayland_display  EGL_KHR_partial_update EGL_KHR_config_attribs EGL_KHR_image EGL_KHR_image_base EGL_KHR_fence_sync EGL_KHR_wait_sync EGL_KHR_gl_colorspace EGL_KHR_get_all_proc_addresses EGL_IMG_context_priority EGL_ARM_pixmap_multisample_discard EGL_KHR_gl_texture_2D_image EGL_KHR_gl_renderbuffer_image EGL_KHR_create_context EGL_KHR_surfaceless_context EGL_KHR_gl_texture_cubemap_image EGL_EXT_create_context_robustness EGL_KHR_cl_event2
EGL_CLIENT_APIS = OpenGL_ES
GL_VERSION: OpenGL ES 3.1 v1.r12p0-04rel0.f9ea82e6bf7f0bb7544260636f375425
GL_RENDERER: Mali-T628
GL_EXTENSIONS:
    GL_ARM_rgba8, GL_ARM_mali_shader_binary, GL_OES_depth24,
    GL_OES_depth_texture, GL_OES_depth_texture_cube_map,
    GL_OES_packed_depth_stencil, GL_OES_rgb8_rgba8, GL_EXT_read_format_bgra,
    GL_OES_compressed_paletted_texture, GL_OES_compressed_ETC1_RGB8_texture,
    GL_OES_standard_derivatives, GL_OES_EGL_image, GL_OES_EGL_image_external,
    GL_OES_EGL_image_external_essl3, GL_OES_EGL_sync, GL_OES_texture_npot,
    GL_OES_vertex_half_float, GL_OES_required_internalformat,
    GL_OES_vertex_array_object, GL_OES_mapbuffer,
    GL_EXT_texture_format_BGRA8888, GL_EXT_texture_rg,
    GL_EXT_texture_type_2_10_10_10_REV, GL_OES_fbo_render_mipmap,
    GL_OES_element_index_uint, GL_EXT_shadow_samplers,
    GL_OES_texture_compression_astc, GL_KHR_texture_compression_astc_ldr,
    GL_KHR_texture_compression_astc_hdr,
    GL_KHR_texture_compression_astc_sliced_3d, GL_KHR_debug,
    GL_EXT_occlusion_query_boolean, GL_EXT_disjoint_timer_query,
    GL_EXT_blend_minmax, GL_EXT_discard_framebuffer,
    GL_OES_get_program_binary, GL_OES_texture_3D, GL_EXT_texture_storage,
    GL_EXT_multisampled_render_to_texture, GL_OES_surfaceless_context,
    GL_OES_texture_stencil8, GL_EXT_shader_pixel_local_storage,
    GL_ARM_shader_framebuffer_fetch,
    GL_ARM_shader_framebuffer_fetch_depth_stencil, GL_ARM_mali_program_binary,
    GL_EXT_sRGB, GL_EXT_sRGB_write_control, GL_EXT_texture_sRGB_decode,
    GL_KHR_blend_equation_advanced, GL_KHR_blend_equation_advanced_coherent,
    GL_OES_texture_storage_multisample_2d_array, GL_OES_shader_image_atomic,
    GL_EXT_robustness, GL_EXT_texture_border_clamp,
    GL_OES_texture_border_clamp, GL_EXT_texture_cube_map_array,
    GL_OES_texture_cube_map_array, GL_OES_sample_variables,
    GL_OES_sample_shading, GL_OES_shader_multisample_interpolation,
    GL_EXT_shader_io_blocks, GL_OES_shader_io_blocks, GL_EXT_gpu_shader5,
    GL_OES_gpu_shader5, GL_EXT_texture_buffer, GL_OES_texture_buffer,
    GL_EXT_copy_image, GL_OES_copy_image

 

  • Thanks for the bug report, I've raised this with the driver team.
  • I've been been looking into this from the driver side.

    For reference, could you identify the version of totem that you're using, and how that's configured to use Wayland? Assuming that totem is interfacing with Wayland via GTK+ 3... the GTK+ 3 library version will also be helpful. In the explanation below, where I mention 'application', I mean 'totem and whatever library is being used' :)

    This segfault can happen if the application frees the Wayland surface too early, specifically if the associated EGL surface is still current. If this is the case, the application is doing something like the following during clean up:

    eglDestroySurface(egl_surface);
    wl_egl_window_destroy(wl_egl_window_win);
    wl_surface_destroy(wl_surface);

    If egl_surface was either the draw or read argument in the previous call to eglMakeCurrent, egl_surface and wl_egl_window_win are only marked for deletion and are still in use. Destroying wl_surface results in the SEGFAULT when the driver subsequently needs to do something with the wl_surface (in this case, part of deletion). EGL spec 1.5 sections 3.5.5 and 3.2 cover the lifetime of EGL objects.

    There are 2 possible application fixes you could consider:
     * Call eglMakeCurrent(display, EGL_NO_SURFACE, EGL_NO_SURFACE, EGL_NO_CONTEXT) before destroying the surface.
     * Call eglTerminate() instead of destroying the surfaces individually.

    In the meantime I'm looking into whether we can reproduce this locally, or whether a change to the driver is feasible.

  • Thanks for the quick reply.

    I am using stock Debian stretch packages, so it's GTK3 3.22.8 (3.22.8-1 debian packaging version) and totem version 3.22.0 (3.22.0-2 debian packaging version).

    totem of course is not the only application that has this issue; gnome-maps is another one. When using Gnome3 (gnome-shell) the segmentation fault actually causes the entire session to be killed (gnome-shell crashes); in weston, only the application crashes. This might be because the issue is not in the application, but maybe in GTK3, and gnome-shell also uses GTK3 while weston does not use it (?).
    Indeed, there is no call to eglDestroySurface, wl_egl_window_destroy or wl_surface_destroy in totem source.

    In GTK3, all calls to eglDestroySurface and wl_egl_window_destroy are in gdk/wayland/gdkwindow-wayland.c:
    github.com/.../gdkwindow-wayland.c
    -- this is the one in GTK 3.22.8 -- but the calls look ok -- and all calls to wl_surface_destroy are in gdk/wayland/gdkdevice-wayland.c:
    github.com/.../gdkdevice-wayland.c
    -- but the calls are to wl_surface_destroy the pointer surface... so this leaves little to look at in GTK3. Unless both weston and gnome-shell are the cause, which i doubt.

    If you would like to debug on the exact rootfs i can reproduce the error, you can download the XU4 image from here: forum.odroid.com/viewtopic.php
  • Hi, are there any news on this?
    Could you reproduce it on your end?

    Thanks.
  • We haven't reproduced this issue locally yet (it's on our backlog of things to take a look at).

    I'm reasonably confident that this is an issue in GDK (or how totem is calling GTK+) rather than the driver.

  • is anyone still looking at this issue?

    it's also affecting other midgard drivers, i have confirmed it with RK3399 mali T860 r14p0 driver: www.youtube.com/watch