[Mali-G925 / driver 49.1.0] SIGSEGV in libGLES_mali.so on every CPU-side Vulkan sync API (vkWaitForFences, vkWaitSemaphores, vkGetSemaphoreCounterValue, vkQueueWaitIdle) after compute swapchain submit

[Device & Driver]

Manufacturer : Samsung
Model : Galaxy Tab S11 (SM-X736B / gts11eea)
Build : samsung/gts11eea/gts11:16/BP4A.251205.006/X736BXXU5AZBC_OXM5AZBC:userdebug
Kernel : 6.6.102-android15-8-abogkiX736BXXU5AZBC-4k
SoC : MediaTek MT6991
GPU : Mali-G925-Immortalis MC12
GPU driver : 49.1.0
Vulkan API : 1.3.278
Vulkan loader : Android system libvulkan.so
Mali ICD : /vendor/lib64/egl/mt6991/libGLES_mali.so
BuildId: 8ffcdf0fe7b476c1


[Summary]

Every CPU-side wait/idle Vulkan entry point SIGSEGVs inside the Mali ICD after the application
performs a vkQueueSubmit on a swapchain present command buffer. All 5 sync primitives tested crash in
the same ~0x99xxxx region of libGLES_mali.so or in libvulkan.so when delegating to the ICD.

Confirmed with A/B comparison against Qualcomm Adreno 830 / driver 512.800.1 on Galaxy S25 Ultra
running identical APK — Adreno survives 1000+ frames without crash.


[Reproduction]

1. Initialize Vulkan instance + device through Android system loader (we use ncnn 20260113 with
NCNN_SIMPLEVK=1, but any path triggers it).
2. Create VkSurfaceKHR from ANativeWindow.
3. Create VkSwapchainKHR: FIFO, 4-5 images, VK_FORMAT_R8G8B8A8_UNORM, usage = STORAGE_BIT |
TRANSFER_DST_BIT | COLOR_ATTACHMENT_BIT.
4. Allocate host-visible staging VkBuffer, memcpy RGBA into it.
5. Record cmd buffer: image layout transition -> vkCmdCopyBufferToImage -> layout transition for
present.
6. vkAcquireNextImageKHR (binary semaphore sem_acq).
7. vkQueueSubmit: pWaitSemaphores=[sem_acq], pSignalSemaphores=[sem_ren, in_flight_sem], timeline
signal value ++signal_val.
8. vkQueuePresentKHR: pWaitSemaphores=[sem_ren].
9. On the next frame, call ANY of:
- vkWaitForFences(device, 1, &fence, VK_TRUE, UINT64_MAX)
- vkWaitSemaphores(device, &swi, UINT64_MAX)
- vkWaitSemaphoresKHR(...)
- vkGetSemaphoreCounterValue(device, sem, &value)
- vkQueueWaitIdle(queue)
10. SIGSEGV inside libGLES_mali.so within 0-6 frames.


[Stack traces — 5 variants on same device + driver]

--- Variant 1 : vkWaitForFences ---
F libc : Fatal signal 11 (SIGSEGV) fault addr 0x306e69be (read)
#00 pc 0x21804 /system/lib64/libvulkan.so
vulkan::api::WaitForFences+4
#01 pc 0x3b0c1c app::present_real_frame+856

--- Variant 2 : vkWaitSemaphores (timeline) ---
F libc : Fatal signal 11 (SIGSEGV) fault addr 0x720600007214 (read)
#00 pc 0x995098 /vendor/lib64/egl/mt6991/libGLES_mali.so
#01 pc 0x3b0698 app::present_real_frame+936

--- Variant 3 : vkGetSemaphoreCounterValue (poll) ---
F libc : Fatal signal 11 (SIGSEGV) fault addr 0x5ffffa67f0 (read)
#00 pc 0x1dd63a4 /vendor/lib64/egl/mt6991/libGLES_mali.so
#01 pc 0x994fd4 /vendor/lib64/egl/mt6991/libGLES_mali.so
#02 pc 0x3b0814 app::present_real_frame+912

--- Variant 4 : vkQueueWaitIdle ---
F libc : Fatal signal 11 (SIGSEGV) fault addr 0xbea048453f5f7f8b (read)
#00 pc 0x21594 /system/lib64/libvulkan.so
vulkan::api::QueueWaitIdle+4
#01 pc 0x3b074c app::present_real_frame+860

--- Variant 5 : skip wait, direct memcpy ---
Different crash: CPU memcpy hits unmapped staging buffer page. Demonstrates wait is functionally
required.

Fault addresses across variants 1-4 are not random heap pointers — small offsets (0x...7214,
0x...69be) or tagged-looking 0xbea... values — suggesting the ICD computes a bad index off a corrupt
internal struct rather than dereferencing uninit memory.


[Vulkan capability advertising vs actual behavior]

ncnn enumeration:
[0 Mali-G925-Immortalis MC12]
queueC=0[2] queueT=0[2]
fp16-p/s/u/a = 1/1/1/1
int8-p/s/u/a = 1/1/1/1
bf16-p/s = 1/0
subgroup = 16 (16~16)
ops = 1/1/1/1/1/1/1/1/1/1
fp16-cm = 4x8x8/16x32x32

Related issue: fp16_storage advertised as supported but compute inference compiled with
opt.use_fp16_storage=true diverges from CPU fp32 reference by mae = 0.434 over a 921600-pixel golden
image at 1280x720 (threshold 0.05 -> FAIL). fp16_packed mae = 0.346 (also FAIL). Pure-fp32 Vulkan
passes at mae = 0.045.


[Galaxy S25 Ultra control — same APK, same source]

Model : Galaxy S25 Ultra (SM-S938N)
GPU : Adreno 830
GPU driver : 512.800.1
Vulkan API : 1.3.284

Swapchain init OK at 1080x2160, 4 images, FIFO. vkWaitForFences and vkWaitSemaphores both work
indefinitely (verified 1000+ frames). No SIGSEGV in any sync API. fp16_storage mae well under 0.05
gate.


[Expected behavior]

vkWaitForFences / vkWaitSemaphores / vkGetSemaphoreCounterValue / vkQueueWaitIdle must complete
without segfault per Vulkan 1.3 spec section 7 (Synchronization) when called on valid objects against
a valid VkDevice. Current driver violates this within 0-6 frames of any swapchain-bound compute
submission.


[Impact]

On-device GPU compositing + Vulkan WSI present path is unusable on the affected device. Apps that
present from a compute queue (matting, ML inference, custom GPU UI) have no path to use a
VkSwapchainKHR — must fall back to ANativeWindow_lock + memcpy or implement an EGL/GLES bridge
workaround.

[Workaround implemented for reference]

EGL/GLES bridge present path using AHardwareBuffer + eglSwapBuffers,
replacing VkSwapchainKHR + vkQueuePresentKHR entirely. The Mali GL ES
driver path uses a separate (mature) sync subsystem inside the same
vendor library and does NOT crash on the same hardware.

Verified on Tab S11 Mali-G925: 60+ seconds of continuous RTSP feed
rendering, no SIGSEGV in libGLES_mali.so. Same APK that crashed
within 0-6 frames using vkWaitForFences / vkWaitSemaphores /
vkGetSemaphoreCounterValue / vkQueueWaitIdle.

Self-contained C++ workaround (Android NDK + EGL + GLES 3.0 + AHB):

// Workaround for Mali-G925 Vulkan WSI sync SIGSEGV.
  // Replaces VkSwapchainKHR present with EGL/GLES + AHardwareBuffer.

  #include <EGL/egl.h>
  #include <EGL/eglext.h>
  #include <GLES3/gl3.h>
  #include <GLES2/gl2ext.h>
  #include <android/hardware_buffer.h>
  #include <android/native_window.h>
  #include <cstring>

  typedef EGLClientBuffer (EGLAPIENTRYP PFN_eglGetNativeClientBufferANDROID)(const AHardwareBuffer*);
  typedef EGLImageKHR     (EGLAPIENTRYP PFN_eglCreateImageKHR)(EGLDisplay, EGLContext, EGLenum,
  EGLClientBuffer, const EGLint*);
  typedef void            (GL_APIENTRYP PFN_glEGLImageTargetTexture2DOES)(GLenum, GLeglImageOES);

  struct GlBridge {
      EGLDisplay display = EGL_NO_DISPLAY;
      EGLConfig  config  = nullptr;
      EGLContext context = EGL_NO_CONTEXT;
      EGLSurface surface = EGL_NO_SURFACE;
      ANativeWindow* window = nullptr;
      int surface_w = 0, surface_h = 0;

      GLuint program = 0, texture = 0;
      AHardwareBuffer* ahb = nullptr;
      EGLImageKHR     ahb_img = EGL_NO_IMAGE_KHR;
      int ahb_w = 0, ahb_h = 0, ahb_stride = 0;

      PFN_eglGetNativeClientBufferANDROID  fnGetNativeBuffer = nullptr;
      PFN_eglCreateImageKHR                fnCreateImage     = nullptr;
      PFN_glEGLImageTargetTexture2DOES     fnImageTarget2D   = nullptr;

      bool initialized = false;
  };

  static GlBridge g;

  // Vertex: fullscreen triangle from gl_VertexID — no VBO needed.
  static const char* kVS = R"(#version 300 es
  out vec2 v_uv;
  void main() {
      vec2 p = vec2((gl_VertexID & 1) * 2, (gl_VertexID & 2));
      gl_Position = vec4(p * 2.0 - 1.0, 0.0, 1.0);
      v_uv = vec2(p.x, 1.0 - p.y);
  })";

  static const char* kFS = R"(#version 300 es
  precision mediump float;
  in vec2 v_uv;
  uniform sampler2D u_tex;
  out vec4 frag;
  void main() { frag = texture(u_tex, v_uv); })";

  // CRITICAL: EGL context is thread-affined. setOutputWindow runs on main
  // thread; present runs on camera callback thread. Lazy-init EGL on the
  // THREAD that will own the context (= camera thread = first present call).
  // Otherwise eglMakeCurrent returns EGL_BAD_ACCESS.
  static bool bootstrap_egl_on_calling_thread() {
      g.display = eglGetDisplay(EGL_DEFAULT_DISPLAY);
      eglInitialize(g.display, nullptr, nullptr);

      const EGLint cfg_attrs[] = {
          EGL_SURFACE_TYPE, EGL_WINDOW_BIT,
          EGL_RENDERABLE_TYPE, EGL_OPENGL_ES3_BIT,
          EGL_RED_SIZE, 8, EGL_GREEN_SIZE, 8, EGL_BLUE_SIZE, 8, EGL_ALPHA_SIZE, 8,
          EGL_NONE
      };
      EGLint n_cfg = 0;
      eglChooseConfig(g.display, cfg_attrs, &g.config, 1, &n_cfg);

      EGLint native_vis = 0;
      eglGetConfigAttrib(g.display, g.config, EGL_NATIVE_VISUAL_ID, &native_vis);
      ANativeWindow_setBuffersGeometry(g.window, 0, 0, native_vis);

      g.surface = eglCreateWindowSurface(g.display, g.config, g.window, nullptr);
      const EGLint ctx_attrs[] = { EGL_CONTEXT_CLIENT_VERSION, 3, EGL_NONE };
      g.context = eglCreateContext(g.display, g.config, EGL_NO_CONTEXT, ctx_attrs);
      eglMakeCurrent(g.display, g.surface, g.surface, g.context);
      eglQuerySurface(g.display, g.surface, EGL_WIDTH,  &g.surface_w);
      eglQuerySurface(g.display, g.surface, EGL_HEIGHT, &g.surface_h);

      // Compile vert + frag → program. (omitted: standard glCompileShader / glLinkProgram)
      g.program = build_program(kVS, kFS);

      // Resolve AHB extension entry points.
      g.fnGetNativeBuffer =
  (PFN_eglGetNativeClientBufferANDROID)eglGetProcAddress("eglGetNativeClientBufferANDROID");
      g.fnCreateImage     = (PFN_eglCreateImageKHR)
  eglGetProcAddress("eglCreateImageKHR");
      g.fnImageTarget2D   = (PFN_glEGLImageTargetTexture2DOES)
  eglGetProcAddress("glEGLImageTargetTexture2DOES");

      return true;
  }

  // Allocate AHB once + bind as GL texture via EGLImage. Zero-copy upload:
  // CPU writes into AHB pages, GL sees the same physical memory.
  static void ensure_ahb(int w, int h) {
      if (g.ahb && g.ahb_w == w && g.ahb_h == h) return;
      if (g.ahb) AHardwareBuffer_release(g.ahb);

      AHardwareBuffer_Desc desc = {};
      desc.width = w; desc.height = h; desc.layers = 1;
      desc.format = AHARDWAREBUFFER_FORMAT_R8G8B8A8_UNORM;
      desc.usage  = AHARDWAREBUFFER_USAGE_GPU_SAMPLED_IMAGE
                  | AHARDWAREBUFFER_USAGE_CPU_WRITE_OFTEN;
      AHardwareBuffer_allocate(&desc, &g.ahb);

      AHardwareBuffer_Desc actual = {};
      AHardwareBuffer_describe(g.ahb, &actual);
      g.ahb_w = w; g.ahb_h = h; g.ahb_stride = actual.stride;

      EGLClientBuffer cb = g.fnGetNativeBuffer(g.ahb);
      const EGLint img_attrs[] = { EGL_IMAGE_PRESERVED_KHR, EGL_TRUE, EGL_NONE };
      g.ahb_img = g.fnCreateImage(g.display, EGL_NO_CONTEXT,
                                  EGL_NATIVE_BUFFER_ANDROID, cb, img_attrs);

      if (g.texture) glDeleteTextures(1, &g.texture);
      glGenTextures(1, &g.texture);
      glBindTexture(GL_TEXTURE_2D, g.texture);
      glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
      glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
      g.fnImageTarget2D(GL_TEXTURE_2D, (GLeglImageOES)g.ahb_img);
  }

  // Public entry — called from main thread when Surface arrives.
  // Stash window only; defer EGL bootstrap until first present (thread-affined).
  extern "C" int gl_bridge_init(ANativeWindow* win) {
      g.window = win;
      ANativeWindow_acquire(win);
      g.initialized = true;
      return 0;
  }

  // Public entry — called from camera callback thread per frame.
  // `rgba` is a tightly-packed RGBA8 buffer w*h*4 bytes.
  extern "C" int gl_bridge_present(const uint8_t* rgba, int w, int h) {
      if (!g.initialized) return -1;
      if (g.display == EGL_NO_DISPLAY) {
          if (!bootstrap_egl_on_calling_thread()) return -2;
      }
      eglMakeCurrent(g.display, g.surface, g.surface, g.context);

      ensure_ahb(w, h);

      // Zero-copy upload via AHB lock — GL sees changes after unlock.
      void* mapped = nullptr;
      AHardwareBuffer_lock(g.ahb, AHARDWAREBUFFER_USAGE_CPU_WRITE_OFTEN,
                           -1, nullptr, &mapped);
      const int dst_row = g.ahb_stride * 4;
      const int src_row = w * 4;
      if (dst_row == src_row) {
          memcpy(mapped, rgba, (size_t)src_row * h);
      } else {
          for (int y = 0; y < h; ++y)
              memcpy((uint8_t*)mapped + (size_t)dst_row * y, rgba + (size_t)src_row * y, src_row);
      }
      AHardwareBuffer_unlock(g.ahb, nullptr);

      glViewport(0, 0, g.surface_w, g.surface_h);
      glClear(GL_COLOR_BUFFER_BIT);
      glUseProgram(g.program);
      glActiveTexture(GL_TEXTURE0);
      glBindTexture(GL_TEXTURE_2D, g.texture);
      glUniform1i(glGetUniformLocation(g.program, "u_tex"), 0);
      glDrawArrays(GL_TRIANGLE_STRIP, 0, 4);

      eglSwapBuffers(g.display, g.surface);
      return 0;
  }

Two critical points:
1. EGL context is thread-affined. Bootstrap MUST run on the thread
that will own the context — not the JNI thread that received
the Surface. We defer EGL setup to the first present() call so
it lands on the camera callback thread automatically.
2. AHB lock/unlock is the only CPU memcpy; no glTexSubImage2D, no
driver-side staging. GL sees AHB writes through shared memory.


[Requested action]

1. Acknowledge the bug exists.
2. Identify root cause inside libGLES_mali.so ~0x99xxxx (semaphore object lifecycle? internal fence
pool corruption after compute submit? timeline-semaphore subsystem missing initialization?).
3. Provide a fixed Mali driver / firmware update via Samsung OTA for Galaxy Tab S11 SM-X736B and any
other device shipping driver 49.1.0.


[Reproducer]

Available on request. Minimal Vulkan-only reproducer (~600 LoC C++) can be supplied if helpful.

[Attached files]
- mali-crash-01-vkWaitForFences.txt
- mali-crash-02-vkWaitSemaphores.txt
- mali-crash-03-vkGetSemaphoreCounterValue.txt
- mali-crash-04-vkQueueWaitIdle.txt
- mali-crash-05-skip-wait-memcpy.txt

=== Mali-G925 SIGSEGV — variant 1 — vkWaitForFences ===

Device   : Samsung Galaxy Tab S11 (SM-X736B)
Build    : samsung/gts11eea/gts11:16/BP4A.251205.006/X736BXXU5AZBC_OXM5AZBC:userdebug
Kernel   : 6.6.102-android15-8-abogkiX736BXXU5AZBC-4k
GPU      : Mali-G925-Immortalis MC12
Driver   : 49.1.0
Vulkan   : 1.3.278

App      : com.samsung.aifredo.debug
Source   : vulkan_swapchain.cpp using VkFence per-CPU-slot recycle pattern
Build ID : varies per APK rebuild

=== logcat -b crash excerpt ===

F libc    : Fatal signal 11 (SIGSEGV), code 1 (SEGV_MAPERR),
            fault addr 0x306e69be in tid <ImageReader-640> (camera thread)
F DEBUG   : Cmdline: com.samsung.aifredo.debug
F DEBUG   : signal 11 (SIGSEGV), code 1 (SEGV_MAPERR),
            fault addr 0x00000000306e69be (read)
F DEBUG   : esr: 0000000092000006 (Data Abort Exception 0x24)
F DEBUG   : tagged_addr_ctrl: 0000000000000001 (PR_TAGGED_ADDR_ENABLE)

=== backtrace ===

#00 pc 0x21804  /system/lib64/libvulkan.so
    vulkan::api::(anonymous namespace)::WaitForFences(
        VkDevice_T*, unsigned int, VkFence_T* const*,
        unsigned int, unsigned long)+4
#01 pc 0x3b0c1c  /data/app/.../librvmncnn.so
    aifredo_swapchain_present_real_frame+856
#02 pc 0x369484  /data/app/.../librvmncnn.so
    NdkCameraWindow::on_image(unsigned char const*, int, int) const+3364
#03 pc 0x367fa8  /data/app/.../librvmncnn.so
    (unwound to AImageReader callback)
#04 pc 0x39b60   /system/lib64/libmediandk.so
    AImageReader::CallbackHandler::onMessageReceived(...)+416
#05 pc 0x1c818   /system/lib64/libstagefright_foundation.so
    android::AHandler::deliverMessage(...)+184
#06 pc 0x23bbc   /system/lib64/libstagefright_foundation.so
    android::AMessage::deliver()+172
#07 pc 0x1de58   /system/lib64/libstagefright_foundation.so
    android::ALooper::loop()+536
#08 pc 0x18120   /system/lib64/libutils.so
    android::Thread::_threadLoop(void*)+528
#09 pc 0x1590fc  /system/lib64/libandroid_runtime.so
    android::AndroidRuntime::javaThreadShell(void*)+140

=== analysis ===

vkWaitForFences delegates from libvulkan loader to Mali ICD. Fault
fires at offset +4 of libvulkan's WaitForFences wrapper (entry on
ICD call). Fault address 0x306e69be is a 4-byte-aligned small value,
not a heap pointer — consistent with ICD dereferencing a corrupt
internal struct field index after compute submit corrupted its
sync-object table.

Time to crash : 0-6 frames after first vkQueueSubmit on the swapchain
                command buffer.
Reproducibility: 100% with default swapchain pattern (FIFO, 4-5 images,
                 per-frame fence recycle across kFramesInFlight=2).
=== Mali-G925 SIGSEGV — variant 4 — vkQueueWaitIdle ===

Device   : Samsung Galaxy Tab S11 (SM-X736B)
GPU      : Mali-G925-Immortalis MC12
Driver   : 49.1.0
Vulkan   : 1.3.278

App      : com.samsung.aifredo.debug
Source   : vulkan_swapchain.cpp:1070 — heavy-handed full-queue stall
           (vkQueueWaitIdle on the swapchain present queue) in place
           of any semaphore wait. Different API surface from variants
           1-3; intended to bypass the broken sync-object subsystem
           by waiting on the queue itself.

Rationale  : if semaphore + fence subsystems are corrupt, maybe the
             queue-drain API takes a different code path. Hypothesis
             FAILED — driver still crashes.

=== logcat -b crash excerpt ===

F libc    : Fatal signal 11 (SIGSEGV), code 1 (SEGV_MAPERR),
            fault addr 0xbea048453f5f7f8b in tid <Thread-2>
F DEBUG   : esr: 0000000092000004 (Data Abort Exception 0x24)

Fault address 0xbea048453f5f7f8b is a tagged-pointer-looking value
in the upper kernel range — consistent with the ICD blindly using
a corrupt struct field as a pointer and the kernel mapping table
rejecting the access.

=== backtrace ===

#00 pc 0x21594  /system/lib64/libvulkan.so
    vulkan::api::(anonymous namespace)::QueueWaitIdle(VkQueue_T*)+4
#01 pc 0x3b074c /data/app/.../librvmncnn.so
    aifredo_swapchain_present_real_frame+860
    (corresponds to vulkan_swapchain.cpp:1070 = vkQueueWaitIdle(s.queue))
#02 pc 0x369544 /data/app/.../librvmncnn.so
    NdkCameraWindow::on_image+3364
#03 pc 0x36af88 /data/app/.../librvmncnn.so
    NdkCameraWindow::rtsp_thread_func+1984
#04 pc 0x8aadc  /apex/com.android.runtime/lib64/bionic/libc.so
    __pthread_start+236

=== analysis ===

vkQueueWaitIdle delegates from libvulkan to ICD. Fault fires at
the entry +4 of the libvulkan QueueWaitIdle wrapper (same
delegation pattern as variant 1's WaitForFences).

Even the heaviest sync API in Vulkan triggers the same Mali
driver crash. The bug is in a shared internal helper that ALL
sync APIs route through after the compute queue submit
corrupts state.

Conclusion: the ICD's sync subsystem has a single broken
helper that all 4 caller-facing APIs reach, and no
combination of compute-and-present queue ordering avoids it
on this driver.
=== Mali-G925 SIGSEGV — variant 2 — vkWaitSemaphores (timeline) ===

Device   : Samsung Galaxy Tab S11 (SM-X736B)
Build    : samsung/gts11eea/gts11:16/BP4A.251205.006/X736BXXU5AZBC_OXM5AZBC:userdebug
GPU      : Mali-G925-Immortalis MC12
Driver   : 49.1.0
Vulkan   : 1.3.278

App      : com.samsung.aifredo.debug
Source   : vulkan_swapchain.cpp:1058 after VkFence -> VkTimelineSemaphore
           migration (per-CPU-slot timeline semaphore with explicit signal
           value tracking; pfn_WaitSemaphores resolved via
           vkGetDeviceProcAddr).

=== logcat -b crash excerpt ===

F libc    : Fatal signal 11 (SIGSEGV), code 1 (SEGV_MAPERR),
            fault addr 0x0000720600007214 in tid <Thread-2>
            (async inference worker)
F DEBUG   : Cmdline: com.samsung.aifredo.debug
F DEBUG   : esr: 0000000092000006 (Data Abort Exception 0x24)

=== register dump (key) ===

x0  000072060000720c   <- target pointer being deref'd
x1  0000000000000001
x2  00000079443545a0
sp  00000079443541a0
pc  0000007c06861804   <- inside libvulkan.so QueueWaitIdle dispatch
lr  00000078ac224c20

=== backtrace ===

#00 pc 0x995098 /vendor/lib64/egl/mt6991/libGLES_mali.so
    BuildId: 8ffcdf0fe7b476c1
#01 pc 0x3b0698 /data/app/.../librvmncnn.so
    aifredo_swapchain_present_real_frame+936
    (corresponds to vulkan_swapchain.cpp:1058 = pfn_WaitSemaphores call)
#02 pc 0x369484 /data/app/.../librvmncnn.so
    NdkCameraWindow::on_image+3364
#03 pc 0x36add8 /data/app/.../librvmncnn.so
    NdkCameraWindow::rtsp_thread_func+1984
#04 pc 0x8aadc  /apex/com.android.runtime/lib64/bionic/libc.so
    __pthread_start+236

=== analysis ===

Timeline-semaphore migration (Vulkan 1.2 vkWaitSemaphores) was
supposed to avoid the vkWaitForFences crash. Mali driver's Vulkan
1.2 timeline-semaphore path also crashes, deeper inside the same
ICD region. Crash point at libGLES_mali.so + 0x995098 — same general
function area as variant 1's libvulkan delegation target.

Fault addr 0x720600007214 is a tagged-pointer-looking value;
suggests the ICD's sync subsystem is computing a bad index off a
corrupt internal table and using the result as a pointer.

Confirms: ARM/Mali sync-object subsystem is broken for ANY caller-
facing wait API. The 5-pass golden self-test cascade had also
exposed a related fp16_storage miscompile on this driver
(mae=0.434 vs 0.05 gate), but the WSI sync crash is independent
of fp16 path.
=== Mali-G925 SIGSEGV — variant 5 — skip wait, direct memcpy ===

Device   : Samsung Galaxy Tab S11 (SM-X736B)
GPU      : Mali-G925-Immortalis MC12
Driver   : 49.1.0
Vulkan   : 1.3.278

App      : com.samsung.aifredo.debug
Source   : vulkan_swapchain.cpp present_real_frame with all four
           CPU-side wait/idle APIs commented out (no sync between
           the CPU memcpy into staging buffer and the GPU's
           prior submit that may still be reading it).

Rationale  : if every Vulkan sync API crashes, maybe we can skip
             the wait entirely and rely on swapchain implicit
             pipelining (FIFO + kFramesInFlight=2 + 4+ swap images).
             Hypothesis FAILED — race condition on the staging
             buffer page leads to a different SIGSEGV.

=== logcat -b crash excerpt ===

F libc    : Fatal signal 11 (SIGSEGV), code 1 (SEGV_MAPERR),
            fault addr 0x0000005fffa0d000 (WRITE) in tid <Thread-2>
F DEBUG   : esr: 0000000092000006 (Data Abort Exception 0x24)

Note: the fault is a WRITE access (CPU writing to unmapped page),
NOT a read like variants 1-4. Distinct failure mode.

=== backtrace ===

#00 pc 0x6b640  /apex/com.android.runtime/lib64/bionic/libc.so
    __memcpy_aarch64_simd+256
#01 pc 0x3b064c /data/app/.../librvmncnn.so
    aifredo_swapchain_present_real_frame+1052
    (corresponds to vulkan_swapchain.cpp memcpy() into staging_mapped[slot]
     after the wait block was removed)
#02 pc 0x369484 /data/app/.../librvmncnn.so
    NdkCameraWindow::on_image+3364
#03 pc 0x36aec8 /data/app/.../librvmncnn.so
    NdkCameraWindow::rtsp_thread_func+1984
#04 pc 0x8aadc  /apex/com.android.runtime/lib64/bionic/libc.so
    __pthread_start+236

=== analysis ===

Symbolized site (addr2line) reports the offending call as:
  memcpy(void*, const void*, size_t)
  string.h:53

The CPU was writing into staging_mapped[slot] — the host-visible
staging buffer for the current frame slot. Without the wait, the
buffer was either:
  - still being read by the GPU from the prior submit on this slot,
    OR
  - had been unmapped/freed by the driver under our feet,
    OR
  - the physical page was reclaimed.

This variant proves the wait is FUNCTIONALLY required for
correctness — we cannot simply skip it without rearchitecting
the staging buffer ring to avoid all CPU reuse during GPU read.

The combination of variants 1-4 (all sync APIs crash) and
variant 5 (skip wait causes memcpy race) demonstrates the bug
cannot be worked around in app-space. ARM driver patch required.
=== Mali-G925 SIGSEGV — variant 3 — vkGetSemaphoreCounterValue (poll) ===

Device   : Samsung Galaxy Tab S11 (SM-X736B)
GPU      : Mali-G925-Immortalis MC12
Driver   : 49.1.0
Vulkan   : 1.3.278

App      : com.samsung.aifredo.debug
Source   : vulkan_swapchain.cpp:1081 — non-blocking poll loop using
           vkGetSemaphoreCounterValue (Vulkan 1.2 + KHR_timeline_semaphore)
           in place of blocking vkWaitSemaphores. Tight spin with
           usleep(10) per iteration, max ~50ms spin then bail.

Rationale  : if Mali ICD crashes on blocking semaphore wait, maybe it
             handles non-blocking counter-value introspection without
             the corrupting state-machine path. Hypothesis FAILED —
             same ICD region.

=== logcat -b crash excerpt ===

F libc    : Fatal signal 11 (SIGSEGV), code 1 (SEGV_MAPERR),
            fault addr 0x0000005ffffa67f0 in tid <Thread-2>
F DEBUG   : esr: 0000000092000007 (Data Abort Exception 0x24)

=== register dump (key) ===

x0  b4000079e0289ee0   <- VkSemaphore handle (probably valid)
x1  b400007ae02e7ba0   <- pValue output pointer (probably valid)
x8  0000005ffffa67f0   <- ICD attempted to write here (fault)
pc  0000007902ead3a4   <- inside libGLES_mali.so deeper handler

=== backtrace ===

#00 pc 0x1dd63a4 /vendor/lib64/egl/mt6991/libGLES_mali.so
    BuildId: 8ffcdf0fe7b476c1
#01 pc 0x994fd4  /vendor/lib64/egl/mt6991/libGLES_mali.so
    BuildId: 8ffcdf0fe7b476c1
    (note: ~0x99xxxx region — same general code as variant 2's 0x995098)
#02 pc 0x3b0814  /data/app/.../librvmncnn.so
    aifredo_swapchain_present_real_frame+912
    (corresponds to vulkan_swapchain.cpp:1081 = pfn_GetSemaphoreCounterValue)
#03 pc 0x369584  /data/app/.../librvmncnn.so
    NdkCameraWindow::on_image+3364
#04 pc 0x36afc8  /data/app/.../librvmncnn.so
    NdkCameraWindow::rtsp_thread_func+1984
#05 pc 0x8aadc   /apex/com.android.runtime/lib64/bionic/libc.so
    __pthread_start+236

=== analysis ===

Two libGLES_mali.so frames now visible:
  - outer: 0x1dd63a4 (higher-level dispatcher)
  - inner: 0x994fd4 (counter-value read path)

The 0x99xxxx address range matches variants 2 and 5 — strong
indicator of a single broken function or struct in the ICD's
sync subsystem. Whether the caller uses blocking wait or
non-blocking poll, control reaches the same corrupted state.

App-side conclusion: ARM driver patch required. No app-visible
sync API on Mali survives the post-compute-submit period.

Parents Reply Children
No data