[Device & Driver]
Manufacturer : Samsung Model : Galaxy Tab S11 (SM-X736B / gts11eea) Build : samsung/gts11eea/gts11:16/BP4A.251205.006/X736BXXU5AZBC_OXM5AZBC:userdebug Kernel : 6.6.102-android15-8-abogkiX736BXXU5AZBC-4k SoC : MediaTek MT6991 GPU : Mali-G925-Immortalis MC12 GPU driver : 49.1.0 Vulkan API : 1.3.278 Vulkan loader : Android system libvulkan.so Mali ICD : /vendor/lib64/egl/mt6991/libGLES_mali.so BuildId: 8ffcdf0fe7b476c1
[Summary]
Every CPU-side wait/idle Vulkan entry point SIGSEGVs inside the Mali ICD after the application performs a vkQueueSubmit on a swapchain present command buffer. All 5 sync primitives tested crash in the same ~0x99xxxx region of libGLES_mali.so or in libvulkan.so when delegating to the ICD.
Confirmed with A/B comparison against Qualcomm Adreno 830 / driver 512.800.1 on Galaxy S25 Ultra running identical APK — Adreno survives 1000+ frames without crash.
[Reproduction]
1. Initialize Vulkan instance + device through Android system loader (we use ncnn 20260113 with NCNN_SIMPLEVK=1, but any path triggers it). 2. Create VkSurfaceKHR from ANativeWindow. 3. Create VkSwapchainKHR: FIFO, 4-5 images, VK_FORMAT_R8G8B8A8_UNORM, usage = STORAGE_BIT | TRANSFER_DST_BIT | COLOR_ATTACHMENT_BIT. 4. Allocate host-visible staging VkBuffer, memcpy RGBA into it. 5. Record cmd buffer: image layout transition -> vkCmdCopyBufferToImage -> layout transition for present. 6. vkAcquireNextImageKHR (binary semaphore sem_acq). 7. vkQueueSubmit: pWaitSemaphores=[sem_acq], pSignalSemaphores=[sem_ren, in_flight_sem], timeline signal value ++signal_val. 8. vkQueuePresentKHR: pWaitSemaphores=[sem_ren]. 9. On the next frame, call ANY of: - vkWaitForFences(device, 1, &fence, VK_TRUE, UINT64_MAX) - vkWaitSemaphores(device, &swi, UINT64_MAX) - vkWaitSemaphoresKHR(...) - vkGetSemaphoreCounterValue(device, sem, &value) - vkQueueWaitIdle(queue) 10. SIGSEGV inside libGLES_mali.so within 0-6 frames.
[Stack traces — 5 variants on same device + driver]
--- Variant 1 : vkWaitForFences --- F libc : Fatal signal 11 (SIGSEGV) fault addr 0x306e69be (read) #00 pc 0x21804 /system/lib64/libvulkan.so vulkan::api::WaitForFences+4 #01 pc 0x3b0c1c app::present_real_frame+856
--- Variant 2 : vkWaitSemaphores (timeline) --- F libc : Fatal signal 11 (SIGSEGV) fault addr 0x720600007214 (read) #00 pc 0x995098 /vendor/lib64/egl/mt6991/libGLES_mali.so #01 pc 0x3b0698 app::present_real_frame+936
--- Variant 3 : vkGetSemaphoreCounterValue (poll) --- F libc : Fatal signal 11 (SIGSEGV) fault addr 0x5ffffa67f0 (read) #00 pc 0x1dd63a4 /vendor/lib64/egl/mt6991/libGLES_mali.so #01 pc 0x994fd4 /vendor/lib64/egl/mt6991/libGLES_mali.so #02 pc 0x3b0814 app::present_real_frame+912
--- Variant 4 : vkQueueWaitIdle --- F libc : Fatal signal 11 (SIGSEGV) fault addr 0xbea048453f5f7f8b (read) #00 pc 0x21594 /system/lib64/libvulkan.so vulkan::api::QueueWaitIdle+4 #01 pc 0x3b074c app::present_real_frame+860
--- Variant 5 : skip wait, direct memcpy --- Different crash: CPU memcpy hits unmapped staging buffer page. Demonstrates wait is functionally required.
Fault addresses across variants 1-4 are not random heap pointers — small offsets (0x...7214, 0x...69be) or tagged-looking 0xbea... values — suggesting the ICD computes a bad index off a corrupt internal struct rather than dereferencing uninit memory.
[Vulkan capability advertising vs actual behavior]
ncnn enumeration: [0 Mali-G925-Immortalis MC12] queueC=0[2] queueT=0[2] fp16-p/s/u/a = 1/1/1/1 int8-p/s/u/a = 1/1/1/1 bf16-p/s = 1/0 subgroup = 16 (16~16) ops = 1/1/1/1/1/1/1/1/1/1 fp16-cm = 4x8x8/16x32x32
Related issue: fp16_storage advertised as supported but compute inference compiled with opt.use_fp16_storage=true diverges from CPU fp32 reference by mae = 0.434 over a 921600-pixel golden image at 1280x720 (threshold 0.05 -> FAIL). fp16_packed mae = 0.346 (also FAIL). Pure-fp32 Vulkan passes at mae = 0.045.
[Galaxy S25 Ultra control — same APK, same source]
Model : Galaxy S25 Ultra (SM-S938N) GPU : Adreno 830 GPU driver : 512.800.1 Vulkan API : 1.3.284
Swapchain init OK at 1080x2160, 4 images, FIFO. vkWaitForFences and vkWaitSemaphores both work indefinitely (verified 1000+ frames). No SIGSEGV in any sync API. fp16_storage mae well under 0.05 gate.
[Expected behavior]
vkWaitForFences / vkWaitSemaphores / vkGetSemaphoreCounterValue / vkQueueWaitIdle must complete without segfault per Vulkan 1.3 spec section 7 (Synchronization) when called on valid objects against a valid VkDevice. Current driver violates this within 0-6 frames of any swapchain-bound compute submission.
[Impact]
On-device GPU compositing + Vulkan WSI present path is unusable on the affected device. Apps that present from a compute queue (matting, ML inference, custom GPU UI) have no path to use a VkSwapchainKHR — must fall back to ANativeWindow_lock + memcpy or implement an EGL/GLES bridge workaround.
[Workaround implemented for reference]
EGL/GLES bridge present path using AHardwareBuffer + eglSwapBuffers, replacing VkSwapchainKHR + vkQueuePresentKHR entirely. The Mali GL ES driver path uses a separate (mature) sync subsystem inside the same vendor library and does NOT crash on the same hardware.
Verified on Tab S11 Mali-G925: 60+ seconds of continuous RTSP feed rendering, no SIGSEGV in libGLES_mali.so. Same APK that crashed within 0-6 frames using vkWaitForFences / vkWaitSemaphores / vkGetSemaphoreCounterValue / vkQueueWaitIdle.
Self-contained C++ workaround (Android NDK + EGL + GLES 3.0 + AHB):
// Workaround for Mali-G925 Vulkan WSI sync SIGSEGV. // Replaces VkSwapchainKHR present with EGL/GLES + AHardwareBuffer. #include <EGL/egl.h> #include <EGL/eglext.h> #include <GLES3/gl3.h> #include <GLES2/gl2ext.h> #include <android/hardware_buffer.h> #include <android/native_window.h> #include <cstring> typedef EGLClientBuffer (EGLAPIENTRYP PFN_eglGetNativeClientBufferANDROID)(const AHardwareBuffer*); typedef EGLImageKHR (EGLAPIENTRYP PFN_eglCreateImageKHR)(EGLDisplay, EGLContext, EGLenum, EGLClientBuffer, const EGLint*); typedef void (GL_APIENTRYP PFN_glEGLImageTargetTexture2DOES)(GLenum, GLeglImageOES); struct GlBridge { EGLDisplay display = EGL_NO_DISPLAY; EGLConfig config = nullptr; EGLContext context = EGL_NO_CONTEXT; EGLSurface surface = EGL_NO_SURFACE; ANativeWindow* window = nullptr; int surface_w = 0, surface_h = 0; GLuint program = 0, texture = 0; AHardwareBuffer* ahb = nullptr; EGLImageKHR ahb_img = EGL_NO_IMAGE_KHR; int ahb_w = 0, ahb_h = 0, ahb_stride = 0; PFN_eglGetNativeClientBufferANDROID fnGetNativeBuffer = nullptr; PFN_eglCreateImageKHR fnCreateImage = nullptr; PFN_glEGLImageTargetTexture2DOES fnImageTarget2D = nullptr; bool initialized = false; }; static GlBridge g; // Vertex: fullscreen triangle from gl_VertexID — no VBO needed. static const char* kVS = R"(#version 300 es out vec2 v_uv; void main() { vec2 p = vec2((gl_VertexID & 1) * 2, (gl_VertexID & 2)); gl_Position = vec4(p * 2.0 - 1.0, 0.0, 1.0); v_uv = vec2(p.x, 1.0 - p.y); })"; static const char* kFS = R"(#version 300 es precision mediump float; in vec2 v_uv; uniform sampler2D u_tex; out vec4 frag; void main() { frag = texture(u_tex, v_uv); })"; // CRITICAL: EGL context is thread-affined. setOutputWindow runs on main // thread; present runs on camera callback thread. Lazy-init EGL on the // THREAD that will own the context (= camera thread = first present call). // Otherwise eglMakeCurrent returns EGL_BAD_ACCESS. static bool bootstrap_egl_on_calling_thread() { g.display = eglGetDisplay(EGL_DEFAULT_DISPLAY); eglInitialize(g.display, nullptr, nullptr); const EGLint cfg_attrs[] = { EGL_SURFACE_TYPE, EGL_WINDOW_BIT, EGL_RENDERABLE_TYPE, EGL_OPENGL_ES3_BIT, EGL_RED_SIZE, 8, EGL_GREEN_SIZE, 8, EGL_BLUE_SIZE, 8, EGL_ALPHA_SIZE, 8, EGL_NONE }; EGLint n_cfg = 0; eglChooseConfig(g.display, cfg_attrs, &g.config, 1, &n_cfg); EGLint native_vis = 0; eglGetConfigAttrib(g.display, g.config, EGL_NATIVE_VISUAL_ID, &native_vis); ANativeWindow_setBuffersGeometry(g.window, 0, 0, native_vis); g.surface = eglCreateWindowSurface(g.display, g.config, g.window, nullptr); const EGLint ctx_attrs[] = { EGL_CONTEXT_CLIENT_VERSION, 3, EGL_NONE }; g.context = eglCreateContext(g.display, g.config, EGL_NO_CONTEXT, ctx_attrs); eglMakeCurrent(g.display, g.surface, g.surface, g.context); eglQuerySurface(g.display, g.surface, EGL_WIDTH, &g.surface_w); eglQuerySurface(g.display, g.surface, EGL_HEIGHT, &g.surface_h); // Compile vert + frag → program. (omitted: standard glCompileShader / glLinkProgram) g.program = build_program(kVS, kFS); // Resolve AHB extension entry points. g.fnGetNativeBuffer = (PFN_eglGetNativeClientBufferANDROID)eglGetProcAddress("eglGetNativeClientBufferANDROID"); g.fnCreateImage = (PFN_eglCreateImageKHR) eglGetProcAddress("eglCreateImageKHR"); g.fnImageTarget2D = (PFN_glEGLImageTargetTexture2DOES) eglGetProcAddress("glEGLImageTargetTexture2DOES"); return true; } // Allocate AHB once + bind as GL texture via EGLImage. Zero-copy upload: // CPU writes into AHB pages, GL sees the same physical memory. static void ensure_ahb(int w, int h) { if (g.ahb && g.ahb_w == w && g.ahb_h == h) return; if (g.ahb) AHardwareBuffer_release(g.ahb); AHardwareBuffer_Desc desc = {}; desc.width = w; desc.height = h; desc.layers = 1; desc.format = AHARDWAREBUFFER_FORMAT_R8G8B8A8_UNORM; desc.usage = AHARDWAREBUFFER_USAGE_GPU_SAMPLED_IMAGE | AHARDWAREBUFFER_USAGE_CPU_WRITE_OFTEN; AHardwareBuffer_allocate(&desc, &g.ahb); AHardwareBuffer_Desc actual = {}; AHardwareBuffer_describe(g.ahb, &actual); g.ahb_w = w; g.ahb_h = h; g.ahb_stride = actual.stride; EGLClientBuffer cb = g.fnGetNativeBuffer(g.ahb); const EGLint img_attrs[] = { EGL_IMAGE_PRESERVED_KHR, EGL_TRUE, EGL_NONE }; g.ahb_img = g.fnCreateImage(g.display, EGL_NO_CONTEXT, EGL_NATIVE_BUFFER_ANDROID, cb, img_attrs); if (g.texture) glDeleteTextures(1, &g.texture); glGenTextures(1, &g.texture); glBindTexture(GL_TEXTURE_2D, g.texture); glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR); glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR); g.fnImageTarget2D(GL_TEXTURE_2D, (GLeglImageOES)g.ahb_img); } // Public entry — called from main thread when Surface arrives. // Stash window only; defer EGL bootstrap until first present (thread-affined). extern "C" int gl_bridge_init(ANativeWindow* win) { g.window = win; ANativeWindow_acquire(win); g.initialized = true; return 0; } // Public entry — called from camera callback thread per frame. // `rgba` is a tightly-packed RGBA8 buffer w*h*4 bytes. extern "C" int gl_bridge_present(const uint8_t* rgba, int w, int h) { if (!g.initialized) return -1; if (g.display == EGL_NO_DISPLAY) { if (!bootstrap_egl_on_calling_thread()) return -2; } eglMakeCurrent(g.display, g.surface, g.surface, g.context); ensure_ahb(w, h); // Zero-copy upload via AHB lock — GL sees changes after unlock. void* mapped = nullptr; AHardwareBuffer_lock(g.ahb, AHARDWAREBUFFER_USAGE_CPU_WRITE_OFTEN, -1, nullptr, &mapped); const int dst_row = g.ahb_stride * 4; const int src_row = w * 4; if (dst_row == src_row) { memcpy(mapped, rgba, (size_t)src_row * h); } else { for (int y = 0; y < h; ++y) memcpy((uint8_t*)mapped + (size_t)dst_row * y, rgba + (size_t)src_row * y, src_row); } AHardwareBuffer_unlock(g.ahb, nullptr); glViewport(0, 0, g.surface_w, g.surface_h); glClear(GL_COLOR_BUFFER_BIT); glUseProgram(g.program); glActiveTexture(GL_TEXTURE0); glBindTexture(GL_TEXTURE_2D, g.texture); glUniform1i(glGetUniformLocation(g.program, "u_tex"), 0); glDrawArrays(GL_TRIANGLE_STRIP, 0, 4); eglSwapBuffers(g.display, g.surface); return 0; }
Two critical points: 1. EGL context is thread-affined. Bootstrap MUST run on the thread that will own the context — not the JNI thread that received the Surface. We defer EGL setup to the first present() call so it lands on the camera callback thread automatically. 2. AHB lock/unlock is the only CPU memcpy; no glTexSubImage2D, no driver-side staging. GL sees AHB writes through shared memory.
[Requested action]
1. Acknowledge the bug exists. 2. Identify root cause inside libGLES_mali.so ~0x99xxxx (semaphore object lifecycle? internal fence pool corruption after compute submit? timeline-semaphore subsystem missing initialization?). 3. Provide a fixed Mali driver / firmware update via Samsung OTA for Galaxy Tab S11 SM-X736B and any other device shipping driver 49.1.0.
[Reproducer]
Available on request. Minimal Vulkan-only reproducer (~600 LoC C++) can be supplied if helpful.
[Attached files] - mali-crash-01-vkWaitForFences.txt - mali-crash-02-vkWaitSemaphores.txt - mali-crash-03-vkGetSemaphoreCounterValue.txt - mali-crash-04-vkQueueWaitIdle.txt - mali-crash-05-skip-wait-memcpy.txt
=== Mali-G925 SIGSEGV — variant 1 — vkWaitForFences === Device : Samsung Galaxy Tab S11 (SM-X736B) Build : samsung/gts11eea/gts11:16/BP4A.251205.006/X736BXXU5AZBC_OXM5AZBC:userdebug Kernel : 6.6.102-android15-8-abogkiX736BXXU5AZBC-4k GPU : Mali-G925-Immortalis MC12 Driver : 49.1.0 Vulkan : 1.3.278 App : com.samsung.aifredo.debug Source : vulkan_swapchain.cpp using VkFence per-CPU-slot recycle pattern Build ID : varies per APK rebuild === logcat -b crash excerpt === F libc : Fatal signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x306e69be in tid <ImageReader-640> (camera thread) F DEBUG : Cmdline: com.samsung.aifredo.debug F DEBUG : signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x00000000306e69be (read) F DEBUG : esr: 0000000092000006 (Data Abort Exception 0x24) F DEBUG : tagged_addr_ctrl: 0000000000000001 (PR_TAGGED_ADDR_ENABLE) === backtrace === #00 pc 0x21804 /system/lib64/libvulkan.so vulkan::api::(anonymous namespace)::WaitForFences( VkDevice_T*, unsigned int, VkFence_T* const*, unsigned int, unsigned long)+4 #01 pc 0x3b0c1c /data/app/.../librvmncnn.so aifredo_swapchain_present_real_frame+856 #02 pc 0x369484 /data/app/.../librvmncnn.so NdkCameraWindow::on_image(unsigned char const*, int, int) const+3364 #03 pc 0x367fa8 /data/app/.../librvmncnn.so (unwound to AImageReader callback) #04 pc 0x39b60 /system/lib64/libmediandk.so AImageReader::CallbackHandler::onMessageReceived(...)+416 #05 pc 0x1c818 /system/lib64/libstagefright_foundation.so android::AHandler::deliverMessage(...)+184 #06 pc 0x23bbc /system/lib64/libstagefright_foundation.so android::AMessage::deliver()+172 #07 pc 0x1de58 /system/lib64/libstagefright_foundation.so android::ALooper::loop()+536 #08 pc 0x18120 /system/lib64/libutils.so android::Thread::_threadLoop(void*)+528 #09 pc 0x1590fc /system/lib64/libandroid_runtime.so android::AndroidRuntime::javaThreadShell(void*)+140 === analysis === vkWaitForFences delegates from libvulkan loader to Mali ICD. Fault fires at offset +4 of libvulkan's WaitForFences wrapper (entry on ICD call). Fault address 0x306e69be is a 4-byte-aligned small value, not a heap pointer — consistent with ICD dereferencing a corrupt internal struct field index after compute submit corrupted its sync-object table. Time to crash : 0-6 frames after first vkQueueSubmit on the swapchain command buffer. Reproducibility: 100% with default swapchain pattern (FIFO, 4-5 images, per-frame fence recycle across kFramesInFlight=2).
=== Mali-G925 SIGSEGV — variant 4 — vkQueueWaitIdle === Device : Samsung Galaxy Tab S11 (SM-X736B) GPU : Mali-G925-Immortalis MC12 Driver : 49.1.0 Vulkan : 1.3.278 App : com.samsung.aifredo.debug Source : vulkan_swapchain.cpp:1070 — heavy-handed full-queue stall (vkQueueWaitIdle on the swapchain present queue) in place of any semaphore wait. Different API surface from variants 1-3; intended to bypass the broken sync-object subsystem by waiting on the queue itself. Rationale : if semaphore + fence subsystems are corrupt, maybe the queue-drain API takes a different code path. Hypothesis FAILED — driver still crashes. === logcat -b crash excerpt === F libc : Fatal signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0xbea048453f5f7f8b in tid <Thread-2> F DEBUG : esr: 0000000092000004 (Data Abort Exception 0x24) Fault address 0xbea048453f5f7f8b is a tagged-pointer-looking value in the upper kernel range — consistent with the ICD blindly using a corrupt struct field as a pointer and the kernel mapping table rejecting the access. === backtrace === #00 pc 0x21594 /system/lib64/libvulkan.so vulkan::api::(anonymous namespace)::QueueWaitIdle(VkQueue_T*)+4 #01 pc 0x3b074c /data/app/.../librvmncnn.so aifredo_swapchain_present_real_frame+860 (corresponds to vulkan_swapchain.cpp:1070 = vkQueueWaitIdle(s.queue)) #02 pc 0x369544 /data/app/.../librvmncnn.so NdkCameraWindow::on_image+3364 #03 pc 0x36af88 /data/app/.../librvmncnn.so NdkCameraWindow::rtsp_thread_func+1984 #04 pc 0x8aadc /apex/com.android.runtime/lib64/bionic/libc.so __pthread_start+236 === analysis === vkQueueWaitIdle delegates from libvulkan to ICD. Fault fires at the entry +4 of the libvulkan QueueWaitIdle wrapper (same delegation pattern as variant 1's WaitForFences). Even the heaviest sync API in Vulkan triggers the same Mali driver crash. The bug is in a shared internal helper that ALL sync APIs route through after the compute queue submit corrupts state. Conclusion: the ICD's sync subsystem has a single broken helper that all 4 caller-facing APIs reach, and no combination of compute-and-present queue ordering avoids it on this driver.
=== Mali-G925 SIGSEGV — variant 2 — vkWaitSemaphores (timeline) === Device : Samsung Galaxy Tab S11 (SM-X736B) Build : samsung/gts11eea/gts11:16/BP4A.251205.006/X736BXXU5AZBC_OXM5AZBC:userdebug GPU : Mali-G925-Immortalis MC12 Driver : 49.1.0 Vulkan : 1.3.278 App : com.samsung.aifredo.debug Source : vulkan_swapchain.cpp:1058 after VkFence -> VkTimelineSemaphore migration (per-CPU-slot timeline semaphore with explicit signal value tracking; pfn_WaitSemaphores resolved via vkGetDeviceProcAddr). === logcat -b crash excerpt === F libc : Fatal signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x0000720600007214 in tid <Thread-2> (async inference worker) F DEBUG : Cmdline: com.samsung.aifredo.debug F DEBUG : esr: 0000000092000006 (Data Abort Exception 0x24) === register dump (key) === x0 000072060000720c <- target pointer being deref'd x1 0000000000000001 x2 00000079443545a0 sp 00000079443541a0 pc 0000007c06861804 <- inside libvulkan.so QueueWaitIdle dispatch lr 00000078ac224c20 === backtrace === #00 pc 0x995098 /vendor/lib64/egl/mt6991/libGLES_mali.so BuildId: 8ffcdf0fe7b476c1 #01 pc 0x3b0698 /data/app/.../librvmncnn.so aifredo_swapchain_present_real_frame+936 (corresponds to vulkan_swapchain.cpp:1058 = pfn_WaitSemaphores call) #02 pc 0x369484 /data/app/.../librvmncnn.so NdkCameraWindow::on_image+3364 #03 pc 0x36add8 /data/app/.../librvmncnn.so NdkCameraWindow::rtsp_thread_func+1984 #04 pc 0x8aadc /apex/com.android.runtime/lib64/bionic/libc.so __pthread_start+236 === analysis === Timeline-semaphore migration (Vulkan 1.2 vkWaitSemaphores) was supposed to avoid the vkWaitForFences crash. Mali driver's Vulkan 1.2 timeline-semaphore path also crashes, deeper inside the same ICD region. Crash point at libGLES_mali.so + 0x995098 — same general function area as variant 1's libvulkan delegation target. Fault addr 0x720600007214 is a tagged-pointer-looking value; suggests the ICD's sync subsystem is computing a bad index off a corrupt internal table and using the result as a pointer. Confirms: ARM/Mali sync-object subsystem is broken for ANY caller- facing wait API. The 5-pass golden self-test cascade had also exposed a related fp16_storage miscompile on this driver (mae=0.434 vs 0.05 gate), but the WSI sync crash is independent of fp16 path.
=== Mali-G925 SIGSEGV — variant 5 — skip wait, direct memcpy === Device : Samsung Galaxy Tab S11 (SM-X736B) GPU : Mali-G925-Immortalis MC12 Driver : 49.1.0 Vulkan : 1.3.278 App : com.samsung.aifredo.debug Source : vulkan_swapchain.cpp present_real_frame with all four CPU-side wait/idle APIs commented out (no sync between the CPU memcpy into staging buffer and the GPU's prior submit that may still be reading it). Rationale : if every Vulkan sync API crashes, maybe we can skip the wait entirely and rely on swapchain implicit pipelining (FIFO + kFramesInFlight=2 + 4+ swap images). Hypothesis FAILED — race condition on the staging buffer page leads to a different SIGSEGV. === logcat -b crash excerpt === F libc : Fatal signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x0000005fffa0d000 (WRITE) in tid <Thread-2> F DEBUG : esr: 0000000092000006 (Data Abort Exception 0x24) Note: the fault is a WRITE access (CPU writing to unmapped page), NOT a read like variants 1-4. Distinct failure mode. === backtrace === #00 pc 0x6b640 /apex/com.android.runtime/lib64/bionic/libc.so __memcpy_aarch64_simd+256 #01 pc 0x3b064c /data/app/.../librvmncnn.so aifredo_swapchain_present_real_frame+1052 (corresponds to vulkan_swapchain.cpp memcpy() into staging_mapped[slot] after the wait block was removed) #02 pc 0x369484 /data/app/.../librvmncnn.so NdkCameraWindow::on_image+3364 #03 pc 0x36aec8 /data/app/.../librvmncnn.so NdkCameraWindow::rtsp_thread_func+1984 #04 pc 0x8aadc /apex/com.android.runtime/lib64/bionic/libc.so __pthread_start+236 === analysis === Symbolized site (addr2line) reports the offending call as: memcpy(void*, const void*, size_t) string.h:53 The CPU was writing into staging_mapped[slot] — the host-visible staging buffer for the current frame slot. Without the wait, the buffer was either: - still being read by the GPU from the prior submit on this slot, OR - had been unmapped/freed by the driver under our feet, OR - the physical page was reclaimed. This variant proves the wait is FUNCTIONALLY required for correctness — we cannot simply skip it without rearchitecting the staging buffer ring to avoid all CPU reuse during GPU read. The combination of variants 1-4 (all sync APIs crash) and variant 5 (skip wait causes memcpy race) demonstrates the bug cannot be worked around in app-space. ARM driver patch required.
=== Mali-G925 SIGSEGV — variant 3 — vkGetSemaphoreCounterValue (poll) === Device : Samsung Galaxy Tab S11 (SM-X736B) GPU : Mali-G925-Immortalis MC12 Driver : 49.1.0 Vulkan : 1.3.278 App : com.samsung.aifredo.debug Source : vulkan_swapchain.cpp:1081 — non-blocking poll loop using vkGetSemaphoreCounterValue (Vulkan 1.2 + KHR_timeline_semaphore) in place of blocking vkWaitSemaphores. Tight spin with usleep(10) per iteration, max ~50ms spin then bail. Rationale : if Mali ICD crashes on blocking semaphore wait, maybe it handles non-blocking counter-value introspection without the corrupting state-machine path. Hypothesis FAILED — same ICD region. === logcat -b crash excerpt === F libc : Fatal signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x0000005ffffa67f0 in tid <Thread-2> F DEBUG : esr: 0000000092000007 (Data Abort Exception 0x24) === register dump (key) === x0 b4000079e0289ee0 <- VkSemaphore handle (probably valid) x1 b400007ae02e7ba0 <- pValue output pointer (probably valid) x8 0000005ffffa67f0 <- ICD attempted to write here (fault) pc 0000007902ead3a4 <- inside libGLES_mali.so deeper handler === backtrace === #00 pc 0x1dd63a4 /vendor/lib64/egl/mt6991/libGLES_mali.so BuildId: 8ffcdf0fe7b476c1 #01 pc 0x994fd4 /vendor/lib64/egl/mt6991/libGLES_mali.so BuildId: 8ffcdf0fe7b476c1 (note: ~0x99xxxx region — same general code as variant 2's 0x995098) #02 pc 0x3b0814 /data/app/.../librvmncnn.so aifredo_swapchain_present_real_frame+912 (corresponds to vulkan_swapchain.cpp:1081 = pfn_GetSemaphoreCounterValue) #03 pc 0x369584 /data/app/.../librvmncnn.so NdkCameraWindow::on_image+3364 #04 pc 0x36afc8 /data/app/.../librvmncnn.so NdkCameraWindow::rtsp_thread_func+1984 #05 pc 0x8aadc /apex/com.android.runtime/lib64/bionic/libc.so __pthread_start+236 === analysis === Two libGLES_mali.so frames now visible: - outer: 0x1dd63a4 (higher-level dispatcher) - inner: 0x994fd4 (counter-value read path) The 0x99xxxx address range matches variants 2 and 5 — strong indicator of a single broken function or struct in the ICD's sync subsystem. Whether the caller uses blocking wait or non-blocking poll, control reaches the same corrupted state. App-side conclusion: ARM driver patch required. No app-visible sync API on Mali survives the post-compute-submit period.