Hello. I am experiencing a crash in the Rader code path for FFT. This is on an M1 Mac, macOS 26.3, building with clang 21. I'm using ArmPL 26.01, but the crash happens with 25.04 as well. I also see this on macOS 12.6 and have a customer that is seeing this too on his M-series Mac (unspecified details).
The crash can be reproduced with a modified version of fftw_dft_2d_c_example.c. in examples_lp64_mp. (I think it happens without openmp too.) The key changes is going from a (5 x 2) array to (256 * 23 x 256 * 23). Using 256 * 19 runs fine. Primes 23 and over crash.
Here's a snippet of the crash info:
Triggered by Thread: 0, Dispatch Queue: com.apple.main-thread
Exception Type: EXC_BAD_ACCESS (SIGBUS)Exception Subtype: KERN_PROTECTION_FAILURE at 0x000000082ec54000Exception Codes: 0x0000000000000002, 0x000000082ec54000
Exception Type: EXC_BAD_ACCESS (SIGBUS)
Exception Subtype: KERN_PROTECTION_FAILURE at 0x000000082ec54000
Exception Codes: 0x0000000000000002, 0x000000082ec54000
Termination Reason: Namespace SIGNAL, Code 10, Bus error: 10Terminating Process: exc handler [40031]
Termination Reason: Namespace SIGNAL, Code 10, Bus error: 10
Terminating Process: exc handler [40031]
VM Region Info: 0x82ec54000 is in 0x82e400000-0xb2c000000; bytes after start: 8732672 bytes before end: 12838420479 REGION TYPE START - END [ VSIZE] PRT/MAX SHRMOD REGION DETAIL MALLOC_SMALL 82e000000-82e400000 [ 4096K] rw-/rwx SM=PRV ---> commpage (reserved) 82e400000-b2c000000 [ 12.0G] ---/--- SM=NUL reserved VM address space (unallocated) GAP OF 0x86000000 BYTES MALLOC_LARGE bb2000000-bba000000 [128.0M] rw-/rwx SM=PRV
VM Region Info: 0x82ec54000 is in 0x82e400000-0xb2c000000; bytes after start: 8732672 bytes before end: 12838420479
REGION TYPE START - END [ VSIZE] PRT/MAX SHRMOD REGION DETAIL
MALLOC_SMALL 82e000000-82e400000 [ 4096K] rw-/rwx SM=PRV
---> commpage (reserved) 82e400000-b2c000000 [ 12.0G] ---/--- SM=NUL reserved VM address space (unallocated)
GAP OF 0x86000000 BYTES
MALLOC_LARGE bb2000000-bba000000 [128.0M] rw-/rwx SM=PRV
Thread 0 Crashed:: Dispatch queue: com.apple.main-thread0 libarmpl_lp64_mp.dylib 0x10b87a8f4 arm::fft1d::level_rader_t<std::__1::complex<double>, std::__1::complex<double>>::execute(long long, void const*, long long, long long, void*, long long, long long) const + 5561 libarmpl_lp64_mp.dylib 0x10b8092d8 void arm::fft1d::execute<std::__1::complex<double>, std::__1::complex<double>>(arm::fft1d::composition<std::__1::complex<double>, std::__1::complex<double>> const&, long long, std::__1::complex<double> const*, std::__1::complex<double>*, long long, long long, long long, long long) + 4722 libarmpl_lp64_mp.dylib 0x10b7d0c80 void arm::fft1d::parallel::parallel_loop<arm::fft1d::batched_1d_plan<std::__1::complex<double>, std::__1::complex<double>>::execute(long long, void const*, long long, long long, void*, long long, long long) const::'lambda'(int)>(int, arm::fft1d::batched_1d_plan<std::__1::complex<double>, std::__1::complex<double>>::execute(long long, void const*, long long, long long, void*, long long, long long) const::'lambda'(int)) (.omp_outlined) + 2923 libomp.dylib 0x104ded1cc __kmp_invoke_microtask + 1564 ??? 0x0 ???5 ??? 0x550 ???6 ??? 0xbb2000000 ???
Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0 libarmpl_lp64_mp.dylib 0x10b87a8f4 arm::fft1d::level_rader_t<std::__1::complex<double>, std::__1::complex<double>>::execute(long long, void const*, long long, long long, void*, long long, long long) const + 556
1 libarmpl_lp64_mp.dylib 0x10b8092d8 void arm::fft1d::execute<std::__1::complex<double>, std::__1::complex<double>>(arm::fft1d::composition<std::__1::complex<double>, std::__1::complex<double>> const&, long long, std::__1::complex<double> const*, std::__1::complex<double>*, long long, long long, long long, long long) + 472
2 libarmpl_lp64_mp.dylib 0x10b7d0c80 void arm::fft1d::parallel::parallel_loop<arm::fft1d::batched_1d_plan<std::__1::complex<double>, std::__1::complex<double>>::execute(long long, void const*, long long, long long, void*, long long, long long) const::'lambda'(int)>(int, arm::fft1d::batched_1d_plan<std::__1::complex<double>, std::__1::complex<double>>::execute(long long, void const*, long long, long long, void*, long long, long long) const::'lambda'(int)) (.omp_outlined) + 292
3 libomp.dylib 0x104ded1cc __kmp_invoke_microtask + 156
4 ??? 0x0 ???
5 ??? 0x550 ???
6 ??? 0xbb2000000 ???
Thread 1:0 ??? 0x104cc4594 ???1 libarmpl_lp64_mp.dylib 0x10b809290 void arm::fft1d::execute<std::__1::complex<double>, std::__1::complex<double>>(arm::fft1d::composition<std::__1::complex<double>, std::__1::complex<double>> const&, long long, std::__1::complex<double> const*, std::__1::complex<double>*, long long, long long, long long, long long) + 4002 libarmpl_lp64_mp.dylib 0x10b7d0c80 void arm::fft1d::parallel::parallel_loop<arm::fft1d::batched_1d_plan<std::__1::complex<double>, std::__1::complex<double>>::execute(long long, void const*, long long, long long, void*, long long, long long) const::'lambda'(int)>(int, arm::fft1d::batched_1d_plan<std::__1::complex<double>, std::__1::complex<double>>::execute(long long, void const*, long long, long long, void*, long long, long long) const::'lambda'(int)) (.omp_outlined) + 2923 libomp.dylib 0x104ded1cc __kmp_invoke_microtask + 1564 ??? 0x1 ???5 ??? 0x380 ???6 ??? 0x200 ???7 ??? 0x726854207265 ???
Thread 1:
0 ??? 0x104cc4594 ???
1 libarmpl_lp64_mp.dylib 0x10b809290 void arm::fft1d::execute<std::__1::complex<double>, std::__1::complex<double>>(arm::fft1d::composition<std::__1::complex<double>, std::__1::complex<double>> const&, long long, std::__1::complex<double> const*, std::__1::complex<double>*, long long, long long, long long, long long) + 400
4 ??? 0x1 ???
5 ??? 0x380 ???
6 ??? 0x200 ???
7 ??? 0x726854207265 ???
Thread 2:0 ??? 0x104cc42a4 ???1 libarmpl_lp64_mp.dylib 0x10b809290 void arm::fft1d::execute<std::__1::complex<double>, std::__1::complex<double>>(arm::fft1d::composition<std::__1::complex<double>, std::__1::complex<double>> const&, long long, std::__1::complex<double> const*, std::__1::complex<double>*, long long, long long, long long, long long) + 4002 libarmpl_lp64_mp.dylib 0x10b7d0c80 void arm::fft1d::parallel::parallel_loop<arm::fft1d::batched_1d_plan<std::__1::complex<double>, std::__1::complex<double>>::execute(long long, void const*, long long, long long, void*, long long, long long) const::'lambda'(int)>(int, arm::fft1d::batched_1d_plan<std::__1::complex<double>, std::__1::complex<double>>::execute(long long, void const*, long long, long long, void*, long long, long long) const::'lambda'(int)) (.omp_outlined) + 2923 libomp.dylib 0x104ded1cc __kmp_invoke_microtask + 1564 ??? 0x2 ???5 ??? 0x380 ???6 ??? 0x200 ???7 ??? 0x726854207265 ???
Thread 2:
0 ??? 0x104cc42a4 ???
4 ??? 0x2 ???
Thread 3:0 libomp.dylib 0x104d8a830 __kmp_launch_thread + 3921 libomp.dylib 0x104dcffb4 __kmp_launch_worker(void*) + 2802 libsystem_pthread.dylib 0x198727c08 _pthread_start + 1363 libsystem_pthread.dylib 0x198722ba8 thread_start + 8
Thread 3:
0 libomp.dylib 0x104d8a830 __kmp_launch_thread + 392
1 libomp.dylib 0x104dcffb4 __kmp_launch_worker(void*) + 280
2 libsystem_pthread.dylib 0x198727c08 _pthread_start + 136
3 libsystem_pthread.dylib 0x198722ba8 thread_start + 8
Thread 4:0 ??? 0x104cc4714 ???1 libarmpl_lp64_mp.dylib 0x10b809290 void arm::fft1d::execute<std::__1::complex<double>, std::__1::complex<double>>(arm::fft1d::composition<std::__1::complex<double>, std::__1::complex<double>> const&, long long, std::__1::complex<double> const*, std::__1::complex<double>*, long long, long long, long long, long long) + 4002 libarmpl_lp64_mp.dylib 0x10b7d0c80 void arm::fft1d::parallel::parallel_loop<arm::fft1d::batched_1d_plan<std::__1::complex<double>, std::__1::complex<double>>::execute(long long, void const*, long long, long long, void*, long long, long long) const::'lambda'(int)>(int, arm::fft1d::batched_1d_plan<std::__1::complex<double>, std::__1::complex<double>>::execute(long long, void const*, long long, long long, void*, long long, long long) const::'lambda'(int)) (.omp_outlined) + 2923 libomp.dylib 0x104ded1cc __kmp_invoke_microtask + 1564 ??? 0x6 ???5 ??? 0x380 ???6 ??? 0x200 ???7 ??? 0x726854207265 ???
Thread 4:
0 ??? 0x104cc4714 ???
4 ??? 0x6 ???
Thread 5:0 ??? 0x104cc4344 ???1 libarmpl_lp64_mp.dylib 0x10b809290 void arm::fft1d::execute<std::__1::complex<double>, std::__1::complex<double>>(arm::fft1d::composition<std::__1::complex<double>, std::__1::complex<double>> const&, long long, std::__1::complex<double> const*, std::__1::complex<double>*, long long, long long, long long, long long) + 4002 libarmpl_lp64_mp.dylib 0x10b7d0c80 void arm::fft1d::parallel::parallel_loop<arm::fft1d::batched_1d_plan<std::__1::complex<double>, std::__1::complex<double>>::execute(long long, void const*, long long, long long, void*, long long, long long) const::'lambda'(int)>(int, arm::fft1d::batched_1d_plan<std::__1::complex<double>, std::__1::complex<double>>::execute(long long, void const*, long long, long long, void*, long long, long long) const::'lambda'(int)) (.omp_outlined) + 2923 libomp.dylib 0x104ded1cc __kmp_invoke_microtask + 1564 ??? 0x7 ???5 ??? 0x380 ???6 ??? 0x200 ???7 ??? 0x726854207265 ???
Thread 5:
0 ??? 0x104cc4344 ???
4 ??? 0x7 ???
Thread 0 crashed with ARM Thread State (64-bit): x0: 0x000000082cc25200 x1: 0x000000082cc25200 x2: 0x0000000000000020 x3: 0x0000000000000020 x4: 0x0000000000000020 x5: 0x0000000000000100 x6: 0x0000000000000200 x7: 0x0000000000000080 x8: 0x0000000000000000 x9: 0x000000082d3183c8 x10: 0x000000082d554000 x11: 0x000000082cc25000 x12: 0x0000000000000016 x13: 0x0000000000170000 x14: 0x0000000000000080 x15: 0x00000000000000a0 x16: 0x0000000000000040 x17: 0x0000000000000060 x18: 0x0000000000000000 x19: 0x000000082cc27e00 x20: 0x0000000000170000 x21: 0x000000082cc27c00 x22: 0x0000000000000200 x23: 0x0000000000000008 x24: 0x000000082cc25000 x25: 0x000000082d56b000 x26: 0x000000082d2fc780 x27: 0x0000000000000020 x28: 0x0000000000000016 fp: 0x000000016b176560 lr: 0x000000010b87a8b0 sp: 0x000000016b176420 pc: 0x000000010b87a8f4 cpsr: 0x20001000 far: 0x000000082ec54000 esr: 0x92000047 (Data Abort) byte write Translation fault
Thread 0 crashed with ARM Thread State (64-bit):
x0: 0x000000082cc25200 x1: 0x000000082cc25200 x2: 0x0000000000000020 x3: 0x0000000000000020
x4: 0x0000000000000020 x5: 0x0000000000000100 x6: 0x0000000000000200 x7: 0x0000000000000080
x8: 0x0000000000000000 x9: 0x000000082d3183c8 x10: 0x000000082d554000 x11: 0x000000082cc25000
x12: 0x0000000000000016 x13: 0x0000000000170000 x14: 0x0000000000000080 x15: 0x00000000000000a0
x16: 0x0000000000000040 x17: 0x0000000000000060 x18: 0x0000000000000000 x19: 0x000000082cc27e00
x20: 0x0000000000170000 x21: 0x000000082cc27c00 x22: 0x0000000000000200 x23: 0x0000000000000008
x24: 0x000000082cc25000 x25: 0x000000082d56b000 x26: 0x000000082d2fc780 x27: 0x0000000000000020
x28: 0x0000000000000016 fp: 0x000000016b176560 lr: 0x000000010b87a8b0
sp: 0x000000016b176420 pc: 0x000000010b87a8f4 cpsr: 0x20001000
far: 0x000000082ec54000 esr: 0x92000047 (Data Abort) byte write Translation fault
Binary Images: 0x104c88000 - 0x104c8bfff fftw_dft_2d_c_doug.exe (*) <bc1e303d-deef-382a-a313-e82127e4f3d0> */fftw_dft_2d_c_doug.exe 0x108f68000 - 0x10c53ffff libarmpl_lp64_mp.dylib (*) <daeaca78-38e9-37be-b7cf-421def5bc699> /Users/USER/Downloads/*/libarmpl_lp64_mp.dylib 0x104d64000 - 0x104df7fff libomp.dylib (*) <86c894d2-ffc2-3ca7-88cd-11f483173719> /Users/USER/Downloads/*/libomp.dylib 0x0 - 0xffffffffffffffff ??? (*) <00000000-0000-0000-0000-000000000000> ??? 0x198721000 - 0x19872dacb libsystem_pthread.dylib (*) <0596a7b6-bce2-3f06-a2e8-3eaab5371ed8> /usr/lib/system/libsystem_pthread.dylib
Binary Images:
0x104c88000 - 0x104c8bfff fftw_dft_2d_c_doug.exe (*) <bc1e303d-deef-382a-a313-e82127e4f3d0> */fftw_dft_2d_c_doug.exe
0x108f68000 - 0x10c53ffff libarmpl_lp64_mp.dylib (*) <daeaca78-38e9-37be-b7cf-421def5bc699> /Users/USER/Downloads/*/libarmpl_lp64_mp.dylib
0x104d64000 - 0x104df7fff libomp.dylib (*) <86c894d2-ffc2-3ca7-88cd-11f483173719> /Users/USER/Downloads/*/libomp.dylib
0x0 - 0xffffffffffffffff ??? (*) <00000000-0000-0000-0000-000000000000> ???
0x198721000 - 0x19872dacb libsystem_pthread.dylib (*) <0596a7b6-bce2-3f06-a2e8-3eaab5371ed8> /usr/lib/system/libsystem_pthread.dylib
Instead of pasting the modified code, I will describe the changes to fftw_dft_2d_c_example.c.
In order to get a build, I had to edit the provided Makefile to remove -fopenmp from both CFLAGS and CLINKFLAGS. The stock examples all ran fine this way.
Though this does appear to be an issue with ArmPL, is there anything I can do in my code to avoid the crash?
Thanks,Doug
Hi Doug,Thanks for raising this issue. We'll try to reproduce what you're seeing and investigate further.Kevin