I have been using ArmPL on Linux for quite some time. I have experienced numerous mysterious sporadic crashes which I haven't been able to identify the cause for. Recently, I started using ArmPL on macOS too and the same type of crashes started occurring on that platform as well. At first, I thought that the issue was related to the OpenMP library but after some experimenting I came to the conclusion that the crash is related to ArmPL. Here is my setup:
LINUX
MAC
The crash typically occurs after running the application for some time. Note that I use a wrapper around ArmPL. On macOS, I get the following output:
C [libomp.dylib+0x5750] ___kmp_fast_free+0xf0C [libomp.dylib+0x36704] __kmp_release_deps(int, kmp_taskdata*)+0xb0C [libomp.dylib+0x35894] void __kmp_task_finish<false>(int, kmp_task*, kmp_taskdata*)+0x148C [libomp.dylib+0x306c0] __kmp_invoke_task(int, kmp_task*, kmp_taskdata*)+0x2b0C [libomp.dylib+0x33960] int __kmp_execute_tasks_64<false, true>(kmp_info*, int, kmp_flag_64<false, true>*, int, int*, void*, int)+0x31cC [libomp.dylib+0x3d594] kmp_flag_64<false, true>::wait(kmp_info*, int, void*)+0x618C [libomp.dylib+0x39b30] __kmp_hyper_barrier_release(barrier_type, kmp_info*, int, int, int, void*)+0x98C [libomp.dylib+0x38730] __kmp_barrier+0x500C [libomp.dylib+0xf170] __kmpc_barrier+0x154C [libomp.dylib+0x6adec] __kmp_invoke_microtask+0x9c
C [libomp.dylib+0x5750] ___kmp_fast_free+0xf0
C [libomp.dylib+0x36704] __kmp_release_deps(int, kmp_taskdata*)+0xb0
C [libomp.dylib+0x35894] void __kmp_task_finish<false>(int, kmp_task*, kmp_taskdata*)+0x148
C [libomp.dylib+0x306c0] __kmp_invoke_task(int, kmp_task*, kmp_taskdata*)+0x2b0
C [libomp.dylib+0x33960] int __kmp_execute_tasks_64<false, true>(kmp_info*, int, kmp_flag_64<false, true>*, int, int*, void*, int)+0x31c
C [libomp.dylib+0x3d594] kmp_flag_64<false, true>::wait(kmp_info*, int, void*)+0x618
C [libomp.dylib+0x39b30] __kmp_hyper_barrier_release(barrier_type, kmp_info*, int, int, int, void*)+0x98
C [libomp.dylib+0x38730] __kmp_barrier+0x500
C [libomp.dylib+0xf170] __kmpc_barrier+0x154
C [libomp.dylib+0x6adec] __kmp_invoke_microtask+0x9c
On Linux, I get this:
C [libomp.so+0x1db1c] ___kmp_fast_free+0x120C [libomp.so+0x58c5c] __kmp_free_task_and_ancestors(int, kmp_taskdata*, kmp_info*)+0x90C [libomp.so+0x57b34] void __kmp_task_finish<false>(int, kmp_task*, kmp_taskdata*)+0xe8C [libomp.so+0x55068] __kmp_invoke_task(int, kmp_task*, kmp_taskdata*)+0x3ccC [libomp.so+0x5a8d4] int __kmp_execute_tasks_64<false, true>(kmp_info*, int, kmp_flag_64<false, true>*, int, int*, void*, int)+0x2dcC [libomp.so+0x620bc] kmp_flag_64<false, true>.wait(kmp_info*, int, void*)+0x620C [libomp.so+0x5e320] __kmp_hyper_barrier_release(barrier_type, kmp_info*, int, int, int, void*)+0x90C [libomp.so+0x5d110] __kmp_barrier+0x754C [libomp.so+0x28634] __kmpc_barrier+0x144C [libomp.so+0x8721c] GOMP_barrier+0x40C [libomp.so+0x87f20] __kmp_GOMP_microtask_wrapper(int*, int*, void (*)(void*), void*)+0x34C [libomp.so+0xa16cc] __kmp_invoke_microtask+0x9cIf I run without a wrapper, I get this:[thread 958273 also had an error][thread 958269 also had an error][thread 958267 also had an error][thread 958284 also had an error][thread 958280 also had an error][thread 958276 also had an error][thread 958281 also had an error][thread 958271 also had an error][thread 958272 also had an error][thread 958275 also had an error][thread 958283 also had an error][thread 958282 also had an error][thread 958270 also had an error][thread 958288 also had an error][thread 958287 also had an error][thread 958286 also had an error][thread 958262 also had an error][thread 958266 also had an error][thread 958274 also had an error][thread 958277 also had an error][thread 958290 also had an error][thread 958279 also had an error][thread 958278 also had an error][thread 958289 also had an error][thread 958285 also had an error][thread 958265 also had an error][thread 958268 also had an error][thread 958261 also had an error][thread 958263 also had an error][thread 958252 also had an error][thread 958264 also had an error]C [libarmpl_mp.so+0x176e034] zdot_conj_kernel+0xf4C [libarmpl_mp.so+0x258a444] std::complex<double> armpl::clag::reduce_add_parallel<std::complex<double>, bool armpl::clag::strat::dot::impl<std::complex<double>, std::complex<double>, armpl::clag::spec::neoverse_n1_machine_spec>(armpl::clag::spec::problem_context_2T<std::complex<double>, std::complex<double>, (armpl::clag::spec::problem_type)43, armpl::clag::spec::neoverse_n1_machine_spec> const&) const::{lambda(long)#1}>(int, bool armpl::clag::strat::dot::impl<std::complex<double>, std::complex<double>, armpl::clag::spec::neoverse_n1_machine_spec>(armpl::clag::spec::problem_context_2T<std::complex<double>, std::complex<double>, (armpl::clag::spec::problem_type)43, armpl::clag::spec::neoverse_n1_machine_spec> const&) const::{lambda(long)#1}) [clone ._omp_fn.0]+0xc4C [libomp.so+0x87f20] __kmp_GOMP_microtask_wrapper(int*, int*, void (*)(void*), void*)+0x34C [libomp.so+0xa16cc] __kmp_invoke_microtask+0x9cThe crash doesn't occur when using other BLAS/LAPACK implementations (OpenBLAS, vecLib). Any help with solving this problem will be much appreciated.
C [libomp.so+0x1db1c] ___kmp_fast_free+0x120
C [libomp.so+0x58c5c] __kmp_free_task_and_ancestors(int, kmp_taskdata*, kmp_info*)+0x90
C [libomp.so+0x57b34] void __kmp_task_finish<false>(int, kmp_task*, kmp_taskdata*)+0xe8
C [libomp.so+0x55068] __kmp_invoke_task(int, kmp_task*, kmp_taskdata*)+0x3cc
C [libomp.so+0x5a8d4] int __kmp_execute_tasks_64<false, true>(kmp_info*, int, kmp_flag_64<false, true>*, int, int*, void*, int)+0x2dc
C [libomp.so+0x620bc] kmp_flag_64<false, true>.wait(kmp_info*, int, void*)+0x620
C [libomp.so+0x5e320] __kmp_hyper_barrier_release(barrier_type, kmp_info*, int, int, int, void*)+0x90
C [libomp.so+0x5d110] __kmp_barrier+0x754
C [libomp.so+0x28634] __kmpc_barrier+0x144
C [libomp.so+0x8721c] GOMP_barrier+0x40
C [libomp.so+0x87f20] __kmp_GOMP_microtask_wrapper(int*, int*, void (*)(void*), void*)+0x34
C [libomp.so+0xa16cc] __kmp_invoke_microtask+0x9c
[thread 958273 also had an error][thread 958269 also had an error][thread 958267 also had an error][thread 958284 also had an error][thread 958280 also had an error][thread 958276 also had an error][thread 958281 also had an error][thread 958271 also had an error][thread 958272 also had an error][thread 958275 also had an error][thread 958283 also had an error][thread 958282 also had an error][thread 958270 also had an error][thread 958288 also had an error][thread 958287 also had an error][thread 958286 also had an error][thread 958262 also had an error][thread 958266 also had an error][thread 958274 also had an error][thread 958277 also had an error][thread 958290 also had an error][thread 958279 also had an error][thread 958278 also had an error][thread 958289 also had an error][thread 958285 also had an error][thread 958265 also had an error][thread 958268 also had an error][thread 958261 also had an error][thread 958263 also had an error][thread 958252 also had an error][thread 958264 also had an error]
C [libarmpl_mp.so+0x176e034] zdot_conj_kernel+0xf4
C [libarmpl_mp.so+0x258a444] std::complex<double> armpl::clag::reduce_add_parallel<std::complex<double>, bool armpl::clag::strat::dot::impl<std::complex<double>, std::complex<double>, armpl::clag::spec::neoverse_n1_machine_spec>(armpl::clag::spec::problem_context_2T<std::complex<double>, std::complex<double>, (armpl::clag::spec::problem_type)43, armpl::clag::spec::neoverse_n1_machine_spec> const&) const::{lambda(long)#1}>(int, bool armpl::clag::strat::dot::impl<std::complex<double>, std::complex<double>, armpl::clag::spec::neoverse_n1_machine_spec>(armpl::clag::spec::problem_context_2T<std::complex<double>, std::complex<double>, (armpl::clag::spec::problem_type)43, armpl::clag::spec::neoverse_n1_machine_spec> const&) const::{lambda(long)#1}) [clone ._omp_fn.0]+0xc4
Sorry for the late reply. I was on a very long journey. The code is compiled with GCC 12 and the crash occurs on Rocky Linux 8 (generic ARMv8) and on Ubuntu 22.04 (Neoverse-N1). I have seen similar crashes on Amazon Linux 2 with Graviton 2 and 3. I could try building the code with GCC 11. This is going to take some time.