We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
I have been using ArmPL on Linux for quite some time. I have experienced numerous mysterious sporadic crashes which I haven't been able to identify the cause for. Recently, I started using ArmPL on macOS too and the same type of crashes started occurring on that platform as well. At first, I thought that the issue was related to the OpenMP library but after some experimenting I came to the conclusion that the crash is related to ArmPL. Here is my setup:
LINUX
MAC
The crash typically occurs after running the application for some time. Note that I use a wrapper around ArmPL. On macOS, I get the following output:
C [libomp.dylib+0x5750] ___kmp_fast_free+0xf0C [libomp.dylib+0x36704] __kmp_release_deps(int, kmp_taskdata*)+0xb0C [libomp.dylib+0x35894] void __kmp_task_finish<false>(int, kmp_task*, kmp_taskdata*)+0x148C [libomp.dylib+0x306c0] __kmp_invoke_task(int, kmp_task*, kmp_taskdata*)+0x2b0C [libomp.dylib+0x33960] int __kmp_execute_tasks_64<false, true>(kmp_info*, int, kmp_flag_64<false, true>*, int, int*, void*, int)+0x31cC [libomp.dylib+0x3d594] kmp_flag_64<false, true>::wait(kmp_info*, int, void*)+0x618C [libomp.dylib+0x39b30] __kmp_hyper_barrier_release(barrier_type, kmp_info*, int, int, int, void*)+0x98C [libomp.dylib+0x38730] __kmp_barrier+0x500C [libomp.dylib+0xf170] __kmpc_barrier+0x154C [libomp.dylib+0x6adec] __kmp_invoke_microtask+0x9c
C [libomp.dylib+0x5750] ___kmp_fast_free+0xf0
C [libomp.dylib+0x36704] __kmp_release_deps(int, kmp_taskdata*)+0xb0
C [libomp.dylib+0x35894] void __kmp_task_finish<false>(int, kmp_task*, kmp_taskdata*)+0x148
C [libomp.dylib+0x306c0] __kmp_invoke_task(int, kmp_task*, kmp_taskdata*)+0x2b0
C [libomp.dylib+0x33960] int __kmp_execute_tasks_64<false, true>(kmp_info*, int, kmp_flag_64<false, true>*, int, int*, void*, int)+0x31c
C [libomp.dylib+0x3d594] kmp_flag_64<false, true>::wait(kmp_info*, int, void*)+0x618
C [libomp.dylib+0x39b30] __kmp_hyper_barrier_release(barrier_type, kmp_info*, int, int, int, void*)+0x98
C [libomp.dylib+0x38730] __kmp_barrier+0x500
C [libomp.dylib+0xf170] __kmpc_barrier+0x154
C [libomp.dylib+0x6adec] __kmp_invoke_microtask+0x9c
On Linux, I get this:
C [libomp.so+0x1db1c] ___kmp_fast_free+0x120C [libomp.so+0x58c5c] __kmp_free_task_and_ancestors(int, kmp_taskdata*, kmp_info*)+0x90C [libomp.so+0x57b34] void __kmp_task_finish<false>(int, kmp_task*, kmp_taskdata*)+0xe8C [libomp.so+0x55068] __kmp_invoke_task(int, kmp_task*, kmp_taskdata*)+0x3ccC [libomp.so+0x5a8d4] int __kmp_execute_tasks_64<false, true>(kmp_info*, int, kmp_flag_64<false, true>*, int, int*, void*, int)+0x2dcC [libomp.so+0x620bc] kmp_flag_64<false, true>.wait(kmp_info*, int, void*)+0x620C [libomp.so+0x5e320] __kmp_hyper_barrier_release(barrier_type, kmp_info*, int, int, int, void*)+0x90C [libomp.so+0x5d110] __kmp_barrier+0x754C [libomp.so+0x28634] __kmpc_barrier+0x144C [libomp.so+0x8721c] GOMP_barrier+0x40C [libomp.so+0x87f20] __kmp_GOMP_microtask_wrapper(int*, int*, void (*)(void*), void*)+0x34C [libomp.so+0xa16cc] __kmp_invoke_microtask+0x9cIf I run without a wrapper, I get this:[thread 958273 also had an error][thread 958269 also had an error][thread 958267 also had an error][thread 958284 also had an error][thread 958280 also had an error][thread 958276 also had an error][thread 958281 also had an error][thread 958271 also had an error][thread 958272 also had an error][thread 958275 also had an error][thread 958283 also had an error][thread 958282 also had an error][thread 958270 also had an error][thread 958288 also had an error][thread 958287 also had an error][thread 958286 also had an error][thread 958262 also had an error][thread 958266 also had an error][thread 958274 also had an error][thread 958277 also had an error][thread 958290 also had an error][thread 958279 also had an error][thread 958278 also had an error][thread 958289 also had an error][thread 958285 also had an error][thread 958265 also had an error][thread 958268 also had an error][thread 958261 also had an error][thread 958263 also had an error][thread 958252 also had an error][thread 958264 also had an error]C [libarmpl_mp.so+0x176e034] zdot_conj_kernel+0xf4C [libarmpl_mp.so+0x258a444] std::complex<double> armpl::clag::reduce_add_parallel<std::complex<double>, bool armpl::clag::strat::dot::impl<std::complex<double>, std::complex<double>, armpl::clag::spec::neoverse_n1_machine_spec>(armpl::clag::spec::problem_context_2T<std::complex<double>, std::complex<double>, (armpl::clag::spec::problem_type)43, armpl::clag::spec::neoverse_n1_machine_spec> const&) const::{lambda(long)#1}>(int, bool armpl::clag::strat::dot::impl<std::complex<double>, std::complex<double>, armpl::clag::spec::neoverse_n1_machine_spec>(armpl::clag::spec::problem_context_2T<std::complex<double>, std::complex<double>, (armpl::clag::spec::problem_type)43, armpl::clag::spec::neoverse_n1_machine_spec> const&) const::{lambda(long)#1}) [clone ._omp_fn.0]+0xc4C [libomp.so+0x87f20] __kmp_GOMP_microtask_wrapper(int*, int*, void (*)(void*), void*)+0x34C [libomp.so+0xa16cc] __kmp_invoke_microtask+0x9cThe crash doesn't occur when using other BLAS/LAPACK implementations (OpenBLAS, vecLib). Any help with solving this problem will be much appreciated.
C [libomp.so+0x1db1c] ___kmp_fast_free+0x120
C [libomp.so+0x58c5c] __kmp_free_task_and_ancestors(int, kmp_taskdata*, kmp_info*)+0x90
C [libomp.so+0x57b34] void __kmp_task_finish<false>(int, kmp_task*, kmp_taskdata*)+0xe8
C [libomp.so+0x55068] __kmp_invoke_task(int, kmp_task*, kmp_taskdata*)+0x3cc
C [libomp.so+0x5a8d4] int __kmp_execute_tasks_64<false, true>(kmp_info*, int, kmp_flag_64<false, true>*, int, int*, void*, int)+0x2dc
C [libomp.so+0x620bc] kmp_flag_64<false, true>.wait(kmp_info*, int, void*)+0x620
C [libomp.so+0x5e320] __kmp_hyper_barrier_release(barrier_type, kmp_info*, int, int, int, void*)+0x90
C [libomp.so+0x5d110] __kmp_barrier+0x754
C [libomp.so+0x28634] __kmpc_barrier+0x144
C [libomp.so+0x8721c] GOMP_barrier+0x40
C [libomp.so+0x87f20] __kmp_GOMP_microtask_wrapper(int*, int*, void (*)(void*), void*)+0x34
C [libomp.so+0xa16cc] __kmp_invoke_microtask+0x9c
[thread 958273 also had an error][thread 958269 also had an error][thread 958267 also had an error][thread 958284 also had an error][thread 958280 also had an error][thread 958276 also had an error][thread 958281 also had an error][thread 958271 also had an error][thread 958272 also had an error][thread 958275 also had an error][thread 958283 also had an error][thread 958282 also had an error][thread 958270 also had an error][thread 958288 also had an error][thread 958287 also had an error][thread 958286 also had an error][thread 958262 also had an error][thread 958266 also had an error][thread 958274 also had an error][thread 958277 also had an error][thread 958290 also had an error][thread 958279 also had an error][thread 958278 also had an error][thread 958289 also had an error][thread 958285 also had an error][thread 958265 also had an error][thread 958268 also had an error][thread 958261 also had an error][thread 958263 also had an error][thread 958252 also had an error][thread 958264 also had an error]
C [libarmpl_mp.so+0x176e034] zdot_conj_kernel+0xf4
C [libarmpl_mp.so+0x258a444] std::complex<double> armpl::clag::reduce_add_parallel<std::complex<double>, bool armpl::clag::strat::dot::impl<std::complex<double>, std::complex<double>, armpl::clag::spec::neoverse_n1_machine_spec>(armpl::clag::spec::problem_context_2T<std::complex<double>, std::complex<double>, (armpl::clag::spec::problem_type)43, armpl::clag::spec::neoverse_n1_machine_spec> const&) const::{lambda(long)#1}>(int, bool armpl::clag::strat::dot::impl<std::complex<double>, std::complex<double>, armpl::clag::spec::neoverse_n1_machine_spec>(armpl::clag::spec::problem_context_2T<std::complex<double>, std::complex<double>, (armpl::clag::spec::problem_type)43, armpl::clag::spec::neoverse_n1_machine_spec> const&) const::{lambda(long)#1}) [clone ._omp_fn.0]+0xc4
Hello again. It's probably worth trying the new 23.10 release for this issue as well, especially if that's easier than trying GCC 11.
https://developer.arm.com/downloads/-/arm-performance-libraries
https://developer.arm.com/downloads/-/arm-compiler-for-linux/
Chris.
Hello, I can confirm that the problem disappeared after upgrading to 23.10. Both the Linux and macOS versions run smoothly now. Thanks!