Crash in ArmPL zdot_conj_kernel

I have been using ArmPL on Linux for quite some time. I have experienced  numerous mysterious sporadic crashes which I haven't been able to identify the cause for. Recently, I started using ArmPL on macOS too and the same type of crashes started occurring on that platform as well. At first, I thought that the issue was related to the OpenMP library but after some experimenting I came to the conclusion that the crash is related to ArmPL. Here is my setup:

LINUX

  • ArmPL 23.04.
  • OpenMP library provided by ARM.
  • Code is compiled with GCC.

MAC

  • ArmPL 23.06.
  • Vanilla LLVM OpenMP.
  • Code is compiled with vanilla LLVM Clang.

The crash typically occurs after running the application for some time. Note that I use a wrapper around ArmPL. On macOS, I get the following output:

C  [libomp.dylib+0x5750]  ___kmp_fast_free+0xf0
C  [libomp.dylib+0x36704]  __kmp_release_deps(int, kmp_taskdata*)+0xb0
C  [libomp.dylib+0x35894]  void __kmp_task_finish<false>(int, kmp_task*, kmp_taskdata*)+0x148
C  [libomp.dylib+0x306c0]  __kmp_invoke_task(int, kmp_task*, kmp_taskdata*)+0x2b0
C  [libomp.dylib+0x33960]  int __kmp_execute_tasks_64<false, true>(kmp_info*, int, kmp_flag_64<false, true>*, int, int*, void*, int)+0x31c
C  [libomp.dylib+0x3d594]  kmp_flag_64<false, true>::wait(kmp_info*, int, void*)+0x618
C  [libomp.dylib+0x39b30]  __kmp_hyper_barrier_release(barrier_type, kmp_info*, int, int, int, void*)+0x98
C  [libomp.dylib+0x38730]  __kmp_barrier+0x500
C  [libomp.dylib+0xf170]  __kmpc_barrier+0x154
C  [libomp.dylib+0x6adec]  __kmp_invoke_microtask+0x9c

On Linux, I get this:


C [libomp.so+0x1db1c] ___kmp_fast_free+0x120
C [libomp.so+0x58c5c] __kmp_free_task_and_ancestors(int, kmp_taskdata*, kmp_info*)+0x90
C [libomp.so+0x57b34] void __kmp_task_finish<false>(int, kmp_task*, kmp_taskdata*)+0xe8
C [libomp.so+0x55068] __kmp_invoke_task(int, kmp_task*, kmp_taskdata*)+0x3cc
C [libomp.so+0x5a8d4] int __kmp_execute_tasks_64<false, true>(kmp_info*, int, kmp_flag_64<false, true>*, int, int*, void*, int)+0x2dc
C [libomp.so+0x620bc] kmp_flag_64<false, true>.wait(kmp_info*, int, void*)+0x620
C [libomp.so+0x5e320] __kmp_hyper_barrier_release(barrier_type, kmp_info*, int, int, int, void*)+0x90
C [libomp.so+0x5d110] __kmp_barrier+0x754
C [libomp.so+0x28634] __kmpc_barrier+0x144
C [libomp.so+0x8721c] GOMP_barrier+0x40
C [libomp.so+0x87f20] __kmp_GOMP_microtask_wrapper(int*, int*, void (*)(void*), void*)+0x34
C [libomp.so+0xa16cc] __kmp_invoke_microtask+0x9c

If I run without a wrapper, I get this:

[thread 958273 also had an error][thread 958269 also had an error][thread 958267 also had an error][thread 958284 also had an error][thread 958280 also had an error][thread 958276 also had an error][thread 958281 also had an error][thread 958271 also had an error][thread 958272 also had an error][thread 958275 also had an error][thread 958283 also had an error][thread 958282 also had an error][thread 958270 also had an error][thread 958288 also had an error][thread 958287 also had an error][thread 958286 also had an error][thread 958262 also had an error][thread 958266 also had an error][thread 958274 also had an error][thread 958277 also had an error][thread 958290 also had an error][thread 958279 also had an error][thread 958278 also had an error][thread 958289 also had an error][thread 958285 also had an error][thread 958265 also had an error][thread 958268 also had an error][thread 958261 also had an error][thread 958263 also had an error][thread 958252 also had an error][thread 958264 also had an error]


C  [libarmpl_mp.so+0x176e034]  zdot_conj_kernel+0xf4
C  [libarmpl_mp.so+0x258a444]  std::complex<double> armpl::clag::reduce_add_parallel<std::complex<double>, bool armpl::clag::strat::dot::impl<std::complex<double>, std::complex<double>, armpl::clag::spec::neoverse_n1_machine_spec>(armpl::clag::spec::problem_context_2T<std::complex<double>, std::complex<double>, (armpl::clag::spec::problem_type)43, armpl::clag::spec::neoverse_n1_machine_spec> const&) const::{lambda(long)#1}>(int, bool armpl::clag::strat::dot::impl<std::complex<double>, std::complex<double>, armpl::clag::spec::neoverse_n1_machine_spec>(armpl::clag::spec::problem_context_2T<std::complex<double>, std::complex<double>, (armpl::clag::spec::problem_type)43, armpl::clag::spec::neoverse_n1_machine_spec> const&) const::{lambda(long)#1}) [clone ._omp_fn.0]+0xc4
C  [libomp.so+0x87f20]  __kmp_GOMP_microtask_wrapper(int*, int*, void (*)(void*), void*)+0x34
C  [libomp.so+0xa16cc]  __kmp_invoke_microtask+0x9c

The crash doesn't occur when using other BLAS/LAPACK implementations (OpenBLAS, vecLib). Any help with solving this problem will be much appreciated.