performance: single thread looks OK while multi-thread ( 8 threads) poor .as compared to linux64.
arm machine is with 64-core. it's verified that 8 threads are started. but why is performance much slow？
Please share these info so we could progress any further:
Could you share details of arm and x86 machines that run your code? (machine model, CPU model, RAM size, CPUs and RAMs frequencies, etc.)
Would it be possible to provide real numbers of your experiment?
Also, could it be possible to narrow down code sections that seems less attractive on arm machine?