performance: single thread looks OK while multi-thread ( 8 threads) poor .as compared to linux64.
arm machine is with 64-core. it's verified that 8 threads are started. but why is performance much slow？
It depends on a lot of things, and there is not enough context that we could work on.
Two possibilities, though:1) with more threads and more core in use, temperature could increase drastically, then thermal management driver could play and throttle CPUs.
2) the workload is implemented using shared memory space with global variables and you suffer from false sharing. When you run a single threaded version, there is no memory thrashing, while a multithreaded version may thrash memory at each access (cache invalidation, cleaning and fetch back and forth between cores).