This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Why performance is higher on LITTLE cores?

zois over 4 years ago

Hi all,

I am using the HiKey970 board to run inferences on neural networks. The board comprises ARM Cortex-A73 and ARM Cortex-A53 cores.
I am using `taskset` to pin the inference process (that spawns 4 threads) once on the LITTLE cores (0-3) and once on the big cores (4-7). Contrary to what I was expecting, the inference time is almost double when running on big cores, compared to LITTLE cores.

Is there an explanation for this behavior? Are there tools that can help me understand why the threads are slower when using big cores?

To be more precise, the board is flashed with kernel version 4.9.78-147538-g244928755bbe, the code that I am using can be found in this repo.

Top replies

vstehle over 4 years ago in reply to zois +1

Hi zois , Is it possible that your inferences tasks run in fact on the NPU? If this is the case, the task would not be CPU bound and that would explain the behaviour you are seeing.

Parents

0 zois over 4 years ago in reply to vstehle

Hi vstehle,

I don't think this is the case, since the API for the NPU is not very open, and the code I am using is not making calls to it. What is strange to me is that this behavior appears only for one network, ResNet50. The rest of the networks have expected behavior, performance better or equal when using big cores.

I am looking now whether it is a synchronization issue for the implementation of the specific network.

I also tried moving all system processes/threads to the cores that I am not using for inference, using `cset`. Still the same behavior, little cores demonstrate better performance, almost double compared to big cores.
Cancel
Vote up 0 Vote down

Cancel

Reply

0 zois over 4 years ago in reply to vstehle

Hi vstehle,

I don't think this is the case, since the API for the NPU is not very open, and the code I am using is not making calls to it. What is strange to me is that this behavior appears only for one network, ResNet50. The rest of the networks have expected behavior, performance better or equal when using big cores.

I am looking now whether it is a synchronization issue for the implementation of the specific network.

I also tried moving all system processes/threads to the cores that I am not using for inference, using `cset`. Still the same behavior, little cores demonstrate better performance, almost double compared to big cores.
Cancel
Vote up 0 Vote down

Cancel

Children

No data