This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Why performance is higher on LITTLE cores?

Hi all,

I am using the HiKey970 board to run inferences on neural networks. The board comprises ARM Cortex-A73 and ARM Cortex-A53 cores.
I am using `taskset` to pin the inference process (that spawns 4 threads) once on the LITTLE cores (0-3) and once on the big cores (4-7). Contrary to what I was expecting, the inference time is almost double when running on big cores, compared to LITTLE cores.

Is there an explanation for this behavior? Are there tools that can help me understand why the threads are slower when using big cores?

To be more precise, the board is flashed with kernel version 4.9.78-147538-g244928755bbe, the code that I am using can be found in this repo.

Top replies

vstehle over 3 years ago in reply to zois +1

Hi zois , Is it possible that your inferences tasks run in fact on the NPU? If this is the case, the task would not be CPU bound and that would explain the behaviour you are seeing.

Parents

0 zois over 3 years ago in reply to Willy Wolff

Hi Willy,

It seems that I am not able to keep the DDR frequency constant. Bumping the kernel could help, but is out of my time scope, I would need to cover some knowledge in order to port that to the board.

You are right, I need to check the hardware counters closer in order to understand how the architecture of each cluster is affecting execution. I plan to do that next.

Regarding the clusters' frequencies, I have set them both to the highest possible value, 1.86GHz for LITTLE and 2.36GHz for big. Even with that configuration, I notice the performance difference I have mentioned. Additionally, I was monitoring the frequency while running ResNet50, I don't see transitions in CPU frequency during execution. So I think that can be ruled out as a cause.

Thanks for the publication, I checked the scripts, I don't see anything funny or unexpected happening with the configuration of the board or the arguments to the executables.

I monitored the temperature as you suggested. There is a ~9C difference on the board between using big and LITTLE cores. Though as I mentioned previously, I was monitoring the frequency of the cores as well and didn't observe any transition, even though the temperature is higher.
Cancel
Up 0 Down

Cancel

Reply

0 zois over 3 years ago in reply to Willy Wolff

Hi Willy,

It seems that I am not able to keep the DDR frequency constant. Bumping the kernel could help, but is out of my time scope, I would need to cover some knowledge in order to port that to the board.

You are right, I need to check the hardware counters closer in order to understand how the architecture of each cluster is affecting execution. I plan to do that next.

Regarding the clusters' frequencies, I have set them both to the highest possible value, 1.86GHz for LITTLE and 2.36GHz for big. Even with that configuration, I notice the performance difference I have mentioned. Additionally, I was monitoring the frequency while running ResNet50, I don't see transitions in CPU frequency during execution. So I think that can be ruled out as a cause.

Thanks for the publication, I checked the scripts, I don't see anything funny or unexpected happening with the configuration of the board or the arguments to the executables.

I monitored the temperature as you suggested. There is a ~9C difference on the board between using big and LITTLE cores. Though as I mentioned previously, I was monitoring the frequency of the cores as well and didn't observe any transition, even though the temperature is higher.
Cancel
Up 0 Down

Cancel

Children

No data