Hi all,I am using the HiKey970 board to run inferences on neural networks. The board comprises ARM Cortex-A73 and ARM Cortex-A53 cores.I am using `taskset` to pin the inference process (that spawns 4 threads) once on the LITTLE cores (0-3) and once on the big cores (4-7). Contrary to what I was expecting, the inference time is almost double when running on big cores, compared to LITTLE cores.Is there an explanation for this behavior? Are there tools that can help me understand why the threads are slower when using big cores?To be more precise, the board is flashed with kernel version 4.9.78-147538-g244928755bbe, the code that I am using can be found in this repo.
Hi Willy,It seems that I am not able to keep the DDR frequency constant. Bumping the kernel could help, but is out of my time scope, I would need to cover some knowledge in order to port that to the board.
You are right, I need to check the hardware counters closer in order to understand how the architecture of each cluster is affecting execution. I plan to do that next.Regarding the clusters' frequencies, I have set them both to the highest possible value, 1.86GHz for LITTLE and 2.36GHz for big. Even with that configuration, I notice the performance difference I have mentioned. Additionally, I was monitoring the frequency while running ResNet50, I don't see transitions in CPU frequency during execution. So I think that can be ruled out as a cause.Thanks for the publication, I checked the scripts, I don't see anything funny or unexpected happening with the configuration of the board or the arguments to the executables.I monitored the temperature as you suggested. There is a ~9C difference on the board between using big and LITTLE cores. Though as I mentioned previously, I was monitoring the frequency of the cores as well and didn't observe any transition, even though the temperature is higher.