Hi all,I am using the HiKey970 board to run inferences on neural networks. The board comprises ARM Cortex-A73 and ARM Cortex-A53 cores.I am using `taskset` to pin the inference process (that spawns 4 threads) once on the LITTLE cores (0-3) and once on the big cores (4-7). Contrary to what I was expecting, the inference time is almost double when running on big cores, compared to LITTLE cores.Is there an explanation for this behavior? Are there tools that can help me understand why the threads are slower when using big cores?To be more precise, the board is flashed with kernel version 4.9.78-147538-g244928755bbe, the code that I am using can be found in this repo.
The VGG (16 and 19) have expected behavior. Performance is ~40% better when using big cores, comparing to LITTLE cores. The best performance is observed when using the GPU on the board.
You may need to investigate deeper, probably with performance counters.You could use Streamline to do so: developer.arm.com/.../streamline-performance-analyzer
Having a fixed DDR frequency would be nice. Instead of the unavailable performance governor, you could force a min and max range:
415000000 830000000 1244000000 1866000000
sudo bash -c "echo 1866000000 > /sys/class/devfreq/ddr_devfreq/min_freq"
sudo bash -c "echo 1866000000 > /sys/class/devfreq/ddr_devfreq/max_freq"
Also, try to use the same frequency on both cluster.
509000 1018000 1210000 1402000 1556000 1690000 1844000
682000 1018000 1210000 1364000 1498000 1652000 1863000 2093000 2362000
sudo bash -c "echo performance > /sys/devices/system/cpu/cpufreq/policy0/scaling_governor"
sudo bash -c "echo 1018000 > /sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq"
sudo bash -c "echo performance > /sys/devices/system/cpu/cpufreq/policy4/scaling_governor"
sudo bash -c "echo 1018000 > /sys/devices/system/cpu/cpufreq/policy4/scaling_max_freq"
Remember to continue monitoring temperature, and frequency change while you're investigating.You could also limit distraction for your hardware by removing "useless" processes, like removing all "unneeded" background services and running all in serial console, without any GUI running. It will limit a bit context switching and some other annoyance for your application.
Thanks for the advice, I will try it out.Posting the following, regarding setting the memory and GPU frequency, in case someone else comes across this thread.For the specific distribution, kernel and device (Lebian, 4.9.78-147538-g244928755bbe, HikEy 970),