This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Why performance is higher on LITTLE cores?

Hi all,

I am using the HiKey970 board to run inferences on neural networks. The board comprises ARM Cortex-A73 and ARM Cortex-A53 cores.
I am using `taskset` to pin the inference process (that spawns 4 threads) once on the LITTLE cores (0-3) and once on the big cores (4-7). Contrary to what I was expecting, the inference time is almost double when running on big cores, compared to LITTLE cores.

Is there an explanation for this behavior? Are there tools that can help me understand why the threads are slower when using big cores?

To be more precise, the board is flashed with kernel version 4.9.78-147538-g244928755bbe, the code that I am using can be found in this repo.

Parents
  • Hi zois,

    Interesting.

    You may need to investigate deeper, probably with performance counters.
    You could use Streamline to do so: developer.arm.com/.../streamline-performance-analyzer

    Having a fixed DDR frequency would be nice. Instead of the unavailable performance governor, you could force a min and max range:

    cat /sys/class/devfreq/ddr_devfreq/available_frequencies
    415000000 830000000 1244000000 1866000000
    
    sudo bash -c "echo 1866000000 > /sys/class/devfreq/ddr_devfreq/min_freq"
    sudo bash -c "echo 1866000000 > /sys/class/devfreq/ddr_devfreq/max_freq"


    Also, try to use the same frequency on both cluster.

    cat /sys/devices/system/cpu/cpufreq/policy0/scaling_available_frequencies
    509000 1018000 1210000 1402000 1556000 1690000 1844000
    cat /sys/devices/system/cpu/cpufreq/policy4/scaling_available_frequencies
    682000 1018000 1210000 1364000 1498000 1652000 1863000 2093000 2362000
    
    sudo bash -c "echo performance > /sys/devices/system/cpu/cpufreq/policy0/scaling_governor"
    sudo bash -c "echo 1018000 > /sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq"
    
    sudo bash -c "echo performance > /sys/devices/system/cpu/cpufreq/policy4/scaling_governor"
    sudo bash -c "echo 1018000 > /sys/devices/system/cpu/cpufreq/policy4/scaling_max_freq"

    Remember to continue monitoring temperature, and frequency change while you're investigating.
    You could also limit distraction for your hardware by removing "useless" processes, like removing all "unneeded" background services and running all in serial console, without any GUI running. It will limit a bit context switching and some other annoyance for your application.

Reply
  • Hi zois,

    Interesting.

    You may need to investigate deeper, probably with performance counters.
    You could use Streamline to do so: developer.arm.com/.../streamline-performance-analyzer

    Having a fixed DDR frequency would be nice. Instead of the unavailable performance governor, you could force a min and max range:

    cat /sys/class/devfreq/ddr_devfreq/available_frequencies
    415000000 830000000 1244000000 1866000000
    
    sudo bash -c "echo 1866000000 > /sys/class/devfreq/ddr_devfreq/min_freq"
    sudo bash -c "echo 1866000000 > /sys/class/devfreq/ddr_devfreq/max_freq"


    Also, try to use the same frequency on both cluster.

    cat /sys/devices/system/cpu/cpufreq/policy0/scaling_available_frequencies
    509000 1018000 1210000 1402000 1556000 1690000 1844000
    cat /sys/devices/system/cpu/cpufreq/policy4/scaling_available_frequencies
    682000 1018000 1210000 1364000 1498000 1652000 1863000 2093000 2362000
    
    sudo bash -c "echo performance > /sys/devices/system/cpu/cpufreq/policy0/scaling_governor"
    sudo bash -c "echo 1018000 > /sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq"
    
    sudo bash -c "echo performance > /sys/devices/system/cpu/cpufreq/policy4/scaling_governor"
    sudo bash -c "echo 1018000 > /sys/devices/system/cpu/cpufreq/policy4/scaling_max_freq"

    Remember to continue monitoring temperature, and frequency change while you're investigating.
    You could also limit distraction for your hardware by removing "useless" processes, like removing all "unneeded" background services and running all in serial console, without any GUI running. It will limit a bit context switching and some other annoyance for your application.

Children
  • Thanks for the advice, I will try it out.

    Posting the following, regarding setting the memory and GPU frequency, in case someone else comes across this thread.
    For the specific distribution, kernel and device (Lebian, 4.9.78-147538-g244928755bbe, HikEy 970),

    /sys/devices/platform/ddr_devfreq/devfreq/ddr_devfreq/cur_freq

    does not have writing privileges. Even if we modify the privileges and write the file, the frequency does not seem to change. Instead, if we change the memory or GPU governor to `userspace`, there is a new directory, named `userspace,` created under

    /sys/devices/platform/ddr_devfreq/devfreq/ddr_devfreq
    .
    Within that directory, there is a file `cur_freq` that we can write and set the frequency in the desired value.