This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Square root calculation results. FPU logic of A15 and A7 CPUs on Odroid-XU3 board.

Hello,

I did experiments with Odroid XU3. I have noticed interesting effect of square root calculation.

I have received unexpected results, during experiments with execution time of 50 million square root operations.

  double temp = 5.0;

  double squareRoot;

  for (int i = 0; i < 50000000; i++)

  {

     squareRoot = sqrt(temp);

     if ((int)squareRoot % 2 == 0)

         temp += 0.5;

     else

         temp += 0.7;

  }

 

On maximum frequency (2.0GHz) Cortex-A15 is less than one and half times productive than Cortex-A7 on maximum frequency (1.4 GHz), 10.9 seconds and 13.2 seconds, correspondingly. However, when the execution time was calculated on the identical frequency (1.4 GHz, 1.3GHz, 1.2GHz, …, 300MHz, 200 MHz), A7 was faster than A15, for example on 1.0 GHz frequency A15 core finished the task at 21.9 seconds, whereas A7 core required only 18.5 seconds and for times less power.

The same trend was observed with sinus and cosines functions. Experiments with logarithm, addition, subtraction, multiplication and division operations give anticipated result. In this case, A15 was more than two times faster than A7 on the same frequency and almost three times faster on maximum frequency.

I used taskset command to bind the task to specified core. For frequency changing, I used cpufreq utils.

Could you please advise where I can find why A15 (or A7) shows such behaviour during square root calculation? Where the FPU module work is described for this processors?

Thank you for your help.

Best regards

  • It does not seem to happen in the Odroid XU4 tough...

    ===================================================
    root@odroid:/home/odroid/Projects/astro/src/examples# cat square_root.c

    #include <math.h>

    int main(int argc, char** argv) {
    double temp = 5.0;

    double squareRoot;

    for (int i = 0; i < 50000000; i++)

    {

    squareRoot = sqrt(temp);

    if ((int)squareRoot % 2 == 0)

    temp += 0.5;

    else

    temp += 0.7;

    }
    return 0;
    }

    ===================================================
    root@odroid:/home/odroid/Projects/astro/src/examples# c++ square_root.c
    root@odroid:/home/odroid/Projects/astro/src/examples# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq
    1000000
    root@odroid:/home/odroid/Projects/astro/src/examples# cat /sys/devices/system/cpu/cpu4/cpufreq/scaling_cur_freq
    1000000
    root@odroid:/home/odroid/Projects/astro/src/examples# c++ square_root.c
    root@odroid:/home/odroid/Projects/astro/src/examples# time taskset -c 0 ./a.out

    real 0m15.693s
    user 0m15.610s
    sys 0m0.050s

    root@odroid:/home/odroid/Projects/astro/src/examples# time taskset -c 4 ./a.out
    real 0m13.815s
    user 0m13.795s
    sys 0m0.010s

    ===================================================
    For the sin and cos functions, do you think it can be because of the high memory access that these functions can have? Sometimes sin and cos rely heavily on pre-process and table lookups.

  • 1st) Trigonometric functions are done in SW.
    2nd) Did you compile the "test" for A15 or A7? Or did you use one optimized for A15 and one for A7.
    3rd) What else is going on?
    4th) Did you look at the resulting object code? Does ist use the SQRT opcode?