Hello,
I did experiments with Odroid XU3. I have noticed interesting effect of square root calculation.
I have received unexpected results, during experiments with execution time of 50 million square root operations.
double temp = 5.0; double squareRoot; for (int i = 0; i < 50000000; i++) { squareRoot = sqrt(temp); if ((int)squareRoot % 2 == 0) temp += 0.5; else temp += 0.7; }
On maximum frequency (2.0GHz) Cortex-A15 is less than one and half times productive than Cortex-A7 on maximum frequency (1.4 GHz), 10.9 seconds and 13.2 seconds, correspondingly. However, when the execution time was calculated on the identical frequency (1.4 GHz, 1.3GHz, 1.2GHz, …, 300MHz, 200 MHz), A7 was faster than A15, for example on 1.0 GHz frequency A15 core finished the task at 21.9 seconds, whereas A7 core required only 18.5 seconds and for times less power.
The same trend was observed with sinus and cosines functions. Experiments with logarithm, addition, subtraction, multiplication and division operations give anticipated result. In this case, A15 was more than two times faster than A7 on the same frequency and almost three times faster on maximum frequency.
I used taskset command to bind the task to specified core. For frequency changing, I used cpufreq utils.
Could you please advise where I can find why A15 (or A7) shows such behaviour during square root calculation? Where the FPU module work is described for this processors?
Thank you for your help.
Best regards
It does not seem to happen in the Odroid XU4 tough... =================================================== root@odroid:/home/odroid/Projects/astro/src/examples# cat square_root.c #include <math.h> int main(int argc, char** argv) {double temp = 5.0; double squareRoot; for (int i = 0; i < 50000000; i++) { squareRoot = sqrt(temp); if ((int)squareRoot % 2 == 0) temp += 0.5; else temp += 0.7; }return 0;} ===================================================root@odroid:/home/odroid/Projects/astro/src/examples# c++ square_root.croot@odroid:/home/odroid/Projects/astro/src/examples# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq1000000root@odroid:/home/odroid/Projects/astro/src/examples# cat /sys/devices/system/cpu/cpu4/cpufreq/scaling_cur_freq1000000root@odroid:/home/odroid/Projects/astro/src/examples# c++ square_root.croot@odroid:/home/odroid/Projects/astro/src/examples# time taskset -c 0 ./a.out real 0m15.693suser 0m15.610ssys 0m0.050s root@odroid:/home/odroid/Projects/astro/src/examples# time taskset -c 4 ./a.outreal 0m13.815suser 0m13.795ssys 0m0.010s
===================================================For the sin and cos functions, do you think it can be because of the high memory access that these functions can have? Sometimes sin and cos rely heavily on pre-process and table lookups.