Hi,
We found the following document on Cortex-A9 performance.
List of ARM microarchitectures - Wikipedia, the free encyclopedia
Which claims 2.5 DMIPS/MHz per core for Cortex-A9 2GHz@2 core. However, our Dhrystone result on Cortex-A9 1.2GHz@2 core only showed roughly 1 DMIPS/MHz per core.
We downloaded the Dhrystone benchmark from the following link and cross-compiled it with gcc version 4.5.2.
dhrystone 2.1 - Download, Browsing & More | Fossies Archive
We realize that Dhrystone measurements will vary due to differences in compilers and OS. And I would like to know whether our result is reasonable or not?
Best,
Ying
No I don't think that result is reasonable
I think the test should fit in the cache so the memory speed shouldn't matter.
I'm sorry this is the is it plugged in question but just to be certain.
What flags did you supply to gcc, did you specify gcc -O3 -Ofast -mcpu=cortex-a9 -mfpu=neon-f16 –DNDEBUG (the fpu option shouldn't really matter but just to have a standard set, only the -O3 should have a large effect)
How are you certain it was running at 1.2GHz rather than thermally throttled like many smartphones do after they done some hard work for a while?
I checked Makefile of the benchmark I downloaded, and it uses flags -O -DTIMES -DHZ=60. With HZ=60 I got 1 DMIPS/MHz per core.
However, I found internal kernel timer frequency in my kernel is 300. So, if I changed HZ to 300, I got 1.36 DMIPS/MHz per core. It is still lower than 2.5 DMIPS/MHz per core.
I also found another config from other people and I changed the config to -O3 -DTIME -march=armv7-a. I got 1.33 DMIPS/MHz per core, which is similar to the above result with HZ=300.
I was wondering which of them is the correct setting and I also thought it is still not a reasonable result. I am not sure about the running status of the benchmark. I just run the benchmark after the system boots in console mode.
Any help would be appreciated.
Well I haven't the foggiest what could be causing the problem.
There is a note from ARM about running the benchmark at
http://infocenter.arm.com/help/topic/com.arm.doc.dai0273a/DAI0273A_dhrystone_benchmarking.pdf
They use armcc but I can't see it making that big a difference for something like this
The test does use some string functions but even if one rewrote them in C instead of using the optimised ones in the library I still don't think it would make that size difference.
It would be nice to have a very simple loop tuned to the A9 which could be used to check the clock is exactly what one thinks it is.
It seems that the benchmark is tainted by compiler. I am not using armcc.
The compiler is gcc version 4.5.2 (Sourcery G++ Lite 2011.03-41) and the Linux kenel is 3.4.5.
The results and the flags are as follows.
DMIPS/MHz/core=0.49 with flags of -O0 -DTIME -march=armv7-a
DMIPS/MHz/core=1.07 with flags of -O1 -DTIME -march=armv7-a
DMIPS/MHz/core=1.31 with flags of -O2 -DTIME -march=armv7-a
DMIPS/MHz/core=1.37 with flags of -O3 -DTIME -march=armv7-a
I was wondering what compiler flags should I use?
Notably, all of them are much lower than the official 2.5 DMIPS/MHz/core...
View all questions in Cortex-A / A-Profile forum