This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Issue about Cortex-A9 Dhrystone performance

Hi,

We found the following document on Cortex-A9 performance.

List of ARM microarchitectures - Wikipedia, the free encyclopedia

Which claims 2.5 DMIPS/MHz per core for Cortex-A9 2GHz@2 core. However, our Dhrystone result on Cortex-A9 1.2GHz@2 core only showed roughly 1 DMIPS/MHz per core.

We downloaded the Dhrystone benchmark from the following link and cross-compiled it with gcc version 4.5.2.

dhrystone 2.1 - Download, Browsing & More | Fossies Archive

We realize that Dhrystone measurements will vary due to differences in compilers and OS. And I would like to know whether our result is reasonable or not?

Best,

Ying

Parents
  • No I don't think that result is reasonable

    I think the test should fit in the cache so the memory speed shouldn't matter.

    I'm sorry  this is the is it plugged in question but just to be certain.

    What flags did you supply to gcc, did you specify gcc -O3 -Ofast -mcpu=cortex-a9 -mfpu=neon-f16 –DNDEBUG   (the fpu option shouldn't really matter but just to have  a standard set, only the -O3 should have a large effect)


    How are you certain it was running at 1.2GHz rather than thermally throttled like many smartphones do after they done some hard work for a while?

Reply
  • No I don't think that result is reasonable

    I think the test should fit in the cache so the memory speed shouldn't matter.

    I'm sorry  this is the is it plugged in question but just to be certain.

    What flags did you supply to gcc, did you specify gcc -O3 -Ofast -mcpu=cortex-a9 -mfpu=neon-f16 –DNDEBUG   (the fpu option shouldn't really matter but just to have  a standard set, only the -O3 should have a large effect)


    How are you certain it was running at 1.2GHz rather than thermally throttled like many smartphones do after they done some hard work for a while?

Children
  • I checked Makefile of the benchmark I downloaded, and it uses flags -O -DTIMES -DHZ=60. With HZ=60 I got 1 DMIPS/MHz per core.

    However, I found internal kernel timer frequency in my kernel is 300. So, if I changed HZ to 300, I got 1.36 DMIPS/MHz per core. It is still lower than 2.5 DMIPS/MHz per core.

    I also found another config from other people and I changed the config to -O3 -DTIME -march=armv7-a. I got 1.33 DMIPS/MHz per core, which is similar to the above result with HZ=300.

    I was wondering which of them is the correct setting and I also thought it is still not a reasonable result. I am not sure about the running status of the benchmark. I just run the benchmark after the system boots in console mode.

    Any help would be appreciated.

  • Well I haven't the foggiest what could be causing the problem.

    There is a note from ARM about running the benchmark at

    http://infocenter.arm.com/help/topic/com.arm.doc.dai0273a/DAI0273A_dhrystone_benchmarking.pdf

    They use armcc but I can't see it making that big a difference for something like this

    The test does use some string functions but even if one rewrote them in C instead of using the optimised ones in the library I still don't think it would make that size difference.

    It would be nice to have a very simple loop tuned to the A9 which could be used to check the clock is exactly what one thinks it is.

  • Based on your earlier comment, I'm assuming you're running Dhrystone as an application under an OS.The overhead of the OS is going to have some affect.  A problem I once faced was the OS was doing dynamic power management while I ran the benchmark.  You might want to see if that is the case for your platform.

  • It seems that the benchmark is tainted by compiler. I am not using armcc.

    The compiler is gcc version 4.5.2 (Sourcery G++ Lite 2011.03-41) and the Linux kenel is 3.4.5.

    The results and the flags are as follows.

    DMIPS/MHz/core=0.49 with flags of -O0 -DTIME -march=armv7-a

    DMIPS/MHz/core=1.07 with flags of -O1 -DTIME -march=armv7-a

    DMIPS/MHz/core=1.31 with flags of -O2 -DTIME -march=armv7-a

    DMIPS/MHz/core=1.37 with flags of -O3 -DTIME -march=armv7-a

    I was wondering what compiler flags should I use?

    Notably, all of them are much lower than the official 2.5 DMIPS/MHz/core...

  • Yes, I am running Dhrystone as an application under Linux. I am not sure about how to check whether the system use dynamic power management or not.

    Can you give me some information?

  • Just checking again, I'm getting to feel like one of those service desk ones but getting desperate, you didn't divide by 2 because there we two cores did you? Only one processor would be running the test.

  • I know only one core would be running the test.

    Take -O3 as an example.

    I got 2898550.8 Dhrystones per Second, and I divided by 1757 to get 1649 DMIPS/core. And then I divided by 1200 to get 1.37 DMIPS/MHz/core.

    I thought whole DMIPS with 2 cores would be 1649*2=3298 DMIPS.

    Is this calculation correct?

  • Yes the calculation giving 1.37 DMIPS/MHz/core is correct.

    I had a search for problems like this and I found that Phoronix had got similar figures with a Pandaboard a couple of years ago. I can't see any resolution of the problem there though.

    ARM Cortex-A9 PandaBoard ES Benchmarks - Page 4

  • Thank you for all your suggestions. Finally I found that our processor is actually set to run at 800MHz. Therefore, it would be reasonable to have 2.5 x (800/1200) = 1.66 DMIPS/MHz/core. Although I got 1.37 DMIPS/MHz/core by gcc 4.5, I can get 1.6 DMIPS/MHz/core by gcc 4.9, which is close to the official value.

  • Hello daith

    i am also calculating DMIPS for my system with 4 cores. on linux OS.

    i am getting 1.67 DMIPS/MHz.

    so should i divide it by 4 to make it per Core.