This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

R5 vs A9 Performances

Hello guys,
I've been running the same code (that you can find here https://gist.github.com/poz1/1714ddd68da5816624d6867ad6cc5d98 ) on an R5 Board and an A9 Board.
Optimisations are enabled and my goal was to find the "right clock" for the A9 in order to obtain the same performances of the R5.

I know they are conceptually different but I was expecting to find that the A9 (at 650Mhz) to be faster than the R5 (at 500Mhz).

Instead the outputs I got are:

- R5

Starting computation
Output took 9879627907556208991 clock cycles.
Output took 32976237.10 us.

- A9

Starting computation
Output took 36834640184 clock cycles.
Output took 56668677.21 us.

I am puzzled because the R5 uses much more clock cycles but takes half the time (???) to complete.

Do you have any idea of how could be explained?
Thank you :)

Top replies

Stuart Hirons over 4 years ago in reply to 42Bastian Schick +1 verified

Hi, Has the Cortex-A9 had the MMU set up and the caches enabled before running this code (from Normal memory) ??? There's been no mention of this and I have encountered this happening before.... ...

0 42Bastian Schick over 4 years ago

There are a lot of "uncertainties" in this code.

For example XTime. It looks like you are running the R5 test on an US+ and the A9 on a ZYNQ 7000.

Does XTime really give the number of CPU cycles? I rather think it is timer cycles. And those a likely different on different board.
Cancel
Up 0 Down

Cancel
0 Poz1 over 4 years ago in reply to 42Bastian Schick

Hello 42Bastian Schick and thank you for your help :)
Yes, It's a ZYNQ7000 (A9) and an UltraScale+ MPSoC (R5)

I've been searching after your input and found this https://www.xilinx.com/support/answers/66568.html where they say that "It works at the APU clock frequency." (that in the case of the MPSoC it's an 1.5Ghz A53 and would explain the much higher clock cycles count)

So thank you for your suggestion :)

What still puzzles me is the difference in time between the A9 and the R5, shouldn't they be comparable? (or at least should not be the A9 the faster one?)

Thank you again :)
Cancel
Up 0 Down

Cancel
0 42Bastian Schick over 4 years ago in reply to Poz1

Did you hand-stop those times?

For the cycles you should read the PMU counters.

The R5 has no access to the A53 timers, so XTime does have a different base.

I suggest to use a dedicated timer, check its frequency with an GPIO and then use it to measure the time.

Anyway, my experience throughout all ARM cores is, that small routines just scale with the clock with a slight performance plus for those with a longer pipeline.
Cancel
Up 0 Down

Cancel
0 Poz1 over 4 years ago in reply to 42Bastian Schick

Hello 42Bastian Schick!
First of all I want to thank you for your precious help!
I created a separate timer as suggested and now everything is right :)

The question about performances still remains though, why is the R5 twice the speed of A9?
Could it be because R5 uses LPDDR4 (on the board I have) while the A9 has DDR3?

Thank you again :)
Cancel
Up 0 Down

Cancel
0 42Bastian Schick over 4 years ago in reply to Poz1

Are you running both bare-metall?

The kind of DDRAM should not matter much, as - at least the code - runs from cache.

But the data cache size might make the difference.

Do you have ECC enabled on the CA9? If so, it has only 16bit data bus.
Cancel
Up 0 Down

Cancel
0 Poz1 over 4 years ago in reply to 42Bastian Schick
Yup, 16 bit, It says

512MB DDR3 with 16-bit bus @ 1050Mbps

EDIT:
Actually, being this board FPGA, I have the option to lower the bus of R5 to 16bits, I am just unsure if it would kill the board or not :D
Cancel
Up 0 Down

Cancel
0 42Bastian Schick over 4 years ago in reply to Poz1

If baremetall try to make it run in the OCM which is 256K and should be sufficient for code and data.

But the 16bit bus seems to me an explanation why you see such a big difference between CA9/R5.
Cancel
Up 0 Down

Cancel
+1 Stuart Hirons over 4 years ago in reply to 42Bastian Schick

Hi,

Has the Cortex-A9 had the MMU set up and the caches enabled before running this code (from Normal memory) ???

There's been no mention of this and I have encountered this happening before....

Just a thought.

regards

Stuart
Cancel
Up +1 Down

Cancel
0 Poz1 over 4 years ago in reply to Stuart Hirons

Hello, thanks for all the precious ideas :)
Unfortunately I had to put the project on hold until mid february :(
I will surelly test and let you know :)

Thanks again,
Alessandro
Cancel
Up 0 Down

Cancel