This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Differences between NEON in Cortex-A8 and A9

Note: This was originally posted on 25th July 2011 at http://forums.arm.com

Currently i am working on a Cortex-A9 single-core chip(AML8726-m if you want to know more), and in the datasheet it's said there is a neon in it. But when i test the code here(http://hilbert-space.de/?p=22), i cannot find any acceleration on it, sometimes the neon-assembly- optimized code runs even slower than the arm-c-code. At the same time, the same code can get a pretty good acceleration on my i.MX515 which is a Cortex-A8 chip.


I am using the Android NDK to build a test app running on Android, can it be the reason?
Can anyone tell me why it happens? 


Here is some results:
#####On A8#####
arm c code: 116.*** ms
neon c code: 83.*** ms
neon asm code: 51.*** ms
#####On A9#####
arm c code: 107.*** ms
neon c code: 106-107.*** ms
neon asm code: 106-107.*** ms

Android is Linux based OS, so I can call gettimeofday() to get a precise time period in us level. The results on A9 are not identical but almost the same and I didn't run the same binary 3 times, I'm sure.

Thanks and looking forward to any useful suggestions.


Parents
  • Note: This was originally posted on 1st August 2011 at http://forums.arm.com


    This time, with a small image, 128*128 resolution, the time is shorten from 16.7ms to 11.3ms on my i.MX51.


    I dont' remember the inprove performance I've had when I had made the test!
    I though it was near 2 time faster !



    But on my A9, the improvement is so tiny, just 1ms, from 20ms to 19ms.
    So I'm confused again.



    Well.
    I don't know why but it is not really a surprised.

    The cortex A9 focus on the out of order execution, and the high frequency soc.
    The cycle table is not detailled but what is given let me suppose the cortex A9 is slower than the cortex A8 (at same frequency).
    With NEON (and then the code you tried) it should not have difference for same frequency proc.
    By the other side, the Cortex A9 should be able to work at higher frequency than the Cortex A8.

    To finish, the cortex A9 seem's to be done to improve the bad code produced by compiler and should not be good for cortex A8 optimized code.
    For me, this cpu (the A9) is not a good choice for the moment. Under 1.2 ou 1.5 Ghz, this is not a valid choice for assembly coder.

    May be one day, ARM will give us the pipeline stage of A9 instructions, and then we'll be able to know a little bit more about it.
    But that not seem's to be for now !


    Etienne
Reply
  • Note: This was originally posted on 1st August 2011 at http://forums.arm.com


    This time, with a small image, 128*128 resolution, the time is shorten from 16.7ms to 11.3ms on my i.MX51.


    I dont' remember the inprove performance I've had when I had made the test!
    I though it was near 2 time faster !



    But on my A9, the improvement is so tiny, just 1ms, from 20ms to 19ms.
    So I'm confused again.



    Well.
    I don't know why but it is not really a surprised.

    The cortex A9 focus on the out of order execution, and the high frequency soc.
    The cycle table is not detailled but what is given let me suppose the cortex A9 is slower than the cortex A8 (at same frequency).
    With NEON (and then the code you tried) it should not have difference for same frequency proc.
    By the other side, the Cortex A9 should be able to work at higher frequency than the Cortex A8.

    To finish, the cortex A9 seem's to be done to improve the bad code produced by compiler and should not be good for cortex A8 optimized code.
    For me, this cpu (the A9) is not a good choice for the moment. Under 1.2 ou 1.5 Ghz, this is not a valid choice for assembly coder.

    May be one day, ARM will give us the pipeline stage of A9 instructions, and then we'll be able to know a little bit more about it.
    But that not seem's to be for now !


    Etienne
Children
No data