This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Differences between NEON in Cortex-A8 and A9

Note: This was originally posted on 25th July 2011 at http://forums.arm.com

Currently i am working on a Cortex-A9 single-core chip(AML8726-m if you want to know more), and in the datasheet it's said there is a neon in it. But when i test the code here(http://hilbert-space.de/?p=22), i cannot find any acceleration on it, sometimes the neon-assembly- optimized code runs even slower than the arm-c-code. At the same time, the same code can get a pretty good acceleration on my i.MX515 which is a Cortex-A8 chip.


I am using the Android NDK to build a test app running on Android, can it be the reason?
Can anyone tell me why it happens? 


Here is some results:
#####On A8#####
arm c code: 116.*** ms
neon c code: 83.*** ms
neon asm code: 51.*** ms
#####On A9#####
arm c code: 107.*** ms
neon c code: 106-107.*** ms
neon asm code: 106-107.*** ms

Android is Linux based OS, so I can call gettimeofday() to get a precise time period in us level. The results on A9 are not identical but almost the same and I didn't run the same binary 3 times, I'm sure.

Thanks and looking forward to any useful suggestions.


Parents
  • Note: This was originally posted on 9th August 2012 at http://forums.arm.com

    I've also found the same sort of problems in most of my image processing code, where NEON typically gives about 20x boost on a Cortex-A8 but only about 3x boost on a Cortex-A9 CPU! Like the guys have mentioned already in this post, there are many reasons why Cortex-A9 is faster in some ways and slower in other ways (I also compare Cortex-A8 with Cortex-A9 on my webpage "http://www.shervinemami.info/armAssembly.html"). But as you've noticed, it's very important that you try different amounts & positions for Cache Preloading using PLD instructions, because like someone else mentioned early in the post, your device is mostly just waiting on data from memory, rather than doing NEON operations on it!

    So if you are working with megapixel images then you should worry less about counting NEON clock cycles and think in terms of memory stalls, because that is where most of the time will go to!
Reply
  • Note: This was originally posted on 9th August 2012 at http://forums.arm.com

    I've also found the same sort of problems in most of my image processing code, where NEON typically gives about 20x boost on a Cortex-A8 but only about 3x boost on a Cortex-A9 CPU! Like the guys have mentioned already in this post, there are many reasons why Cortex-A9 is faster in some ways and slower in other ways (I also compare Cortex-A8 with Cortex-A9 on my webpage "http://www.shervinemami.info/armAssembly.html"). But as you've noticed, it's very important that you try different amounts & positions for Cache Preloading using PLD instructions, because like someone else mentioned early in the post, your device is mostly just waiting on data from memory, rather than doing NEON operations on it!

    So if you are working with megapixel images then you should worry less about counting NEON clock cycles and think in terms of memory stalls, because that is where most of the time will go to!
Children
No data