This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Cortex-A9 : NEON assembly code is not giving expected performance compared with ARM assembly code

Note: This was originally posted on 27th November 2012 at http://forums.arm.com

I am facing one problem, like I have handmade ARM9 assembly code and NEON assembly code. I expected NEON assembly should get 4X % improvement for the speed compared with ARM assembly code. But I could not see that improvement in NEON assembly code.

Can you please explain me what could be the reason?

I am using Cortex-A9 processor and configuration in my Makefile : "CFLAGS=--cpu=Cortex-A9 -O2 -Otime --apcs=/fpic --no_hide_all"

Please let me know is there anything I need to change the make file settings to get NEON performance improvement?
Parents
  • Note: This was originally posted on 30th November 2012 at http://forums.arm.com

    Sorry but it is still not obvious what your tests are doing unless if you show something more concrete such as actual code. For all we know, maybe most of the delay is caused because you are calling a C function on every iteration, or loading data from RAM, etc. And it is not obvious what you are wondering with these results.

    Like several people have already said in your other threads, there is a big difference in NEON pipeline & cache system for Cortex-A8 and Cortex-A9, so it is expected you will get very difference speeds with NEON, and also NEON will only give a speedup if you use it efficiently and for suitable algorithms. If you use NEON for things that aren't suited to NEON then you will get a slow-down instead of a speedup, even if you use Assembly code.

    -Shervin.
Reply
  • Note: This was originally posted on 30th November 2012 at http://forums.arm.com

    Sorry but it is still not obvious what your tests are doing unless if you show something more concrete such as actual code. For all we know, maybe most of the delay is caused because you are calling a C function on every iteration, or loading data from RAM, etc. And it is not obvious what you are wondering with these results.

    Like several people have already said in your other threads, there is a big difference in NEON pipeline & cache system for Cortex-A8 and Cortex-A9, so it is expected you will get very difference speeds with NEON, and also NEON will only give a speedup if you use it efficiently and for suitable algorithms. If you use NEON for things that aren't suited to NEON then you will get a slow-down instead of a speedup, even if you use Assembly code.

    -Shervin.
Children
No data