This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Cortex-A9 : NEON assembly code is not giving expected performance compared with ARM assembly code

Note: This was originally posted on 27th November 2012 at http://forums.arm.com

I am facing one problem, like I have handmade ARM9 assembly code and NEON assembly code. I expected NEON assembly should get 4X % improvement for the speed compared with ARM assembly code. But I could not see that improvement in NEON assembly code.

Can you please explain me what could be the reason?

I am using Cortex-A9 processor and configuration in my Makefile : "CFLAGS=--cpu=Cortex-A9 -O2 -Otime --apcs=/fpic --no_hide_all"

Please let me know is there anything I need to change the make file settings to get NEON performance improvement?
Parents
  • Note: This was originally posted on 22nd March 2013 at http://forums.arm.com

    I'd also suggest posting a code example. Many issues with "benchmarks" of small code sections is that they often do not test what you think they are testing (either because the code under test is inefficient, or the timing method doesn't scale down to very short timing).

    It's much easier to give precise answers if we actually know exactly what your code is trying to do =P
Reply
  • Note: This was originally posted on 22nd March 2013 at http://forums.arm.com

    I'd also suggest posting a code example. Many issues with "benchmarks" of small code sections is that they often do not test what you think they are testing (either because the code under test is inefficient, or the timing method doesn't scale down to very short timing).

    It's much easier to give precise answers if we actually know exactly what your code is trying to do =P
Children
No data