This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Cortex-A9 : NEON assembly code is not giving expected performance compared with ARM assembly code

Note: This was originally posted on 27th November 2012 at http://forums.arm.com

I am facing one problem, like I have handmade ARM9 assembly code and NEON assembly code. I expected NEON assembly should get 4X % improvement for the speed compared with ARM assembly code. But I could not see that improvement in NEON assembly code.

Can you please explain me what could be the reason?

I am using Cortex-A9 processor and configuration in my Makefile : "CFLAGS=--cpu=Cortex-A9 -O2 -Otime --apcs=/fpic --no_hide_all"

Please let me know is there anything I need to change the make file settings to get NEON performance improvement?
Parents
  • Note: This was originally posted on 22nd March 2013 at http://forums.arm.com

    Thanks again..

    One more observation is that with Cortex-A8 I was able to achieve performance difference with the same code.
    So I cannot observe much performance difference in (1) and (2) in cortex - A9?  as in cortex-A9,NEON processing is done with 64bit means takes 2 cycle to  complete 128bit.

    But There should be some difference in speed between (1) and (2) right?

    Any solutions to overcome this issue?

    Can I expect any performance improvement if I add PLD instructions?

    Regards,
    KP
Reply
  • Note: This was originally posted on 22nd March 2013 at http://forums.arm.com

    Thanks again..

    One more observation is that with Cortex-A8 I was able to achieve performance difference with the same code.
    So I cannot observe much performance difference in (1) and (2) in cortex - A9?  as in cortex-A9,NEON processing is done with 64bit means takes 2 cycle to  complete 128bit.

    But There should be some difference in speed between (1) and (2) right?

    Any solutions to overcome this issue?

    Can I expect any performance improvement if I add PLD instructions?

    Regards,
    KP
Children
No data