Arm Community
Site
Search
User
Site
Search
User
Support forums
Arm Development Studio forum
Cortex-A9 : NEON assembly code is not giving expected performance compared with ARM assembly code
Jump...
Cancel
Locked
Locked
Replies
25 replies
Subscribers
118 subscribers
Views
16716 views
Users
0 members are here
Options
Share
More actions
Cancel
Related
This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion
Cortex-A9 : NEON assembly code is not giving expected performance compared with ARM assembly code
Mohamed Jauhar
over 12 years ago
Note: This was originally posted on 27th November 2012 at
http://forums.arm.com
I am facing one problem, like I have handmade ARM9 assembly code and NEON assembly code. I expected NEON assembly should get 4X % improvement for the speed compared with ARM assembly code. But I could not see that improvement in NEON assembly code.
Can you please explain me what could be the reason?
I am using Cortex-A9 processor and configuration in my Makefile : "CFLAGS=--cpu=Cortex-A9 -O2 -Otime --apcs=/fpic --no_hide_all"
Please let me know is there anything I need to change the make file settings to get NEON performance improvement?
Parents
Shervin Emami
over 12 years ago
Note: This was originally posted on 22nd March 2013 at
http://forums.arm.com
Like I said, the multiplication hardware is only 32-bits wide, so multiplying Q registers is roughly the same speed as multiplying S registers 4 times, as mentioned in the cycle timing of the Cortex-A9 NEON TRM.
Cancel
Vote up
0
Vote down
Cancel
Reply
Shervin Emami
over 12 years ago
Note: This was originally posted on 22nd March 2013 at
http://forums.arm.com
Like I said, the multiplication hardware is only 32-bits wide, so multiplying Q registers is roughly the same speed as multiplying S registers 4 times, as mentioned in the cycle timing of the Cortex-A9 NEON TRM.
Cancel
Vote up
0
Vote down
Cancel
Children
No data