Arm Community
Site
Search
User
Site
Search
User
Groups
Arm Research
DesignStart
Education Hub
Graphics and Gaming
High Performance Computing
Innovation
Multimedia
Open Source Software and Platforms
Physical
Processors
Security
System
Software Tools
TrustZone for Armv8-M
中文社区
Blog
Announcements
Artificial Intelligence
Automotive
Healthcare
HPC
Infrastructure
Innovation
Internet of Things
Machine Learning
Mobile
Smart Homes
Wearables
Forums
All developer forums
IP Product forums
Tool & Software forums
Pelion IoT Platform
Support
Open a support case
Documentation
Downloads
Training
Arm Approved program
Arm Design Reviews
Community Help
More
Cancel
Developer Community
Tools and Software
Software Tools
Jump...
Cancel
Software Tools
Arm Development Studio forum
NEON vs VFP usage
Tools, Software and IDEs blog
Forums
Videos & Files
Help
Jump...
Cancel
New
Replies
6 replies
Subscribers
127 subscribers
Views
6707 views
Users
0 members are here
Related
NEON vs VFP usage
Offline
Marius Cetateanu
over 7 years ago
Note: This was originally posted on 29th August 2011 at http://forums.arm.com
Hi,
Could I use NEON and VFP at the same time in my application?
What would be the downsides of that?
I read also in the documentation that the compilation flags are as following:
GCC
-mfpu=neon -mfloat-abi=softfp
-mfpu=vfpv3 -mfloat-abi=softfp
ARMCC
--cpu=Cortex-A9 --apcs=/softfp
--cpu=Cortex-A9 --fpu=VFPv3 --apcs=/softfp
Do this control just the usage of NEON intrinsics? Does specifying th option for one(e.g. neon)
prevents me from using the other(e.g. vfp) directly in the code?
Also specifying "softfp" seems to incur some overhead in my application(at least from the preliminary benchmarks).
I tried to use the "hard" option but then I have linkage error as the runtime libraries are not built with support for that.
Could I get somewhere the runtime libraries built with "hard" or do I have to do it myself?
Thanks
Parents
Offline
Gilead Kutnick
over 7 years ago
Note: This was originally posted on 1st September 2011 at
http://forums.arm.com
First see if you can actual get your code working in single precision. If you need double precision then NEON won't be an option, period.
The advice given applies to Cortex-A8 more than Cortex-A9. VFP on Cortex-A9 is roughly as fast as the NEON equivalents in terms of issue rate and latency. If your code doesn't lend itself to vectorization then you're probably better off sticking with VFP, unless you want to have good performance on Cortex-A8. Mixing the two is probably a bad idea on both processors.
I have found myself mixing VFP and NEON in one instance, where I want a large integer reciprocal on Cortex-A8 VFP can do it (not complete 64-bit, but close enough) faster than I could with vrecpe plus Newton Raphson steps. But the VFP divide instruction blocks further NEON instructions from executing. On the other hand, you can execute integer instructions, so if you can schedule these during the divide you can recover a lot of the time.
One interesting thing is that on Cortex-A8 I've found it to be faster to convert from floating point to fixed and vice-versa in software than it is using VFP, presuming you can ignore inf/NaN/denormal. This is especially true if you have a fixed mantissa, like if the number has been normalized. If the data is already in VFP/NEON registers I haven't found any penalty in switching to using NEON to work on it instead of VFP.
Cancel
Up
0
Down
Reply
Cancel
Reply
Offline
Gilead Kutnick
over 7 years ago
Note: This was originally posted on 1st September 2011 at
http://forums.arm.com
First see if you can actual get your code working in single precision. If you need double precision then NEON won't be an option, period.
The advice given applies to Cortex-A8 more than Cortex-A9. VFP on Cortex-A9 is roughly as fast as the NEON equivalents in terms of issue rate and latency. If your code doesn't lend itself to vectorization then you're probably better off sticking with VFP, unless you want to have good performance on Cortex-A8. Mixing the two is probably a bad idea on both processors.
I have found myself mixing VFP and NEON in one instance, where I want a large integer reciprocal on Cortex-A8 VFP can do it (not complete 64-bit, but close enough) faster than I could with vrecpe plus Newton Raphson steps. But the VFP divide instruction blocks further NEON instructions from executing. On the other hand, you can execute integer instructions, so if you can schedule these during the divide you can recover a lot of the time.
One interesting thing is that on Cortex-A8 I've found it to be faster to convert from floating point to fixed and vice-versa in software than it is using VFP, presuming you can ignore inf/NaN/denormal. This is especially true if you have a fixed mantissa, like if the number has been normalized. If the data is already in VFP/NEON registers I haven't found any penalty in switching to using NEON to work on it instead of VFP.
Cancel
Up
0
Down
Reply
Cancel
Children
No data
More questions in this forum
By title
By date
By reply count
By view count
By most asked
By votes
By quality
Descending
Ascending
All recent questions
Unread questions
Questions you've participated in
Questions you've asked
Unanswered questions
Answered questions
Questions with suggested answers
Questions with no replies
Not Answered
Forum FAQs
0
ARM Community
1137
views
0
replies
Started
5 days ago
by
Annie Cracknell
Suggested Answer
How to view SFRs in DS during debugging?
0
473
views
1
reply
Latest
2 days ago
by
Ronan Synnott
Answered
Dual-core debugging in DS
0
3340
views
2
replies
Latest
15 days ago
by
Ivan Savvateev
Answered
Failure to get an evaluation license with error Unable to execute API call /api/v1/connect
0
4274
views
3
replies
Latest
20 days ago
by
Tim Holt
Suggested Answer
DS52020.0 connection to Musca-A/B boards not working
0
Arm Development Studio
Musca-A
5313
views
4
replies
Latest
22 days ago
by
Daniel Oliveira
>
View all questions in Arm Development Studio forum