Arm Community
Site
Search
User
Site
Search
User
Support forums
Arm Development Studio forum
Optimization of algorithm on ARMCortex A8(using NEON)
Jump...
Cancel
Locked
Locked
Replies
4 replies
Subscribers
119 subscribers
Views
3640 views
Users
0 members are here
Options
Share
More actions
Cancel
Related
How was your experience today?
This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion
Optimization of algorithm on ARMCortex A8(using NEON)
Girisha SG
over 12 years ago
Note: This was originally posted on 1st June 2011 at
http://forums.arm.com
I am new to code optimization for speed(is the main focus)/memory etc
I am looking out for ,
1. We have the algorithms implemented using floating point operations. As I understand NEON has floating point units & hence we dont have to rewrite the algorithms in fixed points, is there anything like floating point operations are slower than fixed point operations or they are same or any other differences with respect to final CPU load/speed?
2. Do we need to take any special care/special optimization techniques for NEON?
3. I would like to know optimization techniques for ARM Cortex A8 (except using intrinsics
, like cache/memory optimizations), do's & dont's.
4. Keywords/anything else to compiler/linker to instruct RVDS to generate optimized code
5. How can I instruct the complier/linker to use NEON accelarator?
6. Any other guidlines to speed optimization using RVDS & GCC compilers.
Etienne SOBOLE
over 12 years ago
Note: This was originally posted on 1st June 2011 at
http://forums.arm.com
1 - Floating point is a little bit slower than integer operation, but the gap is not too important.
If your source data and your result data are 32 bit float, it is not usefull to use fixed point operation.
If your datas are integers, it could be a good idea to try fixed point mode.
2 - Yes there is a lot of hint to optimize NEON:
- data alignment
- pipeline optimisation
- never transfer NEON register to ARM register.
- NEON do not have Divide and Square root opération, if you have to use these VPf opération it could reduce significatively the performance.
3 - The only correct way to optimize NEON code is to do it in assembly.
4, 5, 6
I can't help you about those points.
Etienne
Cancel
Vote up
0
Vote down
Cancel
Shervin Emami
over 12 years ago
Note: This was originally posted on 4th June 2011 at
http://forums.arm.com
If you are using GCC instead of ARM's RVDS compiler, then unfortunately you can't just enable compiler flags and expect it to produce efficient code, because the compiler almost never knows how to take advantage of NEON. So if you want to use NEON then my recommendation is to either buy the RVDS compiler and modify your C code slightly to take advantage of it and hope it will make a noticeable improvement (such as 1.5x or 2x speed boost), or learn ARM & NEON Assembly language.
I have recently written a tutorial explaining some things about optimizing for ARM (using C/C++ libraries or ideally Assembly language), with an example for rotating an image or matrix, at:
http://www.shervinemami.info/armAssembly.html
Cheers,
Shervin Emami.
Cancel
Vote up
0
Vote down
Cancel
Shervin Emami
over 12 years ago
Note: This was originally posted on 4th June 2011 at
http://forums.arm.com
And like Etienne said, you should definitely try to use 32-bit Floats rather than 64-bit Floats, because NEON does not support 64-bit floats, so it will be MUCH slower than 32-bit floats or 64-bit fixed-point maths!
Cheers,
Shervin Emami.
Cancel
Vote up
0
Vote down
Cancel
Ruben Buchatskiy
over 12 years ago
Note: This was originally posted on 6th June 2011 at
http://forums.arm.com
Try to compile with this options in GCC
-O2 -mcpu=cortex-a8 -mtune=cortex-a8 -mfloat-abi=softfp -mfpu=neon
Cancel
Vote up
0
Vote down
Cancel