Arm Development Studio forum Optimization of algorithm on ARMCortex A8(using NEON)

This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Optimization of algorithm on ARMCortex A8(using NEON)

Note: This was originally posted on 1st June 2011 at http://forums.arm.com

I am new to code optimization for speed(is the main focus)/memory etc

I am looking out for ,

1. We have the algorithms implemented using floating point operations. As I understand NEON has floating point units & hence we dont have to rewrite the algorithms in fixed points, is there anything like floating point operations are slower than fixed point operations or they are same or any other differences with respect to final CPU load/speed?
2. Do we need to take any special care/special optimization techniques for NEON?
3. I would like to know optimization techniques for ARM Cortex A8 (except using intrinsics

, like cache/memory optimizations), do's & dont's.
4. Keywords/anything else to compiler/linker to instruct RVDS to generate optimized code
5. How can I instruct the complier/linker to use NEON accelarator?
6. Any other guidlines to speed optimization using RVDS & GCC compilers.