This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Optimizing code for ARM: ARMv7, ARMv8, Memory Models and NEON Intrinsics

Matthew Gretton-Dann titled this presentation: Porting & Optimising Code 32-bit to 64-bit

The title is accurate but he does a better job of giving a high level overview of the ARMv7 and ARMv8 architecture differences, C++11 memory models (which become more interesting as threading complexity increases), but mostly, he covers why you should consider use NEON intrinsics. If you're writing algorithms that deal with lots of data, you may be safer and faster trusting the compiler to auto vectorize code written in intrinsics than writing it in assembly. NEON Intrinsic code has the added benefit of being cross ARM architecture compatible and you didn't hear it from me but there are open source drop in replacements for arm_neon.h that may gain you even more portability.

We have some more white papers on porting from 32-bit ARM to 64-bit ARM in the pipeline but if you prefer to sit down and watch a lecture on the aforementioned topics, grab his presentation PDF and listen to / watch this video. The camera operator takes a nap for the first 18 minutes but the PDF is easy to follow.

If you need some help identifying which algorithms to optimize first in your Android or Linux apps, check out the free Streamline profiler in ARM's DS-5 Community Edition. Follow this Android Community and Advanced Android Application Development category if you're interested in hearing more about NEON, porting and transitioning to Android on ARMv8 (64-bit).