Once you move beyond short sequences of optimised Arm assembler, the next likely step will be to managing more complex, optimised routines using macros and functions.
Ideally caches act as magic make-it-go-faster logic sitting between processor cores and memory banks. But there are cases where cache behaviour must be considered to get the desired result.
This is the first part of a series on how to write SIMD code for Neon using assembly language. It covers getting started with Neon, using it efficiently, and more.
Sitting in the airport at the end of a week's business trip to the US, I reflected back on the week. It turned out that my colleague on this trip has an even worse sense of direction than myself.…
Arm implements conditional execution using a set of flags which store state information about a previous operation. in this post I shed some light on the operation of these flags.
This post will show you how we can deal with these limitations and how the latest revision of the Arm architecture (Armv7) provides a simple and efficient solution.