Writing high-performance software for Arm often means diving deep into its SIMD technologies. Many developers know NEON, the fixed-width vector extension, but Arm’s latest SVE (Scalable Vector Extension) and SME (Scalable Matrix Extension) take things further.
They are not just wider vectors. They introduce new concepts such as predication, scalable vectors, streaming modes, and matrix tiles. These features offer unprecedented flexibility. However, with this power comes complexity.
That is where SIMD Loops steps in.
SIMD Loops is an open-source project designed to help developers learn SVE and SME through hands-on experimentation. It provides dozens of real-world loop kernels. Examples include matrix multiplication, vector reduction, sorting, and string processing. Each kernel is written in C, Arm intrinsics, and inline assembly.
Each loop is carefully annotated to showcase key architectural features in action. This lets you see exactly how instructions like fmopa or fmla work in practice.
Unlike a recipe book, SIMD Loops does not just hand you solutions. It helps you understand the architecture itself. You will see how different vector instruction sets (for example, NEON, SVE, SME, SVE2, SME2.1) handle the same kernel, compare performance, and gain a foundation for writing your own high-performance code.
Whether you are moving from NEON or starting fresh with SVE/SME, SIMD Loops offers a clear, practical pathway to mastering Arm’s most advanced SIMD technologies.
Learn more and explore practical examples in our guided Learning Path.
SIMD Loops Learning Path