As Arm servers have become more widely used in recent years, more cloud providers have begun to offer Arm-based cloud instances, and more developers are writing software for the Arm platform.
Synchronization is a hot topic during the software migration process. Arm-based servers typically have more CPU cores than other architecture, emphasizing the importance of synchronization understanding.
One of the most significant differences between Arm and X86 CPUs is their memory model: the Arm architecture has a weak memory model that differs from the x86 architecture TSO (Total Store Order) model. Different memory models can cause programs to function well on one architecture but encounter performance problem or failure on the other. The Arm server's more relaxed memory model allows for more compiler and hardware optimization to boost system performance. But the tradeoff is that it is more difficult to understand and may be more prone to writing erroneous code.
We produce this document to share synchronization expertise on Arm architecture, which can help the developers from other architecture to perform development on Arm system.
This document first introduces the synchronization approach on Armv8-A architecture, including atomic instructions, Arm memory ordering, and data access barrier instructions.
Next to help the reader better understand, we select three typical cases and take deep dive analysis. Because synchronization-related programming is complicated and intricate, we must balance its correctness and performance carefully. We suggest starting by correcting the logic with the heavier instruction, then move on to increase performance by removing some redundant barriers or switching to a lighter barrier if necessary. A deep understanding of the Arm memory model and related instructions is necessary to accomplish accurate and high-performance synchronization implementation.
In the Appendix section, we first introduce the memory model tool (The litmus test suite), which can help in understanding the memory model and verifying the program on various architectures. Then we present a brief overview of the C++ memory model and the mapping between that and the Armv8-A implementation. Here we'd like to emphasize that, in most development cases, developers don't need write architecture-dependent assembly code. Instead, they should rely on a well-defined programming language level memory model to write high-quality code without having to worry about architectural differences.
[CTAToken URL = "https://developer.arm.com/documentation/107630/1-0/?lang=en" target="_blank" text="Download the Whitepaper" class ="green"]