Synchronization Overview and Case Study on Arm Architecture

June 27, 2022

1 minute read time.

As Arm servers have become more widely used in recent years, more cloud providers have begun to offer Arm-based cloud instances, and more developers are writing software for the Arm platform.

Synchronization is a hot topic during the software migration process. Arm-based servers typically have more CPU cores than other architecture, emphasizing the importance of synchronization understanding.

One of the most significant differences between Arm and X86 CPUs is their memory model: the Arm architecture has a weak memory model that differs from the x86 architecture TSO (Total Store Order) model. Different memory models can cause programs to function well on one architecture but encounter performance problem or failure on the other. The Arm server's more relaxed memory model allows for more compiler and hardware optimization to boost system performance. But the tradeoff is that it is more difficult to understand and may be more prone to writing erroneous code.

We produce this document to share synchronization expertise on Arm architecture, which can help the developers from other architecture to perform development on Arm system.

This document first introduces the synchronization approach on Armv8-A architecture, including atomic instructions, Arm memory ordering, and data access barrier instructions.

Next to help the reader better understand, we select three typical cases and take deep dive analysis. Because synchronization-related programming is complicated and intricate, we must balance its correctness and performance carefully. We suggest starting by correcting the logic with the heavier instruction, then move on to increase performance by removing some redundant barriers or switching to a lighter barrier if necessary. A deep understanding of the Arm memory model and related instructions is necessary to accomplish accurate and high-performance synchronization implementation.

In the Appendix section, we first introduce the memory model tool (The litmus test suite), which can help in understanding the memory model and verifying the program on various architectures. Then we present a brief overview of the C++ memory model and the mapping between that and the Armv8-A implementation. Here we'd like to emphasize that, in most development cases, developers don't need write architecture-dependent assembly code. Instead, they should rely on a well-defined programming language level memory model to write high-quality code without having to worry about architectural differences.

Download the Whitepaper

0 comments
0 members are here

Servers and Cloud Computing blog

Harness the Power of Retrieval-Augmented Generation with Arm Neoverse-powered Google Axion Processors

Na Li

This blog explores the performance benefits of RAG and provides pointers for building a RAG application on Arm®︎ Neoverse-based Google Axion Processors for optimized AI workloads.
- April 7, 2025
Arm CMN S3: Driving CXL storage innovation

John Xavier Lionel

CXL are revolutionizing the storage landscape. Neoverse CMN S3 plays a pivotal role in enabling high-performance, scalable storage devices configured as CXL Type 1 and Type 3.
- February 24, 2025
Streamline Arm adoption with GitHub Copilot and Arm64 Runners

Michael Gamble

The Arm for GitHub Copilot extension is here to change the way developers approach architecture migration.
- February 19, 2025

AI blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded and Microcontrollers blog

Internet of Things (IoT) blog

Laptops and Desktops blog

Mobile, Graphics, and Gaming blog

Operating Systems blog

Servers and Cloud Computing blog

SoC Design and Simulation blog

Tools, Software and IDEs blog

Synchronization Overview and Case Study on Arm Architecture

Harness the Power of Retrieval-Augmented Generation with Arm Neoverse-powered Google Axion Processors

Arm CMN S3: Driving CXL storage innovation

Streamline Arm adoption with GitHub Copilot and Arm64 Runners