Here is a minimal C implementation of a spinlock "lock" operation using GCC's built-in atomics:
#include <stdbool.h> void spin_lock(bool *l) { while (__atomic_test_and_set(l, __ATOMIC_ACQUIRE)) ; }
I am concerned by GCC's output when compiling for Aarch64:
spin_lock: mov w2, 1 .p2align 2 .L4: ldaxrb w1, [x0] stxrb w3, w2, [x0] cbnz w3, .L4 uxtb w1, w1 cbnz w1, .L4 ret
The ldaxrb surely prevents subsequent memory accesses from being reordered before it, but, to my understanding, nothing prevents those accesses from being reordered between the ldaxrb and stxrb. If I understand correctly, the acquire barrier should be placed after stxrb, not before.
When compiling for ARM, however, GCC correctly inserts a dmb after strexb:
spin_lock: mov r2, #1 .L4: ldrexb r3, [r0] strexb r1, r2, [r0] cmp r1, #0 bne .L4 tst r3, #255 dmb sy bne .L4 bx lr
Am I missing something? If GCC's output for Aarch64 is correct, could anyone explain what forces the acquire memory ordering I specified? In the opposite case, what would be a correct solution (beside GCC's solution for ARM)?
I am using Linaro's gcc-linaro-5.3-2016.02-x86_64_aarch64-elf and gcc-linaro-4.9-2015.02-3-x86_64_arm-eabi toolchains.