This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Spin-lock implementation for Aarch64 -- how to enforce acquire semantics?

Here is a minimal C implementation of a spinlock "lock" operation using GCC's built-in atomics:

#include <stdbool.h>

void spin_lock(bool *l) {
  while (__atomic_test_and_set(l, __ATOMIC_ACQUIRE))
    ;
}

I am concerned by GCC's output when compiling for Aarch64:

spin_lock:
    mov    w2, 1
    .p2align 2
.L4:
    ldaxrb    w1, [x0]
    stxrb    w3, w2, [x0]
    cbnz    w3, .L4
    uxtb    w1, w1
    cbnz    w1, .L4
    ret

The ldaxrb surely prevents subsequent memory accesses from being reordered before it, but, to my understanding, nothing prevents those accesses from being reordered between the ldaxrb and stxrb. If I understand correctly, the acquire barrier should be placed after stxrb, not before.

When compiling for ARM, however, GCC correctly inserts a dmb after strexb:

spin_lock:
    mov    r2, #1
.L4:
    ldrexb    r3, [r0]
    strexb    r1, r2, [r0]
    cmp    r1, #0
    bne    .L4
    tst    r3, #255
    dmb    sy
    bne    .L4
    bx    lr

Am I missing something? If GCC's output for Aarch64 is correct, could anyone explain what forces the acquire memory ordering I specified? In the opposite case, what would be a correct solution (beside GCC's solution for ARM)?

I am using Linaro's gcc-linaro-5.3-2016.02-x86_64_aarch64-elf and gcc-linaro-4.9-2015.02-3-x86_64_arm-eabi toolchains.

Parents
  • The ldaxrb surely prevents subsequent memory accesses from being reordered before it, but, to my understanding, nothing prevents those accesses from being reordered between the ldaxrb and stxrb. If I understand correctly, the acquire barrier should be placed after stxrb, not before.

    While having a barrier after the STXR would certainly work, I don't think it is necessary.

    The LDAXR guarantees that explicit access after the barrier aren't re-order before the barrier.  The processor might start speculatively fetching between the LDAXR and the STXR.  If the STXR fails, those will simply be discarded.  If the STXR succeeds, those accesses are still _after_ you saw the spinlock available and you know that no other thread/processor jumped between the LDAXR/STXR as otherwise the STXR wouldn't have succeeded.

    So why it is technically true that they could be before you owned the spinlock, I don't see that it has actually broken anything.

    When compiling for ARM, however, GCC correctly inserts a dmb after strexb:

    I'm not a GCC expert.  But is it possible that you were building for ARMv7-A?  Which doesn't include the Load-Acquire instruction.

    Once you switch to using a DMB, you could arguably place it where it is (line 09) or pretty much any point after the LDREX.  I suspect the rationale for its placing is that if the STREX fails you can avoid the cost of the barrier.

Reply
  • The ldaxrb surely prevents subsequent memory accesses from being reordered before it, but, to my understanding, nothing prevents those accesses from being reordered between the ldaxrb and stxrb. If I understand correctly, the acquire barrier should be placed after stxrb, not before.

    While having a barrier after the STXR would certainly work, I don't think it is necessary.

    The LDAXR guarantees that explicit access after the barrier aren't re-order before the barrier.  The processor might start speculatively fetching between the LDAXR and the STXR.  If the STXR fails, those will simply be discarded.  If the STXR succeeds, those accesses are still _after_ you saw the spinlock available and you know that no other thread/processor jumped between the LDAXR/STXR as otherwise the STXR wouldn't have succeeded.

    So why it is technically true that they could be before you owned the spinlock, I don't see that it has actually broken anything.

    When compiling for ARM, however, GCC correctly inserts a dmb after strexb:

    I'm not a GCC expert.  But is it possible that you were building for ARMv7-A?  Which doesn't include the Load-Acquire instruction.

    Once you switch to using a DMB, you could arguably place it where it is (line 09) or pretty much any point after the LDREX.  I suspect the rationale for its placing is that if the STREX fails you can avoid the cost of the barrier.

Children