In the ARM Architecture Reference Manual issue D.a (ARM DDI 0487D.a) section K11.3.1 "Acquiring a lock" has the following example code:
AArch32 Px PLDW[R1] ; preload into cache in unique state Loop LDAEX R5, [R1] ; read lock with acquire CMP R5, #0 ; check if 0 STREXEQ R5, R0, [R1] ; attempt to store new value CMPEQ R5, #0 ; test if store suceeded BNE Loop ; retry if not ; loads and stores in the critical region can now be performed
My question is I can not find documentation in Chapter E2 "The AArch32 Application Level Memory Model" that would prevent a speculative store after the BNE from being accelerated before the STREXEQ causing the store to be observed by another observer before the lock is acquired. Version A.j of the manual contained the following statement: "Speculative writes by an observer cannot be observed by another observer." Which would preclude this from happening but I can find no equivalent statement in the latest manual.
Oh. Would it not be proper to open a support case with Arm? Or, on the side, to check the object code the modern c++ compilers generate for locking? If licensing isn't an issue, one can also check the source or the object code of current mainstream OSes.
One can also implement locking without relying on instructions with acquire/release semantics - by adopting the no-ordering ldrex/strex pair and by placing explicit barriers (dmb) at appropriate locations in the lock/unlock routines. This should help the project move forward, while awaiting a reply from Arm.
Regardless of the presence or absence of any documentation, propagating speculative writes lends them a sort of certainty to which they are not entitled; not until it can be known that they can be committed.