In the ARM Architecture Reference Manual issue D.a (ARM DDI 0487D.a) section K11.3.1 "Acquiring a lock" has the following example code:
AArch32 Px PLDW[R1] ; preload into cache in unique state Loop LDAEX R5, [R1] ; read lock with acquire CMP R5, #0 ; check if 0 STREXEQ R5, R0, [R1] ; attempt to store new value CMPEQ R5, #0 ; test if store suceeded BNE Loop ; retry if not ; loads and stores in the critical region can now be performed
My question is I can not find documentation in Chapter E2 "The AArch32 Application Level Memory Model" that would prevent a speculative store after the BNE from being accelerated before the STREXEQ causing the store to be observed by another observer before the lock is acquired. Version A.j of the manual contained the following statement: "Speculative writes by an observer cannot be observed by another observer." Which would preclude this from happening but I can find no equivalent statement in the latest manual.
A speculative write that is program-order-after a conditional branch is not propagated to other threads before the conditional branch is resolved. The PE may speculate into the critical region, but the speculation is rendered invalid once it is known that the branch has been resolved in the favour of a direction leading away from the critical region.
Edit: The above is applicable to a load-store control dependency.
I am not sure what you mean by a "load-store control dependency". I can not find any such phrase in the v8 manual. A control dependency is described as:
Control dependencyA Control dependency from a read R1 to a subsequent instruction I2 exists if and only if either:• There is a Register dependency from the data value returned by R1 to the data value used inthe evaluation of a conditional branch, and I2 is only executed as a result of one of thepossible outcomes of that conditional branch.
This does not seem to apply in this case because the conditional branch is based on the result of the STREXEQ and not the LDAEX.
If LDAEX returns non-zero in R5, the BNE LOOP directly depends on that value. But you are correct that, in case STREX does run and fails, the branch to the loop depends on the STREX instruction.
In any case, a speculative write cannot be propagated to any other thread before the branch is resolved.
In the subsection "Instantaneous Instruction Execution" here, it says ("i cannot propagate yet, because e is unresolved;") that a speculatively execute write cannot be propagated before the branch condition, under which that speculation fell, is resolved.
Just below Fig 1.5 and Fig 1.6 here, it says "execution of Figure 1.6, where ... d is a write of x ... is forbidden, because in Power and ARM d is only allowed to propagate when all program-order-earlier control-flow is determined, ..."
This paper explains the states of a store; specifically, it mentions that a store is committed before it is propagated to other threads, and before committing a store, all po-before branch instructions must be "finished". It says this of a conditional branch which is labelled as finished: "When a conditional branch is finished, any untaken alternative paths are discarded, and instruction instances that follow (in program order) a non-finished conditional branch cannot be finished until that conditional branch is."
Thanks for the links, they provided some interesting reading material, unfortunately product I work on is for aircraft and if incorrect people could die so I really need some formal ARM provided documentation that backs up the claims made by these papers.
You should contact the manufacture of the SoC instead of this forum.Anyway, you would not rely in a safety critical system on a single SoC, would you?
I did, they sent me to this forum. This is somewhat understandable because the ARM architecture is what defines the memory model not the SoC.I can not speak to the system design, I write a general purpose OS that gets used on multiple systems (most of which I never see). Safety critical systems normally have redundancy using dissimilar hardware but this does not mean the software can be developed assuming a bug will be caught by some other means.
I realize the ARM architecture is not developed to any safety critical standard so there is no guarantee the processor will work as documented but that is all I have to go on and at this point in time my interpretation of the memory model section is the code in K11.3.1 "Acquiring a lock" is incorrect. Based on things I have read (including the papers refereed to above) I do not actually believe this section is incorrect and I was hoping someone on this from this forum would be able to point to the ARM documentation that I am missing.
Oh. Would it not be proper to open a support case with Arm? Or, on the side, to check the object code the modern c++ compilers generate for locking? If licensing isn't an issue, one can also check the source or the object code of current mainstream OSes.
One can also implement locking without relying on instructions with acquire/release semantics - by adopting the no-ordering ldrex/strex pair and by placing explicit barriers (dmb) at appropriate locations in the lock/unlock routines. This should help the project move forward, while awaiting a reply from Arm.
Regardless of the presence or absence of any documentation, propagating speculative writes lends them a sort of certainty to which they are not entitled; not until it can be known that they can be committed.
View all questions in Cortex-A / A-Profile forum