Hi all:
I have some questions about DMB and DSB in armv8.
(1)
In armv8 Reference Manual doc, it says "The DMB instruction does not ensure the completion of any of the memory accesses for which it ensures relative order".
But in ARM Cortex-A Series Programmer’s Guide for ARMv8-A doc, it explains some dmb/dsb parameters.for example :
<option> | Ordered Accesses (before – after) | Shareability Domain LD | Load –Load, Load – Store | Full system
Load - Load/Store:This means that the barrier requires all loads to complete before the barrier butdoes not require stores to complete. Both loads and stores that appear after thebarrier in program order must wait for the barrier to complete.
Since Load - Load/Store means the barrier requires all loads to complete before the barrier, I think it has ensured the completion of memory access, so I am confused.
(2)
In ARM Cortex-A Series Programmer’s Guide for ARMv8-A doc, it also says DSB "enforces the same ordering as the Data Memory Barrier, but has the additional effect of blocking execution of any further instructions, not just loads or stores, or both, until synchronization is complete".
Since DSB can block any instructions, what's "ST" in "DSB ST" for?
(3)
I have already knew that DSB can replace DMB safely, but in what situation should we only use "DSB", not "DMB"? what's the difference between DSB and DMB? An example should be great.
Thanks!
According to your example, only if W1 is visible to all the other PEs, can W1 be complete. Is that right?
Do you mean that every PE /needs/ to load A, before W1 can be considered complete? Then, no. The definition of completion is not as strict (it says '/if/ a load to address...')
If the potential for reading the original (common) value (that existed before W1 clobbered A) does not exist, then W1 is complete. One of the PEs may not load A during the entire execution of the program. But if it /were/ to load A, whether it received the value from W1, or the original, would determine if W1 ever completed.
Edit: Definition 3.3
Late Edit2: The notion of visibility, that was implicitly assumed (and I failed to expose) in my descriptions, is what Arm describes as "A write W1 from an Observer is Observed-by a read R2 from a different Observer if and only if R2 Reads-from W1".
Accordingly, their definition of Completion of a Write includes a statement: "Any read to the same Location by an Observer within the shareability domain will either Reads-from W1,or Reads-from a write that is Coherence-after W1".
Here, "PE0.W1 Observed-by PE1.R2" == "PE1.R2 reads-from PE0.W1".
Any condition which requires a Coherence-after relation between the operations cannot be described by the example system, because its memory can contradict itself about the order of operations at a single location.
So,
- the clause about a load reading from a write subsequent to W1,
- the additional condition on the write completion, that "Any write to the same Location by an Observer within the shareability domain will be Coherence-after W1." ,
- the notion of visibility of a RW1 from a PE by a W2 from another PE,
these cannot be described by the example system.
The utility of the example system does not extend beyond showing that writes need propagation before they can complete, and that they might become visible to loads of other PEs at different intervals.
I have got the point. Thanks! It really helps a lot.