Hello arm experts,
I am trying to understand when a load access of a memory location might produce side effects that other observers in the system may care about. So far all the examples I can find around dmb memory barriers in the ARMV8 reference material, are focused on observability of *writes*, whose importance and shareability domains are fairly self-explanatory. What I have not been able to find, is an example of when one might prefer dmb ishld over dmb nshld, for example. Whether the memory address is in shareable memory or not, or visible to coherent caches or not, surely a read access cannot produce observable effects that would affect the correctness of the PE executing the dmb instruction?
dmb
dmb ishld
dmb nshld
If this is correct, then why does ARMV8 offer various domains instead of simply some dmb ld with the least restrictive domain possible? And, if this is not correct, then what would be a practical example where the difference between dmb nshld, dmb ishld, and dmb oshld, would matter?
dmb ld
dmb oshld
Thanks!
I thought NSH was covered by the model, but you could ask the team who work on it. There's a contact email address on the page that describes the model.
Vijay G said:What I am trying to determine, is if there is a practical situation where correctness between threads running on different PEs could depend on other PEs having observed that a particular PE performed loads in a certain order.
I'm not sure I understand. Isn't the mail box example just that? For the message to be passed correctly the reads would appear to happen in order.
In the mailbox example, P0 is writing the message and the flag, and P1 is reading the message and the flag. Does P0 need to observe P1's loads, in order to ensure program correctness?
Hmm. If you extend the mailbox example and say P1 clears the flag to acknowledge receipt of the message. When P0 sees the flag cleared, it is permitted to write the message field again. The property we'd need to guarantee is that a write by P0 to message after seeing the cleared flag can't change the value of message read by P1 before it cleared the flag.
This would give you a chain of dependencies.
I don't know if that's what you meant by one PE observing another's reads. But it's a real (if simplified) example of where the writes by one PE must be ordered with respect to "earlier" reads by a different PE.
If I understand correctly, in the extended example above, we would expect the following:
Martin Weidmann said:P0's write to the flag must not be re-ordered relative to its first write of the message.
This would be satisfied by a DMB ISH or DMB ISHST instruction on P0 between writing the message and writing the flag.
Martin Weidmann said:P1's read of the message must not be re-ordered relative to the read of the flag.
This could be satisfied by a DMB NSHLD instruction on P1 between loading the flag and loading the message (i.e. we would not need to use DMB ISH or DMB ISHLD to ensure P0 observed P1 loading the flag.)
Martin Weidmann said:P1's write to the flag must not be re-ordered relative to its read of message.
This could be satisfied by a DMB NSHLD instruction on P1 between loading the message and writing the flag. Aside: If P1 were to produce some other state where observers expected to see the flag cleared before seeing state from P1, then P1 should use DMB ISHST after writing the flag, or, perhaps write the flag using STLR.
Martin Weidmann said:P0's second write to message must no be re-ordered relative to its reads of the cleared flag.
And this could be satisfied by a DMB NSHx instruction on P0 between loading the flag and writing the message (i.e. we would not need to use DMB ISHx to ensure P1 observed P0 loading the flag.) But, you would probably use DMB ISHx here just to prevent the second message write from being observed before the first message write (depending exactly how you wrote your flag polling loop on P0.)
Martin Weidmann said:I don't know if that's what you meant by one PE observing another's reads.
Not quite I don't think. Some more context here might help to clarify. My team would like to ensure that a given PE does not re-order its own loads relative to each other -- you could say it is quite like P1 in the original mailbox example. We would like to use the least restrictive barrier possible for this, and based on the documentation and our own tests, DMB NSHLD appears to be sufficient for this. But, a question has been raised as to what exactly are the effects of P1 loading a value that other observers can observe, and, if there are any practical cases where observers could need to see those effects (thus necessitating the use of DMB ISHLD or DMB OSHLD instead.)