This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

[ARMV8] dmb nshld vs dmb ishld -- practical differences?

Hello arm experts,

I am trying to understand when a load access of a memory location might produce side effects that other observers in the system may care about. So far all the examples I can find around dmb memory barriers in the ARMV8 reference material, are focused on observability of *writes*, whose importance and shareability domains are fairly self-explanatory. What I have not been able to find, is an example of when one might prefer dmb ishld over dmb nshld, for example. Whether the memory address is in shareable memory or not, or visible to coherent caches or not, surely a read access cannot produce observable effects that would affect the correctness of the PE executing the dmb instruction?

If this is correct, then why does ARMV8 offer various domains instead of simply some dmb ld with the least restrictive domain possible? And, if this is not correct, then what would be a practical example where the difference between dmb nshld, dmb ishld, and dmb oshld, would matter?

Thanks!

Parents

0 Martin Weidmann over 1 year ago
The share-ability argument is saying who has to see the guarantees of the barrier (NSH=just this observer, ISH=observers in the Inner domain...). So the answer to your question is that you'd use ISHLD over NSHLD when it mattered that observers in the Inner share-ability domain saw the loads in order.

Let's take an example;

Thread 0 | Thread 1 ;
MOV W0,#1 | LDR W0,[FLAG];
STR W0,[MSG] | DMB xLD ;
DMB xST | LDR W2,[MSG] ;
MOV W2,#1 | ;
STR W2,[FLAG]| ;

Thread 0 is going to write a message (STR to MSG), then it is going to write a flag to say the message is valid (STR to FLAG). Thread 1 does the reverse, it reads the Flag first and then the message.

What we care about is that if Thread 1 sees the flag set then it MUST the message written also. To ensure that, we put a DMB xST in Thread 0 and a DMB xLD in Thread 1.

Now, share-ability. If Thread 0 and Thread 1 both run on the same PE (i.e. same non-shareable domain), then we could replace x with OSH. However, if the two threads might run on different PEs within the same Inner domain, then we need to replace x with ISH to get the guarantee we need. Similarly, if the two threads ran on different PEs in the same Outer domain, then we'd need OSH.

There's a great tool for experimenting with ordering type questions Memory Model Tool (arm.com). We can actually ask it that type of question. Here's the above test converted into the format used by the tool:

{
0:X1=x; 0:X3=y;
1:X1=x; 1:X3=y;
}
P0 | P1 ;
MOV W0,#1 | LDR W0,[X3] ;
STR W0,[X1] | DMB NSHLD ;
DMB NSHST | LDR W2,[X1] ;
MOV W2,#1 | ;
STR W2,[X3] | ;
exists
(1:X0=1 /\ 1:X2=0)

The "exists (1:X0=1 /\ 1:X2=0)" line is a question to the tool. It's saying "is it possible for P1 to end the test with X0=1 and X2=0", or "is it possible for P1 to end the test having seen the Flag but not the Message".

If we run the test in the tool it says:

Test MP Allowed States 4 1:X0=0; 1:X2=0; 1:X0=0; 1:X2=1; 1:X0=1; 1:X2=0; 1:X0=1; 1:X2=1; Ok Witnesses Positive: 1 Negative: 3

So... yes, it is possible! There are four legal outcomes, one of which mataches the pattern we asked the tool to look for.

Meaning, if we specify NSH as the share-ability for the barriers, the ordering guarantee only applies to the Non-shareable domain. As these are two different PEs, and therefore in different Non-shareable domains, the barriers are not enough to get the desired effect.

Now, lets change it from NSH to ISH:

{
0:X1=x; 0:X3=y;
1:X1=x; 1:X3=y;
}
P0 | P1 ;
MOV W0,#1 | LDR W0,[X3] ;
STR W0,[X1] | DMB ISHLD ;
DMB ISHST | LDR W2,[X1] ;
MOV W2,#1 | ;
STR W2,[X3] | ;
exists
(1:X0=1 /\ 1:X2=0)

Now the model says:

Test MP Allowed States 3 1:X0=0; 1:X2=0; 1:X0=0; 1:X2=1; 1:X0=1; 1:X2=1; No Witnesses Positive: 0 Negative: 3

Now the tool is saying there is no possible/legal result where P1 sees the flag but not the message.
Cancel
Up 0 Down

Cancel

Reply

0 Martin Weidmann over 1 year ago
The share-ability argument is saying who has to see the guarantees of the barrier (NSH=just this observer, ISH=observers in the Inner domain...). So the answer to your question is that you'd use ISHLD over NSHLD when it mattered that observers in the Inner share-ability domain saw the loads in order.

Let's take an example;

Thread 0 | Thread 1 ;
MOV W0,#1 | LDR W0,[FLAG];
STR W0,[MSG] | DMB xLD ;
DMB xST | LDR W2,[MSG] ;
MOV W2,#1 | ;
STR W2,[FLAG]| ;

Thread 0 is going to write a message (STR to MSG), then it is going to write a flag to say the message is valid (STR to FLAG). Thread 1 does the reverse, it reads the Flag first and then the message.

What we care about is that if Thread 1 sees the flag set then it MUST the message written also. To ensure that, we put a DMB xST in Thread 0 and a DMB xLD in Thread 1.

Now, share-ability. If Thread 0 and Thread 1 both run on the same PE (i.e. same non-shareable domain), then we could replace x with OSH. However, if the two threads might run on different PEs within the same Inner domain, then we need to replace x with ISH to get the guarantee we need. Similarly, if the two threads ran on different PEs in the same Outer domain, then we'd need OSH.

There's a great tool for experimenting with ordering type questions Memory Model Tool (arm.com). We can actually ask it that type of question. Here's the above test converted into the format used by the tool:

{
0:X1=x; 0:X3=y;
1:X1=x; 1:X3=y;
}
P0 | P1 ;
MOV W0,#1 | LDR W0,[X3] ;
STR W0,[X1] | DMB NSHLD ;
DMB NSHST | LDR W2,[X1] ;
MOV W2,#1 | ;
STR W2,[X3] | ;
exists
(1:X0=1 /\ 1:X2=0)

The "exists (1:X0=1 /\ 1:X2=0)" line is a question to the tool. It's saying "is it possible for P1 to end the test with X0=1 and X2=0", or "is it possible for P1 to end the test having seen the Flag but not the Message".

If we run the test in the tool it says:

Test MP Allowed States 4 1:X0=0; 1:X2=0; 1:X0=0; 1:X2=1; 1:X0=1; 1:X2=0; 1:X0=1; 1:X2=1; Ok Witnesses Positive: 1 Negative: 3

So... yes, it is possible! There are four legal outcomes, one of which mataches the pattern we asked the tool to look for.

Meaning, if we specify NSH as the share-ability for the barriers, the ordering guarantee only applies to the Non-shareable domain. As these are two different PEs, and therefore in different Non-shareable domains, the barriers are not enough to get the desired effect.

Now, lets change it from NSH to ISH:

{
0:X1=x; 0:X3=y;
1:X1=x; 1:X3=y;
}
P0 | P1 ;
MOV W0,#1 | LDR W0,[X3] ;
STR W0,[X1] | DMB ISHLD ;
DMB ISHST | LDR W2,[X1] ;
MOV W2,#1 | ;
STR W2,[X3] | ;
exists
(1:X0=1 /\ 1:X2=0)

Now the model says:

Test MP Allowed States 3 1:X0=0; 1:X2=0; 1:X0=0; 1:X2=1; 1:X0=1; 1:X2=1; No Witnesses Positive: 0 Negative: 3

Now the tool is saying there is no possible/legal result where P1 sees the flag but not the message.
Cancel
Up 0 Down

Cancel

Children

0 Vijay G over 1 year ago in reply to Martin Weidmann

Thanks for the examples Martin! I tried simulating the following, thinking that the observation of loads shouldn't affect the result:

AArch64 MP
"PodWW Rfe PodRR Fre"
Cycle=Rfe PodRR Fre PodWW
Generator=diycross7 (version 7.54+01(dev))
Prefetch=0:x=F,0:y=W,1:y=F,1:x=T
Com=Rf Fr
Orig=PodWW Rfe PodRR Fre
{
0:X1=x; 0:X3=y;
1:X1=y; 1:X3=x;
}
P0 | P1 ;
MOV W0,#1 | LDR W0,[X1] ;
STR W0,[X1] | DMB NSHLD ;
DMB ISHST | LDR W2,[X3] ;
MOV W2,#1 | ;
STR W2,[X3] | ;
exists
(1:X0=1 /\ 1:X2=0)

And, surprisingly got:

Test MP Allowed
States 4
1:X0=0; 1:X2=0;
1:X0=0; 1:X2=1;
1:X0=1; 1:X2=0;
1:X0=1; 1:X2=1;
Ok
Witnesses
Positive: 1 Negative: 3
Flag Assuming-common-inner-shareable-domain
Condition exists (1:X0=1 /\ 1:X2=0)
Observation MP Sometimes 1 3

Looking at the execution diagrams for this as well as substituting dmb nsh, dmb nshld, and dmb nshst, I noticed that the execution flow was labelled as po or program-order for the NSHx cases. Looking at armfences.cat and aarch64fences.cat, it doesn't look like the NSHx barriers are implemented in the simulator, and don't barrier memory accesses on even the single-same PE. Is that correct?

What I am trying to determine, is if there is a practical situation where correctness between threads running on different PEs could depend on other PEs having observed that a particular PE performed loads in a certain order.
Cancel
Up 0 Down

Cancel
0 Vijay G over 1 year ago in reply to Martin Weidmann

Thanks for the examples Martin! I wanted to test if DMB NSHx would at least barrier accesses on the self-same PE, so I tried:

AArch64 MP
"PodWW Rfe PodRR Fre"
Cycle=Rfe PodRR Fre PodWW
Generator=diycross7 (version 7.54+01(dev))
Prefetch=0:x=F,0:y=W,1:y=F,1:x=T
Com=Rf Fr
Orig=PodWW Rfe PodRR Fre
{
0:X1=x; 0:X3=y;
1:X1=y; 1:X3=x;
}
P0 | P1 ;
MOV W0,#1 | LDR W0,[X1] ;
STR W0,[X1] | DMB NSHLD ;
DMB ISHST | LDR W2,[X3] ;
MOV W2,#1 | ;
STR W2,[X3] | ;
exists
(1:X0=1 /\ 1:X2=0)

And the output was:

Test MP Allowed
States 4
1:X0=0; 1:X2=0;
1:X0=0; 1:X2=1;
1:X0=1; 1:X2=0;
1:X0=1; 1:X2=1;
Ok
Witnesses
Positive: 1 Negative: 3
Flag Assuming-common-inner-shareable-domain
Condition exists (1:X0=1 /\ 1:X2=0)
Observation MP Sometimes 1 3

Looking through armfences.cat and aarch64fences.cat, it looks like DMB NSHx are not actually implemented in the simulator. Is that correct?

I am trying to determine if there's a practical case where PE-X could depend on having observed a certain order of loads by PE-Y, or if DMB NSHLD would be generally safe to use.
Cancel
Up 0 Down

Cancel
0 Martin Weidmann over 1 year ago in reply to Vijay G

I thought NSH was covered by the model, but you could ask the team who work on it. There's a contact email address on the page that describes the model.

Vijay G said:
What I am trying to determine, is if there is a practical situation where correctness between threads running on different PEs could depend on other PEs having observed that a particular PE performed loads in a certain order.

I'm not sure I understand. Isn't the mail box example just that? For the message to be passed correctly the reads would appear to happen in order.
Cancel
Up 0 Down

Cancel
0 Vijay G over 1 year ago in reply to Martin Weidmann

In the mailbox example, P0 is writing the message and the flag, and P1 is reading the message and the flag. Does P0 need to observe P1's loads, in order to ensure program correctness?
Cancel
Up 0 Down

Cancel
0 Martin Weidmann over 1 year ago in reply to Vijay G
Hmm. If you extend the mailbox example and say P1 clears the flag to acknowledge receipt of the message. When P0 sees the flag cleared, it is permitted to write the message field again. The property we'd need to guarantee is that a write by P0 to message after seeing the cleared flag can't change the value of message read by P1 before it cleared the flag.

This would give you a chain of dependencies.

P0's write to the flag must not be re-ordered relative to its first write of the message.

P1's read of the message must not be re-ordered relative to the read of the flag.

P1's write to the flag must not be re-ordered relative to its read of message.

P0's second write to message must no be re-ordered relative to its reads of the cleared flag.

I don't know if that's what you meant by one PE observing another's reads. But it's a real (if simplified) example of where the writes by one PE must be ordered with respect to "earlier" reads by a different PE.
Cancel
Up 0 Down

Cancel
0 Vijay G over 1 year ago in reply to Martin Weidmann

If I understand correctly, in the extended example above, we would expect the following:

Martin Weidmann said:
P0's write to the flag must not be re-ordered relative to its first write of the message.

This would be satisfied by a DMB ISH or DMB ISHST instruction on P0 between writing the message and writing the flag.

Martin Weidmann said:
P1's read of the message must not be re-ordered relative to the read of the flag.

This could be satisfied by a DMB NSHLD instruction on P1 between loading the flag and loading the message (i.e. we would not need to use DMB ISH or DMB ISHLD to ensure P0 observed P1 loading the flag.)

Martin Weidmann said:
P1's write to the flag must not be re-ordered relative to its read of message.

This could be satisfied by a DMB NSHLD instruction on P1 between loading the message and writing the flag. Aside: If P1 were to produce some other state where observers expected to see the flag cleared before seeing state from P1, then P1 should use DMB ISHST after writing the flag, or, perhaps write the flag using STLR.

Martin Weidmann said:
P0's second write to message must no be re-ordered relative to its reads of the cleared flag.

And this could be satisfied by a DMB NSHx instruction on P0 between loading the flag and writing the message (i.e. we would not need to use DMB ISHx to ensure P1 observed P0 loading the flag.) But, you would probably use DMB ISHx here just to prevent the second message write from being observed before the first message write (depending exactly how you wrote your flag polling loop on P0.)

Martin Weidmann said:
I don't know if that's what you meant by one PE observing another's reads.

Not quite I don't think. Some more context here might help to clarify. My team would like to ensure that a given PE does not re-order its own loads relative to each other -- you could say it is quite like P1 in the original mailbox example. We would like to use the least restrictive barrier possible for this, and based on the documentation and our own tests, DMB NSHLD appears to be sufficient for this. But, a question has been raised as to what exactly are the effects of P1 loading a value that other observers can observe, and, if there are any practical cases where observers could need to see those effects (thus necessitating the use of DMB ISHLD or DMB OSHLD instead.)
Cancel
Up 0 Down

Cancel