This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

[ARMV8] dmb nshld vs dmb ishld -- practical differences?

Hello arm experts,

I am trying to understand when a load access of a memory location might produce side effects that other observers in the system may care about. So far all the examples I can find around dmb memory barriers in the ARMV8 reference material, are focused on observability of *writes*, whose importance and shareability domains are fairly self-explanatory. What I have not been able to find, is an example of when one might prefer dmb ishld over dmb nshld, for example. Whether the memory address is in shareable memory or not, or visible to coherent caches or not, surely a read access cannot produce observable effects that would affect the correctness of the PE executing the dmb instruction?

If this is correct, then why does ARMV8 offer various domains instead of simply some dmb ld with the least restrictive domain possible? And, if this is not correct, then what would be a practical example where the difference between dmb nshld, dmb ishld, and dmb oshld, would matter?

Thanks!

Parents
  • And, surprisingly got:

    It might be that the P1 expects a LD barrier that corresponds to the ST barrier, in order for it to respond to the invalidation of [X3] upon seeing the LD barrier. NSHLD isn't expected to pair with ISHST. In the absence of the appropriate LD barrier, P1 can delay the invalidation, and thus read the stale value.

    NSHLD guarantees ordering of only LD-LD and LD-ST instructions occurring on the PE executing the barrier, and the scope of the ordering is limited to that PE. From the test you showed, it cannot be concluded that NSHLD failed - it seems that NSHLD did not involve itself with the invalidation of [X3] which originated outside of this PE. Tests on an actual hardware can perhaps show the expected behaviour, if it occurs frequently enough.

    My team would like to ensure that a given PE does not re-order its own loads relative to each other

    Why?

    Also, barriers are not needed if the PE itself is the only observer of the effects of the its LD/ST instructions (unless there actually are multiple observers being considered for this single PE, such as in the cases of self-modifying code, cache/tlb/page-table mgmt, etc.. Or there actually are multiple PEs involved, though that doesn't seem to be likely since you are looking at NSH scope.)

    Have you looked at introducing artificial dependencies between the loads (for e.g. ANDing the value returned by the first load with 0, and adding the result to the address of the second load?)

    What I have not been able to find, is an example of when one might prefer dmb ishld over dmb nshld, for example

    The LD barriers are usually paired with ST barriers. The ST barrier decides the scope, and the observers in that scope, that wish to read from the affected stores in an orderly fashion, must employ a suitable LD barrier.

Reply
  • And, surprisingly got:

    It might be that the P1 expects a LD barrier that corresponds to the ST barrier, in order for it to respond to the invalidation of [X3] upon seeing the LD barrier. NSHLD isn't expected to pair with ISHST. In the absence of the appropriate LD barrier, P1 can delay the invalidation, and thus read the stale value.

    NSHLD guarantees ordering of only LD-LD and LD-ST instructions occurring on the PE executing the barrier, and the scope of the ordering is limited to that PE. From the test you showed, it cannot be concluded that NSHLD failed - it seems that NSHLD did not involve itself with the invalidation of [X3] which originated outside of this PE. Tests on an actual hardware can perhaps show the expected behaviour, if it occurs frequently enough.

    My team would like to ensure that a given PE does not re-order its own loads relative to each other

    Why?

    Also, barriers are not needed if the PE itself is the only observer of the effects of the its LD/ST instructions (unless there actually are multiple observers being considered for this single PE, such as in the cases of self-modifying code, cache/tlb/page-table mgmt, etc.. Or there actually are multiple PEs involved, though that doesn't seem to be likely since you are looking at NSH scope.)

    Have you looked at introducing artificial dependencies between the loads (for e.g. ANDing the value returned by the first load with 0, and adding the result to the address of the second load?)

    What I have not been able to find, is an example of when one might prefer dmb ishld over dmb nshld, for example

    The LD barriers are usually paired with ST barriers. The ST barrier decides the scope, and the observers in that scope, that wish to read from the affected stores in an orderly fashion, must employ a suitable LD barrier.

Children
No data