We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
In
ARM Cortex-A Series Programmer’s Guide for ARMv8-A: 13.2.4. Non-temporal load and store pair
it talks about a relaxation of the memory ordering requirements and then gives the example
LDR X0, [X3]
DMB NSHLD
LDNP X2, X1, [X0]
saying the memory barrier is needed otherwise it might read from an unpredictable address. I don't follow this at all,it just seems wrong to me.
I think you have complicated it unnecessarily by introducing the need for two processes. It is more fundamental than that.
The sequence right at the start of this thread shows that this is fundamental behaviour at the level of instruction execution on a single processor. In this sequence...
...what the architecture tells you is that the second instruction may complete (i.e. access the buffer at [x0]) before the first instruction completes (setting the value of x0 by reading from [x3]). If that matters to you, then you either need to insert a barrier (as in the example) or use a standard LDP instruction, rather than LDNP.
I hope that makes it clear!
Chris
The wording in the document makes it seem like the wrong address may be used for LDNP - not the one loaded from [X3].I can't believe that is so. I really do think it is talking about address dependencies not being observed. With that as you say it could use data from [X0] from before when the load of X0 is done.
If the problem is at the level of a single process then an example like
store 1 in buffer
store 2 in buffer
use LDNP to load from the buffer - it may get 1
would do the trick. Sounds ghastly but if true that would get the message across.
In fact just looking again at that document in 13.1.1 it talks about 'address dependencies' but uses the term to refer to a store-load dependency.
Looking at the ARM site I see that
Barrier Litmus Tests and Cookbook
is superceded but I don't know by what so I don't know what the status of address dependency as a method to implement barriers is. I wouldn't mourn its loss but it looked like it was there for some good reason. In the ARMv8 ARM it talks about address dependency in the same way as this document - as a dependency between two reads or a read and a write o the same location but the structure of the example here is as in the Litmus test document.
"The wording in the document makes it seem like the wrong address may be used for LDNP - not the one loaded from [X3].I can't believe that is so."
It is indeed so! That is exactly what it is saying. And, in some circumstances, that is the behaviour which the programmer wants. Clearly, when using these instructions, you must be careful not to use them in ways which give undesirable behaviour.
"I really do think it is talking about address dependencies not being observed. With that as you say it could use data from [X0] from before when the load of X0 is done."
Yes, that's exactly what is is saying. Strange though it may seem!
PS - the Barrier Litmus Test document has been superseded since all of its content is now included in the ARMv8 ARM. It may not be expressed in exactly same wording but all the content has been included.
Looking at the ARMv8 ARM I see that it does describe address dependency in the way I mean in section B2.7.2 and it is consistent with the way the term is used in the Litmus test, so yes the Litmus test has been incorporated thanks.
So you are basically saying that a LDNP instruction does not even follow the basic register data dependency as described in 'Address dependencies and order' in that section? I am afraid, I think you have somehow got the wrong end of the stick somewhere as I think this type behavior is completely broken. The two things you said yes to above are different - address dependencies are not the same as register data dependencies, the first was a register data dependency and the second was an address dependency.