How does atomicity work with the memory accesses?
Are you asking about the atomic memory access instructions added in ARMv8.1-A?
yes
That sounds like two or three questions to me.
How does it work architecturally - what does a conforming implementation have to ensure.
How would it be implemented in a system that wanted to exploit it well.
What use are the instructions compared to what is already there in making a system work better..
My understanding is:
Architecturally they should act is if they are implemented using a load store exclusive loop. Except that for CAS a monitor might not be cleared if the value isn't changed and the performance counters might be different from expected and the store ones don't tell you what was there originally and are need not be counted as doing a load for memory barrier purposes.
In practice they can be performed by exporting the operation to a further out point where the memory can be held exclusively and the operation done atomically there, and any caches in between if any cleared. I guess one could even send it off to another cache associated with another PE that held the data exclusively
In a system that wanted to exploit it well, I guess there are problems with cache line boundaries one would have to be careful about when a structure contains items one might want to apply atomic operations to and other non-atomic data, but overall even ignoring that having atomic operations cuts down on the conflicts and data movement that using the load store exclusive operations involve. This is especially important in large systems with lots of processors.. It also helps avoid problems with debugging, nasty things have to be done in debuggers to cope with exclusive monitor loops! :) Basically they re cleaner and faster.
Can you give an example or perhaps assembly code.
I'm thinking I may have misunderstood the question as asking why the atomic instructions in ARMv8.1 are wanted when their work can be done using the acquire/release and exclusive load/store instructions on the base architecture. If you are asking instead why it is extremely desirable to be able to support atomic operations irrespective of the architecture. then here's an introduction and WIkipedia has lots of more detailed entries if one searches on the various terms
Atomic Operations in Hardware