This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Why do we need atomicity in ARM Architecture?

How does atomicity work with the memory accesses?

Parents
  • That sounds like two or three questions to me.

    How does it work architecturally - what does a conforming implementation have to ensure.

    How would it be implemented in a system that wanted to exploit it well.

    What use are the instructions compared to what is already there in making a system work better..

    My understanding is:

    Architecturally they should act is if they are implemented using a load store exclusive loop. Except that for CAS a monitor might not be cleared if the value isn't changed and the performance counters might be different from expected and the store ones don't tell you what was there originally and are need not be counted as doing a load for memory barrier purposes.

    In practice they can be performed by exporting the operation to a further out point where the memory can be held exclusively and the operation done atomically there, and any caches in between if any cleared. I guess one could even send it off to another cache associated with another PE that held the data exclusively

    In a system that wanted to exploit it well, I guess there are problems with cache line boundaries one would have to be careful about when a structure contains items one might want to apply atomic operations to and other non-atomic data, but overall even ignoring that having atomic operations cuts down on the conflicts and data movement that using the load store exclusive operations involve. This is especially important in large systems with lots of processors.. It also helps avoid problems with debugging, nasty things have to be done in debuggers to cope with exclusive monitor loops! :) Basically they re cleaner and faster.

Reply
  • That sounds like two or three questions to me.

    How does it work architecturally - what does a conforming implementation have to ensure.

    How would it be implemented in a system that wanted to exploit it well.

    What use are the instructions compared to what is already there in making a system work better..

    My understanding is:

    Architecturally they should act is if they are implemented using a load store exclusive loop. Except that for CAS a monitor might not be cleared if the value isn't changed and the performance counters might be different from expected and the store ones don't tell you what was there originally and are need not be counted as doing a load for memory barrier purposes.

    In practice they can be performed by exporting the operation to a further out point where the memory can be held exclusively and the operation done atomically there, and any caches in between if any cleared. I guess one could even send it off to another cache associated with another PE that held the data exclusively

    In a system that wanted to exploit it well, I guess there are problems with cache line boundaries one would have to be careful about when a structure contains items one might want to apply atomic operations to and other non-atomic data, but overall even ignoring that having atomic operations cuts down on the conflicts and data movement that using the load store exclusive operations involve. This is especially important in large systems with lots of processors.. It also helps avoid problems with debugging, nasty things have to be done in debuggers to cope with exclusive monitor loops! :) Basically they re cleaner and faster.

Children