Hi All,
I am a bit confused with Exclusive access and Locked access
Would you give me example what scenario master will use exclusive access and Locked access?
Thanks a lot!
"Atomic" accesses such as exclusive and locked accesses are used whenever a transaction source needs to know that it can complete a series of transfers to a target without any other transaction source being able to access that target address range.
So examples of this could be any reconfiguration exercise where you don't want another transaction source being able to use this target while it is only partially configured (or reconfigured), so maybe where you are setting up a lookup table and don't want it used before you have finished setting it up. Or where you are passing semaphores between processes, so need to know that the semaphore value you read, modify and then write, hasn't been written to between the read and write accesses.
Locked accesses are simpler to implement, they just stop any other transaction source being able to access this target while it is locked, but this has a latency impact if this "locked" target is a large device that other sources could be wanting access to areas not being updated by the "locked" sequence.
So exclusive accesses improve access latency for other sources by allowing all accesses to this target, but monitoring just the address range being exclusive accessed so that the exclusive sequence can be marked as successful if no other source wrote to any of the monitored address range locations during the exclusive sequence, or failed (and needing repeating) if any of the monitored locations were updated. But at a cost of complexity as you need to have monitoring logic to keep track of accesses to the "exclusive" address range to know if the exclusive sequence was successful or not.
Hi Collin,
Thanks for reply. I still have some problems
Q1:Why a master need to do exclusive read before exclusive write?
Q2: If there are 2 masters(M0,M1), 2 slaves (S0,S1), 1 interconnect. if M0 write a data to s0 (address A), when M0 recived bresp. this bresp is sent by interconnect instead of S0. therefore, it does not mean that data is written to s0 when M0 receiving bresp .if M0 wants the data is exactly written to s0(address A), and then (M1 or M0 ) can write/read that s0 (address A). How does M0 do that using Atomic access?
Thank you
tom said:Q1:Why a master need to do exclusive read before exclusive write?
An exclusive write is saying "this write should only succeed if the location has NOT been written to since I did an exclusive read from it". That lets software implement things like semaphores and mutexes.
You can think of the exclusive read as tagging the address. Subsequents write to the location clear that tag. The exclusive write happens only if the tag is still there, as that shows no one wrote the location since you read it. If that tag isn't there, someone else wrote to the location and the exclusive store fails.
So to answer your question: If you don't do the exclusive read the exclusive write is almost certain to fail.
Martin has hopefully answered your first question, although a simpler way of answering might simply be to state that exclusives are there to support read-modify-write sequences, so a read is always the first step in the sequence.
For your second question, the exclusive access monitor is not part of the interconnect. It is usually part of the destination target device, so any write response the monitor returns will need to factor in both whether the exclusive nature of the transfer completed successfully or not, and also the OKAY/ERROR nature of the transfer.
So if the exclusive write was a "non-bufferable" write, the response returned on BRESP will indicate if the transfer completed exclusively, and also if the transfer was stored in the "S0" destination.
I said above that the monitor is "usually" part of the destination device. This can be fully integrated with it, or perhaps a standalone monitor attached to the destination device. But it could also be further away from the final destination device, as long as it is able to see all transfers that can access the final destination device.
Hi Martin, Thanks for reply. You said The exclusive write happens only if the tag is still there. if Master already tagging the address, why the tag is missed? Can other Masters delete the tag?
Hi Colin, thanks for reply
if Master use non-bufferable write, the performance are usually degraded(because of bresp must be obtained from the final destination)?
The exclusive access monitor watches for successful writes, or possibly other exclusive reads, from other masters in the system to the monitored address.
Let's say you have two CPUs trying to down a counting semaphore. Both need to atomically test whether the semaphore is greater than zero, and if it is, decrement its value in memory.
Both CPUs will try to something along the lines of LDREX, SUBS, BMI, STREX, CMPS, BNE in a loop. SUBS will detect 0 by the result of the subtract going negative, and set N=1 if it does. BMI is to branch to a "go to sleep, semaphore is busy" path. The CMPS, BNE will loop if the STREX failed. CMPS, BNE may be replaced with CBZ/CBNZ if available.
Imagine the following interleaved accesses to the semaphore by the two CPUs:
In this scenario, CPU A was blocked from updating the semaphore, because B raced with it and snatched the exclusive monitor away from it. Now, notice I said "successful writes." Because the STREX was denied, there's no reason for it to redirect the monitor.
The monitor can implement a different policy. If a second CPU tries to obtain the monitor for the same address before the first CPU has made its write, the monitor can decide not to switch, like so.
The second strategy has some advantages if you guarantee LDREX/STREX pairs are guaranteed to arrive close to each other, or you place a timeout on the monitor. If you get an interrupt or a long stall between LDREX and STREX, you could cause undue fairness issues in the system. Note: CLREX can be used to clear active monitors in an interrupt handler.
For device and non-cacheable memory, exclusive access monitors are typically implemented in the endpoint device, or at an upstream point in the interconnect that all accesses to that endpoint must come through. Home Nodes, for example, in CHI interconnects, could implement a monitor for downstream memory or devices. Or, you could imagine a proxy bridge in AXI interconnects that could work with arbitrary IP behind it.
For normal memory in a coherent interconnect, the L1 cache controller can implement the exclusive monitor for normal, cacheable, shared address ranges, by pulling the line into the Exclusive state. Coherence snoops due to other CPUs' accesses would cancel the monitor. That strategy looks like the first scenario above. I believe this is what the Cortex-A53 does with its internal exclusive monitor. (See the TRM.)
Note: My description above is meant to be generic for this class of primitive, and not specific to a particular ARM implementation, except where noted. Other platforms call this primitive "Load Link, Store Conditional." That may help you find other resources on how this form of optimistic synchronization primitive works.
Yes, that's definitely true.
But in your question when you were assuming the monitor was in the interconnect, you were concerned that the access hadn't actually reached target S0. If you use a bufferable write, the response for that write transaction can be returned by any intermediate component (such as a write buffer) according to the AXI protocol, so it might not even reach an exclusive access monitor before a non-exclusive response is returned.
The AXI protocol states that you must use AxCACHE encodings that ensure the target monitoring the transactions will actually see the transactions. The obvious concern here would be if the transaction was marked as cacheable, and so was stored in a cache and not accessing the target, so not monitored. But bufferable is also then a concern, and a bufferable transaction could be held in a buffer before the exclusive monitor logic, and so the exclusivity of the access is not tested when a write response is returned to the transaction source.
So yes, non-bufferable transfers might not return as quick a BRESP as a bufferable one, but we want to ensure that it is the final destination S0 target that returns the response, so a small added latency is the penalty you might have to accept. Hopefully exclusive accesses are not a significant number of your transactions, so it shouldn't be a significant performance degradation.
Hi Jzbiciak,
thank you for your comment, for the example of interleaved accesses to semaphore by the two CPUs, are there any specification or tutorial ?
I don't know of a specific tutorial that walks through the various cases with examples. There are some videos online regarding Load link/Store conditional (LL/SC), which is the name for what LDREX/STREX (or LDXR/STXR if you speak ARMv8) provide. Those videos talk in general principles. There's some architecture course material online as well.
If you want more concrete implementation details, you can look at real-world implementations and how they map onto architecture specifications. For example: