Hi All,
I am a bit confused with Exclusive access and Locked access
Would you give me example what scenario master will use exclusive access and Locked access?
Thanks a lot!
Hi Martin, Thanks for reply. You said The exclusive write happens only if the tag is still there. if Master already tagging the address, why the tag is missed? Can other Masters delete the tag?
Hi Colin, thanks for reply
if Master use non-bufferable write, the performance are usually degraded(because of bresp must be obtained from the final destination)?
The exclusive access monitor watches for successful writes, or possibly other exclusive reads, from other masters in the system to the monitored address.
Let's say you have two CPUs trying to down a counting semaphore. Both need to atomically test whether the semaphore is greater than zero, and if it is, decrement its value in memory.
Both CPUs will try to something along the lines of LDREX, SUBS, BMI, STREX, CMPS, BNE in a loop. SUBS will detect 0 by the result of the subtract going negative, and set N=1 if it does. BMI is to branch to a "go to sleep, semaphore is busy" path. The CMPS, BNE will loop if the STREX failed. CMPS, BNE may be replaced with CBZ/CBNZ if available.
Imagine the following interleaved accesses to the semaphore by the two CPUs:
In this scenario, CPU A was blocked from updating the semaphore, because B raced with it and snatched the exclusive monitor away from it. Now, notice I said "successful writes." Because the STREX was denied, there's no reason for it to redirect the monitor.
The monitor can implement a different policy. If a second CPU tries to obtain the monitor for the same address before the first CPU has made its write, the monitor can decide not to switch, like so.
The second strategy has some advantages if you guarantee LDREX/STREX pairs are guaranteed to arrive close to each other, or you place a timeout on the monitor. If you get an interrupt or a long stall between LDREX and STREX, you could cause undue fairness issues in the system. Note: CLREX can be used to clear active monitors in an interrupt handler.
For device and non-cacheable memory, exclusive access monitors are typically implemented in the endpoint device, or at an upstream point in the interconnect that all accesses to that endpoint must come through. Home Nodes, for example, in CHI interconnects, could implement a monitor for downstream memory or devices. Or, you could imagine a proxy bridge in AXI interconnects that could work with arbitrary IP behind it.
For normal memory in a coherent interconnect, the L1 cache controller can implement the exclusive monitor for normal, cacheable, shared address ranges, by pulling the line into the Exclusive state. Coherence snoops due to other CPUs' accesses would cancel the monitor. That strategy looks like the first scenario above. I believe this is what the Cortex-A53 does with its internal exclusive monitor. (See the TRM.)
Note: My description above is meant to be generic for this class of primitive, and not specific to a particular ARM implementation, except where noted. Other platforms call this primitive "Load Link, Store Conditional." That may help you find other resources on how this form of optimistic synchronization primitive works.
Yes, that's definitely true.
But in your question when you were assuming the monitor was in the interconnect, you were concerned that the access hadn't actually reached target S0. If you use a bufferable write, the response for that write transaction can be returned by any intermediate component (such as a write buffer) according to the AXI protocol, so it might not even reach an exclusive access monitor before a non-exclusive response is returned.
The AXI protocol states that you must use AxCACHE encodings that ensure the target monitoring the transactions will actually see the transactions. The obvious concern here would be if the transaction was marked as cacheable, and so was stored in a cache and not accessing the target, so not monitored. But bufferable is also then a concern, and a bufferable transaction could be held in a buffer before the exclusive monitor logic, and so the exclusivity of the access is not tested when a write response is returned to the transaction source.
So yes, non-bufferable transfers might not return as quick a BRESP as a bufferable one, but we want to ensure that it is the final destination S0 target that returns the response, so a small added latency is the penalty you might have to accept. Hopefully exclusive accesses are not a significant number of your transactions, so it shouldn't be a significant performance degradation.
Hi Jzbiciak,
thank you for your comment, for the example of interleaved accesses to semaphore by the two CPUs, are there any specification or tutorial ?
Thank you
I don't know of a specific tutorial that walks through the various cases with examples. There are some videos online regarding Load link/Store conditional (LL/SC), which is the name for what LDREX/STREX (or LDXR/STXR if you speak ARMv8) provide. Those videos talk in general principles. There's some architecture course material online as well.
If you want more concrete implementation details, you can look at real-world implementations and how they map onto architecture specifications. For example: