Hi All,
I am a bit confused with Exclusive access and Locked access
Would you give me example what scenario master will use exclusive access and Locked access?
Thanks a lot!
The exclusive access monitor watches for successful writes, or possibly other exclusive reads, from other masters in the system to the monitored address.
Let's say you have two CPUs trying to down a counting semaphore. Both need to atomically test whether the semaphore is greater than zero, and if it is, decrement its value in memory.
Both CPUs will try to something along the lines of LDREX, SUBS, BMI, STREX, CMPS, BNE in a loop. SUBS will detect 0 by the result of the subtract going negative, and set N=1 if it does. BMI is to branch to a "go to sleep, semaphore is busy" path. The CMPS, BNE will loop if the STREX failed. CMPS, BNE may be replaced with CBZ/CBNZ if available.
Imagine the following interleaved accesses to the semaphore by the two CPUs:
In this scenario, CPU A was blocked from updating the semaphore, because B raced with it and snatched the exclusive monitor away from it. Now, notice I said "successful writes." Because the STREX was denied, there's no reason for it to redirect the monitor.
The monitor can implement a different policy. If a second CPU tries to obtain the monitor for the same address before the first CPU has made its write, the monitor can decide not to switch, like so.
The second strategy has some advantages if you guarantee LDREX/STREX pairs are guaranteed to arrive close to each other, or you place a timeout on the monitor. If you get an interrupt or a long stall between LDREX and STREX, you could cause undue fairness issues in the system. Note: CLREX can be used to clear active monitors in an interrupt handler.
For device and non-cacheable memory, exclusive access monitors are typically implemented in the endpoint device, or at an upstream point in the interconnect that all accesses to that endpoint must come through. Home Nodes, for example, in CHI interconnects, could implement a monitor for downstream memory or devices. Or, you could imagine a proxy bridge in AXI interconnects that could work with arbitrary IP behind it.
For normal memory in a coherent interconnect, the L1 cache controller can implement the exclusive monitor for normal, cacheable, shared address ranges, by pulling the line into the Exclusive state. Coherence snoops due to other CPUs' accesses would cancel the monitor. That strategy looks like the first scenario above. I believe this is what the Cortex-A53 does with its internal exclusive monitor. (See the TRM.)
Note: My description above is meant to be generic for this class of primitive, and not specific to a particular ARM implementation, except where noted. Other platforms call this primitive "Load Link, Store Conditional." That may help you find other resources on how this form of optimistic synchronization primitive works.
Hi Jzbiciak,
thank you for your comment, for the example of interleaved accesses to semaphore by the two CPUs, are there any specification or tutorial ?
Thank you
I don't know of a specific tutorial that walks through the various cases with examples. There are some videos online regarding Load link/Store conditional (LL/SC), which is the name for what LDREX/STREX (or LDXR/STXR if you speak ARMv8) provide. Those videos talk in general principles. There's some architecture course material online as well.
If you want more concrete implementation details, you can look at real-world implementations and how they map onto architecture specifications. For example: