This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

question on Exclusive access

Hi All,

I am a bit confused with Exclusive access and Locked access

Would you give me example what scenario master will use exclusive access and Locked access? 

Thanks a lot!

Parents
  • The exclusive access monitor watches for successful writes, or possibly other exclusive reads, from other masters in the system to the monitored address.  

    Let's say you have two CPUs trying to down a counting semaphore.  Both need to atomically test whether the semaphore is greater than zero, and if it is, decrement its value in memory.

    Both CPUs will try to something along the lines of LDREX, SUBS, BMI, STREX, CMPS, BNE in a loop.  SUBS will detect 0 by the result of the subtract going negative, and set N=1 if it does.  BMI is to branch to a "go to sleep, semaphore is busy" path.  The CMPS, BNE will loop if the STREX failed.  CMPS, BNE may be replaced with CBZ/CBNZ if available.

    Imagine the following interleaved accesses to the semaphore by the two CPUs:

     CPU
    A
    CPU
    B
    Comment
    LDREX Monitor now points to CPU A for the semaphore address.
    LDREX Monitor now points to CPU B for the semaphore address.
    STREX Monitor denies the write, as the monitor points to CPU B.
    STREX Monitor permits the write, as the monitor points to CPU B.  The monitor now resets.

    In this scenario, CPU A was blocked from updating the semaphore, because B raced with it and snatched the exclusive monitor away from it.  Now, notice I said "successful writes."  Because the STREX was denied, there's no reason for it to redirect the monitor.

    The monitor can implement a different policy.  If a second CPU tries to obtain the monitor for the same address before the first CPU has made its write, the monitor can decide not to switch, like so.

     CPU
    A
    CPU
    B
    Comment
    LDREX Monitor now points to CPU A for the semaphore address.
    LDREX Monitor ignores CPU B, as CPU A has an active monitor.
    STREX Monitor permits the write, as the monitor points to CPU A. The monitor now resets.
    STREX Monitor denies the write, because no monitor is active.

    The second strategy has some advantages if you guarantee LDREX/STREX pairs are guaranteed to arrive close to each other, or you place a timeout on the monitor.  If you get an interrupt or a long stall between LDREX and STREX, you could cause undue fairness issues in the system.  Note: CLREX can be used to clear active monitors in an interrupt handler.

    For device and non-cacheable memory, exclusive access monitors are typically implemented in the endpoint device, or at an upstream point in the interconnect that all accesses to that endpoint must come through.  Home Nodes, for example, in CHI interconnects, could implement a monitor for downstream memory or devices.  Or, you could imagine a proxy bridge in AXI interconnects that could work with arbitrary IP behind it.

    For normal memory in a coherent interconnect, the L1 cache controller can implement the exclusive monitor for normal, cacheable, shared address ranges, by pulling the line into the Exclusive state.  Coherence snoops due to other CPUs' accesses would cancel the monitor.  That strategy looks like the first scenario above.  I believe this is what the Cortex-A53 does with its internal exclusive monitor.  (See the TRM.)

    Note: My description above is meant to be generic for this class of primitive, and not specific to a particular ARM implementation, except where noted.  Other platforms call this primitive "Load Link, Store Conditional." That may help you find other resources on how this form of optimistic synchronization primitive works.

Reply
  • The exclusive access monitor watches for successful writes, or possibly other exclusive reads, from other masters in the system to the monitored address.  

    Let's say you have two CPUs trying to down a counting semaphore.  Both need to atomically test whether the semaphore is greater than zero, and if it is, decrement its value in memory.

    Both CPUs will try to something along the lines of LDREX, SUBS, BMI, STREX, CMPS, BNE in a loop.  SUBS will detect 0 by the result of the subtract going negative, and set N=1 if it does.  BMI is to branch to a "go to sleep, semaphore is busy" path.  The CMPS, BNE will loop if the STREX failed.  CMPS, BNE may be replaced with CBZ/CBNZ if available.

    Imagine the following interleaved accesses to the semaphore by the two CPUs:

     CPU
    A
    CPU
    B
    Comment
    LDREX Monitor now points to CPU A for the semaphore address.
    LDREX Monitor now points to CPU B for the semaphore address.
    STREX Monitor denies the write, as the monitor points to CPU B.
    STREX Monitor permits the write, as the monitor points to CPU B.  The monitor now resets.

    In this scenario, CPU A was blocked from updating the semaphore, because B raced with it and snatched the exclusive monitor away from it.  Now, notice I said "successful writes."  Because the STREX was denied, there's no reason for it to redirect the monitor.

    The monitor can implement a different policy.  If a second CPU tries to obtain the monitor for the same address before the first CPU has made its write, the monitor can decide not to switch, like so.

     CPU
    A
    CPU
    B
    Comment
    LDREX Monitor now points to CPU A for the semaphore address.
    LDREX Monitor ignores CPU B, as CPU A has an active monitor.
    STREX Monitor permits the write, as the monitor points to CPU A. The monitor now resets.
    STREX Monitor denies the write, because no monitor is active.

    The second strategy has some advantages if you guarantee LDREX/STREX pairs are guaranteed to arrive close to each other, or you place a timeout on the monitor.  If you get an interrupt or a long stall between LDREX and STREX, you could cause undue fairness issues in the system.  Note: CLREX can be used to clear active monitors in an interrupt handler.

    For device and non-cacheable memory, exclusive access monitors are typically implemented in the endpoint device, or at an upstream point in the interconnect that all accesses to that endpoint must come through.  Home Nodes, for example, in CHI interconnects, could implement a monitor for downstream memory or devices.  Or, you could imagine a proxy bridge in AXI interconnects that could work with arbitrary IP behind it.

    For normal memory in a coherent interconnect, the L1 cache controller can implement the exclusive monitor for normal, cacheable, shared address ranges, by pulling the line into the Exclusive state.  Coherence snoops due to other CPUs' accesses would cancel the monitor.  That strategy looks like the first scenario above.  I believe this is what the Cortex-A53 does with its internal exclusive monitor.  (See the TRM.)

    Note: My description above is meant to be generic for this class of primitive, and not specific to a particular ARM implementation, except where noted.  Other platforms call this primitive "Load Link, Store Conditional." That may help you find other resources on how this form of optimistic synchronization primitive works.

Children
  • Hi Jzbiciak,

    thank you for your comment,  for the example of interleaved accesses to  semaphore by the two CPUs, are there any specification or tutorial ?

    Thank you

  • I don't know of a specific tutorial that walks through the various cases with examples.  There are some videos online regarding Load link/Store conditional (LL/SC), which is the name for what LDREX/STREX (or LDXR/STXR if you speak ARMv8) provide.  Those videos talk in general principles.  There's some architecture course material online as well.

    If you want more concrete implementation details, you can look at real-world implementations and how they map onto architecture specifications.  For example:

    • Sections 6.5.1 through 6.5.3 of the Cortex-A53 r0p4 TRM (DDI0500G), describe its internal exclusive monitor. 
    • Section B2.9 in the ARMv8 ARM (DDI 0487H.a) walks through how ARM defines the behavior for local and global monitors, along with state machine descriptions.
    • Chapter 6 of the AMBA 5 CHI spec (IHI0050E.b) describes the operation of exclusive accesses in a CHI fabric.
    • Section A7.2 in the AMBA AXI and ACE spec (IHI0020H) describes the operation of exclusive accesses in an AXI fabric.