This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

LL/SC exclusive access by register width or cache line width?

Note: This was originally posted on 30th May 2012 at http://forums.arm.com

Hi.

I'm working on the next release of my lock-free data structure library.

I'm using LL/SC on ARM.

To use LL/SC as LL/SC (rather than emulating CAS) there has to be a single STR between the LDREX and STREX.

Now, I've written the code and this works.

What concerns me however is the possibility it may not work.

I've read on PowerPC if you access the same cache line as the LL/SC target, you break the LL/SC.

So I'm thinking if my STR target is on the same cache line as my LL/SC target, then pow, I'm dead.

Now, the LL/SC target and STR targets are always in different malloc()s so the chance of them being directly in the same cache line is probably small (and I can guarantee this by padding the LL/SC target so it begins on a cache line boundary and fills that cache line).

But there coud be false sharing, if the STR target is in just the right (wrong!) place in memory.

Looking at the LDREX/STREX documentation, this descriibes exclusive access in terms of "the physical address".  This implies register width granularity, not cache line width granularity.

And that's my question - is LDREX/STREX sensitivity to memory access using register width granularity or cache line width granularity?
  • Note: This was originally posted on 31st May 2012 at http://forums.arm.com


    Purely from the idealized architecture point of view the ARM ARM defines the size being tracked for exclusive access as "a small block" - but the size of that is implementation defined (and has varied across a number of ARM core implementations). The exclusive monitor spec in the ARM ARM defines a normal store outside of "this block" between two exclusives will work fine, but within it the behaviour is implementation defined (may clear the exclusive monitor, may not).

    So yes, if you can guarantee that your "normal store" hits a different "small block" to the exclusives I think you are OK. However I don't know of a programmatic way to determine the monitor block size ...


    Hi and thanks!

    I've been told in StackOverflow the block size ranges from 8 to 2048 bytes.  This is fantastic - all other LL/SC implementations seem to mark on cache lines, which means false sharing fails the LL/SC - and it's far easier to align and pad the data structure instance state to 2048 bytes than fix up a malloc wrapper to avoid false sharing!

    LL/SC performs better than LL/SC emulating contigious double-word CAS - I see about a 10% improvement in single core performance (which is about the ratio of CAS/DCAS performance, where the LL/SC is single word) but I see also better scaling, by about 5% (e.g. 1 core = 100%, 2 core = 95%, 3 core = 90%, 4 core = 70%, vs about 5% less for DCAS, e.g. 100, 90, 85, 65 - note on my test system the Linux OS provides no cache info, so I can't tell you who's sharing what with whom).
  • Note: This was originally posted on 31st May 2012 at http://forums.arm.com

    > > I've been told in StackOverflow the block size ranges from 8 to 2048 bytes.

    > On some of the older uni-processor ARM11 cores (e.g. ARM1176) which don't have SMP support there is no address tracking at all; the local monitor is just a single bit which is
    > either 1 or 0 (set by LDREX, cleared by STREX or CLREX, STREX fails if the bit is zero). So the logical block size is 4GB in these cases. That said I think this design allows
    > "other stores" to land in-between the LDREX and the STREX, so you'll probably get away with it ;P
    Erk!

    Hmm.  Actually, in the unit test for the abstraction layer I can probe the block size by doing a malloc and trying LL/STR/SC with the STR being varying distances from the LL/SC target.

    When people run the test on their target platform they'll know then if it works or not.
  • Note: This was originally posted on 30th May 2012 at http://forums.arm.com

    Purely from the idealized architecture point of view the ARM ARM defines the size being tracked for exclusive access as "a small block" - but the size of that is implementation defined (and has varied across a number of ARM core implementations). The exclusive monitor spec in the ARM ARM defines a normal store outside of "this block" between two exclusives will work fine, but within it the behaviour is implementation defined (may clear the exclusive monitor, may not).

    So yes, if you can guarantee that your "normal store" hits a different "small block" to the exclusives I think you are OK. However I don't know of a programmatic way to determine the monitor block size ...
  • Note: This was originally posted on 31st May 2012 at http://forums.arm.com

    [color=#222222][size=2]> I've been told in StackOverflow the block size ranges from 8 to 2048 bytes.[/size][/color]
    [color=#222222][size=2]
    [/size][/color]
    [size=2]On some of the older uni-processor ARM11 cores (e.g. ARM1176) which don't have SMP support there is no address tracking at all; the local monitor is just a single bit which is either 1 or 0 (set by LDREX, cleared by STREX or CLREX, STREX fails if the bit is zero). So the logical block size is 4GB in these cases. [/size][color=#222222][size=2]That said I think this design allows "other stores" to land [/size][/color][color=#222222][size=2]in-between[/size][/color][color=#222222][size=2] the LDREX and [/size][/color][color=#222222][size=2]the[/size][/color][color=#222222][size=2] STREX, so you'll probably get away with it ;P[/size][/color]
    [color=#222222][size=2]
    [/size][/color]
    [color=#222222][size=2]Iso[/size][/color]