This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Memory barriers and ldrexd - race condition?

Note: This was originally posted on 2nd October 2012 at http://forums.arm.com

Hi -

Can someone please please PLEASE help me understand something which is still after a LOT of effort trying to understand confusing and puzzling me.

I'm trying to get my head around memory barriers and load-link/conditional-store.

In the ARM Linux kernel, I see this function;

353 static inline u64 atomic64_cmpxchg(atomic64_t *ptr, u64 old, u64 new)
354 {
355   u64 oldval;
356   unsigned long res;
357
358   smp_mb();
359
360   do {
361     __asm__ __volatile__("@ atomic64_cmpxchg\n"
362     "ldrexd   %1, %H1, [%3]\n"
363     "mov         %0, #0\n"
364     "teq         %1, %4\n"
365     "teqeq       %H1, %H4\n"
366     "strexdeq    %0, %5, %H5, [%3]"
367     : "=&r" (res), "=&r" (oldval), "+Qo" (ptr->counter)
368     : "r" (&ptr->counter), "r" (old), "r" (new)
369     : "cc");
370   } while (res);
371
372   smp_mb();
373
374   return oldval;
375 }

Now, I recently read Paul McKenney's paper explaining the hardware issues involved in memory barriers.

So what I understand is that the initial memory barrier flushes the invalidate queue and the store buffer.

That's all good and well.

We then come to the ldrexd. Now, this will perform the initial load and activate the ERG covering the address where the load occurs.

(And then we compare and if the compare passes, we store. We then issue another memory barrier to ensure our store will be seen before any other stores that we then perform (assuming the other CPUs issued read barriers, of course)).

Now, that which I am so confused about; after the first memory barrier, well - there's a period between the invalidate queue/store buffer flush and the execution of ldrexd.

During this time another CPU (especially if we're running a lock-free data structure, where we're using the same pointer a lot on different CPUs) could issue an invalidate on that pointer - and we in our CPU could fail to process it before executing ldrexd.

So we haven't yet activated the ERG - we're blind to the invalidate AND we're about to load an out of date value from our cache!

So we can imagine doing so, doing the compare, the compare passes, and then we store - which confuses me even more, because we could then store and THEN find the invalidate...

So, I'm missing something here. Question is, what?

PLEASE someone enlighten me!

It's that or insanity and I think I'm pretty close by now :-)

Parents

Toebs Douglass over 12 years ago

Note: This was originally posted on 3rd October 2012 at http://forums.arm.com

The main issue seems to be a misunderstanding about coherent caches work in an SMP system. The CPU hardware means that these "just work"; dirty lines for shared memory are migrated between cores as needed to ensure each core always sees the latest data.

If one core explicitly performs a cache invalidate and the data is needed, then that is a software bug, but for CPU-to-CPU coherency in SMP this is not needed.

I may very well be wrong, but as I understand it, this is not the case. If it were so, ARM would not need memory barriers.

The problem is that store buffers and invalidate queues "short-circuit" MESI, preventing it from working properly in all cases, leading to the need for memory barriers to make it work in all cases.
Cancel
Vote up 0 Vote down

Cancel

Reply

Toebs Douglass over 12 years ago

Note: This was originally posted on 3rd October 2012 at http://forums.arm.com

The main issue seems to be a misunderstanding about coherent caches work in an SMP system. The CPU hardware means that these "just work"; dirty lines for shared memory are migrated between cores as needed to ensure each core always sees the latest data.

If one core explicitly performs a cache invalidate and the data is needed, then that is a software bug, but for CPU-to-CPU coherency in SMP this is not needed.

I may very well be wrong, but as I understand it, this is not the case. If it were so, ARM would not need memory barriers.

The problem is that store buffers and invalidate queues "short-circuit" MESI, preventing it from working properly in all cases, leading to the need for memory barriers to make it work in all cases.
Cancel
Vote up 0 Vote down

Cancel

Children

No data