353 static inline u64 atomic64_cmpxchg(atomic64_t *ptr, u64 old, u64 new)354 {355 u64 oldval;356 unsigned long res;357358 smp_mb();359360 do {361 __asm__ __volatile__("@ atomic64_cmpxchg\n"362 "ldrexd %1, %H1, [%3]\n"363 "mov %0, #0\n"364 "teq %1, %4\n"365 "teqeq %H1, %H4\n"366 "strexdeq %0, %5, %H5, [%3]"367 : "=&r" (res), "=&r" (oldval), "+Qo" (ptr->counter)368 : "r" (&ptr->counter), "r" (old), "r" (new)369 : "cc");370 } while (res);371372 smp_mb();373374 return oldval;375 }
The main issue seems to be a misunderstanding about coherent caches work in an SMP system. The CPU hardware means that these "just work"; dirty lines for shared memory are migrated between cores as needed to ensure each core always sees the latest data. If one core explicitly performs a cache invalidate and the data is needed, then that is a software bug, but for CPU-to-CPU coherency in SMP this is not needed.