Hello,
I am trying to implement atomic_add and atomic_get on FVP_Base_RevC-2xAEMv8A_11.3_30. The FVP is configured to run 4 cores.
After boot, core0 sets up the translation tables while other cores wait. Once the tables are setup, core0 signals the other cores to exit their wait. Each core then sets up its TTBRx, TCR, SCTRL registers to enable MMU, ICache and DCache.
After jumping to set the PC, each core is made to run these few lines, where count is a global variable.
_atomic_add(&count, 10); // POINT0 if (_atomic_get(&count) == 10) for (;;) asm volatile("wfe"); else for (;;) asm volatile("wfi");
The expectation is that one core will wait at wfe, while the other cores will wait at wfi. But it turns out that all 4 cores wait at wfe. That is, atomic_add does not behave as expected. The count cannot be 10 for /all/ cores, but it happens to be so.
If "dc cvac, &count" is placed at POINT0, the cores start behaving as expected.
The atomics:
.globl _atomic_add _atomic_add: ldxr w2, [x0] add w2, w2, w1 stxr w3, w2, [x0] cbnz w3, _atomic_add ret .globl _atomic_get _atomic_get: ldr w0, [x0] ret
Why does ldxr/stxr require me to clean the cache? It seems as if the L1 memory system of each core is unable to 'snoop' or request the cache line for the count variable. AFAIU, ldxr/stxr work with the L1 memory system to provide the required coherency, so 'dc cvac' is not required.
Thank you,
Amol
Edit: cvac, not civac.
What did you put as the cacheability + shareability for the address range where "count" lives?
My first thought would be that you'd marked the memory as non-shared, and hence coherency isn't being maintained. But's a guess based on limited information
:'‑)
Thank you! That was indeed the problem. The PTEs had their shared attributes left to their uninitialized value, 0 (or non-shared.)