This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

fvp: ldrx/strx and cache clean

Hello,

I am trying to implement atomic_add and atomic_get on FVP_Base_RevC-2xAEMv8A_11.3_30. The FVP is configured to run 4 cores.

After boot, core0 sets up the translation tables while other cores wait. Once the tables are setup, core0 signals the other cores to exit their wait. Each core then sets up its TTBRx, TCR, SCTRL registers to enable MMU, ICache and DCache.

After jumping to set the PC, each core is made to run these few lines, where count is a global variable.

Fullscreen

1
2
3
4
5
6
7
8
_atomic_add(&count, 10);
// POINT0
if (_atomic_get(&count) == 10)
        for (;;)
                asm volatile("wfe");
else
        for (;;)
                asm volatile("wfi");
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

_atomic_add(&count, 10);
// POINT0
if (_atomic_get(&count) == 10)
        for (;;)
                asm volatile("wfe");
else
        for (;;)
                asm volatile("wfi");

The expectation is that one core will wait at wfe, while the other cores will wait at wfi. But it turns out that all 4 cores wait at wfe. That is, atomic_add does not behave as expected. The count cannot be 10 for /all/ cores, but it happens to be so.

If "dc cvac, &count" is placed at POINT0, the cores start behaving as expected.

The atomics:

Fullscreen

1
2
3
4
5
6
7
8
9
10
11
12
13
.globl _atomic_add
_atomic_add:
        ldxr    w2, [x0]
        add     w2, w2, w1
        stxr    w3, w2, [x0]
        cbnz    w3, _atomic_add
        ret
.globl _atomic_get
_atomic_get:
        ldr     w0, [x0]
        ret
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

.globl _atomic_add
_atomic_add:
        ldxr    w2, [x0]
        add     w2, w2, w1
        stxr    w3, w2, [x0]
        cbnz    w3, _atomic_add
        ret


.globl _atomic_get
_atomic_get:
        ldr     w0, [x0]
        ret

Why does ldxr/stxr require me to clean the cache? It seems as if the L1 memory system of each core is unable to 'snoop' or request the cache line for the count variable. AFAIU, ldxr/stxr work with the L1 memory system to provide the required coherency, so 'dc cvac' is not required.

Thank you,

Amol

Edit: cvac, not civac.

Top replies

Martin Weidmann over 7 years ago +1 verified

What did you put as the cacheability + shareability for the address range where "count" lives? My first thought would be that you'd marked the memory as non-shared, and hence coherency isn't being maintained...