This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

ARM11 MPcore and stale chacheline

Note: This was originally posted on 18th February 2013 at http://forums.arm.com

Hello all,

we are in middle of porting FreeBSD to ARM11 MPcore CPU.

Unfortunately, we stuck on strange stale cacheline (probably) issue.

In short, after specific write pattern performed on first core and single write on second, we got stale cacheline on first one.  The write (and yes, it's followed by DSB) from second core is not visible on first CPU.  But after executing s DMB on first core, we got actual data. 

We have verified that both cores are in SMP mode, accessed memory is mapped using 1MB section with shared bit set, without any aliasing.  The hardware is Cavium Networks CNS3420 dual core ARM11 MPcore CPU, revision r2p0.

Unfortunately, we don't have access to any ARM11 MPcore errata. It's here any errata that can cause this problem? It's possible to get errata sheet even we are not ARM customer?

We can post pseudocode that's trigger the issue here, if it's necessary/required.

Many thanks

Michal Meloun
Parents
  • Note: This was originally posted on 22nd February 2013 at http://forums.arm.com


    I'm not 100% sure it applies to ARM11 MPCore, but on ARMv7A it is not architecturally valid to use clean and invalidate of the whole cache once the CPU is running; you have to do it by set-way or the SCU doesn't necessarily pick up the snoop correctly.

    Good catch, thanks. The reference manual is bit cryptic in this point (at least for me) so I totally miss this fact. Unfortunately, replacing full cache maintenance operations by full set-way cycle has no effect.


    Additionally why do you have the clean and invalidates everywhere? They shouldn't be needed. The SCU hardware should ensure everything syncs.

    It was added as "be really sure" and "put core to more defined state" when we makes testcase. The whole synchronization sequence can be replaced by DSB without any effect.


    ... and to check the obvious - you have marked these pages as shared in the MMU, and enabled the SCU?

    Checked using CP15 PA to VA translation ("mrc  p15, 0, %0, c7, c4, 0").  Both cores have same value "" normal memory, shared, inner and outer WB WA (beware, ARMv6k uses different format that ARMv7A).

    389 (0xc114c000:cpu0):  cache_test: WAIT_ITEM va: 0xC0484000 -> pa: 0x20484194
    390 (0xc114a900:cpu1): intr_event_handle: exec 0xc005b4dc(0xf) for ipi_test
    391 (0xc114a900:cpu1):  ipi_test_handler: WAIT_ITEM va: 0xC0484000 -> pa: 0x20484194
    392 (0xc114c000:cpu0): 0x00000000 0x00010003 0x17020003 0x00030003 0x16040003
    393 (0xc114c000:cpu0): 0x00000000 0x00010003 0x17020003 0x00030003 0x17040003


    When we got stale data then any of following action helps (all on cpu0):
    - DMB
    - Read at least 4 words on same cache index as wait variable in different ways. 
    - Any write to other word in same cacheline
    - Cacheline flush and invalidate by MVA of wait variable.
    Repeated read from any word in same cacheline or longer timeout not helps.
Reply
  • Note: This was originally posted on 22nd February 2013 at http://forums.arm.com


    I'm not 100% sure it applies to ARM11 MPCore, but on ARMv7A it is not architecturally valid to use clean and invalidate of the whole cache once the CPU is running; you have to do it by set-way or the SCU doesn't necessarily pick up the snoop correctly.

    Good catch, thanks. The reference manual is bit cryptic in this point (at least for me) so I totally miss this fact. Unfortunately, replacing full cache maintenance operations by full set-way cycle has no effect.


    Additionally why do you have the clean and invalidates everywhere? They shouldn't be needed. The SCU hardware should ensure everything syncs.

    It was added as "be really sure" and "put core to more defined state" when we makes testcase. The whole synchronization sequence can be replaced by DSB without any effect.


    ... and to check the obvious - you have marked these pages as shared in the MMU, and enabled the SCU?

    Checked using CP15 PA to VA translation ("mrc  p15, 0, %0, c7, c4, 0").  Both cores have same value "" normal memory, shared, inner and outer WB WA (beware, ARMv6k uses different format that ARMv7A).

    389 (0xc114c000:cpu0):  cache_test: WAIT_ITEM va: 0xC0484000 -> pa: 0x20484194
    390 (0xc114a900:cpu1): intr_event_handle: exec 0xc005b4dc(0xf) for ipi_test
    391 (0xc114a900:cpu1):  ipi_test_handler: WAIT_ITEM va: 0xC0484000 -> pa: 0x20484194
    392 (0xc114c000:cpu0): 0x00000000 0x00010003 0x17020003 0x00030003 0x16040003
    393 (0xc114c000:cpu0): 0x00000000 0x00010003 0x17020003 0x00030003 0x17040003


    When we got stale data then any of following action helps (all on cpu0):
    - DMB
    - Read at least 4 words on same cache index as wait variable in different ways. 
    - Any write to other word in same cacheline
    - Cacheline flush and invalidate by MVA of wait variable.
    Repeated read from any word in same cacheline or longer timeout not helps.
Children
No data