This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Cache type and cache operation sequence

I have a shared memory in DDR  --- shared between two separate ARM execution environments (say A and B)  in a heterogeneous compute SoC.

SW on each execution units (A and B) Reads and Writes to this shared location (imagine a array/matrix that is being worked on)  --- A Read first followed by Write.

i.e.

A :  Read array elements, modify, Write

A : Send IPC message to B

B: Read array elements, modify, Write

B: Send IPC message to A

This continues until a certain condition is satisfied.

What should be the memory attribute and cache operations on A and B.

1. Both A and B can have the shared memory mapped as Write-back with Read-alloc

    At both A and B, the SW first "invalidate/flushe" the region before the Read, and after Write does a Clean (to push the data to the memory/DDR) before sending the IPC.

2. Both A and B can have the shared memory mapped as Write-through with Read-alloc

    At both A and B, the SW would then just need to "invalidate/flush" the region before Read, and no cache operation is required after Write and before sending the IPC.

What would be the right way to do this ?

  • Hello,

    both 1 and 2 would be possible.

    To realize it, you must separate the A access from the B access by using atomic instructions such as LDREX/STREX.

    Best regards,

    Yasuhiko Koumoto.

  • One point to note would be to check the TRM for the CPUs in your system to check what cache write modes they support, IIRC not all of the ARM cores can support write-through caching (it's very rarely used in reality due to the bandwidth implications it has for "normal software").

  • It partly depends on the capabilities of your SoC. For many systems, the processing elements will be in the same Shareability domain. This is an architectural term, but what it essentially means is that -- if the right memory type is used -- the processors will snoop into each other's caches for data; for example using the AMBA ACE or CHI protocols. Depending on the SoC, that would be either "Normal, Cacheable, Inner Shareable", or "Normal, Cacheable, Outer Shareable".

    The Allocate flags are performance optimization hints and should not affect the correctness. Moreover, you should not be relying on them for correctness, as it's not a guarantee.

    You do need to consider memory ordering. You must insert some form of barrier operation between the producer writing the data and sending the message and between consumer receiving the message and reading the data. For example a DMB instruction, or (on ARMv8-A CPUs), use the Load-Acquire/Store-Release instructions. For more information, see chapter K10 "Barrier Litmus Tests" in the ARMv8-A Architecture Reference Manual.

    If the SoC doesn't put the processing elements in a Shareability domain, then you have to use other memory types or software cache maintenance as described. Write-Through memory has the property that all producer writes are globally observable without explicit cache maintenance in finite time, but you might need cache maintenance as a consumer (to kick out stale data); Non-cacheable avoids the need for cache maintenance. These will have lower performance, particularly if your code repeatedly references the data either at the producer or consumer end (so would benefit from it being cached).

    As Peter says, not all processors support Write-Through. This doesn't mean they ignore it, though. They would instead just use Non-cacheable instead.