This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Cache maintanance operation to PoC

Hi experts,

I'm quite confused about cache maintanance operation to PoC on Cortex-A9 (with PL310 L2 cache controller).

I'm refererring to the following operations:

- DCIMVAC, invalidate data cache by MVA to POC      (mcr  p15, 0, r0, c7, c6, 1)

- DCCMVAC, clean data cache by MVA to POC           (mcr  p15, 0, r0, c7, c10, 1)

- DCCIMVAC, clean and inv data cache by MVA to POC  (mcr  p15, 0, r0, c7, c14, 1)

As far as I know, on Cortex-A9, PoC is main external system memory (RAM) and PoU is L2 cache.

So my questions/doubts are:

1) Do these operations really clean/invalidate also L2? I'm pretty sure that PL310 needs to be cleaned/invalidate by separete instrunctions. So I think that the definition "to PoC" is quite misleading.

2) What happens if L2 (PL310) is disabled?

3) On other processors where L2 cache is "on-core" (for example Cortex-A8 and Cortex-A9) do these operations have different behavior?

Could anyone please shed some light

Thanks in advance

Regards

Luke

Parents
  • Hi Luke,

    You're right; PoU and PoC are the same place on the Cortex-A9 - the back side of the L1 cache towards SCU L2 interface.

    1) The L2C-310 needs special attention through a memory mapped interface to perform maintenance. This is defined by the L2C-310 - not the Cortex-A9.

    2) If the L2C-310 is disabled, it does not allocate into its cache, and forwards all transfers to L3. If L2 is disabled then as long as all it's lines are invalidated, you'll always go to L3 (although it may still end up in the store buffer at L2C-310)

    3) They do indeed, and you're going to have to look at the TRMs to find out why (mostly in the possibility of having "outer" or "inner" cacheability policies). See below.

    PoU and PoC are cache-related terms but they're to do with observers from the point of view of the ARM Architecture, and nothing to do with the system coherency. Some call any cache level that you can discover or use via system registers (as you gave, or listed in the CLIDR or usable in the CSSELR) "architectural caches." It's those "architectural caches" that these points and levels describe and deal with. This is all defined in B2.2.6 of the ARMv7-A Architecture Reference Manual.

    As a point on the difference in behaviour on a Cortex-A15, for example, L1 and L2 are in the inner domain, but you can have an L3 in the interconnect which, by way of the interconnect being coherent is *ALSO* in the inner domain. The processor can't know about the L3 cache - it's not part of the "cluster," but part of the interconnect/fabric - and therefore isn't able to be referenced in the CLIDR register contents. There is a way - a requirement - to configure the Cortex-A15 at design time to follow up the PoC operations with an extra cache maintenance broadcast which will get that data out of L3 and towards the actual system memory.

    With Cortex-A9 and L2C-310 you have to follow up the cache maintenance operation with the memory mapped write. All the Cortex-A15 has done is made it an automated procedure.

    Does that explain it?

    Thanks,

    Matt

Reply
  • Hi Luke,

    You're right; PoU and PoC are the same place on the Cortex-A9 - the back side of the L1 cache towards SCU L2 interface.

    1) The L2C-310 needs special attention through a memory mapped interface to perform maintenance. This is defined by the L2C-310 - not the Cortex-A9.

    2) If the L2C-310 is disabled, it does not allocate into its cache, and forwards all transfers to L3. If L2 is disabled then as long as all it's lines are invalidated, you'll always go to L3 (although it may still end up in the store buffer at L2C-310)

    3) They do indeed, and you're going to have to look at the TRMs to find out why (mostly in the possibility of having "outer" or "inner" cacheability policies). See below.

    PoU and PoC are cache-related terms but they're to do with observers from the point of view of the ARM Architecture, and nothing to do with the system coherency. Some call any cache level that you can discover or use via system registers (as you gave, or listed in the CLIDR or usable in the CSSELR) "architectural caches." It's those "architectural caches" that these points and levels describe and deal with. This is all defined in B2.2.6 of the ARMv7-A Architecture Reference Manual.

    As a point on the difference in behaviour on a Cortex-A15, for example, L1 and L2 are in the inner domain, but you can have an L3 in the interconnect which, by way of the interconnect being coherent is *ALSO* in the inner domain. The processor can't know about the L3 cache - it's not part of the "cluster," but part of the interconnect/fabric - and therefore isn't able to be referenced in the CLIDR register contents. There is a way - a requirement - to configure the Cortex-A15 at design time to follow up the PoC operations with an extra cache maintenance broadcast which will get that data out of L3 and towards the actual system memory.

    With Cortex-A9 and L2C-310 you have to follow up the cache maintenance operation with the memory mapped write. All the Cortex-A15 has done is made it an automated procedure.

    Does that explain it?

    Thanks,

    Matt

Children
  • Thank you Matt.

    It's a bit clearer now.

    Regards

    Luke

  • Just for an extra bit of clarification, we had a little discussion here in the office about the meaning of this paragraph in the ARMv7-A ARM:

    • For MVA operations, two conceptual points are defined:
      • Point of coherency (PoC)
        • For a particular MVA, the PoC is the point at which all agents that can access memory are guaranteed to see the same copy of a memory location. In many cases, this is effectively the main system memory, although the architecture does not prohibit the implementation of caches beyond the PoC that have no effect on the coherence between memory system agents.

    (Our emphasis added)

    This might be considered a little unclear - it could either mean that:

    • Those caches have no 'effect' on the coherency of the system (i.e. it is maintained) in that they are handled by a combination of extra coherency logic and system barriers which will handle whether it goes to main memory (Cortex-A15 with a CCN-504 for example)
    • It can be cached in lieu of main memory with no ill effects such as an "L4" cache built in to a memory controller, such that all accesses to main memory will be filled by the cache with no ill effects (transparent caches past any coherency-maintaining logic are implicitly coherent)
    • They have no effect on the coherency of the system in that they will interfere with coherency with respect to other implementations (this is essentially anything with an L2C-310).

    We've decided that it technically means all of the above, but it is most likely intended to cover the final L2C-310-style case, where an external cache may need extra work to maintain.

    Thanks,

    Matt

  • Hi Matt,

    thank you for the clarification.

    Regards

    Luke