Cache maintanance operation to PoC

Hi experts,

I'm quite confused about cache maintanance operation to PoC on Cortex-A9 (with PL310 L2 cache controller).

I'm refererring to the following operations:

- DCIMVAC, invalidate data cache by MVA to POC      (mcr  p15, 0, r0, c7, c6, 1)

- DCCMVAC, clean data cache by MVA to POC           (mcr  p15, 0, r0, c7, c10, 1)

- DCCIMVAC, clean and inv data cache by MVA to POC  (mcr  p15, 0, r0, c7, c14, 1)

As far as I know, on Cortex-A9, PoC is main external system memory (RAM) and PoU is L2 cache.

So my questions/doubts are:

1) Do these operations really clean/invalidate also L2? I'm pretty sure that PL310 needs to be cleaned/invalidate by separete instrunctions. So I think that the definition "to PoC" is quite misleading.

2) What happens if L2 (PL310) is disabled?

3) On other processors where L2 cache is "on-core" (for example Cortex-A8 and Cortex-A9) do these operations have different behavior?

Could anyone please shed some light

Thanks in advance

Regards

Luke

Parents
  • Hi Luke,

    You're right; PoU and PoC are the same place on the Cortex-A9 - the back side of the L1 cache towards SCU L2 interface.

    1) The L2C-310 needs special attention through a memory mapped interface to perform maintenance. This is defined by the L2C-310 - not the Cortex-A9.

    2) If the L2C-310 is disabled, it does not allocate into its cache, and forwards all transfers to L3. If L2 is disabled then as long as all it's lines are invalidated, you'll always go to L3 (although it may still end up in the store buffer at L2C-310)

    3) They do indeed, and you're going to have to look at the TRMs to find out why (mostly in the possibility of having "outer" or "inner" cacheability policies). See below.

    PoU and PoC are cache-related terms but they're to do with observers from the point of view of the ARM Architecture, and nothing to do with the system coherency. Some call any cache level that you can discover or use via system registers (as you gave, or listed in the CLIDR or usable in the CSSELR) "architectural caches." It's those "architectural caches" that these points and levels describe and deal with. This is all defined in B2.2.6 of the ARMv7-A Architecture Reference Manual.

    As a point on the difference in behaviour on a Cortex-A15, for example, L1 and L2 are in the inner domain, but you can have an L3 in the interconnect which, by way of the interconnect being coherent is *ALSO* in the inner domain. The processor can't know about the L3 cache - it's not part of the "cluster," but part of the interconnect/fabric - and therefore isn't able to be referenced in the CLIDR register contents. There is a way - a requirement - to configure the Cortex-A15 at design time to follow up the PoC operations with an extra cache maintenance broadcast which will get that data out of L3 and towards the actual system memory.

    With Cortex-A9 and L2C-310 you have to follow up the cache maintenance operation with the memory mapped write. All the Cortex-A15 has done is made it an automated procedure.

    Does that explain it?

    Thanks,

    Matt

Reply
  • Hi Luke,

    You're right; PoU and PoC are the same place on the Cortex-A9 - the back side of the L1 cache towards SCU L2 interface.

    1) The L2C-310 needs special attention through a memory mapped interface to perform maintenance. This is defined by the L2C-310 - not the Cortex-A9.

    2) If the L2C-310 is disabled, it does not allocate into its cache, and forwards all transfers to L3. If L2 is disabled then as long as all it's lines are invalidated, you'll always go to L3 (although it may still end up in the store buffer at L2C-310)

    3) They do indeed, and you're going to have to look at the TRMs to find out why (mostly in the possibility of having "outer" or "inner" cacheability policies). See below.

    PoU and PoC are cache-related terms but they're to do with observers from the point of view of the ARM Architecture, and nothing to do with the system coherency. Some call any cache level that you can discover or use via system registers (as you gave, or listed in the CLIDR or usable in the CSSELR) "architectural caches." It's those "architectural caches" that these points and levels describe and deal with. This is all defined in B2.2.6 of the ARMv7-A Architecture Reference Manual.

    As a point on the difference in behaviour on a Cortex-A15, for example, L1 and L2 are in the inner domain, but you can have an L3 in the interconnect which, by way of the interconnect being coherent is *ALSO* in the inner domain. The processor can't know about the L3 cache - it's not part of the "cluster," but part of the interconnect/fabric - and therefore isn't able to be referenced in the CLIDR register contents. There is a way - a requirement - to configure the Cortex-A15 at design time to follow up the PoC operations with an extra cache maintenance broadcast which will get that data out of L3 and towards the actual system memory.

    With Cortex-A9 and L2C-310 you have to follow up the cache maintenance operation with the memory mapped write. All the Cortex-A15 has done is made it an automated procedure.

    Does that explain it?

    Thanks,

    Matt

Children
More questions in this forum