Hi,
I have a system with a multiple quad core clusters with Cortex A-53 and the CCN-512. L1 through L2 are integrated caches where L3 is an outer cache in 8xHN-F of the CCN512.
My question is how should I interpret the shareability domain: inner, outer controlled through TCR register and the page descriptor. Would be that inner shareable is within the Cluster and only within it the Coherency is maintained for the Memory region marked Cacheable WB-WA for Innter and Outer?
Or is my understanding wrong in the context of "Snoop and Maintenance Requests" chapter of "ARM Cortex-A53 MPCore Processor TRM". It says there that broadcastinner asserted enforces broadcastouter asserted so that would suggest that setting the inner shareability makes snoop and maintenance requests boradcast to the Observers in both Inner and Outer Domains.
Please help!
Thanks,
Marek
2313.Shareability-Versus-Coherency.pptx
I have included a little bit of my analysis of it but I'm still confused...
Folks, If you have any "unconfirmed theories" I'll appreciate you share them with me.
1) The shareability domains are SOC specific and not core specific(such as A53/A57 etc).2) Based on your comments it appears that the L1/L2 of a cluster are in the inner shareable, and L3 is in the outer shareable.3) Now, there is inner/outer cacheable and inner/outer shareable. Your statement, L3 is an outer cache is a little confusing. It is possible that L1/L2 are inner caches, L3 is the outer cache, but they are all in the same inner shareable coherency domain. Can you provide more details on this?4) TCR register controls the cacheability and shareability attributes of the translation tables itself. The attributes in the TCR register will be used when memory accesses to the translation tables are sent out to the caches/bus. The attributes inside a translation table entry, control the cacheability and shareability attributes with which accesses to the physical address corresponding to the translation table entry will be made with.5) Per the ARMv8 ARM, the architecture assumes that a given inner shareable domain is controlled by a single OS/hypervisor. So if all cores of your soc are running the same OS, this should be a reasonable indicator that L1/L2 and L3 on your SOC are all part of the inner shareable domain, in which case, you dont need to use the Outer shareable attribute anywhere.6) Per the "Snoop and Maintenance Requests" chapter of "ARM Cortex-A53 MPCore Processor TRM", it appears that when they use the "broadcast externally", they mean broadcast externally with respect to the A53 core. If there are multiple A53 cores in a cluster, and the pin BROADCASTINNER is set to 1, it means that the A53 core's SCU will broadcast any actions that need to be broadcast(TLBI's, barriers etc), externally on the bus to which the A53 is connected to. I dont think that particular statement is talking about inner or outer shareability.6) Finally, assuming L1/L2 are inner shareable, and L3 is outer shareable, then each cluster is AN inner shareable domain, and all your clusters belong to the same outer shareable domain, if there is a broadcast instruction/operation (such as TLBI, or memory barrier) with the inner shareable attribute, coherency is enforced only in the inner shareable domain(L1/L2 caches of the cluster). However, if there is broadcast operation with the outer shareable attribute, coherency is enforced in the inner shareable and outershareable domain(L1/L2 and L3). In this example, a DMB ISH will affect only L1/L2 caches of the cluster in which the DMB is executed. A DMB OSH will affect L1/L2 of the cluster where the DMB is executed, the L3's and L1/L2's of other clusters.
Raghu, thank you a lot for the in detail elaboration. My comments are:
1) Yes I know the shareability is a SoC not Processor specific. I also have cortex a57 with CCN0504 and also L3 in there but wanted to go processor specific in the question
3) I cannot provide more details. This is all I have. I have a system with 8x Quad Cortex a53 and 4x Ceva Clusters with CCN512 as Coherent Interconnect. 8x HN-Fs (Fully Coherent Home network) host all together 24M of L3 cache with 3M each. And as per CCN-512 each HN-F manages PoC and PoS, tracks HN-F caching in the snoop filter. Snoop filter tag RAM is 4M.
5) Read that too and that confused me even more as that would suggest that if all cores run the same OS then L1 through L3 would be in the same inner shareability domain. But on top of that I read in “ARM® Architecture Reference Manual ARMv8, for ARMv8-A architecture profile”
Example B2-1 Use of shareability attributes In an implementation, a particular subsystem with two clusters of PEs has the requirement that: • In each cluster, the data caches or unified caches of the PEs in the cluster are transparent for all data accesses to memory locations with the Inner Shareable attribute. • However, between the two clusters, the caches: — Are not required to be coherent for data accesses that have only the Inner Shareable attribute. — Are coherent for data accesses that have the Outer Shareable attribute. In this system, each cluster is in a different shareability domain for the Inner Shareable attribute, but all components of the subsystem are in the same shareability domain for the Outer Shareable attribute. A system might implement two such subsystems. If the data caches or unified caches of one subsystem are not transparent to the accesses from the other subsystem, this system has two Outer Shareable shareability domains.
That would suggest that the design above is that Cluster#0 runs one OS whereas Cluser#1 another. Isn't it?
6) This is also my understanding and I would infer from that that if in any doubts I'm better off to set outer shareability as than the coherency will get maintained for the inner (L1/L2) and outer shareable (L3) domains.
For 5) yes. In the example system described in Example B2-1, the different clusters would typically, but not mandatory, be running different OS's. The document is only referring to expected use case but the architecture does not explicitly forbid running multiple OS's running in the same inner shareable domain.
For 6) Even If you mark your memory as outer shareable, there is no guarantee that coherence "Will" get maintained. The SOC hardware may not even have(or recognize, or may simply force that OSH = ISH) an outer shareable domain. It is possible that the SOC is designed in such a way that coherency between the clusters is expected to be maintained manually. You may want to mark your memory as outer shareable and experiment to see if coherency is maintained but the safest approach to designing software for the SOC would be to know exactly how the SOC is designed to avoid ending up with hard to debug coherency issues. In your case, as stated earlier, it is possible that L1/L2 are inner caches and L3;s are outer caches but are all part of the inner shareable domain in which case you can just mark all your memory as innershareable and not have to worry about coherency between clusters.
Great. Thank you Raghu. This is invaluable information. I have requested the HW design team to let me know these.