I'm studying the Realm Management Extensions, and a question came to mind. The Arm ARM and other documentation (e.g., den0126) suggest that, conceptually, the GPC is performed before any memory access (including the caches). However, since cache lines are tagged with the associated PA, I imagine that this cache tag is used in coherency protocols as part of the snooped address. If so, imagine a hypothetical scenario where we are using different GPTs in two coherent cores with mutually exclusive regions marked as Normal and the rest of the PA marked as Root, both running in the Normal world. Could one of the cores access the other core's memory by fetching the data via the coherency bus if it were present in the other core cache (thus tagged as Normal) despite being marked as Root in its local GPT? Would the line be fetched but blocked by the GPC? If not, this would contradict my first observation. What behavior should I expect in future implementations? Can you point me to other documentation that would clear this up for me?
Note that I am perfectly aware that CCA was designed for a single shared GPT across all PEs. However, the spec seems to suggest that this is kind of implementation dependent (constrained unpredictable behavior which allows it in one of the variants). Also, I imagine we'll only likely find TLB entries with cached GPT information shared across PEs in SMT implementations.
So I was thinking about this more overnight.
Another problem with this approach is potentially the GIC. If you have RME, then it's very likely the system will have a GICv3 or GICv4 interrupt controller. GICv3/4 uses memory in the Non-secure PAS for storing structures for some types of interrupt.
The GIC has to be subject to GPC - just like anything else that can access memory. Which would be achieved by putting it behind an SMMU: Learn the architecture - Realm Management Extension (arm.com)
Which leads to a question: How many GICs are there in the system? One shared GIC (typical)? Or, multiple GICs (less common)?If it's a shared GIC, which GPT is the SMMU using to perform the GIC's GPCs?
I'm using the GIC as an example (because GICs are what I know most about), but the issues apply to any other kind of accelerator in the system. To make it work you'd need a SMMU-per-core, or per group of cores sharing a GPT. Which means at design time you have to know which accelerator to put behind which SMMU. At which point you've practically got 'n' systems, with some amount of shared memory.
Martin Weidmann said:If it's a shared GIC, which GPT is the SMMU using to perform the GIC's GPCs?
That's a very good point. And as I said before, we'd like to leverage GPC as an extra protection layer on top of traditional virtual memory isolation...
Martin Weidmann said:Which means at design time you have to know which accelerator to put behind which SMMU.
Also, this actually falls under our constraints.
Nevertheless, I'd love to hear your thoughts regarding FVP model fidelity to these kinds of issues.