I'm studying the Realm Management Extensions, and a question came to mind. The Arm ARM and other documentation (e.g., den0126) suggest that, conceptually, the GPC is performed before any memory access (including the caches). However, since cache lines are tagged with the associated PA, I imagine that this cache tag is used in coherency protocols as part of the snooped address. If so, imagine a hypothetical scenario where we are using different GPTs in two coherent cores with mutually exclusive regions marked as Normal and the rest of the PA marked as Root, both running in the Normal world. Could one of the cores access the other core's memory by fetching the data via the coherency bus if it were present in the other core cache (thus tagged as Normal) despite being marked as Root in its local GPT? Would the line be fetched but blocked by the GPC? If not, this would contradict my first observation. What behavior should I expect in future implementations? Can you point me to other documentation that would clear this up for me?
Note that I am perfectly aware that CCA was designed for a single shared GPT across all PEs. However, the spec seems to suggest that this is kind of implementation dependent (constrained unpredictable behavior which allows it in one of the variants). Also, I imagine we'll only likely find TLB entries with cached GPT information shared across PEs in SMT implementations.
It's an interesting question, but a few points to consider...
First (and I know I'm sounding like a broken record), RME expects all the PEs to be seeing a coherent set of GPTs. So.. if EL3 followed the spec, this situation wouldn't arise.
Second, cache operations by VA are treated as accesses, which hits R_GRGXY quoted above.
Third, taking a step back - I think you're trying to use the wrong mechanism to solve the problem.
Based on your description, I think what you're trying to do is achieve isolation between different PEs running in the same Security state. For example, two PEs both in Non-secure state, but with isolation between them. Right?
If yes, GPTs are designed to allocated resources between Security states, not within a given Security state. For isolation within a Security state you should be looking at S1 or S2 translation.
Martin Weidmann said:cache operations by VA are treated as accesses
Nevertheless, set/way-based CMOs would not have an address to check...
Martin Weidmann said:Based on your description, I think what you're trying to do is achieve isolation between different PEs running in the same Security state. For example, two PEs both in Non-secure state, but with isolation between them. Right? If yes, GPTs are designed to allocated resources between Security states, not within a given Security state. For isolation within a Security state you should be looking at S1 or S2 translation.
You are completely right. We are just exploring the possibility of leveraging RME to further harden the isolation guarantees based on page tables. I guess we'll need to wait for the real silicon to understand if this is feasible. In your experience, do FVP models provide a faithful emulation of this type of behavior? To what degree do you believe we should expect the behavior we see on the models to represent future real implementations?
So I was thinking about this more overnight.
Another problem with this approach is potentially the GIC. If you have RME, then it's very likely the system will have a GICv3 or GICv4 interrupt controller. GICv3/4 uses memory in the Non-secure PAS for storing structures for some types of interrupt.
The GIC has to be subject to GPC - just like anything else that can access memory. Which would be achieved by putting it behind an SMMU: Learn the architecture - Realm Management Extension (arm.com)
Which leads to a question: How many GICs are there in the system? One shared GIC (typical)? Or, multiple GICs (less common)?If it's a shared GIC, which GPT is the SMMU using to perform the GIC's GPCs?
I'm using the GIC as an example (because GICs are what I know most about), but the issues apply to any other kind of accelerator in the system. To make it work you'd need a SMMU-per-core, or per group of cores sharing a GPT. Which means at design time you have to know which accelerator to put behind which SMMU. At which point you've practically got 'n' systems, with some amount of shared memory.
Martin Weidmann said:If it's a shared GIC, which GPT is the SMMU using to perform the GIC's GPCs?
That's a very good point. And as I said before, we'd like to leverage GPC as an extra protection layer on top of traditional virtual memory isolation...
Martin Weidmann said:Which means at design time you have to know which accelerator to put behind which SMMU.
Also, this actually falls under our constraints.
Nevertheless, I'd love to hear your thoughts regarding FVP model fidelity to these kinds of issues.