I'm studying the Realm Management Extensions, and a question came to mind. The Arm ARM and other documentation (e.g., den0126) suggest that, conceptually, the GPC is performed before any memory access (including the caches). However, since cache lines are tagged with the associated PA, I imagine that this cache tag is used in coherency protocols as part of the snooped address. If so, imagine a hypothetical scenario where we are using different GPTs in two coherent cores with mutually exclusive regions marked as Normal and the rest of the PA marked as Root, both running in the Normal world. Could one of the cores access the other core's memory by fetching the data via the coherency bus if it were present in the other core cache (thus tagged as Normal) despite being marked as Root in its local GPT? Would the line be fetched but blocked by the GPC? If not, this would contradict my first observation. What behavior should I expect in future implementations? Can you point me to other documentation that would clear this up for me?
Note that I am perfectly aware that CCA was designed for a single shared GPT across all PEs. However, the spec seems to suggest that this is kind of implementation dependent (constrained unpredictable behavior which allows it in one of the variants). Also, I imagine we'll only likely find TLB entries with cached GPT information shared across PEs in SMT implementations.
josecm said:imagine a hypothetical scenario where we are using different GPTs in two coherent cores
As you noted, that's not how the architecture is designed to work. The GPTs control which locations are in which physical address space - that needs to be consistent across all the devices in the system.
josecm said:Could one of the cores access the other core's memory by fetching the data via the coherency bus if it were present in the other core cache (thus tagged as Normal) despite being marked as Root in its local GPT?
No. From the spec:
R_GRGXY If GPCCR_EL3.GPC is 1, enabling granule protection checks, then all accesses are subject to granule protection checks, except for fetches of Granule Protection Table (GPT) information and accesses governed by the GPCCR_EL3.GPCP control.
Meaning that any access (e.g. a load or instruction fetch) would have to be checked against the GPT. If the PA of the attempted access doesn't match whats in the GPT, then you'll get a GPF. The requested PA being in a cache doesn't remove the need to perform the Granule Protection Check.
What you could potentially cause is corruption through incoherent access. If PE A thinks the location is NS and PE B thinks the location is S, then caches lines with both PAs could exist independently in the system. Which is going to be a problem when those lines get written back past the Point of Physical Aliasing (PoPA).
But again... the intention of the spec is that PEs don't have inconsistent GPTs:
I_BSPQD To avoid CONSTRAINED UNPREDICTABLE behavior, Root firmware must ensure that both:
Martin Weidmann, I just read your answer a couple of minutes ago, but when I came back to re-read it, it seems to have been deleted.
Huh... I'll ask the admin.
Thanks for the detailed answer. But it raised another question in my mind:
Martin Weidmann said:What you could potentially cause is corruption through incoherent access. If PE A thinks the location is NS and PE B thinks the location is S, then caches lines with both PAs could exist independently in the system. Which is going to be a problem when those lines get written back past the Point of Physical Aliasing (PoPA).
Even if I guarantee no aliasing of physical addresses across the different GPTs, wouldn't cache invalidation instructions cause a similar issue? If in a tagged shared cache, a line belonging to PE A thus marked as normal could be invalidated by PE B running in the normal world, even if PE B's GPT would have that address marked as root or non-accessible. Or even just using an invalidation by set/way...
It's an interesting question, but a few points to consider...
First (and I know I'm sounding like a broken record), RME expects all the PEs to be seeing a coherent set of GPTs. So.. if EL3 followed the spec, this situation wouldn't arise.
Second, cache operations by VA are treated as accesses, which hits R_GRGXY quoted above.
Third, taking a step back - I think you're trying to use the wrong mechanism to solve the problem.
Based on your description, I think what you're trying to do is achieve isolation between different PEs running in the same Security state. For example, two PEs both in Non-secure state, but with isolation between them. Right?
If yes, GPTs are designed to allocated resources between Security states, not within a given Security state. For isolation within a Security state you should be looking at S1 or S2 translation.
Martin Weidmann said:cache operations by VA are treated as accesses
Nevertheless, set/way-based CMOs would not have an address to check...
Martin Weidmann said:Based on your description, I think what you're trying to do is achieve isolation between different PEs running in the same Security state. For example, two PEs both in Non-secure state, but with isolation between them. Right? If yes, GPTs are designed to allocated resources between Security states, not within a given Security state. For isolation within a Security state you should be looking at S1 or S2 translation.
You are completely right. We are just exploring the possibility of leveraging RME to further harden the isolation guarantees based on page tables. I guess we'll need to wait for the real silicon to understand if this is feasible. In your experience, do FVP models provide a faithful emulation of this type of behavior? To what degree do you believe we should expect the behavior we see on the models to represent future real implementations?
So I was thinking about this more overnight.
Another problem with this approach is potentially the GIC. If you have RME, then it's very likely the system will have a GICv3 or GICv4 interrupt controller. GICv3/4 uses memory in the Non-secure PAS for storing structures for some types of interrupt.
The GIC has to be subject to GPC - just like anything else that can access memory. Which would be achieved by putting it behind an SMMU: Learn the architecture - Realm Management Extension (arm.com)
Which leads to a question: How many GICs are there in the system? One shared GIC (typical)? Or, multiple GICs (less common)?If it's a shared GIC, which GPT is the SMMU using to perform the GIC's GPCs?
I'm using the GIC as an example (because GICs are what I know most about), but the issues apply to any other kind of accelerator in the system. To make it work you'd need a SMMU-per-core, or per group of cores sharing a GPT. Which means at design time you have to know which accelerator to put behind which SMMU. At which point you've practically got 'n' systems, with some amount of shared memory.
Martin Weidmann said:If it's a shared GIC, which GPT is the SMMU using to perform the GIC's GPCs?
That's a very good point. And as I said before, we'd like to leverage GPC as an extra protection layer on top of traditional virtual memory isolation...
Martin Weidmann said:Which means at design time you have to know which accelerator to put behind which SMMU.
Also, this actually falls under our constraints.
Nevertheless, I'd love to hear your thoughts regarding FVP model fidelity to these kinds of issues.