A quick question:
Is it possible to run two different OS on different ARMv9 cores (such as A720) in same cluster with no EL2 (Hypervisor) enabled? If not, what stop this.
Is it possible: yesAre there several potential challenges: also yesThe challenge comes from the lack of isolation and that some of peripherals are shared (when each OS likely assumes they aren't).
Taking the isolation point, if OS A does something wrong (whether a bug, malicious, or something else) there's nothing to stop it interfering with OS B. For example, imagine a bug in OS A caused it to write to random locations. Those locations could be in OS B's kernel memory, and those writes take down OS B.
For the shared peripherals, my go to example is the GIC (Generic Interrupt Controller). The GIC drivers I've seen assume that they have full control of the GIC, at least for that Security state. The driver has no concept that there might be another OS/driver doing stuff to the same GIC at the same time. Could you re-work the drivers to be aware of this and operate accordingly? Sure, but that's work that hasn't been done in the drivers I have seen. Plus, again, if the driver in OS A went wrong, it could do stuff to the GIC that breaks OS B - again violating isolation.
A hypervisor, potentially quite a thin hypervisor, could fix both the above problems. Stage translation could be used to prevent one OS messing with the private resources of the other OS. Presenting a Virtual GIC to each OS, with the hypervisor handling mapping to the physical GIC, means you can run with a standard driver and maintain isolation.
Thanks a lot, Martin Weidmann . Got your point. For the isolation issue, technically it can be handled by some HW isolation mechanism implemented in the SOC, right?
Besides of the isolation and GIC issue, I am wondering if any other restriction which makes this impossible. If DSU (L3 cache, snoop, etc) is one of the restriction?
Thanks.
SoC level isolation, such as system MPUs, would be tricky. The issue is that that once the transaction leaves the processor, how are you going to know which core (and hence which OS) it came from? Also, the shared in processor caches would be above the system MPU, so unaffected by SoC-level filtering. Not saying you couldn't come up with a solution - but I suspect none would be as simple or well understood as using the existing EL2 controls. Allows multiple different software stacks to run in isolated boxes is what EL2 is there for - what's the advantage of inventing a different mechanism?
The GIC was more of an example of a kind of issue. You could go through the devices that need to be shared (GIC, power control, system timer....) and come up with solutions for each. The question would be - why is that better than using a hypervisor of some form?
Thanks a lot for your crystal clean explanations. Martin Weidmann