Route EL1 synchronous exceptions to Hypervisor at EL2

Hi Everyone,

We're developing an hypervisor for the ARM Cortex A53 that has an embedded health monitor.

Right now, we are attempting to route EL1 synchronous exceptions, like stage 1 MMU translation fault, to EL2.

The reason behind this is because we want to inform the health monitor about the degraded state of an executing guest OS.

However, looking into the documentation, it does not seem that this behavior is supported through the configuration of HCR_EL2.

So, for the sake of clarity, the current exception I'm having at EL1 provides the following ESR_EL1: 0x96000005

For some reason, the code tries to read from an address that is not permitted by the stage 1 MMU configuration.

When the hypervisor schedules the guest OS (bare-metal application) and the app code tries to read from an access without permission, it will throw the Data Abort exception without a change in EL.

The only way I've seen so far, is to provide an hypervisor call (HVC) that would be called from the guest OS, but this requires the guest OS to call it.

This would undermine our goal of having a general purpose hypervisor that would implement full virtualization without require any kind of coupling from the guest OS side.

Is there any option that would allow us to have an hypervisor trapping such exceptions?

Thanks in advance.

Parents
  • I don't think there is an easy way to do what you describing.  The assumption is that exceptions like MMU faults are best dealt with by the thing that controls the cause of the exception.  In this case, the EL1 OS and the stage 1 MMU.  It's the OS that knows why it set up the MMU in the way that caused the fault, therefore it's the OS that is most likely to know what to do in response. 

    But taking a step back:  The hypervisor's health monitor - what is it looking for/trying to achieve?

    Bouncing all sync exceptions from EL1 to EL2 is going to have interesting effects on performance.  And I'd expect most exceptions take to EL1 to be normal/expected (paging faults, system calls, lazy context switching...).  The hypervisor would need to know a lot about the VM's OS to understand whether the exception it had trapped was normal operation or not.

Reply
  • I don't think there is an easy way to do what you describing.  The assumption is that exceptions like MMU faults are best dealt with by the thing that controls the cause of the exception.  In this case, the EL1 OS and the stage 1 MMU.  It's the OS that knows why it set up the MMU in the way that caused the fault, therefore it's the OS that is most likely to know what to do in response. 

    But taking a step back:  The hypervisor's health monitor - what is it looking for/trying to achieve?

    Bouncing all sync exceptions from EL1 to EL2 is going to have interesting effects on performance.  And I'd expect most exceptions take to EL1 to be normal/expected (paging faults, system calls, lazy context switching...).  The hypervisor would need to know a lot about the VM's OS to understand whether the exception it had trapped was normal operation or not.

Children