This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

How to trap Guest data aborts

Hi,

I am trying to understand if Guest OS data abort happens due to accessing some memory (e.g GIC distributor space) then is there any way I can route it to EL2 mode ?

I looked into HCR_EL2 register bits and tried setting AMO bit but it doesn't help. I see the control still reaching to EL1 Syn handler.

I would like to provide some RW memory functionality from EL2 so I want to trap it ?

Thanks.

  • Broadly speaking there are three cause of data aborts while in the guest:

    • Stage 1 MMU fault (the access was blocked by the guest's translation tables)
    • Stage 2 MMU fault (the access was allowed by the guest, but blocked by the hypervisor)
    • External abort (the access was allowed by both the guest and hypervisor, but failed in the memory system)

    Stage 1 faults will be taken to NS.EL1 (the guest).  (Unless you set HCR_EL2.TGE==1, but that does a whole load of other stuff as well)

    Stage 2 faults will be taken to EL2 (the Hypervisor)

    Where external aborts go will depend on whether they asynchronous or not.  Asynchronous (and external aborts will normally be so) are controlled by SCR_EL3 and HCR_EL2.

    I wasn't entirely clear what type of abort you referring to, but I hope this helps.

  • Hi Martin,

    I am facing issue where my stage 2 MMU fault is not getting trapped by hypervisor.

    My understanding is that

    Stage1 (VA->IPA)   Linux OS

    Stage2 (IPA->PA)   Hyp

    I am setting up my GIC memory at stage2 (Hypervisor) level as DnGnRE (device memory) and with no permissions.

    I expect that when linux OS tries to access this memory it should trap to EL2 level but instead it is going to el1_sync handler.

    I can clearly see that when the GIC memory VA is accessed the data abort happens at EL1 (linux) which is valid and the FAR_EL1 register also shows correct

    virtual address. I also checked my Linux MMU setup and it does map VA->IPA correctly but some how this abort doesn't get trapped to EL2 level.

    I have checked all the bits in HCR_EL2 in order to trap Linux GIC RWs (Distributor) to EL2 but it is not happening.

    Please let me know if I am missing something for HCR_EL2 bits except PTW and AMO bits ?

    Thanks.

  • A given access might fail the stage 1 and stage 2 checks.  In that case the stage 1 fault is taken.  This is what I suspect is happening to you.

    What does ESR_EL1?

  • Hi Martin,

    Yes you are correct and  ESR_EL1 is 0x96000045 and FAR_EL1 shows the valid  virtual address for the fault at Linux/EL1 as 0xFFFF800000000000 --> 0x2C010000 IPA

    but based on my mappings this should not be happening. As again I am mapping 0x2c01_0000 --> some EL2 memory space.


    Map/permissions at EL1 level for the mappings are -

    0xFFFFFF8000000000Level 3 PageNP:0x2C010000UXN=1, PXN=1, Contiguous=0, nG=0, AF=1, SH=0x3, AP=0x3, AttrIndx=0x1

    Map/permission at EL2

    0x2C010000Level 3 PageNP:0x80280000XN=1, Contiguous=0, AF=1, SH=0x3, S2AP=0x0, MemAttr=0x0

    I am not sure why I am getting stage 1 fault rather than stage 2 fault ?

    Thanks.

  • I think you might need to re-check your translation tables.  The ESR_EL1 code reported means "Translation fault, first level".  That means a fault from your level 1 table (the name is little confusing, as it's not necessarily "first" if you have a level 0 table).

  • Hi Martin,

    I was trying to narrow down above problem and found one issue with Juno r0 platform.

    I found that when I setup MMU at EL1 level and write MAIT_EL1 attributes to 0x000000FF440C0400 and

    then write to SCTLR_EL1 to enable MMU the MAIR_EL1 properties changed to 0x000000CC440C0400 which

    is causing the memory attributes at EL1 level to change. This is same happening with Linux kernel.

    Only difference with bare metal and above code is that I am having my own hypervisor which is setting up vttbr

    but those settings shouldn't cause above behavior ? I am not sure if it is a chip bug in Juno r0 ?

    I want to confirm if there is such silicon bug exists and if there is any work around for above problem ?

    Thanks.

  • When exactly do they change?  If you write the register and read is back immediately, what value do you see?

  • Hi Martin,

    I am writing MAIR_EL1 value as 0x000000FF440C0400 and I can see from DS-5 debugger this value but as soon as I enable MMU writing to SCTLR_EL1 the value in MAIR_EL1 changes to 0x000000CC440C0400 which I see in DS-5.


    Thanks.