We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
Hi,
I am trying to understand if Guest OS data abort happens due to accessing some memory (e.g GIC distributor space) then is there any way I can route it to EL2 mode ?
I looked into HCR_EL2 register bits and tried setting AMO bit but it doesn't help. I see the control still reaching to EL1 Syn handler.
I would like to provide some RW memory functionality from EL2 so I want to trap it ?
Thanks.
Broadly speaking there are three cause of data aborts while in the guest:
Stage 1 faults will be taken to NS.EL1 (the guest). (Unless you set HCR_EL2.TGE==1, but that does a whole load of other stuff as well)
Stage 2 faults will be taken to EL2 (the Hypervisor)
Where external aborts go will depend on whether they asynchronous or not. Asynchronous (and external aborts will normally be so) are controlled by SCR_EL3 and HCR_EL2.
I wasn't entirely clear what type of abort you referring to, but I hope this helps.
Hi Martin,
I am facing issue where my stage 2 MMU fault is not getting trapped by hypervisor.
My understanding is that
Stage1 (VA->IPA) Linux OS
Stage2 (IPA->PA) Hyp
I am setting up my GIC memory at stage2 (Hypervisor) level as DnGnRE (device memory) and with no permissions.
I expect that when linux OS tries to access this memory it should trap to EL2 level but instead it is going to el1_sync handler.
I can clearly see that when the GIC memory VA is accessed the data abort happens at EL1 (linux) which is valid and the FAR_EL1 register also shows correct
virtual address. I also checked my Linux MMU setup and it does map VA->IPA correctly but some how this abort doesn't get trapped to EL2 level.
I have checked all the bits in HCR_EL2 in order to trap Linux GIC RWs (Distributor) to EL2 but it is not happening.
Please let me know if I am missing something for HCR_EL2 bits except PTW and AMO bits ?
A given access might fail the stage 1 and stage 2 checks. In that case the stage 1 fault is taken. This is what I suspect is happening to you.
What does ESR_EL1?
Yes you are correct and ESR_EL1 is 0x96000045 and FAR_EL1 shows the valid virtual address for the fault at Linux/EL1 as 0xFFFF800000000000 --> 0x2C010000 IPA
but based on my mappings this should not be happening. As again I am mapping 0x2c01_0000 --> some EL2 memory space.
Map/permissions at EL1 level for the mappings are -
Map/permission at EL2
I am not sure why I am getting stage 1 fault rather than stage 2 fault ?
I think you might need to re-check your translation tables. The ESR_EL1 code reported means "Translation fault, first level". That means a fault from your level 1 table (the name is little confusing, as it's not necessarily "first" if you have a level 0 table).
I was trying to narrow down above problem and found one issue with Juno r0 platform.
I found that when I setup MMU at EL1 level and write MAIT_EL1 attributes to 0x000000FF440C0400 and
then write to SCTLR_EL1 to enable MMU the MAIR_EL1 properties changed to 0x000000CC440C0400 which
is causing the memory attributes at EL1 level to change. This is same happening with Linux kernel.
Only difference with bare metal and above code is that I am having my own hypervisor which is setting up vttbr
but those settings shouldn't cause above behavior ? I am not sure if it is a chip bug in Juno r0 ?
I want to confirm if there is such silicon bug exists and if there is any work around for above problem ?
When exactly do they change? If you write the register and read is back immediately, what value do you see?
I am writing MAIR_EL1 value as 0x000000FF440C0400 and I can see from DS-5 debugger this value but as soon as I enable MMU writing to SCTLR_EL1 the value in MAIR_EL1 changes to 0x000000CC440C0400 which I see in DS-5.