This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

VMSAv8-64 -- worst-case effects of misprogramming of the Contiguous bit

I would like to precisely understand the implications of misprogramming the Contiguous bit in VMSAv8-64 translation tables.

I have a hypervisor running at EL2 in the AArch64 execution state, using two-stage memory translation for the guests. At some point, the hypervisor needs to remove a guest's access to an IPA range, and sometime later restore it; one way to do so is by clearing the Access Flag in the relevant Stage 2 descriptors, then invalidating the relevant TLB entries, and sometime later setting the Access Flags again. Some other CPUs may be concurrently accessing the affected memory ranges.

I foresee no problem as long as none of the affected descriptors boast the Contiguous bit. If some of them do, however, I cannot atomically update a block of adjacent descriptors constrained by the Contiguous bit, therefore I would temporarily violate the constraint, which other CPUs could observe.

Ideally, I would like a guarantee for a worst-case scenario such as "if some descriptor d1 with value v1 has the Contiguous bit set, then any access involving a descriptor d2 in the same block b of adjacent descriptors as d1 may behave as if d2's value were consistent with v1" (here, consistent means satisfying the Contiguous bit constraint). This is typically what could happen if a translation table walk loaded d1, cached an entry for b in the TLB based on v1, and a subsequent walk reused this entry rather than loading d2.

I turned to the ARMv8-A Reference Manual for answers, and got confused by the wording in section D4.2.6, under "Misprogramming of the Contiguous bit":

In some implementations, such misprogramming might also give rise to a TLB Conflict abort.
The architecture guarantees that misprogramming of the Contiguous bit cannot provide a mechanism for any of the following to occur:

  • Software executing at EL1 or EL0 accessing regions of physical memory that are not accessible by programming the translation tables, from EL1, with arbitrary chosen values that do not misprogram the Contiguous bit.
  • Software executing at EL1 or EL0 accessing regions of physical memory with attributes or permissions that are not possible by programming the translation tables, from EL1, with arbitrary chosen values that do not misprogram the Contiguous bit.
  • Software executing in Non-secure state accessing Secure physical memory.

It seems that I may have to worry about TLB Conflict aborts; beyond that, I am unsure what to expect. My interpretation of the manual is that misprogramming the Contiguous bit at privilege level X must never allow X to escape the access restrictions that more privileged levels have enforced, assuming that those levels do not misprogram the Contiguous bit. This would be too weak a guarantee for my purposes: in my case, EL2 would misprogram the Contiguous bit, but I would still like guarantees for the accesses from EL0 and EL1.

Does this mean that the only portable options are not to use the Contiguous bit in such cases, or to make sure that other CPUs cannot access the affected ranges while the descriptors are being modified, and until the TLB entries have been invalidated?

Parents
  • Ok, somehow it seems to have lost my post. Apologies if this ends up showing up twice!

    A TLB Conflict abort can occur when the processor is able to create multiple valid TLB entries for a given address.  Which mis-programming the Contig bit can lead to.  However, the processor cannot create a TLB entry if the translation resulting in a Translation Fault or an Access Flag fault.

    So when you clear the AF bit ( * ), you prevent the processor from creating a TLB entry from that translation table entry. 

    You'd need to clear the AF bits of each entry in the contiguous block, and then do the TLB invalidate.  The processor might try to read the entries in the middle of the routine. 

    ( * this is assuming you don't have hardware of the AF bit enabled)

Reply
  • Ok, somehow it seems to have lost my post. Apologies if this ends up showing up twice!

    A TLB Conflict abort can occur when the processor is able to create multiple valid TLB entries for a given address.  Which mis-programming the Contig bit can lead to.  However, the processor cannot create a TLB entry if the translation resulting in a Translation Fault or an Access Flag fault.

    So when you clear the AF bit ( * ), you prevent the processor from creating a TLB entry from that translation table entry. 

    You'd need to clear the AF bits of each entry in the contiguous block, and then do the TLB invalidate.  The processor might try to read the entries in the middle of the routine. 

    ( * this is assuming you don't have hardware of the AF bit enabled)

Children