This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Is the isb necessary between modifying ttbr and flushing TLB?

I encountered a problem about replacing the active TTBR0_EL2 register. It seems that the new data loading does not use the new page table, or the instruction execution is messed up. In short, Data Abort is triggered, and ISS indicates that the error details are at translation level 1.

I compared the process of modifying the page table base address in the Linux kernel and found that the Linux kernel added an `isb` instruction between modifying ttbr and TLB refresh. So I added it and found that everything worked fine. I read the ARM manual and some online information, but I didn't figure out why.

  1. why memory barrier is needed?
  2. if yes, why can't use `dsb` replaced?

Configuration: MMU on, dcache off, icache off

```asm
; linux/arch/arm64/mm/proc.S .macro __idmap_cpu_set_reserved_ttbr1, tmp1, tmp2 adrp \tmp1, reserved_pg_dir phys_to_ttbr \tmp2, \tmp1 offset_ttbr1 \tmp2, \tmp1 msr ttbr1_el1, \tmp2 isb tlbi vmalle1 dsb nsh isb .endm
```
  • why memory barrier is needed?

    I think this is section of the Arm ARM you need:

    B2.10.1 Instruction Synchronization Barrier (ISB)
    An ISB instruction ensures that all instructions that come after the ISB instruction in program order are fetched from
    the cache or memory after the ISB instruction has completed. Using an ISB ensures that the effects of
    context-changing operations executed before the ISB are visible to the instructions fetched after the ISB instruction.
    Examples of context-changing operations that require the insertion of an ISB instruction to ensure the effects of the
    operation are visible to instructions fetched after the ISB instruction are:

    • Completed cache and TLB maintenance instructions.
    • Changes to System registers

    A change to a TTBR is a change to the PE's context.  A change to context is only guaranteed to be visible after a context synchronization event , which is either an ISB or exception entry/exit (note: see FEAT_ExS).  Meaning in your case, without the ISB the PE might still be using the "old" context at the time of TLBI.  Leading it to attempt to re-walk the tables using the old configuration.

    if yes, why can't use `dsb` replaced?

    Because ISBs and DSBs do different things.  Sometimes you need one, sometimes the other, sometimes a combination - it all depends on what you are trying to achieve.  If you had one "super" barrier that did everything that would end up being inefficient as often you'd be over ordering.

            msr  ttbr1_el1, \tmp2
            isb  <-- ensure change in context (write to TTBR) is visible to following instructions
            tlbi vmalle1
            dsb  nsh <-- ensure TLBI completes before processing next instruction
            isb <-- ensure instructions beyond this point are fetched using the post-TBLI translations
  • Thanks for reply. 

    "without the ISB the PE might still be using the "old" context at the time of TLBI.  Leading it to attempt to re-walk the tables using the old configuration."

    Assuming that in a single-core environment, only the following few instructions may use the old page table, the final `isb` must cause context synchronization, and subsequent instructions after this code-block will be normal, right? I also have the final `isb`, but data abort  ELR indicates that exception occurred after this code-block, is a `ldr` instruction.

  • That doesn't quite work.

    Let's work through the example with and without the first ISB.  First, without:

    1        msr    ttbr1_el1, \tmp2
    2        nop 
    3        tlbi   vmalle1
    4        dsb    nsh
    5        isb
    6 <some instruction>

    The effect of writing TTBR1 is no guaranteed to be visible until the ISB at line (5).  The TLBI at line (3) invalidates cached translations.  As soon as the TLBI completes, the processor is allowed to speculatively walk the translation tables and cache entries in the TLBs.

    Meaning we have a problem.  There is a window between lines 3 and 5 where the processor might do speculative table walks and might be using the old TTBR1 value.  We don't know that it will, but it might - so we should assume it will.

    The ISB at line 5 means the subsequent instructions see the effect of the TTBR update.  But that doesn't do anything about the speculative table walks that have already happened, or the already cached TLB entries.  Meaning that instructions at line 6 onwards could be using stale translations.

    Now lets but the ISB at line 2 back in:

    1        msr    ttbr1_el1, \tmp2
    2        isb 
    3        tlbi   vmalle1
    4        dsb    nsh
    5        isb
    6 <some instruction>

    With an ISB at line 2, the update to TTBR1 at line (1) must be visible before the TLBI at line (3).  The MMU might still speculatively refill the TLBs between lines 3 and 5, but it must do so using the new TTBR1 value.  Therefore, from line 6 we can guarantee that we only see the new translations.

  • This solves my confusion. Thanks again.

  • BTW, do you think break-before-make is necessary for me? (Now I run in EL2 low address, and I try to change TTBR0_EL2 to a whole new page table)

  • It depends on what you're trying to achieve.

    The advantage of break-before-make (BBM) is predictability. TLBs are not permitted to cache invalid translations (ones that result in a Translation Fault).  BBM therefore lets you ensure that the old and new translation can't be the TLB at the same time (which would be a bad thing).

    But you don't always need that.  For example, if you're doing a task switch, the different tasks would be typically be ASID (or ASID+VMID) tagged.  It's perfectly valid to have the same translation in the TLBs if they're tagged with different ASIDs.  That assumes that you can atomically switch the TTBR and ASID, which in AArch64 you can.