I encountered a problem about replacing the active TTBR0_EL2 register. It seems that the new data loading does not use the new page table, or the instruction execution is messed up. In short, Data Abort is triggered, and ISS indicates that the error details are at translation level 1.
I compared the process of modifying the page table base address in the Linux kernel and found that the Linux kernel added an `isb` instruction between modifying ttbr and TLB refresh. So I added it and found that everything worked fine. I read the ARM manual and some online information, but I didn't figure out why.
isb`
dsb`
Configuration: MMU on, dcache off, icache off
```asm; linux/arch/arm64/mm/proc.S .macro __idmap_cpu_set_reserved_ttbr1, tmp1, tmp2 adrp \tmp1, reserved_pg_dir phys_to_ttbr \tmp2, \tmp1 offset_ttbr1 \tmp2, \tmp1 msr ttbr1_el1, \tmp2 isb tlbi vmalle1 dsb nsh isb .endm```
That doesn't quite work.
Let's work through the example with and without the first ISB. First, without:
1 msr ttbr1_el1, \tmp2 2 nop 3 tlbi vmalle1 4 dsb nsh 5 isb6 <some instruction>
The effect of writing TTBR1 is no guaranteed to be visible until the ISB at line (5). The TLBI at line (3) invalidates cached translations. As soon as the TLBI completes, the processor is allowed to speculatively walk the translation tables and cache entries in the TLBs.
Meaning we have a problem. There is a window between lines 3 and 5 where the processor might do speculative table walks and might be using the old TTBR1 value. We don't know that it will, but it might - so we should assume it will.
The ISB at line 5 means the subsequent instructions see the effect of the TTBR update. But that doesn't do anything about the speculative table walks that have already happened, or the already cached TLB entries. Meaning that instructions at line 6 onwards could be using stale translations.
Now lets but the ISB at line 2 back in:
1 msr ttbr1_el1, \tmp2 2 isb 3 tlbi vmalle1 4 dsb nsh 5 isb6 <some instruction>
With an ISB at line 2, the update to TTBR1 at line (1) must be visible before the TLBI at line (3). The MMU might still speculatively refill the TLBs between lines 3 and 5, but it must do so using the new TTBR1 value. Therefore, from line 6 we can guarantee that we only see the new translations.
This solves my confusion. Thanks again.
BTW, do you think break-before-make is necessary for me? (Now I run in EL2 low address, and I try to change TTBR0_EL2 to a whole new page table)
It depends on what you're trying to achieve.
The advantage of break-before-make (BBM) is predictability. TLBs are not permitted to cache invalid translations (ones that result in a Translation Fault). BBM therefore lets you ensure that the old and new translation can't be the TLB at the same time (which would be a bad thing).
But you don't always need that. For example, if you're doing a task switch, the different tasks would be typically be ASID (or ASID+VMID) tagged. It's perfectly valid to have the same translation in the TLBs if they're tagged with different ASIDs. That assumes that you can atomically switch the TTBR and ASID, which in AArch64 you can.