I'm using a IMX8QM system which features a dual-core A72 cluster plus a quad-core A53 cluster. Running on EL2 from one of the A53 cores I want to unmap a single page for all cores, so after I remove the entry for the page table I use the tlb invalidation instructions accompanied by the usual synchronization instructions.
If I execute a "tlbi alle2is" instruction all goes fine. The translation is invalidated for all cores. However, if I use "tlbi vae2is" the cached TLB entries are invalidated only for the A53 cluster. If I execute it from one of the A72 cores everything goes fine again, every core sees the entry invalidated. In all cases, if I remove the "is" part of the instruction only the core where its executing has the pte invalidated.
I have a synchronization barrier that guarantees the A72 cores do not use that address until well after the invalidation.
What can I be doing wrong here?
Are you sure to have the same VA on all cores?
I'm pretty sure, because it works if I execute the instruction from the A72 cores.
Someone on NXP community forums pointed out that this is actually a hardware bug described on an NXP errata ERR050104 (https://community.nxp.com/external-link.jspa?url=https%3A%2F%2Fwww.nxp.com%2Fdocs%2Fen%2Ferrata%2FIMX8_1N94W.pdf)