Inconsistent shareability domain on tlbi instructions

I'm using a IMX8QM system which features a dual-core A72 cluster plus a quad-core A53 cluster. Running on EL2 from one of the A53 cores I want to unmap a single page for all cores, so after I remove the entry for the page table I use the tlb invalidation instructions accompanied by the usual synchronization instructions.

If I execute a "tlbi alle2is" instruction all goes fine. The translation is invalidated for all cores. However, if I use "tlbi vae2is" the cached TLB entries are invalidated only for the A53 cluster. If I execute it from one of the A72 cores everything goes fine again, every core sees the entry invalidated. In all cases, if I remove the "is" part of the instruction only the core where its executing has the pte invalidated. 

I have a synchronization barrier that guarantees the A72 cores do not use that address until well after the invalidation.

What can I be doing wrong here?

More questions in this forum