Since ARM caches are physically indexed is there any way to flush based on the PA? I know I can get the set, but what about the way? If I am flushing from L1 would I have to flush all ways in L1 and then L2 assuming there is no L3 to get to system memory?
Is there an example of this written somewhere? All examples I have seen use VA.
Thanks!
Please also find below a ready to use function for address translation with all the attributes. This is for EL3 but you should be able to modify it use at at EL2 and EL1 too.
void static display_mapping(unsigned long address) { unsigned long par_el1; printf("----- Translating VA 0x%lx\n", address); __asm__ __volatile__ ("at s1e3r, %0" : : "r" (address)); __asm__ __volatile__ ("mrs %0, PAR_EL1\n" : "=r" (par_el1)); if (0 != (par_el1 & 1)) { printf("Address Translation Failed: 0x%lx\n" " FSC: 0x%lx\n" " PTW: 0x%lx\n" " S: 0x%lx\n", address, (par_el1 & 0x7e) >> 1, (par_el1 & 0x100) >> 8, (par_el1 & 0x200) >> 9); } else { printf("Address Translation Succeeded: 0x%lx\n" " SH: 0x%lx\n" " NS: 0x%lx\n" " PA: 0x%lx\n" "ATTR: 0x%lx\n", address, (par_el1 & 0x180) >> 7, (par_el1 & 0x200) >> 9, par_el1 & 0xfffffffff000, (par_el1 & 0xff00000000000000) >> 56); } return; }
But this is VA to PA mapping. AFAIK there is no way to get a VA by writing PA to a register. Right?
As far as I'm concerned there is no way to get a VA from PA unless you connect with the Coresight Debugger, eg. DS-5 that will tell you that.
But let's take it the other way round: why would anyone want to flush (or invalidate or clean and invalidate) by PA? ARM gives you an option to flush by set/way or VA. I cannot see a use case these won't be sufficient.
DMA controllers use the PA. So I could think of a scenario, where you need to flush the cache before activating the DMA. But as I first wrote, in general one should know which PA is mapped to which VA. Esp. because a PA range could be mapped to multiple VA ranges.
If we take other Processing Elements but ARM cores than yes, there could be loads of them: except DMA, there is DSP, another Cortex, eg. M integrated into SoC, FPGA outside of the SOC, etc.. Some may have coherent caches such as ARM and/or DSP if configured to being able to send and receive snoops and other won't as being able only to send snoops.
But unless implemented these other Elements cannot run (manual) Cache Maintenance Operations though they may participate in the coherency managed in HW.
On top, though I'm by no means an expert in this, only know of, the system may have the System MMU. Then even the DMA shouldn't need to know the PA2VA mapping as operates only on VA.