Since ARM caches are physically indexed is there any way to flush based on the PA? I know I can get the set, but what about the way? If I am flushing from L1 would I have to flush all ways in L1 and then L2 assuming there is no L3 to get to system memory?
Is there an example of this written somewhere? All examples I have seen use VA.
Thanks!
AFAIK, the only way is that you "know" the VA of the PA you want to flush.
Not quite. Taking Cortex-A53:
This is inner caches, for outer caches, eg. L3 could be different but as you say you don't have.
m0sf3tz said: I know I can get the set, but what about the way?
if you want to flush a set manually (that is by SW Cache Maint. Operations) you must clean a set from each of the ways as you cannot tell what way your date is. You can tell the set but not the way as you noticed. Well HW design surely have the means to determine that but it is not exposed to us.
m0sf3tz said:If I am flushing from L1 would I have to flush all ways in L1 and then L2 assuming there is no L3 to get to system memory?
You have a variety of options:
There are loads of examples in Uboot at arch/arm/cpu/armv8/cache.S.
If you have a system with more than 1x Cluster then you should also take into consideration what Cache Maint. Operations are broadcast
IC IALLUIS I-cache invalidate all to Point of Unification, Inner Shareable Yes (inner only)IC IALLU I-cache invalidate all to Point of Unification NoaIC IVAU, Xt I-cache invalidate by address to Point of Unification MaybebDC ZVA, Xt D-cache zero by address NoDC IVAC, Xt D-cache invalidate by address to Point of Coherency YesDC ISW, Xt D-cache invalidate by Set/Way NoDC CVAC, Xt D-cache clean by address to Point of Coherency MaybebDC CSW, Xt D-cache clean by Set/Way NoDC CVAU, Xt D-cache clean by address to Point of Unification MaybebDC CIVAC, Xt D-cache clean and invalidate by address to Point of Coherency YesDC CISW, Xt D-cache clean and invalidate by Set/Way No
If an operation is not broadcast OS/Uboot (or any other) must issue the clean or invalidate operations locally on each core even though you may have the Cache Coherent Network/Interconnect connecting the Clusters.
Loads of details are in "ARM® Cortex®-A Series Version: 1.0 Programmer’s Guide for ARMv8-A"
Please also find below a ready to use function for address translation with all the attributes. This is for EL3 but you should be able to modify it use at at EL2 and EL1 too.
void static display_mapping(unsigned long address) { unsigned long par_el1; printf("----- Translating VA 0x%lx\n", address); __asm__ __volatile__ ("at s1e3r, %0" : : "r" (address)); __asm__ __volatile__ ("mrs %0, PAR_EL1\n" : "=r" (par_el1)); if (0 != (par_el1 & 1)) { printf("Address Translation Failed: 0x%lx\n" " FSC: 0x%lx\n" " PTW: 0x%lx\n" " S: 0x%lx\n", address, (par_el1 & 0x7e) >> 1, (par_el1 & 0x100) >> 8, (par_el1 & 0x200) >> 9); } else { printf("Address Translation Succeeded: 0x%lx\n" " SH: 0x%lx\n" " NS: 0x%lx\n" " PA: 0x%lx\n" "ATTR: 0x%lx\n", address, (par_el1 & 0x180) >> 7, (par_el1 & 0x200) >> 9, par_el1 & 0xfffffffff000, (par_el1 & 0xff00000000000000) >> 56); } return; }
But this is VA to PA mapping. AFAIK there is no way to get a VA by writing PA to a register. Right?
As far as I'm concerned there is no way to get a VA from PA unless you connect with the Coresight Debugger, eg. DS-5 that will tell you that.
But let's take it the other way round: why would anyone want to flush (or invalidate or clean and invalidate) by PA? ARM gives you an option to flush by set/way or VA. I cannot see a use case these won't be sufficient.
DMA controllers use the PA. So I could think of a scenario, where you need to flush the cache before activating the DMA. But as I first wrote, in general one should know which PA is mapped to which VA. Esp. because a PA range could be mapped to multiple VA ranges.
If we take other Processing Elements but ARM cores than yes, there could be loads of them: except DMA, there is DSP, another Cortex, eg. M integrated into SoC, FPGA outside of the SOC, etc.. Some may have coherent caches such as ARM and/or DSP if configured to being able to send and receive snoops and other won't as being able only to send snoops.
But unless implemented these other Elements cannot run (manual) Cache Maintenance Operations though they may participate in the coherency managed in HW.
On top, though I'm by no means an expert in this, only know of, the system may have the System MMU. Then even the DMA shouldn't need to know the PA2VA mapping as operates only on VA.