I've got a Zynq UltraScale+ design with the following setup, that I'm having issues with regarding the correct invalidation of L2 cache regions for access over the ACP port from the FPGA fabric:
The typical setup is something like the following:
The behavior that I am seeing is that despite having A53 core 0 invalidate the L2 cache for all the memory regions that the FPGA accelerator will access, the accelerator still reads stale data from the cache. I can confirm that by dynamically changing the ARCACHE flags on the ACP transactions to disable ever allocating into the L2 cache (before starting any tasks) that the correct data is read until re-enabling cache allocation, at which point the cache gets filled and stale data starts being returned.
It seems that the attempts by the A53 core to invalidate the L2 cache are not actually invalidating the portions of the cache that were allocated by the ACP reads from the FPGA. It IS correctly invalidating the cache for regions of memory that are accessed by the A53 core tasks, as cache accesses through those behave consistently with respect to what's been invalidated.
Is there something else I need to be doing to get it to correctly invalidate the L2 cache?