Correctly invalidating Cortex-A53 shared L2 cache for access through ACP?

Dylan Barrie 1 month ago

I've got a Zynq UltraScale+ design with the following setup, that I'm having issues with regarding the correct invalidation of L2 cache regions for access over the ACP port from the FPGA fabric:

PCIe interface in the FPGA fabric, copies data to DDR
A53 cores used to run tasks, operating on that memory
- MMU is used by the A53 cores, and all memory regions in question here are marked as Normal Cacheable, Inner Shareable.
FPGA contains hardware acceleration for parts of those tasks, accessing the L2 cache through the ACP port
- FPGA accelerator contains its own (small) L1 cache, and only reads memory.
- Reads through the ACP port are set to allocate in the L2 cache.
- Reads are specified as Outer Sharable (since this device is outside of the "inner" A53 cluster).
- Note that the FPGA accelerator does NOT touch the same memory as is used by the A53 cores! The A53 cores should never be pulling this memory into their L1 caches.
Before running tasks, A53 cores invalidate their own L1 caches for the regions of memory they will be accessing, followed by core 0 invalidating L2 for those same regions
- Core 0 also invalidates the L2 for memory regions that will be read by the FPGA accelerator

The typical setup is something like the following:

Host CPU sets up a bunch of data, uploads to device, and kicks off tasks to run
Similar tasks are executed on the same data for a while...
Host CPU changes some portion of the data, uploads to device, and kicks off new tasks to run
Run similar tasks for a bit, repeat

The behavior that I am seeing is that despite having A53 core 0 invalidate the L2 cache for all the memory regions that the FPGA accelerator will access, the accelerator still reads stale data from the cache. I can confirm that by dynamically changing the ARCACHE flags on the ACP transactions to disable ever allocating into the L2 cache (before starting any tasks) that the correct data is read until re-enabling cache allocation, at which point the cache gets filled and stale data starts being returned.

It seems that the attempts by the A53 core to invalidate the L2 cache are not actually invalidating the portions of the cache that were allocated by the ACP reads from the FPGA. It IS correctly invalidating the cache for regions of memory that are accessed by the A53 core tasks, as cache accesses through those behave consistently with respect to what's been invalidated.

Is there something else I need to be doing to get it to correctly invalidate the L2 cache?

Top replies

Dylan Barrie 1 month ago +1 verified

And after looking more closely at the MMU translation table, I realized that the memory region in question was marked as non-secure. Thus, the addresses don't match between the CPU performing the invalidation...

0 Dylan Barrie 1 month ago
As some additional evidence toward it being something funky with the L2, if I fully invalidate the entire L2 cache (via set/way iteration) instead of just invalidating the address ranges that have been modified, I no longer see stale data through the ACP.

I'm thinking that there must be something that's making the addresses specified by the firmware when invalidating the L2 cache not match the address that the ACP is trying to access, as far as the L2 is concerned. Some more context:

The firmware here is running in EL3

The interrupt that triggers this cache invalidation should also be is executing at EL3 (confirmed by reading CurrentEL)

The ACP transactions are marked as data, secure, unprivileged (ARPROT = 3'b000)

I've been pouring over the Cortex-A series programmer's guide and the TRM sections on the MMU and cache, and from my understanding there may be a mismatch between virtual addresses in the L2 when one is tagged as secure and another with the same VA is tagged as non-secure, as the two security domains are completely separate from each other. Everything in the firmware should be operating in the secure domain (it's all in EL3), but maybe there is something I've missed that could affect that?
Cancel
Vote up 0 Vote down

Reply

Accept answer

Cancel
+1 Dylan Barrie 1 month ago

And after looking more closely at the MMU translation table, I realized that the memory region in question was marked as non-secure. Thus, the addresses don't match between the CPU performing the invalidation with the non-secure address and the FPGA reading from the secure address.

Won't be making that mistake again, sheesh!
Cancel
Vote up +1 Vote down

Reply

Accept answer

Reject answer

Cancel