Currently working on Xilinx Zynq US+ soc where R5(2 cores in lock step) and A53 (4 cores) , PL and GPU are mounted onto a single chip.
so far we were using the concept of software based cache coherency mechanism to communicate between R5 and A53 worlds…