We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
Hello everyone, my 1st question to the ARM community; please excuse my ignorance. Fairly recently, I shared OCM (on-chip-memory) on Xilinx Zynq processor (which is dual ARM Cortex A9). To pass message between the 2 cores, I followed the Xilinx example and turned off L1 and L2 cache for this OCM. But at first, I actually tried to keep L1 turned on the OCM by relying on DMB to do the job, which did NOT work. I realized that I did NOT understand cache after all. Scanning the web, this is what I gathered:
But beyond the basics, I have a few questions:
Thank you very much for considering my questions. Perhaps these are such basic questions, and I should have learned this in school, but I could not find a clear discussion online.
You have better choice instead of clean/invalidate the whole L1/L2 cache: DCIMVAC(Invalidate data* cache line by MVA to PoC) and DCCMVAC(Clean data* cache line by MVA to PoC). Please refer "ARM® Architecture Reference Manual ARMv7-A" for detail. Good luck.
Hello,
because each core has own L1 cache, the cleaning of one core will be useless.
You should synchronize both cores.
To do so on assumption you would use caches, you should enable SCU (Snoop Control Unit) and MMU, and mark OCM region as shared.
If you will prepare all things, the cleaning or invalidating is not necessary.
Best regards,
Yasuhiko Koumoto.
Thank you for the helpful comment, Koumoto san. Please excuse me for being dense: I thought that if I can somehow make sure to force migration of the data from CPU1's L1 cache to the OCM itself, and if cache coherency of CPU0 was enabled, the cache coherency HW itself will somehow take care of synchronization. It sounds like SCU is the HW that is taking care of coherency. Since SCU is a shared HW between CPU0 and CPU1, I believe CPU1 IS using SCU by default. MMU is a per-CPU HW, but I have enabled MMU on both CPU0 (Linux) and CPU1 (bare-metal).
So unless I misunderstood, I think that if I can just figure out WHICH cache-line to flush for the data I just wrote--WITHOUT flushing the WHOLE L1 cache, I am set. I am still trying to understand yifanfeng's earlier comment, because I really don't want to touch the entire L1 cache. I just cannot believe that when programmers have to flush the cache, they would flush the whole cache...
SCU is the hardware to maintain the cache coherency between CPU0 and CPU1. The operation is performed by cache line.
MPCore configuration includes SCU.
If the shared cache line of one CPU is modified, the corresponding cache line of another CPU is invalidated.
The situation which the whole L1 cache is flushed will not happen unless you force it by the cache operation instructions.
By the way, from ARMv7, the cache operation for the whole data cache had been omitted.
I think I understand. IF the OCM was configured for sharing, SCU would have done the heavy lifting for me, and I would not have to worry about the L1 cache operation. The root of the problem is that the Xilinx OCM driver used device memory model (NOT a normal, cached model)--using the devm_ioremap_resource() call--as I found out in my blog entry. Therefore, the other CPU had to turn off the cache too.
Thanks for the explanation Koumoto san.