Hi All!
I am working with a Xilinx Zynq 7000 SoC which uses the Cortex A9 as a CPU.
I've observed a problem wherein a section of memory marked strongly-ordered and non-cacheable (0xc02) in the MMU table gets corrupted by what appears to be L1 evictions back to DDR.
Setup: Linux master on CPU0 with FreeRTOS on CPU1. During the boot process, the region from 512MB to 1GB is marked 0xc02 in the translation table and aliased back to the lower region (0 to 512MB). This has the effect of allowing accesses to the same physical memory with different region attributes. The Linux CPU owns the L2 cache and its controller, the L2 cache is disabled on CPU1.
Thus, a pointer offset by 0x20000000 from its original value returned by malloc should be considered uncacheable, and all memory accesses will go directly to memory. I am using a buffer of 1024 integers, which is allocated by malloc then offset to make all accesses uncacheable.
Issue: After performing a memcpy to the uncached buffer, the value matches the source exactly. However, after a short amount of time, the uncached buffer drifts from the source (which remains unchanged throughout). When the buffer is instead marked as cached, this corruption does not occur, which leads me to believe that stale data is being evicted from the L1 cache and overwriting the new clean data that was placed in DDR.
I have tried disabling, flushing, and invalidating the cache (both before and after the memcpy), but these did not work. The buffer is unaligned to the L1 cache size, which would cause corruption at the front and end entries in the buffer from accesses to the cached pointers before and after, but the corruption is spread randomly throughout the buffer in chunks of 8 entries (8*4 = 32, the L1 line size). Additionally, I've tried disabling the prefetch bits in the ACTLR. Looking at the the disassembly of memcpy though, it does not issue any PLD instructions to the destination, only to the source.
What else could be the cause of this, and what else could I try to fix the issue of not being able to write to an uncached region?
Thanks!!!
This has the effect of allowing accesses to the same physical memory with different region attributes
No it does not "allow" it; mapping the same physical memory region with different page table attributes is explicitly and repeatedly documented as "you must not ever do this" in the ARM ARM.
Horrible things will likely go wrong, even if your software never touches the "wrong" mapping. Many A-profile cores will speculatively prefetch "normal / cached" memory, so may access conflicting mappings even though one mapping is never explicitly accessed in the executing code, and that's not to mention the impact of aliased mappings on systems relying on hardware cache coherency (which is likely your problem here).
This presentation from Mark Rutland is a good run through all of the pitfalls of modern memory systems and caches:
Taming the Chaos of Modern Caches | Linux.com | The source for Linux information
The Linux CPU owns the L2 cache and its controller, the L2 cache is disabled on CPU1.
It's a single cache shared by both cores. It's either on or off - it can't be on for one, and off for the other.
HTH,
Pete
Hi Pete, dedowes,
The Linux CPU owns the L2 cache and its controller, the L2 cache is disabled on CPU1.It's a single cache shared by both cores. It's either on or off - it can't be on for one, and off for the other.
As an interesting side-note, while you can't "disable" the L2 cache on a per-CPU basis, the CPU SCTLR.C bit will affect whether the CPU can generate what the L2 cache (or, more specifically, the L2 memory system) sees as cacheable transactions. This is somewhat architecturally defined, but it is probably an awful idea to say that CPU1 has it's L1 disabled just to prevent allocation into L2. Note that disabling caches does NOT prevent lookups or hits in caches, nor does it technically prevent a device or strongly-ordered memory access from being looked up in a cache (that seems counter-intuitive but it is architecturally acceptable).
It's possible, if that L2 cache is an L2C-310 and it has been synthesized with the appropriate option to enable "Lockdown by Master ID", to configure the L2 cache to essentially lock all ways as unavailable for allocation to a particular CPU, effectively 'disabling' L2 for that CPU, but any transaction will still have to pass through the L2C-310 on it's way to L3, there is no short-circuit.
~
What's really wrong here, though, is exactly as Pete says: you can't map two virtual addresses with two sets of (conflicting) attributes to the same physical address. From your description of the problem, it isn't so much the L2 cache that is causing the issue here, nor who "owns" it, but flouting the rules of memory coherency which will bite you on any processor architecture (not just ARM). Even if you are considering that using a single CPU and the OS running on it is the only one generating accesses to that physical memory location, you still have to abide by the rules of the memory model, multiple observers (not just CPUs, but the MMU, Instruction- and Data-side logic).
If you've got cacheable memory then by any definition you HAVE to deal with cache coherency. Simply mapping it as strongly-ordered somewhere else would never remove the requirement to maintain the caches for the cacheable alias, not only before and after using the cacheable alias, but before and after using the strongly-ordered alias too.
Unfortunately, you're detailing the symptoms of a problem but you never really described the original intent - is this 512MB strongly-ordered alias an attempt by FreeRTOS to read the memory that Linux is using? Or is it a buffer owned solely by FreeRTOS as an alias of it's own (not-shared-with-Linux) cacheable memory?
Ta,
Matt
As an interesting side-note, while you can't "disable" the L2 cache on a per-CPU basis, the CPU SCTLR.C bit will affect whether the CPU can generate what the L2 cache (or, more specifically, the L2 memory system) sees as cacheable transactions.
Interesting - thanks Matt - I didn't know about that one.
Cheers,
I figured I would cross-check this, since I had a very small doubt about it.. and it turns out that the Cortex-A9 isn't quite as nice about it.. Cortex-A9 TRM r4p1 states that when SCTLR.C=0 all accesses to Cacheable regions are treated as Normal Non-Cacheable without lookup in L1, but then it goes on to say:
ARUSER[4:0] and AWUSER[4:0] directly reflect the value of the Inner attributes and Shared attribute as defined in the corresponding page descriptor. They do not reflect how the Cortex-A9 processor interprets them, and whether the access was treated as Cacheable or not
.. that may interfere with things a little since even though it will never allocate into L1, it will still present Shareability and Cacheability attributes externally, to the L2C-310. I am sure there are some cute use cases of not using L1 but wanting to use the L2.. but as hard as I look, I actually can't find anything architecturally that would prevent a processor from presenting the intended attributes vs. the ones it used (in fact, doing so would not be terribly friendly to designs with a system cache). The case of requiring a processor not to output any cacheability or shareability attributes on the bus is well handled by marking translation tables correctly or just not turning on the MMU in the first place.
So, I was dead wrong. Oops
Thanks for checking
View all questions in Embedded forum