Hi all,
A Question about the A8 processor.
If I enable the L1 and L2 caches, I see a performance boost even if the MMU is disabled. I was under the impression that the MMU is required to be enabled to use the Cache. I also do not see any errors or mismatches. My project works the same way whether cache is disabled or enabled(better performance in the second case), leading me to believe that there are no problems with using the cache without enabling the MMU. Is this the expected behavior?
Additionally, I am also defining the page directory to mark the uncacheable memory. This also seems to work when I do not enable the MMU. Can I ignore the MMU and use the cache safely?
I read this post but I am still not clear on what to expect from a software point of view.
For further context, I am using the BBB device for a a project which requires good performance, when I stumbled on this. This does not require virtual memory. It needs uncacheable memory for some device drivers which use DMA. Every other piece of memory can be cached and all processes share the same address space.
Cheers!
There is a default cache behavior which applies to _all_ memory as long as the MMU is disabled.
See chapter "B3.2.1 VMSA behavior when a stage 1 MMU is disabled" in the Cortex-A TRM (ARM DDI 0406C.c):
-data: "The stage 1 translation assigns the Strongly-Ordered memory type.Note This means the access is Non-cacheable. Unexpected data cache hit behavior is IMPLEMENTATION DEFINED."
- instruction: "The stage 1 translation assigns the Cacheable, Inner Write-Through no Write-Allocate, Outer Write-Through no Write-Allocate attribute."
The MMU is primarily used to perform address translation. If the MMU is disabled, that effectively means that there is no translation happening. If you want to perform address translation (using sections or supersections), you have to use the MMU. The MMU registers tell the hardware how to perform the translation and what to do an address/data has been fetched.Keep in mind that there is Address Cache, Data Cache and a Unified cache. Typically, L1 cache is just used as an address cache, may not specifically need a translation table and it might be enabled as a default behavior. Hence, even if MMU is disabled, L1 Cache is still being used providing the performance boost.
From the cite of the TRM you see that w/o MMU data accesses are not cached. Esp. write back behavior boosts software.
What he will notice is the speed up of instruction fetches.
Some of our test cases read from the same memory location continuously. This had a speedup of 36 times when the cache was enabled. On changing the read to be from locations that are (cache_line_width) apart, there is no more speedup. This looks like in the first case, every access was a cache hit, and in the second case, every access was a miss. I am fairly confident that the data accesses are cached.
Also, we do not need address translation at all. We are using physical addresses throughout.
This is what confuses me. So as I understand, your suggestion is to not use the caches without enabling MMU, since this is unsafe?
Using cache w/o MMU is not unsafe. But from what is written in the TRM, I would not expect a speed boost on data accesses.Are your sure your test is correct?
I think you're missing the point.
Enable the MMU. Even if you don't need virtual addressing, you should enable it. Identity mapping VA->PA is definitely a supported use case. There is absolutely no point in running without the MMU enabled, except if you are EXTREMELY resource-constrained and can't spare the memory to write the tables (you only need 16KiB, though, for the bare essentials at 1MB and 16MB granularity).
What you're probably seeing is an artefact of the size and alignment requirements of internal structures of the Load/Store unit and L2 interface. Whether the caches are 'enabled' (which in Arm Architecture means more 'capable of allocating into' than 'turned on') or not, every request goes through the memory system hierarchy in order a non-cacheable access passes through the L1 cache controller, which may pass it to the L2 cache controller, and so on.
Whatever performance gain you get for enabling caches without enabling the MMU is besides the point, really, because you shouldn't be trying to run without it.