Hi all,
A Question about the A8 processor.
If I enable the L1 and L2 caches, I see a performance boost even if the MMU is disabled. I was under the impression that the MMU is required to be enabled to use the Cache. I also do not see any errors or mismatches. My project works the same way whether cache is disabled or enabled(better performance in the second case), leading me to believe that there are no problems with using the cache without enabling the MMU. Is this the expected behavior?
Additionally, I am also defining the page directory to mark the uncacheable memory. This also seems to work when I do not enable the MMU. Can I ignore the MMU and use the cache safely?
I read this post but I am still not clear on what to expect from a software point of view.
For further context, I am using the BBB device for a a project which requires good performance, when I stumbled on this. This does not require virtual memory. It needs uncacheable memory for some device drivers which use DMA. Every other piece of memory can be cached and all processes share the same address space.
Cheers!
Some of our test cases read from the same memory location continuously. This had a speedup of 36 times when the cache was enabled. On changing the read to be from locations that are (cache_line_width) apart, there is no more speedup. This looks like in the first case, every access was a cache hit, and in the second case, every access was a miss. I am fairly confident that the data accesses are cached.
Also, we do not need address translation at all. We are using physical addresses throughout.
This is what confuses me. So as I understand, your suggestion is to not use the caches without enabling MMU, since this is unsafe?
Using cache w/o MMU is not unsafe. But from what is written in the TRM, I would not expect a speed boost on data accesses.Are your sure your test is correct?
I think you're missing the point.
Enable the MMU. Even if you don't need virtual addressing, you should enable it. Identity mapping VA->PA is definitely a supported use case. There is absolutely no point in running without the MMU enabled, except if you are EXTREMELY resource-constrained and can't spare the memory to write the tables (you only need 16KiB, though, for the bare essentials at 1MB and 16MB granularity).
What you're probably seeing is an artefact of the size and alignment requirements of internal structures of the Load/Store unit and L2 interface. Whether the caches are 'enabled' (which in Arm Architecture means more 'capable of allocating into' than 'turned on') or not, every request goes through the memory system hierarchy in order a non-cacheable access passes through the L1 cache controller, which may pass it to the L2 cache controller, and so on.
Whatever performance gain you get for enabling caches without enabling the MMU is besides the point, really, because you shouldn't be trying to run without it.