I'm thinking about using a cortex-a7 in "bare-metal" where I don't need much memory, so i'd like to avoid using external memory.
The CPU boots from an external 4MBytes SPI NOR FLASH chip.
It has 512 KBytes of L2 cache and 32 KBytes of internal SRAM that is just used during initial boot since it's so slow.
Using MMU and L2 cache configurations, I am wondering if there is a way to fill the whole L2 cache with code/data ?
Since the internal SRAM is 16 times smaller than the L2 cache, it might be tricky.
Could the following work ?
1. CPU boots initial code from SPI FLASH (like a 8-16KB , let's says @ 0x00000000 where SRAM is located )
2. First, MMU is configured so that this bootloader code/data is never cached.
Then,
3. CPU loads one block of 16KB from SPI FLASH, and writes it at a fixed address in internal SRAM ( 0x00004000 )
4. CPU reads 16KB of data from increasing addresses:
for 1st block : 0x80000000-0x80003fff
for 2nd block: 0x80004000-0x80007fff
... and so on ... with MMU/L2 cache configured so that those addresses always map to 0x00004000 - 0x00007fff where the block is located ( the question is here, can this be done? )
5. Those reads provoke L2 cache-misses which fills the 16KB of L2 cache with the block data.
6. Repeat 3-4-5 steps 32 times to fill the whole 512KB of L2 cache
Configure MMU L1 caches (or maybe that must be also done in previous steps?)
Jump to program entry point (so, somewhere between 0x80000000 and 0x80000000 + 512KB).
This isn't how the L2 in the Cortex-A7 was intended to be used.
First problem is that the architecture allows caches lines to be speculatively filled and evicted. Meaning that there is no guarantee that a given line will stay in the cache. The processor might attempt to write it back to memory - which in this case doesn't exist.
Cache line locking would fix this, but cache lock down is not supported on the Cortex-A7 (or any of the other recent Cortex-A processors). You can reduce the possibility of eviction by only mapping as much cacheable memory as you have cache space. However, that doesn't actually guarantee you wouldn't get evictions, just makes it unlikely.
Hi Martin, thanks for your answer,
Normally I wouldn't play much with cache/MMU, i just configure the MMU at startup, then let it do its job and it works well for me.
The only cache operations i do is invalidation after some DMA transfers to cacheable areas.
I can see that the functioning of cache & MMU is quite complex (the ARMv7-A architecture reference manual is a "mammoth" book !!!), but with such a complexity i hope there is a trick/loophole, to achieve what i want.
With external memory, i am able to fill the whole L2 cache. For that I read (and print) the content of 512 KB of cacheable external memory (size of my L2 cache), then do a DMA transfer to overwrite this memory area with new data, and re-read and re-print those 512KB. It still shows the old data, proofing that they were all put in L2 cache (L1 DCache is only 32KB). If i try to do this with more than 512KB, i start to get random inconsistencies by blocks of 64 bytes, which makes sense.
I haven't tried to execute code yet, but it does seem logical that L1 ICache & DCache would get filled with data all grabbed from L2 cache since they are all there, without the need to read from external memory.
I'm not too worried if the CPU attempted to write back data to ext memory, i could configure the memory controller as if the RAM chip was there, writes will not block anything but would go nowhere.
However, any read would obviously return garbage.
The problem is, since I will not have external memory and only have 64KB SRAM, I can't figure how I could fill the whole 512KB L2.
I've tried using MMU and 2-level tables (for one of the 4096 1MB sections descriptor, i used one 256 entries 4KB pages descriptor) .
I mapped those 256 4KB entries to the same 4KB cacheable SRAM area.
My hope was to use DMA transfers to the 4K SRAM area and invalidate 4KB of DCache corresponding the current virtual address in order to provoke an update of the 4KB L2 cache area that it would get from SRAM.
But that does not work... I'm feeling there must be a way, but i haven't found it yet
I have some vague memory about U-boot wanting to do something like this to avoid some problem they perceived in accessing DRAM early on. Sounds like a lot of unnecessary work to me but if anyone has a solution with the current cores I'd guess they would.
View all questions in Cortex-A / A-Profile forum