I'm thinking about using a cortex-a7 in "bare-metal" where I don't need much memory, so i'd like to avoid using external memory.
The CPU boots from an external 4MBytes SPI NOR FLASH chip.
It has 512 KBytes of L2 cache and 32 KBytes of internal SRAM that is just used during initial boot since it's so slow.
Using MMU and L2 cache configurations, I am wondering if there is a way to fill the whole L2 cache with code/data ?
Since the internal SRAM is 16 times smaller than the L2 cache, it might be tricky.
Could the following work ?
1. CPU boots initial code from SPI FLASH (like a 8-16KB , let's says @ 0x00000000 where SRAM is located )
2. First, MMU is configured so that this bootloader code/data is never cached.
Then,
3. CPU loads one block of 16KB from SPI FLASH, and writes it at a fixed address in internal SRAM ( 0x00004000 )
4. CPU reads 16KB of data from increasing addresses:
for 1st block : 0x80000000-0x80003fff
for 2nd block: 0x80004000-0x80007fff
... and so on ... with MMU/L2 cache configured so that those addresses always map to 0x00004000 - 0x00007fff where the block is located ( the question is here, can this be done? )
5. Those reads provoke L2 cache-misses which fills the 16KB of L2 cache with the block data.
6. Repeat 3-4-5 steps 32 times to fill the whole 512KB of L2 cache
Configure MMU L1 caches (or maybe that must be also done in previous steps?)
Jump to program entry point (so, somewhere between 0x80000000 and 0x80000000 + 512KB).
This isn't how the L2 in the Cortex-A7 was intended to be used.
First problem is that the architecture allows caches lines to be speculatively filled and evicted. Meaning that there is no guarantee that a given line will stay in the cache. The processor might attempt to write it back to memory - which in this case doesn't exist.
Cache line locking would fix this, but cache lock down is not supported on the Cortex-A7 (or any of the other recent Cortex-A processors). You can reduce the possibility of eviction by only mapping as much cacheable memory as you have cache space. However, that doesn't actually guarantee you wouldn't get evictions, just makes it unlikely.
Hi Martin, thanks for your answer,
Normally I wouldn't play much with cache/MMU, i just configure the MMU at startup, then let it do its job and it works well for me.
The only cache operations i do is invalidation after some DMA transfers to cacheable areas.
I can see that the functioning of cache & MMU is quite complex (the ARMv7-A architecture reference manual is a "mammoth" book !!!), but with such a complexity i hope there is a trick/loophole, to achieve what i want.
With external memory, i am able to fill the whole L2 cache. For that I read (and print) the content of 512 KB of cacheable external memory (size of my L2 cache), then do a DMA transfer to overwrite this memory area with new data, and re-read and re-print those 512KB. It still shows the old data, proofing that they were all put in L2 cache (L1 DCache is only 32KB). If i try to do this with more than 512KB, i start to get random inconsistencies by blocks of 64 bytes, which makes sense.
I haven't tried to execute code yet, but it does seem logical that L1 ICache & DCache would get filled with data all grabbed from L2 cache since they are all there, without the need to read from external memory.
I'm not too worried if the CPU attempted to write back data to ext memory, i could configure the memory controller as if the RAM chip was there, writes will not block anything but would go nowhere.
However, any read would obviously return garbage.
The problem is, since I will not have external memory and only have 64KB SRAM, I can't figure how I could fill the whole 512KB L2.
I've tried using MMU and 2-level tables (for one of the 4096 1MB sections descriptor, i used one 256 entries 4KB pages descriptor) .
I mapped those 256 4KB entries to the same 4KB cacheable SRAM area.
My hope was to use DMA transfers to the 4K SRAM area and invalidate 4KB of DCache corresponding the current virtual address in order to provoke an update of the 4KB L2 cache area that it would get from SRAM.
But that does not work... I'm feeling there must be a way, but i haven't found it yet
I have some vague memory about U-boot wanting to do something like this to avoid some problem they perceived in accessing DRAM early on. Sounds like a lot of unnecessary work to me but if anyone has a solution with the current cores I'd guess they would.
Actually, what I wrote here made me think of a trivial solution, and i successfully populated the whole 512KB L2 cache with data
I'm not too worried if the CPU attempted to write back data to ext memory, i could configure the memory controller as if the RAM chip was there
Now, the Cortex-A Programmer's guide says : When the core executes a store instruction, a cache lookup on the address(es) to be written is performed. For a cache hit on a write, there are two choices. (Write-through & Write-back).
But the important is, both of them will update the cache.
So, even if there is not physical DRAM chip onboard, as long as DRAM controller is initialized, writing to DRAM locations can populate L2 thru this mechanism.
To test that this works, i modified the DRAM init code so that reading/writing to DRAM is allowed, but produce garbage.
Then, I just write 512KB of valid data to that "broken" DRAM, when re-reading, they are all good, while if i have DCache disabled it is all garbage.
Now it's time to try to execute code from L2 to see if that works...
As Martin has already said, this isn't going to work. Data will be randomly evicted (if dirty) or discarded (if clean), and when next needed it will attempt to reload from external memory which will return garage. What you are trying is architecturally unsafe, so according to the spec can't work reliably, and may be prone to somewhat unpredictable failures.
What you are basically describing is a means to lock the caches (commonly called cache lockdown) which forces the cache to hold on to data (and not write to external memory). The Cortex-A family caches do not support this feature, although some ARM cores in the past have done.
HTH, Pete
View all questions in Cortex-A / A-Profile forum