I'm thinking about using a cortex-a7 in "bare-metal" where I don't need much memory, so i'd like to avoid using external memory.
The CPU boots from an external 4MBytes SPI NOR FLASH chip.
It has 512 KBytes of L2 cache and 32 KBytes of internal SRAM that is just used during initial boot since it's so slow.
Using MMU and L2 cache configurations, I am wondering if there is a way to fill the whole L2 cache with code/data ?
Since the internal SRAM is 16 times smaller than the L2 cache, it might be tricky.
Could the following work ?
1. CPU boots initial code from SPI FLASH (like a 8-16KB , let's says @ 0x00000000 where SRAM is located )
2. First, MMU is configured so that this bootloader code/data is never cached.
Then,
3. CPU loads one block of 16KB from SPI FLASH, and writes it at a fixed address in internal SRAM ( 0x00004000 )
4. CPU reads 16KB of data from increasing addresses:
for 1st block : 0x80000000-0x80003fff
for 2nd block: 0x80004000-0x80007fff
... and so on ... with MMU/L2 cache configured so that those addresses always map to 0x00004000 - 0x00007fff where the block is located ( the question is here, can this be done? )
5. Those reads provoke L2 cache-misses which fills the 16KB of L2 cache with the block data.
6. Repeat 3-4-5 steps 32 times to fill the whole 512KB of L2 cache
Configure MMU L1 caches (or maybe that must be also done in previous steps?)
Jump to program entry point (so, somewhere between 0x80000000 and 0x80000000 + 512KB).
Actually, what I wrote here made me think of a trivial solution, and i successfully populated the whole 512KB L2 cache with data
I'm not too worried if the CPU attempted to write back data to ext memory, i could configure the memory controller as if the RAM chip was there
Now, the Cortex-A Programmer's guide says : When the core executes a store instruction, a cache lookup on the address(es) to be written is performed. For a cache hit on a write, there are two choices. (Write-through & Write-back).
But the important is, both of them will update the cache.
So, even if there is not physical DRAM chip onboard, as long as DRAM controller is initialized, writing to DRAM locations can populate L2 thru this mechanism.
To test that this works, i modified the DRAM init code so that reading/writing to DRAM is allowed, but produce garbage.
Then, I just write 512KB of valid data to that "broken" DRAM, when re-reading, they are all good, while if i have DCache disabled it is all garbage.
Now it's time to try to execute code from L2 to see if that works...
As Martin has already said, this isn't going to work. Data will be randomly evicted (if dirty) or discarded (if clean), and when next needed it will attempt to reload from external memory which will return garage. What you are trying is architecturally unsafe, so according to the spec can't work reliably, and may be prone to somewhat unpredictable failures.
What you are basically describing is a means to lock the caches (commonly called cache lockdown) which forces the cache to hold on to data (and not write to external memory). The Cortex-A family caches do not support this feature, although some ARM cores in the past have done.
HTH, Pete