I'm thinking about using a cortex-a7 in "bare-metal" where I don't need much memory, so i'd like to avoid using external memory.
The CPU boots from an external 4MBytes SPI NOR FLASH chip.
It has 512 KBytes of L2 cache and 32 KBytes of internal SRAM that is just used during initial boot since it's so slow.
Using MMU and L2 cache configurations, I am wondering if there is a way to fill the whole L2 cache with code/data ?
Since the internal SRAM is 16 times smaller than the L2 cache, it might be tricky.
Could the following work ?
1. CPU boots initial code from SPI FLASH (like a 8-16KB , let's says @ 0x00000000 where SRAM is located )
2. First, MMU is configured so that this bootloader code/data is never cached.
Then,
3. CPU loads one block of 16KB from SPI FLASH, and writes it at a fixed address in internal SRAM ( 0x00004000 )
4. CPU reads 16KB of data from increasing addresses:
for 1st block : 0x80000000-0x80003fff
for 2nd block: 0x80004000-0x80007fff
... and so on ... with MMU/L2 cache configured so that those addresses always map to 0x00004000 - 0x00007fff where the block is located ( the question is here, can this be done? )
5. Those reads provoke L2 cache-misses which fills the 16KB of L2 cache with the block data.
6. Repeat 3-4-5 steps 32 times to fill the whole 512KB of L2 cache
Configure MMU L1 caches (or maybe that must be also done in previous steps?)
Jump to program entry point (so, somewhere between 0x80000000 and 0x80000000 + 512KB).
I have some vague memory about U-boot wanting to do something like this to avoid some problem they perceived in accessing DRAM early on. Sounds like a lot of unnecessary work to me but if anyone has a solution with the current cores I'd guess they would.