I'm thinking about using a cortex-a7 in "bare-metal" where I don't need much memory, so i'd like to avoid using external memory.
The CPU boots from an external 4MBytes SPI NOR FLASH chip.
It has 512 KBytes of L2 cache and 32 KBytes of internal SRAM that is just used during initial boot since it's so slow.
Using MMU and L2 cache configurations, I am wondering if there is a way to fill the whole L2 cache with code/data ?
Since the internal SRAM is 16 times smaller than the L2 cache, it might be tricky.
Could the following work ?
1. CPU boots initial code from SPI FLASH (like a 8-16KB , let's says @ 0x00000000 where SRAM is located )
2. First, MMU is configured so that this bootloader code/data is never cached.
Then,
3. CPU loads one block of 16KB from SPI FLASH, and writes it at a fixed address in internal SRAM ( 0x00004000 )
4. CPU reads 16KB of data from increasing addresses:
for 1st block : 0x80000000-0x80003fff
for 2nd block: 0x80004000-0x80007fff
... and so on ... with MMU/L2 cache configured so that those addresses always map to 0x00004000 - 0x00007fff where the block is located ( the question is here, can this be done? )
5. Those reads provoke L2 cache-misses which fills the 16KB of L2 cache with the block data.
6. Repeat 3-4-5 steps 32 times to fill the whole 512KB of L2 cache
Configure MMU L1 caches (or maybe that must be also done in previous steps?)
Jump to program entry point (so, somewhere between 0x80000000 and 0x80000000 + 512KB).
You're very lucky that your DRAM controller returns garbage data if there's no DRAM connected.. the more common scenario is that it locks the bus by never allowing a transaction to finish (because it's waiting on data from RAM that doesn't exist, and that can never arrive).
The DRAM controller blindly assumes the DRAM chip honors its read/write requests according to the various clocks it delivers relative to all the timing information it's been configured with.
It will not wait on the DRAM IC, so it's not a problem if the chip is not there.
Hi 0xffff,
Understood, but again if you do move to a core with lockdown features you might not get the same controller for the DRAM and therefore less easy a way to 'pollute' the cache with read-allocated garbage in order to write over it, then lock it..
Any progress?
I am currently working with arm a13 allwinner.
I look at how to avoid external DDR3 memory, because the program size is less than 128k, boot with sd - card and loader , which fed program into the l2 cache.
As per the multiple replies higher up the thread, this isn't possible in a manner which is reliable unless the CPU supports cache lockdown which the Cortex-A cores do not.
HTH,
Pete