This question was raised in the 'How to implement a secure IoT system on ARMv8-M' webinar, view all the questions in the round up blog post.
There are two types of caches available in the CoreLink SDK-200 System Design Kit:
To measure the instruction cache efficiency, we use counters that are available within these IP blocks. They count the accesses that result in a “hit” (requested data is found in the cache) or a “miss” (requested data not in the cache, so an access has to be performed to the actual location). We can then find out the percentage of hits, called the “hit rate”.
When reading code from Flash, the higher the hit rate, the lower the power consumption. This is a consequence of the fact that Flash reads require much more energy than reading from a RAM (or from our cache memory as it happens).
The hit rate depends on many parameters, including the type of code (very linear code would result in poorer performance than code containing many loops) or the relative sizes of the code versus the size of the cache. A study has been performed in ARM to show the impact of cache size on miss rate (the inverse of hit rate, i.e. the ratio of accesses actually being forwarded to the Flash). The results are shown below:
As you see, a cache of 2KB is already very efficient, and this is the default value we have selected in the CoreLink SSE-200 subsystem.
Now back to the last part of your question. For code stored in Flash, we do not expect to have the need to write the Flash very often. Basically it would only happen when we want to upgrade the code (which is a great idea for security, but that is another topic…), so probably a few times per year. It is therefore not really necessary to optimize this part, and the caches in CoreLink SDK-200 completely ignore write operations. Of course, once the update is complete, the cache needs to be flushed in order to avoid mismatches between the content of the Flash and the cached code.