This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Guidelines on reducing Cache Miss rate

Hi Experts,

Is there any document on general software guidelines in reducing the cache miss rate in the system for ARMV7 architectures ?

If it is more specific to A/R/M then its great..

Parents
  • A quick google on "reduce cache miss rate" turns up this page: Reducing Cache Miss Rate which is quite useful.

    Most recent ARM cores support prefetch instructions like PLD, PLI etc. These can improve the performance of loops, especially through data which exhibits low locallity.

    Very often, though, the most important thing you can do is look at your algorithm. And then look at your implementation of that algorithm. A zero-copy algorithm can make better use of available cache space, for instance. And a simplistic implementation of a matrix multiplication operation will often show very poor performance, especially for matrices which are large compared to the cache size, because of a high level of contention. Re-implementing it using strips or blocks/tiles can improve performance by increasing the amount of reuse of cached data.

    Think also about data structures. Sparse arrays and linked lists often cache extremely poorly and actually the effect of caches in these cases if sometimes to actually increase memory traffic rather than reduce it.

    Another good technique is to align data structures with cache line boundaries. If an individual data element fits within a cache line then the whole element can be loaded with only one cache miss.

    There is a lot of other useful material out there.

    Hope this helps.

    Chris

Reply
  • A quick google on "reduce cache miss rate" turns up this page: Reducing Cache Miss Rate which is quite useful.

    Most recent ARM cores support prefetch instructions like PLD, PLI etc. These can improve the performance of loops, especially through data which exhibits low locallity.

    Very often, though, the most important thing you can do is look at your algorithm. And then look at your implementation of that algorithm. A zero-copy algorithm can make better use of available cache space, for instance. And a simplistic implementation of a matrix multiplication operation will often show very poor performance, especially for matrices which are large compared to the cache size, because of a high level of contention. Re-implementing it using strips or blocks/tiles can improve performance by increasing the amount of reuse of cached data.

    Think also about data structures. Sparse arrays and linked lists often cache extremely poorly and actually the effect of caches in these cases if sometimes to actually increase memory traffic rather than reduce it.

    Another good technique is to align data structures with cache line boundaries. If an individual data element fits within a cache line then the whole element can be loaded with only one cache miss.

    There is a lot of other useful material out there.

    Hope this helps.

    Chris

Children
No data