This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Guidelines on reducing Cache Miss rate

Hi Experts,

Is there any document on general software guidelines in reducing the cache miss rate in the system for ARMV7 architectures ?

If it is more specific to A/R/M then its great..

Parents
  • It would be good to have a good book or website about that. I can give a couple of tips

    Get a package which somebody else has optimised

    Else if you're a lot more desperate

    For matrices try and work with tiles, square patches, at a time that fit easily in the cache. There's a lot of work done on strategies in this area so finding an algorithm is best.

    When going down a list prefetch the link to next node before working on the current node. If you're going to use a linked list a number of times it may be worthwhile arranging it into a table or a contiguous list in order at the start.

    Do a few threads of similar work, say four of them, in parallel in software like emulating a multi-core processor. Prefetch anything that might cause a cache miss then do the next bit of work on the next thread and cycle round the threads that way.

    Study your processor to see what problems might be caused by clashing cache addresses and try and ensure you avoid clashes. This used to be an awful problem in the past but ARM processors have usually been some of the better ones this way.

Reply
  • It would be good to have a good book or website about that. I can give a couple of tips

    Get a package which somebody else has optimised

    Else if you're a lot more desperate

    For matrices try and work with tiles, square patches, at a time that fit easily in the cache. There's a lot of work done on strategies in this area so finding an algorithm is best.

    When going down a list prefetch the link to next node before working on the current node. If you're going to use a linked list a number of times it may be worthwhile arranging it into a table or a contiguous list in order at the start.

    Do a few threads of similar work, say four of them, in parallel in software like emulating a multi-core processor. Prefetch anything that might cause a cache miss then do the next bit of work on the next thread and cycle round the threads that way.

    Study your processor to see what problems might be caused by clashing cache addresses and try and ensure you avoid clashes. This used to be an awful problem in the past but ARM processors have usually been some of the better ones this way.

Children
No data