This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

MALI T760MP4 L2 cache hit ratio low (64%)

Hi all:

I used gator and streamline to check the MALI T760MP4 L2 cache hit ratio. But the performance is bad. The read hit ratio is about 64%.  Is Mali L2 cache a tile-based cache ? If so, how to optimize the data format for mali L2 cache ?

Thank you

-Jack

Parents
  • GPU caches are relatively small (an MP4 probably has only 256KB L2 cache), so for a data plane workload where the data size is large relative to the cache a significant proportion of cache misses are expected (i.e. every cache access will miss at least once). Note that you may have multiple L1 hits per L2 line load, so your effective hit rate from the shader core load instruction point of view may be much higher than this.

    It's hard to give specific advice on data repacking without knowing specifics of the algorithm - but in general making your data types as narrow as possible, avoiding redundant data interleaved with used data, and ensuring good temporal reuse locality are all important.

    For compute workloads how you split your work groups can also impact access patterns, so it's not always only about the data - the work group partitioning defines the data access pattern and the data layout defines how that maps to memory structures.

    HTH,
    Pete

Reply
  • GPU caches are relatively small (an MP4 probably has only 256KB L2 cache), so for a data plane workload where the data size is large relative to the cache a significant proportion of cache misses are expected (i.e. every cache access will miss at least once). Note that you may have multiple L1 hits per L2 line load, so your effective hit rate from the shader core load instruction point of view may be much higher than this.

    It's hard to give specific advice on data repacking without knowing specifics of the algorithm - but in general making your data types as narrow as possible, avoiding redundant data interleaved with used data, and ensuring good temporal reuse locality are all important.

    For compute workloads how you split your work groups can also impact access patterns, so it's not always only about the data - the work group partitioning defines the data access pattern and the data layout defines how that maps to memory structures.

    HTH,
    Pete

Children
No data