This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Help interpreting L/S and Texture memory usage

I'm having some conflicting thoughts on how I'm interpreting the charts of two Streamline captures of the same scenario over 4s (from a Mali G-71; Samsung A20e) , so I was wondering what's your view on the following?

For instance, while the number of L2 and external texture reads per cycle have increased (from Before to After), the total amount of texture bytes read from both the L2 cache and external memory is lower.

My interpretation is that in After we're performing less filtering operations but those are more bandwidth expensive (less coherent or heavier texture format?) thus the increase in bytes/cycle. However, all-in-all there's still an improvement because in total we're reading less data. Is this a reasonable read of these?

On the flipside, there's L/S reading. There's an overall (positive) drop in all the metrics (and increase in the full read cycles). So, the L/S reading seems clear to me, compared to the texture unit one.

Any views are appreciated. Thanks!

Before After
Parents
  • Hi JPJ, 

    My interpretation is that in After we're performing less filtering operations but those are more bandwidth expensive (less coherent or heavier texture format?) thus the increase in bytes/cycle. However, all-in-all there's still an improvement because in total we're reading less data. Is this a reasonable read of these?  

    Yes, that's my reading of it too. Long-hand explaination: 

    The critical numbers for system performance are the absolute quantities for Texturing active cycles and the total byte reads from L2 and external memory. In your "after" case you have substantial drops in all three:

    • Tex active drops from 365M to 272M cycles.
    • Tex bytes from external drops from 111 MB to 102 MB
    • Tex bytes from L2 drops from 303 MB to 268 MB  

    The texture bytes per cycle numbers provide an indicator of how well the texture cache is working. A "good" number depends on the content texture formats being used, so there is no right answer.  

    For example, if you have an application blitting two compressed textures (e.g. 4bpp) and an uncompressed texture (e.g. 32bpp) then the expected number here is:  

    • ((4 + 4 + 32) / (8 * 3)) = 1.67 bytes per access.  

    If you optimize this to remove one of the compressed layers the average per access goes up even though the overall scene load drops:  

    •  ((4 + 32) / (8 * 2)) = 2.25 bytes per access 

    There is some indication this is exactly what is happening in your case. Your percentage of compressed and percentage of mipmapped textures both drop from ~36% to ~30%, indicating that a higher percentage of the total is now uncompressed texture data.    

    *EDIT* Added an answer to the LS comment too

    On the flipside, there's L/S reading. There's an overall (positive) drop in all the metrics (and increase in the full read cycles). So, the L/S reading seems clear to me, compared to the texture unit one.

    The load/store data tends to behave a little more rationally because the counters are all counting physical accesses (full or partial reads of 64 byte cache lines), not a higher level concept like "a vertex" or even "an attribute". Sizes of cache lines don't change, so the ratios of "accesses" to L2/ext traffic should be more consistent unless you start thrashing the cache.

     HTH,  Pete  

Reply
  • Hi JPJ, 

    My interpretation is that in After we're performing less filtering operations but those are more bandwidth expensive (less coherent or heavier texture format?) thus the increase in bytes/cycle. However, all-in-all there's still an improvement because in total we're reading less data. Is this a reasonable read of these?  

    Yes, that's my reading of it too. Long-hand explaination: 

    The critical numbers for system performance are the absolute quantities for Texturing active cycles and the total byte reads from L2 and external memory. In your "after" case you have substantial drops in all three:

    • Tex active drops from 365M to 272M cycles.
    • Tex bytes from external drops from 111 MB to 102 MB
    • Tex bytes from L2 drops from 303 MB to 268 MB  

    The texture bytes per cycle numbers provide an indicator of how well the texture cache is working. A "good" number depends on the content texture formats being used, so there is no right answer.  

    For example, if you have an application blitting two compressed textures (e.g. 4bpp) and an uncompressed texture (e.g. 32bpp) then the expected number here is:  

    • ((4 + 4 + 32) / (8 * 3)) = 1.67 bytes per access.  

    If you optimize this to remove one of the compressed layers the average per access goes up even though the overall scene load drops:  

    •  ((4 + 32) / (8 * 2)) = 2.25 bytes per access 

    There is some indication this is exactly what is happening in your case. Your percentage of compressed and percentage of mipmapped textures both drop from ~36% to ~30%, indicating that a higher percentage of the total is now uncompressed texture data.    

    *EDIT* Added an answer to the LS comment too

    On the flipside, there's L/S reading. There's an overall (positive) drop in all the metrics (and increase in the full read cycles). So, the L/S reading seems clear to me, compared to the texture unit one.

    The load/store data tends to behave a little more rationally because the counters are all counting physical accesses (full or partial reads of 64 byte cache lines), not a higher level concept like "a vertex" or even "an attribute". Sizes of cache lines don't change, so the ratios of "accesses" to L2/ext traffic should be more consistent unless you start thrashing the cache.

     HTH,  Pete  

Children