So as its written in documentation and explained in some sources, whenever you work with mali offline compiler - you need to focus on stage which has the highest score in output from Mali first (I.e. arithmetics/load storage or texture stage)One thing I noticed is that in pretty much any shader texture unit is never a bottleneck.Example:
Hardware: Mali-T720 r1p1 Architecture: Midgard Driver: r23p0-00rel0 Shader type: OpenGL ES Fragment Main shader =========== Work registers: 4 Uniform registers: 1 Stack spilling: false A LS T Bound Total instruction cycles: 22.00 1.00 5.00 A Shortest path cycles: 10.75 1.00 5.00 A Longest path cycles: 10.75 1.00 5.00 A A = Arithmetic, LS = Load/Store, T = Texture
So if you add another texture - you need to do something with it - blend it with other computation at least - it means arithmetics cycles will go up as well.. So as I said - texture cycles are like never higher than other columns.So when I work on optimizing shaders - my current intuition is to still be quite agressive and try to reduce texture fetches as much as possible. And usually I don't tradeoff arithmetics and texture fetches - i.e. I don't move computation from arithmetcs to baked texture unless it's something very expensive.Another thing: Mali offline compiler makes assumption that texture fetch is bilinear and texture has mipmaps.We currently mostly use bilinear filtering without mipmaps on mobile. Rationale: when you start using mipmaps - you also need trilinear filtering, otherwise transition between mipmaps levels become visible.Trilinear filtering means - double the cycles and also more memory throughput is needed (fetching 8 texels instead of 4 for bilinear).On the other hand not using mipmaps means poor cache utilization which also means - more memory throughput is needed. No idea what's better in practice. I guess depends on the project/hardware. Or is there a universal answer?And also fetching texture means latency, this latency is hidden to some degree if shader use relatively small amount but I assume it's still there. Once I switch to another project in the company, I'll have time to do extensive tests related to the cost of textures and hopefully build some intuition.As I am impatient and curious, I do hope other more experienced devs will share their intuition here.So my questions:1. Is it good strategy to aggressively optimize out texture fetches and treat them as very expensive thing (even if it's not a bottleneck by Mali offline compiler). Should I adjust score by Mali offline compiler, i.e. multiply it by 2 (so it's trilinear) or maybe I should use GPU profiler and look at some GPU metrics like memory throughput to make final decision? How do you do it in practice? 2. Bilinear no mipmaps vs Trilinear mipmaps - what do you think is better in practice? How do you choose what to use? Does it depend on hardware maybe? We do need to support Midgard devices (we support very old devices, we're mobile development company) 3. If you can share with me any links/books/resources explaining anything above which might help me - please do share them as well. I already read official mali documentation and optimization guides.
The best strategy for optimizing texture fetches would depend on the specific hardware and project you're working with. The Mali offline compiler provides a good starting point for identifying potential performance bottlenecks, but it's not always an accurate representation of what's happening on the actual hardware. To make the final decision, it's best to use a GPU profiler and look at metrics such as memory throughput to determine the actual performance impact of your optimizations.
The choice between bilinear filtering without mipmaps and trilinear filtering with mipmaps is a tradeoff between performance and visual quality. Bilinear filtering without mipmaps provides better performance, but the transition between mipmap levels can be noticeable, while trilinear filtering with mipmaps provides better visual quality, but at the cost of increased performance overhead. It's important to consider the specific hardware and project you're working with, as well as the target audience, when making this decision.
Here are a few resources that might be helpful for further learning: