Are dependent texture reads still an issue on ARM hardware? (I'm speaking about low-end OpenGL ES 3.0 HW).
For Mali there was never any specific issue with dependent texture reads, other than the the usual cost of managing cache misses on the lookup path (which is not dependent-read specific, the same problem can occur for non-dependent reads too).
Any GPU cache miss can only be hidden if the shader core has "other work" to do - either non-dependent work from the same thread, or other threads to run. If you have a high number of cache misses then stalls cannot be completely hidden and start to eat into your overall content efficiency because the GPU runs out of work to do.
Entry-level GPUs can be more susceptible to problems here because the L2 cache is smaller so you are more likely to get cache pressure causing eviction, and then subsequent misses that end up needing fetches from DRAM.
Things that help:
... but the impact here is very content dependent, so if in doubt benchmark your usage on the devices you care about.
Thanks for the quick answer! Just to clarify my question: with dependent texture reads I meant the ability of old OpenGL 2.0 devices to pre-fetch texture data before even running the fragment shader IF the texture coordinates were used unchanged in the fragment shader.
Don't think Mali ever did that - with enough threads running you don't need to bother.
Cool thanks! This was a common performance optimization on PowerVR OpenGL 2.0 HS (old iPhones too). So was wondering how that did apply here.