Hi,
I'm trying to understand something about the performance of one of my test shaders (I'm using Mali G78 on Pixel 6a - I believe it's 20 cores) and using streamline, I'm getting values as 15 giga instructions per second, with arithmetic unit utilization of around 99%.According to my calculation, we are processing around 600 giga scalar add/mul per second (counting them in the shader - which I think cannot be optimized -, multiplying by the number of pixels, times 4 because I'm using vec4 and times the fps).
I am not sure how to reconcile the 15 giga instructions/s with my calculated 600 gflops. If I assume an instruction can be run on 32 f32 simultaneously, that would give me 15 * 32 = 480 gflops which is still quite lower than what I estimate with my shader and fillrate.
Thanks,
Lorenzo
I confirm I'm matching the 480G as expected. My previous numbers (600G) were based on not waiting for the last frame to complete (they are very slow frames).