I'm trying to use pls to implement deferred rendering in unity, but I found that using pixel local storage makes my framerate much lower. renderdoc shows me the increase in time spent rendering point lights. The streamline shows that my bandwidth is significantly reduced, but fragment active is also greatly improved.
This is the result of using pixel local storage
This is the result of not using pls
Thanks for your answer
Yes, this isn't uncommon - but you often still see net gain overall in terms of sustained performance as memory power consumption drops dramatically. The performance loss should reduce significantly on the new Mali-G310/510/610/710 hardware.
See this post from last month which contains a description of why this happens (in the context of Vulkan subpasses, but PLS is effectively the same behavior as far as the hardware is concerned).
... and the "Tile access synchronization" section here for why it improves in the newer hardware: