Hi
I used Samsung S9 (Mali-G72, OpenGL ES 3.2 r9p0).
I Implement a method to copy current pixel to a image.
I use Framebuffer Fetch to read pixel, then I use Image Store (not atomic) to a image.
And It will process a Dual Filter (Down-Up Sample) to do Bloom Texture in next frame.
But I used the method in OpenGL ES, It dropped 30fps
I see CPU is waiting for something. and GPU is slowdown.
Using the method in Vulkan is not problem.
It can see low usage on Mali L2 Cache Stall in Streamline.
So how to know what the problem ? (no problem in other vendor too)
Thanks
Using imageLoad/Store is really designed for cases where you need read-modify-write access to images, and will disable many optimizations that you would get for free by doing a write out via the framebuffer (such as framebuffer compression). For high core count configurations such as the Galaxy S8 and S9 (20 and 18 cores respectively), it is very easy to become bottlenecked on main memory if you have all cores touching memory regularly, so the loss of framebuffer compression is likely going to be painful.
Framebuffer fetch also needs to be used with some care because it can cause dependency stalls on the pixel pipeline (a fragment in a later layer must wait for the earlier layer to commit a result to tile memory before it can be read back again - if too many threads stall then that can be expensive).
Without knowing exactly what you are trying it's hard to give specific advice - are you able to share a reproducer APK and/or your Streamline files?
Regards, Pete
https://drive.google.com/file/d/1oen9kZPDZg3MxTsrDt5pmQ7BSepBc6J_/view?usp=sharing173fps 0s~9s disable all167fps 9s~15s color grading (Framebuffer Fetch)88fps 15s~24s color grading+bloom (with Framebuffer Fetch and Image Store)120fps 24s~31s color grading+bloom (with BlitFramebuffer)167fps 31s~36s color grading (the same as 9s~15s)
locked 60fps Streamline
https://drive.google.com/file/d/13Y9kbdkHYCSJRSuLvWpaZmJEb4VojFzp/view?usp=sharing