1、I got that in opengl graphics pipline, we can use pixel local storage to shared from each threads
2、I got that from opencl developer guide, opencl __local is not implement for arm mali gpu
3、I also got that, Mali GPUs do not implement dedicated on-chip shared memory for compute shaders.
So, developer can not use shared memory in compute pipline? Or does some have a good idea for use the shared memory? I think the shared memory is very important for hpc optimization.
You can use workgroup shared memory, so work items in a compute shader can exchange data during workgroup execution. This is not backed by a dedicated local RAM, it is just normal load/store cache backing system memory. Therefore the important thing for Mali is to avoid copying read-only data from global memory to local memory - it just pollutes the cache and will make performance worse.