Hello, we are developing a product based on maili T764 (RK3288) with OpenCL. In our kernel, we use about 1kB local memory every workgroup. I was wondering where is these local memory allocated, and if it is possible for us to taking advantage of the L2 Cache (1MB on RK3288) as the local memory, which may greatly speed up our program. Many thanks!
I've added a new section on Memory System to my blog on the GPU Shader Core here:
The Mali GPU: An Abstract Machine, Part 3 - The Shader Core
... which should answer the question.
Cheers, Pete