Hello, we are developing a product based on maili T764 (RK3288) with OpenCL. In our kernel, we use about 1kB local memory every workgroup. I was wondering where is these local memory allocated, and if it is possible for us to taking advantage of the L2 Cache (1MB on RK3288) as the local memory, which may greatly speed up our program. Many thanks!
Hi Peter, could you please tell me the maximum work items that can run at the same time on Mali T764 (RK3288), and the size of L1 cache in that GPU?
Many thanks!
Tan
Hi Tan,
The maximum occupancy on a Midgard GPU is 256 threads per shader core, so 1024 on a T760 MP4. Mali-T760 - ARM doesn't say anything about L1 cache, so it might not be public information. Pete will know
Hth,
Chris
Thanks Chris , Peter updated in his blog:
T760 has two 16KB L1 data caches per shader core; one for texture access and one for generic memory access.