This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

about local memory in opencl

Hello, we are developing a product based on maili T764 (RK3288) with OpenCL. In our kernel, we use about 1kB local memory every workgroup. I was wondering where is these local memory allocated, and if it is possible for us to taking advantage of the L2 Cache (1MB on RK3288)  as the local memory, which may greatly speed up our program. Many thanks!

Parents
  • The GPU L2 in the RK3288 isn't 1MB; it's only 256KB. The 1MB cache is the CPU L2 cache, which is nothing to do with Mali at all ...

    Our problem now is the frequently data transfer

    Based on what you are saying you are reading and writing the same 1KB of memory multiple times from the same work item. That should be fine and should fit entirely inside the L1, let alone the L2, so memory bandwith _may_ not be your problem, although a lot depends how that is laid out in memory. How do you know this L2 to main memory bandwidth is your problem?

    I'd suggest looking at some of the video tutorials here, as they look at a lot of detail about how memory accesses can be optimized in compute kernels, and explain how to profile using the performance counters.

    GPU Compute, OpenCL and RenderScript Tutorials - Mali Developer Center Mali Developer Center

    HTH,
    Pete

    EDIT: Fixed cache size, apparently 256KB.

Reply
  • The GPU L2 in the RK3288 isn't 1MB; it's only 256KB. The 1MB cache is the CPU L2 cache, which is nothing to do with Mali at all ...

    Our problem now is the frequently data transfer

    Based on what you are saying you are reading and writing the same 1KB of memory multiple times from the same work item. That should be fine and should fit entirely inside the L1, let alone the L2, so memory bandwith _may_ not be your problem, although a lot depends how that is laid out in memory. How do you know this L2 to main memory bandwidth is your problem?

    I'd suggest looking at some of the video tutorials here, as they look at a lot of detail about how memory accesses can be optimized in compute kernels, and explain how to profile using the performance counters.

    GPU Compute, OpenCL and RenderScript Tutorials - Mali Developer Center Mali Developer Center

    HTH,
    Pete

    EDIT: Fixed cache size, apparently 256KB.

Children