This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

OpenCL code portability across various Mali GPUs

I wonder how portable is OpenCL code. If I write something for T-628 MP6 will it run on T-880 etc.? For sure min. OpenCL standard must be supported by HW but what about possibly different GPU architecture that may make some code unique on one GPU but performing poorly on the other?

Parents
  • Portability of performance is an interesting question, and the only real answer is "it depends on what you are doing". There are certainly micro-architectural differences in all of the GPUs which can impact performance, how much will depend on how much your algorithm is influenced by those differences.

    In my experience the thing which OpenCL kernels are most sensitive to is the performance of the memory system; in reality many common OpenCL algorithms such as computer vision are actually quite computationally light in terms of FMAs per memory access. How well an algorithm can fit data into the L1 and L2 data caches is therefore of critical importance. The L2 cache size is configurable on Mali, even within a single product line our partners can choose a cache size which fits their performance requirements and shader core count (typical implementations are 64KB per shader core).

    The memory system outside of the GPU linking it to DDR is out of our control, so on top of L2 size difference there may be differences in available memory bandwidth and memory latency.

    HTH,
    Pete
Reply
  • Portability of performance is an interesting question, and the only real answer is "it depends on what you are doing". There are certainly micro-architectural differences in all of the GPUs which can impact performance, how much will depend on how much your algorithm is influenced by those differences.

    In my experience the thing which OpenCL kernels are most sensitive to is the performance of the memory system; in reality many common OpenCL algorithms such as computer vision are actually quite computationally light in terms of FMAs per memory access. How well an algorithm can fit data into the L1 and L2 data caches is therefore of critical importance. The L2 cache size is configurable on Mali, even within a single product line our partners can choose a cache size which fits their performance requirements and shader core count (typical implementations are 64KB per shader core).

    The memory system outside of the GPU linking it to DDR is out of our control, so on top of L2 size difference there may be differences in available memory bandwidth and memory latency.

    HTH,
    Pete
Children
No data