This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

MALI OpenCL: clEnqueueNDRangeKernel and clEnqueueTask has high API overhead

Dear All,

One of my use cases of ARM Mali graphics is running Video(HEVC) Decode Kernels. But, what we discover is that the OpenCL Kernel call APIs clEnqueueNDRangeKernel and clEnqueueTask overhead is much higher than the execution time of the kernel. This reduces the overall Video decoding speed considerably.

Is there anything we can do to reduce this overhead ? Any tips ? Or if you need more details about the issue, I can explain.

Regards

Paul

Parents
  • Hi,

    clCreateKernel has nothing to do with clEnqueueNDRangeKernel.

    In your case if you want to speed up the kernel creation then you need to use a binary program.

    To generate a binary program:

    - build a program from sources

    - build all the kernels in the program (If you don't do that then the binary you will save will be an IR rather than an actual binary)

    - Retrieve the program binary using clGetProgramInfo and save it to a file.

    This should be much quicker than building from sources, if it's not it's likely that your driver is too old.

    Hope this helps,

    Anthony

Reply
  • Hi,

    clCreateKernel has nothing to do with clEnqueueNDRangeKernel.

    In your case if you want to speed up the kernel creation then you need to use a binary program.

    To generate a binary program:

    - build a program from sources

    - build all the kernels in the program (If you don't do that then the binary you will save will be an IR rather than an actual binary)

    - Retrieve the program binary using clGetProgramInfo and save it to a file.

    This should be much quicker than building from sources, if it's not it's likely that your driver is too old.

    Hope this helps,

    Anthony

Children