This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

GPU empty Kernel overhead

Hi,

                   I am observing GPU kernel is taking huge time when I am running an empty kernel. I am Using "Samsung Exynos Octa 5420 Board" which has Mali GPU. I have one kernel which is of work group size around "3000" when I am running it with logic inside kernel and passing arguments, it is taking 2 msec of time but when I run the same kernel without any logic in it and without passing any arguments it is taking 19 msec. What I heard is kernel with less load for will run very fast, but why kernel without any load is taking huge time? For answering this query just consider my logic of kernel as a simple factorial of NxN elements. I hope I gave complete information related my problem, please let me know any more information you need to solve this.

Thanks & Regards,

Narendra Kumar Chepuri.

Parents
  • Hi Narendra,

    Just FYI, the EnqueueNDRangeKernel entry point does not block until the end of kernel execution, it is an asynchronous call which enqueues the kernel to be executed and then returns. Measuring the runtime of this API call will not tell you the execution time of the kernel on the GPU, just how long it took for the entry point to return.

    Hth,

    Chris

Reply
  • Hi Narendra,

    Just FYI, the EnqueueNDRangeKernel entry point does not block until the end of kernel execution, it is an asynchronous call which enqueues the kernel to be executed and then returns. Measuring the runtime of this API call will not tell you the execution time of the kernel on the GPU, just how long it took for the entry point to return.

    Hth,

    Chris

Children