This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Using OpenCL on Odroid-XU3 is slower than without OpenCL

Hello, I'm using Odroid-XU3.
And I installed Opencv 3.0.0-rc1 on odroid-xu3 ubuntu 14.04.

I have two question.

First, In procedure installing opencv, there's no opencl sdk directory option.
I just check 'WITH_OPENCL' and etc.
But, Using OpenCL function is successful.
Why??????

Second, I tested OpenCV with OpenCL.
I referenced the website http://www.learnopencv.com/opencv-tra...
However, With OpenCL running time is 400ms and without OpenCL running time is 152ms.
I want to solve the weird problem.

Please help me
Thank you

Parents
  • Hi zz5414,

    I'm not so sure about your first question, but perhaps someone else on here can help with that.

    Regarding the performance you're seeing with OpenCL enabled...

    The OpenCL layer in OpenCV has been developed with desktop GPUs in mind.  And indeed on desktops you would expect to see a performance improvement when using CL linked to a suitable GPU.  As you probably know however, OpenCL is not "performance portable".  What is optimal for one platform is not necessarily optimal for another, and in fact in some circumstances where particular architectural features of a particular GPU are targeted you can find performance on other architectures quite a bit lower.  In theory, you could optimise the kernels being used more for Mali, but unfortunately this is only part of the problem.

    A fundamental aspect of OpenCV is that it's based on a synchronous architecture.  This means that you make a call into an OpenCV API and you wait for the result.  When you're using the OpenCL layer this means that you have sync points between the CPU and GPU before and after each call.  Moving between processors in this way is always going to be highly sub-optimal.  A much better approach - and indeed a necessary one if you want to get the best out of a mobile architecture - is to have a graph framework where you can queue a collection of work for the GPU that can then work independently until it has finished.  Whilst the jobs run there is no need for any interaction with the CPU.  This also makes it possible to pipeline the workload amongst all available processors.

    So I'm afraid it's not a straightforward fix.  This is a known issue with OpenCV and it's true generally across all mobile platforms.

    One thing you should look into is enabling NEON extensions with OpenCV.  I believe there is a switch in the build system to do this.  That can yield some measure of better performance.

    If you're interested in looking at OpenCL on Mali - and it's certainly possible to create you're own non-synchronous chain of kernels running very optimally - then I'd recommend looking at the malideveloper site here: http://malideveloper.arm.com/develop-for-mali/opencl-renderscript-tutorials/

    I appreciate that doesn't solve your problem, but I hope it's useful nevertheless.

    Tim

Reply
  • Hi zz5414,

    I'm not so sure about your first question, but perhaps someone else on here can help with that.

    Regarding the performance you're seeing with OpenCL enabled...

    The OpenCL layer in OpenCV has been developed with desktop GPUs in mind.  And indeed on desktops you would expect to see a performance improvement when using CL linked to a suitable GPU.  As you probably know however, OpenCL is not "performance portable".  What is optimal for one platform is not necessarily optimal for another, and in fact in some circumstances where particular architectural features of a particular GPU are targeted you can find performance on other architectures quite a bit lower.  In theory, you could optimise the kernels being used more for Mali, but unfortunately this is only part of the problem.

    A fundamental aspect of OpenCV is that it's based on a synchronous architecture.  This means that you make a call into an OpenCV API and you wait for the result.  When you're using the OpenCL layer this means that you have sync points between the CPU and GPU before and after each call.  Moving between processors in this way is always going to be highly sub-optimal.  A much better approach - and indeed a necessary one if you want to get the best out of a mobile architecture - is to have a graph framework where you can queue a collection of work for the GPU that can then work independently until it has finished.  Whilst the jobs run there is no need for any interaction with the CPU.  This also makes it possible to pipeline the workload amongst all available processors.

    So I'm afraid it's not a straightforward fix.  This is a known issue with OpenCV and it's true generally across all mobile platforms.

    One thing you should look into is enabling NEON extensions with OpenCV.  I believe there is a switch in the build system to do this.  That can yield some measure of better performance.

    If you're interested in looking at OpenCL on Mali - and it's certainly possible to create you're own non-synchronous chain of kernels running very optimally - then I'd recommend looking at the malideveloper site here: http://malideveloper.arm.com/develop-for-mali/opencl-renderscript-tutorials/

    I appreciate that doesn't solve your problem, but I hope it's useful nevertheless.

    Tim

Children
No data