Since KleidiCV's initial release in May 2024, the project has made significant strides with the release of versions 0.2.0 in September 2024 and 0.3.0 in December 2024. These updates bring many new features and performance enhancements.
As the ever-increasing numbers of cameras on mobile devices attests to, camera pipelines and image processing are one of the most popular computing tasks performed today. OpenCV is at the heart of many of these pipelines, running on hundreds of millions of devices. Therefore, we are thrilled to be announcing that KleidiCV is now enabled by default in OpenCV 4.11 on Android.
One of the most notable additions in KleidiCV 0.2.0 is the introduction of multithreading support. KleidiCV 0.1.0 was released with only single-threaded functions, leaving application developers to set up multithreading at a higher level. KleidiCV 0.2.0 now integrates into OpenCV's existing multithreading framework so that each operation is multithreaded with no additional effort from application developers. Image processing is an "embarrassingly parallel" problem so many functions in KleidiCV scale almost linearly with the number of CPUs. For example, if running on a Linux server with 16 CPUs available then you can expect many KleidiCV functions to go almost 16 times faster with multithreading enabled.
KleidiCV's new multithreading support, combined with the impressive performance uplifts it already provided, makes enabling it in OpenCV essentially a no-brainer. Therefore, we are happy to announce that we have worked with the good folks of OpenCV to enable KleidiCV by default on Android in OpenCV 4.11. For those building OpenCV 4.10 from source it was already easy to enable KleidiCV by setting a single configuration flag, and this option remains in OpenCV 4.11 and is required to enable KleidiCV in OpenCV on Linux. However, for Android that flag has now been flipped on by default. This brings the performance benefits of KleidiCV 0.3.0 to all Android applications using OpenCV as soon as they upgrade to OpenCV 4.11. The easiest way to integrate OpenCV with KleidiCV into your Android application is through the OpenCV Maven package.
Many more OpenCV functions are accelerated, including cv::exp(), cv::pyrDown(), cv::buildOpticalFlowPyramid() and more. Further, feature support for already-supported functions is expanded, for example float32 images in cv::resize and more kernel sizes in cv::GaussianBlur. See the changelog for a complete list.
cv::exp()
cv::pyrDown()
cv::buildOpticalFlowPyramid()
cv::resize
cv::GaussianBlur
With the addition of multithreading support, we see even more stratospheric speedups than we previously reported. The benchmarks for Sobel show a speedup of over 400%, which means more than 5 times performance!
The measurements compare OpenCV 4.11 with KleidiCV disabled and enabled, running on two Cortex-A710 cores of a Samsung Galaxy S22 SM-S901B. Unless otherwise specified, the benchmarks are operating on 1080p images. If you'd like to see more details of how we run our benchmarks, the scripts we used are available in the KleidiCV repository.
KleidiCV accelerates a small subset of OpenCV functionality. In some cases, this will match what your application uses, in others it will not. To make the most of KleidiCV you may be able to make some small changes in your application to match the functionality that KleidiCV provides. The authoritative list of requirements is in KleidiCV's OpenCV documentation, but here are some examples:
Operations like Gaussian blur read many pixels in the source image for each pixel in the destination image. If the src and dst image are the same, then extra effort must be made to avoid reading pixels that have already been changed. Therefore, to allow KleidiCV to take the most efficient approach, it only supports separate src and dst images. If src and dst are the same, then OpenCV's usual Gaussian blur will be used instead, and you will not see a performance boost.
OpenCV supports specifying sigmaX & sigmaY as 0. If this is done then KleidiCV will use a sigma value computed from the kernel size that allows it to run significantly faster than with non-zero sigmas. If sigmaX or sigmaY must be non-zero but bit-exact blur is not required then the hint parameter (new in OpenCV 4.11) can be set to ALGO_HINT_APPROX to enable KleidiCV to still provide a major speedup. The kernel size must match one of the kernel sizes that KleidiCV accelerates - in KleidiCV 0.3.0 these are 3x3, 5x5, 7x7 and 15x15.
sigmaX
sigmaY
hint
ALGO_HINT_APPROX
Some OpenCV functions take a border mode as an argument, for example reflect or wrap. Each KleidiCV function supports a subset of these, which may or may not match the OpenCV default border mode. For example KleidiCV's Sobel function is extremely fast but to use it from OpenCV requires explicitly specifying a border mode of replicate.
An in-depth example of building an Android application that takes advantage of KleidiCV's performance enhancements is provided as an Arm Learning Path.
Although KleidiCV is powerful it's still small. In future releases we will continue expanding its capabilities to more comprehensively accelerate OpenCV's functionality, and remove some of the constraints listed above. But ultimately it's our users who matter most in deciding what's most important for KleidiCV's development so we would love to hear your feedback! You can raise an issue on our GitLab repo.
KleidiCV 0.2.0 and 0.3.0 deliver major improvements in terms of speed and functionality, while maintaining its high-quality bar. Now with KleidiCV enabled by default on Android in OpenCV 4.11, applications can see speedups of up to 5X on some key functions as soon as they upgrade. To make the most of KleidiCV you can make some small changes to your application, and we would love to hear your feedback.
Learn more