Computer Vision (CV) is a critical aspect of millions of AI workloads running from cloud to edge and any AI applications that seek to understand camera and video-based data. Arm KleidiCV is an open-source library of optimized performance-critical routines for Arm CPUs. It is designed for integrating into any CV framework to enable best performance for CV workloads on Arm, with no action needed by application developers.
In consumer devices, camera quality is one of the most important factors in a consumer’s choice of mobile phone. But the quality of a camera isn't just about the lens or sensor.
Between photons hitting a camera sensor and an image appearing on a screen the image data must go through many transformations. These will typically include but are not limited to:
Some of these stages should be done by dedicated hardware but many stages are better performed in software. Having a stage in software gives it unmatched flexibility, allowing camera pipeline designers to rapidly iterate on their ideas and craft a photography experience that differentiates their product from competitors.
Camera pipelines must not only produce a high quality image but also perform quickly. This is both to allow consumers to record 4K video at 60fps and to rapidly take high quality still images. A camera that can take beautiful images is no use if the photo opportunity disappears by the time the camera has woken up.
So it is desirable to have camera pipelines running in software in order to create a stunning look, but this software needs to run fast.
We at Arm have taken many of the operations that make up the nuts and bolts of image processing and highly optimized them to make the most of the power and efficiency of existing and future Arm CPUs. How did we do it? Instead of using plain C functions, KleidiCV is written using ACLE (Arm C Language Extensions) intrinsics that map directly to powerful Arm SIMD (Single instruction, multiple data) instructions. Each KleidiCV function has three different implementations targeting Neon, SVE2 (Scalable Vector Extension) or Streaming SVE & SME2 (Scalable Matrix Extension). KleidiCV will automatically detect what hardware it's running on and select the best implementation accordingly.
KleidiCV is a small but growing collection of simple yet fast low level operations on images. These include:
KleidiCV can be used as a lightweight standalone image processing library. Alternatively KleidiCV can be used seamlessly as part of the extremely popular OpenCV library. If you are already using OpenCV then with very little effort you can enable KleidiCV to accelerate your image processing.
Using OpenCV's benchmarks we can see how enabling KleidiCV can accelerate OpenCV.
The code was built using Android NDK 26d. The below benchmarks show the performance uplift of KleidiCV on the Cortex®-X2 core of a Samsung Galaxy S22 phone at an image size of 1920*1080.
At present KleidiCV has no built-in support for multithreading. (Image processing is an "embarrassingly parallel" problem so in principle adding multithreading to KleidiCV is easy, but we're taking the time to get our API right to give developers the control they need in a multitasking environment with heterogenous CPUs). Therefore to keep comparisons meaningful we benchmark with multithreading disabled in OpenCV. The benchmarks show single core performance.
The benchmark scores vary a lot between each operation. In some cases the improvement is slight, but in the best case KleidiCV runs in a fraction of the time it takes standard OpenCV. The mean uplift across the different operations is over 75%.
If you'd like to see more details of how we run our benchmarks, the scripts we used are available in the KleidiCV repository.
In a Java project the easiest way to get OpenCV with KleidiCV enabled is to use the OpenCV 4.10 Maven package. This will be available at https://central.sonatype.com/artifact/org.opencv/opencv.
Alternatively you can enable KleidiCV 0.1 when building OpenCV 4.10 with CMake by simply adding the argument -DWITH_KLEIDICV=ON:
cmake -S /path/to/opencv-4.10 -B build-opencv-with-kleidicv -DWITH_KLEIDICV=ON cmake --build build-opencv-with-kleidicv --parallel
You can also build KleidiCV as a standalone library. See the build documentation for details.
At Arm we take security extremely seriously. Our Security Development Lifecycle is embedded into every step of how we work.
Where possible, KleidiCV functions will validate their parameters, for example returning an error should an argument fall outside its valid range.
The project has extensive automated tests. The core library code has excellent branch coverage at well over 99%, and 100% line coverage.
KleidiCV is available now as source code under the Apache License, Version 2.0.
The KleidiCV library is still small, but watch this space as we add more functionality. Your feedback is welcome. You can raise an issue on our GitLab repo:
KleidiCV Gitlab repo
You can also check out Arm KleidiAI, a library offering a similar set of optimized routines for integrating into any AI framework and enabling AI accelerations on Arm CPUs.