Arm Community
Arm Community
  • Site
  • User
  • Site
  • Search
  • User
Arm Community blogs
Arm Community blogs
AI blog Updates in KleidiCV: Multithreading support and OpenCV 4.11 integration
  • Blogs
  • Mentions
  • Sub-Groups
  • Tags
  • Jump...
  • Cancel
More blogs in Arm Community blogs
  • AI blog

  • Announcements

  • Architectures and Processors blog

  • Automotive blog

  • Embedded and Microcontrollers blog

  • Internet of Things (IoT) blog

  • Laptops and Desktops blog

  • Mobile, Graphics, and Gaming blog

  • Operating Systems blog

  • Servers and Cloud Computing blog

  • SoC Design and Simulation blog

  • Tools, Software and IDEs blog

Tags
  • OpenCV
  • KleidiCV
Actions
  • RSS
  • More
  • Cancel
Related blog posts
Related forum threads

Updates in KleidiCV: Multithreading support and OpenCV 4.11 integration

Mark Horvath
Mark Horvath
February 25, 2025
4 minute read time.
This blog post is co-authored by Michael Platings, Staff Software Engineer at Arm.

Since KleidiCV's initial release in May 2024, the project has made significant strides with the release of versions 0.2.0 in September 2024 and 0.3.0 in December 2024. These updates bring many new features and performance enhancements.

As the ever-increasing numbers of cameras on mobile devices attests to, camera pipelines and image processing are one of the most popular computing tasks performed today. OpenCV is at the heart of many of these pipelines, running on hundreds of millions of devices. Therefore, we are thrilled to be announcing that KleidiCV is now enabled by default in OpenCV 4.11 on Android.

KleidiCV 0.2.0 and 0.3.0: What's new?

Multithreading support

One of the most notable additions in KleidiCV 0.2.0 is the introduction of multithreading support. KleidiCV 0.1.0 was released with only single-threaded functions, leaving application developers to set up multithreading at a higher level. KleidiCV 0.2.0 now integrates into OpenCV's existing multithreading framework so that each operation is multithreaded with no additional effort from application developers. Image processing is an "embarrassingly parallel" problem so many functions in KleidiCV scale almost linearly with the number of CPUs. For example, if running on a Linux server with 16 CPUs available then you can expect many KleidiCV functions to go almost 16 times faster with multithreading enabled.

Enhanced integration with OpenCV 4.11

KleidiCV's new multithreading support, combined with the impressive performance uplifts it already provided, makes enabling it in OpenCV essentially a no-brainer. Therefore, we are happy to announce that we have worked with the good folks of OpenCV to enable KleidiCV by default on Android in OpenCV 4.11. For those building OpenCV 4.10 from source it was already easy to enable KleidiCV by setting a single configuration flag, and this option remains in OpenCV 4.11 and is required to enable KleidiCV in OpenCV on Linux. However, for Android that flag has now been flipped on by default. This brings the performance benefits of KleidiCV 0.3.0 to all Android applications using OpenCV as soon as they upgrade to OpenCV 4.11. The easiest way to integrate OpenCV with KleidiCV into your Android application is through the OpenCV Maven package.

Expanded feature set

Many more OpenCV functions are accelerated, including cv::exp(), cv::pyrDown(), cv::buildOpticalFlowPyramid() and more. Further, feature support for already-supported functions is expanded, for example float32 images in cv::resize and more kernel sizes in cv::GaussianBlur. See the changelog for a complete list.

Performance uplift

With the addition of multithreading support, we see even more stratospheric speedups than we previously reported. The benchmarks for Sobel show a speedup of over 400%, which means more than 5 times performance!

Graph: KleidiCV benchmarking

The measurements compare OpenCV 4.11 with KleidiCV disabled and enabled, running on two Cortex-A710 cores of a Samsung Galaxy S22 SM-S901B. Unless otherwise specified, the benchmarks are operating on 1080p images. If you'd like to see more details of how we run our benchmarks, the scripts we used are available in the KleidiCV repository.

Making the most of KleidiCV in your OpenCV project

KleidiCV accelerates a small subset of OpenCV functionality. In some cases, this will match what your application uses, in others it will not. To make the most of KleidiCV you may be able to make some small changes in your application to match the functionality that KleidiCV provides. The authoritative list of requirements is in KleidiCV's OpenCV documentation, but here are some examples:

Use separate src and dst images

Operations like Gaussian blur read many pixels in the source image for each pixel in the destination image. If the src and dst image are the same, then extra effort must be made to avoid reading pixels that have already been changed. Therefore, to allow KleidiCV to take the most efficient approach, it only supports separate src and dst images. If src and dst are the same, then OpenCV's usual Gaussian blur will be used instead, and you will not see a performance boost.

Use optimal Gaussian blur parameters

OpenCV supports specifying sigmaX & sigmaY as 0. If this is done then KleidiCV will use a sigma value computed from the kernel size that allows it to run significantly faster than with non-zero sigmas. If sigmaX or sigmaY must be non-zero but bit-exact blur is not required then the hint parameter (new in OpenCV 4.11) can be set to ALGO_HINT_APPROX to enable KleidiCV to still provide a major speedup. The kernel size must match one of the kernel sizes that KleidiCV accelerates - in KleidiCV 0.3.0 these are 3x3, 5x5, 7x7 and 15x15.

Use supported border types

Some OpenCV functions take a border mode as an argument, for example reflect or wrap. Each KleidiCV function supports a subset of these, which may or may not match the OpenCV default border mode. For example KleidiCV's Sobel function is extremely fast but to use it from OpenCV requires explicitly specifying a border mode of replicate.

An in-depth example of building an Android application that takes advantage of KleidiCV's performance enhancements is provided as an Arm Learning Path.

Looking forward

Although KleidiCV is powerful it's still small. In future releases we will continue expanding its capabilities to more comprehensively accelerate OpenCV's functionality, and remove some of the constraints listed above. But ultimately it's our users who matter most in deciding what's most important for KleidiCV's development so we would love to hear your feedback! You can raise an issue on our GitLab repo.

Summary

KleidiCV 0.2.0 and 0.3.0 deliver major improvements in terms of speed and functionality, while maintaining its high-quality bar. Now with KleidiCV enabled by default on Android in OpenCV 4.11, applications can see speedups of up to 5X on some key functions as soon as they upgrade. To make the most of KleidiCV you can make some small changes to your application, and we would love to hear your feedback.

Learn more

Anonymous
AI blog
  • Ethos-U and Beyond: How ExecuTorch 1.0 powers AI at the edge

    Per Åstrand
    Per Åstrand
    AI meets the edge: ExecuTorch 1.0 brings PyTorch performance and portability to Arm’s tiniest, most efficient devices.
    • October 22, 2025
  • Arm neural technology in ExecuTorch 1.0

    Robert Elliott
    Robert Elliott
    With the announcement of Arm neural technology, Arm is enabling neural networks and a new class of neural graphics capabilities.
    • October 22, 2025
  • ExecuTorch 1.0 is here and with SME2 optimizations through KleidiAI

    Gian Marco Iodice
    Gian Marco Iodice
    Today marks an exciting milestone with the official general availability (GA) release of ExecuTorch 1.0, a lightweight, production-ready runtime from the PyTorch ecosystem.
    • October 22, 2025