Heterogeneous Multiprocessing Gets a Boost with the New OpenCL for NEON Driver

Chinese Version 中文版:NEON驱动OpenCL强化异构多处理

OpenCL - First Mali and Now NEON

I am currently in Santa Clara for ARM TechCon where the latest technologies from ARM and its partners will be on show from tomorrow. There will be a number of exciting announcements from ARM this week, but the one that I have been most involved in is the launch today of a new product that supports OpenCL™ on CPUs with ARM® NEON™ technology and also on the already supported ARM Mali™ Midgard GPUs. NEON is a 128-bit SIMD (Single Instruction, Multiple Data) architecture extension included in all the latest ARM Cortex®-A class processors, so along with Mali GPUs it’s already widely available in current generation devices and an extremely suitable candidate to benefit from the advantages of OpenCL.

What is OpenCL Anyway?

It’s worth starting with a brief explanation of why support for the OpenCL compute API is important. There are a number of industry trends that create challenges for the software developer. For example, heterogeneous multiprocessing is great for performance and efficiency, but the diversity of instruction sets can often lead to a lack of portability. Another example is that parallel computing gets a task done more quickly, but programming parallel systems is notoriously difficult. This is where OpenCL comes in. It is a computing language (OpenCL C) that enables easier, portable and more efficient programming across heterogeneous platforms, and it is also an API that coordinates parallel computation on those heterogeneous processors. OpenCL load balances tasks across all the available processors in a system; it even simplifies the programming of multi-core NEON by treating it as a single OpenCL device. This is all about efficiently matching the ‘right task to the right processor’.

Figure 1: OpenCL is especially suited to parallel processing of large data sets

Where Can I Use OpenCL?

OpenCL can be used wherever an algorithm lends itself to parallelisation and is being used to process a large data-set. Examples of such algorithms and use-cases can be found in many types of device and include:


  1. The stabilization, editing, correction and enhancement of images; stitching panoramic images
  2. Face, smile and landmark recognition (for tagging with metadata)
  3. Computer vision, augmented reality

Digital TV

  1. Upscaling, downscaling; conversion from 2D to Stereo 3D

  2. Support for emerging codec standards (e.g. HEVC)
  3. Pre- and post-processing (stabilizing, transcoding, colour-conversion)
  4. User interfaces: multi-viewer gesture-based UI and speech control


  1. Advanced Driver Assistance Systems (ADAS)
  2. Lane departure and collision warnings; road sign and pedestrian detection
  3. Dashboard, infotainment, advanced navigation and dynamic cruise control

A Tale of Two Profiles

OpenCL supports two ‘profiles’:

  1. A ‘Full Profile’, which provides the full set of OpenCL features
  2. An ‘Embedded Profile’, which is a strict subset of the Full Profile – and is provided for compatibility with legacy systems

The OpenCL for NEON driver and the OpenCL for Mali Midgard GPU driver both support Full Profile. The heritage of OpenCL from desktop systems means that most existing OpenCL software algorithms have been developed for Full Profile. This makes ARM’s Full Profile support very attractive to programmers who can develop on desktop using mature tools with increased productivity and get products to market faster. Another key benefit is that floating point calculations in OpenCL Full Profile are compliant with the IEEE-754 standard, guaranteeing the precision of results.

OpenCL for NEON and Mali - Better Together

The OpenCL for NEON and the Mali Midgard GPU drivers are designed to operate together within the same OpenCL context. This close-coupling of the drivers enables them to operate with maximum efficiency. For example, memory coherency and inter-queue dependencies are resolved automatically within the drivers. We refer to this version of OpenCL for NEON as the ‘plug-in’ because it ‘plugs into’ the Mali Midgard GPU OpenCL driver.


Figure 2: The benefits of keeping the CPU and GPU in one CL_Context

And Not Forgetting the Utgard GPUs - Mali-400 MP & Mali-450 MP

There is also a ‘standalone’ version of OpenCL for NEON that is available to use alongside Mali Utgard GPUs, such as the Mali-400 MP and Mali-450 MP. These particular GPUs focus on supporting graphics APIs really efficiently, but not compute APIs such as OpenCL. Therefore adding OpenCL support on the CPU with NEON is an excellent way to add compute capability into the system. The ‘standalone’ version is also suitable for use when there is no GPU in the system.

Reaching Out

In addition, as the diagram below shows, the ARM OpenCL framework can be connected to other OpenCL frameworks in order to extend OpenCL beyond NEON and Mali GPUs to proprietary hardware devices, for example those built with FPGA fabric. This is achieved by using the Khronos Installable Client Driver (ICD) which is supported by the ARM OpenCL framework.


Figure 3: Using the Khronos ICD to connect the ARM OpenCL context with other devices

In Summary

We've seen that OpenCL for NEON will enhance compute processing on any platform that uses a Cortex-A class processor with NEON. This is true whether the platform includes a Mali Midgard GPU, an Utgard GPU, or maybe has no graphics processor at all. However, the coupling of NEON with a Midgard GPU delivers the greatest efficiencies.

As algorithms for mobile use cases become more complex, technologies such as OpenCL for NEON are increasingly important for their successful execution. The OpenCL for NEON product is available for licensing immediately; if you would like further information please contact your local ARM sales representative.

Further Reading

For more information on OpenCL, Compute and current use cases that are being developed by the ARM Ecosystem:

Realizing the Benefits of GPU Compute for Real Applications with Mali GPUs

Interested in GPU Compute? You have choices!

GPU Compute, OpenCL and RenderScript Tutorials on the Mali Developer Center

The Mali Ecosystem demonstrate GPU Compute solutions at the 2014 Multimedia Seminars

ARM is an official Khronos Adopter and an active contributor to OpenCL as a Working Group Member

Graphics & Multimedia blog