Introducing PyOpenCL

March 27, 2014

2 minute read time.

I will start my blog series with a subseries on technology that I know and come to love best - OpenCL from the Khronos Group. OpenCL aims to provide functional portability across computer systems comprised of general-purpose processors (e.g. CPUs) and special-purpose accelerators (e.g. GPUs), to which I will now and then refer to as accelerated systems.

OpenCL is often criticised for presenting a steep learning curve for software engineers but I don't think that's fair: the verbosity of the OpenCL API is really a consequence of its generality. Once you have written a couple of OpenCL programs, you realise that most of the code running on the host processor is actually boilerplate.

I will sidestep this verbosity of OpenCL altogether by using PyOpenCL - a neat Python module written by Andreas Klöckner. (If you are reading this Andreas, keep up the good work!)

Just install PyOpenCL and NumPy - and you are ready to roll!

In [1]:

import pyopencl as cl

import numpy as np

Trivial example

Suppose we want to create an array containing integers from 0 to 15.

In [2]:

N = 16

That's trivial using NumPy:

In [3]:

np_range = np.arange(N, dtype=np.int32)

np_range

Out[3]:

array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15], dtype=int32)

but our PyOpenCL example will fill in a similar array using OpenCL:

In [4]:

cl_range = np.zeros(N, dtype=np.int32)

cl_range

Out[4]:

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int32)

Gimme some context!

Creating a context could hardly be easier:

In [5]:

context=cl.create_some_context()

Ditto creating a command queue:

In [6]:

queue=cl.CommandQueue(context)

Building and running

An OpenCL C program equivalent to np.arange(N) follows:

In [7]:

source = '''

kernel void arange(global int * buffer)

{

const size_t gid = get_global_id(0);

buffer[gid] = convert_int(gid);

}

'''

The kernel will be launched as N work-items over a one-dimensional range [0, N-1]. Each work-item will get its unique index gid in the range (that is, an integer between 0 and N-1 inclusive) and write it into argument buffer at offset gid.

Let's build the program:

In [8]:

program = cl.Program(context, source).build()

allocate a memory buffer:

In [9]:

memory_flags = cl.mem_flags.WRITE_ONLY | cl.mem_flags.ALLOC_HOST_PTR

memory = cl.Buffer(context, flags=memory_flags, size=cl_range.nbytes)

launch the kernel:

In [10]:

kernel = program.arange(queue, [N], None, memory)

and copy the results from the buffer to cl_range:

In [11]:

cl.enqueue_copy(queue, cl_range, memory, wait_for=[kernel])

cl_range

Out[11]:

array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15], dtype=int32)

Finally, let's confirm that arrays np_range and cl_range match element-wise:

In [12]:

np.all(np_range == cl_range)

Out[12]:

True

Concluding remarks

I am very interested in getting feedback. I am aiming primarily at software engineers already familiar with OpenCL so I can jump straight to more advanced topics. But if you feel you like a little more explanation, just ask me: I will be happy to explain, or point you to some excellent learning resources out there, or indeed consider filling in details in my later posts...

Also, I prepared this post using the wonderful IPython Notebook environment ("Introducing PyOpenCL" as IPython notebook). I haven't seen much in the way of using PyOpenCL and IPython together, so would be very grateful for any links.

In my next post, I start discussing how to optimise OpenCL kernels for the ARM Mali-T600 and Mali-T700 GPU series.

Chris Varnsverry over 11 years ago

FYI there are also Mali based odroid boards available
- Cancel
- Up 0 Down
- Reply
- More
- Cancel
Tom Gall over 11 years ago

The ARM based Samsung Chromebook with a Mali T604 is a good choice. I suspect the upcoming Chromebook 2 will also be a great choice but given the hardware isn't available yet it's too early. The Arndale board is also a good choice.
I like the odroid boards but like the odroid-xu for instance doesn't include OpenCL support for Linux unless you sign an NDA with Imagination.
- Cancel
- Up 0 Down
- Reply
- More
- Cancel
Tarik Saidani over 11 years ago

Is there any affordable ARM Mali GPU/ OpenCL enabled that can be used for development?
- Cancel
- Up 0 Down
- Reply
- More
- Cancel

Mobile, Graphics, and Gaming blog

Unlock the power of SVE and SME with SIMD Loops

Vidya Praveen

SIMD Loops is an open-source project designed to help developers learn SVE and SME through hands-on experimentation. It offers a clear, practical pathway to mastering Arm’s most advanced SIMD technologies…
- September 19, 2025
What is Arm Performance Studio?

Jai Schrem

Arm Performance Studio gives developers free tools to analyze performance, debug graphics, and optimize apps on Arm platforms.
- August 27, 2025
How Neural Super Sampling works: Architecture, training, and inference

Liam O'Neil

A deep dive into a practical, ML-powered approach to temporal super sampling.
- August 12, 2025

AI blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded and Microcontrollers blog

Internet of Things (IoT) blog

Laptops and Desktops blog

Mobile, Graphics, and Gaming blog

Operating Systems blog

Servers and Cloud Computing blog

SoC Design and Simulation blog

Tools, Software and IDEs blog

Introducing PyOpenCL

Trivial example

Gimme some context!

Building and running

Concluding remarks

Unlock the power of SVE and SME with SIMD Loops

What is Arm Performance Studio?

How Neural Super Sampling works: Architecture, training, and inference