Using OpenCL with Qt

Published Wednesday April 7th, 2010 | by

Recently we have been experimenting with OpenCL and Qt, to see what Qt needs to make it easier to use OpenCL and to see what Qt could use it for internally.  In this post we are going to give an introduction to OpenCL, the QtOpenCL wrapper library, show how to write your first QtOpenCL program, and tell you where to get more information on the project.

What is OpenCL anyway?

For those new to it, OpenCL is an open, royalty-free standard for parallel programming in a heterogeneous computing environment. The most common use you’ve probably heard of is to run arbitrary C code on your system’s GPU. These days GPU’s are more powerful than CPU’s, having been designed to pump out hundreds of thousands of textured triangles per second in your favorite shoot-em-up video game.  To do this, the GPU has access to parallel vector processing that far exceeds the capability of x86/SSE or ARM/NEON instructions on your average CPU.

For years, shader languages like GLSL have made the vector capabilities of the GPU available for arbitrary shader effects in OpenGL, but you are basically limited to whatever parameters a “draw triangle” call takes.  It’s also quite typical for OpenGL implementations to cut corners by using fixed-point and lower precisions.  The shader source code may say “float”, but it could be as little as 8 bits of actual precision.  While great for pumping out triangles where you won’t notice an “off-by-0.001″ error, this isn’t very useful for supercomputing, common mathematical algorithms, and super-precise pixel blending.

Enter OpenCL.  It defines a new C-style language that is more precise as to mathematical precision, and which allows arbitrary arguments to be provided to an OpenCL function – known as a kernel – to do almost anything that C can do. Special vector types like “float4″ are provided as well as an extensive mathematical library.  But its most impressive feature is work sizes – it is very easy to split your task up into small chunks that the GPU can scatter across all of its compute units (compute units include whatever CPUs and GPUs OpenCL can find, OpenCL uses everything available).  Unlike regular C where you can spend a lot of time writing outer loops and launching worker threads for subparts of your problem, OpenCL does it for you. We’ll see how that works shortly.


The QtOpenCL library wraps the OpenCL 1.0 API in a Qt-style API.  It takes the pain out of OpenCL initialization, program compilation, and kernel execution.  It also provides convenience functions for interfacing to existing Qt facilities such as QImage and QtOpenGL.

The following links should get you started with downloading and using QtOpenCL with either Qt 4.6 or 4.7:

QtOpenCL is still a work in progress, distributed as a standalone module outside of the normal Qt source repositories.  Suggestions and patches are welcome to make it better.

Hello QtOpenCL

We are going to make a simple program that modifies an image by multiplying the grayscale version of an image by a color.  The code is in the QtOpenCL repository under the “examples/opencl/colorize” directory. We’ll be using the following member variables in the ColorizeWidget class:

QCLContext context;
QCLProgram program;
QCLKernel colorize;
QImage dstImage;
QCLImage2D srcImageBuffer;
QCLImage2D dstImageBuffer;
QColor color;

The first thing we need to do is create the QCLContext, which determines which CPU or GPU computing device to use and opens it for our use:

if (!context.create())
    qFatal("Could not create OpenCL context");

In this example we don’t really care if the computing device is a CPU or GPU, but if we really wanted to use the same GPU as the OpenGL implementation, we could do this instead:

if (!context.create(QCLDevice::GPU))
    qFatal("Could not create OpenCL context");

The next thing we need to do is build our OpenCL program from the source file:

program = context.buildProgramFromSourceFile(QLatin1String(":/"));

Now might be a good time to look at the OpenCL code itself inside

const sampler_t samp = CLK_ADDRESS_CLAMP_TO_EDGE |
__kernel void colorize(__read_only image2d_t srcImage,
                       __write_only image2d_t dstImage,
                       float4 color)
    int2 pos = (int2)(get_global_id(0), get_global_id(1));
    float4 srcColor = read_imagef(srcImage, samp, pos);
    float gray = srcColor.x * 11.0f / 32.0f +
                 srcColor.y * 16.0f / 32.0f +
                 srcColor.z * 5.0f / 32.0f;
    float4 pixel = (float4)( * gray, srcColor.w);
    write_imagef(dstImage, pos, clamp(pixel, 0.0f, 1.0f));

We’ll break it down step by step:

  • The “__kernel” keyword introduces a special entry point function called “colorize” that we will be using later in our C++ code.
  • The “colorize” entry point takes three parameters corresponding to the source image, destination image, and the color to combine with the image.
  • The “pos” variable is set to a 2-dimensional int vector that contains the 0th and 1st global identifiers.  What?  Well, in OpenCL, every kernel execution is given an implicit argument that indicates which part of the overall work job is being performed.  In our case, we use the (x, y) co-ordinates of the image pixel we want to process.  So this line is basically fetching the current pixel, and the “colorize” function is only working on one pixel at a time.
  • A “srcColor” value is read from the source image at “pos”.  You can ignore the “samp” sampler for now – it’s an OpenCL technique for adjusting how values are extracted from image objects – we’re using a simple linear sampler.
  • We convert the “srcColor” into grayscale and then combine it with the “color”.
  • Finally, we write the pixel to the destination image.

This is all fairly straight-forward.  The main difference with a regular C function to do the above is that we haven’t included the for loops to iterate over x and y – OpenCL will be providing them for us.  Back to the C++ code now.  OpenCL kernels running in a computation device cannot directly access host memory, so we need to arrange to copy our source image into an image buffer:

QImage img(QLatin1String(":/qtlogo.png"));
srcImageBuffer = context.createImage2DCopy(img, QCLMemoryObject::ReadOnly);

We specify the source image as “ReadOnly” because the kernel will be reading it. The destination image is created in a similar fashion, but as “WriteOnly”:

dstImage = QImage(img.size(), QImage::Format_ARGB32);
dstImageBuffer = context.createImage2DDevice(dstImage.format(), dstImage.size(), QCLMemoryObject::WriteOnly);

The createImage2DDevice() function creates an image in the fastest possible OpenCL device memory. The final initialization step is to find the kernel entry point:

colorize = program.createKernel("colorize");
colorize.setLocalWorkSize(8, 8);

We set the “global” work size to the dimensions of the image, which causes OpenCL to create the implicit for loops that iterate over the x and y values for us. We set the “local” work size to 8×8, which indicates that OpenCL should process the data in 8×8 chunks and that every item in the chunk can be processed in parallel. This is how OpenCL gets its performance boost: by tweaking the local work size, we can tune the parallelism to make efficient use of the computing resources. I’ve found that 8×8 works quite well for images, so that’s what we’ll use in this example. Now that we have initialized our OpenCL context and kernel, it is on to the paintEvent():

colorize(srcImageBuffer, dstImageBuffer, color);;
QPainter painter(this);
painter.drawImage(0, 0, dstImage);

The first line executes the kernel for us with QCLKernel’s operator() override. The second line then reads the contents of “dstImageBuffer” from the OpenCL device back into “dstImage” in host memory. And then we paint it to the window as per normal Qt. And that’s basically it!

Well … not so fast!  I glossed over one small little detail – the kernel executes in the background and returns to the C++ program immediately.  So after the first line, execution will continue.  But the read() call will automatically block waiting for the kernel to complete execution, so all is fine in this example.  But if we really wanted to wait for the kernel to complete execution, we can use a QCLEvent:

QCLEvent event = colorize(srcImageBuffer, dstImageBuffer, color);
QPainter painter(this);
painter.drawImage(0, 0, dstImage);

Other Examples

The QtOpenCL repository has a number of examples that you can play with:

  • Vector addition example – another simple introduction to QtOpenCL.
  • Mandelbrot viewer program that demonstrates generating QImage data and GL textures via OpenCL.
  • Gaussian blur example and benchmarks that compare it with Qt’s graphics effects.
  • Bezier patch sub-division example to demonstrate using OpenCL like a geometry shader to generate large numbers of vertices.
  • Simple path and image drawing and blending.

And now an obligatory screenshot.  The mandelbrot viewer zooms into the well-known set down to this image:

Mandelbrot screenshot

On my Linux desktop’s NVIDIA GeForce GTX 275, this can get up to 120 frames per second, running across 30 compute units, without breaking a sweat.  To put this into perspective, the same algorithm running on the CPU struggles to achieve 5 frames per second.  Offloading all that work, and breaking it up into 8×8 work chunks makes a huge difference (initially performance wasn’t that great until I realized that it was using a 1×1 work size).

QtOpenCL and QtConcurrent

There is a little bit of interaction between QtOpenCL and QtConcurrent, as described here. Because of QtConcurrent’s background on homogeneous multi-core CPU’s, there’s a bit of work that needs to be done to truly marry the two worlds, but nothing is impossible. For now, the most useful feature of the interaction is that you can get a QFuture for a kernel execution and pass it to a QFutureWatcher for signal notification:

QCLEvent event = kernel(arg1, arg2);
QFutureWatcher<void> *watcher = new QFutureWatcher<void>(this);
connect(watcher, SIGNAL(finished()), this, SLOT(eventFinished()));

Or alternatively, using the implicit conversion between QCLEvent and QFuture:

QFutureWatcher<void> *watcher = new QFutureWatcher<void>(this);
watcher->setFuture(kernel(arg1, arg2));
connect(watcher, SIGNAL(finished()), this, SLOT(eventFinished()));

Embedded Devices

Right now, QtOpenCL works very well with desktop OpenCL implementations, like that from NVIDIA (we’ve tested it under Linux, Mac, and Windows). Embedded devices are currently another matter – OpenCL implementations are still very basic in that space.  The performance improvements on embedded CPU’s are only slightly better than using ARM/NEON instructions for example.  And embedded GPU’s are usually hard-wired for GLSL/ES, lacking many of the features that makes OpenCL really sing.  But like everything in the embedded space, things are likely to change very quickly. By releasing QtOpenCL, hopefully we can stimulate the embedded vendors to accelerate development by giving them something to test with.  Be the first embedded device on the block to get the mandelbrot demo running at 10fps, or 20fps, or 60fps!

Future Work

A lot of stuff remains to be done, particularly with respect to how we can use OpenCL inside Qt itself.  There are many places where it could be useful:

  • Accelerating image blending and path drawing in the raster paint engine.
  • Fast on-the-fly decompression and scaling of JPEG images.
  • Graphics effects: blur, colorize, bloom, etc, etc, etc.
  • Particle effects and other physics simulations.
  • Mesh subdivision and morphing algorithms in Qt/3D.

The possibilities are endless.  We look forward to your patch! :-)

Did you like this? Share it:
Bookmark and Share

Posted in OpenGL, Painting, Performance

50 comments to Using OpenCL with Qt

pns says:

Is there a plan to add physics engine to Qt along with GL support. OpenGL + Qt Kinetics with OpenCL based physics engine can take GUI to next level. With 3D displays and bumptop kind of desktop implementation will be possible with above setup.

zchydem says:

Thanks for this blog post! It is really nice to see that you guys are few steps ahead of others. I’ve been waiting for something to happen in OpenCL front. I better checkout all the examples and start to learn QtOpenCL.

detro says:

Time to reimplement Sudoku solver with backtracking ;)

NuShrike says:

Would be nice to upgrade QtGui to support multi-threaded usage which would extend to QGraphicsView with QtConcurrent, OpenGL and OpenCL. You would then really give WPF a run for its money.

Kensai says:

This is amazing indeed!

It’s great to hear Qt is not staying behind in this important GPGPU front. Waiting for the first real-world use of the QtOpenCL library.

Aamer says:

I always felt OpenCL was down the pipeline after Qt Concurrent, and now its true !!
This will indeed be a major crown in Qt’s glory :-)

kypeli says:

Excellent post and so nice to see Qt shrinking the gap between OpenCL and normal applications. This makes all the difference to make OpenCL more approachable. I think the only problem now is to find some good and useful OpenCL tutorials and references :) I found the Khronos docs horrible.

manudwarf says:

Hi !
That’s an exciting news ! Let’s hope it will be integrated in Qt ASAP !
I made a sum up in french on my website.

Thank you and have fun ;)

RCL says:

What about exposing platform / devices (which are used to create context) and providing integration with QGLContext (for devices having GL interop ability)?

DrOctavius says:

G r e a t NEW Feature!

DrOctavius says:

What about testing this in a Tegra2 SoC? Compared to other embedded devices (OMAP3/4) the Tegra2 has a powerful NVIDIA GPU built-in.

RCL: “What about exposing platform / devices (which are used to create context) and providing integration with QGLContext (for devices having GL interop ability)?”

There are QCLDevice and QCLPlatform classes which can be passed to QCLContext::create() if you want a specific device. Most of the OpenCL implementations I’ve come across so far (but not all), only have a single platform and device so the default create() call Just Works (TM), so that’s what I used in the example.

The QCLContextGL class takes care of GL interop: It is separate from QCLContext so that we can (eventually) use the basic OpenCL stuff in QtGui without creating a dependency to QtOpenGL.

john dalton says:

Thank you for putting this together! We will be very heavy users of this.

Is this going to be a part of the Qt 4.7 release, or still separate at that time? If separate, if you could supply QtOpenCL dynamic library binaries for Qt commercial customers at the 4.7 release that would be extremely useful for us.

Hoang Vu says:

This is awesome. Will it be integrated into Qt 4.8 ?

Julien McArdle says:

This is a very cool concept. Props on doing this, and I look forward to future developments!

I understand why the .cl code is separate, but it would be nice if eventually that code could be integrated into the .cpp file. Have it be as easy to deal with as starting a new thread.

In any case, great work.

Andrew Ford says:

How about having this functionality accessible from QtScript? Imagine having the speed and efficiency of OpenCL, but the flexibility of scripting… You can have your cake and eat it too!

hello says:

you guys should work on your website performance, it takes 30 seconds to load the article, most people would give up.

Igor Gaponenko says:

Any plans to support AMD/ATI cards? I guess that would be more important and useful than embedded processors.

Werner says:

This are great news. I was thinking about using GPU-power for my apps for quite a long time but never had the nerves to delve into that. Now, with Qt-support I am just about to step off the plank. I understand, that I will need both the QtOpenCL-wrapper and the OpenCL library itself. While the blog and the Qt-Documentation is very explicit about the former it says not much about the latter. Does anybody has a link to a tutorial or something similar? I am using a NVidia GPU and Windows.
Many thanks to the Trolls, anyway!

Tim Boundy says:

“Any plans to support AMD/ATI cards?”

That all depends on AMD. Currently you can download the ATI Stream SDK ( which provides AMDs implementation of OpenCL. This should work automatically with QtOpenCL, although I haven’t tested it. I think AMD plans on eventually bundling their OpenCL implementation with the GPU drivers much like what NVIDIA does.

Julien McArdle: “I understand why the .cl code is separate, but it would be nice if eventually that code could be integrated into the .cpp file.”

If you use buildProgramFromSourceCode() instead of buildProgramFromSourceFile(), you can embed OpenCL source directly into the C++ source.

Igor Gaponenko: “Any plans to support AMD/ATI cards?”

I welcome any patches or bug reports to better support other OpenCL implementations. I myself lack for AMD/ATI hardware at present.

There’s also a community opportunity – LLVM does sort of support the OpenCL language in the clang front-end, but it is fairly basic and the runtime library is missing – both NVIDIA and Apple have their own proprietry implementations built on top of LLVM. It would be really nice if there was a truly open reference OpenCL implementation, at least for running on muti-core CPU’s, which can then be extended to GPU’s later.

Mark Turney says:

Rhys, I have been excited about OpenCL, but I just haven’t had the time to dig into it, so this definitely lowers the barrier of entry. Also, I just want to warn you that a Reddit link on the page is causing a 30 second delay before users can see this story (and it causes the same issue with the main Qt Labs Blogs page).

mat69 says:

I have been using OpenCL for a few months now (private projects) and really like the possibilities it offers. Does your wrapper make error-handling easier?

detro says:

Anyone succeeded building it on Mac OS X Snow Leopard?

Great Post!

Clearly depicts the advantages of using OpenCL over a “traditional” CPU-only approach. It is fair to mention that, when developing OpenCL applications that target the GPU, you are actually obtaining a huge performance increase (depending on the algorithm, of course), at a fraction of what trying to obtain the same speedup on an x86 processor would cost.

I liked the Inteface you’ve developed for the wrapper, Qt-fying OpenCL. Well done!

Keep up the good work,

mat69: “Does your wrapper make error-handling easier?”

When an error occurs in a QtOpenCL function, it will generate a qWarning() with the name of the error (e.g. “CL_INVALID_PLATFORM”) and the function that failed, and also records the error code as QCLContext::lastError(). You can also use QCLContext::errorName() to get the name of the error in your own code if you want to display it in a dialog box or whatever. Hopefully this qualifies as “easier”. :-)

detro: “Anyone succeeded building it on Mac OS X Snow Leopard?”

Drop me an e-mail with whatever errors you are getting (address in the commit logs) and I’ll see if my Mac person can help you out.

Sebastien BAUDOUIN says:

I downloaded teh package and compiled it on my PC.
I faced several ressource issue (like .cl file found in debug but not in release mode). Do you know where the issue is coming from ?

The work that is done is really good. I’m really interesting on the possibility to execute a CLKernel onto a QTConcurrent Queue and so use the CPU instead of the GPU.

I checked the demos and example using a Nvidia OpenCL driver and but i also wanted to understand the interaction with the QTConcurrent.According to me, if we want a .cl file to be compiled and executed on a CPU i need in my OpenCL Library to get the compiler for CPU Target. So as Nvidia (in my case) is not providing one i’m stuck. I think it exist another way to test that using the support of native kernels. Do you plan to have it ?

Do you provide any basic test to experiment QTOpenCL and QTConcurrent co-existance ?

Baudouin Sebastien says:

the more i read QtOpenCL the more i think i misunderstood the link between QtOpenCL and QtConcurrent.

When we write following code where kernel is a QCLKernel object:
kernel.setGlobalWorkSize(100, 100);
QCLEvent event = QtConcurrent::run(kernel, a1, b1);

Do we use QtConcurrent as a synchronisation mechanism being notified when the kernel finishes its execution or do we use it to execute the kernel on a separate QT CPU thread ?

In the case it is for execution on QT CPU thread, i would expect that the kernel is compiled for CPU device (and so the OpenCL library support CPU device) or that we use buildProgramFromSourceCode to get host compatible binary code.

My initial goal was to use QtOpenCL as a way to support OpenCL for CPU in an efficient way like “Grand Central” is proposing (without the “block” support); without needing to have an OpenCL for CPU “stack” and for sure less optimized than QtConcurrent could be.

If someone could clarify the situation it would be nice. Nevertheless i will make some tries to better understand.

Thanks again for your work.

Yes, you will need an OpenCL implementation that compiles for the CPU to be able to run OpenCL kernels there. It is allowed by the specification to have multiple OpenCL platform implementations, so theoretically you could run NVIDIA and another implementation side by side.

If the OpenCL implementation supports native kernels, then it is possible to pass a native function pointer in and run it as a kernel with clEnqueueNativeKernel(), but my NVIDIA implementation at least does not support that feature. QtOpenCL doesn’t currently provide a way to call clEnqueueNativeKernel() as its usage is a little awkward – suggestions welcome.

The QtConcurrent integration right now is quite basic, making it possible to use a similar programming style and QFutureWatcher, but not much else. At the moment if you wanted to provide a CPU “fallback”, you would need to implement an OpenCL kernel for the GPU, and a separate QtConcurrent function compiled with the host compiler.

Sebastien BAUDOUIN says:

Thanks for your answer.
I’m still wondering how you are managing the support of several OpenCL library in QtOpenCL ?

Do you have info on the ressource issue i have, where on debug i can see the .cl file but not on release config ?

Best Regards

benoit says:

Excellent. I need to give it a try

A2K says:

Very nice!
I’ve started to implement a wrapper to avoid the tedious API calls of OpenCL apps but now I can use this one ;)

Cristian Magherusan-Stanciu says:

Just a thought: are there any plans to add this to the mobile version of Qt, so we can squeeze most of the performance of our mobile devices?

AlHadr says:

Anyone have an approach to dynamically load OpenCL.dll at runtime instead of dynamically linking? That would allow multiple code paths and allow the application to run even on systems that do not have OpenCL.

wdobbe says:

It seems the configure step is completely broken on OS X 10.6 . I couldn’t build it.

sapym says:

I cannot build the library on Windows…
Is there anybody that can help me ….please… :(

Vincent says:

I’ve written a blog-post how to setup this plugin with QT Creator:
For Windows-users it is hard, since there is not really “build-essentials” available. The focus of the blog is more on *NIX OSes like Ubuntu and MAC.

Sine says:

This is terrific! I was wondering about integrating OpenCL into Qt. It’s great to see that Qt team is showing the way for the rest!!
Congrats guys!!

mgb says:

There is a small error in the .pro files.
Using the vs2008 plugin on windows all the .vcproj files below the top src level get created with _(QTDIR) rather than $(QTDIR) for the rcc command line.
Easy to fix manually – but might be confusing.

+jaakko says:

About mac compilation,
I’m not sure what’s going on, but I added following to and (and just the LIBS to

mac {
LIBS += -framework OpenCL
QMAKE_LFLAGS_SONAME = -Wl,-install_name,@executable_path/../Frameworks/

LIBS was for undefined opencl symbols. The QMAKE_LFLAGS_SONAME allows me to use the OpenCL as ‘private framework’ ie put .so files to directory. If you’re using command line only I think could just define the location of the Qt OpenCL libraries with DYLD_LIBRARY_PATH enviroment variable.

Now everything works, and it’s looking really interesting.

Vincent says:

In I explained how to get OpenCL in QT Creator running. In I describe AMD’s Hello World Example written in QT. With these two, you can do the other examples more easily.

Will this work on POWERVR too?
Thank you QT, for this great extension!

sapym says:

I’m sorry…reall having hard time to make it work under Windows with VS2005…
Is there a chance that somebody can describe me the full installation wya ?
- Qt 4.7 beta built in non-static mode, debug-and-relase ?
- How to built correctly Opencl

I’m kinda lost….

mgb says:

@sapym with vs2008 (there is a free 90day demo download) and the vs-addin it’s relatively simple.
Build Qt4.7 in debug/release, set the QTDIR environment variable to point to this.
Use the vs-addin to open the .pro file and build the solution, copy the resulting dlls from QtOpenCL to somewhere on the path (eg QTDIRbin)

There are a couple of small bugs:
The created dll are named the same for debug/release – rename the debug ones to QtOpenCLd.dll and QtOpenCLGLd.dll
The custom build step for the .qrc files in release mode puts .debug in the path, instead of .release. Change properties->custom build step->’commandline’ and ‘outputs’
The application type for clinfo.exe is set to ‘windows’. This prevents it writing anything to stdout, change it to console (see properties->linker->System)

mgb says:

And of course, you also need the opencl libs from your graphics card maker – confusingly with Nvidia it’s in the CUDA sdk

mgb says:

@Rhys – a performance question about using QtOpenCL rather than QtOpenCLGL.
Since the dstimage is associated with a QPainter which does the actual drawing – does this mean that the resulting image is copied back into QPainter and then back to the GPU to display it?
Or is QPainter smart enough to just wrap the image data on the graphics card?

Should I use QtOpenCLGL for best performance ?

Smaki says:

Not to be nasty, but a QtOpenCL is almost pointless, take a gander at Khronos’ web page on OpenCL and you will find a compact, simple to use C++ OpenCL interface, cl.hpp.

Nadav says:

It would be nice to see an open source implementation of OpenCL for linux.

debhal says:

@Cristian – We’re still keen to see some portable hardware with GPUCPU support, and when we do you can be sure we’ll revisit this for mobile devices.

@mgb – OpenCL can map the same memory for gpu processing and CPU addressing, so it’s not QPainter that’s smart enough to avoid the copy, but OpenCL. Being able to dodge all the uploading/downloading overhead is one of the really exciting things about it.

@Smaki – Sure you don’t need QtOpenCL to use OpenCL, we just try to make it easier. The 5-line example above saves some half a page of boilerplate. If you love boilerplate, go for it!

K says:

The links for build instructions and for full documentation are broken and are getting redirected to, please fix them.

JanneK says:

@K Hi, these posts were moved over from and redirection from that URL goes to front page.
Let’s hope this gets fixed soon.

Documentation of QtConcurrent link:

Build instructions:

Commenting closed.