Gunnar Sletta

New Scene Graph Renderer

Published Monday September 2nd, 2013 | by

Few phrases are more misused in software, whispered in social channels, promised over the negotiating table or shouted out loud in blogs, as the words “The next release will make everything better!”

So… I won’t say that.

But…

Qt 5.2 introduces a new renderer to the Qt Quick Scene Graph.

When we set out to do the scene graph some three years ago, one of my visions was that we would be able to really take advantage of OpenGL and have game-like performance. The renderer we have in Qt 5.0 went a long way towards that goal. For opaque content, we sort based on state and render content front-to-back to minimize GPU overdraw. In the playground/scenegraph repository, we have a different renderer which batches similar primitives together to reduce the number of draw calls (which implicitly also reduces state changes).

The new renderer in Qt 5.2 combines both of these techniques and also tries to identify non-changing parts of the scene so they can be retained on the GPU. Qt 5.2 also adds the texture atlas support which was previously located in the playground/scenegraph’s customcontext. This greatly helps batching of textured content. I think that with Qt 5.2, we are now pretty close to that goal.

A new doc article explaining the renderer in more detail has been added for the more curious, though a deep understanding of the renderer is not needed to write well-performing applications. However, I suspect many people will still find it interesting.

http://doc-snapshot.qt-project.org/qt5-stable/qtquick-visualcanvas-scenegraph-renderer.html

 

There are still a lot of other ideas that could be employed for those who want to have a stab at it. If you have ideas, ping “sletta” on IRC or clone and start coding.

Now some numbers:

Three of the benchmarks are available here: https://github.com/qtproject/playground-scenegraph/tree/master/benchmarks

  • Extreme Table contains a large static table with some animated content on top. It shows the benefit of the GPU retention.
  • List Bench shows a number of simultaneously scrolling lists with an icon, alternating background color, and two texts per cell.
  • Flying Icons contains over 3000 images which are being animated with a unique animation per item.

I also included the front-page of the Qt Quick Controls gallery example and Samegame. Samegame is played in “Zen” mode while clicking to the rhythm of “Where Eagles Dare” by Iron Maiden. (I know… very scientific)

The number of OpenGL draw calls and amount of traffic is measured using apitrace, an awesome tool if you’re debugging OpenGL. As can be seen by the green numbers, we’ve managed to cut down the number of glXxx calls quite a bit, especially for the cases where we have plenty of re-occurrence, meaning lists, tables and grids.

The amount of traffic per frame is also significantly improved. The best examples are the “ExtremeTable” and “ListBench” which are sampled at a frame where no new delegates were being added or removed. I’m quite happy that the “ListBench” comes out as a zero-transfer frame. There is of course some traffic, the draw calls themselves and a couple of uniforms; but no vertex data and no texture data, which is what the tool measures. “FlyingIcons” changes the entire scene every frame so nothing can be retained, so minimal difference is expected. Controls Gallery is mostly static, but has an animated progress bar which needs a new texture every frame. This is the majority of the 20+kb transfer. Samegame comes out pretty high, primarily because of its extensive use of particle systems. A lesson learned is that if you are on a tiny memory bus, limit your particles.

These are theoretical numbers, but they give a good idea that the new renderer is on the right track. Let us look at numbers from running on hardware. I’ve created tables out of the cases where the new renderer had the most impact. The full table of numbers is at the very bottom.

Note, these benchmarks are run without the new V4-based QML engine. This is because the V4 engine also affects the results and I wanted to focus on rendering only.

The rendering time is measured with vsync enabled, but excluding the swap. So when we see that the MacBook spends 16ms rendering, it is actually being throttled while we are issuing OpenGL commands. When Iooked at this in Mac OS X’s Instruments, I saw that the driver was spending a lot of time in synchronous waiting, aka we have stalling in the pipeline. With the Qt 5.2 rendering, the synchronous waiting while drawing is gone. This is good news, as lists and list-like content is quite common in UIs.

In both the MacBook with Integrated chip and on the Nexus, the new renderer drastically reduces the time spent issuing OpenGL commands. It should also be said that the only reason the render times for the “ExtremeTable” did not go all the way to 0 is because of https://bugreports.qt-project.org/browse/QTBUG-32997

“FlyingIcons” high CPU load is mostly primarily due to it running 3000+ animations in parallel, but as we can see from the time spent in the renderer, that there is still significant improvement.

Here are the rest of the numbers:

So this matches up pretty well with the theoretical results at the beginning. For reoccurring content, such as lists and grids, Qt 5.2 is quite an improvement over Qt 5.1. For those use cases we didn’t radically improve, we at least didn’t make them any worse.

Enjoy!

40 Comments


Posted in Graphics, OpenGL, Qt Quick 2.0

Introducing Boot to Qt – A Technology Preview

Published Tuesday May 21st, 2013 | by

For a few months now, we have been working on a new project under the codename Boot to Qt, and today we launch it as a technology preview.

Boot to Qt is a commercial offering that provides a fully integrated solution for the creation of slick user interfaces on embedded devices. The offering includes:

  • A light-weight UI stack for embedded linux, based on the Qt Framework – Boot to Qt is built on an Android kernel/baselayer and offers an elegant means of developing beautiful and performant embedded devices.
  • Ready-made images – We have images for several different devices which include the Boot to Qt software stack, making it possible to get up and running with minimal effort from day one.
  • Full Qt Creator Integration – One-click deploy and run on hardware and a fully featured development environment.
  • Simulator – A VirtualBox based simulator which allows device development without hardware and opens up for simulating hardware input, such as GPS and  connectivity.

This technology preview focuses on the stack built on an Android baselayer. We also want to provide a similar software stack and the same convenience with ready-made images and IDE integration also for traditional embedded Linux, hopefully with a preview coming some time this summer.

We are expecting to have an official release towards the end of this year.

The following video shows Boot to Qt in action on our reference hardware:

And the following video show the Boot to Qt SDK works:

Scope of Boot to Qt

The software stack includes most of the Qt Framework:

  • Qt Core, Qt Gui, Qt Network, Qt Widgets, Qt Xml
  • Qt QML and Qt Quick
  • Qt Quick Controls
  • Qt Graphical Effects
  • Boot to Qt specific additions, including virtual keyboard, brightness control and power off/reboot functionality

The hardware devices supported in the Technology Preview are:

This is not a fixed set, but a place for us to start. If you have suggestions for other devices, let us know. The stack can also run on x86 hardware.

Right now, the stack is single-process. The launcher is a QML application which launches other QML applications in-process. We have looked briefly into using Android Gralloc APIs to do multiprocess sharing of hardware buffers, and we know it can be done, but we consider this out of the 1.0 scope.

We have also had similar discussions around Multimedia and Webkit, we want to have them in place, but maybe not in the initial release. The current software stack is already quite powerful and serves a number of different use cases.

Performance

Qt 5 introduced a new OpenGL ES 2.0 based scene graph to power Qt Quick 2. This makes Qt Quick very suitable for running on embedded hardware, even those with moderate specs. The demo launcher we ship with the images for instance, runs velvet at 60 FPS on all our hardware devices.

We were looking at CPU usage while playing around in the application launcher on the Nexus 7. When idle, it uses a shader to add a glow on the currently selected item and has a small particle system on the Qt logo in the corner. We found that when the launcher was just animating the glow on the active item and running the small particle system on the Qt logo, the CPU load was running at about 50%. When we flicked it, it dropped to 30% and when the finger was down and we were moving the list via touch, it dropped to below 20%. So it seemed that the more we did, the less the CPU load became. What we were observing was CPU frequency scaling. The CPU is a Quad-core clocked at 1.2GHz (with a special 1.3Ghz single-core operating mode), but when idle, it had disabled 3 cores and had scaled the one remaining core to 102Mhz. So we were able to animate a large part of a 1280×800 screen at 60FPS on a CPU clocked at 102Mhz, and were still only using half of that. 

For reference, the same animation runs at 2% CPU on the i.MX6 and 15% on the Beagle, none of which does do frequency scaling.

We also have pretty decent startup times. Below is a diagram comparing Boot to Qt to native Android. Now of course, full Android brings in a lot of additional stuff, but that is also the point. Most embedded devices do not need that.

Startup times, in seconds, from power-on until device reaches the B2Qt launcher or the Android Homescreen.
Lower is better

This is not too shabby, but we believe we can cut this down a bit more, at least when we start exploring various embedded Linux configurations. As an example,  Qt 5 on Raspberry Pi can start rendering after as little as 3 seconds.

Getting Access

For more information, see the product page.

Boot to Qt is available for evaluation upon request. If you want to try it out or if you are just interested in the software, please use the contact form on the product page and we will be happy to get you started. Of course, feel free to leave comments and questions on this blog too.

Enjoy!

78 Comments


Posted in Embedded, Qt

Introducing QWidget::createWindowContainer()

Published Tuesday February 19th, 2013 | by

Qt 5 introduced a new set of OpenGL classes in Qt Gui and a new rendering pipeline for Qt Quick with the scenegraph. As awesome as these are, they were based on the newly introduced QWindow, making it very hard to use them in existing applications.

To remedy this problem, Qt 5.1 introduces the function QWidget::createWindowContainer(). A function that creates a QWidget wrapper for an existing QWindow, allowing it to live inside a QWidget-based application. Using QQuickView or QOpenGLContext together with widgets is now possible.

How to use

QQuickView *view = new QQuickView();
...

QWidget *container = QWidget::createWindowContainer(view);
container->setMinimumSize(...);
container->setMaximumSize(...);
container->setFocusPolicy(Qt::TabFocus);

widgetLayout->addWidget(container);

How it works

The window container works by forcing the use of native child widgets inside the widgets hierarchy and will reparent the window in the windowing system. After that, the container will manage the window’s geometry and visibility. The rendering of the window happens directly in the window without any interference from the widgets, resulting in optimal performance.

As can be seen from the code-snippet above, the container can also receive focus.

Embedding the “Other Way”

This feature covers the use case where an application wants to either port an existing view to either Qt Quick or the new OpenGL classes in Qt Gui. What about the use case where an application’s mainview is written with widgets, say QGraphicsView, and the application wants to keep this and rewrite the surrounding UI with Qt Quick? Well, this is doable by keeping the main application QWidget-based and making each of the big UI blocks a QQuickView embedded in its own container.

Enjoy!

22 Comments


Posted in OpenGL, Qt, Qt Quick 2.0

Render Thread Animations in Qt Quick 2.0

Published Monday August 20th, 2012 | by

One of the shortcomings of the Qt Quick API is that despite having a dedicated rendering thread, our animations are always running on the GUI thread.

Running animations outside the application’s main thread has the advantage that it greatly reduces jerkyness as operations that block the main thread will not hinder the animations from running.

There are three primary problems hindering us:

  • Animations update properties and these are tied to QObjects and the meta object system. To avoid threading insanity, we can only read and write these on the GUI thread.
  • Properties often have bindings in QML, which trigger JavaScript execution which must happen on the GUI thread.
  • The threaded render loop in the QtQuick library is driven by the GUI thread and does not redraw unless told, so if the GUI is blocked, nothing updates.

As I mentioned in my previous post, the render loop in the “customcontext” in ssh://codereview.qt-project.org:29418/playground/scenegraph.git fixes the third problem, but that leaves the issue of access to QObjects and JavaScript execution.

A colleague of mine, Marko Niemelä, has been working on an animation system that solves the QObject / QML binding part. His work is in the “animators” directory of the playground repository.

This is not Qt 5.0 material, but maybe we can get it in good shape for 5.1.

Enjoy!

12 Comments


Posted in Performance, Qt Quick 2

Scene Graph Adaptation Layer

Published Wednesday August 1st, 2012 | by

Both the public documentation for the scene graph and some of my previous posts on the subject have spoken of a backend or adaptation API which makes it possible to adapt the scene graph to various hardware. This is an undocumented plugin API which will remain undocumented, but I try to go through it here, so others know where to start and what to look for. This post is more about the concepts and the ideas that we have tried to solve than the actual code as I believe that the code and the API will most likely change over time, but the problems we are trying to solve and the ideas on how solve them will remain.

Some of these things will probably make their way into the default Qt Quick 2.0 implementation as the code matures and the APIs stabilize, but for now they have been developed in a separate repo to freely play around with ideas while not destabilizing the overall Qt project.

The code is available in the customcontext directory of ssh://codereview.qt-project.org:29418/playground/scenegraph.git

Renderer

When we started the scene graph project roughly two years ago, one of the things we wanted to enable was to make sure we could make optimal use of the underlying hardware. For instance, based on how the hardware worked and which features it supports, we would traverse the graph differently and organize the OpenGL draw calls accordingly. The part of the scene graph that is responsible for how the graph gets turned into OpenGL calls is the renderer, so being able to replace it would be crucial.

One idea we had early on was to have a default renderer in the source tree that would be good for most use cases, and which would serve as a baseline for other implementations. Today this is the QSGDefaultRenderer. Other renderers would then copy this code, subclass it or completely replace it (by reimplementing QSGRenderer instead) depending on how the hardware worked.

Example. On my MacBook Pro at the time (Nvidia 8600M GT), I found that if I did the following:

  1. Clear to transparent,
  2. render all objects with some opaque pixels front to back with blending disabled, while doing “discard” on any non-opaque pixel in the fragment shader, but writing the stacking order to the z-buffer,
  3. then render all objects with translucency again with z-testing enabled, this time without the discard,

I got a significant speedup for scenes with a lot of overlapping elements, as the time spent blending was greatly reduced and a wast amount of pixels could be ignored during the fragment processing. Now, in the end, it turned out (perhaps not surprising) that “discard” in the fragment shader on both the Tegra and the SGX is a performance killer, so even though this would have been a good solution for my mac book, it would not have been a good solution for the embedded hardware (which was overall goal at the time).

On other hardware we have seen that the overhead of each individual glDrawXxx call is quite significant, so there the strategy has been to try to find different geometries that should be rendered with the same material and batch them together while still maintaining the visual stacking order. This is the approach taken by the “overlap renderer” in the playground repository. Cudos to Glenn Watson in the Brisbane office for the implementation.

Some other things that the overlap renderer does is that it has some compile-time options that can be used to speed things up:

  • Geometry sorting – based on materials, QSGGeometryNodes are sorted and batched together so that state changes during the rendering are minimal and also draw calls are kept low. Take for instance a list with background, icon and text. The list is drawn with 3 draw calls, regardless of how many items there are in it.
  • glMapBuffer – By letting the driver allocate the vertex buffer for us, we potentially remove one vertex buffer allocation when we want to move our geometry from the scene graph geometry to the GPU. glVertexAttribPointer (which is all we have on stock OpenGL ES 2.0) mandates that the driver takes a deep copy, which is more costly.
  • Half-floats – The renderer does CPU-side vertex transformation and transfers the vertex data to the GPU in half-floats to reduce the memory bandwidth. Since the vertex data is already in device space when transferred, the loss of precision can be neglected.
  • Neon assembly – to speed up the CPU-side vertex transformation for ARM.

If you are curious about this, then I would really want to see us being able to detect the parts of a scene graph that is completely unchanged for a longer period of time and store that geometry completely in the GPU as vertex buffer objects (VBO) to remove the vertex transfer all together. I hereby dare you to solve that nut :)

And if you have hardware with different performance profiles, or if you know how to code directly in the language of the GPU, then the possibility is there to implement a custom renderer to make QML fly even better.

Texture implementation

The default implementation of textures in the QtQuick 2.0 library is rather straightforward. It uses an OpenGL texture with the GL_RGBA format. If supported, it tries to use the GL_BGRA format, which saves us one RGBA to BGRA conversion. The GL_BGRA format is available on desktop GL, but is often not available on embedded graphics hardware. In addition to the conversion which takes time, we also make use of the glTexImage2D function to upload the texture data, which again takes a deep copy of the bits which takes time.

Faster pixel transfer

The scene graph adaptation makes it possible to customize how the default textures, used by the Image and BorderImage elements, are created and managed. This opens up for things like:

  • On Mac OS X, we can make use of the “GL_APPLE_client_storage” extension which tells the driver that OpenGL does not need to store a CPU-side copy of the pixel data. This effectively makes glTexImage2D a no-op and the copying of pixels from the CPU side to the GPU happens as an asynchronous DMA transfer. The only requirement is that the app (scene graph in this case) needs to retain the pixel bits until the frame is rendered. As the scene graph is already retained this solves itself. The scene graph actually had this implemented some time ago, but as I didn’t want to maintain a lot of stuff while the API was constantly changing, it got removed. I hope to bring it back at some point :)
  • On X11, we can make use of the GLX_EXT_texture_from_pixmap where available and feasible to directly map a QImage to an XPixmap and then map the XPixmap to a texture. On a shared memory architecture, this can (depending on the rest of the graphics stack) result in zero-copy textures. A potential hurdle here is that XPixmap bits need to be in linear form while GPUs tend to prefer a hardware specific non-linear layout of the pixels, so this might result in slower rendering times.
  • Use of hardware specific EGLImage based extensions to directly convert pixel bits into textures. This also has the benefit that the EGLImage (as it is thread unaware) can be prepared completely in QML’s image decoding thread. Mapping it to OpenGL later will then have zero impact on the rendering.
  • Pixel buffer objects can also be used to speed up the transfer where available

Texture Atlas

Another thing the texture customization opens up for is the use of texture atlases. The QSGTexture class has some virtual functions which allows it to map to a sub-region of a texture rather than the whole texture and the internal consumers of textures respect these sub-regions. The scene graph adaptation in the playground repo implements a texture atlas so that only one texture id can be used for all icons and image resources. If we combine this with the “overlap renderer” which can batch multiple geometries with identical material state together, it means that most Image and BorderImage elements in QML will point to the same texture and will therefore have the same material state.

Implementation of QML elements

The renderer can tweak and change the geometry it is given, but in some cases, more aggressive changes are needed for a certain hardware. For instance, when we wrote the scene graph, we started out with using vertex coloring for rectangle nodes. This had the benefit that we could represent both gradients, solid fills and the rectangle outline using the same material. However, on the N900 and the N9 (which we used at the time) the performance dropped significantly when we added a “varying lowp vec4″ to the fragment shader. So we figured that for this hardware we would want to use textures for the color tables instead.

When looking at desktops and newer embedded graphics chips, vertex coloring adds no penalty and is the favorable approach, and also what we use in the code today, but the ability to adapt the implementation is there. Also, if we consider batching possibilities in the renderer, then using vertex coloring means we no longer store color information in the material and all rectangles, regardless of fill style or border can be batched together.

The adaptation also allows customization of glyph nodes, and currently has the option of choosing between distance fields based glyph rendering (supports sub pixel positioning, scaling and free transformation) and the traditional bitmap based glyph rendering (similar to what QPainter uses). This can then also be used to hook into system glyph caches, should these exist.

Animation Driver

The animation driver is an implementation of QAnimationDriver which hooks into the QAbstractAnimation based system in QtCore. The reason for doing this is to be able to more closely tie animations to the screen’s vertical blank. In Qt 4, the animation system is fully driven by a QTimer which by defaults ticks every 16 milliseconds. Since we know that desktop and mobile displays usually update at 60 Hz these days, this might sound ok, but as has been pointed out before, this is not really the case. The problem with timer based animations is that they will drift compared to the actual vertical blank and the result is either:

  • The animation advances faster than the screen updates leading to the animation occasionally running twice before a frame is presented. The visual result is that the animation jumps ahead in time, which is very unpleasant on the eyes.
  • The animation advances slower than the screen updates leading to the animation occasionally not running before a frame is presented. The visual result is that the animation stops for a frame, which again is very unpleasant on the eyes.
  • One might be extremely lucky and the two could line up perfectly, and if they did that is great. However, if you are constantly animating, you would need very high accuracy for a drift to not occur over time. In addition, the vertical blank delta tends to vary slightly over time depending on factors like temperature, so chances are that even if we get lucky, it will not last.

I try to illustrate:

Timer-driven animations

The image tries to illustrate how advancing animations based on timers alone will almost certainly result in non-smooth animation

The scene graph took an alternative approach to this by introducing the animation driver, which instead of using a timer, introduces an explicit QAnimationDriver::advance() which allows exact control over when the animation is advanced. The threaded renderer we currently use on Mac and EGLFS (and other plugins that specify BufferQueueing and ThreadedOpenGL as capabilities), uses the animation driver to tick exactly once, and only once, per frame. For a long time, I was very happy with this approach, but there is one problem still remaining…

Even though animations are advanced once per frame, they are still advanced based to the current clock time, when the animation is run. This leads to very subtle errors, which are in many cases not visible, but if we keep in mind that both QML loaders, event processing and timers are fired on the same thread as the animations it should be easy to see that the clock time can vary greatly from frame to frame. This can result in a that an object that should move 10 pixels per frame could move for instance 8, 12, 7 and 13 pixels over 4 frames. As the frames are still presented to screen at the fixed intervals of the vertical blank, this means that every time we present a new frame, the speed will seem different. Typically this happens in the case of flicking a ListView, where every time a new delegate is created on screen, that animation advance is delayed by a few milliseconds causing the following frame feel like it skips a bit, even though the rendering performance is perfect.

I try to illustrate:

Animations using predictive times vs clock times

Animations using predictive times vs clock times

So some time ago, we added a “time” argument to QAnimationDriver::advance(), allowing the driver to predict when the frame would be presented to screen and thus advance it accordingly. The result is that even though the animations are advanced at the wrong clock time, they are calculated for the time they get displayed, resulting in is velvet motion.

A simple solution to the problem of advancing with a fixed time would be to increment the time with a fixed delta regardless, and Qt also implements this option already. This is doable by setting

QML_FIXED_ANIMATION_STEP=1

in the environment. However, the problem with this approach is that there are frames that take more than the vsync delta to render. This can be because it has loads of content to show, because it hooks in some GL underlay that renders a complex scene, because a large texture needed to be uploaded, a shader needed to be compiled or a number of other scenarios. Some applications manage to avoid this, but on the framework level, recovery in this situation needs to be handled in a graceful manner. So in the case of the frame rendering taking too much time, we need to adapt, otherwise we slow down the animation. For most applications on a desktop system, one would get away with skipping a frame and then continuing a little bit delayed, but if every frame takes long to render then animations will simply not work.

So the perfect solution is a hybrid. Something that advances with a fixed interval while at the same time keeps track of the exact time when frames get to screen and adapts when the two are out of sync. This requires a very accurate vsync delta though, which is why it is not implemented in any of our standard plugins, and why this logic is pluggable via the adaptation layer. (The animation driver in the playground repo implements this based on QScreen::refreshRate()). So that on a given hardware, you can get the right values and to do the right thing.

And last, despite all the “right” things that Qt may or may not do, this still requires support from the underlying system. Both the OpenGL driver and the windowing system may impose their own buffering schemes and delays which may turn our velvet into sandpaper. We’ve come to distinguish between:

  • Non blocking – This leads to low latency with tearing and uneven animation, but you can render as fast as possible and current time is as good as anything. In fact, since nothing is throttling your rendering, you probably want to drive the animation based on a timer as you would otherwise be spinning at 100% CPU (a problem Qt 5.0 has had on several setups over the last year). Qt 5.0 on linux and windows currently assumes this mode of rendering as it is the most common default setting from the driver side.
  • Double buffered w/blocking swap – This leads to fairly good results and for a long time I believe this was the holy grail for driving animations. Event processing typically happens just after we return from the swap and as long as we advance animations once per frame, they end up being advanced with current clock time with an almost fixed delta, which is usually good enough. However, because you need to fully prepare one buffer and present it before you can start the next you have only one vsync interval to do both animations, GL calls AND let the chip render the frame. The threaded renderloop makes it possible to at least do animations while the chip is rendering (CPU blocked inside swapBuffers), but it is still cutting it a bit short.
  • 3 or more buffers w/blocking – Combined with predictive animation delta and adaptive catch-up for slow frames, this gives perfect results. This has the added benefit that if the rendering is faster than the vsync delta, we can prepare queue up ready frames. Having a queue of prepared frames means we are much more tolerant towards single frames being slow and we can usually survive a couple of frames that take long to render, as long as the average rendering time is less than the vsync delta. Down side of this approach is that it increases touch latency.

So, we did not manage to come up with a perfect catch-all solution, but the scene graph does offer hooks to make sure that a UI stack on a given platform can make the best possible call and implement the solution that works best there.

Render Loop

The implementation inside the library contains two different render loops, one called QQuickRenderThreadSingleContextWindowManager and another one called QQuickTrivialWindowManager. These rather long and strange names have grown out of the need to support multiple windows using the QtQuick.Window module, and was named window manager for that reason, but what they really are are render loops. They control when and how the scene graph does its rendering, how the OpenGL context is managed and when animations should run.

The QQuickRenderThreadSingleContextWindowManager (what a mouthful) advances animations on the GUI thread while all rendering and other OpenGL related activities happen on the rendering thread. The QQuickTrivialWindowManager does everything on the GUI thread as we did face a number of problems with using a dedicated render thread, particularly on X11. Via the adaptation layer, it is possible to completely rewrite the render loop to fit a given system.

One problem that QML has not solved is that all animations must happen on the GUI thread. The scene graph has no problems updating it self in another thread, for instance using the QSGNode::preprocess() function, but QObject based bindings need to have sane threading behavior, so these need to happen on the GUI thread. So as a result, the threaded render loop is at the mercy of the GUI thread and it’s ability to stay responsive. Guaranteeing execution on the GUI every couple of milliseconds is hard to do, so jerky animations are still very much possible. The traditional approach to this has been that we promote the use of threading from the GUI thread to offload heavy operations, but as soon as an application reaches a certain complexity having all things forked off to other threads, including semi-complex JavaScript, becomes hard and at times unbearable, so having some enablers available allowing that certain elements to animate despite what goes on in the application’s main thread is very much needed.

To remedy this, I started playing with the idea that the GUI thread would rather be a slave of the render thread and that some animations could run in the render loop. The render loop in the playground repo implements the enablers for this in the render loop, opening for a potential animation system to run there regardless of how the GUI thread is running.

Conclusion

There are a lot of ideas here and a lot of work still to be done, and much of this does not “just work” as Qt normally does. Partially this is because we have been very focused on the embedded side of things recently, but also because graphics is hard and making the most of some hardware requires tweaks on many levels. The good news is that this API makes it at least possible to harness some of the ability on the lower levels when they are available, and it is all transparent to the application programmer writing QML and C++ using the public APIs.

Thank you for reading!

18 Comments


Posted in Performance, Qt Quick 2.0

QML Scene Graph in Master

Published Tuesday May 31st, 2011 | by

Earlier this week, we integrated the QML Scene Graph into qtdeclarative-staging.git#master. Grab the modularized Qt 5 repositories and start hacking!

The primary way to make use of it is by running qmlscene or by instantiating a QSGView and feeding it your own .qml file. The import name has been upgraded to QtQuick 2.0, so upgrade your .qml. If you do not upgrade your .qml files, the qmlscene binary will try to load all QtQuick 1.0 files as 2.0. QDeclarativeItem based plugins will not be loaded.

For a quick tour of what the QML Scene Graph is all about, we’ve compiled this video:

The source code for the video is available here. It uses the QML presentation system.

I’ll answer the question of why we are not using an existing system, to preempt the comment: We wanted something small and lightweight to solve the use cases for QML. The scene graph core is less than 10K lines of code (including class documentation) and tailored to our use case. Everything else is integration with QML and Qt. Code we would have had to write for any system. We could not have achieved something so lean for QML based on existing technologies.

Disclaimer: Do not take this blog post as the final documentation. I’m explaining the current state of things. They may change in the time to come.

Primary Features

The scene graph is not so much about offering new features as it is about changing the core infrastructure of our graphics stack to ensure that Qt and QML work their best. Some of my colleagues and I did a series of posts outlining our existing graphics stack and some of the issues with it. In the initial scene graph blog I explained how we intend to address these issues.

The QML team is working on additions to QML, like the new particle system, but I’ll let them comment on that when they feel they are ready.

  • Kim already talked about the ShaderEffectItem in his post The Convenient Power of QML Scene Graph. The main idea of the shader effect item is to open up the floodgates and let creativity run loose.
  • We’re using a new method of text drawing. The default is now based on distance fields, which gives us scalable glyphs with just a single texture. This technique supports floating point positioning and when GPU processing power allows, we can also use it to do sub-pixel anti-aliasing. This effectively makes Text elements in QML, faster, nicer and more flexible than before. We also have a different font rendering mechanism in place that is similar to native font rendering (how we draw glyphs with QPainter today), but that is not enabled at the moment.
  • We have changed some of the internals of Qt’s animation framework to be able to drive animations based on the vertical blank signal. I talked about the concept in my post Velvet and the QML Scene Graph.

Public API and Back-end API

We have split the API into two different types. The public API, which is what we expect application developers to use, and back-end API which we expect system integrators to use. The public API contains all the needed classes to render graphics in QML; to introduce new primitives and custom shading plus some convenience on top of the low-level API. All files that are visible in the generated documentation are to be considered public API. I wished I could point to the public docs, but we don’t have automatic documentation generation for the modularized repositories yet, so that will have to come later.

The back-end API includes the things like renderer and texture implementations. The idea is that we can optimize certain parts of the system on a per-hardware basis where needed. The back-end API is in private headers. Some of it might move into the public API over time, but we are not comfortable with locking down this API just yet.

Rendering Model

The scene graph is fundamentally a tree of predefined nodes. Rendering is done by populating the tree with geometry nodes. The geometry node consists of a geometry which defines the vertices or mesh to be rendered and a material which defines what to do with that geometry. The material is essentially a QGLShaderProgram with some added logic.

When its time to draw the nodes, a renderer will take the tree and render it. The renderer is free to reorganize the tree as it sees fit to improve performance, as long as the visual rendering looks the same. The default renderer will separate opaque geometry from translucent geometry. The opaque geometry is rendered first, ordered by its material to minimize state changes. Opaque geometry is rendered to both the color buffer and the depth buffer. The depth of items is decided by their original ordering in the graph. When drawing the translucent geometry, depth testing is enabled, so if opaque geometry  is covering transparent geometry, the GPU will not be doing any work for the transparent pixels. The default renderer also has a switch for rendering the opaque geometry strictly front-to-back (right now enabled by passing --opaque-front-to-back to qmlscene). Custom renderers to use different algorithms for rendering the tree can be implemented through the back-end API.

The scene graph operates on a single OpenGL context, but does not create or own it. The context can come from anywhere, and in the case of QSGView, the context comes from the QGLWidget base-class. With the graphics stack inversion which is expected to land in master later this summer, the OpenGL context will come from the window itself.

Threading Model

The QML Scene Graph is thread agnostic, it can run on any thread that has a OpenGL context bound. However, once the scene graph is set up in that context / thread, it cannot be moved. Initially we wanted to run QML animations and all the OpenGL calls in a dedicated rendering thread. Because of how QML works, this turned out to not be possible. QML animations are bound to QML properties which are QObjects which will trigger C++ code. If we were to process animations in the rendering thread and these would call into C++ we would have a synchronization mess, so instead we came up with a different approach.

The OpenGL context and all scene graph related code runs on the rendering thread, unless explicitly stated otherwise in the documentation. Animations are run in the GUI thread, but driven by an event sent from the rendering thread. Before a frame rendering starts, there is a short period where we block the GUI thread to copy the QML tree and changes in it into the scene graph. The scene graph thus represents a snapshot of the QML tree at that point in time. In terms of user API, this happens during QSGItem::updatePaintNode(). I tried to visualize this in a diagram.

The benefit of this model is that during animations, we can for the upcoming frame calculate the animations and evaluate the JavaScript related to bindings on the GUI thread while we are rendering the current frame on the render thread. Advancing the animations and evaluating the JavaScript typically takes quite a bit longer than doing the rendering, so that continues to happen while the render thread blocks for the next vsync signal. So even for a single-core CPU, there is a benefit in that animations are advanced while the render thread is idly waiting for the vsync signal, typically via swapBuffers().

Integration with QPainter

The QML Scene Graph does not use QPainter itself, but there are different ways to integrate it. The most obvious way is to make use of the QSGPaintedItem class. This class will open a painter on a QImage or an FBO depending on what the user requests and has a virtual paint() function which will be called when the user has requested an update() on the item. This is the primary porting class when changing a QDeclarativeItem to work with the scene graph. The two classes are API equivalent, but their internal workings are somewhat different. The paint() function is by default called during the “sync” phase when the GUI thread is blocked to avoid threading issues, but it can be toggled to also run on the rendering thread, decoupled from the GUI thread.

Another option is to manually render to an FBO or a QImage and add the result to a QSGSimpleTextureNode or similar.

Integration with OpenGL

We have three primary ways of integrating with OpenGL.

  • The QSGEngine::beforeRendering() signal is emitted on the rendering thread after the rendering context is bound. This signal can be used to execute GL code in the background of the scene graph with the QML UI rendered on top. Naturally, we need to not clear the background when rendering the scene graph when using this mode and there are properties in QSGEngine help with that. A typical usecase here would be to have a 3D game engine render in the background and have a QML UI on top.
  • The QSGEngine::afterRendering() signal is emitted on the rendering thread after the scene graph has completed its rendering, but before swapping happens. This can be used to render for instance 3D content on top of a QML UI.
  • Render to an FBO and compose the texture inside the scene graph. This is the preferred way to embed content inside QML that should conform to the QML states like opacity, clipping and transformation. Below is an example of Ogre3D embedded into the QML Scene Graph using an offscreen texture. An easy way to do this is to subclass the QSGPaintedItem, use FBO based rendering and do QPainter::beginNativePainting() in the paint() function.

 

Debug Modes

Right now we offer a few environment variables to help track down runaway animations.

  • QML_TRANSLUCENT_MODE=1 Setting this environment variable makes all geometry nodes render with an opacity of 0.5. Some materials may choose to completely ignore opacity, in which case this variable has no effect for it, but these should be few. This is helpful if you have expensive QML elements which are completely obscured by something else.
  • QML_FLASH_MODE=1 Setting this environment variable is similar to the QT_FLUSH_UPDATE we have in Qt. Any QML element that has any kind of graphical update happening to it will get a flash rectangle on top of it for one frame. This is helpful in tracking down runaway animations

Together these two can for instance be used to track down runaway animations behind the current view.

Where to find us?

When you starting using the new stuff, you might find issues to report, features that are missing, suggestions for improvements. The relevant places to contact us are:

  • Bugs and features: http://bugreports.qt.nokia.com. For scene graph related topics, meaning the rendering API’s, there is a “SceneGraph” component.
  • IRC: #qt-graphics on freenode.net is where most of the graphics people are.
  • Mail: There is the qt5-feedback mailing list which was set a few weeks back.
  • We will also have a discussion on QML Scene Graph during the Qt Contributor Summit.

Some Numbers

We thought it would be nice to share a few numbers on where we are at right now. Below are the numbers from running the photoviewer demo under demos/declarative with both QML 1 using Raster, OpenGL and Mesa software rendering using LLVMpipe and QML 2 using OpenGL and Mesa/LLVMpipe. Its run on an Intel Sandy Bridge i7-2600K using the on-die Intel HD Graphics 3000 GPU, Linux, Qt 5 HEAD using the XCB back-end (equivalent to X11 in 4.8 for Raster and the OpenGL paint engines).

As you can see, the QML Scene Graph gives an overall 2.5x speed up of an arbitrary QML example compared to the graphics stack we have in QML 1. The other interesting part is that LLVMpipe is in the same range as our software raster engine, in fact it is a little bit faster. This is not too surprising given that they are essentially doing the exact same. With QML 2, the multi-threaded LLVMpipe version is in fact faster than the OpenGL based QML 1. I hope this helps to reduce some of the concerns that have been raised towards Qt 5’s dependency on OpenGL.

80 Comments


Posted in OpenGL, Painting, Performance, Qt, Qt Quick

A QML Presentation System

Published Monday May 30th, 2011 | by

When I was preparing for Qt Developer Days last year, I started out with an unnamed tool to create my presentation and was annoyed with some of its shortcomings. At the time, I decided to do my slides in QML instead, partially to learn it a bit better and partially because I thought it would be kinda cool. I have since then simplified it a bit and by now I have something that I personally find useful, so maybe someone else will too. It’s all QML and JavaScript, so no compilation required.

Repository is here: https://qt.gitorious.org/qt-labs/qml-presentation-system
All .qml files use import QtQuick 2.0 so you need a fresh build of Qt 5 with the “qtquick2″ branch of qtdeclarative-staging

And here is a video walking through the tutorial and examples in the repository.

20 Comments


Posted in Qt Quick


Velvet and the QML Scene Graph

Published Thursday December 2nd, 2010 | by

First of all, let me start with a bit of clarification. I was at DevDays this year and met a lot of people and I came to understand that we had done a poor job naming the scene graph project. Because of name similarity, it gives the indication that it is similar to projects like Open Scene Graph, which is not the case and was never the intention. Our scene graph is a compact and small 2D scene graph for rendering QML files. So, from now on, we will refer to it as the QML Scene Graph. Its sole purpose for existence is to make QML better.

Moving on…

Animations should feel like velvet – silky smooth and pleasing. From the technical perspective this requires a few things:

  • Draw one frame for every frame the display can draw. Modern displays, such as the LCD or LED screens you are using to read this, are almost always clocked at 60Hz. It depends on the resolution (dpi) of the display of course, but some magic happens around 60Hz. If you update 2D graphics at 30Hz, you can really see every single frame as individual frames. As you get closer to 60Hz, the frames start to blend together and the eyes perceive fluid motion rather than frames and that is the key difference between velvet and sand paper.
  • Be done on time. To reach 60 Hz, you need to be done in no more than 16.66 ms. If you ever shoot above, you missed your mark. That means that while doing animations, you cannot do anything else than updating a few properties and then draw the stuff. If it takes time, it either needs to be ready beforehand, or you do the work in a background thread as described in my previous post. Until recently, I was convinced that missing the mark once now and again, would not be disastrous, but it really makes the difference between velvet and sand paper.
  • Do not draw in the middle of vertical refresh. That leads to Screen Tearing, basically chopping your nice frame in two and ruining what should be a moment of visual pleasure. The solution is to use some form of synchronization that ties you to the screen’s vertical refresh rate. Out of the box, Qt does not always help you here. On Mac OS X and on the Symbian N8, we are locked to vertical refresh and if you try to draw more often than that, the windowing system will block your rendering thread. On Linux / X11, Maemo and Windows, we are not locked and you can typically see tearing. Fortunately, there is a pretty simple solution. QGLWidget combined with QGLFormat::setSwapInterval set to 1, will enable the QGLWidget to be synchronized to vertical refresh. In the QML Scene Graph, we require OpenGL and we set the swap interval to 1 by default.
  • Advance the animation relative to the time between frames. If your object is moving at 1 pixel per millisecond, then it needs to move 16.66 pixels pr frame. If you missed a frame, then it needs to move 33.33 pixels the next one. If you always hit your 16.66ms mark, then this problem goes away, of course.

Yet with all these things in place, Qt cannot yet guarantee me velvet…

The missing ingredient is where the animation “tick” comes from. Qt’s animation framework uses a timer, which advances the animations and updates properties in all the objects. This sends off several update requests and eventually the frame gets repainted. There are two faults in this setup. Firstly the timer is started at 1000 / 60, which rounds down to 16, so it’s not 16.66 as it should have been. In addition, the timer is not accurate. It can fire at 15 or 17, and if it fires at 17, then the animation misses a frame.

This had been bothering me for a while, so while in San Francisco for DevDays, I had some “time off” to start digging. The result is that in the QML Scene Graph we drive the animations a bit differently. We do something along the lines of:

while (animationIsRunning) {
    processEvents();
    advanceAnimations();
    paintQMLScene();
    swapAndBlockForNextVSync();
}

The result is that we are always progressing the animation in sync with the vertical refresh, so once every 16.66 ms, and exactly once pr frame. I said that I was initially not convinced that missing the occational frame was that bad, but it took me ONE look at the result and I realized we finally had it. Velvet!

We cannot do this generally in Qt because the method we have for vertical synchronization is only through OpenGL’s swapBuffers(), so we can only tie it to one window. With Wayland or through custom OpenGL extensions, we can potentially get the vertical synchronization without going through swap, which means we could in theory advance animations across multiple windows, but that is out of scope for me right now. For now, it is fixed for that single window running the QML Scene Graph.

The repository is here: http://qt.gitorious.org/qt-labs/scene-graph

32 Comments


Posted in OpenGL, Painting, Performance

Qt Scene Graph – Round 2

Published Friday October 8th, 2010 | by

Earlier this year, I wrote a post introducing the qt scene graph project. That project has since then gone dormant, but we’ve been playing around with a fork internally for a while now. The idea is to take the basic structure we had in the scene graph and place QML on top of it.

The repository is available here:
http://qt.gitorious.org/qt-labs/scene-graph

To compile it you need a non-make-installed source build of Qt Master, then download the repository and simply do:
qmake -r
make

And you should have a QtSceneGraph library in lib and a qmlscene binary in bin. Most plain .qml files run out of the box with qmlscene. I say “most” because we forked QML back in July and are missing a lot of bug-fixes in QML since then and some aspects of focus handling and mouse/keyboard handling is simply missing. The native plugin API is also something that is still being worked on.

As you understand its far from complete yet, but it gives you an idea of what we are researching to see how Qt and QML can better take advantage of graphics hardware. We hope you find it interesting, enjoy!

(update 2010-10-11: URL of project on Gitorious changed.)

13 Comments


Posted in Graphics, OpenGL, Performance, Qt Quick

  1. Pages:
  2. 1
  3. 2
  4. 3
  5. 4