New Scene Graph Renderer

Published Monday September 2nd, 2013 | by

Few phrases are more misused in software, whispered in social channels, promised over the negotiating table or shouted out loud in blogs, as the words “The next release will make everything better!”

So… I won’t say that.

But…

Qt 5.2 introduces a new renderer to the Qt Quick Scene Graph.

When we set out to do the scene graph some three years ago, one of my visions was that we would be able to really take advantage of OpenGL and have game-like performance. The renderer we have in Qt 5.0 went a long way towards that goal. For opaque content, we sort based on state and render content front-to-back to minimize GPU overdraw. In the playground/scenegraph repository, we have a different renderer which batches similar primitives together to reduce the number of draw calls (which implicitly also reduces state changes).

The new renderer in Qt 5.2 combines both of these techniques and also tries to identify non-changing parts of the scene so they can be retained on the GPU. Qt 5.2 also adds the texture atlas support which was previously located in the playground/scenegraph’s customcontext. This greatly helps batching of textured content. I think that with Qt 5.2, we are now pretty close to that goal.

A new doc article explaining the renderer in more detail has been added for the more curious, though a deep understanding of the renderer is not needed to write well-performing applications. However, I suspect many people will still find it interesting.

http://doc-snapshot.qt-project.org/qt5-stable/qtquick-visualcanvas-scenegraph-renderer.html

 

There are still a lot of other ideas that could be employed for those who want to have a stab at it. If you have ideas, ping “sletta” on IRC or clone and start coding.

Now some numbers:

Three of the benchmarks are available here: https://github.com/qtproject/playground-scenegraph/tree/master/benchmarks

  • Extreme Table contains a large static table with some animated content on top. It shows the benefit of the GPU retention.
  • List Bench shows a number of simultaneously scrolling lists with an icon, alternating background color, and two texts per cell.
  • Flying Icons contains over 3000 images which are being animated with a unique animation per item.

I also included the front-page of the Qt Quick Controls gallery example and Samegame. Samegame is played in “Zen” mode while clicking to the rhythm of “Where Eagles Dare” by Iron Maiden. (I know… very scientific)

The number of OpenGL draw calls and amount of traffic is measured using apitrace, an awesome tool if you’re debugging OpenGL. As can be seen by the green numbers, we’ve managed to cut down the number of glXxx calls quite a bit, especially for the cases where we have plenty of re-occurrence, meaning lists, tables and grids.

The amount of traffic per frame is also significantly improved. The best examples are the “ExtremeTable” and “ListBench” which are sampled at a frame where no new delegates were being added or removed. I’m quite happy that the “ListBench” comes out as a zero-transfer frame. There is of course some traffic, the draw calls themselves and a couple of uniforms; but no vertex data and no texture data, which is what the tool measures. “FlyingIcons” changes the entire scene every frame so nothing can be retained, so minimal difference is expected. Controls Gallery is mostly static, but has an animated progress bar which needs a new texture every frame. This is the majority of the 20+kb transfer. Samegame comes out pretty high, primarily because of its extensive use of particle systems. A lesson learned is that if you are on a tiny memory bus, limit your particles.

These are theoretical numbers, but they give a good idea that the new renderer is on the right track. Let us look at numbers from running on hardware. I’ve created tables out of the cases where the new renderer had the most impact. The full table of numbers is at the very bottom.

Note, these benchmarks are run without the new V4-based QML engine. This is because the V4 engine also affects the results and I wanted to focus on rendering only.

The rendering time is measured with vsync enabled, but excluding the swap. So when we see that the MacBook spends 16ms rendering, it is actually being throttled while we are issuing OpenGL commands. When Iooked at this in Mac OS X’s Instruments, I saw that the driver was spending a lot of time in synchronous waiting, aka we have stalling in the pipeline. With the Qt 5.2 rendering, the synchronous waiting while drawing is gone. This is good news, as lists and list-like content is quite common in UIs.

In both the MacBook with Integrated chip and on the Nexus, the new renderer drastically reduces the time spent issuing OpenGL commands. It should also be said that the only reason the render times for the “ExtremeTable” did not go all the way to 0 is because of https://bugreports.qt-project.org/browse/QTBUG-32997

“FlyingIcons” high CPU load is mostly primarily due to it running 3000+ animations in parallel, but as we can see from the time spent in the renderer, that there is still significant improvement.

Here are the rest of the numbers:

So this matches up pretty well with the theoretical results at the beginning. For reoccurring content, such as lists and grids, Qt 5.2 is quite an improvement over Qt 5.1. For those use cases we didn’t radically improve, we at least didn’t make them any worse.

Enjoy!

Did you like this? Share it:
Bookmark and Share

Posted in Graphics, OpenGL, Qt Quick 2.0

40 comments to New Scene Graph Renderer

Lilian says:

Wow, this is really great news.
Thank you guys for you work!

Always very happy to hear about performance improvements.

Andreykon says:

Gunnar, is it possible to use perspective projection in scene renderer? What should I subclass? It has to be possible somehow because qt3d does it!

Gunnar Sletta says:

The scene graph renderer is intended to render 2D content and this is what it optimizes for. However, it is possible to create subtrees that render 3D content with perspective projection. It is exactly the same as how you would do 3D in a plain OpenGL context and you need to take into account that the renderer will use the depth buffer for its own purposes, but it is possible.

What you do is that you implement your custom item with a QQuickItem::updatePaintNode implementation that creates a subtree. At the top of the subtree is a QSGTransformNode which sets up the projective transform and beneath it you populate with one ore more QSGGeometryNodes which have 3D geometry. I really should create an example on how to do this at some point :)

Qt3D in its current form renders into an FBO with its own depth buffer. The next generation of Qt3D will support multiple render passes to be able to do shadows, depth blurring and other nifty things so it does not use the scene graph directly. For some cases, it might be able to avoid the FBO by using beforeRendering, but that can only happen when the Qt3D scene is not subject to, for instance, clipping and inherited transparency.

I added a section about mixing with 3D to the renderer docs: http://doc-snapshot.qt-project.org/qt5-dev/qtquick/qtquick-visualcanvas-scenegraph-renderer.html#mixing-with-3d-primitives

Andreykon says:

Thank you for the answer.
But I have a feeling that on top of my QSGTransformNode there is some higher tree-level projection matrix and my matrix will be multiplied by that matrix anyway. Am I wrong?

Andreykon says:

I try to implement what you’ve said. But something is wrong with z value from what I can see. You mention in documenatation that something unusual happens to Z value but I can’t understand the way I should handle it.

Gunnar Sletta says:

There is a transform node above your nodes reflecting the modelview transform of the item and there is a projection matrix which will convert the modelview to screen space and adjust the z values a bit.

Each item will have its z-values compressed into a band. One band for each item so two 3D items will not be able to interleave unless you place them all in the same subtree under your own projection matrix.

Andreykon says:

But what if I want qml items like Text or Button to be in subtree of my custom 3d item? Will it work correctly as well?

Gunnar Sletta says:

This would work if you implemented the texts and buttons under your perspective transform node. Other than that, you would have to rely on the 2.5D transforms provided by Item.transform, like the Flipable element is doing.

Amazing job! I really admire the job you’re doing with the Scene Graph. How can I try it ? Is it already in development branch of qtdeclarative ?

Gunnar Sletta says:

This was integrated into the “dev” branch yesterday.

kamre says:

What about performance with 10^5 antialised line segments in the scene? Any improvements for such use case?

Gunnar Sletta says:

It depends on how you implement the lines and what the rest of the scene looks like :)

To get the most out of the renderer, you would rely on multisampling for antialiasing and implement each line segment as two triangles (GL_TRIANGLES). Using multisampling for antialiasing means the primitives are opaque so they can be batched freely independently of what the rest of the scene looks like (ref: http://doc-snapshot.qt-project.org/qt5-dev/qtquick/qtquick-visualcanvas-scenegraph-renderer.html#opaque-primitives). Using GL_TRIANGLES is preferable as the renderer only tries to batch GL_TRIANGLES and GL_TRIANGLE_STRIP. I did it this way, though it would be theoretically possible to also batch GL_LINES, because QtQuick internally only uses these two drawing modes and I figured that any non-trivial vector path implementation would also end up using these two.

If the lines are static, you would batch them and upload them into one VBO and store them on the GPU and render them again and again with a single glDrawElements(). It should be instantaneous.

Using GL_LINES would cause the renderer to issue a unique glDrawXxx call plus setting the matrix for every one, like it does in Qt 5.1. The upload would still be in one VBO though, so it might still offer some benefit.

If the lines change every frame, the vertices would be re-uploaded every frame in one glBufferData(), but if the conditions are right, it would still be drawn in one go and the benefits would be similar to those of the “FlyingIcons”

If the lines are implemented using per-vertex antialiasing and GL_TRIANGLES, they could still be batched, but depending on what the rest of the scene looks like, it might have to be split into several batches.

kamre says:

Thank you for the answer!
Need to dig deeper with all this batching and to write some tests for different scenarios…

Li Li (Anthony Li) says:

The same question. How does this related to Qt3D module?

Could we get this update from dev or stable branch?

Gunnar Sletta says:

Qt3D is a separate module which doesn’t use the scene graph so it is not directly related. See my comment to “Andreykon” above.

The code is available in the “dev” branch.

HGH says:

Is “Samegame” on Nexus 7 actually slower?

Gunnar Sletta says:

It is “different” :)
I was a bit puzzled by this and was unsure how to include it in the blog, so thanks for brining it up.

What happens on the Nexus is that in the 5.1 case, we’re rendering most frames very fast, but a few times per second, I’m getting 30ms frames which cause a skip and the animations jerk. With the 5.2 renderer, it renders the first frames very fast and then stabilizes on 16ms and does NOT skip frames. I reasoned that what is happening is that we’re queuing up the buffer pipeline and once it is full we get throttled while rendering, which is not ideal, but better than 30ms frames a few times a second.

Dominik Holland says:

Really amazing work Gunnar !

Thx to all who made this possible ;-)

This looks awesome!

We will soon run our internal performance tests of the new renderer on a range of mobile devices, majorily iOS & Android as these are most important for our use case (mobile games).

The current version of V-Play uses Qt4 and a customized version of cocos2d-x (which also uses a scene graph) to get the rendering performance needed for high-end games. If we can fully switch to Qt 5.2 this would simplify a lot of things, so keep up with the good work and really looking forward to Qt 5.2!

m][sko says:

from doc

The default renderer does not do any CPU-side viewport clipping nor occlusion detection. If something is not supposed to be visible, it should not be shown. Use Item::visible: false for items that should not be drawn. The primary reason for not adding such logic is that it adds additional cost which would also hurt applications that took care in behaving well.

not really good for game engines
but really easy to add with some small side-effects

I hope that list element handle this wisely
Neverending list is nice future

Steven Ceuppens says:

Great work Gunnar!

Hi Gunnar, thanks for the great post!

The documentation for the scene graph renderer you linked mentions that PNG images are treated as transparent by default. Is there a way from QML to tell the engine that a specific image is fully opaque without resorting to QQuickImageProvider?

Gunnar Sletta says:

There is a way, but unfortunately it uses ShaderEffect which at the moment prevents batching, so the best if you have many opaque images on the screen is a custom image provider.

ShaderEffect {
property variant source: Image { source: "translucent.png" };
blending: false
}

David says:

Hi Gunnar,

This looks really great. Awesome work!

Can you add some benchmark numbers for the software fall back option? (llvmpipe?)

I’m currently using Qt 4.8 w/qwidgets for a line-of-business that runs in an environment with a fairly good x64 CPU, but no GPU is available,

Qt5 and QtQuick in particular look like a good fit for my app, but I worry about performance.

Is there a going to be a well-supported “CPU rendered” fall back?

Thank you

Gunnar Sletta says:

We are not currently looking into CPU fallbacks, so the only option which is easily available is llvmpipe. I will try to remember to include it in future benchmarks.

The numbers at the bottom of: http://blog.qt.digia.com/blog/2011/05/31/qml-scene-graph-in-master/ indicates that scene graph on top of llvmpipe offers a benefit over Qt 4.8 with raster, but such numbers will also be subject to usecase and hardware.

David says:

Hi Gunnar,

Thanks for the reply. Yes, including llvm-pipe for the future would be great.

The numbers from that post you reference are over 2.5 years old. (!) I’m sure Qt, LLVM, and llvm-pipe have had many changes since then.

If you have time to run even a quick sanity check benchmark now, that would be very awesome.

David

Dennis Adams says:

What has become of the work to allow the scenegraph rendering to proceed even of the GUI thread is busy? This is needed for smooth animation and video playback.

Gunnar Sletta says:

5.1 introduced the low-level enabler for render thread animations with the new render loop. There is an example “examples/quick/scenegraph/threadedanimation” which now uses public scene graph API to spin a progress indicator.

I don’t know when I will have time to look at the public QML API for render thread animations. I’m hoping it will be soon, but I don’t want to make any promises.

Dennis Adams says:

With Qt 5.1, on some platforms the scenegraph is not tied to vsync and instead uses timers, which results in missed or duplicated frames, which hurts smooth animation and video playback. Does Qt 5.2 improve this?

In a related question, when does Windows get a separate QML render thread like Mac OS?

Gunnar Sletta says:

In 5.1, all platforms drive animations by vsync. Unless you defined QSG_RENDER_LOOP=plain we don’t use timers for animations anymore. If run into specific issues , please file bugs to bugreports.qt-project.org.

The threaded render loop can be enabled on windows by setting the envvar QSG_RENDER_LOOP=threaded, but there are issues with window resizing. I’m not sure when it can be made the default.

William R. J. Ribeiro says:

Amazing stuff! This looks very promising specially for mobile where resources are so limited (not talking about those new 8 Core + 4Gb beasts…).

But what’s next? Any changes on the planning for it’s release? Nothing special to keep us wanting it even more?
Thanks!

m][sko says:

plz can you fix this bug in qt multimedia
https://bugreports.qt-project.org/browse/QTBUG-32939
as you change a lots of stuff in scenerenderer I think it is right time.

Or at least point me how to fix it and I will post patch
as I don’t understand scenegraph architecture fully.

Nvidia says:

Why Nvidia platforms are much faster than the others?

Thanks for putting effort to explain things. Qt is amazing as always.

Are all these sample applications used for testing available as packages, like for Nexus 7. I would like to try :-) .

Donald says:

You beautiful bastard :)

Tyler says:

Hi, I’m very impressed with Qt overall — the quality of the code, the documentation (stellar!), and the project management.

That being said, while some basic support of OpenGL/3D is essential, I would hate to see you allocate resources with the goal of making a game engine.

I’ve been an engine programmer in the games industry for a long time. A game engine is much, much more than a scene graph, and even a good scene graph is much, much more that what you have done/planned.

Many other projects exist with active communities whose focus is the creation of performant game engines. I humbly suggest that the Qt team’s effort would be better spent in recruiting these teams to provide interoperability with Qt.

A good game engine is too much work. A good scene graph is probably too much work. Trying to provide everything yourselves will, IMHO, be disappointing for everyone.

Just a suggestion.

Gunnar Sletta says:

Sorry if this was not clear, but Qt Quick is not trying to be a game engine. I believe it can be quite helpful for “casual games”, but has never and will never attempt to be a fully fledged game engine.

What Qt Quick and the scene graph tries to do is provide a kick ass graphics stack for modern user interfaces.

S.M.Mousavi says:

Thank you!!
This is a really good news :D

Petar says:

Hi,

I currently have issues with Qt5, and the fact that it does not support flag “-graphicssystem native”

I use QtCreator as code editor, but I must use it through NX Client. Using it without native graphics system makes it very slow, so I’m still using Qt4 to compile it.

Does this graphics enhancement apply also on mine situation? Does this mean I will be able to use QtCreator compiled with Qt5 without slow graphics through NX Client?

Regards,
Petar

Commenting closed.