Qt Weekly #4: Benchmarking Code

Published Thursday April 3rd, 2014 | by

Qt provides several built-in ways to benchmark the runtime costs of code. Let’s have a quick glance at the most common ones.

There comes a time in every non-trivial application when performance starts to matter. In fact, it’s common for developers to pro-actively try to mitigate this by doing micro-optimizations, and aiming for optimized coding patterns based on ‘common wisdom’. However, hardware and compilers evolve, and soon you’ll get long discussions at the coffee machine about the best way to iterate, to pass arguments, and so on – and whether it matters in the end.

Wouldn’t it be cool to come up with some hard data in these cases? Qt provides a few simple ways to measure timing characteristics of code.

Qt Test library

The Qt Test library makes it very easy to create a benchmark. Even more so if you’re using the “Other Project/Qt Unit Test” project wizard in Qt Creator to let you generate the stub code that is needed. Then you’ve to just add the lines to be benchmarked inside the QBENCHMARK macro:

void BenchmarkTest::testCase1()
QBENCHMARK {
// code to benchmark goes here
}


The test will rerun the code inside the macro as often as it’s needed to get a somewhat accurate measurement. You can use e.g. the -iterations option as a command line argument to fine tune how often the test will run. If your code can’t be just repeated this way, you might have to use the QBENCHMARK_ONCE macro and -minimumtotal option to instead re-run the whole test case.

It’s also worth considering which back-end would best suit the use case: While ‘wall-time’ is the default and available everywhere, using the CPU tick counter (‘-tickcounter’ argument) to measure CPU ticks usually gives more stable measurements.

In any case, you should take care that the results are somewhat stable and meaningful. If you want to do some more statistical analysis of the data, you can use the -csv switch to import the results in a spreadsheet.

Tip 1: Always use release builds of Qt and your benchmark.

Tip 2: Make sure the code in QBENCHMARK has a side-effect, so it’s not optimized away by the compiler. A common practice is e.g. to write results to a ‘volatile’ variable.

QElapsedTimer

Using the Qt Test Library works well if the code you want to measure is self-contained. But what if you want to measure the runtime costs of code inside your application that can’t be easily ‘factored out’ to a test case? Qt has a number of timers, but the most useful one for benchmarking is QElapsedTimer:


QElapsedTimer timer;
timer.start();
slowOperation1();
qDebug() << "The slow operation took" << timer.nsecsElapsed() << "nanoseconds";

QElapsedTimer will use the most accurate clock available. This however also means that the actual resolution and accuracy of the timer can vary greatly between systems.

Tip 3: On Linux, the CALLGRIND_START_INSTRUMENTATION and CALLGRIND_STOP_INSTRUMENTATION macros can be used in a similar way to get even more accurate and stable results (based on CPU instructions executed).

console.time(), console.timeEnd()

Finally, there are also helpers in QML/JS to measure the time between two events:

console.time("loadPuzzle");
Logic.startNewGame(gameCanvas,"puzzle","levels/level"+acc+".qml");
console.timeEnd("loadPuzzle");

will print e.g.

loadPuzzle: 90ms

as debug output.

What else?

The API’s described here are simple and easy to use if you know what you want to measure. But they don’t help you much if you don’t know yet where to start. Tools like Valgrind/Callgrind, Intel VTune, Apple XCode Instruments excel at analyzing your complete program. For Qt Quick, Qt Creator ships a QML Profiler.

Qt Weekly is a new blog post series that aims to give your daily Qt usage a boost. Stay tuned for next week’s post!

Did you like this? Share it:

Posted in Performance, Qt | Tags: , ,

One comment to Qt Weekly #4: Benchmarking Code

Another hint: on Linux, there’s an extra backend called -perf, which uses the Linux performance counter API and can get you information from the system, like CPU cycles, branch mis-predictions, cache misses, etc. It’s as stable as the tick counter.

Commenting closed.