Iterating efficiently

Published Friday January 23rd, 2009 | by

Qt provides a whole lot of ways to iterate over a QList: You can use STL like iterators, Java like iterators, use the foreach macro or access the list elements via their index. Personally I fell in love with foreach from the first day, but it was not long that somebody remarked “foreach is slow” and that I should use STL const iterators …

With the new benchmarking extensions of QTestLib (Morten already blogged about them), it’s really easy to finally sort this out :-) . We will iterate over a QStringList with 100.000 elements (imagine you want to make a locale aware search or something alike, we will not modify the list), and compare numerous ways of iterating.

The test

Here is complete source code of the benchmark:

#include <QtCore/QStringList>
#include <assert.h>
#include <qtest.h>

class tst_iteration : public QObject
{
    Q_OBJECT
public:
    enum IterationType {
        Foreach,
        ForeachConstRef,
        Java,
        Stl,
        StlConst,
        IndexAt
    };

private slots:
    void qStringList_data() {
        QTest::addColumn<QStringList>("list");
        QTest::addColumn<IterationType>("type");

        QStringList list;
        for (int m = 100000; m >= 0; --m)
            list < < QString::number(m);

        QTest::newRow("index based") << list << IndexAt;
        QTest::newRow("Java") << list << Java;
        QTest::newRow("STL") << list << Stl;
        QTest::newRow("STL (const iterators)") << list << StlConst;
        QTest::newRow("foreach") << list << Foreach;
        QTest::newRow("foreach (const references)") << list << ForeachConstRef;
    }

The qStringList_data() method defines the test data for each test run: Each run will use the same list with 100.000 elements in it, but use a different type of loop.

    void qStringList() {
        QFETCH(QStringList, list);
        QFETCH(IterationType, type);

        int dummy = 0;

        switch (type) {
        case IndexAt:
            QBENCHMARK {
                const int listSize = list.size();
                for (int i = 0; i < listSize; ++i)
                    dummy += list.at(i).size();
            }
            break;
        case Java:
            QBENCHMARK {
                QListIterator iter(list);
                while (iter.hasNext())
                    dummy += (iter.next()).size();
            }
            break;
        case Stl:
            QBENCHMARK {
                QStringList::iterator iter = list.begin();
                const QStringList::iterator end = list.end();
                for (; iter != end; ++iter)
                    dummy += (*iter).size();
            }
            break;
        case StlConst:
            QBENCHMARK {
                QStringList::const_iterator iter = list.constBegin();
                const QStringList::const_iterator end = list.constEnd();
                for (; iter != end; ++iter) {
                    dummy += (*iter).size();
                }
            }
            break;
        case Foreach:
            QBENCHMARK {
                foreach (QString s, list) {
                    dummy += s.size();
                }
            }
            break;
        case ForeachConstRef:
            QBENCHMARK {
                foreach (const QString &s, list)
                    dummy += s.size();
            }
            break;
        default:
            break;
        }
        assert(dummy);
    }

Note that I access the size() of each list element. This “side effect” prevents the compiler from being too clever and optimizing the whole loop away :)

Just for the sake of completeness here is the rest of the file:

}; // class tst_iteration

Q_DECLARE_METATYPE(tst_iteration::IterationType);

QTEST_MAIN(tst_iteration)

#include "main.moc"

Results

The benchmarking library allows you to run the same tests with different “backends”: That is, you can let it measure the actual time, use the callgrind module of valgrind to measure cache misses or use the CPU ticks. For details see Morten’s post. Here are the results for measuring the actual time on my developer machine (2Core Intel, Ubuntu32, gcc 4.3.2):

RESULT : tst_iteration::qStringList():"index based":
     0.32 msec per iteration (total: 21, iterations: 64)
RESULT : tst_iteration::qStringList():"Java":
     0.39 msec per iteration (total: 25, iterations: 64)
RESULT : tst_iteration::qStringList():"STL":
     0.39 msec per iteration (total: 25, iterations: 64)
RESULT : tst_iteration::qStringList():"STL (const iterators)":
     0.35 msec per iteration (total: 23, iterations: 64)
RESULT : tst_iteration::qStringList():"foreach":
     3.3 msec per iteration (total: 27, iterations: 8 )
RESULT : tst_iteration::qStringList():"foreach (const references)":
     0.40 msec per iteration (total: 26, iterations: 64)

Aside from the first simple foreach, all ways of iterating do not really differ performance wise. But why is the foreach without references so much slower? Well, in this case a new QString object is created inside the loop. It doesn’t detach (it’s still a shallow copy), but since we are not doing that much else in the loop, this makes a difference by a factor of 10!

Bottom line: If you – like me – value the readability of foreach, you should probably get used to writing foreach (const Type &, list) (that is, whenever you do not alter the value). Otherwise the different styles of iterating do not really differ performance wise. Whatever you do in the loop body will in most cases have a much larger impact :)

Update: Rerunning the benchmark on Windows 32 bit with a Visual Studio 2005 compiler (release build) actually shows more severe differences in runtime performance:

RESULT : tst_iteration::qStringList():"index based":
      0.12 msec per iteration (total: 32, iterations: 256)
RESULT : tst_iteration::qStringList():"Java":
      0.24 msec per iteration (total: 31, iterations: 128)
RESULT : tst_iteration::qStringList():"STL":
      0.18 msec per iteration (total: 47, iterations: 256)
RESULT : tst_iteration::qStringList():"STL (const iterators)":
      0.18 msec per iteration (total: 47, iterations: 256)
RESULT : tst_iteration::qStringList():"foreach":
      3.8 msec per iteration (total: 31, iterations: 8)
RESULT : tst_iteration::qStringList():"foreach (const references)":
      0.48 msec per iteration (total: 31, iterations: 64)

MSVC seems to be having problems optimizing the template code away …

Did you like this? Share it:
Bookmark and Share

Posted in Test

4 comments to Iterating efficiently

Ingo says:

If suppose you did the measurements with a release build. Did you?

What about lists of PODs, e.g. a list of pointers? For those foreach and foreach with const references should be identical, right?

Andreas says:

How about a sweeping change? Replace all “foreach (T” with “foreach (const T &” in Qt and KDE?

mongaulois says:

hi,
unfortunately, you should use Qt foreach only for a read-only iterating and Qt container :(
documentation :
“Qt automatically takes a copy of the container when it enters a foreach loop. If you modify the container as you are iterating, that won’t affect the loop. (If you don’t modify the container, the copy still takes place, but thanks to implicit sharing copying a container is very fast.) Similarly, declaring the variable to be a non-const reference, in order to modify the current item in the list will not work either.”

boost foreach is really better

kkoehne says:

@Ingo: Yes, I made the measurements with a release build. Actually that is the reason why I use the C assert, and not Q_ASSERT: Q_ASSERT becomes a NOOP in the release build, and gcc optimized away some of the loops …
You are also right that the foreach const reference idiom does not gain you anything with pointers, since there is no object creation in the first place.

@Andreas: Could be worth the effort :) I’m not sure if Andre (who did the first measurements, I just polished them and made this blog entry out of it) already did that.

Commenting closed.