www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Compiling in std.regex affecting performance

The following code:

import std.datetime;
import std.regex;
import std.stdio;
import std.array;

void test()
{
    auto sw = StopWatch(AutoStart.yes);

    auto text  = "bla\n".replicate(1024);
    auto lines = text.split("\n");

    sw.stop();
    writefln("done in %s usecs.", sw.peek.usecs);
}

void unused()
{
    // auto pattern1 =
regex(r"^(import|file|binary|config)\s+([^\(]+)\(?([^\)]*)\)?\s*$");
}

void main()
{
    test();
}

Over several runs sw reports a steady average of 700 usecs on my
machine. If I uncomment the regex call in the unused function then the
timing goes a little crazy at first, and then stays near ~2000 usecs.
It doesn't drop by much and never reaches the 700usecs that I got
without regex compiled in. (Btw, by using the call in the unused
function this actually pulls in regex. A lone import won't compile in
std.regex since the compiler can figure out it's unused and doesn't
need compiling). Timings with regex:

D:\dev\code\d_code>dmd test.d && test.exe
done in 48707 usecs.

D:\dev\code\d_code>dmd test.d && test.exe
done in 64673 usecs.

D:\dev\code\d_code>dmd test.d && test.exe
done in 2233 usecs.

D:\dev\code\d_code>dmd test.d && test.exe
done in 2181 usecs.

D:\dev\code\d_code>dmd test.d && test.exe
done in 1909 usecs.

The map files give out some clues, here's a visual comparison ( thx
Vladimir ;) ):

Without regex: http://thecybershadow.net/d/mapview/view.php?id=4f07a35cbacaa
With regex: http://thecybershadow.net/d/mapview/view.php?id=4f07a45467101

You can see that unitab adds 128KB to the exe.

So help me out here, what causes these slowdowns? Is this due to
cache-misses (i.e. text segment doesn't fit in cache)? Obviously the
GC has to allocate on the heap and this can affect timings, but on
multiple runs it does show that pulling regex into the exe affects
performance.

Hope I'll learn something new today.. :)
Jan 06 2012