www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 9645] New: std.algorithm.splitter on string with char as separator performs badly in certain situations

reply d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=9645

           Summary: std.algorithm.splitter on string with char as
                    separator performs badly in certain situations
           Product: D
           Version: D2
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Phobos
        AssignedTo: nobody puremagic.com
        ReportedBy: schveiguy yahoo.com


--- Comment #0 from Steven Schveighoffer <schveiguy yahoo.com> 2013-03-04
10:55:33 PST ---
std.algorithm.splitter where the separator is a char can perform worse than the
version where the separator is a substring.  It should be at worse equivalent,
since you can always degrade to the substring version given the single char
separator.  I would expect it to be faster.

In cases where the content to separator ratio is 1 to 1 up to a certain point
(did not test for it), then the single-char version performs better than its
substring counterpart.  But in cases where the ratio of content to separator is
large (arguably the more common case), it does horribly.

A simple test (ratio of 35:1 for content to separator):

import std.datetime;
import std.algorithm;
import std.stdio;

string line =
"bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbba";

void main()
{
    enum iterations = 1_000_000;
    auto t1 = Clock.currTime();
    foreach(i; 0..iterations)
    {
        foreach(s; splitter(line, 'a'))
        {
            if(i == 0)
                writeln(s);
        }
    }
    writefln("char splitter took %s", Clock.currTime()-t1);

    t1 = Clock.currTime();
    foreach(i; 0..iterations)
    {
        foreach(s; splitter(line, "a"))
        {
            if(i == 0)
                writeln(s);
        }
    }
    writefln("substring splitter took %s", Clock.currTime()-t1);
}

result (macbook pro 2.2GHz core i7):

bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb

char splitter took 2 secs, 702 ms, and 188 μs
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb

substring splitter took 1 sec, 71 ms, and 736 μs

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Mar 04 2013
next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=9645


bearophile_hugs eml.cc changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |bearophile_hugs eml.cc


--- Comment #1 from bearophile_hugs eml.cc 2013-03-04 13:37:29 PST ---
See also Issue 8013

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Mar 04 2013
prev sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=9645


monarchdodra gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |monarchdodra gmail.com


--- Comment #2 from monarchdodra gmail.com 2013-08-22 07:36:21 PDT ---
(In reply to comment #0)
 char splitter took 2 secs, 702 ms, and 188 μs

 substring splitter took 1 sec, 71 ms, and 736 μs

Changes where made to improve the situation, However, due to performance bug 10848, it is still slower. That said, I have a standing performance improvement pull for find: https://github.com/D-Programming-Language/phobos/pull/1492 fix Issue 10403 - memchr optimization for std.algorithm.find http://d.puremagic.com/issues/show_bug.cgi?id=10403 With it, I'm getting: char splitter took 429 ms, 298 μs, and 8 hnsecs substring splitter took 2 secs, 323 ms, 301 μs, and 7 hnsecs Gone from twice as slow to five times as fast. Not bad :) -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Aug 22 2013