www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 9646] New: std.algorithm.splitter for strings has opportunities for improvement

reply d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=9646

           Summary: std.algorithm.splitter for strings has opportunities
                    for improvement
           Product: D
           Version: D2
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Phobos
        AssignedTo: nobody puremagic.com
        ReportedBy: schveiguy yahoo.com


--- Comment #0 from Steven Schveighoffer <schveiguy yahoo.com> 2013-03-04
11:10:42 PST ---
Based on the way std.algorithm.splitter is implemented, it should perform at
least as well as a simple hand-written range that does a brute force search for
separators.

However, with certain test data, and a simple range, it does not perform as
well.

An example (take from a post on the newsgroups), along with a simple MySplitter
type that can split on any substring.  I have not done any in-depth research as
to why this works better:

import std.stdio;
import std.datetime;
import std.algorithm;

struct MySplitter
{
    private string s;
    private string separator;
    private string source;
    this(string src, string sep)
    {
        source = src;
        separator = sep;
        popFront();
    }

     property string front()
    {
        return s;
    }

     property bool empty()
    {
        return s.ptr == null;
    }

    void popFront()
    {
        if(!source.length)
        {
            s = source;
            source = null;
        }
        else
        {
            size_t i = 0;
            bool found = false;
outer:
            for(; i + separator.length <= source.length; i++)
            {
                if(source[i] == separator[0])
                {
                    found = true;
                    for(size_t j = i+1, k=1; k < separator.length; ++j, ++k)
                        if(source[j] != separator[k])
                        {
                            found = false;
                            continue outer;
                        }
                    break;
                }
            }
            s = source[0..i];
            if(found)
                source = source[i + separator.length..$];
            else
                source = source[$..$];
        }
    }
}

auto message = "REGISTER sip:example.com SIP/2.0\r
Content-Length: 0\r
Contact:
<sip:12345 10.1.3.114:59788;transport=tls>;expires=4294967295;events=\"message-summaryq\";q=0.9\r
To: <sip:12345 comm.example.com>\r
User-Agent: (\"VENDOR=MyCompany\" \"My User Agent\")\r
Max-Forwards: 70\r
CSeq: 1 REGISTER\r
Via: SIP/2.0/TLS
10.1.3.114:59788;branch=z9hG4bK2910497772630690\r
Call-ID: 2910497622026445\r
From: <sip:12345 comm.example.com>;tag=2910497618150713\r\n\r\n";


void main()
{
    enum iterations = 5_000_000;
    auto t1 = Clock.currTime();
    foreach(i; 0..iterations)
    {
        foreach(notused; splitter(message, "\r\n"))
        {
            if(!i)
                writeln(notused);
        }
    }
    writefln("std.algorithm.splitter took %s", Clock.currTime()-t1);
    t1 = Clock.currTime();
    foreach(i; 0..iterations)
    {
        foreach(notused; MySplitter(message, "\r\n"))
        {
            if(!i)
                writeln(notused);
        }
    }
    writefln("MySplitter took %s", Clock.currTime()-t1);
}


result (macbook pro 2.2GHz i7):

REGISTER sip:example.com SIP/2.0
Content-Length: 0
Contact:
<sip:12345 10.1.3.114:59788;transport=tls>;expires=4294967295;events="message-summaryq";q=0.9
To: <sip:12345 comm.example.com>
User-Agent: ("VENDOR=MyCompany" "My User Agent")
Max-Forwards: 70
CSeq: 1 REGISTER
Via: SIP/2.0/TLS
10.1.3.114:59788;branch=z9hG4bK2910497772630690
Call-ID: 2910497622026445
From: <sip:12345 comm.example.com>;tag=2910497618150713


std.algorithm.splitter took 5 secs, 197 ms, and 89 μs
REGISTER sip:example.com SIP/2.0
Content-Length: 0
Contact:
<sip:12345 10.1.3.114:59788;transport=tls>;expires=4294967295;events="message-summaryq";q=0.9
To: <sip:12345 comm.example.com>
User-Agent: ("VENDOR=MyCompany" "My User Agent")
Max-Forwards: 70
CSeq: 1 REGISTER
Via: SIP/2.0/TLS
10.1.3.114:59788;branch=z9hG4bK2910497772630690
Call-ID: 2910497622026445
From: <sip:12345 comm.example.com>;tag=2910497618150713


MySplitter took 3 secs, 668 ms, and 714 μs

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Mar 04 2013
parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=9646


Andrei Alexandrescu <andrei erdani.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |andrei erdani.com


--- Comment #1 from Andrei Alexandrescu <andrei erdani.com> 2013-03-04 11:31:47
PST ---
The forum discussion: http://goo.gl/CZVCB

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Mar 04 2013