www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Slower than Python

reply "cvk012c" <cvk012c motorolasolutions.com> writes:
Tried to port my SIP parser from Python to D to boost performance
but got opposite result.
I created a simple test script which splits SIP REGISTER message
10 million times. Python version takes about 14 seconds to
execute, D version is about 23 seconds which is 1.6 times slower.
I used DMD 2.062 and compiled my script with options -release and
-O. I used Python 3.3 64 bit.
I ran both scripts on the same hardware with Windows 7 64.
Is this general problem with string performance in D or just
splitter() issue?
Did anybody compared performance of D string manipulating
functions with other languages like Python,Perl,Java and C++?


Here is Python version of test script:

import time

message = "REGISTER sip:comm.example.com SIP/2.0\r\n\
Content-Length: 0\r\n\
Contact:
<sip:12345 10.1.3.114:59788;transport=tls>;expires=4294967295;events=\"message-summary\";q=0.9\r\n\
To: <sip:12345 comm.example.com>\r\n\
User-Agent: (\"VENDOR=MyCompany\" \"My User Agent\")\r\n\
Max-Forwards: 70\r\n\
CSeq: 1 REGISTER\r\n\
Via: SIP/2.0/TLS
10.1.3.114:59788;branch=z9hG4bK2910497772630690\r\n\
Call-ID: 2910497622026445\r\n\
From: <sip:12345 comm.example.com>;tag=2910497618150713\r\n\r\n"

t1 = time.perf_counter()
for i in range(10000000):
    for notused in message.split("\r\n"):
      pass
print(time.perf_counter()-t1)


Here is D version:
import std.stdio,std.algorithm,std.datetime;

void main()
{
    auto message = "REGISTER sip:example.com SIP/2.0\r\n~
Content-Length: 0\r\n~
Contact:
<sip:12345 10.1.3.114:59788;transport=tls>;expires=4294967295;events=\"message-summary\";q=0.9\r\n~
To: <sip:12345 comm.example.com>\r\n~
User-Agent: (\"VENDOR=MyCompany\" \"My User Agent\")\r\n~
Max-Forwards: 70\r\n~
CSeq: 1 REGISTER\r\n~
Via: SIP/2.0/TLS
10.1.3.114:59788;branch=z9hG4bK2910497772630690\r\n~
Call-ID: 2910497622026445\r\n~
From: <sip:12345 comm.example.com>;tag=2910497618150713\r\n\r\n";

    auto t1 = Clock.currTime();
    foreach(i; 0..10000000)
    {
      foreach(notused; splitter(message, "\r\n"))
      {
      }
    }
    writeln(Clock.currTime()-t1);
}
Mar 01 2013
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 3/1/13 3:30 PM, cvk012c wrote:
 Tried to port my SIP parser from Python to D to boost performance
 but got opposite result.
 I created a simple test script which splits SIP REGISTER message
 10 million times. Python version takes about 14 seconds to
 execute, D version is about 23 seconds which is 1.6 times slower.
 I used DMD 2.062 and compiled my script with options -release and
 -O. I used Python 3.3 64 bit.
 I ran both scripts on the same hardware with Windows 7 64.

Add -inline to the options. Andrei
Mar 01 2013
next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 3/1/13 3:58 PM, simendsjo wrote:
 On Friday, 1 March 2013 at 20:50:15 UTC, Andrei Alexandrescu wrote:
 On 3/1/13 3:30 PM, cvk012c wrote:
 Tried to port my SIP parser from Python to D to boost performance
 but got opposite result.
 I created a simple test script which splits SIP REGISTER message
 10 million times. Python version takes about 14 seconds to
 execute, D version is about 23 seconds which is 1.6 times slower.
 I used DMD 2.062 and compiled my script with options -release and
 -O. I used Python 3.3 64 bit.
 I ran both scripts on the same hardware with Windows 7 64.

Add -inline to the options. Andrei

--noboundscheck can also help if you don't mind missing the safety net. $ rdmd -O -release sip 22 secs, 977 ms, 299 μs, and 8 hnsecs $ rdmd -O -release -inline sip 12 secs, 245 ms, 567 μs, and 9 hnsecs $ rdmd -O -release -inline -noboundscheck sip 10 secs, 171 ms, 209 μs, and 9 hnsecs

Also, the D version has a different string to parse (~ is not a line continuation character). The fixed version: auto message = "REGISTER sip:example.com SIP/2.0\r Content-Length: 0\r Contact: <sip:12345 10.1.3.114:59788;transport=tls>;expires=4294967295;events=\"message-summaryq\";q=0.9\r To: <sip:12345 comm.example.com>\r User-Agent: (\"VENDOR=MyCompany\" \"My User Agent\")\r Max-Forwards: 70\r CSeq: 1 REGISTER\r Via: SIP/2.0/TLS 10.1.3.114:59788;branch=z9hG4bK2910497772630690\r Call-ID: 2910497622026445\r From: <sip:12345 comm.example.com>;tag=2910497618150713\r\n\r\n"; That shaves one extra second bringing it down to 9. Andrei
Mar 01 2013
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 3/1/13 4:28 PM, cvk012c wrote:
 On my hardware with -inline options it now takes about 15 secs which is
 still slower than Python but with both -inline and -noboundscheck it
 takes 13 secs and finally beats Python.
 But I still kind of disappointed because I expected a much better
 performance boost and got only 7%. Counting that Python is not the
 fastest scripting language I think that similar Perl and Java scripts
 will outperform D easily.

I doubt that. 1. Microbenchmarks are a crapshoot, they exercise a tiny portion of the language and library. 2. With Python, after comparing 2-3 idioms - well, this is pretty much it. We doubled the speed in no time by just tuning options. D being a systems language allows you take your code to a million places if you want to optimize. 3. split has been discussed and improved for years in the Python community (just google for e.g. python split performance). Perl isn't all that fast actually, try this (takes 30+ seconds): $message = "REGISTER sip:comm.example.com SIP/2.0\r Content-Length: 0\r Contact: <sip:12345 10.1.3.114:59788;transport=tls>;expires=4294967295;events=\"message-summary\";q=0.9\r To: <sip:12345 comm.example.com>\r User-Agent: (\"VENDOR=MyCompany\" \"My User Agent\")\r Max-Forwards: 70\r CSeq: 1 REGISTER\r Via: SIP/2.0/TLS 10.1.3.114:59788;branch=z9hG4bK2910497772630690\r Call-ID: 2910497622026445\r From: <sip:12345 comm.example.com>;tag=2910497618150713\r\n\r\n"; foreach my $i (0 .. 10000000) { foreach my $notused (split(/\r\n/, $message)) { } } Andrei
Mar 01 2013
parent reply Jacob Carlborg <doob me.com> writes:
On 2013-03-01 23:01, Andrei Alexandrescu wrote:

 I doubt that.

 1. Microbenchmarks are a crapshoot, they exercise a tiny portion of the
 language and library.

If that tiny portion is what's only used, then the rest doesn't matter.
 2. With Python, after comparing 2-3 idioms - well, this is pretty much
 it. We doubled the speed in no time by just tuning options. D being a
 systems language allows you take your code to a million places if you
 want to optimize.

Perhaps we should look over the default configuration. -- /Jacob Carlborg
Mar 02 2013
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 3/2/13 11:11 AM, Jacob Carlborg wrote:
 On 2013-03-01 23:01, Andrei Alexandrescu wrote:

 I doubt that.

 1. Microbenchmarks are a crapshoot, they exercise a tiny portion of the
 language and library.

If that tiny portion is what's only used, then the rest doesn't matter.

Agreed. But I doubt that's the case in nontrivial applications. Andrei
Mar 02 2013
prev sibling next sibling parent Timon Gehr <timon.gehr gmx.ch> writes:
On 03/01/2013 10:28 PM, cvk012c wrote:
 ...

 On my hardware with -inline options it now takes about 15 secs which is
 still slower than Python but with both -inline and -noboundscheck it
 takes 13 secs and finally beats Python.
 But I still kind of disappointed because I expected a much better
 performance boost and got only 7%. Counting that Python is not the
 fastest scripting language I think that similar Perl and Java scripts
 will outperform D easily.

Never make such statements without doing actual measurements. Furthermore, it is completely meaningless anyway. Performance benchmarks always compare language implementations, not languages. (Case in point: You get twice the speed by using another compiler backend implementation.)
Mar 01 2013
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 3/1/13 5:31 PM, Steven Schveighoffer wrote:
 Phobos kind of refuses to treat strings like arrays of characters, it
 insists on decoding.

There's no decoding in find and splitter as far as I remember. Andrei
Mar 01 2013
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 3/1/13 5:47 PM, Steven Schveighoffer wrote:
 On Fri, 01 Mar 2013 17:35:04 -0500, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 On 3/1/13 5:31 PM, Steven Schveighoffer wrote:
 Phobos kind of refuses to treat strings like arrays of characters, it
 insists on decoding.

There's no decoding in find and splitter as far as I remember.

Looking at splitter, it uses skipOver to skip over the separator, and that seems to call R.front and R.popFront. Actually, it calls it for both the source string and the separator.

You may be looking at the wrong splitter overload. Andrei
Mar 01 2013
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 3/1/13 6:26 PM, Steven Schveighoffer wrote:
 On Fri, 01 Mar 2013 18:22:54 -0500, Steven Schveighoffer
 <schveiguy yahoo.com> wrote:

 So it's just pure heuristics. Not hard to see how that would affect a
 10 million cycle program.

 We may be able to create a string-specific version of splitter that
 would take advantage of the representation.

Just found a disturbing artifact: splitter(message, '\n') is more than twice as slow as splitter(message, "\n") -Steve

That is a problem. Could you please file a but report? Thanks, Andrei
Mar 02 2013
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 3/2/13 12:20 PM, Andrei Alexandrescu wrote:
 On 3/1/13 6:26 PM, Steven Schveighoffer wrote:
 On Fri, 01 Mar 2013 18:22:54 -0500, Steven Schveighoffer
 <schveiguy yahoo.com> wrote:

 So it's just pure heuristics. Not hard to see how that would affect a
 10 million cycle program.

 We may be able to create a string-specific version of splitter that
 would take advantage of the representation.

Just found a disturbing artifact: splitter(message, '\n') is more than twice as slow as splitter(message, "\n") -Steve

That is a problem. Could you please file a but report?

s/but/bug/ :o) Andrei
Mar 02 2013
prev sibling next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 3/1/2013 1:28 PM, cvk012c wrote:
 But I still kind of disappointed because I expected a much better performance
 boost and got only 7%. Counting that Python is not the fastest scripting
 language I think that similar Perl and Java scripts will outperform D easily.
 Thanks Andrei and simendsjo for a quick response though.

Python's splitter, which you are measuring, isn't written in Python. It is written in C. You're actually comparing a bit of C code with a bit of D code.
Mar 01 2013
parent reply Jacob Carlborg <doob me.com> writes:
On 2013-03-02 02:10, Walter Bright wrote:

 Python's splitter, which you are measuring, isn't written in Python. It
 is written in C. You're actually comparing a bit of C code with a bit of
 D code.

Does that really matter. He's using Python, if the function is part of the standard library and if it's implement in Python or C doesn't really matter. -- /Jacob Carlborg
Mar 02 2013
next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 3/2/13 11:09 AM, Jacob Carlborg wrote:
 On 2013-03-02 02:10, Walter Bright wrote:

 Python's splitter, which you are measuring, isn't written in Python. It
 is written in C. You're actually comparing a bit of C code with a bit of
 D code.

Does that really matter. He's using Python, if the function is part of the standard library and if it's implement in Python or C doesn't really matter.

The point here is that often one needs to write code that's more than gluing library calls together. I've seen that many, many times. Andrei
Mar 02 2013
prev sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2013-03-02 17:48, John Colvin wrote:

 It does.

 Failing to beat mature, optimised C is not anything to be ashamed of,
 but being slower than python would be an abject failure of D as a
 compiled, systems programming language.

Then D needs to get faster, or we need to switch to C for some std lib functions. In that case, as Walter said, we have failed. -- /Jacob Carlborg
Mar 03 2013
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 3/3/2013 7:09 AM, Jacob Carlborg wrote:
 On 2013-03-02 17:48, John Colvin wrote:

 It does.

 Failing to beat mature, optimised C is not anything to be ashamed of,
 but being slower than python would be an abject failure of D as a
 compiled, systems programming language.

Then D needs to get faster, or we need to switch to C for some std lib functions. In that case, as Walter said, we have failed.

Nothing in this thread suggests that D needs to switch its library implementations to C. Interestingly, I tried the indexOf() benchmark using C's memchr() and memcmp() from VC's runtime library. It was not faster than Andrei's optimized D one.
Mar 03 2013
next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 3/3/13 9:31 PM, deadalnix wrote:
 On Sunday, 3 March 2013 at 20:18:58 UTC, Walter Bright wrote:
 On 3/3/2013 7:09 AM, Jacob Carlborg wrote:
 On 2013-03-02 17:48, John Colvin wrote:

 It does.

 Failing to beat mature, optimised C is not anything to be ashamed of,
 but being slower than python would be an abject failure of D as a
 compiled, systems programming language.

Then D needs to get faster, or we need to switch to C for some std lib functions. In that case, as Walter said, we have failed.

Nothing in this thread suggests that D needs to switch its library implementations to C. Interestingly, I tried the indexOf() benchmark using C's memchr() and memcmp() from VC's runtime library. It was not faster than Andrei's optimized D one.

Maybe it is time to look at the python implementation and see why it is faster.

Was the conclusion that it's faster? Andrei
Mar 03 2013
prev sibling next sibling parent Jacob Carlborg <doob me.com> writes:
On 2013-03-03 21:18, Walter Bright wrote:

 Nothing in this thread suggests that D needs to switch its library
 implementations to C.

Then that's good.
 Interestingly, I tried the indexOf() benchmark using C's memchr() and
 memcmp() from VC's runtime library. It was not faster than Andrei's
 optimized D one.

-- /Jacob Carlborg
Mar 03 2013
prev sibling parent Ellery Newcomer <ellery-newcomer utulsa.edu> writes:
On 03/03/2013 07:31 PM, bearophile wrote:
 jerro:

 $ time python3 test.py

Are Python3 strings still like wstrings/dstrings, or have they applied the patch that makes them UTF8? Bye, bearophile

Looks like 3.3 will store them as utf8, but not 3.2 or below.
Mar 04 2013
prev sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2013-03-03 16:41, John Colvin wrote:

 I agree that anything that makes us faster is good, but I wouldn't go so
 far as to say we've failed if we're not as fast as the very fastest.

No, but if we need to drop down to C to get fast enough. -- /Jacob Carlborg
Mar 03 2013
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 3/3/2013 11:38 PM, Jacob Carlborg wrote:
 On 2013-03-03 16:41, John Colvin wrote:

 I agree that anything that makes us faster is good, but I wouldn't go so
 far as to say we've failed if we're not as fast as the very fastest.

No, but if we need to drop down to C to get fast enough.

I can't think of a reason to need to do that.
Mar 03 2013
parent Jacob Carlborg <doob me.com> writes:
On 2013-03-04 09:18, Jonathan M Davis wrote:

 Given how D works, there is something _very_ wrong if we have to drop to C, if
 nothing else, because it's trivial to write code in D which is equivalent to
 what it would be in C. If our implementation of something is worse than a C
 implementation, that merely means that it needs to be refactored and
 optimized. It's possible that some of our abstractions will hamper efficiency
in
 some cases (e.g. depending on how ranges are used - particularly with strings
 - you risk a performance hit in comparison to pure C), but that should
 generally be fixable via specializations and whatnot.

 If we have an efficiency problem, it's merely an implementation issue which
 needs to be sorted out. There should be nothing inherent to D which makes it
 so that you can't write code as fast as C or C++.

That's what I'm saying. -- /Jacob Carlborg
Mar 04 2013
prev sibling next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 3/1/2013 5:19 PM, anon123 wrote:
 My version is functionally equivalent,

I don't think so. Take a look at what happens to the message variable. It is never restored to its original string on the next iteration of the loop.
Mar 01 2013
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 3/1/13 8:05 PM, cvk012c wrote:
 In my latest version of D script I didn't use splitter at all. I used
 string specific indexOf function. Still result is very bad. For text
 based protocols, such as SIP, performance of string manipulating
 functions is very important. Unfortunately, looks like it is not D
 strongest point at this time.

That conclusion would be hasty if not missing the whole point. You essentially measured the speed of one loop in various translators implementing various languages. Java code doing straight computation is on a par with C speed, no two ways about that. Python code using library primitives ain't no slouch either. Performance tuning in these languages becomes more difficult in larger applications where data layout, allocation, and indirect function calls start to dominate. The claim of systems languages being fast materializes only when you need to optimize data layout and nonstandard, custom algorithms. At that point systems languages give you additional options, whereas with high-level languages optimization becomes a very difficult proposition. Andrei
Mar 02 2013
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 3/2/2013 7:43 AM, Russel Winder wrote:
 For writing interpreters, RPython spanks C.

What's RPython doing that makes it faster?
Mar 02 2013
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 3/2/2013 11:42 AM, Russel Winder wrote:
 On Sat, 2013-03-02 at 11:30 -0800, Walter Bright wrote:
 On 3/2/2013 7:43 AM, Russel Winder wrote:
 For writing interpreters, RPython spanks C.

What's RPython doing that makes it faster?

Allowing PyPy to have a good JIT compiler.

I don't understand. Does that JIT generate faster code than a C compiler would generate?
Mar 02 2013
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 3/2/2013 12:08 PM, H. S. Teoh wrote:
 On Sat, Mar 02, 2013 at 12:02:08PM -0800, Walter Bright wrote:
 On 3/2/2013 11:42 AM, Russel Winder wrote:
 On Sat, 2013-03-02 at 11:30 -0800, Walter Bright wrote:
 On 3/2/2013 7:43 AM, Russel Winder wrote:
 For writing interpreters, RPython spanks C.

What's RPython doing that makes it faster?

Allowing PyPy to have a good JIT compiler.

I don't understand. Does that JIT generate faster code than a C compiler would generate?

I don't know, but my wild guess is that a JIT optimizes the *right* hotspots based on real-time performance measurements, whereas a lot of C programmers are obsessed with optimizing what they *think* are the hotspots, but which really aren't.

I meant what the C *compiler* generates.
Mar 02 2013
next sibling parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
03-Mar-2013 12:03, Russel Winder пишет:
 On Sat, 2013-03-02 at 12:52 -0800, Walter Bright wrote:
 On 3/2/2013 12:08 PM, H. S. Teoh wrote:
 On Sat, Mar 02, 2013 at 12:02:08PM -0800, Walter Bright wrote:
 On 3/2/2013 11:42 AM, Russel Winder wrote:
 On Sat, 2013-03-02 at 11:30 -0800, Walter Bright wrote:
 On 3/2/2013 7:43 AM, Russel Winder wrote:
 For writing interpreters, RPython spanks C.

What's RPython doing that makes it faster?

Allowing PyPy to have a good JIT compiler.

I don't understand. Does that JIT generate faster code than a C compiler would generate?

I don't know, but my wild guess is that a JIT optimizes the *right* hotspots based on real-time performance measurements, whereas a lot of C programmers are obsessed with optimizing what they *think* are the hotspots, but which really aren't.

I meant what the C *compiler* generates.

Yes because the C/C++/D/etc. compilers are attempting to predict the control flow of the program in execution and optimize all cases for all possibilities. JIT's are just focussing on the runtime bottlenecks with the actual data as being used. This allows for more focussed code generation in the actual context. I would suspect that in many cases the generated code is effectively the same but JITs can often do unexpected and faster codes because they have more data to optimize with and less optimization to do.

Only one thing missing here is that JITs typically has much less time to do all and any of this.
 To say more would require actual comparative data and I suspect no-one
 on this list will do that.

-- Dmitry Olshansky
Mar 03 2013
prev sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 3/3/2013 12:03 AM, Russel Winder wrote:
 Yes because the C/C++/D/etc. compilers are attempting to predict the
 control flow of the program in execution and optimize all cases for all
 possibilities. JIT's are just focussing on the runtime bottlenecks with
 the actual data as being used. This allows for more focussed code
 generation in the actual context. I would suspect that in many cases the
 generated code is effectively the same but JITs can often do unexpected
 and faster codes because they have more data to optimize with and less
 optimization to do.

I've heard this was the advantage of JITs for the last 15 years. I haven't seen any data to justify it, and I have trouble thinking of any significant improvements to code gen that could be made with runtime data. That is for JITting statically typed languages. The situation is a bit better for dynamic ones, because the JIT can do a better job than a compiler because most of the time, only one type is used in a particular spot, and the JIT can pick that up and "statically" type it, as opposed to a compiler which cannot.
Mar 03 2013
prev sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 3/2/2013 11:14 AM, cvk012c wrote:
 That is true, but what prevents D to do the same?

Nothing.
 I will go even further: what prevents D to be faster than C?

Nothing. In fact, there are semantics in D that allow faster code to be generated than can be for C. They are unexploited at the moment, but they are there.
Mar 02 2013
prev sibling next sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2013-03-01 23:31, Steven Schveighoffer wrote:

 Phobos kind of refuses to treat strings like arrays of characters, it
 insists on decoding.

 With DMD and a hand-written splitter, it takes 6 seconds instead of 10
 on my system (64-bit macosx).

If you need to write a custom splitter to get acceptable performance then it sounds to me that D or Phobos have failed. -- /Jacob Carlborg
Mar 02 2013
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 3/2/13 11:12 AM, Jacob Carlborg wrote:
 On 2013-03-01 23:31, Steven Schveighoffer wrote:

 Phobos kind of refuses to treat strings like arrays of characters, it
 insists on decoding.

 With DMD and a hand-written splitter, it takes 6 seconds instead of 10
 on my system (64-bit macosx).

If you need to write a custom splitter to get acceptable performance then it sounds to me that D or Phobos have failed.

I wrote a custom splitter specialized for arrays. It's as fast. Andrei
Mar 02 2013
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 3/2/13 2:37 PM, Steven Schveighoffer wrote:
 On Sat, 02 Mar 2013 12:14:07 -0500, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 On 3/2/13 11:12 AM, Jacob Carlborg wrote:
 On 2013-03-01 23:31, Steven Schveighoffer wrote:

 Phobos kind of refuses to treat strings like arrays of characters, it
 insists on decoding.

 With DMD and a hand-written splitter, it takes 6 seconds instead of 10
 on my system (64-bit macosx).

If you need to write a custom splitter to get acceptable performance then it sounds to me that D or Phobos have failed.

I wrote a custom splitter specialized for arrays. It's as fast.

Why doesn't it get used for strings in this instance? Maybe there is a bad constraint somewhere.

I fear there's a misunderstanding somewhere. Splitter and find are already specialized appropriately for strings. Andrei
Mar 02 2013
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 3/2/13 5:48 PM, Steven Schveighoffer wrote:
 On Sat, 02 Mar 2013 16:16:22 -0500, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 I fear there's a misunderstanding somewhere. Splitter and find are
 already specialized appropriately for strings.

OK, I would then counter that I have created a splitter range

Prolly a link would clarify that. Andrei
Mar 02 2013
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 3/2/2013 8:12 AM, Jacob Carlborg wrote:
 If you need to write a custom splitter to get acceptable performance then it
 sounds to me that D or Phobos have failed.

In D, you can write custom versions of algorithms to exploit unique characteristics of particular data structures. I don't regard that as a failure. But if, to write a custom version, you had to switch to another language, then I would agree that is a failure of a "systems" programming language.
Mar 02 2013
parent reply Jacob Carlborg <doob me.com> writes:
On 2013-03-02 20:27, Walter Bright wrote:

 In D, you can write custom versions of algorithms to exploit unique
 characteristics of particular data structures. I don't regard that as a
 failure.

If you use the most obvious function/way from the standard library to perform a task in Python and you do the same in D. If the D version is slower that's not good. As you've seen here people will use what's available in the standard library and if that's not good enough compared to what's available in other languages' standard libraries, they will complain. If a function in Phobos can't match the corresponding function in another std lib, I think we're doing something wrong. Or the they just happened to use a language where the std lib was optimized better for that particular task. -- /Jacob Carlborg
Mar 03 2013
parent Walter Bright <newshound2 digitalmars.com> writes:
On 3/3/2013 7:07 AM, Jacob Carlborg wrote:
 If you use the most obvious function/way from the standard library to perform a
 task in Python and you do the same in D. If the D version is slower that's not
good.

Sure. But you should be cognizant of exactly what it is you're comparing, or you're apt to draw very wrong conclusions.
Mar 03 2013
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 3/1/13 8:29 PM, Steven Schveighoffer wrote:
 On Fri, 01 Mar 2013 20:05:34 -0500, cvk012c
 <cvk012c motorolasolutions.com> wrote:

 In my latest version of D script I didn't use splitter at all. I used
 string specific indexOf function. Still result is very bad. For text
 based protocols, such as SIP, performance of string manipulating
 functions is very important. Unfortunately, looks like it is not D
 strongest point at this time.

indexOf uses the same mechanism as splitter to find the separators. If it doesn't improve anything, I'd say that is where the problem lies (std.algorithm.find).

find and indexOf on arrays are on par with handwritten loops. Andrei
Mar 02 2013
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 3/2/13 2:32 PM, Steven Schveighoffer wrote:
 This is not a personal attack, please don't take it that way.

Not at all. I'm just confused about this exchange. You mention some correct facts ("MySplitter finished in 6 seconds instead of 10"), then some mistaken suppositions ("a string-specific version could be written"), and derive conclusions from both ("Maybe there is a bad constraint somewhere"). Hard to deal with the mix.
 My
 anecdotal tests with hand writing a custom splitter range to handle the
 OP's program gave me a 40% savings. Whether it's find or not, I'm not
 sure, but there definitely is room for improvement.

I think I understand where the confusion comes from. If you're referring to MySplitter, that's not comparable. It uses this at the core: for(; i + 1 < source.length; i++) { if(source[i] == '\r' && source[i + 1] == '\n') { found = true; break; } ... } This is not close to the code that would work with arrays. We could specialize things for small arrays, but that hasn't been done yet. My point is it's not comparable with the classic brute force subsequence search. What is comparable is shown below. indexOf2 defines the straight array-in-array brute force search. With dmd on my machine it runs in some 7 seconds. So does indexOf (this was my point). Then indexOf3 uses a simple trick that the Java implementation does: cache the first character. That reduces the running time to 5 seconds. Then indexOf4 optimizes the loop searching that first character, bringing run time to about 4 seconds. On that machine the Java implementation runs in 3.5 seconds. Andrei import std.stdio, std.string, std.datetime; long indexOf2(string haystack, string needle) { if (!needle.length) return 0; if (needle.length > haystack.length) return -1; auto limit = haystack.length - needle.length; bigloop: for (size_t i; i <= limit; ++i) { foreach (q; 0 .. needle.length) { if (haystack[i + q] != needle[q]) continue bigloop; } return i; } return -1; } long indexOf3(string haystack, string needle) { if (!needle.length) return 0; if (needle.length > haystack.length) return -1; immutable c = needle[0]; auto limit = haystack.length - needle.length; assert(limit < uint.max, "not handled"); bigloop: for (uint i; i <= limit; ++i) { if (haystack.ptr[i] != c) continue; foreach (q; 1 .. needle.length) { if (haystack[i + q] != needle[q]) continue bigloop; } return i; } return -1; } long indexOf4(string haystack, string needle) { if (!needle.length) return 0; if (needle.length > haystack.length) return -1; immutable c = needle[0]; auto limit = haystack.length - needle.length; size_t i = 0; bigloop: for (;;) { while (i <= limit && haystack[i++] != c) {} if (i-- > limit) break; foreach (q; 1 .. needle.length) { if (haystack[i + q] != needle[q]) continue bigloop; } return i; } return -1; } unittest { assert(indexOf2("hello, world", ",") == 5); assert(indexOf2("hello, world", "a") == -1); assert(indexOf3("hello, world", ",") == 5); assert(indexOf3("hello, world", "a") == -1); assert(indexOf4("hello, world", ",") == 5); assert(indexOf4("hello, world", "a") == -1); } void main(string[] args) { auto message = "REGISTER sip:example.com SIP/2.0\r\nContent-Length: 0\r\nContact: <sip:12345 10.1.3.114:59788;transport=tls>;expires=4294967295;events=\"message-su mary\";q=0.9\r\nTo: <sip:12345 comm.example.com>\r\nUser-Agent: (\"VENDOR=MyCompany\" \"My User Agent\")\r\nMax-Forwards: 70\r\nCSeq: 1 REGISTER\r\nVia: SIP/2.0/TLS 10.1.3.114:59788;branch=z9hG4bK2910497772630690\r\nCall-ID: 2910497622026445\r\nFrom: <sip:12345 comm.example.com>;tag=2910497618150713\r\n\r\n"; auto t1 = Clock.currTime(); for (int i=0; i<10000000; i++) { long index1 = 0; while (true) { auto index2 = indexOf(message[index1 .. $], "\r\n"); if (index2 == -1) break; auto notused = message[index1 .. index1 + index2]; if (args.length == 42) writeln(notused); index1 += index2 + 2; } } writeln(Clock.currTime()-t1); }
Mar 02 2013
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 3/4/13 10:39 AM, Steven Schveighoffer wrote:
 struct MySplitter
 {
 private string s;
 private string separator;
 private string source;
 this(string src, string sep)
 {
 source = src;
 separator = sep;
 popFront();
 }

  property string front()
 {
 return s;
 }

  property bool empty()
 {
 return s.ptr == null;
 }

 void popFront()
 {
 if(!source.length)
 {
 s = source;
 source = null;
 }
 else
 {
 size_t i = 0;
 bool found = false;
 for(; i + separator.length <= source.length; i++)
 {
 if(source[i] == separator[0])
 {
 found = true;
 for(size_t j = i+1, k=1; k < separator.length; ++j, ++k)
 if(source[j] != separator[k])
 {
 found = false;
 break;
 }
 if(found)
 break;
 }
 }
 s = source[0..i];
 if(found)
 source = source[i + separator.length..$];
 else
 source = source[$..$];
 }
 }
 }

 Takes 7 seconds on my machine instead of 6, but not 10 like
 std.algorithm.splitter. I don't even like the loop that well, it looks
 crude, I can probably optimize it further.

 And it does not use any of your specified tricks.

 Is this sufficiently comparable, or am I missing something else?

 -Steve

That's comparable, please file an enh request. Thanks, Andrei
Mar 04 2013
prev sibling next sibling parent "simendsjo" <simendsjo gmail.com> writes:
On Friday, 1 March 2013 at 20:50:15 UTC, Andrei Alexandrescu 
wrote:
 On 3/1/13 3:30 PM, cvk012c wrote:
 Tried to port my SIP parser from Python to D to boost 
 performance
 but got opposite result.
 I created a simple test script which splits SIP REGISTER 
 message
 10 million times. Python version takes about 14 seconds to
 execute, D version is about 23 seconds which is 1.6 times 
 slower.
 I used DMD 2.062 and compiled my script with options -release 
 and
 -O. I used Python 3.3 64 bit.
 I ran both scripts on the same hardware with Windows 7 64.

Add -inline to the options. Andrei

--noboundscheck can also help if you don't mind missing the safety net. $ rdmd -O -release sip 22 secs, 977 ms, 299 μs, and 8 hnsecs $ rdmd -O -release -inline sip 12 secs, 245 ms, 567 μs, and 9 hnsecs $ rdmd -O -release -inline -noboundscheck sip 10 secs, 171 ms, 209 μs, and 9 hnsecs
Mar 01 2013
prev sibling next sibling parent "jerro" <a a.com> writes:
On Friday, 1 March 2013 at 20:30:24 UTC, cvk012c wrote:
 Tried to port my SIP parser from Python to D to boost 
 performance
 but got opposite result.
 I created a simple test script which splits SIP REGISTER message
 10 million times. Python version takes about 14 seconds to
 execute, D version is about 23 seconds which is 1.6 times 
 slower.
 I used DMD 2.062 and compiled my script with options -release 
 and
 -O. I used Python 3.3 64 bit.
 I ran both scripts on the same hardware with Windows 7 64.
 Is this general problem with string performance in D or just
 splitter() issue?
 Did anybody compared performance of D string manipulating
 functions with other languages like Python,Perl,Java and C++?

I'm guessing you are building without optimization options. When compiled with "dmd -O -inline -noboundscheck -release tmp" the D code takes 11.1 seconds on my machine and the python script takes 16.1 seconds. You can make the D code faster by building with LDC or GDC: ldmd2 -O -noboundscheck -release tmp: 6.8 seconds gdmd -O -nboundscheck -inline -release tmp: 6.1 seconds So no, not slower than Python. I also suspect that much of the work in the Python code is actually done by functions that are implemented in C.
Mar 01 2013
prev sibling next sibling parent Robert <jfanatiker gmx.at> writes:
Hm, just recently a friend of mine and I hacked together at the FB
hacking cup, he in python and I in D. My solutions always were at least
faster by a factor of 80. For your example, I could not get a factor of
80, but with 
 -inline
it is at least faster than the python version (about 30% faster on my
machine)

Best regards,

Robert
Mar 01 2013
prev sibling next sibling parent reply "cvk012c" <cvk012c motorolasolutions.com> writes:
On Friday, 1 March 2013 at 20:58:09 UTC, simendsjo wrote:
 On Friday, 1 March 2013 at 20:50:15 UTC, Andrei Alexandrescu 
 wrote:
 On 3/1/13 3:30 PM, cvk012c wrote:
 Tried to port my SIP parser from Python to D to boost 
 performance
 but got opposite result.
 I created a simple test script which splits SIP REGISTER 
 message
 10 million times. Python version takes about 14 seconds to
 execute, D version is about 23 seconds which is 1.6 times 
 slower.
 I used DMD 2.062 and compiled my script with options -release 
 and
 -O. I used Python 3.3 64 bit.
 I ran both scripts on the same hardware with Windows 7 64.

Add -inline to the options. Andrei

--noboundscheck can also help if you don't mind missing the safety net. $ rdmd -O -release sip 22 secs, 977 ms, 299 μs, and 8 hnsecs $ rdmd -O -release -inline sip 12 secs, 245 ms, 567 μs, and 9 hnsecs $ rdmd -O -release -inline -noboundscheck sip 10 secs, 171 ms, 209 μs, and 9 hnsecs

On my hardware with -inline options it now takes about 15 secs which is still slower than Python but with both -inline and -noboundscheck it takes 13 secs and finally beats Python. But I still kind of disappointed because I expected a much better performance boost and got only 7%. Counting that Python is not the fastest scripting language I think that similar Perl and Java scripts will outperform D easily. Thanks Andrei and simendsjo for a quick response though.
Mar 01 2013
next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Sat, Mar 02, 2013 at 12:02:08PM -0800, Walter Bright wrote:
 On 3/2/2013 11:42 AM, Russel Winder wrote:
On Sat, 2013-03-02 at 11:30 -0800, Walter Bright wrote:
On 3/2/2013 7:43 AM, Russel Winder wrote:
For writing interpreters, RPython spanks C.

What's RPython doing that makes it faster?

Allowing PyPy to have a good JIT compiler.

I don't understand. Does that JIT generate faster code than a C compiler would generate?

I don't know, but my wild guess is that a JIT optimizes the *right* hotspots based on real-time performance measurements, whereas a lot of C programmers are obsessed with optimizing what they *think* are the hotspots, but which really aren't. T -- Everybody talks about it, but nobody does anything about it! -- Mark Twain
Mar 02 2013
prev sibling next sibling parent Russel Winder <russel winder.org.uk> writes:
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Sat, 2013-03-02 at 12:52 -0800, Walter Bright wrote:
 On 3/2/2013 12:08 PM, H. S. Teoh wrote:
 On Sat, Mar 02, 2013 at 12:02:08PM -0800, Walter Bright wrote:
 On 3/2/2013 11:42 AM, Russel Winder wrote:
 On Sat, 2013-03-02 at 11:30 -0800, Walter Bright wrote:
 On 3/2/2013 7:43 AM, Russel Winder wrote:
 For writing interpreters, RPython spanks C.

What's RPython doing that makes it faster?

Allowing PyPy to have a good JIT compiler.

I don't understand. Does that JIT generate faster code than a C compiler would generate?

I don't know, but my wild guess is that a JIT optimizes the *right* hotspots based on real-time performance measurements, whereas a lot of =


 programmers are obsessed with optimizing what they *think* are the
 hotspots, but which really aren't.

I meant what the C *compiler* generates.

Yes because the C/C++/D/etc. compilers are attempting to predict the control flow of the program in execution and optimize all cases for all possibilities. JIT's are just focussing on the runtime bottlenecks with the actual data as being used. This allows for more focussed code generation in the actual context. I would suspect that in many cases the generated code is effectively the same but JITs can often do unexpected and faster codes because they have more data to optimize with and less optimization to do. To say more would require actual comparative data and I suspect no-one on this list will do that. =20 --=20 Russel. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.winder ekiga.n= et 41 Buckmaster Road m: +44 7770 465 077 xmpp: russel winder.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder
Mar 03 2013
prev sibling next sibling parent "deadalnix" <deadalnix gmail.com> writes:
On Sunday, 3 March 2013 at 08:03:53 UTC, Russel Winder wrote:
 Yes because the C/C++/D/etc. compilers are attempting to 
 predict the
 control flow of the program in execution and optimize all cases 
 for all
 possibilities. JIT's are just focussing on the runtime 
 bottlenecks with
 the actual data as being used. This allows for more focussed 
 code
 generation in the actual context. I would suspect that in many 
 cases the
 generated code is effectively the same but JITs can often do 
 unexpected
 and faster codes because they have more data to optimize with 
 and less
 optimization to do.

That is theory, but in practice, it doesn't work that well : you have instrument the code to get measure to optimize according to runtime data. Plus you can't monkey patch codegen (it is unspecified how x86 CPU load instructions, and so it isn't guaranteed that the CPU won't see garbage). That cost, plus the cost of the optimization itself will make the whole thing worth it only if the win when you optimize is high.
 To say more would require actual comparative data and I suspect 
 no-one
 on this list will do that.

It isn't worth. As explained above, you can conclude that this is highly dependent of the code you manipulate and the runtime data that is throw at it. Note that I discussed with people from LLVM on how this can be done in statically compiled code. In fact, it can but it is rather complicated and not worth automate for now.
Mar 03 2013
prev sibling next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
Walter Bright:

 and I have trouble thinking of any significant improvements to 
 code gen that could be made with runtime data.

I've seen that the JavaHotSpot is able to unroll loops with a length known only at run-time. It unrolls them only when the run-time statistics say it's advantageous. (In theory the same is possible with profile-driven optimization). Bye, bearophile
Mar 03 2013
prev sibling parent "SomeDude" <lovelydear mailmetrash.com> writes:
On Sunday, 3 March 2013 at 11:57:36 UTC, bearophile wrote:
 Walter Bright:

 and I have trouble thinking of any significant improvements to 
 code gen that could be made with runtime data.

I've seen that the JavaHotSpot is able to unroll loops with a length known only at run-time. It unrolls them only when the run-time statistics say it's advantageous. (In theory the same is possible with profile-driven optimization). Bye, bearophile

I believe the Hotspot runtime compiler is about the only one to perform runtime optimizations like that. I don't know if the CIL does this kind of things.
Mar 03 2013
prev sibling next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
cvk012c:

 I think that similar Perl and Java scripts will outperform D 
 easily.
 Thanks Andrei and simendsjo for a quick response though.

Why don't you write a Java version? It takes only few minutes, and you will have one more data point. Python string functions are written in C, compiled very efficiently (the standard Python binaries on Windows are compiled with the Microsoft Compiler, but also Intel compile can be found), and they are well optimized in several years of work by people like Hettinger :-) You will see Python2 code like this easily beat D for normal text files: for line in file(foo.txt): ... Both D and general performance aren't magical things. Performance comes from a long work of optimization of algorithms, code and compilers/virtual machines that run them. Bye, bearophile
Mar 01 2013
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Fri, 01 Mar 2013 16:28:08 -0500, cvk012c  
<cvk012c motorolasolutions.com> wrote:

 On my hardware with -inline options it now takes about 15 secs which is  
 still slower than Python but with both -inline and -noboundscheck it  
 takes 13 secs and finally beats Python.
 But I still kind of disappointed because I expected a much better  
 performance boost and got only 7%. Counting that Python is not the  
 fastest scripting language I think that similar Perl and Java scripts  
 will outperform D easily.
 Thanks Andrei and simendsjo for a quick response though.

Phobos kind of refuses to treat strings like arrays of characters, it insists on decoding. With DMD and a hand-written splitter, it takes 6 seconds instead of 10 on my system (64-bit macosx). struct MySplitter { private string s; private string source; this(string src) { source = src; popFront(); } property string front() { return s; } property bool empty() { return s.ptr == null; } void popFront() { s = source; if(!source.length) { source = null; } else { size_t i = 0; bool found = false; for(; i + 1 < source.length; i++) { if(source[i] == '\r' && source[i + 1] == '\n') { found = true; break; } } s = source[0..i]; if(found) source = source[i + 2..$]; else source = source[$..$]; } } } I'm sure splitter could be optimized to do the same thing I'm doing. Probably can reduce that a bit using pointers instead of strings. -Steve
Mar 01 2013
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Fri, 01 Mar 2013 17:35:04 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 On 3/1/13 5:31 PM, Steven Schveighoffer wrote:
 Phobos kind of refuses to treat strings like arrays of characters, it
 insists on decoding.

There's no decoding in find and splitter as far as I remember.

Looking at splitter, it uses skipOver to skip over the separator, and that seems to call R.front and R.popFront. Actually, it calls it for both the source string and the separator. Maybe fixing skipOver would fix the problem? Or don't call skipOver at all, since find isn't doing decoding, why do decoding when skipping the separator? I still feel we need a specialized string type... -Steve
Mar 01 2013
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Fri, 01 Mar 2013 18:07:45 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 On 3/1/13 5:47 PM, Steven Schveighoffer wrote:
 On Fri, 01 Mar 2013 17:35:04 -0500, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 On 3/1/13 5:31 PM, Steven Schveighoffer wrote:
 Phobos kind of refuses to treat strings like arrays of characters, it
 insists on decoding.

There's no decoding in find and splitter as far as I remember.

Looking at splitter, it uses skipOver to skip over the separator, and that seems to call R.front and R.popFront. Actually, it calls it for both the source string and the separator.

You may be looking at the wrong splitter overload.

Yes, I was. Very difficult to tell with the way template constraints are written! So it's just pure heuristics. Not hard to see how that would affect a 10 million cycle program. We may be able to create a string-specific version of splitter that would take advantage of the representation. -Steve
Mar 01 2013
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Fri, 01 Mar 2013 18:22:54 -0500, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:

 So it's just pure heuristics.  Not hard to see how that would affect a  
 10 million cycle program.

 We may be able to create a string-specific version of splitter that  
 would take advantage of the representation.

Just found a disturbing artifact: splitter(message, '\n') is more than twice as slow as splitter(message, "\n") -Steve
Mar 01 2013
prev sibling next sibling parent "cvk012c" <cvk012c motorolasolutions.com> writes:
On Friday, 1 March 2013 at 21:52:13 UTC, bearophile wrote:
 cvk012c:

 I think that similar Perl and Java scripts will outperform D 
 easily.
 Thanks Andrei and simendsjo for a quick response though.

Why don't you write a Java version? It takes only few minutes, and you will have one more data point.

You are right. Why not. But instead of using Java split() method I used combination of indexOf() and substring() methods to do the same job. The reason: Java split method implemented as a regular expression which will be unfair to compare to D splitter. Again, I created a similar D version of the script, compiled it with all suggested options: -release -O -inline -noboundscheck and this time D version is more then twice slower than Java: 8.4 seconds vs 4. D experts, please, take a look at my code and tell me what is wrong with it. Java version: public class Test { public static void main(String[] args) { String message = "REGISTER sip:comm.example.com SIP/2.0\r\nContent-Length: 0\r\nContact: <sip:12345 10.1.3.114:59788;transport=tls>;expires=4294967295;events=\"message-su mary\";q=0.9\r\nTo: <sip:12345 comm.example.com>\r\nUser-Agent: (\"VENDOR=MyCompany\" \"My User Agent\")\r\nMax-Forwards: 70\r\nCSeq: 1 REGISTER\r\nVia: SIP/2.0/TLS 10.1.3.114:59788;branch=z9hG4bK2910497772630690\r\nCall-ID: 2910497622026445\r\nFrom: <sip:12345 comm.example.com>;tag=2910497618150713\r\n\r\n"; long t1 = System.currentTimeMillis(); for (int i=0; i<10000000; i++) { int index1 = 0; while (true) { int index2 = message.indexOf("\r\n", index1); if (index2 == -1) break; String notused = message.substring(index1, index2); index1 = index2+2; } } System.out.println(System.currentTimeMillis()-t1); } } D version: import std.stdio,std.string,std.datetime; void main() { auto message = "REGISTER sip:example.com SIP/2.0\r\nContent-Length: 0\r\nContact: <sip:12345 10.1.3.114:59788;transport=tls>;expires=4294967295;events=\"message-su mary\";q=0.9\r\nTo: <sip:12345 comm.example.com>\r\nUser-Agent: (\"VENDOR=MyCompany\" \"My User Agent\")\r\nMax-Forwards: 70\r\nCSeq: 1 REGISTER\r\nVia: SIP/2.0/TLS 10.1.3.114:59788;branch=z9hG4bK2910497772630690\r\nCall-ID: 2910497622026445\r\nFrom: <sip:12345 comm.example.com>;tag=2910497618150713\r\n\r\n"; auto t1 = Clock.currTime(); for (int i=0; i<10000000; i++) { auto substring = message; while (true) { auto index = indexOf(substring, "\r\n"); if (index == -1) break; auto notused = substring[0..index]; substring = substring[index+2..$]; } } writeln(Clock.currTime()-t1); }
Mar 01 2013
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Fri, 01 Mar 2013 19:19:35 -0500, cvk012c  
<cvk012c motorolasolutions.com> wrote:

 On Friday, 1 March 2013 at 21:52:13 UTC, bearophile wrote:
 cvk012c:

 I think that similar Perl and Java scripts will outperform D easily.
 Thanks Andrei and simendsjo for a quick response though.

Why don't you write a Java version? It takes only few minutes, and you will have one more data point.

You are right. Why not. But instead of using Java split() method I used combination of indexOf() and substring() methods to do the same job. The reason: Java split method implemented as a regular expression which will be unfair to compare to D splitter. Again, I created a similar D version of the script, compiled it with all suggested options: -release -O -inline -noboundscheck and this time D version is more then twice slower than Java: 8.4 seconds vs 4. D experts, please, take a look at my code and tell me what is wrong with it.

Try my hand-written version (elsewhere in thread). I think it can be done better too (use pointers instead of arrays). The issue is a combination of the fact that: 1. splitter is designed for any range, not just strings. Not an excuse really, but a string-specific version could be written that does better (clearly). 2. dmd is not always the best optimizer. I saw one other person who said using a different d compiler resulted in a quicker time. 3. Any time you are looping 10 million times, small insignificant differences will be magnified. Note one other thing -- Be VERY wary of test cases that are fully-visible at compile time. Very smart compilers are known to reduce your code to something that isn't realistic. I remember seeing a similar comparison not too long ago where one wondered why g++ was so much faster (0.09 seconds or something) than D (4 or more seconds). Turns out, the g++ compiler optimized out his ENTIRE program :). You may want to read in the "message" string at runtime to avoid such issues. -Steve
Mar 01 2013
prev sibling next sibling parent "cvk012c" <cvk012c motorolasolutions.com> writes:
On Saturday, 2 March 2013 at 00:47:02 UTC, Steven Schveighoffer 
wrote:
 On Fri, 01 Mar 2013 19:19:35 -0500, cvk012c 
 <cvk012c motorolasolutions.com> wrote:

 On Friday, 1 March 2013 at 21:52:13 UTC, bearophile wrote:
 cvk012c:

 I think that similar Perl and Java scripts will outperform D 
 easily.
 Thanks Andrei and simendsjo for a quick response though.

Why don't you write a Java version? It takes only few minutes, and you will have one more data point.

You are right. Why not. But instead of using Java split() method I used combination of indexOf() and substring() methods to do the same job. The reason: Java split method implemented as a regular expression which will be unfair to compare to D splitter. Again, I created a similar D version of the script, compiled it with all suggested options: -release -O -inline -noboundscheck and this time D version is more then twice slower than Java: 8.4 seconds vs 4. D experts, please, take a look at my code and tell me what is wrong with it.

The issue is a combination of the fact that: 1. splitter is designed for any range, not just strings. Not an excuse really, but a string-specific version could be written that does better (clearly).

In my latest version of D script I didn't use splitter at all. I used string specific indexOf function. Still result is very bad. For text based protocols, such as SIP, performance of string manipulating functions is very important. Unfortunately, looks like it is not D strongest point at this time.
Mar 01 2013
prev sibling next sibling parent "anon123" <z z.z> writes:
On Saturday, 2 March 2013 at 01:05:35 UTC, cvk012c wrote:
 On Saturday, 2 March 2013 at 00:47:02 UTC, Steven Schveighoffer 
 wrote:
 On Fri, 01 Mar 2013 19:19:35 -0500, cvk012c 
 <cvk012c motorolasolutions.com> wrote:

 On Friday, 1 March 2013 at 21:52:13 UTC, bearophile wrote:
 cvk012c:

 I think that similar Perl and Java scripts will outperform 
 D easily.
 Thanks Andrei and simendsjo for a quick response though.

Why don't you write a Java version? It takes only few minutes, and you will have one more data point.

You are right. Why not. But instead of using Java split() method I used combination of indexOf() and substring() methods to do the same job. The reason: Java split method implemented as a regular expression which will be unfair to compare to D splitter. Again, I created a similar D version of the script, compiled it with all suggested options: -release -O -inline -noboundscheck and this time D version is more then twice slower than Java: 8.4 seconds vs 4. D experts, please, take a look at my code and tell me what is wrong with it.

The issue is a combination of the fact that: 1. splitter is designed for any range, not just strings. Not an excuse really, but a string-specific version could be written that does better (clearly).

In my latest version of D script I didn't use splitter at all. I used string specific indexOf function. Still result is very bad. For text based protocols, such as SIP, performance of string manipulating functions is very important. Unfortunately, looks like it is not D strongest point at this time.

My version is functionally equivalent, and measures 55 - 94 ms (depending on compiler; LDC is best). Your version performed in about 7s on my machine. Similar optimization taking advantage of immutability can be done on your python translation that results in measurements of <1ms. import std.stdio,std.string,std.datetime; void main() { auto message = "REGISTER sip:example.com SIP/2.0\r\nContent-Length: 0\r\nContact: <sip:12345 10.1.3.114:59788;transport=tls>;expires=4294967295;events=\"message-su mary\";q=0.9\r\nTo: <sip:12345 comm.example.com>\r\nUser-Agent: (\"VENDOR=MyCompany\" \"My User Agent\")\r\nMax-Forwards: 70\r\nCSeq: 1 REGISTER\r\nVia: SIP/2.0/TLS 10.1.3.114:59788;branch=z9hG4bK2910497772630690\r\nCall-ID: 2910497622026445\r\nFrom: <sip:12345 comm.example.com>;tag=2910497618150713\r\n\r\n"; auto t1 = Clock.currTime(); for (int i=0; i<10_000_000; i++) { while (true) { auto index = indexOf(message, "\r\n"); if (index == -1) break; auto notused = message[0..index]; message = message[index+2..$]; } } writeln(Clock.currTime()-t1); }
Mar 01 2013
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Fri, 01 Mar 2013 20:05:34 -0500, cvk012c  
<cvk012c motorolasolutions.com> wrote:

 In my latest version of D script I didn't use splitter at all. I used  
 string specific indexOf function. Still result is very bad. For text  
 based protocols, such as SIP, performance of string manipulating  
 functions is very important. Unfortunately, looks like it is not D  
 strongest point at this time.

indexOf uses the same mechanism as splitter to find the separators. If it doesn't improve anything, I'd say that is where the problem lies (std.algorithm.find). -Steve
Mar 01 2013
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Fri, 01 Mar 2013 20:19:19 -0500, anon123 <z z.z> wrote:

 My version is functionally equivalent, and measures 55 - 94 ms  
 (depending on compiler; LDC is best). Your version performed in about 7s  
 on my machine.

Try printing out message each time through the 10,000,000 iterations -Steve
Mar 01 2013
prev sibling next sibling parent "SomeDude" <lovelydear mailmetrash.com> writes:
On Friday, 1 March 2013 at 22:02:02 UTC, Timon Gehr wrote:
 On 03/01/2013 10:28 PM, cvk012c wrote:
 ...

 On my hardware with -inline options it now takes about 15 secs 
 which is
 still slower than Python but with both -inline and 
 -noboundscheck it
 takes 13 secs and finally beats Python.
 But I still kind of disappointed because I expected a much 
 better
 performance boost and got only 7%. Counting that Python is not 
 the
 fastest scripting language I think that similar Perl and Java 
 scripts
 will outperform D easily.

Never make such statements without doing actual measurements. Furthermore, it is completely meaningless anyway. Performance benchmarks always compare language implementations, not languages. (Case in point: You get twice the speed by using another compiler backend implementation.)

Still, there is a case to be made for a performance tests suite that could be run after (or before) each release of the language, like http://speed.pypy.org .
Mar 01 2013
prev sibling next sibling parent "deadalnix" <deadalnix gmail.com> writes:
On Saturday, 2 March 2013 at 00:47:02 UTC, Steven Schveighoffer 
wrote:
 Try my hand-written version (elsewhere in thread).  I think it 
 can be done better too (use pointers instead of arrays).

That is usually a bad idea as it will fuck up pretty bad the aliasing analysis capabilities of the compiler.
Mar 02 2013
prev sibling next sibling parent "John Colvin" <john.loughran.colvin gmail.com> writes:
On Saturday, 2 March 2013 at 01:10:39 UTC, Walter Bright wrote:
 On 3/1/2013 1:28 PM, cvk012c wrote:
 But I still kind of disappointed because I expected a much 
 better performance
 boost and got only 7%. Counting that Python is not the fastest 
 scripting
 language I think that similar Perl and Java scripts will 
 outperform D easily.
 Thanks Andrei and simendsjo for a quick response though.

Python's splitter, which you are measuring, isn't written in Python. It is written in C. You're actually comparing a bit of C code with a bit of D code.

This. If you wrote the splitter in pure python you would see an enormous performance gap between it and the D version. Having said that, maybe there are even more improvements we can make to improve speed in that part of photos, but considering we're already on a par with some quite mature and optimised C, there really isn't a problem.
Mar 02 2013
prev sibling next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
deadalnix:

 That is usually a bad idea as it will fuck up pretty bad the 
 aliasing analysis capabilities of the compiler.

With LDC I've seen that using raw pointers sometimes gives a little extra performance, but with DMD I've some times a lower performance. Bye, bearophile
Mar 02 2013
prev sibling next sibling parent Russel Winder <russel winder.org.uk> writes:
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Sat, 2013-03-02 at 10:33 -0500, Andrei Alexandrescu wrote:
[=E2=80=A6]
 That conclusion would be hasty if not missing the whole point. You=20
 essentially measured the speed of one loop in various translators=20
 implementing various languages. Java code doing straight computation is=

 on a par with C speed, no two ways about that. Python code using library=

 primitives ain't no slouch either. Performance tuning in these languages=

 becomes more difficult in larger applications where data layout,=20
 allocation, and indirect function calls start to dominate.

Interestingly, there isn't only one Python implementation. There is only one language but there is CPython, PyPy, Jython, IronPython, to mention but 4. On computationally intensive code, PyPy (Python execution environment in RPython) is generally 10 to 30 times faster than CPython (Python execution environment written in C). C is a (reasonably) well known and used language thought to create fast code. RPython is Python but with some restrictions that is statically compiled. For writing interpreters, RPython spanks C. PyPy is not the only language using RPython to implement the interpreter. C's days in this game are seriously numbered. --=20 Russel. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.winder ekiga.n= et 41 Buckmaster Road m: +44 7770 465 077 xmpp: russel winder.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder
Mar 02 2013
prev sibling next sibling parent "John Colvin" <john.loughran.colvin gmail.com> writes:
On Saturday, 2 March 2013 at 15:43:57 UTC, Russel Winder wrote:
 On Sat, 2013-03-02 at 10:33 -0500, Andrei Alexandrescu wrote:
 […]
 That conclusion would be hasty if not missing the whole point. 
 You essentially measured the speed of one loop in various 
 translators implementing various languages. Java code doing 
 straight computation is on a par with C speed, no two ways 
 about that. Python code using library primitives ain't no 
 slouch either. Performance tuning in these languages becomes 
 more difficult in larger applications where data layout, 
 allocation, and indirect function calls start to dominate.

Interestingly, there isn't only one Python implementation. There is only one language but there is CPython, PyPy, Jython, IronPython, to mention but 4. On computationally intensive code, PyPy (Python execution environment in RPython) is generally 10 to 30 times faster than CPython (Python execution environment written in C). C is a (reasonably) well known and used language thought to create fast code. RPython is Python but with some restrictions that is statically compiled. For writing interpreters, RPython spanks C. PyPy is not the only language using RPython to implement the interpreter. C's days in this game are seriously numbered.

I'm not sure that's entirely fair. PyPy is fast because it implements a JIT, not because it's written in RPython.
Mar 02 2013
prev sibling next sibling parent "Peter Alexander" <peter.alexander.au gmail.com> writes:
On Saturday, 2 March 2013 at 15:43:57 UTC, Russel Winder wrote:
 C is a (reasonably) well known and used language thought to 
 create fast
 code. RPython is Python but with some restrictions that is 
 statically
 compiled.  For writing interpreters, RPython spanks C. PyPy is 
 not the
 only language using RPython to implement the interpreter. C's 
 days in
 this game are seriously numbered.

Source? (Googling "rpython interpreter speed" didn't show anything) There's always claims that systems like RPython, or Haskell, or LuaJIT, or Java HotSpot are able to rival or beat C, yet all I ever see are small toy programs. Show me something big. Something with 100's of thousands of lines of code. If all this is true, where are the video games written in these languages that rival performance/fidelity of AAA titles? I've never seen that. Ever.
Mar 02 2013
prev sibling next sibling parent "John Colvin" <john.loughran.colvin gmail.com> writes:
On Saturday, 2 March 2013 at 16:06:57 UTC, Peter Alexander wrote:
 On Saturday, 2 March 2013 at 15:43:57 UTC, Russel Winder wrote:
 C is a (reasonably) well known and used language thought to 
 create fast
 code. RPython is Python but with some restrictions that is 
 statically
 compiled.  For writing interpreters, RPython spanks C. PyPy is 
 not the
 only language using RPython to implement the interpreter. C's 
 days in
 this game are seriously numbered.

Source? (Googling "rpython interpreter speed" didn't show anything) There's always claims that systems like RPython, or Haskell, or LuaJIT, or Java HotSpot are able to rival or beat C, yet all I ever see are small toy programs. Show me something big. Something with 100's of thousands of lines of code. If all this is true, where are the video games written in these languages that rival performance/fidelity of AAA titles? I've never seen that. Ever.

Rpython has been used to write a very fast jit (pypy) which is not the same as an interpreter. The interpreter is what you fall back to when you can't use the jit. I don't know how fast pypys interpreter is specifically, but it's not it's main selling point.
Mar 02 2013
prev sibling next sibling parent "John Colvin" <john.loughran.colvin gmail.com> writes:
On Saturday, 2 March 2013 at 16:09:02 UTC, Jacob Carlborg wrote:
 On 2013-03-02 02:10, Walter Bright wrote:

 Python's splitter, which you are measuring, isn't written in 
 Python. It
 is written in C. You're actually comparing a bit of C code 
 with a bit of
 D code.

Does that really matter. He's using Python, if the function is part of the standard library and if it's implement in Python or C doesn't really matter.

It does. Failing to beat mature, optimised C is not anything to be ashamed of, but being slower than python would be an abject failure of D as a compiled, systems programming language.
Mar 02 2013
prev sibling next sibling parent "jerro" <a a.com> writes:
 Does that really matter. He's using Python, if the function is 
 part of the standard library and if it's implement in Python or 
 C doesn't really matter.

You can look at it that way, but still, the fact that most of the work in the Python version is done by C code makes the timings less surprising. If split() was implemented in pure python and it would be only three times slower (when run with CPython) than std.algorithm.splitter, that would be very surprising and a sign that std.algorithm.splitter is doing something horribly wrong.
Mar 02 2013
prev sibling next sibling parent Russel Winder <russel winder.org.uk> writes:
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Sat, 2013-03-02 at 16:55 +0100, John Colvin wrote:
[=E2=80=A6]
 I'm not sure that's entirely fair. PyPy is fast because it=20
 implements a JIT, not because it's written in RPython.

The second sentence is almost entirely true, the first I am not so sure. Armin Rigo used to manage Psycho which was an amendment to the CPython code base to include a JIT. He quit trying to keep that going and so CPython doesn't have a JIT of any real importance. Unladen Swallow dissapeared. The JIT framework is an integral part of RPython rather than being specific only to PyPy. The point is that for I/O bound systems all languages are more or less equivalent, for computational intensive code, JITs are needed for virtual machine based systems or they will not compete. --=20 Russel. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.winder ekiga.n= et 41 Buckmaster Road m: +44 7770 465 077 xmpp: russel winder.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder
Mar 02 2013
prev sibling next sibling parent "cvk012c" <cvk012c motorolasolutions.com> writes:
On Saturday, 2 March 2013 at 15:33:42 UTC, Andrei Alexandrescu 
wrote:
 On 3/1/13 8:05 PM, cvk012c wrote:
 In my latest version of D script I didn't use splitter at all. 
 I used
 string specific indexOf function. Still result is very bad. 
 For text
 based protocols, such as SIP, performance of string 
 manipulating
 functions is very important. Unfortunately, looks like it is 
 not D
 strongest point at this time.

That conclusion would be hasty if not missing the whole point. You essentially measured the speed of one loop in various translators implementing various languages. Java code doing straight computation is on a par with C speed, no two ways about that. Python code using library primitives ain't no slouch either.

That is true, but what prevents D to do the same? Why Java code can be on par with C speed and D cannot? If Java and Python folks can do it, why D creators cannot? If performance optimizations is always number one priority in Java and Python community, why it is not a case for D? I will go even further: what prevents D to be faster than C? That's where D will really shine and people will start talking about. But that is probably a distant future of D and for now, my personal opinion is that ANY part of D code should not be slower than at least Java. If it is, it should be reported as an issues and fixed. Being more than twice slower than Java is a failure.
Mar 02 2013
prev sibling next sibling parent Russel Winder <russel winder.org.uk> writes:
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Sat, 2013-03-02 at 17:06 +0100, Peter Alexander wrote:
[=E2=80=A6]
 Source? (Googling "rpython interpreter speed" didn't show=20
 anything)

For me, my source is my microbenchmarks, which are testing just loop with computation, i.e compute intensive data parallel. There is plenty of other stuff.
 There's always claims that systems like RPython, or Haskell, or=20
 LuaJIT, or Java HotSpot are able to rival or beat C, yet all I=20
 ever see are small toy programs. Show me something big. Something=20
 with 100's of thousands of lines of code.

100,000+ lines of code is a total irrelevance regarding system performance, as anyone who knows anything of profiling and benchmarking will tell you. If C is the only viable language why has Java got the majority of the commercial market. C++ and Fortran have the vast majority of the computational world.=20
 If all this is true, where are the video games written in these=20
 languages that rival performance/fidelity of AAA titles? I've=20
 never seen that. Ever.

Where is D is this arena. It is either C++ or C++ in the high performance graphics arena. If there are no AAA titles written in D why does this email list have anyone on it? Clearly I think your response is just trolling, I will stop feeding the troll. --=20 Russel. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.winder ekiga.n= et 41 Buckmaster Road m: +44 7770 465 077 xmpp: russel winder.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder
Mar 02 2013
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Sat, 02 Mar 2013 12:17:54 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 On 3/1/13 8:29 PM, Steven Schveighoffer wrote:
 On Fri, 01 Mar 2013 20:05:34 -0500, cvk012c
 <cvk012c motorolasolutions.com> wrote:

 In my latest version of D script I didn't use splitter at all. I used
 string specific indexOf function. Still result is very bad. For text
 based protocols, such as SIP, performance of string manipulating
 functions is very important. Unfortunately, looks like it is not D
 strongest point at this time.

indexOf uses the same mechanism as splitter to find the separators. If it doesn't improve anything, I'd say that is where the problem lies (std.algorithm.find).

find and indexOf on arrays are on par with handwritten loops.

This is not a personal attack, please don't take it that way. My anecdotal tests with hand writing a custom splitter range to handle the OP's program gave me a 40% savings. Whether it's find or not, I'm not sure, but there definitely is room for improvement. Using indexOf instead of splitter seems to result in the same time usage, so something that is common between the two would be a logical choice to target. It would be interesting to examine the assembly differences, but I haven't got the time right now. -Steve
Mar 02 2013
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Sat, 02 Mar 2013 12:14:07 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 On 3/2/13 11:12 AM, Jacob Carlborg wrote:
 On 2013-03-01 23:31, Steven Schveighoffer wrote:

 Phobos kind of refuses to treat strings like arrays of characters, it
 insists on decoding.

 With DMD and a hand-written splitter, it takes 6 seconds instead of 10
 on my system (64-bit macosx).

If you need to write a custom splitter to get acceptable performance then it sounds to me that D or Phobos have failed.

I wrote a custom splitter specialized for arrays. It's as fast.

Why doesn't it get used for strings in this instance? Maybe there is a bad constraint somewhere. -Steve
Mar 02 2013
prev sibling next sibling parent Russel Winder <russel winder.org.uk> writes:
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Sat, 2013-03-02 at 11:30 -0800, Walter Bright wrote:
 On 3/2/2013 7:43 AM, Russel Winder wrote:
 For writing interpreters, RPython spanks C.

What's RPython doing that makes it faster?

Allowing PyPy to have a good JIT compiler. --=20 Russel. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.winder ekiga.n= et 41 Buckmaster Road m: +44 7770 465 077 xmpp: russel winder.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder
Mar 02 2013
prev sibling next sibling parent "jerro" <a a.com> writes:
On Saturday, 2 March 2013 at 19:42:47 UTC, Russel Winder wrote:
 On Sat, 2013-03-02 at 11:30 -0800, Walter Bright wrote:
 On 3/2/2013 7:43 AM, Russel Winder wrote:
 For writing interpreters, RPython spanks C.

What's RPython doing that makes it faster?

Allowing PyPy to have a good JIT compiler.

And why do you think this wouldn't be possible if it was written in C? There are fast JITs written in C.
Mar 02 2013
prev sibling next sibling parent "Peter Alexander" <peter.alexander.au gmail.com> writes:
On Saturday, 2 March 2013 at 19:19:09 UTC, Russel Winder wrote:
 100,000+ lines of code is a total irrelevance regarding system
 performance, as anyone who knows anything of profiling and 
 benchmarking
 will tell you.

I don't care about benchmarking, I care about writing large, fast systems. Small toy programs are irrelevant because they are not representative of real programs. Real programs are large (many thousands or millions of lines of code) and you don't have the luxury of being able to hand optimize every line of code.
 If C is the only viable language why has Java got the majority 
 of the
 commercial market. C++ and Fortran have the vast majority of the
 computational world.

Commercial market for what? For high performance applications C++ is still champion.
 If all this is true, where are the video games written in 
 these languages that rival performance/fidelity of AAA titles? 
 I've never seen that. Ever.

Where is D is this arena. It is either C++ or C++ in the high performance graphics arena. If there are no AAA titles written in D why does this email list have anyone on it?

D is getting there. Remedy Games are sponsoring DConf, and I predict we'll start to see some high quality indie games over the next few years written in D. Up till now, I imagine language stability has been the main thing holding back D. Tomasz was working on a high quality renderer before he got fed up with D's instability (http://h3.gd/code/nucleus/).
Mar 02 2013
prev sibling next sibling parent "John Colvin" <john.loughran.colvin gmail.com> writes:
On Saturday, 2 March 2013 at 20:02:09 UTC, Walter Bright wrote:
 On 3/2/2013 11:42 AM, Russel Winder wrote:
 On Sat, 2013-03-02 at 11:30 -0800, Walter Bright wrote:
 On 3/2/2013 7:43 AM, Russel Winder wrote:
 For writing interpreters, RPython spanks C.

What's RPython doing that makes it faster?

Allowing PyPy to have a good JIT compiler.

I don't understand. Does that JIT generate faster code than a C compiler would generate?

In some cases yes, it can, but as you know that has nothing to do with the language the jit is written in. As I've said elsewhere, pypy being fast compared to cpython has nothing to do with the former being written in rpython and the latter in c* *other than perhaps the higher productivity of writing in rpython allowing more resources to be dedicated to optimisation.
Mar 02 2013
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Sat, 02 Mar 2013 16:16:22 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 I fear there's a misunderstanding somewhere. Splitter and find are  
 already specialized appropriately for strings.

OK, I would then counter that I have created a splitter range that beats its performance, by a lot. As far as I can tell, I'm not doing anything special. All I feel that I'm doing differently is instead of using find or indexOf, I'm looking at each char to see if it matches the first in the separator, and if it does, then I'm comparing the remaining characters. Pretty braindead, probably room for improvement. Splitter should beat this, or match it at least. From testing the sample code in this thread, my simple version beats splitter by 40%. Something isn't right somewhere, the difference should not be that drastic. -Steve
Mar 02 2013
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Sat, 02 Mar 2013 18:14:28 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 On 3/2/13 5:48 PM, Steven Schveighoffer wrote:
 On Sat, 02 Mar 2013 16:16:22 -0500, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 I fear there's a misunderstanding somewhere. Splitter and find are
 already specialized appropriately for strings.

OK, I would then counter that I have created a splitter range

Prolly a link would clarify that.

Forget this, I read your other post. The MySplitter range is the one I was referring to. I saw your points and will respond there, but not today. I don't have time right now to continue the debate. -Steve
Mar 02 2013
prev sibling next sibling parent "John Colvin" <john.loughran.colvin gmail.com> writes:
On Sunday, 3 March 2013 at 15:09:58 UTC, Jacob Carlborg wrote:
 On 2013-03-02 17:48, John Colvin wrote:

 It does.

 Failing to beat mature, optimised C is not anything to be 
 ashamed of,
 but being slower than python would be an abject failure of D 
 as a
 compiled, systems programming language.

Then D needs to get faster, or we need to switch to C for some std lib functions. In that case, as Walter said, we have failed.

I agree that anything that makes us faster is good, but I wouldn't go so far as to say we've failed if we're not as fast as the very fastest.
Mar 03 2013
prev sibling next sibling parent "deadalnix" <deadalnix gmail.com> writes:
On Sunday, 3 March 2013 at 20:18:58 UTC, Walter Bright wrote:
 On 3/3/2013 7:09 AM, Jacob Carlborg wrote:
 On 2013-03-02 17:48, John Colvin wrote:

 It does.

 Failing to beat mature, optimised C is not anything to be 
 ashamed of,
 but being slower than python would be an abject failure of D 
 as a
 compiled, systems programming language.

Then D needs to get faster, or we need to switch to C for some std lib functions. In that case, as Walter said, we have failed.

Nothing in this thread suggests that D needs to switch its library implementations to C. Interestingly, I tried the indexOf() benchmark using C's memchr() and memcmp() from VC's runtime library. It was not faster than Andrei's optimized D one.

Maybe it is time to look at the python implementation and see why it is faster.
Mar 03 2013
prev sibling next sibling parent "jerro" <a a.com> writes:
 Maybe it is time to look at the python implementation and see 
 why it is faster.

It isn't faster: $ time python3 test.py real 0m14.217s user 0m14.209s sys 0m0.004s $ gdmd -O -inline -release -noboundscheck test $ time ./test real 0m5.323s user 0m5.312s sys 0m0.008s D code here uses the same string as the python code, not the one in cvk012c's D code.
Mar 03 2013
prev sibling next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
jerro:

 $ time python3 test.py

Are Python3 strings still like wstrings/dstrings, or have they applied the patch that makes them UTF8? Bye, bearophile
Mar 03 2013
prev sibling next sibling parent "deadalnix" <deadalnix gmail.com> writes:
On Monday, 4 March 2013 at 03:20:57 UTC, jerro wrote:
 Maybe it is time to look at the python implementation and see 
 why it is faster.

It isn't faster: $ time python3 test.py real 0m14.217s user 0m14.209s sys 0m0.004s $ gdmd -O -inline -release -noboundscheck test $ time ./test real 0m5.323s user 0m5.312s sys 0m0.008s D code here uses the same string as the python code, not the one in cvk012c's D code.

Using noboundcheck isn't fair as you give up on safety, so you are not equivalent to python either.
Mar 03 2013
prev sibling next sibling parent "deadalnix" <deadalnix gmail.com> writes:
On Monday, 4 March 2013 at 04:36:30 UTC, Andrei Alexandrescu 
wrote:
 On 3/3/13 9:31 PM, deadalnix wrote:
 On Sunday, 3 March 2013 at 20:18:58 UTC, Walter Bright wrote:
 On 3/3/2013 7:09 AM, Jacob Carlborg wrote:
 On 2013-03-02 17:48, John Colvin wrote:

 It does.

 Failing to beat mature, optimised C is not anything to be 
 ashamed of,
 but being slower than python would be an abject failure of 
 D as a
 compiled, systems programming language.

Then D needs to get faster, or we need to switch to C for some std lib functions. In that case, as Walter said, we have failed.

Nothing in this thread suggests that D needs to switch its library implementations to C. Interestingly, I tried the indexOf() benchmark using C's memchr() and memcmp() from VC's runtime library. It was not faster than Andrei's optimized D one.

Maybe it is time to look at the python implementation and see why it is faster.

Was the conclusion that it's faster? Andrei

It seems that I was wrong, and missed the point that both benchmark weren't using the same string. Sorry for that.
Mar 03 2013
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Sunday, March 03, 2013 23:40:18 Walter Bright wrote:
 On 3/3/2013 11:38 PM, Jacob Carlborg wrote:
 On 2013-03-03 16:41, John Colvin wrote:
 I agree that anything that makes us faster is good, but I wouldn't go so
 far as to say we've failed if we're not as fast as the very fastest.

No, but if we need to drop down to C to get fast enough.

I can't think of a reason to need to do that.

Given how D works, there is something _very_ wrong if we have to drop to C, if nothing else, because it's trivial to write code in D which is equivalent to what it would be in C. If our implementation of something is worse than a C implementation, that merely means that it needs to be refactored and optimized. It's possible that some of our abstractions will hamper efficiency in some cases (e.g. depending on how ranges are used - particularly with strings - you risk a performance hit in comparison to pure C), but that should generally be fixable via specializations and whatnot. If we have an efficiency problem, it's merely an implementation issue which needs to be sorted out. There should be nothing inherent to D which makes it so that you can't write code as fast as C or C++. - Jonathan M Davis
Mar 04 2013
prev sibling next sibling parent Russel Winder <russel winder.org.uk> writes:
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Mon, 2013-03-04 at 04:31 +0100, bearophile wrote:
[=E2=80=A6]
 Are Python3 strings still like wstrings/dstrings, or have they=20
 applied the patch that makes them UTF8?

Python3 is Unicode by default, with UTF-8 the default encoding. --=20 Russel. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.winder ekiga.n= et 41 Buckmaster Road m: +44 7770 465 077 xmpp: russel winder.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder
Mar 04 2013
prev sibling next sibling parent "Don" <turnyourkidsintocash nospam.com> writes:
On Monday, 4 March 2013 at 03:58:20 UTC, deadalnix wrote:
 On Monday, 4 March 2013 at 03:20:57 UTC, jerro wrote:
 Maybe it is time to look at the python implementation and see 
 why it is faster.

It isn't faster: $ time python3 test.py real 0m14.217s user 0m14.209s sys 0m0.004s $ gdmd -O -inline -release -noboundscheck test $ time ./test real 0m5.323s user 0m5.312s sys 0m0.008s D code here uses the same string as the python code, not the one in cvk012c's D code.

Using noboundcheck isn't fair as you give up on safety, so you are not equivalent to python either.

But that does suggest that the main problem is that DMD has very little optimization aimed at reducing the number of bounds checks required.
Mar 04 2013
prev sibling next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
Walter Bright:

 there are semantics in D that allow faster code to be generated 
 than can be for C. They are unexploited at the moment, but they 
 are there.

This is interesting. Maybe you should write a small article on this. Bye, bearophile
Mar 04 2013
prev sibling next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
Russel Winder:

 Allowing PyPy to have a good JIT compiler.

PyPy is not just a JIT, it's a more complex beast. It's more like a meta-JIT. The only one I know of. Bye, bearophile
Mar 04 2013
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Sat, 02 Mar 2013 17:29:38 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 On 3/2/13 2:32 PM, Steven Schveighoffer wrote:
 This is not a personal attack, please don't take it that way.

Not at all. I'm just confused about this exchange. You mention some correct facts ("MySplitter finished in 6 seconds instead of 10"), then some mistaken suppositions ("a string-specific version could be written"), and derive conclusions from both ("Maybe there is a bad constraint somewhere"). Hard to deal with the mix.

I didn't realize the situation. I assumed that there wasn't a version of splitter for strings that took array-specific shortcuts. My corrected statement should read "the existing string-specific version could be improved." My conclusion of "Maybe there is a bad constraint somewhere" was not derived from that, it was based on your statement elsewhere that "I wrote a custom splitter specialized for arrays. It's as fast." Given that my tests have shown I can write a faster one quite easily than the implementation selected by phobos, and that you said you wrote one that's "as fast," I took that to mean you had written one that was more optimized than the chosen splitter version (and logically assumed you had included this version in Phobos with the intent it would be chosen when compiling with strings), leading me to suggest possibly that due to some bug, the "fast" implementation wasn't being chosen. I didn't realize that "as fast" didn't mean "as fast as yours". I actually don't know what you meant by that now.
 My
 anecdotal tests with hand writing a custom splitter range to handle the
 OP's program gave me a 40% savings. Whether it's find or not, I'm not
 sure, but there definitely is room for improvement.

I think I understand where the confusion comes from. If you're referring to MySplitter, that's not comparable. It uses this at the core: for(; i + 1 < source.length; i++) { if(source[i] == '\r' && source[i + 1] == '\n') { found = true; break; } ... } This is not close to the code that would work with arrays. We could specialize things for small arrays, but that hasn't been done yet. My point is it's not comparable with the classic brute force subsequence search.

Very good point, here is a new version that takes any string as a separator: struct MySplitter { private string s; private string separator; private string source; this(string src, string sep) { source = src; separator = sep; popFront(); } property string front() { return s; } property bool empty() { return s.ptr == null; } void popFront() { if(!source.length) { s = source; source = null; } else { size_t i = 0; bool found = false; for(; i + separator.length <= source.length; i++) { if(source[i] == separator[0]) { found = true; for(size_t j = i+1, k=1; k < separator.length; ++j, ++k) if(source[j] != separator[k]) { found = false; break; } if(found) break; } } s = source[0..i]; if(found) source = source[i + separator.length..$]; else source = source[$..$]; } } } Takes 7 seconds on my machine instead of 6, but not 10 like std.algorithm.splitter. I don't even like the loop that well, it looks crude, I can probably optimize it further. And it does not use any of your specified tricks. Is this sufficiently comparable, or am I missing something else? -Steve
Mar 04 2013
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Sun, 03 Mar 2013 22:58:07 -0500, deadalnix <deadalnix gmail.com> wrote:

 On Monday, 4 March 2013 at 03:20:57 UTC, jerro wrote:
 Maybe it is time to look at the python implementation and see why it  
 is faster.

It isn't faster: $ time python3 test.py real 0m14.217s user 0m14.209s sys 0m0.004s $ gdmd -O -inline -release -noboundscheck test $ time ./test real 0m5.323s user 0m5.312s sys 0m0.008s D code here uses the same string as the python code, not the one in cvk012c's D code.

Using noboundcheck isn't fair as you give up on safety, so you are not equivalent to python either.

In this type of test, there is no danger with using noboundscheck. It's a very fair switch to use, D is just able to do it where python is not. A sufficiently smart compiler could eliminate the bounds check for this code, since it can be proven not to go out of bounds (in fact, a simple run without the noboundscheck proves it in very deterministic code like this). -Steve
Mar 04 2013
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Sat, 02 Mar 2013 12:20:22 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 On 3/1/13 6:26 PM, Steven Schveighoffer wrote:
 On Fri, 01 Mar 2013 18:22:54 -0500, Steven Schveighoffer
 <schveiguy yahoo.com> wrote:

 So it's just pure heuristics. Not hard to see how that would affect a
 10 million cycle program.

 We may be able to create a string-specific version of splitter that
 would take advantage of the representation.

Just found a disturbing artifact: splitter(message, '\n') is more than twice as slow as splitter(message, "\n") -Steve

That is a problem. Could you please file a but report?

http://d.puremagic.com/issues/show_bug.cgi?id=9645 In trying to make a minimal test case, I found that the algorithm performs worse vs. the substring version as the number of content characters between separators grows. It actually starts out performing better than the substring version, by quite a bit (I would think we should be able to optimize there as well, the difference was about half) when the ratio is 1:1 (i.e. splitting "ababababa" on 'a' or "a"), but gets worse as you put more content between the separators. That should be a large clue. -Steve
Mar 04 2013
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Mon, 04 Mar 2013 12:35:23 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 That's comparable, please file an enh request.

http://d.puremagic.com/issues/show_bug.cgi?id=9646
Mar 04 2013
prev sibling next sibling parent "Zach the Mystic" <reachBUTMINUSTHISzach gOOGLYmail.com> writes:
On Saturday, 2 March 2013 at 17:20:23 UTC, Andrei Alexandrescu 
wrote:
 On 3/1/13 6:26 PM, Steven Schveighoffer wrote:
 On Fri, 01 Mar 2013 18:22:54 -0500, Steven Schveighoffer
 <schveiguy yahoo.com> wrote:

 So it's just pure heuristics. Not hard to see how that would 
 affect a
 10 million cycle program.

 We may be able to create a string-specific version of 
 splitter that
 would take advantage of the representation.

Just found a disturbing artifact: splitter(message, '\n') is more than twice as slow as splitter(message, "\n") -Steve

That is a problem. Could you please file a but report? Thanks, Andrei

I would have filed a bug report, but I was WAY too busy.
Mar 04 2013
prev sibling parent "Zach the Mystic" <reachBUTMINUSTHISzach gOOGLYmail.com> writes:
On Saturday, 2 March 2013 at 17:21:17 UTC, Andrei Alexandrescu
wrote:
 On 3/2/13 12:20 PM, Andrei Alexandrescu wrote:
 On 3/1/13 6:26 PM, Steven Schveighoffer wrote:
 On Fri, 01 Mar 2013 18:22:54 -0500, Steven Schveighoffer
 <schveiguy yahoo.com> wrote:

 So it's just pure heuristics. Not hard to see how that would 
 affect a
 10 million cycle program.

 We may be able to create a string-specific version of 
 splitter that
 would take advantage of the representation.

Just found a disturbing artifact: splitter(message, '\n') is more than twice as slow as splitter(message, "\n") -Steve

That is a problem. Could you please file a but report?

s/but/bug/ :o)

Ohhhhh!
 Andrei

Mar 04 2013