digitalmars.D - Investigation: downsides of being generic and correct

Dicebot (11/11) May 16 2013 Want to bring into discussion people that are not on Google+.

Peter Alexander (9/13) May 16 2013 Of course things can be improved. For a start, pattern could be a
bearophile (10/14) May 16 2013 In the first of his posts I don't see -noboundscheck used, and it

Dicebot (3/10) May 17 2013 Sure. I am not interested in benchmarks. What made me curious was

Juan Manuel Cabo (149/160) May 16 2013 I bet the problem is in readln. Currently, File.byLine() and

Andrei Alexandrescu (3/5) May 16 2013 Depends on the OS.
Dicebot (5/22) May 17 2013 Both manual and naive phobos version use same readln approach,

Juan Manuel Cabo (26/37) May 16 2013 May I also recommend my tool "avgtime" to make simple benchmarks,

1100110 (1/31) May 16 2013 Thank you for self-promotion, I miss that tool.

Nick Sabalausky (5/39) May 16 2013 Indeed. I had totally forgotten about that, and yet it *should* be the

1100110 (3/46) May 16 2013 +1

Juan Manuel Cabo (9/59) May 16 2013 Thanks!

Nick Sabalausky (6/20) May 16 2013 [...]

Dicebot (4/5) May 17 2013 Thanks for the tool, it is a good one. But I was not doing

nazriel (11/22) May 16 2013 Very nice blog post.

Dicebot (8/16) May 17 2013 Thank you, I am glad at least someone have noticed it is not a

Nick Sabalausky (5/10) May 16 2013 For anyone else who has trouble viewing that like I did, there
Jonathan M Davis (20/33) May 16 2013 1. In general, if you want to operate on ASCII, and you want your code t...

Walter Bright (4/7) May 16 2013 We should also be aware that while Python code itself is slow, its libra...

Jacob Carlborg (6/10) May 17 2013 But someone using Python won't care about that. Most of them will think

John Colvin (5/16) May 17 2013 I'm not sure how we can respond to that.

deadalnix (6/9) May 17 2013 No. The whole benefit of D is lost if you have to tweak

John Colvin (6/12) May 17 2013 Define fast.

Jonathan M Davis (7/16) May 17 2013 I keep forgetting about that. That's a good thing to keep in mind when

Dicebot (14/41) May 17 2013 I was thinking exactly about that. Only thing I want to be

Jonathan M Davis (31/58) May 17 2013 I'm not sure. My first inclination would be to simply put them as overlo...
Samuel Lampa (3/11) May 17 2013 At least I'm now educated on this :")

"Dicebot" <m.strashun gmail.com> writes:

Want to bring into discussion people that are not on Google+. 
Samuel recently has posted there some simple experiments with 
bioinformatics and bad performance of Phobos-based snippet has 
surprised me.

I did explore issue a bit and reported results in a blog post 
(snippets are really small and simple) : 
http://dicebot.blogspot.com/2013/05/short-performance-tuning-story.html

One open question remains though - can D/Phobos do better here? 
Can some changes be done to Phobos functions in question to 
improve performance or creating bioinformatics-specialized 
library is only practical solution?

May 16 2013

"Peter Alexander" <peter.alexander.au gmail.com> writes:

On Thursday, 16 May 2013 at 10:35:12 UTC, Dicebot wrote:
 One open question remains though - can D/Phobos do better here? 
 Can some changes be done to Phobos functions in question to 
 improve performance or creating bioinformatics-specialized 
 library is only practical solution?

Of course things can be improved. For a start, pattern could be a 
template parameter so that most of the checks are inlined and 
const-folded.

Using count!(c => c=='G' || c=='C')(line) from std.algorithm 
would probably perform better as well.

Simply put, countchars is just the obvious naive implementation 
of the algorithm. It hasn't been tuned at all, and isn't suitable 
for use in a small kernel like this.

May 16 2013

"bearophile" <bearophileHUGS lycos.com> writes:

Dicebot:

 I did explore issue a bit and reported results in a blog post 
 (snippets are really small and simple) : 
 http://dicebot.blogspot.com/2013/05/short-performance-tuning-story.html

In the first of his posts I don't see -noboundscheck used, and it 
compares different algorithms from C++ (a switch) and D (two 
nested ifs, that are not optimal).

 From my experience if you have some care you are able to write D 
code for LDC that is about as fast as equivalent as C code, or 
better.


 One open question remains though - can D/Phobos do better here?

Of course.

Bye,
bearophile

May 16 2013

"Dicebot" <m.strashun gmail.com> writes:

On Thursday, 16 May 2013 at 11:37:14 UTC, bearophile wrote:
 Dicebot:
 In the first of his posts I don't see -noboundscheck used, and 
 it compares different algorithms from C++ (a switch) and D (two 
 nested ifs, that are not optimal).

 From my experience if you have some care you are able to write 
 D code for LDC that is about as fast as equivalent as C code, 
 or better.

Sure. I am not interested in benchmarks. What made me curious was 
"what made this code so slow if you _don't_ have some care".

May 17 2013

"Juan Manuel Cabo" <juanmanuel.cabo gmail.com> writes:

On Thursday, 16 May 2013 at 10:35:12 UTC, Dicebot wrote:
 Want to bring into discussion people that are not on Google+. 
 Samuel recently has posted there some simple experiments with 
 bioinformatics and bad performance of Phobos-based snippet has 
 surprised me.

 I did explore issue a bit and reported results in a blog post 
 (snippets are really small and simple) : 
 http://dicebot.blogspot.com/2013/05/short-performance-tuning-story.html

 One open question remains though - can D/Phobos do better here? 
 Can some changes be done to Phobos functions in question to 
 improve performance or creating bioinformatics-specialized 
 library is only practical solution?

I bet the problem is in readln. Currently, File.byLine() and 
readln() are extremely slow, because they call fgetc() one char 
at a time.

I made an "byLineFast" implementation some time ago that is 10x 
faster than std.stdio.byLine. It reads lines through rawRead, and 
using buffers instead of char by char.

I don't have the time to make it phobos-ready (unicode, etc.). 
But I'll paste it here for any one to use (it works perfectly).

--jm

-------------------------------------

module ByLineFast;

import std.stdio;
import std.string: indexOf;
import std.c.string: memmove;


/**
   Reads by line in an efficient way (10 times faster than 
File.byLine
   from std.stdio).
   This is accomplished by reading entire buffers (fgetc() is not 
used),
   and allocating as little as possible.

   The char \n is considered as separator, removing the previous \r
   if it exists.

   The \n is never returned. The \r is not returned if it was
   part of a \r\n (but it is returned if it was by itself).

   The returned string is always a substring of a temporary
   buffer, that must not be stored. If necessary, you must
   use str[] or .dup or .idup to copy to another string.

   Example:

         File f = File("file.txt");
         foreach (string line; ByLineFast(f)) {
             ...process line...
             //Make a copy:
             string copy = line[];
         }

   The file isn't closed when done iterating, unless it was
   the only reference to the file (same as std.stdio.byLine).
   (example: ByLineFast(File("file.txt"))).
*/
struct ByLineFast {
     File file;
     char[] line;
     bool first_call = true;
     char[] buffer;
     char[] strBuffer;

     this(File f, int bufferSize=4096) {
         assert(bufferSize > 0);
         file = f;
         buffer.length = bufferSize;
     }

      property bool empty() const {
         //Its important to check "line !is null" instead of
         //"line.length != 0", otherwise, no empty lines can
         //be returned, the iteration would be closed.
         if (line !is null) {
             return false;
         }
         if (!file.isOpen) {
             //Clean the buffer to avoid pointer false positives:
			(cast(char[])buffer)[] = 0;
             return true;
         }

         //First read. Determine if it's empty and put the char 
back.
         auto mutableFP = (cast(File*) &file).getFP();
         auto c = fgetc(mutableFP);
         if (c == -1) {
             //Clean the buffer to avoid pointer false positives:
			(cast(char[])buffer)[] = 0;
             return true;
         }
         if (ungetc(c, mutableFP) != c) {
             assert(false, "Bug in cstdlib implementation");
         }
         return false;
     }

      property char[] front() {
         if (first_call) {
             popFront();
             first_call = false;
         }
         return line;
     }

     void popFront() {
         if (strBuffer.length == 0) {
             strBuffer = file.rawRead(buffer);
             if (strBuffer.length == 0) {
                 file.detach();
                 line = null;
                 return;
             }
         }

         int pos = strBuffer.indexOf('\n');
         if (pos != -1) {
             if (pos != 0 && strBuffer[pos-1] == '\r') {
                 line = strBuffer[0 .. (pos-1)];
             } else {
                 line = strBuffer[0 .. pos];
             }
             //Pop the line, skipping the terminator:
             strBuffer = strBuffer[(pos+1) .. $];
         } else {
             //More needs to be read here. Copy the tail of the 
buffer
             //to the beginning, and try to read with the empty 
part of
             //the buffer.
             //If no buffer was left, extend the size of the 
buffer before
             //reading. If the file has ended, then the line is 
the entire
             //buffer.

             if (strBuffer.ptr != buffer.ptr) {
                 //Must use memmove because there might be overlap
                 memmove(buffer.ptr, strBuffer.ptr, 
strBuffer.length * char.sizeof);
             }
             int spaceBegin = strBuffer.length;
             if (strBuffer.length == buffer.length) {
                 //Must extend the buffer to keep reading.
                 assumeSafeAppend(buffer);
                 buffer.length = buffer.length * 2;
             }
             char[] readPart = file.rawRead(buffer[spaceBegin .. 
$]);
             if (readPart.length == 0) {
                 //End of the file. Return whats in the buffer.
                 //The next popFront() will try to read again, and 
then
                 //mark empty condition.
                 if (spaceBegin != 0 && buffer[spaceBegin-1] == 
'\r') {
                     line = buffer[0 .. spaceBegin-1];
                 } else {
                     line = buffer[0 .. spaceBegin];
                 }
                 strBuffer = null;
                 return;
             }
             strBuffer = buffer[0 .. spaceBegin + readPart.length];
             //Now that we have new data in strBuffer, we can go 
on.
             //If a line isn't found, the buffer will be extended 
again to read more.
             popFront();
         }
     }
}

May 16 2013

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 5/16/13 9:48 AM, Juan Manuel Cabo wrote:
 I bet the problem is in readln. Currently, File.byLine() and readln()
 are extremely slow, because they call fgetc() one char at a time.

Depends on the OS.

Andrei

May 16 2013

"Dicebot" <m.strashun gmail.com> writes:

On Thursday, 16 May 2013 at 13:48:45 UTC, Juan Manuel Cabo wrote:
 On Thursday, 16 May 2013 at 10:35:12 UTC, Dicebot wrote:
 Want to bring into discussion people that are not on Google+. 
 Samuel recently has posted there some simple experiments with 
 bioinformatics and bad performance of Phobos-based snippet has 
 surprised me.

 I did explore issue a bit and reported results in a blog post 
 (snippets are really small and simple) : 
 http://dicebot.blogspot.com/2013/05/short-performance-tuning-story.html

 One open question remains though - can D/Phobos do better 
 here? Can some changes be done to Phobos functions in question 
 to improve performance or creating bioinformatics-specialized 
 library is only practical solution?

 I bet the problem is in readln. Currently, File.byLine() and 
 readln() are extremely slow, because they call fgetc() one char 
 at a time.

Both manual and naive phobos version use same readln approach, 
but former is more than 10x faster. It was my first guess too, 
but comparing to snippets have shown that this is not the issue 
this time.

May 17 2013

"Juan Manuel Cabo" <juanmanuel.cabo gmail.com> writes:

On Thursday, 16 May 2013 at 10:35:12 UTC, Dicebot wrote:
 Want to bring into discussion people that are not on Google+. 
 Samuel recently has posted there some simple experiments with 
 bioinformatics and bad performance of Phobos-based snippet has 
 surprised me.

 I did explore issue a bit and reported results in a blog post 
 (snippets are really small and simple) : 
 http://dicebot.blogspot.com/2013/05/short-performance-tuning-story.html

 One open question remains though - can D/Phobos do better here? 
 Can some changes be done to Phobos functions in question to 
 improve performance or creating bioinformatics-specialized 
 library is only practical solution?


May I also recommend my tool "avgtime" to make simple benchmarks, 
instead of "time" (you can see an ascii histogram as the output):

      https://github.com/jmcabo/avgtime/tree/

For example:

$ avgtime -r10 -h -q  ls
------------------------
Total time (ms): 27.413
Repetitions    : 10
Sample mode    : 2.6 (4 ocurrences)
Median time    : 2.6695
Avg time       : 2.7413
Std dev.       : 0.260515
Minimum        : 2.557
Maximum        : 3.505
95% conf.int.  : [2.2307, 3.2519]  e = 0.510599
99% conf.int.  : [2.07026, 3.41234]  e = 0.671041
EstimatedAvg95%: [2.57983, 2.90277]  e = 0.161466
EstimatedAvg99%: [2.5291, 2.9535]  e = 0.212202
Histogram      :
     msecs: count  normalized bar





--jm

May 16 2013

1100110 <0b1100110 gmail.com> writes:

 May I also recommend my tool "avgtime" to make simple benchmarks,
 instead of "time" (you can see an ascii histogram as the output):
=20
      https://github.com/jmcabo/avgtime/tree/
=20
 For example:
=20
 $ avgtime -r10 -h -q  ls
 ------------------------
 Total time (ms): 27.413
 Repetitions    : 10
 Sample mode    : 2.6 (4 ocurrences)
 Median time    : 2.6695
 Avg time       : 2.7413
 Std dev.       : 0.260515
 Minimum        : 2.557
 Maximum        : 3.505
 95% conf.int.  : [2.2307, 3.2519]  e =3D 0.510599
 99% conf.int.  : [2.07026, 3.41234]  e =3D 0.671041
 EstimatedAvg95%: [2.57983, 2.90277]  e =3D 0.161466
 EstimatedAvg99%: [2.5291, 2.9535]  e =3D 0.212202
 Histogram      :
     msecs: count  normalized bar




=20
 --jm
=20

Thank you for self-promotion, I miss that tool.

May 16 2013

Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:

On Thu, 16 May 2013 09:03:36 -0500
1100110 <0b1100110 gmail.com> wrote:

 May I also recommend my tool "avgtime" to make simple benchmarks,
 instead of "time" (you can see an ascii histogram as the output):
 
      https://github.com/jmcabo/avgtime/tree/
 
 For example:
 
 $ avgtime -r10 -h -q  ls
 ------------------------
 Total time (ms): 27.413
 Repetitions    : 10
 Sample mode    : 2.6 (4 ocurrences)
 Median time    : 2.6695
 Avg time       : 2.7413
 Std dev.       : 0.260515
 Minimum        : 2.557
 Maximum        : 3.505
 95% conf.int.  : [2.2307, 3.2519]  e = 0.510599
 99% conf.int.  : [2.07026, 3.41234]  e = 0.671041
 EstimatedAvg95%: [2.57983, 2.90277]  e = 0.161466
 EstimatedAvg99%: [2.5291, 2.9535]  e = 0.212202
 Histogram      :
     msecs: count  normalized bar




 
 --jm
 

 
 Thank you for self-promotion, I miss that tool.
 
 

Indeed. I had totally forgotten about that, and yet it *should* be the
first thing I think of when I think "timing a program". IMO, that
should be a standard tool in any unixy installation.

May 16 2013

1100110 <0b1100110 gmail.com> writes:

On 05/16/2013 01:46 PM, Nick Sabalausky wrote:
 On Thu, 16 May 2013 09:03:36 -0500
 1100110 <0b1100110 gmail.com> wrote:
=20
 May I also recommend my tool "avgtime" to make simple benchmarks,
 instead of "time" (you can see an ascii histogram as the output):

      https://github.com/jmcabo/avgtime/tree/

 For example:

 $ avgtime -r10 -h -q  ls
 ------------------------
 Total time (ms): 27.413
 Repetitions    : 10
 Sample mode    : 2.6 (4 ocurrences)
 Median time    : 2.6695
 Avg time       : 2.7413
 Std dev.       : 0.260515
 Minimum        : 2.557
 Maximum        : 3.505
 95% conf.int.  : [2.2307, 3.2519]  e =3D 0.510599
 99% conf.int.  : [2.07026, 3.41234]  e =3D 0.671041
 EstimatedAvg95%: [2.57983, 2.90277]  e =3D 0.161466
 EstimatedAvg99%: [2.5291, 2.9535]  e =3D 0.212202
 Histogram      :
     msecs: count  normalized bar





 --jm

 Thank you for self-promotion, I miss that tool.

=20
 Indeed. I had totally forgotten about that, and yet it *should* be the
 first thing I think of when I think "timing a program". IMO, that
 should be a standard tool in any unixy installation.
=20
=20

+1

That's worth creating a package for.

May 16 2013

"Juan Manuel Cabo" <juanmanuel.cabo gmail.com> writes:

On Thursday, 16 May 2013 at 22:58:42 UTC, 1100110 wrote:
 On 05/16/2013 01:46 PM, Nick Sabalausky wrote:
 On Thu, 16 May 2013 09:03:36 -0500
 1100110 <0b1100110 gmail.com> wrote:
 
 May I also recommend my tool "avgtime" to make simple 
 benchmarks,
 instead of "time" (you can see an ascii histogram as the 
 output):

      https://github.com/jmcabo/avgtime/tree/

 For example:

 $ avgtime -r10 -h -q  ls
 ------------------------
 Total time (ms): 27.413
 Repetitions    : 10
 Sample mode    : 2.6 (4 ocurrences)
 Median time    : 2.6695
 Avg time       : 2.7413
 Std dev.       : 0.260515
 Minimum        : 2.557
 Maximum        : 3.505
 95% conf.int.  : [2.2307, 3.2519]  e = 0.510599
 99% conf.int.  : [2.07026, 3.41234]  e = 0.671041
 EstimatedAvg95%: [2.57983, 2.90277]  e = 0.161466
 EstimatedAvg99%: [2.5291, 2.9535]  e = 0.212202
 Histogram      :
     msecs: count  normalized bar





 --jm

 Thank you for self-promotion, I miss that tool.

 
 Indeed. I had totally forgotten about that, and yet it 
 *should* be the
 first thing I think of when I think "timing a program". IMO, 
 that
 should be a standard tool in any unixy installation.
 
 

 +1

 That's worth creating a package for.

Thanks!
I currently don't have much time to make a ubuntu/arch/etc. 
package, between work and the university. I might in the future.

Keep in mind that it also works in windows. Though the process 
creation overhead is bigger in windows than in linux (because of 
the OS). Also, you can open the source up and easily modify it to 
measure your times directly, inside your programs.

--jm

May 16 2013

Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:

On Fri, 17 May 2013 03:01:38 +0200
"Juan Manuel Cabo" <juanmanuel.cabo gmail.com> wrote:

 On Thursday, 16 May 2013 at 22:58:42 UTC, 1100110 wrote:
 On 05/16/2013 01:46 PM, Nick Sabalausky wrote:
 
 Indeed. I had totally forgotten about that, and yet it 
 *should* be the
 first thing I think of when I think "timing a program". IMO, 
 that
 should be a standard tool in any unixy installation.
 



[...]
 
 Keep in mind that it also works in windows. Though the process 
 creation overhead is bigger in windows than in linux (because of 
 the OS). Also, you can open the source up and easily modify it to 
 measure your times directly, inside your programs.

Yea, I almost said "should be a standard tool in any OS installation",
but there's a *lot* of things that should be a standard part of any
Windows box (bash, grep, a pre-Vista GUI...) and yet never will be ;)

May 16 2013

"Dicebot" <m.strashun gmail.com> writes:

On Thursday, 16 May 2013 at 13:52:01 UTC, Juan Manuel Cabo wrote:
 ...

Thanks for the tool, it is a good one. But I was not doing 
benchmarks this time, only cared about 2x difference at least, so 
"time" was enough :)

May 17 2013

"nazriel" <spam dzfl.pl> writes:

On Thursday, 16 May 2013 at 10:35:12 UTC, Dicebot wrote:
 Want to bring into discussion people that are not on Google+. 
 Samuel recently has posted there some simple experiments with 
 bioinformatics and bad performance of Phobos-based snippet has 
 surprised me.

 I did explore issue a bit and reported results in a blog post 
 (snippets are really small and simple) : 
 http://dicebot.blogspot.com/2013/05/short-performance-tuning-story.html

 One open question remains though - can D/Phobos do better here? 
 Can some changes be done to Phobos functions in question to 
 improve performance or creating bioinformatics-specialized 
 library is only practical solution?

Very nice blog post.

Something similar should go into D wiki database so it won't get 
lost in "In 80s we had..." topics.

For sure there is a space for improvements in Phobos but such 
articles are good start to prevent wave of "D is slow and sucks" 
and force people to rethink if they are using right tools 
(functions in this case ie UTF8 aware vs plain ASCII ones) for 
their job.

Btw, you've got nice articles on your blog in overall. Bookmarked 
;)

May 16 2013

"Dicebot" <m.strashun gmail.com> writes:

On Thursday, 16 May 2013 at 14:23:22 UTC, nazriel wrote:
 Very nice blog post.

 Something similar should go into D wiki database so it won't 
 get lost in "In 80s we had..." topics.

 For sure there is a space for improvements in Phobos but such 
 articles are good start to prevent wave of "D is slow and 
 sucks" and force people to rethink if they are using right 
 tools (functions in this case ie UTF8 aware vs plain ASCII 
 ones) for their job.

Thank you, I am glad at least someone have noticed it is not a 
call for a benchmarking contest :) Yes, my interest was exactly 
in case when newbie comes and tries to write some trivial code. 
If it behaves too slow, that abstract guy won't benchmark or 
investigate stuff, he will just say "D sucks" and move to the 
next language.

It is more of an informational issue, than Phobos one.

May 17 2013

Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:

On Thu, 16 May 2013 12:35:11 +0200
"Dicebot" <m.strashun gmail.com> wrote:
 
 I did explore issue a bit and reported results in a blog post 
 (snippets are really small and simple) : 
 http://dicebot.blogspot.com/2013/05/short-performance-tuning-story.html
 

For anyone else who has trouble viewing that like I did, there
appears to be an HTML version of it here:
http://dicebot.blogspot.com/2013/05/short-performance-tuning-story.html?m=1

May 16 2013

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On Thursday, May 16, 2013 12:35:11 Dicebot wrote:
 Want to bring into discussion people that are not on Google+.
 Samuel recently has posted there some simple experiments with
 bioinformatics and bad performance of Phobos-based snippet has
 surprised me.
 
 I did explore issue a bit and reported results in a blog post
 (snippets are really small and simple) :
 http://dicebot.blogspot.com/2013/05/short-performance-tuning-story.html
 
 One open question remains though - can D/Phobos do better here?
 Can some changes be done to Phobos functions in question to
 improve performance or creating bioinformatics-specialized
 library is only practical solution?

1. In general, if you want to operate on ASCII, and you want your code to be 
fast, use immutable(ubyte)[], not immutable(char)[]. Obviously, that's not 
gonig to work in this case, because the function is in std.string, but maybe 
that's a reason for some std.string functions to have ubyte overloads which 
are ASCII-specific.

2. We actually discussed removing all of the pattern stuff completely and 
replacing it with regexes (which is why countchars doesn't follow Phobos' 
naming scheme correctly - I left the pattern-using functions alone). However, 
that requires that someone who is appropriately familiar with regexes go and 
implement new versions of all of these functions which use std.regex. It 
should definitely be done, but no one has taken the time to do so yet.

3. While some functions in Phobos are well-optimized, there are plenty of them 
which aren't. They do the job, but no one has taken the time to optimize their 
implementations. This should be fixed, but again, it requires that someone 
spends the time to do the optimizations, and while that has been done for some 
functions, it definitely hasn't been done for all. And if python is faster than 
D at something, odds are that either the code in question is poorly written or 
that whatever Phobos functions it's using haven't been properly optimized yet.

- Jonathan M Davis

May 16 2013

Walter Bright <newshound2 digitalmars.com> writes:

On 5/16/2013 12:15 PM, Jonathan M Davis wrote:
 And if python is faster than
 D at something, odds are that either the code in question is poorly written or
 that whatever Phobos functions it's using haven't been properly optimized yet.

We should also be aware that while Python code itself is slow, its library 
functions are heavily optimized C code. So, if the benchmark consists of
calling 
a Python library function, it'll run as fast as any optimized C code.

May 16 2013

Jacob Carlborg <doob me.com> writes:

On 2013-05-16 21:54, Walter Bright wrote:

 We should also be aware that while Python code itself is slow, its
 library functions are heavily optimized C code. So, if the benchmark
 consists of calling a Python library function, it'll run as fast as any
 optimized C code.

But someone using Python won't care about that. Most of them will think 
they just use Python and have no idea there's optimized C code under the 
hood.

-- 
/Jacob Carlborg

May 17 2013

"John Colvin" <john.loughran.colvin gmail.com> writes:

On Friday, 17 May 2013 at 08:28:38 UTC, Jacob Carlborg wrote:
 On 2013-05-16 21:54, Walter Bright wrote:

 We should also be aware that while Python code itself is slow, 
 its
 library functions are heavily optimized C code. So, if the 
 benchmark
 consists of calling a Python library function, it'll run as 
 fast as any
 optimized C code.

 But someone using Python won't care about that. Most of them 
 will think they just use Python and have no idea there's 
 optimized C code under the hood.

I'm not sure how we can respond to that.

If naive D code has to be significantly faster than optimised C 
for people to not go "D sucks, it's only as fast as python" then 
we're pretty much doomed by peoples stupidity.

May 17 2013

"deadalnix" <deadalnix gmail.com> writes:

On Friday, 17 May 2013 at 10:09:11 UTC, John Colvin wrote:
 If naive D code has to be significantly faster than optimised C 
 for people to not go "D sucks, it's only as fast as python" 
 then we're pretty much doomed by peoples stupidity.

No. The whole benefit of D is lost if you have to tweak 
everything in complex way to get it run fast.

It means we failed at designing nice API.

Dev don't have years to sped on every existing language to know 
if it is good or not and figure out all the subtelties.

May 17 2013

"John Colvin" <john.loughran.colvin gmail.com> writes:

On Friday, 17 May 2013 at 11:26:27 UTC, deadalnix wrote:
 On Friday, 17 May 2013 at 10:09:11 UTC, John Colvin wrote:
 If naive D code has to be significantly faster than optimised 
 C for people to not go "D sucks, it's only as fast as python" 
 then we're pretty much doomed by peoples stupidity.

 No. The whole benefit of D is lost if you have to tweak 
 everything in complex way to get it run fast.

Define fast.

In some cases, if a naive call to a generic phobos function is as 
fast as an equivalent python library function then i'd say that's 
pretty good. Those python library functions are often 
impressively fast.

May 17 2013

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Thursday, May 16, 2013 12:54:35 Walter Bright wrote:
 On 5/16/2013 12:15 PM, Jonathan M Davis wrote:
 And if python is faster than
 D at something, odds are that either the code in question is poorly
 written or that whatever Phobos functions it's using haven't been
 properly optimized yet.

 We should also be aware that while Python code itself is slow, its library
 functions are heavily optimized C code. So, if the benchmark consists of
 calling a Python library function, it'll run as fast as any optimized C
 code.

I keep forgetting about that. That's a good thing to keep in mind when 
comparing performance - though part of me thinks that it says very poor things 
about your language if you have to write your code in other languages in order 
to make it fast enough (even if it were only the standard library where that 
happened).

- Jonathan M Davis

May 17 2013

"Dicebot" <m.strashun gmail.com> writes:

On Thursday, 16 May 2013 at 19:15:57 UTC, Jonathan M Davis wrote:
 1. In general, if you want to operate on ASCII, and you want 
 your code to be
 fast, use immutable(ubyte)[], not immutable(char)[]. Obviously, 
 that's not
 gonig to work in this case, because the function is in 
 std.string, but maybe
 that's a reason for some std.string functions to have ubyte 
 overloads which
 are ASCII-specific.

I was thinking exactly about that. Only thing I want to be 
advised on - is it better to add those overloads in std.string or 
separate module is better from the point of self-documentation?

 2. We actually discussed removing all of the pattern stuff 
 completely and
 replacing it with regexes.

Is is kind of pre-approved? I am willing to add this to my TODO 
list together with needed benchmarks, but had some doubts that 
std.string depending on std.regex will be tolerated.

 3. While some functions in Phobos are well-optimized, there are 
 plenty of them
 which aren't. They do the job, but no one has taken the time to 
 optimize their
 implementations. This should be fixed, but again, it requires 
 that someone
 spends the time to do the optimizations, and while that has 
 been done for some
 functions, it definitely hasn't been done for all. And if 
 python is faster than
 D at something, odds are that either the code in question is 
 poorly written or
 that whatever Phobos functions it's using haven't been properly 
 optimized yet.

I understand that. What I tried to bring attention to is how big 
difference it may be for someone who just picks random functions 
and writes some simple code. It is very tempting to just say 
"Phobos (D) sucks" and don't get into details. In other words I 
consider it more of informational/marketing issue than a 
technical one.

 - Jonathan M Davis

Thanks for your response, it was really helpful.

May 17 2013

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Friday, May 17, 2013 11:15:24 Dicebot wrote:
 On Thursday, 16 May 2013 at 19:15:57 UTC, Jonathan M Davis wrote:
 1. In general, if you want to operate on ASCII, and you want
 your code to be
 fast, use immutable(ubyte)[], not immutable(char)[]. Obviously,
 that's not
 gonig to work in this case, because the function is in
 std.string, but maybe
 that's a reason for some std.string functions to have ubyte
 overloads which
 are ASCII-specific.

 
 I was thinking exactly about that. Only thing I want to be
 advised on - is it better to add those overloads in std.string or
 separate module is better from the point of self-documentation?

I'm not sure. My first inclination would be to simply put them as overloads in 
the same module, but that probably merits some discussion. And while I think 
that having ubyte overloads for strings for ASCII is something that we should 
at least explore, it probably merits some discussion as well, as we haven't 
really done a lot with handling ASCII outside of std.ascii at this point 
(which currently only operates on characters, not strings). My first 
inclination is to handle ASCII where necessary by accepting arrays of ubytes, 
but others here may have other ideas about that (which may or may not be 
better).

A side note of that is that we might want to consider is having a function 
called assumeASCII which casts from string to immutable(ubyte)[] (similar to 
assumeUnique). I think that that might have been suggested before, but even if 
it has, we've never actually added it.

 2. We actually discussed removing all of the pattern stuff
 completely and
 replacing it with regexes.

 
 Is is kind of pre-approved? I am willing to add this to my TODO
 list together with needed benchmarks, but had some doubts that
 std.string depending on std.regex will be tolerated.

AFAIK, there would be no problem with doing so. Maybe Dmitry would have 
something to say about it, since he's the regex guru, but IIRC, the last time 
it was discussed, it was pretty clear that we wanted those functions to be 
using std.regex instead of patterns. So, if you did the work and did it at the 
appropriate quality level, I expect that it would be merged in. And we might 
or might now deprecate the pattern functions at that point (that was 
originally my intention and is why I never fixed their names, but we're not 
deprecating much now, so I don't know if we'll want to in this case).

 I understand that. What I tried to bring attention to is how big
 difference it may be for someone who just picks random functions
 and writes some simple code. It is very tempting to just say
 "Phobos (D) sucks" and don't get into details. In other words I
 consider it more of informational/marketing issue than a
 technical one.

We need to do more to optimize Phobos, but given our stance of correctness by 
default, we're kind of stuck with string functions taking a performance hit in 
a number of common cases simply due to the necessary decoding of code points. 
We can do better at making them fast, and reduce problems like this, but 
ultimately, if you want fast ASCII-only operations, you almost certainly need 
to operate on something like ubyte[] rather than string, and that requires 
educating people. It's one of the costs of trying to be both correct and 
performant.

- Jonathan M Davis

May 17 2013

Samuel Lampa <samuel.lampa gmail.com> writes:

On 05/17/2013 11:41 AM, Jonathan M Davis wrote:
 We need to do more to optimize Phobos, but given our stance of correctness by
 default, we're kind of stuck with string functions taking a performance hit in
 a number of common cases simply due to the necessary decoding of code points.
 We can do better at making them fast, and reduce problems like this, but
 ultimately, if you want fast ASCII-only operations, you almost certainly need
 to operate on something like ubyte[] rather than string, and that requires
 educating people. It's one of the costs of trying to be both correct and
 performant.

At least I'm now educated on this :")

// Samuel

May 17 2013

D Programming

C/C++ Programming

Other

digitalmars.D - Investigation: downsides of being generic and correct