digitalmars.D.learn - Why is stdin.byLine.writeln so slow?

Jyxent (11/11) Jun 13 2014 I've been playing around with D and noticed that:

monarch_dodra (20/31) Jun 13 2014 Because:

=?UTF-8?B?QWxpIMOHZWhyZWxp?= (7/13) Jun 13 2014 Do you mean writeln() first generates an array and then prints that

monarch_dodra (20/35) Jun 13 2014 No, it just receives a range, so it does range formating. eg:

=?UTF-8?B?QWxpIMOHZWhyZWxp?= (11/13) Jun 13 2014 It still looks like it could send the formatting characters as well as

monarch_dodra (14/27) Jun 13 2014 We'd have to check, but don't think that formatted write actually
"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (9/21) Jun 14 2014 It already does what you suggest, and doesn't constructing one

H. S. Teoh via Digitalmars-d-learn (27/39) Jun 13 2014 I wrote part of that documentation, and my favorite example is matrix

monarch_dodra (18/30) Jun 13 2014 On Friday, 13 June 2014 at 22:25:25 UTC, H. S. Teoh via

Jyxent (5/43) Jun 13 2014 Hah. You're right. I had seen writeln being used this way and

"Jyxent" <jyxent example.com> writes:

I've been playing around with D and noticed that:

stdin.byLine.writeln

takes ~20 times as long as:

foreach(line; stdin.byLine) writeln(line);

I asked on IRC and this was suggested:

stdin.byLine(KeepTerminator.yes).copy(stdout.lockingTextWriter)

which is slightly faster than the foreach case.

It was suggested that there is something slow about writeln 
taking the input range, but I'm not sure I see why.  If I follow 
the code correctly, formatRange in std.format will eventually be 
called and iterate over the range.

Jun 13 2014

"monarch_dodra" <monarchdodra gmail.com> writes:

On Friday, 13 June 2014 at 20:48:16 UTC, Jyxent wrote:
 I've been playing around with D and noticed that:

 stdin.byLine.writeln

 takes ~20 times as long as:

 foreach(line; stdin.byLine) writeln(line);

 I asked on IRC and this was suggested:

 stdin.byLine(KeepTerminator.yes).copy(stdout.lockingTextWriter)

 which is slightly faster than the foreach case.

 It was suggested that there is something slow about writeln 
 taking the input range, but I'm not sure I see why.  If I 
 follow the code correctly, formatRange in std.format will 
 eventually be called and iterate over the range.

Because:
stdin.byLine.writeln
and
foreach(line; stdin.byLine) writeln(line);
Don't produce the same output. One prints a range that contains 
strings, whereas the second repeatedly prints strings.

Given this input:
line 1
line	2
Yo!

Then "stdin.byLine.writeln" will produce this string:
["line 1", "line\t2", "Yo!"]

So that's the extra overhead which is slowing you down, because 
*each* character needs to be individually parsed, and potentially 
escaped (eg: "\t").

The "copy" option is the same as the foreach one, since each 
string is individually passed to the writeln, which doesn't parse 
your string. The "lockingTextWriter" is just sugar to squeeze out 
extra speed.

Jun 13 2014

=?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:

On 06/13/2014 02:08 PM, monarch_dodra wrote:

 Given this input:
 line 1
 line    2
 Yo!

 Then "stdin.byLine.writeln" will produce this string:
 ["line 1", "line\t2", "Yo!"]

Do you mean writeln() first generates an array and then prints that 
array? I've always imagined that it used the range interface and did 
similar to what copy() does.

Is there a good reason why the imagined-by-me-range-overload of 
writeln() behaves that way?

Ali

Jun 13 2014

"monarch_dodra" <monarchdodra gmail.com> writes:

On Friday, 13 June 2014 at 21:17:27 UTC, Ali Çehreli wrote:
 On 06/13/2014 02:08 PM, monarch_dodra wrote:

 Given this input:
 line 1
 line    2
 Yo!

 Then "stdin.byLine.writeln" will produce this string:
 ["line 1", "line\t2", "Yo!"]

 Do you mean writeln() first generates an array and then prints 
 that array?

No, it just receives a range, so it does range formating. eg:
"[" ~ Element ~ ", " ~ Element ... "]".

 I've always imagined that it used the range interface and did 
 similar to what copy() does.

That wouldn't make sense. Then, if I did "[1, 2, 3].writeln();", 
it would print:
123
instead of
[1, 2, 3]

 Is there a good reason why the imagined-by-me-range-overload of 
 writeln() behaves that way?

 Ali

As I said, it's a range, so it prints a range. That's all there 
is to it.

That said, you can use one of D's most powerful formating 
abilities: Range formating:
writefln("%-(%s\n%)", stdin.byLine());

And BOOM. Does what you want. I freaking love range formatting.
More info here:


TLDR:
%( => range start
%) => range end
%-( => range start without element escape (for strings mostly).

Jun 13 2014

=?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:

On 06/13/2014 03:02 PM, monarch_dodra wrote:

 No, it just receives a range, so it does range formating. eg:
 "[" ~ Element ~ ", " ~ Element ... "]".

It still looks like it could send the formatting characters as well as 
the elements separately to the output stream:

"["
Element
", "
...
"]"

I am assuming that the slowness in OP's example is due to constructing a 
long string.

Ali

Jun 13 2014

"monarch_dodra" <monarchdodra gmail.com> writes:

On Friday, 13 June 2014 at 22:12:01 UTC, Ali Çehreli wrote:
 On 06/13/2014 03:02 PM, monarch_dodra wrote:

 No, it just receives a range, so it does range formating. eg:
 "[" ~ Element ~ ", " ~ Element ... "]".

 It still looks like it could send the formatting characters as 
 well as the elements separately to the output stream:

 "["
 Element
 ", "
 ...
 "]"

 I am assuming that the slowness in OP's example is due to 
 constructing a long string.

 Ali

We'd have to check, but don't think that formatted write actually 
ever allocates anywhere, so there should be no "constructing a 
long string". The real issue (I think), is that when you ask 
formatted write to write a string, it just pipes the entire char 
array at once to the underlying stream.

If the characters are escaped though (which is the case when you 
print an array of strings), then formatedWrite needs to check 
each character individually, and then also pass each character 
individually to the underlying stream. And *that* could 
definitely justify the order of magnitude slowdown observed.

What's more this *may* trigger a per-character decode-encode 
loop. I'd have to check. But that shouldn't be observable next to 
the stream overhead anyways.

Jun 13 2014

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:

On Friday, 13 June 2014 at 22:12:01 UTC, Ali Çehreli wrote:
 On 06/13/2014 03:02 PM, monarch_dodra wrote:

 No, it just receives a range, so it does range formating. eg:
 "[" ~ Element ~ ", " ~ Element ... "]".

 It still looks like it could send the formatting characters as 
 well as the elements separately to the output stream:

 "["
 Element
 ", "
 ...
 "]"

 I am assuming that the slowness in OP's example is due to 
 constructing a long string.

It already does what you suggest, and doesn't constructing one 
big string. You can test this

     void main() {
         import std.stdio;
         stdin.byLine.writeln;
     }

When you type in several lines in the terminal, it will output 
the first element as soon as you pressed enter for the first line.

Jun 14 2014

"H. S. Teoh via Digitalmars-d-learn" <digitalmars-d-learn puremagic.com> writes:

On Fri, Jun 13, 2014 at 10:02:49PM +0000, monarch_dodra via Digitalmars-d-learn
wrote:
[...]
 That said, you can use one of D's most powerful formating abilities:
 Range formating:
 writefln("%-(%s\n%)", stdin.byLine());
 
 And BOOM. Does what you want. I freaking love range formatting.
 More info here:

 
 TLDR:
 %( => range start
 %) => range end
 %-( => range start without element escape (for strings mostly).

I wrote part of that documentation, and my favorite example is matrix
formatting:

	auto mat = [[1,2,3], [4,5,6], [7,8,9]];
	writefln("[%([%(%d %)]%|\n %)]", mat);

Output:

	[[1 2 3]
	 [4 5 6]
	 [7 8 9]]

D coolness at its finest!

Whoever invented %(, %|, %) is a genius. It takes C's printf formatting
from weak sauce to whole new levels of awesome. I remember debugging
some range-based code, and being able to write stuff like:

	debug writefln("%(%(%s, %); %)", buggyNestedRange().take(10));

at strategic spots in the code is just pure win.  In C/C++, you'd have
to manually write nested loops to print out the data, which may involve
manually calling accessor methods, manually counting them, perhaps
storing intermediate output fragments in temporary buffers,
encapsulating all this jazz in a throwaway function so that you can use
it at multiple strategic points (in D you just copy-n-paste the single
line above), etc..  Pure lose.

(Speaking of which, this might be an awesome lightning talk topic at the
next DConf. ;-) Or did somebody already do it?)


T

-- 
Having a smoking section in a restaurant is like having a peeing section in a
swimming pool. -- Edward Burr

Jun 13 2014

"monarch_dodra" <monarchdodra gmail.com> writes:

On Friday, 13 June 2014 at 22:25:25 UTC, H. S. Teoh via 
Digitalmars-d-learn wrote:
In C/C++,
 you'd have
 to manually write nested loops to print out the data, which may 
 involve
 manually calling accessor methods, manually counting them, 
 perhaps
 storing intermediate output fragments in temporary buffers,
 encapsulating all this jazz in a throwaway function so that you 
 can use
 it at multiple strategic points (in D you just copy-n-paste the 
 single
 line above), etc..  Pure lose.

 T

In C++, I usually use copy/transform:

*std::copy(begin(), end(), std::ostream_iterator<T>(std::cout, 
"\n")) = "\n";
or
*std::tranform(begin(), end(), 
std::ostream_iterator<T>(std::cout, "\n"), [](???){???}) = "\n";

It's a bit verbose, and looks like ass to the non-initiated, but 
once you are used to it, it's quite convenient. It's just 
something that grows on you. You can stack on a "foreach" if you 
need more "depth".

foreach(begin(), end(), [](R& r){
   *std::copy(r.begin(), r.end(), 
std::ostream_iterator<T>(std::cout, "\n")) = "\n";
});

Though arguably, that's just a loop in disguise :)

Jun 13 2014

"Jyxent" <jyxent example.com> writes:

On Friday, 13 June 2014 at 21:08:08 UTC, monarch_dodra wrote:
 On Friday, 13 June 2014 at 20:48:16 UTC, Jyxent wrote:
 I've been playing around with D and noticed that:

 stdin.byLine.writeln

 takes ~20 times as long as:

 foreach(line; stdin.byLine) writeln(line);

 I asked on IRC and this was suggested:

 stdin.byLine(KeepTerminator.yes).copy(stdout.lockingTextWriter)

 which is slightly faster than the foreach case.

 It was suggested that there is something slow about writeln 
 taking the input range, but I'm not sure I see why.  If I 
 follow the code correctly, formatRange in std.format will 
 eventually be called and iterate over the range.

 Because:
 stdin.byLine.writeln
 and
 foreach(line; stdin.byLine) writeln(line);
 Don't produce the same output. One prints a range that contains 
 strings, whereas the second repeatedly prints strings.

 Given this input:
 line 1
 line	2
 Yo!

 Then "stdin.byLine.writeln" will produce this string:
 ["line 1", "line\t2", "Yo!"]

 So that's the extra overhead which is slowing you down, because 
 *each* character needs to be individually parsed, and 
 potentially escaped (eg: "\t").

 The "copy" option is the same as the foreach one, since each 
 string is individually passed to the writeln, which doesn't 
 parse your string. The "lockingTextWriter" is just sugar to 
 squeeze out extra speed.

Hah.  You're right.  I had seen writeln being used this way and 
just assumed that it printed every line, without looking at the 
output too closely.

Thanks for clearing that up.

Jun 13 2014

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Why is stdin.byLine.writeln so slow?