www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Why is stdin.byLine.writeln so slow?

reply "Jyxent" <jyxent example.com> writes:
I've been playing around with D and noticed that:

stdin.byLine.writeln

takes ~20 times as long as:

foreach(line; stdin.byLine) writeln(line);

I asked on IRC and this was suggested:

stdin.byLine(KeepTerminator.yes).copy(stdout.lockingTextWriter)

which is slightly faster than the foreach case.

It was suggested that there is something slow about writeln 
taking the input range, but I'm not sure I see why.  If I follow 
the code correctly, formatRange in std.format will eventually be 
called and iterate over the range.
Jun 13 2014
parent reply "monarch_dodra" <monarchdodra gmail.com> writes:
On Friday, 13 June 2014 at 20:48:16 UTC, Jyxent wrote:
 I've been playing around with D and noticed that:

 stdin.byLine.writeln

 takes ~20 times as long as:

 foreach(line; stdin.byLine) writeln(line);

 I asked on IRC and this was suggested:

 stdin.byLine(KeepTerminator.yes).copy(stdout.lockingTextWriter)

 which is slightly faster than the foreach case.

 It was suggested that there is something slow about writeln 
 taking the input range, but I'm not sure I see why.  If I 
 follow the code correctly, formatRange in std.format will 
 eventually be called and iterate over the range.
Because: stdin.byLine.writeln and foreach(line; stdin.byLine) writeln(line); Don't produce the same output. One prints a range that contains strings, whereas the second repeatedly prints strings. Given this input: line 1 line 2 Yo! Then "stdin.byLine.writeln" will produce this string: ["line 1", "line\t2", "Yo!"] So that's the extra overhead which is slowing you down, because *each* character needs to be individually parsed, and potentially escaped (eg: "\t"). The "copy" option is the same as the foreach one, since each string is individually passed to the writeln, which doesn't parse your string. The "lockingTextWriter" is just sugar to squeeze out extra speed.
Jun 13 2014
next sibling parent reply =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
On 06/13/2014 02:08 PM, monarch_dodra wrote:

 Given this input:
 line 1
 line    2
 Yo!

 Then "stdin.byLine.writeln" will produce this string:
 ["line 1", "line\t2", "Yo!"]
Do you mean writeln() first generates an array and then prints that array? I've always imagined that it used the range interface and did similar to what copy() does. Is there a good reason why the imagined-by-me-range-overload of writeln() behaves that way? Ali
Jun 13 2014
parent reply "monarch_dodra" <monarchdodra gmail.com> writes:
On Friday, 13 June 2014 at 21:17:27 UTC, Ali Çehreli wrote:
 On 06/13/2014 02:08 PM, monarch_dodra wrote:

 Given this input:
 line 1
 line    2
 Yo!

 Then "stdin.byLine.writeln" will produce this string:
 ["line 1", "line\t2", "Yo!"]
Do you mean writeln() first generates an array and then prints that array?
No, it just receives a range, so it does range formating. eg: "[" ~ Element ~ ", " ~ Element ... "]".
 I've always imagined that it used the range interface and did 
 similar to what copy() does.
That wouldn't make sense. Then, if I did "[1, 2, 3].writeln();", it would print: 123 instead of [1, 2, 3]
 Is there a good reason why the imagined-by-me-range-overload of 
 writeln() behaves that way?

 Ali
As I said, it's a range, so it prints a range. That's all there is to it. That said, you can use one of D's most powerful formating abilities: Range formating: writefln("%-(%s\n%)", stdin.byLine()); And BOOM. Does what you want. I freaking love range formatting. More info here: TLDR: %( => range start %) => range end %-( => range start without element escape (for strings mostly).
Jun 13 2014
next sibling parent reply =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
On 06/13/2014 03:02 PM, monarch_dodra wrote:

 No, it just receives a range, so it does range formating. eg:
 "[" ~ Element ~ ", " ~ Element ... "]".
It still looks like it could send the formatting characters as well as the elements separately to the output stream: "[" Element ", " ... "]" I am assuming that the slowness in OP's example is due to constructing a long string. Ali
Jun 13 2014
next sibling parent "monarch_dodra" <monarchdodra gmail.com> writes:
On Friday, 13 June 2014 at 22:12:01 UTC, Ali Çehreli wrote:
 On 06/13/2014 03:02 PM, monarch_dodra wrote:

 No, it just receives a range, so it does range formating. eg:
 "[" ~ Element ~ ", " ~ Element ... "]".
It still looks like it could send the formatting characters as well as the elements separately to the output stream: "[" Element ", " ... "]" I am assuming that the slowness in OP's example is due to constructing a long string. Ali
We'd have to check, but don't think that formatted write actually ever allocates anywhere, so there should be no "constructing a long string". The real issue (I think), is that when you ask formatted write to write a string, it just pipes the entire char array at once to the underlying stream. If the characters are escaped though (which is the case when you print an array of strings), then formatedWrite needs to check each character individually, and then also pass each character individually to the underlying stream. And *that* could definitely justify the order of magnitude slowdown observed. What's more this *may* trigger a per-character decode-encode loop. I'd have to check. But that shouldn't be observable next to the stream overhead anyways.
Jun 13 2014
prev sibling parent "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:
On Friday, 13 June 2014 at 22:12:01 UTC, Ali Çehreli wrote:
 On 06/13/2014 03:02 PM, monarch_dodra wrote:

 No, it just receives a range, so it does range formating. eg:
 "[" ~ Element ~ ", " ~ Element ... "]".
It still looks like it could send the formatting characters as well as the elements separately to the output stream: "[" Element ", " ... "]" I am assuming that the slowness in OP's example is due to constructing a long string.
It already does what you suggest, and doesn't constructing one big string. You can test this void main() { import std.stdio; stdin.byLine.writeln; } When you type in several lines in the terminal, it will output the first element as soon as you pressed enter for the first line.
Jun 14 2014
prev sibling parent reply "H. S. Teoh via Digitalmars-d-learn" <digitalmars-d-learn puremagic.com> writes:
On Fri, Jun 13, 2014 at 10:02:49PM +0000, monarch_dodra via Digitalmars-d-learn
wrote:
[...]
 That said, you can use one of D's most powerful formating abilities:
 Range formating:
 writefln("%-(%s\n%)", stdin.byLine());
 
 And BOOM. Does what you want. I freaking love range formatting.
 More info here:

 
 TLDR:
 %( => range start
 %) => range end
 %-( => range start without element escape (for strings mostly).
I wrote part of that documentation, and my favorite example is matrix formatting: auto mat = [[1,2,3], [4,5,6], [7,8,9]]; writefln("[%([%(%d %)]%|\n %)]", mat); Output: [[1 2 3] [4 5 6] [7 8 9]] D coolness at its finest! Whoever invented %(, %|, %) is a genius. It takes C's printf formatting from weak sauce to whole new levels of awesome. I remember debugging some range-based code, and being able to write stuff like: debug writefln("%(%(%s, %); %)", buggyNestedRange().take(10)); at strategic spots in the code is just pure win. In C/C++, you'd have to manually write nested loops to print out the data, which may involve manually calling accessor methods, manually counting them, perhaps storing intermediate output fragments in temporary buffers, encapsulating all this jazz in a throwaway function so that you can use it at multiple strategic points (in D you just copy-n-paste the single line above), etc.. Pure lose. (Speaking of which, this might be an awesome lightning talk topic at the next DConf. ;-) Or did somebody already do it?) T -- Having a smoking section in a restaurant is like having a peeing section in a swimming pool. -- Edward Burr
Jun 13 2014
parent "monarch_dodra" <monarchdodra gmail.com> writes:
On Friday, 13 June 2014 at 22:25:25 UTC, H. S. Teoh via 
Digitalmars-d-learn wrote:
In C/C++,
 you'd have
 to manually write nested loops to print out the data, which may 
 involve
 manually calling accessor methods, manually counting them, 
 perhaps
 storing intermediate output fragments in temporary buffers,
 encapsulating all this jazz in a throwaway function so that you 
 can use
 it at multiple strategic points (in D you just copy-n-paste the 
 single
 line above), etc..  Pure lose.

 T
In C++, I usually use copy/transform: *std::copy(begin(), end(), std::ostream_iterator<T>(std::cout, "\n")) = "\n"; or *std::tranform(begin(), end(), std::ostream_iterator<T>(std::cout, "\n"), [](???){???}) = "\n"; It's a bit verbose, and looks like ass to the non-initiated, but once you are used to it, it's quite convenient. It's just something that grows on you. You can stack on a "foreach" if you need more "depth". foreach(begin(), end(), [](R& r){ *std::copy(r.begin(), r.end(), std::ostream_iterator<T>(std::cout, "\n")) = "\n"; }); Though arguably, that's just a loop in disguise :)
Jun 13 2014
prev sibling parent "Jyxent" <jyxent example.com> writes:
On Friday, 13 June 2014 at 21:08:08 UTC, monarch_dodra wrote:
 On Friday, 13 June 2014 at 20:48:16 UTC, Jyxent wrote:
 I've been playing around with D and noticed that:

 stdin.byLine.writeln

 takes ~20 times as long as:

 foreach(line; stdin.byLine) writeln(line);

 I asked on IRC and this was suggested:

 stdin.byLine(KeepTerminator.yes).copy(stdout.lockingTextWriter)

 which is slightly faster than the foreach case.

 It was suggested that there is something slow about writeln 
 taking the input range, but I'm not sure I see why.  If I 
 follow the code correctly, formatRange in std.format will 
 eventually be called and iterate over the range.
Because: stdin.byLine.writeln and foreach(line; stdin.byLine) writeln(line); Don't produce the same output. One prints a range that contains strings, whereas the second repeatedly prints strings. Given this input: line 1 line 2 Yo! Then "stdin.byLine.writeln" will produce this string: ["line 1", "line\t2", "Yo!"] So that's the extra overhead which is slowing you down, because *each* character needs to be individually parsed, and potentially escaped (eg: "\t"). The "copy" option is the same as the foreach one, since each string is individually passed to the writeln, which doesn't parse your string. The "lockingTextWriter" is just sugar to squeeze out extra speed.
Hah. You're right. I had seen writeln being used this way and just assumed that it printed every line, without looking at the output too closely. Thanks for clearing that up.
Jun 13 2014