www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.announce - wc in D: 712 Characters Without a Single Branch

reply Mike Parker <aldacron gmail.com> writes:
Robert Schadek was inspired by a post he saw on Hacker News a 
while back showing an implementation of wc in Haskell totaling 80 
lines. He decided he could do better in D. So he did. This post 
on the D blog shows what he came up with and also provides a 
brief introduction to ranges.


The blog:
https://dlang.org/blog/2020/01/28/wc-in-d-712-characters-without-a-single-branch/

Reddit:
https://www.reddit.com/r/programming/comments/ev5w91/wc_in_d_712_characters_without_a_single_branch/

It's also on Hacker News, but please don't post a direct link to 
it if you find it there.

https://news.ycombinator.com/
Jan 28
next sibling parent Steven Schveighoffer <schveiguy gmail.com> writes:
On 1/28/20 9:01 AM, Mike Parker wrote:
 Robert Schadek was inspired by a post he saw on Hacker News a while back 
 showing an implementation of wc in Haskell totaling 80 lines. He decided 
 he could do better in D. So he did. This post on the D blog shows what 
 he came up with and also provides a brief introduction to ranges.
 
 
 The blog:
 https://dlang.org/blog/2020/01/28/wc-in-d-712-characters-with
ut-a-single-branch/ 
 
 
 Reddit:
 https://www.reddit.com/r/programming/comments/ev5w91/wc_in_d_712_characters_with
ut_a_single_branch/ 
 
 
 It's also on Hacker News, but please don't post a direct link to it if 
 you find it there.
 
 https://news.ycombinator.com/
Nice! I bet you get better performance with a tweak: you are currently iterating each line 2x (3x if you count searching for the line ending). If you did both at once, the toLine function would be longer, and probably have some branching in it. But I bet you get close to double performance for that. You could also combine all 3 checks into one (if you are checking for spaces, you might as well check for newlines). But newline searching is pretty fast for byLine. iopipe could probably do a bit better ;) In any case, I totally get what you are saying: 15 minutes to write a program that works, is easily understandable, but has performance issues for unusual input is pretty cool. -Steve
Jan 28
prev sibling next sibling parent OlaOst <olaa81 gmail.com> writes:
On Tuesday, 28 January 2020 at 14:01:35 UTC, Mike Parker wrote:
 Robert Schadek was inspired by a post he saw on Hacker News a 
 while back showing an implementation of wc in Haskell totaling 
 80 lines. He decided he could do better in D. So he did. This 
 post on the D blog shows what he came up with and also provides 
 a brief introduction to ranges.


 The blog:
 https://dlang.org/blog/2020/01/28/wc-in-d-712-characters-without-a-single-branch/

 Reddit:
 https://www.reddit.com/r/programming/comments/ev5w91/wc_in_d_712_characters_without_a_single_branch/

 It's also on Hacker News, but please don't post a direct link 
 to it if you find it there.

 https://news.ycombinator.com/
Golfed down to 9 lines: import std; void main(string[] args) { args[1].File.byLine(Yes.keepTerminator) .map!(l => [l.byCodePoint.walkLength, l.splitter.walkLength]) .fold!((o, l) => [o[0]+1, o[1]+l[1], o[2]+l[0]])([0uL, 0, 0]) .writefln!"%(%u %) %s"(args[1]); } Sacrificed most of the readability of course, but I think it is functionally equivalent.
Jan 28
prev sibling next sibling parent Paul Backus <snarwin gmail.com> writes:
On Tuesday, 28 January 2020 at 14:01:35 UTC, Mike Parker wrote:
 Robert Schadek was inspired by a post he saw on Hacker News a 
 while back showing an implementation of wc in Haskell totaling 
 80 lines. He decided he could do better in D. So he did. This 
 post on the D blog shows what he came up with and also provides 
 a brief introduction to ranges.


 The blog:
 https://dlang.org/blog/2020/01/28/wc-in-d-712-characters-without-a-single-branch/
 unittest {
 	foreach(it; Iota(1,10).Filter(&isEven)) {
 		writeln(it);
 	}
 }
Today I learned you can use UFCS with struct literals/constructors.
Jan 28
prev sibling next sibling parent reply Petar Kirov [ZombineDev] <petar.p.kirov gmail.com> writes:
On Tuesday, 28 January 2020 at 14:01:35 UTC, Mike Parker wrote:
 [snip]
BTW, while playing with a solution of my own [0] I noticed that both mine and Robert's version return different results for the following input [1]:
       expected: 
 '\u0003\u0000\u0000\u00005èÆÕL]\u0012|Î¾ž\u001a7«›\u00052\u0011(ЗY\n<\u0010\u0000\u0000\u0000\u0000\u0000\u0000e!ßd/ñõì\f:z¦Î¦±ç·÷Í¢Ëß\u00076*…\bŽ—ñžùC1É
Àé2\u001aӆBŒ' },
To reproduce: $ curl -fsSL https://github.com/ethjs/ethjs-util/raw/e9aede668177b6d1ea62d741ba1c19402bc337b3/src tests/test.index.js | sed '350q;d' > input $ ./robert input 1 4 190 input $ wc -lwm input 1 3 190 input [0]:
 import std.algorithm : count, splitter;
 import std.stdio : File, writefln;
 import std.typecons : Yes;

 void main(string[] args) {
     size_t lines, words, bytes;
     foreach (line; args[1].File.byLine(Yes.keepTerminator)) {
         lines++;
         bytes += line.count;
         words += line.splitter.count;
     }
     writefln!"%u %u %u %s"(lines, words, bytes, args[1]);
 }
[1]: https://github.com/ethjs/ethjs-util/blob/e9aede668177b6d1ea62d741ba1c19402bc337b3/src/tests/test.index.js#L350
Jan 28
next sibling parent Petar Kirov [ZombineDev] <petar.p.kirov gmail.com> writes:
On Tuesday, 28 January 2020 at 21:40:40 UTC, Petar Kirov 
[ZombineDev] wrote:
 [snip]

 import std.algorithm : count, splitter;
 import std.stdio : File, writefln;
 import std.typecons : Yes;

 void main(string[] args) {
     size_t lines, words, bytes;
     foreach (line; args[1].File.byLine(Yes.keepTerminator)) {
         lines++;
         bytes += line.count;
         words += line.splitter.count;
     }
     writefln!"%u %u %u %s"(lines, words, bytes, args[1]);
 }
[1]: https://github.com/ethjs/ethjs-util/blob/e9aede668177b6d1ea62d741ba1c19402bc337b3/src/tests/test.index.js#L350
s/bytes/chars/
Jan 28
prev sibling parent Petar Kirov [ZombineDev] <petar.p.kirov gmail.com> writes:
On Tuesday, 28 January 2020 at 21:40:40 UTC, Petar Kirov 
[ZombineDev] wrote:
 BTW, while playing with a solution of my own [0] I noticed that 
 both mine and Robert's version return different [... snip]
I found the culprit - iswspace. For more info see: https://www.mail-archive.com/bug-coreutils gnu.org/msg30803.html
Jan 28
prev sibling parent reply sarn <sarn theartofmachinery.com> writes:
On Tuesday, 28 January 2020 at 14:01:35 UTC, Mike Parker wrote:
 Robert Schadek was inspired by a post he saw on Hacker News a 
 while back showing an implementation of wc in Haskell totaling 
 80 lines.
I enjoyed the article overall, but I think this part lets it down a bit:
 Is the Haskell wc faster? For big files, absolutely, but then 
 it is using threads. For small files, GNU’s coreutils still 
 beats the competition. At this stage my version is very likely 
 IO bound, and it’s fast enough anyway.
Admit it, "my version is very likely IO bound" is hand-wavey. The top comment on HN right now is pointing out that it doesn't make sense. It would be a better tech article if it stuck to the facts by either cutting out the hand-wavey bit (and just saying "it's fast enough anyway") or doing a test. (Quick and dirty way: running the code on a large file and seeing if it maxes out a CPU using the "top" command. More elegant way: using tools like these https://github.com/sysstat/sysstat/) Remember: plenty of us on the forums are happy to help make a D article more interesting.
Jan 28
parent Steven Schveighoffer <schveiguy gmail.com> writes:
On 1/28/20 5:11 PM, sarn wrote:
 Admit it, "my version is very likely IO bound" is hand-wavey. The top 
 comment on HN right now is pointing out that it doesn't make sense.
I don't think it's i/o bound. Doing the split/walk-length and then walk-length again is likely making it 2x slower. Phobos' byLine is not the fastest you can get, but it's not a slouch either. It also tries to use as many libc tricks as it can. I would say the key point is that it's fast enough for most purposes, no matter what the actual performance problem is. -Steve
Jan 28