www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - [improve-it] Parsing NG archive and sorting by post-count

reply Andrej Mitrovic <none none.none> writes:
I thought about making a kind of code-golf contest (stackoverflow usually has
these contests). Only I would focus on improving each others code.

So here's my idea of the day: Parse the newsgroup archive files from
http://www.digitalmars.com/NewsGroup.html, and for each .html file output
another .html file which has a list of topics sorted in post count order. Sure,
there is NG software which does this automatically. But this is about doing it
in D.

Here's my implementation: https://gist.github.com/871631

Download a few .html files, save them in their own folder. Then copy my script
into a .d file in the same folder, and just run it with RDMD. It will output
the files in a `output`subfolder. It works on Windows, since that's all I've
tested it with.

There's a few things I've noticed: Using just a simple hash with the post count
as the Key type wouldn't work. There are many topics which have the same post
count number, and AA's can't hold duplicates. So I worked around this by making
a wrapper which hides all the details of storing duplicates and traversal, I've
called it `CommonAA`.

I've also implemented an `allSatisfy` function which works on runtime
arguments. There's a similar function in std.typetuple, but its only useful for
compile-time arguments. There's probably a similar method someplace in
std.algorithm, but I was too lazy to check. I thought it would be nice to have.

I can see some ways to improve this. For one, I could have used Regex instead
of indexOf. I could have also tried to avoid using a wrapper, however I haven't
figured out a way to do this while having duplicate key types and having to
sort them while keeping the Key types linked to the Values.

Anywho, let's see you improve my code! It's just for fun and maybe we'll learn
some tricks from one another. Have fun!
Mar 15 2011
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Andrej Mitrovic:

 I've also implemented an `allSatisfy` function which works on runtime
arguments. There's a similar function in std.typetuple, but its only useful for
compile-time arguments. There's probably a similar method someplace in
std.algorithm, but I was too lazy to check. I thought it would be nice to have.

http://d.puremagic.com/issues/show_bug.cgi?id=4405
 Anywho, let's see you improve my code! It's just for fun and maybe we'll learn
some tricks from one another. Have fun!

I suggest you to add unit tests and Contracts to your CommonAA() and allSatisfy() :-) Have you tried to replace this: if (key in payload) { payload[key] ~= val; } else { payload[key] = [val]; } With just: payload[key] ~= val; I suggest to replace this: sortedKeys.sort; With: sortedKeys.sort(); Bye, bearophile
Mar 15 2011
next sibling parent reply Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
On 3/15/11, Andrej Mitrovic <andrej.mitrovich gmail.com> wrote:
 I suggest to replace this:
 sortedKeys.sort;

 With:
 sortedKeys.sort();

Yes, I prefer it that way too.

Correction: DMD complains about having parentheses, in fact it's an error: ngparser.d(28): Error: undefined identifier module ngparser.sort So I've had to remove them. And again that's that uninformative error message which I don't like.
Mar 15 2011
parent bearophile <bearophileHUGS lycos.com> writes:
Andrej Mitrovic:

 Correction: DMD complains about having parentheses, in fact it's an error:
 ngparser.d(28): Error: undefined identifier module ngparser.sort
 
 So I've had to remove them. And again that's that uninformative error
 message which I don't like.

Sorry, this time the uninformative text was mine :-) When I have suggested you to add the () after the sort, I meant to suggest you to use the std.algorithm sort instead of the deprecated built-in one, because the built-in one is slow and it has bad bugs, like this one I've found: http://d.puremagic.com/issues/show_bug.cgi?id=2819 Bye, bearophile
Mar 15 2011
prev sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
On 3/16/11, bearophile <bearophileHUGS lycos.com> wrote:
 I meant to suggest you to use the
 std.algorithm sort instead of the deprecated built-in one, because the
 built-in one is slow and it has bad bugs, like this one I've found:
 http://d.puremagic.com/issues/show_bug.cgi?id=2819

Thanks, I didn't know about the bugs. .
Mar 15 2011
prev sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
On 3/15/11, bearophile <bearophileHUGS lycos.com> wrote:
 Andrej Mitrovic:

 I've also implemented an `allSatisfy` function which works on runtime
 arguments. There's a similar function in std.typetuple, but its only
 useful for compile-time arguments. There's probably a similar method
 someplace in std.algorithm, but I was too lazy to check. I thought it
 would be nice to have.

http://d.puremagic.com/issues/show_bug.cgi?id=4405

Cool, I was afraid I was reinventing the wheel.
 I suggest you to add unit tests and Contracts to your CommonAA() and
 allSatisfy() :-)

allSatisfy definitely doesn't work for a bunch of cases, like passing a delegate instead of a literal. And CommonAA doesn't take into account things like removing elements, etc. It's definitely a half-ass implementation. :p
 Have you tried to replace this:

         if (key in payload)
         {
             payload[key] ~= val;
         }
         else
         {
             payload[key] = [val];
         }

 With just:

         payload[key] ~= val;

Good catch. Since the value type is an array I could simply append to it. Although one didn't exist yet, so I figure I had to assign something to an empty spot in an AA. Oh well..
 I suggest to replace this:
 sortedKeys.sort;

 With:
 sortedKeys.sort();

Yes, I prefer it that way too. Since DMD doesn't complain about it (is sort even a property?), I missed it. Thanks for the input.
Mar 15 2011