www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - A Class in Composition

reply "Chris" <wendlec tcd.ie> writes:
I had a very simple method that would read a text file, parse it 
and create a lexicon of the type string[string].

public void loadLexicon(ref string[string] lex, string src)

It was high time I made the method more sophisticated and 
flexible (allowing for comments in the source file, checking for 
formatting errors etc). Because neither the comment bit (C-style 
line comments "//") nor the formatting check are too demanding I 
decided to do it in the existing loop that looked something like:

while (file.readln(buf)) {
   line = to!string(buf);
   // Do something with line
}

Soon, very soon indeed, I ran into all the problems associated 
with loops, which basically boils down to "Where the f.. am I 
now?". So I said to myself that component programming was the 
perfect fit for this kind of problem. I rearranged the program 
and now I have one line in the function that does the job:

public void loadLexicon(ref string[string] lex, string src) {
   // ...
   // arr is of type string[] and holds the lines of the source 
file.
   auto dictionary = Lexicon(); // The output range
   lex = arr.byCommentFilter().byEntry().copy(dictionary).lexicon;
}

It's very nice indeed. The beauty of it is not only that I now 
have a nice, sequentially ordered "one-liner", the outsourcing of 
checks and filters freed my head from the loop logic, which 
helped me to focus on the respective algorithms. This lead to a 
leaner and cleaner implementation of each algorithm, because 
there are no dependecies on loop conditions or the position in 
the loop.

I could easily remove the comment filter, if the deployment 
version of the program ships with tidied up source files, or I 
could add a new component if necessary. In a loop, however 
trivial it may appear at first glance, it would not be so simple 
to add or remove parts of the logic.

One drawback is, that using ranges creates some overheads and 
code duplication. But the neatness of it is amazing, and I 
daresay that a lot of bugs are hidden in loops simply because the 
loop logic distracts and confuses the programmer and bugs finally 
find loopholes they can slip through.

Another drawback is that ranges demand a lot of boilerplate code. 
If properly implemented, there is a lot of code you have to write 
over and over again such as

if (isInputRange!Range && isInputRange!(ElementType!Range) &&
         is(ElementType!(ElementType!Range) == MyType))

I don't know if this could possibly be reduced. Or maybe it's 
just my lack of experience.
Aug 27 2013
next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
Chris:

 Another drawback is that ranges demand a lot of boilerplate 
 code. If properly implemented, there is a lot of code you have 
 to write over and over again such as

 if (isInputRange!Range && isInputRange!(ElementType!Range) &&
         is(ElementType!(ElementType!Range) == MyType))

 I don't know if this could possibly be reduced. Or maybe it's 
 just my lack of experience.

Two ways to reduce "boilerplate code" for that code: One way to reduce "boilerplate code" is to introduce in D a C#-style "yield" that's syntax sugar to create Input Ranges: yield!int inverseGrayCode() { yield 0; foreach (x; inverseGrayCode()) if (x & 1) { yield 2 * x + 1; yield 2 * x; } else { if (x) yield 2 * x; yield 2 * x + 1; } } The yield keyword could also be used to denote a range of a type, to reduce this code of yours: if (isInputRange!Range && isInputRange!(ElementType!Range) && is(ElementType!(ElementType!Range) == MyType)) Bye, bearophile
Aug 27 2013
prev sibling next sibling parent "John Colvin" <john.loughran.colvin gmail.com> writes:
On Tuesday, 27 August 2013 at 14:13:25 UTC, Chris wrote:
 Another drawback is that ranges demand a lot of boilerplate 
 code. If properly implemented, there is a lot of code you have 
 to write over and over again such as

 if (isInputRange!Range && isInputRange!(ElementType!Range) &&
         is(ElementType!(ElementType!Range) == MyType))

 I don't know if this could possibly be reduced. Or maybe it's 
 just my lack of experience.

If you've got a lot of repeated template constraints, simply moving them out in to a separate template can be useful e.g. template isIRofIRofT(Range, T) { enum isRoRoT = isInputRange!Range && isInputRange!(ElementType!Range) && is(ElementType!(ElementType!Range) == T); } if(isIRofIRofT!(Range, MyType))
Aug 27 2013
prev sibling next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, Aug 27, 2013 at 04:13:24PM +0200, Chris wrote:
 I had a very simple method that would read a text file, parse it and
 create a lexicon of the type string[string].
 
 public void loadLexicon(ref string[string] lex, string src)
 
 It was high time I made the method more sophisticated and flexible
 (allowing for comments in the source file, checking for formatting
 errors etc). Because neither the comment bit (C-style line comments
 "//") nor the formatting check are too demanding I decided to do it
 in the existing loop that looked something like:
 
 while (file.readln(buf)) {
   line = to!string(buf);
   // Do something with line
 }
 
 Soon, very soon indeed, I ran into all the problems associated with
 loops, which basically boils down to "Where the f.. am I now?".

Yep, this is a typical structure conflict caused by the mismatch between the lines in a file and the structures they represent. :)
 So I said to myself that component programming was the perfect fit for
 this kind of problem. I rearranged the program and now I have one line
 in the function that does the job:
 
 public void loadLexicon(ref string[string] lex, string src) {
   // ...
   // arr is of type string[] and holds the lines of the source file.
   auto dictionary = Lexicon(); // The output range
   lex = arr.byCommentFilter().byEntry().copy(dictionary).lexicon;
 }
 
 It's very nice indeed. The beauty of it is not only that I now have
 a nice, sequentially ordered "one-liner", the outsourcing of checks
 and filters freed my head from the loop logic, which helped me to
 focus on the respective algorithms. This lead to a leaner and
 cleaner implementation of each algorithm, because there are no
 dependecies on loop conditions or the position in the loop.

I usually separate out the loop body into smaller functions that handle each part of the parsing. Since code is the most straightforward when its structure matches that of the data, this usually means the outer loop no longer iterates over lines, but over larger, logical structures (e.g., an entry in your lexicon). That way, your code becomes something like: void loadLexicon(...) { auto dictionary = Lexicon(); do { dictionary.put(parseEntry(input, ...)); } while (!input.empty); } Entry parseEntry(...) { skipDelimitingBlankLines(...); auto key = parseHeadword(...); auto value = parseBody(...); return Entry(key, value); } Key parseHeadword(...) { ... } Value parseBody(...) { ... } This way, your code structure has 1-to-1 correspondence with the format of your input, which makes it far more readable and less bug-prone. Of course, we can then take this to the next level, which is to encapsulate each of these functions into a range-based component, and so we end up with your nice one-line function. :)
 I could easily remove the comment filter, if the deployment version
 of the program ships with tidied up source files, or I could add a
 new component if necessary. In a loop, however trivial it may appear
 at first glance, it would not be so simple to add or remove parts of
 the logic.

Yeah, when loops get beyond the simplest incarnations, their complexity quickly explodes into something unmanageable, usually resulting in tight coupling between distant parts of the code. That makes it very hard (or outright impossible) to add/remove parts.
 One drawback is, that using ranges creates some overheads and code
 duplication. But the neatness of it is amazing, and I daresay that a
 lot of bugs are hidden in loops simply because the loop logic
 distracts and confuses the programmer and bugs finally find
 loopholes they can slip through.

Overly complex loops make a programmer go loopy and bugs crawl out from loopholes... ;-) As for overheads, I wonder if, given enough experience over time, we can one day identify common patterns that the compiler can take advantage of and optimize into more efficient code. For example, if the compiler recognizes a linear composition of range-based components, it could transform them into optimized nested loops that eliminate some of the overhead involved. I'm not sure how feasible this is with the current DMD implementation, though. But it's certainly something to keep in mind for the future.
 Another drawback is that ranges demand a lot of boilerplate code. If
 properly implemented, there is a lot of code you have to write over
 and over again such as
 
 if (isInputRange!Range && isInputRange!(ElementType!Range) &&
         is(ElementType!(ElementType!Range) == MyType))
 
 I don't know if this could possibly be reduced. Or maybe it's just
 my lack of experience.

You can encapsulate this into a template: template isRoR(R, ElemType) { enum isRoRMyType = isInputRange!R && isInputRange!(ElementType!R) && is(ElementType!(ElementType!R) == ElemType); } // Or, once 2.064 is out: enum isRoRMyType(R) = isInputRange!R && isInputRange!(ElementType!R) && is(ElementType!(ElementType!R) == ElemType); // Now the sig constraint is much more readable and easier to // type. auto myFunc(R)(R range) if (isRoR!(R, MyType)) { ... } Another source of boilerplate is in repeatedly defining .empty, .front, and .popFront. Every single input range, unless it can be defined as a composition of existing Phobos ranges, must have this boilerplate: struct MyRange(T) { property bool empty() { ... } property T front() { ... } void popFront() { ... } } And once you need a forward range, you need to also add: property typeof(this) save() { ... } There might be a way to reduce the amount of boilerplate by using mixins or factory functions, though. T -- When solving a problem, take care that you do not become part of the problem.
Aug 28 2013
prev sibling next sibling parent "Chris" <wendlec tcd.ie> writes:
On Wednesday, 28 August 2013 at 18:30:11 UTC, H. S. Teoh wrote:
 As for overheads, I wonder if, given enough experience over 
 time, we can
 one day identify common patterns that the compiler can take 
 advantage of
 and optimize into more efficient code. For example, if the 
 compiler
 recognizes a linear composition of range-based components, it 
 could
 transform them into optimized nested loops that eliminate some 
 of the
 overhead involved. I'm not sure how feasible this is with the 
 current
 DMD implementation, though. But it's certainly something to 
 keep in mind
 for the future.

Thanks for your comments. I was thinking the same thing, i.e. that maybe future implementations of the compiler can optimize the overhead away. If CP is THE major thing in D (and it makes perfect sense that it is), then the compiler (and syntax) will adapt to it over time, I guess (and hope). I'll try to introduce CP/Ranges where ever it makes sense from now on. It was unbelievable how well the code worked once I could free myself from the loop. And I can take the components and use them "as is" in a separate program I use to add entries to the lexicon source files. I'm only wondering how to integrate the new pattern properly into an otherwise OO framwork. I seriously wonder whether CP/Ranges will make OO obsolete. After all, structs are "classy".
Aug 29 2013
prev sibling parent "Chris" <wendlec tcd.ie> writes:
I re-used the components (see above) in a different program 
(again a parser with different needs) and I ended up with more or 
less the same one-liner.

auto dictionary = Lexicon(); // output range
string[string] lex = arr.byEntry().copy(dictionary).lexicon;

The only change necessary was the logic bit in the Entry struct 
(two lines had to be changed). I could emit the byComment() bit, 
because there are no comments in the source, without any 
complaints or side effects. Great stuff!
Aug 30 2013