www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Code layout for range-intensive D code

reply "bearophile" <bearophileHUGS lycos.com> writes:
The introduction of UFCS in D offers new ways to format D code, 
especially when your code uses many high order functions. What is 
a good layout of the D code in such situations? I have tried 
several alternative layouts, and in the end I found to appreciate 
a layout similar to the one used in F# code. Below I show a kind 
of  extreme example :-)

A textual matrix of bits like this is the input of a little 
nonogram puzzle:

0 1 1 1 1 0
1 0 0 1 1 1
1 0 1 1 1 1
1 1 1 1 1 1
0 1 1 1 1 0


A program has to produce an output like this, in the first part 
of the output it looks at the columns and counts the lengths of 
the groups of "1", and in the second part of the output it does 
the same on the rows:

3
1 2
1 3
5
5
3

4
1 3
1 4
6
4


This is a possible solution program:


import std.stdio, std.algorithm, std.string, std.range, std.conv;

void main() {

     auto t = "table.txt"
              .File()
              .byLine()
              .map!(r => r.removechars("^01".dup))()
              .array();

     const transposed = t[0]
                        .length
                        .iota()
                        .map!(i => t.transversal(i).array())()
                        .array();

     (t ~ [(char[]).init] ~ transposed)
     .map!(r => r
                .group()
                .filter!(p => p[0] == '1')()
                .map!(p => p[1].text())()
                .join(" ")
          )()
     .join("\n")
     .writeln();
}


(Note: the second argument of removechars is "^01".dup because 
removechars is a bit stupid, it requires the same type argument 
on both arguments, and the 'r' given by byLine() is a char[]. 
Here the code performs the string->char[] conversion many times 
because the typical inputs for this program are small enough, 
otherwise it's a premature optimization.)


As you see you have to break the lines, because the processing 
chains often become too much long for single lines.
At first I have put the dots at the end of the lines, but later I 
have found that putting the dots at their start is better, it 
reminds me we are inside a processing chain still.
Putting a single operation on each line (instead of two or three) 
helps readability, allowing a bit of single-line nesting like in 
".map!(i => t.transversal(i).array())()".
And putting the dots and first part aligned vertically helps the 
eye find what chain we are in. In the last part of the program 
you see a nested chain too, inside a map.

I think this code layout is similar to the one used with F# pipe 
operators. In F# code that layout is a kind of standard, I see it 
used by most F# programmers. Maybe some D programmers will want 
to use this kind of layout for such kind of 
higher-order-function-heavy code.

I have found that breaking the chains and giving variable names 
to the intermediate parts of those processing chains doesn't help 
the readability a lot, and the names for those intermediate 
temporary variables tend to be dull and repetitive. On the other 
hand putting just one processing step on each row gives space for 
a short comment on each row, where you thik you need it:

auto t = File("table.txt", "r")                 // comment #1
          .byLine()                              // comment #2
          .map!(r => r.removechars("^01".dup))() // comment #3
          .array();                              // comment #4


In practice I think comment #3 is the only useful here, barely.

Bye,
bearophile
Jun 09 2012
next sibling parent Denis Shelomovskij <verylonglogin.reg gmail.com> writes:
09.06.2012 14:43, bearophile пишет:
 The introduction of UFCS in D offers new ways to format D code,
 especially when your code uses many high order functions.

I have to mention that one shouldn't write range-intensive D code for now. It's too risky to use high level functions in D because it can result in any kind of memory corruption because of at least "Issue 7965 - Invalid outer function scope pointer in some cases": http://d.puremagic.com/issues/show_bug.cgi?id=7965 Not sure about other such issues but at least that one makes high level functions almost unusable because of a danger. -- Денис В. Шеломовский Denis V. Shelomovskij
Jun 09 2012
prev sibling next sibling parent "Nick Sabalausky" <SeeWebsiteToContactMe semitwist.com> writes:
"bearophile" <bearophileHUGS lycos.com> wrote in message 
news:lkplokawtyisvwhvfsew forum.dlang.org...
 The introduction of UFCS in D offers new ways to format D code, especially 
 when your code uses many high order functions. What is a good layout of 
 the D code in such situations? I have tried several alternative layouts, 
 and in the end I found to appreciate a layout similar to the one used in 
 F# code. Below I show a kind of  extreme example :-)

     auto t = "table.txt"
              .File()
              .byLine()
              .map!(r => r.removechars("^01".dup))()
              .array();

     const transposed = t[0]
                        .length
                        .iota()
                        .map!(i => t.transversal(i).array())()
                        .array();

     (t ~ [(char[]).init] ~ transposed)
     .map!(r => r
                .group()
                .filter!(p => p[0] == '1')()
                .map!(p => p[1].text())()
                .join(" ")
          )()
     .join("\n")
     .writeln();
 }

That's basically what I do, except I always indent by exactly one tab.
Jun 10 2012
prev sibling next sibling parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 09.06.2012 14:43, bearophile wrote:
[snip]
 import std.stdio, std.algorithm, std.string, std.range, std.conv;

 void main() {

 auto t = "table.txt"
 .File()
 .byLine()
 .map!(r => r.removechars("^01".dup))()
 .array();

 const transposed = t[0]
 .length
 .iota()

//hopefully I'm not alone in that this: iota(t[0].length) .map!(...) ... is so much more readable. -- Dmitry Olshansky
Jun 10 2012
prev sibling next sibling parent reply "ixid" <nuaccount gmail.com> writes:
Having to append .array() all the time is rather annoying. I 
can't help but feel that there's a better solution than this. Are 
lazy Result methods really the default way of doing things? I'd 
rather have eager versions.
Jun 10 2012
parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 11.06.2012 1:49, ixid wrote:
 Having to append .array() all the time is rather annoying. I can't help
 but feel that there's a better solution than this. Are lazy Result
 methods really the default way of doing things? I'd rather have eager
 versions.

And then try to get lazy version back. Now that would be harder then appending .array() wouldn't it? ;) -- Dmitry Olshansky
Jun 10 2012
prev sibling next sibling parent "Peter Alexander" <peter.alexander.au gmail.com> writes:
On Sunday, 10 June 2012 at 21:49:14 UTC, ixid wrote:
 Having to append .array() all the time is rather annoying. I 
 can't help but feel that there's a better solution than this. 
 Are lazy Result methods really the default way of doing things? 
 I'd rather have eager versions.

Problem with eager versions is that you end up doing multiple memory allocations and passes when combining range manipulators. e.g. r.take(n).filter!(blah).map!(whatever).array() Only allocates once, and is only one linear pass over the memory. If they were all eager, 'take' would create a new array, then 'filter' would pass over that and create a new array, then 'map' would pass over that and create yet another array. Not very efficient.
Jun 10 2012
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Sunday, June 10, 2012 23:49:12 ixid wrote:
 Having to append .array() all the time is rather annoying. I
 can't help but feel that there's a better solution than this. Are
 lazy Result methods really the default way of doing things? I'd
 rather have eager versions.

Eager versions are only good if you don't want to pass the result to another function. And since Phobos (and plenty of user code) uses ranges all over the place, you'd end up allocating memory all over the place, whereas right now, you only allocate it when you explicitly call a function which allocates memory to hold the data - e.g. array. Something like foreach(e; map!"to!string(a)"(filter!"a <= 50"(arr, 42))) { doSomething(e); if(someCondition) break; } would end up allocating for both the result of filter and map, when it doesn't actually need to allocate for _either_ of them. Right now, it can process them lazily, only filtering and mapping for the elements that actually get processed. By using lazy ranges, you avoid both unnecessary allocations and avoid having to process all of the elements in a range if you don't need to. On top of all of that, if a function doesn't return a lazy range, you _can't_ use it with infinite ranges, so if range-based functions all eagerly operated on ranges, infinite ranges would be useless. - Jonathan M Davis
Jun 10 2012
prev sibling next sibling parent "ixid" <nuaccount gmail.com> writes:
Thank you. May I ask though, is the argument against 
automatically appending .array() when a single or chain of lazy 
functions are used to set a variable or set of variables just 
syntactic salt against accidentally doing it eagerly?
Jun 10 2012
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Monday, June 11, 2012 02:11:24 ixid wrote:
 Thank you. May I ask though, is the argument against
 automatically appending .array() when a single or chain of lazy
 functions are used to set a variable or set of variables just
 syntactic salt against accidentally doing it eagerly?

D doesn't do much of anything like that automatically, Aside from the fact tha foreach supports the ranges, _nothing_ in the compiler supports them. They're entirely a library artifact. So, the compiler isn't going to do _anything_ to them automatically. And besides, it's not necessarily the case that you want the result of a range-based function to be an array. What if I want to keep the result of map on the stack? auto a = map!"to!string(a)"(arr); I could pass it to multiple functions later without having to allocate anything. Forcing it to be an array would stop that. And it would be downright nasty for that to happen when dealing with containers, because a number of containers require that you pass them the _exact_ range type that you give them (e.g. remove does this). Converting those ranges to arrays just because you assigned them to a variable would be a _big_ problem. And as for infinite ranges again, if they automatically were converted to arrays when assigned to variables, then you couldn't have variables for them at all, because they _have_ to be eager. Take a random number generator for instance. If all ranges were automatically converted to arrays when assigned to variables, then you couldn't do auto generator = rndGen(); You'd be stuck in an infinite loop if you tried. Having to use std.array.array to explicitly convert a range to an array really doesn't cost you much, and trying to do it automatically would be incredibly error-prone and costly. And really, the more that you use rang-based functions, the less that you need to convert ranges to arrays. - Jonathan M Davis
Jun 10 2012
prev sibling next sibling parent "ixid" <nuaccount gmail.com> writes:
I must say that this is what I love about the D community- 
getting a very thorough answer to a rather basic and newbie 
question from someone who knows what they're talking about.
Jun 10 2012
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Sunday, June 10, 2012 17:31:37 Jonathan M Davis wrote:
 On Monday, June 11, 2012 02:11:24 ixid wrote:
 Thank you. May I ask though, is the argument against
 automatically appending .array() when a single or chain of lazy
 functions are used to set a variable or set of variables just
 syntactic salt against accidentally doing it eagerly?


Ouch, I obviously need to do a better job of re-reading my posts.
 D doesn't do much of anything like that automatically, Aside from the fact
 tha foreach supports the ranges, _nothing_ in the compiler supports them.

"D doesn't do much of anything like that automatically_._ Aside from the fact _that_ foreach supports ranges, nothing in the compiler supports them."
 They're entirely a library artifact. So, the compiler isn't going to do
 _anything_ to them automatically. And besides, it's not necessarily the
 case that you want the result of a range-based function to be an array.
 What if I want to keep the result of map on the stack?
 
 auto a = map!"to!string(a)"(arr);
 
 I could pass it to multiple functions later without having to allocate
 anything. Forcing it to be an array would stop that. And it would be
 downright nasty for that to happen when dealing with containers, because a
 number of containers require that you pass them the _exact_ range type that
 you give them (e.g. remove does this).

"a number of containers require that you pass them the exact range type that _they_ give _you_..."
 Converting those ranges to arrays
 just because you assigned them to a variable would be a _big_ problem.
 
 And as for infinite ranges again, if they automatically were converted to
 arrays when assigned to variables, then you couldn't have variables for them
 at all, because they _have_ to be eager.

"because they have to be _lazy_." - Jonathan M Davis
Jun 10 2012
prev sibling next sibling parent Philippe Sigaud <philippe.sigaud gmail.com> writes:
On Mon, Jun 11, 2012 at 2:36 AM, ixid <nuaccount gmail.com> wrote:
 I must say that this is what I love about the D community- getting a very
 thorough answer to a rather basic and newbie question from someone who knows
 what they're talking about.

auto answers = questions.filter!(isBasic) .filter!(fromNewbie) .map!(jonathanMDavis) .array(); Hey, you can even get lazy answers :)
Jun 11 2012
prev sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
ixid:

 Having to append .array() all the time is rather annoying. I 
 can't help but feel that there's a better solution than this. 
 Are lazy Result methods really the default way of doing things? 
 I'd rather have eager versions.

In most cases having a lazy range is the right default. On the other hand in D many times you can't use lazy ranges, you need arrays, so the ".array()" is quite common after a map or filter. So I propose to add to Phobos two small functions named amap/afilter that produce arrays: http://d.puremagic.com/issues/show_bug.cgi?id=5756 If you think those are redundant, then first of all take a look here: http://d.puremagic.com/issues/show_bug.cgi?id=8155 Bye, bearophile
Jun 11 2012