www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Range of n lines from stdin

reply "Ivan Kazmenko" <gassa mail.ru> writes:
Quick question.

(1) I can do
n.iota.map!(_ => readln)
to get the next n lines from stdin.

(2) However, when I do
readln.repeat(n)
it looks clearer but works differently: preserves front and reads 
only one line.

(3) In the particular case of readln, we can substitute it with
stdin.byLine.take(n)
but the question remains for other impure functions.

So, what I ask for is some non-caching repeat for functions with 
side effects.  More idiomatic than (1).  Is there something like 
that in Phobos?  Is it an OK style to have an impure function in 
an UFCS chain?

If repeat could know whether its first argument is pure, it could 
then enable or disable front caching depending on purity... no 
way currently?

Ivan Kazmenko.
Dec 27 2013
next sibling parent Marco Leise <Marco.Leise gmx.de> writes:
Am Fri, 27 Dec 2013 14:26:59 +0000
schrieb "Ivan Kazmenko" <gassa mail.ru>:

 Quick question.
 
 (1) I can do
 n.iota.map!(_ => readln)
 to get the next n lines from stdin.
 
 (2) However, when I do
 readln.repeat(n)
 it looks clearer but works differently: preserves front and reads 
 only one line.
 
 (3) In the particular case of readln, we can substitute it with
 stdin.byLine.take(n)
 but the question remains for other impure functions.
 
 So, what I ask for is some non-caching repeat for functions with 
 side effects.  More idiomatic than (1).  Is there something like 
 that in Phobos?  Is it an OK style to have an impure function in 
 an UFCS chain?
 
 If repeat could know whether its first argument is pure, it could 
 then enable or disable front caching depending on purity... no 
 way currently?
repeat() is only meant to repeat the same first element over and over. I think it would be wrong if it changed its value during iteration. A wrapper struct could be more ideomatic: FuncRange!readln.take(n)
 Ivan Kazmenko.
-- Marco
Dec 27 2013
prev sibling next sibling parent =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
On 12/27/2013 06:26 AM, Ivan Kazmenko wrote:

 n.iota.map!(_ => readln)
 to get the next n lines from stdin.
 So, what I ask for is some non-caching repeat for functions with side
 effects.  More idiomatic than (1).
This request comes up once in a while.
 Is there something like that in Phobos?
As far as I know, no. Although, bearophile may have a bug report to track the issue. :)
 Is it an OK style to have an impure function in an UFCS chain?
I don't think any different than side effects in other parts of the language. In other words, side effects are a part of D. :) Ali
Dec 27 2013
prev sibling parent reply "Jakob Ovrum" <jakobovrum gmail.com> writes:
On Friday, 27 December 2013 at 14:27:01 UTC, Ivan Kazmenko wrote:
 Quick question.

 (1) I can do
 n.iota.map!(_ => readln)
 to get the next n lines from stdin.
This has several issues: * The result claims to have all kinds of range capabilities that don't make sense at all. Attempting to actually use these capabilities, likely indirectly through range algorithms, can cause all kinds of havoc. * It will allocate a new buffer for the read line every time `front` is called, which is less granular than `byLine`'s allocation behaviour. * If `stdin` (or whatever file) only has `i` number of lines left in it where `i < n`, the range will erroneously report `n - i` number of empty lines at the end. * It's not showing intent as clear as it should be.
 (3) In the particular case of readln, we can substitute it with
 stdin.byLine.take(n)
 but the question remains for other impure functions.
`byLine` with `take` has very different characteristics from `map` + `readln`, as explained above.
 So, what I ask for is some non-caching repeat for functions 
 with side effects.  More idiomatic than (1).  Is there 
 something like that in Phobos?
It's hard generalize. For one, what is the empty condition?
 Is it an OK style to have an impure function in an UFCS chain?
I assume by UFCS chain you mean range compositions in particular. It's not really about purity; impure links in the chain are fine (e.g. `byLine`). The issue is when the side effects are the only result - I think that is very bad style, and should either be rewritten in terms of return values, or rewritten to use an imperative style. Some people think otherwise, and it results in a lot of Phobos requests for a function that just eagerly evaluates a range and nothing more (sometimes called `eat`). I have yet to convince these people that they are wrong :)
 If repeat could know whether its first argument is pure, it 
 could then enable or disable front caching depending on 
 purity... no way currently?
`readln.repeat(n)` can also be written `repeat(readln(), n)`. Maybe that makes it more obvious what it does - reads one line from standard input and passes that to `repeat`, which returns a range that returns that same line `n` times.
Dec 27 2013
parent reply "Ivan Kazmenko" <gassa mail.ru> writes:
On Friday, 27 December 2013 at 18:32:29 UTC, Jakob Ovrum wrote:
 (1) I can do
 n.iota.map!(_ => readln)
 to get the next n lines from stdin.
This has several issues: * The result claims to have all kinds of range capabilities that don't make sense at all. Attempting to actually use these capabilities, likely indirectly through range algorithms, can cause all kinds of havoc.
Hmm?.. From my experience, attempting to use a range in a wrong way usually results in a compilation error. For example, I can't do n.iota.map!(_ => readln).sort()) since MapResult isn't a random access range with swappable elements. I can instead do n.iota.map!(_ => readln).array().sort()) and it allocates an array and works as expected. So, how do I misuse that range?
  * It will allocate a new buffer for the read line every time 
 `front` is called, which is less granular than `byLine`'s 
 allocation behaviour.

  * If `stdin` (or whatever file) only has `i` number of lines 
 left in it where `i < n`, the range will erroneously report `n 
 - i` number of empty lines at the end.

  * It's not showing intent as clear as it should be.
Thank you for pointing these out! So it's not performant, not correct and not idiomatic. I understood only a part of that, but already asked for a better alternative. Well, that's more arguments to the same point. And yeah, stdin.byLine serves rather well in this particular case.
 So, what I ask for is some non-caching repeat for functions 
 with side effects.  More idiomatic than (1).  Is there 
 something like that in Phobos?
It's hard generalize. For one, what is the empty condition?
Hmm. For example, that could be a RNG emitting (a range of) random numbers, then "empty" is always false. But we still want a new random number each time. Something like n.iota.map!(_ => uniform(0, 10))
 Is it an OK style to have an impure function in an UFCS chain?
I assume by UFCS chain you mean range compositions in particular. It's not really about purity; impure links in the chain are fine (e.g. `byLine`). The issue is when the side effects are the only result - I think that is very bad style, and should either be rewritten in terms of return values, or rewritten to use an imperative style.
So, something like n.iota.map !(_ => readln).writeln; is bad style but writeln (n.iota.map !(_ => readln)); better shows what's the main action? Makes sense for me.
 If repeat could know whether its first argument is pure, it 
 could then enable or disable front caching depending on 
 purity... no way currently?
`readln.repeat(n)` can also be written `repeat(readln(), n)`. Maybe that makes it more obvious what it does - reads one line from standard input and passes that to `repeat`, which returns a range that returns that same line `n` times.
The confusion for me is this: does "repeat" mean "eagerly get a value once and then lazily repeat it n times" or "do what the first argument suggests (emit constant, call function, etc.) n times"? I guess it depends on the defaults of the language. Currently, I had no strong preference for one definition over the other when I saw the name. Maybe I would indeed prefer the first definition if I knew D better, I don't know. In the first definition, the "eagerly vs. lazily" contradiction in my mind is what scares me off from making it the default: if "repeat" is a lazy range by itself, why would it treat its argument eagerly? What if the argument is a lazy range itself, having a new value each time repeat asks for it? The first definition makes much more sense for me when I treat it this way: "repeat expects its first argument to be pure (not able to change between calls)". Perhaps there's a wholly different way of thinking about this in which the first definition makes much more sense than then second one from the start. If so, please share it. Ivan Kazmenko.
Dec 27 2013
next sibling parent reply "Ivan Kazmenko" <gassa mail.ru> writes:
On Friday, 27 December 2013 at 20:30:52 UTC, Ivan Kazmenko wrote:
 On Friday, 27 December 2013 at 18:32:29 UTC, Jakob Ovrum wrote:
 If repeat could know whether its first argument is pure, it 
 could then enable or disable front caching depending on 
 purity... no way currently?
`readln.repeat(n)` can also be written `repeat(readln(), n)`. Maybe that makes it more obvious what it does - reads one line from standard input and passes that to `repeat`, which returns a range that returns that same line `n` times.
The confusion for me is this: does "repeat" mean "eagerly get a value once and then lazily repeat it n times" or "do what the first argument suggests (emit constant, call function, etc.) n times"? I guess it depends on the defaults of the language. Currently, I had no strong preference for one definition over the other when I saw the name. Maybe I would indeed prefer the first definition if I knew D better, I don't know. In the first definition, the "eagerly vs. lazily" contradiction in my mind is what scares me off from making it the default: if "repeat" is a lazy range by itself, why would it treat its argument eagerly? What if the argument is a lazy range itself, having a new value each time repeat asks for it? The first definition makes much more sense for me when I treat it this way: "repeat expects its first argument to be pure (not able to change between calls)". Perhaps there's a wholly different way of thinking about this in which the first definition makes much more sense than then second one from the start. If so, please share it.
Maybe the imperative should be "repeat is a function, and arguments of functions should be evaluated only once"? It does make sense from a language point of view, but somewhat breaks the abstraction for me.
Dec 27 2013
parent Marco Leise <Marco.Leise gmx.de> writes:
Am Fri, 27 Dec 2013 20:34:02 +0000
schrieb "Ivan Kazmenko" <gassa mail.ru>:

 Maybe the imperative should be "repeat is a function, and 
 arguments of functions should be evaluated only once"?  It does 
 make sense from a language point of view, but somewhat breaks the 
 abstraction for me.
The documentation is clear about it: "Repeats one value forever." It has nothing to do with purity, whether the input range is lazy or the element is fetched eagerly. If it was meant to do what you expected it would read: "Constructs a range from lazily evaluating the expression passed to it over and over." This is not a limitation of the language either I think, since arguments to functions can be declared "lazy". -- Marco
Dec 27 2013
prev sibling parent reply "Jakob Ovrum" <jakobovrum gmail.com> writes:
On Friday, 27 December 2013 at 20:30:52 UTC, Ivan Kazmenko wrote:
 Hmm?..  From my experience, attempting to use a range in a 
 wrong way usually results in a compilation error.  For example, 
 I can't do
 n.iota.map!(_ => readln).sort())
 since MapResult isn't a random access range with swappable 
 elements. I can instead do
 n.iota.map!(_ => readln).array().sort())
 and it allocates an array and works as expected.  So, how do I 
 misuse that range?
Yes, the idea is that ranges only present interfaces that make sense, so cases of misuse will result in a compilation error. However, hacks like using map with functions that ignore their argument(s) throws that out of the window: `r` in `auto r = n.iota.map!(_ => readln);` claims to support forward, bidirectional and random access (all read-only, as the argument function returns by value) as well as slicing, but none of these make any sense; all access primitives do exactly the same thing, with the result being different every time. Even the simplest invariants fail, such as `r.front == r.front`, and `popFront`, `popBack` and slicing only has a binary effect, whether or not the range is empty yet.
 Hmm.  For example, that could be a RNG emitting (a range of) 
 random numbers, then "empty" is always false.  But we still 
 want a new random number each time.  Something like
 n.iota.map!(_ => uniform(0, 10))
That would only provide `n` random numbers, not an infinite number. All the random number generator types in `std.random` are infinite forward ranges of random numbers, which is completely fine. For any PRNG `r`, `r.front == r.front` is true, and remains the same number until `r.popFront()`, it correctly has no length and is always non-empty (infinite range), and `r.save` works correctly etc.
 So, something like
 n.iota.map !(_ => readln).writeln;
 is bad style but
 writeln (n.iota.map !(_ => readln));
 better shows what's the main action?  Makes sense for me.
No, it has nothing to do with syntax. The two examples are completely equivalent, and the only problem is that it breaks the invariant that the result of map's transformation function should be derived from the arguments it was given. The fact that the transformation function is impure is not in itself a problem: pure functions can also ignore arguments, and impure functions can return consistent results while still being necessarily impure.
 Perhaps there's a wholly different way of thinking about this 
 in which the first definition makes much more sense than then 
 second one from the start.  If so, please share it.
All you have to do is look at the signature of the function, which is the primary part of its documentation: Repeat!T repeat(T)(T value); It takes one value of any type T, not a function pointer or delegate that returns T. Even if you give it a function pointer or delegate (which your example does not), it will simply repeat that function pointer or delegate, never calling it. As I already explained, `readln.repeat(n)` is just a different way of writing `readln().repeat(n)` which in turn is also equivalent to `repeat(readln(), n)`. This should make it perfectly clear what it does - `readln` is called and its return value is passed to `repeat`. Barring one relatively obscure exception[1], this is the only way to interpret the expression regardless of the signature of the function, as a consequence of basic languages rules common to the entire C family of programming languages. [1] ... in D we have something (slightly controversial) called the `lazy` parameter storage class, but when used, it is clearly visible in the signature of the function. http://dlang.org/function.html#parameters
Dec 28 2013
parent "Ivan Kazmenko" <gassa mail.ru> writes:
Many thanks to Marco, Ali and Jakob for the answers!

On Saturday, 28 December 2013 at 08:56:53 UTC, Jakob Ovrum wrote:
 On Friday, 27 December 2013 at 20:30:52 UTC, Ivan Kazmenko 
 wrote:
 Hmm?..  From my experience, attempting to use a range in a 
 wrong way usually results in a compilation error.  For 
 example, I can't do
 n.iota.map!(_ => readln).sort())
 since MapResult isn't a random access range with swappable 
 elements. I can instead do
 n.iota.map!(_ => readln).array().sort())
 and it allocates an array and works as expected.  So, how do I 
 misuse that range?
Yes, the idea is that ranges only present interfaces that make sense, so cases of misuse will result in a compilation error. However, hacks like using map with functions that ignore their argument(s) throws that out of the window: `r` in `auto r = n.iota.map!(_ => readln);` claims to support forward, bidirectional and random access (all read-only, as the argument function returns by value) as well as slicing, but none of these make any sense; all access primitives do exactly the same thing, with the result being different every time. Even the simplest invariants fail, such as `r.front == r.front`, and `popFront`, `popBack` and slicing only has a binary effect, whether or not the range is empty yet.
OK, I'm now beginning to understand how hacky is that.
 All the random number generator types in `std.random` are 
 infinite forward ranges of random numbers, which is completely 
 fine. For any PRNG `r`, `r.front == r.front` is true, and 
 remains the same number until `r.popFront()`, it correctly has 
 no length and is always non-empty (infinite range), and 
 `r.save` works correctly etc.
So, for both of my examples, support for desired behavior is provided at the different side: not a non-caching repeat for a given function but a range of lines or random numbers with the desired properties instead of such function. Maybe that's usually the right thing to do in the general case, too...
 Perhaps there's a wholly different way of thinking about this 
 in which the first definition makes much more sense than then 
 second one from the start.  If so, please share it.
All you have to do is look at the signature of the function, which is the primary part of its documentation: Repeat!T repeat(T)(T value); It takes one value of any type T, not a function pointer or delegate that returns T. Even if you give it a function pointer or delegate (which your example does not), it will simply repeat that function pointer or delegate, never calling it.
So what I initially wanted is possible with something like (I just checked): (&readln!(string)) . repeat(n) . map!(f => f('\n')) However, I'm having a hard time trying to get rid of "!(string)" and "'\n'" to make something like the following work: (&readln) . repeat(n) . map!(f => f()) Anyway, even the second line (which does not compile) looks cryptic a bit. And it still has the problem of silently adding empty lines after end-of-file was reached.
 [1] ... in D we have something (slightly controversial) called 
 the `lazy` parameter storage class, but when used, it is 
 clearly visible in the signature of the function. 
 http://dlang.org/function.html#parameters
Thank you for the link. This is indeed what was my other expectation for repeat. Ivan Kazmenko.
Dec 28 2013