digitalmars.D.learn - Range of n lines from stdin

Ivan Kazmenko (19/19) Dec 27 2013 Quick question.

Marco Leise (8/32) Dec 27 2013 repeat() is only meant to repeat the same first element over
=?UTF-8?B?QWxpIMOHZWhyZWxp?= (7/13) Dec 27 2013 As far as I know, no. Although, bearophile may have a bug report to
Jakob Ovrum (30/44) Dec 27 2013 This has several issues:

Ivan Kazmenko (43/77) Dec 27 2013 Hmm?.. From my experience, attempting to use a range in a wrong

Ivan Kazmenko (5/32) Dec 27 2013 Maybe the imperative should be "repeat is a function, and

Marco Leise (13/17) Dec 27 2013 The documentation is clear about it:

Jakob Ovrum (49/70) Dec 28 2013 Yes, the idea is that ranges only present interfaces that make

Ivan Kazmenko (20/63) Dec 28 2013 OK, I'm now beginning to understand how hacky is that.

"Ivan Kazmenko" <gassa mail.ru> writes:

Quick question.

(1) I can do
n.iota.map!(_ => readln)
to get the next n lines from stdin.

(2) However, when I do
readln.repeat(n)
it looks clearer but works differently: preserves front and reads 
only one line.

(3) In the particular case of readln, we can substitute it with
stdin.byLine.take(n)
but the question remains for other impure functions.

So, what I ask for is some non-caching repeat for functions with 
side effects.  More idiomatic than (1).  Is there something like 
that in Phobos?  Is it an OK style to have an impure function in 
an UFCS chain?

If repeat could know whether its first argument is pure, it could 
then enable or disable front caching depending on purity... no 
way currently?

Ivan Kazmenko.

Dec 27 2013

Marco Leise <Marco.Leise gmx.de> writes:

Am Fri, 27 Dec 2013 14:26:59 +0000
schrieb "Ivan Kazmenko" <gassa mail.ru>:

 Quick question.
 
 (1) I can do
 n.iota.map!(_ => readln)
 to get the next n lines from stdin.
 
 (2) However, when I do
 readln.repeat(n)
 it looks clearer but works differently: preserves front and reads 
 only one line.
 
 (3) In the particular case of readln, we can substitute it with
 stdin.byLine.take(n)
 but the question remains for other impure functions.
 
 So, what I ask for is some non-caching repeat for functions with 
 side effects.  More idiomatic than (1).  Is there something like 
 that in Phobos?  Is it an OK style to have an impure function in 
 an UFCS chain?
 
 If repeat could know whether its first argument is pure, it could 
 then enable or disable front caching depending on purity... no 
 way currently?

repeat() is only meant to repeat the same first element over
and over. I think it would be wrong if it changed its value
during iteration. A wrapper struct could be more ideomatic:

  FuncRange!readln.take(n)

 Ivan Kazmenko.

-- 
Marco

Dec 27 2013

=?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:

On 12/27/2013 06:26 AM, Ivan Kazmenko wrote:

 n.iota.map!(_ => readln)
 to get the next n lines from stdin.

 So, what I ask for is some non-caching repeat for functions with side
 effects.  More idiomatic than (1).

This request comes up once in a while.

 Is there something like that in Phobos?

As far as I know, no. Although, bearophile may have a bug report to 
track the issue. :)

 Is it an OK style to have an impure function in an UFCS chain?

I don't think any different than side effects in other parts of the 
language. In other words, side effects are a part of D. :)

Ali

Dec 27 2013

"Jakob Ovrum" <jakobovrum gmail.com> writes:

On Friday, 27 December 2013 at 14:27:01 UTC, Ivan Kazmenko wrote:
 Quick question.

 (1) I can do
 n.iota.map!(_ => readln)
 to get the next n lines from stdin.

This has several issues:

  * The result claims to have all kinds of range capabilities that 
don't make sense at all. Attempting to actually use these 
capabilities, likely indirectly through range algorithms, can 
cause all kinds of havoc.

  * It will allocate a new buffer for the read line every time 
`front` is called, which is less granular than `byLine`'s 
allocation behaviour.

  * If `stdin` (or whatever file) only has `i` number of lines 
left in it where `i < n`, the range will erroneously report `n - 
i` number of empty lines at the end.

  * It's not showing intent as clear as it should be.

 (3) In the particular case of readln, we can substitute it with
 stdin.byLine.take(n)
 but the question remains for other impure functions.

`byLine` with `take` has very different characteristics from 
`map` + `readln`, as explained above.

 So, what I ask for is some non-caching repeat for functions 
 with side effects.  More idiomatic than (1).  Is there 
 something like that in Phobos?

It's hard generalize. For one, what is the empty condition?

 Is it an OK style to have an impure function in an UFCS chain?

I assume by UFCS chain you mean range compositions in particular.

It's not really about purity; impure links in the chain are fine 
(e.g. `byLine`). The issue is when the side effects are the only 
result - I think that is very bad style, and should either be 
rewritten in terms of return values, or rewritten to use an 
imperative style.

Some people think otherwise, and it results in a lot of Phobos 
requests for a function that just eagerly evaluates a range and 
nothing more (sometimes called `eat`). I have yet to convince 
these people that they are wrong :)

 If repeat could know whether its first argument is pure, it 
 could then enable or disable front caching depending on 
 purity... no way currently?

`readln.repeat(n)` can also be written `repeat(readln(), n)`. 
Maybe that makes it more obvious what it does - reads one line 
from standard input and passes that to `repeat`, which returns a 
range that returns that same line `n` times.

Dec 27 2013

"Ivan Kazmenko" <gassa mail.ru> writes:

On Friday, 27 December 2013 at 18:32:29 UTC, Jakob Ovrum wrote:
 (1) I can do
 n.iota.map!(_ => readln)
 to get the next n lines from stdin.

 This has several issues:

  * The result claims to have all kinds of range capabilities 
 that don't make sense at all. Attempting to actually use these 
 capabilities, likely indirectly through range algorithms, can 
 cause all kinds of havoc.

Hmm?..  From my experience, attempting to use a range in a wrong 
way usually results in a compilation error.  For example, I can't 
do
n.iota.map!(_ => readln).sort())
since MapResult isn't a random access range with swappable 
elements. I can instead do
n.iota.map!(_ => readln).array().sort())
and it allocates an array and works as expected.  So, how do I 
misuse that range?

  * It will allocate a new buffer for the read line every time 
 `front` is called, which is less granular than `byLine`'s 
 allocation behaviour.

  * If `stdin` (or whatever file) only has `i` number of lines 
 left in it where `i < n`, the range will erroneously report `n 
 - i` number of empty lines at the end.

  * It's not showing intent as clear as it should be.

Thank you for pointing these out!  So it's not performant, not 
correct and not idiomatic.  I understood only a part of that, but 
already asked for a better alternative.  Well, that's more 
arguments to the same point.  And yeah, stdin.byLine serves 
rather well in this particular case.

 So, what I ask for is some non-caching repeat for functions 
 with side effects.  More idiomatic than (1).  Is there 
 something like that in Phobos?

 It's hard generalize. For one, what is the empty condition?

Hmm.  For example, that could be a RNG emitting (a range of) 
random numbers, then "empty" is always false.  But we still want 
a new random number each time.  Something like
n.iota.map!(_ => uniform(0, 10))

 Is it an OK style to have an impure function in an UFCS chain?

 I assume by UFCS chain you mean range compositions in 
 particular.

 It's not really about purity; impure links in the chain are 
 fine (e.g. `byLine`). The issue is when the side effects are 
 the only result - I think that is very bad style, and should 
 either be rewritten in terms of return values, or rewritten to 
 use an imperative style.

So, something like
n.iota.map !(_ => readln).writeln;
is bad style but
writeln (n.iota.map !(_ => readln));
better shows what's the main action?  Makes sense for me.

 If repeat could know whether its first argument is pure, it 
 could then enable or disable front caching depending on 
 purity... no way currently?

 `readln.repeat(n)` can also be written `repeat(readln(), n)`. 
 Maybe that makes it more obvious what it does - reads one line 
 from standard input and passes that to `repeat`, which returns 
 a range that returns that same line `n` times.

The confusion for me is this: does "repeat" mean "eagerly get a 
value once and then lazily repeat it n times" or "do what the 
first argument suggests (emit constant, call function, etc.) n 
times"?  I guess it depends on the defaults of the language.  
Currently, I had no strong preference for one definition over the 
other when I saw the name.  Maybe I would indeed prefer the first 
definition if I knew D better, I don't know.

In the first definition, the "eagerly vs. lazily" contradiction 
in my mind is what scares me off from making it the default: if 
"repeat" is a lazy range by itself, why would it treat its 
argument eagerly?  What if the argument is a lazy range itself, 
having a new value each time repeat asks for it?

The first definition makes much more sense for me when I treat it 
this way: "repeat expects its first argument to be pure (not able 
to change between calls)".

Perhaps there's a wholly different way of thinking about this in 
which the first definition makes much more sense than then second 
one from the start.  If so, please share it.

Ivan Kazmenko.

Dec 27 2013

"Ivan Kazmenko" <gassa mail.ru> writes:

On Friday, 27 December 2013 at 20:30:52 UTC, Ivan Kazmenko wrote:
 On Friday, 27 December 2013 at 18:32:29 UTC, Jakob Ovrum wrote:
 If repeat could know whether its first argument is pure, it 
 could then enable or disable front caching depending on 
 purity... no way currently?

 `readln.repeat(n)` can also be written `repeat(readln(), n)`. 
 Maybe that makes it more obvious what it does - reads one line 
 from standard input and passes that to `repeat`, which returns 
 a range that returns that same line `n` times.

 The confusion for me is this: does "repeat" mean "eagerly get a 
 value once and then lazily repeat it n times" or "do what the 
 first argument suggests (emit constant, call function, etc.) n 
 times"?  I guess it depends on the defaults of the language.  
 Currently, I had no strong preference for one definition over 
 the other when I saw the name.  Maybe I would indeed prefer the 
 first definition if I knew D better, I don't know.

 In the first definition, the "eagerly vs. lazily" contradiction 
 in my mind is what scares me off from making it the default: if 
 "repeat" is a lazy range by itself, why would it treat its 
 argument eagerly?  What if the argument is a lazy range itself, 
 having a new value each time repeat asks for it?

 The first definition makes much more sense for me when I treat 
 it this way: "repeat expects its first argument to be pure (not 
 able to change between calls)".

 Perhaps there's a wholly different way of thinking about this 
 in which the first definition makes much more sense than then 
 second one from the start.  If so, please share it.

Maybe the imperative should be "repeat is a function, and 
arguments of functions should be evaluated only once"?  It does 
make sense from a language point of view, but somewhat breaks the 
abstraction for me.

Dec 27 2013

Marco Leise <Marco.Leise gmx.de> writes:

Am Fri, 27 Dec 2013 20:34:02 +0000
schrieb "Ivan Kazmenko" <gassa mail.ru>:

 Maybe the imperative should be "repeat is a function, and 
 arguments of functions should be evaluated only once"?  It does 
 make sense from a language point of view, but somewhat breaks the 
 abstraction for me.

The documentation is clear about it:

"Repeats one value forever."

It has nothing to do with purity, whether the input range is
lazy or the element is fetched eagerly. If it was meant to do
what you expected it would read:

"Constructs a range from lazily evaluating the expression
passed to it over and over."

This is not a limitation of the language either I think, since
arguments to functions can be declared "lazy".

-- 
Marco

Dec 27 2013

"Jakob Ovrum" <jakobovrum gmail.com> writes:

On Friday, 27 December 2013 at 20:30:52 UTC, Ivan Kazmenko wrote:
 Hmm?..  From my experience, attempting to use a range in a 
 wrong way usually results in a compilation error.  For example, 
 I can't do
 n.iota.map!(_ => readln).sort())
 since MapResult isn't a random access range with swappable 
 elements. I can instead do
 n.iota.map!(_ => readln).array().sort())
 and it allocates an array and works as expected.  So, how do I 
 misuse that range?

Yes, the idea is that ranges only present interfaces that make 
sense, so cases of misuse will result in a compilation error. 
However, hacks like using map with functions that ignore their 
argument(s) throws that out of the window: `r` in `auto r = 
n.iota.map!(_ => readln);` claims to support forward, 
bidirectional and random access (all read-only, as the argument 
function returns by value) as well as slicing, but none of these 
make any sense; all access primitives do exactly the same thing, 
with the result being different every time. Even the simplest 
invariants fail, such as `r.front == r.front`, and `popFront`, 
`popBack` and slicing only has a binary effect, whether or not 
the range is empty yet.

 Hmm.  For example, that could be a RNG emitting (a range of) 
 random numbers, then "empty" is always false.  But we still 
 want a new random number each time.  Something like
 n.iota.map!(_ => uniform(0, 10))

That would only provide `n` random numbers, not an infinite 
number.

All the random number generator types in `std.random` are 
infinite forward ranges of random numbers, which is completely 
fine. For any PRNG `r`, `r.front == r.front` is true, and remains 
the same number until `r.popFront()`, it correctly has no length 
and is always non-empty (infinite range), and `r.save` works 
correctly etc.

 So, something like
 n.iota.map !(_ => readln).writeln;
 is bad style but
 writeln (n.iota.map !(_ => readln));
 better shows what's the main action?  Makes sense for me.

No, it has nothing to do with syntax. The two examples are 
completely equivalent, and the only problem is that it breaks the 
invariant that the result of map's transformation function should 
be derived from the arguments it was given. The fact that the 
transformation function is impure is not in itself a problem: 
pure functions can also ignore arguments, and impure functions 
can return consistent results while still being necessarily 
impure.

 Perhaps there's a wholly different way of thinking about this 
 in which the first definition makes much more sense than then 
 second one from the start.  If so, please share it.

All you have to do is look at the signature of the function, 
which is the primary part of its documentation:

Repeat!T repeat(T)(T value);

It takes one value of any type T, not a function pointer or 
delegate that returns T. Even if you give it a function pointer 
or delegate (which your example does not), it will simply repeat 
that function pointer or delegate, never calling it.

As I already explained, `readln.repeat(n)` is just a different 
way of writing `readln().repeat(n)` which in turn is also 
equivalent to `repeat(readln(), n)`. This should make it 
perfectly clear what it does - `readln` is called and its return 
value is passed to `repeat`. Barring one relatively obscure 
exception[1], this is the only way to interpret the expression 
regardless of the signature of the function, as a consequence of 
basic languages rules common to the entire C family of 
programming languages.

[1] ... in D we have something (slightly controversial) called 
the `lazy` parameter storage class, but when used, it is clearly 
visible in the signature of the function. 
http://dlang.org/function.html#parameters

Dec 28 2013

"Ivan Kazmenko" <gassa mail.ru> writes:

Many thanks to Marco, Ali and Jakob for the answers!

On Saturday, 28 December 2013 at 08:56:53 UTC, Jakob Ovrum wrote:
 On Friday, 27 December 2013 at 20:30:52 UTC, Ivan Kazmenko 
 wrote:
 Hmm?..  From my experience, attempting to use a range in a 
 wrong way usually results in a compilation error.  For 
 example, I can't do
 n.iota.map!(_ => readln).sort())
 since MapResult isn't a random access range with swappable 
 elements. I can instead do
 n.iota.map!(_ => readln).array().sort())
 and it allocates an array and works as expected.  So, how do I 
 misuse that range?

 Yes, the idea is that ranges only present interfaces that make 
 sense, so cases of misuse will result in a compilation error. 
 However, hacks like using map with functions that ignore their 
 argument(s) throws that out of the window: `r` in `auto r = 
 n.iota.map!(_ => readln);` claims to support forward, 
 bidirectional and random access (all read-only, as the argument 
 function returns by value) as well as slicing, but none of 
 these make any sense; all access primitives do exactly the same 
 thing, with the result being different every time. Even the 
 simplest invariants fail, such as `r.front == r.front`, and 
 `popFront`, `popBack` and slicing only has a binary effect, 
 whether or not the range is empty yet.

OK, I'm now beginning to understand how hacky is that.

 All the random number generator types in `std.random` are 
 infinite forward ranges of random numbers, which is completely 
 fine. For any PRNG `r`, `r.front == r.front` is true, and 
 remains the same number until `r.popFront()`, it correctly has 
 no length and is always non-empty (infinite range), and 
 `r.save` works correctly etc.

So, for both of my examples, support for desired behavior is 
provided at the different side: not a non-caching repeat for a 
given function but a range of lines or random numbers with the 
desired properties instead of such function.  Maybe that's 
usually the right thing to do in the general case, too...

 Perhaps there's a wholly different way of thinking about this 
 in which the first definition makes much more sense than then 
 second one from the start.  If so, please share it.

 All you have to do is look at the signature of the function, 
 which is the primary part of its documentation:

 Repeat!T repeat(T)(T value);

 It takes one value of any type T, not a function pointer or 
 delegate that returns T. Even if you give it a function pointer 
 or delegate (which your example does not), it will simply 
 repeat that function pointer or delegate, never calling it.

So what I initially wanted is possible with something like (I 
just checked):
(&readln!(string)) . repeat(n) . map!(f => f('\n'))
However, I'm having a hard time trying to get rid of "!(string)" 
and "'\n'" to make something like the following work:
(&readln) . repeat(n) . map!(f => f())
Anyway, even the second line (which does not compile) looks 
cryptic a bit.  And it still has the problem of silently adding 
empty lines after end-of-file was reached.

 [1] ... in D we have something (slightly controversial) called 
 the `lazy` parameter storage class, but when used, it is 
 clearly visible in the signature of the function. 
 http://dlang.org/function.html#parameters

Thank you for the link.  This is indeed what was my other 
expectation for repeat.

Ivan Kazmenko.

Dec 28 2013

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Range of n lines from stdin