digitalmars.D - protocol for using InputRanges

Walter Bright (13/13) Mar 22 2014 It's become clear to me that we've underspecified what an InputRange is....

Jakob Ovrum (4/8) Mar 22 2014 If `front` was destructive there would be little point in having
Adam D. Ruppe (7/10) Mar 22 2014 Remember that front is a getter property, which means it should
Rikki Cattermole (8/24) Mar 22 2014 It would be nice to have ranges full stop described. And how to

Chris (13/43) Mar 27 2014 I agree. I've been using ranges for a while now and have tried

Walter Bright (3/15) Mar 27 2014 Following the protocol empty-front-popFront always works.

Chris (24/43) Mar 28 2014 Not only there. The whole issue of front and popFront has not

w0rp (43/43) Mar 28 2014 I think something has gone wrong in this discussion somewhere. It

Chris (14/14) Mar 28 2014 Earlier Walter wrote:

Regan Heath (6/19) Mar 28 2014 You can build safety on top of performance. You cannot do the opposite....

Chris (3/25) Mar 28 2014 But should unsafe+fast be the default or rather an option for

Regan Heath (5/31) Mar 28 2014 Pass. My point was only that it needs to exist.

Steven Schveighoffer (4/6) Mar 31 2014 -- when it is known empty will return false through logical deduction.

Jonathan M Davis (42/61) Mar 23 2014 You definitely don't have to call empty before calling front if you know...

Tommi (6/8) Mar 23 2014 I thought that "breaking existing code" meant either "causing
Walter Bright (5/9) Mar 25 2014 This is simply not practical. The canonical example is a tty. You cannot...

Marco Leise (22/40) Mar 23 2014 Looking at ranges as a user with all the subjectivity it
w0rp (19/19) Mar 23 2014 I understand it like this.

w0rp (6/6) Mar 23 2014 I should add that I have implemented some ranges where .front and
Szymon Gatner (8/28) Mar 23 2014 That is not consistent with the first part of your reply and is

w0rp (37/70) Mar 24 2014 You read wrong. Say you have this sequence of three numbers and

Walter Bright (2/3) Mar 25 2014 This is another undocumented assumption.

Szymon Gatner (15/23) Mar 23 2014 IMO: yes. Logic of empty() sould be const and not have side

Szymon Gatner (7/34) Mar 23 2014 I tend to think about pair of C++ iterators, which are
Steven Schveighoffer (24/29) Mar 24 2014 Here is the crux of the issue. I think aside from Walter's question, we ...

Joseph Rushton Wakeling (17/26) Mar 23 2014 There are some not-very-nice exceptions to that in std.random, where in ...
Timon Gehr (14/17) Mar 23 2014 An analogous problem exists and is more severe for RandomAccessRanges.

monarch_dodra (10/32) Mar 24 2014 I have an open proposal for a "cache" range adaptor that somewhat

Tommi (8/24) Mar 23 2014 Is InputRange supposed to be a one-pass range and am I supposed

Steven Schveighoffer (4/34) Mar 24 2014 A range interface only works for a buffered stream, which naturally allo...

Dicebot (7/23) Mar 24 2014 I think there is one design mistake with current InputRange rules

Joseph Rushton Wakeling (16/21) Mar 24 2014 I floated some ideas along those lines, for a "first" method that would ...

Dicebot (18/31) Mar 25 2014 I was thinking about something more simple. Current pattern is:

Andrei Alexandrescu (4/33) Mar 25 2014 Suggestion: focusing on what to do within the present context is more

Dicebot (3/6) Mar 25 2014 I do work on template argument pack & friends right now, leave me

Chris (15/31) Mar 25 2014 In many cases it makes sense to put things into front, for
Walter Bright (52/52) Mar 25 2014 It's pretty clear that:

monarch_dodra (55/84) Mar 25 2014 http://dlang.org/phobos/std_range.html#isInputRange

Walter Bright (26/83) Mar 25 2014 It's a reasonable requirement. If your range has an issue with this, it ...

Andrei Alexandrescu (2/4) Mar 25 2014 This part I disagree with. -- Andrei
Andrei Alexandrescu (2/3) Mar 25 2014 No, front requires a buffer in the range. -- Andrei
Andrei Alexandrescu (20/57) Mar 25 2014 s/COMPLETELY/loosely/

Timon Gehr (2/5) Mar 25 2014 'map' may fail this criterion. In this case, is the blame put on the use...

monarch_dodra (4/13) Mar 25 2014 Depends how you define "same" though :/ But yeah.

Walter Bright (20/60) Mar 25 2014 The throwing part, or the not good design part?

Regan Heath (23/44) Mar 26 2014 Surely you'd simply start with a range of pointers to expensive-to-copy ...

Steven Schveighoffer (6/51) Mar 26 2014 These two rules are not necessary if you know the range is not empty. Se...

Steven Schveighoffer (5/23) Mar 26 2014 Gah, I didn't cut out the right rules. I meant the two rules that empty ...

Regan Heath (8/31) Mar 26 2014 I see. I was thinking we ought to make empty mandatory to give more

Steven Schveighoffer (7/14) Mar 26 2014 Yes, but when you know that empty is going to return false, there isn't ...

monarch_dodra (8/12) Mar 26 2014 Not only that, but it's also a performance criteria: If you are

Regan Heath (11/21) Mar 26 2014 What guarantees range2 is longer than range1? The isArray case checks

monarch_dodra (13/29) Mar 26 2014 The interface: The target *shall* have enough room to accommodate

Regan Heath (30/56) Mar 26 2014 Ok. So long as *something* is throwing that Error I am down with this.

Daniel Murphy (3/6) Mar 26 2014 Some ranges will give you their length...

Regan Heath (7/13) Mar 27 2014 Sure. And generally you could use it. copy() doesn't, and I was talkin...

Regan Heath (14/29) Mar 26 2014 Sure, it's not required for some algorithms in some situations.
Andrei Alexandrescu (4/19) Mar 26 2014 I think requiring users to call empty before front on input ranges is a

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (11/13) Mar 26 2014 Then the name should change to "ready". It makes sense to require

Paulo Pinto (6/19) Mar 27 2014 Why not?

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (8/10) Mar 27 2014 Why not what? A query for "empty()" should not have any side

Steven Schveighoffer (17/38) Mar 26 2014 if(!r.empty)

Walter Bright (9/23) Mar 26 2014 The idea is that there are three functions: empty, front, and popFront. ...

Steven Schveighoffer (12/41) Mar 26 2014 OK, but it's logical to assume you *can* avoid a call to empty if you kn...

Walter Bright (6/14) Mar 26 2014 As with *any* API, if you look under the hood and make assumptions about...

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (11/35) Mar 27 2014 I was originally going to do that, but then I took a closer look

Regan Heath (23/57) Mar 27 2014 u =
Andrei Alexandrescu (6/10) Mar 27 2014 Probably we need to amend that. For efficient ranges, front() and

Steven Schveighoffer (25/36) Mar 27 2014 he

Andrei Alexandrescu (2/4) Mar 27 2014 filter comes to mind. -- Andrei

monarch_dodra (6/19) Mar 27 2014 I just want to point out that I think allowing empty to have an

Steven Schveighoffer (8/25) Mar 27 2014 On Thu, 27 Mar 2014 12:02:16 -0400, monarch_dodra

Andrei Alexandrescu (2/24) Mar 27 2014 Yah, agreed. -- Andrei

Walter Bright (5/6) Mar 27 2014 Completely unworkable. To determine if a range is empty or not, it may h...

Steven Schveighoffer (17/23) Mar 27 2014 In the land of ranges, it's construction and popFront that generally

Walter Bright (7/17) Mar 27 2014 The range protocol is designed to work with streams. It's a giant fail i...

Steven Schveighoffer (10/28) Mar 27 2014 A 1 byte buffered stream? If it's performance you are looking for, you

Walter Bright (5/6) Mar 27 2014 Are ready to implement a parallel universe of stream based algorithms to...

Steven Schveighoffer (7/14) Mar 28 2014 Not necessary. You just need to implement one range on top of a buffered...

Andrei Alexandrescu (2/16) Mar 28 2014 It increasingly seems to me we need to do this. -- Andrei

Steven Schveighoffer (5/25) Mar 28 2014 It is in the works. I need to finish it. I've been incorporating Dmitry'...
monarch_dodra (13/14) Mar 29 2014 What I find funny here is that walter is saying we must enforce
monarch_dodra (12/13) Mar 29 2014 Question: Should this new restriction only be applied to

monarch_dodra (4/18) Mar 30 2014 Did this conversation die the instant I actually had something

Walter Bright (3/5) Mar 28 2014 Meaning it must also write through a pointer if it did read something.

Marco Leise (30/37) Mar 29 2014 I guess we all have a clear concept of streams in our mind
Steven Schveighoffer (4/11) Mar 31 2014 I don't understand this point/question.

Andrei Alexandrescu (4/8) Mar 27 2014 It's not a giant fail, we just need to adjust the notion.

Walter Bright (5/12) Mar 27 2014 Are you suggesting that ranges needn't support streams?

Johannes Pfau (8/28) Mar 28 2014 Ranges have equivalents in other languages:

Walter Bright (4/10) Mar 28 2014 Do you see a point to be able to, in an algorithm, seamlessly swap a soc...

Dmitry Olshansky (8/19) Mar 28 2014 Certainly NOT a socket. There is no escaping the fact that there are

Walter Bright (5/23) Mar 28 2014 Yes, it does require a one element buffer. But seriously, does a one cha...

Dmitry Olshansky (7/34) Mar 28 2014 WAT? The overhead is in issuing system calls, you'd want to do as little...

Walter Bright (2/4) Mar 28 2014 That's why we have things like byLine().

Dmitry Olshansky (6/11) Mar 28 2014 Which uses C's BUFFERED I/O and it reads from it byte by byte via getc.

Walter Bright (2/5) Mar 28 2014 How about a PR to fix it?

w0rp (9/16) Mar 28 2014 Yes. I hold the opinion that there is not a whole lot of reason
Dmitry Olshansky (6/13) Mar 29 2014 Sorry, I'm not inclined to do any work on hacking arbitrary C runtime

Andrei Alexandrescu (2/13) Mar 28 2014 byLine doesn't use getc. -- Andrei

Dmitry Olshansky (11/25) Mar 29 2014 Indeed not every version. Linux looks almost OK.

Johannes Pfau (24/41) Mar 28 2014 Sorry, but no. It sure sounds nice first, but I can't really imagine a

w0rp (7/28) Mar 28 2014 I think a key is to offer something with gives you chunks at a

Johannes Pfau (18/27) Mar 28 2014 byChunk is implemented on top of the file rawRead API though, and

QAston (3/35) Mar 28 2014 There are stream iterators in C++:

Dmitry Olshansky (8/20) Mar 28 2014 If streams are like hot raw sockets then certainly it doesn't make

QAston (6/11) Mar 27 2014 The protocol is not intuitive. I'm not entirely against having to

Walter Bright (10/11) Mar 27 2014 I find empty-front-popFront as perfectly intuitive. I don't find the cou...

Paolo Invernizzi (10/23) Mar 28 2014 I _strongly_ agree with Walter: people learning D in my groups

Regan Heath (6/8) Mar 28 2014 This is actually not true.

Paolo Invernizzi (6/12) Mar 28 2014 What I'm meaning, it's that we don't care: we are always

John Stahara (4/16) Mar 28 2014 To clarify for Mr. Invernizzi: the "we" to which he refers is the group

Regan Heath (6/22) Mar 28 2014 Thanks, that was confusing me :)
H. S. Teoh (7/27) Mar 28 2014 [...]
Paolo Invernizzi (4/25) Mar 28 2014 Thank you John, that's exact: I'm talking about my colleagues

Andrei Alexandrescu (3/9) Mar 27 2014 That's a good point too.

monarch_dodra (9/23) Mar 27 2014 Yes, but the "observability" should be sort lived, since empty is
RivenTheMage (5/19) Mar 28 2014 Maybe new kind of range can help?

Walter Bright (2/10) Mar 27 2014 Right, it does say that, and it is seriously wrong.

Steven Schveighoffer (17/40) Mar 27 2014 Like range.save. It's "required", but frequently omitted, without any

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (9/15) Mar 27 2014 You probably can run-time test this in Debug builds, but…

Walter Bright (6/10) Mar 27 2014 Sure, but what is awkward about empty-front-popFront?

Steven Schveighoffer (6/19) Mar 27 2014 e:

Andrei Alexandrescu (2/6) Mar 27 2014 I think byXchar are important and are not streams. -- Andrei

Steven Schveighoffer (4/10) Mar 27 2014 What's broken about those?

Andrei Alexandrescu (2/12) Mar 27 2014 Speed.

monarch_dodra (21/40) Mar 27 2014 I call shenanigans. This is the code:

Andrei Alexandrescu (3/6) Mar 27 2014 I was on the verge, and we need to discuss this more. A constructor that...

monarch_dodra (25/34) Mar 27 2014 I still think there's ambiguity in the word "lazy". I think this

Walter Bright (5/6) Mar 27 2014 As mentioned earlier, requiring this of a range constructor makes it imp...

Steven Schveighoffer (8/14) Mar 27 2014 Not impossible. Use lazy initialization.

Walter Bright (7/16) Mar 27 2014 That still doesn't work, because then you could be calling front, and fr...

Steven Schveighoffer (18/38) Mar 27 2014 In the cases where you know there is input, that's not true. What I am

Walter Bright (17/33) Mar 27 2014 You can't use that on generic code, because generic code cannot assume t...

Steven Schveighoffer (23/29) Mar 28 2014 There is a difference here that I want to establish.

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (19/54) Mar 28 2014 Does a flag "isInitialized" even have an effect on performance

Steven Schveighoffer (27/42) Mar 28 2014 =

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (22/47) Mar 29 2014 I was more thinking of the fact that you need to read something

Steven Schveighoffer (45/78) Mar 31 2014 :

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (11/36) Mar 31 2014 I've found the example I was talking about:

Steven Schveighoffer (17/26) Mar 31 2014 d@puremagic.com

Timon Gehr (7/13) Mar 27 2014 I feel uneasy about the assertion that code that assumes that 'empty'

Timon Gehr (2/3) Mar 27 2014 arbitrarily restricting interactions to some 'protocol'
Walter Bright (2/4) Mar 27 2014 What if get() fails?

H. S. Teoh (26/31) Mar 27 2014 That's the whole point of Nullable!T: you can return null if get()

monarch_dodra (23/45) Mar 28 2014 It's not that *you* know a particular range doesn't need "empty"

Rainer Schuetze (101/130) Mar 26 2014 IIUC what you are proposing would be covered better by removing front

Rainer Schuetze (10/15) Mar 26 2014 Ouch, pasted before fixing:
Walter Bright (11/15) Mar 27 2014 Different things being ranged over make different decisions as to what g...

Rainer Schuetze (6/22) Mar 27 2014 This loop is intuitive. Not being allowed to call empty or front

Walter Bright (4/7) Mar 27 2014 I can concede that. But I can't concede being able to call front without...

"Tobias =?UTF-8?B?TcO8bGxlciI=?= <troplin bluewin.ch> (24/34) Mar 28 2014 Disclaimer: I'm a C++ programmer just lurking here, I've never

Szymon Gatner (11/33) Mar 28 2014 In c++ if you had a list or deque you would obviously did
w0rp (10/45) Mar 28 2014 Well, it might seem kind of weird at first that an InputRange has
"Oscar =?UTF-8?B?TWFydMOtbiI=?= <omarmed0000 hotmail.com> (15/50) Mar 30 2014 First sorry for my english.

Walter Bright (5/21) Mar 27 2014 What happens if empty is called twice in a row? Two characters get read!...

Rainer Schuetze (5/28) Mar 27 2014 Good catch. Somehow I pasted the wrong version, but posted the

Walter Bright (3/4) Mar 27 2014 I think that is the very definition of looking under the hood in order t...

Regan Heath (16/25) Mar 27 2014 if(r.empty)

Jonathan M Davis (18/40) Mar 26 2014 I don't know. It's not the end of the world if we require it (at least w...

Walter Bright (5/11) Mar 27 2014 Many things, like some ttys, can not be tested for being empty without r...

=?UTF-8?B?Ikx1w61z?= Marques" (33/35) Mar 27 2014 Some thoughts:

Daniel Murphy (17/20) Mar 29 2014 Not by itself - if a range's empty has returned true, calling empty

Peter Alexander (5/7) Mar 29 2014 As someone who has missed this discussion, can I just say this:

Walter Bright <newshound2 digitalmars.com> writes:

It's become clear to me that we've underspecified what an InputRange is. The 
normal way to use it is:

     while (!r.empty) {
         auto e = r.front;
         ... do something with e ...
         r.popFront();
     }

no argument there. But there are two issues:

1. If you know the range is not empty, is it allowed to call r.front without 
calling r.empty first?

If this is true, extra logic will need to be added to r.front in many cases.

2. Can r.front be called n times in a row? I.e. is calling front() destructive?

If true, this means that r.front will have to cache a copy in many cases.

Mar 22 2014

"Jakob Ovrum" <jakobovrum gmail.com> writes:

On Sunday, 23 March 2014 at 00:50:34 UTC, Walter Bright wrote:
 2. Can r.front be called n times in a row? I.e. is calling 
 front() destructive?

 If true, this means that r.front will have to cache a copy in 
 many cases.

If `front` was destructive there would be little point in having 
it separate from `popFront`. I think it must be non-destructive 
to make sense.

Mar 22 2014

"Adam D. Ruppe" <destructionator gmail.com> writes:

On Sunday, 23 March 2014 at 00:50:34 UTC, Walter Bright wrote:
         auto e = r.front;

Remember that front is a getter property, which means it should 
work like a variable. Typically, reading a variable is not 
destructive and needs no preparation. There's exceptions to the 
rule, but they almost always work this way, so ranges should too.

 If true, this means that r.front will have to cache a copy in 
 many cases.

Indeed, in many of my quick input ranges, I just make front a 
regular variable and popFront updates it to the next item.

Mar 22 2014

"Rikki Cattermole" <alphaglosined gmail.com> writes:

On Sunday, 23 March 2014 at 00:50:34 UTC, Walter Bright wrote:
 It's become clear to me that we've underspecified what an 
 InputRange is. The normal way to use it is:

     while (!r.empty) {
         auto e = r.front;
         ... do something with e ...
         r.popFront();
     }

 no argument there. But there are two issues:

 1. If you know the range is not empty, is it allowed to call 
 r.front without calling r.empty first?

 If this is true, extra logic will need to be added to r.front 
 in many cases.

 2. Can r.front be called n times in a row? I.e. is calling 
 front() destructive?

 If true, this means that r.front will have to cache a copy in 
 many cases.

It would be nice to have ranges full stop described. And how to 
use/make them in D. I have yet to learn them because of no 
official documentation on them.
On this topic we also need to work on common patterns and 
creating documentation on e.g. the site or wiki for it. Saw that 
we do have a bit in the wiki under tutorials. Maybe if I get some 
time I'll work on that.

Mar 22 2014

"Chris" <wendlec tcd.ie> writes:

On Sunday, 23 March 2014 at 01:07:27 UTC, Rikki Cattermole wrote:
 On Sunday, 23 March 2014 at 00:50:34 UTC, Walter Bright wrote:
 It's become clear to me that we've underspecified what an 
 InputRange is. The normal way to use it is:

    while (!r.empty) {
        auto e = r.front;
        ... do something with e ...
        r.popFront();
    }

 no argument there. But there are two issues:

 1. If you know the range is not empty, is it allowed to call 
 r.front without calling r.empty first?

 If this is true, extra logic will need to be added to r.front 
 in many cases.

 2. Can r.front be called n times in a row? I.e. is calling 
 front() destructive?

 If true, this means that r.front will have to cache a copy in 
 many cases.

 It would be nice to have ranges full stop described. And how to 
 use/make them in D. I have yet to learn them because of no 
 official documentation on them.
 On this topic we also need to work on common patterns and 
 creating documentation on e.g. the site or wiki for it. Saw 
 that we do have a bit in the wiki under tutorials. Maybe if I 
 get some time I'll work on that.

I agree. I've been using ranges for a while now and have tried
different implementations based on advice given on this forum and
depending on each case. After reading this thread I looked at
some of my ranges and I have to say I still have no clue as to
what should and should _not_ be done (regardless of whether you
_can_ do it). For a while I thought that it's my lack of
understanding, but this thread shows that everyone has a
different view of ranges. Guidelines with use cases would be
great. I remember I mentioned this in another thread already. The
thing is that experimenting without proper guidelines leaves your
code in an inconsistent state where you have two or more ranges
doing technically the same thing but each with a different logic.

Mar 27 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 3/27/2014 3:21 AM, Chris wrote:
 I agree. I've been using ranges for a while now and have tried
 different implementations based on advice given on this forum and
 depending on each case. After reading this thread I looked at
 some of my ranges and I have to say I still have no clue as to
 what should and should _not_ be done (regardless of whether you
 _can_ do it). For a while I thought that it's my lack of
 understanding, but this thread shows that everyone has a
 different view of ranges. Guidelines with use cases would be
 great. I remember I mentioned this in another thread already. The
 thing is that experimenting without proper guidelines leaves your
 code in an inconsistent state where you have two or more ranges
 doing technically the same thing but each with a different logic.

Following the protocol empty-front-popFront always works.

Where the differences come in is when people skip calling empty.

Mar 27 2014

"Chris" <wendlec tcd.ie> writes:

On Friday, 28 March 2014 at 04:58:28 UTC, Walter Bright wrote:
 On 3/27/2014 3:21 AM, Chris wrote:
 I agree. I've been using ranges for a while now and have tried
 different implementations based on advice given on this forum 
 and
 depending on each case. After reading this thread I looked at
 some of my ranges and I have to say I still have no clue as to
 what should and should _not_ be done (regardless of whether you
 _can_ do it). For a while I thought that it's my lack of
 understanding, but this thread shows that everyone has a
 different view of ranges. Guidelines with use cases would be
 great. I remember I mentioned this in another thread already. 
 The
 thing is that experimenting without proper guidelines leaves 
 your
 code in an inconsistent state where you have two or more ranges
 doing technically the same thing but each with a different 
 logic.

 Following the protocol empty-front-popFront always works.

 Where the differences come in is when people skip calling empty.

Not only there. The whole issue of front and popFront has not 
been made clear yet (unless it's my lack of understanding). What 
should and should not be done there, even if you stick to the 
basic protocol (!empty > front > popFront).

I like the concept of ranges and use them a lot, because they 
help me to create independent components. But sometimes I think 
they are just glorified for-loops in the sense that we know more 
or less what the input is, and part of the reason why we have 
this thread is that we want everything to be specialized and 
generic at the same time. E.g.

 monarch_dodra wrote:

"It's not that *you* know a particular range doesn't need "empty"
called, it's that the algorithm you are using has already
previously validated there are elements in it.

For example, the splitter algorithm will first save a copy of its
range, and then walk it, searching for the "splitting" elements.
It then realizes it has walked N elements."

But then it's no longer generic. If your range depends on another 
algorithm (that you assume or know to have checked something), 
then it's no longer generic. In many cases "generic" means an 
overhead of checks, and is anathema to performance tuning. I 
think it's the issue of generic vs. specialized / optimized that 
started this thread. And this is a tough one to sort out.

Mar 28 2014

"w0rp" <devw0rp gmail.com> writes:

I think something has gone wrong in this discussion somewhere. It 
is like this.

Do not assume .empty is const, this is unreasonable.

Do not assume .front is const, this is unreasonable.

.popFront has the only high-level observable difference in 
removing something from the front of an InputRange.

I say it is unreasonable to expect .empty or .front to be const 
because a range which is strictly an InputRange (not a 
ForwardRange...) is likely to be doing some kind of work where 
stepping through its sequence of data is destructive. As soon as 
the thing is read in the implementation of the InputRange, you 
can't read that thing again. You've cached it or it is gone. Thus 
.empty and .front must be non-const and lazily evaluate caching 
the next value.

I do not see it as a major issue to check .empty or call .front 
many times, even with this caching behaviour. Commonly this is as 
simple as checking whether or not you still have a value to 
provide inside the InputRange. We use assert(!empty) etc. because 
honestly I cannot imagine another way to make this safe.

If you really, really hate this double-checking in .empty or 
.front, you could try to change InputRanges so they behave like 
Java Iterators. I believe this is a really bad idea. You have to 
start thinking about returning nullable types. You have to push 
the job of caching values on to users of the InputRanges, as in 
each and every algorithm. It's just a really bad choice, and with 
the kind of boxing you'll need to do for nullable types, you'll 
surely end off worse as far as performance goes anyway.

Really, the range protocol is as Walter specified at the start of 
this thread. The documentation should reflect how .empty or 
.front may result in caching the next value. I really don't think 
this is a major performance problem worth worrying about, though. 
If it's really, really worth it for certain standard algorithms, 
perhaps something can be done in private range methods or similar 
to offer some kind of speed boost. ('package' privacy for std.* 
is feasible.) I don't think there is a lot to be gained from this 
in general, though.

Streams are different, and may be more appropriate for some 
functions for performance reasons. I still prefer ranges for code 
where performance is less critical, because if you aren't very 
worried about performance, the range API is a lot nicer. I'm not 
dead set on maximum performance if it makes life a lot harder in 
my code, personally. I view these kinds of improvements as 
premature optimisations if I'm writing something new.

Mar 28 2014

"Chris" <wendlec tcd.ie> writes:

Earlier Walter wrote:

"I don't like being in the position of when I need high 
performance code, I have
to implement my own ranges & algorithms, or telling customers 
they need to do so."

I don't think there is a one size fits all. What if customers ask 
for maximum security? In any language, if I want high 
performance, I have to be prepared to walk on thin ice. If I want 
things to be safe and / or generic, I have to accept additonal 
checks (= perfomance penalties). I don't think that a language 
can solve the fundamental problems concerning programming / 
mathematical logic with all the contradictory demands involved. 
It can give us the tools to cope with those problems, but not 
solve them out of the box.

Mar 28 2014

"Regan Heath" <regan netmail.co.nz> writes:

On Fri, 28 Mar 2014 14:15:10 -0000, Chris <wendlec tcd.ie> wrote:

 Earlier Walter wrote:

 "I don't like being in the position of when I need high performance  
 code, I have
 to implement my own ranges & algorithms, or telling customers they need  
 to do so."

 I don't think there is a one size fits all. What if customers ask for  
 maximum security? In any language, if I want high performance, I have to  
 be prepared to walk on thin ice. If I want things to be safe and / or  
 generic, I have to accept additonal checks (= perfomance penalties). I  
 don't think that a language can solve the fundamental problems  
 concerning programming / mathematical logic with all the contradictory  
 demands involved. It can give us the tools to cope with those problems,  
 but not solve them out of the box.

You can build safety on top of performance.  You cannot do the opposite.   
Meaning, one could wrap an unsafe/fast range with a safe/slower one.

R

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/

Mar 28 2014

"Chris" <wendlec tcd.ie> writes:

On Friday, 28 March 2014 at 15:49:06 UTC, Regan Heath wrote:
 On Fri, 28 Mar 2014 14:15:10 -0000, Chris <wendlec tcd.ie> 
 wrote:

 Earlier Walter wrote:

 "I don't like being in the position of when I need high 
 performance code, I have
 to implement my own ranges & algorithms, or telling customers 
 they need to do so."

 I don't think there is a one size fits all. What if customers 
 ask for maximum security? In any language, if I want high 
 performance, I have to be prepared to walk on thin ice. If I 
 want things to be safe and / or generic, I have to accept 
 additonal checks (= perfomance penalties). I don't think that 
 a language can solve the fundamental problems concerning 
 programming / mathematical logic with all the contradictory 
 demands involved. It can give us the tools to cope with those 
 problems, but not solve them out of the box.

 You can build safety on top of performance.  You cannot do the 
 opposite.  Meaning, one could wrap an unsafe/fast range with a 
 safe/slower one.

 R

But should unsafe+fast be the default or rather an option for 
cases when you really need it?

Mar 28 2014

"Regan Heath" <regan netmail.co.nz> writes:

On Fri, 28 Mar 2014 16:04:29 -0000, Chris <wendlec tcd.ie> wrote:

 On Friday, 28 March 2014 at 15:49:06 UTC, Regan Heath wrote:
 On Fri, 28 Mar 2014 14:15:10 -0000, Chris <wendlec tcd.ie> wrote:

 Earlier Walter wrote:

 "I don't like being in the position of when I need high performance  
 code, I have
 to implement my own ranges & algorithms, or telling customers they  
 need to do so."

 I don't think there is a one size fits all. What if customers ask for  
 maximum security? In any language, if I want high performance, I have  
 to be prepared to walk on thin ice. If I want things to be safe and /  
 or generic, I have to accept additonal checks (= perfomance  
 penalties). I don't think that a language can solve the fundamental  
 problems concerning programming / mathematical logic with all the  
 contradictory demands involved. It can give us the tools to cope with  
 those problems, but not solve them out of the box.

 You can build safety on top of performance.  You cannot do the  
 opposite.  Meaning, one could wrap an unsafe/fast range with a  
 safe/slower one.

 R

 But should unsafe+fast be the default or rather an option for cases when  
 you really need it?

Pass.  My point was only that it needs to exist.

R

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/

Mar 28 2014

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Fri, 28 Mar 2014 00:58:35 -0400, Walter Bright  
<newshound2 digitalmars.com> wrote:

 Following the protocol empty-front-popFront always works.

 Where the differences come in is when people skip calling empty.

-- when it is known empty will return false through logical deduction.

-Steve

Mar 31 2014

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Saturday, March 22, 2014 17:50:34 Walter Bright wrote:
 It's become clear to me that we've underspecified what an InputRange is. The
 normal way to use it is:
 
      while (!r.empty) {
          auto e = r.front;
          ... do something with e ...
          r.popFront();
      }
 
 no argument there. But there are two issues:
 
 1. If you know the range is not empty, is it allowed to call r.front without
 calling r.empty first?
 
 If this is true, extra logic will need to be added to r.front in many cases.

You definitely don't have to call empty before calling front if you know that 
it's not empty. Both front and empty should normally be pure (or at least act 
that way) and essentially act like variables. In most cases, it works best for 
the work of the range to go in popFront. The exception is when you're dealing 
with a random-access range, since then any element could be accessed, making 
it so that you can't be doing the work in popFront. I think that we have a 
general agreement on this based on previous discussions, though it's certainly 
not unanimous.

 2. Can r.front be called n times in a row? I.e. is calling front()
 destructive?
 
 If true, this means that r.front will have to cache a copy in many cases.

If calling front were destructive, that would break a lot of code. It's 
probably true that most range-based code should avoid calling front multiple 
times (in case front has to do more work than just return the value as well as 
to avoid copying the result if that happens on every call), though if front is 
auto ref, it could be more efficient to call it multiple times. So, it's not 
entirely clear-cut.

But again, front and empty should normally function as if they were variables. 
They should be property functions and should be pure (or at least act like 
they're pure). I'm sure that a _lot_ of code will break if that isn't 
followed.

There are corner cases which can get a bit mucky though - e.g.

auto a = map!(to!string)(range);

In this case, front is pure, but it returns a new value each time (albeit a 
value that's equal each time until popFront is called). And there's no real 
way to fix that if the resulting range is random access (though if it weren't, 
the work could go in popFront, which _would_ make it so that front always 
returned the same result).

And there have been arguments over whether the result of front should be valid 
after popFront has been called (i.e. whether it's transient or not). A lot of 
code assumes that it will be, but we have some nasty exceptions (e.g. 
std.stdio.ByLine) - typically because front's a buffer which gets reused. 
IIRC, in those cases, Andrei favored saying that input ranges that weren't 
forward ranges could have a transient front but that forward ranges couldn't 
(which I tend to agree with, though I'd prefer that _no_ ranges have transient 
fronts, since it can really cause problems - e.g. std.array.array not 
working). I don't think that a consensus was reached on that though, since a 
few folks really liked using transient fronts with more complicated ranges.

In general though, I think that most of us would agree that front and empty 
should be treated as properties - i.e. as if they were variables - and that 
they should have try to stick to those semantics as closely as possible. 
Ranges that stray from that seriously risk not working with a lot of range-
based code.

- Jonathan M Davis

Mar 23 2014

"Tommi" <tommitissari hotmail.com> writes:

On Sunday, 23 March 2014 at 07:54:12 UTC, Jonathan M Davis wrote:
 If calling front were destructive, that would break a lot of 
 code.

I thought that "breaking existing code" meant either "causing 
existing code do something it wasn't supposed to do" or "causing 
existing code not compile", but you're using it in the meaning of 
"making existing code not conform to the language specification". 
Are you sure that's the correct use of the expression?

Mar 23 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 3/23/2014 12:53 AM, Jonathan M Davis wrote:
 But again, front and empty should normally function as if they were variables.
 They should be property functions and should be pure (or at least act like
 they're pure).

This is simply not practical. The canonical example is a tty. You cannot tell
if 
a character is available unless you try to read it. That is NOT pure.


 I'm sure that a _lot_ of code will break if that isn't followed.

It seems there have been a lot of undocumented assumptions about the behavior
of 
ranges.

Mar 25 2014

Marco Leise <Marco.Leise gmx.de> writes:

Am Sat, 22 Mar 2014 17:50:34 -0700
schrieb Walter Bright <newshound2 digitalmars.com>:

 It's become clear to me that we've underspecified what an InputRange is. The 
 normal way to use it is:
 
      while (!r.empty) {
          auto e = r.front;
          ... do something with e ...
          r.popFront();
      }
 
 no argument there. But there are two issues:
 
 1. If you know the range is not empty, is it allowed to call r.front without 
 calling r.empty first?
 
 If this is true, extra logic will need to be added to r.front in many cases.

Looking at ranges as a user with all the subjectivity it
entails, I'd expect .empty to be a read-only property, a
 safe-const-pure-nothrow candidate.

Actions that are not logically const are commonly encapsulated
in methods (as opposed to read-only properties like .empty) and
often also carry a verb in their name like "create", "update"
or "append".

 2. Can r.front be called n times in a row? I.e. is calling front() destructive?
 
 If true, this means that r.front will have to cache a copy in many cases.

On the other side of it were (IIRC) there are use cases for
return-by-ref on .front:

o .front gives access to a large struct. Return-by-ref can
  avoid a copy on the call site, but may result in
  several .front calls.
o .front is a view into some struct which holds system
  resources and providing a copy requires internals as in
  the File struct.

I both cases you deal with a range over value types that are
costly to copy and where "auto e = r.front;" is to be avoided.

I'm 100% unsure how to deal with this.

-- 
Marco

Mar 23 2014

"w0rp" <devw0rp gmail.com> writes:

I understand it like this.

* empty - Are there no more values?
* front - Get me the current value.
* popFront - Advance to the next value.

In terms of how I implement an InputRange in general, I typically 
end up with this.

* empty - Advance and cache "current value," return true if we 
ran out of values.
* front - enforce(!empty), which in turn caches the current 
value, which we then return.
* popFront - enforce(!empty), clear the cached value so we can 
later advance.

So .front gives you the same thing again and again until you call 
popFront, you could indeed call .front before .empty, but you may 
get an exception. This obviously isn't how I implement all 
InputRanges, as there are better ways to write other ranges, such 
as ranges which operate on sets of integers or wrap arrays. This 
is just my last resort general case InputRange implementation.

.front assignment obviously replaces the cached value.

Mar 23 2014

"w0rp" <devw0rp gmail.com> writes:

I should add that I have implemented some ranges where .front and 
.popFront are both nothrow, as !empty doesn't "advance and cache" 
for these ranges and the check is moved into an in{} contract. 
For these ranges, they tend to behave like arrays with bounds 
checking, only now the bounds checking is turned off by virtue of 
-release.

Mar 23 2014

"Szymon Gatner" <noemail gmail.com> writes:

On Sunday, 23 March 2014 at 09:26:53 UTC, w0rp wrote:
 I understand it like this.

 * empty - Are there no more values?
 * front - Get me the current value.
 * popFront - Advance to the next value.

That is correct.

 In terms of how I implement an InputRange in general, I 
 typically end up with this.

 * empty - Advance and cache "current value," return true if we 
 ran out of values.
 * front - enforce(!empty), which in turn caches the current 
 value, which we then return.
 * popFront - enforce(!empty), clear the cached value so we can 
 later advance.

 So .front gives you the same thing again and again until you 
 call popFront, you could indeed call .front before .empty, but 
 you may get an exception. This obviously isn't how I implement 
 all InputRanges, as there are better ways to write other 
 ranges, such as ranges which operate on sets of integers or 
 wrap arrays. This is just my last resort general case 
 InputRange implementation.

 .front assignment obviously replaces the cached value.

That is not consistent with the first part of your reply and is 
incorrect imho. Calling empty() twice should not destroy 
anything, even according to your understanding. Neither should 
front(). User should be able to call them as many times as he 
wishes getting same value every time. Only popFront() mutate the 
range.

Mar 23 2014

"w0rp" <devw0rp gmail.com> writes:

On Sunday, 23 March 2014 at 09:38:43 UTC, Szymon Gatner wrote:
 On Sunday, 23 March 2014 at 09:26:53 UTC, w0rp wrote:
 I understand it like this.

 * empty - Are there no more values?
 * front - Get me the current value.
 * popFront - Advance to the next value.

 That is correct.

 In terms of how I implement an InputRange in general, I 
 typically end up with this.

 * empty - Advance and cache "current value," return true if we 
 ran out of values.
 * front - enforce(!empty), which in turn caches the current 
 value, which we then return.
 * popFront - enforce(!empty), clear the cached value so we can 
 later advance.

 So .front gives you the same thing again and again until you 
 call popFront, you could indeed call .front before .empty, but 
 you may get an exception. This obviously isn't how I implement 
 all InputRanges, as there are better ways to write other 
 ranges, such as ranges which operate on sets of integers or 
 wrap arrays. This is just my last resort general case 
 InputRange implementation.

 .front assignment obviously replaces the cached value.

 That is not consistent with the first part of your reply and is 
 incorrect imho. Calling empty() twice should not destroy 
 anything, even according to your understanding. Neither should 
 front(). User should be able to call them as many times as he 
 wishes getting same value every time. Only popFront() mutate 
 the range.

You read wrong. Say you have this sequence of three numbers and 
'x' represents the cached value being empty, for a range 'r.'

---
1 2 3
x

r.empty

1 2 3
^

r.empty

1 2 3
^

r.front == 1

r.popFront()

1 2 3
   x

r.empty

1 2 3
   ^
---

front and popFront each enforce(!empty) first, so this works, etc.

---
1 2 3
x

r.popFront()



1 2 3
^



1 2 3
   x

---

So it works for any sequence of values, and you can call .empty 
and .front as much as you want without changing the result. It 
just so happens that it's actually .empty which does the work of 
actually fetching the next value. There are better ways to 
implement this for certain ranges, but this works for anything.

Mar 24 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 3/23/2014 2:26 AM, w0rp wrote:
 .front assignment obviously replaces the cached value.

This is another undocumented assumption.

Mar 25 2014

"Szymon Gatner" <noemail gmail.com> writes:

On Sunday, 23 March 2014 at 00:50:34 UTC, Walter Bright wrote:
 1. If you know the range is not empty, is it allowed to call 
 r.front without calling r.empty first?

IMO: yes. Logic of empty() sould be const and not have side 
effects.

 If this is true, extra logic will need to be added to r.front 
 in many cases.

 2. Can r.front be called n times in a row? I.e. is calling 
 front() destructive?

 If true, this means that r.front will have to cache a copy in 
 many cases.

Yes, all true. Not sure if there is something like that in Phobos 
but if for example you had RandomValueRange, ever call to front() 
should return the same random number until popFront() is called 
at which point internal front variable is being recalculated and 
cached.

Since only popFront() is non-const, it is there where all the 
magic/mutation should happen.

In my code I also have CachingRange which calls front() on 
underlying range the first time and then stores result 
internally. I use it for ranges which generate front() on the fly 
and I knot it is expensive. I could cache that calculation 
directly in that underlying range but CachingRange is reusable.

Mar 23 2014

"Szymon Gatner" <noemail gmail.com> writes:

On Sunday, 23 March 2014 at 09:34:28 UTC, Szymon Gatner wrote:
 On Sunday, 23 March 2014 at 00:50:34 UTC, Walter Bright wrote:
 1. If you know the range is not empty, is it allowed to call 
 r.front without calling r.empty first?

 IMO: yes. Logic of empty() sould be const and not have side 
 effects.

 If this is true, extra logic will need to be added to r.front 
 in many cases.

 2. Can r.front be called n times in a row? I.e. is calling 
 front() destructive?

 If true, this means that r.front will have to cache a copy in 
 many cases.

 Yes, all true. Not sure if there is something like that in 
 Phobos but if for example you had RandomValueRange, ever call 
 to front() should return the same random number until 
 popFront() is called at which point internal front variable is 
 being recalculated and cached.

 Since only popFront() is non-const, it is there where all the 
 magic/mutation should happen.

 In my code I also have CachingRange which calls front() on 
 underlying range the first time and then stores result 
 internally. I use it for ranges which generate front() on the 
 fly and I knot it is expensive. I could cache that calculation 
 directly in that underlying range but CachingRange is reusable.

I tend to think about pair of C++ iterators, which are 
generalization of pointers. So for C pointers 'first' and 'last':

*first -> front()
first != last -> empty()
++first -> popFont()

And I stick to semantics.

Mar 23 2014

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Sun, 23 Mar 2014 05:34:26 -0400, Szymon Gatner <noemail gmail.com>  
wrote:

 On Sunday, 23 March 2014 at 00:50:34 UTC, Walter Bright wrote:
 1. If you know the range is not empty, is it allowed to call r.front  
 without calling r.empty first?

 IMO: yes. Logic of empty() sould be const and not have side effects.

Here is the crux of the issue. I think aside from Walter's question, we  
have situations where it is expensive to construct each value of a range.  
For a range that is never used, that is an expense that you don't have to  
pay.

There was a post a little while ago about how File.byLine reads the first  
line on construction. Consider the case where stdin.byLine needs to be  
constructed, but not *iterated* before writing some information to stdout.  
This gives us a predicament that the first line has to be available before  
you can output any instructions. As we all know, stdin will block on first  
read, unless you piped in a file or something. This makes an interactive  
program simply incorrect.

 From the library point of view, caching the first line on construction is  
the most sensible thing -- This allows empty, front, and popFront to all  
be consistent across the entire iteration. Making empty or front do  
something different on the first access requires awkward machinery (a  
boolean indicating it should do so, plus a check every call thereafter).  
But from the user's point of view, sometimes you want lazy access to the  
first element, for logical reasons.

I think in light of such possibilities, empty and front should not have to  
be const or pure. Logically, it can be viewed that way, but not  
technically.

-Steve

Mar 24 2014

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

On 23/03/14 08:53, Jonathan M Davis wrote:
 But again, front and empty should normally function as if they were variables.
 They should be property functions and should be pure (or at least act like
 they're pure). I'm sure that a _lot_ of code will break if that isn't
 followed.

There are some not-very-nice exceptions to that in std.random, where in many of 
the range types you have stuff along the following lines:

     auto front()  property
     {
         if (notInitialized)
         {
             init();
         }
         return _value;
     }

see e.g. MersenneTwister, RandomSample, and others.  It's all a nasty 
consequence of that RNGs-as-value-types problem we've discussed previously.

However, the class-based std.random2 that is being discussed on the announce 
list fixes that:
http://forum.dlang.org/thread/cyytvhixkqlbwkmiugex forum.dlang.org

 In general though, I think that most of us would agree that front and empty
 should be treated as properties - i.e. as if they were variables - and that
 they should have try to stick to those semantics as closely as possible.
 Ranges that stray from that seriously risk not working with a lot of range-
 based code.

Yes, absolutely.

Mar 23 2014

Timon Gehr <timon.gehr gmx.ch> writes:

On 03/23/2014 01:50 AM, Walter Bright wrote:
 2. Can r.front be called n times in a row? I.e. is calling front()
 destructive?

 If true, this means that r.front will have to cache a copy in many cases.

An analogous problem exists and is more severe for RandomAccessRanges.

import std.algorithm, std.range;
class Obj{ this(int){} }
void main(){
     auto x=[1,2,3].map!(a=>new Obj(a));
     auto a=x.front;
     auto b=x.front;
     auto c=x[0];
     auto d=x[0];
     assert(a is b); // fail
     assert(b is c); // fail
     assert(c is d); // fail
}

Mar 23 2014

"monarch_dodra" <monarchdodra gmail.com> writes:

On Sunday, 23 March 2014 at 11:52:53 UTC, Timon Gehr wrote:
 On 03/23/2014 01:50 AM, Walter Bright wrote:
 2. Can r.front be called n times in a row? I.e. is calling 
 front()
 destructive?

 If true, this means that r.front will have to cache a copy in 
 many cases.

 An analogous problem exists and is more severe for 
 RandomAccessRanges.

 import std.algorithm, std.range;
 class Obj{ this(int){} }
 void main(){
     auto x=[1,2,3].map!(a=>new Obj(a));
     auto a=x.front;
     auto b=x.front;
     auto c=x[0];
     auto d=x[0];
     assert(a is b); // fail
     assert(b is c); // fail
     assert(c is d); // fail
 }

I have an open proposal for a "cache" range adaptor that somewhat 
eliviates the problem:
https://github.com/D-Programming-Language/phobos/pull/1364

But it restricts the range to bidirectional (for the reasons 
above).

Actually, it's mimited to Bidir, but if you *do* use it as 
bidirectional, then 1 element will be evaluated twice. So I'm 
wondering if it might not be better to simply restrict it to 
forward...

Mar 24 2014

"Tommi" <tommitissari hotmail.com> writes:

On Sunday, 23 March 2014 at 00:50:34 UTC, Walter Bright wrote:
 It's become clear to me that we've underspecified what an 
 InputRange is. The normal way to use it is:

     while (!r.empty) {
         auto e = r.front;
         ... do something with e ...
         r.popFront();
     }

 no argument there. But there are two issues:

 1. If you know the range is not empty, is it allowed to call 
 r.front without calling r.empty first?

 If this is true, extra logic will need to be added to r.front 
 in many cases.

 2. Can r.front be called n times in a row? I.e. is calling 
 front() destructive?

 If true, this means that r.front will have to cache a copy in 
 many cases.

Is InputRange supposed to be a one-pass range and am I supposed 
to able to use InputRange as a lightweight wrapper for a stream? 
If the answer is yes, then I think the fundamental issue is that 
the empty-front-popFront interface is not optimal for something 
like InputRange. But given that's the interface we have, I think 
that InputRange's front must be allowed to be destructive because 
the stream it could potentially be wrapping is destructive.

Mar 23 2014

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Sun, 23 Mar 2014 09:00:17 -0400, Tommi <tommitissari hotmail.com> wrote:

 On Sunday, 23 March 2014 at 00:50:34 UTC, Walter Bright wrote:
 It's become clear to me that we've underspecified what an InputRange  
 is. The normal way to use it is:

     while (!r.empty) {
         auto e = r.front;
         ... do something with e ...
         r.popFront();
     }

 no argument there. But there are two issues:

 1. If you know the range is not empty, is it allowed to call r.front  
 without calling r.empty first?

 If this is true, extra logic will need to be added to r.front in many  
 cases.

 2. Can r.front be called n times in a row? I.e. is calling front()  
 destructive?

 If true, this means that r.front will have to cache a copy in many  
 cases.

 Is InputRange supposed to be a one-pass range and am I supposed to able  
 to use InputRange as a lightweight wrapper for a stream? If the answer  
 is yes, then I think the fundamental issue is that the  
 empty-front-popFront interface is not optimal for something like  
 InputRange. But given that's the interface we have, I think that  
 InputRange's front must be allowed to be destructive because the stream  
 it could potentially be wrapping is destructive.

A range interface only works for a buffered stream, which naturally allows  
caching.

-Steve

Mar 24 2014

"Dicebot" <public dicebot.lv> writes:

On Sunday, 23 March 2014 at 00:50:34 UTC, Walter Bright wrote:
 It's become clear to me that we've underspecified what an 
 InputRange is. The normal way to use it is:

     while (!r.empty) {
         auto e = r.front;
         ... do something with e ...
         r.popFront();
     }

 no argument there. But there are two issues:

 1. If you know the range is not empty, is it allowed to call 
 r.front without calling r.empty first?

 If this is true, extra logic will need to be added to r.front 
 in many cases.

 2. Can r.front be called n times in a row? I.e. is calling 
 front() destructive?

 If true, this means that r.front will have to cache a copy in 
 many cases.

I think there is one design mistake with current InputRange rules 
that makes usage so inconsistent. We have `empty` but don't have 
any distinct `not yet started` state. If calling `popFront` at 
least once was required before accessing `front`, I can't imagine 
the case where having any non-trivial code in `front` would have 
been necessary.

Mar 24 2014

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

On 24/03/14 14:20, Dicebot wrote:
 I think there is one design mistake with current InputRange rules that makes
 usage so inconsistent. We have `empty` but don't have any distinct `not yet
 started` state. If calling `popFront` at least once was required before
 accessing `front`, I can't imagine the case where having any non-trivial code
in
 `front` would have been necessary.

I floated some ideas along those lines, for a "first" method that would be 
automatically called immediately before the first call to front, popFront,
etc., 
whatever came first.

The trouble is that it's so vague _when_ it should be called.  Example from 
std.random: RandomSample not only has a .front which needs a check as to
whether 
the first value has actually been selected, it also has an .index method which 
corresponds to the index of the chosen element of the range being sampled from.

Now, how is a "first" method supposed to know which methods it should precede? 
Any method?  Specified ones?  In any case, even if we could get it to work,
it'd 
be a convenience rather than a solution, and would still in effect result in a 
non-const .front.

OTOH a requirement that .popFront() be called at least once before .front is 
accessed won't wash either.  At least in the case of randomSample, the code 
required to generate the _first_ sample point is different from what is
required 
to .popFront().

Mar 24 2014

"Dicebot" <public dicebot.lv> writes:

On Monday, 24 March 2014 at 22:26:10 UTC, Joseph Rushton Wakeling 
wrote:
 On 24/03/14 14:20, Dicebot wrote:
 I think there is one design mistake with current InputRange 
 rules that makes
 usage so inconsistent. We have `empty` but don't have any 
 distinct `not yet
 started` state. If calling `popFront` at least once was 
 required before
 accessing `front`, I can't imagine the case where having any 
 non-trivial code in
 `front` would have been necessary.

 I floated some ideas along those lines, for a "first" method 
 that would be automatically called immediately before the first 
 call to front, popFront, etc., whatever came first.

I was thinking about something more simple. Current pattern is:

while (!r.empty)
{
     auto useme = r.front;
     r.popFront;
}

And I think this would have been more practical:

r.popFront;

while (!r.empty)
{
    // same
}

So every range is supposed to start in "init" state and provide 
data only after first popFront. No other changes. It is not a 
silver bullet but looks like an improvement over current design 
to me.

Mar 25 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 3/25/14, 12:16 PM, Dicebot wrote:
 On Monday, 24 March 2014 at 22:26:10 UTC, Joseph Rushton Wakeling wrote:
 On 24/03/14 14:20, Dicebot wrote:
 I think there is one design mistake with current InputRange rules
 that makes
 usage so inconsistent. We have `empty` but don't have any distinct
 `not yet
 started` state. If calling `popFront` at least once was required before
 accessing `front`, I can't imagine the case where having any
 non-trivial code in
 `front` would have been necessary.

 I floated some ideas along those lines, for a "first" method that
 would be automatically called immediately before the first call to
 front, popFront, etc., whatever came first.

 I was thinking about something more simple. Current pattern is:

 while (!r.empty)
 {
      auto useme = r.front;
      r.popFront;
 }

 And I think this would have been more practical:

 r.popFront;

 while (!r.empty)
 {
     // same
 }

 So every range is supposed to start in "init" state and provide data
 only after first popFront. No other changes. It is not a silver bullet
 but looks like an improvement over current design to me.

Suggestion: focusing on what to do within the present context is more 
productive.

Andrei

Mar 25 2014

"Dicebot" <public dicebot.lv> writes:

On Tuesday, 25 March 2014 at 19:44:46 UTC, Andrei Alexan
 Suggestion: focusing on what to do within the present context 
 is more productive.

 Andrei

I do work on template argument pack & friends right now, leave me 
some fun! :P

Mar 25 2014

"Chris" <wendlec tcd.ie> writes:

On Sunday, 23 March 2014 at 00:50:34 UTC, Walter Bright wrote:
 It's become clear to me that we've underspecified what an 
 InputRange is. The normal way to use it is:

     while (!r.empty) {
         auto e = r.front;
         ... do something with e ...
         r.popFront();
     }

 no argument there. But there are two issues:

 1. If you know the range is not empty, is it allowed to call 
 r.front without calling r.empty first?

 If this is true, extra logic will need to be added to r.front 
 in many cases.

 2. Can r.front be called n times in a row? I.e. is calling 
 front() destructive?

 If true, this means that r.front will have to cache a copy in 
 many cases.

In many cases it makes sense to put things into front, for 
example if you read a file line by line, parsing each line to 
create a dictionary entry. Although it has been said that any 
logic should go into popFront, because front can be called 
several times. Since Ranges usually perform some kind of 
filtering mechanism and return something new, it makes sense that 
things go into front, or at least should be allowed to go into 
front. In my experience it is a bit of a "the hen or the egg" 
problem, to get something you have to call front, but the 
something should only be done in popFront. Ranges are often 
simpler and faster to write, when you put things into front, 
because in most of my use cases front is not called many times in 
a row, just once for each item. I still haven't found the golden 
way(s) of writing ranges.

Mar 25 2014

Walter Bright <newshound2 digitalmars.com> writes:

It's pretty clear that:

1. the protocol is COMPLETELY undocumented and undefined.

2. in this vacuum, people (including me) have made all sorts of assumptions,
and 
assumed those assumptions were universal and shared.

Since ranges+algorithms are central to D, this is a very serious problem. There 
are two competing schools of thought:

1. maximum flexibility

2. maximum performance

(1) has been explained well in this thread already. (2) not so much, so I'll 
stand up for that one.

I've been recently involved in writing high performance apps, as anyone 
following ScopeBuffer knows. With such apps, it's clear that:

1. size matters - the fewer registers an object fits in, the faster it goes
2. every instruction matters - more operations == slower code
3. lazy seems to be faster, mainly because it eliminates the storage management 
cost of buffers and extra copying into and out of those buffers

We want to appeal to the high performance coders. To maximize performance, 
ranges should be optimized to the inner loop case, which is:

     while (!r.empty) { auto e = r.front; ... e ...; r.popFront(); }

With the additional proviso that ranges are passed to algorithms by value, so 
they should be cheap to copy. Cheap to copy usually means them being small.

A) I know that my range is not empty, so I can skip calling empty.

Since front is guaranteed to succeed if !empty, this puts a requirement on many 
ranges that they have a non-trivial constructor that 'primes the pump'. Of 
course, priming may fail, and so construction may throw, which is not good 
design. Next, priming requires a buffer in the range. Buffers add size, making 
them slower to copy, and will often require another field saying if the buffer 
has data in it or not, further bumping the size. And lastly, it means that lazy 
ranges will be required to read the first element, even if the range isn't then 
subsequently used, which defeats what one would expect from a lazy range. By 
having mere construction of a range initiate reading from source, this makes
the 
useful idiom of separating construction from use impractical. (For example, 
building a pipeline object, then sending it to another thread for execution.)

All this saves for the user is one call to empty for the entire algorithm, at a 
cost incurred with every iteration. I.e. it selects O(n) to save O(1).

B) I want to be able to call front multiple times in a row, and expect to get 
the same value.

This can force the range to incorporate a one element buffer and a flag 
indicating the status of that buffer. The range instance gets bigger and more 
expensive to copy, and the cost of manipulating the flag and the buffer is
added 
to every loop iteration. Note that the user of a range can trivially use:
      auto e = r.front;
      ... using e multiple times ...
instead.

The big problem with (A) and (B) is these costs become present in every range, 
despite being rarely needed and having simple alternatives. This is the wrong 
place to put cost and complexity. The user should be making these decisions 
based on his needs.

Hence, I propose that the protocol for using input ranges be defined as:

     while (!r.empty) { auto e = r.front; ... e ...; r.popFront(); }

This makes it possible to build pipelines that are firehoses with no kinks or 
constrictions in them. It optimizes for the inner loop case, not boundary cases.

Mar 25 2014

"monarch_dodra" <monarchdodra gmail.com> writes:

On Tuesday, 25 March 2014 at 20:15:32 UTC, Walter Bright wrote:
 It's pretty clear that:

 1. the protocol is COMPLETELY undocumented and undefined.

http://dlang.org/phobos/std_range.html#isInputRange

The semantics of an input range (not checkable during 
compilation) are assumed to be the following (r is an object of 
type R):
r.empty returns false iff there is more data available in the 
range.
r.front returns the current element in the range. It may return 
by value or by reference. Calling r.front is allowed only if 
calling r.empty has, or would have, returned false.
r.popFront advances to the next element in the range. Calling 
r.popFront is allowed only if calling r.empty has, or would have, 
returned false.

 We want to appeal to the high performance coders. To maximize 
 performance, ranges should be optimized to the inner loop case, 
 which is:

     while (!r.empty) { auto e = r.front; ... e ...; 
 r.popFront(); }

This makes the assumption that r.front is copy constructible at 
all. It also makes the assumption that you want to operate on a 
copy, rather than the actual element in the range.

Finally, it means having to declare a local object: It merely 
means shifting the burden of caching from one context to another. 
If the object is large, chances are you are better off just 
calling front instead of making a copy. Especially if the loop is 
trivial.

If you want high performance, then arguably, just provide a O(1) 
front, and a O(1) empty.

Also, certain ranges, such as "filter" *must* access the front of 
the previous range more than once. Unless you are suggesting we 
add a field for it to cache the result of the previous range?

 With the additional proviso that ranges are passed to 
 algorithms by value, so they should be cheap to copy. Cheap to 
 copy usually means them being small.

Unfortunately yes. That said, any range that does anything will 
have at least two fields, one of which is a slice, or comparable 
to in terms of size, so it's going to be big anyways. So you 
*are* better off passing by ref if you can regardless, unless 
your range is *really* trivial.

I agree that range sizes can be a problem.

 A) I know that my range is not empty, so I can skip calling 
 empty.

 Since front is guaranteed to succeed if !empty, this puts a 
 requirement on many ranges that they have a non-trivial 
 constructor that 'primes the pump'. Of course, priming may 
 fail, and so construction may throw, which is not good design.

If the prime fails, then the range can simply be marked as empty. 
Then if you decide to skip calling empty anyways, it's your own 
fault.

 And lastly, it means that lazy ranges will be required to read 
 the first element, even if the range isn't then subsequently 
 used, which defeats what one would expect from a lazy range.

I'm not yet convinced of adding special code for ranges that 
aren't used. I've heard of these kinds of ranges, but I've 
observed that when you declare one, it almost always ends up 
being used. I don't think we should waste efforts on this rare 
usecase.

As for "Lazy", in range terms, it mostly only means you calculate 
things element at once, as you go up the range chain. As opposed 
to processing the entire input data, one transformation at a time.

Evaluating an element "on use" as opposed to "1 instruction 
before use" doesn't make much of a change in this context.

 All this saves for the user is one call to empty for the entire 
 algorithm, at a cost incurred with every iteration. I.e. it 
 selects O(n) to save O(1).

If code that was actually meant to *do* something was in 
popFront() to begin with, then there'd be no extra overhead.

I've found that if you are creative enough, you can usually 
design the range in such a way that it works efficiently, 
lazilly, and without flags.

 Hence, I propose that the protocol for using input ranges be 
 defined as:

     while (!r.empty) { auto e = r.front; ... e ...; 
 r.popFront(); }

 This makes it possible to build pipelines that are firehoses 
 with no kinks or constrictions in them. It optimizes for the 
 inner loop case, not boundary cases.

I get where you are coming from, but it's simply not manageable 
in a generic fashion, if you want to be able to preserve all the 
power and the diversity of the ranges we have.

The protocol you are suggesting would prevent us from doing a lot 
of the lovely things that ranges empowers us with.

Mar 25 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 3/25/2014 1:56 PM, monarch_dodra wrote:
 http://dlang.org/phobos/std_range.html#isInputRange

 The semantics of an input range (not checkable during compilation) are assumed
 to be the following (r is an object of type R):
 r.empty returns false iff there is more data available in the range.
 r.front returns the current element in the range. It may return by value or by
 reference. Calling r.front is allowed only if calling r.empty has, or would
 have, returned false.
 r.popFront advances to the next element in the range. Calling r.popFront is
 allowed only if calling r.empty has, or would have, returned false.

I overlooked that. Thanks.

 We want to appeal to the high performance coders. To maximize performance,
 ranges should be optimized to the inner loop case, which is:

     while (!r.empty) { auto e = r.front; ... e ...; r.popFront(); }

 This makes the assumption that r.front is copy constructible at all.
 It also
 makes the assumption that you want to operate on a copy, rather than the actual
 element in the range.

It's a reasonable requirement. If your range has an issue with this, it can 
return a pointer to the element, and the element can be a struct with access 
functions. Then, the pointer will work as well as a copy.


 Finally, it means having to declare a local object: It merely means shifting
the
 burden of caching from one context to another. If the object is large, chances
 are you are better off just calling front instead of making a copy. Especially
 if the loop is trivial.

This does come up as an issue, and is solvable by returning a pointer as I 
described. It's up to the designer of the range.


 If you want high performance, then arguably, just provide a O(1) front, and a
 O(1) empty.

I don't think the issues can be waved away so easily, or I wouldn't have
brought 
this up.


 Also, certain ranges, such as "filter" *must* access the front of the previous
 range more than once.
 Unless you are suggesting we add a field for it to cache
 the result of the previous range?

This is putting the cost where it belongs - when needed on the user of a range, 
rather than on all ranges. It's the "pay only for what you need" idea rather 
than "pay regardless".


 With the additional proviso that ranges are passed to algorithms by value, so
 they should be cheap to copy. Cheap to copy usually means them being small.

 Unfortunately yes. That said, any range that does anything will have at least
 two fields, one of which is a slice, or comparable to in terms of size, so it's
 going to be big anyways. So you *are* better off passing by ref if you can
 regardless, unless your range is *really* trivial.

Many very useful ranges are trivial. Or at least they should be. An array, for 
example, is a trivial range.


 A) I know that my range is not empty, so I can skip calling empty.

 Since front is guaranteed to succeed if !empty, this puts a requirement on
 many ranges that they have a non-trivial constructor that 'primes the pump'.
 Of course, priming may fail, and so construction may throw, which is not good
 design.

 If the prime fails, then the range can simply be marked as empty. Then if you
 decide to skip calling empty anyways, it's your own fault.

Yes, one can add state flags to indicate failed construction, which I'll argue 
is an ugly design. After all, construction is supposed to construct an object
or 
fail, not leave the 'constructed' object in a zombie state.


 And lastly, it means that lazy ranges will be required to read the first
 element, even if the range isn't then subsequently used, which defeats what
 one would expect from a lazy range.

 I'm not yet convinced of adding special code for ranges that aren't used. I've
 heard of these kinds of ranges, but I've observed that when you declare one, it
 almost always ends up being used. I don't think we should waste efforts on this
 rare usecase.


throw 
this entire category of use cases under the bus just to handle a convenience of 
not needing to call empty in some cases?


 Evaluating an element "on use" as opposed to "1 instruction before use" doesn't
 make much of a change in this context.

Except that it requires the use to start upon construction, which defeats any 
hope of separating construction of a pipeline from using it.


 I've found that if you are creative enough, you can usually design the range in
 such a way that it works efficiently, lazilly, and without flags.

That's not been my experience.


 I get where you are coming from, but it's simply not manageable in a generic
 fashion, if you want to be able to preserve all the power and the diversity of
 the ranges we have.

 The protocol you are suggesting would prevent us from doing a lot of the lovely
 things that ranges empowers us with.

Please show me such a case. Note that I've shown above that this power and 
diversity throws an entire use case category under the bus. Secondly, in my 
experiments with ranges, the power and diversity results in slower pipelines.

Mar 25 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 3/25/14, 1:15 PM, Walter Bright wrote:
 Of course, priming may fail, and so construction may throw, which is not
 good design.

This part I disagree with. -- Andrei

Mar 25 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 3/25/14, 1:15 PM, Walter Bright wrote:
 Next, priming requires a buffer in the range.

No, front requires a buffer in the range. -- Andrei

Mar 25 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

I thought I had only 1-2 comments, but I have a few more.

On 3/25/14, 1:15 PM, Walter Bright wrote:
 1. the protocol is COMPLETELY undocumented and undefined.

s/COMPLETELY/loosely/

 Since ranges+algorithms are central to D, this is a very serious
 problem.

Agreed.

 We want to appeal to the high performance coders. To maximize
 performance, ranges should be optimized to the inner loop case, which is:

      while (!r.empty) { auto e = r.front; ... e ...; r.popFront(); }

Actually generic code often prefers:

while (!r.empty) { ... r.front ...; r.popFront(); }

That saves copying r.front if it returns by ref. A bunch of algos in std 
do that.

 Since front is guaranteed to succeed if !empty, this puts a requirement
 on many ranges that they have a non-trivial constructor that 'primes the
 pump'. Of course, priming may fail, and so construction may throw, which
 is not good design.

Again, I disagree with this assertion.

 Next, priming requires a buffer in the range.

Priming has nothing do to with the range being buffered. The entire 
design of ranges is a one-element buffer.

 Buffers add size, making them slower to copy, and will often require
 another field saying if the buffer has data in it or not, further
 bumping the size.

That's an argument for adding an unbuffered range abstraction.

 All this saves for the user is one call to empty for the entire
 algorithm, at a cost incurred with every iteration. I.e. it selects O(n)
 to save O(1).

I don't understand how that O() reasoning works.

 B) I want to be able to call front multiple times in a row, and expect
 to get the same value.

Correct.

 This can force the range to incorporate a one element buffer and a flag
 indicating the status of that buffer.

It may, but many ranges naturally work that way (e.g. arrays).

 The range instance gets bigger and
 more expensive to copy, and the cost of manipulating the flag and the
 buffer is added to every loop iteration. Note that the user of a range
 can trivially use:
       auto e = r.front;
       ... using e multiple times ...
 instead.

That would pessimize code using arrays of large structs.

 The big problem with (A) and (B) is these costs become present in every
 range, despite being rarely needed and having simple alternatives. This
 is the wrong place to put cost and complexity. The user should be making
 these decisions based on his needs.

 Hence, I propose that the protocol for using input ranges be defined as:

      while (!r.empty) { auto e = r.front; ... e ...; r.popFront(); }

 This makes it possible to build pipelines that are firehoses with no
 kinks or constrictions in them. It optimizes for the inner loop case,
 not boundary cases.

The proposed protocol pessimizes arrays of large structs and will trip 
the unwary if calling r.front again returns something else. I don't 
think the proposal is good.


Andrei

Mar 25 2014

Timon Gehr <timon.gehr gmx.ch> writes:

On 03/25/2014 10:29 PM, Andrei Alexandrescu wrote:
 B) I want to be able to call front multiple times in a row, and expect
 to get the same value.

 Correct.

'map' may fail this criterion. In this case, is the blame put on the user?

Mar 25 2014

"monarch_dodra" <monarchdodra gmail.com> writes:

On Tuesday, 25 March 2014 at 22:54:14 UTC, Timon Gehr wrote:
 On 03/25/2014 10:29 PM, Andrei Alexandrescu wrote:
 B) I want to be able to call front multiple times in a row, 
 and expect
 to get the same value.

 Correct.

 'map' may fail this criterion. In this case, is the blame put 
 on the user?

Depends how you define "same" though :/ But yeah.

Also, shameless plug about my "cache" range adaptor:
https://github.com/D-Programming-Language/phobos/pull/1364

Mar 25 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 3/25/2014 2:29 PM, Andrei Alexandrescu wrote:
 We want to appeal to the high performance coders. To maximize
 performance, ranges should be optimized to the inner loop case, which is:

      while (!r.empty) { auto e = r.front; ... e ...; r.popFront(); }

 Actually generic code often prefers:

 while (!r.empty) { ... r.front ...; r.popFront(); }

 That saves copying r.front if it returns by ref.

Yes, I agree. The idiom foreach (ref e; r) { ... } must also work.


 Since front is guaranteed to succeed if !empty, this puts a requirement
 on many ranges that they have a non-trivial constructor that 'primes the
 pump'. Of course, priming may fail, and so construction may throw, which
 is not good design.

 Again, I disagree with this assertion.

The throwing part, or the not good design part?


 Next, priming requires a buffer in the range.

 Priming has nothing do to with the range being buffered. The entire design of
 ranges is a one-element buffer.
 Buffers add size, making them slower to copy, and will often require
 another field saying if the buffer has data in it or not, further
 bumping the size.

 That's an argument for adding an unbuffered range abstraction.

I don't know how that would fit into the ecosystem.

In any case, with the protocol I suggest, the designer of the range is free to 
distribute the functionality into the three functions, however it makes best 
sense. With arbitrarily calling any in any order, then all three need to have 
some sort of signalling mechanism between them.


 All this saves for the user is one call to empty for the entire
 algorithm, at a cost incurred with every iteration. I.e. it selects O(n)
 to save O(1).

 I don't understand how that O() reasoning works.

I meant adding extra signalling with every iteration of the loop, rather than 
requiring the user to call empty before front.


 B) I want to be able to call front multiple times in a row, and expect
 to get the same value.

 Correct.

 This can force the range to incorporate a one element buffer and a flag
 indicating the status of that buffer.

 It may, but many ranges naturally work that way (e.g. arrays).

Sure, but there are far more range types than arrays.


 The range instance gets bigger and
 more expensive to copy, and the cost of manipulating the flag and the
 buffer is added to every loop iteration. Note that the user of a range
 can trivially use:
       auto e = r.front;
       ... using e multiple times ...
 instead.

 That would pessimize code using arrays of large structs.

You're already requiring copying with the buffering requirement. And besides,
if 
I was creating a range of expensive-to-copy objects, I would add a layer of 
indirection to cheapen it.


 The proposed protocol pessimizes arrays of large structs

Not really more than the existing protocol does (i.e. required buffering).

 and will trip the unwary if calling r.front again returns something else.

I'm not proposing that calling them wrongly would make things unsafe. But I 
don't think it's unreasonable to expect the protocol to be followed, just like 
file open/read/close.


 I don't think the proposal is good.

I don't like being in the position of when I need high performance code, I have 
to implement my own ranges & algorithms, or telling customers they need to do
so.

Mar 25 2014

"Regan Heath" <regan netmail.co.nz> writes:

On Tue, 25 Mar 2014 23:22:18 -0000, Walter Bright  
<newshound2 digitalmars.com> wrote:
 On 3/25/2014 2:29 PM, Andrei Alexandrescu wrote:
 The range instance gets bigger and
 more expensive to copy, and the cost of manipulating the flag and the
 buffer is added to every loop iteration. Note that the user of a range
 can trivially use:
       auto e = r.front;
       ... using e multiple times ...
 instead.

 That would pessimize code using arrays of large structs.

 You're already requiring copying with the buffering requirement. And  
 besides, if I was creating a range of expensive-to-copy objects, I would  
 add a layer of indirection to cheapen it.

Surely you'd simply start with a range of pointers to expensive-to-copy  
objects?  Or, return them by reference from the underlying  
range/array/source.  You want to avoid *ever* copying them except  
explicitly where required.

 The proposed protocol pessimizes arrays of large structs

 Not really more than the existing protocol does (i.e. required  
 buffering).

 and will trip the unwary if calling r.front again returns something  
 else.

 I'm not proposing that calling them wrongly would make things unsafe.  
 But I don't think it's unreasonable to expect the protocol to be  
 followed, just like file open/read/close.

My immediate expectation upon encountering ranges was that r.front would  
return the same item repeatedly until r.popFront was called.  Breaking  
that guarantee will trip a *lot* of people up.

IMO the rules should be something like:
  - r.empty WILL return false if there is more data available in the range.

  - r.empty MUST be called before r.front, r.front WILL succeed if r.empty  
returned false.
  - r.front WILL repeatably return the current element in the range. It MAY  
return by value or by reference.

  - r.empty SHOULD be called before r.popFront, otherwise r.popFront MAY do  
nothing or throw
    (could also make this one a 'MUST')
  - r.popFront WILL advance to the next element in the range.

  - "lazy" ranges SHOULD delay initialisation until r.empty is called

R

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/

Mar 26 2014

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Wed, 26 Mar 2014 06:46:26 -0400, Regan Heath <regan netmail.co.nz>  
wrote:

 On Tue, 25 Mar 2014 23:22:18 -0000, Walter Bright  
 <newshound2 digitalmars.com> wrote:
 On 3/25/2014 2:29 PM, Andrei Alexandrescu wrote:
 The range instance gets bigger and
 more expensive to copy, and the cost of manipulating the flag and the
 buffer is added to every loop iteration. Note that the user of a range
 can trivially use:
       auto e = r.front;
       ... using e multiple times ...
 instead.

 That would pessimize code using arrays of large structs.

 You're already requiring copying with the buffering requirement. And  
 besides, if I was creating a range of expensive-to-copy objects, I  
 would add a layer of indirection to cheapen it.

 Surely you'd simply start with a range of pointers to expensive-to-copy  
 objects?  Or, return them by reference from the underlying  
 range/array/source.  You want to avoid *ever* copying them except  
 explicitly where required.

 The proposed protocol pessimizes arrays of large structs

 Not really more than the existing protocol does (i.e. required  
 buffering).

 and will trip the unwary if calling r.front again returns something  
 else.

 I'm not proposing that calling them wrongly would make things unsafe.  
 But I don't think it's unreasonable to expect the protocol to be  
 followed, just like file open/read/close.

 My immediate expectation upon encountering ranges was that r.front would  
 return the same item repeatedly until r.popFront was called.  Breaking  
 that guarantee will trip a *lot* of people up.

 IMO the rules should be something like:
   - r.empty WILL return false if there is more data available in the  
 range.

   - r.empty MUST be called before r.front, r.front WILL succeed if  
 r.empty returned false.
   - r.front WILL repeatably return the current element in the range. It  
 MAY return by value or by reference.

   - r.empty SHOULD be called before r.popFront, otherwise r.popFront MAY  
 do nothing or throw
     (could also make this one a 'MUST')
   - r.popFront WILL advance to the next element in the range.

These two rules are not necessary if you know the range is not empty. See  
the conversation inside this pull:  
https://github.com/D-Programming-Language/phobos/pull/1987

-Steve

Mar 26 2014

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Wed, 26 Mar 2014 08:29:15 -0400, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:

 On Wed, 26 Mar 2014 06:46:26 -0400, Regan Heath <regan netmail.co.nz>  
 wrote:
 IMO the rules should be something like:
   - r.empty WILL return false if there is more data available in the  
 range.

   - r.empty MUST be called before r.front, r.front WILL succeed if  
 r.empty returned false.
   - r.front WILL repeatably return the current element in the range. It  
 MAY return by value or by reference.

   - r.empty SHOULD be called before r.popFront, otherwise r.popFront  
 MAY do nothing or throw
     (could also make this one a 'MUST')
   - r.popFront WILL advance to the next element in the range.

 These two rules are not necessary if you know the range is not empty.  
 See the conversation inside this pull:  
 https://github.com/D-Programming-Language/phobos/pull/1987

Gah, I didn't cut out the right rules. I meant the two rules that empty  
must be called before others. Those are not necessary.

-Steve

Mar 26 2014

"Regan Heath" <regan netmail.co.nz> writes:

On Wed, 26 Mar 2014 12:30:53 -0000, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:

 On Wed, 26 Mar 2014 08:29:15 -0400, Steven Schveighoffer  
 <schveiguy yahoo.com> wrote:

 On Wed, 26 Mar 2014 06:46:26 -0400, Regan Heath <regan netmail.co.nz>  
 wrote:
 IMO the rules should be something like:
   - r.empty WILL return false if there is more data available in the  
 range.

   - r.empty MUST be called before r.front, r.front WILL succeed if  
 r.empty returned false.
   - r.front WILL repeatably return the current element in the range.  
 It MAY return by value or by reference.

   - r.empty SHOULD be called before r.popFront, otherwise r.popFront  
 MAY do nothing or throw
     (could also make this one a 'MUST')
   - r.popFront WILL advance to the next element in the range.

 These two rules are not necessary if you know the range is not empty.  
 See the conversation inside this pull:  
 https://github.com/D-Programming-Language/phobos/pull/1987

 Gah, I didn't cut out the right rules. I meant the two rules that empty  
 must be called before others. Those are not necessary.

I see.  I was thinking we ought to make empty mandatory to give more  
guaranteed structure for range implementors, so lazy initialisation can be  
done in one place only, etc etc.

R

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/

Mar 26 2014

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Wed, 26 Mar 2014 11:09:04 -0400, Regan Heath <regan netmail.co.nz>  
wrote:

 On Wed, 26 Mar 2014 12:30:53 -0000, Steven Schveighoffer  
 <schveiguy yahoo.com> wrote:

 Gah, I didn't cut out the right rules. I meant the two rules that empty  
 must be called before others. Those are not necessary.

 I see.  I was thinking we ought to make empty mandatory to give more  
 guaranteed structure for range implementors, so lazy initialisation can  
 be done in one place only, etc etc.

Yes, but when you know that empty is going to return false, there isn't  
any logical reason to call it. It is an awkward requirement.

I had the same thinking as you, why pay for an extra check for all 3  
calls? But there was already evidence that people were avoiding empty.

-Steve

Mar 26 2014

"monarch_dodra" <monarchdodra gmail.com> writes:

On Wednesday, 26 March 2014 at 15:37:38 UTC, Steven Schveighoffer 
wrote:
 Yes, but when you know that empty is going to return false, 
 there isn't any logical reason to call it. It is an awkward 
 requirement.

 -Steve

Not only that, but it's also a performance criteria: If you are 
iterating on two ranges at once (think "copy"), then you *know* 
"range2" is longer than "range1", even if you don't know its 
length.

Why pay for "range2.empty", when you know it'll always be false? 
There is a noticeable performance difference if you *don't* check.

Mar 26 2014

"Regan Heath" <regan netmail.co.nz> writes:

On Wed, 26 Mar 2014 16:38:57 -0000, monarch_dodra <monarchdodra gmail.com>  
wrote:

 On Wednesday, 26 March 2014 at 15:37:38 UTC, Steven Schveighoffer wrote:
 Yes, but when you know that empty is going to return false, there isn't  
 any logical reason to call it. It is an awkward requirement.

 -Steve

 Not only that, but it's also a performance criteria: If you are  
 iterating on two ranges at once (think "copy"), then you *know* "range2"  
 is longer than "range1", even if you don't know its length.

What guarantees range2 is longer than range1?  The isArray case checks  
explicitly, but the generic one doesn't.  Is it a property of being an  
output range that it will expand as required, or..

 Why pay for "range2.empty", when you know it'll always be false? There  
 is a noticeable performance difference if you *don't* check.

But aren't you instead paying for 2 checks in front and 2 in popFront, so  
4 checks vs 1?  Or is the argument that these 4 checks cannot be removed  
even if we mandate r.empty is called before r.front/popFront.

R

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/

Mar 26 2014

"monarch_dodra" <monarchdodra gmail.com> writes:

On Wednesday, 26 March 2014 at 16:55:48 UTC, Regan Heath wrote:
 On Wed, 26 Mar 2014 16:38:57 -0000, monarch_dodra
 Not only that, but it's also a performance criteria: If you 
 are iterating on two ranges at once (think "copy"), then you 
 *know* "range2" is longer than "range1", even if you don't 
 know its length.

 What guarantees range2 is longer than range1?  The isArray case 
 checks explicitly, but the generic one doesn't.  Is it a 
 property of being an output range that it will expand as 
 required, or..

The interface: The target *shall* have enough room to accommodate 
origin. Failure to meet that criteria is an Error.

Output ranges may or may not auto expand as required. Arguably, 
it's a design flaw I don't want to get started on.

 Why pay for "range2.empty", when you know it'll always be 
 false? There is a noticeable performance difference if you 
 *don't* check.

 But aren't you instead paying for 2 checks in front and 2 in 
 popFront, so 4 checks vs 1?  Or is the argument that these 4 
 checks cannot be removed even if we mandate r.empty is called 
 before r.front/popFront.

I don't know what checks you are talking about. Most ranges don't 
check anything on front or popFront. They merely assume they are 
in a state that where they can do their job. Failure to meet this 
condition prior to the call (note I didn't say "failure to check 
for"), means Error.

It's really not much different from when doing an strcpy. "++p" 
and "*p" don't check anything. The fact that ranges *can* check 
doesn't always mean they should.

Mar 26 2014

"Regan Heath" <regan netmail.co.nz> writes:

On Wed, 26 Mar 2014 17:32:30 -0000, monarch_dodra <monarchdodra gmail.com>  
wrote:

 On Wednesday, 26 March 2014 at 16:55:48 UTC, Regan Heath wrote:
 On Wed, 26 Mar 2014 16:38:57 -0000, monarch_dodra
 Not only that, but it's also a performance criteria: If you are  
 iterating on two ranges at once (think "copy"), then you *know*  
 "range2" is longer than "range1", even if you don't know its length.

 What guarantees range2 is longer than range1?  The isArray case checks  
 explicitly, but the generic one doesn't.  Is it a property of being an  
 output range that it will expand as required, or..

 The interface: The target *shall* have enough room to accommodate  
 origin. Failure to meet that criteria is an Error.

Ok.  So long as *something* is throwing that Error I am down with this.

 Output ranges may or may not auto expand as required. Arguably, it's a  
 design flaw I don't want to get started on.

:)

 Why pay for "range2.empty", when you know it'll always be false? There  
 is a noticeable performance difference if you *don't* check.

 But aren't you instead paying for 2 checks in front and 2 in popFront,  
 so 4 checks vs 1?  Or is the argument that these 4 checks cannot be  
 removed even if we mandate r.empty is called before r.front/popFront.

 I don't know what checks you are talking about. Most ranges don't check  
 anything on front or popFront. They merely assume they are in a state  
 that where they can do their job. Failure to meet this condition prior  
 to the call (note I didn't say "failure to check for"), means Error.

Ok.. but lets take a naive range of say int with a 1 element cache in the  
member variable "int cache;".  The simplest possible front would just be  
"return cache;".  But, if cache hasn't been populated yet it's not going  
to throw an Error, it's just going to be wrong.

So, presumably front has to check another boolean to verify it's been  
populated and throw an Error if not.  That's one of the checks I meant.  A  
typical loop over a range will call front one or more times, so you pay  
for that check 1 or more times per loop.

popFront in this example doesn't need to check anything, it just populates  
cache regardless, as does empty.

But, I imagine there are ranges which need some initial setup, and they  
have to do it somewhere, and they need to check they have done it in  
empty, front and popFront for every call.  It's those checks we'd like to  
avoid if we can.

So.. if we mandate that empty MUST be called, then they could just be done  
in one place, empty.

However, in this situation nothing would be enforcing that requirement (in  
release anyway) and things could just go wrong.  So, perhaps the checks  
always need to be there and we gain nothing by mandating empty is called  
first.

idunno.

 It's really not much different from when doing an strcpy. "++p" and "*p"  
 don't check anything. The fact that ranges *can* check doesn't always  
 mean they should.

Sure.  For performance reasons they might not, but isn't this just a tiny  
bit safer if we mandate empty must be called and do one check there..

R

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/

Mar 26 2014

"Daniel Murphy" <yebbliesnospam gmail.com> writes:

"Regan Heath"  wrote in message 
news:op.xdb9a9v354xghj puck.auriga.bhead.co.uk...

 What guarantees range2 is longer than range1?  The isArray case checks 
 explicitly, but the generic one doesn't.  Is it a property of being an 
 output range that it will expand as required, or..

Some ranges will give you their length...

Mar 26 2014

"Regan Heath" <regan netmail.co.nz> writes:

On Thu, 27 Mar 2014 02:44:13 -0000, Daniel Murphy  
<yebbliesnospam gmail.com> wrote:

 "Regan Heath"  wrote in message  
 news:op.xdb9a9v354xghj puck.auriga.bhead.co.uk...

 What guarantees range2 is longer than range1?  The isArray case checks  
 explicitly, but the generic one doesn't.  Is it a property of being an  
 output range that it will expand as required, or..

 Some ranges will give you their length...

Sure.  And generally you could use it.  copy() doesn't, and I was talking  
specifically about that example.  :)

R

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/

Mar 27 2014

"Regan Heath" <regan netmail.co.nz> writes:

On Wed, 26 Mar 2014 15:37:38 -0000, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:

 On Wed, 26 Mar 2014 11:09:04 -0400, Regan Heath <regan netmail.co.nz>  
 wrote:

 On Wed, 26 Mar 2014 12:30:53 -0000, Steven Schveighoffer  
 <schveiguy yahoo.com> wrote:

 Gah, I didn't cut out the right rules. I meant the two rules that  
 empty must be called before others. Those are not necessary.

 I see.  I was thinking we ought to make empty mandatory to give more  
 guaranteed structure for range implementors, so lazy initialisation can  
 be done in one place only, etc etc.

 Yes, but when you know that empty is going to return false, there isn't  
 any logical reason to call it. It is an awkward requirement.

Sure, it's not required for some algorithms in some situations.

 I had the same thinking as you, why pay for an extra check for all 3  
 calls? But there was already evidence that people were avoiding empty.

Sure, as above, makes perfect sense.

It seemed from this thread that there was some confusion about how ranges  
should be written and used, and I thought it might help if the  
requirements were more fixed, that's all.

If r.empty was mandatory then every range implementer would have a place  
to lazily initialise, r.front would be simpler, r.popFront too.  Basically  
it would lower the bar for "good" range implementations.

We might just need better documentation tho.

R

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/

Mar 26 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 3/26/14, 8:37 AM, Steven Schveighoffer wrote:
 On Wed, 26 Mar 2014 11:09:04 -0400, Regan Heath <regan netmail.co.nz>
 wrote:

 On Wed, 26 Mar 2014 12:30:53 -0000, Steven Schveighoffer
 <schveiguy yahoo.com> wrote:

 Gah, I didn't cut out the right rules. I meant the two rules that
 empty must be called before others. Those are not necessary.

 I see.  I was thinking we ought to make empty mandatory to give more
 guaranteed structure for range implementors, so lazy initialisation
 can be done in one place only, etc etc.

 Yes, but when you know that empty is going to return false, there isn't
 any logical reason to call it. It is an awkward requirement.

 I had the same thinking as you, why pay for an extra check for all 3
 calls? But there was already evidence that people were avoiding empty.

I think requiring users to call empty before front on input ranges is a 
concession we should make.

Andrei

Mar 26 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Wednesday, 26 March 2014 at 17:36:08 UTC, Andrei Alexandrescu 
wrote:
 I think requiring users to call empty before front on input 
 ranges is a concession we should make.

Then the name should change to "ready". It makes sense to require 
the user to check that the range is "ready", but not to check 
that it is "not empty". This will also make more sense for async 
implementations that will block if "not ready".

IMO the whole interface needs rethinking if you want to 
gracefully support async data streams where you need to 
distinguish between: "ready" vs "empty", "front" vs 
"firstavailable". Both quick-sort, merge-sort, filter and map 
work well with async data streams.

Mar 26 2014

"Paulo Pinto" <pjmlp progtools.org> writes:

On Wednesday, 26 March 2014 at 18:04:44 UTC, Ola Fosheim Grøstad
wrote:
 On Wednesday, 26 March 2014 at 17:36:08 UTC, Andrei 
 Alexandrescu wrote:
 I think requiring users to call empty before front on input 
 ranges is a concession we should make.

 Then the name should change to "ready". It makes sense to 
 require the user to check that the range is "ready", but not to 
 check that it is "not empty". This will also make more sense 
 for async implementations that will block if "not ready".

 IMO the whole interface needs rethinking if you want to 
 gracefully support async data streams where you need to 
 distinguish between: "ready" vs "empty", "front" vs 
 "firstavailable". Both quick-sort, merge-sort, filter and map 
 work well with async data streams.

Why not?

It is how iterators work in most OO languages.

--
Paulo

Mar 27 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Thursday, 27 March 2014 at 08:13:28 UTC, Paulo Pinto wrote:
 Why not?

 It is how iterators work in most OO languages.

Why not what? A query for "empty()" should not have any side 
effects. In what library does that happen? Tests should in 
general not have side effects unless the name suggest so.

Anyway, I wish D was more clear about use scenarios. If D is 
going to be efficient in the server space then it should also 
make low latency programming easier, meaning supporting async 
collections out-of-the-box.

Mar 27 2014

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Wed, 26 Mar 2014 13:36:08 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 On 3/26/14, 8:37 AM, Steven Schveighoffer wrote:
 On Wed, 26 Mar 2014 11:09:04 -0400, Regan Heath <regan netmail.co.nz>
 wrote:

 On Wed, 26 Mar 2014 12:30:53 -0000, Steven Schveighoffer
 <schveiguy yahoo.com> wrote:

 Gah, I didn't cut out the right rules. I meant the two rules that
 empty must be called before others. Those are not necessary.

 I see.  I was thinking we ought to make empty mandatory to give more
 guaranteed structure for range implementors, so lazy initialisation
 can be done in one place only, etc etc.

 Yes, but when you know that empty is going to return false, there isn't
 any logical reason to call it. It is an awkward requirement.

 I had the same thinking as you, why pay for an extra check for all 3
 calls? But there was already evidence that people were avoiding empty.

 I think requiring users to call empty before front on input ranges is a  
 concession we should make.

if(!r.empty)
{
    auto r2 = map!(x => x * 2)(r);
    do
    {
       auto x = r2.front;
       ...
    } while(!r2.empty);
}

Should we be required to re-verify that r2 is not empty before using it?  
It clearly is not, and would be an artificial requirement (one that map  
likely would not enforce!).

This sounds so much like a convention that simply won't be followed, and  
the result will be perfectly valid code.

-Steve

Mar 26 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 3/26/2014 7:19 PM, Steven Schveighoffer wrote:
 if(!r.empty)
 {
     auto r2 = map!(x => x * 2)(r);
     do
     {
        auto x = r2.front;
        ...
     } while(!r2.empty);
 }

 Should we be required to re-verify that r2 is not empty before using it? It
 clearly is not, and would be an artificial requirement (one that map likely
 would not enforce!).

 This sounds so much like a convention that simply won't be followed, and the
 result will be perfectly valid code.

The idea is that there are three functions: empty, front, and popFront. The 
functionality of the range will be distributed between these 3 functions. 
Different things being ranged over will naturally split their operations into 
the 3 functions in different ways.

If the 3 functions can be called in any order, then extra logic has to be added 
to them to signal to each other. This slows things down.

To write generic code calling ranges, it's necessary to stop assuming what is 
happening under the hood of empty, front, and popFront, and stick to the
protocol.

Mar 26 2014

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Wed, 26 Mar 2014 22:48:16 -0400, Walter Bright  
<newshound2 digitalmars.com> wrote:

 On 3/26/2014 7:19 PM, Steven Schveighoffer wrote:
 if(!r.empty)
 {
     auto r2 = map!(x => x * 2)(r);
     do
     {
        auto x = r2.front;
        ...
     } while(!r2.empty);
 }

 Should we be required to re-verify that r2 is not empty before using  
 it? It
 clearly is not, and would be an artificial requirement (one that map  
 likely
 would not enforce!).

 This sounds so much like a convention that simply won't be followed,  
 and the
 result will be perfectly valid code.

 The idea is that there are three functions: empty, front, and popFront.  
 The functionality of the range will be distributed between these 3  
 functions. Different things being ranged over will naturally split their  
 operations into the 3 functions in different ways.

 If the 3 functions can be called in any order, then extra logic has to  
 be added to them to signal to each other. This slows things down.

 To write generic code calling ranges, it's necessary to stop assuming  
 what is happening under the hood of empty, front, and popFront, and  
 stick to the protocol.

OK, but it's logical to assume you *can* avoid a call to empty if you know  
what's going on under the hood, no? Then at that point, you have lost the  
requirement -- people will avoid calling empty because they can get away  
with it, and then altering the under-the-hood requirements cause code  
breakage later.

Case in point, the pull request I referenced, the author originally tried  
to just use empty to lazily initialize filter, but it failed due to  
existing code in phobos that did not call empty on filtered data before  
processing. He had to instrument all 3 calls.

-Steve

Mar 26 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 3/26/2014 7:55 PM, Steven Schveighoffer wrote:
 OK, but it's logical to assume you *can* avoid a call to empty if you know
 what's going on under the hood, no? Then at that point, you have lost the
 requirement -- people will avoid calling empty because they can get away with
 it, and then altering the under-the-hood requirements cause code breakage
later.

 Case in point, the pull request I referenced, the author originally tried to
 just use empty to lazily initialize filter, but it failed due to existing code
 in phobos that did not call empty on filtered data before processing. He had to
 instrument all 3 calls.

As with *any* API, if you look under the hood and make assumptions about the 
behavior based on a particular implementation, assumptions that are not part of 
the API, the risk of breakage inevitably follows.

If you've identified Phobos code that uses ranges but does not follow the 
protocol, the Phobos code is broken - please file a bugzilla issue on it.

Mar 26 2014

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:

On Thursday, 27 March 2014 at 04:17:16 UTC, Walter Bright wrote:
 On 3/26/2014 7:55 PM, Steven Schveighoffer wrote:
 OK, but it's logical to assume you *can* avoid a call to empty 
 if you know
 what's going on under the hood, no? Then at that point, you 
 have lost the
 requirement -- people will avoid calling empty because they 
 can get away with
 it, and then altering the under-the-hood requirements cause 
 code breakage later.

 Case in point, the pull request I referenced, the author 
 originally tried to
 just use empty to lazily initialize filter, but it failed due 
 to existing code
 in phobos that did not call empty on filtered data before 
 processing. He had to
 instrument all 3 calls.

 As with *any* API, if you look under the hood and make 
 assumptions about the behavior based on a particular 
 implementation, assumptions that are not part of the API, the 
 risk of breakage inevitably follows.

 If you've identified Phobos code that uses ranges but does not 
 follow the protocol, the Phobos code is broken - please file a 
 bugzilla issue on it.

I was originally going to do that, but then I took a closer look 
at the documentation, which says ([1] in the documentation of 
`isInputRange()`):

"Calling r.front is allowed only if calling r.empty has, or would 
have, returned false."

(And the same for `popFront()`.)

That is, the documentation more or less explicitly states that 
you don't actually need to call `empty` if you know it returned 
`true`.

[1] http://dlang.org/phobos/std_range.html

Mar 27 2014

"Regan Heath" <regan netmail.co.nz> writes:

On Thu, 27 Mar 2014 10:49:42 -0000, Marc Sch=FCtz <schuetzm gmx.net> wro=
te:

 On Thursday, 27 March 2014 at 04:17:16 UTC, Walter Bright wrote:
 On 3/26/2014 7:55 PM, Steven Schveighoffer wrote:
 OK, but it's logical to assume you *can* avoid a call to empty if yo=



u  =

 know
 what's going on under the hood, no? Then at that point, you have los=



t  =

 the
 requirement -- people will avoid calling empty because they can get =



 =

 away with
 it, and then altering the under-the-hood requirements cause code  =



 breakage later.

 Case in point, the pull request I referenced, the author originally =



 =

 tried to
 just use empty to lazily initialize filter, but it failed due to  =



 existing code
 in phobos that did not call empty on filtered data before processing=



.  =

 He had to
 instrument all 3 calls.

 As with *any* API, if you look under the hood and make assumptions  =


 about the behavior based on a particular implementation, assumptions =


 =

 that are not part of the API, the risk of breakage inevitably follows=


.
 If you've identified Phobos code that uses ranges but does not follow=


  =

 the protocol, the Phobos code is broken - please file a bugzilla issu=


e  =

 on it.

 I was originally going to do that, but then I took a closer look at th=

e  =

 documentation, which says ([1] in the documentation of `isInputRange()=

`):
 "Calling r.front is allowed only if calling r.empty has, or would have=

,  =

 returned false."

 (And the same for `popFront()`.)

 That is, the documentation more or less explicitly states that you don=

't  =

 actually need to call `empty` if you know it returned `true`.

 [1] http://dlang.org/phobos/std_range.html

That's because up until now we've made no attempt to set this in stone, =
 =

and as such many interpretations have surfaced.  This documentation woul=
d,  =

of course, change to match the final decision made.

R

-- =

Using Opera's revolutionary email client: http://www.opera.com/mail/

Mar 27 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 3/27/14, 3:49 AM, "Marc Schütz" <schuetzm gmx.net>" wrote:
 I was originally going to do that, but then I took a closer look at the
 documentation, which says ([1] in the documentation of `isInputRange()`):

 "Calling r.front is allowed only if calling r.empty has, or would have,
 returned false."

Probably we need to amend that. For efficient ranges, front() and 
popFront() should only be guaranteed to work if either empty() or 
length() were evaluated first, and they returned false or nonzero, 
respectively.

Andrei

Mar 27 2014

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Thu, 27 Mar 2014 11:19:15 -0400, Andrei Alexandrescu  =

<SeeWebsiteForEmail erdani.org> wrote:

 On 3/27/14, 3:49 AM, "Marc Sch=C3=BCtz" <schuetzm gmx.net>" wrote:
 I was originally going to do that, but then I took a closer look at t=


he
 documentation, which says ([1] in the documentation of  =


 `isInputRange()`):

 "Calling r.front is allowed only if calling r.empty has, or would hav=


e,
 returned false."

 Probably we need to amend that. For efficient ranges, front() and  =

 popFront() should only be guaranteed to work if either empty() or  =

 length() were evaluated first, and they returned false or nonzero,  =

 respectively.

This is a misleading statement. An array is probably the most efficient =
of  =

ranges, which does not require this.

We should be more specific here. For certain types of *nonempty* ranges,=
  =

front is only guaranteed to work if empty/length was called before front=
  =

is called. I would not necessarily lump popFront into that requirement  =

automatically, popFront may be able to cope with not having cached the  =

first element without losing efficiency.

Questions to answer:

1. What types of ranges does this rule apply to?
2. What is the cost of not requiring this in terms of efficiency and eas=
e  =

of implementation.
3. Is there a better mechanism to use than misusing empty? should we  =

introduce another property/call?

At the moment, the only candidates I can think of are lazy caching range=
s.  =

How many of those exist?

-Steve

Mar 27 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 3/27/14, 8:44 AM, Steven Schveighoffer wrote:
 At the moment, the only candidates I can think of are lazy caching
 ranges. How many of those exist?

filter comes to mind. -- Andrei

Mar 27 2014

"monarch_dodra" <monarchdodra gmail.com> writes:

On Thursday, 27 March 2014 at 15:19:37 UTC, Andrei Alexandrescu 
wrote:
 On 3/27/14, 3:49 AM, "Marc Schütz" <schuetzm gmx.net>" wrote:
 I was originally going to do that, but then I took a closer 
 look at the
 documentation, which says ([1] in the documentation of 
 `isInputRange()`):

 "Calling r.front is allowed only if calling r.empty has, or 
 would have,
 returned false."

 Probably we need to amend that. For efficient ranges, front() 
 and popFront() should only be guaranteed to work if either 
 empty() or length() were evaluated first, and they returned 
 false or nonzero, respectively.

I just want to point out that I think allowing empty to have an 
*observable* side effect will have cataclysmic repercussion in 
validating a release build, what with all our "assert(!empty);" 
calls and all.

Mar 27 2014

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Thu, 27 Mar 2014 12:02:16 -0400, monarch_dodra <monarchdodra gmail.co=
m>  =

wrote:

 On Thursday, 27 March 2014 at 15:19:37 UTC, Andrei Alexandrescu wrote:=

 On 3/27/14, 3:49 AM, "Marc Sch=C3=BCtz" <schuetzm gmx.net>" wrote:
 I was originally going to do that, but then I took a closer look at =



the
 documentation, which says ([1] in the documentation of  =



 `isInputRange()`):

 "Calling r.front is allowed only if calling r.empty has, or would ha=



ve,
 returned false."

 Probably we need to amend that. For efficient ranges, front() and  =


 popFront() should only be guaranteed to work if either empty() or  =


 length() were evaluated first, and they returned false or nonzero,  =


 respectively.

 I just want to point out that I think allowing empty to have an  =

 *observable* side effect will have cataclysmic repercussion in  =

 validating a release build, what with all our "assert(!empty);" calls =

 =

 and all.

This is an excellent point. assert(!empty) is everywhere.

-Steve

Mar 27 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 3/27/14, 11:53 AM, Steven Schveighoffer wrote:
 On Thu, 27 Mar 2014 12:02:16 -0400, monarch_dodra
 <monarchdodra gmail.com> wrote:

 On Thursday, 27 March 2014 at 15:19:37 UTC, Andrei Alexandrescu wrote:
 On 3/27/14, 3:49 AM, "Marc Schütz" <schuetzm gmx.net>" wrote:
 I was originally going to do that, but then I took a closer look at the
 documentation, which says ([1] in the documentation of
 `isInputRange()`):

 "Calling r.front is allowed only if calling r.empty has, or would have,
 returned false."

 Probably we need to amend that. For efficient ranges, front() and
 popFront() should only be guaranteed to work if either empty() or
 length() were evaluated first, and they returned false or nonzero,
 respectively.

 I just want to point out that I think allowing empty to have an
 *observable* side effect will have cataclysmic repercussion in
 validating a release build, what with all our "assert(!empty);" calls
 and all.

 This is an excellent point. assert(!empty) is everywhere.

 -Steve

Yah, agreed. -- Andrei

Mar 27 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 3/27/2014 12:50 PM, Andrei Alexandrescu wrote:
 Yah, agreed. -- Andrei

Completely unworkable. To determine if a range is empty or not, it may have to 
actually read from its input. TTYs are an example. Although empty may then
cache 
the result, and not read the second time it is called, an observer of TTY could 
see that an item was read from the TTY.

Mar 27 2014

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Thu, 27 Mar 2014 16:58:12 -0400, Walter Bright  
<newshound2 digitalmars.com> wrote:

 On 3/27/2014 12:50 PM, Andrei Alexandrescu wrote:
 Yah, agreed. -- Andrei

 Completely unworkable. To determine if a range is empty or not, it may  
 have to actually read from its input. TTYs are an example. Although  
 empty may then cache the result, and not read the second time it is  
 called, an observer of TTY could see that an item was read from the TTY.

In the land of ranges, it's construction and popFront that generally  
advances the range. Empty does not. This is why assert(!empty) is all over  
the place, it's considered to be non-destructive.

Note that a stream makes a terrible range, simply because of what you say  
-- reading is destructive, and determining emptiness is dependent on  
reading. You need a buffered stream to make a good range, and then the  
logic becomes much more straightforward.

The awkwardness for shoehorning streams into ranges I see is that the  
advancement (popFront) and the check for termination (empty) are  
decoupled, when in reality they are synced for a stream. This REQUIRES a  
sticky bit for popFront to communicate with empty on whether it is EOF or  
not.

Even a single byte buffer is not enough, you need a bool to indicate the  
stream is done.

-Steve

Mar 27 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 3/27/2014 2:13 PM, Steven Schveighoffer wrote:
 Note that a stream makes a terrible range, simply because of what you say --
 reading is destructive, and determining emptiness is dependent on reading. You
 need a buffered stream to make a good range, and then the logic becomes much
 more straightforward.

The range becomes the one element buffer in this case. It is completely
workable.


 The awkwardness for shoehorning streams into ranges I see is that the
 advancement (popFront) and the check for termination (empty) are decoupled,
when
 in reality they are synced for a stream. This REQUIRES a sticky bit for
popFront
 to communicate with empty on whether it is EOF or not.

The range protocol is designed to work with streams. It's a giant fail if they 
do not, or if you want to create a separate, non-range universe to deal with 
streams.


 Even a single byte buffer is not enough, you need a bool to indicate the stream
 is done.

Right. But empty for a stream still has to read. Just follow the protocol, and 
the range will work, even with streams.

Mar 27 2014

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Thu, 27 Mar 2014 17:24:11 -0400, Walter Bright  
<newshound2 digitalmars.com> wrote:

 On 3/27/2014 2:13 PM, Steven Schveighoffer wrote:
 Note that a stream makes a terrible range, simply because of what you  
 say --
 reading is destructive, and determining emptiness is dependent on  
 reading. You
 need a buffered stream to make a good range, and then the logic becomes  
 much
 more straightforward.

 The range becomes the one element buffer in this case. It is completely  
 workable.

A 1 byte buffered stream? If it's performance you are looking for, you  
will not find it there.

 The range protocol is designed to work with streams. It's a giant fail  
 if they do not, or if you want to create a separate, non-range universe  
 to deal with streams.

Ranges work well on top of buffered streams, but not AS streams.

 Even a single byte buffer is not enough, you need a bool to indicate  
 the stream
 is done.

 Right. But empty for a stream still has to read. Just follow the  
 protocol, and the range will work, even with streams.

A stream requires one primitive -- read. In one function call, you get the  
data you want, you can tell if it's empty, and the stream object does not  
need to concern itself with projecting a cached element. Adding range  
primitives on top of a stream does not make sense.

-Steve

Mar 27 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 3/27/2014 2:31 PM, Steven Schveighoffer wrote:
 Adding range primitives on top of a stream does not make sense.

Are ready to implement a parallel universe of stream based algorithms to go 
alongside all the range based ones and be ready to constantly justify that to 
everyone?

I'm also curious what a generic read() should do when the stream is empty.

Mar 27 2014

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Thu, 27 Mar 2014 20:25:56 -0400, Walter Bright  
<newshound2 digitalmars.com> wrote:

 On 3/27/2014 2:31 PM, Steven Schveighoffer wrote:
 Adding range primitives on top of a stream does not make sense.

 Are ready to implement a parallel universe of stream based algorithms to  
 go alongside all the range based ones and be ready to constantly justify  
 that to everyone?

Not necessary. You just need to implement one range on top of a buffered  
stream, and then it works with all other algorithms that accept input  
ranges.

 I'm also curious what a generic read() should do when the stream is  
 empty.

Returns 0 elements read.

-Steve

Mar 28 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 3/28/14, 3:42 AM, Steven Schveighoffer wrote:
 On Thu, 27 Mar 2014 20:25:56 -0400, Walter Bright
 <newshound2 digitalmars.com> wrote:

 On 3/27/2014 2:31 PM, Steven Schveighoffer wrote:
 Adding range primitives on top of a stream does not make sense.

 Are ready to implement a parallel universe of stream based algorithms
 to go alongside all the range based ones and be ready to constantly
 justify that to everyone?

 Not necessary. You just need to implement one range on top of a buffered
 stream, and then it works with all other algorithms that accept input
 ranges.

 I'm also curious what a generic read() should do when the stream is
 empty.

 Returns 0 elements read.

It increasingly seems to me we need to do this. -- Andrei

Mar 28 2014

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Fri, 28 Mar 2014 20:14:39 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 On 3/28/14, 3:42 AM, Steven Schveighoffer wrote:
 On Thu, 27 Mar 2014 20:25:56 -0400, Walter Bright
 <newshound2 digitalmars.com> wrote:

 On 3/27/2014 2:31 PM, Steven Schveighoffer wrote:
 Adding range primitives on top of a stream does not make sense.

 Are ready to implement a parallel universe of stream based algorithms
 to go alongside all the range based ones and be ready to constantly
 justify that to everyone?

 Not necessary. You just need to implement one range on top of a buffered
 stream, and then it works with all other algorithms that accept input
 ranges.

 I'm also curious what a generic read() should do when the stream is
 empty.

 Returns 0 elements read.

 It increasingly seems to me we need to do this. -- Andrei

It is in the works. I need to finish it. I've been incorporating Dmitry's  
I/O buffer into my design.

-Steve

Mar 28 2014

"monarch_dodra" <monarchdodra gmail.com> writes:

On Saturday, 29 March 2014 at 00:14:40 UTC, Andrei Alexandrescu 
wrote:
 It increasingly seems to me we need to do this. -- Andrei

What I find funny here is that walter is saying we must enforce 
that empty always be called in the range - "for performance".

Yet about a year ago, he made a proposal for a 
"SentinelInputRange", where you could do away with empty 
altogether because "tl,dr: PERFORMANCE!".

http://forum.dlang.org/thread/kgmatj$1v8b$1 digitalmars.com

If you need to deviate from the rules for extra performance, then 
fine. Just document that it is a lower level range, with useage 
limitations.

However, making a global design change because *1* type of range 
needs it, "for performance", I'm not fine with.

Mar 29 2014

"monarch_dodra" <monarchdodra gmail.com> writes:

On Saturday, 29 March 2014 at 00:14:40 UTC, Andrei Alexandrescu 
wrote:
 It increasingly seems to me we need to do this. -- Andrei

Question: Should this new restriction only be applied to 
*Input*Ranges? If so, then it might make more sense.

Ranges that also satisfy the *Forward* interface (which has 
save), or specifically, the *Output* interface (that doesn't even 
have empty) should not have this obligation. In particular, 
output ranges *write*, so usually can't even buffer data.

*Input*-only Ranges are the ones that are usually implemented 
over streams, and that are most subject to the "test empty" issue 
too, so they are the ones that need this restriction.

Could this maybe be a good solution for everyone?

Mar 29 2014

"monarch_dodra" <monarchdodra gmail.com> writes:

On Saturday, 29 March 2014 at 09:06:45 UTC, monarch_dodra wrote:
 On Saturday, 29 March 2014 at 00:14:40 UTC, Andrei Alexandrescu 
 wrote:
 It increasingly seems to me we need to do this. -- Andrei

 Question: Should this new restriction only be applied to 
 *Input*Ranges? If so, then it might make more sense.

 Ranges that also satisfy the *Forward* interface (which has 
 save), or specifically, the *Output* interface (that doesn't 
 even have empty) should not have this obligation. In 
 particular, output ranges *write*, so usually can't even buffer 
 data.

 *Input*-only Ranges are the ones that are usually implemented 
 over streams, and that are most subject to the "test empty" 
 issue too, so they are the ones that need this restriction.

 Could this maybe be a good solution for everyone?

Did this conversation die the instant I actually had something 
smart to say?

 andralex,  WalterBright ? Thoughts?

Mar 30 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 3/28/2014 3:42 AM, Steven Schveighoffer wrote:
 I'm also curious what a generic read() should do when the stream is empty.

 Returns 0 elements read.

Meaning it must also write through a pointer if it did read something.

How is that faster than a 1 element buffer?

Mar 28 2014

Marco Leise <Marco.Leise gmx.de> writes:

Am Fri, 28 Mar 2014 19:23:29 -0700
schrieb Walter Bright <newshound2 digitalmars.com>:

 On 3/28/2014 3:42 AM, Steven Schveighoffer wrote:
 I'm also curious what a generic read() should do when the stream is empty.

 Returns 0 elements read.


I guess we all have a clear concept of streams in our mind
from all kinds of programming languages. They typically
operate on bytes, have an EOF flag and offer read/write
methods that you pass a byte pointer and a length into. The
result is the count of bytes read or written.
Optionally they have an "available" property and handle any
combination of the following:
o all basic data types of the programming language
o POD structs
o formatted strings
o bitwise operations
o seeking after calculating offsets

They are used for I/O on heterogeneous data like binary
file formats or network protocols.

Ranges on the other hand work on sequences of items of the
same type, which is a small subset of what streams are
supposed to support. While there should be a connection
between both worlds, one cannot replace the other. There will
always be a separate raw stream type in Phobos.

 Meaning it must also write through a pointer if it did read something.
 
 How is that faster than a 1 element buffer?

You can write your stream in such a way that you "map" a
memory area. E.g. you call "stream.waitForSoManyBytes(123);"
and then "ubyte[] arr = stream.mapMemory(123);" where arr is
then a slice into the stream's internal buffer.
(This also requires a mechanism to release the
 buffer after use, so the stream can reuse it:
 stream.doneWithSoManyBytes(123);)

-- 
Marco

Mar 29 2014

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Fri, 28 Mar 2014 22:23:29 -0400, Walter Bright  
<newshound2 digitalmars.com> wrote:

 On 3/28/2014 3:42 AM, Steven Schveighoffer wrote:
 I'm also curious what a generic read() should do when the stream is  
 empty.

 Returns 0 elements read.

 Meaning it must also write through a pointer if it did read something.

 How is that faster than a 1 element buffer?

I don't understand this point/question.

-Steve

Mar 31 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 3/27/14, 2:24 PM, Walter Bright wrote:
 The range protocol is designed to work with streams.

It's designed to work with containers.

 It's a giant fail
 if they do not, or if you want to create a separate, non-range universe
 to deal with streams.

It's not a giant fail, we just need to adjust the notion.


Andrei

Mar 27 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 3/27/2014 2:56 PM, Andrei Alexandrescu wrote:
 On 3/27/14, 2:24 PM, Walter Bright wrote:
 The range protocol is designed to work with streams.

 It's designed to work with containers.

I know we talked about streams when we designed it.


 It's a giant fail
 if they do not, or if you want to create a separate, non-range universe
 to deal with streams.

 It's not a giant fail, we just need to adjust the notion.

Are you suggesting that ranges needn't support streams?

Note also that I suggested a way Steven could create an adapter with the 
behavior he desired, yet still adhere to protocol. No notion adjustments
required.

Mar 27 2014

Johannes Pfau <nospam example.com> writes:

Am Thu, 27 Mar 2014 17:20:25 -0700
schrieb Walter Bright <newshound2 digitalmars.com>:

 On 3/27/2014 2:56 PM, Andrei Alexandrescu wrote:
 On 3/27/14, 2:24 PM, Walter Bright wrote:
 The range protocol is designed to work with streams.

 It's designed to work with containers.

 
 I know we talked about streams when we designed it.
 
 
 It's a giant fail
 if they do not, or if you want to create a separate, non-range
 universe to deal with streams.

 It's not a giant fail, we just need to adjust the notion.

 
 Are you suggesting that ranges needn't support streams?
 
 Note also that I suggested a way Steven could create an adapter with
 the behavior he desired, yet still adhere to protocol. No notion
 adjustments required.

Ranges have equivalents in other languages:
iterators in c++,

Iterator in java

all these languages have special stream types for raw data. I don't
think it's bad if we also have streams/ranges separate in D.

Mar 28 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 3/28/2014 1:32 AM, Johannes Pfau wrote:
 Ranges have equivalents in other languages:
 iterators in c++,

 Iterator in java

 all these languages have special stream types for raw data. I don't
 think it's bad if we also have streams/ranges separate in D.


Do you see a point to be able to, in an algorithm, seamlessly swap a socket
with 
a string?

empty-front-popFront works with streams and non-streams. Why break this?

Mar 28 2014

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

28-Mar-2014 13:55, Walter Bright пишет:
 On 3/28/2014 1:32 AM, Johannes Pfau wrote:
 Ranges have equivalents in other languages:
 iterators in c++,

 Iterator in java

 all these languages have special stream types for raw data. I don't
 think it's bad if we also have streams/ranges separate in D.


 Do you see a point to be able to, in an algorithm, seamlessly swap a
 socket with a string?

Certainly NOT a socket. There is no escaping the fact that there are 
specifics to unbuffered direct streams.
What you mention only makes sense with buffering either implicit or (I'd 
prefer) explicit.

 empty-front-popFront works with streams and non-streams. Why break this?

Pipe dream.

-- 
Dmitry Olshansky

Mar 28 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 3/28/2014 9:48 AM, Dmitry Olshansky wrote:
 28-Mar-2014 13:55, Walter Bright пишет:
 On 3/28/2014 1:32 AM, Johannes Pfau wrote:
 Ranges have equivalents in other languages:
 iterators in c++,

 Iterator in java

 all these languages have special stream types for raw data. I don't
 think it's bad if we also have streams/ranges separate in D.


 Do you see a point to be able to, in an algorithm, seamlessly swap a
 socket with a string?

 Certainly NOT a socket. There is no escaping the fact that there are specifics
 to unbuffered direct streams.
 What you mention only makes sense with buffering either implicit or (I'd
prefer)
 explicit.

Yes, it does require a one element buffer. But seriously, does a one character 
buffer from a socket have a measurable impact on reading from a network? I'm an 
efficiency wonk as much or more than anyone, and this appears to me to be a 
false savings.

Mar 28 2014

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

28-Mar-2014 21:07, Walter Bright пишет:
 On 3/28/2014 9:48 AM, Dmitry Olshansky wrote:
 28-Mar-2014 13:55, Walter Bright пишет:
 On 3/28/2014 1:32 AM, Johannes Pfau wrote:
 Ranges have equivalents in other languages:
 iterators in c++,

 Iterator in java

 all these languages have special stream types for raw data. I don't
 think it's bad if we also have streams/ranges separate in D.


 Do you see a point to be able to, in an algorithm, seamlessly swap a
 socket with a string?

 Certainly NOT a socket. There is no escaping the fact that there are
 specifics
 to unbuffered direct streams.
 What you mention only makes sense with buffering either implicit or
 (I'd prefer)
 explicit.

 Yes, it does require a one element buffer. But seriously, does a one
 character buffer from a socket have a measurable impact on reading from
 a network?

WAT? The overhead is in issuing system calls, you'd want to do as little 
of them as possible. Reading byte by byte is an exemplar of idiocy in 
I/O code.

 I'm an efficiency wonk as much or more than anyone, and this
 appears to me to be a false savings.

Oh crap. This is very wrong. Do you often work with I/O and networking?

-- 
Dmitry Olshansky

Mar 28 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 3/28/2014 10:11 AM, Dmitry Olshansky wrote:
 WAT? The overhead is in issuing system calls, you'd want to do as little of
them
 as possible. Reading byte by byte is an exemplar of idiocy in I/O code.

That's why we have things like byLine().

Mar 28 2014

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

28-Mar-2014 22:29, Walter Bright пишет:
 On 3/28/2014 10:11 AM, Dmitry Olshansky wrote:
 WAT? The overhead is in issuing system calls, you'd want to do as
 little of them
 as possible. Reading byte by byte is an exemplar of idiocy in I/O code.

 That's why we have things like byLine().

Which uses C's BUFFERED I/O and it reads from it byte by byte via getc. 
Even though sys calls are amortized by C runtime, we have a function 
call per byte. No wonder it's SLOW.

-- 
Dmitry Olshansky

Mar 28 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 3/28/2014 11:40 AM, Dmitry Olshansky wrote:
 Which uses C's BUFFERED I/O and it reads from it byte by byte via getc. Even
 though sys calls are amortized by C runtime, we have a function call per byte.
 No wonder it's SLOW.

How about a PR to fix it?

Mar 28 2014

"w0rp" <devw0rp gmail.com> writes:

On Friday, 28 March 2014 at 22:05:33 UTC, Walter Bright wrote:
 On 3/28/2014 11:40 AM, Dmitry Olshansky wrote:
 Which uses C's BUFFERED I/O and it reads from it byte by byte 
 via getc. Even
 though sys calls are amortized by C runtime, we have a 
 function call per byte.
 No wonder it's SLOW.

 How about a PR to fix it?

Yes. I hold the opinion that there is not a whole lot of reason 
why something like byChunk can't be fast like we might desire. If 
it's down to getting chunks of data the right way, whatever that 
is, and the problem is an IO bound problem, then I don't see why 
we can't submit pull requests to offer optimisations on it. I 
don't think we need to go down this "the way it does IO isn't 
fast enough so we need to replace it" route. Just optimise the 
existing thing more.

Mar 28 2014

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

29-Mar-2014 02:05, Walter Bright пишет:
 On 3/28/2014 11:40 AM, Dmitry Olshansky wrote:
 Which uses C's BUFFERED I/O and it reads from it byte by byte via
 getc. Even
 though sys calls are amortized by C runtime, we have a function call
 per byte.
 No wonder it's SLOW.

 How about a PR to fix it?

Sorry, I'm not inclined to do any work on hacking arbitrary C runtime 
libraries. Too much work on reverse engineering and making sure it stays 
in sync. We need new I/O package and hopefully Steven has something brewing.

-- 
Dmitry Olshansky

Mar 29 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 3/28/14, 11:40 AM, Dmitry Olshansky wrote:
 28-Mar-2014 22:29, Walter Bright пишет:
 On 3/28/2014 10:11 AM, Dmitry Olshansky wrote:
 WAT? The overhead is in issuing system calls, you'd want to do as
 little of them
 as possible. Reading byte by byte is an exemplar of idiocy in I/O code.

 That's why we have things like byLine().

 Which uses C's BUFFERED I/O and it reads from it byte by byte via getc.
 Even though sys calls are amortized by C runtime, we have a function
 call per byte. No wonder it's SLOW.

byLine doesn't use getc. -- Andrei

Mar 28 2014

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

29-Mar-2014 06:47, Andrei Alexandrescu пишет:
 On 3/28/14, 11:40 AM, Dmitry Olshansky wrote:
 28-Mar-2014 22:29, Walter Bright пишет:
 On 3/28/2014 10:11 AM, Dmitry Olshansky wrote:
 WAT? The overhead is in issuing system calls, you'd want to do as
 little of them
 as possible. Reading byte by byte is an exemplar of idiocy in I/O code.

 That's why we have things like byLine().

 Which uses C's BUFFERED I/O and it reads from it byte by byte via getc.
 Even though sys calls are amortized by C runtime, we have a function
 call per byte. No wonder it's SLOW.

 byLine doesn't use getc. -- Andrei

Indeed not every version. Linux looks almost OK.

There is still this stuff in empty:
https://github.com/D-Programming-Language/phobos/blob/master/std/stdio.d#L1604

And looking through the code I see that Win64, OSX and FreeBSD versions 
still use getc. Win32 hacks right into DMC run-time, and on linux 
getdelim is used.

The point about ranges only ever making sense over buffered I/O still 
stands. Preferably not C runtime.

-- 
Dmitry Olshansky

Mar 29 2014

Johannes Pfau <nospam example.com> writes:

Am Fri, 28 Mar 2014 02:55:45 -0700
schrieb Walter Bright <newshound2 digitalmars.com>:

 On 3/28/2014 1:32 AM, Johannes Pfau wrote:
 Ranges have equivalents in other languages:
 iterators in c++,

 Iterator in java

 all these languages have special stream types for raw data. I don't
 think it's bad if we also have streams/ranges separate in D.

 
 
 Do you see a point to be able to, in an algorithm, seamlessly swap a
 socket with a string?

Sorry, but no. It sure sounds nice first, but I can't really imagine a
use case where I would want any generic algorithms to work directly on a
socket. Having a look on the cheat sheet of std.algorithm 99% of these
algorithms do not make sense to operate on sockets, those that do
(count, until) would be bad in performance terms when operating
directly byte per byte.

You at least want buffered reads when doing IO operations. If you then
extend range interface to give access to an internal buffer you just
reinvented streams.

 
 empty-front-popFront works with streams and non-streams. Why break
 this?
 
 

It 'works' with streams but it's way too slow. You don't want to read
byte-per-byte. Of course you can always implement ranges on top of
streams. Usually these will not provide byte-per-byte access but
efficient higher level abstractions (byLine, byChunk, decodeText).

The point is you can implement ranges on streams easily, but you can't
use ranges as the generic primitive for raw data. What's the element
type of a data range?
ubyte - performance sucks
ubyte[n], ubyte[] now you have a range of ranges, most algorithms wont
work as expected (find, count, ...).

(the call empty/don't call empty discussion is completely unrelated to
this, btw. You can implement ranges on streams either way, but again,
using ranges for raw data streams is not a good idea.)

Mar 28 2014

"w0rp" <devw0rp gmail.com> writes:

On Friday, 28 March 2014 at 16:59:05 UTC, Johannes Pfau wrote:
 It 'works' with streams but it's way too slow. You don't want 
 to read
 byte-per-byte. Of course you can always implement ranges on top 
 of
 streams. Usually these will not provide byte-per-byte access but
 efficient higher level abstractions (byLine, byChunk, 
 decodeText).

 The point is you can implement ranges on streams easily, but 
 you can't
 use ranges as the generic primitive for raw data. What's the 
 element
 type of a data range?
 ubyte - performance sucks
 ubyte[n], ubyte[] now you have a range of ranges, most 
 algorithms wont
 work as expected (find, count, ...).

 (the call empty/don't call empty discussion is completely 
 unrelated to
 this, btw. You can implement ranges on streams either way, but 
 again,
 using ranges for raw data streams is not a good idea.)

I think a key is to offer something with gives you chunks at a 
time right at the top, and the use .joiner on that. I read files 
this way currently.

auto fileByteRange = File("something").byChunk(chunkSize).joiner;

I believe this to be a very good way to get good performance 
without losing the functionality of std.algorithm.

Mar 28 2014

Johannes Pfau <nospam example.com> writes:

Am Fri, 28 Mar 2014 17:22:26 +0000
schrieb "w0rp" <devw0rp gmail.com>:

 
 I think a key is to offer something with gives you chunks at a 
 time right at the top, and the use .joiner on that. I read files 
 this way currently.
 
 auto fileByteRange = File("something").byChunk(chunkSize).joiner;
 

byChunk is implemented on top of the file rawRead API though, and
that's a stream API ;-)
As said before implementing ranges on top of streams is fine, but if you
want ranges to replace streams as the lowest level interface you'll
either suffer from performance issues or you'll have to extend the
range interface and effectively make it a stream interface. (For example
byChunk doesn't offer a way to provide a buffer. I'd expect a low level
API to offer this, but it'll complicate range API a lot. File.rawRead
on the other hand provides exactly that and you can implement byChunk
on top of rawRead easily. The other way round is not as easy).

BTW: If this code performs well of course depends what you with that
fileByteRange range. For example if you only read the complete file into
a memory buffer joiner would reduce performance significantly.

 I believe this to be a very good way to get good performance 
 without losing the functionality of std.algorithm.

Yes, that's exactly how ranges/streams should interface, there's no
real problem for users.
stream.getSomeRange().rangeAPICalls....

Mar 28 2014

"QAston" <qaston gmail.com> writes:

On Friday, 28 March 2014 at 08:34:08 UTC, Johannes Pfau wrote:
 Am Thu, 27 Mar 2014 17:20:25 -0700
 schrieb Walter Bright <newshound2 digitalmars.com>:

 On 3/27/2014 2:56 PM, Andrei Alexandrescu wrote:
 On 3/27/14, 2:24 PM, Walter Bright wrote:
 The range protocol is designed to work with streams.

 It's designed to work with containers.

 
 I know we talked about streams when we designed it.
 
 
 It's a giant fail
 if they do not, or if you want to create a separate, 
 non-range
 universe to deal with streams.

 It's not a giant fail, we just need to adjust the notion.

 
 Are you suggesting that ranges needn't support streams?
 
 Note also that I suggested a way Steven could create an 
 adapter with
 the behavior he desired, yet still adhere to protocol. No 
 notion
 adjustments required.

 Ranges have equivalents in other languages:
 iterators in c++,

 Iterator in java

 all these languages have special stream types for raw data. I 
 don't
 think it's bad if we also have streams/ranges separate in D.

There are stream iterators in C++:
http://www.cplusplus.com/reference/iterator/istream_iterator/

Mar 28 2014

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

28-Mar-2014 04:20, Walter Bright пишет:
 On 3/27/2014 2:56 PM, Andrei Alexandrescu wrote:
 On 3/27/14, 2:24 PM, Walter Bright wrote:
 The range protocol is designed to work with streams.

 It's designed to work with containers.

 I know we talked about streams when we designed it.

If streams are like hot raw sockets then certainly it doesn't make 
sense. Ranges work nice when there are some "slots" down below, that is 
a buffer.

 It's a giant fail
 if they do not, or if you want to create a separate, non-range universe
 to deal with streams.

 It's not a giant fail, we just need to adjust the notion.

 Are you suggesting that ranges needn't support streams?

Yes they "don't support streams". They are an abstraction somewhere on 
top of buffering. This is where they fit nicely.


-- 
Dmitry Olshansky

Mar 28 2014

"QAston" <qaston gmail.com> writes:

 Even a single byte buffer is not enough, you need a bool to 
 indicate the stream
 is done.

 Right. But empty for a stream still has to read. Just follow 
 the protocol, and the range will work, even with streams.

The protocol is not intuitive. I'm not entirely against having to 
call something to do lazy initialization but it's not like anyone 
will expect empty  property to lazily initialize.

Empty and front could be methods added by a mixin template which 
would use efficient methods with strict protocol implemented by 
the user.

Mar 27 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 3/27/2014 3:23 PM, QAston wrote:
 The protocol is not intuitive.

I find empty-front-popFront as perfectly intuitive. I don't find the counter 
proposals, which come with baggage like constructors that may fail, and front() 
that may fail in unspecified ways, or throwing entire paradigms out the window, 
as intuitive.

But I concede that other people think differently. Not everyone thinks the
same. 
But consider this: floating point math is not intuitive. There has never been a 
shortage of proposals to make fp intuitive, but they've all failed because they 
are impractical.

Sometimes ya gotta go with what works.

Mar 27 2014

"Paolo Invernizzi" <paolo.invernizzi no.address> writes:

On Friday, 28 March 2014 at 00:41:42 UTC, Walter Bright wrote:
 On 3/27/2014 3:23 PM, QAston wrote:
 The protocol is not intuitive.

 I find empty-front-popFront as perfectly intuitive. I don't 
 find the counter proposals, which come with baggage like 
 constructors that may fail, and front() that may fail in 
 unspecified ways, or throwing entire paradigms out the window, 
 as intuitive.

 But I concede that other people think differently. Not everyone 
 thinks the same. But consider this: floating point math is not 
 intuitive. There has never been a shortage of proposals to make 
 fp intuitive, but they've all failed because they are 
 impractical.

 Sometimes ya gotta go with what works.

I _strongly_ agree with Walter: people learning D in my groups 
have no problems with the empty-front-popFront sequence.

Please don't complicate or change the notion of range: you can 
find an adjustment that don't break code, but for sure that will 
break the mindset of people.

For what concern us, everyone here is happy with the fact that 
empty *must* be checked prior to front/popFront.

--
Paolo

Mar 28 2014

"Regan Heath" <regan netmail.co.nz> writes:

On Fri, 28 Mar 2014 08:59:34 -0000, Paolo Invernizzi  
<paolo.invernizzi no.address> wrote:
 For what concern us, everyone here is happy with the fact that empty  
 *must* be checked prior to front/popFront.

This is actually not true.

R

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/

Mar 28 2014

"Paolo Invernizzi" <paolo.invernizzi no.address> writes:

On Friday, 28 March 2014 at 09:30:25 UTC, Regan Heath wrote:
 On Fri, 28 Mar 2014 08:59:34 -0000, Paolo Invernizzi 
 <paolo.invernizzi no.address> wrote:
 For what concern us, everyone here is happy with the fact that 
 empty *must* be checked prior to front/popFront.

 This is actually not true.

 R

What I'm meaning, it's that we don't care: we are always 
respecting the sequence "empty > front > pop", and everybody here 
find it natural.
--
Paolo

Mar 28 2014

John Stahara <john.stahara+dlang gmail.com> writes:

On Fri, 28 Mar 2014 16:23:11 +0000, Paolo Invernizzi wrote:

 On Friday, 28 March 2014 at 09:30:25 UTC, Regan Heath wrote:
 On Fri, 28 Mar 2014 08:59:34 -0000, Paolo Invernizzi
 <paolo.invernizzi no.address> wrote:
 For what concern us, everyone here is happy with the fact that empty
 *must* be checked prior to front/popFront.

 This is actually not true.

 R

 
 What I'm meaning, it's that we don't care: we are always respecting the
 sequence "empty > front > pop", and everybody here find it natural.


To clarify for Mr. Invernizzi: the "we" to which he refers is the group 
of people he works with, and /not/ the members of this newsgroup.

--jjs

Mar 28 2014

"Regan Heath" <regan netmail.co.nz> writes:

On Fri, 28 Mar 2014 16:30:36 -0000, John Stahara  
<john.stahara+dlang gmail.com> wrote:

 On Fri, 28 Mar 2014 16:23:11 +0000, Paolo Invernizzi wrote:

 On Friday, 28 March 2014 at 09:30:25 UTC, Regan Heath wrote:
 On Fri, 28 Mar 2014 08:59:34 -0000, Paolo Invernizzi
 <paolo.invernizzi no.address> wrote:
 For what concern us, everyone here is happy with the fact that empty
 *must* be checked prior to front/popFront.

 This is actually not true.

 R

 What I'm meaning, it's that we don't care: we are always respecting the
 sequence "empty > front > pop", and everybody here find it natural.


 To clarify for Mr. Invernizzi: the "we" to which he refers is the group
 of people he works with, and /not/ the members of this newsgroup.

 --jjs

Thanks, that was confusing me :)

R

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/

Mar 28 2014

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Fri, Mar 28, 2014 at 04:30:36PM +0000, John Stahara wrote:
 On Fri, 28 Mar 2014 16:23:11 +0000, Paolo Invernizzi wrote:
 
 On Friday, 28 March 2014 at 09:30:25 UTC, Regan Heath wrote:
 On Fri, 28 Mar 2014 08:59:34 -0000, Paolo Invernizzi
 <paolo.invernizzi no.address> wrote:
 For what concern us, everyone here is happy with the fact that empty
 *must* be checked prior to front/popFront.

 This is actually not true.

 R

 
 What I'm meaning, it's that we don't care: we are always respecting
 the sequence "empty > front > pop", and everybody here find it
 natural.

 
 
 To clarify for Mr. Invernizzi: the "we" to which he refers is the
 group of people he works with, and /not/ the members of this
 newsgroup.

[...]

FWIW I find it perfectly natural as well. (But I know not everyone
agrees. :P)


T

-- 
The two rules of success: 1. Don't tell everything you know. -- YHL

Mar 28 2014

"Paolo Invernizzi" <paolo.invernizzi no.address> writes:

On Friday, 28 March 2014 at 16:30:36 UTC, John Stahara wrote:
 On Fri, 28 Mar 2014 16:23:11 +0000, Paolo Invernizzi wrote:

 On Friday, 28 March 2014 at 09:30:25 UTC, Regan Heath wrote:
 On Fri, 28 Mar 2014 08:59:34 -0000, Paolo Invernizzi
 <paolo.invernizzi no.address> wrote:
 For what concern us, everyone here is happy with the fact 
 that empty
 *must* be checked prior to front/popFront.

 This is actually not true.

 R

 
 What I'm meaning, it's that we don't care: we are always 
 respecting the
 sequence "empty > front > pop", and everybody here find it 
 natural.


 To clarify for Mr. Invernizzi: the "we" to which he refers is 
 the group
 of people he works with, and /not/ the members of this 
 newsgroup.

 --jjs

Thank you John, that's exact: I'm talking about my colleagues 
working with D.
-- Paolo

Mar 28 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 3/27/14, 1:58 PM, Walter Bright wrote:
 On 3/27/2014 12:50 PM, Andrei Alexandrescu wrote:
 Yah, agreed. -- Andrei

 Completely unworkable. To determine if a range is empty or not, it may
 have to actually read from its input. TTYs are an example. Although
 empty may then cache the result, and not read the second time it is
 called, an observer of TTY could see that an item was read from the TTY.

That's a good point too.

Andrei

Mar 27 2014

"monarch_dodra" <monarchdodra gmail.com> writes:

On Thursday, 27 March 2014 at 21:54:00 UTC, Andrei Alexandrescu 
wrote:
 On 3/27/14, 1:58 PM, Walter Bright wrote:
 On 3/27/2014 12:50 PM, Andrei Alexandrescu wrote:
 Yah, agreed. -- Andrei

 Completely unworkable. To determine if a range is empty or 
 not, it may
 have to actually read from its input. TTYs are an example. 
 Although
 empty may then cache the result, and not read the second time 
 it is
 called, an observer of TTY could see that an item was read 
 from the TTY.

 That's a good point too.

 Andrei

Yes, but the "observability" should be sort lived, since empty is 
virtually guaranteed to be followed by "front" anyways. If both 
front/empty do the same (lazy) operation on the stream, then 
whether or not empty was called isn't really "observable" (to a 
certain extent).

In contrast, having front completly fail if empty was NOT called, 
means "empty" has a hell of a lot of observable side effect.

Mar 27 2014

"RivenTheMage" <riven-mage id.ru> writes:

On Thursday, 27 March 2014 at 21:54:00 UTC, Andrei Alexandrescu
wrote:
 On 3/27/14, 1:58 PM, Walter Bright wrote:
 On 3/27/2014 12:50 PM, Andrei Alexandrescu wrote:
 Yah, agreed. -- Andrei

 Completely unworkable. To determine if a range is empty or 
 not, it may
 have to actually read from its input. TTYs are an example. 
 Although
 empty may then cache the result, and not read the second time 
 it is
 called, an observer of TTY could see that an item was read 
 from the TTY.

 That's a good point too.

 Andrei

Maybe new kind of range can help?
Which has "bool popFront()" instead of "bool empty() + void
popFront()".

Mar 28 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 3/27/2014 3:49 AM, "Marc Schütz" <schuetzm gmx.net>" wrote:
 I was originally going to do that, but then I took a closer look at the
 documentation, which says ([1] in the documentation of `isInputRange()`):

 "Calling r.front is allowed only if calling r.empty has, or would have,
returned
 false."

 (And the same for `popFront()`.)

 That is, the documentation more or less explicitly states that you don't
 actually need to call `empty` if you know it returned `true`.

 [1] http://dlang.org/phobos/std_range.html

Right, it does say that, and it is seriously wrong.

Mar 27 2014

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Thu, 27 Mar 2014 00:17:21 -0400, Walter Bright  
<newshound2 digitalmars.com> wrote:

 On 3/26/2014 7:55 PM, Steven Schveighoffer wrote:
 OK, but it's logical to assume you *can* avoid a call to empty if you  
 know
 what's going on under the hood, no? Then at that point, you have lost  
 the
 requirement -- people will avoid calling empty because they can get  
 away with
 it, and then altering the under-the-hood requirements cause code  
 breakage later.

 Case in point, the pull request I referenced, the author originally  
 tried to
 just use empty to lazily initialize filter, but it failed due to  
 existing code
 in phobos that did not call empty on filtered data before processing.  
 He had to
 instrument all 3 calls.

 As with *any* API, if you look under the hood and make assumptions about  
 the behavior based on a particular implementation, assumptions that are  
 not part of the API, the risk of breakage inevitably follows.

Like range.save. It's "required", but frequently omitted, without any  
consequences. I'm not saying requiring it would be an invalid decision,  
I'm saying requiring it would be a futile gesture -- the requirement would  
be ignored.

What happens when one of our "clients" code breaks severely because we  
make a change in phobos that assumes empty is always called first on a  
newly-created range?

 If you've identified Phobos code that uses ranges but does not follow  
 the protocol, the Phobos code is broken - please file a bugzilla issue  
 on it.

I think we should work on making a protocol that does not require awkward  
calls, rather than alienating developers who don't follow the awkward  
protocol.

BTW, I think there has been recent talk about not focusing on dust when we  
are laying bricks. This would have my vote as useless dust. This does not  
solve the problem of streams-as-ranges, because streams don't make good  
ranges. It doesn't really solve any problems as far as I can tell.

-Steve

Mar 27 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Thursday, 27 March 2014 at 12:35:04 UTC, Steven Schveighoffer 
wrote:
 What happens when one of our "clients" code breaks severely 
 because we make a change in phobos that assumes empty is always 
 called first on a newly-created range?

You probably can run-time test this in Debug builds, but…

 I think we should work on making a protocol that does not 
 require awkward calls, rather than alienating developers who 
 don't follow the awkward protocol.

This is it. There is way too much awkwardness in D already. And 
essentially, the closer the design is to mainstream the more 
annoying it is when you diverge and are inconsistent.

Code should behave as expected without any need for memorizing. 
Having to differentiate between dialects is so much harder than 
differentiating between orthogonal languages.

Mar 27 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 3/27/2014 5:45 AM, "Ola Fosheim Grøstad" 
<ola.fosheim.grostad+dlang gmail.com>" wrote:
 On Thursday, 27 March 2014 at 12:35:04 UTC, Steven Schveighoffer wrote:
 I think we should work on making a protocol that does not require awkward
 calls, rather than alienating developers who don't follow the awkward protocol.

 This is it.

Sure, but what is awkward about empty-front-popFront?

You can still always use:

     foreach (e; range) { ... }

which will follow the correct protocol automatically.

Mar 27 2014

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Thu, 27 Mar 2014 17:06:35 -0400, Walter Bright  =

<newshound2 digitalmars.com> wrote:

 On 3/27/2014 5:45 AM, "Ola Fosheim Gr=C3=B8stad"  =

 <ola.fosheim.grostad+dlang gmail.com>" wrote:
 On Thursday, 27 March 2014 at 12:35:04 UTC, Steven Schveighoffer wrot=


e:
 I think we should work on making a protocol that does not require  =



 awkward
 calls, rather than alienating developers who don't follow the awkwar=



d  =

 protocol.

 This is it.

 Sure, but what is awkward about empty-front-popFront?

 You can still always use:

      foreach (e; range) { ... }

 which will follow the correct protocol automatically.

If this is all we needed, ranges would never have been invented.

-Steve

Mar 27 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 3/27/14, 5:35 AM, Steven Schveighoffer wrote:
 BTW, I think there has been recent talk about not focusing on dust when
 we are laying bricks. This would have my vote as useless dust. This does
 not solve the problem of streams-as-ranges, because streams don't make
 good ranges. It doesn't really solve any problems as far as I can tell.

I think byXchar are important and are not streams. -- Andrei

Mar 27 2014

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Thu, 27 Mar 2014 11:40:59 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 On 3/27/14, 5:35 AM, Steven Schveighoffer wrote:
 BTW, I think there has been recent talk about not focusing on dust when
 we are laying bricks. This would have my vote as useless dust. This does
 not solve the problem of streams-as-ranges, because streams don't make
 good ranges. It doesn't really solve any problems as far as I can tell.

 I think byXchar are important and are not streams. -- Andrei

What's broken about those?

-Steve

Mar 27 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 3/27/14, 8:44 AM, Steven Schveighoffer wrote:
 On Thu, 27 Mar 2014 11:40:59 -0400, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 On 3/27/14, 5:35 AM, Steven Schveighoffer wrote:
 BTW, I think there has been recent talk about not focusing on dust when
 we are laying bricks. This would have my vote as useless dust. This does
 not solve the problem of streams-as-ranges, because streams don't make
 good ranges. It doesn't really solve any problems as far as I can tell.

 I think byXchar are important and are not streams. -- Andrei

 What's broken about those?

Speed.

Mar 27 2014

"monarch_dodra" <monarchdodra gmail.com> writes:

On Thursday, 27 March 2014 at 15:45:14 UTC, Andrei Alexandrescu 
wrote:
 On 3/27/14, 8:44 AM, Steven Schveighoffer wrote:
 On Thu, 27 Mar 2014 11:40:59 -0400, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 On 3/27/14, 5:35 AM, Steven Schveighoffer wrote:
 BTW, I think there has been recent talk about not focusing 
 on dust when
 we are laying bricks. This would have my vote as useless 
 dust. This does
 not solve the problem of streams-as-ranges, because streams 
 don't make
 good ranges. It doesn't really solve any problems as far as 
 I can tell.

 I think byXchar are important and are not streams. -- Andrei

 What's broken about those?

 Speed.

I call shenanigans. This is the code:

              property bool empty()
             {
                 if (nLeft)
                     return false;
                 if (r.empty)
                     return true;

If you initialized the next element in both constructor and 
popFront, then you'd get rid of both these checks.

Furthermore, popFront() has a guaranteed "1 call" per element. 
Weareas "empty" and "front" may be called several times in a row 
for a single element.

If you want efficiency, stop being "member-wise" lazy, and put 
some eagerness in your code.

 filter comes to mind. -- Andrei

You *just* turned down a pull that did exactly that, for exactly 
the same reasons.

Have "byXChar" function like filter: construction and popFront do 
the work, and front/empty is 0(1), and const, and strongly pure 
to boot. I *know* the compiler likes that in terms of speed.

Mar 27 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 3/27/14, 9:12 AM, monarch_dodra wrote:
 filter comes to mind. -- Andrei

 You *just* turned down a pull that did exactly that, for exactly the
 same reasons.

I was on the verge, and we need to discuss this more. A constructor that 
does arbitrary work for a lazy range doesn't sit well. -- Andrei

Mar 27 2014

"monarch_dodra" <monarchdodra gmail.com> writes:

On Thursday, 27 March 2014 at 19:03:26 UTC, Andrei Alexandrescu 
wrote:
 On 3/27/14, 9:12 AM, monarch_dodra wrote:
 filter comes to mind. -- Andrei

 You *just* turned down a pull that did exactly that, for 
 exactly the
 same reasons.

 I was on the verge, and we need to discuss this more. A 
 constructor that does arbitrary work for a lazy range doesn't 
 sit well. -- Andrei

I still think there's ambiguity in the word "lazy". I think this 
is the distinction most make:

Not lazy:
//----
auto r = getSomeRange();
auto arr = someRange.array();
foreach( ref e ; arr)
     doSomething;
remove!fun(arr);
return arr;
//----

Lazy:
//----
return getSomeRange()
     .map!doSomething()
     .filter!fun()
     .array();
//----

What you are talking about (IMO) is better described as "eager" 
vs "not eager".

"A constructor does some work for a lazy range that does eager 
processing of its elements": I think this is a better 
description, and I think is acceptable.

Mar 27 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 3/27/2014 9:12 AM, monarch_dodra wrote:
 If you initialized the next element in both constructor

As mentioned earlier, requiring this of a range constructor makes it impossible 
to separate composition of a range with using it.

As also mentioned earlier, this throws under the bus entire categories of use 
cases for ranges, all to avoid a single call to 'empty'.

Mar 27 2014

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Thu, 27 Mar 2014 16:53:35 -0400, Walter Bright  
<newshound2 digitalmars.com> wrote:

 On 3/27/2014 9:12 AM, monarch_dodra wrote:
 If you initialized the next element in both constructor

 As mentioned earlier, requiring this of a range constructor makes it  
 impossible to separate composition of a range with using it.

Not impossible. Use lazy initialization.

 As also mentioned earlier, this throws under the bus entire categories  
 of use cases for ranges, all to avoid a single call to 'empty'.

This is going to look weird:

r.empty;

// now we can use it...

It only makes sense if you are using empty in your processing of the range.

-Steve

Mar 27 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 3/27/2014 1:59 PM, Steven Schveighoffer wrote:
 On Thu, 27 Mar 2014 16:53:35 -0400, Walter Bright <newshound2 digitalmars.com>
 wrote:

 On 3/27/2014 9:12 AM, monarch_dodra wrote:
 If you initialized the next element in both constructor

 As mentioned earlier, requiring this of a range constructor makes it
 impossible to separate composition of a range with using it.

 Not impossible. Use lazy initialization.

That still doesn't work, because then you could be calling front, and front 
might fail if there's no input.

What is wrong with following the protocol? Why must we introduce all these 

style LINQ, etc.? For what gain?


 It only makes sense if you are using empty in your processing of the range.

I.e. follow the danged protocol and it works.

Mar 27 2014

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Thu, 27 Mar 2014 17:28:05 -0400, Walter Bright  
<newshound2 digitalmars.com> wrote:

 On 3/27/2014 1:59 PM, Steven Schveighoffer wrote:
 On Thu, 27 Mar 2014 16:53:35 -0400, Walter Bright  
 <newshound2 digitalmars.com>
 wrote:

 On 3/27/2014 9:12 AM, monarch_dodra wrote:
 If you initialized the next element in both constructor

 As mentioned earlier, requiring this of a range constructor makes it
 impossible to separate composition of a range with using it.

 Not impossible. Use lazy initialization.

 That still doesn't work, because then you could be calling front, and  
 front might fail if there's no input.

In the cases where you know there is input, that's not true. What I am  
protesting is the idea that you *know* input exists, you must still call  
empty. Front is enough in that case.

 What is wrong with following the protocol? Why must we introduce all  
 these caveats for ranges, like streams not working, front that can fail,  


There is no gain to requiring empty on ranges where you logically prove  
they are not empty. It's a wasted call. For things where you're not sure  
if they are empty or not (e.g. an input file), of course empty is required  
to be called, and of course it may do some processing to determine that.  
But I would only expect that on the first call. Subsequent checks would  
already be done by popFront.

 It only makes sense if you are using empty in your processing of the  
 range.

 I.e. follow the danged protocol and it works.

I mean, if you are calling empty regularly, because you aren't sure  
whether the elements are there. That's not always the case, sometimes you  
are sure the elements are there. Requiring a call to empty to do anything  
other than to check whether a range is empty, is just plain wrong. I'm  
sure there's better ways to implement what you want without stitching the  
functionality onto empty.

-Steve

Mar 27 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 3/27/2014 2:46 PM, Steven Schveighoffer wrote:
 In the cases where you know there is input, that's not true.

The range doesn't know what you know.

 What I am protesting is the idea that you *know* input exists, you must still
call empty.
 Front is enough in that case.

You can't use that on generic code, because generic code cannot assume that 
front will not fail.


 What is wrong with following the protocol? Why must we introduce all these

 style LINQ, etc.? For what gain?

 There is no gain to requiring empty on ranges where you logically prove they
are
 not empty. It's a wasted call. For things where you're not sure if they are
 empty or not (e.g. an input file), of course empty is required to be called,
and
 of course it may do some processing to determine that. But I would only expect
 that on the first call. Subsequent checks would already be done by popFront.

I know that you want to get rid of empty. But getting rid of empty means that 
front may fail. This is why there is an empty, and any generic code MUST
respect 
that.

What you can do is, in your range:

     enum empty = true;

and then it won't cost anything to call it.


 I mean, if you are calling empty regularly, because you aren't sure whether the
 elements are there. That's not always the case, sometimes you are sure the
 elements are there.

Then use an adapter range that has:

    enum empty = true;

and forwards the front and popFront() calls to its parameter. You'll get the 
optimization you want, and won't violate protocol.


 Requiring a call to empty to do anything other than to check
 whether a range is empty, is just plain wrong.

There are solid reasons to require it (i.e. streams), and you've responded that 
ranges needn't even support streams. If I have interpreted your position 
correctly, I cannot agree with it.

Mar 27 2014

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Thu, 27 Mar 2014 19:52:52 -0400, Walter Bright  
<newshound2 digitalmars.com> wrote:

 On 3/27/2014 2:46 PM, Steven Schveighoffer wrote:
 What I am protesting is the idea that you *know* input exists, you must  
 still call empty.
 Front is enough in that case.

 You can't use that on generic code, because generic code cannot assume  
 that front will not fail.

There is a difference here that I want to establish.

First, I completely agree that for generic code, for when an unknown range  
is passed in, empty is required to be called to verify that front is valid.

Second, while I agree that empty should be required to verify front is  
valid on unknown ranges, it should not be required, when it is known that  
it's not empty.

What you are proposing is that we take advantage of the requirement for  
empty on unknown ranges to require it on known ones. I disagree, the code  
looks awkward and incorrect. When I know something is empty, calling empty  
again doesn't give me information. instead empty now becomes "is it empty,  
and if it needs priming, prime it." I'd rather see another primitive,  
since empty does not convey that message.

A counter proposal -- What if we create a global method in std.range  
called prime. prime(r) will call r.prime if it exists, otherwise calls  
r.empty, and ignores the result. Code that wants to work with "primable"  
ranges, can call that function, and it's harmless for normal ranges, but  
will allow primable ranges to fetch the first element.

I think penalizing all range-using code to force a non-functional empty  
call would 1. artificially invalidate lots of currently valid code and 2.  
look completely awkward in cases where it's not normally necessary.

-Steve

Mar 28 2014

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:

On Thursday, 27 March 2014 at 16:12:36 UTC, monarch_dodra wrote:
 On Thursday, 27 March 2014 at 15:45:14 UTC, Andrei Alexandrescu 
 wrote:
 On 3/27/14, 8:44 AM, Steven Schveighoffer wrote:
 On Thu, 27 Mar 2014 11:40:59 -0400, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 On 3/27/14, 5:35 AM, Steven Schveighoffer wrote:
 BTW, I think there has been recent talk about not focusing 
 on dust when
 we are laying bricks. This would have my vote as useless 
 dust. This does
 not solve the problem of streams-as-ranges, because streams 
 don't make
 good ranges. It doesn't really solve any problems as far as 
 I can tell.

 I think byXchar are important and are not streams. -- Andrei

 What's broken about those?

 Speed.

 I call shenanigans. This is the code:

              property bool empty()
             {
                 if (nLeft)
                     return false;
                 if (r.empty)
                     return true;

 If you initialized the next element in both constructor and 
 popFront, then you'd get rid of both these checks.

... but of course lose laziness.

 Furthermore, popFront() has a guaranteed "1 call" per element. 
 Weareas "empty" and "front" may be called several times in a 
 row for a single element.

 If you want efficiency, stop being "member-wise" lazy, and put 
 some eagerness in your code.

Does a flag "isInitialized" even have an effect on performance 
where it matters?

It should only be relevant in tight loops, and there I expect a 
moderately well-working optimizer to inline calls to `empty` and 
`front`. It can then see that it has just set "isInitialized" to 
true/false, and remove the checks completely, effectively 
generating code as if the value were only fetched/computed in 
`empty`. That only leaves the larger struct size as a potential 
problem. However I've seen LDC being able to "decompose" a struct 
into separate variables, so I guess it can remove the unnecessary 
member in our case.

In other situations, where you already have more complex code, an 
additional conditional branch will not have as much weight anyway.

Does somehow have numbers for realistic programs, with 
GDC/LDC/DMD? I suspect this entire discussion is moot, because we 
can easily have non-eagerly initialized ranges without 
sacrificing performance in most situations.

Mar 28 2014

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Fri, 28 Mar 2014 07:47:22 -0400, Marc Sch=C3=BCtz <schuetzm gmx.net> =
wrote:

 On Thursday, 27 March 2014 at 16:12:36 UTC, monarch_dodra wrote:

 I call shenanigans. This is the code:

              property bool empty()
             {
                 if (nLeft)
                     return false;
                 if (r.empty)
                     return true;

 If you initialized the next element in both constructor and popFront,=


  =

 then you'd get rid of both these checks.

 ... but of course lose laziness.

In this case, laziness is not critical. Decoding the element is an O(1) =
 =

operation, and when looping through, you will decode it anyway.

When processing a 20 element string, the cost of checking to see if you =
 =

have decoded on every empty or front call may override the front-loaded =
 =

cost of decoding the first element on construction. It's sure to add to =
 =

the cost if you are processing all 20 elements, since you decode them al=
l  =

anyway.

On other ranges, it's more important when the first element costs a lot =
to  =

fetch. HOWEVER, it's not critically important to delay that unless you a=
re  =

not going to process that element. For example, if you are foreach'ing  =

over all the elements, the delay doesn't matter.

I'd rather the higher level code decide whether to delay or not, dependi=
ng  =

on the situation. Requiring a protocol change for such detailed knowledg=
e  =

seems unbalanced.

-Steve

Mar 28 2014

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:

On Saturday, 29 March 2014 at 01:36:46 UTC, Steven Schveighoffer 
wrote:
 On Fri, 28 Mar 2014 07:47:22 -0400, Marc Schütz 
 <schuetzm gmx.net> wrote:
 On Thursday, 27 March 2014 at 16:12:36 UTC, monarch_dodra 
 wrote:
 If you initialized the next element in both constructor and 
 popFront, then you'd get rid of both these checks.

 ... but of course lose laziness.

 In this case, laziness is not critical. Decoding the element is 
 an O(1) operation, and when looping through, you will decode it 
 anyway.

 When processing a 20 element string, the cost of checking to 
 see if you have decoded on every empty or front call may 
 override the front-loaded cost of decoding the first element on 
 construction. It's sure to add to the cost if you are 
 processing all 20 elements, since you decode them all anyway.

 On other ranges, it's more important when the first element 
 costs a lot to fetch. HOWEVER, it's not critically important to 
 delay that unless you are not going to process that element. 
 For example, if you are foreach'ing over all the elements, the 
 delay doesn't matter.

 I'd rather the higher level code decide whether to delay or 
 not, depending on the situation. Requiring a protocol change 
 for such detailed knowledge seems unbalanced.

I was more thinking of the fact that you need to read something 
on construction, rather than on consumption, and this reading 
might be noticeable. There was the example of 
`stdin.byLine().filter(...)` (or something similar, don't 
remember exactly), which reads from stdin on construction. This 
changes the behaviour of the program, because the read operation 
will (probably) block.

I'd suggest to make it a requirement for ranges and algorithms 
_not_ to start consuming the underlying data until one of 
empty/front/popFront is called, even if that has a negative 
effect on performance. That's why I was asking for performance 
numbers, to see whether there even is an effect. If there isn't, 
that's just another argument for adding that requirement.

This is then, IMO, a very acceptable additional burden to place 
on the writers of ranges. I agree, however, that it's not a good 
idea to change the range protocol, i.e. what _users_ of ranges 
have to abide by. That would be a breaking change, and it would 
be an especially bad one because there I see no way to detect 
that a user failed to call `empty` in an iteration if they knew 
that there are more elements available.

Mar 29 2014

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Sat, 29 Mar 2014 17:02:30 -0400, Marc Sch=C3=BCtz <schuetzm gmx.net> =
wrote:

 On Saturday, 29 March 2014 at 01:36:46 UTC, Steven Schveighoffer wrote=

:
 On Fri, 28 Mar 2014 07:47:22 -0400, Marc Sch=C3=BCtz <schuetzm gmx.ne=


t>  =

 wrote:
 On Thursday, 27 March 2014 at 16:12:36 UTC, monarch_dodra wrote:
 If you initialized the next element in both constructor and popFron=




t,  =

 then you'd get rid of both these checks.

 ... but of course lose laziness.

 In this case, laziness is not critical. Decoding the element is an O(=


1)  =

 operation, and when looping through, you will decode it anyway.

 When processing a 20 element string, the cost of checking to see if y=


ou  =

 have decoded on every empty or front call may override the front-load=


ed  =

 cost of decoding the first element on construction. It's sure to add =


to  =

 the cost if you are processing all 20 elements, since you decode them=


  =

 all anyway.

 On other ranges, it's more important when the first element costs a l=


ot  =

 to fetch. HOWEVER, it's not critically important to delay that unless=


  =

 you are not going to process that element. For example, if you are  =


 foreach'ing over all the elements, the delay doesn't matter.

 I'd rather the higher level code decide whether to delay or not,  =


 depending on the situation. Requiring a protocol change for such  =


 detailed knowledge seems unbalanced.

 I was more thinking of the fact that you need to read something on  =

 construction, rather than on consumption, and this reading might be  =

 noticeable. There was the example of `stdin.byLine().filter(...)` (or =

 =

 something similar, don't remember exactly), which reads from stdin on =

 =

 construction. This changes the behaviour of the program, because the  =

 read operation will (probably) block.

Blocking operations may have a noticeable impact, but they may not. It  =

depends on what you do between construction and processing.

For example, if you have:

foreach(l; stdin.byLine)

Lazily fetching the first line makes no operational difference whatsoeve=
r,  =

even if it blocks, because you're immediately going to process it.

But if it *is* lazily fetched, you are paying some possibly small, but  =

nonzero, cost for that laziness that you didn't need to pay.

This is why I said it's not critically important to delay unless you are=
  =

not going to process the first element. Because there is no primitive to=
  =

"prime" the range, we must do it on a property fetch, or on construction=
.  =

But after that, popFront is a perfectly reasonable place to get the next=
  =

element.

All this fuss is over the *first* element in the range. I think providin=
g  =

a mechanism to decide whether you want it now or later is reasonable. Fo=
r  =

example a lazy range wrapper.

Note, however, the case Andrei was arguing about was one of the string  =

decoders. When you are blocking for input, hell, even if you aren't  =

blocking, but need to call system calls to get it, the performance cost =
of  =

checking a boolean every call is negligible. But when you are decoding 1=
-4  =

bytes of data in memory, checking a boolean becomes significant.

-Steve

Mar 31 2014

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:

On Monday, 31 March 2014 at 11:40:11 UTC, Steven Schveighoffer 
wrote:
 Blocking operations may have a noticeable impact, but they may 
 not. It depends on what you do between construction and 
 processing.

 For example, if you have:

 foreach(l; stdin.byLine)

 Lazily fetching the first line makes no operational difference 
 whatsoever, even if it blocks, because you're immediately going 
 to process it.

 But if it *is* lazily fetched, you are paying some possibly 
 small, but nonzero, cost for that laziness that you didn't need 
 to pay.

 This is why I said it's not critically important to delay 
 unless you are not going to process the first element. Because 
 there is no primitive to "prime" the range, we must do it on a 
 property fetch, or on construction. But after that, popFront is 
 a perfectly reasonable place to get the next element.

 All this fuss is over the *first* element in the range. I think 
 providing a mechanism to decide whether you want it now or 
 later is reasonable. For example a lazy range wrapper.

I've found the example I was talking about:
http://forum.dlang.org/thread/mailman.323.1393458346.6445.digitalmars-d puremagic.com

I misremembered, filter wasn't even involved. But similar 
situations may arise from using filter or another range that 
eagerly fetches the first element, even if its source doesn't.

 Note, however, the case Andrei was arguing about was one of the 
 string decoders. When you are blocking for input, hell, even if 
 you aren't blocking, but need to call system calls to get it, 
 the performance cost of checking a boolean every call is 
 negligible. But when you are decoding 1-4 bytes of data in 
 memory, checking a boolean becomes significant.

That's what I meant by a tight loop. My hypothesis is that in 
such cases the optimizers of at least GDC and LDC are good enough 
to remove the check for the boolean completely, and that at least 
LDC can then remove the boolean itself from the range structure.

Mar 31 2014

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Mon, 31 Mar 2014 13:43:05 -0400, Marc Sch=C3=BCtz <schuetzm gmx.net> =
wrote:

 I've found the example I was talking about:
 http://forum.dlang.org/thread/mailman.323.1393458346.6445.digitalmars-=

d puremagic.com
 I misremembered, filter wasn't even involved. But similar situations m=

ay  =

 arise from using filter or another range that eagerly fetches the firs=

t  =

 element, even if its source doesn't.

Sure, but fetching lazily adds cost. It's possible to add it as an optio=
n  =

when needed, but it's not easy to reclaim the cost if it's the default  =

implementation.

 That's what I meant by a tight loop. My hypothesis is that in such cas=

es  =

 the optimizers of at least GDC and LDC are good enough to remove the  =

 check for the boolean completely, and that at least LDC can then remov=

e  =

 the boolean itself from the range structure.

I suspect that this is an optimization that will break down when not don=
e  =

in the "right way." I'd rather be able to choose, do I need lazy  =

evaluation or not? The cases where lazy evaluation is needed are not  =

common.

-Steve

Mar 31 2014

Timon Gehr <timon.gehr gmx.ch> writes:

On 03/27/2014 05:17 AM, Walter Bright wrote:
 As with *any* API, if you look under the hood and make assumptions about
 the behavior based on a particular implementation, assumptions that are
 not part of the API, the risk of breakage inevitably follows.

 If you've identified Phobos code that uses ranges but does not follow
 the protocol, the Phobos code is broken - please file a bugzilla issue
 on it.

I feel uneasy about the assertion that code that assumes that 'empty' 
checks for an empty range is broken. If the API does not allow fast 
range implementations without arbitrarily restrictions on some 
'protocol', then this is a problem with the API. Maybe 
empty/front/popFront should just be a single primitive, like:

Nullable!T get();

Mar 27 2014

Timon Gehr <timon.gehr gmx.ch> writes:

On 03/27/2014 11:48 PM, Timon Gehr wrote:
 arbitrarily restrictions on some 'protocol'

arbitrarily restricting interactions to some 'protocol'

Mar 27 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 3/27/2014 3:48 PM, Timon Gehr wrote:
 Maybe empty/front/popFront should just be a single primitive, like:
 Nullable!T get();

What if get() fails?

Mar 27 2014

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Thu, Mar 27, 2014 at 05:11:55PM -0700, Walter Bright wrote:
 On 3/27/2014 3:48 PM, Timon Gehr wrote:
Maybe empty/front/popFront should just be a single primitive, like:
Nullable!T get();

 
 What if get() fails?

That's the whole point of Nullable!T: you can return null if get()
fails.

I can't say I like this, though. There are many cases in my code that
need separation of .empty from .front from .popFront, such as recursive
descent parsers or code that does if-match-then-consume algorithms. One
of the *nice* things about the current range API is that that code works
with arrays, streams, and lots of other stuff, without needing to care
about implementation details. Changing it to get() will require lots of
manual buffering and reintroduce lots of ugly boilerplate that I had to
live with in C++.

I'm with Walter on this one. Generic code should NOT assume anything
about a range it was given, and therefore it should call .empty before
calling .front or .popFront. If you "know" that a particular range
doesn't require calling .empty beforehand, then by definition your code
is no longer generic, and you just have to live with the consequences of
that. Nothing stops you, for example, from exposing a .get function or
whatever else, *in addition* to the standard range API. Then code that
knows how to deal with .get will use it, and your range remains usable
with other generic code.

Nothing about the range API dictates that .empty cannot do useful work
like initialize buffers.  That should be an implementation detail that
generic code shouldn't rely on.


T

-- 
The early bird gets the worm. Moral: ewww...

Mar 27 2014

"monarch_dodra" <monarchdodra gmail.com> writes:

On Friday, 28 March 2014 at 02:14:40 UTC, H. S. Teoh wrote:
 I'm with Walter on this one. Generic code should NOT assume 
 anything
 about a range it was given, and therefore it should call .empty 
 before
 calling .front or .popFront. If you "know" that a particular 
 range
 doesn't require calling .empty beforehand, then by definition 
 your code
 is no longer generic, and you just have to live with the 
 consequences of
 that. Nothing stops you, for example, from exposing a .get 
 function or
 whatever else, *in addition* to the standard range API. Then 
 code that
 knows how to deal with .get will use it, and your range remains 
 usable
 with other generic code.

It's not that *you* know a particular range doesn't need "empty" 
called, it's that the algorithm you are using has already 
previously validated there are elements in it.

For example, the splitter algorithm will first save a copy of its 
range, and then walk it, searching for the "splitting" elements. 
It then realizes it has walked N elements.

 From there, it takes the original range, and packs it into a 
"takeExactly(N)" of the original range. Iterating that 
"takeExactly(N)" is faster than a raw "take", *because* 
"takeExactly" was already promised that the range holds N 
elements, and as such, *doesn't* check for empty.

Ditto for "findSplit".

And again, there are functions, such as "copy", then simply 
*require* that a certain range have at least a certain amount of 
elements. Why check for it, if not providing it is a violation of 
its interface.

On Thursday, 27 March 2014 at 23:52:46 UTC, Walter Bright wrote:
 I know that you want to get rid of empty. But getting rid of 
 empty means that front may fail. This is why there is an empty, 
 and any generic code MUST respect that.

Front only fails *if* the range is empty. Not if you fail to call 
empty. Generic code respects that.

 What you can do is, in your range:

     enum empty = true;

That's not the same. That's making the assumption your range will 
*never* be empty, which is a whole other concept (infinite 
ranges).

Mar 28 2014

Rainer Schuetze <r.sagitario gmx.de> writes:

On 27.03.2014 03:48, Walter Bright wrote:
 On 3/26/2014 7:19 PM, Steven Schveighoffer wrote:
 if(!r.empty)
 {
     auto r2 = map!(x => x * 2)(r);
     do
     {
        auto x = r2.front;
        ...
     } while(!r2.empty);
 }

 Should we be required to re-verify that r2 is not empty before using
 it? It
 clearly is not, and would be an artificial requirement (one that map
 likely
 would not enforce!).

 This sounds so much like a convention that simply won't be followed,
 and the
 result will be perfectly valid code.

 The idea is that there are three functions: empty, front, and popFront.
 The functionality of the range will be distributed between these 3
 functions. Different things being ranged over will naturally split their
 operations into the 3 functions in different ways.

 If the 3 functions can be called in any order, then extra logic has to
 be added to them to signal to each other. This slows things down.

 To write generic code calling ranges, it's necessary to stop assuming
 what is happening under the hood of empty, front, and popFront, and
 stick to the protocol.

IIUC what you are proposing would be covered better by removing front 
and return the value by popFront.

Instead of forcing strong, breaking and sometimes unintuitive semantics 
we could also improve dmd's optimizer. This caching range example:

///////////////////////////////////
T getc(T)();

struct irange(T)
{
   bool _cached;
   T _cache;

   this(T[] arr) { _cached = false; }
   bool empty() {
     if(_cached) return false;
     _cache = getc!T();
     return (_cache < 0);
   }
   T front() { empty(); return _cache; }
   void popFront() { _cached = false; }
}

int foo(irange!int rg)
{
   int sum = 0;
   while(!rg.empty)
   {
     sum += rg.front;
   	rg.popFront;
   }
   return sum;
}
///////////////////////////////////

generates this code with "dmd2.065 -O -inline -release -noboundscheck -m64":

_D7irange23fooFS7irange213__T6irangeTiZ6irangeZi:
   0000: push        rbp
   0001: mov         rbp,rsp
   0004: push        rax
   0005: push        rsi
   0006: mov         qword ptr [rbp+10h],rcx
   000A: xor         esi,esi
   000C: lea         rcx,[rbp+10h]
   0010: sub         rsp,20h
   0014: call        _D7irange213__T6irangeTiZ6irange5emptyMFZb
   0019: add         rsp,20h
   001D: xor         al,1
   001F: je          0050
   0021: lea         rcx,[rbp+10h]
   0025: sub         rsp,20h
   0029: call        _D7irange213__T6irangeTiZ6irange5emptyMFZb
   002E: add         rsp,20h
   0032: mov         eax,dword ptr [rbp+14h]
   0035: add         esi,eax
   0037: mov         byte ptr [rbp+10h],0
   003B: lea         rcx,[rbp+10h]
   003F: sub         rsp,20h
   0043: call        _D7irange213__T6irangeTiZ6irange5emptyMFZb
   0048: add         rsp,20h
   004C: xor         al,1
   004E: jne         0021
   0050: mov         eax,esi
   0052: pop         rsi
   0053: lea         rsp,[rbp]
   0057: pop         rbp
   0058: ret

_D7irange213__T6irangeTiZ6irange5emptyMFZb:
   0000: push        rbp
   0001: mov         rbp,rsp
   0004: mov         qword ptr [rbp+10h],rcx
   0008: cmp         byte ptr [rcx],0
   000B: je          0011
   000D: xor         eax,eax
   000F: pop         rbp
   0010: ret
   0011: sub         rsp,20h
   0015: call        _D7irange211__T4getcTiZ4getcFZi
   001A: add         rsp,20h
   001E: mov         rcx,qword ptr [rbp+10h]
   0022: mov         dword ptr [rcx+4],eax
   0025: shr         eax,1Fh
   0028: mov         byte ptr [rcx],al
   002A: xor         al,1
   002C: pop         rbp
   002D: ret

and this with gdc (-O2 on godbolt.org):

int example.foo(example.irange!(int).irange):
	mov	rax, rdi
	push	rbx
	xor	ebx, ebx
	shr	rax, 32
	test	dil, dil
	je	.L5
.L2:
	add	ebx, eax
.L5:
	call	int example.getc!(int).getc()
	test	eax, eax
	js	.L2
	mov	eax, ebx
	pop	rbx
	ret

All traces of the caching but the initial state check have been removed, 
no extra calls.

Mar 26 2014

Rainer Schuetze <r.sagitario gmx.de> writes:

On 27.03.2014 07:53, Rainer Schuetze wrote:
 bool empty() {
      if(_cached) return false;
      _cache = getc!T();
      return (_cache < 0);
    }

Ouch, pasted before fixing:

   bool empty() {
     if(_cached) return false;
     _cache = getc!T();
     _cached = _cache >= 0; // this line added
     return !_cached;
   }

The asm is with an unintended "_cached = _cache < 0;", but that's the 
same, just accepting negative input instead of non-negative.

Mar 26 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 3/26/2014 11:53 PM, Rainer Schuetze wrote:
 IIUC what you are proposing would be covered better by removing front and
return
 the value by popFront.

Different things being ranged over make different decisions as to what goes
into 
each function.

 Instead of forcing strong, breaking and sometimes unintuitive semantics

I don't understand what is so unintuitive about:

     while (!empty) { ... front ...; popFront(); }

What I do find unintuitive and unattractive is all those flags in the range 
implementation.

 we could also improve dmd's optimizer.

Yes, that's true. The thing about optimizers, though, is they work on patterns. 
If your code doesn't quite match the right pattern, it won't tickle the 
optimization you might be expecting. The pattern for recognizing the ROL and
ROR 
patterns is one such.

Mar 27 2014

Rainer Schuetze <r.sagitario gmx.de> writes:

On 27.03.2014 09:42, Walter Bright wrote:
 On 3/26/2014 11:53 PM, Rainer Schuetze wrote:
 IIUC what you are proposing would be covered better by removing front
 and return
 the value by popFront.

 Different things being ranged over make different decisions as to what
 goes into each function.

 Instead of forcing strong, breaking and sometimes unintuitive semantics

 I don't understand what is so unintuitive about:

      while (!empty) { ... front ...; popFront(); }

This loop is intuitive. Not being allowed to call empty or front 
multiple times or not at all is unintuitive. They should not be named as 
if they are properties then.

 What I do find unintuitive and unattractive is all those flags in the
 range implementation.

 we could also improve dmd's optimizer.

 Yes, that's true. The thing about optimizers, though, is they work on
 patterns. If your code doesn't quite match the right pattern, it won't
 tickle the optimization you might be expecting. The pattern for
 recognizing the ROL and ROR patterns is one such.

I guess gcc doesn't use patterns here, but value propagation and notices 
that all the checks are unnecessary.

Mar 27 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 3/27/2014 12:21 PM, Rainer Schuetze wrote:
 This loop is intuitive. Not being allowed to call empty or front multiple times
 or not at all is unintuitive. They should not be named as if they are
properties
 then.

I can concede that. But I can't concede being able to call front without first 
calling empty, or calling popFront without calling empty and front, or
requiring 
'pump priming' in the constructor.

Mar 27 2014

"Tobias =?UTF-8?B?TcO8bGxlciI=?= <troplin bluewin.ch> writes:

On Thursday, 27 March 2014 at 20:49:16 UTC, Walter Bright wrote:
 On 3/27/2014 12:21 PM, Rainer Schuetze wrote:
 This loop is intuitive. Not being allowed to call empty or 
 front multiple times
 or not at all is unintuitive. They should not be named as if 
 they are properties
 then.

 I can concede that. But I can't concede being able to call 
 front without first calling empty, or calling popFront without 
 calling empty and front, or requiring 'pump priming' in the 
 constructor.

Disclaimer: I'm a C++ programmer just lurking here, I've never
actually used D.

I find it very counter-intuitive that 'empty' is required before
front or popFront.
Since 'pump priming' in the constructor isn't wanted either, i'd
suggest the following protocol:

while (popFront())
{
     front;
}

popFront is then required to return !empty.
'empty' as a separate property getter can stay but is not
required for the protocol.

This way, it's clear that the work to fetch the next element is
always done in popFront.

Generally I find dependecies between functions problematic that
require a specific call sequence. If they can be removed, they
should.

With my proposed solution, there's still one minor dependency,
namely that front is not valid before the first popFront. This
could be solved by again combining the two, as proposed by
someone else in this thread.

Tobi

Mar 28 2014

"Szymon Gatner" <noemail gmail.com> writes:

On Friday, 28 March 2014 at 23:14:56 UTC, Tobias Müller wrote:
 On Thursday, 27 March 2014 at 20:49:16 UTC, Walter Bright wrote:
 On 3/27/2014 12:21 PM, Rainer Schuetze wrote:
 This loop is intuitive. Not being allowed to call empty or 
 front multiple times
 or not at all is unintuitive. They should not be named as if 
 they are properties
 then.

 I can concede that. But I can't concede being able to call 
 front without first calling empty, or calling popFront without 
 calling empty and front, or requiring 'pump priming' in the 
 constructor.

 Disclaimer: I'm a C++ programmer just lurking here, I've never
 actually used D.

 I find it very counter-intuitive that 'empty' is required before
 front or popFront.
 Since 'pump priming' in the constructor isn't wanted either, i'd
 suggest the following protocol:

 while (popFront())
 {
     front;
 }

In c++ if you had a list or deque you would obviously did 
if(empty()) before calling front() too, and the before a call to 
pop_front(). Container is kind of range and in c++ it looks 
exactly the same:

while (!cont.empty())
{
   auto& var = cont.front();
   cont.pop_front(); // consume
}

3 operations are needed.

Mar 28 2014

"w0rp" <devw0rp gmail.com> writes:

On Friday, 28 March 2014 at 23:14:56 UTC, Tobias Müller wrote:
 On Thursday, 27 March 2014 at 20:49:16 UTC, Walter Bright wrote:
 On 3/27/2014 12:21 PM, Rainer Schuetze wrote:
 This loop is intuitive. Not being allowed to call empty or 
 front multiple times
 or not at all is unintuitive. They should not be named as if 
 they are properties
 then.

 I can concede that. But I can't concede being able to call 
 front without first calling empty, or calling popFront without 
 calling empty and front, or requiring 'pump priming' in the 
 constructor.

 Disclaimer: I'm a C++ programmer just lurking here, I've never
 actually used D.

 I find it very counter-intuitive that 'empty' is required before
 front or popFront.
 Since 'pump priming' in the constructor isn't wanted either, i'd
 suggest the following protocol:

 while (popFront())
 {
     front;
 }

 popFront is then required to return !empty.
 'empty' as a separate property getter can stay but is not
 required for the protocol.

 This way, it's clear that the work to fetch the next element is
 always done in popFront.

 Generally I find dependecies between functions problematic that
 require a specific call sequence. If they can be removed, they
 should.

 With my proposed solution, there's still one minor dependency,
 namely that front is not valid before the first popFront. This
 could be solved by again combining the two, as proposed by
 someone else in this thread.

 Tobi

Well, it might seem kind of weird at first that an InputRange has 
these concepts in three distinct methods, but they really are 
three separate operations. After I have spent enough time working 
with Java Iterators, I have found it much nicer to be able to get 
the current value or see if I'm at the end or not without having 
to worry about caching the current value myself. Also as I've 
said before it gives you some opportunity to make trying to fetch 
something out of an empty range less error prone, because you 
don't have to check for a null value, etc.

Mar 28 2014

"Oscar =?UTF-8?B?TWFydMOtbiI=?= <omarmed0000 hotmail.com> writes:

On Friday, 28 March 2014 at 23:14:56 UTC, Tobias Müller wrote:
 On Thursday, 27 March 2014 at 20:49:16 UTC, Walter Bright wrote:
 On 3/27/2014 12:21 PM, Rainer Schuetze wrote:
 This loop is intuitive. Not being allowed to call empty or 
 front multiple times
 or not at all is unintuitive. They should not be named as if 
 they are properties
 then.

 I can concede that. But I can't concede being able to call 
 front without first calling empty, or calling popFront without 
 calling empty and front, or requiring 'pump priming' in the 
 constructor.

 Disclaimer: I'm a C++ programmer just lurking here, I've never
 actually used D.

 I find it very counter-intuitive that 'empty' is required before
 front or popFront.
 Since 'pump priming' in the constructor isn't wanted either, i'd
 suggest the following protocol:

 while (popFront())
 {
     front;
 }

 popFront is then required to return !empty.
 'empty' as a separate property getter can stay but is not
 required for the protocol.

 This way, it's clear that the work to fetch the next element is
 always done in popFront.

 Generally I find dependecies between functions problematic that
 require a specific call sequence. If they can be removed, they
 should.

 With my proposed solution, there's still one minor dependency,
 namely that front is not valid before the first popFront. This
 could be solved by again combining the two, as proposed by
 someone else in this thread.

 Tobi

First sorry for my english.

I agree.

All work is done in "bool popFront()" (well, this name may no 
longer be the most appropriate).
In this function the first / next item is obtained and caches. 
Returns true if any element, false otherwise.
"front" return de cached element as now, as many times as desired 
and without side effects.
"empty" function is no longer necessary, but it might be useful 
to keep changing the return type for example to int:

     1 - Sure there is more elements
     0 - Sure there is NO more elements
    -1 - I don't known. Try popFront to know if more element

Of course this function as "front", without side effects

Mar 30 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 3/26/2014 11:53 PM, Rainer Schuetze wrote:
  This caching range example:

 ///////////////////////////////////
 T getc(T)();

 struct irange(T)
 {
    bool _cached;
    T _cache;

    this(T[] arr) { _cached = false; }
    bool empty() {
      if(_cached) return false;
      _cache = getc!T();
      return (_cache < 0);

What happens if empty is called twice in a row? Two characters get read! That 
isn't right.

    }
    T front() { empty(); return _cache; }

What happens if empty returns true? EOF? I don't think that's intuitive. You 
could have front throw, but that prevents anyone from building nothrow ranges.

    void popFront() { _cached = false; }
 }

Mar 27 2014

Rainer Schuetze <r.sagitario gmx.de> writes:

On 27.03.2014 10:06, Walter Bright wrote:
 On 3/26/2014 11:53 PM, Rainer Schuetze wrote:
  This caching range example:

 ///////////////////////////////////
 T getc(T)();

 struct irange(T)
 {
    bool _cached;
    T _cache;

    this(T[] arr) { _cached = false; }
    bool empty() {
      if(_cached) return false;
      _cache = getc!T();
      return (_cache < 0);

 What happens if empty is called twice in a row? Two characters get read!
 That isn't right.

Good catch. Somehow I pasted the wrong version, but posted the 
corrections a few minutes later.

    }
    T front() { empty(); return _cache; }

 What happens if empty returns true? EOF? I don't think that's intuitive.
 You could have front throw, but that prevents anyone from building
 nothrow ranges.

The caller is told to guarantee that empty must not return true. I just 
didn't wanted to repeat the stuff in empty. GDC removed it anyway...

Mar 27 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 3/27/2014 12:24 PM, Rainer Schuetze wrote:
 The caller is told to guarantee that empty must not return true.

I think that is the very definition of looking under the hood in order to 
violate protocol. A range CANNOT be written generically and support that.

Mar 27 2014

"Regan Heath" <regan netmail.co.nz> writes:

On Thu, 27 Mar 2014 02:19:13 -0000, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:
 if(!r.empty)
 {
     auto r2 = map!(x => x * 2)(r);
     do
     {
        auto x = r2.front;
        ...
     } while(!r2.empty);
 }

if(r.empty)
   return;

auto r2 = map!(x => x * 2)(r);
while(!r2.empty)
{
    auto x = r2.front;
    ...
    r2.popFront();  //bug fix for your version which I noticed because I  
followed "the pattern" :D
}

ahh.. much better.

R

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/

Mar 27 2014

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Wednesday, March 26, 2014 10:36:08 Andrei Alexandrescu wrote:
 On 3/26/14, 8:37 AM, Steven Schveighoffer wrote:
 On Wed, 26 Mar 2014 11:09:04 -0400, Regan Heath <regan netmail.co.nz>
 
 wrote:
 On Wed, 26 Mar 2014 12:30:53 -0000, Steven Schveighoffer
 
 <schveiguy yahoo.com> wrote:
 Gah, I didn't cut out the right rules. I meant the two rules that
 empty must be called before others. Those are not necessary.

 
 I see.  I was thinking we ought to make empty mandatory to give more
 guaranteed structure for range implementors, so lazy initialisation
 can be done in one place only, etc etc.

 
 Yes, but when you know that empty is going to return false, there isn't
 any logical reason to call it. It is an awkward requirement.
 
 I had the same thinking as you, why pay for an extra check for all 3
 calls? But there was already evidence that people were avoiding empty.

 
 I think requiring users to call empty before front on input ranges is a
 concession we should make.

I don't know. It's not the end of the world if we require it (at least with a 
range that doesn't have length), since you're almost always going to be forced 
to call empty anyway, because ranges are generally used in generic code. 
However, it seems really off to me for there to be any real work going on in 
empty, and I'd be leery of any range which actually required empty to be 
called before calling front (though I definitely think that calling front on 
an empty range should not be considered okay and would expect that to 
generally be undefined behavior).

Regardless, if the range has length, then I'd think that quite a few of the 
algorithms which actually used length wouldn't actually need to call empty, 
making calling empty unnecessary overhead. But for the most part, I think that 
a lot of the weird cases go away when you end up with length and random-access 
ranges and the like, since that usually means that the range isn't generative 
but likely has an array underneath (though map certainly has its quirks thanks 
to the fact that front could technically allocate a new object on every call, 
and it can be random-access).

- Jonathan M Davis

Mar 26 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 3/26/2014 11:48 PM, Jonathan M Davis wrote:
 However, it seems really off to me for there to be any real work going on in
 empty,

Many things, like some ttys, can not be tested for being empty without reading 
it, and reading of a tty is destructive.

 and I'd be leery of any range which actually required empty to be
 called before calling front (though I definitely think that calling front on
 an empty range should not be considered okay and would expect that to
 generally be undefined behavior).

That's exactly why empty should be called before front - because front is 
expected to succeed.

Mar 27 2014

=?UTF-8?B?Ikx1w61z?= Marques" <luis luismarques.eu> writes:

On Sunday, 23 March 2014 at 00:50:34 UTC, Walter Bright wrote:
 It's become clear to me that we've underspecified what an 
 InputRange is.

Some thoughts:

- A while ago I realized I did not know how some of those 
underspecified details should be implemented (I think my initial 
doubt was if the range was expected/required to enforce(!empty)). 
Assuming it was just my ignorance (and incomplete or hard to find 
documentation), I asked in the IRC channel and, IIRC, the answer 
I got there was that ranges are an abstract concept, so such 
minutia were supposed to be implementation details. From this 
forum thread and that IRC feedback, it seems that, indeed, people 
have been operating under an illusion of a shared universal 
assumption. I'm very happy to see this addressed.

- (Lots of text deleted because I was no longer sure about it ...)

- Is it allowed for an InputRange to become empty and then not 
empty again? For a finite RandomAccessRange to increase in 
length? E.g.:

     // Example 1: (with "outside" control flow)
     if(someQueue.empty)
         addStuffToQueue();
     assert(!someQueue.empty);

     // Example 2: (with only range-related control flow)
     if(!someQueue.empty)
     {
         auto n = someQueue.length;
         someQueue.popFront();
         assert(someQueue.length == n-1); // can this fail?
     }

- Is it allowed to not call front? E.g.:

     void dropAll(Range)(Range range)
     {
         while(!empty)
              range.popFront();
     }

Mar 27 2014

"Daniel Murphy" <yebbliesnospam gmail.com> writes:

""Luís Marques " wrote in message 
news:exmfmqftpykoaxdgluit forum.dlang.org...

 - Is it allowed for an InputRange to become empty and then not empty 
 again? For a finite RandomAccessRange to increase in length? E.g.:

Not by itself - if a range's empty has returned true, calling empty 
repeatedly should continue to return true.

If you change it externally (ie not by using the range interface) then it 
can become non-empty.

eg

int[] arr;

assert(arr.empty);
arr ~= 3;
// no longer empty, but we used a method outside the range interface to 
change it.

I don't think a random number generator or some hardware device producing 
more data would be a good reason to change empty - range.empty is not asking 
'is there more data ready', it's asking 'are there items left to iterate 
over'.

 - Is it allowed to not call front? E.g.:

Yes.

Mar 29 2014

"Peter Alexander" <peter.alexander.au gmail.com> writes:

On Sunday, 23 March 2014 at 00:50:34 UTC, Walter Bright wrote:
 It's become clear to me that we've underspecified what an 
 InputRange is.

As someone who has missed this discussion, can I just say this: 
can we please document the result somewhere, and possibly even 
announce it clearly so that people know that something has 
changed?

Mar 29 2014

D Programming

C/C++ Programming

Other

digitalmars.D - protocol for using InputRanges