www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - protocol for using InputRanges

reply Walter Bright <newshound2 digitalmars.com> writes:
It's become clear to me that we've underspecified what an InputRange is. The 
normal way to use it is:

     while (!r.empty) {
         auto e = r.front;
         ... do something with e ...
         r.popFront();
     }

no argument there. But there are two issues:

1. If you know the range is not empty, is it allowed to call r.front without 
calling r.empty first?

If this is true, extra logic will need to be added to r.front in many cases.

2. Can r.front be called n times in a row? I.e. is calling front() destructive?

If true, this means that r.front will have to cache a copy in many cases.
Mar 22 2014
next sibling parent "Jakob Ovrum" <jakobovrum gmail.com> writes:
On Sunday, 23 March 2014 at 00:50:34 UTC, Walter Bright wrote:
 2. Can r.front be called n times in a row? I.e. is calling 
 front() destructive?

 If true, this means that r.front will have to cache a copy in 
 many cases.

If `front` was destructive there would be little point in having it separate from `popFront`. I think it must be non-destructive to make sense.
Mar 22 2014
prev sibling next sibling parent "Adam D. Ruppe" <destructionator gmail.com> writes:
On Sunday, 23 March 2014 at 00:50:34 UTC, Walter Bright wrote:
         auto e = r.front;

Remember that front is a getter property, which means it should work like a variable. Typically, reading a variable is not destructive and needs no preparation. There's exceptions to the rule, but they almost always work this way, so ranges should too.
 If true, this means that r.front will have to cache a copy in 
 many cases.

Indeed, in many of my quick input ranges, I just make front a regular variable and popFront updates it to the next item.
Mar 22 2014
prev sibling next sibling parent reply "Rikki Cattermole" <alphaglosined gmail.com> writes:
On Sunday, 23 March 2014 at 00:50:34 UTC, Walter Bright wrote:
 It's become clear to me that we've underspecified what an 
 InputRange is. The normal way to use it is:

     while (!r.empty) {
         auto e = r.front;
         ... do something with e ...
         r.popFront();
     }

 no argument there. But there are two issues:

 1. If you know the range is not empty, is it allowed to call 
 r.front without calling r.empty first?

 If this is true, extra logic will need to be added to r.front 
 in many cases.

 2. Can r.front be called n times in a row? I.e. is calling 
 front() destructive?

 If true, this means that r.front will have to cache a copy in 
 many cases.

It would be nice to have ranges full stop described. And how to use/make them in D. I have yet to learn them because of no official documentation on them. On this topic we also need to work on common patterns and creating documentation on e.g. the site or wiki for it. Saw that we do have a bit in the wiki under tutorials. Maybe if I get some time I'll work on that.
Mar 22 2014
parent Walter Bright <newshound2 digitalmars.com> writes:
On 3/27/2014 3:21 AM, Chris wrote:
 I agree. I've been using ranges for a while now and have tried
 different implementations based on advice given on this forum and
 depending on each case. After reading this thread I looked at
 some of my ranges and I have to say I still have no clue as to
 what should and should _not_ be done (regardless of whether you
 _can_ do it). For a while I thought that it's my lack of
 understanding, but this thread shows that everyone has a
 different view of ranges. Guidelines with use cases would be
 great. I remember I mentioned this in another thread already. The
 thing is that experimenting without proper guidelines leaves your
 code in an inconsistent state where you have two or more ranges
 doing technically the same thing but each with a different logic.

Following the protocol empty-front-popFront always works. Where the differences come in is when people skip calling empty.
Mar 27 2014
prev sibling next sibling parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Saturday, March 22, 2014 17:50:34 Walter Bright wrote:
 It's become clear to me that we've underspecified what an InputRange is. The
 normal way to use it is:
 
      while (!r.empty) {
          auto e = r.front;
          ... do something with e ...
          r.popFront();
      }
 
 no argument there. But there are two issues:
 
 1. If you know the range is not empty, is it allowed to call r.front without
 calling r.empty first?
 
 If this is true, extra logic will need to be added to r.front in many cases.

You definitely don't have to call empty before calling front if you know that it's not empty. Both front and empty should normally be pure (or at least act that way) and essentially act like variables. In most cases, it works best for the work of the range to go in popFront. The exception is when you're dealing with a random-access range, since then any element could be accessed, making it so that you can't be doing the work in popFront. I think that we have a general agreement on this based on previous discussions, though it's certainly not unanimous.
 2. Can r.front be called n times in a row? I.e. is calling front()
 destructive?
 
 If true, this means that r.front will have to cache a copy in many cases.

If calling front were destructive, that would break a lot of code. It's probably true that most range-based code should avoid calling front multiple times (in case front has to do more work than just return the value as well as to avoid copying the result if that happens on every call), though if front is auto ref, it could be more efficient to call it multiple times. So, it's not entirely clear-cut. But again, front and empty should normally function as if they were variables. They should be property functions and should be pure (or at least act like they're pure). I'm sure that a _lot_ of code will break if that isn't followed. There are corner cases which can get a bit mucky though - e.g. auto a = map!(to!string)(range); In this case, front is pure, but it returns a new value each time (albeit a value that's equal each time until popFront is called). And there's no real way to fix that if the resulting range is random access (though if it weren't, the work could go in popFront, which _would_ make it so that front always returned the same result). And there have been arguments over whether the result of front should be valid after popFront has been called (i.e. whether it's transient or not). A lot of code assumes that it will be, but we have some nasty exceptions (e.g. std.stdio.ByLine) - typically because front's a buffer which gets reused. IIRC, in those cases, Andrei favored saying that input ranges that weren't forward ranges could have a transient front but that forward ranges couldn't (which I tend to agree with, though I'd prefer that _no_ ranges have transient fronts, since it can really cause problems - e.g. std.array.array not working). I don't think that a consensus was reached on that though, since a few folks really liked using transient fronts with more complicated ranges. In general though, I think that most of us would agree that front and empty should be treated as properties - i.e. as if they were variables - and that they should have try to stick to those semantics as closely as possible. Ranges that stray from that seriously risk not working with a lot of range- based code. - Jonathan M Davis
Mar 23 2014
parent Walter Bright <newshound2 digitalmars.com> writes:
On 3/23/2014 12:53 AM, Jonathan M Davis wrote:
 But again, front and empty should normally function as if they were variables.
 They should be property functions and should be pure (or at least act like
 they're pure).

This is simply not practical. The canonical example is a tty. You cannot tell if a character is available unless you try to read it. That is NOT pure.
 I'm sure that a _lot_ of code will break if that isn't followed.

It seems there have been a lot of undocumented assumptions about the behavior of ranges.
Mar 25 2014
prev sibling next sibling parent Marco Leise <Marco.Leise gmx.de> writes:
Am Sat, 22 Mar 2014 17:50:34 -0700
schrieb Walter Bright <newshound2 digitalmars.com>:

 It's become clear to me that we've underspecified what an InputRange is. The 
 normal way to use it is:
 
      while (!r.empty) {
          auto e = r.front;
          ... do something with e ...
          r.popFront();
      }
 
 no argument there. But there are two issues:
 
 1. If you know the range is not empty, is it allowed to call r.front without 
 calling r.empty first?
 
 If this is true, extra logic will need to be added to r.front in many cases.

Looking at ranges as a user with all the subjectivity it entails, I'd expect .empty to be a read-only property, a safe-const-pure-nothrow candidate. Actions that are not logically const are commonly encapsulated in methods (as opposed to read-only properties like .empty) and often also carry a verb in their name like "create", "update" or "append".
 2. Can r.front be called n times in a row? I.e. is calling front() destructive?
 
 If true, this means that r.front will have to cache a copy in many cases.

On the other side of it were (IIRC) there are use cases for return-by-ref on .front: o .front gives access to a large struct. Return-by-ref can avoid a copy on the call site, but may result in several .front calls. o .front is a view into some struct which holds system resources and providing a copy requires internals as in the File struct. I both cases you deal with a range over value types that are costly to copy and where "auto e = r.front;" is to be avoided. I'm 100% unsure how to deal with this. -- Marco
Mar 23 2014
prev sibling next sibling parent reply "w0rp" <devw0rp gmail.com> writes:
I understand it like this.

* empty - Are there no more values?
* front - Get me the current value.
* popFront - Advance to the next value.

In terms of how I implement an InputRange in general, I typically 
end up with this.

* empty - Advance and cache "current value," return true if we 
ran out of values.
* front - enforce(!empty), which in turn caches the current 
value, which we then return.
* popFront - enforce(!empty), clear the cached value so we can 
later advance.

So .front gives you the same thing again and again until you call 
popFront, you could indeed call .front before .empty, but you may 
get an exception. This obviously isn't how I implement all 
InputRanges, as there are better ways to write other ranges, such 
as ranges which operate on sets of integers or wrap arrays. This 
is just my last resort general case InputRange implementation.

.front assignment obviously replaces the cached value.
Mar 23 2014
parent Walter Bright <newshound2 digitalmars.com> writes:
On 3/23/2014 2:26 AM, w0rp wrote:
 .front assignment obviously replaces the cached value.

This is another undocumented assumption.
Mar 25 2014
prev sibling next sibling parent "Szymon Gatner" <noemail gmail.com> writes:
On Sunday, 23 March 2014 at 00:50:34 UTC, Walter Bright wrote:
 1. If you know the range is not empty, is it allowed to call 
 r.front without calling r.empty first?

IMO: yes. Logic of empty() sould be const and not have side effects.
 If this is true, extra logic will need to be added to r.front 
 in many cases.

 2. Can r.front be called n times in a row? I.e. is calling 
 front() destructive?

 If true, this means that r.front will have to cache a copy in 
 many cases.

Yes, all true. Not sure if there is something like that in Phobos but if for example you had RandomValueRange, ever call to front() should return the same random number until popFront() is called at which point internal front variable is being recalculated and cached. Since only popFront() is non-const, it is there where all the magic/mutation should happen. In my code I also have CachingRange which calls front() on underlying range the first time and then stores result internally. I use it for ranges which generate front() on the fly and I knot it is expensive. I could cache that calculation directly in that underlying range but CachingRange is reusable.
Mar 23 2014
prev sibling next sibling parent "w0rp" <devw0rp gmail.com> writes:
I should add that I have implemented some ranges where .front and 
.popFront are both nothrow, as !empty doesn't "advance and cache" 
for these ranges and the check is moved into an in{} contract. 
For these ranges, they tend to behave like arrays with bounds 
checking, only now the bounds checking is turned off by virtue of 
-release.
Mar 23 2014
prev sibling next sibling parent "Szymon Gatner" <noemail gmail.com> writes:
On Sunday, 23 March 2014 at 09:26:53 UTC, w0rp wrote:
 I understand it like this.

 * empty - Are there no more values?
 * front - Get me the current value.
 * popFront - Advance to the next value.

That is correct.
 In terms of how I implement an InputRange in general, I 
 typically end up with this.

 * empty - Advance and cache "current value," return true if we 
 ran out of values.
 * front - enforce(!empty), which in turn caches the current 
 value, which we then return.
 * popFront - enforce(!empty), clear the cached value so we can 
 later advance.

 So .front gives you the same thing again and again until you 
 call popFront, you could indeed call .front before .empty, but 
 you may get an exception. This obviously isn't how I implement 
 all InputRanges, as there are better ways to write other 
 ranges, such as ranges which operate on sets of integers or 
 wrap arrays. This is just my last resort general case 
 InputRange implementation.

 .front assignment obviously replaces the cached value.

That is not consistent with the first part of your reply and is incorrect imho. Calling empty() twice should not destroy anything, even according to your understanding. Neither should front(). User should be able to call them as many times as he wishes getting same value every time. Only popFront() mutate the range.
Mar 23 2014
prev sibling next sibling parent "Szymon Gatner" <noemail gmail.com> writes:
On Sunday, 23 March 2014 at 09:34:28 UTC, Szymon Gatner wrote:
 On Sunday, 23 March 2014 at 00:50:34 UTC, Walter Bright wrote:
 1. If you know the range is not empty, is it allowed to call 
 r.front without calling r.empty first?

IMO: yes. Logic of empty() sould be const and not have side effects.
 If this is true, extra logic will need to be added to r.front 
 in many cases.

 2. Can r.front be called n times in a row? I.e. is calling 
 front() destructive?

 If true, this means that r.front will have to cache a copy in 
 many cases.

Yes, all true. Not sure if there is something like that in Phobos but if for example you had RandomValueRange, ever call to front() should return the same random number until popFront() is called at which point internal front variable is being recalculated and cached. Since only popFront() is non-const, it is there where all the magic/mutation should happen. In my code I also have CachingRange which calls front() on underlying range the first time and then stores result internally. I use it for ranges which generate front() on the fly and I knot it is expensive. I could cache that calculation directly in that underlying range but CachingRange is reusable.

I tend to think about pair of C++ iterators, which are generalization of pointers. So for C pointers 'first' and 'last': *first -> front() first != last -> empty() ++first -> popFont() And I stick to semantics.
Mar 23 2014
prev sibling next sibling parent Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On 23/03/14 08:53, Jonathan M Davis wrote:
 But again, front and empty should normally function as if they were variables.
 They should be property functions and should be pure (or at least act like
 they're pure). I'm sure that a _lot_ of code will break if that isn't
 followed.

There are some not-very-nice exceptions to that in std.random, where in many of the range types you have stuff along the following lines: auto front() property { if (notInitialized) { init(); } return _value; } see e.g. MersenneTwister, RandomSample, and others. It's all a nasty consequence of that RNGs-as-value-types problem we've discussed previously. However, the class-based std.random2 that is being discussed on the announce list fixes that: http://forum.dlang.org/thread/cyytvhixkqlbwkmiugex forum.dlang.org
 In general though, I think that most of us would agree that front and empty
 should be treated as properties - i.e. as if they were variables - and that
 they should have try to stick to those semantics as closely as possible.
 Ranges that stray from that seriously risk not working with a lot of range-
 based code.

Yes, absolutely.
Mar 23 2014
prev sibling next sibling parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 03/23/2014 01:50 AM, Walter Bright wrote:
 2. Can r.front be called n times in a row? I.e. is calling front()
 destructive?

 If true, this means that r.front will have to cache a copy in many cases.

An analogous problem exists and is more severe for RandomAccessRanges. import std.algorithm, std.range; class Obj{ this(int){} } void main(){ auto x=[1,2,3].map!(a=>new Obj(a)); auto a=x.front; auto b=x.front; auto c=x[0]; auto d=x[0]; assert(a is b); // fail assert(b is c); // fail assert(c is d); // fail }
Mar 23 2014
parent "monarch_dodra" <monarchdodra gmail.com> writes:
On Sunday, 23 March 2014 at 11:52:53 UTC, Timon Gehr wrote:
 On 03/23/2014 01:50 AM, Walter Bright wrote:
 2. Can r.front be called n times in a row? I.e. is calling 
 front()
 destructive?

 If true, this means that r.front will have to cache a copy in 
 many cases.

An analogous problem exists and is more severe for RandomAccessRanges. import std.algorithm, std.range; class Obj{ this(int){} } void main(){ auto x=[1,2,3].map!(a=>new Obj(a)); auto a=x.front; auto b=x.front; auto c=x[0]; auto d=x[0]; assert(a is b); // fail assert(b is c); // fail assert(c is d); // fail }

I have an open proposal for a "cache" range adaptor that somewhat eliviates the problem: https://github.com/D-Programming-Language/phobos/pull/1364 But it restricts the range to bidirectional (for the reasons above). Actually, it's mimited to Bidir, but if you *do* use it as bidirectional, then 1 element will be evaluated twice. So I'm wondering if it might not be better to simply restrict it to forward...
Mar 24 2014
prev sibling next sibling parent "Tommi" <tommitissari hotmail.com> writes:
On Sunday, 23 March 2014 at 00:50:34 UTC, Walter Bright wrote:
 It's become clear to me that we've underspecified what an 
 InputRange is. The normal way to use it is:

     while (!r.empty) {
         auto e = r.front;
         ... do something with e ...
         r.popFront();
     }

 no argument there. But there are two issues:

 1. If you know the range is not empty, is it allowed to call 
 r.front without calling r.empty first?

 If this is true, extra logic will need to be added to r.front 
 in many cases.

 2. Can r.front be called n times in a row? I.e. is calling 
 front() destructive?

 If true, this means that r.front will have to cache a copy in 
 many cases.

Is InputRange supposed to be a one-pass range and am I supposed to able to use InputRange as a lightweight wrapper for a stream? If the answer is yes, then I think the fundamental issue is that the empty-front-popFront interface is not optimal for something like InputRange. But given that's the interface we have, I think that InputRange's front must be allowed to be destructive because the stream it could potentially be wrapping is destructive.
Mar 23 2014
prev sibling next sibling parent "Tommi" <tommitissari hotmail.com> writes:
On Sunday, 23 March 2014 at 07:54:12 UTC, Jonathan M Davis wrote:
 If calling front were destructive, that would break a lot of 
 code.

I thought that "breaking existing code" meant either "causing existing code do something it wasn't supposed to do" or "causing existing code not compile", but you're using it in the meaning of "making existing code not conform to the language specification". Are you sure that's the correct use of the expression?
Mar 23 2014
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Sun, 23 Mar 2014 09:00:17 -0400, Tommi <tommitissari hotmail.com> wrote:

 On Sunday, 23 March 2014 at 00:50:34 UTC, Walter Bright wrote:
 It's become clear to me that we've underspecified what an InputRange  
 is. The normal way to use it is:

     while (!r.empty) {
         auto e = r.front;
         ... do something with e ...
         r.popFront();
     }

 no argument there. But there are two issues:

 1. If you know the range is not empty, is it allowed to call r.front  
 without calling r.empty first?

 If this is true, extra logic will need to be added to r.front in many  
 cases.

 2. Can r.front be called n times in a row? I.e. is calling front()  
 destructive?

 If true, this means that r.front will have to cache a copy in many  
 cases.

Is InputRange supposed to be a one-pass range and am I supposed to able to use InputRange as a lightweight wrapper for a stream? If the answer is yes, then I think the fundamental issue is that the empty-front-popFront interface is not optimal for something like InputRange. But given that's the interface we have, I think that InputRange's front must be allowed to be destructive because the stream it could potentially be wrapping is destructive.

A range interface only works for a buffered stream, which naturally allows caching. -Steve
Mar 24 2014
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Sun, 23 Mar 2014 05:34:26 -0400, Szymon Gatner <noemail gmail.com>  
wrote:

 On Sunday, 23 March 2014 at 00:50:34 UTC, Walter Bright wrote:
 1. If you know the range is not empty, is it allowed to call r.front  
 without calling r.empty first?

IMO: yes. Logic of empty() sould be const and not have side effects.

Here is the crux of the issue. I think aside from Walter's question, we have situations where it is expensive to construct each value of a range. For a range that is never used, that is an expense that you don't have to pay. There was a post a little while ago about how File.byLine reads the first line on construction. Consider the case where stdin.byLine needs to be constructed, but not *iterated* before writing some information to stdout. This gives us a predicament that the first line has to be available before you can output any instructions. As we all know, stdin will block on first read, unless you piped in a file or something. This makes an interactive program simply incorrect. From the library point of view, caching the first line on construction is the most sensible thing -- This allows empty, front, and popFront to all be consistent across the entire iteration. Making empty or front do something different on the first access requires awkward machinery (a boolean indicating it should do so, plus a check every call thereafter). But from the user's point of view, sometimes you want lazy access to the first element, for logical reasons. I think in light of such possibilities, empty and front should not have to be const or pure. Logically, it can be viewed that way, but not technically. -Steve
Mar 24 2014
prev sibling next sibling parent "w0rp" <devw0rp gmail.com> writes:
On Sunday, 23 March 2014 at 09:38:43 UTC, Szymon Gatner wrote:
 On Sunday, 23 March 2014 at 09:26:53 UTC, w0rp wrote:
 I understand it like this.

 * empty - Are there no more values?
 * front - Get me the current value.
 * popFront - Advance to the next value.

That is correct.
 In terms of how I implement an InputRange in general, I 
 typically end up with this.

 * empty - Advance and cache "current value," return true if we 
 ran out of values.
 * front - enforce(!empty), which in turn caches the current 
 value, which we then return.
 * popFront - enforce(!empty), clear the cached value so we can 
 later advance.

 So .front gives you the same thing again and again until you 
 call popFront, you could indeed call .front before .empty, but 
 you may get an exception. This obviously isn't how I implement 
 all InputRanges, as there are better ways to write other 
 ranges, such as ranges which operate on sets of integers or 
 wrap arrays. This is just my last resort general case 
 InputRange implementation.

 .front assignment obviously replaces the cached value.

That is not consistent with the first part of your reply and is incorrect imho. Calling empty() twice should not destroy anything, even according to your understanding. Neither should front(). User should be able to call them as many times as he wishes getting same value every time. Only popFront() mutate the range.

You read wrong. Say you have this sequence of three numbers and 'x' represents the cached value being empty, for a range 'r.' --- 1 2 3 x r.empty 1 2 3 ^ r.empty 1 2 3 ^ r.front == 1 r.popFront() 1 2 3 x r.empty 1 2 3 ^ --- front and popFront each enforce(!empty) first, so this works, etc. --- 1 2 3 x r.popFront() # enforce(!empty); 1 2 3 ^ # Clear the cached value. 1 2 3 x --- So it works for any sequence of values, and you can call .empty and .front as much as you want without changing the result. It just so happens that it's actually .empty which does the work of actually fetching the next value. There are better ways to implement this for certain ranges, but this works for anything.
Mar 24 2014
prev sibling next sibling parent reply "Dicebot" <public dicebot.lv> writes:
On Sunday, 23 March 2014 at 00:50:34 UTC, Walter Bright wrote:
 It's become clear to me that we've underspecified what an 
 InputRange is. The normal way to use it is:

     while (!r.empty) {
         auto e = r.front;
         ... do something with e ...
         r.popFront();
     }

 no argument there. But there are two issues:

 1. If you know the range is not empty, is it allowed to call 
 r.front without calling r.empty first?

 If this is true, extra logic will need to be added to r.front 
 in many cases.

 2. Can r.front be called n times in a row? I.e. is calling 
 front() destructive?

 If true, this means that r.front will have to cache a copy in 
 many cases.

I think there is one design mistake with current InputRange rules that makes usage so inconsistent. We have `empty` but don't have any distinct `not yet started` state. If calling `popFront` at least once was required before accessing `front`, I can't imagine the case where having any non-trivial code in `front` would have been necessary.
Mar 24 2014
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 3/25/14, 12:16 PM, Dicebot wrote:
 On Monday, 24 March 2014 at 22:26:10 UTC, Joseph Rushton Wakeling wrote:
 On 24/03/14 14:20, Dicebot wrote:
 I think there is one design mistake with current InputRange rules
 that makes
 usage so inconsistent. We have `empty` but don't have any distinct
 `not yet
 started` state. If calling `popFront` at least once was required before
 accessing `front`, I can't imagine the case where having any
 non-trivial code in
 `front` would have been necessary.

I floated some ideas along those lines, for a "first" method that would be automatically called immediately before the first call to front, popFront, etc., whatever came first.

I was thinking about something more simple. Current pattern is: while (!r.empty) { auto useme = r.front; r.popFront; } And I think this would have been more practical: r.popFront; while (!r.empty) { // same } So every range is supposed to start in "init" state and provide data only after first popFront. No other changes. It is not a silver bullet but looks like an improvement over current design to me.

Suggestion: focusing on what to do within the present context is more productive. Andrei
Mar 25 2014
prev sibling next sibling parent Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On 24/03/14 14:20, Dicebot wrote:
 I think there is one design mistake with current InputRange rules that makes
 usage so inconsistent. We have `empty` but don't have any distinct `not yet
 started` state. If calling `popFront` at least once was required before
 accessing `front`, I can't imagine the case where having any non-trivial code
in
 `front` would have been necessary.

I floated some ideas along those lines, for a "first" method that would be automatically called immediately before the first call to front, popFront, etc., whatever came first. The trouble is that it's so vague _when_ it should be called. Example from std.random: RandomSample not only has a .front which needs a check as to whether the first value has actually been selected, it also has an .index method which corresponds to the index of the chosen element of the range being sampled from. Now, how is a "first" method supposed to know which methods it should precede? Any method? Specified ones? In any case, even if we could get it to work, it'd be a convenience rather than a solution, and would still in effect result in a non-const .front. OTOH a requirement that .popFront() be called at least once before .front is accessed won't wash either. At least in the case of randomSample, the code required to generate the _first_ sample point is different from what is required to .popFront().
Mar 24 2014
prev sibling next sibling parent "Chris" <wendlec tcd.ie> writes:
On Sunday, 23 March 2014 at 00:50:34 UTC, Walter Bright wrote:
 It's become clear to me that we've underspecified what an 
 InputRange is. The normal way to use it is:

     while (!r.empty) {
         auto e = r.front;
         ... do something with e ...
         r.popFront();
     }

 no argument there. But there are two issues:

 1. If you know the range is not empty, is it allowed to call 
 r.front without calling r.empty first?

 If this is true, extra logic will need to be added to r.front 
 in many cases.

 2. Can r.front be called n times in a row? I.e. is calling 
 front() destructive?

 If true, this means that r.front will have to cache a copy in 
 many cases.

In many cases it makes sense to put things into front, for example if you read a file line by line, parsing each line to create a dictionary entry. Although it has been said that any logic should go into popFront, because front can be called several times. Since Ranges usually perform some kind of filtering mechanism and return something new, it makes sense that things go into front, or at least should be allowed to go into front. In my experience it is a bit of a "the hen or the egg" problem, to get something you have to call front, but the something should only be done in popFront. Ranges are often simpler and faster to write, when you put things into front, because in most of my use cases front is not called many times in a row, just once for each item. I still haven't found the golden way(s) of writing ranges.
Mar 25 2014
prev sibling next sibling parent "Dicebot" <public dicebot.lv> writes:
On Monday, 24 March 2014 at 22:26:10 UTC, Joseph Rushton Wakeling 
wrote:
 On 24/03/14 14:20, Dicebot wrote:
 I think there is one design mistake with current InputRange 
 rules that makes
 usage so inconsistent. We have `empty` but don't have any 
 distinct `not yet
 started` state. If calling `popFront` at least once was 
 required before
 accessing `front`, I can't imagine the case where having any 
 non-trivial code in
 `front` would have been necessary.

I floated some ideas along those lines, for a "first" method that would be automatically called immediately before the first call to front, popFront, etc., whatever came first.

I was thinking about something more simple. Current pattern is: while (!r.empty) { auto useme = r.front; r.popFront; } And I think this would have been more practical: r.popFront; while (!r.empty) { // same } So every range is supposed to start in "init" state and provide data only after first popFront. No other changes. It is not a silver bullet but looks like an improvement over current design to me.
Mar 25 2014
prev sibling next sibling parent "Dicebot" <public dicebot.lv> writes:
On Tuesday, 25 March 2014 at 19:44:46 UTC, Andrei Alexan
 Suggestion: focusing on what to do within the present context 
 is more productive.

 Andrei

I do work on template argument pack & friends right now, leave me some fun! :P
Mar 25 2014
prev sibling next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
It's pretty clear that:

1. the protocol is COMPLETELY undocumented and undefined.

2. in this vacuum, people (including me) have made all sorts of assumptions,
and 
assumed those assumptions were universal and shared.

Since ranges+algorithms are central to D, this is a very serious problem. There 
are two competing schools of thought:

1. maximum flexibility

2. maximum performance

(1) has been explained well in this thread already. (2) not so much, so I'll 
stand up for that one.

I've been recently involved in writing high performance apps, as anyone 
following ScopeBuffer knows. With such apps, it's clear that:

1. size matters - the fewer registers an object fits in, the faster it goes
2. every instruction matters - more operations == slower code
3. lazy seems to be faster, mainly because it eliminates the storage management 
cost of buffers and extra copying into and out of those buffers

We want to appeal to the high performance coders. To maximize performance, 
ranges should be optimized to the inner loop case, which is:

     while (!r.empty) { auto e = r.front; ... e ...; r.popFront(); }

With the additional proviso that ranges are passed to algorithms by value, so 
they should be cheap to copy. Cheap to copy usually means them being small.

A) I know that my range is not empty, so I can skip calling empty.

Since front is guaranteed to succeed if !empty, this puts a requirement on many 
ranges that they have a non-trivial constructor that 'primes the pump'. Of 
course, priming may fail, and so construction may throw, which is not good 
design. Next, priming requires a buffer in the range. Buffers add size, making 
them slower to copy, and will often require another field saying if the buffer 
has data in it or not, further bumping the size. And lastly, it means that lazy 
ranges will be required to read the first element, even if the range isn't then 
subsequently used, which defeats what one would expect from a lazy range. By 
having mere construction of a range initiate reading from source, this makes
the 
useful idiom of separating construction from use impractical. (For example, 
building a pipeline object, then sending it to another thread for execution.)

All this saves for the user is one call to empty for the entire algorithm, at a 
cost incurred with every iteration. I.e. it selects O(n) to save O(1).

B) I want to be able to call front multiple times in a row, and expect to get 
the same value.

This can force the range to incorporate a one element buffer and a flag 
indicating the status of that buffer. The range instance gets bigger and more 
expensive to copy, and the cost of manipulating the flag and the buffer is
added 
to every loop iteration. Note that the user of a range can trivially use:
      auto e = r.front;
      ... using e multiple times ...
instead.

The big problem with (A) and (B) is these costs become present in every range, 
despite being rarely needed and having simple alternatives. This is the wrong 
place to put cost and complexity. The user should be making these decisions 
based on his needs.

Hence, I propose that the protocol for using input ranges be defined as:

     while (!r.empty) { auto e = r.front; ... e ...; r.popFront(); }

This makes it possible to build pipelines that are firehoses with no kinks or 
constrictions in them. It optimizes for the inner loop case, not boundary cases.
Mar 25 2014
next sibling parent reply "monarch_dodra" <monarchdodra gmail.com> writes:
On Tuesday, 25 March 2014 at 20:15:32 UTC, Walter Bright wrote:
 It's pretty clear that:

 1. the protocol is COMPLETELY undocumented and undefined.

http://dlang.org/phobos/std_range.html#isInputRange The semantics of an input range (not checkable during compilation) are assumed to be the following (r is an object of type R): r.empty returns false iff there is more data available in the range. r.front returns the current element in the range. It may return by value or by reference. Calling r.front is allowed only if calling r.empty has, or would have, returned false. r.popFront advances to the next element in the range. Calling r.popFront is allowed only if calling r.empty has, or would have, returned false.
 We want to appeal to the high performance coders. To maximize 
 performance, ranges should be optimized to the inner loop case, 
 which is:

     while (!r.empty) { auto e = r.front; ... e ...; 
 r.popFront(); }

This makes the assumption that r.front is copy constructible at all. It also makes the assumption that you want to operate on a copy, rather than the actual element in the range. Finally, it means having to declare a local object: It merely means shifting the burden of caching from one context to another. If the object is large, chances are you are better off just calling front instead of making a copy. Especially if the loop is trivial. If you want high performance, then arguably, just provide a O(1) front, and a O(1) empty. Also, certain ranges, such as "filter" *must* access the front of the previous range more than once. Unless you are suggesting we add a field for it to cache the result of the previous range?
 With the additional proviso that ranges are passed to 
 algorithms by value, so they should be cheap to copy. Cheap to 
 copy usually means them being small.

Unfortunately yes. That said, any range that does anything will have at least two fields, one of which is a slice, or comparable to in terms of size, so it's going to be big anyways. So you *are* better off passing by ref if you can regardless, unless your range is *really* trivial. I agree that range sizes can be a problem.
 A) I know that my range is not empty, so I can skip calling 
 empty.

 Since front is guaranteed to succeed if !empty, this puts a 
 requirement on many ranges that they have a non-trivial 
 constructor that 'primes the pump'. Of course, priming may 
 fail, and so construction may throw, which is not good design.

If the prime fails, then the range can simply be marked as empty. Then if you decide to skip calling empty anyways, it's your own fault.
 And lastly, it means that lazy ranges will be required to read 
 the first element, even if the range isn't then subsequently 
 used, which defeats what one would expect from a lazy range.

I'm not yet convinced of adding special code for ranges that aren't used. I've heard of these kinds of ranges, but I've observed that when you declare one, it almost always ends up being used. I don't think we should waste efforts on this rare usecase. As for "Lazy", in range terms, it mostly only means you calculate things element at once, as you go up the range chain. As opposed to processing the entire input data, one transformation at a time. Evaluating an element "on use" as opposed to "1 instruction before use" doesn't make much of a change in this context.
 All this saves for the user is one call to empty for the entire 
 algorithm, at a cost incurred with every iteration. I.e. it 
 selects O(n) to save O(1).

If code that was actually meant to *do* something was in popFront() to begin with, then there'd be no extra overhead. I've found that if you are creative enough, you can usually design the range in such a way that it works efficiently, lazilly, and without flags.
 Hence, I propose that the protocol for using input ranges be 
 defined as:

     while (!r.empty) { auto e = r.front; ... e ...; 
 r.popFront(); }

 This makes it possible to build pipelines that are firehoses 
 with no kinks or constrictions in them. It optimizes for the 
 inner loop case, not boundary cases.

I get where you are coming from, but it's simply not manageable in a generic fashion, if you want to be able to preserve all the power and the diversity of the ranges we have. The protocol you are suggesting would prevent us from doing a lot of the lovely things that ranges empowers us with.
Mar 25 2014
parent Walter Bright <newshound2 digitalmars.com> writes:
On 3/25/2014 1:56 PM, monarch_dodra wrote:
 http://dlang.org/phobos/std_range.html#isInputRange

 The semantics of an input range (not checkable during compilation) are assumed
 to be the following (r is an object of type R):
 r.empty returns false iff there is more data available in the range.
 r.front returns the current element in the range. It may return by value or by
 reference. Calling r.front is allowed only if calling r.empty has, or would
 have, returned false.
 r.popFront advances to the next element in the range. Calling r.popFront is
 allowed only if calling r.empty has, or would have, returned false.

I overlooked that. Thanks.
 We want to appeal to the high performance coders. To maximize performance,
 ranges should be optimized to the inner loop case, which is:

     while (!r.empty) { auto e = r.front; ... e ...; r.popFront(); }

This makes the assumption that r.front is copy constructible at all. It also makes the assumption that you want to operate on a copy, rather than the actual element in the range.

It's a reasonable requirement. If your range has an issue with this, it can return a pointer to the element, and the element can be a struct with access functions. Then, the pointer will work as well as a copy.
 Finally, it means having to declare a local object: It merely means shifting
the
 burden of caching from one context to another. If the object is large, chances
 are you are better off just calling front instead of making a copy. Especially
 if the loop is trivial.

This does come up as an issue, and is solvable by returning a pointer as I described. It's up to the designer of the range.
 If you want high performance, then arguably, just provide a O(1) front, and a
 O(1) empty.

I don't think the issues can be waved away so easily, or I wouldn't have brought this up.
 Also, certain ranges, such as "filter" *must* access the front of the previous
 range more than once.
 Unless you are suggesting we add a field for it to cache
 the result of the previous range?

This is putting the cost where it belongs - when needed on the user of a range, rather than on all ranges. It's the "pay only for what you need" idea rather than "pay regardless".
 With the additional proviso that ranges are passed to algorithms by value, so
 they should be cheap to copy. Cheap to copy usually means them being small.

two fields, one of which is a slice, or comparable to in terms of size, so it's going to be big anyways. So you *are* better off passing by ref if you can regardless, unless your range is *really* trivial.

Many very useful ranges are trivial. Or at least they should be. An array, for example, is a trivial range.
 A) I know that my range is not empty, so I can skip calling empty.

 Since front is guaranteed to succeed if !empty, this puts a requirement on
 many ranges that they have a non-trivial constructor that 'primes the pump'.
 Of course, priming may fail, and so construction may throw, which is not good
 design.

If the prime fails, then the range can simply be marked as empty. Then if you decide to skip calling empty anyways, it's your own fault.

Yes, one can add state flags to indicate failed construction, which I'll argue is an ugly design. After all, construction is supposed to construct an object or fail, not leave the 'constructed' object in a zombie state.
 And lastly, it means that lazy ranges will be required to read the first
 element, even if the range isn't then subsequently used, which defeats what
 one would expect from a lazy range.

I'm not yet convinced of adding special code for ranges that aren't used. I've heard of these kinds of ranges, but I've observed that when you declare one, it almost always ends up being used. I don't think we should waste efforts on this rare usecase.

It's not rare - it's the primary way ranges are used in C# Linq. Should we throw this entire category of use cases under the bus just to handle a convenience of not needing to call empty in some cases?
 Evaluating an element "on use" as opposed to "1 instruction before use" doesn't
 make much of a change in this context.

Except that it requires the use to start upon construction, which defeats any hope of separating construction of a pipeline from using it.
 I've found that if you are creative enough, you can usually design the range in
 such a way that it works efficiently, lazilly, and without flags.

That's not been my experience.
 I get where you are coming from, but it's simply not manageable in a generic
 fashion, if you want to be able to preserve all the power and the diversity of
 the ranges we have.

 The protocol you are suggesting would prevent us from doing a lot of the lovely
 things that ranges empowers us with.

Please show me such a case. Note that I've shown above that this power and diversity throws an entire use case category under the bus. Secondly, in my experiments with ranges, the power and diversity results in slower pipelines.
Mar 25 2014
prev sibling next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 3/25/14, 1:15 PM, Walter Bright wrote:
 Of course, priming may fail, and so construction may throw, which is not
 good design.

This part I disagree with. -- Andrei
Mar 25 2014
prev sibling next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 3/25/14, 1:15 PM, Walter Bright wrote:
 Next, priming requires a buffer in the range.

No, front requires a buffer in the range. -- Andrei
Mar 25 2014
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
I thought I had only 1-2 comments, but I have a few more.

On 3/25/14, 1:15 PM, Walter Bright wrote:
 1. the protocol is COMPLETELY undocumented and undefined.

s/COMPLETELY/loosely/
 Since ranges+algorithms are central to D, this is a very serious
 problem.

Agreed.
 We want to appeal to the high performance coders. To maximize
 performance, ranges should be optimized to the inner loop case, which is:

      while (!r.empty) { auto e = r.front; ... e ...; r.popFront(); }

Actually generic code often prefers: while (!r.empty) { ... r.front ...; r.popFront(); } That saves copying r.front if it returns by ref. A bunch of algos in std do that.
 Since front is guaranteed to succeed if !empty, this puts a requirement
 on many ranges that they have a non-trivial constructor that 'primes the
 pump'. Of course, priming may fail, and so construction may throw, which
 is not good design.

Again, I disagree with this assertion.
 Next, priming requires a buffer in the range.

Priming has nothing do to with the range being buffered. The entire design of ranges is a one-element buffer.
 Buffers add size, making them slower to copy, and will often require
 another field saying if the buffer has data in it or not, further
 bumping the size.

That's an argument for adding an unbuffered range abstraction.
 All this saves for the user is one call to empty for the entire
 algorithm, at a cost incurred with every iteration. I.e. it selects O(n)
 to save O(1).

I don't understand how that O() reasoning works.
 B) I want to be able to call front multiple times in a row, and expect
 to get the same value.

Correct.
 This can force the range to incorporate a one element buffer and a flag
 indicating the status of that buffer.

It may, but many ranges naturally work that way (e.g. arrays).
 The range instance gets bigger and
 more expensive to copy, and the cost of manipulating the flag and the
 buffer is added to every loop iteration. Note that the user of a range
 can trivially use:
       auto e = r.front;
       ... using e multiple times ...
 instead.

That would pessimize code using arrays of large structs.
 The big problem with (A) and (B) is these costs become present in every
 range, despite being rarely needed and having simple alternatives. This
 is the wrong place to put cost and complexity. The user should be making
 these decisions based on his needs.

 Hence, I propose that the protocol for using input ranges be defined as:

      while (!r.empty) { auto e = r.front; ... e ...; r.popFront(); }

 This makes it possible to build pipelines that are firehoses with no
 kinks or constrictions in them. It optimizes for the inner loop case,
 not boundary cases.

The proposed protocol pessimizes arrays of large structs and will trip the unwary if calling r.front again returns something else. I don't think the proposal is good. Andrei
Mar 25 2014
next sibling parent Timon Gehr <timon.gehr gmx.ch> writes:
On 03/25/2014 10:29 PM, Andrei Alexandrescu wrote:
 B) I want to be able to call front multiple times in a row, and expect
 to get the same value.

Correct.

'map' may fail this criterion. In this case, is the blame put on the user?
Mar 25 2014
prev sibling next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 3/25/2014 2:29 PM, Andrei Alexandrescu wrote:
 We want to appeal to the high performance coders. To maximize
 performance, ranges should be optimized to the inner loop case, which is:

      while (!r.empty) { auto e = r.front; ... e ...; r.popFront(); }

Actually generic code often prefers: while (!r.empty) { ... r.front ...; r.popFront(); } That saves copying r.front if it returns by ref.

Yes, I agree. The idiom foreach (ref e; r) { ... } must also work.
 Since front is guaranteed to succeed if !empty, this puts a requirement
 on many ranges that they have a non-trivial constructor that 'primes the
 pump'. Of course, priming may fail, and so construction may throw, which
 is not good design.

Again, I disagree with this assertion.

The throwing part, or the not good design part?
 Next, priming requires a buffer in the range.

ranges is a one-element buffer.
 Buffers add size, making them slower to copy, and will often require
 another field saying if the buffer has data in it or not, further
 bumping the size.


I don't know how that would fit into the ecosystem. In any case, with the protocol I suggest, the designer of the range is free to distribute the functionality into the three functions, however it makes best sense. With arbitrarily calling any in any order, then all three need to have some sort of signalling mechanism between them.
 All this saves for the user is one call to empty for the entire
 algorithm, at a cost incurred with every iteration. I.e. it selects O(n)
 to save O(1).

I don't understand how that O() reasoning works.

I meant adding extra signalling with every iteration of the loop, rather than requiring the user to call empty before front.
 B) I want to be able to call front multiple times in a row, and expect
 to get the same value.

Correct.
 This can force the range to incorporate a one element buffer and a flag
 indicating the status of that buffer.

It may, but many ranges naturally work that way (e.g. arrays).

Sure, but there are far more range types than arrays.
 The range instance gets bigger and
 more expensive to copy, and the cost of manipulating the flag and the
 buffer is added to every loop iteration. Note that the user of a range
 can trivially use:
       auto e = r.front;
       ... using e multiple times ...
 instead.

That would pessimize code using arrays of large structs.

You're already requiring copying with the buffering requirement. And besides, if I was creating a range of expensive-to-copy objects, I would add a layer of indirection to cheapen it.
 The proposed protocol pessimizes arrays of large structs

Not really more than the existing protocol does (i.e. required buffering).
 and will trip the unwary if calling r.front again returns something else.

I'm not proposing that calling them wrongly would make things unsafe. But I don't think it's unreasonable to expect the protocol to be followed, just like file open/read/close.
 I don't think the proposal is good.

I don't like being in the position of when I need high performance code, I have to implement my own ranges & algorithms, or telling customers they need to do so.
Mar 25 2014
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 3/26/14, 8:37 AM, Steven Schveighoffer wrote:
 On Wed, 26 Mar 2014 11:09:04 -0400, Regan Heath <regan netmail.co.nz>
 wrote:

 On Wed, 26 Mar 2014 12:30:53 -0000, Steven Schveighoffer
 <schveiguy yahoo.com> wrote:

 Gah, I didn't cut out the right rules. I meant the two rules that
 empty must be called before others. Those are not necessary.

I see. I was thinking we ought to make empty mandatory to give more guaranteed structure for range implementors, so lazy initialisation can be done in one place only, etc etc.

Yes, but when you know that empty is going to return false, there isn't any logical reason to call it. It is an awkward requirement. I had the same thinking as you, why pay for an extra check for all 3 calls? But there was already evidence that people were avoiding empty.

I think requiring users to call empty before front on input ranges is a concession we should make. Andrei
Mar 26 2014
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 3/26/2014 7:19 PM, Steven Schveighoffer wrote:
 if(!r.empty)
 {
     auto r2 = map!(x => x * 2)(r);
     do
     {
        auto x = r2.front;
        ...
     } while(!r2.empty);
 }

 Should we be required to re-verify that r2 is not empty before using it? It
 clearly is not, and would be an artificial requirement (one that map likely
 would not enforce!).

 This sounds so much like a convention that simply won't be followed, and the
 result will be perfectly valid code.

The idea is that there are three functions: empty, front, and popFront. The functionality of the range will be distributed between these 3 functions. Different things being ranged over will naturally split their operations into the 3 functions in different ways. If the 3 functions can be called in any order, then extra logic has to be added to them to signal to each other. This slows things down. To write generic code calling ranges, it's necessary to stop assuming what is happening under the hood of empty, front, and popFront, and stick to the protocol.
Mar 26 2014
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 3/26/2014 7:55 PM, Steven Schveighoffer wrote:
 OK, but it's logical to assume you *can* avoid a call to empty if you know
 what's going on under the hood, no? Then at that point, you have lost the
 requirement -- people will avoid calling empty because they can get away with
 it, and then altering the under-the-hood requirements cause code breakage
later.

 Case in point, the pull request I referenced, the author originally tried to
 just use empty to lazily initialize filter, but it failed due to existing code
 in phobos that did not call empty on filtered data before processing. He had to
 instrument all 3 calls.

As with *any* API, if you look under the hood and make assumptions about the behavior based on a particular implementation, assumptions that are not part of the API, the risk of breakage inevitably follows. If you've identified Phobos code that uses ranges but does not follow the protocol, the Phobos code is broken - please file a bugzilla issue on it.
Mar 26 2014
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 3/27/14, 3:49 AM, "Marc Schütz" <schuetzm gmx.net>" wrote:
 I was originally going to do that, but then I took a closer look at the
 documentation, which says ([1] in the documentation of `isInputRange()`):

 "Calling r.front is allowed only if calling r.empty has, or would have,
 returned false."

Probably we need to amend that. For efficient ranges, front() and popFront() should only be guaranteed to work if either empty() or length() were evaluated first, and they returned false or nonzero, respectively. Andrei
Mar 27 2014
next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 3/27/14, 8:44 AM, Steven Schveighoffer wrote:
 At the moment, the only candidates I can think of are lazy caching
 ranges. How many of those exist?

filter comes to mind. -- Andrei
Mar 27 2014
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 3/27/14, 11:53 AM, Steven Schveighoffer wrote:
 On Thu, 27 Mar 2014 12:02:16 -0400, monarch_dodra
 <monarchdodra gmail.com> wrote:

 On Thursday, 27 March 2014 at 15:19:37 UTC, Andrei Alexandrescu wrote:
 On 3/27/14, 3:49 AM, "Marc Schütz" <schuetzm gmx.net>" wrote:
 I was originally going to do that, but then I took a closer look at the
 documentation, which says ([1] in the documentation of
 `isInputRange()`):

 "Calling r.front is allowed only if calling r.empty has, or would have,
 returned false."

Probably we need to amend that. For efficient ranges, front() and popFront() should only be guaranteed to work if either empty() or length() were evaluated first, and they returned false or nonzero, respectively.

I just want to point out that I think allowing empty to have an *observable* side effect will have cataclysmic repercussion in validating a release build, what with all our "assert(!empty);" calls and all.

This is an excellent point. assert(!empty) is everywhere. -Steve

Yah, agreed. -- Andrei
Mar 27 2014
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 3/27/2014 12:50 PM, Andrei Alexandrescu wrote:
 Yah, agreed. -- Andrei

Completely unworkable. To determine if a range is empty or not, it may have to actually read from its input. TTYs are an example. Although empty may then cache the result, and not read the second time it is called, an observer of TTY could see that an item was read from the TTY.
Mar 27 2014
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 3/27/2014 2:13 PM, Steven Schveighoffer wrote:
 Note that a stream makes a terrible range, simply because of what you say --
 reading is destructive, and determining emptiness is dependent on reading. You
 need a buffered stream to make a good range, and then the logic becomes much
 more straightforward.

The range becomes the one element buffer in this case. It is completely workable.
 The awkwardness for shoehorning streams into ranges I see is that the
 advancement (popFront) and the check for termination (empty) are decoupled,
when
 in reality they are synced for a stream. This REQUIRES a sticky bit for
popFront
 to communicate with empty on whether it is EOF or not.

The range protocol is designed to work with streams. It's a giant fail if they do not, or if you want to create a separate, non-range universe to deal with streams.
 Even a single byte buffer is not enough, you need a bool to indicate the stream
 is done.

Right. But empty for a stream still has to read. Just follow the protocol, and the range will work, even with streams.
Mar 27 2014
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 3/27/14, 2:24 PM, Walter Bright wrote:
 The range protocol is designed to work with streams.

It's designed to work with containers.
 It's a giant fail
 if they do not, or if you want to create a separate, non-range universe
 to deal with streams.

It's not a giant fail, we just need to adjust the notion. Andrei
Mar 27 2014
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 3/27/2014 2:56 PM, Andrei Alexandrescu wrote:
 On 3/27/14, 2:24 PM, Walter Bright wrote:
 The range protocol is designed to work with streams.

It's designed to work with containers.

I know we talked about streams when we designed it.
 It's a giant fail
 if they do not, or if you want to create a separate, non-range universe
 to deal with streams.

It's not a giant fail, we just need to adjust the notion.

Are you suggesting that ranges needn't support streams? Note also that I suggested a way Steven could create an adapter with the behavior he desired, yet still adhere to protocol. No notion adjustments required.
Mar 27 2014
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 3/28/2014 1:32 AM, Johannes Pfau wrote:
 Ranges have equivalents in other languages:
 iterators in c++,
 IEnumerator in c#,
 Iterator in java

 all these languages have special stream types for raw data. I don't
 think it's bad if we also have streams/ranges separate in D.

Do you see a point to be able to, in an algorithm, seamlessly swap a socket with a string? empty-front-popFront works with streams and non-streams. Why break this?
Mar 28 2014
parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
28-Mar-2014 13:55, Walter Bright пишет:
 On 3/28/2014 1:32 AM, Johannes Pfau wrote:
 Ranges have equivalents in other languages:
 iterators in c++,
 IEnumerator in c#,
 Iterator in java

 all these languages have special stream types for raw data. I don't
 think it's bad if we also have streams/ranges separate in D.

Do you see a point to be able to, in an algorithm, seamlessly swap a socket with a string?

Certainly NOT a socket. There is no escaping the fact that there are specifics to unbuffered direct streams. What you mention only makes sense with buffering either implicit or (I'd prefer) explicit.
 empty-front-popFront works with streams and non-streams. Why break this?

Pipe dream. -- Dmitry Olshansky
Mar 28 2014
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 3/28/2014 9:48 AM, Dmitry Olshansky wrote:
 28-Mar-2014 13:55, Walter Bright пишет:
 On 3/28/2014 1:32 AM, Johannes Pfau wrote:
 Ranges have equivalents in other languages:
 iterators in c++,
 IEnumerator in c#,
 Iterator in java

 all these languages have special stream types for raw data. I don't
 think it's bad if we also have streams/ranges separate in D.

Do you see a point to be able to, in an algorithm, seamlessly swap a socket with a string?

Certainly NOT a socket. There is no escaping the fact that there are specifics to unbuffered direct streams. What you mention only makes sense with buffering either implicit or (I'd prefer) explicit.

Yes, it does require a one element buffer. But seriously, does a one character buffer from a socket have a measurable impact on reading from a network? I'm an efficiency wonk as much or more than anyone, and this appears to me to be a false savings.
Mar 28 2014
parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
28-Mar-2014 21:07, Walter Bright пишет:
 On 3/28/2014 9:48 AM, Dmitry Olshansky wrote:
 28-Mar-2014 13:55, Walter Bright пишет:
 On 3/28/2014 1:32 AM, Johannes Pfau wrote:
 Ranges have equivalents in other languages:
 iterators in c++,
 IEnumerator in c#,
 Iterator in java

 all these languages have special stream types for raw data. I don't
 think it's bad if we also have streams/ranges separate in D.

Do you see a point to be able to, in an algorithm, seamlessly swap a socket with a string?

Certainly NOT a socket. There is no escaping the fact that there are specifics to unbuffered direct streams. What you mention only makes sense with buffering either implicit or (I'd prefer) explicit.

Yes, it does require a one element buffer. But seriously, does a one character buffer from a socket have a measurable impact on reading from a network?

WAT? The overhead is in issuing system calls, you'd want to do as little of them as possible. Reading byte by byte is an exemplar of idiocy in I/O code.
 I'm an efficiency wonk as much or more than anyone, and this
 appears to me to be a false savings.

Oh crap. This is very wrong. Do you often work with I/O and networking? -- Dmitry Olshansky
Mar 28 2014
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 3/28/2014 10:11 AM, Dmitry Olshansky wrote:
 WAT? The overhead is in issuing system calls, you'd want to do as little of
them
 as possible. Reading byte by byte is an exemplar of idiocy in I/O code.

That's why we have things like byLine().
Mar 28 2014
parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
28-Mar-2014 22:29, Walter Bright пишет:
 On 3/28/2014 10:11 AM, Dmitry Olshansky wrote:
 WAT? The overhead is in issuing system calls, you'd want to do as
 little of them
 as possible. Reading byte by byte is an exemplar of idiocy in I/O code.

That's why we have things like byLine().

Which uses C's BUFFERED I/O and it reads from it byte by byte via getc. Even though sys calls are amortized by C runtime, we have a function call per byte. No wonder it's SLOW. -- Dmitry Olshansky
Mar 28 2014
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 3/28/2014 11:40 AM, Dmitry Olshansky wrote:
 Which uses C's BUFFERED I/O and it reads from it byte by byte via getc. Even
 though sys calls are amortized by C runtime, we have a function call per byte.
 No wonder it's SLOW.

How about a PR to fix it?
Mar 28 2014
parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
29-Mar-2014 02:05, Walter Bright пишет:
 On 3/28/2014 11:40 AM, Dmitry Olshansky wrote:
 Which uses C's BUFFERED I/O and it reads from it byte by byte via
 getc. Even
 though sys calls are amortized by C runtime, we have a function call
 per byte.
 No wonder it's SLOW.

How about a PR to fix it?

Sorry, I'm not inclined to do any work on hacking arbitrary C runtime libraries. Too much work on reverse engineering and making sure it stays in sync. We need new I/O package and hopefully Steven has something brewing. -- Dmitry Olshansky
Mar 29 2014
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 3/28/14, 11:40 AM, Dmitry Olshansky wrote:
 28-Mar-2014 22:29, Walter Bright пишет:
 On 3/28/2014 10:11 AM, Dmitry Olshansky wrote:
 WAT? The overhead is in issuing system calls, you'd want to do as
 little of them
 as possible. Reading byte by byte is an exemplar of idiocy in I/O code.

That's why we have things like byLine().

Which uses C's BUFFERED I/O and it reads from it byte by byte via getc. Even though sys calls are amortized by C runtime, we have a function call per byte. No wonder it's SLOW.

byLine doesn't use getc. -- Andrei
Mar 28 2014
parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
29-Mar-2014 06:47, Andrei Alexandrescu пишет:
 On 3/28/14, 11:40 AM, Dmitry Olshansky wrote:
 28-Mar-2014 22:29, Walter Bright пишет:
 On 3/28/2014 10:11 AM, Dmitry Olshansky wrote:
 WAT? The overhead is in issuing system calls, you'd want to do as
 little of them
 as possible. Reading byte by byte is an exemplar of idiocy in I/O code.

That's why we have things like byLine().

Which uses C's BUFFERED I/O and it reads from it byte by byte via getc. Even though sys calls are amortized by C runtime, we have a function call per byte. No wonder it's SLOW.

byLine doesn't use getc. -- Andrei

Indeed not every version. Linux looks almost OK. There is still this stuff in empty: https://github.com/D-Programming-Language/phobos/blob/master/std/stdio.d#L1604 And looking through the code I see that Win64, OSX and FreeBSD versions still use getc. Win32 hacks right into DMC run-time, and on linux getdelim is used. The point about ranges only ever making sense over buffered I/O still stands. Preferably not C runtime. -- Dmitry Olshansky
Mar 29 2014
prev sibling parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
28-Mar-2014 04:20, Walter Bright пишет:
 On 3/27/2014 2:56 PM, Andrei Alexandrescu wrote:
 On 3/27/14, 2:24 PM, Walter Bright wrote:
 The range protocol is designed to work with streams.

It's designed to work with containers.

I know we talked about streams when we designed it.

If streams are like hot raw sockets then certainly it doesn't make sense. Ranges work nice when there are some "slots" down below, that is a buffer.
 It's a giant fail
 if they do not, or if you want to create a separate, non-range universe
 to deal with streams.

It's not a giant fail, we just need to adjust the notion.

Are you suggesting that ranges needn't support streams?

Yes they "don't support streams". They are an abstraction somewhere on top of buffering. This is where they fit nicely. -- Dmitry Olshansky
Mar 28 2014
prev sibling next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 3/27/2014 2:31 PM, Steven Schveighoffer wrote:
 Adding range primitives on top of a stream does not make sense.

Are ready to implement a parallel universe of stream based algorithms to go alongside all the range based ones and be ready to constantly justify that to everyone? I'm also curious what a generic read() should do when the stream is empty.
Mar 27 2014
next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 3/28/14, 3:42 AM, Steven Schveighoffer wrote:
 On Thu, 27 Mar 2014 20:25:56 -0400, Walter Bright
 <newshound2 digitalmars.com> wrote:

 On 3/27/2014 2:31 PM, Steven Schveighoffer wrote:
 Adding range primitives on top of a stream does not make sense.

Are ready to implement a parallel universe of stream based algorithms to go alongside all the range based ones and be ready to constantly justify that to everyone?

Not necessary. You just need to implement one range on top of a buffered stream, and then it works with all other algorithms that accept input ranges.
 I'm also curious what a generic read() should do when the stream is
 empty.

Returns 0 elements read.

It increasingly seems to me we need to do this. -- Andrei
Mar 28 2014
prev sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 3/28/2014 3:42 AM, Steven Schveighoffer wrote:
 I'm also curious what a generic read() should do when the stream is empty.

Returns 0 elements read.

Meaning it must also write through a pointer if it did read something. How is that faster than a 1 element buffer?
Mar 28 2014
prev sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 3/27/2014 3:23 PM, QAston wrote:
 The protocol is not intuitive.

I find empty-front-popFront as perfectly intuitive. I don't find the counter proposals, which come with baggage like constructors that may fail, and front() that may fail in unspecified ways, or throwing entire paradigms out the window, as intuitive. But I concede that other people think differently. Not everyone thinks the same. But consider this: floating point math is not intuitive. There has never been a shortage of proposals to make fp intuitive, but they've all failed because they are impractical. Sometimes ya gotta go with what works.
Mar 27 2014
prev sibling next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 3/27/14, 1:58 PM, Walter Bright wrote:
 On 3/27/2014 12:50 PM, Andrei Alexandrescu wrote:
 Yah, agreed. -- Andrei

Completely unworkable. To determine if a range is empty or not, it may have to actually read from its input. TTYs are an example. Although empty may then cache the result, and not read the second time it is called, an observer of TTY could see that an item was read from the TTY.

That's a good point too. Andrei
Mar 27 2014
prev sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Fri, Mar 28, 2014 at 04:30:36PM +0000, John Stahara wrote:
 On Fri, 28 Mar 2014 16:23:11 +0000, Paolo Invernizzi wrote:
 
 On Friday, 28 March 2014 at 09:30:25 UTC, Regan Heath wrote:
 On Fri, 28 Mar 2014 08:59:34 -0000, Paolo Invernizzi
 <paolo.invernizzi no.address> wrote:
 For what concern us, everyone here is happy with the fact that empty
 *must* be checked prior to front/popFront.

This is actually not true. R

What I'm meaning, it's that we don't care: we are always respecting the sequence "empty > front > pop", and everybody here find it natural.

To clarify for Mr. Invernizzi: the "we" to which he refers is the group of people he works with, and /not/ the members of this newsgroup.

FWIW I find it perfectly natural as well. (But I know not everyone agrees. :P) T -- The two rules of success: 1. Don't tell everything you know. -- YHL
Mar 28 2014
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 3/27/14, 5:35 AM, Steven Schveighoffer wrote:
 BTW, I think there has been recent talk about not focusing on dust when
 we are laying bricks. This would have my vote as useless dust. This does
 not solve the problem of streams-as-ranges, because streams don't make
 good ranges. It doesn't really solve any problems as far as I can tell.

I think byXchar are important and are not streams. -- Andrei
Mar 27 2014
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 3/27/14, 8:44 AM, Steven Schveighoffer wrote:
 On Thu, 27 Mar 2014 11:40:59 -0400, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 On 3/27/14, 5:35 AM, Steven Schveighoffer wrote:
 BTW, I think there has been recent talk about not focusing on dust when
 we are laying bricks. This would have my vote as useless dust. This does
 not solve the problem of streams-as-ranges, because streams don't make
 good ranges. It doesn't really solve any problems as far as I can tell.

I think byXchar are important and are not streams. -- Andrei

What's broken about those?

Speed.
Mar 27 2014
next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 3/27/14, 9:12 AM, monarch_dodra wrote:
 filter comes to mind. -- Andrei

You *just* turned down a pull that did exactly that, for exactly the same reasons.

I was on the verge, and we need to discuss this more. A constructor that does arbitrary work for a lazy range doesn't sit well. -- Andrei
Mar 27 2014
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 3/27/2014 9:12 AM, monarch_dodra wrote:
 If you initialized the next element in both constructor

As mentioned earlier, requiring this of a range constructor makes it impossible to separate composition of a range with using it. As also mentioned earlier, this throws under the bus entire categories of use cases for ranges, all to avoid a single call to 'empty'.
Mar 27 2014
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 3/27/2014 1:59 PM, Steven Schveighoffer wrote:
 On Thu, 27 Mar 2014 16:53:35 -0400, Walter Bright <newshound2 digitalmars.com>
 wrote:

 On 3/27/2014 9:12 AM, monarch_dodra wrote:
 If you initialized the next element in both constructor

As mentioned earlier, requiring this of a range constructor makes it impossible to separate composition of a range with using it.

Not impossible. Use lazy initialization.

That still doesn't work, because then you could be calling front, and front might fail if there's no input. What is wrong with following the protocol? Why must we introduce all these caveats for ranges, like streams not working, front that can fail, can't do C# style LINQ, etc.? For what gain?
 It only makes sense if you are using empty in your processing of the range.

I.e. follow the danged protocol and it works.
Mar 27 2014
parent Walter Bright <newshound2 digitalmars.com> writes:
On 3/27/2014 2:46 PM, Steven Schveighoffer wrote:
 In the cases where you know there is input, that's not true.

The range doesn't know what you know.
 What I am protesting is the idea that you *know* input exists, you must still
call empty.
 Front is enough in that case.

You can't use that on generic code, because generic code cannot assume that front will not fail.
 What is wrong with following the protocol? Why must we introduce all these
 caveats for ranges, like streams not working, front that can fail, can't do C#
 style LINQ, etc.? For what gain?

There is no gain to requiring empty on ranges where you logically prove they are not empty. It's a wasted call. For things where you're not sure if they are empty or not (e.g. an input file), of course empty is required to be called, and of course it may do some processing to determine that. But I would only expect that on the first call. Subsequent checks would already be done by popFront.

I know that you want to get rid of empty. But getting rid of empty means that front may fail. This is why there is an empty, and any generic code MUST respect that. What you can do is, in your range: enum empty = true; and then it won't cost anything to call it.
 I mean, if you are calling empty regularly, because you aren't sure whether the
 elements are there. That's not always the case, sometimes you are sure the
 elements are there.

Then use an adapter range that has: enum empty = true; and forwards the front and popFront() calls to its parameter. You'll get the optimization you want, and won't violate protocol.
 Requiring a call to empty to do anything other than to check
 whether a range is empty, is just plain wrong.

There are solid reasons to require it (i.e. streams), and you've responded that ranges needn't even support streams. If I have interpreted your position correctly, I cannot agree with it.
Mar 27 2014
prev sibling next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 3/27/2014 3:49 AM, "Marc Schütz" <schuetzm gmx.net>" wrote:
 I was originally going to do that, but then I took a closer look at the
 documentation, which says ([1] in the documentation of `isInputRange()`):

 "Calling r.front is allowed only if calling r.empty has, or would have,
returned
 false."

 (And the same for `popFront()`.)

 That is, the documentation more or less explicitly states that you don't
 actually need to call `empty` if you know it returned `true`.

 [1] http://dlang.org/phobos/std_range.html

Right, it does say that, and it is seriously wrong.
Mar 27 2014
prev sibling next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 3/27/2014 5:45 AM, "Ola Fosheim Grøstad" 
<ola.fosheim.grostad+dlang gmail.com>" wrote:
 On Thursday, 27 March 2014 at 12:35:04 UTC, Steven Schveighoffer wrote:
 I think we should work on making a protocol that does not require awkward
 calls, rather than alienating developers who don't follow the awkward protocol.

This is it.

Sure, but what is awkward about empty-front-popFront? You can still always use: foreach (e; range) { ... } which will follow the correct protocol automatically.
Mar 27 2014
prev sibling parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 03/27/2014 05:17 AM, Walter Bright wrote:
 As with *any* API, if you look under the hood and make assumptions about
 the behavior based on a particular implementation, assumptions that are
 not part of the API, the risk of breakage inevitably follows.

 If you've identified Phobos code that uses ranges but does not follow
 the protocol, the Phobos code is broken - please file a bugzilla issue
 on it.

I feel uneasy about the assertion that code that assumes that 'empty' checks for an empty range is broken. If the API does not allow fast range implementations without arbitrarily restrictions on some 'protocol', then this is a problem with the API. Maybe empty/front/popFront should just be a single primitive, like: Nullable!T get();
Mar 27 2014
next sibling parent Timon Gehr <timon.gehr gmx.ch> writes:
On 03/27/2014 11:48 PM, Timon Gehr wrote:
 arbitrarily restrictions on some 'protocol'

arbitrarily restricting interactions to some 'protocol'
Mar 27 2014
prev sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 3/27/2014 3:48 PM, Timon Gehr wrote:
 Maybe empty/front/popFront should just be a single primitive, like:
 Nullable!T get();

What if get() fails?
Mar 27 2014
prev sibling parent reply Rainer Schuetze <r.sagitario gmx.de> writes:
On 27.03.2014 03:48, Walter Bright wrote:
 On 3/26/2014 7:19 PM, Steven Schveighoffer wrote:
 if(!r.empty)
 {
     auto r2 = map!(x => x * 2)(r);
     do
     {
        auto x = r2.front;
        ...
     } while(!r2.empty);
 }

 Should we be required to re-verify that r2 is not empty before using
 it? It
 clearly is not, and would be an artificial requirement (one that map
 likely
 would not enforce!).

 This sounds so much like a convention that simply won't be followed,
 and the
 result will be perfectly valid code.

The idea is that there are three functions: empty, front, and popFront. The functionality of the range will be distributed between these 3 functions. Different things being ranged over will naturally split their operations into the 3 functions in different ways. If the 3 functions can be called in any order, then extra logic has to be added to them to signal to each other. This slows things down. To write generic code calling ranges, it's necessary to stop assuming what is happening under the hood of empty, front, and popFront, and stick to the protocol.

IIUC what you are proposing would be covered better by removing front and return the value by popFront. Instead of forcing strong, breaking and sometimes unintuitive semantics we could also improve dmd's optimizer. This caching range example: /////////////////////////////////// T getc(T)(); struct irange(T) { bool _cached; T _cache; this(T[] arr) { _cached = false; } bool empty() { if(_cached) return false; _cache = getc!T(); return (_cache < 0); } T front() { empty(); return _cache; } void popFront() { _cached = false; } } int foo(irange!int rg) { int sum = 0; while(!rg.empty) { sum += rg.front; rg.popFront; } return sum; } /////////////////////////////////// generates this code with "dmd2.065 -O -inline -release -noboundscheck -m64": _D7irange23fooFS7irange213__T6irangeTiZ6irangeZi: 0000: push rbp 0001: mov rbp,rsp 0004: push rax 0005: push rsi 0006: mov qword ptr [rbp+10h],rcx 000A: xor esi,esi 000C: lea rcx,[rbp+10h] 0010: sub rsp,20h 0014: call _D7irange213__T6irangeTiZ6irange5emptyMFZb 0019: add rsp,20h 001D: xor al,1 001F: je 0050 0021: lea rcx,[rbp+10h] 0025: sub rsp,20h 0029: call _D7irange213__T6irangeTiZ6irange5emptyMFZb 002E: add rsp,20h 0032: mov eax,dword ptr [rbp+14h] 0035: add esi,eax 0037: mov byte ptr [rbp+10h],0 003B: lea rcx,[rbp+10h] 003F: sub rsp,20h 0043: call _D7irange213__T6irangeTiZ6irange5emptyMFZb 0048: add rsp,20h 004C: xor al,1 004E: jne 0021 0050: mov eax,esi 0052: pop rsi 0053: lea rsp,[rbp] 0057: pop rbp 0058: ret _D7irange213__T6irangeTiZ6irange5emptyMFZb: 0000: push rbp 0001: mov rbp,rsp 0004: mov qword ptr [rbp+10h],rcx 0008: cmp byte ptr [rcx],0 000B: je 0011 000D: xor eax,eax 000F: pop rbp 0010: ret 0011: sub rsp,20h 0015: call _D7irange211__T4getcTiZ4getcFZi 001A: add rsp,20h 001E: mov rcx,qword ptr [rbp+10h] 0022: mov dword ptr [rcx+4],eax 0025: shr eax,1Fh 0028: mov byte ptr [rcx],al 002A: xor al,1 002C: pop rbp 002D: ret and this with gdc (-O2 on godbolt.org): int example.foo(example.irange!(int).irange): mov rax, rdi push rbx xor ebx, ebx shr rax, 32 test dil, dil je .L5 .L2: add ebx, eax .L5: call int example.getc!(int).getc() test eax, eax js .L2 mov eax, ebx pop rbx ret All traces of the caching but the initial state check have been removed, no extra calls.
Mar 26 2014
next sibling parent Rainer Schuetze <r.sagitario gmx.de> writes:
On 27.03.2014 07:53, Rainer Schuetze wrote:
 bool empty() {
      if(_cached) return false;
      _cache = getc!T();
      return (_cache < 0);
    }

Ouch, pasted before fixing: bool empty() { if(_cached) return false; _cache = getc!T(); _cached = _cache >= 0; // this line added return !_cached; } The asm is with an unintended "_cached = _cache < 0;", but that's the same, just accepting negative input instead of non-negative.
Mar 26 2014
prev sibling next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 3/26/2014 11:53 PM, Rainer Schuetze wrote:
 IIUC what you are proposing would be covered better by removing front and
return
 the value by popFront.

Different things being ranged over make different decisions as to what goes into each function.
 Instead of forcing strong, breaking and sometimes unintuitive semantics

I don't understand what is so unintuitive about: while (!empty) { ... front ...; popFront(); } What I do find unintuitive and unattractive is all those flags in the range implementation.
 we could also improve dmd's optimizer.

Yes, that's true. The thing about optimizers, though, is they work on patterns. If your code doesn't quite match the right pattern, it won't tickle the optimization you might be expecting. The pattern for recognizing the ROL and ROR patterns is one such.
Mar 27 2014
parent reply Rainer Schuetze <r.sagitario gmx.de> writes:
On 27.03.2014 09:42, Walter Bright wrote:
 On 3/26/2014 11:53 PM, Rainer Schuetze wrote:
 IIUC what you are proposing would be covered better by removing front
 and return
 the value by popFront.

Different things being ranged over make different decisions as to what goes into each function.
 Instead of forcing strong, breaking and sometimes unintuitive semantics

I don't understand what is so unintuitive about: while (!empty) { ... front ...; popFront(); }

This loop is intuitive. Not being allowed to call empty or front multiple times or not at all is unintuitive. They should not be named as if they are properties then.
 What I do find unintuitive and unattractive is all those flags in the
 range implementation.

 we could also improve dmd's optimizer.

Yes, that's true. The thing about optimizers, though, is they work on patterns. If your code doesn't quite match the right pattern, it won't tickle the optimization you might be expecting. The pattern for recognizing the ROL and ROR patterns is one such.

I guess gcc doesn't use patterns here, but value propagation and notices that all the checks are unnecessary.
Mar 27 2014
parent Walter Bright <newshound2 digitalmars.com> writes:
On 3/27/2014 12:21 PM, Rainer Schuetze wrote:
 This loop is intuitive. Not being allowed to call empty or front multiple times
 or not at all is unintuitive. They should not be named as if they are
properties
 then.

I can concede that. But I can't concede being able to call front without first calling empty, or calling popFront without calling empty and front, or requiring 'pump priming' in the constructor.
Mar 27 2014
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 3/26/2014 11:53 PM, Rainer Schuetze wrote:
  This caching range example:

 ///////////////////////////////////
 T getc(T)();

 struct irange(T)
 {
    bool _cached;
    T _cache;

    this(T[] arr) { _cached = false; }
    bool empty() {
      if(_cached) return false;
      _cache = getc!T();
      return (_cache < 0);

What happens if empty is called twice in a row? Two characters get read! That isn't right.
    }
    T front() { empty(); return _cache; }

What happens if empty returns true? EOF? I don't think that's intuitive. You could have front throw, but that prevents anyone from building nothrow ranges.
    void popFront() { _cached = false; }
 }

Mar 27 2014
parent reply Rainer Schuetze <r.sagitario gmx.de> writes:
On 27.03.2014 10:06, Walter Bright wrote:
 On 3/26/2014 11:53 PM, Rainer Schuetze wrote:
  This caching range example:

 ///////////////////////////////////
 T getc(T)();

 struct irange(T)
 {
    bool _cached;
    T _cache;

    this(T[] arr) { _cached = false; }
    bool empty() {
      if(_cached) return false;
      _cache = getc!T();
      return (_cache < 0);

What happens if empty is called twice in a row? Two characters get read! That isn't right.

Good catch. Somehow I pasted the wrong version, but posted the corrections a few minutes later.
    }
    T front() { empty(); return _cache; }

What happens if empty returns true? EOF? I don't think that's intuitive. You could have front throw, but that prevents anyone from building nothrow ranges.

The caller is told to guarantee that empty must not return true. I just didn't wanted to repeat the stuff in empty. GDC removed it anyway...
Mar 27 2014
parent Walter Bright <newshound2 digitalmars.com> writes:
On 3/27/2014 12:24 PM, Rainer Schuetze wrote:
 The caller is told to guarantee that empty must not return true.

I think that is the very definition of looking under the hood in order to violate protocol. A range CANNOT be written generically and support that.
Mar 27 2014
prev sibling next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 3/26/2014 11:48 PM, Jonathan M Davis wrote:
 However, it seems really off to me for there to be any real work going on in
 empty,

Many things, like some ttys, can not be tested for being empty without reading it, and reading of a tty is destructive.
 and I'd be leery of any range which actually required empty to be
 called before calling front (though I definitely think that calling front on
 an empty range should not be considered okay and would expect that to
 generally be undefined behavior).

That's exactly why empty should be called before front - because front is expected to succeed.
Mar 27 2014
prev sibling parent "Regan Heath" <regan netmail.co.nz> writes:
On Fri, 28 Mar 2014 16:30:36 -0000, John Stahara  
<john.stahara+dlang gmail.com> wrote:

 On Fri, 28 Mar 2014 16:23:11 +0000, Paolo Invernizzi wrote:

 On Friday, 28 March 2014 at 09:30:25 UTC, Regan Heath wrote:
 On Fri, 28 Mar 2014 08:59:34 -0000, Paolo Invernizzi
 <paolo.invernizzi no.address> wrote:
 For what concern us, everyone here is happy with the fact that empty
 *must* be checked prior to front/popFront.

This is actually not true. R

What I'm meaning, it's that we don't care: we are always respecting the sequence "empty > front > pop", and everybody here find it natural.

To clarify for Mr. Invernizzi: the "we" to which he refers is the group of people he works with, and /not/ the members of this newsgroup. --jjs

Thanks, that was confusing me :) R -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Mar 28 2014
prev sibling next sibling parent "Daniel Murphy" <yebbliesnospam gmail.com> writes:
"Regan Heath"  wrote in message 
news:op.xdb9a9v354xghj puck.auriga.bhead.co.uk...

 What guarantees range2 is longer than range1?  The isArray case checks 
 explicitly, but the generic one doesn't.  Is it a property of being an 
 output range that it will expand as required, or..

Some ranges will give you their length...
Mar 26 2014
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 27 Mar 2014 17:24:11 -0400, Walter Bright  
<newshound2 digitalmars.com> wrote:

 On 3/27/2014 2:13 PM, Steven Schveighoffer wrote:
 Note that a stream makes a terrible range, simply because of what you  
 say --
 reading is destructive, and determining emptiness is dependent on  
 reading. You
 need a buffered stream to make a good range, and then the logic becomes  
 much
 more straightforward.

The range becomes the one element buffer in this case. It is completely workable.

A 1 byte buffered stream? If it's performance you are looking for, you will not find it there.
 The range protocol is designed to work with streams. It's a giant fail  
 if they do not, or if you want to create a separate, non-range universe  
 to deal with streams.

Ranges work well on top of buffered streams, but not AS streams.
 Even a single byte buffer is not enough, you need a bool to indicate  
 the stream
 is done.

Right. But empty for a stream still has to read. Just follow the protocol, and the range will work, even with streams.

A stream requires one primitive -- read. In one function call, you get the data you want, you can tell if it's empty, and the stream object does not need to concern itself with projecting a cached element. Adding range primitives on top of a stream does not make sense. -Steve
Mar 27 2014
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 27 Mar 2014 17:28:05 -0400, Walter Bright  
<newshound2 digitalmars.com> wrote:

 On 3/27/2014 1:59 PM, Steven Schveighoffer wrote:
 On Thu, 27 Mar 2014 16:53:35 -0400, Walter Bright  
 <newshound2 digitalmars.com>
 wrote:

 On 3/27/2014 9:12 AM, monarch_dodra wrote:
 If you initialized the next element in both constructor

As mentioned earlier, requiring this of a range constructor makes it impossible to separate composition of a range with using it.

Not impossible. Use lazy initialization.

That still doesn't work, because then you could be calling front, and front might fail if there's no input.

In the cases where you know there is input, that's not true. What I am protesting is the idea that you *know* input exists, you must still call empty. Front is enough in that case.
 What is wrong with following the protocol? Why must we introduce all  
 these caveats for ranges, like streams not working, front that can fail,  
 can't do C# style LINQ, etc.? For what gain?

There is no gain to requiring empty on ranges where you logically prove they are not empty. It's a wasted call. For things where you're not sure if they are empty or not (e.g. an input file), of course empty is required to be called, and of course it may do some processing to determine that. But I would only expect that on the first call. Subsequent checks would already be done by popFront.
 It only makes sense if you are using empty in your processing of the  
 range.

I.e. follow the danged protocol and it works.

I mean, if you are calling empty regularly, because you aren't sure whether the elements are there. That's not always the case, sometimes you are sure the elements are there. Requiring a call to empty to do anything other than to check whether a range is empty, is just plain wrong. I'm sure there's better ways to implement what you want without stitching the functionality onto empty. -Steve
Mar 27 2014
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Sat, 29 Mar 2014 17:02:30 -0400, Marc Sch=C3=BCtz <schuetzm gmx.net> =
wrote:

 On Saturday, 29 March 2014 at 01:36:46 UTC, Steven Schveighoffer wrote=

 On Fri, 28 Mar 2014 07:47:22 -0400, Marc Sch=C3=BCtz <schuetzm gmx.ne=


 wrote:
 On Thursday, 27 March 2014 at 16:12:36 UTC, monarch_dodra wrote:
 If you initialized the next element in both constructor and popFron=




 then you'd get rid of both these checks.

... but of course lose laziness.

In this case, laziness is not critical. Decoding the element is an O(=


 operation, and when looping through, you will decode it anyway.

 When processing a 20 element string, the cost of checking to see if y=


 have decoded on every empty or front call may override the front-load=


 cost of decoding the first element on construction. It's sure to add =


 the cost if you are processing all 20 elements, since you decode them=


 all anyway.

 On other ranges, it's more important when the first element costs a l=


 to fetch. HOWEVER, it's not critically important to delay that unless=


 you are not going to process that element. For example, if you are  =


 foreach'ing over all the elements, the delay doesn't matter.

 I'd rather the higher level code decide whether to delay or not,  =


 depending on the situation. Requiring a protocol change for such  =


 detailed knowledge seems unbalanced.

I was more thinking of the fact that you need to read something on =

 construction, rather than on consumption, and this reading might be  =

 noticeable. There was the example of `stdin.byLine().filter(...)` (or =

 something similar, don't remember exactly), which reads from stdin on =

 construction. This changes the behaviour of the program, because the  =

 read operation will (probably) block.

Blocking operations may have a noticeable impact, but they may not. It = depends on what you do between construction and processing. For example, if you have: foreach(l; stdin.byLine) Lazily fetching the first line makes no operational difference whatsoeve= r, = even if it blocks, because you're immediately going to process it. But if it *is* lazily fetched, you are paying some possibly small, but = nonzero, cost for that laziness that you didn't need to pay. This is why I said it's not critically important to delay unless you are= = not going to process the first element. Because there is no primitive to= = "prime" the range, we must do it on a property fetch, or on construction= . = But after that, popFront is a perfectly reasonable place to get the next= = element. All this fuss is over the *first* element in the range. I think providin= g = a mechanism to decide whether you want it now or later is reasonable. Fo= r = example a lazy range wrapper. Note, however, the case Andrei was arguing about was one of the string = decoders. When you are blocking for input, hell, even if you aren't = blocking, but need to call system calls to get it, the performance cost = of = checking a boolean every call is negligible. But when you are decoding 1= -4 = bytes of data in memory, checking a boolean becomes significant. -Steve
Mar 31 2014
prev sibling parent "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:
On Monday, 31 March 2014 at 11:40:11 UTC, Steven Schveighoffer 
wrote:
 Blocking operations may have a noticeable impact, but they may 
 not. It depends on what you do between construction and 
 processing.

 For example, if you have:

 foreach(l; stdin.byLine)

 Lazily fetching the first line makes no operational difference 
 whatsoever, even if it blocks, because you're immediately going 
 to process it.

 But if it *is* lazily fetched, you are paying some possibly 
 small, but nonzero, cost for that laziness that you didn't need 
 to pay.

 This is why I said it's not critically important to delay 
 unless you are not going to process the first element. Because 
 there is no primitive to "prime" the range, we must do it on a 
 property fetch, or on construction. But after that, popFront is 
 a perfectly reasonable place to get the next element.

 All this fuss is over the *first* element in the range. I think 
 providing a mechanism to decide whether you want it now or 
 later is reasonable. For example a lazy range wrapper.

I've found the example I was talking about: http://forum.dlang.org/thread/mailman.323.1393458346.6445.digitalmars-d puremagic.com I misremembered, filter wasn't even involved. But similar situations may arise from using filter or another range that eagerly fetches the first element, even if its source doesn't.
 Note, however, the case Andrei was arguing about was one of the 
 string decoders. When you are blocking for input, hell, even if 
 you aren't blocking, but need to call system calls to get it, 
 the performance cost of checking a boolean every call is 
 negligible. But when you are decoding 1-4 bytes of data in 
 memory, checking a boolean becomes significant.

That's what I meant by a tight loop. My hypothesis is that in such cases the optimizers of at least GDC and LDC are good enough to remove the check for the boolean completely, and that at least LDC can then remove the boolean itself from the range structure.
Mar 31 2014
prev sibling parent "w0rp" <devw0rp gmail.com> writes:
On Friday, 28 March 2014 at 16:59:05 UTC, Johannes Pfau wrote:
 It 'works' with streams but it's way too slow. You don't want 
 to read
 byte-per-byte. Of course you can always implement ranges on top 
 of
 streams. Usually these will not provide byte-per-byte access but
 efficient higher level abstractions (byLine, byChunk, 
 decodeText).

 The point is you can implement ranges on streams easily, but 
 you can't
 use ranges as the generic primitive for raw data. What's the 
 element
 type of a data range?
 ubyte - performance sucks
 ubyte[n], ubyte[] now you have a range of ranges, most 
 algorithms wont
 work as expected (find, count, ...).

 (the call empty/don't call empty discussion is completely 
 unrelated to
 this, btw. You can implement ranges on streams either way, but 
 again,
 using ranges for raw data streams is not a good idea.)

I think a key is to offer something with gives you chunks at a time right at the top, and the use .joiner on that. I read files this way currently. auto fileByteRange = File("something").byChunk(chunkSize).joiner; I believe this to be a very good way to get good performance without losing the functionality of std.algorithm.
Mar 28 2014
prev sibling next sibling parent "monarch_dodra" <monarchdodra gmail.com> writes:
On Tuesday, 25 March 2014 at 22:54:14 UTC, Timon Gehr wrote:
 On 03/25/2014 10:29 PM, Andrei Alexandrescu wrote:
 B) I want to be able to call front multiple times in a row, 
 and expect
 to get the same value.

Correct.

'map' may fail this criterion. In this case, is the blame put on the user?

Depends how you define "same" though :/ But yeah. Also, shameless plug about my "cache" range adaptor: https://github.com/D-Programming-Language/phobos/pull/1364
Mar 25 2014
prev sibling next sibling parent "Regan Heath" <regan netmail.co.nz> writes:
On Tue, 25 Mar 2014 23:22:18 -0000, Walter Bright  
<newshound2 digitalmars.com> wrote:
 On 3/25/2014 2:29 PM, Andrei Alexandrescu wrote:
 The range instance gets bigger and
 more expensive to copy, and the cost of manipulating the flag and the
 buffer is added to every loop iteration. Note that the user of a range
 can trivially use:
       auto e = r.front;
       ... using e multiple times ...
 instead.

That would pessimize code using arrays of large structs.

You're already requiring copying with the buffering requirement. And besides, if I was creating a range of expensive-to-copy objects, I would add a layer of indirection to cheapen it.

Surely you'd simply start with a range of pointers to expensive-to-copy objects? Or, return them by reference from the underlying range/array/source. You want to avoid *ever* copying them except explicitly where required.
 The proposed protocol pessimizes arrays of large structs

Not really more than the existing protocol does (i.e. required buffering).
 and will trip the unwary if calling r.front again returns something  
 else.

I'm not proposing that calling them wrongly would make things unsafe. But I don't think it's unreasonable to expect the protocol to be followed, just like file open/read/close.

My immediate expectation upon encountering ranges was that r.front would return the same item repeatedly until r.popFront was called. Breaking that guarantee will trip a *lot* of people up. IMO the rules should be something like: - r.empty WILL return false if there is more data available in the range. - r.empty MUST be called before r.front, r.front WILL succeed if r.empty returned false. - r.front WILL repeatably return the current element in the range. It MAY return by value or by reference. - r.empty SHOULD be called before r.popFront, otherwise r.popFront MAY do nothing or throw (could also make this one a 'MUST') - r.popFront WILL advance to the next element in the range. - "lazy" ranges SHOULD delay initialisation until r.empty is called R -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Mar 26 2014
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Wed, 26 Mar 2014 06:46:26 -0400, Regan Heath <regan netmail.co.nz>  
wrote:

 On Tue, 25 Mar 2014 23:22:18 -0000, Walter Bright  
 <newshound2 digitalmars.com> wrote:
 On 3/25/2014 2:29 PM, Andrei Alexandrescu wrote:
 The range instance gets bigger and
 more expensive to copy, and the cost of manipulating the flag and the
 buffer is added to every loop iteration. Note that the user of a range
 can trivially use:
       auto e = r.front;
       ... using e multiple times ...
 instead.

That would pessimize code using arrays of large structs.

You're already requiring copying with the buffering requirement. And besides, if I was creating a range of expensive-to-copy objects, I would add a layer of indirection to cheapen it.

Surely you'd simply start with a range of pointers to expensive-to-copy objects? Or, return them by reference from the underlying range/array/source. You want to avoid *ever* copying them except explicitly where required.
 The proposed protocol pessimizes arrays of large structs

Not really more than the existing protocol does (i.e. required buffering).
 and will trip the unwary if calling r.front again returns something  
 else.

I'm not proposing that calling them wrongly would make things unsafe. But I don't think it's unreasonable to expect the protocol to be followed, just like file open/read/close.

My immediate expectation upon encountering ranges was that r.front would return the same item repeatedly until r.popFront was called. Breaking that guarantee will trip a *lot* of people up. IMO the rules should be something like: - r.empty WILL return false if there is more data available in the range. - r.empty MUST be called before r.front, r.front WILL succeed if r.empty returned false. - r.front WILL repeatably return the current element in the range. It MAY return by value or by reference. - r.empty SHOULD be called before r.popFront, otherwise r.popFront MAY do nothing or throw (could also make this one a 'MUST') - r.popFront WILL advance to the next element in the range.

These two rules are not necessary if you know the range is not empty. See the conversation inside this pull: https://github.com/D-Programming-Language/phobos/pull/1987 -Steve
Mar 26 2014
prev sibling next sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Wed, 26 Mar 2014 08:29:15 -0400, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:

 On Wed, 26 Mar 2014 06:46:26 -0400, Regan Heath <regan netmail.co.nz>  
 wrote:
 IMO the rules should be something like:
   - r.empty WILL return false if there is more data available in the  
 range.

   - r.empty MUST be called before r.front, r.front WILL succeed if  
 r.empty returned false.
   - r.front WILL repeatably return the current element in the range. It  
 MAY return by value or by reference.

   - r.empty SHOULD be called before r.popFront, otherwise r.popFront  
 MAY do nothing or throw
     (could also make this one a 'MUST')
   - r.popFront WILL advance to the next element in the range.

These two rules are not necessary if you know the range is not empty. See the conversation inside this pull: https://github.com/D-Programming-Language/phobos/pull/1987

Gah, I didn't cut out the right rules. I meant the two rules that empty must be called before others. Those are not necessary. -Steve
Mar 26 2014
next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Thu, Mar 27, 2014 at 05:11:55PM -0700, Walter Bright wrote:
 On 3/27/2014 3:48 PM, Timon Gehr wrote:
Maybe empty/front/popFront should just be a single primitive, like:
Nullable!T get();

What if get() fails?

That's the whole point of Nullable!T: you can return null if get() fails. I can't say I like this, though. There are many cases in my code that need separation of .empty from .front from .popFront, such as recursive descent parsers or code that does if-match-then-consume algorithms. One of the *nice* things about the current range API is that that code works with arrays, streams, and lots of other stuff, without needing to care about implementation details. Changing it to get() will require lots of manual buffering and reintroduce lots of ugly boilerplate that I had to live with in C++. I'm with Walter on this one. Generic code should NOT assume anything about a range it was given, and therefore it should call .empty before calling .front or .popFront. If you "know" that a particular range doesn't require calling .empty beforehand, then by definition your code is no longer generic, and you just have to live with the consequences of that. Nothing stops you, for example, from exposing a .get function or whatever else, *in addition* to the standard range API. Then code that knows how to deal with .get will use it, and your range remains usable with other generic code. Nothing about the range API dictates that .empty cannot do useful work like initialize buffers. That should be an implementation detail that generic code shouldn't rely on. T -- The early bird gets the worm. Moral: ewww...
Mar 27 2014
prev sibling parent "monarch_dodra" <monarchdodra gmail.com> writes:
On Friday, 28 March 2014 at 02:14:40 UTC, H. S. Teoh wrote:
 I'm with Walter on this one. Generic code should NOT assume 
 anything
 about a range it was given, and therefore it should call .empty 
 before
 calling .front or .popFront. If you "know" that a particular 
 range
 doesn't require calling .empty beforehand, then by definition 
 your code
 is no longer generic, and you just have to live with the 
 consequences of
 that. Nothing stops you, for example, from exposing a .get 
 function or
 whatever else, *in addition* to the standard range API. Then 
 code that
 knows how to deal with .get will use it, and your range remains 
 usable
 with other generic code.

It's not that *you* know a particular range doesn't need "empty" called, it's that the algorithm you are using has already previously validated there are elements in it. For example, the splitter algorithm will first save a copy of its range, and then walk it, searching for the "splitting" elements. It then realizes it has walked N elements. From there, it takes the original range, and packs it into a "takeExactly(N)" of the original range. Iterating that "takeExactly(N)" is faster than a raw "take", *because* "takeExactly" was already promised that the range holds N elements, and as such, *doesn't* check for empty. Ditto for "findSplit". And again, there are functions, such as "copy", then simply *require* that a certain range have at least a certain amount of elements. Why check for it, if not providing it is a violation of its interface. On Thursday, 27 March 2014 at 23:52:46 UTC, Walter Bright wrote:
 I know that you want to get rid of empty. But getting rid of 
 empty means that front may fail. This is why there is an empty, 
 and any generic code MUST respect that.

Front only fails *if* the range is empty. Not if you fail to call empty. Generic code respects that.
 What you can do is, in your range:

     enum empty = true;

That's not the same. That's making the assumption your range will *never* be empty, which is a whole other concept (infinite ranges).
Mar 28 2014
prev sibling next sibling parent "Regan Heath" <regan netmail.co.nz> writes:
On Wed, 26 Mar 2014 12:30:53 -0000, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:

 On Wed, 26 Mar 2014 08:29:15 -0400, Steven Schveighoffer  
 <schveiguy yahoo.com> wrote:

 On Wed, 26 Mar 2014 06:46:26 -0400, Regan Heath <regan netmail.co.nz>  
 wrote:
 IMO the rules should be something like:
   - r.empty WILL return false if there is more data available in the  
 range.

   - r.empty MUST be called before r.front, r.front WILL succeed if  
 r.empty returned false.
   - r.front WILL repeatably return the current element in the range.  
 It MAY return by value or by reference.

   - r.empty SHOULD be called before r.popFront, otherwise r.popFront  
 MAY do nothing or throw
     (could also make this one a 'MUST')
   - r.popFront WILL advance to the next element in the range.

These two rules are not necessary if you know the range is not empty. See the conversation inside this pull: https://github.com/D-Programming-Language/phobos/pull/1987

Gah, I didn't cut out the right rules. I meant the two rules that empty must be called before others. Those are not necessary.

I see. I was thinking we ought to make empty mandatory to give more guaranteed structure for range implementors, so lazy initialisation can be done in one place only, etc etc. R -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Mar 26 2014
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Wed, 26 Mar 2014 11:09:04 -0400, Regan Heath <regan netmail.co.nz>  
wrote:

 On Wed, 26 Mar 2014 12:30:53 -0000, Steven Schveighoffer  
 <schveiguy yahoo.com> wrote:

 Gah, I didn't cut out the right rules. I meant the two rules that empty  
 must be called before others. Those are not necessary.

I see. I was thinking we ought to make empty mandatory to give more guaranteed structure for range implementors, so lazy initialisation can be done in one place only, etc etc.

Yes, but when you know that empty is going to return false, there isn't any logical reason to call it. It is an awkward requirement. I had the same thinking as you, why pay for an extra check for all 3 calls? But there was already evidence that people were avoiding empty. -Steve
Mar 26 2014
prev sibling next sibling parent "monarch_dodra" <monarchdodra gmail.com> writes:
On Wednesday, 26 March 2014 at 15:37:38 UTC, Steven Schveighoffer 
wrote:
 Yes, but when you know that empty is going to return false, 
 there isn't any logical reason to call it. It is an awkward 
 requirement.

 -Steve

Not only that, but it's also a performance criteria: If you are iterating on two ranges at once (think "copy"), then you *know* "range2" is longer than "range1", even if you don't know its length. Why pay for "range2.empty", when you know it'll always be false? There is a noticeable performance difference if you *don't* check.
Mar 26 2014
prev sibling next sibling parent "Regan Heath" <regan netmail.co.nz> writes:
On Wed, 26 Mar 2014 15:37:38 -0000, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:

 On Wed, 26 Mar 2014 11:09:04 -0400, Regan Heath <regan netmail.co.nz>  
 wrote:

 On Wed, 26 Mar 2014 12:30:53 -0000, Steven Schveighoffer  
 <schveiguy yahoo.com> wrote:

 Gah, I didn't cut out the right rules. I meant the two rules that  
 empty must be called before others. Those are not necessary.

I see. I was thinking we ought to make empty mandatory to give more guaranteed structure for range implementors, so lazy initialisation can be done in one place only, etc etc.

Yes, but when you know that empty is going to return false, there isn't any logical reason to call it. It is an awkward requirement.

Sure, it's not required for some algorithms in some situations.
 I had the same thinking as you, why pay for an extra check for all 3  
 calls? But there was already evidence that people were avoiding empty.

Sure, as above, makes perfect sense. It seemed from this thread that there was some confusion about how ranges should be written and used, and I thought it might help if the requirements were more fixed, that's all. If r.empty was mandatory then every range implementer would have a place to lazily initialise, r.front would be simpler, r.popFront too. Basically it would lower the bar for "good" range implementations. We might just need better documentation tho. R -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Mar 26 2014
prev sibling next sibling parent "Regan Heath" <regan netmail.co.nz> writes:
On Wed, 26 Mar 2014 16:38:57 -0000, monarch_dodra <monarchdodra gmail.com>  
wrote:

 On Wednesday, 26 March 2014 at 15:37:38 UTC, Steven Schveighoffer wrote:
 Yes, but when you know that empty is going to return false, there isn't  
 any logical reason to call it. It is an awkward requirement.

 -Steve

Not only that, but it's also a performance criteria: If you are iterating on two ranges at once (think "copy"), then you *know* "range2" is longer than "range1", even if you don't know its length.

What guarantees range2 is longer than range1? The isArray case checks explicitly, but the generic one doesn't. Is it a property of being an output range that it will expand as required, or..
 Why pay for "range2.empty", when you know it'll always be false? There  
 is a noticeable performance difference if you *don't* check.

But aren't you instead paying for 2 checks in front and 2 in popFront, so 4 checks vs 1? Or is the argument that these 4 checks cannot be removed even if we mandate r.empty is called before r.front/popFront. R -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Mar 26 2014
prev sibling next sibling parent "monarch_dodra" <monarchdodra gmail.com> writes:
On Wednesday, 26 March 2014 at 16:55:48 UTC, Regan Heath wrote:
 On Wed, 26 Mar 2014 16:38:57 -0000, monarch_dodra
 Not only that, but it's also a performance criteria: If you 
 are iterating on two ranges at once (think "copy"), then you 
 *know* "range2" is longer than "range1", even if you don't 
 know its length.

What guarantees range2 is longer than range1? The isArray case checks explicitly, but the generic one doesn't. Is it a property of being an output range that it will expand as required, or..

The interface: The target *shall* have enough room to accommodate origin. Failure to meet that criteria is an Error. Output ranges may or may not auto expand as required. Arguably, it's a design flaw I don't want to get started on.
 Why pay for "range2.empty", when you know it'll always be 
 false? There is a noticeable performance difference if you 
 *don't* check.

But aren't you instead paying for 2 checks in front and 2 in popFront, so 4 checks vs 1? Or is the argument that these 4 checks cannot be removed even if we mandate r.empty is called before r.front/popFront.

I don't know what checks you are talking about. Most ranges don't check anything on front or popFront. They merely assume they are in a state that where they can do their job. Failure to meet this condition prior to the call (note I didn't say "failure to check for"), means Error. It's really not much different from when doing an strcpy. "++p" and "*p" don't check anything. The fact that ranges *can* check doesn't always mean they should.
Mar 26 2014
prev sibling next sibling parent "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Wednesday, 26 March 2014 at 17:36:08 UTC, Andrei Alexandrescu 
wrote:
 I think requiring users to call empty before front on input 
 ranges is a concession we should make.

Then the name should change to "ready". It makes sense to require the user to check that the range is "ready", but not to check that it is "not empty". This will also make more sense for async implementations that will block if "not ready". IMO the whole interface needs rethinking if you want to gracefully support async data streams where you need to distinguish between: "ready" vs "empty", "front" vs "firstavailable". Both quick-sort, merge-sort, filter and map work well with async data streams.
Mar 26 2014
prev sibling next sibling parent "Regan Heath" <regan netmail.co.nz> writes:
On Wed, 26 Mar 2014 17:32:30 -0000, monarch_dodra <monarchdodra gmail.com>  
wrote:

 On Wednesday, 26 March 2014 at 16:55:48 UTC, Regan Heath wrote:
 On Wed, 26 Mar 2014 16:38:57 -0000, monarch_dodra
 Not only that, but it's also a performance criteria: If you are  
 iterating on two ranges at once (think "copy"), then you *know*  
 "range2" is longer than "range1", even if you don't know its length.

What guarantees range2 is longer than range1? The isArray case checks explicitly, but the generic one doesn't. Is it a property of being an output range that it will expand as required, or..

The interface: The target *shall* have enough room to accommodate origin. Failure to meet that criteria is an Error.

Ok. So long as *something* is throwing that Error I am down with this.
 Output ranges may or may not auto expand as required. Arguably, it's a  
 design flaw I don't want to get started on.

:)
 Why pay for "range2.empty", when you know it'll always be false? There  
 is a noticeable performance difference if you *don't* check.

But aren't you instead paying for 2 checks in front and 2 in popFront, so 4 checks vs 1? Or is the argument that these 4 checks cannot be removed even if we mandate r.empty is called before r.front/popFront.

I don't know what checks you are talking about. Most ranges don't check anything on front or popFront. They merely assume they are in a state that where they can do their job. Failure to meet this condition prior to the call (note I didn't say "failure to check for"), means Error.

Ok.. but lets take a naive range of say int with a 1 element cache in the member variable "int cache;". The simplest possible front would just be "return cache;". But, if cache hasn't been populated yet it's not going to throw an Error, it's just going to be wrong. So, presumably front has to check another boolean to verify it's been populated and throw an Error if not. That's one of the checks I meant. A typical loop over a range will call front one or more times, so you pay for that check 1 or more times per loop. popFront in this example doesn't need to check anything, it just populates cache regardless, as does empty. But, I imagine there are ranges which need some initial setup, and they have to do it somewhere, and they need to check they have done it in empty, front and popFront for every call. It's those checks we'd like to avoid if we can. So.. if we mandate that empty MUST be called, then they could just be done in one place, empty. However, in this situation nothing would be enforcing that requirement (in release anyway) and things could just go wrong. So, perhaps the checks always need to be there and we gain nothing by mandating empty is called first. idunno.
 It's really not much different from when doing an strcpy. "++p" and "*p"  
 don't check anything. The fact that ranges *can* check doesn't always  
 mean they should.

Sure. For performance reasons they might not, but isn't this just a tiny bit safer if we mandate empty must be called and do one check there.. R -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Mar 26 2014
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Wed, 26 Mar 2014 13:36:08 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 On 3/26/14, 8:37 AM, Steven Schveighoffer wrote:
 On Wed, 26 Mar 2014 11:09:04 -0400, Regan Heath <regan netmail.co.nz>
 wrote:

 On Wed, 26 Mar 2014 12:30:53 -0000, Steven Schveighoffer
 <schveiguy yahoo.com> wrote:

 Gah, I didn't cut out the right rules. I meant the two rules that
 empty must be called before others. Those are not necessary.

I see. I was thinking we ought to make empty mandatory to give more guaranteed structure for range implementors, so lazy initialisation can be done in one place only, etc etc.

Yes, but when you know that empty is going to return false, there isn't any logical reason to call it. It is an awkward requirement. I had the same thinking as you, why pay for an extra check for all 3 calls? But there was already evidence that people were avoiding empty.

I think requiring users to call empty before front on input ranges is a concession we should make.

if(!r.empty) { auto r2 = map!(x => x * 2)(r); do { auto x = r2.front; ... } while(!r2.empty); } Should we be required to re-verify that r2 is not empty before using it? It clearly is not, and would be an artificial requirement (one that map likely would not enforce!). This sounds so much like a convention that simply won't be followed, and the result will be perfectly valid code. -Steve
Mar 26 2014
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Wed, 26 Mar 2014 22:48:16 -0400, Walter Bright  
<newshound2 digitalmars.com> wrote:

 On 3/26/2014 7:19 PM, Steven Schveighoffer wrote:
 if(!r.empty)
 {
     auto r2 = map!(x => x * 2)(r);
     do
     {
        auto x = r2.front;
        ...
     } while(!r2.empty);
 }

 Should we be required to re-verify that r2 is not empty before using  
 it? It
 clearly is not, and would be an artificial requirement (one that map  
 likely
 would not enforce!).

 This sounds so much like a convention that simply won't be followed,  
 and the
 result will be perfectly valid code.

The idea is that there are three functions: empty, front, and popFront. The functionality of the range will be distributed between these 3 functions. Different things being ranged over will naturally split their operations into the 3 functions in different ways. If the 3 functions can be called in any order, then extra logic has to be added to them to signal to each other. This slows things down. To write generic code calling ranges, it's necessary to stop assuming what is happening under the hood of empty, front, and popFront, and stick to the protocol.

OK, but it's logical to assume you *can* avoid a call to empty if you know what's going on under the hood, no? Then at that point, you have lost the requirement -- people will avoid calling empty because they can get away with it, and then altering the under-the-hood requirements cause code breakage later. Case in point, the pull request I referenced, the author originally tried to just use empty to lazily initialize filter, but it failed due to existing code in phobos that did not call empty on filtered data before processing. He had to instrument all 3 calls. -Steve
Mar 26 2014
prev sibling next sibling parent "Paulo Pinto" <pjmlp progtools.org> writes:
On Wednesday, 26 March 2014 at 18:04:44 UTC, Ola Fosheim Grøstad
wrote:
 On Wednesday, 26 March 2014 at 17:36:08 UTC, Andrei 
 Alexandrescu wrote:
 I think requiring users to call empty before front on input 
 ranges is a concession we should make.

Then the name should change to "ready". It makes sense to require the user to check that the range is "ready", but not to check that it is "not empty". This will also make more sense for async implementations that will block if "not ready". IMO the whole interface needs rethinking if you want to gracefully support async data streams where you need to distinguish between: "ready" vs "empty", "front" vs "firstavailable". Both quick-sort, merge-sort, filter and map work well with async data streams.

Why not? It is how iterators work in most OO languages. -- Paulo
Mar 27 2014
prev sibling next sibling parent "Regan Heath" <regan netmail.co.nz> writes:
On Thu, 27 Mar 2014 02:44:13 -0000, Daniel Murphy  
<yebbliesnospam gmail.com> wrote:

 "Regan Heath"  wrote in message  
 news:op.xdb9a9v354xghj puck.auriga.bhead.co.uk...

 What guarantees range2 is longer than range1?  The isArray case checks  
 explicitly, but the generic one doesn't.  Is it a property of being an  
 output range that it will expand as required, or..

Some ranges will give you their length...

Sure. And generally you could use it. copy() doesn't, and I was talking specifically about that example. :) R -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Mar 27 2014
prev sibling next sibling parent "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:
On Thursday, 27 March 2014 at 04:17:16 UTC, Walter Bright wrote:
 On 3/26/2014 7:55 PM, Steven Schveighoffer wrote:
 OK, but it's logical to assume you *can* avoid a call to empty 
 if you know
 what's going on under the hood, no? Then at that point, you 
 have lost the
 requirement -- people will avoid calling empty because they 
 can get away with
 it, and then altering the under-the-hood requirements cause 
 code breakage later.

 Case in point, the pull request I referenced, the author 
 originally tried to
 just use empty to lazily initialize filter, but it failed due 
 to existing code
 in phobos that did not call empty on filtered data before 
 processing. He had to
 instrument all 3 calls.

As with *any* API, if you look under the hood and make assumptions about the behavior based on a particular implementation, assumptions that are not part of the API, the risk of breakage inevitably follows. If you've identified Phobos code that uses ranges but does not follow the protocol, the Phobos code is broken - please file a bugzilla issue on it.

I was originally going to do that, but then I took a closer look at the documentation, which says ([1] in the documentation of `isInputRange()`): "Calling r.front is allowed only if calling r.empty has, or would have, returned false." (And the same for `popFront()`.) That is, the documentation more or less explicitly states that you don't actually need to call `empty` if you know it returned `true`. [1] http://dlang.org/phobos/std_range.html
Mar 27 2014
prev sibling next sibling parent "Regan Heath" <regan netmail.co.nz> writes:
On Thu, 27 Mar 2014 02:19:13 -0000, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:
 if(!r.empty)
 {
     auto r2 = map!(x => x * 2)(r);
     do
     {
        auto x = r2.front;
        ...
     } while(!r2.empty);
 }

if(r.empty) return; auto r2 = map!(x => x * 2)(r); while(!r2.empty) { auto x = r2.front; ... r2.popFront(); //bug fix for your version which I noticed because I followed "the pattern" :D } ahh.. much better. R -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Mar 27 2014
prev sibling next sibling parent "Regan Heath" <regan netmail.co.nz> writes:
On Thu, 27 Mar 2014 10:49:42 -0000, Marc Sch=FCtz <schuetzm gmx.net> wro=
te:

 On Thursday, 27 March 2014 at 04:17:16 UTC, Walter Bright wrote:
 On 3/26/2014 7:55 PM, Steven Schveighoffer wrote:
 OK, but it's logical to assume you *can* avoid a call to empty if yo=



 know
 what's going on under the hood, no? Then at that point, you have los=



 the
 requirement -- people will avoid calling empty because they can get =



 away with
 it, and then altering the under-the-hood requirements cause code  =



 breakage later.

 Case in point, the pull request I referenced, the author originally =



 tried to
 just use empty to lazily initialize filter, but it failed due to  =



 existing code
 in phobos that did not call empty on filtered data before processing=



 He had to
 instrument all 3 calls.

As with *any* API, if you look under the hood and make assumptions =


 about the behavior based on a particular implementation, assumptions =


 that are not part of the API, the risk of breakage inevitably follows=


 If you've identified Phobos code that uses ranges but does not follow=


 the protocol, the Phobos code is broken - please file a bugzilla issu=


 on it.

I was originally going to do that, but then I took a closer look at th=

 documentation, which says ([1] in the documentation of `isInputRange()=

 "Calling r.front is allowed only if calling r.empty has, or would have=

 returned false."

 (And the same for `popFront()`.)

 That is, the documentation more or less explicitly states that you don=

 actually need to call `empty` if you know it returned `true`.

 [1] http://dlang.org/phobos/std_range.html

That's because up until now we've made no attempt to set this in stone, = = and as such many interpretations have surfaced. This documentation woul= d, = of course, change to match the final decision made. R -- = Using Opera's revolutionary email client: http://www.opera.com/mail/
Mar 27 2014
prev sibling next sibling parent "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Thursday, 27 March 2014 at 08:13:28 UTC, Paulo Pinto wrote:
 Why not?

 It is how iterators work in most OO languages.

Why not what? A query for "empty()" should not have any side effects. In what library does that happen? Tests should in general not have side effects unless the name suggest so. Anyway, I wish D was more clear about use scenarios. If D is going to be efficient in the server space then it should also make low latency programming easier, meaning supporting async collections out-of-the-box.
Mar 27 2014
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 27 Mar 2014 00:17:21 -0400, Walter Bright  
<newshound2 digitalmars.com> wrote:

 On 3/26/2014 7:55 PM, Steven Schveighoffer wrote:
 OK, but it's logical to assume you *can* avoid a call to empty if you  
 know
 what's going on under the hood, no? Then at that point, you have lost  
 the
 requirement -- people will avoid calling empty because they can get  
 away with
 it, and then altering the under-the-hood requirements cause code  
 breakage later.

 Case in point, the pull request I referenced, the author originally  
 tried to
 just use empty to lazily initialize filter, but it failed due to  
 existing code
 in phobos that did not call empty on filtered data before processing.  
 He had to
 instrument all 3 calls.

As with *any* API, if you look under the hood and make assumptions about the behavior based on a particular implementation, assumptions that are not part of the API, the risk of breakage inevitably follows.

Like range.save. It's "required", but frequently omitted, without any consequences. I'm not saying requiring it would be an invalid decision, I'm saying requiring it would be a futile gesture -- the requirement would be ignored. What happens when one of our "clients" code breaks severely because we make a change in phobos that assumes empty is always called first on a newly-created range?
 If you've identified Phobos code that uses ranges but does not follow  
 the protocol, the Phobos code is broken - please file a bugzilla issue  
 on it.

I think we should work on making a protocol that does not require awkward calls, rather than alienating developers who don't follow the awkward protocol. BTW, I think there has been recent talk about not focusing on dust when we are laying bricks. This would have my vote as useless dust. This does not solve the problem of streams-as-ranges, because streams don't make good ranges. It doesn't really solve any problems as far as I can tell. -Steve
Mar 27 2014
prev sibling next sibling parent "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Thursday, 27 March 2014 at 12:35:04 UTC, Steven Schveighoffer 
wrote:
 What happens when one of our "clients" code breaks severely 
 because we make a change in phobos that assumes empty is always 
 called first on a newly-created range?

You probably can run-time test this in Debug builds, but…
 I think we should work on making a protocol that does not 
 require awkward calls, rather than alienating developers who 
 don't follow the awkward protocol.

This is it. There is way too much awkwardness in D already. And essentially, the closer the design is to mainstream the more annoying it is when you diverge and are inconsistent. Code should behave as expected without any need for memorizing. Having to differentiate between dialects is so much harder than differentiating between orthogonal languages.
Mar 27 2014
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 27 Mar 2014 11:40:59 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 On 3/27/14, 5:35 AM, Steven Schveighoffer wrote:
 BTW, I think there has been recent talk about not focusing on dust when
 we are laying bricks. This would have my vote as useless dust. This does
 not solve the problem of streams-as-ranges, because streams don't make
 good ranges. It doesn't really solve any problems as far as I can tell.

I think byXchar are important and are not streams. -- Andrei

What's broken about those? -Steve
Mar 27 2014
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 27 Mar 2014 11:19:15 -0400, Andrei Alexandrescu  =

<SeeWebsiteForEmail erdani.org> wrote:

 On 3/27/14, 3:49 AM, "Marc Sch=C3=BCtz" <schuetzm gmx.net>" wrote:
 I was originally going to do that, but then I took a closer look at t=


 documentation, which says ([1] in the documentation of  =


 `isInputRange()`):

 "Calling r.front is allowed only if calling r.empty has, or would hav=


 returned false."

Probably we need to amend that. For efficient ranges, front() and =

 popFront() should only be guaranteed to work if either empty() or  =

 length() were evaluated first, and they returned false or nonzero,  =

 respectively.

This is a misleading statement. An array is probably the most efficient = of = ranges, which does not require this. We should be more specific here. For certain types of *nonempty* ranges,= = front is only guaranteed to work if empty/length was called before front= = is called. I would not necessarily lump popFront into that requirement = automatically, popFront may be able to cope with not having cached the = first element without losing efficiency. Questions to answer: 1. What types of ranges does this rule apply to? 2. What is the cost of not requiring this in terms of efficiency and eas= e = of implementation. 3. Is there a better mechanism to use than misusing empty? should we = introduce another property/call? At the moment, the only candidates I can think of are lazy caching range= s. = How many of those exist? -Steve
Mar 27 2014
prev sibling next sibling parent "monarch_dodra" <monarchdodra gmail.com> writes:
On Thursday, 27 March 2014 at 15:19:37 UTC, Andrei Alexandrescu 
wrote:
 On 3/27/14, 3:49 AM, "Marc Schütz" <schuetzm gmx.net>" wrote:
 I was originally going to do that, but then I took a closer 
 look at the
 documentation, which says ([1] in the documentation of 
 `isInputRange()`):

 "Calling r.front is allowed only if calling r.empty has, or 
 would have,
 returned false."

Probably we need to amend that. For efficient ranges, front() and popFront() should only be guaranteed to work if either empty() or length() were evaluated first, and they returned false or nonzero, respectively.

I just want to point out that I think allowing empty to have an *observable* side effect will have cataclysmic repercussion in validating a release build, what with all our "assert(!empty);" calls and all.
Mar 27 2014
prev sibling next sibling parent "monarch_dodra" <monarchdodra gmail.com> writes:
On Thursday, 27 March 2014 at 15:45:14 UTC, Andrei Alexandrescu 
wrote:
 On 3/27/14, 8:44 AM, Steven Schveighoffer wrote:
 On Thu, 27 Mar 2014 11:40:59 -0400, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 On 3/27/14, 5:35 AM, Steven Schveighoffer wrote:
 BTW, I think there has been recent talk about not focusing 
 on dust when
 we are laying bricks. This would have my vote as useless 
 dust. This does
 not solve the problem of streams-as-ranges, because streams 
 don't make
 good ranges. It doesn't really solve any problems as far as 
 I can tell.

I think byXchar are important and are not streams. -- Andrei

What's broken about those?

Speed.

I call shenanigans. This is the code: property bool empty() { if (nLeft) return false; if (r.empty) return true; If you initialized the next element in both constructor and popFront, then you'd get rid of both these checks. Furthermore, popFront() has a guaranteed "1 call" per element. Weareas "empty" and "front" may be called several times in a row for a single element. If you want efficiency, stop being "member-wise" lazy, and put some eagerness in your code.
 filter comes to mind. -- Andrei

You *just* turned down a pull that did exactly that, for exactly the same reasons. Have "byXChar" function like filter: construction and popFront do the work, and front/empty is 0(1), and const, and strongly pure to boot. I *know* the compiler likes that in terms of speed.
Mar 27 2014
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 27 Mar 2014 12:02:16 -0400, monarch_dodra <monarchdodra gmail.co=
m>  =

wrote:

 On Thursday, 27 March 2014 at 15:19:37 UTC, Andrei Alexandrescu wrote:=

 On 3/27/14, 3:49 AM, "Marc Sch=C3=BCtz" <schuetzm gmx.net>" wrote:
 I was originally going to do that, but then I took a closer look at =



 documentation, which says ([1] in the documentation of  =



 `isInputRange()`):

 "Calling r.front is allowed only if calling r.empty has, or would ha=



 returned false."

Probably we need to amend that. For efficient ranges, front() and =


 popFront() should only be guaranteed to work if either empty() or  =


 length() were evaluated first, and they returned false or nonzero,  =


 respectively.

I just want to point out that I think allowing empty to have an =

 *observable* side effect will have cataclysmic repercussion in  =

 validating a release build, what with all our "assert(!empty);" calls =

 and all.

This is an excellent point. assert(!empty) is everywhere. -Steve
Mar 27 2014
prev sibling next sibling parent "monarch_dodra" <monarchdodra gmail.com> writes:
On Thursday, 27 March 2014 at 19:03:26 UTC, Andrei Alexandrescu 
wrote:
 On 3/27/14, 9:12 AM, monarch_dodra wrote:
 filter comes to mind. -- Andrei

You *just* turned down a pull that did exactly that, for exactly the same reasons.

I was on the verge, and we need to discuss this more. A constructor that does arbitrary work for a lazy range doesn't sit well. -- Andrei

I still think there's ambiguity in the word "lazy". I think this is the distinction most make: Not lazy: //---- auto r = getSomeRange(); auto arr = someRange.array(); foreach( ref e ; arr) doSomething; remove!fun(arr); return arr; //---- Lazy: //---- return getSomeRange() .map!doSomething() .filter!fun() .array(); //---- What you are talking about (IMO) is better described as "eager" vs "not eager". "A constructor does some work for a lazy range that does eager processing of its elements": I think this is a better description, and I think is acceptable.
Mar 27 2014
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 27 Mar 2014 16:53:35 -0400, Walter Bright  
<newshound2 digitalmars.com> wrote:

 On 3/27/2014 9:12 AM, monarch_dodra wrote:
 If you initialized the next element in both constructor

As mentioned earlier, requiring this of a range constructor makes it impossible to separate composition of a range with using it.

Not impossible. Use lazy initialization.
 As also mentioned earlier, this throws under the bus entire categories  
 of use cases for ranges, all to avoid a single call to 'empty'.

This is going to look weird: r.empty; // now we can use it... It only makes sense if you are using empty in your processing of the range. -Steve
Mar 27 2014
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 27 Mar 2014 16:58:12 -0400, Walter Bright  
<newshound2 digitalmars.com> wrote:

 On 3/27/2014 12:50 PM, Andrei Alexandrescu wrote:
 Yah, agreed. -- Andrei

Completely unworkable. To determine if a range is empty or not, it may have to actually read from its input. TTYs are an example. Although empty may then cache the result, and not read the second time it is called, an observer of TTY could see that an item was read from the TTY.

In the land of ranges, it's construction and popFront that generally advances the range. Empty does not. This is why assert(!empty) is all over the place, it's considered to be non-destructive. Note that a stream makes a terrible range, simply because of what you say -- reading is destructive, and determining emptiness is dependent on reading. You need a buffered stream to make a good range, and then the logic becomes much more straightforward. The awkwardness for shoehorning streams into ranges I see is that the advancement (popFront) and the check for termination (empty) are decoupled, when in reality they are synced for a stream. This REQUIRES a sticky bit for popFront to communicate with empty on whether it is EOF or not. Even a single byte buffer is not enough, you need a bool to indicate the stream is done. -Steve
Mar 27 2014
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 27 Mar 2014 17:06:35 -0400, Walter Bright  =

<newshound2 digitalmars.com> wrote:

 On 3/27/2014 5:45 AM, "Ola Fosheim Gr=C3=B8stad"  =

 <ola.fosheim.grostad+dlang gmail.com>" wrote:
 On Thursday, 27 March 2014 at 12:35:04 UTC, Steven Schveighoffer wrot=


 I think we should work on making a protocol that does not require  =



 awkward
 calls, rather than alienating developers who don't follow the awkwar=



 protocol.

This is it.

Sure, but what is awkward about empty-front-popFront? You can still always use: foreach (e; range) { ... } which will follow the correct protocol automatically.

If this is all we needed, ranges would never have been invented. -Steve
Mar 27 2014
prev sibling next sibling parent "monarch_dodra" <monarchdodra gmail.com> writes:
On Thursday, 27 March 2014 at 21:54:00 UTC, Andrei Alexandrescu 
wrote:
 On 3/27/14, 1:58 PM, Walter Bright wrote:
 On 3/27/2014 12:50 PM, Andrei Alexandrescu wrote:
 Yah, agreed. -- Andrei

Completely unworkable. To determine if a range is empty or not, it may have to actually read from its input. TTYs are an example. Although empty may then cache the result, and not read the second time it is called, an observer of TTY could see that an item was read from the TTY.

That's a good point too. Andrei

Yes, but the "observability" should be sort lived, since empty is virtually guaranteed to be followed by "front" anyways. If both front/empty do the same (lazy) operation on the stream, then whether or not empty was called isn't really "observable" (to a certain extent). In contrast, having front completly fail if empty was NOT called, means "empty" has a hell of a lot of observable side effect.
Mar 27 2014
prev sibling next sibling parent "QAston" <qaston gmail.com> writes:
 Even a single byte buffer is not enough, you need a bool to 
 indicate the stream
 is done.

Right. But empty for a stream still has to read. Just follow the protocol, and the range will work, even with streams.

The protocol is not intuitive. I'm not entirely against having to call something to do lazy initialization but it's not like anyone will expect empty property to lazily initialize. Empty and front could be methods added by a mixin template which would use efficient methods with strict protocol implemented by the user.
Mar 27 2014
prev sibling next sibling parent "Paolo Invernizzi" <paolo.invernizzi no.address> writes:
On Friday, 28 March 2014 at 00:41:42 UTC, Walter Bright wrote:
 On 3/27/2014 3:23 PM, QAston wrote:
 The protocol is not intuitive.

I find empty-front-popFront as perfectly intuitive. I don't find the counter proposals, which come with baggage like constructors that may fail, and front() that may fail in unspecified ways, or throwing entire paradigms out the window, as intuitive. But I concede that other people think differently. Not everyone thinks the same. But consider this: floating point math is not intuitive. There has never been a shortage of proposals to make fp intuitive, but they've all failed because they are impractical. Sometimes ya gotta go with what works.

I _strongly_ agree with Walter: people learning D in my groups have no problems with the empty-front-popFront sequence. Please don't complicate or change the notion of range: you can find an adjustment that don't break code, but for sure that will break the mindset of people. For what concern us, everyone here is happy with the fact that empty *must* be checked prior to front/popFront. -- Paolo
Mar 28 2014
prev sibling next sibling parent "RivenTheMage" <riven-mage id.ru> writes:
On Thursday, 27 March 2014 at 21:54:00 UTC, Andrei Alexandrescu
wrote:
 On 3/27/14, 1:58 PM, Walter Bright wrote:
 On 3/27/2014 12:50 PM, Andrei Alexandrescu wrote:
 Yah, agreed. -- Andrei

Completely unworkable. To determine if a range is empty or not, it may have to actually read from its input. TTYs are an example. Although empty may then cache the result, and not read the second time it is called, an observer of TTY could see that an item was read from the TTY.

That's a good point too. Andrei

Maybe new kind of range can help? Which has "bool popFront()" instead of "bool empty() + void popFront()".
Mar 28 2014
prev sibling next sibling parent "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:
On Thursday, 27 March 2014 at 16:12:36 UTC, monarch_dodra wrote:
 On Thursday, 27 March 2014 at 15:45:14 UTC, Andrei Alexandrescu 
 wrote:
 On 3/27/14, 8:44 AM, Steven Schveighoffer wrote:
 On Thu, 27 Mar 2014 11:40:59 -0400, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 On 3/27/14, 5:35 AM, Steven Schveighoffer wrote:
 BTW, I think there has been recent talk about not focusing 
 on dust when
 we are laying bricks. This would have my vote as useless 
 dust. This does
 not solve the problem of streams-as-ranges, because streams 
 don't make
 good ranges. It doesn't really solve any problems as far as 
 I can tell.

I think byXchar are important and are not streams. -- Andrei

What's broken about those?

Speed.

I call shenanigans. This is the code: property bool empty() { if (nLeft) return false; if (r.empty) return true; If you initialized the next element in both constructor and popFront, then you'd get rid of both these checks.

... but of course lose laziness.
 Furthermore, popFront() has a guaranteed "1 call" per element. 
 Weareas "empty" and "front" may be called several times in a 
 row for a single element.

 If you want efficiency, stop being "member-wise" lazy, and put 
 some eagerness in your code.

Does a flag "isInitialized" even have an effect on performance where it matters? It should only be relevant in tight loops, and there I expect a moderately well-working optimizer to inline calls to `empty` and `front`. It can then see that it has just set "isInitialized" to true/false, and remove the checks completely, effectively generating code as if the value were only fetched/computed in `empty`. That only leaves the larger struct size as a potential problem. However I've seen LDC being able to "decompose" a struct into separate variables, so I guess it can remove the unnecessary member in our case. In other situations, where you already have more complex code, an additional conditional branch will not have as much weight anyway. Does somehow have numbers for realistic programs, with GDC/LDC/DMD? I suspect this entire discussion is moot, because we can easily have non-eagerly initialized ranges without sacrificing performance in most situations.
Mar 28 2014
prev sibling next sibling parent "QAston" <qaston gmail.com> writes:
On Friday, 28 March 2014 at 08:34:08 UTC, Johannes Pfau wrote:
 Am Thu, 27 Mar 2014 17:20:25 -0700
 schrieb Walter Bright <newshound2 digitalmars.com>:

 On 3/27/2014 2:56 PM, Andrei Alexandrescu wrote:
 On 3/27/14, 2:24 PM, Walter Bright wrote:
 The range protocol is designed to work with streams.

It's designed to work with containers.

I know we talked about streams when we designed it.
 It's a giant fail
 if they do not, or if you want to create a separate, 
 non-range
 universe to deal with streams.

It's not a giant fail, we just need to adjust the notion.

Are you suggesting that ranges needn't support streams? Note also that I suggested a way Steven could create an adapter with the behavior he desired, yet still adhere to protocol. No notion adjustments required.

Ranges have equivalents in other languages: iterators in c++, IEnumerator in c#, Iterator in java all these languages have special stream types for raw data. I don't think it's bad if we also have streams/ranges separate in D.

There are stream iterators in C++: http://www.cplusplus.com/reference/iterator/istream_iterator/
Mar 28 2014
prev sibling next sibling parent "Tobias =?UTF-8?B?TcO8bGxlciI=?= <troplin bluewin.ch> writes:
On Thursday, 27 March 2014 at 20:49:16 UTC, Walter Bright wrote:
 On 3/27/2014 12:21 PM, Rainer Schuetze wrote:
 This loop is intuitive. Not being allowed to call empty or 
 front multiple times
 or not at all is unintuitive. They should not be named as if 
 they are properties
 then.

I can concede that. But I can't concede being able to call front without first calling empty, or calling popFront without calling empty and front, or requiring 'pump priming' in the constructor.

Disclaimer: I'm a C++ programmer just lurking here, I've never actually used D. I find it very counter-intuitive that 'empty' is required before front or popFront. Since 'pump priming' in the constructor isn't wanted either, i'd suggest the following protocol: while (popFront()) { front; } popFront is then required to return !empty. 'empty' as a separate property getter can stay but is not required for the protocol. This way, it's clear that the work to fetch the next element is always done in popFront. Generally I find dependecies between functions problematic that require a specific call sequence. If they can be removed, they should. With my proposed solution, there's still one minor dependency, namely that front is not valid before the first popFront. This could be solved by again combining the two, as proposed by someone else in this thread. Tobi
Mar 28 2014
prev sibling next sibling parent "Szymon Gatner" <noemail gmail.com> writes:
On Friday, 28 March 2014 at 23:14:56 UTC, Tobias Müller wrote:
 On Thursday, 27 March 2014 at 20:49:16 UTC, Walter Bright wrote:
 On 3/27/2014 12:21 PM, Rainer Schuetze wrote:
 This loop is intuitive. Not being allowed to call empty or 
 front multiple times
 or not at all is unintuitive. They should not be named as if 
 they are properties
 then.

I can concede that. But I can't concede being able to call front without first calling empty, or calling popFront without calling empty and front, or requiring 'pump priming' in the constructor.

Disclaimer: I'm a C++ programmer just lurking here, I've never actually used D. I find it very counter-intuitive that 'empty' is required before front or popFront. Since 'pump priming' in the constructor isn't wanted either, i'd suggest the following protocol: while (popFront()) { front; }

In c++ if you had a list or deque you would obviously did if(empty()) before calling front() too, and the before a call to pop_front(). Container is kind of range and in c++ it looks exactly the same: while (!cont.empty()) { auto& var = cont.front(); cont.pop_front(); // consume } 3 operations are needed.
Mar 28 2014
prev sibling next sibling parent "w0rp" <devw0rp gmail.com> writes:
On Friday, 28 March 2014 at 23:14:56 UTC, Tobias Müller wrote:
 On Thursday, 27 March 2014 at 20:49:16 UTC, Walter Bright wrote:
 On 3/27/2014 12:21 PM, Rainer Schuetze wrote:
 This loop is intuitive. Not being allowed to call empty or 
 front multiple times
 or not at all is unintuitive. They should not be named as if 
 they are properties
 then.

I can concede that. But I can't concede being able to call front without first calling empty, or calling popFront without calling empty and front, or requiring 'pump priming' in the constructor.

Disclaimer: I'm a C++ programmer just lurking here, I've never actually used D. I find it very counter-intuitive that 'empty' is required before front or popFront. Since 'pump priming' in the constructor isn't wanted either, i'd suggest the following protocol: while (popFront()) { front; } popFront is then required to return !empty. 'empty' as a separate property getter can stay but is not required for the protocol. This way, it's clear that the work to fetch the next element is always done in popFront. Generally I find dependecies between functions problematic that require a specific call sequence. If they can be removed, they should. With my proposed solution, there's still one minor dependency, namely that front is not valid before the first popFront. This could be solved by again combining the two, as proposed by someone else in this thread. Tobi

Well, it might seem kind of weird at first that an InputRange has these concepts in three distinct methods, but they really are three separate operations. After I have spent enough time working with Java Iterators, I have found it much nicer to be able to get the current value or see if I'm at the end or not without having to worry about caching the current value myself. Also as I've said before it gives you some opportunity to make trying to fetch something out of an empty range less error prone, because you don't have to check for a null value, etc.
Mar 28 2014
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Fri, 28 Mar 2014 07:47:22 -0400, Marc Sch=C3=BCtz <schuetzm gmx.net> =
wrote:

 On Thursday, 27 March 2014 at 16:12:36 UTC, monarch_dodra wrote:

 I call shenanigans. This is the code:

              property bool empty()
             {
                 if (nLeft)
                     return false;
                 if (r.empty)
                     return true;

 If you initialized the next element in both constructor and popFront,=


 then you'd get rid of both these checks.

... but of course lose laziness.

In this case, laziness is not critical. Decoding the element is an O(1) = = operation, and when looping through, you will decode it anyway. When processing a 20 element string, the cost of checking to see if you = = have decoded on every empty or front call may override the front-loaded = = cost of decoding the first element on construction. It's sure to add to = = the cost if you are processing all 20 elements, since you decode them al= l = anyway. On other ranges, it's more important when the first element costs a lot = to = fetch. HOWEVER, it's not critically important to delay that unless you a= re = not going to process that element. For example, if you are foreach'ing = over all the elements, the delay doesn't matter. I'd rather the higher level code decide whether to delay or not, dependi= ng = on the situation. Requiring a protocol change for such detailed knowledg= e = seems unbalanced. -Steve
Mar 28 2014
prev sibling next sibling parent "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:
On Saturday, 29 March 2014 at 01:36:46 UTC, Steven Schveighoffer 
wrote:
 On Fri, 28 Mar 2014 07:47:22 -0400, Marc Schütz 
 <schuetzm gmx.net> wrote:
 On Thursday, 27 March 2014 at 16:12:36 UTC, monarch_dodra 
 wrote:
 If you initialized the next element in both constructor and 
 popFront, then you'd get rid of both these checks.

... but of course lose laziness.

In this case, laziness is not critical. Decoding the element is an O(1) operation, and when looping through, you will decode it anyway. When processing a 20 element string, the cost of checking to see if you have decoded on every empty or front call may override the front-loaded cost of decoding the first element on construction. It's sure to add to the cost if you are processing all 20 elements, since you decode them all anyway. On other ranges, it's more important when the first element costs a lot to fetch. HOWEVER, it's not critically important to delay that unless you are not going to process that element. For example, if you are foreach'ing over all the elements, the delay doesn't matter. I'd rather the higher level code decide whether to delay or not, depending on the situation. Requiring a protocol change for such detailed knowledge seems unbalanced.

I was more thinking of the fact that you need to read something on construction, rather than on consumption, and this reading might be noticeable. There was the example of `stdin.byLine().filter(...)` (or something similar, don't remember exactly), which reads from stdin on construction. This changes the behaviour of the program, because the read operation will (probably) block. I'd suggest to make it a requirement for ranges and algorithms _not_ to start consuming the underlying data until one of empty/front/popFront is called, even if that has a negative effect on performance. That's why I was asking for performance numbers, to see whether there even is an effect. If there isn't, that's just another argument for adding that requirement. This is then, IMO, a very acceptable additional burden to place on the writers of ranges. I agree, however, that it's not a good idea to change the range protocol, i.e. what _users_ of ranges have to abide by. That would be a breaking change, and it would be an especially bad one because there I see no way to detect that a user failed to call `empty` in an iteration if they knew that there are more elements available.
Mar 29 2014
prev sibling parent "Oscar =?UTF-8?B?TWFydMOtbiI=?= <omarmed0000 hotmail.com> writes:
On Friday, 28 March 2014 at 23:14:56 UTC, Tobias Müller wrote:
 On Thursday, 27 March 2014 at 20:49:16 UTC, Walter Bright wrote:
 On 3/27/2014 12:21 PM, Rainer Schuetze wrote:
 This loop is intuitive. Not being allowed to call empty or 
 front multiple times
 or not at all is unintuitive. They should not be named as if 
 they are properties
 then.

I can concede that. But I can't concede being able to call front without first calling empty, or calling popFront without calling empty and front, or requiring 'pump priming' in the constructor.

Disclaimer: I'm a C++ programmer just lurking here, I've never actually used D. I find it very counter-intuitive that 'empty' is required before front or popFront. Since 'pump priming' in the constructor isn't wanted either, i'd suggest the following protocol: while (popFront()) { front; } popFront is then required to return !empty. 'empty' as a separate property getter can stay but is not required for the protocol. This way, it's clear that the work to fetch the next element is always done in popFront. Generally I find dependecies between functions problematic that require a specific call sequence. If they can be removed, they should. With my proposed solution, there's still one minor dependency, namely that front is not valid before the first popFront. This could be solved by again combining the two, as proposed by someone else in this thread. Tobi

First sorry for my english. I agree. All work is done in "bool popFront()" (well, this name may no longer be the most appropriate). In this function the first / next item is obtained and caches. Returns true if any element, false otherwise. "front" return de cached element as now, as many times as desired and without side effects. "empty" function is no longer necessary, but it might be useful to keep changing the return type for example to int: 1 - Sure there is more elements 0 - Sure there is NO more elements -1 - I don't known. Try popFront to know if more element Of course this function as "front", without side effects
Mar 30 2014
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Wednesday, March 26, 2014 10:36:08 Andrei Alexandrescu wrote:
 On 3/26/14, 8:37 AM, Steven Schveighoffer wrote:
 On Wed, 26 Mar 2014 11:09:04 -0400, Regan Heath <regan netmail.co.nz>
 
 wrote:
 On Wed, 26 Mar 2014 12:30:53 -0000, Steven Schveighoffer
 
 <schveiguy yahoo.com> wrote:
 Gah, I didn't cut out the right rules. I meant the two rules that
 empty must be called before others. Those are not necessary.

I see. I was thinking we ought to make empty mandatory to give more guaranteed structure for range implementors, so lazy initialisation can be done in one place only, etc etc.

Yes, but when you know that empty is going to return false, there isn't any logical reason to call it. It is an awkward requirement. I had the same thinking as you, why pay for an extra check for all 3 calls? But there was already evidence that people were avoiding empty.

I think requiring users to call empty before front on input ranges is a concession we should make.

I don't know. It's not the end of the world if we require it (at least with a range that doesn't have length), since you're almost always going to be forced to call empty anyway, because ranges are generally used in generic code. However, it seems really off to me for there to be any real work going on in empty, and I'd be leery of any range which actually required empty to be called before calling front (though I definitely think that calling front on an empty range should not be considered okay and would expect that to generally be undefined behavior). Regardless, if the range has length, then I'd think that quite a few of the algorithms which actually used length wouldn't actually need to call empty, making calling empty unnecessary overhead. But for the most part, I think that a lot of the weird cases go away when you end up with length and random-access ranges and the like, since that usually means that the range isn't generative but likely has an array underneath (though map certainly has its quirks thanks to the fact that front could technically allocate a new object on every call, and it can be random-access). - Jonathan M Davis
Mar 26 2014
prev sibling next sibling parent "Chris" <wendlec tcd.ie> writes:
On Sunday, 23 March 2014 at 01:07:27 UTC, Rikki Cattermole wrote:
 On Sunday, 23 March 2014 at 00:50:34 UTC, Walter Bright wrote:
 It's become clear to me that we've underspecified what an 
 InputRange is. The normal way to use it is:

    while (!r.empty) {
        auto e = r.front;
        ... do something with e ...
        r.popFront();
    }

 no argument there. But there are two issues:

 1. If you know the range is not empty, is it allowed to call 
 r.front without calling r.empty first?

 If this is true, extra logic will need to be added to r.front 
 in many cases.

 2. Can r.front be called n times in a row? I.e. is calling 
 front() destructive?

 If true, this means that r.front will have to cache a copy in 
 many cases.

It would be nice to have ranges full stop described. And how to use/make them in D. I have yet to learn them because of no official documentation on them. On this topic we also need to work on common patterns and creating documentation on e.g. the site or wiki for it. Saw that we do have a bit in the wiki under tutorials. Maybe if I get some time I'll work on that.

I agree. I've been using ranges for a while now and have tried different implementations based on advice given on this forum and depending on each case. After reading this thread I looked at some of my ranges and I have to say I still have no clue as to what should and should _not_ be done (regardless of whether you _can_ do it). For a while I thought that it's my lack of understanding, but this thread shows that everyone has a different view of ranges. Guidelines with use cases would be great. I remember I mentioned this in another thread already. The thing is that experimenting without proper guidelines leaves your code in an inconsistent state where you have two or more ranges doing technically the same thing but each with a different logic.
Mar 27 2014
prev sibling next sibling parent reply =?UTF-8?B?Ikx1w61z?= Marques" <luis luismarques.eu> writes:
On Sunday, 23 March 2014 at 00:50:34 UTC, Walter Bright wrote:
 It's become clear to me that we've underspecified what an 
 InputRange is.

Some thoughts: - A while ago I realized I did not know how some of those underspecified details should be implemented (I think my initial doubt was if the range was expected/required to enforce(!empty)). Assuming it was just my ignorance (and incomplete or hard to find documentation), I asked in the IRC channel and, IIRC, the answer I got there was that ranges are an abstract concept, so such minutia were supposed to be implementation details. From this forum thread and that IRC feedback, it seems that, indeed, people have been operating under an illusion of a shared universal assumption. I'm very happy to see this addressed. - (Lots of text deleted because I was no longer sure about it ...) - Is it allowed for an InputRange to become empty and then not empty again? For a finite RandomAccessRange to increase in length? E.g.: // Example 1: (with "outside" control flow) if(someQueue.empty) addStuffToQueue(); assert(!someQueue.empty); // Example 2: (with only range-related control flow) if(!someQueue.empty) { auto n = someQueue.length; someQueue.popFront(); assert(someQueue.length == n-1); // can this fail? } - Is it allowed to not call front? E.g.: void dropAll(Range)(Range range) { while(!empty) range.popFront(); }
Mar 27 2014
parent "Daniel Murphy" <yebbliesnospam gmail.com> writes:
""Luís Marques " wrote in message 
news:exmfmqftpykoaxdgluit forum.dlang.org...

 - Is it allowed for an InputRange to become empty and then not empty 
 again? For a finite RandomAccessRange to increase in length? E.g.:

Not by itself - if a range's empty has returned true, calling empty repeatedly should continue to return true. If you change it externally (ie not by using the range interface) then it can become non-empty. eg int[] arr; assert(arr.empty); arr ~= 3; // no longer empty, but we used a method outside the range interface to change it. I don't think a random number generator or some hardware device producing more data would be a good reason to change empty - range.empty is not asking 'is there more data ready', it's asking 'are there items left to iterate over'.
 - Is it allowed to not call front? E.g.:

Yes.
Mar 29 2014
prev sibling next sibling parent Johannes Pfau <nospam example.com> writes:
Am Thu, 27 Mar 2014 17:20:25 -0700
schrieb Walter Bright <newshound2 digitalmars.com>:

 On 3/27/2014 2:56 PM, Andrei Alexandrescu wrote:
 On 3/27/14, 2:24 PM, Walter Bright wrote:
 The range protocol is designed to work with streams.

It's designed to work with containers.

I know we talked about streams when we designed it.
 It's a giant fail
 if they do not, or if you want to create a separate, non-range
 universe to deal with streams.

It's not a giant fail, we just need to adjust the notion.

Are you suggesting that ranges needn't support streams? Note also that I suggested a way Steven could create an adapter with the behavior he desired, yet still adhere to protocol. No notion adjustments required.

Ranges have equivalents in other languages: iterators in c++, IEnumerator in c#, Iterator in java all these languages have special stream types for raw data. I don't think it's bad if we also have streams/ranges separate in D.
Mar 28 2014
prev sibling next sibling parent "Regan Heath" <regan netmail.co.nz> writes:
On Fri, 28 Mar 2014 08:59:34 -0000, Paolo Invernizzi  
<paolo.invernizzi no.address> wrote:
 For what concern us, everyone here is happy with the fact that empty  
 *must* be checked prior to front/popFront.

This is actually not true. R -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Mar 28 2014
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 27 Mar 2014 20:25:56 -0400, Walter Bright  
<newshound2 digitalmars.com> wrote:

 On 3/27/2014 2:31 PM, Steven Schveighoffer wrote:
 Adding range primitives on top of a stream does not make sense.

Are ready to implement a parallel universe of stream based algorithms to go alongside all the range based ones and be ready to constantly justify that to everyone?

Not necessary. You just need to implement one range on top of a buffered stream, and then it works with all other algorithms that accept input ranges.
 I'm also curious what a generic read() should do when the stream is  
 empty.

Returns 0 elements read. -Steve
Mar 28 2014
prev sibling next sibling parent "Chris" <wendlec tcd.ie> writes:
On Friday, 28 March 2014 at 04:58:28 UTC, Walter Bright wrote:
 On 3/27/2014 3:21 AM, Chris wrote:
 I agree. I've been using ranges for a while now and have tried
 different implementations based on advice given on this forum 
 and
 depending on each case. After reading this thread I looked at
 some of my ranges and I have to say I still have no clue as to
 what should and should _not_ be done (regardless of whether you
 _can_ do it). For a while I thought that it's my lack of
 understanding, but this thread shows that everyone has a
 different view of ranges. Guidelines with use cases would be
 great. I remember I mentioned this in another thread already. 
 The
 thing is that experimenting without proper guidelines leaves 
 your
 code in an inconsistent state where you have two or more ranges
 doing technically the same thing but each with a different 
 logic.

Following the protocol empty-front-popFront always works. Where the differences come in is when people skip calling empty.

Not only there. The whole issue of front and popFront has not been made clear yet (unless it's my lack of understanding). What should and should not be done there, even if you stick to the basic protocol (!empty > front > popFront). I like the concept of ranges and use them a lot, because they help me to create independent components. But sometimes I think they are just glorified for-loops in the sense that we know more or less what the input is, and part of the reason why we have this thread is that we want everything to be specialized and generic at the same time. E.g. monarch_dodra wrote: "It's not that *you* know a particular range doesn't need "empty" called, it's that the algorithm you are using has already previously validated there are elements in it. For example, the splitter algorithm will first save a copy of its range, and then walk it, searching for the "splitting" elements. It then realizes it has walked N elements." But then it's no longer generic. If your range depends on another algorithm (that you assume or know to have checked something), then it's no longer generic. In many cases "generic" means an overhead of checks, and is anathema to performance tuning. I think it's the issue of generic vs. specialized / optimized that started this thread. And this is a tough one to sort out.
Mar 28 2014
prev sibling next sibling parent "w0rp" <devw0rp gmail.com> writes:
I think something has gone wrong in this discussion somewhere. It 
is like this.

Do not assume .empty is const, this is unreasonable.

Do not assume .front is const, this is unreasonable.

.popFront has the only high-level observable difference in 
removing something from the front of an InputRange.

I say it is unreasonable to expect .empty or .front to be const 
because a range which is strictly an InputRange (not a 
ForwardRange...) is likely to be doing some kind of work where 
stepping through its sequence of data is destructive. As soon as 
the thing is read in the implementation of the InputRange, you 
can't read that thing again. You've cached it or it is gone. Thus 
.empty and .front must be non-const and lazily evaluate caching 
the next value.

I do not see it as a major issue to check .empty or call .front 
many times, even with this caching behaviour. Commonly this is as 
simple as checking whether or not you still have a value to 
provide inside the InputRange. We use assert(!empty) etc. because 
honestly I cannot imagine another way to make this safe.

If you really, really hate this double-checking in .empty or 
.front, you could try to change InputRanges so they behave like 
Java Iterators. I believe this is a really bad idea. You have to 
start thinking about returning nullable types. You have to push 
the job of caching values on to users of the InputRanges, as in 
each and every algorithm. It's just a really bad choice, and with 
the kind of boxing you'll need to do for nullable types, you'll 
surely end off worse as far as performance goes anyway.

Really, the range protocol is as Walter specified at the start of 
this thread. The documentation should reflect how .empty or 
.front may result in caching the next value. I really don't think 
this is a major performance problem worth worrying about, though. 
If it's really, really worth it for certain standard algorithms, 
perhaps something can be done in private range methods or similar 
to offer some kind of speed boost. ('package' privacy for std.* 
is feasible.) I don't think there is a lot to be gained from this 
in general, though.

Streams are different, and may be more appropriate for some 
functions for performance reasons. I still prefer ranges for code 
where performance is less critical, because if you aren't very 
worried about performance, the range API is a lot nicer. I'm not 
dead set on maximum performance if it makes life a lot harder in 
my code, personally. I view these kinds of improvements as 
premature optimisations if I'm writing something new.
Mar 28 2014
prev sibling next sibling parent "Chris" <wendlec tcd.ie> writes:
Earlier Walter wrote:

"I don't like being in the position of when I need high 
performance code, I have
to implement my own ranges & algorithms, or telling customers 
they need to do so."

I don't think there is a one size fits all. What if customers ask 
for maximum security? In any language, if I want high 
performance, I have to be prepared to walk on thin ice. If I want 
things to be safe and / or generic, I have to accept additonal 
checks (= perfomance penalties). I don't think that a language 
can solve the fundamental problems concerning programming / 
mathematical logic with all the contradictory demands involved. 
It can give us the tools to cope with those problems, but not 
solve them out of the box.
Mar 28 2014
prev sibling next sibling parent "Regan Heath" <regan netmail.co.nz> writes:
On Fri, 28 Mar 2014 14:15:10 -0000, Chris <wendlec tcd.ie> wrote:

 Earlier Walter wrote:

 "I don't like being in the position of when I need high performance  
 code, I have
 to implement my own ranges & algorithms, or telling customers they need  
 to do so."

 I don't think there is a one size fits all. What if customers ask for  
 maximum security? In any language, if I want high performance, I have to  
 be prepared to walk on thin ice. If I want things to be safe and / or  
 generic, I have to accept additonal checks (= perfomance penalties). I  
 don't think that a language can solve the fundamental problems  
 concerning programming / mathematical logic with all the contradictory  
 demands involved. It can give us the tools to cope with those problems,  
 but not solve them out of the box.

You can build safety on top of performance. You cannot do the opposite. Meaning, one could wrap an unsafe/fast range with a safe/slower one. R -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Mar 28 2014
prev sibling next sibling parent "Chris" <wendlec tcd.ie> writes:
On Friday, 28 March 2014 at 15:49:06 UTC, Regan Heath wrote:
 On Fri, 28 Mar 2014 14:15:10 -0000, Chris <wendlec tcd.ie> 
 wrote:

 Earlier Walter wrote:

 "I don't like being in the position of when I need high 
 performance code, I have
 to implement my own ranges & algorithms, or telling customers 
 they need to do so."

 I don't think there is a one size fits all. What if customers 
 ask for maximum security? In any language, if I want high 
 performance, I have to be prepared to walk on thin ice. If I 
 want things to be safe and / or generic, I have to accept 
 additonal checks (= perfomance penalties). I don't think that 
 a language can solve the fundamental problems concerning 
 programming / mathematical logic with all the contradictory 
 demands involved. It can give us the tools to cope with those 
 problems, but not solve them out of the box.

You can build safety on top of performance. You cannot do the opposite. Meaning, one could wrap an unsafe/fast range with a safe/slower one. R

But should unsafe+fast be the default or rather an option for cases when you really need it?
Mar 28 2014
prev sibling next sibling parent "Paolo Invernizzi" <paolo.invernizzi no.address> writes:
On Friday, 28 March 2014 at 09:30:25 UTC, Regan Heath wrote:
 On Fri, 28 Mar 2014 08:59:34 -0000, Paolo Invernizzi 
 <paolo.invernizzi no.address> wrote:
 For what concern us, everyone here is happy with the fact that 
 empty *must* be checked prior to front/popFront.

This is actually not true. R

What I'm meaning, it's that we don't care: we are always respecting the sequence "empty > front > pop", and everybody here find it natural. -- Paolo
Mar 28 2014
prev sibling next sibling parent John Stahara <john.stahara+dlang gmail.com> writes:
On Fri, 28 Mar 2014 16:23:11 +0000, Paolo Invernizzi wrote:

 On Friday, 28 March 2014 at 09:30:25 UTC, Regan Heath wrote:
 On Fri, 28 Mar 2014 08:59:34 -0000, Paolo Invernizzi
 <paolo.invernizzi no.address> wrote:
 For what concern us, everyone here is happy with the fact that empty
 *must* be checked prior to front/popFront.

This is actually not true. R

What I'm meaning, it's that we don't care: we are always respecting the sequence "empty > front > pop", and everybody here find it natural.

To clarify for Mr. Invernizzi: the "we" to which he refers is the group of people he works with, and /not/ the members of this newsgroup. --jjs
Mar 28 2014
prev sibling next sibling parent "Regan Heath" <regan netmail.co.nz> writes:
On Fri, 28 Mar 2014 16:04:29 -0000, Chris <wendlec tcd.ie> wrote:

 On Friday, 28 March 2014 at 15:49:06 UTC, Regan Heath wrote:
 On Fri, 28 Mar 2014 14:15:10 -0000, Chris <wendlec tcd.ie> wrote:

 Earlier Walter wrote:

 "I don't like being in the position of when I need high performance  
 code, I have
 to implement my own ranges & algorithms, or telling customers they  
 need to do so."

 I don't think there is a one size fits all. What if customers ask for  
 maximum security? In any language, if I want high performance, I have  
 to be prepared to walk on thin ice. If I want things to be safe and /  
 or generic, I have to accept additonal checks (= perfomance  
 penalties). I don't think that a language can solve the fundamental  
 problems concerning programming / mathematical logic with all the  
 contradictory demands involved. It can give us the tools to cope with  
 those problems, but not solve them out of the box.

You can build safety on top of performance. You cannot do the opposite. Meaning, one could wrap an unsafe/fast range with a safe/slower one. R

But should unsafe+fast be the default or rather an option for cases when you really need it?

Pass. My point was only that it needs to exist. R -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Mar 28 2014
prev sibling next sibling parent Johannes Pfau <nospam example.com> writes:
Am Fri, 28 Mar 2014 02:55:45 -0700
schrieb Walter Bright <newshound2 digitalmars.com>:

 On 3/28/2014 1:32 AM, Johannes Pfau wrote:
 Ranges have equivalents in other languages:
 iterators in c++,
 IEnumerator in c#,
 Iterator in java

 all these languages have special stream types for raw data. I don't
 think it's bad if we also have streams/ranges separate in D.

Do you see a point to be able to, in an algorithm, seamlessly swap a socket with a string?

Sorry, but no. It sure sounds nice first, but I can't really imagine a use case where I would want any generic algorithms to work directly on a socket. Having a look on the cheat sheet of std.algorithm 99% of these algorithms do not make sense to operate on sockets, those that do (count, until) would be bad in performance terms when operating directly byte per byte. You at least want buffered reads when doing IO operations. If you then extend range interface to give access to an internal buffer you just reinvented streams.
 
 empty-front-popFront works with streams and non-streams. Why break
 this?
 
 

It 'works' with streams but it's way too slow. You don't want to read byte-per-byte. Of course you can always implement ranges on top of streams. Usually these will not provide byte-per-byte access but efficient higher level abstractions (byLine, byChunk, decodeText). The point is you can implement ranges on streams easily, but you can't use ranges as the generic primitive for raw data. What's the element type of a data range? ubyte - performance sucks ubyte[n], ubyte[] now you have a range of ranges, most algorithms wont work as expected (find, count, ...). (the call empty/don't call empty discussion is completely unrelated to this, btw. You can implement ranges on streams either way, but again, using ranges for raw data streams is not a good idea.)
Mar 28 2014
prev sibling next sibling parent Johannes Pfau <nospam example.com> writes:
Am Fri, 28 Mar 2014 17:22:26 +0000
schrieb "w0rp" <devw0rp gmail.com>:

 
 I think a key is to offer something with gives you chunks at a 
 time right at the top, and the use .joiner on that. I read files 
 this way currently.
 
 auto fileByteRange = File("something").byChunk(chunkSize).joiner;
 

byChunk is implemented on top of the file rawRead API though, and that's a stream API ;-) As said before implementing ranges on top of streams is fine, but if you want ranges to replace streams as the lowest level interface you'll either suffer from performance issues or you'll have to extend the range interface and effectively make it a stream interface. (For example byChunk doesn't offer a way to provide a buffer. I'd expect a low level API to offer this, but it'll complicate range API a lot. File.rawRead on the other hand provides exactly that and you can implement byChunk on top of rawRead easily. The other way round is not as easy). BTW: If this code performs well of course depends what you with that fileByteRange range. For example if you only read the complete file into a memory buffer joiner would reduce performance significantly.
 I believe this to be a very good way to get good performance 
 without losing the functionality of std.algorithm.

Yes, that's exactly how ranges/streams should interface, there's no real problem for users. stream.getSomeRange().rangeAPICalls....
Mar 28 2014
prev sibling next sibling parent "Paolo Invernizzi" <paolo.invernizzi no.address> writes:
On Friday, 28 March 2014 at 16:30:36 UTC, John Stahara wrote:
 On Fri, 28 Mar 2014 16:23:11 +0000, Paolo Invernizzi wrote:

 On Friday, 28 March 2014 at 09:30:25 UTC, Regan Heath wrote:
 On Fri, 28 Mar 2014 08:59:34 -0000, Paolo Invernizzi
 <paolo.invernizzi no.address> wrote:
 For what concern us, everyone here is happy with the fact 
 that empty
 *must* be checked prior to front/popFront.

This is actually not true. R

What I'm meaning, it's that we don't care: we are always respecting the sequence "empty > front > pop", and everybody here find it natural.

To clarify for Mr. Invernizzi: the "we" to which he refers is the group of people he works with, and /not/ the members of this newsgroup. --jjs

Thank you John, that's exact: I'm talking about my colleagues working with D. -- Paolo
Mar 28 2014
prev sibling next sibling parent "w0rp" <devw0rp gmail.com> writes:
On Friday, 28 March 2014 at 22:05:33 UTC, Walter Bright wrote:
 On 3/28/2014 11:40 AM, Dmitry Olshansky wrote:
 Which uses C's BUFFERED I/O and it reads from it byte by byte 
 via getc. Even
 though sys calls are amortized by C runtime, we have a 
 function call per byte.
 No wonder it's SLOW.

How about a PR to fix it?

Yes. I hold the opinion that there is not a whole lot of reason why something like byChunk can't be fast like we might desire. If it's down to getting chunks of data the right way, whatever that is, and the problem is an IO bound problem, then I don't see why we can't submit pull requests to offer optimisations on it. I don't think we need to go down this "the way it does IO isn't fast enough so we need to replace it" route. Just optimise the existing thing more.
Mar 28 2014
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Fri, 28 Mar 2014 20:14:39 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 On 3/28/14, 3:42 AM, Steven Schveighoffer wrote:
 On Thu, 27 Mar 2014 20:25:56 -0400, Walter Bright
 <newshound2 digitalmars.com> wrote:

 On 3/27/2014 2:31 PM, Steven Schveighoffer wrote:
 Adding range primitives on top of a stream does not make sense.

Are ready to implement a parallel universe of stream based algorithms to go alongside all the range based ones and be ready to constantly justify that to everyone?

Not necessary. You just need to implement one range on top of a buffered stream, and then it works with all other algorithms that accept input ranges.
 I'm also curious what a generic read() should do when the stream is
 empty.

Returns 0 elements read.

It increasingly seems to me we need to do this. -- Andrei

It is in the works. I need to finish it. I've been incorporating Dmitry's I/O buffer into my design. -Steve
Mar 28 2014
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 27 Mar 2014 19:52:52 -0400, Walter Bright  
<newshound2 digitalmars.com> wrote:

 On 3/27/2014 2:46 PM, Steven Schveighoffer wrote:
 What I am protesting is the idea that you *know* input exists, you must  
 still call empty.
 Front is enough in that case.

You can't use that on generic code, because generic code cannot assume that front will not fail.

There is a difference here that I want to establish. First, I completely agree that for generic code, for when an unknown range is passed in, empty is required to be called to verify that front is valid. Second, while I agree that empty should be required to verify front is valid on unknown ranges, it should not be required, when it is known that it's not empty. What you are proposing is that we take advantage of the requirement for empty on unknown ranges to require it on known ones. I disagree, the code looks awkward and incorrect. When I know something is empty, calling empty again doesn't give me information. instead empty now becomes "is it empty, and if it needs priming, prime it." I'd rather see another primitive, since empty does not convey that message. A counter proposal -- What if we create a global method in std.range called prime. prime(r) will call r.prime if it exists, otherwise calls r.empty, and ignores the result. Code that wants to work with "primable" ranges, can call that function, and it's harmless for normal ranges, but will allow primable ranges to fetch the first element. I think penalizing all range-using code to force a non-functional empty call would 1. artificially invalidate lots of currently valid code and 2. look completely awkward in cases where it's not normally necessary. -Steve
Mar 28 2014
prev sibling next sibling parent "monarch_dodra" <monarchdodra gmail.com> writes:
On Saturday, 29 March 2014 at 00:14:40 UTC, Andrei Alexandrescu 
wrote:
 It increasingly seems to me we need to do this. -- Andrei

What I find funny here is that walter is saying we must enforce that empty always be called in the range - "for performance". Yet about a year ago, he made a proposal for a "SentinelInputRange", where you could do away with empty altogether because "tl,dr: PERFORMANCE!". http://forum.dlang.org/thread/kgmatj$1v8b$1 digitalmars.com If you need to deviate from the rules for extra performance, then fine. Just document that it is a lower level range, with useage limitations. However, making a global design change because *1* type of range needs it, "for performance", I'm not fine with.
Mar 29 2014
prev sibling next sibling parent "monarch_dodra" <monarchdodra gmail.com> writes:
On Saturday, 29 March 2014 at 00:14:40 UTC, Andrei Alexandrescu 
wrote:
 It increasingly seems to me we need to do this. -- Andrei

Question: Should this new restriction only be applied to *Input*Ranges? If so, then it might make more sense. Ranges that also satisfy the *Forward* interface (which has save), or specifically, the *Output* interface (that doesn't even have empty) should not have this obligation. In particular, output ranges *write*, so usually can't even buffer data. *Input*-only Ranges are the ones that are usually implemented over streams, and that are most subject to the "test empty" issue too, so they are the ones that need this restriction. Could this maybe be a good solution for everyone?
Mar 29 2014
prev sibling next sibling parent "Peter Alexander" <peter.alexander.au gmail.com> writes:
On Sunday, 23 March 2014 at 00:50:34 UTC, Walter Bright wrote:
 It's become clear to me that we've underspecified what an 
 InputRange is.

As someone who has missed this discussion, can I just say this: can we please document the result somewhere, and possibly even announce it clearly so that people know that something has changed?
Mar 29 2014
prev sibling next sibling parent Marco Leise <Marco.Leise gmx.de> writes:
Am Fri, 28 Mar 2014 19:23:29 -0700
schrieb Walter Bright <newshound2 digitalmars.com>:

 On 3/28/2014 3:42 AM, Steven Schveighoffer wrote:
 I'm also curious what a generic read() should do when the stream is empty.

Returns 0 elements read.


I guess we all have a clear concept of streams in our mind from all kinds of programming languages. They typically operate on bytes, have an EOF flag and offer read/write methods that you pass a byte pointer and a length into. The result is the count of bytes read or written. Optionally they have an "available" property and handle any combination of the following: o all basic data types of the programming language o POD structs o formatted strings o bitwise operations o seeking after calculating offsets They are used for I/O on heterogeneous data like binary file formats or network protocols. Ranges on the other hand work on sequences of items of the same type, which is a small subset of what streams are supposed to support. While there should be a connection between both worlds, one cannot replace the other. There will always be a separate raw stream type in Phobos.
 Meaning it must also write through a pointer if it did read something.
 
 How is that faster than a 1 element buffer?

You can write your stream in such a way that you "map" a memory area. E.g. you call "stream.waitForSoManyBytes(123);" and then "ubyte[] arr = stream.mapMemory(123);" where arr is then a slice into the stream's internal buffer. (This also requires a mechanism to release the buffer after use, so the stream can reuse it: stream.doneWithSoManyBytes(123);) -- Marco
Mar 29 2014
prev sibling next sibling parent "monarch_dodra" <monarchdodra gmail.com> writes:
On Saturday, 29 March 2014 at 09:06:45 UTC, monarch_dodra wrote:
 On Saturday, 29 March 2014 at 00:14:40 UTC, Andrei Alexandrescu 
 wrote:
 It increasingly seems to me we need to do this. -- Andrei

Question: Should this new restriction only be applied to *Input*Ranges? If so, then it might make more sense. Ranges that also satisfy the *Forward* interface (which has save), or specifically, the *Output* interface (that doesn't even have empty) should not have this obligation. In particular, output ranges *write*, so usually can't even buffer data. *Input*-only Ranges are the ones that are usually implemented over streams, and that are most subject to the "test empty" issue too, so they are the ones that need this restriction. Could this maybe be a good solution for everyone?

Did this conversation die the instant I actually had something smart to say? andralex, WalterBright ? Thoughts?
Mar 30 2014
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Fri, 28 Mar 2014 22:23:29 -0400, Walter Bright  
<newshound2 digitalmars.com> wrote:

 On 3/28/2014 3:42 AM, Steven Schveighoffer wrote:
 I'm also curious what a generic read() should do when the stream is  
 empty.

Returns 0 elements read.

Meaning it must also write through a pointer if it did read something. How is that faster than a 1 element buffer?

I don't understand this point/question. -Steve
Mar 31 2014
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Fri, 28 Mar 2014 00:58:35 -0400, Walter Bright  
<newshound2 digitalmars.com> wrote:

 Following the protocol empty-front-popFront always works.

 Where the differences come in is when people skip calling empty.

-- when it is known empty will return false through logical deduction. -Steve
Mar 31 2014
prev sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Mon, 31 Mar 2014 13:43:05 -0400, Marc Sch=C3=BCtz <schuetzm gmx.net> =
wrote:

 I've found the example I was talking about:
 http://forum.dlang.org/thread/mailman.323.1393458346.6445.digitalmars-=

 I misremembered, filter wasn't even involved. But similar situations m=

 arise from using filter or another range that eagerly fetches the firs=

 element, even if its source doesn't.

Sure, but fetching lazily adds cost. It's possible to add it as an optio= n = when needed, but it's not easy to reclaim the cost if it's the default = implementation.
 That's what I meant by a tight loop. My hypothesis is that in such cas=

 the optimizers of at least GDC and LDC are good enough to remove the  =

 check for the boolean completely, and that at least LDC can then remov=

 the boolean itself from the range structure.

I suspect that this is an optimization that will break down when not don= e = in the "right way." I'd rather be able to choose, do I need lazy = evaluation or not? The cases where lazy evaluation is needed are not = common. -Steve
Mar 31 2014