www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Should range foreach be iterating over an implicit copy?

reply "Nick Sabalausky" <SeeWebsiteToContactMe semitwist.com> writes:
A small debate has broken out over on D.learn ( 
http://forum.dlang.org/thread/jovicg$jta$1 digitalmars.com#post-jovicg:24jta:2
1:40digitalmars.com ) 
that I thought I should move here.

Basically, the issue is this: Currently, when you have a struct-based range, 
foreach will iterate over a *copy* of it:

    Range r;
    auto save = r;
    foreach(e; r)
        assert(r == save);
    assert(r == save);

One side of the argument is that this behavior is correct and expected since 
structs are value types, and iterating shouldn't consume the range.

My argument is this:

First of all, foreach is conceptually a flow-control statement, not a 
function. So I'd intuitively expect:

    Range r;
    foreach(e; r) {...}

To work like this:

    Range r;
    for(; !r.empty; r.popFront)
    {
        auto e = r.front;
        ...
    }

Additionally, iterating a range normally *does* consume it. And that's 
expected, as it should be: Imagine, for example, an input range that read 
from stdin. Leaving that range unconsumed would make no sense at all. 
Actually, any input range can be expected to *not* leave an uniterated copy 
behind: if it *could* have an uniterated copy left behind, it would be a 
forward range, not an input range.

It actually seems wrong to use the current foreach over an input range: 
Sometimes it might work by pure luck (as in the original example), but you 
can expect that for some input ranges it would fail miserably, because an 
input range is *not* a forward range and by definition cannot reliably save 
a previous state. So iterating over an input range would leave the original 
input range in an undefined state (actually, that could even be true for 
certain forward ranges if foreach doesn't implicitly start out by calling 
"r.save" and the range doesn't expect to be saved by copying).

Suppose foreach *didn't* iterate over a copy: If you wanted to still have an 
un-iterated version (of an actual forward range, or an input range that you 
knew to be safe), then that's trivial:

    Foo a;
    b = a.save(); // Or "b = a;" if you really know what you're doing.
    foreach(elem; a) {}
    assert(a.empty);
    assert(!b.empty);

Note, however, that there is no such simple way to go the other way around: 
to have the current "foreach over a copy" behavior and have access to the 
actual iterated range. Maybe if we had "foreach(e; ref range)", but AFAIK we 
don't.

One counter-argument that was raised is that TDPL has an example on page 381 
that indicates foreach iterates over an implicit copy. I don't have a copy 
handy ATM, so I can't look at it myself, but I'd love to see Andrei weigh in 
on this: I'm curious if this example in TDPL made that copy deliberately, or 
if the full implications of that copy were just an oversight.
May 16 2012
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 5/16/12 4:37 PM, Nick Sabalausky wrote:
 One counter-argument that was raised is that TDPL has an example on page 381
 that indicates foreach iterates over an implicit copy. I don't have a copy
 handy ATM, so I can't look at it myself, but I'd love to see Andrei weigh in
 on this: I'm curious if this example in TDPL made that copy deliberately, or
 if the full implications of that copy were just an oversight.

It is deliberate and the intent is that millions of programmers used to foreach from other languages don't scream "where is my range???" Andrei
May 16 2012
next sibling parent "Xinok" <xinok live.com> writes:
On Wednesday, 16 May 2012 at 21:40:39 UTC, Andrei Alexandrescu 
wrote:
 On 5/16/12 4:37 PM, Nick Sabalausky wrote:
 One counter-argument that was raised is that TDPL has an 
 example on page 381
 that indicates foreach iterates over an implicit copy. I don't 
 have a copy
 handy ATM, so I can't look at it myself, but I'd love to see 
 Andrei weigh in
 on this: I'm curious if this example in TDPL made that copy 
 deliberately, or
 if the full implications of that copy were just an oversight.

It is deliberate and the intent is that millions of programmers used to foreach from other languages don't scream "where is my range???" Andrei

I think this is the correct behavior as well. Otherwise, we'd have to take extra care when writing generic code to ensure it doesn't consume the range but doesn't attempt to call .save on non-range types.
May 16 2012
prev sibling parent "Ken" <kennethuil gmail.com> writes:
On Wednesday, 16 May 2012 at 21:40:39 UTC, Andrei Alexandrescu 
wrote:
 On 5/16/12 4:37 PM, Nick Sabalausky wrote:
 One counter-argument that was raised is that TDPL has an 
 example on page 381
 that indicates foreach iterates over an implicit copy. I don't 
 have a copy
 handy ATM, so I can't look at it myself, but I'd love to see 
 Andrei weigh in
 on this: I'm curious if this example in TDPL made that copy 
 deliberately, or
 if the full implications of that copy were just an oversight.

It is deliberate and the intent is that millions of programmers used to foreach from other languages don't scream "where is my range???" Andrei

According to the docs, however, an InputRange that is not a ForwardRange won't necessarily behave the way millions of programers used to foreach from other languages expect them to. Such an operation will most likely consume the underlying data, whether foreach makes an implicit copy or not. Put another way, whether a non-ForwardRange is consumed by a foreach the way it currently works is unspecified. The docs also say that the right way to checkpoint a ForwardRange is by calling save(). This tells me that the most intuitive way for foreach to work is: a. If it's a ForwardRange, make a copy with save(), then consume the copy. b. If it's not a ForwardRange, consume the original.
Jun 07 2012
prev sibling next sibling parent "Era Scarecrow" <rtcvb32 yahoo.com> writes:
On Wednesday, 16 May 2012 at 21:37:54 UTC, Nick Sabalausky wrote:
 A small debate has broken out over on D.learn (
 http://forum.dlang.org/thread/jovicg$jta$1 digitalmars.com#post-jovicg:24jta:2
1:40digitalmars.com 
 )
 that I thought I should move here.

 Basically, the issue is this: Currently, when you have a 
 struct-based range, foreach will iterate over a *copy* of it:

http://forum.dlang.org/post/dskrijudairkopeoyqkc forum.dlang.org I've replied in the other one, but I'll repost it here, perhaps andrei can correct me. --- On Wednesday, 16 May 2012 at 20:50:55 UTC, Nick Sabalausky wrote:
 I was initially open to the idea I may have just misunderstood 
 something, but I'm becoming more and more convinced that the 
 current "foreach over an implicit copy" behavior is indeed a 
 bad idea.

 I'd love to see Andrei weigh in on this. I'm curious if the 
 example you pointed out in TDPL made the copy deliberately, or 
 if the effects of that copy were just an oversight.

I remember going over hard and long over that section. I think it's deliberate. Range can refer to many things, and I'm assuming array is one of them. So... If the array is consumed when using foreach, that seems dumb right? (An array in the end is just a small struct afterall...) --- int[] x = [1,2,3]; foreach(i; x) { //stuff } assert(x.length == 0, "Should have been consumed!"); --- Structs being value types are by value and so are copied, not referenced. Although referencing a copy does seem a bit silly... int[] x = [1,2,3]; foreach(ref i; x) { i = 10; } assert(x == [10,10,10], "But i changed them to 10!!"); -- That means that foreach(ref; ref) if it's considered (And I think it makes sense to) would work for structs in these few cases. (Course you'd still consume dynamic arrays that way) In std.stream are classes, so they are consumed. I'm probably just rambling...
May 16 2012
prev sibling next sibling parent "Jesse Phillips" <jessekphillips+D gmail.com> writes:
On Wednesday, 16 May 2012 at 21:37:54 UTC, Nick Sabalausky wrote:
 A small debate has broken out over on D.learn (
 http://forum.dlang.org/thread/jovicg$jta$1 digitalmars.com#post-jovicg:24jta:2
1:40digitalmars.com 
 )
 that I thought I should move here.

I am in agreement, a value type rage is confusing to use http://d.puremagic.com/issues/show_bug.cgi?id=7067 I don't agree with Andrei that we want to stop the "hey, where'd my range go" because we spend so much effort to get that to be expected behavior (and it is). The hybrid array really doesn't help, but I wouldn't want it changed. If .save is not called on the range then the expectation is that it will be consumed. It is clear that calling a function with a value type would likely mean it won't be consumed, but that is the odd case, the expectation is that the range would be consumed. And this leads to what is suggested in the referenced bug, a wrapper to prevent an implicit forward range.
May 16 2012
prev sibling next sibling parent reply "Nick Sabalausky" <SeeWebsiteToContactMe semitwist.com> writes:
I'm a little discouraged that my concern about "input ranges can't save 
their state, and yet that's exactly what happens implicitly" hasn't been 
addressed. I was hoping to at least get a "That's not really a problem and 
here's why..."

However, that said...

"Andrei Alexandrescu" wrote...
 It is deliberate and the intent is that millions of programmers used to 
 foreach from other languages don't scream "where is my range???"

"Era Scarecrow" wrote...
 Range can refer to many things, and I'm assuming array is one of them. 
 So... If the array is consumed when using foreach, that seems dumb right? 
 (An array in the end is just a small struct afterall...)

I admit, those are fair points. With those in mind, I've given it some more thought, and here are my current...umm...thoughts: - A range is not a collection, it's a *view* of a collection (or of something else). This is a necessary distinction because ranges and collections work in fundamentally different ways: A range is, *by necessity* consumed as it's iterated - that's what popFront *does*, that's fundamentally how ranges work. For many collections, OTOH, it makes *no* sense to consume the collection while iterating it. Thus, range != collection, it's a view of a collection. - Ranges are D's answer to iterators. I don't think people used to iterators from other languages would expect their iterator to magically reset itself after being used. So I see no reason why they would expect a range (ie, an iterator-pair-with-benefits) to behave differently than that. - D's arrays conflate the ideas of "collection" and "range", hence the odd edge case Era pointed out, and hence the "need" for foreach to automatically make a copy. But in my (not super-extensive) experience creating ranges, I've found that to be a problematic pattern (due to the fundamental distinction between a range and a collection), and learned to prefer making my iterable things *return* a range, rather than actually *be* ranges. - Admittedly, it would be annoying if foreach had to be used like this on all collections: "foreach(e; myArray.rangeOf)", so I guess it would make sense for a range to automatically be obtained when foreach-ing over a collection. However, I'm still not 100% sold on the current method of doing that (making foreach automatically copy struct-based ranges), partly because of the questionable implications it has for input ranges, and partly because (for struct-ranges) it leaves no way to access the range that's actually being iterated. - At the very least, perhaps input ranges just shouldn't be allowed to be structs? After all, structs are intended to be copied around, but input ranges, by definition, can't have their current state copied.
May 16 2012
next sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 17 May 2012 01:48:06 -0400, Nick Sabalausky  
<SeeWebsiteToContactMe semitwist.com> wrote:

 I'm a little discouraged that my concern about "input ranges can't save
 their state, and yet that's exactly what happens implicitly" hasn't been
 addressed. I was hoping to at least get a "That's not really a problem  
 and
 here's why..."

input ranges can save their state if they are also forward ranges.
 However, that said...

 "Andrei Alexandrescu" wrote...
 It is deliberate and the intent is that millions of programmers used to
 foreach from other languages don't scream "where is my range???"

"Era Scarecrow" wrote...
 Range can refer to many things, and I'm assuming array is one of them.
 So... If the array is consumed when using foreach, that seems dumb  
 right?
 (An array in the end is just a small struct afterall...)

I admit, those are fair points. With those in mind, I've given it some more thought, and here are my current...umm...thoughts:

[snip] This is not a problem with ranges, this is an issue with value types versus reference types. I believe someone has created a byRef struct that wraps a range and iterates it byRef (maybe dsimcha?) Almost all ranges except true input-only ranges (i.e. input ranges that are only input ranges) should be value types. If we *didn't* do this, you would need heap data for each range. -Steve
May 17 2012
parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 17.05.2012 18:18, Steven Schveighoffer wrote:
 On Thu, 17 May 2012 01:48:06 -0400, Nick Sabalausky
 <SeeWebsiteToContactMe semitwist.com> wrote:

 I'm a little discouraged that my concern about "input ranges can't save
 their state, and yet that's exactly what happens implicitly" hasn't been
 addressed. I was hoping to at least get a "That's not really a problem
 and
 here's why..."

input ranges can save their state if they are also forward ranges.

s/ save state / copy themselves Fixed :) That's why I tried to come up with BacktrackRange or some such. An input range that you can "shake" to reset to some saved point, but can't have two separate states at _the same time_ (nor full copy, it's like ... file) -- Dmitry Olshansky
May 17 2012
prev sibling next sibling parent "Erik Jensen" <eriksjunk rkjnsn.net> writes:
On Thursday, 17 May 2012 at 05:48:44 UTC, Nick Sabalausky wrote:
 - A range is not a collection, it's a *view* of a collection 
 (or of
 something else). This is a necessary distinction because ranges 
 and
 collections work in fundamentally different ways: A range is, 
 *by necessity*
 consumed as it's iterated - that's what popFront *does*, that's
 fundamentally how ranges work. For many collections, OTOH, it 
 makes *no*
 sense to consume the collection while iterating it. Thus, range 
 !=
 collection, it's a view of a collection.

 - Ranges are D's answer to iterators. I don't think people used 
 to iterators
 from other languages would expect their iterator to magically 
 reset itself
 after being used. So I see no reason why they would expect a 
 range (ie, an
 iterator-pair-with-benefits) to behave differently than that.

 - D's arrays conflate the ideas of "collection" and "range", 
 hence the odd
 edge case Era pointed out, and hence the "need" for foreach to 
 automatically
 make a copy. But in my (not super-extensive) experience 
 creating ranges,
 I've found that to be a problematic pattern (due to the 
 fundamental
 distinction between a range and a collection), and learned to 
 prefer making
 my iterable things *return* a range, rather than actually *be* 
 ranges.

 - Admittedly, it would be annoying if foreach had to be used 
 like this on
 all collections: "foreach(e; myArray.rangeOf)", so I guess it 
 would make
 sense for a range to automatically be obtained when foreach-ing 
 over a
 collection. However, I'm still not 100% sold on the current 
 method of doing
 that (making foreach automatically copy struct-based ranges), 
 partly because
 of the questionable implications it has for input ranges, and 
 partly because
 (for struct-ranges) it leaves no way to access the range that's 
 actually
 being iterated.

 - At the very least, perhaps input ranges just shouldn't be 
 allowed to be
 structs? After all, structs are intended to be copied around, 
 but input
 ranges, by definition, can't have their current state copied.

I would be very much in support of having ranges and containers be distinct, with a standard way to get a range from a container. Something very similar is done in C#, where containers have a getEnumerator() method. The enumerator itself has a Current property and moveNext method (similar to front and popFront of a range), and thus is consumed as you use it. In my experience, this system works very well. Another advantage of giving arrays a method to obtain a range instead of them being a range themselves is that using foreach on a const/immutable array would work as expected without having to perform a slice to make it work. I would even go so far as to say that having an array BE a range really doesn't make any sense, conceptually.
May 17 2012
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Friday, May 18, 2012 01:04:35 Erik Jensen wrote:
 I would be very much in support of having ranges and containers
 be distinct, with a standard way to get a range from a container.
 Something very similar is done in C#, where containers have a
 getEnumerator() method. The enumerator itself has a Current
 property and moveNext method (similar to front and popFront of a
 range), and thus is consumed as you use it. In my experience,
 this system works very well.

opSlice is the standard way to return a range from a container. - Jonathan M Davis
May 17 2012
prev sibling next sibling parent "Lars T. Kyllingstad" <public kyllingen.net> writes:
On Thursday, 17 May 2012 at 14:18:55 UTC, Steven Schveighoffer 
wrote:
 [...]  I believe someone has created a byRef struct that wraps 
 a range and iterates it byRef (maybe dsimcha?)

Nope, me. :) https://github.com/kyllingstad/ltk/blob/master/ltk/range.d#L12 It only supports the input range primitives, because that was all I needed when I wrote it. I offered to contribute it to Phobos (with support for the remaining range types, of course), but the interest for it seemed rather low. -Lars
May 17 2012
prev sibling parent "Ken" <kennethuil gmail.com> writes:
On Friday, 18 May 2012 at 06:17:13 UTC, Lars T. Kyllingstad wrote:
 On Thursday, 17 May 2012 at 14:18:55 UTC, Steven Schveighoffer 
 wrote:
 [...]  I believe someone has created a byRef struct that wraps 
 a range and iterates it byRef (maybe dsimcha?)

Nope, me. :) https://github.com/kyllingstad/ltk/blob/master/ltk/range.d#L12 It only supports the input range primitives, because that was all I needed when I wrote it. I offered to contribute it to Phobos (with support for the remaining range types, of course), but the interest for it seemed rather low. -Lars

I've got a use for it right now. I've implemented groupBy for my project, which is like std.algorithm.group except that instead of giving you the number of adjacent equivalent items, it gives you a lazy subrange of them. So you end up with a range of subranges. Trouble is, if you feed the inner range to foreach, it works on a copy of the inner range. Which means that the outer range skips through the entire inner range you just iterated through in order to get to the next inner range. Now if the range you feed to groupBy is a true non-ForwardRange, where copying it just creates an alias of some kind, this problem goes away. Your wrapper looks like it would do the trick there. At any rate, the whole thing makes an excellent use case for foreach(ref e; ref r).
Jun 07 2012
prev sibling next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
Nick Sabalausky:

 One side of the argument is that this behavior is correct and 
 expected since structs are value types, and iterating
 shouldn't consume the range.

In that D.learn thread I've shown that iterating on a fixed-size array, that is a value, doesn't perform a copy of the array.
 Imagine, for example, an input range that read from stdin.
 Leaving that range unconsumed would make no sense at all.
 Actually, any input range can be expected to *not* leave an 
 uniterated copy behind: if it *could* have an uniterated
 copy left behind, it would be a forward range, not an input
 range.

Seems right. Bye, bearophile
May 17 2012
prev sibling next sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Wed, 16 May 2012 17:37:14 -0400, Nick Sabalausky  
<SeeWebsiteToContactMe semitwist.com> wrote:

 A small debate has broken out over on D.learn (
 http://forum.dlang.org/thread/jovicg$jta$1 digitalmars.com#post-jovicg:24jta:2
1:40digitalmars.com  
 )
 that I thought I should move here.

 Basically, the issue is this: Currently, when you have a struct-based  
 range,
 foreach will iterate over a *copy* of it:

     Range r;
     auto save = r;
     foreach(e; r)
         assert(r == save);
     assert(r == save);

 One side of the argument is that this behavior is correct and expected  
 since
 structs are value types, and iterating shouldn't consume the range.

 My argument is this:

 First of all, foreach is conceptually a flow-control statement, not a
 function. So I'd intuitively expect:

     Range r;
     foreach(e; r) {...}

 To work like this:

     Range r;
     for(; !r.empty; r.popFront)
     {
         auto e = r.front;
         ...
     }

Hm... proposal: foreach(e; ref r) { } equates to your desired code. Would this help? -Steve
May 17 2012
next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Thursday, May 17, 2012 11:05:46 Steven Schveighoffer wrote:
 Hm... proposal:
 
 foreach(e; ref r)
 {
 }
 
 equates to your desired code.  Would this help?

Or you could just do for(; !r.empty; r.popFront()) { auto e = r.front; } I really don't think that that's a big deal. I don't think that the language change would be worth having yet another thing in the language to remember, particularly when it's so easy to just use for to do the job. - Jonathan M Davis
May 17 2012
prev sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 17 May 2012 11:42:04 -0400, Jonathan M Davis <jmdavisProg gmx.com>  
wrote:

 On Thursday, May 17, 2012 11:05:46 Steven Schveighoffer wrote:
 Hm... proposal:

 foreach(e; ref r)
 {
 }

 equates to your desired code.  Would this help?

Or you could just do for(; !r.empty; r.popFront()) { auto e = r.front; } I really don't think that that's a big deal. I don't think that the language change would be worth having yet another thing in the language to remember, particularly when it's so easy to just use for to do the job.

Probably true. The only one I'd see as being impossible to duplicate is: foreach(ref e; ref r) -Steve
May 17 2012
prev sibling parent "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Thursday, May 17, 2012 13:23:22 Steven Schveighoffer wrote:
 On Thu, 17 May 2012 11:42:04 -0400, Jonathan M Davis <jmdavisProg gmx.com>
 
 wrote:
 On Thursday, May 17, 2012 11:05:46 Steven Schveighoffer wrote:
 Hm... proposal:
 
 foreach(e; ref r)
 {
 }
 
 equates to your desired code. Would this help?

Or you could just do for(; !r.empty; r.popFront()) { auto e = r.front; } I really don't think that that's a big deal. I don't think that the language change would be worth having yet another thing in the language to remember, particularly when it's so easy to just use for to do the job.

Probably true. The only one I'd see as being impossible to duplicate is: foreach(ref e; ref r)

But that only works if front returns a ref. If it doesn't, the ref would be to a local variable (so I don't know if that even compiles). And if front returns a ref, all you have to do is use front instead of e. So, no it wouldn't be _identical_, but close enough. - Jonathan M Davis
May 17 2012