digitalmars.D - Should range foreach be iterating over an implicit copy?
- Nick Sabalausky (55/55) May 16 2012 A small debate has broken out over on D.learn (
- Andrei Alexandrescu (4/9) May 16 2012 It is deliberate and the intent is that millions of programmers used to
- Xinok (6/20) May 16 2012 I think this is the correct behavior as well. Otherwise, we'd
- Ken (15/29) Jun 07 2012 According to the docs, however, an InputRange that is not a
- Era Scarecrow (32/45) May 16 2012 http://forum.dlang.org/post/dskrijudairkopeoyqkc@forum.dlang.org
- Jesse Phillips (13/17) May 16 2012 I am in agreement, a value type rage is confusing to use
- Nick Sabalausky (38/43) May 16 2012 I'm a little discouraged that my concern about "input ranges can't save
- Steven Schveighoffer (11/28) May 17 2012 input ranges can save their state if they are also forward ranges.
- Dmitry Olshansky (8/16) May 17 2012 s/ save state / copy themselves
- Lars T. Kyllingstad (9/11) May 17 2012 Nope, me. :)
- Ken (14/25) Jun 07 2012 I've got a use for it right now. I've implemented groupBy for my
- Erik Jensen (14/65) May 17 2012 I would be very much in support of having ranges and containers
- Jonathan M Davis (3/10) May 17 2012 opSlice is the standard way to return a range from a container.
- bearophile (6/15) May 17 2012 In that D.learn thread I've shown that iterating on a fixed-size
- Steven Schveighoffer (8/35) May 17 2012 Hm... proposal:
- Jonathan M Davis (10/17) May 17 2012 Or you could just do
- Steven Schveighoffer (5/23) May 17 2012 Probably true. The only one I'd see as being impossible to duplicate is...
- Jonathan M Davis (6/36) May 17 2012 But that only works if front returns a ref. If it doesn't, the ref would...
A small debate has broken out over on D.learn ( http://forum.dlang.org/thread/jovicg$jta$1 digitalmars.com#post-jovicg:24jta:2 1:40digitalmars.com ) that I thought I should move here. Basically, the issue is this: Currently, when you have a struct-based range, foreach will iterate over a *copy* of it: Range r; auto save = r; foreach(e; r) assert(r == save); assert(r == save); One side of the argument is that this behavior is correct and expected since structs are value types, and iterating shouldn't consume the range. My argument is this: First of all, foreach is conceptually a flow-control statement, not a function. So I'd intuitively expect: Range r; foreach(e; r) {...} To work like this: Range r; for(; !r.empty; r.popFront) { auto e = r.front; ... } Additionally, iterating a range normally *does* consume it. And that's expected, as it should be: Imagine, for example, an input range that read from stdin. Leaving that range unconsumed would make no sense at all. Actually, any input range can be expected to *not* leave an uniterated copy behind: if it *could* have an uniterated copy left behind, it would be a forward range, not an input range. It actually seems wrong to use the current foreach over an input range: Sometimes it might work by pure luck (as in the original example), but you can expect that for some input ranges it would fail miserably, because an input range is *not* a forward range and by definition cannot reliably save a previous state. So iterating over an input range would leave the original input range in an undefined state (actually, that could even be true for certain forward ranges if foreach doesn't implicitly start out by calling "r.save" and the range doesn't expect to be saved by copying). Suppose foreach *didn't* iterate over a copy: If you wanted to still have an un-iterated version (of an actual forward range, or an input range that you knew to be safe), then that's trivial: Foo a; b = a.save(); // Or "b = a;" if you really know what you're doing. foreach(elem; a) {} assert(a.empty); assert(!b.empty); Note, however, that there is no such simple way to go the other way around: to have the current "foreach over a copy" behavior and have access to the actual iterated range. Maybe if we had "foreach(e; ref range)", but AFAIK we don't. One counter-argument that was raised is that TDPL has an example on page 381 that indicates foreach iterates over an implicit copy. I don't have a copy handy ATM, so I can't look at it myself, but I'd love to see Andrei weigh in on this: I'm curious if this example in TDPL made that copy deliberately, or if the full implications of that copy were just an oversight.
May 16 2012
On 5/16/12 4:37 PM, Nick Sabalausky wrote:One counter-argument that was raised is that TDPL has an example on page 381 that indicates foreach iterates over an implicit copy. I don't have a copy handy ATM, so I can't look at it myself, but I'd love to see Andrei weigh in on this: I'm curious if this example in TDPL made that copy deliberately, or if the full implications of that copy were just an oversight.It is deliberate and the intent is that millions of programmers used to foreach from other languages don't scream "where is my range???" Andrei
May 16 2012
On Wednesday, 16 May 2012 at 21:40:39 UTC, Andrei Alexandrescu wrote:On 5/16/12 4:37 PM, Nick Sabalausky wrote:I think this is the correct behavior as well. Otherwise, we'd have to take extra care when writing generic code to ensure it doesn't consume the range but doesn't attempt to call .save on non-range types.One counter-argument that was raised is that TDPL has an example on page 381 that indicates foreach iterates over an implicit copy. I don't have a copy handy ATM, so I can't look at it myself, but I'd love to see Andrei weigh in on this: I'm curious if this example in TDPL made that copy deliberately, or if the full implications of that copy were just an oversight.It is deliberate and the intent is that millions of programmers used to foreach from other languages don't scream "where is my range???" Andrei
May 16 2012
On Wednesday, 16 May 2012 at 21:40:39 UTC, Andrei Alexandrescu wrote:On 5/16/12 4:37 PM, Nick Sabalausky wrote:According to the docs, however, an InputRange that is not a ForwardRange won't necessarily behave the way millions of programers used to foreach from other languages expect them to. Such an operation will most likely consume the underlying data, whether foreach makes an implicit copy or not. Put another way, whether a non-ForwardRange is consumed by a foreach the way it currently works is unspecified. The docs also say that the right way to checkpoint a ForwardRange is by calling save(). This tells me that the most intuitive way for foreach to work is: a. If it's a ForwardRange, make a copy with save(), then consume the copy. b. If it's not a ForwardRange, consume the original.One counter-argument that was raised is that TDPL has an example on page 381 that indicates foreach iterates over an implicit copy. I don't have a copy handy ATM, so I can't look at it myself, but I'd love to see Andrei weigh in on this: I'm curious if this example in TDPL made that copy deliberately, or if the full implications of that copy were just an oversight.It is deliberate and the intent is that millions of programmers used to foreach from other languages don't scream "where is my range???" Andrei
Jun 07 2012
On Wednesday, 16 May 2012 at 21:37:54 UTC, Nick Sabalausky wrote:A small debate has broken out over on D.learn ( http://forum.dlang.org/thread/jovicg$jta$1 digitalmars.com#post-jovicg:24jta:2 1:40digitalmars.com ) that I thought I should move here. Basically, the issue is this: Currently, when you have a struct-based range, foreach will iterate over a *copy* of it:http://forum.dlang.org/post/dskrijudairkopeoyqkc forum.dlang.org I've replied in the other one, but I'll repost it here, perhaps andrei can correct me. --- On Wednesday, 16 May 2012 at 20:50:55 UTC, Nick Sabalausky wrote:I was initially open to the idea I may have just misunderstood something, but I'm becoming more and more convinced that the current "foreach over an implicit copy" behavior is indeed a bad idea. I'd love to see Andrei weigh in on this. I'm curious if the example you pointed out in TDPL made the copy deliberately, or if the effects of that copy were just an oversight.I remember going over hard and long over that section. I think it's deliberate. Range can refer to many things, and I'm assuming array is one of them. So... If the array is consumed when using foreach, that seems dumb right? (An array in the end is just a small struct afterall...) --- int[] x = [1,2,3]; foreach(i; x) { //stuff } assert(x.length == 0, "Should have been consumed!"); --- Structs being value types are by value and so are copied, not referenced. Although referencing a copy does seem a bit silly... int[] x = [1,2,3]; foreach(ref i; x) { i = 10; } assert(x == [10,10,10], "But i changed them to 10!!"); -- That means that foreach(ref; ref) if it's considered (And I think it makes sense to) would work for structs in these few cases. (Course you'd still consume dynamic arrays that way) In std.stream are classes, so they are consumed. I'm probably just rambling...
May 16 2012
On Wednesday, 16 May 2012 at 21:37:54 UTC, Nick Sabalausky wrote:A small debate has broken out over on D.learn ( http://forum.dlang.org/thread/jovicg$jta$1 digitalmars.com#post-jovicg:24jta:2 1:40digitalmars.com ) that I thought I should move here.I am in agreement, a value type rage is confusing to use http://d.puremagic.com/issues/show_bug.cgi?id=7067 I don't agree with Andrei that we want to stop the "hey, where'd my range go" because we spend so much effort to get that to be expected behavior (and it is). The hybrid array really doesn't help, but I wouldn't want it changed. If .save is not called on the range then the expectation is that it will be consumed. It is clear that calling a function with a value type would likely mean it won't be consumed, but that is the odd case, the expectation is that the range would be consumed. And this leads to what is suggested in the referenced bug, a wrapper to prevent an implicit forward range.
May 16 2012
I'm a little discouraged that my concern about "input ranges can't save their state, and yet that's exactly what happens implicitly" hasn't been addressed. I was hoping to at least get a "That's not really a problem and here's why..." However, that said... "Andrei Alexandrescu" wrote...It is deliberate and the intent is that millions of programmers used to foreach from other languages don't scream "where is my range???""Era Scarecrow" wrote...Range can refer to many things, and I'm assuming array is one of them. So... If the array is consumed when using foreach, that seems dumb right? (An array in the end is just a small struct afterall...)I admit, those are fair points. With those in mind, I've given it some more thought, and here are my current...umm...thoughts: - A range is not a collection, it's a *view* of a collection (or of something else). This is a necessary distinction because ranges and collections work in fundamentally different ways: A range is, *by necessity* consumed as it's iterated - that's what popFront *does*, that's fundamentally how ranges work. For many collections, OTOH, it makes *no* sense to consume the collection while iterating it. Thus, range != collection, it's a view of a collection. - Ranges are D's answer to iterators. I don't think people used to iterators from other languages would expect their iterator to magically reset itself after being used. So I see no reason why they would expect a range (ie, an iterator-pair-with-benefits) to behave differently than that. - D's arrays conflate the ideas of "collection" and "range", hence the odd edge case Era pointed out, and hence the "need" for foreach to automatically make a copy. But in my (not super-extensive) experience creating ranges, I've found that to be a problematic pattern (due to the fundamental distinction between a range and a collection), and learned to prefer making my iterable things *return* a range, rather than actually *be* ranges. - Admittedly, it would be annoying if foreach had to be used like this on all collections: "foreach(e; myArray.rangeOf)", so I guess it would make sense for a range to automatically be obtained when foreach-ing over a collection. However, I'm still not 100% sold on the current method of doing that (making foreach automatically copy struct-based ranges), partly because of the questionable implications it has for input ranges, and partly because (for struct-ranges) it leaves no way to access the range that's actually being iterated. - At the very least, perhaps input ranges just shouldn't be allowed to be structs? After all, structs are intended to be copied around, but input ranges, by definition, can't have their current state copied.
May 16 2012
On Thu, 17 May 2012 01:48:06 -0400, Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> wrote:I'm a little discouraged that my concern about "input ranges can't save their state, and yet that's exactly what happens implicitly" hasn't been addressed. I was hoping to at least get a "That's not really a problem and here's why..."input ranges can save their state if they are also forward ranges.However, that said... "Andrei Alexandrescu" wrote...[snip] This is not a problem with ranges, this is an issue with value types versus reference types. I believe someone has created a byRef struct that wraps a range and iterates it byRef (maybe dsimcha?) Almost all ranges except true input-only ranges (i.e. input ranges that are only input ranges) should be value types. If we *didn't* do this, you would need heap data for each range. -SteveIt is deliberate and the intent is that millions of programmers used to foreach from other languages don't scream "where is my range???""Era Scarecrow" wrote...Range can refer to many things, and I'm assuming array is one of them. So... If the array is consumed when using foreach, that seems dumb right? (An array in the end is just a small struct afterall...)I admit, those are fair points. With those in mind, I've given it some more thought, and here are my current...umm...thoughts:
May 17 2012
On 17.05.2012 18:18, Steven Schveighoffer wrote:On Thu, 17 May 2012 01:48:06 -0400, Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> wrote:s/ save state / copy themselves Fixed :) That's why I tried to come up with BacktrackRange or some such. An input range that you can "shake" to reset to some saved point, but can't have two separate states at _the same time_ (nor full copy, it's like ... file) -- Dmitry OlshanskyI'm a little discouraged that my concern about "input ranges can't save their state, and yet that's exactly what happens implicitly" hasn't been addressed. I was hoping to at least get a "That's not really a problem and here's why..."input ranges can save their state if they are also forward ranges.
May 17 2012
On Thursday, 17 May 2012 at 14:18:55 UTC, Steven Schveighoffer wrote:[...] I believe someone has created a byRef struct that wraps a range and iterates it byRef (maybe dsimcha?)Nope, me. :) https://github.com/kyllingstad/ltk/blob/master/ltk/range.d#L12 It only supports the input range primitives, because that was all I needed when I wrote it. I offered to contribute it to Phobos (with support for the remaining range types, of course), but the interest for it seemed rather low. -Lars
May 17 2012
On Friday, 18 May 2012 at 06:17:13 UTC, Lars T. Kyllingstad wrote:On Thursday, 17 May 2012 at 14:18:55 UTC, Steven Schveighoffer wrote:I've got a use for it right now. I've implemented groupBy for my project, which is like std.algorithm.group except that instead of giving you the number of adjacent equivalent items, it gives you a lazy subrange of them. So you end up with a range of subranges. Trouble is, if you feed the inner range to foreach, it works on a copy of the inner range. Which means that the outer range skips through the entire inner range you just iterated through in order to get to the next inner range. Now if the range you feed to groupBy is a true non-ForwardRange, where copying it just creates an alias of some kind, this problem goes away. Your wrapper looks like it would do the trick there. At any rate, the whole thing makes an excellent use case for foreach(ref e; ref r).[...] I believe someone has created a byRef struct that wraps a range and iterates it byRef (maybe dsimcha?)Nope, me. :) https://github.com/kyllingstad/ltk/blob/master/ltk/range.d#L12 It only supports the input range primitives, because that was all I needed when I wrote it. I offered to contribute it to Phobos (with support for the remaining range types, of course), but the interest for it seemed rather low. -Lars
Jun 07 2012
On Thursday, 17 May 2012 at 05:48:44 UTC, Nick Sabalausky wrote:- A range is not a collection, it's a *view* of a collection (or of something else). This is a necessary distinction because ranges and collections work in fundamentally different ways: A range is, *by necessity* consumed as it's iterated - that's what popFront *does*, that's fundamentally how ranges work. For many collections, OTOH, it makes *no* sense to consume the collection while iterating it. Thus, range != collection, it's a view of a collection. - Ranges are D's answer to iterators. I don't think people used to iterators from other languages would expect their iterator to magically reset itself after being used. So I see no reason why they would expect a range (ie, an iterator-pair-with-benefits) to behave differently than that. - D's arrays conflate the ideas of "collection" and "range", hence the odd edge case Era pointed out, and hence the "need" for foreach to automatically make a copy. But in my (not super-extensive) experience creating ranges, I've found that to be a problematic pattern (due to the fundamental distinction between a range and a collection), and learned to prefer making my iterable things *return* a range, rather than actually *be* ranges. - Admittedly, it would be annoying if foreach had to be used like this on all collections: "foreach(e; myArray.rangeOf)", so I guess it would make sense for a range to automatically be obtained when foreach-ing over a collection. However, I'm still not 100% sold on the current method of doing that (making foreach automatically copy struct-based ranges), partly because of the questionable implications it has for input ranges, and partly because (for struct-ranges) it leaves no way to access the range that's actually being iterated. - At the very least, perhaps input ranges just shouldn't be allowed to be structs? After all, structs are intended to be copied around, but input ranges, by definition, can't have their current state copied.I would be very much in support of having ranges and containers be distinct, with a standard way to get a range from a container. getEnumerator() method. The enumerator itself has a Current property and moveNext method (similar to front and popFront of a range), and thus is consumed as you use it. In my experience, this system works very well. Another advantage of giving arrays a method to obtain a range instead of them being a range themselves is that using foreach on a const/immutable array would work as expected without having to perform a slice to make it work. I would even go so far as to say that having an array BE a range really doesn't make any sense, conceptually.
May 17 2012
On Friday, May 18, 2012 01:04:35 Erik Jensen wrote:I would be very much in support of having ranges and containers be distinct, with a standard way to get a range from a container. getEnumerator() method. The enumerator itself has a Current property and moveNext method (similar to front and popFront of a range), and thus is consumed as you use it. In my experience, this system works very well.opSlice is the standard way to return a range from a container. - Jonathan M Davis
May 17 2012
Nick Sabalausky:One side of the argument is that this behavior is correct and expected since structs are value types, and iterating shouldn't consume the range.In that D.learn thread I've shown that iterating on a fixed-size array, that is a value, doesn't perform a copy of the array.Imagine, for example, an input range that read from stdin. Leaving that range unconsumed would make no sense at all. Actually, any input range can be expected to *not* leave an uniterated copy behind: if it *could* have an uniterated copy left behind, it would be a forward range, not an input range.Seems right. Bye, bearophile
May 17 2012
On Wed, 16 May 2012 17:37:14 -0400, Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> wrote:A small debate has broken out over on D.learn ( http://forum.dlang.org/thread/jovicg$jta$1 digitalmars.com#post-jovicg:24jta:2 1:40digitalmars.com ) that I thought I should move here. Basically, the issue is this: Currently, when you have a struct-based range, foreach will iterate over a *copy* of it: Range r; auto save = r; foreach(e; r) assert(r == save); assert(r == save); One side of the argument is that this behavior is correct and expected since structs are value types, and iterating shouldn't consume the range. My argument is this: First of all, foreach is conceptually a flow-control statement, not a function. So I'd intuitively expect: Range r; foreach(e; r) {...} To work like this: Range r; for(; !r.empty; r.popFront) { auto e = r.front; ... }Hm... proposal: foreach(e; ref r) { } equates to your desired code. Would this help? -Steve
May 17 2012
On Thursday, May 17, 2012 11:05:46 Steven Schveighoffer wrote:Hm... proposal: foreach(e; ref r) { } equates to your desired code. Would this help?Or you could just do for(; !r.empty; r.popFront()) { auto e = r.front; } I really don't think that that's a big deal. I don't think that the language change would be worth having yet another thing in the language to remember, particularly when it's so easy to just use for to do the job. - Jonathan M Davis
May 17 2012
On Thu, 17 May 2012 11:42:04 -0400, Jonathan M Davis <jmdavisProg gmx.com> wrote:On Thursday, May 17, 2012 11:05:46 Steven Schveighoffer wrote:Probably true. The only one I'd see as being impossible to duplicate is: foreach(ref e; ref r) -SteveHm... proposal: foreach(e; ref r) { } equates to your desired code. Would this help?Or you could just do for(; !r.empty; r.popFront()) { auto e = r.front; } I really don't think that that's a big deal. I don't think that the language change would be worth having yet another thing in the language to remember, particularly when it's so easy to just use for to do the job.
May 17 2012
On Thursday, May 17, 2012 13:23:22 Steven Schveighoffer wrote:On Thu, 17 May 2012 11:42:04 -0400, Jonathan M Davis <jmdavisProg gmx.com> wrote:But that only works if front returns a ref. If it doesn't, the ref would be to a local variable (so I don't know if that even compiles). And if front returns a ref, all you have to do is use front instead of e. So, no it wouldn't be _identical_, but close enough. - Jonathan M DavisOn Thursday, May 17, 2012 11:05:46 Steven Schveighoffer wrote:Probably true. The only one I'd see as being impossible to duplicate is: foreach(ref e; ref r)Hm... proposal: foreach(e; ref r) { } equates to your desired code. Would this help?Or you could just do for(; !r.empty; r.popFront()) { auto e = r.front; } I really don't think that that's a big deal. I don't think that the language change would be worth having yet another thing in the language to remember, particularly when it's so easy to just use for to do the job.
May 17 2012