www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - foreach range with index

reply =?UTF-8?B?THXDrXM=?= Marques <luis luismarques.eu> writes:
In the documentation for foreach with range concepts (not x..y 
ranges) 
(<http://dlang.org/spec/statement.html#foreach-with-ranges>) I 
did not find anything about foreach supporting iteration 
indices/tuple unpacking for ranges, such as `foreach(i, elm; 
range)`. It is "documented" in an example for std.range.enumerate 
though [1].

I found a bug report asking for that to be documented [2] and one 
saying that it should be removed [3].

One of the things that I like in D is the plasticity of the code. 
In the situation that brought me to this issue, I started with a 
plain slice which I iterated with foreach(i, foo; foos). Then 
`foos` became a range, and the foreach became foreach(i, foo; 
foos.enumerate); Thus, I didn't have to restructure my foreach 
iteration, moving the index variable outside the foreach 
statement, and manually incrementing the index, making this 
change less disruptive.

Some thoughts about this:

- It doesn't strike me as very realistic to remove support for 
foreach(i, e; range.enumerate) now, as probably a lot of people 
are relying on this. I find it very useful, especially as 
currently there doesn't seem to be a good alternative.

- The plasticity was not perfect in my example; ideally I 
wouldn't have to change the foreach statement at all (i.e., add 
.enumerate). Consider: my range had random access, why couldn't 
foreach itself generate the index, as it does for slices, instead 
of relying on .enumerate and tuple unpacking? Maybe it would even 
make sense for input ranges, I'm not sure.

- Whatever changes you decide for tuple unpacking and foreach, 
please try to keep in mind this plasticity concern. I'm trying 
some techniques to make data-oriented designs more flexible, and 
this feature seems like an important tool for the approach I'm 
following/devising.

- Since foreach with ranges and unpacking (i.e., indices when 
using .enumerate) seems here to stay for now, please document it. 
The code I was writing was supporting material for an article. 
But I'm afraid to advocate an approach which relies on behavior 
which is undocumented and which people are arguing should be 
removed. Not exactly confidence inspiring...

Thank you for your hard work.

Cheers,
Luís

[1] <http://dlang.org/phobos/std_range.html#enumerate>
[2] <https://issues.dlang.org/show_bug.cgi?id=7361>
[3] <https://issues.dlang.org/show_bug.cgi?id=9817>
Jun 13
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 6/13/17 1:15 PM, Luís Marques wrote:
 In the documentation for foreach with range concepts (not x..y ranges)
 (<http://dlang.org/spec/statement.html#foreach-with-ranges>) I did not
 find anything about foreach supporting iteration indices/tuple unpacking
 for ranges, such as `foreach(i, elm; range)`. It is "documented" in an
 example for std.range.enumerate though [1].

 I found a bug report asking for that to be documented [2] and one saying
 that it should be removed [3].

 One of the things that I like in D is the plasticity of the code. In the
 situation that brought me to this issue, I started with a plain slice
 which I iterated with foreach(i, foo; foos). Then `foos` became a range,
 and the foreach became foreach(i, foo; foos.enumerate); Thus, I didn't
 have to restructure my foreach iteration, moving the index variable
 outside the foreach statement, and manually incrementing the index,
 making this change less disruptive.

 Some thoughts about this:

 - It doesn't strike me as very realistic to remove support for
 foreach(i, e; range.enumerate) now, as probably a lot of people are
 relying on this. I find it very useful, especially as currently there
 doesn't seem to be a good alternative.
We aren't going to as far as I can tell. I closed that bug report.
 - The plasticity was not perfect in my example; ideally I wouldn't have
 to change the foreach statement at all (i.e., add ..enumerate).
 Consider: my range had random access, why couldn't foreach itself
 generate the index, as it does for slices, instead of relying on
 .enumerate and tuple unpacking? Maybe it would even make sense for input
 ranges, I'm not sure.
The issue here is the difference between foreach on arrays (which has nothing to do with ranges), and foreach on a range. In the former, the array index is wholly a concept of the foreach loop itself. The compiler inserts and maintains the index in the loop, as the array structure has no storage or maintenance of an index. In the latter, the index is maintained by the range, and so stored there. AND it's copied and stored as a temporary for your loop via the unpacking of the tuple. I feel a change in the way foreach and ranges interact may be needed to make things more efficient and less awkward. opApply has some really nice features, one of them being that you can declare loop-specific data such as an index, and another being that foreach'ing an opApply structure with different types/numbers of parameters works just fine. But I think leaving the definition of the index up to the range itself is paramount -- I don't want every range to be able to have a size_t index, as that's not always what you want, and it conflicts with other items. What we may need is a smarter way to get from the type parameters at the front of the foreach to an iterable type.
 - Since foreach with ranges and unpacking (i.e., indices when using
 .enumerate) seems here to stay for now, please document it. The code I
 was writing was supporting material for an article. But I'm afraid to
 advocate an approach which relies on behavior which is undocumented and
 which people are arguing should be removed. Not exactly confidence
 inspiring...
It should be documented, 100%. -Steve
Jun 13
parent reply =?UTF-8?B?THXDrXM=?= Marques <luis luismarques.eu> writes:
On Tuesday, 13 June 2017 at 21:44:43 UTC, Steven Schveighoffer 
wrote:
 But I think leaving the definition of the index up to the range 
 itself is paramount -- I don't want every range to be able to 
 have a size_t index, as that's not always what you want, and it 
 conflicts with other items. What we may need is a smarter way 
 to get from the type parameters at the front of the foreach to 
 an iterable type.
That's why I mentioned random access ranges, as in that case we can sidestep this discussion; since foreach understands input ranges, why can't foreach understand random access ones, and iterate them as if they were slices, including maintaining a foreach-generated index? That is, it would have nothing to do with tuple unpacking. Plus, it would work transparently in the cases you replace `slice` with `slice.algorithm`, where algorithm maintains random-access. Instead of having to add .enumerate in each place the slice (now a range) is iterated, it would just work.
Jun 13
next sibling parent reply =?UTF-8?B?THXDrXM=?= Marques <luis luismarques.eu> writes:
On Wednesday, 14 June 2017 at 02:42:46 UTC, Luís Marques wrote:
 On Tuesday, 13 June 2017 at 21:44:43 UTC, Steven Schveighoffer 
 wrote:
 But I think leaving the definition of the index up to the 
 range itself is paramount -- I don't want every range to be 
 able to have a size_t index, as that's not always what you 
 want, and it conflicts with other items. What we may need is a 
 smarter way to get from the type parameters at the front of 
 the foreach to an iterable type.
That's why I mentioned random access ranges, as in that case we can sidestep this discussion; since foreach understands input ranges, why can't foreach understand random access ones, and iterate them as if they were slices, including maintaining a foreach-generated index? That is, it would have nothing to do with tuple unpacking. Plus, it would work transparently in the cases you replace `slice` with `slice.algorithm`, where algorithm maintains random-access. Instead of having to add .enumerate in each place the slice (now a range) is iterated, it would just work.
Hmm, I suppose that even if we have all of the good stuff of length, opIndex, opSlice, etc., we can't assume the indices are [0..length], right? But there should be some way for, say, map to preserve that information from the slice, the same way random access is preserved.
Jun 13
parent =?UTF-8?B?THXDrXM=?= Marques <luis luismarques.eu> writes:
On Wednesday, 14 June 2017 at 02:48:34 UTC, Luís Marques wrote:
 Hmm, I suppose that even if we have all of the good stuff of 
 length, opIndex, opSlice, etc., we can't assume the indices are 
 [0..length], right? But there should be some way for, say, map 
 to preserve that information from the slice, the same way 
 random access is preserved.
A quick look into std.algorithm.iteration suggests that this is not actually true. For instance: static if (hasLength!R) { foreach (k; 0 .. data.length / 16) So I guess we are assuming that if we have length then indexing is always [0..length]. These are the kinds of details that I never know because there is no spec for ranges :(
Jun 13
prev sibling next sibling parent reply 9il <ilyayaroshenko gmail.com> writes:
On Wednesday, 14 June 2017 at 02:42:46 UTC, Luís Marques wrote:
 On Tuesday, 13 June 2017 at 21:44:43 UTC, Steven Schveighoffer 
 wrote:
 But I think leaving the definition of the index up to the 
 range itself is paramount -- I don't want every range to be 
 able to have a size_t index, as that's not always what you 
 want, and it conflicts with other items. What we may need is a 
 smarter way to get from the type parameters at the front of 
 the foreach to an iterable type.
That's why I mentioned random access ranges, as in that case we can sidestep this discussion; since foreach understands input ranges, why can't foreach understand random access ones, and iterate them as if they were slices, including maintaining a foreach-generated index? That is, it would have nothing to do with tuple unpacking. Plus, it would work transparently in the cases you replace `slice` with `slice.algorithm`, where algorithm maintains random-access. Instead of having to add .enumerate in each place the slice (now a range) is iterated, it would just work.
Random access iteration is more expensive then front/popFront in general case. For arrays it is not true. But for many other kinds of ranges it is. For example ranges that do not have contiguous memory representation (strided vectors, flattened matrixes and etc)
Jun 13
parent reply =?UTF-8?B?THXDrXM=?= Marques <luis luismarques.eu> writes:
On Wednesday, 14 June 2017 at 02:54:30 UTC, 9il wrote:
 Random access iteration is more expensive then front/popFront 
 in general case.
 For arrays it is not true. But for many other kinds of ranges 
 it is. For example ranges that do not have contiguous memory 
 representation (strided vectors, flattened matrixes and etc)
I'm not sure I understand what you are saying. I was suggesting that foreach(range) would iterate range[0...length] in that order. Wouldn't that be an equivalent access pattern to front/popFront?
Jun 13
parent 9il <ilyayaroshenko gmail.com> writes:
On Wednesday, 14 June 2017 at 03:06:26 UTC, Luís Marques wrote:
 On Wednesday, 14 June 2017 at 02:54:30 UTC, 9il wrote:
 Random access iteration is more expensive then front/popFront 
 in general case.
 For arrays it is not true. But for many other kinds of ranges 
 it is. For example ranges that do not have contiguous memory 
 representation (strided vectors, flattened matrixes and etc)
I'm not sure I understand what you are saying. I was suggesting that foreach(range) would iterate range[0...length] in that order. Wouldn't that be an equivalent access pattern to front/popFront?
-------A foreach(__i; 0 .. r.length) { auto e = r[__i]; } -------B for (auto __r = range; !__r.empty; __r.popFront()) { auto e = __r.front; ... } They are equivalent, but "A" is slower then "B" in general case. So, B is preferred.
Jun 13
prev sibling parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 6/13/17 10:42 PM, Luís Marques wrote:
 On Tuesday, 13 June 2017 at 21:44:43 UTC, Steven Schveighoffer wrote:
 But I think leaving the definition of the index up to the range itself
 is paramount -- I don't want every range to be able to have a size_t
 index, as that's not always what you want, and it conflicts with other
 items. What we may need is a smarter way to get from the type
 parameters at the front of the foreach to an iterable type.
That's why I mentioned random access ranges, as in that case we can sidestep this discussion; since foreach understands input ranges, why can't foreach understand random access ones, and iterate them as if they were slices, including maintaining a foreach-generated index? That is, it would have nothing to do with tuple unpacking. Plus, it would work transparently in the cases you replace `slice` with `slice.algorithm`, where algorithm maintains random-access. Instead of having to add .enumerate in each place the slice (now a range) is iterated, it would just work.
Yes, foreach could be augmented to do that. However, this is essentially a way to implement enumerate without the .enumerate, and nothing more. It's not a lot of added benefit IMO. What I'd rather see is a better mechanism for ranges (or things that provide ranges) to hook the foreach syntax to allow better control and power from ranges without using traditional opApply. For example, if we had: foreach(size_t a, b; x) translates to: for(auto _context = x.opApply!(size_t, void); !context.empty; _context.popFront) { auto (a, b) = _context.front; // the way it works now with tuples, don't think there's a real syntax for this. ... } -Steve
Jun 14
parent reply =?UTF-8?B?THXDrXM=?= Marques <luis luismarques.eu> writes:
On Wednesday, 14 June 2017 at 14:35:32 UTC, Steven Schveighoffer 
wrote:
 What I'd rather see is a better mechanism for ranges (or things 
 that provide ranges) to hook the foreach syntax to allow better 
 control and power from ranges without using traditional opApply.

 For example, if we had:

 foreach(size_t a, b; x)

 translates to:

 for(auto _context = x.opApply!(size_t, void); !context.empty; 
 _context.popFront)
 {
    auto (a, b) = _context.front; // the way it works now with 
 tuples, don't think there's a real syntax for this.
    ...
 }
An approach like that seems fine to me. In any case, why do it in exclusion of a foreach-generated index? Imagine that I define an input range. Can we assume that range.front is *conceptually* equivalent to range[0], even if the range doesn't define random access? Because if we can, then, if the range doesn't define the new opApply or whatever, why not allow foreach to generate the index?
 Yes, foreach could be augmented to do that. However, this is 
 essentially a way to implement enumerate without the 
 .enumerate, and nothing more. It's not a lot of added benefit 
 IMO.
Consider this case. You create a prototype where foo is a slice. Then you start improving the code and foo becomes a range. In your code you have several places that use foo with foreach, and several places where it is used directly. If you define the new foo as newRangeFoo.enumerate, then the several foreach work automatically, but the other uses don't because they have to be changed to foo.value; if you do it the other way around, the direct uses work automatically but the foreach all have to be changed from foo to foo.enumerate. Thus you have lost a lot of the code plasticity. If the foreach statement generated the indices then it would work automatically. This could also be solved by the range defining your new version of opApply and foreach understanding that. Assuming that we can wrap old ranges (e.g. from a library you don't want to change) with that opApply, I would be fine with that solution as long as this is something that actually goes forward. I'm just afraid this is something that will linger because it's a more difficult design decision, while having foreach support index generation seems like a more self contained solution that could be agreed upon and implemented quickly.
Jun 14
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 6/14/17 11:42 AM, Luís Marques wrote:
 On Wednesday, 14 June 2017 at 14:35:32 UTC, Steven Schveighoffer wrote:
 What I'd rather see is a better mechanism for ranges (or things that
 provide ranges) to hook the foreach syntax to allow better control and
 power from ranges without using traditional opApply.

 For example, if we had:

 foreach(size_t a, b; x)

 translates to:

 for(auto _context = x.opApply!(size_t, void); !context.empty;
 _context.popFront)
 {
    auto (a, b) = _context.front; // the way it works now with tuples,
 don't think there's a real syntax for this.
    ...
 }
An approach like that seems fine to me. In any case, why do it in exclusion of a foreach-generated index? Imagine that I define an input range. Can we assume that range.front is *conceptually* equivalent to range[0], even if the range doesn't define random access? Because if we can, then, if the range doesn't define the new opApply or whatever, why not allow foreach to generate the index?
That's when you use range.enumerate ;)
 Yes, foreach could be augmented to do that. However, this is
 essentially a way to implement enumerate without the .enumerate, and
 nothing more. It's not a lot of added benefit IMO.
Consider this case. You create a prototype where foo is a slice. Then you start improving the code and foo becomes a range. In your code you have several places that use foo with foreach, and several places where it is used directly. If you define the new foo as newRangeFoo.enumerate, then the several foreach work automatically, but the other uses don't because they have to be changed to foo.value; if you do it the other way around, the direct uses work automatically but the foreach all have to be changed from foo to foo.enumerate. Thus you have lost a lot of the code plasticity. If the foreach statement generated the indices then it would work automatically.
This is inherent of arrays working differently than ranges, but being considered ranges via std.array. People consider slices to be ranges, but they really aren't as far as the compiler is concerned. I don't know if there is room to allow range-defined indexes and foreach-defined indexes. foreach(a, b; someRange) has to do one or the other. Otherwise, changes to how someRange operates can make this a completely different operation.
 This could also be solved by the range defining your new version of
 opApply and foreach understanding that. Assuming that we can wrap old
 ranges (e.g. from a library you don't want to change) with that opApply,
 I would be fine with that solution as long as this is something that
 actually goes forward. I'm just afraid this is something that will
 linger because it's a more difficult design decision, while having
 foreach support index generation seems like a more self contained
 solution that could be agreed upon and implemented quickly.
I think opApply is kind of a forgotten piece of the language since ranges have been introduced. We could resurrect it quite easily this way, as opApply with template parameters that return another range does not conflict with opApply that takes a delegate. Easy to see if one works and if not try something else. The huge benefit of this mechanism is to allow complete control to the type over how it should be iterated, and not have to fight the compiler. -Steve
Jun 14
parent reply =?UTF-8?B?THXDrXM=?= Marques <luis luismarques.eu> writes:
On Wednesday, 14 June 2017 at 16:17:50 UTC, Steven Schveighoffer 
wrote:
 I think opApply is kind of a forgotten piece of the language 
 since ranges have been introduced. We could resurrect it quite 
 easily this way, as opApply with template parameters that 
 return another range does not conflict with opApply that takes 
 a delegate. Easy to see if one works and if not try something 
 else. The huge benefit of this mechanism is to allow complete 
 control to the type over how it should be iterated, and not 
 have to fight the compiler.
You say "quite easily". I'm afraid this will linger. Please let's not drop the ball on this, whatever the chosen solution is.
 This is inherent of arrays working differently than ranges, but 
 being considered ranges via std.array. People consider slices 
 to be ranges, but they really aren't as far as the compiler is 
 concerned.
Sorry, I'm not seeing the connection. foreach is an abstract iteration statement and we are just discussion if it is reasonable for foreach to keep track of the iteration number.
 I don't know if there is room to allow range-defined indexes 
 and foreach-defined indexes. foreach(a, b; someRange) has to do 
 one or the other. Otherwise, changes to how someRange operates 
 can make this a completely different operation.
Seems like exactly what we want. If you add a customization to someRange to change the iteration behavior then each relevant foreach automatically switches from the default to the requested behavior. That's exactly the point of programming to abstract interfaces.
Jun 14
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 6/14/17 12:35 PM, Luís Marques wrote:
 On Wednesday, 14 June 2017 at 16:17:50 UTC, Steven Schveighoffer wrote:
 I don't know if there is room to allow range-defined indexes and
 foreach-defined indexes. foreach(a, b; someRange) has to do one or the
 other. Otherwise, changes to how someRange operates can make this a
 completely different operation.
Seems like exactly what we want. If you add a customization to someRange to change the iteration behavior then each relevant foreach automatically switches from the default to the requested behavior. That's exactly the point of programming to abstract interfaces.
For example: foreach(i, v; hashmap) => i is counter, v is value Later hashmap adds support for iterating key and value. Now i is key, v is value. Code means something completely different. Compare with foreach(i, v; hashmap.enumerate) Intent is clear from the code. -Steve
Jun 14
next sibling parent reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 06/14/2017 12:22 PM, Steven Schveighoffer wrote:

 foreach(i, v; hashmap) => i is counter, v is value

 Later hashmap adds support for iterating key and value. Now i is key, v
 is value. Code means something completely different.

 Compare with

 foreach(i, v; hashmap.enumerate)

 Intent is clear from the code.

 -Steve
Then, perhaps we're arguing in favor of * writing .enumerate even for slices (implying that automatic indexing for them has been a historical artifact and code that wants to be portable should always write .enumerate) * making sure that enumerate() on arrays don't bring extra cost Ali
Jun 14
next sibling parent reply bachmeier <no spam.net> writes:
On Wednesday, 14 June 2017 at 22:02:30 UTC, Ali Çehreli wrote:

 Then, perhaps we're arguing in favor of

 * writing .enumerate even for slices (implying that automatic 
 indexing for them has been a historical artifact and code that 
 wants to be portable should always write .enumerate)

 * making sure that enumerate() on arrays don't bring extra cost

 Ali
I like this proposal. Is there a reason you don't introduce .enumerate in the section of your book "The counter is automatic only for arrays"?
Jun 14
parent =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 06/14/2017 03:48 PM, bachmeier wrote:
 On Wednesday, 14 June 2017 at 22:02:30 UTC, Ali Çehreli wrote:

 Then, perhaps we're arguing in favor of

 * writing .enumerate even for slices (implying that automatic indexing
 for them has been a historical artifact and code that wants to be
 portable should always write .enumerate)

 * making sure that enumerate() on arrays don't bring extra cost

 Ali
I like this proposal. Is there a reason you don't introduce .enumerate in the section of your book "The counter is automatic only for arrays"?
No reason other than ranges appear later in the book. But I took a note; I will add your suggestion soon. Otherwise, .enumerate is mentioned here: http://ddili.org/ders/d.en/foreach_opapply.html#ix_foreach_opapply.enumerate,%20std.range Ali
Jun 14
prev sibling parent Steven Schveighoffer <schveiguy yahoo.com> writes:
On 6/14/17 6:02 PM, Ali Çehreli wrote:
 On 06/14/2017 12:22 PM, Steven Schveighoffer wrote:

 foreach(i, v; hashmap) => i is counter, v is value

 Later hashmap adds support for iterating key and value. Now i is key, v
 is value. Code means something completely different.

 Compare with

 foreach(i, v; hashmap.enumerate)

 Intent is clear from the code.

 -Steve
Then, perhaps we're arguing in favor of * writing .enumerate even for slices (implying that automatic indexing for them has been a historical artifact and code that wants to be portable should always write .enumerate) * making sure that enumerate() on arrays don't bring extra cost
I would say making enumerate on *any* range shouldn't bring any extra cost over how foreach works on an array. One idea I had but haven't thought it through completely is a way to mark some parameter to foreach as always referencing the actual index, so you aren't making unnecessary copies for the loop. When you foreach a range, a copy is made just for the loop, and *then* a copy is made each loop iteration for the element itself. Maybe tagging a parameter in foreach as lazy means "always use the range element". -Steve
Jun 14
prev sibling parent reply =?UTF-8?B?THXDrXM=?= Marques <luis luismarques.eu> writes:
On Wednesday, 14 June 2017 at 19:22:24 UTC, Steven Schveighoffer 
wrote:
 For example:

 foreach(i, v; hashmap) => i is counter, v is value

 Later hashmap adds support for iterating key and value. Now i 
 is key, v is value. Code means something completely different.
If we take a step back, I think we are discussing the generalization of foreach. Let's assume we want to preserve the current foreach behavior where associative types iterate with key and value pairs. If the "hashmap" in your example was supposed to be an associative type then `i` wasn't a counter, it was the key already. If "hashmap" didn't define the iteration key (using a new kind of opApply or whatever) then it wasn't an associative type, and therefore your change wouldn't be valid. So, we just have to be careful that when foreach(i, v; foo) is introduced the relevant types in phobos also implement the new behavior, so that this situation doesn't arise afterwards..
Jun 14
parent Steven Schveighoffer <schveiguy yahoo.com> writes:
On 6/14/17 6:46 PM, Luís Marques wrote:
 On Wednesday, 14 June 2017 at 19:22:24 UTC, Steven Schveighoffer wrote:
 For example:

 foreach(i, v; hashmap) => i is counter, v is value

 Later hashmap adds support for iterating key and value. Now i is key,
 v is value. Code means something completely different.
If we take a step back, I think we are discussing the generalization of foreach. Let's assume we want to preserve the current foreach behavior where associative types iterate with key and value pairs. If the "hashmap" in your example was supposed to be an associative type then `i` wasn't a counter, it was the key already. If "hashmap" didn't define the iteration key (using a new kind of opApply or whatever) then it wasn't an associative type, and therefore your change wouldn't be valid. So, we just have to be careful that when foreach(i, v; foo) is introduced the relevant types in phobos also implement the new behavior, so that this situation doesn't arise afterwards..
The point I'm making is that we have mechanisms for a type to define what to do when you foreach with 2 parameters per loop iteration. What you are proposing is to allow the compiler to define what to do in the case where the type doesn't. But what happens if it then defines the behavior later? code that was written expecting the first item to be an arbitrary index is now hijacked by the type to mean something else. This can't happen with arrays, because the language says so. The language doesn't have control over an arbitrary struct. It could be any type, I just used a hypothetical hashmap that didn't already have key/value iteration. Probably an unlikely scenario, but it was what popped into my head as a demonstration. And what if you DID want to iterate just an incrementing index with the values when the type defines 2-parameter iteration otherwise? That mechanism isn't available to you without some effort. So again, we still need an enumerate to do what you want. -Steve
Jun 14