digitalmars.D.learn - foreach() behavior on ranges

frame (19/19) Aug 24 2021 Consider a simple input range that can be iterated with empty(),

bauss (25/45) Aug 24 2021 A range should be a struct always and thus its state is copied

frame (5/8) Aug 24 2021 This is not conform with the aggregate expression mentioned in
Alexandru Ermicioi (21/34) Aug 24 2021 Actually the range contracts don't mention that it needs to be a

Ferhat =?UTF-8?B?S3VydHVsbXXFnw==?= (8/33) Aug 24 2021 Just out of curiosity, if a range implementation uses malloc in

=?UTF-8?Q?Ali_=c3=87ehreli?= (36/39) Aug 24 2021 Yes but It depends on the specific case. For example, if the type has a=...

bauss (6/12) Aug 24 2021 Of course it doesn't disallow classes but it's generally advised

Alexandru Ermicioi (4/8) Aug 25 2021 Well, sometimes you can't avoid ref types. For example when you

Joseph Rushton Wakeling (7/9) Aug 25 2021 That's quite a strong assumption, because its state might be a

Steven Schveighoffer (11/22) Aug 25 2021 structs still provide a mechanism (postblit/copy ctor) to properly save

Joseph Rushton Wakeling (22/32) Aug 25 2021 Consider a struct whose internal fields are just a pointer to its

Steven Schveighoffer (12/44) Aug 25 2021 In a world where copyability means it's a forward range? Yes. We aren't

Joseph Rushton Wakeling (11/24) Aug 25 2021 OK, that makes sense.

H. S. Teoh (30/47) Aug 25 2021 [...]

Joseph Rushton Wakeling (35/62) Aug 26 2021 That definition is potentially misleading if we take into account

jfondren (36/41) Aug 24 2021 I think you strayed from the beaten path, in a second way, as

frame (7/42) Aug 24 2021 Yes, I have a special case where a delegate jumps back to the

Steven Schveighoffer (25/47) Aug 24 2021 You can call `popFront` if you need to after the loop, or just before

frame (32/56) Aug 24 2021 Of course by the next iteration, you are right.

Steven Schveighoffer (65/113) Aug 24 2021 I can't agree at all. It's totally expected.

frame (26/75) Aug 25 2021 I get your point, you see foreach() as raw translate to the

Steven Schveighoffer (13/19) Aug 25 2021 It seems what you are after is forward ranges. Those are able to

frame (11/32) Aug 26 2021 This could be any custom method for my ranges or forward range

H. S. Teoh (37/46) Aug 24 2021 Generally, if you need precise control over range state between multiple

frame (3/11) Aug 24 2021 I'm only talking about foreach() uses and that you should'nt need

Alexandru Ermicioi (17/19) Aug 24 2021 This is expected behavior imho. I think what you need is a

frame (3/6) Aug 25 2021 I know, but foreach() doesn't call save().

Alexandru Ermicioi (7/8) Aug 25 2021 Hmm, this is a regression probably, or I missed the time frame

Steven Schveighoffer (4/14) Aug 25 2021 It never has called `save`. It makes a copy, which is almost always the

Alexandru Ermicioi (13/16) Aug 25 2021 Really?

Steven Schveighoffer (15/33) Aug 25 2021 The `save` function was used to provide a way for code like

frame <frame86 live.com> writes:

Consider a simple input range that can be iterated with empty(), 
front() and popFront(). That is comfortable to use with foreach() 
but what if the foreach loop will be cancelled? If a range isn't 
depleted yet and continued it will supply the same data twice on 
front() in the next use of foreach().

For some reason, foreach() does not call popFront() on a break or 
continue statement. There is no way to detect it except the range 
itself tracks its status and does an implicit popFront() if 
needed - but then this whole interface is some kind of useless.

There is opApply() on the other hand that is designed for 
foreach() and informs via non-0-result if the loop is cancelled - 
but this means that every range must implement it if the range 
should work in foreach() correctly?

This is very inconsistent. Either foreach() should deny usage of 
ranges that have no opApply() method or there should be a reset() 
or cancel() method in the interfaces that may be called by 
foreach() if they are implemented.

How do you handle that issue? Are your ranges designed to have 
this bug or do you implement opApply() always?

Aug 24 2021

bauss <jj_1337 live.dk> writes:

On Tuesday, 24 August 2021 at 08:36:18 UTC, frame wrote:
 Consider a simple input range that can be iterated with 
 empty(), front() and popFront(). That is comfortable to use 
 with foreach() but what if the foreach loop will be cancelled? 
 If a range isn't depleted yet and continued it will supply the 
 same data twice on front() in the next use of foreach().

 For some reason, foreach() does not call popFront() on a break 
 or continue statement. There is no way to detect it except the 
 range itself tracks its status and does an implicit popFront() 
 if needed - but then this whole interface is some kind of 
 useless.

 There is opApply() on the other hand that is designed for 
 foreach() and informs via non-0-result if the loop is cancelled 
 - but this means that every range must implement it if the 
 range should work in foreach() correctly?

 This is very inconsistent. Either foreach() should deny usage 
 of ranges that have no opApply() method or there should be a 
 reset() or cancel() method in the interfaces that may be called 
 by foreach() if they are implemented.

 How do you handle that issue? Are your ranges designed to have 
 this bug or do you implement opApply() always?

A range should be a struct always and thus its state is copied 
when the foreach loop is created.

Which means the state resets every time the loop is initiated.

If your range uses some internal state that isn't able to be 
copied then or your ranges are not structs then your ranges are 
inherently incorrect.

This is what a foreach loop on a range actually compiles to:

```d
for (auto copy = range; !copy.empty; copy.popFront())
{
     ...
}
```

This is easily evident in this example:

https://run.dlang.io/is/YFuWHn

Which prints:
1
2
1
2
3
4
5

Unless I'm misunderstanding your concern?

Aug 24 2021

frame <frame86 live.com> writes:

On Tuesday, 24 August 2021 at 09:15:23 UTC, bauss wrote:

 A range should be a struct always and thus its state is copied 
 when the foreach loop is created.

This is not conform with the aggregate expression mentioned in 
the manual where a class object would be also allowed.

 Which means the state resets every time the loop is initiated.

Yes, it should reset - thus foreach() also needs to handle that 
correctly.

Aug 24 2021

Alexandru Ermicioi <alexandru.ermicioi gmail.com> writes:

On Tuesday, 24 August 2021 at 09:15:23 UTC, bauss wrote:
 A range should be a struct always and thus its state is copied 
 when the foreach loop is created.

Actually the range contracts don't mention that it needs to be a 
by value type. It can also be a reference type, i.e. a class.

 Which means the state resets every time the loop is initiated.

True for any forward range and above, not true for input ranges. 
The problem with them is that some of them are structs, and even 
if they are not forward ranges they do have this behavior due to 
implicit copy on assignment, which can potentially make the code 
confusing.

 If your range uses some internal state that isn't able to be 
 copied then or your ranges are not structs then your ranges are 
 inherently incorrect.

If we follow the definition of ranges, they must not be copy-able 
at all. The only way to copy/save, would be to have .save method 
and call that method. This again is not being properly followed 
by even phobos implementations.

Note, that a better approach would be to replace .save in 
definition of forward range with a copy constructor, then all 
non-compliant ranges would become suddenly compliant, while those 
that have .save method should be refactored to a copy constructor 
version.

 This is what a foreach loop on a range actually compiles to:

 ```d
 for (auto copy = range; !copy.empty; copy.popFront())
 {
     ...
 }
 ```

You should add .save on assignment if range is a forward range, 
or just remove the assignment if it is not.

Best regards,
Alexandru.

Aug 24 2021

Ferhat =?UTF-8?B?S3VydHVsbXXFnw==?= <aferust gmail.com> writes:

On Tuesday, 24 August 2021 at 19:06:44 UTC, Alexandru Ermicioi 
wrote:
 On Tuesday, 24 August 2021 at 09:15:23 UTC, bauss wrote:
 [...]

 Actually the range contracts don't mention that it needs to be 
 a by value type. It can also be a reference type, i.e. a class.

 [...]

 True for any forward range and above, not true for input 
 ranges. The problem with them is that some of them are structs, 
 and even if they are not forward ranges they do have this 
 behavior due to implicit copy on assignment, which can 
 potentially make the code confusing.

 [...]

 If we follow the definition of ranges, they must not be 
 copy-able at all. The only way to copy/save, would be to have 
 .save method and call that method. This again is not being 
 properly followed by even phobos implementations.

 Note, that a better approach would be to replace .save in 
 definition of forward range with a copy constructor, then all 
 non-compliant ranges would become suddenly compliant, while 
 those that have .save method should be refactored to a copy 
 constructor version.

 [...]

 You should add .save on assignment if range is a forward range, 
 or just remove the assignment if it is not.

 Best regards,
 Alexandru.

Just out of curiosity, if a range implementation uses malloc in 
save, is it only possible to free the memory with the dtor? I 
worry about that especially when using those nogc range 
implementations with standard library. I don't have a list of the 
functions calling save in phobos. Is a save function only 
meaningful for GC ranges?

Aug 24 2021

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

On 8/24/21 1:44 PM, Ferhat Kurtulmu=C5=9F wrote:

 Just out of curiosity, if a range implementation uses malloc in save, =

is
 it only possible to free the memory with the dtor?

Yes but It depends on the specific case. For example, if the type has a=20
clear() function that does clean up, then one might call that. I don't=20
see it as being different from any other resource management.

 Is a save function only meaningful for GC ranges?

save() is to store the iteration state of a range. It should seldom=20
require memory allocation unless we're dealing with e.g. stdin where we=20
would have to store input lines just to support save(). It would not be=20
a good design to hide such  potentilly expensive storage of lines behind =

save().

To me, save() should mostly be as trivial as returning a copy of the=20
struct object to preserve the state of the original range. Here is a=20
trivial generator:

import std.range;

struct Squares {
   int current;

   enum empty =3D false;

   int front() const {
     return current * current;
   }

   void popFront() {
     ++current;
   }

   auto save() {
     return this;
   }
}

void main() {
   auto r =3D Squares(0);
   r.popFront();  // Drop 0 * 0
   r.popFront();  // Drop 1 * 1

   auto copy =3D r.save;
   copy.popFront();  // Drop 2 * 2 only from the copy

   assert(r.front =3D=3D 2 * 2);  // Saved original still has 2 * 2
}

Ali

Aug 24 2021

bauss <jj_1337 live.dk> writes:

On Tuesday, 24 August 2021 at 19:06:44 UTC, Alexandru Ermicioi 
wrote:
 On Tuesday, 24 August 2021 at 09:15:23 UTC, bauss wrote:
 A range should be a struct always and thus its state is copied 
 when the foreach loop is created.

 Actually the range contracts don't mention that it needs to be 
 a by value type. It can also be a reference type, i.e. a class.

Of course it doesn't disallow classes but it's generally advised 
that you use structs and that's what you want in 99% of the 
cases. It's usually a red flag when a range starts being a 
reference type.

Aug 24 2021

Alexandru Ermicioi <alexandru.ermicioi gmail.com> writes:

On Wednesday, 25 August 2021 at 06:51:36 UTC, bauss wrote:
 Of course it doesn't disallow classes but it's generally 
 advised that you use structs and that's what you want in 99% of 
 the cases. It's usually a red flag when a range starts being a 
 reference type.

Well, sometimes you can't avoid ref types. For example when you 
need to mask the implementation of the range, but yes, in most of 
the cases best is to use simpler methods to represent ranges.

Aug 25 2021

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

On Tuesday, 24 August 2021 at 09:15:23 UTC, bauss wrote:
 A range should be a struct always and thus its state is copied 
 when the foreach loop is created.

That's quite a strong assumption, because its state might be a 
reference type, or it might not _have_ state in a meaningful 
sense -- consider an input range that wraps reading from a 
socket, or that just reads from `/dev/urandom`, for two examples.

Deterministic copying per foreach loop is only guaranteed for 
forward ranges.

Aug 25 2021

Steven Schveighoffer <schveiguy gmail.com> writes:

On 8/25/21 6:06 AM, Joseph Rushton Wakeling wrote:
 On Tuesday, 24 August 2021 at 09:15:23 UTC, bauss wrote:
 A range should be a struct always and thus its state is copied when 
 the foreach loop is created.

 
 That's quite a strong assumption, because its state might be a reference 
 type, or it might not _have_ state in a meaningful sense -- consider an 
 input range that wraps reading from a socket, or that just reads from 
 `/dev/urandom`, for two examples.
 
 Deterministic copying per foreach loop is only guaranteed for forward 
 ranges.

structs still provide a mechanism (postblit/copy ctor) to properly save 
a forward range when copying, even if the guts need copying (unlike 
classes). In general, I think it was a mistake to use `.save` as the 
mechanism, as generally `.save` is equivalent to copying, so nobody does 
it, and code works fine for most ranges.

What should have happened is that input-only ranges should not have been 
copyable, and copying should have been the save mechanism. Then it 
becomes way way more obvious what is happening. Yes, this means forgoing 
classes as ranges.

-Steve

Aug 25 2021

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

On Wednesday, 25 August 2021 at 10:59:44 UTC, Steven 
Schveighoffer wrote:
 structs still provide a mechanism (postblit/copy ctor) to 
 properly save a forward range when copying, even if the guts 
 need copying (unlike classes). In general, I think it was a 
 mistake to use `.save` as the mechanism, as generally `.save` 
 is equivalent to copying, so nobody does it, and code works 
 fine for most ranges.

Consider a struct whose internal fields are just a pointer to its 
"true" internal state.  Does one have any right to assume that 
the postblit/copy ctor would necessarily deep-copy that?

If that struct implements a forward range, though, and that 
pointed-to state is mutated by iteration of the range, then it 
would be reasonable to assume that the `save` method MUST 
deep-copy it, because otherwise the forward-range property would 
not be respected.

With that in mind, I am not sure it's reasonable to assume that 
just because a struct implements a forward-range API, that 
copying the struct instance is necessarily the same as saving the 
range.

Indeed, IIRC quite a few Phobos library functions program 
defensively against that difference by taking a `.save` copy of 
their input before iterating over it.

 What should have happened is that input-only ranges should not 
 have been copyable, and copying should have been the save 
 mechanism. Then it becomes way way more obvious what is 
 happening. Yes, this means forgoing classes as ranges.

I think there's a benefit of a method whose definition is 
explicitly "If you call this, you will get a copy of the range 
which will replay exactly the same results when iterating over 
it".  Just because the meaning of "copy" can be ambiguous, 
whereas a promise about how iteration can be used is not.

Aug 25 2021

Steven Schveighoffer <schveiguy gmail.com> writes:

On 8/25/21 12:46 PM, Joseph Rushton Wakeling wrote:
 On Wednesday, 25 August 2021 at 10:59:44 UTC, Steven Schveighoffer wrote:
 structs still provide a mechanism (postblit/copy ctor) to properly 
 save a forward range when copying, even if the guts need copying 
 (unlike classes). In general, I think it was a mistake to use `.save` 
 as the mechanism, as generally `.save` is equivalent to copying, so 
 nobody does it, and code works fine for most ranges.

 
 Consider a struct whose internal fields are just a pointer to its "true" 
 internal state.  Does one have any right to assume that the 
 postblit/copy ctor would necessarily deep-copy that?

In a world where copyability means it's a forward range? Yes. We aren't 
in that world, it's a hypothetical "if we could go back and redesign".

 If that struct implements a forward range, though, and that pointed-to 
 state is mutated by iteration of the range, then it would be reasonable 
 to assume that the `save` method MUST deep-copy it, because otherwise 
 the forward-range property would not be respected.
 
 With that in mind, I am not sure it's reasonable to assume that just 
 because a struct implements a forward-range API, that copying the struct 
 instance is necessarily the same as saving the range.

Technically this is true. In practice, it rarely happens. The flaw of 
`save` isn't that it's an unsound API, the flaw is that people get away 
with just copying, and it works 99.9% of the time. So code is simply 
untested with ranges where `save` is important.

 Indeed, IIRC quite a few Phobos library functions program defensively 
 against that difference by taking a `.save` copy of their input before 
 iterating over it.

I'd be willing to bet $10 there is a function in phobos right now, that 
takes forward ranges, and forgets to call `save` when iterating with 
foreach. It's just so easy to do, and works with most ranges in existence.

 
 What should have happened is that input-only ranges should not have 
 been copyable, and copying should have been the save mechanism. Then 
 it becomes way way more obvious what is happening. Yes, this means 
 forgoing classes as ranges.

 
 I think there's a benefit of a method whose definition is explicitly "If 
 you call this, you will get a copy of the range which will replay 
 exactly the same results when iterating over it".  Just because the 
 meaning of "copy" can be ambiguous, whereas a promise about how 
 iteration can be used is not.

The idea is to make the meaning of a range copy not ambiguous.

-Steve

Aug 25 2021

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

On Wednesday, 25 August 2021 at 17:01:54 UTC, Steven 
Schveighoffer wrote:
 In a world where copyability means it's a forward range? Yes. 
 We aren't in that world, it's a hypothetical "if we could go 
 back and redesign".

OK, that makes sense.

 Technically this is true. In practice, it rarely happens. The 
 flaw of `save` isn't that it's an unsound API, the flaw is that 
 people get away with just copying, and it works 99.9% of the 
 time. So code is simply untested with ranges where `save` is 
 important.

This is very true, and makes it quite reasonable to try to pursue 
"the obvious/lazy thing == the thing you're supposed to do" 
w.r.t. how ranges are defined.

 I'd be willing to bet $10 there is a function in phobos right 
 now, that takes forward ranges, and forgets to call `save` when 
 iterating with foreach. It's just so easy to do, and works with 
 most ranges in existence.

I'm sure you'd win that bet!

 The idea is to make the meaning of a range copy not ambiguous.

Yes, this feels reasonable.  And then one can reserve the idea of 
a magic deep-copy method for special cases like pseudo-RNGs where 
one wants them to be copyable on user request, but without code 
assuming it can copy them.

Aug 25 2021

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Wed, Aug 25, 2021 at 04:46:54PM +0000, Joseph Rushton Wakeling via
Digitalmars-d-learn wrote:
 On Wednesday, 25 August 2021 at 10:59:44 UTC, Steven Schveighoffer wrote:
 structs still provide a mechanism (postblit/copy ctor) to properly
 save a forward range when copying, even if the guts need copying
 (unlike classes). In general, I think it was a mistake to use
 `.save` as the mechanism, as generally `.save` is equivalent to
 copying, so nobody does it, and code works fine for most ranges.

 
 Consider a struct whose internal fields are just a pointer to its
 "true" internal state.  Does one have any right to assume that the
 postblit/copy ctor would necessarily deep-copy that?

[...]
 If that struct implements a forward range, though, and that pointed-to
 state is mutated by iteration of the range, then it would be
 reasonable to assume that the `save` method MUST deep-copy it, because
 otherwise the forward-range property would not be respected.

[...]

What I understand from what Andrei has said in the past, is that a range
is merely a "view" into some underlying storage; it is not responsible
for the contents of that storage.  My interpretation of this is that
.save will only save the *position* of the range, but it will not save
the contents it points to, so it will not (should not) deep-copy.

However, if the range is implemented by a struct that contains a
reference to its iteration state, then yes, to satisfy the definition of
.save it should deep-copy this state.


 With that in mind, I am not sure it's reasonable to assume that just
 because a struct implements a forward-range API, that copying the
 struct instance is necessarily the same as saving the range.

[...]

Andrei has mentioned before that in retrospect, .save was a design
mistake.  The difference between an input range and a forward range
should have been keyed on whether the range type has reference semantics
(input range) or by-value semantics (forward range).  But for various
reasons, including the state of the language at the time the range API
was designed, the .save route was chosen, and we're stuck with it unless
Phobos 2.0 comes into existence.

Either way, though, the semantics of a forward range pretty much
dictates that whatever type a range has, if it claims to be a forward
range then .save must preserve whatever iteration state it has at that
point in time. If this requires deep-copying some state referenced from
a struct, then that's what it takes to satisfy the API.  This may take
the form of a .save method that copies state, or a copy ctor that does
the same, or simply storing iteration state as PODs in the range struct
so that copying the struct equates to preserving the iteration state.


T

-- 
Why waste time reinventing the wheel, when you could be reinventing the engine?
-- Damian Conway

Aug 25 2021

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

On Wednesday, 25 August 2021 at 19:51:36 UTC, H. S. Teoh wrote:
 What I understand from what Andrei has said in the past, is 
 that a range is merely a "view" into some underlying storage; 
 it is not responsible for the contents of that storage.  My 
 interpretation of this is that .save will only save the 
 *position* of the range, but it will not save the contents it 
 points to, so it will not (should not) deep-copy.

That definition is potentially misleading if we take into account 
that a range is not necessarily iterating over some underlying 
storage: ranges can also be defined by algorithmic processes.  
(Think e.g. iota, or pseudo-RNGs, or a range that iterates over 
the Fibonacci numbers.)

 However, if the range is implemented by a struct that contains 
 a reference to its iteration state, then yes, to satisfy the 
 definition of .save it should deep-copy this state.

Right.  And in the case of algorithmic ranges (rather than 
container-derived ranges), the state is always and only the 
iteration state.  And then as well as that there are ranges that 
are iterating over external IO, which in most cases can't be 
treated as forward ranges but in a few cases might be (e.g. 
saving the cursor position when iterating over a file's contents).

Arguably I think a lot of problems in the range design derive 
from not thinking through those distinctions in detail 
(external-IO-based vs. algorithmic vs. container-based), even 
though superficially those seem to map well to the input vs 
forward vs bidirectional vs random-access range distinctions.

That's also not taking into account edge cases, e.g. stuff like 
RandomShuffle or RandomSample: here one can in theory copy the 
"head" of the range but one arguably wants to avoid correlations 
in the output of the different copies (which can arise from at 
least 2 different sources: copying under-the-hood pseudo-random 
state of the sampling/shuffling algorithm itself, or copying the 
underlying pseudo-random number generator).  Except perhaps in 
the case where one wants to take advantage of the pseudo-random 
feature to reproduce those sequences ... but then one wants that 
to be a conscious programmer decision, not happening by accident 
under the hood of some library function.

(Rabbit hole, here we come.)

 Andrei has mentioned before that in retrospect, .save was a 
 design mistake.  The difference between an input range and a 
 forward range should have been keyed on whether the range type 
 has reference semantics (input range) or by-value semantics 
 (forward range).  But for various reasons, including the state 
 of the language at the time the range API was designed, the 
 .save route was chosen, and we're stuck with it unless Phobos 
 2.0 comes into existence.

 Either way, though, the semantics of a forward range pretty 
 much dictates that whatever type a range has, if it claims to 
 be a forward range then .save must preserve whatever iteration 
 state it has at that point in time. If this requires 
 deep-copying some state referenced from a struct, then that's 
 what it takes to satisfy the API.  This may take the form of a 
 .save method that copies state, or a copy ctor that does the 
 same, or simply storing iteration state as PODs in the range 
 struct so that copying the struct equates to preserving the 
 iteration state.

Yes.  FWIW I agree that when _implementing_ a forward range one 
should probably make sure that copying by value and the `save` 
method produce the same results.

But as a _user_ of code implemented using the current range API, 
it might be a bad idea to assume that a 3rd party forward range 
implementation will necessarily guarantee that.

Aug 26 2021

jfondren <julian.fondren gmail.com> writes:

On Tuesday, 24 August 2021 at 08:36:18 UTC, frame wrote:
 Consider a simple input range that can be iterated with 
 empty(), front() and popFront(). That is comfortable to use 
 with foreach() but what if the foreach loop will be cancelled? 
 If a range isn't depleted yet and continued it will supply the 
 same data twice on front() in the next use of foreach().

I think you strayed from the beaten path, in a second way, as 
soon as your range's lifetime escaped a single expression, to be 
possibly used in two foreach loops. With ranges, as you do more 
unusual things, you're already encouraged to use a more advanced 
range. And ranges already have caveats for surprising behavior, 
like map/filter interactions that redundantly execute code. So I 
see this as a documentation problem. The current behavior of 'if 
you break then the next foreach gets what you broke on' is 
probably a desirable behavior for some uses:

```d
import std;

class MyIntRange {
     int[] _elements;
     size_t _offset;

     this(int[] elems) { _elements = elems; }

     bool empty() { return !_elements || _offset >= 
_elements.length; }

     int front() { return _elements[_offset]; }

     void popFront() { _offset++; }
}

void main() {
     auto ns = new MyIntRange([0, 1, 1, 2, 3, 4, 4, 4, 5]);
     // calls writeln() as many times as there are numbers:
     while (!ns.empty) {
         foreach (odd; ns) {
             if (odd % 2 == 0) break;
             writeln("odd: ", odd);
         }
         foreach (even; ns) {
             if (even % 2 != 0) break;
             writeln("even: ", even);
         }
     }
}
```

Aug 24 2021

frame <frame86 live.com> writes:

On Tuesday, 24 August 2021 at 09:26:20 UTC, jfondren wrote:

 I think you strayed from the beaten path, in a second way, as 
 soon as your range's lifetime escaped a single expression, to 
 be possibly used in two foreach loops. With ranges, as you do 
 more unusual things, you're already encouraged to use a more 
 advanced range. And ranges already have caveats for surprising 
 behavior, like map/filter interactions that redundantly execute 
 code. So I see this as a documentation problem. The current 
 behavior of 'if you break then the next foreach gets what you 
 broke on' is probably a desirable behavior for some uses:

Yes, I have a special case where a delegate jumps back to the 
range because something must be buffered before it can be 
delivered.

 ```d
 import std;

 class MyIntRange {
     int[] _elements;
     size_t _offset;

     this(int[] elems) { _elements = elems; }

     bool empty() { return !_elements || _offset >= 
 _elements.length; }

     int front() { return _elements[_offset]; }

     void popFront() { _offset++; }
 }

 void main() {
     auto ns = new MyIntRange([0, 1, 1, 2, 3, 4, 4, 4, 5]);
     // calls writeln() as many times as there are numbers:
     while (!ns.empty) {
         foreach (odd; ns) {
             if (odd % 2 == 0) break;
             writeln("odd: ", odd);
         }
         foreach (even; ns) {
             if (even % 2 != 0) break;
             writeln("even: ", even);
         }
     }
 }
 ```

That is just weird. It's not logical and a source of bugs. I 
mean, we should use foreach() to avoid loop-bugs. Then it's a 
desired behavior to rely on that?

Aug 24 2021

Steven Schveighoffer <schveiguy gmail.com> writes:

On 8/24/21 4:36 AM, frame wrote:
 Consider a simple input range that can be iterated with empty(), front() 
 and popFront(). That is comfortable to use with foreach() but what if 
 the foreach loop will be cancelled? If a range isn't depleted yet and 
 continued it will supply the same data twice on front() in the next use 
 of foreach().
 
 For some reason, foreach() does not call popFront() on a break or 
 continue statement. 

continue calls `popFront`. break does not.

 There is no way to detect it except the range itself 
 tracks its status and does an implicit popFront() if needed - but then 
 this whole interface is some kind of useless.

You can call `popFront` if you need to after the loop, or just before 
the break. I have to say, the term "useless" does not even come close to 
describing ranges using foreach in my experience.

 There is opApply() on the other hand that is designed for foreach() and 
 informs via non-0-result if the loop is cancelled - but this means that 
 every range must implement it if the range should work in foreach() 
 correctly?

`opApply` has to return different values because it needs you to pass 
through its instructions to the compiler-generated code. The compiler 
has written the delegate to return the message, and so you need to pass 
through that information. The non-zero result is significant, not just 
non-zero. For instance, if you end with a `break somelabel;` statement, 
it has to know which label to go to.

The correct behavior for `opApply` should be, if the delegate returns 
non-zero, return that value immediately. It should not be doing anything 
else. Would you be happy with a `break somelabel;` actually triggering 
output? What if it just continued the loop instead? You don't get to 
decide what happens at that point, you are acting as the compiler.

 This is very inconsistent. Either foreach() should deny usage of ranges 
 that have no opApply() method or there should be a reset() or cancel() 
 method in the interfaces that may be called by foreach() if they are 
 implemented.
 
 How do you handle that issue? Are your ranges designed to have this bug 
 or do you implement opApply() always?

It's not a bug. So there is no need to "handle" it.

The pattern of using a for(each) loop to align certain things occurs all 
the time in code. Imagine a loop that is looking for a certain line in a 
file, and breaks when the line is there. Would you really want the 
compiler to unhelpfully throw away that line for you?

And if that is what you want, put `popFront` in the loop before you 
exit. You can't "unpopFront" something, so this provides the most 
flexibility.

-Steve

Aug 24 2021

frame <frame86 live.com> writes:

On Tuesday, 24 August 2021 at 13:02:38 UTC, Steven Schveighoffer 
wrote:
 On 8/24/21 4:36 AM, frame wrote:
 Consider a simple input range that can be iterated with 
 empty(), front() and popFront(). That is comfortable to use 
 with foreach() but what if the foreach loop will be cancelled? 
 If a range isn't depleted yet and continued it will supply the 
 same data twice on front() in the next use of foreach().
 
 For some reason, foreach() does not call popFront() on a break 
 or continue statement.

 continue calls `popFront`. break does not.

Of course by the next iteration, you are right.

 You can call `popFront` if you need to after the loop, or just 
 before the break. I have to say, the term "useless" does not 
 even come close to describing ranges using foreach in my 
 experience.

I disagree, because foreach() is a language construct and 
therefore it should behave in a logic way. The methods are fine 
in ranges or if something is done manually. But in case of 
foreach() it's just unexpected.

It becomes useless for foreach() because you can't rely on them 
if other code breaks the loop and you need to use that range, 
like in my case. But also for ranges - there is no need for a 
popFront() if it is not called in a logic way. Then even empty() 
could fetch next data if needed. It only makes sense if language 
system code uses it in a strictly order and ensures that this 
order is always assured.


 It's not a bug. So there is no need to "handle" it.

 The pattern of using a for(each) loop to align certain things 
 occurs all the time in code. Imagine a loop that is looking for 
 a certain line in a file, and breaks when the line is there. 
 Would you really want the compiler to unhelpfully throw away 
 that line for you?

I don't get this point. If it breaks from the loop then it 
changes the scope anyway, so my data should be already processed 
or copied. What is thrown away here?

 And if that is what you want, put `popFront` in the loop before 
 you exit. You can't "unpopFront" something, so this provides 
 the most flexibility.

 -Steve

Yes, this is the solution but not the way how it should be. If 
the programmer uses the range methods within the foreach-loop 
then you would expect some bug. There shouldn't be a need to 
manipulate the range just because I break the foreach-loop.

Java, for example just uses next() and hasNext(). You can't run 
into a bug here because one method must move the cursor.

PHP has a rewind() method. So any foreach() would reset the range 
or could clean up before next use of it.

But D just lets your range in an inconsistent state between an 
iteration cycle. This feels just wrong. The next foreach() would 
not continue with popFront() but with empty() again - because it 
even relies on it that a range should be called in a given order. 
As there is no rewind or exit-method, this order should be 
maintained by foreach-exit too, preparing for next use. That's it.

You don't see a bug here?

Aug 24 2021

Steven Schveighoffer <schveiguy gmail.com> writes:

On 8/24/21 2:12 PM, frame wrote:
 You can call `popFront` if you need to after the loop, or just before 
 the break. I have to say, the term "useless" does not even come close 
 to describing ranges using foreach in my experience.

 
 I disagree, because foreach() is a language construct and therefore it 
 should behave in a logic way. The methods are fine in ranges or if 
 something is done manually. But in case of foreach() it's just unexpected.

I can't agree at all. It's totally expected.

If you have a for loop:

```d
int i;
for(i = 0; i < someArr.length; ++i)
{
    if(someArr[i] == desiredValue) break;
}
```

You are saying, "compiler, please execute the `++i` when I break from 
the loop because I already processed that one". How can that be 
expected? I would *never* expect that. When I break, it means "stop the 
loop, I'm done", and then I use `i` which is where I expected it to be.

 It becomes useless for foreach() because you can't rely on them if other 
 code breaks the loop and you need to use that range, like in my case. 
 But also for ranges - there is no need for a popFront() if it is not 
 called in a logic way. Then even empty() could fetch next data if 
 needed. It only makes sense if language system code uses it in a 
 strictly order and ensures that this order is always assured.

There is no problem with the ordering. What seems to be the issue is 
that you aren't used to the way ranges work.

What's great about D is that there is a solution for you:

```d
struct EagerPopfrontRange(R)
{
    R source;
    ElementType!R front;
    bool empty;
    void popFront() {
      if(source.empty) empty = true;
      else {
         front = source.front;
         source.popFront;
      }
    }
}

auto epf(R)(R inputRange) {
    auto result = EagerPopfrontRange!R(inputRange);
    result.popFront; // eager!
    return result;
}

// usage
foreach(v; someRange.epf) { ... }
```

Now if you break from the loop, the original range is pointing at the 
element *after* the one you last were processing.

 It's not a bug. So there is no need to "handle" it.

 The pattern of using a for(each) loop to align certain things occurs 
 all the time in code. Imagine a loop that is looking for a certain 
 line in a file, and breaks when the line is there. Would you really 
 want the compiler to unhelpfully throw away that line for you?

 
 I don't get this point. If it breaks from the loop then it changes the 
 scope anyway, so my data should be already processed or copied. What is 
 thrown away here?

Why does the loop have to contain all your code? Maybe you have code 
after the loop. Maybe the loop's purpose is to align the range based on 
some criteria (e.g. take this byLine range and prime it so it contains 
the first line of the thing I'm looking for).

 
 And if that is what you want, put `popFront` in the loop before you 
 exit. You can't "unpopFront" something, so this provides the most 
 flexibility.

 
 Yes, this is the solution but not the way how it should be. If the 
 programmer uses the range methods within the foreach-loop then you would 
 expect some bug. There shouldn't be a need to manipulate the range just 
 because I break the foreach-loop.

You shouldn't need to in most circumstances. I don't think I've ever 
needed to do this. And I use foreach on ranges all the time.

Granted, I probably would use a while loop to align a range rather than 
foreach.

 
 Java, for example just uses next() and hasNext(). You can't run into a 
 bug here because one method must move the cursor.

This gives a giant clue as to the problem -- you aren't used to this. 
Java's iterator interface is different than D's. It consumes the element 
as you fetch it, instead of acting like a pointer to a current element. 
Once it gives you the element, it's done with it.

D's ranges are closer to a C++ iterator pair (which is modeled after a 
pair of pointers).

 PHP has a rewind() method. So any foreach() would reset the range or 
 could clean up before next use of it.

I'm surprised you bring PHP as an example, as it appears their foreach 
interface works EXACTLY as D does:

```php
$arriter = new ArrayIterator(array(1, 2, 3, 4));
foreach($arriter as $val) { if ($val == 2) break; }
print($arriter->current()); // 2
```

 But D just lets your range in an inconsistent state between an iteration 
 cycle. This feels just wrong. The next foreach() would not continue with 
 popFront() but with empty() again - because it even relies on it that a 
 range should be called in a given order. As there is no rewind or 
 exit-method, this order should be maintained by foreach-exit too, 
 preparing for next use. That's it.
 
 You don't see a bug here?
 

I believe the bug is in your expectations. While Java-like iteration 
would be a possible API D could have chosen, it's not what D chose.

-Steve

Aug 24 2021

frame <frame86 live.com> writes:

On Tuesday, 24 August 2021 at 21:15:02 UTC, Steven Schveighoffer 
wrote:

 If you have a for loop:

 ```d
 int i;
 for(i = 0; i < someArr.length; ++i)
 {
    if(someArr[i] == desiredValue) break;
 }
 ```

 You are saying, "compiler, please execute the `++i` when I 
 break from the loop because I already processed that one". How 
 can that be expected? I would *never* expect that. When I 
 break, it means "stop the loop, I'm done", and then I use `i` 
 which is where I expected it to be.

I get your point, you see foreach() as raw translate to the 
for-loop and I'm fine with that. To automatically popFront() on 
break also is only a suggestion if there is no other mechanism to 
the tell the range we have cancelled it.

 It becomes useless for foreach() because you can't rely on 
 them if other code breaks the loop and you need to use that 
 range, like in my case. But also for ranges - there is no need 
 for a popFront() if it is not called in a logic way. Then even 
 empty() could fetch next data if needed. It only makes sense 
 if language system code uses it in a strictly order and 
 ensures that this order is always assured.

 There is no problem with the ordering. What seems to be the 
 issue is that you aren't used to the way ranges work.

Ehm, no...
-> empty()
-> front()
-> popFront()
-> empty()
-> front()
break;

-> empty();
-> front();

clearly violates the order for me.
Well, nobody said that we must move on the range - but come on...

 What's great about D is that there is a solution for you:

 ```d
 struct EagerPopfrontRange(R)
 {
    R source;
    ElementType!R front;
    bool empty;
    void popFront() {
      if(source.empty) empty = true;
      else {
         front = source.front;
         source.popFront;
      }
    }
 }

 auto epf(R)(R inputRange) {
    auto result = EagerPopfrontRange!R(inputRange);
    result.popFront; // eager!
    return result;
 }

 // usage
 foreach(v; someRange.epf) { ... }
 ```

 Now if you break from the loop, the original range is pointing 
 at the element *after* the one you last were processing.

This is nice. But foreach() should do it automatically - avoiding 
this.
foreach() should be seen as a special construct that does that, 
not just a dumb alias for the for-loop. Why? Because it is a 
convenient language construct and usage should be easy. Again, 
there should be no additional popFront() just because I break the 
loop.


 I'm surprised you bring PHP as an example, as it appears their 
 foreach interface works EXACTLY as D does:

Yeah, but the point is, there is a rewind() method. That is 
called every time on foreach().

Aug 25 2021

Steven Schveighoffer <schveiguy gmail.com> writes:

On 8/25/21 4:31 AM, frame wrote:
 On Tuesday, 24 August 2021 at 21:15:02 UTC, Steven Schveighoffer wrote:
 I'm surprised you bring PHP as an example, as it appears their foreach 
 interface works EXACTLY as D does:

 
 Yeah, but the point is, there is a rewind() method. That is called every 
 time on foreach().

It seems what you are after is forward ranges. Those are able to 
"rewind" when you are done with them. It's just not done through a 
rewind method, but via saving the range before iteration:

```d
foreach(val; forwardRange.save)
{
    ...
    break;
}

// forwardRange hasn't been iterated here
```

-Steve

Aug 25 2021

frame <frame86 live.com> writes:

On Wednesday, 25 August 2021 at 11:02:23 UTC, Steven 
Schveighoffer wrote:
 On 8/25/21 4:31 AM, frame wrote:
 On Tuesday, 24 August 2021 at 21:15:02 UTC, Steven 
 Schveighoffer wrote:
 I'm surprised you bring PHP as an example, as it appears 
 their foreach interface works EXACTLY as D does:

 
 Yeah, but the point is, there is a rewind() method. That is 
 called every time on foreach().

 It seems what you are after is forward ranges. Those are able 
 to "rewind" when you are done with them. It's just not done 
 through a rewind method, but via saving the range before 
 iteration:

 ```d
 foreach(val; forwardRange.save)
 {
    ...
    break;
 }

 // forwardRange hasn't been iterated here
 ```

 -Steve

This could be any custom method for my ranges or forward range 
returned by some function.

But that doesn't help if some thirdparty library function would 
break and return just an input range. Then it seems that it must 
be very properly implemented like postblit technics mentioned 
before. Some author may never care about.

That it works in 99% of all cases should not be an excuse for a 
design flaw.
The documentation really need to mention this.

Aug 26 2021

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Tue, Aug 24, 2021 at 08:36:18AM +0000, frame via Digitalmars-d-learn wrote:
 Consider a simple input range that can be iterated with empty(),
 front() and popFront(). That is comfortable to use with foreach() but
 what if the foreach loop will be cancelled? If a range isn't depleted
 yet and continued it will supply the same data twice on front() in the
 next use of foreach().

Generally, if you need precise control over range state between multiple
loops, you really should think about using a while loop instead of a for
loop, and call .popFront where it's needed.


 For some reason, foreach() does not call popFront() on a break or continue
 statement. There is no way to detect it except the range itself tracks its
 status and does an implicit popFront() if needed - but then this whole
 interface is some kind of useless.

In some cases, you *want* to retain the same element between loops,
e.g., if you're iterating over elements of some category and stop when
you encounter something that belongs to the next category -- you
wouldn't want to consume that element, but leave it to the next loop to
consume it.  So it's not a good idea to have break call .popFront
automatically.  Similarly, sometimes you might want to reuse an element
(e.g., the loop body detects a condition that warrants retrying).

Basically, once you need anything more than a single sequential
iteration over a range, it's better to be explicit about what exactly
you want, rather than depend on implicit semantics, which may lead to
surprising results.

	while (!range.empty) {
		doSomething(range.front);
		if (someCondition) {
			range.popFront;
			break;
		} else if (someOtherCondition) {
			// Don't consume current element
			break;
		} else if (skipElement) {
			range.popFront;
			continue;
		} else if (retryElement) {
			continue;
		}
		range.popFront;	// normal iteration
	}


T

-- 
"No, John.  I want formats that are actually useful, rather than
over-featured megaliths that address all questions by piling on
ridiculous internal links in forms which are hideously over-complex." --
Simon St. Laurent on xml-dev

Aug 24 2021

frame <frame86 live.com> writes:

On Tuesday, 24 August 2021 at 16:45:27 UTC, H. S. Teoh wrote:

 In some cases, you *want* to retain the same element between 
 loops, e.g., if you're iterating over elements of some category 
 and stop when you encounter something that belongs to the next 
 category -- you wouldn't want to consume that element, but 
 leave it to the next loop to consume it.  So it's not a good 
 idea to have break call .popFront automatically.  Similarly, 
 sometimes you might want to reuse an element (e.g., the loop 
 body detects a condition that warrants retrying).

I'm only talking about foreach() uses and that you should'nt need 
to mix it with manual methods. Such iterations are another topic.

Aug 24 2021

Alexandru Ermicioi <alexandru.ermicioi gmail.com> writes:

On Tuesday, 24 August 2021 at 08:36:18 UTC, frame wrote:
 How do you handle that issue? Are your ranges designed to have 
 this bug or do you implement opApply() always?

This is expected behavior imho. I think what you need is a 
forward range, not input range. By the contract of input range, 
it is a consumable object, hence once used in a foreach it can't 
be used anymore. It is similar to an iterator or a stream object 
in java.

Forward range exposes also capability to create save points, 
which is actually used by foreach to do, what it is done in java 
by iterable interface for example.

Then there is bidirectional and random access ranges that offer 
even more capabilities.

Per knowledge I have opApply is from pre range era, and is kinda 
left as an option to provide easy foreach integration. In this 
case you can think of objects having opApply as forward ranges, 
though just for foreach constructs only.

Regards,
Alexandru.

Aug 24 2021

frame <frame86 live.com> writes:

On Tuesday, 24 August 2021 at 18:52:19 UTC, Alexandru Ermicioi 
wrote:

 Forward range exposes also capability to create save points, 
 which is actually used by foreach to do, what it is done in 
 java by iterable interface for example.

I know, but foreach() doesn't call save().

Aug 25 2021

Alexandru Ermicioi <alexandru.ermicioi gmail.com> writes:

On Wednesday, 25 August 2021 at 08:15:18 UTC, frame wrote:
 I know, but foreach() doesn't call save().

Hmm, this is a regression probably, or I missed the time frame 
when foreach moved to use of copy constructor for forward ranges.

Do we have a well defined description of what input, forward and 
any other well known range is, and how it does interact with 
language features?

For some reason I didn't manage to find anything on dlang.org.

Aug 25 2021

Steven Schveighoffer <schveiguy gmail.com> writes:

On 8/25/21 6:06 AM, Alexandru Ermicioi wrote:
 On Wednesday, 25 August 2021 at 08:15:18 UTC, frame wrote:
 I know, but foreach() doesn't call save().

 
 Hmm, this is a regression probably, or I missed the time frame when 
 foreach moved to use of copy constructor for forward ranges.
 
 Do we have a well defined description of what input, forward and any 
 other well known range is, and how it does interact with language features?
 
 For some reason I didn't manage to find anything on dlang.org.

It never has called `save`. It makes a copy, which is almost always the 
equivalent `save` implementation.

-Steve

Aug 25 2021

Alexandru Ermicioi <alexandru.ermicioi gmail.com> writes:

On Wednesday, 25 August 2021 at 11:04:35 UTC, Steven 
Schveighoffer wrote:
 It never has called `save`. It makes a copy, which is almost 
 always the equivalent `save` implementation.

 -Steve

Really?

Then what is the use for .save method then?
The only reason I can find is that you can't declare constructors 
in interfaces hence the use of the .save method instead of copy 
constructor for defining forward ranges.

We have now two ways of doing the same thing, which can cause 
confusion. Best would be then for ranges to hide copy constructor 
under private modifier (or disable altoghether), and force other 
range wrappers call .save always, including foreach since by not 
doing so we introduce difference in behavior between ref and 
value forward ranges (for foreach use).

Aug 25 2021

Steven Schveighoffer <schveiguy gmail.com> writes:

On 8/25/21 7:26 AM, Alexandru Ermicioi wrote:
 On Wednesday, 25 August 2021 at 11:04:35 UTC, Steven Schveighoffer wrote:
 It never has called `save`. It makes a copy, which is almost always 
 the equivalent `save` implementation.

 
 Really?
 
 Then what is the use for .save method then?
 The only reason I can find is that you can't declare constructors in 
 interfaces hence the use of the .save method instead of copy constructor 
 for defining forward ranges.

The `save` function was used to provide a way for code like 
`isForwardRange` to have a definitive symbol to search for. It's also 
opt-in, whereas if we used copying, it would be opt-out.

Why a function, and not just some enum? Because it should be something 
that has to be used, not just a "documenting" attribute if I recall 
correctly.

Keep in mind, UDAs were not a thing yet, and compile-time introspection 
was not as robust as it is now. I'm not even sure you could disable copying.

 
 We have now two ways of doing the same thing, which can cause confusion. 
 Best would be then for ranges to hide copy constructor under private 
 modifier (or disable altoghether), and force other range wrappers call 
 .save always, including foreach since by not doing so we introduce 
 difference in behavior between ref and value forward ranges (for foreach 
 use).

There would be a huge hole in this plan -- arrays. Arrays are the most 
common range anywhere, and if a forward range must not be copyable any 
way but using `save`, it would mean arrays are not forward ranges.

Not to mention that foreach on an array is a language construct, and 
does not involve the range interface.

-Steve

Aug 25 2021

D Programming

C/C++ Programming

Other

digitalmars.D.learn - foreach() behavior on ranges