www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - How is chunkBy supposed to behave on copy

reply Dukc <ajieskola gmail.com> writes:
```
void main()
{	import std;
	auto chunks = [14, 16, 18, 20, 22, 20, 18, 16]
	.chunkBy!((a, b) => a / 10 == b / 10);
	
	//[[14, 16, 18], [20, 22, 20], [18, 16]]
	chunks
	.map!(chunk => chunk.array)
	.array
	.writeln;
	
	//[]
	chunks
	.map!(chunk => chunk.array)
	.array
	.writeln;
}
```

This is how chunkBy[1] currently behaves when copying. It 
essentially behaves like a reference range: it will only save 
it's state when `save` is explicitly called, not otherwise, even 
if the chunked source range is a forward range with value 
semantics.

As most Phobos ranges preserve the value semantics of their 
source ranges, this is an odd exception, especially as the 
documentation says nothing about it.

So the question is, how should it behave? I have looked at the 
implementation, it should be trivial to have it to use value 
semantics. On the oher hand, someone may rely on the present 
behaviour.

[1] https://dlang.org/phobos/std_algorithm_iteration.html#.chunkBy
Mar 18
next sibling parent Paul Backus <snarwin gmail.com> writes:
On Wednesday, 18 March 2020 at 14:57:41 UTC, Dukc wrote:
 So the question is, how should it behave? I have looked at the 
 implementation, it should be trivial to have it to use value 
 semantics. On the oher hand, someone may rely on the present 
 behaviour.

 [1] 
 https://dlang.org/phobos/std_algorithm_iteration.html#.chunkBy
As I understand it, the result of using a range after copying it is unspecified, and relying on any particular behavior in that case is a bug. So changing the behavior will only break code that is (in some sense) "already broken." Of course, given the above, the question of how it "should" behave is rendered moot. If you're copying a range, you must either use .save, or only use the copy and not the original from that point forward. In fact, if you want to catch as many errors as possible at compile time, the strictest approach would be to mark the range as non-copyable ( disable this(this)) and require the use of either .save or move to copy it.
Mar 18
prev sibling next sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 3/18/20 10:57 AM, Dukc wrote:
 ```
 void main()
 {    import std;
      auto chunks = [14, 16, 18, 20, 22, 20, 18, 16]
      .chunkBy!((a, b) => a / 10 == b / 10);
 
      //[[14, 16, 18], [20, 22, 20], [18, 16]]
      chunks
      .map!(chunk => chunk.array)
      .array
      .writeln;
 
      //[]
      chunks
      .map!(chunk => chunk.array)
      .array
      .writeln;
 }
 ```
 
 This is how chunkBy[1] currently behaves when copying. It essentially 
 behaves like a reference range: it will only save it's state when `save` 
 is explicitly called, not otherwise, even if the chunked source range is 
 a forward range with value semantics.
 
 As most Phobos ranges preserve the value semantics of their source 
 ranges, this is an odd exception, especially as the documentation says 
 nothing about it.
 
 So the question is, how should it behave? I have looked at the 
 implementation, it should be trivial to have it to use value semantics. 
 On the oher hand, someone may rely on the present behaviour.
 
 [1] https://dlang.org/phobos/std_algorithm_iteration.html#.chunkBy
According to the range spec, save must be called if you wish to preserve state. In practice, this is seldom done, because most of the time, save just does `return this;`, so copying works just the same. The short answer is, use save. The long answer is, rangeBy uses reference counting internally to store the range data. I'm not sure why this is, as it seems possible to do without this mechanism. It seems there is some optimization surrounding pushing along the range without iterating it twice. But I don't know if that's a worthwhile optimization, especially if the allocation and reference counting are more expensive than the iteration. I would think you could get away with simply storing the range twice, and using .save to make sure you don't have problems. -Steve
Mar 18
parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Wed, Mar 18, 2020 at 12:18:04PM -0400, Steven Schveighoffer via
Digitalmars-d wrote:
[...]
 The long answer is, rangeBy uses reference counting internally to
 store the range data. I'm not sure why this is,
Because Andrei demanded it when I submitted the PR. The main reason is to address the issue of what happens when you iterate the outer range while retaining a copy of an inner range. In my original implementation, advancing the inner range also advances the outer range, and vice versa, but only when it's an input range. When it's a forward range, it takes advantage of .save to decouple iteration on an inner range from iteration on the outer range. But that meant that iterating over the entire thing required traversing the original range twice: once in each inner range, and once on the outer range. Andrei didn't like this, so we hashed out several solutions and eventually settled on the reference counting one.
 as it seems possible to do without this mechanism. It seems there is
 some optimization surrounding pushing along the range without
 iterating it twice. But I don't know if that's a worthwhile
 optimization, especially if the allocation and reference counting are
 more expensive than the iteration.
That's up for debate, but yes, the whole point of the reference counting thing is to avoid traversing a forward range twice when iterating over subranges. It really depends on what the original range does, IMO. If it's generating values on-the-fly with an expensive calculation, you could be saving quite a bit of work. But for simple ranges, yeah, reference-counting inner ranges is kinda overkill, probably with a performance hit.
 I would think you could get away with simply storing the range twice,
 and using .save to make sure you don't have problems.
[...] This is what the original implementation did, but Andrei did not like it. T -- Старый друг лучше новых двух.
Mar 18
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 3/18/20 12:37 PM, H. S. Teoh wrote:
 On Wed, Mar 18, 2020 at 12:18:04PM -0400, Steven Schveighoffer via
Digitalmars-d wrote:
 as it seems possible to do without this mechanism. It seems there is
 some optimization surrounding pushing along the range without
 iterating it twice. But I don't know if that's a worthwhile
 optimization, especially if the allocation and reference counting are
 more expensive than the iteration.
That's up for debate, but yes, the whole point of the reference counting thing is to avoid traversing a forward range twice when iterating over subranges. It really depends on what the original range does, IMO. If it's generating values on-the-fly with an expensive calculation, you could be saving quite a bit of work. But for simple ranges, yeah, reference-counting inner ranges is kinda overkill, probably with a performance hit.
Ugh, I think this is way overkill in most cases, and depends heavily on where the performance hit is. Not only that, but you are only seeing a benefit if you iterate a chunk completely (I think). For instance, an array has nearly zero cost to iterate elements, but the predicate for checking the chunking is probably way more expensive. The algorithm would be nicer if you simply iterated the array until you found the boundary, and then returned a slice as the element. Only one iteration through everything necessary (you pre-calculate the "next" range). In other cases, iterating the elements is going to be expensive, so you don't want to do that twice if possible. I think a good solution might be to provide different versions of the function or an enum designating what mechanism to prefer (e.g. IterationPolicy.MinimizePopfront or IterationPolicy.MinimizePredicateEval). And of course, the .save behavior sucks, as always. -Steve
Mar 18
parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Wed, Mar 18, 2020 at 01:06:02PM -0400, Steven Schveighoffer via
Digitalmars-d wrote:
 On 3/18/20 12:37 PM, H. S. Teoh wrote:
[...]
 That's up for debate, but yes, the whole point of the reference
 counting thing is to avoid traversing a forward range twice when
 iterating over subranges.
[...]
 Ugh, I think this is way overkill in most cases, and depends heavily
 on where the performance hit is.
 
 Not only that, but you are only seeing a benefit if you iterate a
 chunk completely (I think).
Yeah probably.
 For instance, an array has nearly zero cost to iterate elements, but
 the predicate for checking the chunking is probably way more
 expensive. The algorithm would be nicer if you simply iterated the
 array until you found the boundary, and then returned a slice as the
 element. Only one iteration through everything necessary (you
 pre-calculate the "next" range).
We *could* just detect hasSlicing!R and switch to an alternative implementation that doesn't need to jump through all of those refcounting hoops.
 In other cases, iterating the elements is going to be expensive, so
 you don't want to do that twice if possible.
 
 I think a good solution might be to provide different versions of the
 function or an enum designating what mechanism to prefer (e.g.
 IterationPolicy.MinimizePopfront or
 IterationPolicy.MinimizePredicateEval).
[...] It sounds like a good idea, but these days I'm wary of proliferating these implementation-detail-based parameters. They're a pain to write, a pain to use, and a pain to maintain, and once you have the option you're committed to supporting it even if one day you invent a better algorithm that completely changes the implementation. I much rather detect hasSlicing on the incoming range, and switching to a better implementation. T -- INTEL = Only half of "intelligence".
Mar 18
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 3/18/20 2:30 PM, H. S. Teoh wrote:

 It sounds like a good idea, but these days I'm wary of proliferating
 these implementation-detail-based parameters.  They're a pain to write,
 a pain to use, and a pain to maintain, and once you have the option
 you're committed to supporting it even if one day you invent a better
 algorithm that completely changes the implementation.
 
 I much rather detect hasSlicing on the incoming range, and switching to
 a better implementation.
The problem is that you don't know what the bottleneck is, only the user does. auto r = arr.map!(elem => someHorribleCalculation(elem)); static assert(hasSlicing(typeof(r))); // 2x horrible calculations The thing I would do is provide the "implementation detail" mechanisms, but default to something reasonable. It could even change based on whether slicing is available (the default, that is). -Steve
Mar 18
parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Wed, Mar 18, 2020 at 02:55:35PM -0400, Steven Schveighoffer via
Digitalmars-d wrote:
 On 3/18/20 2:30 PM, H. S. Teoh wrote:
[...]
 I much rather detect hasSlicing on the incoming range, and switching
 to a better implementation.
The problem is that you don't know what the bottleneck is, only the user does. auto r = arr.map!(elem => someHorribleCalculation(elem)); static assert(hasSlicing(typeof(r))); // 2x horrible calculations The thing I would do is provide the "implementation detail" mechanisms, but default to something reasonable. It could even change based on whether slicing is available (the default, that is).
[...] Fair enough. Still, I think we should totally detect hasSlicing, or at least built-in arrays, and default that to the slicing implementation instead. T -- Bomb technician: If I'm running, try to keep up.
Mar 18
prev sibling next sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Wed, Mar 18, 2020 at 02:57:41PM +0000, Dukc via Digitalmars-d wrote:
[...]
 This is how chunkBy[1] currently behaves when copying. It essentially
 behaves like a reference range: it will only save it's state when
 `save` is explicitly called, not otherwise, even if the chunked source
 range is a forward range with value semantics.
The current range API design, which Andrei himself admitted was not the best design in retrospect, does not specify behaviour on copying a range. IOW, it's the application-level equivalent of undefined behaviour. Generally, this is not a problem because usually you're working with your own range types which you already know the semantics of. But in generic code, assumptions of this sort are a no-no, precisely because of breakages of this sort when different ranges behave differently on copy. tl;dr: never copy a range directly, always use .save. Never assume a range remains unchanged after iterating a copy; always assume the worst and use .save whenever you wish the original range to remain unchanged afterwards.
 As most Phobos ranges preserve the value semantics of their source
 ranges,
[...] This is a dangerous assumption. My personal advice is, if you expect a range to retain its contents after iteration, call .save explicitly. Don't assume anything not explicitly required by the range API. T -- Verbing weirds language. -- Calvin (& Hobbes)
Mar 18
parent reply Dukc <ajieskola gmail.com> writes:
On Wednesday, 18 March 2020 at 16:29:53 UTC, H. S. Teoh wrote:
 This is a dangerous assumption.  My personal advice is, if you 
 expect a range to retain its contents after iteration, call 
 .save explicitly. Don't assume anything not explicitly required 
 by the range API.
This means that even if the api does thing X right now, I should not assume it won't change unless it's documented, right?
Mar 19
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 3/19/20 10:00 AM, Dukc wrote:
 On Wednesday, 18 March 2020 at 16:29:53 UTC, H. S. Teoh wrote:
 This is a dangerous assumption.  My personal advice is, if you expect 
 a range to retain its contents after iteration, call .save explicitly. 
 Don't assume anything not explicitly required by the range API.
This means that even if the api does thing X right now, I should not assume it won't change unless it's documented, right?
If you want to copy a forward range use save. Anything else is possible to break in an implementation detail later. -Steve
Mar 19
parent reply Dukc <ajieskola gmail.com> writes:
On Thursday, 19 March 2020 at 14:11:20 UTC, Steven Schveighoffer 
wrote:
 If you want to copy a forward range use save. Anything else is 
 possible to break in an implementation detail later.

 -Steve
Ouch - my code has to be totally broken. And I don't think I'm the only one.
Mar 19
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 3/19/20 10:28 AM, Dukc wrote:
 On Thursday, 19 March 2020 at 14:11:20 UTC, Steven Schveighoffer wrote:
 If you want to copy a forward range use save. Anything else is 
 possible to break in an implementation detail later.
Ouch - my code has to be totally broken. And I don't think I'm the only one.
You are not the only one. Almost all range functions are tested with and use arrays, which do exactly the same for .save or copying. The reason why `save` is such a bad solution is because there is generally no penalty for ignoring it (until there is). The only way to "fix" this IMO is to migrate to a system where standard (non-forward) input ranges are not copyable, and deprecate save. It likely will never happen. -Steve
Mar 19
next sibling parent reply Dukc <ajieskola gmail.com> writes:
On Thursday, 19 March 2020 at 14:46:39 UTC, Steven Schveighoffer 
wrote:
 The only way to "fix" this IMO is to migrate to a system where 
 standard (non-forward) input ranges are not copyable, and 
 deprecate save. It likely will never happen.

 -Steve
Perhaps there is another way. Individual ranges can give guarantees about their own copy behaviour if they want. We could document the current copy behaviour of existing Phobos ranges and say that relying on it is thereafter fine. But then the question with `chunkBy` is, should it take the opportunity to start to behave like most other ranges in this regard?
Mar 19
parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Thu, Mar 19, 2020 at 03:35:28PM +0000, Dukc via Digitalmars-d wrote:
 On Thursday, 19 March 2020 at 14:46:39 UTC, Steven Schveighoffer wrote:
 The only way to "fix" this IMO is to migrate to a system where
 standard (non-forward) input ranges are not copyable, and deprecate
 save. It likely will never happen.
[...]
 Perhaps there is another way. Individual ranges can give guarantees
 about their own copy behaviour if they want. We could document the
 current copy behaviour of existing Phobos ranges and say that relying
 on it is thereafter fine.
The problem is, you *cannot* give any guarantees about copy behaviour, because it depends on the behaviour of the incoming range. For example, if you pass the output of .byChunk to another range algorithm, that second algorithm cannot guarantee copy behaviour anymore. In fact, all you have to do is to wrap a forward range in an InputRangeObject (let's say you need to alternate between two different range types (but with compatible elements) at runtime, then you'll need to do this), and now you have a forward range with by-reference semantics that requires the use of .save in order to retain its current position. This is (likely) part of the reason why the range API does not specify the behaviour of a range once it's iterated over without using .save: it depends on implementation details.
 But then the question with `chunkBy` is, should it take the
 opportunity to start to behave like most other ranges in this regard?
I don't see why this should be a compelling reason to change chunkBy -- since copy behaviour is not specified by the range API (for IMO good reasons -- it's implementation-specific). But I do agree that the implementation of chunkBy could be improved for other reasons, like not using the expensive ref-counting implementation when the underlying range is, say, a native array that you could just slice over. T -- MAS = Mana Ada Sistem?
Mar 19
parent reply Dukc <ajieskola gmail.com> writes:
On Thursday, 19 March 2020 at 16:01:39 UTC, H. S. Teoh wrote:
 The problem is, you *cannot* give any guarantees about copy 
 behaviour, because it depends on the behaviour of the incoming 
 range. For example, if you pass the output of .byChunk to 
 another range algorithm, that second algorithm cannot guarantee 
 copy behaviour anymore.

 In fact, all you have to do is to wrap a forward range in an 
 InputRangeObject (let's say you need to alternate between two 
 different range types (but with compatible elements) at 
 runtime, then you'll need to do this), and now you have a 
 forward range with by-reference semantics that requires the use 
 of .save in order to retain its current position.
The documentation would say that the copy behaviour of the range is the same as the source range has. A guarantee to depend on the source range is still a guarantee, and definitely better than the present situation.
Mar 19
parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Thu, Mar 19, 2020 at 05:15:51PM +0000, Dukc via Digitalmars-d wrote:
 On Thursday, 19 March 2020 at 16:01:39 UTC, H. S. Teoh wrote:
 
 The problem is, you *cannot* give any guarantees about copy
 behaviour, because it depends on the behaviour of the incoming
 range. For example, if you pass the output of .byChunk to another
 range algorithm, that second algorithm cannot guarantee copy
 behaviour anymore.
 
 In fact, all you have to do is to wrap a forward range in an
 InputRangeObject (let's say you need to alternate between two
 different range types (but with compatible elements) at runtime,
 then you'll need to do this), and now you have a forward range with
 by-reference semantics that requires the use of .save in order to
 retain its current position.
The documentation would say that the copy behaviour of the range is the same as the source range has. A guarantee to depend on the source range is still a guarantee, and definitely better than the present situation.
A guarantee about copy behaviour potentially binds the Phobos maintainers to the specifics of the current implementation of the algorithm. I'm not sure that's what we want, since it may limit future improvements. The thing about the range API is that it's like a contract between user code and Phobos; the way you use it should be according to the contract, anything not stated by the contract should not be relied upon even if currently it happens to work. That's the principle of encapsulation. Behaviours that arise from the specifics of how something is currently implemented are implementation details that shouldn't leak into the calling code, including assumptions about copy behaviour. Arguably, relying on a specific copy behaviour is an instance of leaky abstraction, since outside code shouldn't know nor depend upon Phobos internal implementation details. Part of the power of encapsulation is to leave certain non-essential things unspecified, so that implementors can have greater freedom in how they implement the API. Something that isn't directly related to the particular function (e.g., the primary function of byChunk) shouldn't be a part of the API, IMO, it should be left unspecified with the understanding that relying on the behaviour of something unspecified runs the risk of future breakage. The specific behaviour ought to be inside the "black box" of encapsulation, not relied upon by user code, much less specified in library docs. T -- "Maybe" is a strange word. When mom or dad says it it means "yes", but when my big brothers say it it means "no"! -- PJ jr.
Mar 19
prev sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Thu, Mar 19, 2020 at 10:46:39AM -0400, Steven Schveighoffer via
Digitalmars-d wrote:
 On 3/19/20 10:28 AM, Dukc wrote:
 On Thursday, 19 March 2020 at 14:11:20 UTC, Steven Schveighoffer wrote:
 
 If you want to copy a forward range use save. Anything else is
 possible to break in an implementation detail later.
 
Ouch - my code has to be totally broken. And I don't think I'm the only one.
You are not the only one. Almost all range functions are tested with and use arrays, which do exactly the same for .save or copying. The reason why `save` is such a bad solution is because there is generally no penalty for ignoring it (until there is).
Yeah, Andrei has said that in retrospect, .save was a design mistake, he should have made the input range vs. forward range distinction keyed on copyability or ref/non-ref instead. But to be fair, back in the day when the range API was first designed, D didn't have the necessary language facilities to be able to handle a better solution, unlike now when we have disable, copy ctors, and a bunch of other language features that make such a solution more feasible.
 The only way to "fix" this IMO is to migrate to a system where
 standard (non-forward) input ranges are not copyable, and deprecate
 save. It likely will never happen.
[...] It will happen if std.v2 happens... ;-) T -- "How are you doing?" "Doing what?"
Mar 19
prev sibling next sibling parent Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Wednesday, March 18, 2020 10:29:53 AM MDT H. S. Teoh via Digitalmars-d 
wrote:
 On Wed, Mar 18, 2020 at 02:57:41PM +0000, Dukc via Digitalmars-d wrote:
 [...]

 This is how chunkBy[1] currently behaves when copying. It essentially
 behaves like a reference range: it will only save it's state when
 `save` is explicitly called, not otherwise, even if the chunked source
 range is a forward range with value semantics.
The current range API design, which Andrei himself admitted was not the best design in retrospect, does not specify behaviour on copying a range. IOW, it's the application-level equivalent of undefined behaviour. Generally, this is not a problem because usually you're working with your own range types which you already know the semantics of. But in generic code, assumptions of this sort are a no-no, precisely because of breakages of this sort when different ranges behave differently on copy. tl;dr: never copy a range directly, always use .save. Never assume a range remains unchanged after iterating a copy; always assume the worst and use .save whenever you wish the original range to remain unchanged afterwards.
Ranges get copied all the time when passing them around. You're not going to avoid it (even using them with foreach copies them). The key thing is not that a range shouldn't be copied without using save but that if a range is ever copied, then the original shouldn't be used again, and from that point on, only the copy should be used. So, if you pass a range to foreach, a function, or do anything else which would copy it, then don't use the original again, and if you want to use it again, then instead of copying it directly, call save to get an independent copy. Generic code should _never_ rely on the copying semantics of a range, and even in non-generic code, depending on the semantics of copying a range whose implementation you don't fully control is just begging for bugs. - Jonathan M Davis
Mar 20
prev sibling next sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Fri, Mar 20, 2020 at 03:59:57PM -0600, Jonathan M Davis via Digitalmars-d
wrote:
 On Wednesday, March 18, 2020 10:29:53 AM MDT H. S. Teoh via Digitalmars-d 
 wrote:
[...]
 tl;dr: never copy a range directly, always use .save.  Never assume
 a range remains unchanged after iterating a copy; always assume the
 worst and use .save whenever you wish the original range to remain
 unchanged afterwards.
Ranges get copied all the time when passing them around. You're not going to avoid it (even using them with foreach copies them). The key thing is not that a range shouldn't be copied without using save but that if a range is ever copied, then the original shouldn't be used again, and from that point on, only the copy should be used.
Yes, that's right. Actually, for by-value ranges the act of passing them as an argument to a range function in the first place already copies them. The catch is really that once this happens, the caller or whoever retains the original copy should no longer assume that the range remains in the same place as before. For some ranges this is true, but for other ranges this assumption is invalid, and will lead to incorrect results.
 So, if you pass a range to foreach, a function, or do anything else
 which would copy it, then don't use the original again, and if you
 want to use it again, then instead of copying it directly, call save
 to get an independent copy. Generic code should _never_ rely on the
 copying semantics of a range, and even in non-generic code, depending
 on the semantics of copying a range whose implementation you don't
 fully control is just begging for bugs.
[...] +1. T -- Knowledge is that area of ignorance that we arrange and classify. -- Ambrose Bierce
Mar 20
parent reply Ben Jones <fake fake.fake> writes:
On Friday, 20 March 2020 at 22:11:49 UTC, H. S. Teoh wrote:
 On Fri, Mar 20, 2020 at 03:59:57PM -0600, Jonathan M Davis via 
 Digitalmars-d wrote:
 [...]
[...]
 [...]
Yes, that's right. Actually, for by-value ranges the act of passing them as an argument to a range function in the first place already copies them. The catch is really that once this happens, the caller or whoever retains the original copy should no longer assume that the range remains in the same place as before. For some ranges this is true, but for other ranges this assumption is invalid, and will lead to incorrect results.
 [...]
[...] +1. T
So range "copy" is really what a C++ person would call range "move" ? It might be a copy, or it might invalidate the original, depending on the type?
Mar 20
next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Fri, Mar 20, 2020 at 10:15:18PM +0000, Ben Jones via Digitalmars-d wrote:
 On Friday, 20 March 2020 at 22:11:49 UTC, H. S. Teoh wrote:
[...]
 Yes, that's right.  Actually, for by-value ranges the act of passing
 them as an argument to a range function in the first place already
 copies them.  The catch is really that once this happens, the caller
 or whoever retains the original copy should no longer assume that
 the range remains in the same place as before.  For some ranges this
 is true, but for other ranges this assumption is invalid, and will
 lead to incorrect results.
[...]
 So range "copy" is really what a C++ person would call range "move" ?
 It might be a copy, or it might invalidate the original, depending on
 the type?
The way it's currently implemented, yes, pretty much. Unless you're in control of the exact range type, you should always assume the worst and not rely on the state of the original range after passing it to a range function. If you need to retain the original state, use .save. T -- You are only young once, but you can stay immature indefinitely. -- azephrahel
Mar 20
prev sibling next sibling parent Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Friday, March 20, 2020 4:15:18 PM MDT Ben Jones via Digitalmars-d wrote:
 On Friday, 20 March 2020 at 22:11:49 UTC, H. S. Teoh wrote:
 On Fri, Mar 20, 2020 at 03:59:57PM -0600, Jonathan M Davis via

 Digitalmars-d wrote:
 [...]
[...]
 [...]
Yes, that's right. Actually, for by-value ranges the act of passing them as an argument to a range function in the first place already copies them. The catch is really that once this happens, the caller or whoever retains the original copy should no longer assume that the range remains in the same place as before. For some ranges this is true, but for other ranges this assumption is invalid, and will lead to incorrect results.
 [...]
[...] +1. T
So range "copy" is really what a C++ person would call range "move" ? It might be a copy, or it might invalidate the original, depending on the type?
You more or less have to view it that way, though no move is actually taking place. The problem is that the semantics of what happens when a range is copied are completely implementation-dependent, making it so that you cannot depend on theem and thus basically have to consider the range to be in an invalid state once it's been copied even if it's not technically in an invalid state. If a range has by-value copying semantics, then when you copy it, you get the same as save. If it's a full-on reference type, then mutating the copy mutates the original. And worst of all, if you have a pseudo-reference type, then you can end up in a state where mutating the copy mutates only part of the original, effectively putting it in an invalid state. But even if you somehow never had to worry about pseudo-reference types, the mere fact that some ranges have by-value copying semantics whereas others are full-on reference types makes it so that you can't rely on what happens to a range once it's been copied. And if code is not being at minimum tested with both value-type ranges and reference-type ranges, the odds are _very_ high that it won't handle ranges that aren't value types correctly. Really, forward ranges should all have by-value copying (thus requiring that classes be wrapped in structs if they're going to be forward ranges), and save should be abolished, but that requires a major redesign and likely would only happen if we did some sort of Phobos v2 (as has occasionally been discussed). And exactly what should happen with basic input ranges is not clear, because while ideally, you'd just require that they have full-on reference semantics, that tends to mean that you're forcing them to be allocated on the heap, which isn't really the sort of thing that we want to force if we can avoid it. So, while it's clear what we should do with some aspects of the range API if we have the opportunity to redesign it, there are still issues that would have to be sorted out. Regardless, as things stand, you can't rely on the semantics of copying a range and basically have to consider that a range has become invalid once it's been copied. Unfortunately, it's not something that seems to be well understood and is often handled incorrectly in code. I've been pointing it out for years (including in my talk at dconf 2015), but we haven't done a good enough job in general messaging how the range API works, and this is one of the details that seems to be very easily missed. - Jonathan M Davis
Mar 20
prev sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Fri, Mar 20, 2020 at 05:50:26PM -0600, Jonathan M Davis via Digitalmars-d
wrote:
[...]
 And exactly what should happen with basic input ranges is not clear,
 because while ideally, you'd just require that they have full-on
 reference semantics, that tends to mean that you're forcing them to be
 allocated on the heap, which isn't really the sort of thing that we
 want to force if we can avoid it.
[...] You could just have input ranges be a struct with the copy ctor disable'd, perhaps? T -- Those who've learned LaTeX swear by it. Those who are learning LaTeX swear at it. -- Pete Bleackley
Mar 22
parent reply Paul Backus <snarwin gmail.com> writes:
On Sunday, 22 March 2020 at 15:43:45 UTC, H. S. Teoh wrote:
 On Fri, Mar 20, 2020 at 05:50:26PM -0600, Jonathan M Davis via 
 Digitalmars-d wrote: [...]
 And exactly what should happen with basic input ranges is not 
 clear, because while ideally, you'd just require that they 
 have full-on reference semantics, that tends to mean that 
 you're forcing them to be allocated on the heap, which isn't 
 really the sort of thing that we want to force if we can avoid 
 it.
[...] You could just have input ranges be a struct with the copy ctor disable'd, perhaps? T
Technically, this is already legal--isInputRange accepts non-copyable structs. Of course, if you accept non-copyable ranges as legitimate, someone has to go and fix all of std.algorithm and std.range to handle them correctly, which is a non-trivial amount of work.
Mar 22
parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Sun, Mar 22, 2020 at 04:23:27PM +0000, Paul Backus via Digitalmars-d wrote:
 On Sunday, 22 March 2020 at 15:43:45 UTC, H. S. Teoh wrote:
 On Fri, Mar 20, 2020 at 05:50:26PM -0600, Jonathan M Davis via
 Digitalmars-d wrote: [...]
 And exactly what should happen with basic input ranges is not
 clear, because while ideally, you'd just require that they have
 full-on reference semantics, that tends to mean that you're
 forcing them to be allocated on the heap, which isn't really the
 sort of thing that we want to force if we can avoid it.
[...] You could just have input ranges be a struct with the copy ctor disable'd, perhaps?
[...]
 Technically, this is already legal--isInputRange accepts non-copyable
 structs.
 
 Of course, if you accept non-copyable ranges as legitimate, someone
 has to go and fix all of std.algorithm and std.range to handle them
 correctly, which is a non-trivial amount of work.
I don't think it's wise to do this with the existing codebase. A change as drastic as removing .save will likely require rewriting not just large chunks of Phobos, but existing user code as well. This is probably best left to Phobos v2, if that ever happens. T -- An elephant: A mouse built to government specifications. -- Robert Heinlein
Mar 23
parent Paul Backus <snarwin gmail.com> writes:
On Monday, 23 March 2020 at 17:37:16 UTC, H. S. Teoh wrote:
 On Sun, Mar 22, 2020 at 04:23:27PM +0000, Paul Backus via 
 Digitalmars-d wrote:
 Technically, this is already legal--isInputRange accepts 
 non-copyable structs.
 
 Of course, if you accept non-copyable ranges as legitimate, 
 someone has to go and fix all of std.algorithm and std.range 
 to handle them correctly, which is a non-trivial amount of 
 work.
I don't think it's wise to do this with the existing codebase. A change as drastic as removing .save will likely require rewriting not just large chunks of Phobos, but existing user code as well. This is probably best left to Phobos v2, if that ever happens. T
Agreed. I was only talking about making pure input ranges non-copyable (i.e., requiring all code that copies a range to use either .save or move), which is not (strictly speaking) a breaking change to the existing range interface.
Mar 23
prev sibling next sibling parent reply Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Sunday, March 22, 2020 9:43:45 AM MDT H. S. Teoh via Digitalmars-d wrote:
 On Fri, Mar 20, 2020 at 05:50:26PM -0600, Jonathan M Davis via
 Digitalmars-d wrote: [...]

 And exactly what should happen with basic input ranges is not clear,
 because while ideally, you'd just require that they have full-on
 reference semantics, that tends to mean that you're forcing them to be
 allocated on the heap, which isn't really the sort of thing that we
 want to force if we can avoid it.
[...] You could just have input ranges be a struct with the copy ctor disable'd, perhaps?
A range that can't be copied is useless. It won't even work with foreach, because anything you iterate over with foreach is copied when it's passed to foreach. And of course, idiomatic range code passes ranges all over the place, resulting in a lot of copying. And to wrap a range in another range (which is required for most range-based algorithms), you have to copy it. IMHO, it would make far more sense to just use opApply than to try to make a range non-copyable. - Jonathan M Davis
Mar 23
parent Steven Schveighoffer <schveiguy gmail.com> writes:
On 3/23/20 2:15 PM, Jonathan M Davis wrote:
 On Sunday, March 22, 2020 9:43:45 AM MDT H. S. Teoh via Digitalmars-d wrote:
 On Fri, Mar 20, 2020 at 05:50:26PM -0600, Jonathan M Davis via
 Digitalmars-d wrote: [...]

 And exactly what should happen with basic input ranges is not clear,
 because while ideally, you'd just require that they have full-on
 reference semantics, that tends to mean that you're forcing them to be
 allocated on the heap, which isn't really the sort of thing that we
 want to force if we can avoid it.
[...] You could just have input ranges be a struct with the copy ctor disable'd, perhaps?
A range that can't be copied is useless. It won't even work with foreach, because anything you iterate over with foreach is copied when it's passed to foreach.
foreach(x; uncopyableRange.move) foreach(x; ucr.refCounted) foreach(x; returnsRvalue()) Not only that, but once you define "non-copyable" as a thing for ranges, then you can handle this without having to always tack-on ".move" or whatnot. It would be possible. It wouldn't be as "pretty". But most ranges are at least forward ranges, so it wouldn't be a terrible problem. -Steve
Mar 23
prev sibling parent Pezbi Mahn <pezbiworkemail gmail.com> writes:
That=E2=80=99s dope homie

On Mon, Mar 23, 2020 at 2:15 PM Jonathan M Davis via Digitalmars-d <
digitalmars-d puremagic.com> wrote:

 On Sunday, March 22, 2020 9:43:45 AM MDT H. S. Teoh via Digitalmars-d
 wrote:
 On Fri, Mar 20, 2020 at 05:50:26PM -0600, Jonathan M Davis via
 Digitalmars-d wrote: [...]

 And exactly what should happen with basic input ranges is not clear,
 because while ideally, you'd just require that they have full-on
 reference semantics, that tends to mean that you're forcing them to b=
e
 allocated on the heap, which isn't really the sort of thing that we
 want to force if we can avoid it.
[...] You could just have input ranges be a struct with the copy ctor disable'd, perhaps?
A range that can't be copied is useless. It won't even work with foreach, because anything you iterate over with foreach is copied when it's passed to foreach. And of course, idiomatic range code passes ranges all over the place, resulting in a lot of copying. And to wrap a range in another rang=
e
 (which is required for most range-based algorithms), you have to copy it.
 IMHO, it would make far more sense to just use opApply than to try to mak=
e
 a
 range non-copyable.

 - Jonathan M Davis
Mar 23