digitalmars.D - How is chunkBy supposed to behave on copy

Dukc (32/32) Mar 18 2020 ```

Paul Backus (12/18) Mar 18 2020 As I understand it, the result of using a range after copying it
Steven Schveighoffer (15/49) Mar 18 2020 According to the range spec, save must be called if you wish to preserve...

H. S. Teoh (26/35) Mar 18 2020 Because Andrei demanded it when I submitted the PR. The main reason is

Steven Schveighoffer (17/31) Mar 18 2020 Ugh, I think this is way overkill in most cases, and depends heavily on

H. S. Teoh (18/40) Mar 18 2020 [...]

Steven Schveighoffer (9/17) Mar 18 2020 The problem is that you don't know what the bottleneck is, only the user...

H. S. Teoh (9/23) Mar 18 2020 [...]

H. S. Teoh (21/27) Mar 18 2020 The current range API design, which Andrei himself admitted was not the

Dukc (3/7) Mar 19 2020 This means that even if the api does thing X right now, I should

Steven Schveighoffer (4/11) Mar 19 2020 If you want to copy a forward range use save. Anything else is possible

Dukc (4/7) Mar 19 2020 Ouch - my code has to be totally broken. And I don't think I'm

Steven Schveighoffer (9/17) Mar 19 2020 You are not the only one. Almost all range functions are tested with and...

Dukc (9/13) Mar 19 2020 Perhaps there is another way. Individual ranges can give

H. S. Teoh (24/34) Mar 19 2020 The problem is, you *cannot* give any guarantees about copy behaviour,

Dukc (5/16) Mar 19 2020 The documentation would say that the copy behaviour of the range

H. S. Teoh (26/45) Mar 19 2020 A guarantee about copy behaviour potentially binds the Phobos

H. S. Teoh (13/30) Mar 19 2020 Yeah, Andrei has said that in retrospect, .save was a design mistake, he

Jonathan M Davis (14/32) Mar 20 2020 Ranges get copied all the time when passing them around. You're not goin...
H. S. Teoh (14/33) Mar 20 2020 Yes, that's right. Actually, for by-value ranges the act of passing

Ben Jones (4/20) Mar 20 2020 So range "copy" is really what a C++ person would call range

H. S. Teoh (10/21) Mar 20 2020 [...]
Jonathan M Davis (37/66) Mar 20 2020 You more or less have to view it that way, though no move is actually ta...
H. S. Teoh (8/13) Mar 22 2020 [...]

Paul Backus (6/18) Mar 22 2020 Technically, this is already legal--isInputRange accepts

H. S. Teoh (9/27) Mar 23 2020 I don't think it's wise to do this with the existing codebase. A change

Paul Backus (5/20) Mar 23 2020 Agreed. I was only talking about making pure input ranges

Jonathan M Davis (9/19) Mar 23 2020 A range that can't be copied is useless. It won't even work with foreach...

Steven Schveighoffer (10/28) Mar 23 2020 foreach(x; uncopyableRange.move)

Pezbi Mahn (6/31) Mar 23 2020 That=E2=80=99s dope homie

Dukc <ajieskola gmail.com> writes:

```
void main()
{	import std;
	auto chunks = [14, 16, 18, 20, 22, 20, 18, 16]
	.chunkBy!((a, b) => a / 10 == b / 10);
	
	//[[14, 16, 18], [20, 22, 20], [18, 16]]
	chunks
	.map!(chunk => chunk.array)
	.array
	.writeln;
	
	//[]
	chunks
	.map!(chunk => chunk.array)
	.array
	.writeln;
}
```

This is how chunkBy[1] currently behaves when copying. It 
essentially behaves like a reference range: it will only save 
it's state when `save` is explicitly called, not otherwise, even 
if the chunked source range is a forward range with value 
semantics.

As most Phobos ranges preserve the value semantics of their 
source ranges, this is an odd exception, especially as the 
documentation says nothing about it.

So the question is, how should it behave? I have looked at the 
implementation, it should be trivial to have it to use value 
semantics. On the oher hand, someone may rely on the present 
behaviour.

[1] https://dlang.org/phobos/std_algorithm_iteration.html#.chunkBy

Mar 18 2020

Paul Backus <snarwin gmail.com> writes:

On Wednesday, 18 March 2020 at 14:57:41 UTC, Dukc wrote:
 So the question is, how should it behave? I have looked at the 
 implementation, it should be trivial to have it to use value 
 semantics. On the oher hand, someone may rely on the present 
 behaviour.

 [1] 
 https://dlang.org/phobos/std_algorithm_iteration.html#.chunkBy

As I understand it, the result of using a range after copying it 
is unspecified, and relying on any particular behavior in that 
case is a bug. So changing the behavior will only break code that 
is (in some sense) "already broken."

Of course, given the above, the question of how it "should" 
behave is rendered moot. If you're copying a range, you must 
either use .save, or only use the copy and not the original from 
that point forward. In fact, if you want to catch as many errors 
as possible at compile time, the strictest approach would be to 
mark the range as non-copyable ( disable this(this)) and require 
the use of either .save or move to copy it.

Mar 18 2020

Steven Schveighoffer <schveiguy gmail.com> writes:

On 3/18/20 10:57 AM, Dukc wrote:
 ```
 void main()
 {    import std;
      auto chunks = [14, 16, 18, 20, 22, 20, 18, 16]
      .chunkBy!((a, b) => a / 10 == b / 10);
 
      //[[14, 16, 18], [20, 22, 20], [18, 16]]
      chunks
      .map!(chunk => chunk.array)
      .array
      .writeln;
 
      //[]
      chunks
      .map!(chunk => chunk.array)
      .array
      .writeln;
 }
 ```
 
 This is how chunkBy[1] currently behaves when copying. It essentially 
 behaves like a reference range: it will only save it's state when `save` 
 is explicitly called, not otherwise, even if the chunked source range is 
 a forward range with value semantics.
 
 As most Phobos ranges preserve the value semantics of their source 
 ranges, this is an odd exception, especially as the documentation says 
 nothing about it.
 
 So the question is, how should it behave? I have looked at the 
 implementation, it should be trivial to have it to use value semantics. 
 On the oher hand, someone may rely on the present behaviour.
 
 [1] https://dlang.org/phobos/std_algorithm_iteration.html#.chunkBy

According to the range spec, save must be called if you wish to preserve 
state.

In practice, this is seldom done, because most of the time, save just 
does `return this;`, so copying works just the same.

The short answer is, use save.

The long answer is, rangeBy uses reference counting internally to store 
the range data. I'm not sure why this is, as it seems possible to do 
without this mechanism. It seems there is some optimization surrounding 
pushing along the range without iterating it twice. But I don't know if 
that's a worthwhile optimization, especially if the allocation and 
reference counting are more expensive than the iteration.

I would think you could get away with simply storing the range twice, 
and using .save to make sure you don't have problems.

-Steve

Mar 18 2020

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Wed, Mar 18, 2020 at 12:18:04PM -0400, Steven Schveighoffer via
Digitalmars-d wrote:
[...]
 The long answer is, rangeBy uses reference counting internally to
 store the range data. I'm not sure why this is,

Because Andrei demanded it when I submitted the PR. The main reason is
to address the issue of what happens when you iterate the outer range
while retaining a copy of an inner range. In my original implementation,
advancing the inner range also advances the outer range, and vice versa,
but only when it's an input range. When it's a forward range, it takes
advantage of .save to decouple iteration on an inner range from
iteration on the outer range.  But that meant that iterating over the
entire thing required traversing the original range twice: once in each
inner range, and once on the outer range.  Andrei didn't like this, so
we hashed out several solutions and eventually settled on the reference
counting one.


 as it seems possible to do without this mechanism. It seems there is
 some optimization surrounding pushing along the range without
 iterating it twice. But I don't know if that's a worthwhile
 optimization, especially if the allocation and reference counting are
 more expensive than the iteration.

That's up for debate, but yes, the whole point of the reference counting
thing is to avoid traversing a forward range twice when iterating over
subranges.  It really depends on what the original range does, IMO.  If
it's generating values on-the-fly with an expensive calculation, you
could be saving quite a bit of work. But for simple ranges, yeah,
reference-counting inner ranges is kinda overkill, probably with a
performance hit.


 I would think you could get away with simply storing the range twice,
 and using .save to make sure you don't have problems.

[...]

This is what the original implementation did, but Andrei did not like
it.


T

-- 
Старый друг лучше новых двух.

Mar 18 2020

Steven Schveighoffer <schveiguy gmail.com> writes:

On 3/18/20 12:37 PM, H. S. Teoh wrote:
 On Wed, Mar 18, 2020 at 12:18:04PM -0400, Steven Schveighoffer via
Digitalmars-d wrote:

 as it seems possible to do without this mechanism. It seems there is
 some optimization surrounding pushing along the range without
 iterating it twice. But I don't know if that's a worthwhile
 optimization, especially if the allocation and reference counting are
 more expensive than the iteration.

 
 That's up for debate, but yes, the whole point of the reference counting
 thing is to avoid traversing a forward range twice when iterating over
 subranges.  It really depends on what the original range does, IMO.  If
 it's generating values on-the-fly with an expensive calculation, you
 could be saving quite a bit of work. But for simple ranges, yeah,
 reference-counting inner ranges is kinda overkill, probably with a
 performance hit.

Ugh, I think this is way overkill in most cases, and depends heavily on 
where the performance hit is.

Not only that, but you are only seeing a benefit if you iterate a chunk 
completely (I think).

For instance, an array has nearly zero cost to iterate elements, but the 
predicate for checking the chunking is probably way more expensive. The 
algorithm would be nicer if you simply iterated the array until you 
found the boundary, and then returned a slice as the element. Only one 
iteration through everything necessary (you pre-calculate the "next" range).

In other cases, iterating the elements is going to be expensive, so you 
don't want to do that twice if possible.

I think a good solution might be to provide different versions of the 
function or an enum designating what mechanism to prefer (e.g. 
IterationPolicy.MinimizePopfront or IterationPolicy.MinimizePredicateEval).

And of course, the .save behavior sucks, as always.

-Steve

Mar 18 2020

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Wed, Mar 18, 2020 at 01:06:02PM -0400, Steven Schveighoffer via
Digitalmars-d wrote:
 On 3/18/20 12:37 PM, H. S. Teoh wrote:

[...]
 That's up for debate, but yes, the whole point of the reference
 counting thing is to avoid traversing a forward range twice when
 iterating over subranges.


[...]
 Ugh, I think this is way overkill in most cases, and depends heavily
 on where the performance hit is.
 
 Not only that, but you are only seeing a benefit if you iterate a
 chunk completely (I think).

Yeah probably.


 For instance, an array has nearly zero cost to iterate elements, but
 the predicate for checking the chunking is probably way more
 expensive. The algorithm would be nicer if you simply iterated the
 array until you found the boundary, and then returned a slice as the
 element. Only one iteration through everything necessary (you
 pre-calculate the "next" range).

We *could* just detect hasSlicing!R and switch to an alternative
implementation that doesn't need to jump through all of those
refcounting hoops.


 In other cases, iterating the elements is going to be expensive, so
 you don't want to do that twice if possible.
 
 I think a good solution might be to provide different versions of the
 function or an enum designating what mechanism to prefer (e.g.
 IterationPolicy.MinimizePopfront or
 IterationPolicy.MinimizePredicateEval).

[...]

It sounds like a good idea, but these days I'm wary of proliferating
these implementation-detail-based parameters.  They're a pain to write,
a pain to use, and a pain to maintain, and once you have the option
you're committed to supporting it even if one day you invent a better
algorithm that completely changes the implementation.

I much rather detect hasSlicing on the incoming range, and switching to
a better implementation.


T

-- 
INTEL = Only half of "intelligence".

Mar 18 2020

Steven Schveighoffer <schveiguy gmail.com> writes:

On 3/18/20 2:30 PM, H. S. Teoh wrote:

 It sounds like a good idea, but these days I'm wary of proliferating
 these implementation-detail-based parameters.  They're a pain to write,
 a pain to use, and a pain to maintain, and once you have the option
 you're committed to supporting it even if one day you invent a better
 algorithm that completely changes the implementation.
 
 I much rather detect hasSlicing on the incoming range, and switching to
 a better implementation.

The problem is that you don't know what the bottleneck is, only the user 
does.

auto r = arr.map!(elem => someHorribleCalculation(elem));

static assert(hasSlicing(typeof(r))); // 2x horrible calculations

The thing I would do is provide the "implementation detail" mechanisms, 
but default to something reasonable. It could even change based on 
whether slicing is available (the default, that is).

-Steve

Mar 18 2020

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Wed, Mar 18, 2020 at 02:55:35PM -0400, Steven Schveighoffer via
Digitalmars-d wrote:
 On 3/18/20 2:30 PM, H. S. Teoh wrote:

[...]
 I much rather detect hasSlicing on the incoming range, and switching
 to a better implementation.

 
 The problem is that you don't know what the bottleneck is, only the
 user does.
 
 auto r = arr.map!(elem => someHorribleCalculation(elem));
 
 static assert(hasSlicing(typeof(r))); // 2x horrible calculations
 
 The thing I would do is provide the "implementation detail"
 mechanisms, but default to something reasonable. It could even change
 based on whether slicing is available (the default, that is).

[...]

Fair enough.

Still, I think we should totally detect hasSlicing, or at least built-in
arrays, and default that to the slicing implementation instead.


T

-- 
Bomb technician: If I'm running, try to keep up.

Mar 18 2020

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Wed, Mar 18, 2020 at 02:57:41PM +0000, Dukc via Digitalmars-d wrote:
[...]
 This is how chunkBy[1] currently behaves when copying. It essentially
 behaves like a reference range: it will only save it's state when
 `save` is explicitly called, not otherwise, even if the chunked source
 range is a forward range with value semantics.

The current range API design, which Andrei himself admitted was not the
best design in retrospect, does not specify behaviour on copying a
range.  IOW, it's the application-level equivalent of undefined
behaviour.  Generally, this is not a problem because usually you're
working with your own range types which you already know the semantics
of.  But in generic code, assumptions of this sort are a no-no,
precisely because of breakages of this sort when different ranges behave
differently on copy.

tl;dr: never copy a range directly, always use .save.  Never assume a
range remains unchanged after iterating a copy; always assume the worst
and use .save whenever you wish the original range to remain unchanged
afterwards.


 As most Phobos ranges preserve the value semantics of their source
 ranges,

[...]

This is a dangerous assumption.  My personal advice is, if you expect a
range to retain its contents after iteration, call .save explicitly.
Don't assume anything not explicitly required by the range API.


T

-- 
Verbing weirds language. -- Calvin (& Hobbes)

Mar 18 2020

Dukc <ajieskola gmail.com> writes:

On Wednesday, 18 March 2020 at 16:29:53 UTC, H. S. Teoh wrote:
 This is a dangerous assumption.  My personal advice is, if you 
 expect a range to retain its contents after iteration, call 
 .save explicitly. Don't assume anything not explicitly required 
 by the range API.

This means that even if the api does thing X right now, I should 
not assume it won't change unless it's documented, right?

Mar 19 2020

Steven Schveighoffer <schveiguy gmail.com> writes:

On 3/19/20 10:00 AM, Dukc wrote:
 On Wednesday, 18 March 2020 at 16:29:53 UTC, H. S. Teoh wrote:
 This is a dangerous assumption.  My personal advice is, if you expect 
 a range to retain its contents after iteration, call .save explicitly. 
 Don't assume anything not explicitly required by the range API.

 
 This means that even if the api does thing X right now, I should not 
 assume it won't change unless it's documented, right?

If you want to copy a forward range use save. Anything else is possible 
to break in an implementation detail later.

-Steve

Mar 19 2020

Dukc <ajieskola gmail.com> writes:

On Thursday, 19 March 2020 at 14:11:20 UTC, Steven Schveighoffer 
wrote:
 If you want to copy a forward range use save. Anything else is 
 possible to break in an implementation detail later.

 -Steve

Ouch - my code has to be totally broken. And I don't think I'm 
the only one.

Mar 19 2020

Steven Schveighoffer <schveiguy gmail.com> writes:

On 3/19/20 10:28 AM, Dukc wrote:
 On Thursday, 19 March 2020 at 14:11:20 UTC, Steven Schveighoffer wrote:
 If you want to copy a forward range use save. Anything else is 
 possible to break in an implementation detail later.

 
 Ouch - my code has to be totally broken. And I don't think I'm the only 
 one.

You are not the only one. Almost all range functions are tested with and 
use arrays, which do exactly the same for .save or copying. The reason 
why `save` is such a bad solution is because there is generally no 
penalty for ignoring it (until there is).

The only way to "fix" this IMO is to migrate to a system where standard 
(non-forward) input ranges are not copyable, and deprecate save. It 
likely will never happen.

-Steve

Mar 19 2020

Dukc <ajieskola gmail.com> writes:

On Thursday, 19 March 2020 at 14:46:39 UTC, Steven Schveighoffer 
wrote:
 The only way to "fix" this IMO is to migrate to a system where 
 standard (non-forward) input ranges are not copyable, and 
 deprecate save. It likely will never happen.

 -Steve

Perhaps there is another way. Individual ranges can give 
guarantees about their own copy behaviour if they want. We could 
document the current copy behaviour of existing Phobos ranges and 
say that relying on it is thereafter fine.

But then the question with `chunkBy` is, should it take the 
opportunity to start to behave like most other ranges in this 
regard?

Mar 19 2020

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Thu, Mar 19, 2020 at 03:35:28PM +0000, Dukc via Digitalmars-d wrote:
 On Thursday, 19 March 2020 at 14:46:39 UTC, Steven Schveighoffer wrote:
 The only way to "fix" this IMO is to migrate to a system where
 standard (non-forward) input ranges are not copyable, and deprecate
 save. It likely will never happen.


[...]
 Perhaps there is another way. Individual ranges can give guarantees
 about their own copy behaviour if they want. We could document the
 current copy behaviour of existing Phobos ranges and say that relying
 on it is thereafter fine.

The problem is, you *cannot* give any guarantees about copy behaviour,
because it depends on the behaviour of the incoming range. For example,
if you pass the output of .byChunk to another range algorithm, that
second algorithm cannot guarantee copy behaviour anymore.

In fact, all you have to do is to wrap a forward range in an
InputRangeObject (let's say you need to alternate between two different
range types (but with compatible elements) at runtime, then you'll need
to do this), and now you have a forward range with by-reference
semantics that requires the use of .save in order to retain its current
position.

This is (likely) part of the reason why the range API does not specify
the behaviour of a range once it's iterated over without using .save: it
depends on implementation details.


 But then the question with `chunkBy` is, should it take the
 opportunity to start to behave like most other ranges in this regard?

I don't see why this should be a compelling reason to change chunkBy --
since copy behaviour is not specified by the range API (for IMO good
reasons -- it's implementation-specific).  But I do agree that the
implementation of chunkBy could be improved for other reasons, like not
using the expensive ref-counting implementation when the underlying
range is, say, a native array that you could just slice over.


T

-- 
MAS = Mana Ada Sistem?

Mar 19 2020

Dukc <ajieskola gmail.com> writes:

On Thursday, 19 March 2020 at 16:01:39 UTC, H. S. Teoh wrote:
 The problem is, you *cannot* give any guarantees about copy 
 behaviour, because it depends on the behaviour of the incoming 
 range. For example, if you pass the output of .byChunk to 
 another range algorithm, that second algorithm cannot guarantee 
 copy behaviour anymore.

 In fact, all you have to do is to wrap a forward range in an 
 InputRangeObject (let's say you need to alternate between two 
 different range types (but with compatible elements) at 
 runtime, then you'll need to do this), and now you have a 
 forward range with by-reference semantics that requires the use 
 of .save in order to retain its current position.

The documentation would say that the copy behaviour of the range 
is the same as the source range has. A guarantee to depend on the 
source range is still a guarantee, and definitely better than the 
present situation.

Mar 19 2020

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Thu, Mar 19, 2020 at 05:15:51PM +0000, Dukc via Digitalmars-d wrote:
 On Thursday, 19 March 2020 at 16:01:39 UTC, H. S. Teoh wrote:
 
 The problem is, you *cannot* give any guarantees about copy
 behaviour, because it depends on the behaviour of the incoming
 range. For example, if you pass the output of .byChunk to another
 range algorithm, that second algorithm cannot guarantee copy
 behaviour anymore.
 
 In fact, all you have to do is to wrap a forward range in an
 InputRangeObject (let's say you need to alternate between two
 different range types (but with compatible elements) at runtime,
 then you'll need to do this), and now you have a forward range with
 by-reference semantics that requires the use of .save in order to
 retain its current position.

 
 The documentation would say that the copy behaviour of the range is
 the same as the source range has. A guarantee to depend on the source
 range is still a guarantee, and definitely better than the present
 situation.

A guarantee about copy behaviour potentially binds the Phobos
maintainers to the specifics of the current implementation of the
algorithm.  I'm not sure that's what we want, since it may limit future
improvements.

The thing about the range API is that it's like a contract between user
code and Phobos; the way you use it should be according to the contract,
anything not stated by the contract should not be relied upon even if
currently it happens to work.  That's the principle of encapsulation.
Behaviours that arise from the specifics of how something is currently
implemented are implementation details that shouldn't leak into the
calling code, including assumptions about copy behaviour.  Arguably,
relying on a specific copy behaviour is an instance of leaky
abstraction, since outside code shouldn't know nor depend upon Phobos
internal implementation details.  Part of the power of encapsulation is
to leave certain non-essential things unspecified, so that implementors
can have greater freedom in how they implement the API.  Something that
isn't directly related to the particular function (e.g., the primary
function of byChunk) shouldn't be a part of the API, IMO, it should be
left unspecified with the understanding that relying on the behaviour of
something unspecified runs the risk of future breakage.  The specific
behaviour ought to be inside the "black box" of encapsulation, not
relied upon by user code, much less specified in library docs.


T

-- 
"Maybe" is a strange word.  When mom or dad says it it means "yes", but when my
big brothers say it it means "no"! -- PJ jr.

Mar 19 2020

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Thu, Mar 19, 2020 at 10:46:39AM -0400, Steven Schveighoffer via
Digitalmars-d wrote:
 On 3/19/20 10:28 AM, Dukc wrote:
 On Thursday, 19 March 2020 at 14:11:20 UTC, Steven Schveighoffer wrote:
 
 If you want to copy a forward range use save. Anything else is
 possible to break in an implementation detail later.
 

 
 Ouch - my code has to be totally broken. And I don't think I'm the
 only one.

 
 You are not the only one. Almost all range functions are tested with
 and use arrays, which do exactly the same for .save or copying. The
 reason why `save` is such a bad solution is because there is generally
 no penalty for ignoring it (until there is).

Yeah, Andrei has said that in retrospect, .save was a design mistake, he
should have made the input range vs. forward range distinction keyed on
copyability or ref/non-ref instead.  But to be fair, back in the day
when the range API was first designed, D didn't have the necessary
language facilities to be able to handle a better solution, unlike now
when we have  disable, copy ctors, and a bunch of other language
features that make such a solution more feasible.


 The only way to "fix" this IMO is to migrate to a system where
 standard (non-forward) input ranges are not copyable, and deprecate
 save. It likely will never happen.

[...]

It will happen if std.v2 happens... ;-)


T

-- 
"How are you doing?" "Doing what?"

Mar 19 2020

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Wednesday, March 18, 2020 10:29:53 AM MDT H. S. Teoh via Digitalmars-d 
wrote:
 On Wed, Mar 18, 2020 at 02:57:41PM +0000, Dukc via Digitalmars-d wrote:
 [...]

 This is how chunkBy[1] currently behaves when copying. It essentially
 behaves like a reference range: it will only save it's state when
 `save` is explicitly called, not otherwise, even if the chunked source
 range is a forward range with value semantics.

 The current range API design, which Andrei himself admitted was not the
 best design in retrospect, does not specify behaviour on copying a
 range.  IOW, it's the application-level equivalent of undefined
 behaviour.  Generally, this is not a problem because usually you're
 working with your own range types which you already know the semantics
 of.  But in generic code, assumptions of this sort are a no-no,
 precisely because of breakages of this sort when different ranges behave
 differently on copy.

 tl;dr: never copy a range directly, always use .save.  Never assume a
 range remains unchanged after iterating a copy; always assume the worst
 and use .save whenever you wish the original range to remain unchanged
 afterwards.

Ranges get copied all the time when passing them around. You're not going to
avoid it (even using them with foreach copies them). The key thing is not
that a range shouldn't be copied without using save but that if a range is
ever copied, then the original shouldn't be used again, and from that point
on, only the copy should be used.

So, if you pass a range to foreach, a function, or do anything else which
would copy it, then don't use the original again, and if you want to use it
again, then instead of copying it directly, call save to get an independent
copy. Generic code should _never_ rely on the copying semantics of a range,
and even in non-generic code, depending on the semantics of copying a range
whose implementation you don't fully control is just begging for bugs.

- Jonathan M Davis

Mar 20 2020

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Fri, Mar 20, 2020 at 03:59:57PM -0600, Jonathan M Davis via Digitalmars-d
wrote:
 On Wednesday, March 18, 2020 10:29:53 AM MDT H. S. Teoh via Digitalmars-d 
 wrote:

[...]
 tl;dr: never copy a range directly, always use .save.  Never assume
 a range remains unchanged after iterating a copy; always assume the
 worst and use .save whenever you wish the original range to remain
 unchanged afterwards.

 
 Ranges get copied all the time when passing them around. You're not
 going to avoid it (even using them with foreach copies them). The key
 thing is not that a range shouldn't be copied without using save but
 that if a range is ever copied, then the original shouldn't be used
 again, and from that point on, only the copy should be used.

Yes, that's right.  Actually, for by-value ranges the act of passing
them as an argument to a range function in the first place already
copies them.  The catch is really that once this happens, the caller or
whoever retains the original copy should no longer assume that the range
remains in the same place as before.  For some ranges this is true, but
for other ranges this assumption is invalid, and will lead to incorrect
results.


 So, if you pass a range to foreach, a function, or do anything else
 which would copy it, then don't use the original again, and if you
 want to use it again, then instead of copying it directly, call save
 to get an independent copy. Generic code should _never_ rely on the
 copying semantics of a range, and even in non-generic code, depending
 on the semantics of copying a range whose implementation you don't
 fully control is just begging for bugs.

[...]

+1.


T

-- 
Knowledge is that area of ignorance that we arrange and classify. -- Ambrose
Bierce

Mar 20 2020

Ben Jones <fake fake.fake> writes:

On Friday, 20 March 2020 at 22:11:49 UTC, H. S. Teoh wrote:
 On Fri, Mar 20, 2020 at 03:59:57PM -0600, Jonathan M Davis via 
 Digitalmars-d wrote:
 [...]

 [...]
 [...]

 Yes, that's right.  Actually, for by-value ranges the act of 
 passing them as an argument to a range function in the first 
 place already copies them.  The catch is really that once this 
 happens, the caller or whoever retains the original copy should 
 no longer assume that the range remains in the same place as 
 before.  For some ranges this is true, but for other ranges 
 this assumption is invalid, and will lead to incorrect results.


 [...]

 [...]

 +1.


 T

So range "copy" is really what a C++ person would call range 
"move" ?  It might be a copy, or it might invalidate the 
original, depending on the type?

Mar 20 2020

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Fri, Mar 20, 2020 at 10:15:18PM +0000, Ben Jones via Digitalmars-d wrote:
 On Friday, 20 March 2020 at 22:11:49 UTC, H. S. Teoh wrote:

[...]
 Yes, that's right.  Actually, for by-value ranges the act of passing
 them as an argument to a range function in the first place already
 copies them.  The catch is really that once this happens, the caller
 or whoever retains the original copy should no longer assume that
 the range remains in the same place as before.  For some ranges this
 is true, but for other ranges this assumption is invalid, and will
 lead to incorrect results.


[...]
 So range "copy" is really what a C++ person would call range "move" ?
 It might be a copy, or it might invalidate the original, depending on
 the type?

The way it's currently implemented, yes, pretty much.  Unless you're in
control of the exact range type, you should always assume the worst and
not rely on the state of the original range after passing it to a range
function.  If you need to retain the original state, use .save.


T

-- 
You are only young once, but you can stay immature indefinitely. -- azephrahel

Mar 20 2020

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Friday, March 20, 2020 4:15:18 PM MDT Ben Jones via Digitalmars-d wrote:
 On Friday, 20 March 2020 at 22:11:49 UTC, H. S. Teoh wrote:
 On Fri, Mar 20, 2020 at 03:59:57PM -0600, Jonathan M Davis via

 Digitalmars-d wrote:
 [...]

 [...]

 [...]

 Yes, that's right.  Actually, for by-value ranges the act of
 passing them as an argument to a range function in the first
 place already copies them.  The catch is really that once this
 happens, the caller or whoever retains the original copy should
 no longer assume that the range remains in the same place as
 before.  For some ranges this is true, but for other ranges
 this assumption is invalid, and will lead to incorrect results.

 [...]

 [...]

 +1.


 T

 So range "copy" is really what a C++ person would call range
 "move" ?  It might be a copy, or it might invalidate the
 original, depending on the type?

You more or less have to view it that way, though no move is actually taking
place. The problem is that the semantics of what happens when a range is
copied are completely implementation-dependent, making it so that you cannot
depend on theem and thus basically have to consider the range to be in an
invalid state once it's been copied even if it's not technically in an
invalid state.

If a range has by-value copying semantics, then when you copy it, you get
the same as save. If it's a full-on reference type, then mutating the copy
mutates the original. And worst of all, if you have a pseudo-reference type,
then you can end up in a state where mutating the copy mutates only part of
the original, effectively putting it in an invalid state. But even if you
somehow never had to worry about pseudo-reference types, the mere fact that
some ranges have by-value copying semantics whereas others are full-on
reference types makes it so that you can't rely on what happens to a range
once it's been copied. And if code is not being at minimum tested with both
value-type ranges and reference-type ranges, the odds are _very_ high that
it won't handle ranges that aren't value types correctly.

Really, forward ranges should all have by-value copying (thus requiring that
classes be wrapped in structs if they're going to be forward ranges), and
save should be abolished, but that requires a major redesign and likely
would only happen if we did some sort of Phobos v2 (as has occasionally been
discussed). And exactly what should happen with basic input ranges is not
clear, because while ideally, you'd just require that they have full-on
reference semantics, that tends to mean that you're forcing them to be
allocated on the heap, which isn't really the sort of thing that we want to
force if we can avoid it. So, while it's clear what we should do with some
aspects of the range API if we have the opportunity to redesign it, there
are still issues that would have to be sorted out.

Regardless, as things stand, you can't rely on the semantics of copying a
range and basically have to consider that a range has become invalid once
it's been copied. Unfortunately, it's not something that seems to be well
understood and is often handled incorrectly in code. I've been pointing it
out for years (including in my talk at dconf 2015), but we haven't done a
good enough job in general messaging how the range API works, and this is
one of the details that seems to be very easily missed.

- Jonathan M Davis

Mar 20 2020

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Fri, Mar 20, 2020 at 05:50:26PM -0600, Jonathan M Davis via Digitalmars-d
wrote:
[...]
 And exactly what should happen with basic input ranges is not clear,
 because while ideally, you'd just require that they have full-on
 reference semantics, that tends to mean that you're forcing them to be
 allocated on the heap, which isn't really the sort of thing that we
 want to force if we can avoid it.

[...]

You could just have input ranges be a struct with the copy ctor
 disable'd, perhaps?


T

-- 
Those who've learned LaTeX swear by it. Those who are learning LaTeX swear at
it. -- Pete Bleackley

Mar 22 2020

Paul Backus <snarwin gmail.com> writes:

On Sunday, 22 March 2020 at 15:43:45 UTC, H. S. Teoh wrote:
 On Fri, Mar 20, 2020 at 05:50:26PM -0600, Jonathan M Davis via 
 Digitalmars-d wrote: [...]
 And exactly what should happen with basic input ranges is not 
 clear, because while ideally, you'd just require that they 
 have full-on reference semantics, that tends to mean that 
 you're forcing them to be allocated on the heap, which isn't 
 really the sort of thing that we want to force if we can avoid 
 it.

 [...]

 You could just have input ranges be a struct with the copy ctor 
  disable'd, perhaps?


 T

Technically, this is already legal--isInputRange accepts 
non-copyable structs.

Of course, if you accept non-copyable ranges as legitimate, 
someone has to go and fix all of std.algorithm and std.range to 
handle them correctly, which is a non-trivial amount of work.

Mar 22 2020

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Sun, Mar 22, 2020 at 04:23:27PM +0000, Paul Backus via Digitalmars-d wrote:
 On Sunday, 22 March 2020 at 15:43:45 UTC, H. S. Teoh wrote:
 On Fri, Mar 20, 2020 at 05:50:26PM -0600, Jonathan M Davis via
 Digitalmars-d wrote: [...]
 And exactly what should happen with basic input ranges is not
 clear, because while ideally, you'd just require that they have
 full-on reference semantics, that tends to mean that you're
 forcing them to be allocated on the heap, which isn't really the
 sort of thing that we want to force if we can avoid it.

 [...]
 
 You could just have input ranges be a struct with the copy ctor
  disable'd, perhaps?


[...]
 Technically, this is already legal--isInputRange accepts non-copyable
 structs.
 
 Of course, if you accept non-copyable ranges as legitimate, someone
 has to go and fix all of std.algorithm and std.range to handle them
 correctly, which is a non-trivial amount of work.

I don't think it's wise to do this with the existing codebase. A change
as drastic as removing .save will likely require rewriting not just
large chunks of Phobos, but existing user code as well.  This is
probably best left to Phobos v2, if that ever happens.


T

-- 
An elephant: A mouse built to government specifications. -- Robert Heinlein

Mar 23 2020

Paul Backus <snarwin gmail.com> writes:

On Monday, 23 March 2020 at 17:37:16 UTC, H. S. Teoh wrote:
 On Sun, Mar 22, 2020 at 04:23:27PM +0000, Paul Backus via 
 Digitalmars-d wrote:
 Technically, this is already legal--isInputRange accepts 
 non-copyable structs.
 
 Of course, if you accept non-copyable ranges as legitimate, 
 someone has to go and fix all of std.algorithm and std.range 
 to handle them correctly, which is a non-trivial amount of 
 work.

 I don't think it's wise to do this with the existing codebase. 
 A change as drastic as removing .save will likely require 
 rewriting not just large chunks of Phobos, but existing user 
 code as well.  This is probably best left to Phobos v2, if that 
 ever happens.


 T

Agreed. I was only talking about making pure input ranges 
non-copyable (i.e., requiring all code that copies a range to use 
either .save or move), which is not (strictly speaking) a 
breaking change to the existing range interface.

Mar 23 2020

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Sunday, March 22, 2020 9:43:45 AM MDT H. S. Teoh via Digitalmars-d wrote:
 On Fri, Mar 20, 2020 at 05:50:26PM -0600, Jonathan M Davis via
 Digitalmars-d wrote: [...]

 And exactly what should happen with basic input ranges is not clear,
 because while ideally, you'd just require that they have full-on
 reference semantics, that tends to mean that you're forcing them to be
 allocated on the heap, which isn't really the sort of thing that we
 want to force if we can avoid it.

 [...]

 You could just have input ranges be a struct with the copy ctor
  disable'd, perhaps?

A range that can't be copied is useless. It won't even work with foreach,
because anything you iterate over with foreach is copied when it's passed to
foreach. And of course, idiomatic range code passes ranges all over the
place, resulting in a lot of copying. And to wrap a range in another range
(which is required for most range-based algorithms), you have to copy it.
IMHO, it would make far more sense to just use opApply than to try to make a
range non-copyable.

- Jonathan M Davis

Mar 23 2020

Steven Schveighoffer <schveiguy gmail.com> writes:

On 3/23/20 2:15 PM, Jonathan M Davis wrote:
 On Sunday, March 22, 2020 9:43:45 AM MDT H. S. Teoh via Digitalmars-d wrote:
 On Fri, Mar 20, 2020 at 05:50:26PM -0600, Jonathan M Davis via
 Digitalmars-d wrote: [...]

 And exactly what should happen with basic input ranges is not clear,
 because while ideally, you'd just require that they have full-on
 reference semantics, that tends to mean that you're forcing them to be
 allocated on the heap, which isn't really the sort of thing that we
 want to force if we can avoid it.

 [...]

 You could just have input ranges be a struct with the copy ctor
  disable'd, perhaps?

 
 A range that can't be copied is useless. It won't even work with foreach,
 because anything you iterate over with foreach is copied when it's passed to
 foreach.

foreach(x; uncopyableRange.move)

foreach(x; ucr.refCounted)

foreach(x; returnsRvalue())

Not only that, but once you define "non-copyable" as a thing for ranges, 
then you can handle this without having to always tack-on ".move" or 
whatnot.

It would be possible. It wouldn't be as "pretty". But most ranges are at 
least forward ranges, so it wouldn't be a terrible problem.

-Steve

Mar 23 2020

Pezbi Mahn <pezbiworkemail gmail.com> writes:

That=E2=80=99s dope homie

On Mon, Mar 23, 2020 at 2:15 PM Jonathan M Davis via Digitalmars-d <
digitalmars-d puremagic.com> wrote:

 On Sunday, March 22, 2020 9:43:45 AM MDT H. S. Teoh via Digitalmars-d
 wrote:
 On Fri, Mar 20, 2020 at 05:50:26PM -0600, Jonathan M Davis via
 Digitalmars-d wrote: [...]

 And exactly what should happen with basic input ranges is not clear,
 because while ideally, you'd just require that they have full-on
 reference semantics, that tends to mean that you're forcing them to b=



e
 allocated on the heap, which isn't really the sort of thing that we
 want to force if we can avoid it.

 [...]

 You could just have input ranges be a struct with the copy ctor
  disable'd, perhaps?

 A range that can't be copied is useless. It won't even work with foreach,
 because anything you iterate over with foreach is copied when it's passed
 to
 foreach. And of course, idiomatic range code passes ranges all over the
 place, resulting in a lot of copying. And to wrap a range in another rang=

e
 (which is required for most range-based algorithms), you have to copy it.
 IMHO, it would make far more sense to just use opApply than to try to mak=

e
 a
 range non-copyable.

 - Jonathan M Davis

Mar 23 2020

D Programming

C/C++ Programming

Other

digitalmars.D - How is chunkBy supposed to behave on copy