digitalmars.D - Range Redesign: Copy Semantics

Jonathan M Davis (238/238) Jan 20 2024 I've been thinking about this for a while now, but with the next version...

Per =?UTF-8?B?Tm9yZGzDtnc=?= (6/7) Jan 21 2024 Shall

Jonathan M Davis (23/30) Jan 21 2024 Maybe. It probably should be, though I don't know how many changes like ...

Alexandru Ermicioi (4/21) Jan 21 2024 Imho, it must not be added, as this might conflate with contents

Jonathan M Davis (13/36) Jan 21 2024 It doesn't conflate with Nullable!bool at all, because the alias this wa...

Richard (Rikki) Andrew Cattermole (4/9) Jan 21 2024 Umm, why would we bring back Nullable?

Jonathan M Davis (27/37) Jan 21 2024 Much as I know that you like the idea of sum types, AFAIK, it's never be...

Richard (Rikki) Andrew Cattermole (4/21) Jan 21 2024 Given that Walter has put together a DIP for them, I'm fairly confident

Jonathan M Davis (15/38) Jan 21 2024 Well, we'll see what we actually end up with, but I'd consider that prop...

Alexandru Ermicioi (14/46) Jan 21 2024 For all these cases there is `.save` method which seems to be

Jonathan M Davis (39/49) Jan 21 2024 And how on earth would you use a range if it can't be copied? They're co...

Nick Treleaven (6/14) Jan 21 2024 Maybe they could be moved instead of copied when the original is

Jonathan M Davis (45/59) Jan 21 2024 Do you really want to have to call move every time that you call a

Alexandru Ermicioi (19/87) Jan 21 2024 We could use java Iterator api in this case which has also `bool

Jonathan M Davis (49/82) Jan 21 2024 Well, the key change the API that would really matter IMHO would be to m...

Paul Backus (51/62) Jan 21 2024 I think maybe we're losing sight of the big picture in this

Jonathan M Davis (48/91) Jan 22 2024 In practice, basic input ranges don't work with the vast majority of

Paul Backus (43/62) Jan 22 2024 Here's a very rough approximation of the number of algorithms in

Jonathan M Davis (89/140) Jan 22 2024 I will admit that the number is higher than I would have expected, but m...
monkyyy (3/12) Jan 24 2024 Oh, I hope someone pays attension and makes arrays more important

Quirin Schroll (44/57) Jan 23 2024 Not necessarily. First, any algorithm requiring a forward range

Alexandru Ermicioi (9/20) Jan 22 2024 Well one advantage of `hasNext` is that you can check in advance

Jonathan M Davis (68/88) Jan 22 2024 true

Paul Backus (19/23) Jan 21 2024 With the proposed design, it would be possible to implement

Paul Backus (2/5) Jan 21 2024 Correction: fr should be passed by ref (or maybe auto ref).
Jonathan M Davis (9/32) Jan 21 2024 True, but then you have a function with the same copy semantics problem ...

Paul Backus (6/14) Jan 21 2024 [...]

Jonathan M Davis (24/39) Jan 21 2024 Well, even if the parameter is ref (which it would need to be), the core

Jonathan M Davis (18/41) Jan 22 2024 Actually. What occurs to me is that if we made this function take the ra...
Quirin Schroll (15/38) Jan 23 2024 It wouldn’t work. That’s the point. It would compile, probably,

Walter Bright (5/5) Jan 21 2024 Thanks for bringing this up. Improving range semantics is something we s...

Paul Backus (11/14) Jan 21 2024 Surely this would be better handled outside the range API.
Jonathan M Davis (31/36) Jan 21 2024 I will have to study that more to see what's needed there. If it's the k...

Sebastiaan Koppe (24/32) Jan 21 2024 Thanks for the write up.

Paul Backus (11/20) Jan 21 2024 If your concern is performance, an indirect function call is

Ben Jones (13/15) Jan 21 2024 I've been writing a fair amount of Kotlin (an a tiny bit of

Quirin Schroll (7/22) Jan 23 2024 This is off-topic, but I want to mention that the implicit `it`

Jonathan M Davis (54/88) Jan 21 2024 Well, arguably, that's really an implementation detail of the range, and...

Sebastiaan Koppe (30/88) Jan 22 2024 Yes exactly. My point is that if there is an explicit place to do

Steven Schveighoffer (11/40) Jan 21 2024 While I agree with most of what you wrote, there is one very big

Paul Backus (17/25) Jan 21 2024 One possibility is to change the forward range interface to

Jonathan M Davis (34/60) Jan 21 2024 Well, I think that the reality of the matter here is that if we want to ...

Jonathan M Davis (24/32) Jan 21 2024 Yeah, I was orignally hoping that this sort of thing could just be solve...

Atila Neves (9/19) Jan 22 2024 I don't think I've ever encountered a situation where reference

Richard (Rikki) Andrew Cattermole (15/35) Jan 22 2024 I did experiment with them when I first learned ranges.
H. S. Teoh (40/58) Jan 22 2024 It's useful in recursive-descent parsers where you expect the range to

Atila Neves (3/20) Jan 23 2024 Or moved.

H. S. Teoh (8/21) Jan 23 2024 [...]

Paul Backus (15/19) Jan 22 2024 This was my initial thought too, but it turns out it's not quite

Atila Neves (16/34) Jan 23 2024 I don't think we should that because algorithms that don't care
Steven Schveighoffer (21/40) Jan 23 2024 You are thinking about this the wrong way. Copyability becomes a

Paul Backus (27/40) Jan 23 2024 The only problem with this is that what you are asking the range

Nick Treleaven (5/11) Jan 24 2024 `isForwardRange` could require the range *type* to have `empty,

Paul Backus (10/21) Jan 24 2024 It could, but I don't think it's a great idea, because it would

Steven Schveighoffer (26/62) Jan 24 2024 Wouldn't that be on the person who wrapped the type? I wouldn't

Paul Backus (28/49) Jan 24 2024 Yes, it would--which is exactly what I'd like to avoid. I don't

Paul Backus (5/10) Jan 24 2024 Having thought about it some more, this is a bad argument, and I

Jonathan M Davis (14/24) Jan 24 2024 Yeah. We can enforce some stuff statically (e.g. we can require that a
Alexandru Ermicioi (3/5) Jan 24 2024 Isn't `isInputRange`, `isForwardRange`, and the rest, kind of

Jonathan M Davis (18/23) Jan 24 2024 Traits like that can test (and thus be used to enforce) _some_ of the

Alexandru Ermicioi (7/26) Jan 24 2024 That's what unit tests are for, while those traits act like

Jonathan M Davis (29/56) Jan 24 2024 Of course that's what unit tests are for, but that doesn't enforce anyth...

Quirin Schroll (6/9) Feb 01 2024 Even if you could, what would you even test? Especially for

Jonathan M Davis (53/73) Jan 22 2024 Once you disable the copy/postblit constructor, you can't generally pass...

victoroak (14/15) Jan 22 2024 I think one problem with just "next" as the api for an Input
monkyyy (17/20) Jan 22 2024 Can this be summarized as

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

I've been thinking about this for a while now, but with the next version of
Phobos which is in the early planning stages, we really should do some
redesigning of ranges. Most of their API is just fine how it is, but there
are some aspects of it which really should be changed if we want them to be
better (the most obvious one being the removal of auto-decoding). But what
I'd like to discuss specifically in this thread is fixing - and defining -
the semantics of copying and assigning to ranges. Specifically, the
semantics of stuff like

auto copy = orig;

and

range1 = range2;

should be defined (which then defines what happens when passing ranges to
functions and stuff like foreach).

Right now, the only semantics of copying a range which are defined by the
range API are what happens to the copy when you make a copy. Specifically,
the copy will have the exact same elements in the exact same order that the
original would have had if it had been iterated through prior to the copy
being made. If that weren't true, then we couldn't pass ranges around.
However, because ranges support a whole range of types with varying copy
semantics, you can't rely on a range's copy semantics beyond that
(particularly with regards to the state of the original range after the copy
has been made). Specifically, a range type can have four different types of
copy semantics:

1. The range is a value type, so copying it results in two completely
indepedent copies. Iterating through one has no effect on the other.

2. The range is a reference type, so copying it results in two completely
dependent copies. They both share all of their state, so iterating through
one iterates through the other, and they both always have the same state.

3. The range is a pseudo-reference type, and it has value semantics for its
iteration state but reference semantics for some portion of the rest of its
state. This means that if you iterate through a copy of the range, it has no
effect on the original, so the range is a value type in that respect, but
the rest of its state may or may not be copied by value (often - though not
always - this means that the iteration state is directly within the range
type, but the elements themselves are referred to via pointer, so mutating
the actual elements would affect all copies of that range, but each range
can be iterated independently). Dynamic arrays are the most common example
of this sort of range.

4. The range is a pseudo-reference type, but it does not have value
semantics with regards to its iteration state. So, when copying the range,
some portion of its state is copied by value, but you don't end up with
indendepent copies - and in fact, you don't get dependent copies either,
since some of the state is copied by value. A common example of this sort of
range which would be one which contains most of its state via a pointer or
reference but which caches its front as a member variable within the struct.
The result of this is that once you've mutated a copy of the range, you can
no longer use the original, because mutating the copy only partially mutates
the state of the original, leaving it in an inconsistent / invalid state.

Generic code must support ranges with any of these copy semantics, and since
these copy semantics are drastically different from one another, the result
is that in fully generic code, once you copy a range, you cannot ever use
the original again (since you can't depend on its state). And you can't even
assign it a new value and then use it, because the semantics of assignment
aren't defined for ranges either (though they're likely to be strongly tied
to the copy semantics). Once you copy a range, you have to assume that the
orginal range is in a potentially invalid state and can't do anything
further with it.

In addition to copying ranges to new variables and passing them to other
functions, this also means that you can't actually continue to use a range
once it's been passed to foreach, because the lowering that foreach uses
copies the range (and it has to or it would break the semantics of using
dynamic arrays with foreach). So, you can't do stuff like partially iterate
through a range, break out of the loop, and then continue to iterate through
it, because the state of the range is implementation-dependent and could be
invalid (on top of the fact that in the case of many forward ranges, copying
it is equivalent to save, so trying to continue the iteration on the
original won't work, whereas it would work with some basic input ranges, so
even without any ranges ending up in an invalid state, the semantics depend
on the type in a way that does not work with generic code). In fact, someone
necro-ed a thread in D.Learn on this issue with foreach just the other day,
since it's not obvious and can be pretty surprising.

Of course, in order to try to solve the problem with varying copy semantics,
the range API has save for forward ranges. This makes it possible to get an
independent copy of a forward range, and any generic code which needs an
independent copy needs to use it to get one. However, that does not help at
all with the inconsistent semantics of actually copying a range, and it's
very common for people to write code that works with dynamic arrays but then
doesn't work with other types of ranges, because other types of ranges have
differing copy semantics, and they didn't test their code with enough range
types to find where they accidentally relied on the copy semantics (or
assignment semantics) of dynamic arrays (or on the semantics of whatever
range type it was that they tested with).

If we really want to fix these classes of issues, we need ranges to have
consistent copy semantics, which means that we need to define what those
semantics are.

For forward ranges, this is quite straightfoward. We need to require that
copying a forward range results in two copies which can be independently
iterated - so essentially, the iteration state must be copied by value, or
another way to look at it is that copying a forward range must be equivalent
to calling save (though the situation is actually more nuanced than that).
We would be requiring that all forward ranges have the same copying
semantics as dynamic arrays with regards to their iteration state (but would
allow the elements themselves to be copied by value or by reference like we
do now). And of course, we would then get rid of save, since it would be
unnecessary.

What this would mean is that all forward ranges would be either dynamic
arrays or structs. Any code that then wanted a class or a pointer to a
struct to be a forward range would need to wrap it in a struct to give it
the correct copy semantics. A naive implementation would just duplicate the
class whenever the struct was copied, but a smarter implementation could do
something like use reference counting to keep track of how many ranges there
were pointing to that same class object and then duplicate it whenever a
function was called on it which would mutate its iteration state, and the
reference count was greater than 1 (and of course not bother to duplicate it
if the reference count was 1). It wouldn't need to do more memory management
than that (though it could), so that doesn't necessarily require anything
 system or  trusted as often gets discussed with reference counting. Of
course, allowing such an implementation does technically mean that copying a
forward range is not necessarily strictly equivalent to calling save, but
the overall iteration semantics would treat each copy as independent like
save is supposed to allow, so it wouldn't be a problem.

And with that change, we then have consistent copy semantics with forward
ranges. All code operating on forward ranges could treat them the same as
they do dynamic arrays when copying them and passing them around, and you
wouldn't have to worry about forgetting to call save. I think that we can
probably get general agreement on such a change, since it's been discussed
off-and-on for years now. However, the problem then is basic input ranges.

Basic input ranges cannot have the same copy semantics as dynamic arrays. If
they could, then they could be forward ranges. As things stand, they either
have reference semantics, or they have pseudo-reference semantics of the
variety where mutating a copy puts the original in an invalid state. The
only way that we could make their copy semantics consistent is if we then
require that they be reference types. The simplest way to do that would be
to require that they be classes or pointers to structs, though that would
then make it impossible to do anything like reference counting with basic
input ranges, which isn't ideal, but it would have the advantage of allowing
us to force reference semantics at compile time.

This would of course mean that basic input ranges and forward ranges would
then be defined to have different copy semantics, which would mean that
either generic code that relied on the copy semantics would have to work
with only one of them, or we would need to give basic input ranges a
different API. And if we allowed basic input ranges to be structs that
defined reference semantics internally (e.g. by implementing reference
counting on a pointer or class reference), then we would have to give basic
input ranges a different API, because we wouldn't have save to differentiate
them anymore, and if we allowed structs, then we couldn't differentiate them
based on a basic input range being a class reference or pointer to a struct,
whereas a forward range would be a struct or a dynamic array.

Ultimately, I'm inclined to argue that we should give basic input ranges a
new API - not just because it would allow them to use reference counting,
but also because the current input range API tends to force basic input
ranges to cache the value of front (which if anything, encourages basic
input ranges to be pseudo-reference types and could result in an extra layer
of indirection in some cases if they're forced to be reference types). It
would be annoying in some cases to require that different functions be used
for basic input ranges and forward ranges (though overloading could
obviously be used to avoid needing different names), but it's already often
the case that code isn't really designed to work with both, and overloading
on the category of range being used is already fairly common, since
different range capabilities allow for different implementations. So, given
that it would prevent whole classes of copying bugs as well as potentially
remove the requirement to cache front for basic input ranges, I think that a
separate API for basic input ranges is warranted. What I would propose for
that would be a single function

    auto next();

where next returns a nullable type where the value returned is the next
element in the range, with a null value being returned if the range is
empty. The return type would then need to emulate a pointer - specifically,
when casting it to bool, it would be true if it's non-null and false if it's
null, and dereferencing it would give you the actual value if it's non-null
(with it being undefined behavior if you dereference null). So, a basic
input range of ints might define next as

    int* next();

or alternatively, it could be something like

    Nullable!int next();

though that wouldn't work with Phobos' current Nullable type, since it
doesn't support either casting to bool or dereferencing (probably because it
originally used alias this). Either way, since we'd just require the
specific API for the return type rather than requirng a pointer, the range
type would have some flexibliity in what it used.

This would then mean that if you wanted to loop over a basic input range,
you'd do something like

    for(auto front = range.next; !front; front = range.next)
    {
        ...
    }

And if we go down this road, then we could also add this API to foreach,
allowing for code such as

    foreach(e; basicInputRange)
    {
        ...
    }

to work - and unlike now, you could rely on the copy semantics involved such
that you would know that you could then break out of the loop and continue
to use the range (whereas right now, you can't safely break out of a foreach
and then continue to use the range that you were iterating over). Of course,
for forward ranges, breaking out of a foreach still wouldn't let you
continue to iterate from where you were, since the foreach would be
iterating over a copy, but those would be the semantics for all forward
ranges, so you could rely on them.

In addition to these changes to the copy semantics, I propose that we then
require that assignment follow the same semantics as copying. So,

    range1 = range2;

on a forward range would result in two independent copies with the same the
same elements in the same order (regardless of what range1 pointed to
before), just like you'd get with

    auto copy = orig;

whereas for basic input ranges

    range1 = range2;

would result in range1 and range2 being fully dependent copies (regardless
of what range1 pointed to before) - just like what you'd expect when
assigning one pointer to another.

So, the assignment semantics would differ between basic input ranges and
forward ranges, but they would be consistent with their copy semantics, and
they would be consistent across all ranges of the same category (and of
course, assignment would only be expected to work if the two ranges were of
exactly the same type).

The result of all of this is that we could then have consistent, reliable
copy and assignment semantics for both basic input ranges and forward
ranges, and the two would be easily distinguishable so that you don't
accidentally use one as the other and end up with bugs due to the
differences in semantics between the two categories of ranges.

The other potential change related to this would be to rename basic input
ranges. Right now, forward ranges are treated as an extension of input
ranges, which makes sense if you're ignoring the issues surrounding copy
semantics, since a forward range is an input range with an extra function
that allows you to get an independent copy, but if we're giving them
different APIs so that the copy semantics can be cleanly defined and
separated, then arguably, it doesn't make sense to call them both ranges. As
such, we could switch to calling forward ranges simply ranges (which we
mostly do anyway) and give basic input ranges a new name. My initial
suggestion would be an input stream (or if we absolutely had to continue to
call them ranges, then maybe simple ranges), but there's probably a better
name to be had, and obviously, plenty of bikeshedding can be done on that
topic. Either way, the names used here aren't the primary concern.

What I'd like to get out of this thread is feedback on how much sense this
idea does or doesn't make and what problems I'm missing that someone else is
able to see. From what I can see, the main negative is simply that you can't
then write code that works on both a basic input range and a forward range
(though you can obviously still create function overloads so that the caller
can use either), but given the issues surrounding copy semantics, I think
that that's probably ultimately a good change (and the number of functions
that can operate on basic input ranges is already pretty limited anyway in
comparison to forward ranges - particularly with regards to generic
algorithms). It will also make it much easier to discuss the separation
between basic input ranges and forward ranges, which IMHO can be too easy to
lose or confuse as things stand.

- Jonathan M Davis

Jan 20 2024

Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:

On Sunday, 21 January 2024 at 05:00:31 UTC, Jonathan M Davis 
wrote:
 I've been thinking about this for a while now, but with the ...

Shall

- `Nullable.opCast` to `bool`
- `Nullable.opUnary` `*`

be added to Phobos' `std.typecons`?

Jan 21 2024

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Sunday, January 21, 2024 3:36:37 AM MST Per Nordlöw via Digitalmars-d 
wrote:
 On Sunday, 21 January 2024 at 05:00:31 UTC, Jonathan M Davis

 wrote:
 I've been thinking about this for a while now, but with the ...

 Shall

 - `Nullable.opCast` to `bool`
 - `Nullable.opUnary` `*`

 be added to Phobos' `std.typecons`?

Maybe. It probably should be, though I don't know how many changes like that
we're going to do to the current version of Phobos.

Either way, if we're doing a new major version of Phobos, we'll be reworking
a variety of stuff anyway, and so Nullable would presumably be on the list
(e.g. I think that it was a big mistake to make it a range, and I'd remove
that functionality; slicing it to get a range is one thing, but the fact
that it itself is a range definitely causes problems in generic code). And
if the new API for basic input ranges uses a pointer API to access its
elements, then that's a very good reason to make Nullable support that. As I
said in my post, I suspect that the main reason that the current version
doesn't is because it originally used alias this - which we removed due to
all of the bugs that that caused, but giving Nullable an opCast to bool and
* for dereferencing would likely have been a terrible idea when Nullable
used alias this, whereas without that, I can't think of any reason why it
would be a bad idea - though I'd likely still choose to have get and isNull
for those who wanted to make it clear that a Nullable was being used. But I
think that it's better to use a pointer API for basic input ranges, because
then they can actually use pointers in those cases where that makes sense
instead of requiring a Nullable type, much as it also needs to work to use a
Nullable type.

- Jonathan M Davis

Jan 21 2024

Alexandru Ermicioi <alexandru.ermicioi gmail.com> writes:

On Sunday, 21 January 2024 at 10:58:32 UTC, Jonathan M Davis 
wrote:
 On Sunday, January 21, 2024 3:36:37 AM MST Per Nordlöw via 
 Digitalmars-d wrote:
 On Sunday, 21 January 2024 at 05:00:31 UTC, Jonathan M Davis

 wrote:
 I've been thinking about this for a while now, but with the 
 ...

 Shall

 - `Nullable.opCast` to `bool`
 - `Nullable.opUnary` `*`

 be added to Phobos' `std.typecons`?

 Maybe. It probably should be, though I don't know how many 
 changes like that we're going to do to the current version of 
 Phobos.

Imho, it must not be added, as this might conflate with contents 
of `Nullable!bool` types.

Jan 21 2024

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Sunday, January 21, 2024 6:26:16 AM MST Alexandru Ermicioi via Digitalmars-
d wrote:
 On Sunday, 21 January 2024 at 10:58:32 UTC, Jonathan M Davis

 wrote:
 On Sunday, January 21, 2024 3:36:37 AM MST Per Nordlöw via

 Digitalmars-d wrote:
 On Sunday, 21 January 2024 at 05:00:31 UTC, Jonathan M Davis

 wrote:
 I've been thinking about this for a while now, but with the
 ...

 Shall

 - `Nullable.opCast` to `bool`
 - `Nullable.opUnary` `*`

 be added to Phobos' `std.typecons`?

 Maybe. It probably should be, though I don't know how many
 changes like that we're going to do to the current version of
 Phobos.

 Imho, it must not be added, as this might conflate with contents
 of `Nullable!bool` types.

It doesn't conflate with Nullable!bool at all, because the alias this was
removed from Nullable. The only way to get the value out of Nullable is to
use its get member. If we added dereferencing to it, then dereferencing it
would then also work to get its value, but if we added an opCast to bool,
casting would never give the Nullable's value even if the type of its value
were bool. It would just be equivalent to negating the result of isNull.

I agree that if we still had alias this on Nullable, then adding opCast to
bool to it would be problematic - and that's likely why it doesn't have that
opCast, since its original API included alias this, but since the alias this
is now gone, that's no longer a problem. There is no ambiguity.

- Jonathan M Davis

Jan 21 2024

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 21/01/2024 11:58 PM, Jonathan M Davis wrote:
 Either way, if we're doing a new major version of Phobos, we'll be reworking
 a variety of stuff anyway, and so Nullable would presumably be on the list
 (e.g. I think that it was a big mistake to make it a range, and I'd remove
 that functionality; slicing it to get a range is one thing, but the fact
 that it itself is a range definitely causes problems in generic code).

Umm, why would we bring back Nullable?

By then we'd have sum types which can represent it in the language far 
better.

Jan 21 2024

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Sunday, January 21, 2024 10:39:41 AM MST Richard (Rikki) Andrew Cattermole 
via Digitalmars-d wrote:
 On 21/01/2024 11:58 PM, Jonathan M Davis wrote:
 Either way, if we're doing a new major version of Phobos, we'll be
 reworking a variety of stuff anyway, and so Nullable would presumably be
 on the list (e.g. I think that it was a big mistake to make it a range,
 and I'd remove that functionality; slicing it to get a range is one
 thing, but the fact that it itself is a range definitely causes problems
 in generic code).

 Umm, why would we bring back Nullable?

 By then we'd have sum types which can represent it in the language far
 better.

Much as I know that you like the idea of sum types, AFAIK, it's never been
officially agreed that we're going to have them in the language, and even if
we do get them, I don't know what that's actually going to look like. If
it's anything like std.sumtype, IMHO, it would be utterly terrible for when
you want a Nullable type. For that, you want to be able to ask whether the
object has a value, and if it does, you want to be able to fetch its value.
So, while you could use different function names for it, you basically want
the same kind of API that a pointer has. Something where you'd have to pass
delegates in just to get the value out would be incredibly annoying and
bloated in comparison (and personally, I hate std.sumtype in general because
of that design, but it makes a lot more sense when you have a variety of
possible types as opposed to just one which may or may not be there).

But regardless of whether sum types would make sense in this case (which I
very much question), we'd have to know that they were definitely being added
to the language and what exactly that would look like - and they would have
to function in a way that was clearly superior to have a dedicated Nullable
type, or it would make more sense to add a Nullable type and use that for
cases where you want a nullable object.

Obviously, if sum types get added to the language soon enough, and they
really do replace the need for Nullable, then Nullable probably won't be in
Phobos v3, but as thing stand, I would fully expect it to be there. Either
way, using the API of a pointer makes a lot of sense for the proposed basic
input range API, in which case, a range that wanted to return something
other than a pointer would need to return a type that used the same API.

- Jonathan M Davis

Jan 21 2024

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 22/01/2024 3:08 PM, Jonathan M Davis wrote:
 On Sunday, January 21, 2024 10:39:41 AM MST Richard (Rikki) Andrew Cattermole
 via Digitalmars-d wrote:
 On 21/01/2024 11:58 PM, Jonathan M Davis wrote:
 Either way, if we're doing a new major version of Phobos, we'll be
 reworking a variety of stuff anyway, and so Nullable would presumably be
 on the list (e.g. I think that it was a big mistake to make it a range,
 and I'd remove that functionality; slicing it to get a range is one
 thing, but the fact that it itself is a range definitely causes problems
 in generic code).

 Umm, why would we bring back Nullable?

 By then we'd have sum types which can represent it in the language far
 better.

 
 Much as I know that you like the idea of sum types, AFAIK, it's never been
 officially agreed that we're going to have them in the language, and even if
 we do get them, I don't know what that's actually going to look like.

Given that Walter has put together a DIP for them, I'm fairly confident 
it is a question of when not if.

https://github.com/WalterBright/DIPs/blob/sumtypes/DIPs/1NNN-(wgb).md

Jan 21 2024

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Sunday, January 21, 2024 7:17:35 PM MST Richard (Rikki) Andrew Cattermole 
via Digitalmars-d wrote:
 On 22/01/2024 3:08 PM, Jonathan M Davis wrote:
 On Sunday, January 21, 2024 10:39:41 AM MST Richard (Rikki) Andrew
 Cattermole>
 via Digitalmars-d wrote:
 On 21/01/2024 11:58 PM, Jonathan M Davis wrote:
 Either way, if we're doing a new major version of Phobos, we'll be
 reworking a variety of stuff anyway, and so Nullable would presumably be
 on the list (e.g. I think that it was a big mistake to make it a range,
 and I'd remove that functionality; slicing it to get a range is one
 thing, but the fact that it itself is a range definitely causes problems
 in generic code).

 Umm, why would we bring back Nullable?

 By then we'd have sum types which can represent it in the language far
 better.

 Much as I know that you like the idea of sum types, AFAIK, it's never been
 officially agreed that we're going to have them in the language, and even
 if we do get them, I don't know what that's actually going to look like.

 Given that Walter has put together a DIP for them, I'm fairly confident
 it is a question of when not if.

 https://github.com/WalterBright/DIPs/blob/sumtypes/DIPs/1NNN-(wgb).md

Well, we'll see what we actually end up with, but I'd consider that proposal
to be terrible if all you want is a nullable type, and restrictions such as
"Members of sumtypes cannot have copy constructors, postblits, or
destructors" would be far too much for plenty of use cases such that I would
still be very much inclined to have an explicit Nullable type.

In any case, whether Phobos v3 has a Nullable type and the question of the
exact type returned by next in the proposed API don't really matter that
much right now. What matters is the basics of the API, and some of the
details can be ironed out later. IMHO, having next return a type that
emulates a pointer is the best choice (especially when we don't even have
sum types yet), but that can be revisited down the line, particularly since
it's not like we're releasing major changes to ranges right now.

- Jonathan M Davis

Jan 21 2024

Alexandru Ermicioi <alexandru.ermicioi gmail.com> writes:

On Sunday, 21 January 2024 at 05:00:31 UTC, Jonathan M Davis 
wrote:
 1. The range is a value type, so copying it results in two 
 completely indepedent copies. Iterating through one has no 
 effect on the other.

 2. The range is a reference type, so copying it results in two 
 completely dependent copies. They both share all of their 
 state, so iterating through one iterates through the other, and 
 they both always have the same state.

 3. The range is a pseudo-reference type, and it has value 
 semantics for its iteration state but reference semantics for 
 some portion of the rest of its state. This means that if you 
 iterate through a copy of the range, it has no effect on the 
 original, so the range is a value type in that respect, but the 
 rest of its state may or may not be copied by value (often - 
 though not always - this means that the iteration state is 
 directly within the range type, but the elements themselves are 
 referred to via pointer, so mutating the actual elements would 
 affect all copies of that range, but each range can be iterated 
 independently). Dynamic arrays are the most common example of 
 this sort of range.

 4. The range is a pseudo-reference type, but it does not have 
 value semantics with regards to its iteration state. So, when 
 copying the range, some portion of its state is copied by 
 value, but you don't end up with indendepent copies - and in 
 fact, you don't get dependent copies either, since some of the 
 state is copied by value. A common example of this sort of 
 range which would be one which contains most of its state via a 
 pointer or reference but which caches its front as a member 
 variable within the struct. The result of this is that once 
 you've mutated a copy of the range, you can no longer use the 
 original, because mutating the copy only partially mutates the 
 state of the original, leaving it in an inconsistent / invalid 
 state.

For all these cases there is `.save` method which seems to be 
ignored in most of the code and compiler, when it should not be. 
It explicitly defines that range is copied irrelevant of the type 
of range you've mntioned. Since `.save` is defined on forward 
range, this means that pure input range must never be copyable, 
i.e. copy and post constructors must be disabled.

If this rule is followed, issues with copying would dissappear.

Even more drastic measure would be to disable copying on all 
types of ranges, and allow copying only through `.save` method, 
i.e. do explicit copying when it is intended to.

Best regards,
Alexandru.

Jan 21 2024

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Sunday, January 21, 2024 6:22:26 AM MST Alexandru Ermicioi via Digitalmars-
d wrote:
 For all these cases there is `.save` method which seems to be
 ignored in most of the code and compiler, when it should not be.
 It explicitly defines that range is copied irrelevant of the type
 of range you've mntioned. Since `.save` is defined on forward
 range, this means that pure input range must never be copyable,
 i.e. copy and post constructors must be disabled.

 If this rule is followed, issues with copying would dissappear.

 Even more drastic measure would be to disable copying on all
 types of ranges, and allow copying only through `.save` method,
 i.e. do explicit copying when it is intended to.

And how on earth would you use a range if it can't be copied? They're copied
all over the place and have to be to even be used. You copy them when you
pass them to any range-based function. You copy them when you return them
from a range-based function. You copy them when they get wrapped by another
range. You copy them when you pass them to foreach. Disabling copying on
ranges would make them borderline useless. And you also can't disable
copying on classes or dynamic arrays. We could disallow classes which are
ranges (even for basic input ranges) if we really wanted to make it so that
they couldn't be copied, but we couldn't do the same with dynamic arrays.
And really, ranges need to be copyable, because we need to be able to pass
them around around and store them. And while ref could theoretically be used
in some of those cases, in many of them, it could not (e.g. you have to copy
a range once you wrap it in another type, and the ability to do that is
crucial for most range-based algorithms).

Also, forward ranges are supposed to mimic dynamic arrays. C++ iterators are
meant to mimic pointers, and D's ranges are meant to mimic dynamic arrays.
The only reason that we even have save is to support ranges which are
classes. A lot of the problems with range-based code come from code that
assumes that a range behaves like a dynamic array when it doesn't. If we
required that all forward ranges have the same copy semantics as dynamic
arrays, then that problem is solved. Having save means treating forward
ranges differently than we treat dynamic arrays, and it's completely
unnecessary so long as we're willing to require that classes be wrapped in
structs to be forward ranges. The ideal situation here is for ranges in
general to have the same semantics as dynamic arrays with regards to
copying, assignment, and iteration.

So, we can't disable copying for ranges in general and have them work, and
even if we all agreed that disabling copying for ranges in general was a
good idea, we still couldn't do that for dynamic arrays. So, we'd continue
to have the problem that code would be written to be range-based but only
worked with arrays, because that was what it was tested with. By requiring
that forward ranges have the same semantics as dynamic arrays and treating
basic input ranges as a separate thing, it becomes possible to write code
that assumes the same semantics as dynamic arrays and then works with all
forward ranges, which would significantly reduce the number of bugs that we
get in range-based code.

- Jonathan M Davis

Jan 21 2024

Nick Treleaven <nick geany.org> writes:

On Sunday, 21 January 2024 at 13:48:56 UTC, Jonathan M Davis 
wrote:
 And how on earth would you use a range if it can't be copied? 
 They're copied all over the place and have to be to even be 
 used. You copy them when you pass them to any range-based 
 function. You copy them when you return them from a range-based 
 function. You copy them when they get wrapped by another range. 
 You copy them when you pass them to foreach. Disabling copying 
 on ranges would make them borderline useless.

Maybe they could be moved instead of copied when the original is 
not needed.

 And you also can't disable copying on classes or dynamic arrays.

Unless all ranges have to be structs, but then we probably need a 
function/syntax to ask for a range from a class or dynamic array.

Jan 21 2024

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Sunday, January 21, 2024 11:13:02 AM MST Nick Treleaven via Digitalmars-d 
wrote:
 On Sunday, 21 January 2024 at 13:48:56 UTC, Jonathan M Davis

 wrote:
 And how on earth would you use a range if it can't be copied?
 They're copied all over the place and have to be to even be
 used. You copy them when you pass them to any range-based
 function. You copy them when you return them from a range-based
 function. You copy them when they get wrapped by another range.
 You copy them when you pass them to foreach. Disabling copying
 on ranges would make them borderline useless.

 Maybe they could be moved instead of copied when the original is
 not needed.

Do you really want to have to call move every time that you call a
range-based function or want to wrap one range in another? Immovable types
are incredibly un-user-friendly in general.

It also would mean that ranges wouldn't work with foreach with how foreach
currently works, since it copies the range. And if it doesn't copy the
range, then ranges other than dynamic arrays would have different semantics
when being iterated with foreach than dynamic arrays do, which would also be
a big problem.

 And you also can't disable copying on classes or dynamic arrays.

 Unless all ranges have to be structs, but then we probably need a
 function/syntax to ask for a range from a class or dynamic array.

Forward ranges are supposed to mimic dynamic arrays. Having to wrap dynamic
arrays to use them as ranges would be very annoying. And for what? To just
avoid copying? Yes, if copying ranges is illegal, then you've given them all
the same copy semantics, but the result is incredibly unwieldy, and it means
that you can't have dynamic arrays and forward ranges use the same code
without wrapping dynamic arrays. And are people really going to want to be
wrapping dynamic arrays in another type just so that they can make them
uncopyable and the be forced to call move all over the place instead?

The most friendly way for forward ranges to function is to make them more
properly emulate dynamic arrays as they were originally intended to do. The
reason that they don't already is so that we could support classes as
ranges, which while it should work, is not the typical use case, and we can
allow them to be forward ranges almost as easily with a wrapper struct as we
can by having a save function.

If forward ranges all have the same copy semantics as dynamic arrays, then
they become easy to use correctly without having bugs due to stuff like
forgetting to call save - and without a bunch of compiler errors, because
you forgot to use move all over the place. For forward ranges, it would be a
huge win, and it's exactly what we've talked about doing for years now. We
just haven't done it, because doing it in the current version of Phobos
would break existing code.

The bigger problem is then what to do with basic input ranges, because they
can't have the same copy semantics, but the fact that they can't implement
something like save already hamstrings them enough that arguably they should
be treated as something separate anyway. So, by just giving them a separate
API, we simplify the situation with forward ranges considerably, and we
avoid the classes of bugs that come with code being written to work with
both basic input ranges and forward ranges. Of course, there are some cases
where having code written to work with both works just fine, and having
separate APIs could be annoying in those cases, but basic input ranges are
limited enough that most code doesn't support them anyway.

Obviously, the tradeoffs are debatable, but I think that it's clear that
ideal situation for forward ranges is to have them all have the same copy
semantics as dynamic arrays.

- Jonathan M Davis

Jan 21 2024

Alexandru Ermicioi <alexandru.ermicioi gmail.com> writes:

On Sunday, 21 January 2024 at 05:00:31 UTC, Jonathan M Davis 
wrote:
 Ultimately, I'm inclined to argue that we should give basic 
 input ranges a new API - not just because it would allow them 
 to use reference counting, but also because the current input 
 range API tends to force basic input ranges to cache the value 
 of front (which if anything, encourages basic input ranges to 
 be pseudo-reference types and could result in an extra layer of 
 indirection in some cases if they're forced to be reference 
 types). It would be annoying in some cases to require that 
 different functions be used for basic input ranges and forward 
 ranges (though overloading could obviously be used to avoid 
 needing different names), but it's already often the case that 
 code isn't really designed to work with both, and overloading 
 on the category of range being used is already fairly common, 
 since different range capabilities allow for different 
 implementations. So, given that it would prevent whole classes 
 of copying bugs as well as potentially remove the requirement 
 to cache front for basic input ranges, I think that a separate 
 API for basic input ranges is warranted. What I would propose 
 for that would be a single function

     auto next();

We could use java Iterator api in this case which has also `bool 
hasNext()` function.
Then new input range api can also be propagated to other types of 
range such as forward range and further.

 where next returns a nullable type where the value returned is 
 the next element in the range, with a null value being returned 
 if the range is empty. The return type would then need to 
 emulate a pointer - specifically, when casting it to bool, it 
 would be true if it's non-null and false if it's null, and 
 dereferencing it would give you the actual value if it's 
 non-null (with it being undefined behavior if you dereference 
 null). So, a basic input range of ints might define next as

     int* next();

 or alternatively, it could be something like

     Nullable!int next();

The returned type should be a tagged union, that can allow 
storing of actual value inside, or pointer to it, while it's 
interface will hide the details, i.e. get would look like this: 
`ref T get()`.

 though that wouldn't work with Phobos' current Nullable type, 
 since it doesn't support either casting to bool or 
 dereferencing (probably because it originally used alias this). 
 Either way, since we'd just require the specific API for the 
 return type rather than requirng a pointer, the range type 
 would have some flexibliity in what it used.

 This would then mean that if you wanted to loop over a basic 
 input range, you'd do something like

     for(auto front = range.next; !front; front = range.next)
     {
         ...
     }

 And if we go down this road, then we could also add this API to 
 foreach, allowing for code such as

     foreach(e; basicInputRange)
     {
         ...
     }

 to work - and unlike now, you could rely on the copy semantics 
 involved such that you would know that you could then break out 
 of the loop and continue to use the range (whereas right now, 
 you can't safely break out of a foreach and then continue to 
 use the range that you were iterating over). Of course, for

Input ranges could just be disallowed in foreach statements, that 
would solve different semantics between them and forward ranges, 
just like how in Java it is done with Stream api.

 What I'd like to get out of this thread is feedback on how much 
 sense this idea does or doesn't make and what problems I'm 
 missing that someone else is able to see. From what I can see, 
 the main negative is simply that you can't then write code that 
 works on both a basic input range and a forward range (though 
 you can obviously still create function overloads so that the 
 caller can use either), but given the issues surrounding copy 
 semantics, I think that that's probably ultimately a good 
 change (and the number of functions that can operate on basic 
 input ranges is already pretty limited anyway in comparison to 
 forward ranges - particularly with regards to generic 
 algorithms). It will also make it much easier to discuss the 
 separation between basic input ranges and forward ranges, which 
 IMHO can be too easy to lose or confuse as things stand.

Imho, this proposal is complicated, and unnecessarily complicates 
construction of ranges, making them less appealing to implement 
in user code. I'd opt for restricting copying completely, and 
allow copying through `.save` only.

The `.next` method proposal is a good improvement though, with 
addition of `.hasNext` method at minimum.

Jan 21 2024

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Sunday, January 21, 2024 6:50:26 AM MST Alexandru Ermicioi via Digitalmars-
d wrote:
 On Sunday, 21 January 2024 at 05:00:31 UTC, Jonathan M Davis
 wrote:
 We could use java Iterator api in this case which has also `bool
 hasNext()` function.

Well, the key change the API that would really matter IMHO would be to merge
front and popFront, since that would eliminate the need for basic input
ranges to cache front, which many of them are forced to do right now. I'd
have to think about it more to see whether having hasNext instead of
returning a nullable type from next would be an improvement, make it worse,
or be more or less equal. A lot of that depends on how it would affect the
typical implementation of a basic input range.

 Then new input range api can also be propagated to other types of
 range such as forward range and further.

Part of the point here is to _not_ do that, because they fundamentally have
different semantics.

 where next returns a nullable type where the value returned is
 the next element in the range, with a null value being returned
 if the range is empty. The return type would then need to
 emulate a pointer - specifically, when casting it to bool, it
 would be true if it's non-null and false if it's null, and
 dereferencing it would give you the actual value if it's
 non-null (with it being undefined behavior if you dereference
 null). So, a basic input range of ints might define next as

     int* next();

 or alternatively, it could be something like

     Nullable!int next();

 The returned type should be a tagged union, that can allow
 storing of actual value inside, or pointer to it, while it's
 interface will hide the details, i.e. get would look like this:
 `ref T get()`.

Requiring the pointer API already does that. It's just that checking whether
the value is null means casting to bool, and you dereference the return
value rather than calling get. The implementation would be quite free to
then use a Nullable or some other type which does whatever it wants
internally so long as casting to bool tells you whether the object contains
a value, and dereferencing it gives you the value.

 Input ranges could just be disallowed in foreach statements, that
 would solve different semantics between them and forward ranges,
 just like how in Java it is done with Stream api.

They could be, but that wouldn't be very user-friendly. And it doesn't solve
the different semantics of copying them.

 Imho, this proposal is complicated, and unnecessarily complicates
 construction of ranges, making them less appealing to implement
 in user code. I'd opt for restricting copying completely, and
 allow copying through `.save` only.

 The `.next` method proposal is a good improvement though, with
 addition of `.hasNext` method at minimum.

Restricting copying would make ranges borderline unusable. They have to be
able to be passed around and be wrappable, which means either copying them
or moving them, and moving them would be very un-user-friendly, since it
would require explicit calls to move calls all over the place. What we
really want to be able to do in general is treat ranges like dynamic arrays
(and part of the reason to give basic input ranges a different API is
because the can't have the same copy semantics as dynamic arrays).

The suggested API would actually simplify forward ranges considerably,
because then most of them wouldn't have to implement save any longer, and
those few that do would just implement the necessary semantics via their
copy constructor (or a more advanced implementation might use
reference-counting, which would be somewhat more complex, but the vast
majority of ranges would not be in that situation). And the result would be
that forward ranges in general could be treated just like dynamic arrays,
which would simplify the vast majority of range-based code and allow
range-based code to rely on the copy and assignment semantics instead of
having to carefully avoid using ranges which have been copied and avoid
assignment like they have to do now. So, as far as forward ranges go, this
seems to me like it's very much a simplification, not something complicated.
If this is coming across as complicated, then it's probably because of all
of the explanatory text I had to give as to why these changes should be
made. But the changes themselves simplify things for forward ranges.

The part that's arguably more complex is simply that the basic input ranges
then have to have a separate API, but since they fundamentally have
different cropy semantics and can't be used in most range-based functions
anyway, I don't think that it complicates things much. And it makes the
distinction and behavioral differences between basic input ranges and
forward ranges much clearer than they typically are now, which would make
understanding ranges easier.

- Jonathan M Davis

Jan 21 2024

Paul Backus <snarwin gmail.com> writes:

On Monday, 22 January 2024 at 03:52:02 UTC, Jonathan M Davis 
wrote:
 On Sunday, January 21, 2024 6:50:26 AM MST Alexandru Ermicioi 
 via Digitalmars- d wrote:
 Then new input range api can also be propagated to other types 
 of range such as forward range and further.

 Part of the point here is to _not_ do that, because they 
 fundamentally have different semantics.

I think maybe we're losing sight of the big picture in this 
discussion.

The big-picture, overarching goal of the range API is to provide 
a common interface between data structures and algorithms. You 
have N data structures and M algorithms, and by using this common 
interface you can implement all N*M combinations in O(N+M) code. 
Each data structure implements the range API once, and each 
algorithm is implemented once using the range API.

If you split the API in two, by making input streams (basic input 
ranges) and forward ranges completely disjoint, you undermine 
this goal. Now, each data structure has to implement *two* APIs, 
and each algorithm has to be implemented *twice*, once for input 
streams and once for forward ranges.

In practice, what's going to happen is library authors will 
simply not bother to implement both, and we will end up with gaps 
where many (data structure, algorithm) pairs are not covered.

I appreciate the desire to provide consistent copying semantics, 
but if it comes to a choice between that and having a single 
unified API, I will choose the unified API, warts and all.

 Restricting copying would make ranges borderline unusable. They 
 have to be able to be passed around and be wrappable, which 
 means either copying them or moving them, and moving them would 
 be very un-user-friendly, since it would require explicit calls 
 to move calls all over the place.

Worth noting that generic range algorithms already have to 
account for the existence of non-copyable types, since even if a 
range itself is copyable, its elements may not be.

Also, as I pointed out in an earlier reply to Walter [1], there 
are legitimate use-cases for non-copyable input streams, where 
making them copyable would incur a significant performance 
overhead.

I think the correct solution to this is something like DIP 1040, 
which makes non-copyable types less of a pain to work with in 
general.

[1] 
https://forum.dlang.org/post/sullaynnrjufookbuijt forum.dlang.org

---

In light of the points above, my proposed copying semantics for 
input streams are:

1. Input streams MAY be non-copyable, but are not required to be.
2. If you copy an input stream and then call next() on the 
original, the behavior of both the original and the copy is 
unspecified.

That is, I think we should give up on having 100% consistent 
copying semantics in this one case, in order to keep the overall 
API unified (by supporting UFCS next()) and avoid unnecessary 
pessimization of range algorithms.

The good news is that with these semantics, if you write you 
generic code conservatively and treat all input streams as 
non-copyable, you'll get the right behavior in all cases. And if 
you don't, you'll get a compile-time error as soon as you 
actually try to pass in a non-copyable input stream, and you'll 
know exactly how to fix it. So this design is still an 
improvement on the status quo.

Jan 21 2024

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Sunday, January 21, 2024 10:22:44 PM MST Paul Backus via Digitalmars-d 
wrote:
 On Monday, 22 January 2024 at 03:52:02 UTC, Jonathan M Davis

 wrote:
 On Sunday, January 21, 2024 6:50:26 AM MST Alexandru Ermicioi

 via Digitalmars- d wrote:
 Then new input range api can also be propagated to other types
 of range such as forward range and further.

 Part of the point here is to _not_ do that, because they
 fundamentally have different semantics.


 If you split the API in two, by making input streams (basic input
 ranges) and forward ranges completely disjoint, you undermine
 this goal. Now, each data structure has to implement *two* APIs,
 and each algorithm has to be implemented *twice*, once for input
 streams and once for forward ranges.

 In practice, what's going to happen is library authors will
 simply not bother to implement both, and we will end up with gaps
 where many (data structure, algorithm) pairs are not covered.

In practice, basic input ranges don't work with the vast majority of
algorithms anyway. Some do (e.g. map), but pretty much anything that needs
to do more than a simple transformation on the elements ends up needing a
forward range. I really don't think that we're going to lose much by forcing
basic input ranges to be separate.

 Restricting copying would make ranges borderline unusable. They
 have to be able to be passed around and be wrappable, which
 means either copying them or moving them, and moving them would
 be very un-user-friendly, since it would require explicit calls
 to move calls all over the place.

 Worth noting that generic range algorithms already have to
 account for the existence of non-copyable types, since even if a
 range itself is copyable, its elements may not be.

Do they? Non-copyable types do not work with ranges right now. They don't
work with foreach right now. Non-copyable types have their uses to be sure,
but they're also fairly niche. Realistically, very little range-based code
is going to work with them because of how restricted they are to deal with,
and unless you're explicitly testing code with them, you're not going to end
up writing code that works with them. We could bend over backwards to try to
make them work in Phobos, but no one else is going to do that unless they're
using non-copyable types in ranges themselves (which almost no one will be
doing even if we support it), and trying to support non-copyable types will
actively make range-base code worse in many cases, because it will require
calling front multiple times, and many range-based algorithms do real work
in front rather than simply returning a value (e.g. map works that way).

Granted, range-based code in general doesn't tend to be very disciplined
about whether it calls front once or several times, but forcing it to be
several times in order to support non-copyable types will have a real cost
for code that is _way_ more common than non-copyable types.

Personally, I think that we should make it very explicit that non-copyable
types are not supported by ranges. They don't work with them now, and I
don't think that it's even vaguely worth it to try to make ranges work with
them. The cost is too high for too little benefit.

 In light of the points above, my proposed copying semantics for
 input streams are:

 1. Input streams MAY be non-copyable, but are not required to be.
 2. If you copy an input stream and then call next() on the
 original, the behavior of both the original and the copy is
 unspecified.

 That is, I think we should give up on having 100% consistent
 copying semantics in this one case, in order to keep the overall
 API unified (by supporting UFCS next()) and avoid unnecessary
 pessimization of range algorithms.

 The good news is that with these semantics, if you write you
 generic code conservatively and treat all input streams as
 non-copyable, you'll get the right behavior in all cases. And if
 you don't, you'll get a compile-time error as soon as you
 actually try to pass in a non-copyable input stream, and you'll
 know exactly how to fix it. So this design is still an
 improvement on the status quo.

I really think that trying to support non-copyable types is not worth it -
either for the ranges themselves or for their elements. It's not something
that very many people are even going to think about, let alone do. So, if it
were supported, realistically, it would just be with the algorithms in
Phobos and with the code that the few people using such types wrote for
themselves. And even in Phobos, even if we wanted to support it,
realistically, it wouldn't work a lot of the time until the few people that
cared complained about it, because very few programmers would even think
about testisg with non-copyable types. It would be _far_ worse than the
situation we have now with code needing to call save to work correctly and
yet failing to call save all over the place, because most ranges don't need
it. Even with Phobos, we have had many bugs over the years due to stuff like
forgetting to call save or reusing a range, because you have to test with
way too many range types to catch all of the edge cases. IMHO, we need to be
trying to reduce the number of edge cases, not increasing them.

And it's not like non-copyable types have ever worked properly with ranges -
so not supporting them is not removing capabiltiies. It's just not adding
capabilities that a select few would like to have. And while I feel for
them, I really think that it's far too niche to support given how costly it
is to support.

- Jonathan M Davis

Jan 22 2024

Paul Backus <snarwin gmail.com> writes:

On Monday, 22 January 2024 at 23:07:28 UTC, Jonathan M Davis 
wrote:
 In practice, basic input ranges don't work with the vast 
 majority of algorithms anyway. Some do (e.g. map), but pretty 
 much anything that needs to do more than a simple 
 transformation on the elements ends up needing a forward range. 
 I really don't think that we're going to lose much by forcing 
 basic input ranges to be separate.

Here's a very rough approximation of the number of algorithms in 
Phobos that work with basic input ranges:

     $ grep 'isInputRange' std/algorithm/*.d | wc -l
     153

For comparison, here are the other range types:

     $ grep 'isForwardRange' std/algorithm/*.d | wc -l
     141
     $ grep 'isBidirectionalRange' std/algorithm/*.d | wc -l
     42
     $ grep 'isRandomAccessRange' std/algorithm/*.d | wc -l
     62

So, I would say that your hypothesis is not supported by the 
data, in this case.

 Do they? Non-copyable types do not work with ranges right now. 
 They don't work with foreach right now. Non-copyable types have 
 their uses to be sure, but they're also fairly niche.

Work is being done to make ranges work with non-copyable types 
(e.g. [1]).

Foreach works if the elements are rvalues, or the elements are 
lvalues and you use ref for the loop variable. This is the 
correct, expected behavior.

I think a big part of the reason non-copyable types are not 
widely used is that a lot of existing library code was written 
without support for them in mind. Since we have an opportunity to 
start from scratch with Phobos v3, there is no reason to repeat 
this mistake.

[1] https://github.com/dlang/phobos/pull/8721

 trying to support non-copyable types will actively make 
 range-base code worse in many cases, because it will require 
 calling front multiple times, and many range-based algorithms 
 do real work in front rather than simply returning a value 
 (e.g. map works that way).

Have you ever actually written generic code that works with 
non-copyable types? I have, and it mostly involves (a) passing 
things by ref, (b) using 'auto ref' and core.lifetime.forward a 
lot, and (c) checking isCopyable!T before trying to copy things.

I have no idea why you think this would require calling front 
multiple times, or doing extra work. Perhaps you could try 
writing out some example code to clarify?

 Personally, I think that we should make it very explicit that 
 non-copyable types are not supported by ranges. They don't work 
 with them now, and I don't think that it's even vaguely worth 
 it to try to make ranges work with them. The cost is too high 
 for too little benefit.

First, as a general rule, I don't think that we, as 
standard-library authors, should be in the business of telling 
our users that their use-cases are illegitimate just because it 
makes our jobs more difficult.

Second...how do you know the benefit is too little? You thought 
earlier that "the vast majority of algorithms" didn't work with 
basic input ranges, even though the actual data shows that isn't 
true. Unless you have concrete evidence, I think it is better to 
avoid making assumptions about which use-cases are important and 
which aren't.

Jan 22 2024

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Monday, January 22, 2024 5:57:49 PM MST Paul Backus via Digitalmars-d 
wrote:
 On Monday, 22 January 2024 at 23:07:28 UTC, Jonathan M Davis

 wrote:
 In practice, basic input ranges don't work with the vast
 majority of algorithms anyway. Some do (e.g. map), but pretty
 much anything that needs to do more than a simple
 transformation on the elements ends up needing a forward range.
 I really don't think that we're going to lose much by forcing
 basic input ranges to be separate.

 Here's a very rough approximation of the number of algorithms in
 Phobos that work with basic input ranges:

      $ grep 'isInputRange' std/algorithm/*.d | wc -l
      153

 For comparison, here are the other range types:

      $ grep 'isForwardRange' std/algorithm/*.d | wc -l
      141
      $ grep 'isBidirectionalRange' std/algorithm/*.d | wc -l
      42
      $ grep 'isRandomAccessRange' std/algorithm/*.d | wc -l
      62

 So, I would say that your hypothesis is not supported by the
 data, in this case.

I will admit that the number is higher than I would have expected, but most
of those algorithms are on the simple side where the algorithm itself
doesn't care whether the range is consumed, whereas the caller very much
cares. They were written to accept basic input ranges, because the algorithm
itself will work with a basic input range, but that doesn't mean that it can
actually be used with basic input ranges to do anything useful. For
instance, you can use startsWith on a basic input range, but you pretty much
never would, because then you've consumed it - and actually, it's not even
safe to use the rest of the range in generic code after passing it to
startsWith, because it was copied, and the original could be in an invalid
state. The same goes with any function that takes an argument and gives you
a result that doesn't wrap the input range and return it or return a
partially iterated range.

In my experience, it's very rare that it's possible to make anything that
isn't incredibly simple work with a basic input range due to the fact that
looking at it consumes it and makes it impossible to then do anything useful
with it. And it's easy to write simple algorithms which will technically
work with a basic input range but which aren't actually useful with a basic
input range.

So, sure, a surprising number of the algorithms in std.algorithm work with
isInputRange, but a number of them really shouldn't be used with basic input
ranges. Now, a few of them might be more useful if you could rely on a basic
input range having reference semantics, but their composibility in general
is pretty poor.

Either way, I think that the gain from being able to require specific copy
semantics for both forward ranges and basic input ranges is worth it. And
since such a requirement couldn't actually be enforced at compile time
unless we made it impossible to have basic input ranges that were structs
(which would then make reference counting impossible), then a specific
algorithm could choose to add a function to forward ranges via UFCS and
support them if it didn't care about the copy semantics, but like with
 trusted code, I think that it should then be up to the programmer to make
sure that it does the right thing so that basic input ranges can be required
to have reference semantics.

 Foreach works if the elements are rvalues, or the elements are
 lvalues and you use ref for the loop variable. This is the
 correct, expected behavior.

Well, I couldn't get it to work when I tried, but either way, it means that
you can't use foreach on a range of non-copyable types in generic code,
because the way that you use foreach would have to be the same regardless of
the elment type. Of course, you could use static if and branch the code
based on whether the types are copyable or not, but that's yet another
complication to range implementations which are already often far too
complicated with too many static if branches.

 trying to support non-copyable types will actively make
 range-base code worse in many cases, because it will require
 calling front multiple times, and many range-based algorithms
 do real work in front rather than simply returning a value
 (e.g. map works that way).

 Have you ever actually written generic code that works with
 non-copyable types? I have, and it mostly involves (a) passing
 things by ref, (b) using 'auto ref' and core.lifetime.forward a
 lot, and (c) checking isCopyable!T before trying to copy things.

 I have no idea why you think this would require calling front
 multiple times, or doing extra work. Perhaps you could try
 writing out some example code to clarify?

If you can't rely on front returning something that you can copy, then
you're going to have to call front multiple times any time that you need to
access front multiple times - either that or call move on it, which you
can't necessarily do safely, because that would alter the range. Algorithms
that don't have to look at front more than once don't have that problem, but
plenty of algorithms need to do multiple things with front. And often, it's
much better to just store the result of front rather than call front
multiple times, because it's often the case that front does actual work that
you'd rather not do over and over again (and it's particularly bad when you
have stuff like the result of map where the transformation function it was
given allocates on the heap).

So the result of having to worry about non-copyable types is that algorithms
that need to access front multiple times are either going to end up calling
front multiple times, harming performance in many cases, or they're going to
have static if branches to change what they're doing based on the return
type of front, which is going to make them that much more complex to
implement.

 Personally, I think that we should make it very explicit that
 non-copyable types are not supported by ranges. They don't work
 with them now, and I don't think that it's even vaguely worth
 it to try to make ranges work with them. The cost is too high
 for too little benefit.

 First, as a general rule, I don't think that we, as
 standard-library authors, should be in the business of telling
 our users that their use-cases are illegitimate just because it
 makes our jobs more difficult.

 Second...how do you know the benefit is too little? You thought
 earlier that "the vast majority of algorithms" didn't work with
 basic input ranges, even though the actual data shows that isn't
 true. Unless you have concrete evidence, I think it is better to
 avoid making assumptions about which use-cases are important and
 which aren't.

IMHO, it's already way too complicated to write correct range-based code,
and adding non-copyable types into the mix is very much a step too far.
Obviously, there can be disagreement on that, but it's already very
difficult to write range-based code that actually works properly with all
types of ranges. You have to test it with a ridiculous number of types, and
most people just don't do that. Phobos often doesn't do that. And every
capability that we add to ranges means even more static if blocks and even
more variations that need to be tested. The only way to fix that is to
simplify things by putting greater restrictions and requirements on how
ranges work, whereas making them work with a wider variety of things is
going in the opposite direction of that.

Realistically, the best that even might happen here for folks who want
ranges to work with non-copyable types is that a subset of the algorithms in
Phobos will be made to work with them, but third-party libraries are not
going to go to that level of effort (and I don't think that it's all
reasonable to tell them that they should). And Phobos' range-based code will
become even more complicated and that much harder to maintain and make sure
that it works correctly. And we already have too much trouble making sure
that Phobos' range-based code works correctly.

So, my take on it is that supporting non-copyable types is simply not worth
the effort given that it's going to complicate the code and the tests that
much more, and they're already too complex.

And so, if it's my choice, then we simply won't support non-copyable types
with ranges. Obviously, I'm not the sole voice in this, and ultimately, the
choice is going to be up to Walter and Atila (probably Atila given that he's
supposed to be the one in charge of Phobos), so I could definitely be
overruled on this, but I very much want to see it become easier to write
correct range-based code, not have it become harder.

- Jonathan M Davis

Jan 22 2024

monkyyy <crazymonkyyy gmail.com> writes:

On Tuesday, 23 January 2024 at 00:57:49 UTC, Paul Backus wrote:
 
     $ grep 'isForwardRange' std/algorithm/*.d | wc -l
     141
     $ grep 'isBidirectionalRange' std/algorithm/*.d | wc -l
     42
     $ grep 'isRandomAccessRange' std/algorithm/*.d | wc -l
     62

 So, I would say that your hypothesis is not supported by the 
 data, in this case.

Oh, I hope someone pays attension and makes arrays more important 
then doubly linked lists

Jan 24 2024

Quirin Schroll <qs.il.paperinik gmail.com> writes:

On Monday, 22 January 2024 at 05:22:44 UTC, Paul Backus wrote:
 You have N data structures and M algorithms, and by using this 
 common interface you can implement all N*M combinations in 
 O(N+M) code. Each data structure implements the range API once, 
 and each algorithm is implemented once using the range API.

 If you split the API in two, by making input streams (basic 
 input ranges) and forward ranges completely disjoint, you 
 undermine this goal. Now, each data structure has to implement 
 *two* APIs, and each algorithm has to be implemented *twice*, 
 once for input streams and once for forward ranges.

Not necessarily. First, any algorithm requiring a forward range 
wouldn’t be implemented twice. And, most importantly, a data 
structure would not offer input range iteration if it can offer 
forward range. Why would it?

 In practice, what's going to happen is library authors will 
 simply not bother to implement both, and we will end up with 
 gaps where many (data structure, algorithm) pairs are not 
 covered.

The only realistic concern I see with the proposal is algorithms 
being implemented requiring forward ranges when they could (also) 
work with basic input ranges. Overconstraining input is not 
specific to ranges, though.

What Jonathan never mentioned in his proposal, but something 
that, as far as I see, is true: One can wrap any forward range to 
be an input range. Heck, the pointer to any forward range is an 
input range, assuming there is a global `next(ForwardRange*)` 
function utilizing the forward range primitives. If you pass 
`&forwardRange` to an input range algorithm, it may copy it 
around as it pleases, it will change the underlying range just 
like an input range should; and you’d have the `&` for an 
indication that it does.

A problem I see with this is composition. Requiring the lifting 
to a pointer requires lvalues, and composition lives on not 
having those. The issue will generally be negligible, as most 
range algorithms returning a range will for the most part want to 
preserve as many of the abilities of the original range as 
possible. The difference in the algorithm would be that, instead 
of:
```d
// implement input range
static if (/* base range is forward range */)
     // implement forward range
static if (/* base range is bidi range */
     // implement bidi range
// etc.
```
You would have:
```d
static if (/* base range is input range */)
     // implement input range
else:
static if (/* base range is forward range */)
     // implement forward range
static if (/* base range is bidi range */)
     // implement bidi range
// etc.
```

Jan 23 2024

Alexandru Ermicioi <alexandru.ermicioi gmail.com> writes:

On Monday, 22 January 2024 at 03:52:02 UTC, Jonathan M Davis 
wrote:
 I'd have to think about it more to see whether having hasNext 
 instead of returning a nullable type from next would be an 
 improvement, make it worse, or be more or less equal. A lot of 
 that depends on how it would affect the typical implementation 
 of a basic input range.

Well one advantage of `hasNext` is that you can check in advance 
whether range is empty or not, without the need to pop front 
element.

 Part of the point here is to _not_ do that, because they 
 fundamentally have different semantics.

That would mean that you need to overload a method to accept both 
input, and forward ranges when foreach is not used, right?

 If this is coming across as complicated, then it's probably 
 because of all of the explanatory text I had to give as to why 
 these changes should be made. But the changes themselves 
 simplify things for forward ranges.

Could be the case :). Perhaps it would be nice to see it as a 
code example.

Jan 22 2024

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Monday, January 22, 2024 2:17:08 AM MST Alexandru Ermicioi via Digitalmars-
d wrote:
 On Monday, 22 January 2024 at 03:52:02 UTC, Jonathan M Davis

 wrote:
 I'd have to think about it more to see whether having hasNext
 instead of returning a nullable type from next would be an
 improvement, make it worse, or be more or less equal. A lot of
 that depends on how it would affect the typical implementation
 of a basic input range.

 Well one advantage of `hasNext` is that you can check in advance
 whether range is empty or not, without the need to pop front
 element.

true

 Part of the point here is to _not_ do that, because they
 fundamentally have different semantics.

 That would mean that you need to overload a method to accept both
 input, and forward ranges when foreach is not used, right?

Yes, though in practice, most functions tend to require forward ranges
anyway. So, while the separation will be annoying in some cases, in general,
I wouldn't expect it to be a big problem, and if anything, it will help make
the distinction between basic input ranges and forward ranges clearer, which
will probably make it easier to reason about the code. Obviously, we'll have
to actually do it to see what the impact is, but realistically, about all
you can do with basic input ranges is iterate through them once and possibly
do transformations on their elements as you do it. The fact that they can be
iterated separately from a foreach loop is beneficial, and some range-based
functions are useful with them (e.g. map), but most range-based algorithms
need forward ranges.

 If this is coming across as complicated, then it's probably
 because of all of the explanatory text I had to give as to why
 these changes should be made. But the changes themselves
 simplify things for forward ranges.

 Could be the case :). Perhaps it would be nice to see it as a
 code example.

Well for forward ranges, you'd basically be writing what you write now
except that you wouldn't have to worry about calling save anywhere, which
would signifcantly reduce the number of bugs that crop into range-based
code. So, about all I could really show you would be normal range-based code
that just doesn't call save, because it doesn't need to.

It would also allow you to assign one range to another (assuming that
they're the same type), which you can't do right now without relying on the
specific copy semantics of a range. But for forward ranges, the main change
here would simply be that you could assume that copying a range was
equivalent to save, which is purely a simplification. If there's a
complication, it's only that ranges which are currently reference types
would have to do something smarter than saving in their copy constructor if
they wanted to be efficient. So, they'd work with something like

struct Range(T)
{
    T front() { return _range.front; }
    void popFront() { _range.popFront(); }
    bool empty() { return _range.empty; }

    // or copy constructor
    this(this)
    {
        _range = _range.save;
    }

    private static class RefRange { ... }

    private RefRange _range;
}

but if you wanted them to be more efficient, you'd need something like

struct Range(T)
{
    T front() { return _range.front; }

    void popFront()
    {
        if(_range._refCount != 1)
            _range = _range.save; // ref count now 1

        _range.popFront();
    }

    bool empty() { return _range.empty; }

    // or copy constructor
    this(this)
    {
        _range.incRef();
    }

    ~this()
    {
        _range.decRef();
    }

    private static class RefRange { ... }

    private RefRange _range;
}

So, there would be some extra complication for reference type forward
ranges, but most forward ranges already do the equivalent of calling save
when they're copied, which is a large part of why you frequently end up with
bugs when you try to use a range that needs you to call save explicitly.

- Jonathan M Davis

Jan 22 2024

Paul Backus <snarwin gmail.com> writes:

On Sunday, 21 January 2024 at 05:00:31 UTC, Jonathan M Davis 
wrote:
 From what I can see, the main negative is simply that you can't 
 then write code that works on both a basic input range and a 
 forward range (though you can obviously still create function 
 overloads so that the caller can use either)

With the proposed design, it would be possible to implement 
next() for forward ranges as a UFCS function:

     auto next(FR)(FR fr)
         if (std.v2.range.isForwardRange!FR)
     {
         if (fr.empty)
         {
             return Nullable!(ElementType!FR).init;
         }
         else
         {
             scope(success) fr.popFront;
             return nullable(fr.front);
         }
     }

So, a function written to operate on basic input ranges would be 
able to accept any kind of range, same as today.

Jan 21 2024

Paul Backus <snarwin gmail.com> writes:

On Sunday, 21 January 2024 at 14:51:33 UTC, Paul Backus wrote:
 With the proposed design, it would be possible to implement 
 next() for forward ranges as a UFCS function:

     auto next(FR)(FR fr)

Correction: fr should be passed by ref (or maybe auto ref).

Jan 21 2024

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Sunday, January 21, 2024 7:51:33 AM MST Paul Backus via Digitalmars-d 
wrote:
 On Sunday, 21 January 2024 at 05:00:31 UTC, Jonathan M Davis

 wrote:
 From what I can see, the main negative is simply that you can't
 then write code that works on both a basic input range and a
 forward range (though you can obviously still create function
 overloads so that the caller can use either)

 With the proposed design, it would be possible to implement
 next() for forward ranges as a UFCS function:

      auto next(FR)(FR fr)
          if (std.v2.range.isForwardRange!FR)
      {
          if (fr.empty)
          {
              return Nullable!(ElementType!FR).init;
          }
          else
          {
              scope(success) fr.popFront;
              return nullable(fr.front);
          }
      }

 So, a function written to operate on basic input ranges would be
 able to accept any kind of range, same as today.

True, but then you have a function with the same copy semantics problem that
we have now, since the forward range wouldn't have reference semantics. It
would need to be fully wrapped in another type which gave it reference
semantics to do that.

So, while someone could do something like this, I question that we should
encourage it or support it with anything in Phobos.

- Jonathan M Davis

Jan 21 2024

Paul Backus <snarwin gmail.com> writes:

On Monday, 22 January 2024 at 02:10:32 UTC, Jonathan M Davis 
wrote:
 On Sunday, January 21, 2024 7:51:33 AM MST Paul Backus via 
 Digitalmars-d wrote:
 With the proposed design, it would be possible to implement 
 next() for forward ranges as a UFCS function:


[...]
 True, but then you have a function with the same copy semantics 
 problem that we have now, since the forward range wouldn't have 
 reference semantics. It would need to be fully wrapped in 
 another type which gave it reference semantics to do that.

You caught me--I forgot to mark the parameter as 'ref'. Dangers 
of writing code directly into a message without testing it first, 
I suppose. :)

Jan 21 2024

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Sunday, January 21, 2024 7:30:40 PM MST Paul Backus via Digitalmars-d 
wrote:
 On Monday, 22 January 2024 at 02:10:32 UTC, Jonathan M Davis

 wrote:
 On Sunday, January 21, 2024 7:51:33 AM MST Paul Backus via

 Digitalmars-d wrote:
 With the proposed design, it would be possible to implement


 next() for forward ranges as a UFCS function:


 [...]

 True, but then you have a function with the same copy semantics
 problem that we have now, since the forward range wouldn't have
 reference semantics. It would need to be fully wrapped in
 another type which gave it reference semantics to do that.

 You caught me--I forgot to mark the parameter as 'ref'. Dangers
 of writing code directly into a message without testing it first,
 I suppose. :)

Well, even if the parameter is ref (which it would need to be), the core
problem is that the range-based function then has to deal with passing
around an object that could have value semantics even though basic input
ranges would be required to have reference semantics with the proposed
changes. Obviously, range-based code _can_ work without caring about the
copy semantics of the range type, but the main thrust of the proposed
changes is to try to require specific copy semantics from different
categories of ranges so that we can avoid the bugs that come with different
range types having different copy semantics (as well as making it possible
to write code that takes advantage of the specific copy semantics of a
particular category of range; even simply being able to assign one range to
another in generic code would be a big help to some code).

Ideally, code would be able to rely on the copy and assignment semantics of
ranges - and that means not trying to treat basic input ranges and forward
ranges as the same thing without actually wrapping one in a way that forces
it to be the same thing (be it by turning a forward range into a reference
type with the basic input range API or by wrapping a basic input range in a
struct which caches the data such that it can implement the forward range
API). Simply slapping the functions for the other API on one of the two
types will result in ranges that won't behave properly, which will work with
some functions and fail miserably with others.

- Jonathan M Davis

Jan 21 2024

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Sunday, January 21, 2024 7:51:33 AM MST Paul Backus via Digitalmars-d 
wrote:
 On Sunday, 21 January 2024 at 05:00:31 UTC, Jonathan M Davis

 wrote:
 From what I can see, the main negative is simply that you can't
 then write code that works on both a basic input range and a
 forward range (though you can obviously still create function
 overloads so that the caller can use either)

 With the proposed design, it would be possible to implement
 next() for forward ranges as a UFCS function:

      auto next(FR)(FR fr)
          if (std.v2.range.isForwardRange!FR)
      {
          if (fr.empty)
          {
              return Nullable!(ElementType!FR).init;
          }
          else
          {
              scope(success) fr.popFront;
              return nullable(fr.front);
          }
      }

 So, a function written to operate on basic input ranges would be
 able to accept any kind of range, same as today.

Actually. What occurs to me is that if we made this function take the range
by pointer, then it would be possible to give a forward range the correct
copy semantics. You'd then have to take its address (or put it on the heap)
to make it work, but then you'd get reference semantics even though it's a
forward range.

Obviously, that's less than ideal if you want a function to accept both
basic input ranges and forward ranges, but if it's a function that really
only needs basic input ranges and doesn't benefit from operating on a
forward range, then it would be simple enough to create an overload which
takes the address - and if you really want to be returning the range from
the function, then you wouldn't want it to be operating on both basic input
ranges and forward ranges anyway, since you'd usually be looking to wrap the
result, in which case, you'd need to do all of the annoying static ifs to
determine which functions from the range API you could slap on the wrapper
range based on the capabilities of the one being passed in.

- Jonathan M Davis

Jan 22 2024

Quirin Schroll <qs.il.paperinik gmail.com> writes:

On Sunday, 21 January 2024 at 14:51:33 UTC, Paul Backus wrote:
 On Sunday, 21 January 2024 at 05:00:31 UTC, Jonathan M Davis 
 wrote:
 From what I can see, the main negative is simply that you 
 can't then write code that works on both a basic input range 
 and a forward range (though you can obviously still create 
 function overloads so that the caller can use either)

 With the proposed design, it would be possible to implement 
 next() for forward ranges as a UFCS function:

     auto next(FR)(FR fr)
         if (std.v2.range.isForwardRange!FR)
     {
         if (fr.empty)
         {
             return Nullable!(ElementType!FR).init;
         }
         else
         {
             scope(success) fr.popFront;
             return nullable(fr.front);
         }
     }

 So, a function written to operate on basic input ranges would 
 be able to accept any kind of range, same as today.

It wouldn’t work. That’s the point. It would compile, probably, 
but by the proposal, an input range would have guaranteed 
reference semantics, whereas a forward range has value semantics 
(as far as iteration state is concerned). That means, if you call 
`next` on a copy of an input range, *both* ranges, the old one 
and the copy, are on the following element, but doing that on a 
forward range, only the copy would be on the next element and the 
original would not. Both guaranteed.

I’d rather say that a forward range must have a ` disable`d 
`next`, i.e. `isForwardRange` should test if `next` is present 
and if it is, refuse to acknowledge the forward range. By the 
proposal, input and forward ranges are fundamentally 
incommensurate concepts on the usage side; granted, only when it 
comes to copying, but that’s enough.

Jan 23 2024

Walter Bright <newshound2 digitalmars.com> writes:

Thanks for bringing this up. Improving range semantics is something we should 
not pass up the opportunity for with Phobos3. (I consider Phobos1 the D1 
version, Phobos2 the one we have currently.)

Another thing that should be visited is the ability to lock/unlock a range. For 
example, a range that represents stdout needs to be lockable.

Jan 21 2024

Paul Backus <snarwin gmail.com> writes:

On Sunday, 21 January 2024 at 17:15:58 UTC, Walter Bright wrote:
 Another thing that should be visited is the ability to 
 lock/unlock a range. For example, a range that represents 
 stdout needs to be lockable.

Surely this would be better handled outside the range API.

E.g., you call a library function that locks stdout and returns a 
range, and then when you're done with the range, it unlocks 
stdout again in its destructor. But the functions you pass the 
range to (map, filter, whatever) don't need to know anything 
about locking.

This *would* be a good use-case for a non-copyable range, since 
you only want to unlock once (and reference counting would add 
significant overhead). So even if we don't *require* ranges to be 
non-copyable, we probably want to *allow* then to be.

Jan 21 2024

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Sunday, January 21, 2024 10:15:58 AM MST Walter Bright via Digitalmars-d 
wrote:
 Thanks for bringing this up. Improving range semantics is something we
 should not pass up the opportunity for with Phobos3. (I consider Phobos1
 the D1 version, Phobos2 the one we have currently.)

 Another thing that should be visited is the ability to lock/unlock a range.
 For example, a range that represents stdout needs to be lockable.

I will have to study that more to see what's needed there. If it's the kind
of thing that requires higer level locking than simply locking within the
implementations of existing range-based functions (e.g. if it had to be
locked across multiple calls), then that likely either merits a non-standard
extension that the user has to worry about that's specific to stdio stuff,
or we'll need to come up with some sort of higher level addition to handle
it. I very much doubt that it's the sort of thing that's going to make much
sense to worry about within most range-based functions though. I will add it
to the list of things to look into though.

There are several other minor changes that I think are going to be needed to
the range API (e.g. requiring that random-access ranges implement opDollar
rather than letting it be optional, since it's useless if it's optional like
it is now), but the core point of this thread was to try to sort out the
issue of the copy and assignment semantics for ranges, and the best solution
for that seems to be to get rid of save (which would allow all forward
ranges to behave like dynamic arrays as far as their copy, assignment, and
iteration semantics go) and to make basic input ranges distinct from forward
ranges, since they fundamentally can't have the same semantics.

The issue of tail-const is going to be another big one, though that could
result in us having to switch to a head/tail model for forward ranges
instead of front/popFront - either that, or we're going to need a language
improvement of some kind to sort it out, since we need to be able to
generically get a tail-const copy of a forward range, whereas we currently
don't have a way to convert const(Range!Element) to Range!(const Element).
We can't even assume that const(Range!Element) is the same as
const(Range!(const Element)). So, there are definitely further issues that
will need to be hashed out beyond the copy semantics. I just decided to
start with that.

- Jonathan M Davis

Jan 21 2024

Sebastiaan Koppe <mail skoppe.eu> writes:

On Sunday, 21 January 2024 at 05:00:31 UTC, Jonathan M Davis 
wrote:
 I've been thinking about this for a while now, but with the 
 next version of Phobos which is in the early planning stages, 
 we really should do some redesigning of ranges.

Thanks for the write up.

Another issue I have encountered is around priming. Sometimes it 
happens in `empty`, sometimes in a private helper function and 
sometimes even in the constructor.

If priming happens in `empty`, then it can't be `const`. Which is 
strange because you would expect `empty` not to mutate.

I have been thinking about having an explicit build and iteration 
phase, where priming happens when you switch from build to 
iteration.

The benefit is that implementers have a clear place where to 
prime the range.

 What I would propose for that would be a single function

    auto next();

 where next returns a nullable type where the value returned is 
 the next element in the range, with a null value being returned 
 if the range is empty.

What about:

```
     bool next(Callable)(Callable c) {
         if (empty)
             return false;
         c(front());
         popFront();
         return true;
     }
```

It has the benefit of not needing to unbox a Nullable/Pointer.

Jan 21 2024

Paul Backus <snarwin gmail.com> writes:

On Sunday, 21 January 2024 at 18:26:37 UTC, Sebastiaan Koppe 
wrote:
 What about:

     bool next(Callable)(Callable c) {
         if (empty)
             return false;
         c(front());
         popFront();
         return true;
     }

 It has the benefit of not needing to unbox a Nullable/Pointer.

If your concern is performance, an indirect function call is 
going to hurt a lot more than a pointer dereference.

If your concern is aesthetics, dereferencing a pointer or 
unboxing a Nullable is far less ugly than passing a callback.

     // With pointer
     while (auto p = r.next)
         doStuffWith(*p);

     // With callback
     while (r.next((e) { doStuffWith(e); })) {}

Jan 21 2024

Ben Jones <fake fake.fake> writes:

On Sunday, 21 January 2024 at 22:40:55 UTC, Paul Backus wrote:
 If your concern is aesthetics, dereferencing a pointer or 
 unboxing a Nullable is far less ugly than passing a callback.

I've been writing a fair amount of Kotlin (an a tiny bit of 
Swift) and the trailing lambda syntax is very nice for making 
these APIs look nice.  Bascially, if the last argument to a 
function is a function, you can put it's curly braces after the 
parentheses:  `range.map{ it*2}.filter{ it < 10}`

The sauce that Kotlin has that D doesn't is that 1-arg lambdas 
get an implicit parameter called `it` and explicit parameters 
come after the opening curly brace: `{ p1, p2 -> body }`.  I'm 
also not sure how it would work with templates.  Generally, I 
think it's much easier to think about lifetime guarantees with 
functional APIs, so something like that might make them more 
palatable syntactically.

Jan 21 2024

Quirin Schroll <qs.il.paperinik gmail.com> writes:

On Monday, 22 January 2024 at 05:54:59 UTC, Ben Jones wrote:
 On Sunday, 21 January 2024 at 22:40:55 UTC, Paul Backus wrote:
 If your concern is aesthetics, dereferencing a pointer or 
 unboxing a Nullable is far less ugly than passing a callback.

 I've been writing a fair amount of Kotlin (an a tiny bit of 
 Swift) and the trailing lambda syntax is very nice for making 
 these APIs look nice.  Bascially, if the last argument to a 
 function is a function, you can put it's curly braces after the 
 parentheses:  `range.map{ it*2}.filter{ it < 10}`

 The sauce that Kotlin has that D doesn't is that 1-arg lambdas 
 get an implicit parameter called `it` and explicit parameters 
 come after the opening curly brace: `{ p1, p2 -> body }`.  I'm 
 also not sure how it would work with templates.  Generally, I 
 think it's much easier to think about lifetime guarantees with 
 functional APIs, so something like that might make them more 
 palatable syntactically.

This is off-topic, but I want to mention that the implicit `it` 
confused the hell out of me when I learned Kotlin. The feature 
literally saves measly 6 keystrokes for `it -> `, 4 if you ignore 
spaces. In D, `map!(it => it * 2)` IMO, is pretty good (e.g. 
compared to C++). D’s lambdas have their flaws, but lack of 
brevity isn’t one.

Jan 23 2024

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Sunday, January 21, 2024 11:26:37 AM MST Sebastiaan Koppe via Digitalmars-d 
wrote:
 On Sunday, 21 January 2024 at 05:00:31 UTC, Jonathan M Davis

 wrote:
 I've been thinking about this for a while now, but with the
 next version of Phobos which is in the early planning stages,
 we really should do some redesigning of ranges.

 Thanks for the write up.

 Another issue I have encountered is around priming. Sometimes it
 happens in `empty`, sometimes in a private helper function and
 sometimes even in the constructor.

 If priming happens in `empty`, then it can't be `const`. Which is
 strange because you would expect `empty` not to mutate.

 I have been thinking about having an explicit build and iteration
 phase, where priming happens when you switch from build to
 iteration.

 The benefit is that implementers have a clear place where to
 prime the range.

Well, arguably, that's really an implementation detail of the range, and it
would be pretty annoying if range-based code had to worry about explicitly
priming a range (you're basically getting into two phase construction at
that point, which tends to be problematic). Ideally, once you have a range,
it's ready to use. So, while I could see saying what best practice was on
that, I'm not sure that I'd want to require that it be done a specific way.
Part of the problem is that in my experience, it sometimes makes sense to do
it one place, whereas with other ranges, it makes more sense to do it
somewhere else. And whether empty mutates really shouldn't matter so long as
it's not actually const (it has to be logically const to be sure, but that
doesn't mean that it needs to actually be const), though obviously, it can
become annoying to wrap a range when you want to make your empty const, and
you can't rely on the range your wrapping having empty be const. For better
or worse though, the current range API makes no requirements about const
(and I'm not sure if we want to add such requirements given how restrictive
D's const is). If we did require const anywhere on the range API though,
empty and length would be the obvious places.

That being said, I'm not sure that I've ever written a range that primes
anything in empty. Usually, the question is between what goes in front and
and what goes in popFront and how much has to go in the constructor for that
to work cleanly. And the answer is not always the same.

We may also want to rework how front and popFront work for forward ranges,
which could simplify the priming issue depending on what we did. Regardless
though, the priming issue is certainly something that we should be thinking
about. Even if we don't add anything to the API for it, it will be good to
keep it in mind for how it would affect the other range API functions for
any reworking of them that we might do.

Regardless, the big issues that were the point of this thread were the copy
and assignment semantics for ranges and what we would need to do fix those.
There are definitely other issues that we're going to have to sort out (e.g.
the tail-const issue).

 What I would propose for that would be a single function

    auto next();

 where next returns a nullable type where the value returned is
 the next element in the range, with a null value being returned
 if the range is empty.

 What about:

 ```
      bool next(Callable)(Callable c) {
          if (empty)
              return false;
          c(front());
          popFront();
          return true;
      }
 ```

 It has the benefit of not needing to unbox a Nullable/Pointer.

Well, that would basically be another sort of opApply, which is an idiom
that tends to be pretty hard to understand in comparison to the
alternatives. Getting a nullable type of some kind back is far easier to
read and understand, and the only real downside I see to it is that you get
bugs if someone tries to access the value when there isn't one. But ranges
already have that in general with front and empty, and I don't think that
it's been much of an issue in practice.

We could also go with next and hasNext like Alexandru suggested, in which
case, we wouldn't be returning a pointer or nullable type if that's the
concern. I'm not sure if that's ultimately better or worse though. In some
respects, it would be better to be able to put all of the range logic in a
single function, but it would also be nice in some situations to be able to
ask whether a basic input range has any elements without having to grab the
first one if it's there.

Another consideration that comes to mind is that we might want to add a
function that explicitly skips elements (since in some cases, you can skip
work if you don't actually need the next element), and since next both pops
the element and returns it, we wouldn't have to worry about the state of
front in the process (whereas adding some kind of skip function to the
current range API would result in an invalid front).

- Jonathan M Davis

Jan 21 2024

Sebastiaan Koppe <mail skoppe.eu> writes:

On Monday, 22 January 2024 at 04:38:13 UTC, Jonathan M Davis
wrote:
On Sunday, January 21, 2024 11:26:37 AM MST Sebastiaan Koppe
I have been thinking about having an explicit build and
iteration phase, where priming happens when you switch from
build to iteration.

The benefit is that implementers have a clear place where to
prime the range.

Well, arguably, that's really an implementation detail of the
range, and it would be pretty annoying if range-based code had
to worry about explicitly priming a range (you're basically
getting into two phase construction at that point, which tends
to be problematic). Ideally, once you have a range, it's ready
to use. So, while I could see saying what best practice was on
that, I'm not sure that I'd want to require that it be done a
specific way.

Part of the problem is that in my experience, it sometimes
makes sense to do it one place, whereas with other ranges, it
makes more sense to do it somewhere else.

Yes exactly. My point is that if there is an explicit place to do
priming, that uncertainty would go away.

That being said, I'm not sure that I've ever written a range
that primes anything in empty. Usually, the question is between
what goes in front and and what goes in popFront and how much
has to go in the constructor for that to work cleanly. And the
answer is not always the same.

With 2-phase ranges it would always be the same. Anyway, here are
some Phobos examples:

- `filter` does its priming in empty,
https://github.com/dlang/phobos/blob/bf35228426529ab19e5d17a3286d187214bf024a/std/algorithm/iteration.d#L1381

- `filterByDirectional` does a while loop in its constructor,
https://github.com/dlang/phobos/blob/bf35228426529ab19e5d17a3286d187214bf024a/std/algorithm/iteration.d#L1581

- `chunkBy` calls empty in its constructor,
https://github.com/dlang/phobos/blob/bf35228426529ab19e5d17a3286d187214bf024a/std/algorithm/iteration.d#L1931

- `substitute` calls empty and popFront in its constructor,
https://github.com/dlang/phobos/blob/bf35228426529ab19e5d17a3286d187214bf024a/std/algorithm/iteration.d#L6909

Taken together, pretty much anything can happen just by
constructing a range. You don't even need to iterate it!

Regardless, the big issues that were the point of this thread
were the copy and assignment semantics for ranges and what we
would need to do fix those. There are definitely other issues
that we're going to have to sort out (e.g. the tail-const
issue).

Ultimately it all comes together or it doesn't.

What about:

```
bool next(Callable)(Callable c) {
if (empty)
return false;
c(front());
popFront();
return true;
}
```

It has the benefit of not needing to unbox a Nullable/Pointer.

Well, that would basically be another sort of opApply, which is
an idiom that tends to be pretty hard to understand in
comparison to the alternatives.

Actual usual would just use `foreach` of course. But in cases
where you want to iterate manually you have to deal with the
added ugliness/complexity, fair.

I do think its easier to access the item by `ref` this way,
instead of having to do a pointer in some nullable wrapper.

Getting a nullable type of some kind back is far easier to read
and understand, and the only real downside I see to it is that
you get bugs if someone tries to access the value when there
isn't one. But ranges already have that in general with front
and empty, and I don't think that it's been much of an issue in
practice.

I don't particularly like APIs that have a requirement to call
methods in a particular order, I much rather have APIs that can't
be used wrong. Pit of success and all that.

In practice though, I messed up only a few times, and then added
a condition on `empty` and continued on.

We could also go with next and hasNext like Alexandru
suggested, in which case, we wouldn't be returning a pointer or
nullable type if that's the concern. I'm not sure if that's
ultimately better or worse though. In some respects, it would
be better to be able to put all of the range logic in a single
function, but it would also be nice in some situations to be
able to ask whether a basic input range has any elements
without having to grab the first one if it's there.

With some sources, determining whether there is a next item
actually involves *getting* the next item. Instead, like `empty`,
I think `hasNext` ought to be `const`.

Jan 22 2024

Steven Schveighoffer <schveiguy gmail.com> writes:

On Sunday, 21 January 2024 at 05:00:31 UTC, Jonathan M Davis 
wrote:
 Ultimately, I'm inclined to argue that we should give basic 
 input ranges a new API - not just because it would allow them 
 to use reference counting, but also because the current input 
 range API tends to force basic input ranges to cache the value 
 of front (which if anything, encourages basic input ranges to 
 be pseudo-reference types and could result in an extra layer of 
 indirection in some cases if they're forced to be reference 
 types). It would be annoying in some cases to require that 
 different functions be used for basic input ranges and forward 
 ranges (though overloading could obviously be used to avoid 
 needing different names), but it's already often the case that 
 code isn't really designed to work with both, and overloading 
 on the category of range being used is already fairly common, 
 since different range capabilities allow for different 
 implementations. So, given that it would prevent whole classes 
 of copying bugs as well as potentially remove the requirement 
 to cache front for basic input ranges, I think that a separate 
 API for basic input ranges is warranted. What I would propose 
 for that would be a single function

     auto next();

 where next returns a nullable type where the value returned is 
 the next element in the range, with a null value being returned 
 if the range is empty. The return type would then need to 
 emulate a pointer - specifically, when casting it to bool, it 
 would be true if it's non-null and false if it's null, and 
 dereferencing it would give you the actual value if it's 
 non-null (with it being undefined behavior if you dereference 
 null). So, a basic input range of ints might define next as

     int* next();

While I agree with most of what you wrote, there is one very big 
problem with this. Namely, current input ranges are suddenly 
considered to be forward ranges. Changing the semantics of what 
it means to only define `front`, `popFront`, and `empty` changes 
the semantics of existing code.

And this is not something you can solve with some kind of 
versioning. The only way I can think of is to alter the name of 
one or more primitives to prevent accidental usage.

-Steve

Jan 21 2024

Paul Backus <snarwin gmail.com> writes:

On Sunday, 21 January 2024 at 19:24:20 UTC, Steven Schveighoffer 
wrote:
 While I agree with most of what you wrote, there is one very 
 big problem with this. Namely, current input ranges are 
 suddenly considered to be forward ranges. Changing the 
 semantics of what it means to only define `front`, `popFront`, 
 and `empty` changes the semantics of existing code.

 And this is not something you can solve with some kind of 
 versioning. The only way I can think of is to alter the name of 
 one or more primitives to prevent accidental usage.

One possibility is to change the forward range interface to 
something like

     bool empty();
     Element head();
     typeof(this) tail();

In addition to distinguishing new-style forward ranges from 
old-style ones, this interface would also allow forward ranges to 
be `const` (although iterating a `const` forward range would only 
be possible with recursion).

`popFront` could then be implemented as a UFCS function:

     void popFront(R)(ref R r)
         if (std.v2.range.isForwardRange!R && isAssignable!R)
     {
         r = r.tail;
     }

Jan 21 2024

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Sunday, January 21, 2024 3:46:06 PM MST Paul Backus via Digitalmars-d 
wrote:
 On Sunday, 21 January 2024 at 19:24:20 UTC, Steven Schveighoffer

 wrote:
 While I agree with most of what you wrote, there is one very
 big problem with this. Namely, current input ranges are
 suddenly considered to be forward ranges. Changing the
 semantics of what it means to only define `front`, `popFront`,
 and `empty` changes the semantics of existing code.

 And this is not something you can solve with some kind of
 versioning. The only way I can think of is to alter the name of
 one or more primitives to prevent accidental usage.

 One possibility is to change the forward range interface to
 something like

      bool empty();
      Element head();
      typeof(this) tail();

 In addition to distinguishing new-style forward ranges from
 old-style ones, this interface would also allow forward ranges to
 be `const` (although iterating a `const` forward range would only
 be possible with recursion).

 `popFront` could then be implemented as a UFCS function:

      void popFront(R)(ref R r)
          if (std.v2.range.isForwardRange!R && isAssignable!R)
      {
          r = r.tail;
      }

Well, I think that the reality of the matter here is that if we want to deal
with const and immutable properly (which is another issue that we need to be
tackling with the range API redesign), we're either going to need to switch
to a more functional-style API like this, or we're going to need to add some
sort of opTailConst to the language, or we're going to need to add some
other sort of language improvement that would allow use to convert
const(Range!Element) to Range!(const Element) generically. As things stand,
const and ranges really don't interact well together outside of fairly
limited situations (e.g. using dynamic arrays for everything), and
containers in particular get screwed by it.

So, we may very well decide that we need to move forward ranges to something
like head and tail, though I'm not terribly happy with the idea, and I'm not
sure what the performance implications are (though LDC at least could
probably make it not matter). It would also require another change to
foreach, but pretty much _anything_ we do here to fix the tail-const problem
with ranges is going to require some kind of language change.

Another problem with the head/tail approach is that we then have to come up
with names for bidirectional ranges, and all of that could get pretty
confusing. But ultimately, that can be solved.

Either way, the main things that I think needs to happen to fix the copy
semantics problems are to require that forward ranges have the same copy
semantics as dynamic arrays (at least with regards to their iteration
state), which means that save is no longer a thing, and we need basic input
ranges to be reference types, which means that they should probably have a
different API from forward ranges. I suppose that we could then change the
forward range API and leave the basic input range API alone, but given that
the current API requires caching front, which causes some problems for basic
input ranges, I'm inclined to give basic input ranges a different API. If we
then also need to change the API of forward ranges beyond removing save,
then so be it, though that will result in a lot more code having to be
changed when it's updated to use or be in Phobos v3.

- Jonathan M Davis

Jan 21 2024

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Sunday, January 21, 2024 12:24:20 PM MST Steven Schveighoffer via 
Digitalmars-d wrote:
 While I agree with most of what you wrote, there is one very big
 problem with this. Namely, current input ranges are suddenly
 considered to be forward ranges. Changing the semantics of what
 it means to only define `front`, `popFront`, and `empty` changes
 the semantics of existing code.

 And this is not something you can solve with some kind of
 versioning. The only way I can think of is to alter the name of
 one or more primitives to prevent accidental usage.

Yeah, I was orignally hoping that this sort of thing could just be solved
with editions, but regardless of what we do with editions and Phobos, third
party code won't necessarily be updated to match, and it won't even
necessarily be obvious whether it has been. To an extent, that doesn't
really matter, and we could theoretically punt on the issue based on the
idea that any basic input ranges written to follow the Phobos v2 API would
just behave incorrectly with functions written to follow the Phobos v3 API
for forward ranges, so testing would catch it, but that's obviously not
ideal. It _could_ be caught if we required all basic input ranges to be
either class references or pointers to structs, while requiring that all
forward ranges be structs or dynamic arrays (and we will have to do the
latter), but that would arguably restrict the implementation of basic input
ranges too much (e.g. reference counting wouldn't be possible).

For better or worse, it's definitely an argument for switching to a more
functional-style API like Paul mentioned and like has occasionally been
discussed in the past (usually as a potentially solution to the tail-const
and tail-immutable issue). That would actually allow us to solve other
problems rather than simply being a minor API change just to make it so that
we could detect the difference, though I'm not sure what all of the other
consequences of such a change would be (particularly with regards to
performance).

- Jonathan M Davis

Jan 21 2024

Atila Neves <atila.neves gmail.com> writes:

On Sunday, 21 January 2024 at 05:00:31 UTC, Jonathan M Davis 
wrote:
 I've been thinking about this for a while now, but with the 
 next version of Phobos which is in the early planning stages, 
 we really should do some redesigning of ranges. Most of their 
 API is just fine how it is, but there are some aspects of it 
 which really should be changed if we want them to be better 
 (the most obvious one being the removal of auto-decoding). But 
 what I'd like to discuss specifically in this thread is fixing 
 - and defining - the semantics of copying and assigning to 
 ranges. Specifically, the semantics of stuff like

 [...]

I don't think I've ever encountered a situation where reference 
ranges would have been desirable - I've never used one.

I think that `.save` was a historical mistake, and that ranges 
that can be copied are forward ranges. Something like a range 
reading from stdin or a socket would/should disable the 
copy/postblit constructors.

Has anyone here used the class-based ranges?

Jan 22 2024

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 23/01/2024 4:41 AM, Atila Neves wrote:
 On Sunday, 21 January 2024 at 05:00:31 UTC, Jonathan M Davis wrote:
 I've been thinking about this for a while now, but with the next 
 version of Phobos which is in the early planning stages, we really 
 should do some redesigning of ranges. Most of their API is just fine 
 how it is, but there are some aspects of it which really should be 
 changed if we want them to be better (the most obvious one being the 
 removal of auto-decoding). But what I'd like to discuss specifically 
 in this thread is fixing - and defining - the semantics of copying and 
 assigning to ranges. Specifically, the semantics of stuff like

 [...]

 
 I don't think I've ever encountered a situation where reference ranges 
 would have been desirable - I've never used one.
 
 I think that `.save` was a historical mistake, and that ranges that can 
 be copied are forward ranges. Something like a range reading from stdin 
 or a socket would/should disable the copy/postblit constructors.
 
 Has anyone here used the class-based ranges?

I did experiment with them when I first learned ranges.

Nothing is set up for them.

But they do two things extremely well:

1. Create confusion, they are not the definition of what a range is, nor 
are they used anywhere.
2. Of course since then I've also learned that they offer opportunity to 
prevent optimizations, so not a great solution to be tied to.

I want signatures. I've told Adam Wilson that if you want me involved 
with ranges for PhobosV3 at the very least I want the compile time 
aspect. The current situation with ranges not being properly documented 
in API is just not good enough.

Signatures do have a runtime aspect, they can represent ranges if you 
need a reference, we can also implement something like save via that 
(even if the implementation doesn't have it).

Jan 22 2024

"H. S. Teoh" <hsteoh qfbox.info> writes:

On Mon, Jan 22, 2024 at 03:41:35PM +0000, Atila Neves via Digitalmars-d wrote:
 On Sunday, 21 January 2024 at 05:00:31 UTC, Jonathan M Davis wrote:
 I've been thinking about this for a while now, but with the next
 version of Phobos which is in the early planning stages, we really
 should do some redesigning of ranges. Most of their API is just fine
 how it is, but there are some aspects of it which really should be
 changed if we want them to be better (the most obvious one being the
 removal of auto-decoding). But what I'd like to discuss specifically
 in this thread is fixing - and defining - the semantics of copying
 and assigning to ranges. Specifically, the semantics of stuff like
 
 [...]

 
 I don't think I've ever encountered a situation where reference ranges
 would have been desirable - I've never used one.

It's useful in recursive-descent parsers where you expect the range to
have advanced past whatever a recursive call has consumed.


 I think that `.save` was a historical mistake, and that ranges that
 can be copied are forward ranges. Something like a range reading from
 stdin or a socket would/should disable the copy/postblit constructors.

Meaning they can only be passed by reference?


 Has anyone here used the class-based ranges?

Yes.  They are useful when you have a range whose type varies at
runtime. For example:

	auto myFunc(R)(R range, int type) {
		switch (type) {
			case 1:
				return range.map!(a => a*2);
			case 2:
				auto auxRange = helperFunc(range);
				return range.filter!(a => a % 2)
					.chain(auxRange);
			case 3:
				return range.chain(recurrence!(...));
		}
		assert(0);
	}

This code will not compile because there is no common return type. To
fix it, wrap each return in out of the class-based range objects:

	auto myFunc(R)(R range, int type) {
		switch (type) {
			case 1:
				return range.map!(a => a*2)
					.inputRangeObject;
			case 2:
				auto auxRange = helperFunc(range);
				return range.filter!(a => a % 2)
					.chain(auxRange)
					.inputRangeObject;
			case 3:
				return range.chain(recurrence!(...))
					.inputRangeObject;
		}
		assert(0);
	}


T

-- 
They say that "guns don't kill people, people kill people." Well I think the
gun helps. If you just stood there and yelled BANG, I don't think you'd kill
too many people. -- Eddie Izzard, Dressed to Kill

Jan 22 2024

Atila Neves <atila.neves gmail.com> writes:

On Monday, 22 January 2024 at 17:07:05 UTC, H. S. Teoh wrote:
 On Mon, Jan 22, 2024 at 03:41:35PM +0000, Atila Neves via 
 Digitalmars-d wrote:
 On Sunday, 21 January 2024 at 05:00:31 UTC, Jonathan M Davis 
 wrote:
 [...]

 
 I don't think I've ever encountered a situation where 
 reference ranges would have been desirable - I've never used 
 one.

 It's useful in recursive-descent parsers where you expect the 
 range to have advanced past whatever a recursive call has 
 consumed.

Why can't this be done with value ranges?

 I think that `.save` was a historical mistake, and that ranges 
 that can be copied are forward ranges. Something like a range 
 reading from stdin or a socket would/should disable the 
 copy/postblit constructors.

 Meaning they can only be passed by reference?

Or moved.

Jan 23 2024

"H. S. Teoh" <hsteoh qfbox.info> writes:

On Tue, Jan 23, 2024 at 09:19:24AM +0000, Atila Neves via Digitalmars-d wrote:
 On Monday, 22 January 2024 at 17:07:05 UTC, H. S. Teoh wrote:
 On Mon, Jan 22, 2024 at 03:41:35PM +0000, Atila Neves via Digitalmars-d
 wrote:
 On Sunday, 21 January 2024 at 05:00:31 UTC, Jonathan M Davis wrote:
 [...]

 
 I don't think I've ever encountered a situation where reference
 ranges would have been desirable - I've never used one.

 
 It's useful in recursive-descent parsers where you expect the range
 to have advanced past whatever a recursive call has consumed.

 
 Why can't this be done with value ranges?

[...]

Because the recursive call advances a value copy of the passed range,
and when it returns, the range in the caller is still in the old
position.


T

-- 
"Hi." "'Lo."

Jan 23 2024

Paul Backus <snarwin gmail.com> writes:

On Monday, 22 January 2024 at 15:41:35 UTC, Atila Neves wrote:
 I think that `.save` was a historical mistake, and that ranges 
 that can be copied are forward ranges. Something like a range 
 reading from stdin or a socket would/should disable the 
 copy/postblit constructors.

This was my initial thought too, but it turns out it's not quite 
that simple.

If we *require* basic input ranges to be non-copyable, then types 
which are inherently copyable (in particular, `T[]`) will be 
exclude from being basic input ranges. In order to use a `T[]` 
with an algorithm that expects a basic input range, you would 
have to wrap it in a `struct` that has copying disabled.

It's also a big problem for composability. If half the algorithms 
require your range to be copyable, and the other half require it 
to be non-copyable, you have to keep track of [what color each 
algorithm is][1] whenever you write a range pipeline, or write a 
new algorithm that calls an existing one internally.

[1]: 
https://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function/

Jan 22 2024

Atila Neves <atila.neves gmail.com> writes:

On Monday, 22 January 2024 at 17:18:56 UTC, Paul Backus wrote:
 On Monday, 22 January 2024 at 15:41:35 UTC, Atila Neves wrote:
 I think that `.save` was a historical mistake, and that ranges 
 that can be copied are forward ranges. Something like a range 
 reading from stdin or a socket would/should disable the 
 copy/postblit constructors.

 This was my initial thought too, but it turns out it's not 
 quite that simple.

 If we *require* basic input ranges to be non-copyable,

I don't think we should that because algorithms that don't care 
if a range is copiable (i.e. forward) or not should be usable 
with any input range.

 then types which are inherently copyable (in particular, `T[]`) 
 will be exclude from being basic input ranges. In order to use 
 a `T[]` with an algorithm that expects a basic input range, you 
 would have to wrap it in a `struct` that has copying disabled.

That would indeed be a mistake, but the forward range 
concept/template constraint is a... err... "subtype" of input 
ranges, so it *should* work. My idea here is changing the concept 
of forward ranges from ones that can have `.save` called on them 
to ones that can be copied. If that works, of course.

The case against the idea is that right now, range authors have 
to opt-in to "forwardness", whereas copies have to be disabled to 
opt-out.

 It's also a big problem for composability. If half the 
 algorithms require your range to be copyable, and the other 
 half require it to be non-copyable, you have to keep track of 
 [what color each algorithm is][1] whenever you write a range 
 pipeline, or write a new algorithm that calls an existing one 
 internally.

This is the case already (sort of) with functions that expected 
isForwardRange!R vs isInputRange!R, the difference being a 
relationship of subtyping that makes it that all forward ranges 
are also input ranges.

Jan 23 2024

Steven Schveighoffer <schveiguy gmail.com> writes:

On Monday, 22 January 2024 at 17:18:56 UTC, Paul Backus wrote:
 On Monday, 22 January 2024 at 15:41:35 UTC, Atila Neves wrote:
 I think that `.save` was a historical mistake, and that ranges 
 that can be copied are forward ranges. Something like a range 
 reading from stdin or a socket would/should disable the 
 copy/postblit constructors.

 This was my initial thought too, but it turns out it's not 
 quite that simple.

 If we *require* basic input ranges to be non-copyable, then 
 types which are inherently copyable (in particular, `T[]`) will 
 be exclude from being basic input ranges.

You are thinking about this the wrong way. Copyability becomes a 
trait of forward ranges, instead of input ranges. An input range 
would not be copyable, therefore code which takes input ranges 
would have to take them by reference, or the caller would have to 
ensure no other copies exist. But you can still take forward 
ranges this way.

It is not hard to do -- you just remove the requirement for 
copying from `isInputRange`, and add it to `isForwardRange`.


 In order to use a `T[]` with an algorithm that expects a basic 
 input range, you would have to wrap it in a `struct` that has 
 copying disabled.

A `T[]` can be used without copying it. So why can't you pass 
that into an algorithm that takes an input range?

 It's also a big problem for composability. If half the 
 algorithms require your range to be copyable, and the other 
 half require it to be non-copyable, you have to keep track of 
 [what color each algorithm is][1] whenever you write a range 
 pipeline, or write a new algorithm that calls an existing one 
 internally.

The assertion is that if the range is copyable, it's now a 
forward range and copying is allowed. It is *on you* as the range 
developer to ensure copying does not work if it's an input range. 
The algorithm that says `isInputRange` would have to avoid 
copying, but that also works for forward ranges as well. There 
would be no need for multiple colors.

Now, in *practical* terms, algorithms that are input-range 
compatible likely would need more careful attention to avoid 
copies than they currently do.

-Steve

Jan 23 2024

Paul Backus <snarwin gmail.com> writes:

On Tuesday, 23 January 2024 at 17:25:47 UTC, Steven Schveighoffer 
wrote:
 You are thinking about this the wrong way. Copyability becomes 
 a trait of forward ranges, instead of input ranges. An input 
 range would not be copyable, therefore code which takes input 
 ranges would have to take them by reference, or the caller 
 would have to ensure no other copies exist. But you can still 
 take forward ranges this way.

 It is not hard to do -- you just remove the requirement for 
 copying from `isInputRange`, and add it to `isForwardRange`.

 [...]

 The assertion is that if the range is copyable, it's now a 
 forward range and copying is allowed. It is *on you* as the 
 range developer to ensure copying does not work if it's an 
 input range.

The only problem with this is that what you are asking the range 
developer to do is impossible. :)

Consider what would happen if the user creates a pointer to a 
non-copyable input range.  The pointer (a) is inherently copyable 
and (b) implements the range interface (via implicit 
dereferencing), but (c) the copies are not independent.

(If your response to this is "ok, we'll just have isForwardRange 
explicitly reject pointers", keep in mind that this problem also 
applies to generic wrapper types that use `alias this` or 
`opDispatch`. Are we really going to tell our users that it's 
their fault if their code breaks because they put a range pointer 
inside one of these types?)

So, there has to be some other way of distinguishing forward 
ranges from input ranges, beyond just checking whether they can 
be copied. For example, by using `next()` for input ranges vs. 
`head()`/`tail()` for forward ranges. (As [you pointed out 
earlier][1], reusing the `empty`/`front`/`popFront` API would not 
be a great idea.)

Once you have this other method of distinguishing input and 
forward ranges, requiring forward ranges to be copyable no longer 
implies that pure input ranges MUST be non-copyable. So we can 
allow "pointer-to-input-range" to be an input range itself, even 
though it supports copying, and nothing will break.

[1]: 
https://forum.dlang.org/post/yppqryhpumcucqspkqsx forum.dlang.org

Jan 23 2024

Nick Treleaven <nick geany.org> writes:

On Tuesday, 23 January 2024 at 18:21:47 UTC, Paul Backus wrote:
 (If your response to this is "ok, we'll just have 
 isForwardRange explicitly reject pointers", keep in mind that 
 this problem also applies to generic wrapper types that use 
 `alias this` or `opDispatch`. Are we really going to tell our 
 users that it's their fault if their code breaks because they 
 put a range pointer inside one of these types?)

`isForwardRange` could require the range *type* to have `empty, 
front, popFront` defined. Not sure if that solves your `alias 
this`/`opDispatch` points. This would mean arrays would need the 
range methods built-in rather than using UFCS.

Jan 24 2024

Paul Backus <snarwin gmail.com> writes:

On Wednesday, 24 January 2024 at 12:26:43 UTC, Nick Treleaven 
wrote:
 On Tuesday, 23 January 2024 at 18:21:47 UTC, Paul Backus wrote:
 (If your response to this is "ok, we'll just have 
 isForwardRange explicitly reject pointers", keep in mind that 
 this problem also applies to generic wrapper types that use 
 `alias this` or `opDispatch`. Are we really going to tell our 
 users that it's their fault if their code breaks because they 
 put a range pointer inside one of these types?)

 `isForwardRange` could require the range *type* to have `empty, 
 front, popFront` defined. Not sure if that solves your `alias 
 this`/`opDispatch` points. This would mean arrays would need 
 the range methods built-in rather than using UFCS.

It could, but I don't think it's a great idea, because it would 
make ranges a special case. Along with all of the usual reasons 
special cases are bad, ranges are one of the first things new D 
programmers are going to encounter, and are pretty much the 
canonical example of introspection and structural typing in D. If 
we make ranges behave differently, users who learn from them are 
going to have a much harder time understanding every *other* use 
of introspection.

Jan 24 2024

Steven Schveighoffer <schveiguy gmail.com> writes:

On Tuesday, 23 January 2024 at 18:21:47 UTC, Paul Backus wrote:
 On Tuesday, 23 January 2024 at 17:25:47 UTC, Steven 
 Schveighoffer wrote:
 You are thinking about this the wrong way. Copyability becomes 
 a trait of forward ranges, instead of input ranges. An input 
 range would not be copyable, therefore code which takes input 
 ranges would have to take them by reference, or the caller 
 would have to ensure no other copies exist. But you can still 
 take forward ranges this way.

 It is not hard to do -- you just remove the requirement for 
 copying from `isInputRange`, and add it to `isForwardRange`.

 [...]

 The assertion is that if the range is copyable, it's now a 
 forward range and copying is allowed. It is *on you* as the 
 range developer to ensure copying does not work if it's an 
 input range.

 The only problem with this is that what you are asking the 
 range developer to do is impossible. :)

 Consider what would happen if the user creates a pointer to a 
 non-copyable input range.  The pointer (a) is inherently 
 copyable and (b) implements the range interface (via implicit 
 dereferencing), but (c) the copies are not independent.

 (If your response to this is "ok, we'll just have 
 isForwardRange explicitly reject pointers", keep in mind that 
 this problem also applies to generic wrapper types that use 
 `alias this` or `opDispatch`. Are we really going to tell our 
 users that it's their fault if their code breaks because they 
 put a range pointer inside one of these types?)

Wouldn't that be on the person who wrapped the type? I wouldn't 
expect `RefCounted` to care that it's wrapping a range for 
instance.

We could explicitly forbid pointers and classes, because you 
can't obviously implement value semantics using those.

Indeed, since copying is defined by default, it's more onorous to 
properly create an input range. But so what? Can I tell you how 
many times people copy an input range without realizing the 
consequences today? At least if it's *explicit* that input ranges 
should not be copyable, then you will be more aware of the issue 
as the compiler complains about it.

 So, there has to be some other way of distinguishing forward 
 ranges from input ranges, beyond just checking whether they can 
 be copied. For example, by using `next()` for input ranges vs. 
 `head()`/`tail()` for forward ranges. (As [you pointed out 
 earlier][1], reusing the `empty`/`front`/`popFront` API would 
 not be a great idea.)

Yeah, I agree that if we change semantics, we have to change the 
naming, for the sake of all existing code. But the naming could 
be consistent between input/forward ranges as it is today. I'm 
not keen on the head/tail idea, as it seems way more clunky than 
I would like, and I'm not sold on the compiler making sure the `x 
= x.tail` operation is going to be efficient.

I don't know how bad it would be to deal with, it's not something 
I have put a lot of consideration into, but I had thought that 
this mechanism of preventing copying for things that shouldn't be 
used from multiple copies sounds like the proper way to prevent 
problems by default. Sometimes it's good to start with what is 
simple and correct, even if it's not what you want, and see if 
you can massage it into what makes sense.

-Steve

Jan 24 2024

Paul Backus <snarwin gmail.com> writes:

On Wednesday, 24 January 2024 at 16:18:11 UTC, Steven 
Schveighoffer wrote:
 On Tuesday, 23 January 2024 at 18:21:47 UTC, Paul Backus wrote:
 (If your response to this is "ok, we'll just have 
 isForwardRange explicitly reject pointers", keep in mind that 
 this problem also applies to generic wrapper types that use 
 `alias this` or `opDispatch`. Are we really going to tell our 
 users that it's their fault if their code breaks because they 
 put a range pointer inside one of these types?)

 Wouldn't that be on the person who wrapped the type? I wouldn't 
 expect `RefCounted` to care that it's wrapping a range for 
 instance.

Yes, it would--which is exactly what I'd like to avoid. I don't 
want us to put ourselves in a position where we have to tell end 
users "you're holding it wrong" if they try to combine two 
generic library features like this.

In general, we should not make assumptions in our library code 
unless we are capable of enforcing them. Since we are, 
unfortunately, not capable of enforcing the assumption that all 
copyable ranges have forward-range semantics, we should not allow 
ourselves make it.

(Perhaps there's an opportunity for a language feature here? A 
`unique(T)` qualifier would help with this, for example.)

 Indeed, since copying is defined by default, it's more onorous 
 to properly create an input range. But so what? Can I tell you 
 how many times people copy an input range without realizing the 
 consequences today? At least if it's *explicit* that input 
 ranges should not be copyable, then you will be more aware of 
 the issue as the compiler complains about it.

I agree that permitting input ranges to be copyable puts more of 
a burden on the authors of range algorithms. However, given that 
there are edge cases where the burden of preventing copies *must* 
fall on someone other than the range's author, I would rather 
that person be the author of the generic algorithm than the end 
user.

My hope is that in practice, if Phobos itself contains some 
prominent examples of non-copyable ranges, these bugs will be 
found quickly during testing. That's obviously not as good as a 
reliable compile-time error, but it's not nothing.

 I don't know how bad it would be to deal with, it's not 
 something I have put a lot of consideration into, but I had 
 thought that this mechanism of preventing copying for things 
 that shouldn't be used from multiple copies sounds like the 
 proper way to prevent problems by default.

For what it's worth, I agree that in practice, most input ranges 
should be non-copyable. [That `readf` bug][1] we discussed a 
little while ago would never have existed if `LockingTextReader` 
had been non-copyable from the start, for example.

[1]: https://github.com/dlang/phobos/pull/8826

Jan 24 2024

Paul Backus <snarwin gmail.com> writes:

On Wednesday, 24 January 2024 at 17:35:32 UTC, Paul Backus wrote:
 In general, we should not make assumptions in our library code 
 unless we are capable of enforcing them. Since we are, 
 unfortunately, not capable of enforcing the assumption that all 
 copyable ranges have forward-range semantics, we should not 
 allow ourselves make it.

Having thought about it some more, this is a bad argument, and I 
withdraw it.

The entire range API is based on convention to begin with. Of 
course there's no enforcement mechanism!

Jan 24 2024

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Wednesday, January 24, 2024 11:20:38 AM MST Paul Backus via Digitalmars-d 
wrote:
 On Wednesday, 24 January 2024 at 17:35:32 UTC, Paul Backus wrote:
 In general, we should not make assumptions in our library code
 unless we are capable of enforcing them. Since we are,
 unfortunately, not capable of enforcing the assumption that all
 copyable ranges have forward-range semantics, we should not
 allow ourselves make it.

 Having thought about it some more, this is a bad argument, and I
 withdraw it.

 The entire range API is based on convention to begin with. Of
 course there's no enforcement mechanism!

Yeah. We can enforce some stuff statically (e.g. we can require that a
forward range be a dynamic array or a struct, since if it's a class or a
pointer, we know for sure that it doesn't have the right copy semantics),
but we can't actually enforce the correct copy semantics statically - just
like we can't enforce the correct copy semantics with save right now. We
can't even enforce that popFront removes an element from the range or that
empty has anything to do with how many elements the range has. We can
improve the situation with the API (e.g. getting rid of the requirement for
save), and we can try to make it easier to use correctly and harder to use
incorrectly, but unfortunately, there's no way to not rely on convention on
some basis.

- Jonathan M Davis

Jan 24 2024

Alexandru Ermicioi <alexandru.ermicioi gmail.com> writes:

On Wednesday, 24 January 2024 at 18:20:38 UTC, Paul Backus wrote:
 The entire range API is based on convention to begin with. Of 
 course there's no enforcement mechanism!

Isn't `isInputRange`, `isForwardRange`, and the rest, kind of 
enforcement machanism?

Jan 24 2024

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Wednesday, January 24, 2024 12:23:45 PM MST Alexandru Ermicioi via 
Digitalmars-d wrote:
 On Wednesday, 24 January 2024 at 18:20:38 UTC, Paul Backus wrote:
 The entire range API is based on convention to begin with. Of
 course there's no enforcement mechanism!

 Isn't `isInputRange`, `isForwardRange`, and the rest, kind of
 enforcement machanism?

Traits like that can test (and thus be used to enforce) _some_ of the
semantics, but there are plenty of things that can't be tested statically
(like what popFront actually does). They are able to test that certain code
will compile with the range API (or that certain code won't compile) using
the type that they're instantiated with, and they can test certain explicit
stuff about the type (e.g. enforcing that it's a struct if we want to do
that), but they can't actually test what the functions do.

So, while we can enforce that a forward range is copyable, we can't enforce
what its copy semantics are beyond disallowing stuff like classes or
pointers to structs, since those are clearly reference types. But the struct
itself could have a variety of copy semantics depending on its
implementation, and there's no way to determine that statically. Ultimately,
for a range to have the correct copy semantics, we have to rely on the
programmer to implement them correctly, just like we have to rely on them
implementing what front, popFront, and empty do correctly.

- Jonathan M Davis

Jan 24 2024

Alexandru Ermicioi <alexandru.ermicioi gmail.com> writes:

On Wednesday, 24 January 2024 at 20:08:56 UTC, Jonathan M Davis 
wrote:

 Traits like that can test (and thus be used to enforce) _some_ 
 of the semantics, but there are plenty of things that can't be 
 tested statically (like what popFront actually does). They are 
 able to test that certain code will compile with the range API 
 (or that certain code won't compile) using the type that 
 they're instantiated with, and they can test certain explicit 
 stuff about the type (e.g. enforcing that it's a struct if we 
 want to do that), but they can't actually test what the 
 functions do.

 So, while we can enforce that a forward range is copyable, we 
 can't enforce what its copy semantics are beyond disallowing 
 stuff like classes or pointers to structs, since those are 
 clearly reference types. But the struct itself could have a 
 variety of copy semantics depending on its implementation, and 
 there's no way to determine that statically. Ultimately, for a 
 range to have the correct copy semantics, we have to rely on 
 the programmer to implement them correctly, just like we have 
 to rely on them implementing what front, popFront, and empty do 
 correctly.

That's what unit tests are for, while those traits act like 
pseudo interfaces. Maybe there could be a test suite provided in 
Phobos for user to run over his own range implementations, in his 
own unit tests. This would make easy to spot whether user's range 
does or does not follow proper logic.

Jan 24 2024

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Wednesday, January 24, 2024 2:31:19 PM MST Alexandru Ermicioi via 
Digitalmars-d wrote:
 On Wednesday, 24 January 2024 at 20:08:56 UTC, Jonathan M Davis

 wrote:
 Traits like that can test (and thus be used to enforce) _some_
 of the semantics, but there are plenty of things that can't be
 tested statically (like what popFront actually does). They are
 able to test that certain code will compile with the range API
 (or that certain code won't compile) using the type that
 they're instantiated with, and they can test certain explicit
 stuff about the type (e.g. enforcing that it's a struct if we
 want to do that), but they can't actually test what the
 functions do.

 So, while we can enforce that a forward range is copyable, we
 can't enforce what its copy semantics are beyond disallowing
 stuff like classes or pointers to structs, since those are
 clearly reference types. But the struct itself could have a
 variety of copy semantics depending on its implementation, and
 there's no way to determine that statically. Ultimately, for a
 range to have the correct copy semantics, we have to rely on
 the programmer to implement them correctly, just like we have
 to rely on them implementing what front, popFront, and empty do
 correctly.

 That's what unit tests are for, while those traits act like
 pseudo interfaces.

Of course that's what unit tests are for, but that doesn't enforce anything.
That's the programmers making sure that their code behaves the way that it
should - that their code follows the conventions required by the range API.
On the other hand, what Paul was talking about was actually statically
enforcing the semantics via template constraints rather than relying on
coventions, which is something that we can do with some portions of the
semantics of the range API but can't do for a good chunk of it.

 Maybe there could be a test suite provided in Phobos for user to run over
 his own range implementations, in his own unit tests. This would make easy
 to spot whether user's range does or does not follow proper logic.

Well, we can provide test ranges to use to test a function with, but not
much can be automated there, since the results would depend on what the
function actually did. However, if there were set of standard range types to
test with, then it would at least be easier to test a proper range of range
types. Phobos itself doesn't always do a good job with that though.

As for testing that a range behaves properly, that might be possible to
automate, but since it's usually the case that a range comes from a
function, that's not always straightforward. And while how the range should
behave is largely defined by the range API, not all ranges are generic with
regards to what they work with (e.g. what their element type is) or how
they're constructed, so writing a generic set of tests could be challenging.
Off the top of my head, I don't know how possible it would be, though I'm
sure that it could be made to work with a subset of ranges.

All in all, providing better testing tools is a good idea, but a lot of the
question here is what we can statically enforce, since the more that we can
statically enforce, the harder it will be to write range-based code that
does the wrong thing, and the harder it will be to pass types to a
range-based function which then won't behave properly even if the function
is written properly.

- Jonathan M Davis

Jan 24 2024

Quirin Schroll <qs.il.paperinik gmail.com> writes:

On Wednesday, 24 January 2024 at 20:08:56 UTC, Jonathan M Davis 
wrote:
 Traits like that can test (and thus be used to enforce) _some_ 
 of the semantics, but there are plenty of things that can't be 
 tested statically (like what popFront actually does).

Even if you could, what would you even test? Especially for 
`popFront`, what does it mean to “pop front”? A no-op could be a 
reasonable `popFront` implementation for a range that represents 
a value infinitely repeated, i.e. something like `cycle(1)`.

Feb 01 2024

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Monday, January 22, 2024 8:41:35 AM MST Atila Neves via Digitalmars-d 
wrote:
 On Sunday, 21 January 2024 at 05:00:31 UTC, Jonathan M Davis

 wrote:
 I've been thinking about this for a while now, but with the
 next version of Phobos which is in the early planning stages,
 we really should do some redesigning of ranges. Most of their
 API is just fine how it is, but there are some aspects of it
 which really should be changed if we want them to be better
 (the most obvious one being the removal of auto-decoding). But
 what I'd like to discuss specifically in this thread is fixing
 - and defining - the semantics of copying and assigning to
 ranges. Specifically, the semantics of stuff like

 [...]

 I don't think I've ever encountered a situation where reference
 ranges would have been desirable - I've never used one.

 I think that `.save` was a historical mistake, and that ranges
 that can be copied are forward ranges. Something like a range
 reading from stdin or a socket would/should disable the
 copy/postblit constructors.

Once you disable the copy/postblit constructor, you can't generally pass the
range around or wrap it in another range - at least not without explicit
calls to move. ref parameters and return values would solve some of that,
but it wouldn't solve all of it (wrapping in particular, which ranges do
heavily, won't work with ref). I would expect a range that can't be copied
to be too annoying to even bother with. And in fact, they can't even work
with foreach, because foreach copies the range that it iterates over (and
has to in order to have the semantics match what hapens with dynamic arrays,
though if we separate basic input ranges from forward ranges, then we can
change what happens with basic input ranges). IMHO, if we want to go the
route of trying to make them uncopyable, we pretty much might as well just
get rid of the concept of basic input ranges and require that they use
opApply (which isn't an entirely bad idea given how limited basic input
ranges are in practice anyway).

Basic input ranges are inherently either reference types or pseudo-reference
types, and they can be made to work cleanly if we just require that they be
reference types. Those are very different semantics from forward ranges,
which are value types with regards to their iteration state (assuming that
we require that copying them results in copies which can be independently
iterated rather than having save). And that's why I'm arguing that we should
split the two concepts rather than trying to treat forward ranges like an
extension of basic input ranges - especially since basic input ranges are
extremely hamstrung anyway, because you can't actually get an independent
copy of them, and in practice, most stuff needs that. And if we did go with
a solution for basic input ranges which made them non-copyable, then they
couldn't be used with the same code that uses forward ranges anyway, so we
might as well just give them a different API and stop treating them like the
same thing when they really aren't.

 Has anyone here used the class-based ranges?

Yes. Symmetry uses them. In general, it's best not to use them for forward
ranges, but when you're dealing with code where you can't determine the type
at compile time (like with an interpreter), then you need some some kind of
runtime polymorphism, and you really don't have much choice. That being
said, we can support that by wrapping such classes in structs and ensuring
that the structs have the correct copy semantics rather than allowing
forward ranges that are classes. So, it should be quite possible to get rid
of save without losing any functionality, and we'd get much cleaner copy
semantics in the process, making it so that forward ranges in general behave
more like dynamic arrays.

For basic input ranges, it should be perfectly fine for them to be classes -
desirable even - simply because you can't actually get independent copies of
them, and using a struct with pseudo-reference copy semantics is just
begging for bugs, because it means that mutating the copy puts the original
in an invalid state, whereas if we can require that all basic input ranges
be full-on reference types (which could include structs and pointers to
structs so long as they have the correct semantics), then we can actually
rely on sane, consistent copy semantics for basic input ranges. It's just
that they would be different from the copy semantics of forward ranges -
which would also be true if we were somehow able to make it work to make
basic input ranges noncopyable instead, though I really don't think that
that will work.

- Jonathan M Davis

Jan 22 2024

victoroak <victoroak victor.oak> writes:

On Sunday, 21 January 2024 at 05:00:31 UTC, Jonathan M Davis 
wrote:
  [...]

I think one problem with just "next" as the api for an Input 
Range is that currently I don't know any safe way to return a 
reference to the item. If "next" returns a pointer we can't be 
sure the backing data structure (a growable array like 
std::vector for example) did not freed that memory yet but I 
guess this is the same for any range right now.

Any way, it would be good to have an optimization for Nullable!T 
or a different type that when T is nullable (e.g. classes or 
pointers) it doesn't need a boolean flag but also loses the 
ability to store null inside it.

That's one of the optimizations of "Option" in Rust, although in 
Rust references can't be null so it's ok to do this.

Jan 22 2024

monkyyy <crazymonkyyy gmail.com> writes:

On Sunday, 21 January 2024 at 05:00:31 UTC, Jonathan M Davis 
wrote:
 wall-o-text

Can this be summarized as

```d
enum isRangeCopible(R)= is(
   (R r){
     R rdup=r;
     r=rdup;
   }
);
```

 What I'd like to get out of this thread is feedback on how much 
 sense this idea does

I think someone formal should collect all changes people want to 
make to ranges

see my thread for .index and feature sets
theres something about immutablity
I would also suggest arrays should be more important then doubly 
linked lists

Jan 22 2024

D Programming

C/C++ Programming

Other

digitalmars.D - Range Redesign: Copy Semantics