www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Input ranges

reply "Ulrich =?UTF-8?B?S8O8dHRsZXIi?= <kuettler gmail.com> writes:
Hi,

I tried to learn about input ranges and got some anonymous advice 
http://forum.dlang.org/thread/dezlxxygufocmafvleen forum.dlang.org 
, still the topic seems to warrant broader attention.

The question is how pure input ranges (as in non-forward) should 
be build. Algorithms like take and groupBy require reference 
semantics. I did not find that communicated. Now I am looking for 
the official ruling. As in: What the D rule book says.
Apr 20 2015
parent Jonathan M Davis via Digitalmars-d <digitalmars-d puremagic.com> writes:
On Monday, April 20, 2015 07:54:32 via Digitalmars-d wrote:
 Hi,

 I tried to learn about input ranges and got some anonymous advice
 http://forum.dlang.org/thread/dezlxxygufocmafvleen forum.dlang.org
 , still the topic seems to warrant broader attention.

 The question is how pure input ranges (as in non-forward) should
 be build. Algorithms like take and groupBy require reference
 semantics. I did not find that communicated. Now I am looking for
 the official ruling. As in: What the D rule book says.
Honestly, the cleanest behavior is that all ranges be reference types, and save be required to copy the range. However, there are no actual requirements that that be the case, and structs and arrays are what is most frequently used for ranges, and they tend to be half-reference types or value types. So, passing a range to a function usually ends up being equivalent to a call to save, which makes it so that many range-based functions don't work properly with class ranges, since they usually aren't tested with them. And since arrays certainly aren't going to be full reference types, it just isn't possible to require that a forward range be a full reference type, so we need to support value type ranges, half reference type ranges, and full reference type ranges when it comes to forward ranges, and we have to deal with the fact that copying them _can_ be equivalent to saving them without assuming that it is. :| So, as far as forward ranges go, you shouldn't rely on copying a range being equivalent to save (otherwise save wouldn't need to exist), but realistically, if you pass a range to a function, you can't simply use that range after. Many ranges will have saved themselves in the process, but not all of them will have. The range in the caller is still valid (fortunately), but whether it's in the same state that it was before calling the other function depends entirely on whether copying the range is equivalent to save or not, meaning that you just can't use a range after you've passed it to another function unless you called save on it (forcing copy semantics) or passed it by ref (forcing reference semantics). Unfortunately, there are almost certainly lingering bugs in Phobos due to functions which assume that passing range to another function implicitly saves it, simply because that's what happens with such a large percentage of ranges. With regard to pure input ranges, they're not forward ranges, because what they're iterating over cannot be saved. So, by their very nature, they have to be at least partially reference types and folks will often assume that they are full reference types. The problem is that they can also be half reference types in that they could have saved some state internally (e.g. front) but still not be able to actually have a proper save function which saves all of their state. So, if you pass a pure input range to a function, then just like forward ranges, you cannot assume that you can use it again without passing it by reference, even though by definition, their internal state will have changed at least partially due to whatever iteration is done within the function that it's passed to (since it has to be at least a partial reference type to be a pure input range). However, unlike with a forward range, the input range is not a full reference type, then it won't be left in a valid state, since it won't have be saved, just had some of its contents passed by reference and some of them by value. All in all, I think that input ranges are incredibly annoying to work with, and I'm frequently tempted to argue that they just shouldn't exist. They're just too restrictive and tend to have weird behaviors, but unfcortunately, there are cases where it's prohibitively expensive to have a forward range. :| In any case, all that is presently _required_ of ranges is what the traits isInputRange, isForwardRange, etc. test for. We can't actually require more than that, because the compiler can't check for more. What we _can_ do (and has been discussed but never fully sorted out) is write up some additional rules about what is expected by Phobos and encourage the community at large to follow them with their ranges, but those can't actually be enforced, and we've never been able to agree on those rules (particularly since, in most cases, they aren't necessary, meaning that it only comes up in the corner cases and hasn't been enough of a pain point to force change). For instance, many of us argued that it should not be legal for empty to do any work, but counter-examples were given where it was necessary. So, it becomes very difficult to place additional requirements beyond what the API itself says - even as guidelines with regards to what the standard library expects. Now, as to pure input ranges and reference semantics, allowing half reference type ranges is very dangerous, because any time that they're copied, and the copy is iterated at all, the original will be left in an invalid state (unlike with forward ranges). So, we need to either require that all pure input ranges be full reference types or require that pure input ranges never be used after they're copied (since whether the original is still valid or not will depend on whether it's a full reference type or not). I'd like to require that all pure input ranges be full reference types, but if a function is going to operate both on pure input ranges and on forward ranges, then you can't assume that the range is a reference type anyway, because the forward range probably won't be. That being the case, what we probably need to do is to publish the guideline that _no_ range ever be used after it has been copied (be it a forward range or a pure input range). Then, any function which wants reference semantics will need to pass by ref or use std.range.RefRange to wrap the range, and any function which doesn't want reference semantics, will need to restrict itself to forward ranges and use save explicitly. I don't think that we really have any other option. - Jonathan M Davis
Apr 20 2015