www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - I'm confused about ranges (input and forward in practice, in

reply =?UTF-8?B?Ikx1w61z?= Marques" <luis luismarques.eu> writes:
Despite using them all the time, I'm suddenly confused about 
ranges...

My understanding is that (for library-worth code) algorithms that 
consume ranges are supposed to use .save() to be compatible with 
both ref type and value type ranges. What about the reverse? Are 
algorithms supposed to try to avoid effectively .save()ing ranges 
(when they are value types) by not copying range variables? 
Furthermore, is it incorrect for an input range that is also not 
a forward range (or at least does not declare itself formally as 
a forward range by having a save()) to allow itself to be bitwise 
copied? (i.e., to effectively provide a save() behaviour).

To make it a more concrete, consider the following:

     struct ValInputRange
     {
         int v;
         int front() { return v; }
         void popFront() { ++v; }
         bool empty = false;
     }

     class RefInputRange
     {
         int v;
         int front() { return v; }
         void popFront() { ++v; }
         bool empty = false;
     }

     void main()
     {
         auto vl = ValInputRange();
         auto rf = new RefInputRange();

         assert(vl.startsWith([0, 1]) == true);
         assert(vl.startsWith([0, 1]) == true);

         assert(rf.startsWith([0, 1]) == true);
         assert(rf.startsWith([0, 1]) == false);

         writeln(vl.take(3)); // [0, 1, 2]
         writeln(rf.take(3)); // [1, 2, 3]
     }

- is startsWith() supposed to be transparently usable with both 
val-type ranges and val-type ranges? Currently it provides 
different semantics for these, arguably, as in the example.

- startsWith accepts an InputRange. What does this mean in 
practice? I'm used (perhaps incorrectly) to thinking about a 
proper InputRange (one that isn't also a ForwardRange, etc.) as 
one that provides a single pass. But many algorithms, such as 
startsWith, have multi-pass semantics: despite not save()ing the 
range explicitly, since startsWith() receives the range by value, 
it is copied, and therefore a subsequent pass is available.

- If the current (dual) behaviour is desired, shouldn't 
startsWith at least document how it mutates the inputed range? I 
mean, to me it wasn't obvious that the last element wouldn't be 
popped after the comparison (it's slightly extra work, but if I'm 
OK with dropping the other elements then I probably don't want 
the last element to remain either), just as an example.

- If not, shouldn't it accept the range by ref?

- IMO, there *really* should be a small reference page on ranges 
that included all of these more subtle points. The most 
reference-style page on ranges is perhaps the documentation for 
std.ranges.interfaces and it doesn't discuss many of these kinds 
of points). BTW, I thought that after the discussion on the 
subject (due to Walter's parser work) it had been decided that it 
was necessary to call empty() before first calling front(), but 
that page says "Calling r.front is allowed only if calling 
r.empty has, *or would have*, returned false." (emphasis added). 
What's the current consensus?

- shouldn't InputRanges be more explicit about their implicit 
ForwardRange'ness or lack thereof? For instance, is it 100% 
obvious to everybody which of these versions will be correct? For 
all acceptable implementations of File and .byLine()?

     void main()
     {
         import std.file : write;
         import std.stdio : File;
         import std.algorithm;

         write("/tmp/test.txt", "line one\nline two\nline three");
         auto lines = File("/tmp/test.txt").byLine;

         // guess which...

         version(one)
         {
             assert(lines.startsWith(["line one", "line two"]) == 
true);
             assert(lines.startsWith(["line one", "line two"]) == 
true);
         }

         version(two)
         {
             assert(lines.startsWith(["line one", "line two"]) == 
true);
             assert(lines.startsWith(["line two", "line three"]) 
== true);
         }
     }
Aug 13 2015
next sibling parent "Alex Parrill" <initrd.gz gmail.com> writes:
On Friday, 14 August 2015 at 00:33:30 UTC, Luís Marques wrote:
 ...
Yea, it would be nice if all ranges were reference types, and calling `save` made a duplicate of the cursor (or whatever the range is supposed to be). However, that would mean that they would have to be allocated and garbage collected, which a lot of overhead that D is working to avoid. Or perhaps there's another solution that I can't think of. Bitwise copying of a struct `rng` isn't necessarily the same as doing the same thing as `rng.save` though. Just because something is a struct, doesn't mean it has value semantics. `std.stdio.File` is technically a struct, but it's just a refcounted wrapper around a pointer to an internal file object, so it technically has reference semantics (hence why version(two) in your example is the output). This also applies to wrapper ranges like map and filter, if they wrap reference-semantic objects. Also consider a barebones range that contains only an `int` file descriptor. Right now, I mostly treat it as if passing a range through a non-`ref` parameter consumes the original range. I don't know if `startsWith` should take the range by reference, though. On one hand, it would allow you to use the range afterwards if it matches, but it would have weird, unexpected effects on code like this: string mystring = "Hello World"; if(mystring.startsWith("Hello")) writeln(mystring) // This would print " World" because startsWith modified mystring I do think `startsWith` not popping off the last character is a bit of a bug though.
Aug 13 2015
prev sibling next sibling parent reply "Gary Willoughby" <dev nomad.so> writes:
On Friday, 14 August 2015 at 00:33:30 UTC, Luís Marques wrote:
 Despite using them all the time, I'm suddenly confused about 
 ranges...

 My understanding is that (for library-worth code) algorithms 
 that consume ranges are supposed to use .save() to be 
 compatible with both ref type and value type ranges. What about 
 the reverse? Are algorithms supposed to try to avoid 
 effectively .save()ing ranges (when they are value types) by 
 not copying range variables? Furthermore, is it incorrect for 
 an input range that is also not a forward range (or at least 
 does not declare itself formally as a forward range by having a 
 save()) to allow itself to be bitwise copied? (i.e., to 
 effectively provide a save() behaviour).
I was confused a lot about ranges until I realised that a range is only a `view` onto the data. A range is not *the* data. For example, consuming a range does not consume the underlying data or modify it in any way. A forward range's save method is for creating a copy of the view, not a copy of the underlying data. This is to allow two ranges to be consumed independently without affecting the data. Usually, these ranges are returned in the form of a Result struct as shown here: https://github.com/D-Programming-Language/phobos/blob/master/std/uni.d#L6517
Aug 14 2015
parent Jacob Carlborg <doob me.com> writes:
On 2015-08-14 09:28, Gary Willoughby wrote:

 I was confused a lot about ranges until I realised that a range is only
 a `view` onto the data. A range is not *the* data.

 For example, consuming a range does not consume the underlying data or
 modify it in any way. A forward range's save method is for creating a
 copy of the view, not a copy of the underlying data. This is to allow
 two ranges to be consumed independently without affecting the data.

 Usually, these ranges are returned in the form of a Result struct as
 shown here:
 https://github.com/D-Programming-Language/phobos/blob/master/std/uni.d#L6517
Until you work with arrays, which is both data and a range at the same time. -- /Jacob Carlborg
Aug 14 2015
prev sibling parent "anonymous" <anonymous example.com> writes:
On Friday, 14 August 2015 at 00:33:30 UTC, Luís Marques wrote:
 Despite using them all the time, I'm suddenly confused about 
 ranges...

 My understanding is that (for library-worth code) algorithms 
 that consume ranges are supposed to use .save() to be 
 compatible with both ref type and value type ranges.
I don't think we have that requirement. When you don't want the range to be consumed, you should make the call on a `.save`d copy.
 What about the reverse? Are algorithms supposed to try to avoid 
 effectively .save()ing ranges (when they are value types) by 
 not copying range variables?
Algorithms cannot assume that copying is the same as `.save`ing. I think copying is ok, but once you've `.popFront`ed the original or the copy, you can't assume that the other one is still usable.
 Furthermore, is it incorrect for an input range that is also 
 not a forward range (or at least does not declare itself 
 formally as a forward range by having a save()) to allow itself 
 to be bitwise copied? (i.e., to effectively provide a save() 
 behaviour).
It's tempting. I don't how practical or enforceable it would be.
 To make it a more concrete, consider the following:
[...]
     void main()
     {
         auto vl = ValInputRange();
         auto rf = new RefInputRange();

         assert(vl.startsWith([0, 1]) == true);
         assert(vl.startsWith([0, 1]) == true);

         assert(rf.startsWith([0, 1]) == true);
         assert(rf.startsWith([0, 1]) == false);

         writeln(vl.take(3)); // [0, 1, 2]
         writeln(rf.take(3)); // [1, 2, 3]
     }

 - is startsWith() supposed to be transparently usable with both 
 val-type ranges and val-type ranges? Currently it provides 
 different semantics for these, arguably, as in the example.
From another point of view, startsWith does the same on both ranges, and it's the ranges that are behaving differently. That means, if you don't want startsWith to consume your range, .save it in the call. Otherwise, be prepared for startsWith to .popFront stuff away.
 - startsWith accepts an InputRange. What does this mean in 
 practice? I'm used (perhaps incorrectly) to thinking about a 
 proper InputRange (one that isn't also a ForwardRange, etc.) as 
 one that provides a single pass. But many algorithms, such as 
 startsWith, have multi-pass semantics: despite not save()ing 
 the range explicitly, since startsWith() receives the range by 
 value, it is copied, and therefore a subsequent pass is 
 available.
That's not a property of startsWith but of the range. startsWith could force reference semantics with a ref parameter. But then you couldn't pass rvalue ranges, which would be annoying.
 - If the current (dual) behaviour is desired, shouldn't 
 startsWith at least document how it mutates the inputed range?
Maybe. But being vague has value in that the exact behavior can change later on without breaking what's documented. If a user can't accept vague popping, .save is there. [...]
 - shouldn't InputRanges be more explicit about their implicit 
 ForwardRange'ness or lack thereof? For instance, is it 100% 
 obvious to everybody which of these versions will be correct? 
 For all acceptable implementations of File and .byLine()?

     void main()
     {
         import std.file : write;
         import std.stdio : File;
         import std.algorithm;

         write("/tmp/test.txt", "line one\nline two\nline 
 three");
         auto lines = File("/tmp/test.txt").byLine;

         // guess which...

         version(one)
         {
             assert(lines.startsWith(["line one", "line two"]) 
 == true);
             assert(lines.startsWith(["line one", "line two"]) 
 == true);
         }

         version(two)
         {
             assert(lines.startsWith(["line one", "line two"]) 
 == true);
             assert(lines.startsWith(["line two", "line three"]) 
 == true);
         }
     }
I don't think it's obvious, and I don't think you should rely on the actual behavior if it's not documented. Instead force things your way with .save or std.range.refRange: When you don't care if a range is popped or not: Pass it as it is. When you want it to be popped: refRange. Can pass as is when the range has proper reference semantics or when passing via a ref parameter. When you don't want it to be popped: .save. Can pass as is when the range has proper value semantics and it's not passed via a ref parameter. Applied to the example: ---- version(one) { assert(lines.save.startsWith(["line one", "line two"])); assert(lines.save.startsWith(["line one", "line two"])); } version(two) { import std.range: refRange; assert(refRange(&lines).startsWith(["line one", "line two"])); assert(refRange(&lines).startsWith(["line two", "line three"])); } ---- Version one doesn't compile, meaning you can't enforce value semantics. Version two compiles, but since startsWith is vague about its use of .popFront, you probably shouldn't rely on what's there after the first call. In the end, byLine and startsWith don't work together well, because the range is a mere input range, and the algorithm is not about popping things. Maybe you should write your own ifStartsWithThenPopIt (with a better name). tl;dr: use .save and std.range.refRange
Aug 14 2015