www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - output ranges: by ref or by value?

reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Consider:

R2 copy(R1, R2)(R1 src, R2 tgt) {
    foreach (ref e; src) tgt.put(e);
    return tgt;
}

Currently output ranges are supposed to define only the put() method. 
However, you also want to copy using regular ranges as a target, hence 
the shim:

// pseudo-method
void put(R, E)(ref R tgt, E e) {
    tgt.front = e;
    tgt.popFront();
}

Now copying ranges is possible even when they don't define put().

An example of "pure" output range is ArrayAppender. It doesn't define 
anything interesting except put.

The question of this post is the following: should output ranges be 
passed by value or by reference? ArrayAppender uses an extra indirection 
to work properly when passed by value. But if we want to model built-in 
arrays' operator ~=, we'd need to request that all output ranges be 
passed by reference.

Any ideas - please share.


Andrei
Dec 31 2009
parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2009-12-31 09:58:06 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 The question of this post is the following: should output ranges be 
 passed by value or by reference? ArrayAppender uses an extra 
 indirection to work properly when passed by value. But if we want to 
 model built-in arrays' operator ~=, we'd need to request that all 
 output ranges be passed by reference.

I think modeling built-in arrays is the way to go as it makes less things to learn. In fact, it makes it easier to learn ranges because you can begin by learning arrays, then transpose this knowledge to ranges which are more abstract and harder to grasp. Beside, an extra indirection is wasteful when you don't need it. It's easier to add a new layer of indirection when you need one than the reverse, so the primitive shouldn't require any indirection.
 // pseudo-method
 void put(R, E)(ref R tgt, E e) {
     tgt.front = e;
     tgt.popFront();
 }

I like that because it works especially well with arrays. Here's what I'm thinking of: char[10] buffer; char[] remainingSpace = buffer[]; while (!remainingSpace.empty) remainingSpace.put(getc()); // now buffer is full writeln(buffer); -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Dec 31 2009
next sibling parent reply Philippe Sigaud <philippe.sigaud gmail.com> writes:
--00151758f270e2fe25047c14f576
Content-Type: text/plain; charset=ISO-8859-1

On Thu, Dec 31, 2009 at 16:47, Michel Fortin <michel.fortin michelf.com>wrote:

 On 2009-12-31 09:58:06 -0500, Andrei Alexandrescu <
 SeeWebsiteForEmail erdani.org> said:

  The question of this post is the following: should output ranges be passed
 by value or by reference? ArrayAppender uses an extra indirection to work
 properly when passed by value. But if we want to model built-in arrays'
 operator ~=, we'd need to request that all output ranges be passed by
 reference.

I think modeling built-in arrays is the way to go as it makes less things to learn. In fact, it makes it easier to learn ranges because you can begin by learning arrays, then transpose this knowledge to ranges which are more abstract and harder to grasp.

I agree. And arrays may well be the most used range anyway. Beside, an extra indirection is wasteful when you don't need it. It's easier
 to add a new layer of indirection when you need one than the reverse, so the
 primitive shouldn't require any indirection.

So (squint through sleep-deprived eyes:) that makes it by ref, right?
  // pseudo-method
 void put(R, E)(ref R tgt, E e) {
    tgt.front = e;
    tgt.popFront();
 }


- this new put needs hasAssignableElements!R, right? What's in this case the difference between isOutputRange!R and hasAssignableElements!R? - should all higher-order ranges expose a put method if possible? (stride comes to mind, but also take or filter). - does that play nice with the new auto ref / ref template parameter from 2.038? It seems to me that this new feature will go hand in hand with this, but I may be mistaken. - your shim method will be used like this: put(r,e); whereas for an output range you use: r.put(e); and you cannot freely go from one form to the other, except for arrays, which are output ranges anyway [*]. Does that mean that you must disseminate static if ByRef/Assignable/Output/Whatever checks everywhere, to use either put(r,e) or r.put(e)? - what if R is a range of ranges (ie: if E is itself a range). Should put by invoked recursively? What if its a chunked range? - something I wanted to ask for a long time: does put really write to the range as written in the docs or to the underlying container for which the output range is but a 'writable' view? The container/range separation does not exist for arrays, but is important for other cases. Philippe [*] except if this transformation rule is implemented now? --00151758f270e2fe25047c14f576 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable <div class=3D"gmail_quote">On Thu, Dec 31, 2009 at 16:47, Michel Fortin <sp= an dir=3D"ltr">&lt;<a href=3D"mailto:michel.fortin michelf.com">michel.fort= in michelf.com</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" s= tyle=3D"border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8e= x; padding-left: 1ex;"> <div class=3D"im">On 2009-12-31 09:58:06 -0500, Andrei Alexandrescu &lt;<a = href=3D"mailto:SeeWebsiteForEmail erdani.org" target=3D"_blank">SeeWebsiteF= orEmail erdani.org</a>&gt; said:<br> <br> <blockquote class=3D"gmail_quote" style=3D"border-left: 1px solid rgb(204, = 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"> The question of this post is the following: should output ranges be passed = by value or by reference? ArrayAppender uses an extra indirection to work p= roperly when passed by value. But if we want to model built-in arrays&#39; = operator ~=3D, we&#39;d need to request that all output ranges be passed by= reference.<br> </blockquote> <br></div> I think modeling built-in arrays is the way to go as it makes less things t= o learn. In fact, it makes it easier to learn ranges because you can begin = by learning arrays, then transpose this knowledge to ranges which are more = abstract and harder to grasp.<br> </blockquote><div><br>I agree. And arrays may well be the most used range a= nyway.<br><br></div><blockquote class=3D"gmail_quote" style=3D"border-left:= 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex= ;"> Beside, an extra indirection is wasteful when you don&#39;t need it. It&#39= ;s easier to add a new layer of indirection when you need one than the reve= rse, so the primitive shouldn&#39;t require any indirection.</blockquote> <div><br>So (squint through sleep-deprived eyes:) that makes it by ref, rig= ht?</div><div>=A0<br><br></div><blockquote class=3D"gmail_quote" style=3D"b= order-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; paddin= g-left: 1ex;"> <div> <br> <blockquote class=3D"gmail_quote" style=3D"border-left: 1px solid rgb(204, = 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"> // pseudo-method<br> void put(R, E)(ref R tgt, E e) {<br> =A0 =A0tgt.front =3D e;<br> =A0 =A0tgt.popFront();<br> }<br></blockquote></div></blockquote><div><br>A few random comments, sorry = if they are not entirely coherent:<br><br>- this new put needs hasAssignabl= eElements!R, right? What&#39;s in this case the difference between isOutput= Range!R and hasAssignableElements!R?<br> <br>- should all higher-order ranges expose a put method if possible? (stri= de comes to mind, but also take or filter).<br><br>- does that play nice wi= th the new auto ref / ref template parameter from 2.038? It seems to me tha= t this new feature will go hand in hand with this, but I may be mistaken.<b= r> <br>- your shim method will be used like this:<br><br>put(r,e);<br><br>wher= eas for an output range you use:<br><br>r.put(e);<br><br>and you cannot fre= ely go from one form to the other, except for arrays, which are output rang= es anyway [*]. Does that mean that you must disseminate static if ByRef/Ass= ignable/Output/Whatever checks everywhere, to use either put(r,e) or r.put(= e)?<br> <br>- what if R is a range of ranges (ie: if E is itself a range). Should p= ut by invoked recursively? What if its a chunked range?<br><br> - something I wanted to ask for a long time: does put really write to the range as written in the docs or to the underlying container for which t= he output range is but a &#39;writable&#39; view? The container/range separation does not exist for = arrays, but is important for other cases.<br><br><br>=A0 Philippe<br><br>[*= ] except if this transformation rule is implemented now?<br></div></div> --00151758f270e2fe25047c14f576--
Dec 31 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Philippe Sigaud wrote:
 On Thu, Dec 31, 2009 at 16:47, Michel Fortin <michel.fortin michelf.com 
 <mailto:michel.fortin michelf.com>> wrote:
 
     On 2009-12-31 09:58:06 -0500, Andrei Alexandrescu
     <SeeWebsiteForEmail erdani.org
     <mailto:SeeWebsiteForEmail erdani.org>> said:
 
         The question of this post is the following: should output ranges
         be passed by value or by reference? ArrayAppender uses an extra
         indirection to work properly when passed by value. But if we
         want to model built-in arrays' operator ~=, we'd need to request
         that all output ranges be passed by reference.
 
 
     I think modeling built-in arrays is the way to go as it makes less
     things to learn. In fact, it makes it easier to learn ranges because
     you can begin by learning arrays, then transpose this knowledge to
     ranges which are more abstract and harder to grasp.
 
 
 I agree. And arrays may well be the most used range anyway.

Upon more thinking, I'm leaning the other way. ~= is a quirk of arrays motivated by practical necessity. I don't want to propagate that quirk into ranges. The best output range is one that works properly when passed by value.
     Beside, an extra indirection is wasteful when you don't need it.
     It's easier to add a new layer of indirection when you need one than
     the reverse, so the primitive shouldn't require any indirection.
 
 
 So (squint through sleep-deprived eyes:) that makes it by ref, right?
  
 
 
         // pseudo-method
         void put(R, E)(ref R tgt, E e) {
            tgt.front = e;
            tgt.popFront();
         }

It doesn't. The ref in there is to pass tgt to the pseudo-method put, not to the function that invokes it.
 A few random comments, sorry if they are not entirely coherent:
 
 - this new put needs hasAssignableElements!R, right? What's in this case 
 the difference between isOutputRange!R and hasAssignableElements!R?

It's a good question. There are two possible designs: 1. In the current design, the difference is that hasAssignableElements!R does not imply the range may grow. Consider this: auto a = new int[10], b = new int[10]; copy(a, b); This should work. But this shouldn't: auto a = new int[10], b = new int[5]; copy(a, b); because copy does not grow the target. If you want to append to b, you write: copy(a, appender(&b)); 2. In the design sketched in http://www.informit.com/articles/printerfriendly.aspx?p=1407357, iteration is separated from access. In that approach, you'd have a one-pass range for both input and output. I'm not sure which design is better. (Suggestions are welcome.) For a pure output range, it's awkward to define empty() (what should it return?) and it's also awkward to put elements by calling two functions front/popFront instead of one.
 - should all higher-order ranges expose a put method if possible? 
 (stride comes to mind, but also take or filter).

I don't think so. The generic put will take care of that.
 - does that play nice with the new auto ref / ref template parameter 
 from 2.038? It seems to me that this new feature will go hand in hand 
 with this, but I may be mistaken.

There's no obvious connection. The nice thing about auto ref is this: struct SomeAdapterRange(R) if (isWhateverRange!R) { private R src; property auto ref front() { return src.front; } } You don't want to see how that looks today. Actually: http://www.dsource.org/projects/phobos/browser/trunk/phobos/std/range.d Search the page for "mixin" :o}.
 - your shim method will be used like this:
 
 put(r,e);
 
 whereas for an output range you use:
 
 r.put(e);
 
 and you cannot freely go from one form to the other, except for arrays, 
 which are output ranges anyway [*]. Does that mean that you must 
 disseminate static if ByRef/Assignable/Output/Whatever checks 
 everywhere, to use either put(r,e) or r.put(e)?

The D language automatically rewrites the latter into the former.
 - what if R is a range of ranges (ie: if E is itself a range). Should 
 put by invoked recursively? What if its a chunked range?

I don't know.
 - something I wanted to ask for a long time: does put really write to 
 the range as written in the docs or to the underlying container for 
 which the output range is but a 'writable' view? The container/range 
 separation does not exist for arrays, but is important for other cases.

Depends on how the range is defined. Appender holds a pointer to an array and appends to it. But appender is a special-purpose range. A usual range cannot change the topology of the container it's under. Andrei
Jan 01 2010
next sibling parent Michel Fortin <michel.fortin michelf.com> writes:
On 2010-01-01 09:47:58 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 Upon more thinking, I'm leaning the other way. ~= is a quirk of arrays 
 motivated by practical necessity. I don't want to propagate that quirk 
 into ranges. The best output range is one that works properly when 
 passed by value.

I agree and disagree. I wasn't proposing that ranges support ~=, and I don't thing it'd be a good idea, so I agree with you here. But I still believe output ranges should behave like arrays. I was proposing that you model output ranges after buffers. A stream then becomes a buffer of infinite length. Look at this example: char[10] buffer; char[] remainingSpace = buffer[]; while (!remainingSpace.empty) remainingSpace.put(getc()); // now buffer is full writeln(buffer); Now rename "remainingSpace" for "outputStream" and it works fine, except for two things: "empty" sounds strange, and an output stream is never empty of remaining space: it's infinite length. But conceptually, a buffer and a stream are almost the same, one having finite capacity while the other is infinite. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Jan 01 2010
prev sibling next sibling parent reply Jason House <jason.james.house gmail.com> writes:
Andrei Alexandrescu wrote:

 Philippe Sigaud wrote:
 On Thu, Dec 31, 2009 at 16:47, Michel Fortin <michel.fortin michelf.com
 <mailto:michel.fortin michelf.com>> wrote:
 
     On 2009-12-31 09:58:06 -0500, Andrei Alexandrescu
     <SeeWebsiteForEmail erdani.org
     <mailto:SeeWebsiteForEmail erdani.org>> said:
 
         The question of this post is the following: should output ranges
         be passed by value or by reference? ArrayAppender uses an extra
         indirection to work properly when passed by value. But if we
         want to model built-in arrays' operator ~=, we'd need to request
         that all output ranges be passed by reference.
 
 
     I think modeling built-in arrays is the way to go as it makes less
     things to learn. In fact, it makes it easier to learn ranges because
     you can begin by learning arrays, then transpose this knowledge to
     ranges which are more abstract and harder to grasp.
 
 
 I agree. And arrays may well be the most used range anyway.

Upon more thinking, I'm leaning the other way. ~= is a quirk of arrays motivated by practical necessity. I don't want to propagate that quirk into ranges. The best output range is one that works properly when passed by value.

I worry about a growing level of convention with ranges. Another recent range thread discussed the need to call consume after a successful call to startsWith. If I violated convention and had a range class, things would fail miserably. There would be no need to consume after a successful call to startsWith and the range would have a random number of elements removed on an unsuccessful call to startsWith. I'm pretty sure that early discussions of ranges claimed that they could be either structs and classes, but in practice that isn't the case.
Jan 01 2010
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Jason House wrote:
 Andrei Alexandrescu wrote:
 
 Philippe Sigaud wrote:
 On Thu, Dec 31, 2009 at 16:47, Michel Fortin <michel.fortin michelf.com
 <mailto:michel.fortin michelf.com>> wrote:

     On 2009-12-31 09:58:06 -0500, Andrei Alexandrescu
     <SeeWebsiteForEmail erdani.org
     <mailto:SeeWebsiteForEmail erdani.org>> said:

         The question of this post is the following: should output ranges
         be passed by value or by reference? ArrayAppender uses an extra
         indirection to work properly when passed by value. But if we
         want to model built-in arrays' operator ~=, we'd need to request
         that all output ranges be passed by reference.


     I think modeling built-in arrays is the way to go as it makes less
     things to learn. In fact, it makes it easier to learn ranges because
     you can begin by learning arrays, then transpose this knowledge to
     ranges which are more abstract and harder to grasp.


 I agree. And arrays may well be the most used range anyway.

motivated by practical necessity. I don't want to propagate that quirk into ranges. The best output range is one that works properly when passed by value.

I worry about a growing level of convention with ranges. Another recent range thread discussed the need to call consume after a successful call to startsWith. If I violated convention and had a range class, things would fail miserably. There would be no need to consume after a successful call to startsWith and the range would have a random number of elements removed on an unsuccessful call to startsWith. I'm pretty sure that early discussions of ranges claimed that they could be either structs and classes, but in practice that isn't the case.

I am implementing right now a change in the range interface mentioned in http://www.informit.com/articles/printerfriendly.aspx?p=1407357, namely: add a function save() that saves the iteration state of a range. With save() in tow, class ranges and struct ranges can be used the same way. True, if someone forgets to say auto copy = r.save(); and instead says: auto copy = r; the behavior will indeed be different for class ranges and struct ranges. Andrei
Jan 01 2010
parent reply Jason House <jason.james.house gmail.com> writes:
Andrei Alexandrescu Wrote:

 Jason House wrote:
 Andrei Alexandrescu wrote:
 
 Philippe Sigaud wrote:
 On Thu, Dec 31, 2009 at 16:47, Michel Fortin <michel.fortin michelf.com
 <mailto:michel.fortin michelf.com>> wrote:

     On 2009-12-31 09:58:06 -0500, Andrei Alexandrescu
     <SeeWebsiteForEmail erdani.org
     <mailto:SeeWebsiteForEmail erdani.org>> said:

         The question of this post is the following: should output ranges
         be passed by value or by reference? ArrayAppender uses an extra
         indirection to work properly when passed by value. But if we
         want to model built-in arrays' operator ~=, we'd need to request
         that all output ranges be passed by reference.


     I think modeling built-in arrays is the way to go as it makes less
     things to learn. In fact, it makes it easier to learn ranges because
     you can begin by learning arrays, then transpose this knowledge to
     ranges which are more abstract and harder to grasp.


 I agree. And arrays may well be the most used range anyway.

motivated by practical necessity. I don't want to propagate that quirk into ranges. The best output range is one that works properly when passed by value.

I worry about a growing level of convention with ranges. Another recent range thread discussed the need to call consume after a successful call to startsWith. If I violated convention and had a range class, things would fail miserably. There would be no need to consume after a successful call to startsWith and the range would have a random number of elements removed on an unsuccessful call to startsWith. I'm pretty sure that early discussions of ranges claimed that they could be either structs and classes, but in practice that isn't the case.

I am implementing right now a change in the range interface mentioned in http://www.informit.com/articles/printerfriendly.aspx?p=1407357, namely: add a function save() that saves the iteration state of a range. With save() in tow, class ranges and struct ranges can be used the same way. True, if someone forgets to say auto copy = r.save(); and instead says: auto copy = r; the behavior will indeed be different for class ranges and struct ranges.

Or if they completely forgot that bit of convention and omit creating a variable called save... Also, doesn't use of save degrade performance for structs? Or does the inliner/optimizer remove the copy variable altogether?
 
 Andrei

Jan 01 2010
next sibling parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2010-01-01 14:10:40 -0500, Jason House <jason.james.house gmail.com> said:

 With save() in tow, class ranges and struct ranges can be used the same
 way. True, if someone forgets to say
 
 auto copy = r.save();
 
 and instead says:
 
 auto copy = r;
 
 the behavior will indeed be different for class ranges and struct ranges.

Or if they completely forgot that bit of convention and omit creating a variable called save... Also, doesn't use of save degrade performance for structs? Or does the inliner/optimizer remove the copy variable altogether?

I don't think this will be a problem for the optimizer. But I say dump save(): it's more trouble than it's worth. I'm sure it will be a problem for most programmers, as most will learn and test algorithms with arrays and struct ranges and not classes. "save()" is a detail too easy to miss. The benefit of supporting classes by asking everyone to remember adding save() seems tiny considering you can easily create a struct wrapper if you really need your range to be a class. I know that the struct wrapper would probably force more copies of a range than necessary. But there are many ways this could be alleviated. For instance, the struct wrapper could use copy-on-write with a reference counter to detect when a copy is necessary. So while it'd be a little harder to design a range from a class, it'd be easier for everyone to use ranges correctly. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Jan 01 2010
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Michel Fortin wrote:
 On 2010-01-01 14:10:40 -0500, Jason House <jason.james.house gmail.com> 
 said:
 
 With save() in tow, class ranges and struct ranges can be used the same
 way. True, if someone forgets to say

 auto copy = r.save();

 and instead says:

 auto copy = r;

 the behavior will indeed be different for class ranges and struct 
 ranges.

Or if they completely forgot that bit of convention and omit creating a variable called save... Also, doesn't use of save degrade performance for structs? Or does the inliner/optimizer remove the copy variable altogether?

I don't think this will be a problem for the optimizer. But I say dump save(): it's more trouble than it's worth. I'm sure it will be a problem for most programmers, as most will learn and test algorithms with arrays and struct ranges and not classes. "save()" is a detail too easy to miss. The benefit of supporting classes by asking everyone to remember adding save() seems tiny considering you can easily create a struct wrapper if you really need your range to be a class. I know that the struct wrapper would probably force more copies of a range than necessary. But there are many ways this could be alleviated. For instance, the struct wrapper could use copy-on-write with a reference counter to detect when a copy is necessary. So while it'd be a little harder to design a range from a class, it'd be easier for everyone to use ranges correctly.

save() is not only for classes. It also distinguishes input ranges from forward ranges. It's the primitive that STL didn't define but should have. Andrei
Jan 01 2010
parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2010-01-01 15:53:42 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 save() is not only for classes. It also distinguishes input ranges from 
 forward ranges. It's the primitive that STL didn't define but should 
 have.

Right. I still maintain that it's a bad approach. I've written a lot of algorithms in C++ that worked with iterators, always assuming assignment would copy the state. Fortunately I didn't had to use input iterators with them, most of the time. But I did once or twice, and the thing was working slightly off. What I'd do instead is somehow make input ranges non-copyable. They could be either passed by ref or moved, never copied. This way they would still behave exactly like array slices, only not copyable, and you get a compile-time error if you try to copy them which is infinitely better than a subtle change in behavior. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Jan 01 2010
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Michel Fortin wrote:
 On 2010-01-01 15:53:42 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> said:
 
 save() is not only for classes. It also distinguishes input ranges 
 from forward ranges. It's the primitive that STL didn't define but 
 should have.

Right. I still maintain that it's a bad approach. I've written a lot of algorithms in C++ that worked with iterators, always assuming assignment would copy the state. Fortunately I didn't had to use input iterators with them, most of the time. But I did once or twice, and the thing was working slightly off. What I'd do instead is somehow make input ranges non-copyable. They could be either passed by ref or moved, never copied. This way they would still behave exactly like array slices, only not copyable, and you get a compile-time error if you try to copy them which is infinitely better than a subtle change in behavior.

I tried that, it makes input ranges next to unusable. save() is an imperfect but workable solution. Andrei
Jan 01 2010
parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2010-01-01 17:54:12 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 Michel Fortin wrote:
 What I'd do instead is somehow make input ranges non-copyable. They 
 could be either passed by ref or moved, never copied. This way they 
 would still behave exactly like array slices, only not copyable, and 
 you get a compile-time error if you try to copy them which is 
 infinitely better than a subtle change in behavior.

I tried that, it makes input ranges next to unusable.

I think I can see why. You can't have ref member and local variables like in C++, so it's pretty hard to use references.
 save() is an imperfect but workable solution.

save() is an workable but error-prone solution. Perhaps we could mitigate this by making people more aware of the difference instead. Couldn't we rename "input range" for "input stream"? Currently you have ranges that behave one way and ranges that behave the other way, which is confusing. Having a different name for both would emphasize there is a difference. With different names, you're guarantied to get the "what's the difference?" question from newbies. And it's simple to explain: "You can often use ranges and streams interchangeably, but for that to work you must use save() when you need a copy of the current state. Also, not all streams support save(). It's good practice to always use save() so that algorithms work for both for ranges and streams." -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Jan 02 2010
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Michel Fortin wrote:
 On 2010-01-01 17:54:12 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> said:
 
 Michel Fortin wrote:
 What I'd do instead is somehow make input ranges non-copyable. They 
 could be either passed by ref or moved, never copied. This way they 
 would still behave exactly like array slices, only not copyable, and 
 you get a compile-time error if you try to copy them which is 
 infinitely better than a subtle change in behavior.

I tried that, it makes input ranges next to unusable.

I think I can see why. You can't have ref member and local variables like in C++, so it's pretty hard to use references.
 save() is an imperfect but workable solution.

save() is an workable but error-prone solution. Perhaps we could mitigate this by making people more aware of the difference instead. Couldn't we rename "input range" for "input stream"? Currently you have ranges that behave one way and ranges that behave the other way, which is confusing. Having a different name for both would emphasize there is a difference. With different names, you're guarantied to get the "what's the difference?" question from newbies. And it's simple to explain: "You can often use ranges and streams interchangeably, but for that to work you must use save() when you need a copy of the current state. Also, not all streams support save(). It's good practice to always use save() so that algorithms work for both for ranges and streams."

That's an idea, and names are powerful, but I think it's reasonable to not expect miracles from that name change. It has disadvantages too - "input range" vs. "forward range" clarifies there's a conceptual relationship between the two, whereas "streams" are different from anything else. Andrei
Jan 02 2010
parent Michel Fortin <michel.fortin michelf.com> writes:
On 2010-01-02 09:59:51 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 That's an idea, and names are powerful, but I think it's reasonable to 
 not expect miracles from that name change. It has disadvantages too - 
 "input range" vs. "forward range"  clarifies there's a conceptual 
 relationship between the two, whereas "streams" are different from 
 anything else.

I'm not expecting a miracle from it either, it'd just be much less confusing. You could say that assignment of an input stream might or might not save its state (depending on the stream type) so you must call save() to save the state when working with streams, but ranges are guarantied to save their state on assignment, thus behaving more predictably and just like arrays. So if you're working only with ranges, not streams, you never need to worry about save(). A similar option would be to have both input ranges and input streams: * input range: by value semantics, no need for save() * input stream: by reference semantics A pointer to an input range would thus automatically qualify as an input stream, so it's easy to give an input range to a function taking an input stream. Well, except for stack-allocated ranges in SafeD for which you can't create a pointer. This pretty much break the idea, I think. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Jan 02 2010
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Jason House wrote:
 Andrei Alexandrescu Wrote:
 
 Jason House wrote:
 Andrei Alexandrescu wrote:
 
 Philippe Sigaud wrote:
 On Thu, Dec 31, 2009 at 16:47, Michel Fortin
 <michel.fortin michelf.com 
 <mailto:michel.fortin michelf.com>> wrote:
 
 On 2009-12-31 09:58:06 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org 
 <mailto:SeeWebsiteForEmail erdani.org>> said:
 
 The question of this post is the following: should output
 ranges be passed by value or by reference? ArrayAppender uses
 an extra indirection to work properly when passed by value.
 But if we want to model built-in arrays' operator ~=, we'd
 need to request that all output ranges be passed by
 reference.
 
 
 I think modeling built-in arrays is the way to go as it makes
 less things to learn. In fact, it makes it easier to learn
 ranges because you can begin by learning arrays, then
 transpose this knowledge to ranges which are more abstract
 and harder to grasp.
 
 
 I agree. And arrays may well be the most used range anyway.

arrays motivated by practical necessity. I don't want to propagate that quirk into ranges. The best output range is one that works properly when passed by value.

recent range thread discussed the need to call consume after a successful call to startsWith. If I violated convention and had a range class, things would fail miserably. There would be no need to consume after a successful call to startsWith and the range would have a random number of elements removed on an unsuccessful call to startsWith. I'm pretty sure that early discussions of ranges claimed that they could be either structs and classes, but in practice that isn't the case.

mentioned in http://www.informit.com/articles/printerfriendly.aspx?p=1407357, namely: add a function save() that saves the iteration state of a range. With save() in tow, class ranges and struct ranges can be used the same way. True, if someone forgets to say auto copy = r.save(); and instead says: auto copy = r; the behavior will indeed be different for class ranges and struct ranges.

Or if they completely forgot that bit of convention and omit creating a variable called save... Also, doesn't use of save degrade performance for structs? Or does the inliner/optimizer remove the copy variable altogether?

It may be best to discuss this on an example: /** If $(D startsWith(r1, r2)), consume the corresponding elements off $(D r1) and return $(D true). Otherwise, leave $(D r1) unchanged and return $(D false). */ bool consume(R1, R2)(ref R1 r1, R2 r2) if (isForwardRange!R1 && isInputRange!R2) { auto r = r1.save(); while (!r2.empty && !r.empty && r.front == r2.front) { r.popFront(); r2.popFront(); } if (r2.empty) { r1 = r; return true; } return false; } For most structs, save() is very simple: auto save() { return this; } For classes, save() entails creating a new object: auto save() { return new typeof(this)(this); } If the implementor of consume() forgets to call save(), the situation is unpleasant albeit not catastrophic: for most struct ranges things will continue to work, but for class ranges the function will fail to perform to spec. I don't know how to improve on that. Anyway, it's not entirely a convention. I'll change isForwardRange to require the existence of save(). Andrei
Jan 01 2010
parent reply Rainer Deyke <rainerd eldwood.com> writes:
Andrei Alexandrescu wrote:
 If the implementor of consume() forgets to call save(), the situation is
 unpleasant albeit not catastrophic: for most struct ranges things will
 continue to work, but for class ranges the function will fail to perform
 to spec. I don't know how to improve on that.

Require that all ranges are structs. If you want to implement a range as a class, use a wrapper struct that creates a new object in its postblit function. The wrapper struct can be made generic and placed in the standard library. Same performance as the current approach, slightly more effort on the part of the range implementor, much easier and less error-prone on the side of the range user. -- Rainer Deyke - rainerd eldwood.com
Jan 01 2010
next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Rainer Deyke wrote:
 Andrei Alexandrescu wrote:
 If the implementor of consume() forgets to call save(), the situation is
 unpleasant albeit not catastrophic: for most struct ranges things will
 continue to work, but for class ranges the function will fail to perform
 to spec. I don't know how to improve on that.

Require that all ranges are structs. If you want to implement a range as a class, use a wrapper struct that creates a new object in its postblit function. The wrapper struct can be made generic and placed in the standard library.

That's a good idea, but it doesn't work with covariant return types. Those are needed for the container hierarchy that I'm working on. Andrei
Jan 01 2010
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Rainer Deyke wrote:
 Andrei Alexandrescu wrote:
 If the implementor of consume() forgets to call save(), the situation is
 unpleasant albeit not catastrophic: for most struct ranges things will
 continue to work, but for class ranges the function will fail to perform
 to spec. I don't know how to improve on that.

Require that all ranges are structs. If you want to implement a range as a class, use a wrapper struct that creates a new object in its postblit function. The wrapper struct can be made generic and placed in the standard library. Same performance as the current approach, slightly more effort on the part of the range implementor, much easier and less error-prone on the side of the range user.

Oh, besides it doesn't work for struct ranges that iterate one-pass streams. Andrei
Jan 01 2010
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Steven Schveighoffer wrote:
 On Fri, 01 Jan 2010 18:45:35 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 
 Rainer Deyke wrote:
 Andrei Alexandrescu wrote:
 If the implementor of consume() forgets to call save(), the 
 situation is
 unpleasant albeit not catastrophic: for most struct ranges things will
 continue to work, but for class ranges the function will fail to 
 perform
 to spec. I don't know how to improve on that.

as a class, use a wrapper struct that creates a new object in its postblit function. The wrapper struct can be made generic and placed in the standard library. Same performance as the current approach, slightly more effort on the part of the range implementor, much easier and less error-prone on the side of the range user.

Oh, besides it doesn't work for struct ranges that iterate one-pass streams.

What does save do in those cases? -Steve

It provides a syntactic differentiation between input ranges and forward ranges. STL's input iterators are syntactically indistinguishable from forward iterators. Because of that, all STL algorithms that expect forward ranges will also compile and run with input ranges, although never with the expected result. This has been a lingering problem with C++98, and a key motivator for concepts. Since syntactic differences cannot be used to distinguish between input and forward iterators, the reasoning went, we need something else as a discriminator - and that would be a concept: someone who defines an input iterator declares it obeys the input iterator concept, whereas someone defining a forward iterator declares it obeys the forward iterator concept. During the meltdown of concepts, a number of people including Bjarne Stroustrup and myself have suggested that a simple workable solution would be to define an extra function a la "save" that is used in algorithms and only supported by forward iterators, but not by input iterators. Then, algorithms use save() and will correctly refuse to compile calls with input iterators. The remaining risk is that someone writes an algorithm and forgets to use save(). Andrei
Jan 02 2010
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Steven Schveighoffer wrote:
 On Sat, 02 Jan 2010 13:06:25 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 
 Steven Schveighoffer wrote:
 On Fri, 01 Jan 2010 18:45:35 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:

 Rainer Deyke wrote:
 Andrei Alexandrescu wrote:
 If the implementor of consume() forgets to call save(), the 
 situation is
 unpleasant albeit not catastrophic: for most struct ranges things 
 will
 continue to work, but for class ranges the function will fail to 
 perform
 to spec. I don't know how to improve on that.

range as a class, use a wrapper struct that creates a new object in its postblit function. The wrapper struct can be made generic and placed in the standard library. Same performance as the current approach, slightly more effort on the part of the range implementor, much easier and less error-prone on the side of the range user.

Oh, besides it doesn't work for struct ranges that iterate one-pass streams.

-Steve

It provides a syntactic differentiation between input ranges and forward ranges. STL's input iterators are syntactically indistinguishable from forward iterators. Because of that, all STL algorithms that expect forward ranges will also compile and run with input ranges, although never with the expected result. This has been a lingering problem with C++98, and a key motivator for concepts. Since syntactic differences cannot be used to distinguish between input and forward iterators, the reasoning went, we need something else as a discriminator - and that would be a concept: someone who defines an input iterator declares it obeys the input iterator concept, whereas someone defining a forward iterator declares it obeys the forward iterator concept. During the meltdown of concepts, a number of people including Bjarne Stroustrup and myself have suggested that a simple workable solution would be to define an extra function a la "save" that is used in algorithms and only supported by forward iterators, but not by input iterators. Then, algorithms use save() and will correctly refuse to compile calls with input iterators. The remaining risk is that someone writes an algorithm and forgets to use save().

Would it not be as useful then to just define attributes on the type and save a bunch of useless/weird looking code? that is, have an enum value inside the type that defines its state can be saved. It seems to me like save is the equivalent of that anyways, since its possible to forget to use it, and still have your code compile.

If you have an enum that says a range's state can be saved, then you still need a function to effect that save :o). So you added, not removed, bloat.
 Basically, it appears to me that save either a) doesn't compile or b) is 
 the equivalent of assignment.  It doesn't seem to add much to the 
 functionality.

No, for class ranges save() is nontrivial (a sort of clone). I suspect even struct ranges for certain file-oriented stuff would do nontrivial work (e.g. open the file anew and fseek() on the current position etc.)
 This is all except for classes, which I think have no business being 
 ranges in the first place.

Well then how would the container hierarchy work? It does need range interfaces. How does dcollections deal with that? Andrei
Jan 02 2010
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Steven Schveighoffer wrote:
 On Sat, 02 Jan 2010 13:51:48 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 
 Steven Schveighoffer wrote:
  Would it not be as useful then to just define attributes on the type 
 and save a bunch of useless/weird looking code?
  that is, have an enum value inside the type that defines its state 
 can be saved.  It seems to me like save is the equivalent of that 
 anyways, since its possible to forget to use it, and still have your 
 code compile.

If you have an enum that says a range's state can be saved, then you still need a function to effect that save :o). So you added, not removed, bloat.

The function already exists -- opAssign. My point is that save would just be the same as opAssign in most cases where you'd want to implement save. Cases where you'd want to implement save differently than opAssign are cases where you wouldn't want to use it in a generic fashion.

You might mean this(this). Either doesn't help because you'd still be unable to differentiate between input ranges and forward ranges. Much of the point of save() is to mark a syntactic difference between input ranges and forward ranges. Input ranges still need this(this) and opAssign - those just have different semantics.
 Basically, it appears to me that save either a) doesn't compile or b) 
 is the equivalent of assignment.  It doesn't seem to add much to the 
 functionality.

No, for class ranges save() is nontrivial (a sort of clone). I suspect even struct ranges for certain file-oriented stuff would do nontrivial work (e.g. open the file anew and fseek() on the current position etc.)

Classes should not be ranges. And I hope that any algorithm that uses savable ranges would not be used on a file range. For example, your consume function: bool consume(R1, R2)(ref R1 r1, R2 r2) if (isForwardRange!R1 && isInputRange!R2) { auto r = r1.save(); // open an extra file, and seek the file to the given point *just in case*?! while (!r2.empty && !r.empty && r.front == r2.front) { r.popFront(); r2.popFront(); } if (r2.empty) { r1 = r; // note that this unaliases r1 from any other copies (not save()'d copies) that were made of it. // also note that you now have opened more handles to the same file. Calling this many times could // consume quite a few handles. return true; } return false; } Here is a good exercise that would help clarify what we need: determine all the types of ranges you can think of that would need a special save function.

A range that defines save() is a forward range. save() creates an independent range from its source. The file etc. example was hypothetical but realistic.
 This is all except for classes, which I think have no business being 
 ranges in the first place.

Well then how would the container hierarchy work? It does need range interfaces. How does dcollections deal with that?

A container is not a range. A range of a container should not be a class. For dcollections, I was planning on ranges being a struct with a begin and end cursor defining the part of the container referenced by the range. Such a range can always be copied and modifying the copy through the range functions should not modify the original range.

Struct ranges won't work with a container hierarchy. If you define a container hierarchy (classes etc.) you'll also need a range hierarchy. Otherwise defining the inheritance relationship is impossible. Consider: abstract class Container(E) { // most general container property bool empty(); bool add(E element); E removeAny(); InputRange!E opSlice(); } That's what I'd expect any container worth its salt to implement: (a) test for empty; (b) add an element to the container; (c) remove some element from the container; (d) get a range that spans the entire container. (Probably removeAll() and a range-positioned remove() would make sense, too.) My point is, how do you inherit this guy? Well by taking advantage of covariant return types: abstract class Array(E) : Container!E { property bool empty(); bool add(E element); E removeAny(); RandomAccessRange!E opSlice(); ... more stuff ... } Ergo, RandomAccessRange!E must inherit InputRange!E for covariance to kick in. The resulting setup is quite interesting: you either know you work with an Array and therefore you get a random-access range, or you just use the generic Container and get an input range.
 I see a range as being useful for iteration or algorithms, but not for 
 general use.  A great example is AAs.  Would you say that an AA *is* a 
 range or should *provide* a range?  If it is a range, does that mean you 
 remove elements as you call popFront?  Does that make any sense?  If it 
 doesn't, then what happens if you add elements through another alias to 
 that AA?

An AA provides several ranges - among which byKey and byValue. Andrei
Jan 02 2010
next sibling parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2010-01-02 21:46:57 -0500, "Steven Schveighoffer" 
<schveiguy yahoo.com> said:

 That was the point of the enum.  It would be a synthetic difference, 
 but  IMO so is save.  If it turns out that the only usable times you 
 implement  save all look like:
 
 auto save() { return *this; }
 
 then save gives you nothing.  It's kind of a proof by induction (I think).
 
 You say that algorithm A requires the ability to save the state of a  
 range.  So I define a function save on a range which cannot be saved by 
 a  simple assignment (i.e. a file).  I use A on that range, and the 
 results  are not what I expect or kill performance or consume unneeded 
 resources,  I'd rather not use algorithm A on that range, negating the 
 need to define  a save function in the first place.
 
 I think that's what we're going to end up with.

This is making me think... First, opening files silently whenever an algorithm feels the need to save its state is just madness. What if the operating system decide to pop a security window when opening a restricted file? What if the file has been deleted or replaced in its directory while your program is still hanging on the old inode? One might want to save the position in a file in order to be able to return there later, but when you return there the last thing you actually want to do is to open the file a second time: what you might realistically want is to set the position to what it was before, calling fseek. So perhaps it would be more useful to have both save() to save the current state (the position in a file), and restore() which would restore the saved state (returning to the saved position in a file). For ranges with reference semantics -- please call them streams! -- save() and restore() would work just as fpos and fseek for files, but they might also not be available like in a TCP stream or in a communication channel between threads. For ranges such as array slices, save and restore would just copy the range in one or the other direction. The only reason to have them is so they can be used as streams. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Jan 02 2010
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Michel Fortin wrote:
 On 2010-01-02 21:46:57 -0500, "Steven Schveighoffer" 
 <schveiguy yahoo.com> said:
 
 That was the point of the enum.  It would be a synthetic difference, 
 but  IMO so is save.  If it turns out that the only usable times you 
 implement  save all look like:

 auto save() { return *this; }

 then save gives you nothing.  It's kind of a proof by induction (I 
 think).

 You say that algorithm A requires the ability to save the state of a  
 range.  So I define a function save on a range which cannot be saved 
 by a  simple assignment (i.e. a file).  I use A on that range, and the 
 results  are not what I expect or kill performance or consume unneeded 
 resources,  I'd rather not use algorithm A on that range, negating the 
 need to define  a save function in the first place.

 I think that's what we're going to end up with.

This is making me think... First, opening files silently whenever an algorithm feels the need to save its state is just madness. What if the operating system decide to pop a security window when opening a restricted file? What if the file has been deleted or replaced in its directory while your program is still hanging on the old inode? One might want to save the position in a file in order to be able to return there later, but when you return there the last thing you actually want to do is to open the file a second time: what you might realistically want is to set the position to what it was before, calling fseek. So perhaps it would be more useful to have both save() to save the current state (the position in a file), and restore() which would restore the saved state (returning to the saved position in a file).

These are implementation matters. save() could lazily save information, duplicate the file handle (cheap on many OSs), etc.
 For ranges with reference semantics -- please call them streams! -- 
 save() and restore() would work just as fpos and fseek for files, but 
 they might also not be available like in a TCP stream or in a 
 communication channel between threads.
 
 For ranges such as array slices, save and restore would just copy the 
 range in one or the other direction. The only reason to have them is so 
 they can be used as streams.

This I don't understand. Array slices' save is: T[] save(T[] array) { return array; } Andrei
Jan 02 2010
parent Michel Fortin <michel.fortin michelf.com> writes:
On 2010-01-03 00:51:46 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 First, opening files silently whenever an algorithm feels the need to 
 save its state is just madness. What if the operating system decide to 
 pop a security window when opening a restricted file? What if the file 
 has been deleted or replaced in its directory while your program is 
 still hanging on the old inode?
 
 One might want to save the position in a file in order to be able to 
 return there later, but when you return there the last thing you 
 actually want to do is to open the file a second time: what you might 
 realistically want is to set the position to what it was before, 
 calling fseek. So perhaps it would be more useful to have both save() 
 to save the current state (the position in a file), and restore() which 
 would restore the saved state (returning to the saved position in a 
 file).

These are implementation matters. save() could lazily save information, duplicate the file handle (cheap on many OSs), etc.

 For ranges with reference semantics -- please call them streams! -- 
 save() and restore() would work just as fpos and fseek for files, but 
 they might also not be available like in a TCP stream or in a 
 communication channel between threads.
 
 For ranges such as array slices, save and restore would just copy the 
 range in one or the other direction. The only reason to have them is so 
 they can be used as streams.

This I don't understand. Array slices' save is: T[] save(T[] array) { return array; }

The idea is that you would use save and restore like this: auto state = range.save(); // ... range.restore(state); For array slices, save doesn't really have to change, just define restore: T[] save(T[] array) { return array; } void restore(ref T[] array, T[] state) { array = state; } But with file handles: size_t save(FILE handle) { return fpos(handle); } void restore(FILE handle, size_t state) { fseek(handle, state); } And for a range interface to be used in a class hierarchy: interface Range { Variant save(); void restore(Variant state); } This way you don't have to create a new range every time you need to start from a previously saved state, you can reuse the existing one which is much more efficient. Also, the saved state can be anything, of any type, and doesn't need to be a range. So when implementing a non-trivial range you don't need to implement lazy initialization logic and all sort of complicated stuff just in case someone might save the state. Note that with save defined like this you can't really duplicate the range in the general case. I'm not sure if this is good or bad. How many algorithms really require you to duplicate a range? -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Jan 03 2010
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Steven Schveighoffer wrote:
 My theory is, given this list of ranges, if you pair them with an 
 algorithm that requires save capability, you wouldn't want to use that 
 algorithm on it anyways (kinda like the consume example).

Why gratuitously limit the design? You're asking to replace this: R save() { return this; } with: enum thisIsAForwardRange = true; Is there a reason? The former leaves in flexibility. The latter doesn't, for no good reason.
 Struct ranges won't work with a container hierarchy. If you define a 
 container hierarchy (classes etc.) you'll also need a range hierarchy. 
 Otherwise defining the inheritance relationship is impossible. Consider:

 abstract class Container(E) { // most general container
       property bool empty();
      bool add(E element);
      E removeAny();
      InputRange!E opSlice();
 }

 That's what I'd expect any container worth its salt to implement: (a) 
 test for empty; (b) add an element to the container; (c) remove some 
 element from the container; (d) get a range that spans the entire 
 container. (Probably removeAll() and a range-positioned remove() would 
 make sense, too.)

The range interface is compile-time only, there is no need to define it in the container interface. opSlice does not need to be part of the interface, even if it is defined on all the container types. opApply makes for a much better interface-friendly iteration anyways.

I want container users to be able to save iteration state and also to open up containers to std.algorithm and other goodies, so I'm shying away from opApply.
 BTW, the primitives in dcollections are:
 
 clear(); // clear all elements
 remove(V v); // remove an element

Search and remove? That's an odd primitive. Why wouldn't you offer an interface for iteration that allows an algorithm for search, and a primitive for positioned removal?
 contains(V v); // returns whether an element is contained in the colleciton

I don't think this belongs to primitives. It's O(n) for many containers and again it's a generic algorithm, not a member.
 length(); // the length of the collection

That's not a primitive either. std.algorithm.walkLength is. For me, all sorts of red flags and alarm buzzers go off when primitives are guaranteed that can't be implemented efficiently but by a subset of containers. You can't discount complexity as a implementation detail.
 dup(); // duplicate the collection
 opApply(int delegate(ref V v) dg); // iterate the collection
 opApply(int delegate(ref bool doPurge, ref V v) dg); // purge the 
 collection
 
 That means it covers only empty in your list of must-have functions (via 
 length() == 0).

How do you implement length() for a singly-linked list? Is empty() going to take O(n)?
 Add is not a primitive because the Map collections 
 shouldn't assign some random key to the new element.  removeAny is 
 defined only on sets and multisets, but I'm not sure that it couldn't be 
 moved to Collection, in fact, I'll probably do that.

add is a primitive that takes Tuple!(K, V) as the element type.
 Note that it's missing begin and end which are defined on every single 
 container type (i.e. the equivalent of the all-elements range).  This is 
 because those primitives return a struct that is different for every 
 container type.

So you can't write container-independent iteration code unless you use opApply, in which case composition becomes tenuous.
 It also surpasses opSlice via opApply, since all an input range can do 
 anyways is iterate.  In fact, opApply is more powerful because you can 
 change elements and remove elements (via purging).  Plus it's more 
 efficient than a range-via-interface.

An input range is a subset of other (more powerful) ranges. It's also much more flexible. I agree that calling range primitives via interfaces can become an efficiency liability.
 I see a range as being useful for iteration or algorithms, but not 
 for general use.  A great example is AAs.  Would you say that an AA 
 *is* a range or should *provide* a range?  If it is a range, does 
 that mean you remove elements as you call popFront?  Does that make 
 any sense?  If it doesn't, then what happens if you add elements 
 through another alias to that AA?

An AA provides several ranges - among which byKey and byValue.

I misunderstood your statement "[a container hierarchy] does need range interfaces." I thought you meant containers themselves need to implement the range interface, I see now that isn't the case, so my bad.

Yah, they'd offer it as a result of opSlice(). Covariant return types will ensure there's no virtual call when you know what container you operate on. Above all: the primitive set for a container must be a small set of functions that (a) can be implemented by all containers within reasonable efficiency bounds, and (b) are container-specific, not generic. IMHO any container design that defines a search(Element) as a primitive is misguided. Searching is not a container primitive - it's an algorithm. Only more specialized containers (e.g. trees, hashes etc.) can afford to define search(Element) as a primitive. Linear search is a generic algorithm that works the same for everyone. It does not belong as a method of any container. Andrei
Jan 02 2010
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Steven Schveighoffer wrote:
 On Sun, 03 Jan 2010 00:49:08 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 
 Steven Schveighoffer wrote:
 My theory is, given this list of ranges, if you pair them with an 
 algorithm that requires save capability, you wouldn't want to use 
 that algorithm on it anyways (kinda like the consume example).

Why gratuitously limit the design? You're asking to replace this: R save() { return this; } with: enum thisIsAForwardRange = true; Is there a reason? The former leaves in flexibility. The latter doesn't, for no good reason.

Well, one thing you could do is: enum thisIsAnInputRange = true; and then no special implementation is needed for normal forward ranges. the other point is there is no special treatment needed inside algorithms -- the risk of forgetting to use save at the right points of the algorithm is higher than forgetting to say isForwardRange!(R) at the beginning of the function.

isForwardRange will be defined to yield true if and only if the range defines save. But I see you point - user code only asserts isForwardRange and then does not bother to use save(), just copies stuff around in confidence that copying does the right thing. Thanks for this insight. I don't see how to reconcile that with class ranges and covariance.
 Not having opSlice be part of the interface itself does not preclude it 
 from implementing opSlice, and does not preclude using ranges of it in 
 std.algorithm.  If I'm not mistaken, all functions in std.algorithm rely 
 on compile time interfaces.  opApply allows for full input range 
 functionality for things like copying and outputting where templates may 
 not be desired.

The point is how much container-independent code can someone write by using the Container interface. If all you have is a Container, you can't use it with any range algorithm.
 BTW, the primitives in dcollections are:
  clear(); // clear all elements
 remove(V v); // remove an element

Search and remove? That's an odd primitive. Why wouldn't you offer an interface for iteration that allows an algorithm for search, and a primitive for positioned removal?

Search and positioned removal are also a primitives, but not defined on the interface. remove was a primitive on Tango's containers, and dcollections was originally meant to be a replacement for Tango's containers. I think the point is, if you have an interface reference, what would be the minimum functionality needed so that you could use a container without knowing its implementation.

Yes, and I think remove(V v) does not belong to the minimum functionality. It combines two functions (search and remove) and raises questions such as what happens to duplicate elements.
 contains(V v); // returns whether an element is contained in the 
 colleciton

I don't think this belongs to primitives. It's O(n) for many containers and again it's a generic algorithm, not a member.

Again, it's part of the minimal usable interface. It's not a generic algorithm, because some containers can implement this more efficiently.

But this is exactly what I believe to be a mistake: you are abstracting away algorithmic complexity.
 Plus, to use the generic algorithms, you would need to use interfaces as 
 ranges which I think are completely useless.

Why?
 length(); // the length of the collection

That's not a primitive either. std.algorithm.walkLength is. For me, all sorts of red flags and alarm buzzers go off when primitives are guaranteed that can't be implemented efficiently but by a subset of containers. You can't discount complexity as a implementation detail.

All current dcollection containers have O(1) length.

Some containers can't define O(1) length conveniently.
 dup(); // duplicate the collection
 opApply(int delegate(ref V v) dg); // iterate the collection
 opApply(int delegate(ref bool doPurge, ref V v) dg); // purge the 
 collection
  That means it covers only empty in your list of must-have functions 
 (via length() == 0).

How do you implement length() for a singly-linked list? Is empty() going to take O(n)?

first, dcollections' list implementation is doubly linked because all collections are forward and backward iterable. second, even for singly linked lists, you can have either O(1) length or O(1) splicing (consuming a link list range into another linked list). Dcollections' default link list implementation uses O(1) length since I think splicing is a specialized requirement.

Right. The question is how much pressure Container is putting on the implementation. I'd rather leave it to the list implementation to decide to store the length or not.
 Add is not a primitive because the Map collections shouldn't assign 
 some random key to the new element.  removeAny is defined only on 
 sets and multisets, but I'm not sure that it couldn't be moved to 
 Collection, in fact, I'll probably do that.

add is a primitive that takes Tuple!(K, V) as the element type.

How do you define that on Container(V)? on Map(K, V), set(K k, V v) is an interface method.

Map!(K, V) has Tuple!(K, V) as its element type.
 what you can do is define Map(K, V) as inheriting Container(Tuple!(K, 
 V)), but then trying to use the container functions are very 
 cumbersome.  In dcollections, Map(K, V) inherits Collection(V).
 
 Note that it's missing begin and end which are defined on every 
 single container type (i.e. the equivalent of the all-elements 
 range).  This is because those primitives return a struct that is 
 different for every container type.

So you can't write container-independent iteration code unless you use opApply, in which case composition becomes tenuous.

No, you can easily write container-independent iteration as long as you have the implementation.

In this context: container-independent = using the Container interface. This is the whole purpose of creating a container hierarchy. If the container design fosters knowing the implementation, maybe a class hierarchy is not the right choice in the first place.
 If you use interfaces you can write opApply wrappers to do different 
 things.  I'm not sure what you mean by composition.

For example, compose ranges a la retro or take.
 It also surpasses opSlice via opApply, since all an input range can 
 do anyways is iterate.  In fact, opApply is more powerful because you 
 can change elements and remove elements (via purging).  Plus it's 
 more efficient than a range-via-interface.

An input range is a subset of other (more powerful) ranges. It's also much more flexible. I agree that calling range primitives via interfaces can become an efficiency liability.

How is it more flexible? You can't replace data, and you can't remove data while iterating, both of which are possible with dcollection's primitives. If you have a Container(E) which defines InputRange!E opSlice, how do you get at the more defined range definition? casting?

You can replace data by assigning to range's elements. Removal is done via positioned remove (which I think needs to be a primitive).
 I see a range as being useful for iteration or algorithms, but not 
 for general use.  A great example is AAs.  Would you say that an AA 
 *is* a range or should *provide* a range?  If it is a range, does 
 that mean you remove elements as you call popFront?  Does that make 
 any sense?  If it doesn't, then what happens if you add elements 
 through another alias to that AA?

An AA provides several ranges - among which byKey and byValue.

range interfaces." I thought you meant containers themselves need to implement the range interface, I see now that isn't the case, so my bad.

Yah, they'd offer it as a result of opSlice(). Covariant return types will ensure there's no virtual call when you know what container you operate on.

Not having opSlice on the interface guarantees you never have a virtual call for iteration :) opApply mitigates the virtual call on the interface.

And takes away the ability to compose ranges and to use algorithms with the container.
 Above all: the primitive set for a container must be a small set of 
 functions that (a) can be implemented by all containers within 
 reasonable efficiency bounds, and (b) are container-specific, not 
 generic. IMHO any container design that defines a search(Element) as a 
 primitive is misguided. Searching is not a container primitive - it's 
 an algorithm. Only more specialized containers (e.g. trees, hashes 
 etc.) can afford to define search(Element) as a primitive. Linear 
 search is a generic algorithm that works the same for everyone. It 
 does not belong as a method of any container.

If the minimal container design isn't usable without std.algorithm, then I don't think it's worth having interfaces.

Why? I think the other way: if the minimal container design is unusable with std.algorithm, the design took a wrong turn somewhere.
 If you need std.algorithm, 
 you need the full implementation of the container because it's a 
 compile-time interface.

How did you reach that conclusion? std.algorithm uses a syntactic interface that is obeyed by interfaces too. There's no problem.
 Interface ranges are something that should be 
 avoided, it's like having a programming language where everything has to 
 be a class.

I disagree. The negation of an implementation dogma can be just as limiting as the dogma. The way I see it, a design defines some desiderata. Then it does whatever it takes to fulfill them. If one desideratum is to use a class hierachy to write container-independent code, then interface ranges naturally follow. There's no ifs and buts about it.
 What you are saying seems completely incorrect to me: "since not all 
 containers can implement fast search, I'm going to guarantee that *all* 
 containers use a linear search via their interface.

This is a misunderstanding. In the STL linear containers don't define find(), but associative containers do. That is the correct design.
 *AND* I want to 
 make each loop in the search algorithm call 3 virtual functions!"

Not necessarily. This happens only if you use the Container interface to write container-independent code. It is the cost it takes for the design to fulfill its desiderata.
 How 
 is that better than a search function that guarantees linear performance 
 but gives the option of being as fast as possible with no loop virtual 
 calls?

It is better because it doesn't write off search complexity as an implementation detail. "Linear or better" is not a good guarantee. A good guarantee is "From this node of the hierarchy down, this primitive is defined to guarantee O(log n) search". Linear search is a generic algorithm and does not belong to any container. Andrei
Jan 03 2010
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Steven Schveighoffer wrote:
 On Sun, 03 Jan 2010 09:25:25 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 
 Steven Schveighoffer wrote:

 Not having opSlice be part of the interface itself does not preclude 
 it from implementing opSlice, and does not preclude using ranges of 
 it in std.algorithm.  If I'm not mistaken, all functions in 
 std.algorithm rely on compile time interfaces.  opApply allows for 
 full input range functionality for things like copying and outputting 
 where templates may not be desired.

The point is how much container-independent code can someone write by using the Container interface. If all you have is a Container, you can't use it with any range algorithm.

The answer is, you don't. Ranges via interfaces are slower than primitive functions defined on the interface, so use what's best for interfaces when you have an interface and what's best for compile-time optimization when you have the full implementation.

I understand we don't agree on this. To me, making containers work with algorithms is a drop-dead requirement so I will rest my case. Nevertheless, I think there's one point that got missed. It's a tad subtle, and I find it pretty cool because it's the first time I used covariant returns for something interesting. Consider: interface InputIter(T) { ... } interface ForwardIter(T) : InputIter!T { ... } class Container(T) { InputIter!T opSlice() { ... } } class Array(T) : Container(T) { class Iterator : ForwardIter!T { ... all final functions ... } Iterator opSlice(); } Now there are two use cases: a) The user uses Array specifically. auto a = new Array!int; sort(a[]); In this case, sort gets a range of type Array!(int).Iterator, which defines final primitives. Therefore, the compiler does not emit ONE virtual call throughout. I believe this point was lost. b) The user wants generality and OOP-style so uses Container throughout: Container!int a = new Array!int; auto found = find(a[], 5); This time find gets an InputIter!int, so it will use virtual calls for iteration. The beauty of this design is that without any code duplication it clearly spans the spectrum between static knowledge and dynamic flexibility - within one design and one implementation. This is the design I want to pursue.
 Yes, and I think remove(V v) does not belong to the minimum 
 functionality. It combines two functions (search and remove) and 
 raises questions such as what happens to duplicate elements.

the function removes one value if it is in the container, zero if it is not.

So you need a way to do searching and a way to do positioned remove. Why combine them into one, when you can use both separately to great effect?
 Another primitive interface Multi(V) adds a removeAll(V v) function.

Why do you need a separate interface for removeAll? Are there containers that can remove one value but not all?
 Again, it combines two functions that are awkward or difficult to 
 implement using ranges via interfaces.

I contend that point.
 contains(V v); // returns whether an element is contained in the 
 colleciton

I don't think this belongs to primitives. It's O(n) for many containers and again it's a generic algorithm, not a member.

generic algorithm, because some containers can implement this more efficiently.

But this is exactly what I believe to be a mistake: you are abstracting away algorithmic complexity.

Not really, I say right in the docs that contains(V) can take O(n) time. There's no abstraction. I don't say that the algorithmic complexity is defined by the implementation. It's no different than std.algorithm's find.
 Plus, to use the generic algorithms, you would need to use interfaces 
 as ranges which I think are completely useless.

Why?

3 virtual calls per iteration, no possibility to inline, and reference copy semantics. The latter is the biggest problem for me. I want to be able to save ranges (iterators) for later use on containers, and having to call to the heap for each save is a deal killer.

Per my argument above, there may be zero virtual calls per iteration. (I want to make sure my point about covariance was understood even if it is not considered compelling.)
 However, you make it sound like there are so many containers that would 
 want to not store the length.  There is only one -- linked lists, and 
 only with special requirements (O(1) splicing)

I agree it's tenuous to leave length() out. It's a judgment call.
 Map!(K, V) has Tuple!(K, V) as its element type.

That makes things awkward. For example, to remove, you must remove the K-V pair. typically you only want to specify the K or the V. I realize it doesn't make things awkward for your container interface since you don't *define* a remove function, but your container interface isn't very useful as a general container.

It is. What is true is a converse of your statement - a general container isn't very useful as a map - which I'd agree with, and is sensible OO design. As long as a map is seen as a container, it _does_ store K-V pairs. If I know it's a map, great, it defines a different primitive for removing an element given its key.
 If you use interfaces you can write opApply wrappers to do different 
 things.  I'm not sure what you mean by composition.

For example, compose ranges a la retro or take.

how do you compose retro on an input range? that's all you got with your container interface. It's the same for opApply.

You need to be a bit down in the hierarchy to use retro.
 take is simple:
 
 class TakeIterator(V) : Iterator(V)
 {
    private Iterator!V src;
    private uint nelems;
    public this(Iterator!V src, uint nelems) { this.src = src; 
 this.nelems = nelems;}
    public int opApply(int delegate(ref V v) dg)
    {
       uint elems = 0;
       int result = 0;
       foreach(ref v; src)
       {
           if(elems++ > nelems || (result = dg(v)) != 0)
         break;
       }
       return result;
    }
    public uint length()
    {
       uint len = src.length();
       return len == NO_LENGTH_SUPPORT ? NO_LENGTH_SUPPORT : len < nelems 
 ? len : nelems;
    }
 }

And then how do I compose take with something else, or pass it to std.algorithm.find()? This whole thing doesn't hold water.
 You can replace data by assigning to range's elements. Removal is done 
 via positioned remove (which I think needs to be a primitive).

That is available in dcollections, just not part of the interface. Every container implements remove(cursor) function, but since cursors are implementation specific, it can't be part of the interface.

Well that's a problem with the design, no? Why then define a hierarchy? A simple set of unrelated types may be a better design.
 If I have to pull out std.algorithm to do simple things like remove a 
 specific element from a container, where std.algorithm may not do the 
 most efficient thing, then why the hell have an interface in the first 
 place?

Things can be arranged such that std.algorithm does do the best thing, and is also the most general and reusable approach. The STL has partly shown that. I plan to take that one step further - to show that container-independent code can be gainfully written with a hierarchy of containers (something that STL failed at).
 If I have an interface, I want to make it do all the things all 
 containers can do, not delegate it to some other part of the library.  I 
 trust that the interface knows how to do container functions in the best 
 way possible.

The interface must be expressive enough to let generic algorithms do their job.
 I think the other way: if the minimal container design is unusable 
 with std.algorithm, the design took a wrong turn somewhere.

If the OO *interface* does not support std.algorithm, then that's good by me, seeing as how you have to use non-inlinable virtual functions to do simple tasks on ranges that cannot be copied without allocating on the heap.

Did my covariance argument above convince you otherwise?
 I think absolutely containers should support std.algorithm 
 via the method std.algorithm is usable by any other range -- 
 compile-time interfaces.

I agree :o).
 I disagree. The negation of an implementation dogma can be just as 
 limiting as the dogma. The way I see it, a design defines some 
 desiderata. Then it does whatever it takes to fulfill them.

You're starting to get too smart for me here :) I have no idea what disrderata means.

Come on, the Internet is there. I meant "essential goal".
 If one desideratum is to use a class hierachy to write 
 container-independent code, then interface ranges naturally follow. 
 There's no ifs and buts about it.

The container hierarchy can support two *different* paradigms. First, the paradigm of runtime interfaces which may be useful for using containers to store pieces of data to go with an object. Such as a socket cache that maps ip addresses to sockets. This cache cares nothing about sorting or doing retro iteration on the socket cache. It cares not about the implementation of the map, just that the map does map-like things (i.e. assign this key to this value, remove this key, etc.). The other paradigm is to have access to std.algorithm in the most efficient way possible. Such access requires full implementation knowledge to make the most efficient implementation of algorithms. In fact, containers can *use* std.algorithm underneath their member functions if so desired.

Yah. Covariance takes care of all of the above. I'd almost agree with you as of a couple of months ago, when the whole covariance thing hadn't occurred to me.
 What you are saying seems completely incorrect to me: "since not all 
 containers can implement fast search, I'm going to guarantee that 
 *all* containers use a linear search via their interface.

This is a misunderstanding. In the STL linear containers don't define find(), but associative containers do. That is the correct design.

But STL isn't accessed without the full implementation, you are comparing apples to oranges here. You said that find is something that should be relegated to std.algorithm.

YES. If a design must express define linear search in more than one place, then that design is suboptimal. This is an important goal I want to follow.
 Isn't std.algorithm's find a linear search?

Yes.
 If I have a 
 Container!E, don't I need to use std.algorithm to search for an element 
 with it's returned range?

No. It's like in the STL - either the container defines its own searching primitives (e.g. set or map), or you can always grab the container's range and use the well-known linear search defined by std.algorithm.
 How is this not forcing a linear search when 
 using an interface to a container?

It isn't forcing a linear search. It is fostering awareness of the capabilities needed by the compiler. A design that defines a best-effort find() as a member is suboptimal. There's no guarantee it can provide. A putative user cannot tell anything about the complexity of their code that uses the find() member. A good design has the user say, hey, linear search is ok here; in this other place, I need a fast find so I can't use Container!T, I need AssociativeContainer!T.
 What is the answer to that, don't 
 search for an element in a container unless you have the full 
 implementation?  That works in dcollections too!

The shape and characteristic of the search is defined by the type of the container. A great library that got mentioned here in the past: http://www.gobosoft.com/eiffel/gobo/structure/container.html I strongly recommend studying that design; it is very close to optimality. They define DS_SEARCHABLE as a subclass of DS_CONTAINER - as they should.
 *AND* I want to make each loop in the search algorithm call 3 virtual 
 functions!"

Not necessarily. This happens only if you use the Container interface to write container-independent code. It is the cost it takes for the design to fulfill its desiderata.

Is that not the case we are discussing?

No, there are two cases as I discussed above in this post.
 How is that better than a search function that guarantees linear 
 performance but gives the option of being as fast as possible with no 
 loop virtual calls?

It is better because it doesn't write off search complexity as an implementation detail. "Linear or better" is not a good guarantee. A good guarantee is "From this node of the hierarchy down, this primitive is defined to guarantee O(log n) search". Linear search is a generic algorithm and does not belong to any container.

That is available via the full implementation. With the interface, it's the "best you can do", because that's all that is available. "linear or better" is a better guarantee than "linear with guaranteed 3 no-inlining virtual function calls per element."

I think one issue might be that you think "interface" and "implementation" with nothing in between. I'm talking "gradually more specialized interface", as the Gobo library above defines. Andrei
Jan 03 2010
prev sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
We are arguing in circles, so I will just stop :)

I'll address the one point I think we both feel is most important below

On Sun, 03 Jan 2010 17:19:52 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Steven Schveighoffer wrote:
 On Sun, 03 Jan 2010 09:25:25 -0500, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> wrote:

 Steven Schveighoffer wrote:

 Not having opSlice be part of the interface itself does not preclude  
 it from implementing opSlice, and does not preclude using ranges of  
 it in std.algorithm.  If I'm not mistaken, all functions in  
 std.algorithm rely on compile time interfaces.  opApply allows for  
 full input range functionality for things like copying and outputting  
 where templates may not be desired.

The point is how much container-independent code can someone write by using the Container interface. If all you have is a Container, you can't use it with any range algorithm.

primitive functions defined on the interface, so use what's best for interfaces when you have an interface and what's best for compile-time optimization when you have the full implementation.

I understand we don't agree on this. To me, making containers work with algorithms is a drop-dead requirement so I will rest my case. Nevertheless, I think there's one point that got missed. It's a tad subtle, and I find it pretty cool because it's the first time I used covariant returns for something interesting. Consider: interface InputIter(T) { ... } interface ForwardIter(T) : InputIter!T { ... } class Container(T) { InputIter!T opSlice() { ... } } class Array(T) : Container(T) { class Iterator : ForwardIter!T { ... all final functions ... } Iterator opSlice(); } Now there are two use cases: a) The user uses Array specifically. auto a = new Array!int; sort(a[]); In this case, sort gets a range of type Array!(int).Iterator, which defines final primitives. Therefore, the compiler does not emit ONE virtual call throughout. I believe this point was lost. b) The user wants generality and OOP-style so uses Container throughout: Container!int a = new Array!int; auto found = find(a[], 5); This time find gets an InputIter!int, so it will use virtual calls for iteration. The beauty of this design is that without any code duplication it clearly spans the spectrum between static knowledge and dynamic flexibility - within one design and one implementation. This is the design I want to pursue.

I see why it is attractive to you. But to me, algorithms are not the main driver for containers. One thing we both agree on -- when you know the full implementation, algorithms from std.algorithm should be implemented as fast as possible. Where we disagree is what is desired when the full implementation is not known. I contend that the ability to link with std.algorithm isn't a requirement in that case, and is not worth complicating the whole world of ranges to do so (i.e. build std.algorithm just in case you have reference-type ranges with a "save" requirement). If you don't know the implementation, don't depend on std.algorithm to have all the answers, depend on the implementation which is abstracted correctly by the interface. What this means is there will be some overlap in functions that are defined on the interfaces and functions defined in std.algorithm. Most likely the overlapping functions both point to the same implementation (naturally, this should live in std.algorithm). This is for convenience to people who want to use containers in the interface form, so they do not need to concern themselves with the abilities of std.algorithm, they just want containers that help them get work done. There is still one flaw in your ideas for covariant types: even though final functions can be used throughout and the possibility for inlining exists, you *still* need to use the heap to make copies of ranges. That was and still is my biggest concern. I think the rest of this whole post is based on our disagreement of these design choices, and it really doesn't seem productive to continue all the subtle points. [rest of growing disagreement snipped] BTW, I use covariant return types freely in dcollections and I agree it kicks ass. It seems like black magic especially when you are returning possibly a class or interface which need to be handled differently. -Steve
Jan 03 2010
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Fri, 01 Jan 2010 18:45:35 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Rainer Deyke wrote:
 Andrei Alexandrescu wrote:
 If the implementor of consume() forgets to call save(), the situation  
 is
 unpleasant albeit not catastrophic: for most struct ranges things will
 continue to work, but for class ranges the function will fail to  
 perform
 to spec. I don't know how to improve on that.

as a class, use a wrapper struct that creates a new object in its postblit function. The wrapper struct can be made generic and placed in the standard library. Same performance as the current approach, slightly more effort on the part of the range implementor, much easier and less error-prone on the side of the range user.

Oh, besides it doesn't work for struct ranges that iterate one-pass streams.

What does save do in those cases? -Steve
Jan 02 2010
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Sat, 02 Jan 2010 13:06:25 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Steven Schveighoffer wrote:
 On Fri, 01 Jan 2010 18:45:35 -0500, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> wrote:

 Rainer Deyke wrote:
 Andrei Alexandrescu wrote:
 If the implementor of consume() forgets to call save(), the  
 situation is
 unpleasant albeit not catastrophic: for most struct ranges things  
 will
 continue to work, but for class ranges the function will fail to  
 perform
 to spec. I don't know how to improve on that.

range as a class, use a wrapper struct that creates a new object in its postblit function. The wrapper struct can be made generic and placed in the standard library. Same performance as the current approach, slightly more effort on the part of the range implementor, much easier and less error-prone on the side of the range user.

Oh, besides it doesn't work for struct ranges that iterate one-pass streams.

-Steve

It provides a syntactic differentiation between input ranges and forward ranges. STL's input iterators are syntactically indistinguishable from forward iterators. Because of that, all STL algorithms that expect forward ranges will also compile and run with input ranges, although never with the expected result. This has been a lingering problem with C++98, and a key motivator for concepts. Since syntactic differences cannot be used to distinguish between input and forward iterators, the reasoning went, we need something else as a discriminator - and that would be a concept: someone who defines an input iterator declares it obeys the input iterator concept, whereas someone defining a forward iterator declares it obeys the forward iterator concept. During the meltdown of concepts, a number of people including Bjarne Stroustrup and myself have suggested that a simple workable solution would be to define an extra function a la "save" that is used in algorithms and only supported by forward iterators, but not by input iterators. Then, algorithms use save() and will correctly refuse to compile calls with input iterators. The remaining risk is that someone writes an algorithm and forgets to use save().

Would it not be as useful then to just define attributes on the type and save a bunch of useless/weird looking code? that is, have an enum value inside the type that defines its state can be saved. It seems to me like save is the equivalent of that anyways, since its possible to forget to use it, and still have your code compile. Basically, it appears to me that save either a) doesn't compile or b) is the equivalent of assignment. It doesn't seem to add much to the functionality. This is all except for classes, which I think have no business being ranges in the first place. -Steve
Jan 02 2010
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Sat, 02 Jan 2010 13:51:48 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Steven Schveighoffer wrote:
  Would it not be as useful then to just define attributes on the type  
 and save a bunch of useless/weird looking code?
  that is, have an enum value inside the type that defines its state can  
 be saved.  It seems to me like save is the equivalent of that anyways,  
 since its possible to forget to use it, and still have your code  
 compile.

If you have an enum that says a range's state can be saved, then you still need a function to effect that save :o). So you added, not removed, bloat.

The function already exists -- opAssign. My point is that save would just be the same as opAssign in most cases where you'd want to implement save. Cases where you'd want to implement save differently than opAssign are cases where you wouldn't want to use it in a generic fashion.
 Basically, it appears to me that save either a) doesn't compile or b)  
 is the equivalent of assignment.  It doesn't seem to add much to the  
 functionality.

No, for class ranges save() is nontrivial (a sort of clone). I suspect even struct ranges for certain file-oriented stuff would do nontrivial work (e.g. open the file anew and fseek() on the current position etc.)

Classes should not be ranges. And I hope that any algorithm that uses savable ranges would not be used on a file range. For example, your consume function: bool consume(R1, R2)(ref R1 r1, R2 r2) if (isForwardRange!R1 && isInputRange!R2) { auto r = r1.save(); // open an extra file, and seek the file to the given point *just in case*?! while (!r2.empty && !r.empty && r.front == r2.front) { r.popFront(); r2.popFront(); } if (r2.empty) { r1 = r; // note that this unaliases r1 from any other copies (not save()'d copies) that were made of it. // also note that you now have opened more handles to the same file. Calling this many times could // consume quite a few handles. return true; } return false; } Here is a good exercise that would help clarify what we need: determine all the types of ranges you can think of that would need a special save function.
 This is all except for classes, which I think have no business being  
 ranges in the first place.

Well then how would the container hierarchy work? It does need range interfaces. How does dcollections deal with that?

A container is not a range. A range of a container should not be a class. For dcollections, I was planning on ranges being a struct with a begin and end cursor defining the part of the container referenced by the range. Such a range can always be copied and modifying the copy through the range functions should not modify the original range. I see a range as being useful for iteration or algorithms, but not for general use. A great example is AAs. Would you say that an AA *is* a range or should *provide* a range? If it is a range, does that mean you remove elements as you call popFront? Does that make any sense? If it doesn't, then what happens if you add elements through another alias to that AA? -Steve
Jan 02 2010
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Sat, 02 Jan 2010 19:51:41 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Steven Schveighoffer wrote:
  The function already exists -- opAssign.  My point is that save would  
 just be the same as opAssign in most cases where you'd want to  
 implement save.  Cases where you'd want to implement save differently  
 than opAssign are cases where you wouldn't want to use it in a generic  
 fashion.

You might mean this(this).

Yes, that's what I meant :) I haven't gotten used to struct constructors.
 Either doesn't help because you'd still be unable to differentiate  
 between input ranges and forward ranges. Much of the point of save() is  
 to mark a syntactic difference between input ranges and forward ranges.  
 Input ranges still need this(this) and opAssign - those just have  
 different semantics.

That was the point of the enum. It would be a synthetic difference, but IMO so is save. If it turns out that the only usable times you implement save all look like: auto save() { return *this; } then save gives you nothing. It's kind of a proof by induction (I think). You say that algorithm A requires the ability to save the state of a range. So I define a function save on a range which cannot be saved by a simple assignment (i.e. a file). I use A on that range, and the results are not what I expect or kill performance or consume unneeded resources, I'd rather not use algorithm A on that range, negating the need to define a save function in the first place. I think that's what we're going to end up with.
 A range that defines save() is a forward range. save() creates an  
 independent range from its source. The file etc. example was  
 hypothetical but realistic.

I meant what actual types of ranges, not categories, I don't know how to say this better... i.e. the file range is one type, what are others? What I'm looking for is ranges that would otherwise be input ranges unless you define save on them. For example, a file range without save is an input range, cannot be a forward range. But if you define save, it is a forward range. An array is an example of a range that can be a forward range without the save function. My theory is, given this list of ranges, if you pair them with an algorithm that requires save capability, you wouldn't want to use that algorithm on it anyways (kinda like the consume example).
 Struct ranges won't work with a container hierarchy. If you define a  
 container hierarchy (classes etc.) you'll also need a range hierarchy.  
 Otherwise defining the inheritance relationship is impossible. Consider:

 abstract class Container(E) { // most general container
       property bool empty();
      bool add(E element);
      E removeAny();
      InputRange!E opSlice();
 }

 That's what I'd expect any container worth its salt to implement: (a)  
 test for empty; (b) add an element to the container; (c) remove some  
 element from the container; (d) get a range that spans the entire  
 container. (Probably removeAll() and a range-positioned remove() would  
 make sense, too.)

The range interface is compile-time only, there is no need to define it in the container interface. opSlice does not need to be part of the interface, even if it is defined on all the container types. opApply makes for a much better interface-friendly iteration anyways. BTW, the primitives in dcollections are: clear(); // clear all elements remove(V v); // remove an element contains(V v); // returns whether an element is contained in the colleciton length(); // the length of the collection dup(); // duplicate the collection opApply(int delegate(ref V v) dg); // iterate the collection opApply(int delegate(ref bool doPurge, ref V v) dg); // purge the collection That means it covers only empty in your list of must-have functions (via length() == 0). Add is not a primitive because the Map collections shouldn't assign some random key to the new element. removeAny is defined only on sets and multisets, but I'm not sure that it couldn't be moved to Collection, in fact, I'll probably do that. Note that it's missing begin and end which are defined on every single container type (i.e. the equivalent of the all-elements range). This is because those primitives return a struct that is different for every container type. It also surpasses opSlice via opApply, since all an input range can do anyways is iterate. In fact, opApply is more powerful because you can change elements and remove elements (via purging). Plus it's more efficient than a range-via-interface.
 I see a range as being useful for iteration or algorithms, but not for  
 general use.  A great example is AAs.  Would you say that an AA *is* a  
 range or should *provide* a range?  If it is a range, does that mean  
 you remove elements as you call popFront?  Does that make any sense?   
 If it doesn't, then what happens if you add elements through another  
 alias to that AA?

An AA provides several ranges - among which byKey and byValue.

I misunderstood your statement "[a container hierarchy] does need range interfaces." I thought you meant containers themselves need to implement the range interface, I see now that isn't the case, so my bad. -Steve
Jan 02 2010
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Sun, 03 Jan 2010 00:49:08 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Steven Schveighoffer wrote:
 My theory is, given this list of ranges, if you pair them with an  
 algorithm that requires save capability, you wouldn't want to use that  
 algorithm on it anyways (kinda like the consume example).

Why gratuitously limit the design? You're asking to replace this: R save() { return this; } with: enum thisIsAForwardRange = true; Is there a reason? The former leaves in flexibility. The latter doesn't, for no good reason.

Well, one thing you could do is: enum thisIsAnInputRange = true; and then no special implementation is needed for normal forward ranges. the other point is there is no special treatment needed inside algorithms -- the risk of forgetting to use save at the right points of the algorithm is higher than forgetting to say isForwardRange!(R) at the beginning of the function.
 Struct ranges won't work with a container hierarchy. If you define a  
 container hierarchy (classes etc.) you'll also need a range hierarchy.  
 Otherwise defining the inheritance relationship is impossible.  
 Consider:

 abstract class Container(E) { // most general container
       property bool empty();
      bool add(E element);
      E removeAny();
      InputRange!E opSlice();
 }

 That's what I'd expect any container worth its salt to implement: (a)  
 test for empty; (b) add an element to the container; (c) remove some  
 element from the container; (d) get a range that spans the entire  
 container. (Probably removeAll() and a range-positioned remove() would  
 make sense, too.)

it in the container interface. opSlice does not need to be part of the interface, even if it is defined on all the container types. opApply makes for a much better interface-friendly iteration anyways.

I want container users to be able to save iteration state and also to open up containers to std.algorithm and other goodies, so I'm shying away from opApply.

Not having opSlice be part of the interface itself does not preclude it from implementing opSlice, and does not preclude using ranges of it in std.algorithm. If I'm not mistaken, all functions in std.algorithm rely on compile time interfaces. opApply allows for full input range functionality for things like copying and outputting where templates may not be desired.
 BTW, the primitives in dcollections are:
  clear(); // clear all elements
 remove(V v); // remove an element

Search and remove? That's an odd primitive. Why wouldn't you offer an interface for iteration that allows an algorithm for search, and a primitive for positioned removal?

Search and positioned removal are also a primitives, but not defined on the interface. remove was a primitive on Tango's containers, and dcollections was originally meant to be a replacement for Tango's containers. I think the point is, if you have an interface reference, what would be the minimum functionality needed so that you could use a container without knowing its implementation.
 contains(V v); // returns whether an element is contained in the  
 colleciton

I don't think this belongs to primitives. It's O(n) for many containers and again it's a generic algorithm, not a member.

Again, it's part of the minimal usable interface. It's not a generic algorithm, because some containers can implement this more efficiently. Plus, to use the generic algorithms, you would need to use interfaces as ranges which I think are completely useless.
 length(); // the length of the collection

That's not a primitive either. std.algorithm.walkLength is. For me, all sorts of red flags and alarm buzzers go off when primitives are guaranteed that can't be implemented efficiently but by a subset of containers. You can't discount complexity as a implementation detail.

All current dcollection containers have O(1) length.
 dup(); // duplicate the collection
 opApply(int delegate(ref V v) dg); // iterate the collection
 opApply(int delegate(ref bool doPurge, ref V v) dg); // purge the  
 collection
  That means it covers only empty in your list of must-have functions  
 (via length() == 0).

How do you implement length() for a singly-linked list? Is empty() going to take O(n)?

first, dcollections' list implementation is doubly linked because all collections are forward and backward iterable. second, even for singly linked lists, you can have either O(1) length or O(1) splicing (consuming a link list range into another linked list). Dcollections' default link list implementation uses O(1) length since I think splicing is a specialized requirement.
 Add is not a primitive because the Map collections shouldn't assign  
 some random key to the new element.  removeAny is defined only on sets  
 and multisets, but I'm not sure that it couldn't be moved to  
 Collection, in fact, I'll probably do that.

add is a primitive that takes Tuple!(K, V) as the element type.

How do you define that on Container(V)? on Map(K, V), set(K k, V v) is an interface method. what you can do is define Map(K, V) as inheriting Container(Tuple!(K, V)), but then trying to use the container functions are very cumbersome. In dcollections, Map(K, V) inherits Collection(V).
 Note that it's missing begin and end which are defined on every single  
 container type (i.e. the equivalent of the all-elements range).  This  
 is because those primitives return a struct that is different for every  
 container type.

So you can't write container-independent iteration code unless you use opApply, in which case composition becomes tenuous.

No, you can easily write container-independent iteration as long as you have the implementation. If you use interfaces you can write opApply wrappers to do different things. I'm not sure what you mean by composition.
 It also surpasses opSlice via opApply, since all an input range can do  
 anyways is iterate.  In fact, opApply is more powerful because you can  
 change elements and remove elements (via purging).  Plus it's more  
 efficient than a range-via-interface.

An input range is a subset of other (more powerful) ranges. It's also much more flexible. I agree that calling range primitives via interfaces can become an efficiency liability.

How is it more flexible? You can't replace data, and you can't remove data while iterating, both of which are possible with dcollection's primitives. If you have a Container(E) which defines InputRange!E opSlice, how do you get at the more defined range definition? casting?
 I see a range as being useful for iteration or algorithms, but not  
 for general use.  A great example is AAs.  Would you say that an AA  
 *is* a range or should *provide* a range?  If it is a range, does  
 that mean you remove elements as you call popFront?  Does that make  
 any sense?  If it doesn't, then what happens if you add elements  
 through another alias to that AA?

An AA provides several ranges - among which byKey and byValue.

range interfaces." I thought you meant containers themselves need to implement the range interface, I see now that isn't the case, so my bad.

Yah, they'd offer it as a result of opSlice(). Covariant return types will ensure there's no virtual call when you know what container you operate on.

Not having opSlice on the interface guarantees you never have a virtual call for iteration :) opApply mitigates the virtual call on the interface.
 Above all: the primitive set for a container must be a small set of  
 functions that (a) can be implemented by all containers within  
 reasonable efficiency bounds, and (b) are container-specific, not  
 generic. IMHO any container design that defines a search(Element) as a  
 primitive is misguided. Searching is not a container primitive - it's an  
 algorithm. Only more specialized containers (e.g. trees, hashes etc.)  
 can afford to define search(Element) as a primitive. Linear search is a  
 generic algorithm that works the same for everyone. It does not belong  
 as a method of any container.

If the minimal container design isn't usable without std.algorithm, then I don't think it's worth having interfaces. If you need std.algorithm, you need the full implementation of the container because it's a compile-time interface. Interface ranges are something that should be avoided, it's like having a programming language where everything has to be a class. What you are saying seems completely incorrect to me: "since not all containers can implement fast search, I'm going to guarantee that *all* containers use a linear search via their interface. *AND* I want to make each loop in the search algorithm call 3 virtual functions!" How is that better than a search function that guarantees linear performance but gives the option of being as fast as possible with no loop virtual calls? In fact, my implementation is guaranteed to be faster than std.algorithm because of the lack of virtual calls. -Steve
Jan 03 2010
prev sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Sun, 03 Jan 2010 09:25:25 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Steven Schveighoffer wrote:

 Not having opSlice be part of the interface itself does not preclude it  
 from implementing opSlice, and does not preclude using ranges of it in  
 std.algorithm.  If I'm not mistaken, all functions in std.algorithm  
 rely on compile time interfaces.  opApply allows for full input range  
 functionality for things like copying and outputting where templates  
 may not be desired.

The point is how much container-independent code can someone write by using the Container interface. If all you have is a Container, you can't use it with any range algorithm.

The answer is, you don't. Ranges via interfaces are slower than primitive functions defined on the interface, so use what's best for interfaces when you have an interface and what's best for compile-time optimization when you have the full implementation.
 BTW, the primitives in dcollections are:
  clear(); // clear all elements
 remove(V v); // remove an element

Search and remove? That's an odd primitive. Why wouldn't you offer an interface for iteration that allows an algorithm for search, and a primitive for positioned removal?

on the interface. remove was a primitive on Tango's containers, and dcollections was originally meant to be a replacement for Tango's containers. I think the point is, if you have an interface reference, what would be the minimum functionality needed so that you could use a container without knowing its implementation.

Yes, and I think remove(V v) does not belong to the minimum functionality. It combines two functions (search and remove) and raises questions such as what happens to duplicate elements.

the function removes one value if it is in the container, zero if it is not. Another primitive interface Multi(V) adds a removeAll(V v) function. Again, it combines two functions that are awkward or difficult to implement using ranges via interfaces.
 contains(V v); // returns whether an element is contained in the  
 colleciton

I don't think this belongs to primitives. It's O(n) for many containers and again it's a generic algorithm, not a member.

algorithm, because some containers can implement this more efficiently.

But this is exactly what I believe to be a mistake: you are abstracting away algorithmic complexity.

Not really, I say right in the docs that contains(V) can take O(n) time. There's no abstraction. I don't say that the algorithmic complexity is defined by the implementation. It's no different than std.algorithm's find.
 Plus, to use the generic algorithms, you would need to use interfaces  
 as ranges which I think are completely useless.

Why?

3 virtual calls per iteration, no possibility to inline, and reference copy semantics. The latter is the biggest problem for me. I want to be able to save ranges (iterators) for later use on containers, and having to call to the heap for each save is a deal killer.
 length(); // the length of the collection

That's not a primitive either. std.algorithm.walkLength is. For me, all sorts of red flags and alarm buzzers go off when primitives are guaranteed that can't be implemented efficiently but by a subset of containers. You can't discount complexity as a implementation detail.


Some containers can't define O(1) length conveniently.

length is actually defined to return ~0 if it cannot compute the length quickly :) Forgot to mention that detail. So such containers have an "out" so to say. In fact, length is part of a different primitive I defined: Iterator(V) Iterator(V) defines length and opApply and can be implemented by any class, not just containers. My original goal when I planned dcollections for tango was for things like LineInput file readers and such to implement the Iterator interface. That way you could do cool stuff like: container.add(new LineInput("file.txt")); // add all the lines from the file but it didn't come to fruition, so Iterator is only implemented on dcollections...
 dup(); // duplicate the collection
 opApply(int delegate(ref V v) dg); // iterate the collection
 opApply(int delegate(ref bool doPurge, ref V v) dg); // purge the  
 collection
  That means it covers only empty in your list of must-have functions  
 (via length() == 0).

How do you implement length() for a singly-linked list? Is empty() going to take O(n)?

collections are forward and backward iterable. second, even for singly linked lists, you can have either O(1) length or O(1) splicing (consuming a link list range into another linked list). Dcollections' default link list implementation uses O(1) length since I think splicing is a specialized requirement.

Right. The question is how much pressure Container is putting on the implementation. I'd rather leave it to the list implementation to decide to store the length or not.

That is possible with my design. I realize I didn't fully disclose the nature of length before. But of course, there aren't any containers in dcollections that *don't* define length. However, you make it sound like there are so many containers that would want to not store the length. There is only one -- linked lists, and only with special requirements (O(1) splicing) I guess this all goes back to how empty is to be composed. In dcollections, all containers implement O(1) length, so that is not an issue. If you had a specialized linked list implementation, I'm not sure how you could compose it with my primitives. But I don't consider it to be a worthwhile deficiency to fix. most of the time, you care if something is empty because you are about to take an element, or iterate it. Just call the thing you want to do, and there are ways to check if it didn't return anything. In any case, empty could be defined as a primitive if necessary (in all dcollections it would be written return length == 0).
 Add is not a primitive because the Map collections shouldn't assign  
 some random key to the new element.  removeAny is defined only on  
 sets and multisets, but I'm not sure that it couldn't be moved to  
 Collection, in fact, I'll probably do that.

add is a primitive that takes Tuple!(K, V) as the element type.

is an interface method.

Map!(K, V) has Tuple!(K, V) as its element type.

That makes things awkward. For example, to remove, you must remove the K-V pair. typically you only want to specify the K or the V. I realize it doesn't make things awkward for your container interface since you don't *define* a remove function, but your container interface isn't very useful as a general container.
 Note that it's missing begin and end which are defined on every  
 single container type (i.e. the equivalent of the all-elements  
 range).  This is because those primitives return a struct that is  
 different for every container type.

So you can't write container-independent iteration code unless you use opApply, in which case composition becomes tenuous.

you have the implementation.

In this context: container-independent = using the Container interface. This is the whole purpose of creating a container hierarchy. If the container design fosters knowing the implementation, maybe a class hierarchy is not the right choice in the first place.
 If you use interfaces you can write opApply wrappers to do different  
 things.  I'm not sure what you mean by composition.

For example, compose ranges a la retro or take.

how do you compose retro on an input range? that's all you got with your container interface. It's the same for opApply. take is simple: class TakeIterator(V) : Iterator(V) { private Iterator!V src; private uint nelems; public this(Iterator!V src, uint nelems) { this.src = src; this.nelems = nelems;} public int opApply(int delegate(ref V v) dg) { uint elems = 0; int result = 0; foreach(ref v; src) { if(elems++ > nelems || (result = dg(v)) != 0) break; } return result; } public uint length() { uint len = src.length(); return len == NO_LENGTH_SUPPORT ? NO_LENGTH_SUPPORT : len < nelems ? len : nelems; } }
 It also surpasses opSlice via opApply, since all an input range can  
 do anyways is iterate.  In fact, opApply is more powerful because you  
 can change elements and remove elements (via purging).  Plus it's  
 more efficient than a range-via-interface.

An input range is a subset of other (more powerful) ranges. It's also much more flexible. I agree that calling range primitives via interfaces can become an efficiency liability.

data while iterating, both of which are possible with dcollection's primitives. If you have a Container(E) which defines InputRange!E opSlice, how do you get at the more defined range definition? casting?

You can replace data by assigning to range's elements. Removal is done via positioned remove (which I think needs to be a primitive).

That is available in dcollections, just not part of the interface. Every container implements remove(cursor) function, but since cursors are implementation specific, it can't be part of the interface.
  Not having opSlice on the interface guarantees you never have a  
 virtual call for iteration :)  opApply mitigates the virtual call on  
 the interface.

And takes away the ability to compose ranges and to use algorithms with the container.

As long as you only have the interface and not the actual container. I have no problem with that.
 Above all: the primitive set for a container must be a small set of  
 functions that (a) can be implemented by all containers within  
 reasonable efficiency bounds, and (b) are container-specific, not  
 generic. IMHO any container design that defines a search(Element) as a  
 primitive is misguided. Searching is not a container primitive - it's  
 an algorithm. Only more specialized containers (e.g. trees, hashes  
 etc.) can afford to define search(Element) as a primitive. Linear  
 search is a generic algorithm that works the same for everyone. It  
 does not belong as a method of any container.

then I don't think it's worth having interfaces.

Why?

If I have to pull out std.algorithm to do simple things like remove a specific element from a container, where std.algorithm may not do the most efficient thing, then why the hell have an interface in the first place? If I have an interface, I want to make it do all the things all containers can do, not delegate it to some other part of the library. I trust that the interface knows how to do container functions in the best way possible.
 I think the other way: if the minimal container design is unusable with  
 std.algorithm, the design took a wrong turn somewhere.

If the OO *interface* does not support std.algorithm, then that's good by me, seeing as how you have to use non-inlinable virtual functions to do simple tasks on ranges that cannot be copied without allocating on the heap. I think absolutely containers should support std.algorithm via the method std.algorithm is usable by any other range -- compile-time interfaces.
 If you need std.algorithm, you need the full implementation of the  
 container because it's a compile-time interface.

How did you reach that conclusion? std.algorithm uses a syntactic interface that is obeyed by interfaces too. There's no problem.

It's a bastardized hack to use interfaces with std.algorithm. I think it's as useful as functional qsort. Yeah the big-O is the same, but the implementation sucks.
 Interface ranges are something that should be avoided, it's like having  
 a programming language where everything has to be a class.

I disagree. The negation of an implementation dogma can be just as limiting as the dogma. The way I see it, a design defines some desiderata. Then it does whatever it takes to fulfill them.

You're starting to get too smart for me here :) I have no idea what disrderata means.
 If one desideratum is to use a class hierachy to write  
 container-independent code, then interface ranges naturally follow.  
 There's no ifs and buts about it.

The container hierarchy can support two *different* paradigms. First, the paradigm of runtime interfaces which may be useful for using containers to store pieces of data to go with an object. Such as a socket cache that maps ip addresses to sockets. This cache cares nothing about sorting or doing retro iteration on the socket cache. It cares not about the implementation of the map, just that the map does map-like things (i.e. assign this key to this value, remove this key, etc.). The other paradigm is to have access to std.algorithm in the most efficient way possible. Such access requires full implementation knowledge to make the most efficient implementation of algorithms. In fact, containers can *use* std.algorithm underneath their member functions if so desired.
 What you are saying seems completely incorrect to me: "since not all  
 containers can implement fast search, I'm going to guarantee that *all*  
 containers use a linear search via their interface.

This is a misunderstanding. In the STL linear containers don't define find(), but associative containers do. That is the correct design.

But STL isn't accessed without the full implementation, you are comparing apples to oranges here. You said that find is something that should be relegated to std.algorithm. Isn't std.algorithm's find a linear search? If I have a Container!E, don't I need to use std.algorithm to search for an element with it's returned range? How is this not forcing a linear search when using an interface to a container? What is the answer to that, don't search for an element in a container unless you have the full implementation? That works in dcollections too!
 *AND* I want to make each loop in the search algorithm call 3 virtual  
 functions!"

Not necessarily. This happens only if you use the Container interface to write container-independent code. It is the cost it takes for the design to fulfill its desiderata.

Is that not the case we are discussing?
 How is that better than a search function that guarantees linear  
 performance but gives the option of being as fast as possible with no  
 loop virtual calls?

It is better because it doesn't write off search complexity as an implementation detail. "Linear or better" is not a good guarantee. A good guarantee is "From this node of the hierarchy down, this primitive is defined to guarantee O(log n) search". Linear search is a generic algorithm and does not belong to any container.

That is available via the full implementation. With the interface, it's the "best you can do", because that's all that is available. "linear or better" is a better guarantee than "linear with guaranteed 3 no-inlining virtual function calls per element." -Steve
Jan 03 2010