digitalmars.D - Safe Cursors and Ranges

Pillsy (22/22) Aug 26 2010 These thoughts were inspired by the recent thread, "Retrieving the

Steven Schveighoffer (43/81) Aug 26 2010 That's exactly how cursors perform in dcollections. An excerpt from

Pillsy (18/29) Aug 26 2010 [...]

Steven Schveighoffer (51/78) Aug 26 2010 A cursor that refers to one element is much more efficient to pass

Pillsy (33/74) Aug 26 2010 Ah, yes. If you just need the one element, I figure that you can use

Steven Schveighoffer (46/104) Aug 26 2010 I think there is a misunderstanding of how I went about defining

dsimcha (39/61) Aug 26 2010 operations to get a huge amount of benefit:

Andrei Alexandrescu (25/36) Aug 26 2010 I'm weary of this philosophy. I've seen a phenomenon happen time and

Jonathan M Davis (36/59) Aug 26 2010 I think that what it comes down to is that you need to be smart about wh...

Pillsy (29/52) Aug 27 2010 I really don't disagree. One of the reasons I'm suggesting that we

Andrei Alexandrescu (13/15) Aug 26 2010 (I wrote my last post before reading yours.) Excellent points. I do

Pillsy <pillsbury gmail.com> writes:

These thoughts were inspired by the recent thread, "Retrieving the 
traversed range". I am generally pretty sold on the range concept, 
but I think for some operations some sort of cursor is required. 
However, there are some unsafe cursor-related operations that are
best avoided. 

However, as far as I can see, the unsafe operations revolve around
synthesizing a new range out of two cursors. However, you can get 
a lot of the desired functionality associated with finding elements and
splitting ranges *without* allowing the synthesis of of ranges from
unrelated cursors. The key to doing this is to have each cursor be permanently
bound to the underlying range which it was derived from.

If you have this kind of cursor, you only really need it to support two
operations to get a huge amount of benefit:

     Range before ();

     Range after ();

The before () operation just returns a range of the appropriate type (Forward,
Bidirectional, RandomAccess) of all the elements that come before the cursor is
pointing to, while the after () operation returns a range of the appropriate
type with all the elements after the cursor. Given this concept, a cursor
really points at the boundary between elements of a range, if you will.

If find and related operations return this kind of cursor, all of the
operations involving splitting ranges become trivial. It's completely
consistent for either before () or after () to be empty, and since you can't
glue two of these cursors together, you are always guaranteed to get
well-defined ranges out. 

While I think the before () and after () operations are the only ones 
which are strictly necessary for the desiderata I've seen so far, people 
could write their own generic algorithms if cursors supported more of 
the iterator operations, for moving forward (and backward, where appropriate)
and inspecting the element immediately after the cursor (or before, where
appropriate), and if ranges supported operations to return cursors from
reasonable places (like from the front or back, or from a specific indexed
elemnt for RandomAccess ranges).

The key idea is that these cursors aren't a primitive part of a range; instead,
they take a range and add a position *inside* the range. They're a perfect fit
for all those "three-legged" operations out there because they actually have
three legs.

Cheers,
Pillsy

Aug 26 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Thu, 26 Aug 2010 10:18:51 -0400, Pillsy <pillsbury gmail.com> wrote:

 These thoughts were inspired by the recent thread, "Retrieving the
 traversed range". I am generally pretty sold on the range concept,
 but I think for some operations some sort of cursor is required.
 However, there are some unsafe cursor-related operations that are
 best avoided.

 However, as far as I can see, the unsafe operations revolve around
 synthesizing a new range out of two cursors. However, you can get
 a lot of the desired functionality associated with finding elements and
 splitting ranges *without* allowing the synthesis of of ranges from
 unrelated cursors. The key to doing this is to have each cursor be  
 permanently bound to the underlying range which it was derived from.

 If you have this kind of cursor, you only really need it to support two  
 operations to get a huge amount of benefit:

      Range before ();

      Range after ();

 The before () operation just returns a range of the appropriate type  
 (Forward, Bidirectional, RandomAccess) of all the elements that come  
 before the cursor is pointing to, while the after () operation returns a  
 range of the appropriate type with all the elements after the cursor.  
 Given this concept, a cursor really points at the boundary between  
 elements of a range, if you will.

 If find and related operations return this kind of cursor, all of the  
 operations involving splitting ranges become trivial. It's completely  
 consistent for either before () or after () to be empty, and since you  
 can't glue two of these cursors together, you are always guaranteed to  
 get well-defined ranges out.

 While I think the before () and after () operations are the only ones
 which are strictly necessary for the desiderata I've seen so far, people
 could write their own generic algorithms if cursors supported more of
 the iterator operations, for moving forward (and backward, where  
 appropriate) and inspecting the element immediately after the cursor (or  
 before, where appropriate), and if ranges supported operations to return  
 cursors from reasonable places (like from the front or back, or from a  
 specific indexed elemnt for RandomAccess ranges).

 The key idea is that these cursors aren't a primitive part of a range;  
 instead, they take a range and add a position *inside* the range.  
 They're a perfect fit for all those "three-legged" operations out there  
 because they actually have three legs.

That's exactly how cursors perform in dcollections.  An excerpt from  
dcollections' concepts document:

Cursors:

Cursors are special 1 or 0 element ranges.  They implement the forward  
range
interface.  The benefit to using cursors is that they always refer to  
exactly
one element.  A normal range uses two marker elements, a begin and an end
element.  Therefore, cursors are less susceptible to invalidation on  
removal of
elements.

Cursors support these additional features:

- Use a cursor as a reference point when dealing with a container.
- Use as an end point when slicing a container.  Note that some containers  
only
   support slicing with an arbitrary cursor and the beginning or end of the
   container.  Slicing is guaranteed to be a fast operation (lgn or better).

Determining cursor/range ownership:

All collections can identify whether a cursor/range belongs to them.  Each  
collection class has a 'belongs' method to determine ownership.  The  
ownership check is guaranteed to be a fast operation (O(lgN) or better).


-------------------

The 'three-legged' cursor as you propose was an early idea of mine too.   
Because dcollections containers are classes however, there is an easy  
point of ownership to establish -- the class reference.  So a dcollections  
cursor doesn't have to refer to its owner, but it can in order to fulfill  
the requirement.

My view is that a cursor should be separate from a range, and a range  
should either be able to tell if a cursor is part of it ( safe) or be told  
that a cursor is a part of it ( system or  trusted).  Inside a  
three-legged algorithm which accepts a range as input, and generates a  
cursor from the range for the third leg, it can force the range to assume  
the cursor is a part of it, knowing that it's true.  You can make the  
second operation  safe by wrapping a cursor + range in a struct that  
generates the cursor from the range (and therefore knows the cursor is a  
part of the range).

For two examples of ranges that can tell if a cursor is a part of it, an  
array can tell if a cursor is a part of it without any extra mechanisms  
because it is an O(1) check for an interior pointer.  A balanced tree can  
also perform an O(lgn) check that a cursor is part of it (though it's  
questionable whether a tree is a range).

-Steve

Aug 26 2010

Pillsy <pillsbury gmail.com> writes:

Steven Schveighoffer Wrote:

 On Thu, 26 Aug 2010 10:18:51 -0400, Pillsy <pillsbury gmail.com> 
 wrote:

[...]
 The key idea is that these cursors aren't a primitive part of a
 range; instead, they take a range and add a position *inside* 
 the range. They're a perfect fit for all those "three-legged" 
 operations out there because they actually have three legs.


[...]
 The 'three-legged' cursor as you propose was an early idea of
 mine too. 

If you don't mind me asking, what made you give up on it? 

 My view is that a cursor should be separate from a range, and a
 range should either be able to tell if a cursor is part of it ( safe)
 or be told  that a cursor is a part of it ( system or  trusted).  

What's the advantage of not holding on to the range as part of the 
cursor? I'm more focused on the typical value-type ranges, but I'm 
not seeing a lot of need for cursors that have lives independent of 
underlying ranges. Is the concern just one of more frequent 
invalidation? 

IME, three-legged operations are fairly common, but four- and more-
legged operations which can't be easily decomposed into consecutive 
three-legged operations are very uncommon. 

What I'm thinking is that cursors aren't skinny ranges, they're ranges 
that are even fatter. You simply wouldn't use one where a range 
would suffice. Allowing questions about whether a range contains a 
cursor or not strikes me as just buying trouble.

Thanks for your thoughts,
Pillsy

Aug 26 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Thu, 26 Aug 2010 11:17:32 -0400, Pillsy <pillsbury gmail.com> wrote:

 Steven Schveighoffer Wrote:

 On Thu, 26 Aug 2010 10:18:51 -0400, Pillsy <pillsbury gmail.com>
 wrote:

 [...]
 The key idea is that these cursors aren't a primitive part of a
 range; instead, they take a range and add a position *inside*
 the range. They're a perfect fit for all those "three-legged"
 operations out there because they actually have three legs.


 [...]
 The 'three-legged' cursor as you propose was an early idea of
 mine too.

 If you don't mind me asking, what made you give up on it?

A cursor that refers to one element is much more efficient to pass  
around/copy.  Plus the extra baggage was not needed in dcollections -- any  
operations you perform on a cursor besides changing the value require you  
to have the container also (i.e. a cursor is mainly a parameter to the  
container).  I don't know how this relates to three-legged algorithms and  
straight ranges+cursors, dcollections ranges+cursors would need some extra  
binding support to use them, since you need the container to create a  
range based on two cursors.

 My view is that a cursor should be separate from a range, and a
 range should either be able to tell if a cursor is part of it ( safe)
 or be told  that a cursor is a part of it ( system or  trusted).

 What's the advantage of not holding on to the range as part of the
 cursor? I'm more focused on the typical value-type ranges, but I'm
 not seeing a lot of need for cursors that have lives independent of
 underlying ranges. Is the concern just one of more frequent
 invalidation?

Because some cursors do not need the hard reference, it is trivial to tell  
whether such cursors belong to a range.  For instance, it's trivial to  
tell that a pointer is part of an array.  So a cursor for an array would  
not need to refer the range it belongs to.

For instance, if I have an array, and a cursor in that array, it's trivial  
for an allBefore operation to determine that the cursor is a member of the  
array.  Now, let's say we have two slices that contain the same cursor, in  
your scheme, in order to use one cursor against the other range, you need  
to recompose the cursor, which seems unnecessarily awkward.  Such a scheme  
is not possible on other ranges, but that just means those ranges don't  
support safe calls for allBefore and the like.  For those ranges, a  
3-legged type would be appropriate, but the 3-legged thing could be a  
combination of a cursor and a range, with the cursor proven to be part of  
the range -- an invariant of the 3-legged type.

Having a cursor point to exactly one element allows more flexibility,  
because you could use a cursor with multiple ranges.  It also decouples  
the cursor from a range that might be superfluous information.  For  
example, if I want to store a link list element cursor so I can change it  
later, why do I also have to carry around the baggage of the range it came  
from?

One of the drawbacks is the cursor cannot move to other elements or alter  
topology, it's strictly a reference-only.  Another reason to have the  
3-legged type.

 IME, three-legged operations are fairly common, but four- and more-
 legged operations which can't be easily decomposed into consecutive
 three-legged operations are very uncommon.

I have no experience with three-legged operations.  I just looked at  
cursors from the point of view of reference to a single element.  Such an  
ability is important for a container library.

I can see such an ability being important for simple operations as well,  
not just algorithms.

 What I'm thinking is that cursors aren't skinny ranges, they're ranges
 that are even fatter. You simply wouldn't use one where a range
 would suffice. Allowing questions about whether a range contains a
 cursor or not strikes me as just buying trouble.

But you don't always care if a cursor is a part of a range.  For example,  
to change or refer to a specific element of a range, why do you need to  
refer to the range also?  The composition of a range and cursor together  
can be guarded by  system or  trusted indicating the type system is taking  
your word for it, but in some cases, it can be proven without assumption  
(i.e. arrays).

I don't have a problem with defining such a three-legged type, but can we  
call it something besides a cursor?  Not that I have a trademark or  
anything, but it would be really confusing to have it mean two different  
things depending on context :)

As I said, I can see utility in having a three-legged beast that combines  
a cursor and a range, but I see value in having a simple one-element  
cursor as well.

-Steve

Aug 26 2010

Pillsy <pillsbury gmail.com> writes:

Steven Schveighoffer Wrote:

 On Thu, 26 Aug 2010 11:17:32 -0400, Pillsy <pillsbury gmail.com> 
 wrote:

[...]
 If you don't mind me asking, what made you give up on it?


 A cursor that refers to one element is much more efficient to pass  
 around/copy.  

Ah, yes. If you just need the one element, I figure that you can use
a range. In most of the cases I'm familiar with, ranges are not 
distinctly less lightweight than array slices, and in D one tends to
blithely pass slices all over the place. Things could be different for 
iterating over trees---is that where you ran into trouble with the 
idea in dcollections?

 Plus the extra baggage was not needed in dcollections -- any  
 operations you perform on a cursor besides changing the value 
 require you  to have the container also (i.e. a cursor is mainly a 
 parameter to the  container).  I don't know how this relates to 
 three-legged algorithms and  straight ranges+cursors, dcollections 
 ranges+cursors would need some extra binding support to use 
 them, since you need the container to create a  
 range based on two cursors.

 What's the advantage of not holding on to the range as part of
 the cursor? I'm more focused on the typical value-type ranges, 
 but I'm not seeing a lot of need for cursors that have lives 
 independent of underlying ranges. Is the concern just one of 
 more frequent invalidation?


 Because some cursors do not need the hard reference, it is trivial 
 to tell whether such cursors belong to a range.  For instance, it's
 trivial to tell that a pointer is part of an array.  So a cursor for an
 array would not need to refer the range it belongs to.

Ah, I see. That would make things harder to use for the specific use
case I'm thinking of, where one has a range one would like to split
at a certain element (which you might have found any number of 
ways). Basically, you can just pass around the cursor, instead of 
needing the cursor and the range. 

This strikes me as likely to be very helpful for three-legged races^W
cases.

 For instance, if I have an array, and a cursor in that array, it's trivial  
 for an allBefore operation to determine that the cursor is a member
 of the array. Now, let's say we have two slices that contain the 
 same cursor, in  your scheme, in order to use one cursor against 
 the other range, you need  to recompose the cursor, which seems
 unnecessarily awkward. 

The way I see it, two different slices can't contain the same cursor; 
a cursor into a slice will always be associated with that slice and no 
other. Maybe there are important use cases I'm missing by defining 
things this way, but I don't know what they might be.
[...]
 For example, if I want to store a link list element cursor so I can 
 change it later, why do I also have to carry around the baggage of 
 the range it came from?

You may not, but why not just hang on to the range it came from 
in the first place instead of using the cursor? What's in the link-list 
range besides a pointer to a node anyway?
[...]
 What I'm thinking is that cursors aren't skinny ranges, they're 
 ranges that are even fatter. You simply wouldn't use one
 where a range would suffice. Allowing questions about whether
 a range contains a cursor or not strikes me as just buying 
 trouble.


 But you don't always care if a cursor is a part of a range.  For 
 example, to change or refer to a specific element of a range, 
 why do you need to refer to the range also? 

You don't, but what do you lose by referring to the range also? A 
few words on the stack (sticking with the idea that ranges are value
types)?
[...] 
 I don't have a problem with defining such a three-legged type,
 but can we call it something besides a cursor? 

Sure. How about Tripod? 8^)

I was thinking of calling them either Splitters, actually, 'cause that's
what they're there for: splitting ranges.

Cheers, 
Pillsy

Aug 26 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Thu, 26 Aug 2010 13:30:35 -0400, Pillsy <pillsbury gmail.com> wrote:

 Steven Schveighoffer Wrote:

 On Thu, 26 Aug 2010 11:17:32 -0400, Pillsy <pillsbury gmail.com>
 wrote:

 [...]
 If you don't mind me asking, what made you give up on it?


 A cursor that refers to one element is much more efficient to pass
 around/copy.

 Ah, yes. If you just need the one element, I figure that you can use
 a range. In most of the cases I'm familiar with, ranges are not
 distinctly less lightweight than array slices, and in D one tends to
 blithely pass slices all over the place. Things could be different for
 iterating over trees---is that where you ran into trouble with the
 idea in dcollections?

I think there is a misunderstanding of how I went about defining  
dcollections cursors.  I never actually tried the tripod idea, I just  
posted it to the newsgroup as a possible solution for dcollections cursors.

Originally in dcollections 1, cursors were iterators -- allowing movement  
of the cursor as well as dereferencing.  This was way way back, before  
ranges were a prevalent concept and I was trying to write a better  
collection library for tango.

When negotiating with Andrei on allowing me to have cursors (when I hoped  
dcollections could be a part of phobos), I went through several different  
ideas to try and make cursors safe.  I ended up trying to define a simple  
reference type that didn't allow iterating.  That is, a reference to a  
single item that cannot move, thereby removing the unsafe nature of  
iterators.  I ran into the problem that the "end" cursor was pointing to  
an invalid element, so you still had to rely on the coder not referencing  
it, just like in C++.  I decided I'd add a bool to the cursor to let it  
know it was invalid, and the idea hit me all of a sudden -- that's just a  
0 or 1-element range depending on the bool.

So the current incarnation was born.  And I have to say, it's quite  
usable.  Once I defined it that way, the code almost wrote itself.

The issue with using ranges to refer to single elements is, it's difficult  
to keep a node-based range intact, for later use.  For example, if you  
want to refer to a single element in a tree, and you use a range to do  
that (assuming it's a range with a start and end node), things can  
happen.  More elements could be inserted between them, the end marker of  
your range could get removed, etc.  Then when you pass the range to the  
tree's remove function, you're giving it an invalid range, or removing  
more elements than you wanted.  Cursors are not susceptible to these  
problems, since they only ever refer to one element.

So there *is* value in having a special cursor type that only refers to  
one element vs. a range, regardless of the copy performance.

 Because some cursors do not need the hard reference, it is trivial
 to tell whether such cursors belong to a range.  For instance, it's
 trivial to tell that a pointer is part of an array.  So a cursor for an
 array would not need to refer the range it belongs to.

 Ah, I see. That would make things harder to use for the specific use
 case I'm thinking of, where one has a range one would like to split
 at a certain element (which you might have found any number of
 ways). Basically, you can just pass around the cursor, instead of
 needing the cursor and the range.

 This strikes me as likely to be very helpful for three-legged races^W
 cases.

Sure, but you can build such a type and its corresponding splitter  
function (which actually could be a member of the tripod) in terms of the  
low-level function which are more powerful because they accept two  
different entities -- the range and the cursor.

 For instance, if I have an array, and a cursor in that array, it's  
 trivial
 for an allBefore operation to determine that the cursor is a member
 of the array. Now, let's say we have two slices that contain the
 same cursor, in  your scheme, in order to use one cursor against
 the other range, you need  to recompose the cursor, which seems
 unnecessarily awkward.

 The way I see it, two different slices can't contain the same cursor;
 a cursor into a slice will always be associated with that slice and no
 other. Maybe there are important use cases I'm missing by defining
 things this way, but I don't know what they might be.
 [...]

Not sure what the use cases would be, but defining the simplest thing  
possible and then building more complex concepts on top of them usually  
allows more possibilities.

 For example, if I want to store a link list element cursor so I can
 change it later, why do I also have to carry around the baggage of
 the range it came from?

 You may not, but why not just hang on to the range it came from
 in the first place instead of using the cursor? What's in the link-list
 range besides a pointer to a node anyway?

The pointer to the possibly invalid end node.  See my above description.

 [...]
 What I'm thinking is that cursors aren't skinny ranges, they're
 ranges that are even fatter. You simply wouldn't use one
 where a range would suffice. Allowing questions about whether
 a range contains a cursor or not strikes me as just buying
 trouble.


 But you don't always care if a cursor is a part of a range.  For
 example, to change or refer to a specific element of a range,
 why do you need to refer to the range also?

 You don't, but what do you lose by referring to the range also? A
 few words on the stack (sticking with the idea that ranges are value
 types)?

It becomes awkward when you are passing a range around but are saying  
"well, really, I'm only passing you a reference to the first element,  
ignore the rest of it".  You end up naming things like  
removeFirstElement(range r) instead of remove(cursor c).  This is from my  
experience with defining dcollections and a std.container redblacktree.

 [...]
 I don't have a problem with defining such a three-legged type,
 but can we call it something besides a cursor?

 Sure. How about Tripod? 8^)

I like it :)

-Steve

Aug 26 2010

dsimcha <dsimcha yahoo.com> writes:

== Quote from Pillsy (pillsbury gmail.com)'s article
 These thoughts were inspired by the recent thread, "Retrieving the
 traversed range". I am generally pretty sold on the range concept,
 but I think for some operations some sort of cursor is required.
 However, there are some unsafe cursor-related operations that are
 best avoided.
 However, as far as I can see, the unsafe operations revolve around
 synthesizing a new range out of two cursors. However, you can get
 a lot of the desired functionality associated with finding elements and
 splitting ranges *without* allowing the synthesis of of ranges from
 unrelated cursors. The key to doing this is to have each cursor be permanently

bound to the underlying range which it was derived from.
 If you have this kind of cursor, you only really need it to support two

operations to get a huge amount of benefit:
      Range before ();
      Range after ();
 The before () operation just returns a range of the appropriate type (Forward,

Bidirectional, RandomAccess) of all the elements that come before the cursor is
pointing to, while the after () operation returns a range of the appropriate
type
with all the elements after the cursor. Given this concept, a cursor really
points
at the boundary between elements of a range, if you will.
 If find and related operations return this kind of cursor, all of the
operations

involving splitting ranges become trivial. It's completely consistent for either
before () or after () to be empty, and since you can't glue two of these cursors
together, you are always guaranteed to get well-defined ranges out.
 While I think the before () and after () operations are the only ones
 which are strictly necessary for the desiderata I've seen so far, people
 could write their own generic algorithms if cursors supported more of
 the iterator operations, for moving forward (and backward, where appropriate)

and inspecting the element immediately after the cursor (or before, where
appropriate), and if ranges supported operations to return cursors from
reasonable
places (like from the front or back, or from a specific indexed elemnt for
RandomAccess ranges).
 The key idea is that these cursors aren't a primitive part of a range; instead,

they take a range and add a position *inside* the range. They're a perfect fit
for
all those "three-legged" operations out there because they actually have three
legs.
 Cheers,
 Pillsy

I'm starting to think that the whole concept of ranges is starting to get too
complicated.  We're already seeing it with this moveFront() stuff that's a
special
case to deal with structs that have expensive postblits.  We need a healthy dose
of "worse is better" to throw at ranges.  IMHO generic code should solve 90% of
the problem in a way that's robust, not terribly hard to implement and easy to
use
and understand.  Trying to solve 100% of the problem with generic code means
that
your solution will be absurdly complicated in both interface and implementation,
and therefore will be a buggy POS that nobody can figure out how to use.

Adding yet more complexity to ranges has the following negative consequences,
such
that I think it's justified to sacrifice completeness in favor of simplicity:

1.  Having yet more things to wrap makes writing correct higher order ranges
that
much more difficult and bug-prone.  The composability of ranges is what makes
them
such an elegant and useful abstraction in the first place.

2.  The combinatorial explosion of range types (isBidirectional, hasBefore,
hasLength, ...) makes testing a higher order range a nightmare.

3.  The more stuff we require someone to define to create a range (for example,
if
before() were simply required for bidirectional ranges), the higher the odds
that
we'll run into cases where one of these primitives can't be implemented
efficiently.

4.  I remember when ranges first came out, some people were pretty confused by
them.  Adding more stuff to solve the last 10% of cases will only add to this.
It's better to have ranges that can solve 90% of the problem for the most
competent 80% of programmers than ranges that can solve 100% of the problem, but
only for the 1% who are hardcore gurus.

Aug 26 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 8/26/10 9:18 PDT, dsimcha wrote:
 I'm starting to think that the whole concept of ranges is starting to get too
 complicated.  We're already seeing it with this moveFront() stuff that's a
special
 case to deal with structs that have expensive postblits.  We need a healthy
dose
 of "worse is better" to throw at ranges.  IMHO generic code should solve 90% of
 the problem in a way that's robust, not terribly hard to implement and easy to
use
 and understand.  Trying to solve 100% of the problem with generic code means
that
 your solution will be absurdly complicated in both interface and
implementation,
 and therefore will be a buggy POS that nobody can figure out how to use.

[...]
 It's better to have ranges that can solve 90% of the problem for the most
 competent 80% of programmers than ranges that can solve 100% of the problem,
but
 only for the 1% who are hardcore gurus.

I'm weary of this philosophy. I've seen a phenomenon happen time and 
again with myself and others: you hit a 10% case exactly when you (a) 
know what you want to do, (b) you know that the library could and should 
help you, and (c) you are working on a difficult enough problem to be 
wary of implementing it by hand.

Two examples from recent history: A poster used std.algorithm for the 
_first_ (sic!) time, and found a fundamental limitation: std.algorithm 
doesn't work with immutable arrays. Second, Peter Alexander wanted to 
implement nextPermutation (a nontrivial algorithm in which you need all 
the help you can get) as his _first_ (sic again!) artifact of a new 
library, and hit the bidirectional range composition limitation.

I don't think we want to aim std.algorithm and others for a 90% thing. 
That's what Visual Basic did. We should aim for making it a 100% thing 
without losing the other costs from sight (something that I believe we 
have a good track record on).

In light of that, it's worth looking at optional functionality of the 
kind I described (allBefore etc.) would add power without aggravating 
everyone.

Also, regarding moveFront and friends, I think it's a good time to start 
the following discussion: do we want to support types with arbitrarily 
expensive copying, or could we simply require reference counting and 
copy-on-write for such types? Let's start a new thread about this!


Andrei

Aug 26 2010

Jonathan M Davis <jmdavisprog gmail.com> writes:

On Thursday, August 26, 2010 10:51:02 Andrei Alexandrescu wrote:
 On 8/26/10 9:18 PDT, dsimcha wrote:
 I'm starting to think that the whole concept of ranges is starting to get
 too complicated.  We're already seeing it with this moveFront() stuff
 that's a special case to deal with structs that have expensive
 postblits.  We need a healthy dose of "worse is better" to throw at
 ranges.  IMHO generic code should solve 90% of the problem in a way
 that's robust, not terribly hard to implement and easy to use and
 understand.  Trying to solve 100% of the problem with generic code means
 that your solution will be absurdly complicated in both interface and
 implementation, and therefore will be a buggy POS that nobody can figure
 out how to use.

 
 [...]
 
 It's better to have ranges that can solve 90% of the problem for the most
 competent 80% of programmers than ranges that can solve 100% of the
 problem, but only for the 1% who are hardcore gurus.

 
 I'm weary of this philosophy. I've seen a phenomenon happen time and
 again with myself and others: you hit a 10% case exactly when you (a)
 know what you want to do, (b) you know that the library could and should
 help you, and (c) you are working on a difficult enough problem to be
 wary of implementing it by hand.

I think that what it comes down to is that you need to be smart about what 
functionality that you do and don't include. David has an excellent point that 
when APIs become too complicated, they don't get used. For instance, I know 
_very_ few people who even _consider_ using C++'s algorithm library. It's just 
too hard to use. It has too many quirks (like how remove() works) and the lack 
of lambdas functions makes it incredibly painful to do simple things with it. 
Any library with much complexity is going to run into such issues, but they
need 
to be minimized for a library to be properly useable. We can't afford to make 
ranges too complex. They need to be reasonably useable by your average 
programmer.

Any time that we need to add functionality to make them do more, we need to 
weigh the benefits and costs. If adding ability X makes it possible to do thing 
Y, but only a few people need to do thing Y, and adding feature X greatly 
increases the complexity, then it probably shouldn't be added. On the other 
hand, if adding feature X doesn't add much complexity and/or a lot of people 
need to do thing Y, then it would likely be a good thing to add feature X.

If we find something that a lot of people are trying to do and can't, then we 
need to really look at a reasonable way to make it possible. However, there are 
limits to what you can do without making Phobos' APIs unreasonably complex to 
understand and use. I do _not_ think that we should necessarily be hitting the 
100% case. The cost for that is likely too high. However, we probably should be 
going for more like the 98% or 99% case. (after all 90% indicates that 1 out of 
10 things that you try to do won't be feasible, and that's actually a lot, much 
as 90% sounds like a big number).

So, I do think that we should be making ranges (and the rest of Phobos) as 
powerful we reasonably can, but there _are_ limits to what they can reasonably 
do, and we should not necessarily try to make it so that they can do absolutely 
everything. And when we do make ranges more powerful, we need to strive to
limit 
the increase to complexity as much as possible. Ultimately, ranges really
should 
be powerful enough to do virtually everything that you would normally do with 
iterators (along with what they can do that iterators can't), but they should 
also be useable by the average programmer. Whatever we do needs to reasonably 
balance those goals. And I think that there is legitmate concern that ranges
are 
risking becoming too complex.

- Jonathan M Davis

Aug 26 2010

Pillsy <pillsbury gmail.com> writes:

Jonathan M Davis Wrote:

 On Thursday, August 26, 2010 10:51:02 Andrei Alexandrescu
 wrote:

[...]
 I'm weary of this philosophy. I've seen a phenomenon happen
 time and again with myself and others: you hit a 10% case 
 exactly when you (a) know what you want to do, (b) you 
 know that the library could and should help you, and (c) you
 are working on a difficult enough problem to be wary of
 implementing it by hand.


 I think that what it comes down to is that you need to be smart
 about what functionality that you do and don't include.

I really don't disagree. One of the reasons I'm suggesting that we
have a Tripod[1] concept to cover these three-legged operations is 
that I think that it's cohesive enough that fitting this functionality 
into a Range directly is unduly complicating. Having two concepts 
which each provide a reasonably orthogonal set of six basic 
operations is much less complicated than having one concept which
provides a dozen less orthogonal operations. 

[1] I actually think "Pivot" is the right name of the
range+iterator-in-the-middle abstraction.

 David has an excellent point that 
 when APIs become too complicated, they don't get used. For 
 instance, I know _very_ few people who even _consider_ using
 C++'s algorithm library. It's just too hard to use.

Huh. I use it with some frequency when I program C++, and I hardly
regard myself as a C++ wizard. Sure, it can be pretty damn awkward,
but welcome to C++.
[...]
 It has too many quirks (like how remove() works) and the lack 
 of lambdas functions makes it incredibly painful to do simple things
 with it. 

I think the lambda functions are a great example: they're a place
where C++98/03 doesn't provide *enough*, rather than providing 
too much. The library runs up against a fundamental limitation of 
the underlying language, rendering parts of it (like for_each) pretty 
useless.

There can be a severe cost to not providing enough.
[...]
 I do _not_ think that we should necessarily be hitting the 
 100% case. The cost for that is likely too high. However, we 
 probably should be going for more like the 98% or 99% case.
 (after all 90% indicates that 1 out of 10 things that you try to
 do won't be feasible, and that's actually a lot, much as 90% 
 sounds like a big number).

This is definitely agree with. Making something that legitimately 
does everything anyone could ever want is how you end up with
nightmares like Scheme's call-with-current-continuation, but letting
1 use case in 10 drop on the floor makes the cost/benefit ratio of
learning your library or language really dubious.
[...]
Cheers,
Pillsy

Aug 27 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 8/26/10 7:18 PDT, Pillsy wrote:
[snip]
       Range before ();

       Range after ();

(I wrote my last post before reading yours.) Excellent points. I do 
think we can get away without defining a new type. In addition to the 
allBefore(all, tail) primitive (which is easy to implement as a safe 
primitive by ranges that support it) we could also define allAfter(all, 
head) that works when head is superimposed over the beginning portion of 
all (is a prefix of it).

It would still be impossible to take a range somewhere in the middle of 
another and then get in constant time the portion before and after it, 
but (a) that is difficult to make safe in O(1) anyway, and (b) there is 
exceedingly rare need for it.

Andrei

Aug 26 2010

D Programming

C/C++ Programming

Other

digitalmars.D - Safe Cursors and Ranges