www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Safe Cursors and Ranges

reply Pillsy <pillsbury gmail.com> writes:
These thoughts were inspired by the recent thread, "Retrieving the 
traversed range". I am generally pretty sold on the range concept, 
but I think for some operations some sort of cursor is required. 
However, there are some unsafe cursor-related operations that are
best avoided. 

However, as far as I can see, the unsafe operations revolve around
synthesizing a new range out of two cursors. However, you can get 
a lot of the desired functionality associated with finding elements and
splitting ranges *without* allowing the synthesis of of ranges from
unrelated cursors. The key to doing this is to have each cursor be permanently
bound to the underlying range which it was derived from.

If you have this kind of cursor, you only really need it to support two
operations to get a huge amount of benefit:

     Range before ();

     Range after ();

The before () operation just returns a range of the appropriate type (Forward,
Bidirectional, RandomAccess) of all the elements that come before the cursor is
pointing to, while the after () operation returns a range of the appropriate
type with all the elements after the cursor. Given this concept, a cursor
really points at the boundary between elements of a range, if you will.

If find and related operations return this kind of cursor, all of the
operations involving splitting ranges become trivial. It's completely
consistent for either before () or after () to be empty, and since you can't
glue two of these cursors together, you are always guaranteed to get
well-defined ranges out. 

While I think the before () and after () operations are the only ones 
which are strictly necessary for the desiderata I've seen so far, people 
could write their own generic algorithms if cursors supported more of 
the iterator operations, for moving forward (and backward, where appropriate)
and inspecting the element immediately after the cursor (or before, where
appropriate), and if ranges supported operations to return cursors from
reasonable places (like from the front or back, or from a specific indexed
elemnt for RandomAccess ranges).

The key idea is that these cursors aren't a primitive part of a range; instead,
they take a range and add a position *inside* the range. They're a perfect fit
for all those "three-legged" operations out there because they actually have
three legs.

Cheers,
Pillsy
Aug 26 2010
next sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 26 Aug 2010 10:18:51 -0400, Pillsy <pillsbury gmail.com> wrote:

 These thoughts were inspired by the recent thread, "Retrieving the
 traversed range". I am generally pretty sold on the range concept,
 but I think for some operations some sort of cursor is required.
 However, there are some unsafe cursor-related operations that are
 best avoided.

 However, as far as I can see, the unsafe operations revolve around
 synthesizing a new range out of two cursors. However, you can get
 a lot of the desired functionality associated with finding elements and
 splitting ranges *without* allowing the synthesis of of ranges from
 unrelated cursors. The key to doing this is to have each cursor be  
 permanently bound to the underlying range which it was derived from.

 If you have this kind of cursor, you only really need it to support two  
 operations to get a huge amount of benefit:

      Range before ();

      Range after ();

 The before () operation just returns a range of the appropriate type  
 (Forward, Bidirectional, RandomAccess) of all the elements that come  
 before the cursor is pointing to, while the after () operation returns a  
 range of the appropriate type with all the elements after the cursor.  
 Given this concept, a cursor really points at the boundary between  
 elements of a range, if you will.

 If find and related operations return this kind of cursor, all of the  
 operations involving splitting ranges become trivial. It's completely  
 consistent for either before () or after () to be empty, and since you  
 can't glue two of these cursors together, you are always guaranteed to  
 get well-defined ranges out.

 While I think the before () and after () operations are the only ones
 which are strictly necessary for the desiderata I've seen so far, people
 could write their own generic algorithms if cursors supported more of
 the iterator operations, for moving forward (and backward, where  
 appropriate) and inspecting the element immediately after the cursor (or  
 before, where appropriate), and if ranges supported operations to return  
 cursors from reasonable places (like from the front or back, or from a  
 specific indexed elemnt for RandomAccess ranges).

 The key idea is that these cursors aren't a primitive part of a range;  
 instead, they take a range and add a position *inside* the range.  
 They're a perfect fit for all those "three-legged" operations out there  
 because they actually have three legs.

That's exactly how cursors perform in dcollections. An excerpt from dcollections' concepts document: Cursors: Cursors are special 1 or 0 element ranges. They implement the forward range interface. The benefit to using cursors is that they always refer to exactly one element. A normal range uses two marker elements, a begin and an end element. Therefore, cursors are less susceptible to invalidation on removal of elements. Cursors support these additional features: - Use a cursor as a reference point when dealing with a container. - Use as an end point when slicing a container. Note that some containers only support slicing with an arbitrary cursor and the beginning or end of the container. Slicing is guaranteed to be a fast operation (lgn or better). Determining cursor/range ownership: All collections can identify whether a cursor/range belongs to them. Each collection class has a 'belongs' method to determine ownership. The ownership check is guaranteed to be a fast operation (O(lgN) or better). ------------------- The 'three-legged' cursor as you propose was an early idea of mine too. Because dcollections containers are classes however, there is an easy point of ownership to establish -- the class reference. So a dcollections cursor doesn't have to refer to its owner, but it can in order to fulfill the requirement. My view is that a cursor should be separate from a range, and a range should either be able to tell if a cursor is part of it ( safe) or be told that a cursor is a part of it ( system or trusted). Inside a three-legged algorithm which accepts a range as input, and generates a cursor from the range for the third leg, it can force the range to assume the cursor is a part of it, knowing that it's true. You can make the second operation safe by wrapping a cursor + range in a struct that generates the cursor from the range (and therefore knows the cursor is a part of the range). For two examples of ranges that can tell if a cursor is a part of it, an array can tell if a cursor is a part of it without any extra mechanisms because it is an O(1) check for an interior pointer. A balanced tree can also perform an O(lgn) check that a cursor is part of it (though it's questionable whether a tree is a range). -Steve
Aug 26 2010
parent reply Pillsy <pillsbury gmail.com> writes:
Steven Schveighoffer Wrote:

 On Thu, 26 Aug 2010 10:18:51 -0400, Pillsy <pillsbury gmail.com> 
 wrote:

 The key idea is that these cursors aren't a primitive part of a
 range; instead, they take a range and add a position *inside* 
 the range. They're a perfect fit for all those "three-legged" 
 operations out there because they actually have three legs.


 The 'three-legged' cursor as you propose was an early idea of
 mine too. 

If you don't mind me asking, what made you give up on it?
 My view is that a cursor should be separate from a range, and a
 range should either be able to tell if a cursor is part of it ( safe)
 or be told  that a cursor is a part of it ( system or  trusted).  

What's the advantage of not holding on to the range as part of the cursor? I'm more focused on the typical value-type ranges, but I'm not seeing a lot of need for cursors that have lives independent of underlying ranges. Is the concern just one of more frequent invalidation? IME, three-legged operations are fairly common, but four- and more- legged operations which can't be easily decomposed into consecutive three-legged operations are very uncommon. What I'm thinking is that cursors aren't skinny ranges, they're ranges that are even fatter. You simply wouldn't use one where a range would suffice. Allowing questions about whether a range contains a cursor or not strikes me as just buying trouble. Thanks for your thoughts, Pillsy
Aug 26 2010
parent Pillsy <pillsbury gmail.com> writes:
Steven Schveighoffer Wrote:

 On Thu, 26 Aug 2010 11:17:32 -0400, Pillsy <pillsbury gmail.com> 
 wrote:

 If you don't mind me asking, what made you give up on it?


 A cursor that refers to one element is much more efficient to pass  
 around/copy.  

Ah, yes. If you just need the one element, I figure that you can use a range. In most of the cases I'm familiar with, ranges are not distinctly less lightweight than array slices, and in D one tends to blithely pass slices all over the place. Things could be different for iterating over trees---is that where you ran into trouble with the idea in dcollections?
 Plus the extra baggage was not needed in dcollections -- any  
 operations you perform on a cursor besides changing the value 
 require you  to have the container also (i.e. a cursor is mainly a 
 parameter to the  container).  I don't know how this relates to 
 three-legged algorithms and  straight ranges+cursors, dcollections 
 ranges+cursors would need some extra binding support to use 
 them, since you need the container to create a  
 range based on two cursors.

 What's the advantage of not holding on to the range as part of
 the cursor? I'm more focused on the typical value-type ranges, 
 but I'm not seeing a lot of need for cursors that have lives 
 independent of underlying ranges. Is the concern just one of 
 more frequent invalidation?


 Because some cursors do not need the hard reference, it is trivial 
 to tell whether such cursors belong to a range.  For instance, it's
 trivial to tell that a pointer is part of an array.  So a cursor for an
 array would not need to refer the range it belongs to.

Ah, I see. That would make things harder to use for the specific use case I'm thinking of, where one has a range one would like to split at a certain element (which you might have found any number of ways). Basically, you can just pass around the cursor, instead of needing the cursor and the range. This strikes me as likely to be very helpful for three-legged races^W cases.
 For instance, if I have an array, and a cursor in that array, it's trivial  
 for an allBefore operation to determine that the cursor is a member
 of the array. Now, let's say we have two slices that contain the 
 same cursor, in  your scheme, in order to use one cursor against 
 the other range, you need  to recompose the cursor, which seems
 unnecessarily awkward. 

The way I see it, two different slices can't contain the same cursor; a cursor into a slice will always be associated with that slice and no other. Maybe there are important use cases I'm missing by defining things this way, but I don't know what they might be. [...]
 For example, if I want to store a link list element cursor so I can 
 change it later, why do I also have to carry around the baggage of 
 the range it came from?

You may not, but why not just hang on to the range it came from in the first place instead of using the cursor? What's in the link-list range besides a pointer to a node anyway? [...]
 What I'm thinking is that cursors aren't skinny ranges, they're 
 ranges that are even fatter. You simply wouldn't use one
 where a range would suffice. Allowing questions about whether
 a range contains a cursor or not strikes me as just buying 
 trouble.


 But you don't always care if a cursor is a part of a range.  For 
 example, to change or refer to a specific element of a range, 
 why do you need to refer to the range also? 

You don't, but what do you lose by referring to the range also? A few words on the stack (sticking with the idea that ranges are value types)? [...]
 I don't have a problem with defining such a three-legged type,
 but can we call it something besides a cursor? 

Sure. How about Tripod? 8^) I was thinking of calling them either Splitters, actually, 'cause that's what they're there for: splitting ranges. Cheers, Pillsy
Aug 26 2010
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 26 Aug 2010 11:17:32 -0400, Pillsy <pillsbury gmail.com> wrote:

 Steven Schveighoffer Wrote:

 On Thu, 26 Aug 2010 10:18:51 -0400, Pillsy <pillsbury gmail.com>
 wrote:

 The key idea is that these cursors aren't a primitive part of a
 range; instead, they take a range and add a position *inside*
 the range. They're a perfect fit for all those "three-legged"
 operations out there because they actually have three legs.


 The 'three-legged' cursor as you propose was an early idea of
 mine too.

If you don't mind me asking, what made you give up on it?

A cursor that refers to one element is much more efficient to pass around/copy. Plus the extra baggage was not needed in dcollections -- any operations you perform on a cursor besides changing the value require you to have the container also (i.e. a cursor is mainly a parameter to the container). I don't know how this relates to three-legged algorithms and straight ranges+cursors, dcollections ranges+cursors would need some extra binding support to use them, since you need the container to create a range based on two cursors.
 My view is that a cursor should be separate from a range, and a
 range should either be able to tell if a cursor is part of it ( safe)
 or be told  that a cursor is a part of it ( system or  trusted).

What's the advantage of not holding on to the range as part of the cursor? I'm more focused on the typical value-type ranges, but I'm not seeing a lot of need for cursors that have lives independent of underlying ranges. Is the concern just one of more frequent invalidation?

Because some cursors do not need the hard reference, it is trivial to tell whether such cursors belong to a range. For instance, it's trivial to tell that a pointer is part of an array. So a cursor for an array would not need to refer the range it belongs to. For instance, if I have an array, and a cursor in that array, it's trivial for an allBefore operation to determine that the cursor is a member of the array. Now, let's say we have two slices that contain the same cursor, in your scheme, in order to use one cursor against the other range, you need to recompose the cursor, which seems unnecessarily awkward. Such a scheme is not possible on other ranges, but that just means those ranges don't support safe calls for allBefore and the like. For those ranges, a 3-legged type would be appropriate, but the 3-legged thing could be a combination of a cursor and a range, with the cursor proven to be part of the range -- an invariant of the 3-legged type. Having a cursor point to exactly one element allows more flexibility, because you could use a cursor with multiple ranges. It also decouples the cursor from a range that might be superfluous information. For example, if I want to store a link list element cursor so I can change it later, why do I also have to carry around the baggage of the range it came from? One of the drawbacks is the cursor cannot move to other elements or alter topology, it's strictly a reference-only. Another reason to have the 3-legged type.
 IME, three-legged operations are fairly common, but four- and more-
 legged operations which can't be easily decomposed into consecutive
 three-legged operations are very uncommon.

I have no experience with three-legged operations. I just looked at cursors from the point of view of reference to a single element. Such an ability is important for a container library. I can see such an ability being important for simple operations as well, not just algorithms.
 What I'm thinking is that cursors aren't skinny ranges, they're ranges
 that are even fatter. You simply wouldn't use one where a range
 would suffice. Allowing questions about whether a range contains a
 cursor or not strikes me as just buying trouble.

But you don't always care if a cursor is a part of a range. For example, to change or refer to a specific element of a range, why do you need to refer to the range also? The composition of a range and cursor together can be guarded by system or trusted indicating the type system is taking your word for it, but in some cases, it can be proven without assumption (i.e. arrays). I don't have a problem with defining such a three-legged type, but can we call it something besides a cursor? Not that I have a trademark or anything, but it would be really confusing to have it mean two different things depending on context :) As I said, I can see utility in having a three-legged beast that combines a cursor and a range, but I see value in having a simple one-element cursor as well. -Steve
Aug 26 2010
prev sibling next sibling parent reply dsimcha <dsimcha yahoo.com> writes:
== Quote from Pillsy (pillsbury gmail.com)'s article
 These thoughts were inspired by the recent thread, "Retrieving the
 traversed range". I am generally pretty sold on the range concept,
 but I think for some operations some sort of cursor is required.
 However, there are some unsafe cursor-related operations that are
 best avoided.
 However, as far as I can see, the unsafe operations revolve around
 synthesizing a new range out of two cursors. However, you can get
 a lot of the desired functionality associated with finding elements and
 splitting ranges *without* allowing the synthesis of of ranges from
 unrelated cursors. The key to doing this is to have each cursor be permanently

 If you have this kind of cursor, you only really need it to support two

      Range before ();
      Range after ();
 The before () operation just returns a range of the appropriate type (Forward,

pointing to, while the after () operation returns a range of the appropriate type with all the elements after the cursor. Given this concept, a cursor really points at the boundary between elements of a range, if you will.
 If find and related operations return this kind of cursor, all of the
operations

before () or after () to be empty, and since you can't glue two of these cursors together, you are always guaranteed to get well-defined ranges out.
 While I think the before () and after () operations are the only ones
 which are strictly necessary for the desiderata I've seen so far, people
 could write their own generic algorithms if cursors supported more of
 the iterator operations, for moving forward (and backward, where appropriate)

appropriate), and if ranges supported operations to return cursors from reasonable places (like from the front or back, or from a specific indexed elemnt for RandomAccess ranges).
 The key idea is that these cursors aren't a primitive part of a range; instead,

all those "three-legged" operations out there because they actually have three legs.
 Cheers,
 Pillsy

I'm starting to think that the whole concept of ranges is starting to get too complicated. We're already seeing it with this moveFront() stuff that's a special case to deal with structs that have expensive postblits. We need a healthy dose of "worse is better" to throw at ranges. IMHO generic code should solve 90% of the problem in a way that's robust, not terribly hard to implement and easy to use and understand. Trying to solve 100% of the problem with generic code means that your solution will be absurdly complicated in both interface and implementation, and therefore will be a buggy POS that nobody can figure out how to use. Adding yet more complexity to ranges has the following negative consequences, such that I think it's justified to sacrifice completeness in favor of simplicity: 1. Having yet more things to wrap makes writing correct higher order ranges that much more difficult and bug-prone. The composability of ranges is what makes them such an elegant and useful abstraction in the first place. 2. The combinatorial explosion of range types (isBidirectional, hasBefore, hasLength, ...) makes testing a higher order range a nightmare. 3. The more stuff we require someone to define to create a range (for example, if before() were simply required for bidirectional ranges), the higher the odds that we'll run into cases where one of these primitives can't be implemented efficiently. 4. I remember when ranges first came out, some people were pretty confused by them. Adding more stuff to solve the last 10% of cases will only add to this. It's better to have ranges that can solve 90% of the problem for the most competent 80% of programmers than ranges that can solve 100% of the problem, but only for the 1% who are hardcore gurus.
Aug 26 2010
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/26/10 9:18 PDT, dsimcha wrote:
 I'm starting to think that the whole concept of ranges is starting to get too
 complicated.  We're already seeing it with this moveFront() stuff that's a
special
 case to deal with structs that have expensive postblits.  We need a healthy
dose
 of "worse is better" to throw at ranges.  IMHO generic code should solve 90% of
 the problem in a way that's robust, not terribly hard to implement and easy to
use
 and understand.  Trying to solve 100% of the problem with generic code means
that
 your solution will be absurdly complicated in both interface and
implementation,
 and therefore will be a buggy POS that nobody can figure out how to use.

 It's better to have ranges that can solve 90% of the problem for the most
 competent 80% of programmers than ranges that can solve 100% of the problem,
but
 only for the 1% who are hardcore gurus.

I'm weary of this philosophy. I've seen a phenomenon happen time and again with myself and others: you hit a 10% case exactly when you (a) know what you want to do, (b) you know that the library could and should help you, and (c) you are working on a difficult enough problem to be wary of implementing it by hand. Two examples from recent history: A poster used std.algorithm for the _first_ (sic!) time, and found a fundamental limitation: std.algorithm doesn't work with immutable arrays. Second, Peter Alexander wanted to implement nextPermutation (a nontrivial algorithm in which you need all the help you can get) as his _first_ (sic again!) artifact of a new library, and hit the bidirectional range composition limitation. I don't think we want to aim std.algorithm and others for a 90% thing. That's what Visual Basic did. We should aim for making it a 100% thing without losing the other costs from sight (something that I believe we have a good track record on). In light of that, it's worth looking at optional functionality of the kind I described (allBefore etc.) would add power without aggravating everyone. Also, regarding moveFront and friends, I think it's a good time to start the following discussion: do we want to support types with arbitrarily expensive copying, or could we simply require reference counting and copy-on-write for such types? Let's start a new thread about this! Andrei
Aug 26 2010
parent Pillsy <pillsbury gmail.com> writes:
Jonathan M Davis Wrote:

 On Thursday, August 26, 2010 10:51:02 Andrei Alexandrescu
 wrote:

 I'm weary of this philosophy. I've seen a phenomenon happen
 time and again with myself and others: you hit a 10% case 
 exactly when you (a) know what you want to do, (b) you 
 know that the library could and should help you, and (c) you
 are working on a difficult enough problem to be wary of
 implementing it by hand.


 I think that what it comes down to is that you need to be smart
 about what functionality that you do and don't include.

I really don't disagree. One of the reasons I'm suggesting that we have a Tripod[1] concept to cover these three-legged operations is that I think that it's cohesive enough that fitting this functionality into a Range directly is unduly complicating. Having two concepts which each provide a reasonably orthogonal set of six basic operations is much less complicated than having one concept which provides a dozen less orthogonal operations. [1] I actually think "Pivot" is the right name of the range+iterator-in-the-middle abstraction.
 David has an excellent point that 
 when APIs become too complicated, they don't get used. For 
 instance, I know _very_ few people who even _consider_ using
 C++'s algorithm library. It's just too hard to use.

Huh. I use it with some frequency when I program C++, and I hardly regard myself as a C++ wizard. Sure, it can be pretty damn awkward, but welcome to C++. [...]
 It has too many quirks (like how remove() works) and the lack 
 of lambdas functions makes it incredibly painful to do simple things
 with it. 

I think the lambda functions are a great example: they're a place where C++98/03 doesn't provide *enough*, rather than providing too much. The library runs up against a fundamental limitation of the underlying language, rendering parts of it (like for_each) pretty useless. There can be a severe cost to not providing enough. [...]
 I do _not_ think that we should necessarily be hitting the 
 100% case. The cost for that is likely too high. However, we 
 probably should be going for more like the 98% or 99% case.
 (after all 90% indicates that 1 out of 10 things that you try to
 do won't be feasible, and that's actually a lot, much as 90% 
 sounds like a big number).

This is definitely agree with. Making something that legitimately does everything anyone could ever want is how you end up with nightmares like Scheme's call-with-current-continuation, but letting 1 use case in 10 drop on the floor makes the cost/benefit ratio of learning your library or language really dubious. [...] Cheers, Pillsy
Aug 27 2010
prev sibling parent Jonathan M Davis <jmdavisprog gmail.com> writes:
On Thursday, August 26, 2010 10:51:02 Andrei Alexandrescu wrote:
 On 8/26/10 9:18 PDT, dsimcha wrote:
 I'm starting to think that the whole concept of ranges is starting to get
 too complicated.  We're already seeing it with this moveFront() stuff
 that's a special case to deal with structs that have expensive
 postblits.  We need a healthy dose of "worse is better" to throw at
 ranges.  IMHO generic code should solve 90% of the problem in a way
 that's robust, not terribly hard to implement and easy to use and
 understand.  Trying to solve 100% of the problem with generic code means
 that your solution will be absurdly complicated in both interface and
 implementation, and therefore will be a buggy POS that nobody can figure
 out how to use.

[...]
 It's better to have ranges that can solve 90% of the problem for the most
 competent 80% of programmers than ranges that can solve 100% of the
 problem, but only for the 1% who are hardcore gurus.

I'm weary of this philosophy. I've seen a phenomenon happen time and again with myself and others: you hit a 10% case exactly when you (a) know what you want to do, (b) you know that the library could and should help you, and (c) you are working on a difficult enough problem to be wary of implementing it by hand.

I think that what it comes down to is that you need to be smart about what functionality that you do and don't include. David has an excellent point that when APIs become too complicated, they don't get used. For instance, I know _very_ few people who even _consider_ using C++'s algorithm library. It's just too hard to use. It has too many quirks (like how remove() works) and the lack of lambdas functions makes it incredibly painful to do simple things with it. Any library with much complexity is going to run into such issues, but they need to be minimized for a library to be properly useable. We can't afford to make ranges too complex. They need to be reasonably useable by your average programmer. Any time that we need to add functionality to make them do more, we need to weigh the benefits and costs. If adding ability X makes it possible to do thing Y, but only a few people need to do thing Y, and adding feature X greatly increases the complexity, then it probably shouldn't be added. On the other hand, if adding feature X doesn't add much complexity and/or a lot of people need to do thing Y, then it would likely be a good thing to add feature X. If we find something that a lot of people are trying to do and can't, then we need to really look at a reasonable way to make it possible. However, there are limits to what you can do without making Phobos' APIs unreasonably complex to understand and use. I do _not_ think that we should necessarily be hitting the 100% case. The cost for that is likely too high. However, we probably should be going for more like the 98% or 99% case. (after all 90% indicates that 1 out of 10 things that you try to do won't be feasible, and that's actually a lot, much as 90% sounds like a big number). So, I do think that we should be making ranges (and the rest of Phobos) as powerful we reasonably can, but there _are_ limits to what they can reasonably do, and we should not necessarily try to make it so that they can do absolutely everything. And when we do make ranges more powerful, we need to strive to limit the increase to complexity as much as possible. Ultimately, ranges really should be powerful enough to do virtually everything that you would normally do with iterators (along with what they can do that iterators can't), but they should also be useable by the average programmer. Whatever we do needs to reasonably balance those goals. And I think that there is legitmate concern that ranges are risking becoming too complex. - Jonathan M Davis
Aug 26 2010
prev sibling next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/26/10 7:18 PDT, Pillsy wrote:
[snip]
       Range before ();

       Range after ();

(I wrote my last post before reading yours.) Excellent points. I do think we can get away without defining a new type. In addition to the allBefore(all, tail) primitive (which is easy to implement as a safe primitive by ranges that support it) we could also define allAfter(all, head) that works when head is superimposed over the beginning portion of all (is a prefix of it). It would still be impossible to take a range somewhere in the middle of another and then get in constant time the portion before and after it, but (a) that is difficult to make safe in O(1) anyway, and (b) there is exceedingly rare need for it. Andrei
Aug 26 2010
prev sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 26 Aug 2010 13:30:35 -0400, Pillsy <pillsbury gmail.com> wrote:

 Steven Schveighoffer Wrote:

 On Thu, 26 Aug 2010 11:17:32 -0400, Pillsy <pillsbury gmail.com>
 wrote:

 If you don't mind me asking, what made you give up on it?


 A cursor that refers to one element is much more efficient to pass
 around/copy.

Ah, yes. If you just need the one element, I figure that you can use a range. In most of the cases I'm familiar with, ranges are not distinctly less lightweight than array slices, and in D one tends to blithely pass slices all over the place. Things could be different for iterating over trees---is that where you ran into trouble with the idea in dcollections?

I think there is a misunderstanding of how I went about defining dcollections cursors. I never actually tried the tripod idea, I just posted it to the newsgroup as a possible solution for dcollections cursors. Originally in dcollections 1, cursors were iterators -- allowing movement of the cursor as well as dereferencing. This was way way back, before ranges were a prevalent concept and I was trying to write a better collection library for tango. When negotiating with Andrei on allowing me to have cursors (when I hoped dcollections could be a part of phobos), I went through several different ideas to try and make cursors safe. I ended up trying to define a simple reference type that didn't allow iterating. That is, a reference to a single item that cannot move, thereby removing the unsafe nature of iterators. I ran into the problem that the "end" cursor was pointing to an invalid element, so you still had to rely on the coder not referencing it, just like in C++. I decided I'd add a bool to the cursor to let it know it was invalid, and the idea hit me all of a sudden -- that's just a 0 or 1-element range depending on the bool. So the current incarnation was born. And I have to say, it's quite usable. Once I defined it that way, the code almost wrote itself. The issue with using ranges to refer to single elements is, it's difficult to keep a node-based range intact, for later use. For example, if you want to refer to a single element in a tree, and you use a range to do that (assuming it's a range with a start and end node), things can happen. More elements could be inserted between them, the end marker of your range could get removed, etc. Then when you pass the range to the tree's remove function, you're giving it an invalid range, or removing more elements than you wanted. Cursors are not susceptible to these problems, since they only ever refer to one element. So there *is* value in having a special cursor type that only refers to one element vs. a range, regardless of the copy performance.
 Because some cursors do not need the hard reference, it is trivial
 to tell whether such cursors belong to a range.  For instance, it's
 trivial to tell that a pointer is part of an array.  So a cursor for an
 array would not need to refer the range it belongs to.

Ah, I see. That would make things harder to use for the specific use case I'm thinking of, where one has a range one would like to split at a certain element (which you might have found any number of ways). Basically, you can just pass around the cursor, instead of needing the cursor and the range. This strikes me as likely to be very helpful for three-legged races^W cases.

Sure, but you can build such a type and its corresponding splitter function (which actually could be a member of the tripod) in terms of the low-level function which are more powerful because they accept two different entities -- the range and the cursor.
 For instance, if I have an array, and a cursor in that array, it's  
 trivial
 for an allBefore operation to determine that the cursor is a member
 of the array. Now, let's say we have two slices that contain the
 same cursor, in  your scheme, in order to use one cursor against
 the other range, you need  to recompose the cursor, which seems
 unnecessarily awkward.

The way I see it, two different slices can't contain the same cursor; a cursor into a slice will always be associated with that slice and no other. Maybe there are important use cases I'm missing by defining things this way, but I don't know what they might be. [...]

Not sure what the use cases would be, but defining the simplest thing possible and then building more complex concepts on top of them usually allows more possibilities.
 For example, if I want to store a link list element cursor so I can
 change it later, why do I also have to carry around the baggage of
 the range it came from?

You may not, but why not just hang on to the range it came from in the first place instead of using the cursor? What's in the link-list range besides a pointer to a node anyway?

The pointer to the possibly invalid end node. See my above description.
 [...]
 What I'm thinking is that cursors aren't skinny ranges, they're
 ranges that are even fatter. You simply wouldn't use one
 where a range would suffice. Allowing questions about whether
 a range contains a cursor or not strikes me as just buying
 trouble.


 But you don't always care if a cursor is a part of a range.  For
 example, to change or refer to a specific element of a range,
 why do you need to refer to the range also?

You don't, but what do you lose by referring to the range also? A few words on the stack (sticking with the idea that ranges are value types)?

It becomes awkward when you are passing a range around but are saying "well, really, I'm only passing you a reference to the first element, ignore the rest of it". You end up naming things like removeFirstElement(range r) instead of remove(cursor c). This is from my experience with defining dcollections and a std.container redblacktree.
 [...]
 I don't have a problem with defining such a three-legged type,
 but can we call it something besides a cursor?

Sure. How about Tripod? 8^)

I like it :) -Steve
Aug 26 2010