www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Iterators Must Go

reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
The slides from my keynote at BoostCon 2009 (www.boostcon.com) are now 
available from:

http://www.boostcon.com/site-media/var/sphene/sphwiki/attachment/2009/05/08/iterators-must-go.pdf

The talk went so well, the urge to brag is too mighty to resist. I mean 
it's not everyday that several people come and tell they've literally 
lost sleep and focus on other talks because of thinking about ranges. 
Also, there was a definite feel of "before" and "after". In short, the 
talk has generated a great deal of interest in both D proper and in the 
Boost community rewriting the STL entirely in terms of ranges.


Andrei
May 07 2009
next sibling parent Georg Wrede <georg.wrede iki.fi> writes:
Andrei Alexandrescu wrote:
 The slides from my keynote at BoostCon 2009 (www.boostcon.com) are now 
 available from:
 
 http://www.boostcon.com/site-media/var/sphene/sphwiki/attachment/2009/05/08/i
erators-must-go.pdf 
 
 
 The talk went so well, the urge to brag is too mighty to resist. I mean 
 it's not everyday that several people come and tell they've literally 
 lost sleep and focus on other talks because of thinking about ranges. 
 Also, there was a definite feel of "before" and "after". In short, the 
 talk has generated a great deal of interest in both D proper and in the 
 Boost community rewriting the STL entirely in terms of ranges.

Have you considered an article in the paper version of DrDobb's? And maybe some other publications, too. D could use this PR. (Maybe even CUJ.)
May 07 2009
prev sibling next sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
http://www.reddit.com/r/programming/comments/8isiw/author_of_modern_c_design_stl_iterators_must_die/
May 07 2009
parent Georg Wrede <georg.wrede iki.fi> writes:
Walter Bright wrote:
 http://www.reddit.com/r/programming/comments/8isiw/author_of_modern_c_design_stl
iterators_must_die/ 

The link reads like Andrei wants to mureder Stepanov..... Good thing I read the slides. :-)
May 08 2009
prev sibling next sibling parent reply Sean Kelly <sean invisibleduck.org> writes:
Great paper!  The only quibble I have so far is that I think iterators 
can support sentinel-terminated containers, it just isn't terribly 
natural to do so.
May 08 2009
parent Sean Kelly <sean invisibleduck.org> writes:
Sean Kelly wrote:
 Great paper!  The only quibble I have so far is that I think iterators 
 can support sentinel-terminated containers, it just isn't terribly 
 natural to do so.

The things I particularly like: find_end() -- Reverse iterators are a pain in the ass. The simplicity of this is just fantastic. Chain, Zip, Stride, Radial -- 100% pure awesome. partial_sort() -- The definition of this is just ingenious. It derives naturally from the range design and yet I'd never have thought of it otherwise. Regarding iostream ranges, I'd have liked if there was a bit more about the difference in interface: put vs. push/pop. One "benefit" of stream iterators is that they use the same syntax as other iterators, so it would be good to hear why the change in syntax for ranges isn't actually a design flaw. Overall, the presentation makes a very compelling case for ranges. While it's /possible/ to do quite a lot with iterators, the truly interesting stuff is so difficult/messy that it isn't worth doing. Ranges are elegant and safe for even fancy things. Nice work.
May 08 2009
prev sibling next sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 07 May 2009 23:01:20 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 The slides from my keynote at BoostCon 2009 (www.boostcon.com) are now  
 available from:

 http://www.boostcon.com/site-media/var/sphene/sphwiki/attachment/2009/05/08/iterators-must-go.pdf

 The talk went so well, the urge to brag is too mighty to resist. I mean  
 it's not everyday that several people come and tell they've literally  
 lost sleep and focus on other talks because of thinking about ranges.  
 Also, there was a definite feel of "before" and "after". In short, the  
 talk has generated a great deal of interest in both D proper and in the  
 Boost community rewriting the STL entirely in terms of ranges.

A good paper. You still have not addressed the usage of iterators as general data structure pointers. As far as I can tell, ranges do not implement this. i.e. find surrounding elements of an element. With iterators: auto iter = container.find(elem); auto elembefore = iter - 1; auto elemafter = iter + 1; Assuming incrementing and decrementing an iterator is checked for out-of-bounds. -Steve
May 08 2009
parent reply Walter Bright <newshound1 digitalmars.com> writes:
Steven Schveighoffer wrote:
 You still have not addressed the usage of iterators as general data 
 structure pointers.  As far as I can tell, ranges do not implement this.
 
 i.e. find surrounding elements of an element.
 
 With iterators:
 
 auto iter = container.find(elem);
 auto elembefore = iter - 1;
 auto elemafter = iter + 1;
 
 Assuming incrementing and decrementing an iterator is checked for 
 out-of-bounds.

The problem is that last statement - "Assuming". If the iterator is the first or the last, or if there's only 1 or 2 elements in the container, it's crash city. Iterators are *inherently* uncheckable. For finding the elemafter, it's trivial as find() returns a range from the found element to the end (and it's also trivially checkable!). For elembefore, there's a bit more work involved, probably defining a find() that returns a range backed up by one one.
May 08 2009
parent reply Walter Bright <newshound1 digitalmars.com> writes:
Steven Schveighoffer wrote:
 You're assuming an iterator does not know its bounds.

That's right. That's the usual design, which is based on the pointer model. Pointers do not know their limits.
 Maybe I should 
 call it something other than iterator.  How about cursor?

Or range? <g>
 There are definite reasons to use containers in ways that don't involve 
 std.algorithm, where something that has the easy ability to move back 
 and forth N times without weird subrange operations.
 
 I'm thinking of a structure with either a pointer to the container for 
 bounds checking, or a range and pointer combined (where the invariant is 
 that the pointer is always within the range).
 
 I'm not saying ranges are not great, i think they are a HUGE step 
 forward, but the statement "Iterators must be eliminated" may be too 
 harsh.  Perhaps the unchecked iterator, yes (but you may want to allow 
 it in certain performance-critical code).

If you had an iterator that knew its beginning and end, then the whole paradigm of: for (iterator i = begin; i != end; i++) doesn't make much sense because of the redundancy.
May 08 2009
next sibling parent reply Sean Kelly <sean invisibleduck.org> writes:
Walter Bright wrote:
 
 If you had an iterator that knew its beginning and end, then the whole 
 paradigm of:
 
    for (iterator i = begin; i != end; i++)
 
 doesn't make much sense because of the redundancy.

Yup. And most of the "interesting" iterators fall into this category--the one returned from begin or rbegin is the real iterator and the one returned from end or rend is just used to tell the real cursor to check whether it's at the end or not. This is why I commented on Andrei's statement that it's impossible to make an iterator for a delimited range. It's possible, the design is just unnatural.
May 09 2009
parent reply Fawzi Mohamed <fmohamed mac.com> writes:
On 2009-05-09 18:06:48 +0200, Sean Kelly <sean invisibleduck.org> said:

 Walter Bright wrote:
 
 If you had an iterator that knew its beginning and end, then the whole 
 paradigm of:
 
    for (iterator i = begin; i != end; i++)
 
 doesn't make much sense because of the redundancy.

Yup. And most of the "interesting" iterators fall into this category--the one returned from begin or rbegin is the real iterator and the one returned from end or rend is just used to tell the real cursor to check whether it's at the end or not. This is why I commented on Andrei's statement that it's impossible to make an iterator for a delimited range. It's possible, the design is just unnatural.

Indeed I have always argued that the unbundling of the iterator end done by C++ was a bad idea. All other languages that have iterators or generators did not make this mistake, and luckily it seems that D2 did not either. I had to argue but finally ranges have ForwardRange and the possibility to check the head with another Range I would call that (which is in my opinion the most general and useful) simply Iterator because that is what it is, and iterator which knows its end (as generators in python, iterators in Aldor, and almost all languages that want to be reasonably safe). Anyway as long as it works and for_each works on them I don't care much about the name used... Fawzi
May 09 2009
parent reply Rainer Deyke <rainerd eldwood.com> writes:
Although I like ranges, it looks to me like there are a couple of
operations that would difficult to implement without iterators or some
other way to specify a specific position in a range.

Finding and erasing an element:
  list.erase(find(list.begin(), list.end(), e));

Splitting a container on a position:
  iter = find(list.begin(), list.end(), e);
  do_something_with(list.begin(), iter);
  do_something_else_with(iter, list.end());

Inserting into a container at a position:
  iter = find(list.begin(), list.end(), e);
  list.insert(iter, array.begin(), array.end());

Constructing a range from two independent position:
  iter1 = find(list.begin(), list.end(), e1);
  iter2 = rfind(list.begin(), list.end(), e2);
  do_something_with(iter1, iter2);


-- 
Rainer Deyke - rainerd eldwood.com
May 09 2009
parent reply dsimcha <dsimcha yahoo.com> writes:
== Quote from Rainer Deyke (rainerd eldwood.com)'s article
 Although I like ranges, it looks to me like there are a couple of
 operations that would difficult to implement without iterators or some
 other way to specify a specific position in a range.
 Finding and erasing an element:
   list.erase(find(list.begin(), list.end(), e));

What's wrong with this? Since list owns its range representation, it can know the implementation details. For a linked list, this will probably be just a pair of pointers under the hood anyhow. In other words, it's internally still an iterator, just prettier looking.
 Splitting a container on a position:
   iter = find(list.begin(), list.end(), e);
   do_something_with(list.begin(), iter);
   do_something_else_with(iter, list.end());

This one is legit, as far as I can tell. On the other hand, although it's awkward, you could do something like: Range myRange1; auto myRange2 = find(myRange1, e); struct PairOfRanges { Range myRange1, myRange2; auto front() { return myRange1.front; } bool empty() { return myRange1 == myRange2; } void popFront() { myRange1.popFront; } }
 Inserting into a container at a position:
   iter = find(list.begin(), list.end(), e);
   list.insert(iter, array.begin(), array.end());

Same as erasing.
 Constructing a range from two independent position:
   iter1 = find(list.begin(), list.end(), e1);
   iter2 = rfind(list.begin(), list.end(), e2);
   do_something_with(iter1, iter2);

Assuming find() works by popping elements off the front of the range until it finds what it's looking for, and then returning that, and rfind() does the same thing but from the back, just do something like: Range myRange = find(list, e1); myRange = rfind(myRange, e2); do_something_with(myRange);
May 09 2009
parent reply Rainer Deyke <rainerd eldwood.com> writes:
dsimcha wrote:
 == Quote from Rainer Deyke (rainerd eldwood.com)'s article
 Although I like ranges, it looks to me like there are a couple of
 operations that would difficult to implement without iterators or some
 other way to specify a specific position in a range.
 Finding and erasing an element:
   list.erase(find(list.begin(), list.end(), e));

What's wrong with this? Since list owns its range representation, it can know the implementation details. For a linked list, this will probably be just a pair of pointers under the hood anyhow. In other words, it's internally still an iterator, just prettier looking.

The following is semantically incorrect: r = find(list, e) list.erase(r) 'find' advances only the front of the range to the found element. Therefore 'r' is the range of all elements starting with the found element, but also including all elements after that. I would expect 'list.erase(range)' to erase all elements in the range. However, in this case I only want to erase a single element. This could still be expressed with ranges, but it would require a different function. For example: find(list, e).eraseFront() Or: list.eraseFrontOf(find(list, e)) Or: list.eraseOne(find(list, e)) Or: list.findAndErase(e) Or: list.erase(take(1, find(list, e)))
 Splitting a container on a position:
   iter = find(list.begin(), list.end(), e);
   do_something_with(list.begin(), iter);
   do_something_else_with(iter, list.end());

This one is legit, as far as I can tell. On the other hand, although it's awkward, you could do something like: Range myRange1; auto myRange2 = find(myRange1, e); struct PairOfRanges { Range myRange1, myRange2; auto front() { return myRange1.front; } bool empty() { return myRange1 == myRange2; } void popFront() { myRange1.popFront; } }

Unfortunately this falls apart if my 'do_something_with' in the above example is 'list.erase', unless 'list.erase' can look inside a 'PairOfRanges'.
 Inserting into a container at a position:
   iter = find(list.begin(), list.end(), e);
   list.insert(iter, array.begin(), array.end());

Same as erasing.
 Constructing a range from two independent position:
   iter1 = find(list.begin(), list.end(), e1);
   iter2 = rfind(list.begin(), list.end(), e2);
   do_something_with(iter1, iter2);

Assuming find() works by popping elements off the front of the range until it finds what it's looking for, and then returning that, and rfind() does the same thing but from the back, just do something like: Range myRange = find(list, e1); myRange = rfind(myRange, e2); do_something_with(myRange);

That works in this case, but what if the iterators (sorry, ranges) come from different sources? -- Rainer Deyke - rainerd eldwood.com
May 09 2009
parent dsimcha <dsimcha yahoo.com> writes:
== Quote from Rainer Deyke (rainerd eldwood.com)'s article
 dsimcha wrote:
 == Quote from Rainer Deyke (rainerd eldwood.com)'s article
 Although I like ranges, it looks to me like there are a couple of
 operations that would difficult to implement without iterators or some
 other way to specify a specific position in a range.
 Finding and erasing an element:
   list.erase(find(list.begin(), list.end(), e));

What's wrong with this? Since list owns its range representation, it can know the implementation details. For a linked list, this will probably be just a pair of pointers under the hood anyhow. In other words, it's internally still an iterator, just prettier looking.

r = find(list, e) list.erase(r) 'find' advances only the front of the range to the found element. Therefore 'r' is the range of all elements starting with the found element, but also including all elements after that. I would expect 'list.erase(range)' to erase all elements in the range. However, in this case I only want to erase a single element. This could still be expressed with ranges, but it would require a different function. For example: find(list, e).eraseFront() Or: list.eraseFrontOf(find(list, e)) Or: list.eraseOne(find(list, e)) Or: list.findAndErase(e) Or: list.erase(take(1, find(list, e)))
 Splitting a container on a position:
   iter = find(list.begin(), list.end(), e);
   do_something_with(list.begin(), iter);
   do_something_else_with(iter, list.end());

This one is legit, as far as I can tell. On the other hand, although it's awkward, you could do something like: Range myRange1; auto myRange2 = find(myRange1, e); struct PairOfRanges { Range myRange1, myRange2; auto front() { return myRange1.front; } bool empty() { return myRange1 == myRange2; } void popFront() { myRange1.popFront; } }

example is 'list.erase', unless 'list.erase' can look inside a 'PairOfRanges'.
 Inserting into a container at a position:
   iter = find(list.begin(), list.end(), e);
   list.insert(iter, array.begin(), array.end());

Same as erasing.
 Constructing a range from two independent position:
   iter1 = find(list.begin(), list.end(), e1);
   iter2 = rfind(list.begin(), list.end(), e2);
   do_something_with(iter1, iter2);

Assuming find() works by popping elements off the front of the range until it finds what it's looking for, and then returning that, and rfind() does the same thing but from the back, just do something like: Range myRange = find(list, e1); myRange = rfind(myRange, e2); do_something_with(myRange);

from different sources?

You do make some good points here, but as counter points: Your examples rely on the fact that there are two iterators, a begin iterator and an end iterator. In C++, this means that testing for the end of a range/iterator/whatever must be reduced to some kind of comparison, thus exposing an implementation detail in the design. Thinking about some of the ranges I've written, I cannot picture how I would reduce testing for empty to a comparison operation without resorting to some pretty significant kludges.
May 09 2009
prev sibling parent Michel Fortin <michel.fortin michelf.com> writes:
On 2009-05-09 10:45:05 -0400, "Steven Schveighoffer" 
<schveiguy yahoo.com> said:

 STL iterators can be used for more than just iteration.  They also 
 serve  as cursors, or pointers to specific elements.  If you add the 
 ability for  them to check their own bounds, then they become as safe 
 as ranges, and  can be used as general purpose pointers for things like 
 insertion,  deletion, bi-directional traversal, things that ranges can 
 do but are  clumsy at.
 
 You still have the interchangable-with-pointer concept burned into your 
  brain :)
 
 Think more like this:
 
 for(cursor i = begin; !i.end; i++)

So basically your cursor is a range (so it knows its bounds) with an added position pointer. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
May 09 2009
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Fri, 08 May 2009 11:57:41 -0400, Walter Bright  
<newshound1 digitalmars.com> wrote:

 Steven Schveighoffer wrote:
 You still have not addressed the usage of iterators as general data  
 structure pointers.  As far as I can tell, ranges do not implement this.
  i.e. find surrounding elements of an element.
  With iterators:
  auto iter = container.find(elem);
 auto elembefore = iter - 1;
 auto elemafter = iter + 1;
  Assuming incrementing and decrementing an iterator is checked for  
 out-of-bounds.

The problem is that last statement - "Assuming". If the iterator is the first or the last, or if there's only 1 or 2 elements in the container, it's crash city. Iterators are *inherently* uncheckable. For finding the elemafter, it's trivial as find() returns a range from the found element to the end (and it's also trivially checkable!). For elembefore, there's a bit more work involved, probably defining a find() that returns a range backed up by one one.

You're assuming an iterator does not know its bounds. Maybe I should call it something other than iterator. How about cursor? There are definite reasons to use containers in ways that don't involve std.algorithm, where something that has the easy ability to move back and forth N times without weird subrange operations. I'm thinking of a structure with either a pointer to the container for bounds checking, or a range and pointer combined (where the invariant is that the pointer is always within the range). I'm not saying ranges are not great, i think they are a HUGE step forward, but the statement "Iterators must be eliminated" may be too harsh. Perhaps the unchecked iterator, yes (but you may want to allow it in certain performance-critical code). -Steve
May 08 2009
prev sibling next sibling parent Lutger <lutger.blijdestijn gmail.com> writes:
Congratulations, glad to hear it was so well received. These are very 
inspiring slides, helped me to understand the benefits of ranges much 
better.

You are a very good writer, looking forward to read more of your 
publications on D.
May 08 2009
prev sibling next sibling parent davidl <davidl nospam.org> writes:
在 Fri, 08 May 2009 11:01:20 +0800,Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> 写道:

 The slides from my keynote at BoostCon 2009 (www.boostcon.com) are now  
 available from:

 http://www.boostcon.com/site-media/var/sphene/sphwiki/attachment/2009/05/08/iterators-must-go.pdf

 The talk went so well, the urge to brag is too mighty to resist. I mean  
 it's not everyday that several people come and tell they've literally  
 lost sleep and focus on other talks because of thinking about ranges.  
 Also, there was a definite feel of "before" and "after". In short, the  
 talk has generated a great deal of interest in both D proper and in the  
 Boost community rewriting the STL entirely in terms of ranges.


 Andrei

Yeah, range is the correct design. Iterators should be deprecated. -- 使用 Opera 革命性的电子邮件客户程序: http://www.opera.com/mail/
May 09 2009
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Sat, 09 May 2009 00:15:19 -0400, Walter Bright  
<newshound1 digitalmars.com> wrote:

 Steven Schveighoffer wrote:
 You're assuming an iterator does not know its bounds.

That's right. That's the usual design, which is based on the pointer model. Pointers do not know their limits.

Yes, that model is not as safe. That's not the model I'm referring to.
 Maybe I should call it something other than iterator.  How about cursor?

Or range? <g>

Nope, a range cannot go backwards or forwards at will. It can only do one or the other, shrinking one end.
 There are definite reasons to use containers in ways that don't involve  
 std.algorithm, where something that has the easy ability to move back  
 and forth N times without weird subrange operations.
  I'm thinking of a structure with either a pointer to the container for  
 bounds checking, or a range and pointer combined (where the invariant  
 is that the pointer is always within the range).
  I'm not saying ranges are not great, i think they are a HUGE step  
 forward, but the statement "Iterators must be eliminated" may be too  
 harsh.  Perhaps the unchecked iterator, yes (but you may want to allow  
 it in certain performance-critical code).

If you had an iterator that knew its beginning and end, then the whole paradigm of: for (iterator i = begin; i != end; i++) doesn't make much sense because of the redundancy.

STL iterators can be used for more than just iteration. They also serve as cursors, or pointers to specific elements. If you add the ability for them to check their own bounds, then they become as safe as ranges, and can be used as general purpose pointers for things like insertion, deletion, bi-directional traversal, things that ranges can do but are clumsy at. You still have the interchangable-with-pointer concept burned into your brain :) Think more like this: for(cursor i = begin; !i.end; i++) -Steve
May 09 2009
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Sat, 09 May 2009 12:40:34 -0400, Michel Fortin  
<michel.fortin michelf.com> wrote:

 On 2009-05-09 10:45:05 -0400, "Steven Schveighoffer"  
 <schveiguy yahoo.com> said:

 STL iterators can be used for more than just iteration.  They also  
 serve  as cursors, or pointers to specific elements.  If you add the  
 ability for  them to check their own bounds, then they become as safe  
 as ranges, and  can be used as general purpose pointers for things like  
 insertion,  deletion, bi-directional traversal, things that ranges can  
 do but are  clumsy at.
  You still have the interchangable-with-pointer concept burned into  
 your  brain :)
  Think more like this:
  for(cursor i = begin; !i.end; i++)

So basically your cursor is a range (so it knows its bounds) with an added position pointer.

Basically, yes. -Steve
May 11 2009
prev sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Sat, 09 May 2009 22:10:22 -0400, Rainer Deyke <rainerd eldwood.com>  
wrote:

 dsimcha wrote:
 == Quote from Rainer Deyke (rainerd eldwood.com)'s article
 Although I like ranges, it looks to me like there are a couple of
 operations that would difficult to implement without iterators or some
 other way to specify a specific position in a range.
 Finding and erasing an element:
   list.erase(find(list.begin(), list.end(), e));

What's wrong with this? Since list owns its range representation, it can know the implementation details. For a linked list, this will probably be just a pair of pointers under the hood anyhow. In other words, it's internally still an iterator, just prettier looking.

The following is semantically incorrect: r = find(list, e) list.erase(r) 'find' advances only the front of the range to the found element. Therefore 'r' is the range of all elements starting with the found element, but also including all elements after that. I would expect 'list.erase(range)' to erase all elements in the range. However, in this case I only want to erase a single element. This could still be expressed with ranges, but it would require a different function. For example: find(list, e).eraseFront() Or: list.eraseFrontOf(find(list, e)) Or: list.eraseOne(find(list, e)) Or: list.findAndErase(e) Or: list.erase(take(1, find(list, e)))

I think Andrei's plans are for this last one. You can srhink a range, so use the range primitives to srhink it to a 1-element range, then call erase.
 Splitting a container on a position:
   iter = find(list.begin(), list.end(), e);
   do_something_with(list.begin(), iter);
   do_something_else_with(iter, list.end());



From what I remember from the earlier discussions, Andrei's plan is for you to be able to subrange a range. So for example, you have 2 ranges that overlap, return a range that is the intersection or complement. I don't have huge issues with that solution, but I would point out that you now are depending on the developer to ensure the ranges do in fact overlap. Of course, I could be wrong with my assumption (about Andrei's intent), maybe he has a better solution. -Steve
May 11 2009