www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Iterators for D

reply Walter Bright <newshound digitalmars.com> writes:
It's becoming increasingly obvious that D needs iterators. While opApply 
  is far better for some kinds of iteration (such as recursively 
traversing a directory), iterators are more efficient in some cases, and 
allow for things that opApply makes difficult.

So hear are the design goals:

1) Works with dynamic and stat arrays
2) Doesn't need to work with associative arrays
3) Should be possible to achieve "pointer efficiency" with it
4) Needs to be able to restrict lvalue access (what C++ does with const 
iterators)
5) Needs to work seamlessly with foreach
6) Iterators need to be copyable

So here's one design:

.being property returns an iterator that starts at the beginning

.end returns an iterator that is at the end

Overloading * doesn't work with D. But, instead:

Overload opIndex for rvalue access
Overload opIndexAssign for lvalue access

Overloading opIndex also will work for random access

foreach loops will not be able to have a 'key' parameter.
Nov 06 2006
next sibling parent reply Walter Bright <newshound digitalmars.com> writes:
Walter Bright wrote:
 .being property returns an iterator that starts at the beginning

Make that .begin Also, overloading += will be used to advance the iterator.
Nov 06 2006
parent reply rm <roel.mathys gmail.com> writes:
Walter Bright wrote:
 Walter Bright wrote:
 .being property returns an iterator that starts at the beginning

Make that .begin Also, overloading += will be used to advance the iterator.

that would give something like this? import std.stdio; void main() { char[] s = "hello"; // += for advancing an iterator, why not ++? for (iterator i = s.begin; i != s.end; i += 1) writefln(s[i]); // ??? i is an iterator or a char ??? foreach(i; s) // extend writefln to take iterators ? // but that would mean that it must be possible // to determine a value from an iterator writefln(i); }
Nov 06 2006
next sibling parent reply Sean Kelly <sean f4.ca> writes:
rm wrote:
 Walter Bright wrote:
 Walter Bright wrote:
 .being property returns an iterator that starts at the beginning

Make that .begin Also, overloading += will be used to advance the iterator.

that would give something like this? import std.stdio; void main() { char[] s = "hello"; // += for advancing an iterator, why not ++? for (iterator i = s.begin; i != s.end; i += 1) writefln(s[i]);

Oh right, that's what doesn't seem to work with the C++ model--such an iterator wouldn't work in foreach: foreach( e; s.begin() ) // nowhere to supply s.end() I think the Java style may be more appropriate: foreach( e; s.fwdIter() ) foreach( e; s.revIter() ) Sean
Nov 06 2006
parent reply rm <roel.mathys gmail.com> writes:
 I think the Java style may be more appropriate:
 
     foreach( e; s.fwdIter() )
     foreach( e; s.revIter() )
 

I didn't read the discussion, but wasn't there a topic about foreach_reverse? Should that not cover s.revIter()? Secondly, why must one give in the foreach statement the first iterator? I'd say that there the iterable must be given? roel
Nov 06 2006
parent Sean Kelly <sean f4.ca> writes:
rm wrote:
 
 I think the Java style may be more appropriate:

     foreach( e; s.fwdIter() )
     foreach( e; s.revIter() )

I didn't read the discussion, but wasn't there a topic about foreach_reverse? Should that not cover s.revIter()?

How? Say I do this: foreach_reverse( e; s.begin() ) I've passed an iterator pointing to the beginning of a sequence to an algorithm that wants to iterate backwards, so best case the iterator is bidirectional and we will visit exactly one element (the theoretical "last" in this case) before exiting the loop.
 Secondly, why must one give in the foreach statement the first iterator?
 I'd say that there the iterable must be given?

Something like that, yes. You need a unified object that knows when it's reached the end of the sequence. So assuming C++ style iterators you'd have to wrap them in an object that takes care of the comparisons and looping to work with foreach. Sean
Nov 06 2006
prev sibling parent reply Hasan Aljudy <hasan.aljudy gmail.com> writes:
rm wrote:
 Walter Bright wrote:
 Walter Bright wrote:
 .being property returns an iterator that starts at the beginning

Make that .begin Also, overloading += will be used to advance the iterator.

that would give something like this? import std.stdio; void main() { char[] s = "hello"; // += for advancing an iterator, why not ++? for (iterator i = s.begin; i != s.end; i += 1) writefln(s[i]); // ??? i is an iterator or a char ??? foreach(i; s) // extend writefln to take iterators ? // but that would mean that it must be possible // to determine a value from an iterator writefln(i); }

Why would I use
     for (iterator i = s.begin; i != s.end; i += 1)
         writefln(s[i]);

instead of:
     for (int i = 0; i < s.length; i++)
         writefln(s[i]);

?
Nov 06 2006
parent Bill Baxter <dnewsgroup billbaxter.com> writes:
Hasan Aljudy wrote:
 
 
 rm wrote:
 Walter Bright wrote:
 Walter Bright wrote:
 .being property returns an iterator that starts at the beginning

Make that .begin Also, overloading += will be used to advance the iterator.

that would give something like this? import std.stdio; void main() { char[] s = "hello"; // += for advancing an iterator, why not ++? for (iterator i = s.begin; i != s.end; i += 1) writefln(s[i]); // ??? i is an iterator or a char ??? foreach(i; s) // extend writefln to take iterators ? // but that would mean that it must be possible // to determine a value from an iterator writefln(i); }

Why would I use > for (iterator i = s.begin; i != s.end; i += 1) > writefln(s[i]); instead of: > for (int i = 0; i < s.length; i++) > writefln(s[i]); ?

The point is that you could replace char[] with SomeCollection and the code should still work. But the same mechanism should also work on the built-in types. --bb
Nov 06 2006
prev sibling next sibling parent reply Dave <Dave_member pathlink.com> writes:
Walter Bright wrote:
 It's becoming increasingly obvious that D needs iterators. While opApply 
  is far better for some kinds of iteration (such as recursively 
 traversing a directory), iterators are more efficient in some cases, and 
 allow for things that opApply makes difficult.
 
 So hear are the design goals:
 
 1) Works with dynamic and stat arrays
 2) Doesn't need to work with associative arrays
 3) Should be possible to achieve "pointer efficiency" with it
 4) Needs to be able to restrict lvalue access (what C++ does with const 
 iterators)
 5) Needs to work seamlessly with foreach
 6) Iterators need to be copyable
 
 So here's one design:
 
 .being property returns an iterator that starts at the beginning
 
 .end returns an iterator that is at the end
 
 Overloading * doesn't work with D. But, instead:
 
 Overload opIndex for rvalue access
 Overload opIndexAssign for lvalue access
 
 Overloading opIndex also will work for random access
 
 foreach loops will not be able to have a 'key' parameter.

Would this mechanism work for, say, arrays of structs (not struct*[] 's) so that we could avoid the overhead of either a) copying the struct and/or b) the extra level of indirection in the loop? (In other words, could the iterator provide a reference to each element by default). Thanks, - Dave
Nov 06 2006
parent reply Dave <Dave_member pathlink.com> writes:
Dave wrote:
 Walter Bright wrote:
 It's becoming increasingly obvious that D needs iterators. While 
 opApply  is far better for some kinds of iteration (such as 
 recursively traversing a directory), iterators are more efficient in 
 some cases, and allow for things that opApply makes difficult.

 So hear are the design goals:

 1) Works with dynamic and stat arrays
 2) Doesn't need to work with associative arrays
 3) Should be possible to achieve "pointer efficiency" with it
 4) Needs to be able to restrict lvalue access (what C++ does with 
 const iterators)
 5) Needs to work seamlessly with foreach
 6) Iterators need to be copyable

 So here's one design:

 .being property returns an iterator that starts at the beginning

 .end returns an iterator that is at the end

 Overloading * doesn't work with D. But, instead:

 Overload opIndex for rvalue access
 Overload opIndexAssign for lvalue access

 Overloading opIndex also will work for random access

 foreach loops will not be able to have a 'key' parameter.

Would this mechanism work for, say, arrays of structs (not struct*[] 's) so that we could avoid the overhead of either a) copying the struct and/or b) the extra level of indirection in the loop? (In other words, could the iterator provide a reference to each element by default). Thanks, - Dave

I should add - will native arrays of structs (say) provide this by default w/o having to add specialized operators? Thanks.
Nov 06 2006
parent reply Walter Bright <newshound digitalmars.com> writes:
Dave wrote:
 I should add - will native arrays of structs (say) provide this by 
 default w/o having to add specialized operators?

Yes. It would be no different than for native arrays of ints.
Nov 06 2006
parent Dave <Dave_member pathlink.com> writes:
Walter Bright wrote:
 Dave wrote:
 I should add - will native arrays of structs (say) provide this by 
 default w/o having to add specialized operators?

Yes. It would be no different than for native arrays of ints.

Cool!
Nov 06 2006
prev sibling next sibling parent reply Tom S <h3r3tic remove.mat.uni.torun.pl> writes:
Walter Bright wrote:
 Overload opIndex for rvalue access
 Overload opIndexAssign for lvalue access

Could you elaborate on how this would look like ? E.g. a usage example of iterators with the proposed design.
 foreach loops will not be able to have a 'key' parameter.

What's the particular reason for that restriction ? -- Tomasz Stachowiak
Nov 06 2006
parent reply Walter Bright <newshound digitalmars.com> writes:
Tom S wrote:
 Walter Bright wrote:
 foreach loops will not be able to have a 'key' parameter.


With an iterator, what is the key?
Nov 06 2006
next sibling parent Tom S <h3r3tic remove.mat.uni.torun.pl> writes:
Walter Bright wrote:
 Tom S wrote:
 Walter Bright wrote:
 foreach loops will not be able to have a 'key' parameter.


With an iterator, what is the key?

You've got a point. I failed to see the big picture here.
Nov 06 2006
prev sibling next sibling parent Bill Baxter <dnewsgroup billbaxter.com> writes:
Walter Bright wrote:
 Tom S wrote:
 Walter Bright wrote:
 foreach loops will not be able to have a 'key' parameter.


With an iterator, what is the key?

It could just be the count. Say I'm trying to linearize a non-linear datastructure into a pre-existing array: foreach(int i, val v; hierarchical_thing.iterator()) { linearized[i] = val; } Also having the count there would make it easier to do something like iterating lock step in funky ways over different objects: foreach(int i, val v; thing.randomized_iterator()) { other_thing[i] = val; } Obviously you could count things yourself if you had to. But I find for instance in Python that I use 'enumerate()' quite often. enumerate() takes an iterator and returns an iterator that spits out (count,value) tuples. --bb
Nov 06 2006
prev sibling parent rm <roel.mathys gmail.com> writes:
Walter Bright wrote:
 Tom S wrote:
 Walter Bright wrote:
 foreach loops will not be able to have a 'key' parameter.


With an iterator, what is the key?

with an iterator, what is the "value"? Is it the key-value *pair*, or is it just the key?
Nov 06 2006
prev sibling next sibling parent reply Sean Kelly <sean f4.ca> writes:
Walter Bright wrote:
 It's becoming increasingly obvious that D needs iterators. While opApply 
  is far better for some kinds of iteration (such as recursively 
 traversing a directory), iterators are more efficient in some cases, and 
 allow for things that opApply makes difficult.
 
 So hear are the design goals:
 
 1) Works with dynamic and stat arrays
 2) Doesn't need to work with associative arrays
 3) Should be possible to achieve "pointer efficiency" with it
 4) Needs to be able to restrict lvalue access (what C++ does with const 
 iterators)
 5) Needs to work seamlessly with foreach
 6) Iterators need to be copyable
 
 So here's one design:
 
 ..being property returns an iterator that starts at the beginning
 
 ..end returns an iterator that is at the end

Is it worth discussing whether the pointer-style C++ iterators are the best approach? They certainly are if random access iterators are a requirement, but if we're just implementing forward and reverse iterators, the all-in-one Java approach may be better. In that vein, I assume alose that reverse iteration would work with reverse iterators and an .rbegin type method instead of .begin and foreach_reverse? Sean
Nov 06 2006
parent reply Walter Bright <newshound digitalmars.com> writes:
Sean Kelly wrote:
 Is it worth discussing whether the pointer-style C++ iterators are the 
 best approach?  They certainly are if random access iterators are a 
 requirement, but if we're just implementing forward and reverse 
 iterators, the all-in-one Java approach may be better.  In that vein, I 
  assume alose that reverse iteration would work with reverse iterators 
 and an .rbegin type method instead of .begin and foreach_reverse?

Hmm, or it could be done starting with .end and going to .begin.
Nov 06 2006
parent Sean Kelly <sean f4.ca> writes:
Walter Bright wrote:
 Sean Kelly wrote:
 Is it worth discussing whether the pointer-style C++ iterators are the 
 best approach?  They certainly are if random access iterators are a 
 requirement, but if we're just implementing forward and reverse 
 iterators, the all-in-one Java approach may be better.  In that vein, 
 I  assume alose that reverse iteration would work with reverse 
 iterators and an .rbegin type method instead of .begin and 
 foreach_reverse?

Hmm, or it could be done starting with .end and going to .begin.

Yup. About the only potentially weird thing there is that, assuming C++ style iterators, it would require reverse iterators and forward iterators to be comparable to one another. And, I suppose, for .end to return a reverse style iterator even for sequences that don't support reverse iteration? Sean
Nov 06 2006
prev sibling next sibling parent reply Bill Baxter <wbaxter gmail.com> writes:
Walter Bright wrote:
 It's becoming increasingly obvious that D needs iterators. While opApply 
  is far better for some kinds of iteration (such as recursively 
 traversing a directory), iterators are more efficient in some cases, and 
 allow for things that opApply makes difficult.
 
 So hear are the design goals:
 
 1) Works with dynamic and stat arrays

How about pointers (unknown length arrays)? int *some_numbers; Is it required that a pointer be a valid iterator? I.e. should it be possible to have this template function: print_all(some_numbers, some_numbers+theLength); Or will that case require a wrapper object that contains theLength? auto iter = PointerIterator(some_numbers, theLength); print_all(iter);
 2) Doesn't need to work with associative arrays

Why not? Do you mean it doesn't need to because you can just iterate over keys or values?
 3) Should be possible to achieve "pointer efficiency" with it
 4) Needs to be able to restrict lvalue access (what C++ does with const 
 iterators)
 5) Needs to work seamlessly with foreach
 6) Iterators need to be copyable

 So here's one design:
 
 .being property returns an iterator that starts at the beginning
 
 .end returns an iterator that is at the end

That's like C++'s way. Iterator is basically a generalized pointer. The other proposal is more like Java/Python/C#'s way, where the iterator is like a pointer that knows it's own limits. The Java/Python/C# way would definitely work in D. I'm not sure about the C++ way working for D because, in practice, references and dereferencing get used pretty heavily there. On the other hand Java/Python/C# were designed without real templates. One of the major design goals for C++ was that for a basic array a regular pointer should be a valid iterator. Thus the iterator cannot know where it's own end() is, because a pointer does not have such knowledge. But aside from cute demos on SGI's STL web page where algorithms are applied to plain arrays using pointers, I've never seen that sort of thing used in the real world. You almost always use a std::vector or std::list or something besides a raw array. So pointer equivalence seems to be pretty useless to me. Even moreso with D. In the rare case you need to work with raw pointers, you can always make an iterator wrapper. On the plus side for C++ style: * iterators are lightweight (just one size_t/pointer in most cases) * algorithms and methods all take begin/end pairs making things uniform. Minuses * Iterators are dumb and error prone and easily go out of bounds * algorithms all require begin/end pairs making things cumbersome I think the minuses outweigh the plusses. But maybe it's useful to take one concrete example: A generic mergesort or quicksort algorithm. In C++ recursive calls just pass the limits of the range to the children mergesort(begin, end) { ... mergesort(begin, (end-begin)/2); mergesort((end-begin)/2, end); } It's very efficient. In the iterator-is-object approach I'm not sure how that works. If all you've got is .next() or .hasNext() I'm not sure what you're supposed to do to create an iterator over half the range represented by the original iterator.
 foreach loops will not be able to have a 'key' parameter.

Don't understand that one. --bb
Nov 06 2006
parent reply Walter Bright <newshound digitalmars.com> writes:
Bill Baxter wrote:
 Walter Bright wrote:
 It's becoming increasingly obvious that D needs iterators. While 
 opApply  is far better for some kinds of iteration (such as 
 recursively traversing a directory), iterators are more efficient in 
 some cases, and allow for things that opApply makes difficult.

 So hear are the design goals:

 1) Works with dynamic and stat arrays

How about pointers (unknown length arrays)? int *some_numbers; Is it required that a pointer be a valid iterator? I.e. should it be possible to have this template function: print_all(some_numbers, some_numbers+theLength); Or will that case require a wrapper object that contains theLength? auto iter = PointerIterator(some_numbers, theLength); print_all(iter);

It's simpler than that. Pointers can be converted to arrays: auto array = some_numbers[0 .. theLength]; and then you can get the iterator.
 2) Doesn't need to work with associative arrays

over keys or values?

Because that falls into my big complaint with iterators - there's no way to efficiently 'linearize' a binary tree. Might as well just iterate over the keys or values.
 That's like C++'s way.  Iterator is basically a generalized pointer.

Yes.
 The other proposal is more like Java/Python/C#'s way, where the iterator 
  is like a pointer that knows it's own limits.
 The Java/Python/C# way would definitely work in D.   I'm not sure about 
 the C++ way working for D because, in practice, references and 
 dereferencing get used pretty heavily there.  On the other hand 
 Java/Python/C# were designed without real templates.

I think the C++ like way will be a lot more efficient, and I think it will work. Reference types are not needed.
 One of the major design goals for C++ was that for a basic array a 
 regular pointer should be a valid iterator.  Thus the iterator cannot 
 know where it's own end() is, because a pointer does not have such 
 knowledge.  But aside from cute demos on SGI's STL web page where 
 algorithms are applied to plain arrays using pointers,  I've never seen 
 that sort of thing used in the real world.  You almost always use a 
 std::vector or std::list or something besides a raw array.  So pointer 
 equivalence seems to be pretty useless to me.  Even moreso with D.  In 
 the rare case you need to work with raw pointers, you can always make an 
 iterator wrapper.

I think it'll work fine, since pointers can be easily converted to dynamic arrays (see above).
 On the plus side for C++ style:
 * iterators are lightweight (just one size_t/pointer in most cases)
 * algorithms and methods all take begin/end pairs making things uniform.
 Minuses
 * Iterators are dumb and error prone and easily go out of bounds
 * algorithms all require begin/end pairs making things cumbersome
 
 I think the minuses outweigh the plusses.
 But maybe it's useful to take one concrete example:  A generic mergesort 
 or quicksort algorithm.
 
 In C++ recursive calls just pass the limits of the range to the children
 mergesort(begin, end)
 {
     ...
     mergesort(begin, (end-begin)/2);
     mergesort((end-begin)/2, end);
 }
 It's very efficient.
 
 In the iterator-is-object approach I'm not sure how that works.
 If all you've got is .next() or .hasNext() I'm not sure what you're 
 supposed to do to create an iterator over half the range represented by 
 the original iterator.

Iterators can be any type, not just objects, that support [], ++, +=, and =.
 foreach loops will not be able to have a 'key' parameter.


What's the key for an unknown iterator type?
Nov 06 2006
next sibling parent reply Bill Baxter <dnewsgroup billbaxter.com> writes:
Walter Bright wrote:
 Bill Baxter wrote:
 Walter Bright wrote:
 It's becoming increasingly obvious that D needs iterators. While 
 opApply  is far better for some kinds of iteration (such as 
 recursively traversing a directory), iterators are more efficient in 
 some cases, and allow for things that opApply makes difficult.

 So hear are the design goals:

 1) Works with dynamic and stat arrays

How about pointers (unknown length arrays)? int *some_numbers; Is it required that a pointer be a valid iterator? I.e. should it be possible to have this template function: print_all(some_numbers, some_numbers+theLength); Or will that case require a wrapper object that contains theLength? auto iter = PointerIterator(some_numbers, theLength); print_all(iter);

It's simpler than that. Pointers can be converted to arrays: auto array = some_numbers[0 .. theLength]; and then you can get the iterator.

Nice! Didn't know you could do that.
 I think the C++ like way will be a lot more efficient, and I think it 
 will work. Reference types are not needed.

The C++ way really embodies two design choices: 1) iterators are defined by 'concept' --> Means generic iteration functions all need to be templates --> The other alternative is define them by interfaces. --> these two aren't necessarily mutually exclusive though 2) an iterator is a pointer -- begin() and end() are kept separate. --> The other alternative is 'an iterator is a range' Definitely 1) will be more efficient than using interfaces, but I don't thing pointer semantics, 2), are required for 1) to work. It may be possible for D iterators to have the speed of C++ while having the convenience and safety of Java's iterators that know their bounds.
 I think the minuses outweigh the plusses.
 But maybe it's useful to take one concrete example:  A generic 
 mergesort or quicksort algorithm.

 In C++ recursive calls just pass the limits of the range to the children
 mergesort(begin, end)
 {
     ...
     mergesort(begin, (end-begin)/2);
     mergesort((end-begin)/2, end);
 }
 It's very efficient.

 In the iterator-is-object approach I'm not sure how that works.
 If all you've got is .next() or .hasNext() I'm not sure what you're 
 supposed to do to create an iterator over half the range represented 
 by the original iterator.

Iterators can be any type, not just objects, that support [], ++, +=, and =.

I think the iterator-as-range idea is at least worth discussing. 99% of the time you use an iterator you also need to know where to stop. It's pretty similar to how arrays in D keep a length, because 99% of the time when you have an array you need to know how long it is too. So it makes sense to keep those two bits of information together. Similarly in C++, when you have an iterator, you almost always need the end() too. And any time you want to pass around iterators you end up having to pass the two bits of information around separately. --bb
Nov 06 2006
parent reply Walter Bright <newshound digitalmars.com> writes:
Bill Baxter wrote:
 I think the iterator-as-range idea is at least worth discussing.  99% of 
 the time you use an iterator you also need to know where to stop.  It's 
 pretty similar to how arrays in D keep a length, because 99% of the time 
 when you have an array you need to know how long it is too.  So it makes 
 sense to keep those two bits of information together.  Similarly in C++, 
 when you have an iterator, you almost always need the end() too.  And 
 any time you want to pass around iterators you end up having to pass the 
 two bits of information around separately.

That's a very good point. Got any ideas on that?
Nov 06 2006
parent reply "Craig Black" <cblack ara.com> writes:
When writing custom C++ iterators, I find that end() is not ever necessary.
If end() is not used, it means that a little more smarts have to be added to
the iterator itself so that the iterator knows when to stop.  In some cases
this means that the iterator needs a pointer to the collection/container
object in order to get that information.

-Craig

"Walter Bright" <newshound digitalmars.com> wrote in message
news:eioqvl$26t0$2 digitaldaemon.com...
 Bill Baxter wrote:
 I think the iterator-as-range idea is at least worth discussing.  99% of
 the time you use an iterator you also need to know where to stop.  It's
 pretty similar to how arrays in D keep a length, because 99% of the time
 when you have an array you need to know how long it is too.  So it makes
 sense to keep those two bits of information together.  Similarly in C++,
 when you have an iterator, you almost always need the end() too.  And
 any time you want to pass around iterators you end up having to pass the
 two bits of information around separately.

That's a very good point. Got any ideas on that?

Nov 07 2006
next sibling parent Tom S <h3r3tic remove.mat.uni.torun.pl> writes:
Craig Black wrote:
 When writing custom C++ iterators, I find that end() is not ever necessary.
 If end() is not used, it means that a little more smarts have to be added to
 the iterator itself so that the iterator knows when to stop.  In some cases
 this means that the iterator needs a pointer to the collection/container
 object in order to get that information.

Also in some cases, 'end()' is really unnatural to write. E.g. with cyclic containers.
Nov 07 2006
prev sibling parent "Kristian Kilpi" <kjkilpi gmail.com> writes:
On Tue, 07 Nov 2006 16:03:13 +0200, Craig Black <cblack ara.com> wrote:
 When writing custom C++ iterators, I find that end() is not ever  =

 necessary.
 If end() is not used, it means that a little more smarts have to be  =

 added to
 the iterator itself so that the iterator knows when to stop.  In some =

 cases
 this means that the iterator needs a pointer to the collection/contain=

 object in order to get that information.

A benefit of having a pointer to the iterated container is of course tha= t = boundary checks are always up to date, even if you change the container = = while iterating it. Well, I'm not saying that it would be the way to do = = the iterators, or that it's not the way to do them. Btw, I have used (in C++) iterators having the following functions: isEnd() //does the iterator point to the end position? (isNotEnd(= ) = etc provided for convenience) //there is also: isBegin(), isFirst(), isLast() toBegin() //to position -1 (which is outside the container) toFirst() //to position 0 toLast() //to position end - 1 toEnd() //the end position points outside the container toNext() toPrev() get() //returns the current item read() //=3D=3D get() + toNext() getNext() getPrev() readBackward() //=3D=3D get() + toPrev() ... I have named iterators capable of changing the container as 'Writer's = ('wr' + 'iter'). They have the following functions: set() write() //=3D=3D set() + toNext() ... Ok, there are a lot of functions... but my actual point is that the = iterator has two positions, 'begin' and 'end', that point *outside* the = = container. When writing (or setting) a value to 'end', the value is = appended to the end of the container (very natural operation for files).= = The same way "i.toBegin(); i.set(val);" inserts 'val' to the start of th= e = container (very unnatural operation for files, and actually causes an = error if tried for files).
Nov 07 2006
prev sibling parent Karen Lanrap <karen digitaldaemon.com> writes:
Walter Bright wrote:

 there's no way to efficiently 'linearize' a binary tree

In the general case the structure to be linearized is a directed graph. Every linearization has to at least implicitely build a spanning tree for that graph. A priori in spanning trees the number of childs of a node is limited only by the number of nodes of the structure. Therefore every linearizator must be able to handle trees of arbitrary degree. From tree grammars it is known, that top down attributions and bottom up attributions are incomparable to each other when only one run is allowed. Because the goal of any linearization is to compute some attribution of the structure every 'traverser' must assist the process outlined above, where in the general case more than one run is needed. Now: what is efficiency in this general case?
Nov 06 2006
prev sibling next sibling parent reply Mike Capp <mike.capp gmail.com> writes:
 Overload opIndex for rvalue access
 [...]
 Overloading opIndex also will work for random access

Wouldn't this make iterators incompatible with e.g. linked lists? Seems to defeat much of the purpose. (It might be possible to implement random access for a list, but it's not something you'd want to encourage.)
Nov 06 2006
parent reply Walter Bright <newshound digitalmars.com> writes:
Mike Capp wrote:
 Overload opIndex for rvalue access
 [...]
 Overloading opIndex also will work for random access

Wouldn't this make iterators incompatible with e.g. linked lists? Seems to defeat much of the purpose.

To work, all it has to do is support [0], and throw an exception if the index is anything but 0.
 (It might be possible to implement random access for a list, but it's not
 something you'd want to encourage.)

I agree.
Nov 06 2006
parent reply Bill Baxter <dnewsgroup billbaxter.com> writes:
Walter Bright wrote:
 Mike Capp wrote:
 Overload opIndex for rvalue access
 [...]
 Overloading opIndex also will work for random access

Wouldn't this make iterators incompatible with e.g. linked lists? Seems to defeat much of the purpose.

To work, all it has to do is support [0], and throw an exception if the index is anything but 0.
 (It might be possible to implement random access for a list, but it's not
 something you'd want to encourage.)

I agree.

This is one point where there's a huge divide between C++ iterators and Java iterators. In C++ there is no such thing as *the* iterator type or interface. What C++ (or STL, rather) has is collection of *iterator concepts*. What that means is that if opIndex doesn't make sense for the the collection in question, you just leave it out. A 'concept' is like an interface but for templates. Here are the concepts in the STL: Trivial Iterator - http://www.sgi.com/tech/stl/trivial.html Input Iterator - http://www.sgi.com/tech/stl/InputIterator.html Output Iterator - http://www.sgi.com/tech/stl/OutputIterator.html Forward Iterator - http://www.sgi.com/tech/stl/ForwardIterator.html Bidirectional Iterator - http://www.sgi.com/tech/stl/BidirectionalIterator.html Random Access Iterator http://www.sgi.com/tech/stl/RandomAccessIterator.html If you want your container to support forward iteration, then your iterator can be of any type you wish, but it must support x++, ++x, and *x operations. The Java/C# model is instead based on predefined interfaces which has pros and cons. * Main pro is that functions that take generic iterators can be ordinary functions rather than templates. * Main con is that such functions can't generally be as efficient as raw pointer manipulation, because you have to make a virtual method call through the interface pointer to get each next element. D is in a very interesting place in that it is high level enough that the C++ model doesn't quite make sense, but low level enough that the Java model doesn't quite make sense either. D may be able to have the best of both worlds. Define *both* standard concepts *and* standard interfaces that implement those concepts, and then libraries can write either templatized algorithms for speed, or regular function based algorithms for code-size, use with inheritance, etc. --bb
Nov 06 2006
parent Sean Kelly <sean f4.ca> writes:
Bill Baxter wrote:
 Walter Bright wrote:
 Mike Capp wrote:
 Overload opIndex for rvalue access
 [...]
 Overloading opIndex also will work for random access

Wouldn't this make iterators incompatible with e.g. linked lists? Seems to defeat much of the purpose.

To work, all it has to do is support [0], and throw an exception if the index is anything but 0.
 (It might be possible to implement random access for a list, but it's 
 not
 something you'd want to encourage.)

I agree.

This is one point where there's a huge divide between C++ iterators and Java iterators. In C++ there is no such thing as *the* iterator type or interface. What C++ (or STL, rather) has is collection of *iterator concepts*. What that means is that if opIndex doesn't make sense for the the collection in question, you just leave it out. A 'concept' is like an interface but for templates.

For what it's worth, C++98 style iterator traits won't work in D because of D's simpler template lookup rules. I think we'll have to go with something much closer to C++0x concept checking if we want to overload algorithms for different iterator categories.
 The Java/C# model is instead based on predefined interfaces which has 
 pros and cons.
 * Main pro is that functions that take generic iterators can be ordinary 
 functions rather than templates.
 * Main con is that such functions can't generally be as efficient as raw 
 pointer manipulation, because you have to make a virtual method call 
 through the interface pointer to get each next element.

Hrm... you could still use a Java design without the base class and have everyone just use templates though.
 D is in a very interesting place in that it is high level enough that 
 the C++ model doesn't quite make sense, but low level enough that the 
 Java model doesn't quite make sense either.
 
 D may be able to have the best of both worlds.  Define *both* standard 
 concepts *and* standard interfaces that implement those concepts, and 
 then libraries can write either templatized algorithms for speed, or 
 regular function based algorithms for code-size, use with inheritance, etc.

That's kind of what I was thinking, though if iterators all inherit from a common base class then the calls will be virtual regardless, unless the DMD optimizer gets a bit fancier. Sean
Nov 06 2006
prev sibling next sibling parent reply Kirk McDonald <kirklin.mcdonald gmail.com> writes:
Walter Bright wrote:
 It's becoming increasingly obvious that D needs iterators. While opApply 
  is far better for some kinds of iteration (such as recursively 
 traversing a directory), iterators are more efficient in some cases, and 
 allow for things that opApply makes difficult.
 
 So hear are the design goals:
 
 1) Works with dynamic and stat arrays
 2) Doesn't need to work with associative arrays
 3) Should be possible to achieve "pointer efficiency" with it
 4) Needs to be able to restrict lvalue access (what C++ does with const 
 iterators)
 5) Needs to work seamlessly with foreach
 6) Iterators need to be copyable
 
 So here's one design:
 
 .begin property returns an iterator that starts at the beginning
 
 .end returns an iterator that is at the end
 
 Overloading * doesn't work with D. But, instead:
 

What are the arguments against that, again? I do not recall, and it's worth at least reviewing why we don't have opDeref and opDerefAssign. As you know, Walter, a major benefit of C++-style iterators is that they are semantically as close as possible to pointers. Would [] be changed to dereference pointers? I sure hope not. If we go this route, I would suggest adding these unary-* overloads.
 Overload opIndex for rvalue access
 Overload opIndexAssign for lvalue access
 

I think you mean opSlice and opSliceAssign. Bar bar; auto foo = bar.begin(); Bar baz = foo[]; foo[] = Bar.init;
 Overloading opIndex also will work for random access
 
 foreach loops will not be able to have a 'key' parameter.

So, I would assume these iterators would work something like this: If a class provides .begin and .end, and these both return the same type, and that type provides the requisite iterator methods, it is foreach-able. If a class provides .begin, .end, and opApply, one of the two iteration methods has to take precedence. I would hope it's opApply. For a type T, its associated iterator type should be available via T.iterator. This has to be standard. An iterable type T MUST provide these methods: I begin() I end() An iterator type I MUST provide the following overloads: E opSlice() I opPostInc() E is the type of an element in the collection. These are enough to describe a read-only forward iterator, which is adequate for foreach. In other words, given a type T that meets the requirements outlined above, the following is adequate to iterate through it: foreach(e; t) { // ... } 'e' would be of type E. We run into a curious problem involving pre-increment, which uses opAddAssign. Classes for which random-access is problematic should not be required to provide this, which is essentially a random-access method. There are two options here: 1) Only require that iterators support opPostInc. (This further removes them from actual pointers.) 2) Introduce a convention whereby a non-random-access iterator throws an exception if any value other than 1 is passed to its opAddAssign overload. Extending these rules for bidirectional iterators and random-access iterators, as well as read-write iterators, is left as an exercise to the reader. (And not a very difficult one.) All of these strange edge-cases and differences between pointers and possible iterator types convince me that C++-STYLE ITERATORS ARE NOT FOR D. D, unlike C++, does not supply enough operator overloads for their semantics to be identical to those of pointers, which was the whole point of C++-style iterators in the first place. Instead, we should look to other languages for inspiration. I suggest Java and Python, whose iterator semantics are similar. Other posters have already made suggestions on this front. I would only suggest the "opIter" method for getting an iterator, and that the iterator class itself be available via T.iterator (whether it is a nested class or an alias shouldn't matter). Classes can provide methods for returning alternate iterators, as well. (A class providing opIter should be iterated over as easily as an iterator itself.) -- Kirk McDonald Pyd: Wrapping Python with D http://pyd.dsource.org
Nov 06 2006
parent reply Walter Bright <newshound digitalmars.com> writes:
Kirk McDonald wrote:
 Walter Bright wrote:
 Overloading * doesn't work with D. But, instead:


Because D has many cases of implicit dereferencing, I thought it would cause more trouble than it's worth.
 I do not recall, and it's 
 worth at least reviewing why we don't have opDeref and opDerefAssign.
 
 As you know, Walter, a major benefit of C++-style iterators is that they 
 are semantically as close as possible to pointers. Would [] be changed 
 to dereference pointers?

You've always been able to: int* p; p[1] = ...
 I sure hope not. If we go this route, I would 
 suggest adding these unary-* overloads.
 
 Overload opIndex for rvalue access
 Overload opIndexAssign for lvalue access

I think you mean opSlice and opSliceAssign.

No, I meant Index.
 Overloading opIndex also will work for random access

 foreach loops will not be able to have a 'key' parameter.

So, I would assume these iterators would work something like this: If a class provides .begin and .end, and these both return the same type, and that type provides the requisite iterator methods, it is foreach-able.

Yes, though they will return a value, not a type.
 If a class provides .begin, .end, and opApply, one of the two iteration 
 methods has to take precedence. I would hope it's opApply.

I was leaning towards the other way around.
 For a type T, its associated iterator type should be available via 
 T.iterator. This has to be standard.

It's not needed, as typeof(T.begin) will do it.
 An iterable type T MUST provide these methods:
 I begin()
 I end()

Yes.
 An iterator type I MUST provide the following overloads:
 E opSlice()

No. opIndex()
 I opPostInc()

Probably opAddAssign()
 E is the type of an element in the collection.

 These are enough to describe a read-only forward iterator, which is 
 adequate for foreach. In other words, given a type T that meets the 
 requirements outlined above, the following is adequate to iterate 
 through it:
 
 foreach(e; t) {
     // ...
 }
 
 'e' would be of type E.

Yes.
 We run into a curious problem involving pre-increment, which uses 
 opAddAssign. Classes for which random-access is problematic should not 
 be required to provide this, which is essentially a random-access 
 method. There are two options here:
 
 1) Only require that iterators support opPostInc. (This further removes 
 them from actual pointers.)
 
 2) Introduce a convention whereby a non-random-access iterator throws an 
 exception if any value other than 1 is passed to its opAddAssign overload.

I could go either way with that.
 Extending these rules for bidirectional iterators and random-access 
 iterators, as well as read-write iterators, is left as an exercise to 
 the reader. (And not a very difficult one.)
 
 All of these strange edge-cases and differences between pointers and 
 possible iterator types convince me that C++-STYLE ITERATORS ARE NOT FOR 
 D. D, unlike C++, does not supply enough operator overloads for their 
 semantics to be identical to those of pointers, which was the whole 
 point of C++-style iterators in the first place.

I think it does provide enough, in fact, it will be less wacky than in C++ (look at the wretched iterator/const_iterator dichotomy). It'll work with core arrays, and will not need those typedefs.
 Instead, we should look to other languages for inspiration. I suggest 
 Java and Python, whose iterator semantics are similar. Other posters 
 have already made suggestions on this front. I would only suggest the 
 "opIter" method for getting an iterator, and that the iterator class 
 itself be available via T.iterator (whether it is a nested class or an 
 alias shouldn't matter). Classes can provide methods for returning 
 alternate iterators, as well. (A class providing opIter should be 
 iterated over as easily as an iterator itself.)

Nov 06 2006
parent reply Kirk McDonald <kirklin.mcdonald gmail.com> writes:
Walter Bright wrote:
 Kirk McDonald wrote:
 Walter Bright wrote:
 Overloading * doesn't work with D. But, instead:


Because D has many cases of implicit dereferencing, I thought it would cause more trouble than it's worth.
 I do not recall, and it's worth at least reviewing why we don't have 
 opDeref and opDerefAssign.

 As you know, Walter, a major benefit of C++-style iterators is that 
 they are semantically as close as possible to pointers. Would [] be 
 changed to dereference pointers?

You've always been able to: int* p; p[1] = ...

Wouldn't that be p[0] = ...? Is that how you intend to use opIndex to "dereference" the iterator? That's /terrifying/. Although it is consistent with pointers, using [0] for everything is just asking for trouble with regards to non-random-access iterators. And though it's probably not totally meaningless to a random reader of the code, it comes pretty darned close.
 I sure hope not. If we go this route, I would suggest adding these 
 unary-* overloads.

 Overload opIndex for rvalue access
 Overload opIndexAssign for lvalue access

I think you mean opSlice and opSliceAssign.

No, I meant Index.
 Overloading opIndex also will work for random access

 foreach loops will not be able to have a 'key' parameter.

So, I would assume these iterators would work something like this: If a class provides .begin and .end, and these both return the same type, and that type provides the requisite iterator methods, it is foreach-able.

Yes, though they will return a value, not a type.

s/these both return the same type/their return types are the same/
 If a class provides .begin, .end, and opApply, one of the two 
 iteration methods has to take precedence. I would hope it's opApply.

I was leaning towards the other way around.

It is slightly easier and clearer to explicitly say: foreach(e; t.begin) { } than foreach(e; &t.opApply) { } opApply is an operator overload. .begin would just be a special method. I would expect the operator to take precedence.
 For a type T, its associated iterator type should be available via 
 T.iterator. This has to be standard.

It's not needed, as typeof(T.begin) will do it.

Due to the way in which D's type inference from properties works, you'll need typeof(T.begin()). Otherwise, it will derive the type of the method itself. (Which isn't even a valid type, as such.)
 An iterable type T MUST provide these methods:
 I begin()
 I end()

Yes.
 An iterator type I MUST provide the following overloads:
 E opSlice()

No. opIndex()

I'm referring specifically to the no-index form of opSlice (meaning i[] would "dereference" the iterator). (A no-index opIndex doesn't work.) This of course is inconsistent with pointers, so saying i[0] is better on that count.
 
 I opPostInc()

Probably opAddAssign()

Oh, there's another one: int opEquals(I)
 E is the type of an element in the collection.

 These are enough to describe a read-only forward iterator, which is 
 adequate for foreach. In other words, given a type T that meets the 
 requirements outlined above, the following is adequate to iterate 
 through it:

 foreach(e; t) {
     // ...
 }

 'e' would be of type E.

Yes.
 We run into a curious problem involving pre-increment, which uses 
 opAddAssign. Classes for which random-access is problematic should not 
 be required to provide this, which is essentially a random-access 
 method. There are two options here:

 1) Only require that iterators support opPostInc. (This further 
 removes them from actual pointers.)

 2) Introduce a convention whereby a non-random-access iterator throws 
 an exception if any value other than 1 is passed to its opAddAssign 
 overload.

I could go either way with that.

Neither one is particularly elegant.
 Extending these rules for bidirectional iterators and random-access 
 iterators, as well as read-write iterators, is left as an exercise to 
 the reader. (And not a very difficult one.)

 All of these strange edge-cases and differences between pointers and 
 possible iterator types convince me that C++-STYLE ITERATORS ARE NOT 
 FOR D. D, unlike C++, does not supply enough operator overloads for 
 their semantics to be identical to those of pointers, which was the 
 whole point of C++-style iterators in the first place.

I think it does provide enough, in fact, it will be less wacky than in C++ (look at the wretched iterator/const_iterator dichotomy). It'll work with core arrays, and will not need those typedefs.

Without the ability to overload the dereference operator, any attempt to write a class that behaves like a pointer in D will be unwieldy, or at least ugly. Admittedly, this isn't much of an issue when just using foreach, but might be an issue with any kind of STL-like algorithms library. (Which is what you're inviting when using C++'s iterator semantics.)
 
 Instead, we should look to other languages for inspiration. I suggest 
 Java and Python, whose iterator semantics are similar. Other posters 
 have already made suggestions on this front. I would only suggest the 
 "opIter" method for getting an iterator, and that the iterator class 
 itself be available via T.iterator (whether it is a nested class or an 
 alias shouldn't matter). Classes can provide methods for returning 
 alternate iterators, as well. (A class providing opIter should be 
 iterated over as easily as an iterator itself.)


-- Kirk McDonald Pyd: Wrapping Python with D http://pyd.dsource.org
Nov 06 2006
parent reply Walter Bright <newshound digitalmars.com> writes:
Kirk McDonald wrote:
 Walter Bright wrote:
 You've always been able to:
     int* p;
     p[1] = ...


I must have misunderstood you. I thought you were asking if [n] worked on pointers. Yes, it does.
 Is that how you intend to use opIndex to "dereference" the iterator? 
 That's /terrifying/. Although it is consistent with pointers, using [0] 
 for everything is just asking for trouble with regards to 
 non-random-access iterators. And though it's probably not totally 
 meaningless to a random reader of the code, it comes pretty darned close.

I agree it might be startling at first because it's unusual. Once one is past that, however, is it that bad?
 If a class provides .begin, .end, and opApply, one of the two 
 iteration methods has to take precedence. I would hope it's opApply.

I was leaning towards the other way around.

It is slightly easier and clearer to explicitly say: foreach(e; t.begin) { } than foreach(e; &t.opApply) { } opApply is an operator overload. .begin would just be a special method. I would expect the operator to take precedence.

The: foreach(e; t.begin) { } would not be supported. It would be: foreach (e; t) { } If an aggregate had both, and iterators would be chosen by default, overriding with: foreach (e; &t.opApply) { } would not require any new conventions or syntax. So I think this is the right way to go.
 For a type T, its associated iterator type should be available via 
 T.iterator. This has to be standard.

It's not needed, as typeof(T.begin) will do it.

Due to the way in which D's type inference from properties works, you'll need typeof(T.begin()). Otherwise, it will derive the type of the method itself. (Which isn't even a valid type, as such.)

Yes, you're right.
 No. opIndex()

I'm referring specifically to the no-index form of opSlice (meaning i[] would "dereference" the iterator). (A no-index opIndex doesn't work.) This of course is inconsistent with pointers, so saying i[0] is better on that count.

Ok, I understand now where you were coming from with the slice.
 I opPostInc()

Probably opAddAssign()

Oh, there's another one: int opEquals(I)

Yes, I'd overlooked that.
 I could go either way with that.


True, but that inelegance is hidden away in the library, but the flexibility is probably worth it.
 I think it does provide enough, in fact, it will be less wacky than in 
 C++ (look at the wretched iterator/const_iterator dichotomy). It'll 
 work with core arrays, and will not need those typedefs.

write a class that behaves like a pointer in D will be unwieldy, or at least ugly. Admittedly, this isn't much of an issue when just using foreach, but might be an issue with any kind of STL-like algorithms library. (Which is what you're inviting when using C++'s iterator semantics.)

I still think it isn't necessary, but you might be thinking of a use case I haven't. Can you provide an example?
Nov 06 2006
parent Bill Baxter <dnewsgroup billbaxter.com> writes:
Walter Bright wrote:
 Kirk McDonald wrote:

 
 I think it does provide enough, in fact, it will be less wacky than 
 in C++ (look at the wretched iterator/const_iterator dichotomy). 
 It'll work with core arrays, and will not need those typedefs.

to write a class that behaves like a pointer in D will be unwieldy, or at least ugly. Admittedly, this isn't much of an issue when just using foreach, but might be an issue with any kind of STL-like algorithms library. (Which is what you're inviting when using C++'s iterator semantics.)

I still think it isn't necessary, but you might be thinking of a use case I haven't. Can you provide an example?

The only thing that comes to mind is if you get multiple indirections. For instance, in C++ I frequently end up creating things like vectors of lists or lists of vectors. Or vectors of pointers. Or sets of list iterators. Then to actually use the things I need two dereferences. But with built-in GC, you don't really need smart pointers so often. And with "dot-is-all-you-need" that also eliminates many cases for dereferences. But still I'm sure you could come up with situations where to get at the thing pointed to you'd need a string of [0]'s iteriteriter[0][0][0] Actually, in some ways it's an improvement over the prefix deref operator because it reads consistently from left to right. In c++ if you had iterator to vector of iterator and you wanted to deref it you'd need: *((*iterveciter)[i]) Or maybe the inner parens are unnecessary. I can't recall, which is another reason why using postfix for everything is better -- no question about precedence rules. So I'm sold on dereferencing being a postfix operation. But I would still like it better if there were some way to get a compile-time error if I try to do iter[2] // Error! This iterator isn't random access! Would introducing a special symbol be of any use? Like [*]? iter[*] It would just compile to opIndex(0), but it declares the intent "this is dereferencing, not arbitrary indexing". I guess it's kind of pointless if there's no compiler support for enforcing its use. I guess it would only be useful if there were an opDeref. Anyway, whatever you do with dereferencing -- I'm convinced you should absolutely not make it a unary prefix operator. It should be postfix. --bb
Nov 06 2006
prev sibling next sibling parent reply "Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:
"Walter Bright" <newshound digitalmars.com> wrote in message 
news:eio6u5$1j6o$1 digitaldaemon.com...

 Overload opIndex for rvalue access
 Overload opIndexAssign for lvalue access

Somehow.. foreach(i; foo) { auto x = i[0]; i[0] = x + 1; } You often say that operator overloading should be used only what it should be used for, and taking the index of an iterator means.. what exactly? Would it be nicer to just use a .value property instead? foreach(i; foo) { auto x = i.value; i.value = x + 1; }
Nov 06 2006
parent reply Walter Bright <newshound digitalmars.com> writes:
Jarrett Billingsley wrote:
 "Walter Bright" <newshound digitalmars.com> wrote in message 
 news:eio6u5$1j6o$1 digitaldaemon.com...
 
 Overload opIndex for rvalue access
 Overload opIndexAssign for lvalue access

Somehow.. foreach(i; foo) { auto x = i[0]; i[0] = x + 1; } You often say that operator overloading should be used only what it should be used for, and taking the index of an iterator means.. what exactly? Would it be nicer to just use a .value property instead? foreach(i; foo) { auto x = i.value; i.value = x + 1; }

The problem is if the iterator is an actual pointer, there is no .value property for a pointer. Another possibility is to have: *x rewritten to: x[0] if x is not a pointer.
Nov 06 2006
next sibling parent "Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:
"Walter Bright" <newshound digitalmars.com> wrote in message 
news:eiosro$28g6$1 digitaldaemon.com...

 The problem is if the iterator is an actual pointer, there is no .value 
 property for a pointer.

 Another possibility is to have:
 *x
 rewritten to:
 x[0]
 if x is not a pointer.

And yet another is to allow overriding of opDeref and opDerefAssign ;) Which would, of course, be the _optimal_ solution, but..
Nov 06 2006
prev sibling parent reply Sean Kelly <sean f4.ca> writes:
Walter Bright wrote:
 Jarrett Billingsley wrote:
 "Walter Bright" <newshound digitalmars.com> wrote in message 
 news:eio6u5$1j6o$1 digitaldaemon.com...

 Overload opIndex for rvalue access
 Overload opIndexAssign for lvalue access

Somehow.. foreach(i; foo) { auto x = i[0]; i[0] = x + 1; } You often say that operator overloading should be used only what it should be used for, and taking the index of an iterator means.. what exactly? Would it be nicer to just use a .value property instead? foreach(i; foo) { auto x = i.value; i.value = x + 1; }

The problem is if the iterator is an actual pointer, there is no .value property for a pointer.

Is there really any reason to support pointers as iterators though? C++ libraries even seem to be moving away from that towards more robust and less error-prone iterator objects. Sean
Nov 06 2006
next sibling parent reply Bill Baxter <dnewsgroup billbaxter.com> writes:
Sean Kelly wrote:
 Would it be nicer to just use a .value property instead?

 foreach(i; foo)
 {
     auto x = i.value;
     i.value = x + 1;
 } 

The problem is if the iterator is an actual pointer, there is no .value property for a pointer.


Well, they could. It's up to Mr. Compiler Writer if a pointer has a value property or not.
 Is there really any reason to support pointers as iterators though?  C++ 
 libraries even seem to be moving away from that towards more robust and 
 less error-prone iterator objects.

Agreed. I don't think I've ever actually used a raw pointer with a fancy STL algorithm. I was actually really surprised the first time I saw it in the examples on the SGI STL page. Other than those examples, I've never seen it in real code. And that should go double for D, where you rarely even see pointers to begin with. --bb
Nov 06 2006
next sibling parent reply Walter Bright <newshound digitalmars.com> writes:
Bill Baxter wrote:
 Sean Kelly wrote:
 The problem is if the iterator is an actual pointer, there is no 
 .value property for a pointer.


Well, they could. It's up to Mr. Compiler Writer if a pointer has a value property or not.

Consider the following struct: struct Foo { int value; } Foo* p; p.value; Is p.value the entire contents of Foo (as it would be for a proposed .value property) or just Foo.value? "value" is a pretty common field name.
Nov 06 2006
next sibling parent Bill Baxter <dnewsgroup billbaxter.com> writes:
Walter Bright wrote:
 Bill Baxter wrote:
 Sean Kelly wrote:
 The problem is if the iterator is an actual pointer, there is no 
 .value property for a pointer.


Well, they could. It's up to Mr. Compiler Writer if a pointer has a value property or not.

Consider the following struct: struct Foo { int value; } Foo* p; p.value; Is p.value the entire contents of Foo (as it would be for a proposed .value property) or just Foo.value? "value" is a pretty common field name.

It's a name conflict. They happen. I'd have trouble if I wanted to have a sizeof member in my struct too. If you want to be sure to get the struct's member then use p[0].value. 'Value' is too common, though. If you were going to go that route it should definitely be something like .iterVal or some other word less likely to conflict. I'm not saying it's the right way to go, just that it could be made to work if needed. --bb
Nov 06 2006
prev sibling parent Charles D Hixson <charleshixsn earthlink.net> writes:
Walter Bright wrote:
 Bill Baxter wrote:
 Sean Kelly wrote:
 The problem is if the iterator is an actual pointer, there is no 
 .value property for a pointer.


Well, they could. It's up to Mr. Compiler Writer if a pointer has a value property or not.

Consider the following struct: struct Foo { int value; } Foo* p; p.value; Is p.value the entire contents of Foo (as it would be for a proposed .value property) or just Foo.value? "value" is a pretty common field name.

Maybe that's why Ada used "`" as a property marker rather than "."? Of course there's the Python approach...naming it __value__. A suitable solution might be to name it "valueof". That's a much less common field name. The problem is that this is something used quite frequently, so one would want it to be short. Perhaps "V"? Then one would write struct Foo { int value; } Foo* p; p.value; p.V; and have two distinct forms. (Yes, V is used occasionally as a variable name...but pretty rarely.)
Nov 07 2006
prev sibling parent Sean Kelly <sean f4.ca> writes:
Bill Baxter wrote:
 Sean Kelly wrote:
 Would it be nicer to just use a .value property instead?

 foreach(i; foo)
 {
     auto x = i.value;
     i.value = x + 1;
 } 

The problem is if the iterator is an actual pointer, there is no .value property for a pointer.


Well, they could. It's up to Mr. Compiler Writer if a pointer has a value property or not.
 Is there really any reason to support pointers as iterators though?  
 C++ libraries even seem to be moving away from that towards more 
 robust and less error-prone iterator objects.

Agreed. I don't think I've ever actually used a raw pointer with a fancy STL algorithm. I was actually really surprised the first time I saw it in the examples on the SGI STL page. Other than those examples, I've never seen it in real code.

I've used it a few times, but generally more when I'm throwing together a quick test than writing actual production code. But since D arrays aren't the same as C++ arrays and we have the option of getting a robust object back (via a.begin() or whatever), why not take it? Sean
Nov 07 2006
prev sibling parent reply Walter Bright <newshound digitalmars.com> writes:
Sean Kelly wrote:
 Is there really any reason to support pointers as iterators though?  C++ 
 libraries even seem to be moving away from that towards more robust and 
 less error-prone iterator objects.

Can you be more specific about what the problems and solutions are?
Nov 06 2006
next sibling parent Georg Wrede <georg.wrede nospam.org> writes:
Walter Bright wrote:
 Sean Kelly wrote:
 
 Is there really any reason to support pointers as iterators though?  
 C++ libraries even seem to be moving away from that towards more 
 robust and less error-prone iterator objects.

Can you be more specific about what the problems and solutions are?

For one thing, if pointers still are supported, then all algorithms have to be written so they can use them. If we skip "real pointer compatibility", then we can decide on smarter iterators. Example: With pointer-compatible iterators, it is impossible to traverse the tree. With smarter iterators one could.
Nov 07 2006
prev sibling parent reply Sean Kelly <sean f4.ca> writes:
Walter Bright wrote:
 Sean Kelly wrote:
 Is there really any reason to support pointers as iterators though?  
 C++ libraries even seem to be moving away from that towards more 
 robust and less error-prone iterator objects.

Can you be more specific about what the problems and solutions are?

A direct application of pointers as iterators couldn't be done exactly the C++ way because C++ uses traits templates to describe the operations an iterator supports. And D will not overload templates from different modules, so the user could not provide overloads for his own iterators. This isn't a huge problem however, as we could probably write a set of generic traits templates that apply to classes/structs implemented the 'right' way rather than requiring the user to provide his own. This would be roughly similar to how a user can derive from std::iterator<> to add the required typedefs to his object in C++. More generally though, I find the need to use two iterators to identify a range to be somewhat cumbersome and annoying. And there have been enough others who feel this way that there has been at least an informal proposal for "all in one" iterators for C++. I haven't bothered to check whether it was ever formalized though. Basically, I'm wondering whether it might be easier and more D-like to use iterators for stepping and opIndex for random access, instead of iterators for everything. Since D has slicing, the argument for using iterators to define the boundaries of a range of randomly accessible elements seems kind of small to me. ie. sort( a.begin, a.begin + 5 ); // sort the first 5 elements or sort( a[0 .. 5] ); I find the second to be cleaner to look at. But I'm undecided whether we'd lose any power or flexibility by not bothering with random access iterators. Sean
Nov 07 2006
parent reply Walter Bright <newshound digitalmars.com> writes:
Sean Kelly wrote:
 Since D has slicing, the argument for using 
 iterators to define the boundaries of a range of randomly accessible 
 elements seems kind of small to me.  ie.
 
     sort( a.begin, a.begin + 5 ); // sort the first 5 elements
 or
     sort( a[0 .. 5] );
 
 I find the second to be cleaner to look at.  But I'm undecided whether 
 we'd lose any power or flexibility by not bothering with random access 
 iterators.

Hmm, instead of 'pointers as iterators', have 'arrays as iterators'. That definitely sounds like a good area to explore.
Nov 07 2006
next sibling parent Sean Kelly <sean f4.ca> writes:
Walter Bright wrote:
 Sean Kelly wrote:
 Since D has slicing, the argument for using iterators to define the 
 boundaries of a range of randomly accessible elements seems kind of 
 small to me.  ie.

     sort( a.begin, a.begin + 5 ); // sort the first 5 elements
 or
     sort( a[0 .. 5] );

 I find the second to be cleaner to look at.  But I'm undecided whether 
 we'd lose any power or flexibility by not bothering with random access 
 iterators.

Hmm, instead of 'pointers as iterators', have 'arrays as iterators'. That definitely sounds like a good area to explore.

I went ahead and wrote everything up during my commute this morning. It's posted separately as "Iterator straw-man." And I apologize for it being a bit longer than expected :-) Sean
Nov 07 2006
prev sibling next sibling parent Benji Smith <dlanguage benjismith.net> writes:
Sean Kelly wrote:
 Since D has slicing, the argument for using iterators to define the 
 boundaries of a range of randomly accessible elements seems kind of 
 small to me.  ie.

     sort( a.begin, a.begin + 5 ); // sort the first 5 elements
 or
     sort( a[0 .. 5] );

 I find the second to be cleaner to look at.  But I'm undecided whether 
 we'd lose any power or flexibility by not bothering with random access 
 iterators.

I agree completely. As someone with a background in Java, C#, and Python (among others), but without significant C++ experience, my concept of an iterator is: "an object with methods for getting the current element, determining whether a 'next' element exists, and for advancing the cursor to the next element". The C++ iterators look very weird (and very very wrong) to me. Search and sort routines should use opIndex, opIndexAssign, and opSlice. If an object doesn't implement opIndex or opIndexAssign, it's not sortable using generic sorting algorithms. Tough luck. Iteration, though, is completely distinct from sorting and searching, and should use a different mechanism. Walter Bright wrote:
 Hmm, instead of 'pointers as iterators', have 'arrays as iterators'.
 That definitely sounds like a good area to explore.

Perhaps an iterator always steps through an array (or an object implementing a List or Iterable interface), and in order for an object to be foreach-able, you'll need to first transform an object into an array (or construct a List or an Iterable wrapping the object). Arrays could still get their own special-case implementation of foreach, avoiding interfaces for maximum speed. And, if a class supports opIndex, then you could avoid making a virtual method call through the Iterable interface. But I think it's also desirable, in many many cases, to implement an Iterator class, without supporting opIndex. The only methods in the iterator would be hasNext() and next(). Foreach should support real arrays, classes with opIndex, and Java-style iterators. Searching and sorting should only use opIndex, opIndex, opCmp, and opEquals. Iterators should only be used for iteration. Anyhoo, those are my two cents. --Benji Smith
Nov 07 2006
prev sibling next sibling parent reply Bill Baxter <dnewsgroup billbaxter.com> writes:
Walter Bright wrote:
 Sean Kelly wrote:
 Since D has slicing, the argument for using iterators to define the 
 boundaries of a range of randomly accessible elements seems kind of 
 small to me.  ie.

     sort( a.begin, a.begin + 5 ); // sort the first 5 elements
 or
     sort( a[0 .. 5] );

 I find the second to be cleaner to look at.  But I'm undecided whether 
 we'd lose any power or flexibility by not bothering with random access 
 iterators.

Hmm, instead of 'pointers as iterators', have 'arrays as iterators'. That definitely sounds like a good area to explore.

OOh, I like the direction this is heading. Sean I don't have time to read the straw-man right now but that definitely solves my recursive mergesort question. And Benji's comment about simple iterators and ranges being distinct things that shouldn't be lumped together also seems on target. If iterators don't have to be ranges then it's simple to see how to support iterators over infinite or circular containers. They just never stop spitting out 'next()'s But how do you handle generically pointing to an element then? Like you have with an iterator to a node in linked list. I guess you can have iterators with hasPrev() getPrev() (or --) type methods too. --bb
Nov 07 2006
parent Ary Manzana <ary esperanto.org.ar> writes:
Bill Baxter escribió:
 But how do you handle generically pointing to an element then?  Like you 
 have with an iterator to a node in linked list.  I guess you can have 
 iterators with hasPrev() getPrev() (or --) type methods too.
 
 --bb

In fact, Java has them: http://java.sun.com/j2se/1.5.0/docs/api/java/util/ListIterator.html
Nov 07 2006
prev sibling parent reply Daniel Keep <daniel.keep.lists gmail.com> writes:
Walter Bright wrote:
 Sean Kelly wrote:
 Since D has slicing, the argument for using iterators to define the
 boundaries of a range of randomly accessible elements seems kind of
 small to me.  ie.

     sort( a.begin, a.begin + 5 ); // sort the first 5 elements
 or
     sort( a[0 .. 5] );

 I find the second to be cleaner to look at.  But I'm undecided whether
 we'd lose any power or flexibility by not bothering with random access
 iterators.

Hmm, instead of 'pointers as iterators', have 'arrays as iterators'. That definitely sounds like a good area to explore.

Hang on; aren't we back to where we are *right now*? Things that need random access override opIndex and opIndexAssign. Things which you can take a range of values from override opSlice and opSliceAssign. Things that require lock-step iteration override opApply, or supply a function that returns a delegate that "runs" a foreach loop. About the only thing this *doesn't* cover are bi-directional iterators (say, iterating back and forth over an infinite series). I know some people want to be able to stop iteration and resume it, but couldn't that be solved by making opApply return a generator like what Python does? Those things are written the same way as a current opApply (sans all the return value from the delegate stuff), and can be advanced manually. I know this can be done. I wrote the coroutine module for StackThreads a while back, and one of the *very first* things I did was implement generic generators. The base generator class essentially just implemented opApply switching over to the coroutine every time it needed the next value. Here's a port of Python's range() function (well, the one and zero argument versions :P):
 class
 range : Generator!(uint)
 {
     // These just let you use range() or range(n)
     static
     range
     opCall()
     {
         return range(uint.max);
     }

     static
     range
     opCall(uint limit)
     {
         return new range(limit);
     }

 protected:
     uint limit;

     // Or you can use new range(n)
     this(uint limit)
     {
         this.limit = limit;
         super();
     }

     // Where the magic happens.
     override
     uint
     run()
     {
         uint i = 0;
         while( i < limit )
             yield(i++);    // yield in D: *very* sexy

         StopGenerator();   // throws an exception, imitating what
                            // Python does to stop an iterator.
     }
 }

Now, from that, we can implement opApply AND next/current in the base class. Going with coroutines kills both the opApply and forward iterator birds with one stone. As for bi-directional iterators... I actually can't think of anywhere I'd use them. I just keep thinking "I could just use random access to the same end". Python's never suffered from not having them... Anyway, just some random thoughts. Haven't had coffee yet :3 -- Daniel -- Unlike Knuth, I have neither proven or tried the above; it may not even make sense. v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP http://hackerkey.com/
Nov 07 2006
next sibling parent reply Bill Baxter <dnewsgroup billbaxter.com> writes:
Daniel Keep wrote:
 
 Walter Bright wrote:
 Sean Kelly wrote:
 Since D has slicing, the argument for using iterators to define the
 boundaries of a range of randomly accessible elements seems kind of
 small to me.  ie.

     sort( a.begin, a.begin + 5 ); // sort the first 5 elements
 or
     sort( a[0 .. 5] );

 I find the second to be cleaner to look at.  But I'm undecided whether
 we'd lose any power or flexibility by not bothering with random access
 iterators.

That definitely sounds like a good area to explore.

Hang on; aren't we back to where we are *right now*?

Close, but right now we don't have a good way to iterate over multiple things at once. for(i,j; i.hasNext()&&j.hasNext(); i++,j++) { // do something with current values of i and j } Or as you say to stop iteration or a generic way to return a pointer to a particular spot in a container.
 About the only thing this *doesn't* cover are bi-directional iterators
 (say, iterating back and forth over an infinite series).

 [intersting stuff about generators and D]

If generators can handle the above cases *AND* do it with code that simple to create and use *AND* do all that as efficiently as for loops, then it sounds great to me. My impression was that any sort of opApply/coroutine thing in D was not going to be so efficient. --bb
Nov 07 2006
parent reply Daniel Keep <daniel.keep.lists gmail.com> writes:
Bill Baxter wrote:
 Daniel Keep wrote:
 Walter Bright wrote:
 Sean Kelly wrote:
 Since D has slicing, the argument for using iterators to define the
 boundaries of a range of randomly accessible elements seems kind of
 small to me.  ie.

     sort( a.begin, a.begin + 5 ); // sort the first 5 elements
 or
     sort( a[0 .. 5] );

 I find the second to be cleaner to look at.  But I'm undecided whether
 we'd lose any power or flexibility by not bothering with random access
 iterators.

That definitely sounds like a good area to explore.

Hang on; aren't we back to where we are *right now*?

Close, but right now we don't have a good way to iterate over multiple things at once. for(i,j; i.hasNext()&&j.hasNext(); i++,j++) { // do something with current values of i and j } Or as you say to stop iteration or a generic way to return a pointer to a particular spot in a container.
 About the only thing this *doesn't* cover are bi-directional iterators
 (say, iterating back and forth over an infinite series).

 [intersting stuff about generators and D]

If generators can handle the above cases *AND* do it with code that simple to create and use *AND* do all that as efficiently as for loops, then it sounds great to me. My impression was that any sort of opApply/coroutine thing in D was not going to be so efficient. --bb

Well, as I said, once you have your sequence expressed as a coroutine, writing hasNext() and getCurrent() is trivial. Once you have that, iterating multiple things in lockstep is also trivial:
 class zip : Generator!(Tuple!(T1,T2))
 {
     // ...
     Generator!(T1) seq1;
     Generator!(T2) seq2;
     // ...
     void run()
     {
         while( seq1.hasNext && seq2.hasNext )
             yield(new Tuple!(T1,T2)(seq1.next,seq2.next));
     }
     // ...
 }

Or something to that effect :P As for efficiency, I'm not sure how to go about testing that so that the results mean anything. I'm sure my current implementation could be made faster, but I don't know how. Just for loose comparison, the game EVE Online has the record for the most number of people simultaneously online on a single server at once. And it's servers are built using coroutines running in *Python*. I suppose the bottleneck (if it is one) is how fast you can do user-space context switches. Which is basically a function call, moving the base of the stack, and switching over stuff like the exception handling registers. Or somesuch... Mikola Lysenko wrote StackThreads; my attempt only mostly worked :P -- Daniel -- Unlike Knuth, I have neither proven or tried the above; it may not even make sense. v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP http://hackerkey.com/
Nov 07 2006
parent Sean Kelly <sean f4.ca> writes:
Daniel Keep wrote:
 
 I suppose the bottleneck (if it is one) is how fast you can do
 user-space context switches.  Which is basically a function call, moving
 the base of the stack, and switching over stuff like the exception
 handling registers.  Or somesuch... Mikola Lysenko wrote StackThreads;
 my attempt only mostly worked :P

That's pretty much it. Saving the current register data, loading the other register data, and jumping to a new instruction address. There's no way this can be as efficient as an integer increment instruction (for loop iteration) but it's orders of magnitude faster than a kernel thread context switch. Sean
Nov 08 2006
prev sibling next sibling parent reply Benji Smith <dlanguage benjismith.net> writes:
Daniel Keep wrote:
 Things that need random access override opIndex and opIndexAssign.
 
 Things which you can take a range of values from override opSlice and
 opSliceAssign.
 
 Things that require lock-step iteration override opApply, or supply a
 function that returns a delegate that "runs" a foreach loop.
 
 About the only thing this *doesn't* cover are bi-directional iterators
 (say, iterating back and forth over an infinite series).

What about containers where there are multiple valid orders of iteration? For a binary tree class, I might want to perform iteration using depth-first, breadth-first, in-order, or reverse-order traversals. How does the opApply/opApplyReverse solution address these needs? And isn't a delegate-call about as computationally expensive as a virtual method call (in the case of an Iterable iterface)? If the delegate solution is really better than an Iterable interface, why create special cases for opApply and opApplyReverse? Why not just say that foreach always requires an iteration delegate? --benji
Nov 08 2006
parent Chris Nicholson-Sauls <ibisbasenji gmail.com> writes:
Benji Smith wrote:
 Daniel Keep wrote:
 
 Things that need random access override opIndex and opIndexAssign.

 Things which you can take a range of values from override opSlice and
 opSliceAssign.

 Things that require lock-step iteration override opApply, or supply a
 function that returns a delegate that "runs" a foreach loop.

 About the only thing this *doesn't* cover are bi-directional iterators
 (say, iterating back and forth over an infinite series).

What about containers where there are multiple valid orders of iteration? For a binary tree class, I might want to perform iteration using depth-first, breadth-first, in-order, or reverse-order traversals. How does the opApply/opApplyReverse solution address these needs? And isn't a delegate-call about as computationally expensive as a virtual method call (in the case of an Iterable iterface)?

It should be roughly equivelant in cost to explicitly calling the method, meaning if the method is virtual then you will get that cost, and if it is final then it is like any function call. Since Iterable methods will very likely almost always be virtual, there is at least a chance of delegates being a little cheaper /some of the time/ -- when they can be pointers to final methods.
 If the delegate solution is really better than an Iterable interface, 
 why create special cases for opApply and opApplyReverse? Why not just 
 say that foreach always requires an iteration delegate?
 
 --benji

Well, when foreach(;) was first added to D, I don't think anyone had thought about delegates-as-aggregates yet. In fact, it was just added a couple of versions back. So I suppose in a sense, opApply*Reverse is just legacy -- though it makes it straight-forward to iterate simple, linear-only collections. I suppose it would be safe enough to do away with opApply*Reverse now, but I don't think it hurts anything to leave them be, either. -- Chris Nicholson-Sauls
Nov 08 2006
prev sibling parent reply Sean Kelly <sean f4.ca> writes:
Daniel Keep wrote:
 
 Walter Bright wrote:
 Sean Kelly wrote:
 Since D has slicing, the argument for using iterators to define the
 boundaries of a range of randomly accessible elements seems kind of
 small to me.  ie.

     sort( a.begin, a.begin + 5 ); // sort the first 5 elements
 or
     sort( a[0 .. 5] );

 I find the second to be cleaner to look at.  But I'm undecided whether
 we'd lose any power or flexibility by not bothering with random access
 iterators.

That definitely sounds like a good area to explore.

Hang on; aren't we back to where we are *right now*?

Essentially. But D has no formal description of an iterator type. I think it would be a useful convention to establish, and doing so requires determining what types of iterators should be supported, and how.
 Things that need random access override opIndex and opIndexAssign.
 
 Things which you can take a range of values from override opSlice and
 opSliceAssign.
 
 Things that require lock-step iteration override opApply, or supply a
 function that returns a delegate that "runs" a foreach loop.
 
 About the only thing this *doesn't* cover are bi-directional iterators
 (say, iterating back and forth over an infinite series).

This wouldn't be too difficult to do by adding a prev() method and atBegin() or some such. Or maybe the Java style of hasNext(), next(), hasPrev(), prev().
 I know some people want to be able to stop iteration and resume it, but
 couldn't that be solved by making opApply return a generator like what
 Python does?  Those things are written the same way as a current opApply
 (sans all the return value from the delegate stuff), and can be advanced
 manually.
 
 I know this can be done.  I wrote the coroutine module for StackThreads
 a while back, and one of the *very first* things I did was implement
 generic generators.  The base generator class essentially just
 implemented opApply switching over to the coroutine every time it needed
 the next value.  Here's a port of Python's range() function (well, the
 one and zero argument versions :P):
 
 class
 range : Generator!(uint)
 {
     // These just let you use range() or range(n)
     static
     range
     opCall()
     {
         return range(uint.max);
     }

     static
     range
     opCall(uint limit)
     {
         return new range(limit);
     }

 protected:
     uint limit;

     // Or you can use new range(n)
     this(uint limit)
     {
         this.limit = limit;
         super();
     }

     // Where the magic happens.
     override
     uint
     run()
     {
         uint i = 0;
         while( i < limit )
             yield(i++);    // yield in D: *very* sexy

         StopGenerator();   // throws an exception, imitating what
                            // Python does to stop an iterator.
     }
 }

Now, from that, we can implement opApply AND next/current in the base class. Going with coroutines kills both the opApply and forward iterator birds with one stone.

Interesting idea--I'll admit I don't have much experience with Python. But I think both approaches have value. Using coroutines just to step across two sequences simultaneously seems unnecessarily complex, and not terribly efficient compared to a simple pointer increment. But it does offer a tremendous amount of power for more complex cases.
 As for bi-directional iterators... I actually can't think of anywhere
 I'd use them.  I just keep thinking "I could just use random access to
 the same end".  Python's never suffered from not having them...

I don't think I've ever needed them either, but they can come in handy when implementing one container in terms of another. Say an iterable stack and queue (unlikely, I know) that are implemented via a doubly linked-list. Sean
Nov 08 2006
parent Bill Baxter <wbaxter gmail.com> writes:
Sean Kelly wrote:
 Daniel Keep wrote:
 
 Walter Bright wrote:

 Sean Kelly wrote:

 Since D has slicing, the argument for using iterators to define the
 boundaries of a range of randomly accessible elements seems kind of
 small to me.  ie.

     sort( a.begin, a.begin + 5 ); // sort the first 5 elements
 or
     sort( a[0 .. 5] );

 I find the second to be cleaner to look at.  But I'm undecided whether
 we'd lose any power or flexibility by not bothering with random access
 iterators.

Hmm, instead of 'pointers as iterators', have 'arrays as iterators'. That definitely sounds like a good area to explore.

Hang on; aren't we back to where we are *right now*?

Essentially. But D has no formal description of an iterator type. I think it would be a useful convention to establish, and doing so requires determining what types of iterators should be supported, and how.
 Things that need random access override opIndex and opIndexAssign.

 Things which you can take a range of values from override opSlice and
 opSliceAssign.

 Things that require lock-step iteration override opApply, or supply a
 function that returns a delegate that "runs" a foreach loop.

 About the only thing this *doesn't* cover are bi-directional iterators
 (say, iterating back and forth over an infinite series).

This wouldn't be too difficult to do by adding a prev() method and atBegin() or some such. Or maybe the Java style of hasNext(), next(), hasPrev(), prev().
 I know some people want to be able to stop iteration and resume it, but
 couldn't that be solved by making opApply return a generator like what
 Python does?  Those things are written the same way as a current opApply
 (sans all the return value from the delegate stuff), and can be advanced
 manually.

 I know this can be done.  I wrote the coroutine module for StackThreads
 a while back, and one of the *very first* things I did was implement
 generic generators.  The base generator class essentially just
 implemented opApply switching over to the coroutine every time it needed
 the next value.  Here's a port of Python's range() function (well, the
 one and zero argument versions :P):

 class
 range : Generator!(uint)
 {
     // These just let you use range() or range(n)
     static
     range
     opCall()
     {
         return range(uint.max);
     }

     static
     range
     opCall(uint limit)
     {
         return new range(limit);
     }

 protected:
     uint limit;

     // Or you can use new range(n)
     this(uint limit)
     {
         this.limit = limit;
         super();
     }

     // Where the magic happens.
     override
     uint
     run()
     {
         uint i = 0;
         while( i < limit )
             yield(i++);    // yield in D: *very* sexy

         StopGenerator();   // throws an exception, imitating what
                            // Python does to stop an iterator.
     }
 }

Now, from that, we can implement opApply AND next/current in the base class. Going with coroutines kills both the opApply and forward iterator birds with one stone.

Interesting idea--I'll admit I don't have much experience with Python. But I think both approaches have value. Using coroutines just to step across two sequences simultaneously seems unnecessarily complex, and not terribly efficient compared to a simple pointer increment. But it does offer a tremendous amount of power for more complex cases.
 As for bi-directional iterators... I actually can't think of anywhere
 I'd use them.  I just keep thinking "I could just use random access to
 the same end".  Python's never suffered from not having them...

I don't think I've ever needed them either, but they can come in handy when implementing one container in terms of another. Say an iterable stack and queue (unlikely, I know) that are implemented via a doubly linked-list.

I wrote some mesh (graph) manipulation algorithms in Python. I really needed a linked list for that and a way to point to e.g. an edge in a list of edges coming out of a vertex that wouldn't be invalidated by inserting edges here and there before or after it, and I needed both to find the previous edge and the following edge. In C++ std::list would have been the clear choice. Doing that with indexes into to a standard python list just isn't practical (I started out trying it that way because Python FAQ's seem to say "you don't need a linked list, just use a python list". But with a python list, every time you insert one new item you have to go find everyone out there that's holding onto an index into this list and get them to update their values. Just not practical. So sometimes bidirectional iterators are needed. Even in Python. --bb
Nov 08 2006
prev sibling next sibling parent reply =?iso-8859-1?q?Knud_S=F8rensen?= <12tkvvb02 sneakemail.com> writes:
On Mon, 06 Nov 2006 12:46:01 -0800, Walter Bright wrote:

 It's becoming increasingly obvious that D needs iterators. While opApply 
   is far better for some kinds of iteration (such as recursively 
 traversing a directory), iterators are more efficient in some cases, and 
 allow for things that opApply makes difficult.
 

What are those cases ? Maybe we can find a way to fix the problem with opApply.
Nov 06 2006
next sibling parent reply Walter Bright <newshound digitalmars.com> writes:
Knud Sørensen wrote:
 On Mon, 06 Nov 2006 12:46:01 -0800, Walter Bright wrote:
 
 It's becoming increasingly obvious that D needs iterators. While opApply 
   is far better for some kinds of iteration (such as recursively 
 traversing a directory), iterators are more efficient in some cases, and 
 allow for things that opApply makes difficult.

What are those cases ? Maybe we can find a way to fix the problem with opApply.

One such case is the usefulness of being able to provide an input iterator to a parsing function, which may itself pass the iterator off to other parsing functions.
Nov 06 2006
next sibling parent Bill Baxter <dnewsgroup billbaxter.com> writes:
Walter Bright wrote:
 Knud Sørensen wrote:
 On Mon, 06 Nov 2006 12:46:01 -0800, Walter Bright wrote:

 It's becoming increasingly obvious that D needs iterators. While 
 opApply   is far better for some kinds of iteration (such as 
 recursively traversing a directory), iterators are more efficient in 
 some cases, and allow for things that opApply makes difficult.

What are those cases ? Maybe we can find a way to fix the problem with opApply.

One such case is the usefulness of being able to provide an input iterator to a parsing function, which may itself pass the iterator off to other parsing functions.

Another case is the desire to iterate over multiple containers at the same time, say as you would do in a sorted list merge operation. opApply could handle this one if D had real coroutines (i.e. multiple opApply's could be running simultaneously) Another case is the basic desire to hold onto a pointer to a container for later use. For instance if you have an STL style linked list container where the implementation (the actual nodes) are hidden from the user. You still want the user to be able to keep something that functions like a pointer to a particular insertion point in the list (and be able to use that in generic sequence manipulation algorithms). --bb
Nov 06 2006
prev sibling parent KlausO <oberhofer users.sf.net> writes:
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Walter Bright wrote:
 
 One such case is the usefulness of being able to provide an input 
 iterator to a parsing function, which may itself pass the iterator off 
 to other parsing functions.

Some questions: Is your data structure a graph or a tree ? What does your iterator iterating ? I ask because I currently investigate the visitor pattern to traverse graph like data structures (see attached code for an example).
Nov 07 2006
prev sibling parent reply Chris Nicholson-Sauls <ibisbasenji gmail.com> writes:
Knud Sørensen wrote:
 On Mon, 06 Nov 2006 12:46:01 -0800, Walter Bright wrote:
 
 
It's becoming increasingly obvious that D needs iterators. While opApply 
  is far better for some kinds of iteration (such as recursively 
traversing a directory), iterators are more efficient in some cases, and 
allow for things that opApply makes difficult.

What are those cases ? Maybe we can find a way to fix the problem with opApply.

Mostly agreed. For example, one case I seem to find mentioned a few times is providing depth-first versus breadth-first iteration on a tree -- supposedly this is hard to do with foreach/opApply. While I do agree that opApply is not optimal (because you can't tell opApply which way you want to go) I thought this was part of the utility of the recent delegates-as-aggregates feature? The following is just off the very top of my head, so it might not be ideal/perfect, but I do believe it provides this ability, using current D: <code> struct Tree(int B = 2, T) { version (Tree_NoWarnings) {} else { static if (B == 0) { static assert(false, "A tree with zero branches is just useless."); } else static if (B == 1) { static assert(false, "A tree with only one branch may as well be an array."); } } Tree*[B] children ; T value ; int walkDepth (int delegate(inout Tree!(T)) dlg) { int result ; foreach (x; children) { if (x && result = x.walkDepth(dlg)) { goto end; } } if (result = dlg(this)) { goto end; } end: return result; } int walkBreadth (int delegate(inout Tree!(T)) dlg) { int result ; Tree!(T)*[] gen = [this] , next ; while (gen.length) { foreach (x; gen) { if (result = dlg(x.value)) { goto end; } foreach (y; x.children) { if (y) { next ~= y; } } } gen = next.dup; next.length = 0; } end: return result; } } Tree!(2, int) binary_int_tree ; // ... foreach (node; &binary_int_tree.walkDepth) { // ... } foreach (node; &binary_int_tree.walkBreadth) { // ... } </code> Unless I'm missing something entirely, I don't see any major utility in this case for having a seperate Iterator entity... -- Chris Nicholson-Sauls
Nov 07 2006
parent Bill Baxter <dnewsgroup billbaxter.com> writes:
Chris Nicholson-Sauls wrote:
 Knud Sørensen wrote:
 On Mon, 06 Nov 2006 12:46:01 -0800, Walter Bright wrote:


 It's becoming increasingly obvious that D needs iterators. While 
 opApply  is far better for some kinds of iteration (such as 
 recursively traversing a directory), iterators are more efficient in 
 some cases, and allow for things that opApply makes difficult.

What are those cases ? Maybe we can find a way to fix the problem with opApply.

Mostly agreed. For example, one case I seem to find mentioned a few times is providing depth-first versus breadth-first iteration on a tree -- supposedly this is hard to do with foreach/opApply.

Other way around. Tree traversal is harder to do with C++ style iterators, easy to do with opApply. I assume your code is correct but didn't look at it too closely. --bb
Nov 07 2006
prev sibling next sibling parent "Craig Black" <cblack ara.com> writes:
"Walter Bright" <newshound digitalmars.com> wrote in message 
news:eio6u5$1j6o$1 digitaldaemon.com...
 It's becoming increasingly obvious that D needs iterators. While opApply 
 is far better for some kinds of iteration (such as recursively traversing 
 a directory), iterators are more efficient in some cases, and allow for 
 things that opApply makes difficult.

 So hear are the design goals:

 1) Works with dynamic and stat arrays
 2) Doesn't need to work with associative arrays
 3) Should be possible to achieve "pointer efficiency" with it
 4) Needs to be able to restrict lvalue access (what C++ does with const 
 iterators)
 5) Needs to work seamlessly with foreach
 6) Iterators need to be copyable

 So here's one design:

 .being property returns an iterator that starts at the beginning

 .end returns an iterator that is at the end

 Overloading * doesn't work with D. But, instead:

 Overload opIndex for rvalue access
 Overload opIndexAssign for lvalue access

 Overloading opIndex also will work for random access

 foreach loops will not be able to have a 'key' parameter.

It's not a huge problem, but passing an argument needlessly is both ugly and slightly inefficient. Perhaps opIndex and opIndexAssign could be overloaded without any parameters. -Craig
Nov 07 2006
prev sibling next sibling parent reply Oskar Linde <oskar.lindeREM OVEgmail.com> writes:
Walter Bright wrote:
 It's becoming increasingly obvious that D needs iterators. While opApply 
  is far better for some kinds of iteration (such as recursively 
 traversing a directory), iterators are more efficient in some cases, and 
 allow for things that opApply makes difficult.
 
 So hear are the design goals:
 
 1) Works with dynamic and stat arrays
 2) Doesn't need to work with associative arrays
 3) Should be possible to achieve "pointer efficiency" with it
 4) Needs to be able to restrict lvalue access (what C++ does with const 
 iterators)
 5) Needs to work seamlessly with foreach
 6) Iterators need to be copyable

I apologize for this long post. Writing too much has become a bad habit of mine lately it seems. This took a while writing, and in the meantime new posts appeared that say much of the same. If you don't bother reading this, read Sean's posts instead, as I agree 100 % with him on this. :) What I suggest is that we should not try to copy the C++ "iterator == pointer" idiom, but instead aim for an iterator design that is self-contained and simple. For me, the iterator concept is simple. An iterator is an entity that allows an iterative way of accessing data. An iterator needs: 1. a way to reference the data 2. a way to traverse the data 3. a way to signal the end of the data (In C++, many iterators lack #3 above, which leads to the necessity of a second iterator that just signals the end of the iterative range.) An iterator could also be a generator. Here are some examples (translated to D) of how iterators are implemented in some languages other than C++: Java ---- class Iterator(T) { bool hasNext(); // self explanatory T* next(); // steps and returns reference to next element } while(i.hasNext()) writefln(*i.next()); C# (Simplified, from David Gileadi) ----------------------- class Iterator(T) { bool MoveNext(); // false if no more elements T Current(); // property getter } while(i.MoveNext) writefln(i.Current); Python ------ class Iterator(T) { T next(); } while(true) try writefln(i.next); catch (StopIteration) break; Taking inspiration from the above and following Walter's design goals above, here is my suggestion: interface Iterator(T) { bool step(); // as C# MoveNext T value; // getter (rvalue access) T value(T); // setter (lvalue access) int opApply(...) // support foreach style iteration } An alternative is supplying a value property that returns a pointer to the element, but that would not allow limiting lvalue access for read only iterators and introduces a pointer, which might be bad by itself. Usage: // explicit iteration while(i.step) { writefln(i.value); i.value = 5; } // implicit iteration foreach(o; i) writefln(o); I will briefly explain my reasoning: As I said, I believe the iterator should be clean and simple. Saying that it is OK for the iterator implementations to be dirty and messy, but that the implementations will be hidden away in libraries and only be the headache of library writers will lead to just that - only library writers will write iterators. As Sean hints at, "random access iterators" are not pure iterators. The random access part is orthogonal to the iterative part. In D, a perfect way to implement a random access view is defining a .length property and an opIndex method. A random access view doesn't need to be iterable and an iterator doesn't need to provide random access. A D slice is the perfect example of a random access view. I see no reason to use operator overloading when the result isn't replaceable with a pointer anyway. Bill Baxter's points regarding iterator as a concept vs an interface/abstract class are very valid. And as Bill says, maybe both should be supported. One of them could always be converted to the other. Regarding efficiency, here is one sample implementation that a compiler should(*) be able to make just as efficient as for/foreach loops: struct Iterator(T) { T* ptr; T* end; bool step() { return ++ptr != end; } T value() { return *ptr; } T value(T x) { return *ptr = x; } // mixin OpApplyMixin; } Iterator!(T) iter(T)(T[] x) { Iterator!(T) ret;// = {x.ptr-1, x.ptr+x.length}; // ICE ret.ptr = x.ptr-1; ret.end = x.ptr+x.length; return ret; } void main() { int[] arr = [1,2,3,4,5]; auto i = arr.iter(); while(i.step) { writefln("%s",i.value); i.value = 5; } } /Oskar
Nov 07 2006
next sibling parent nazo <lovesyao gmail.com> writes:
interface Iterator(T) {
    bool step();     // as C# MoveNext
    T value;         // getter (rvalue access)
    T value(T);      // setter (lvalue access)
    int opApply(...) // support foreach style iteration
}

interface Iterator(T) { bool opNext(); // as C# MoveNext. // also support in pointer with opBack(). bool opNext(int); // speed hack T opValue; // getter (rvalue access). also support in pointer T opValue(T); // setter (lvalue access). also support in pointer int opApply(...) // support foreach style iteration } while(iter.next)//call opNext() writefln(*iter);//call opValue()
Nov 07 2006
prev sibling parent reply Walter Bright <newshound digitalmars.com> writes:
Oskar Linde wrote:
 As Sean hints at, "random access iterators" are not pure iterators. The 
 random access part is orthogonal to the iterative part. In D, a perfect 
 way to implement a random access view is defining a .length property and 
 an opIndex method. A random access view doesn't need to be iterable and 
 an iterator doesn't need to provide random access. A D slice is the 
 perfect example of a random access view.

You two might be right.
 I see no reason to use operator overloading when the result isn't 
 replaceable with a pointer anyway.

Right.
 Bill Baxter's points regarding iterator as a concept vs an 
 interface/abstract class are very valid. And as Bill says, maybe both 
 should be supported. One of them could always be converted to the other.

Yes.
 Regarding efficiency, here is one sample implementation that a compiler 
 should(*) be able to make just as efficient as for/foreach loops:
 
 struct Iterator(T) {
     T* ptr;
     T* end;
 
     bool step() { return ++ptr != end; }
     T value() { return *ptr; }
     T value(T x) { return *ptr = x; }
     // mixin OpApplyMixin;
 }
 
 Iterator!(T) iter(T)(T[] x) {
     Iterator!(T) ret;// = {x.ptr-1, x.ptr+x.length}; // ICE
     ret.ptr = x.ptr-1;
     ret.end = x.ptr+x.length;
     return ret;
 }
 
 void main() {
     int[] arr = [1,2,3,4,5];
 
     auto i = arr.iter();
 
     while(i.step) {
         writefln("%s",i.value);
         i.value = 5;
     }
 }

It's good, but the pointer-before-the-start is problematic. This can cause problems on some architectures. Pointing past the end is explicitly supported, but before the beginning is not. I'd also prefer the step() get 'stuck' at the end, instead of going past it if it gets called again (this makes sense for input iterators). So I suggest instead: for (auto i = arr.iter(); !i.atEnd(); i.step()) ... in other words, separate out the 'end' test from the 'step' operation.
Nov 07 2006
parent Dave <Dave_member pathlink.com> writes:
Walter Bright wrote:
 struct Iterator(T) {
     T* ptr;
     T* end;

     bool step() { return ++ptr != end; }
     T value() { return *ptr; }
     T value(T x) { return *ptr = x; }
     // mixin OpApplyMixin;
 }

 Iterator!(T) iter(T)(T[] x) {
     Iterator!(T) ret;// = {x.ptr-1, x.ptr+x.length}; // ICE
     ret.ptr = x.ptr-1;
     ret.end = x.ptr+x.length;
     return ret;
 }

 void main() {
     int[] arr = [1,2,3,4,5];

     auto i = arr.iter();

     while(i.step) {
         writefln("%s",i.value);
         i.value = 5;
     }
 }

It's good, but the pointer-before-the-start is problematic. This can cause problems on some architectures. Pointing past the end is explicitly supported, but before the beginning is not. I'd also prefer the step() get 'stuck' at the end, instead of going past it if it gets called again (this makes sense for input iterators). So I suggest instead: for (auto i = arr.iter(); !i.atEnd(); i.step()) ... in other words, separate out the 'end' test from the 'step' operation.

For arrays, we already have arr.ptr, what if array.end was simply added? auto slice = array[50..100]; for(auto i = slice.ptr; i != slice.end; i++) { ... } Then for UDT custom iterators, ptr, end and op[Pre|Post][Inc|Dec] could be overloaded? Just a thought..
Nov 12 2006
prev sibling parent reply Burton Radons <burton-radons smocky.com> writes:
I don't like C++ iterators; they're a very old design made to fit a 
language that was unmodifiable. They can't work with destructive or 
unbound iterators (well, they can, but you just gotta hope to hell the 
user doesn't use the interface in an unusual way), they can't work with 
associative arrays or tree structures without allocation (or wasting 
space in the structure for parent node pointers), and they're VERY 
verbose and distributed. Plus they allow iterating to an illegal index 
(the end index), which requires an additional state value for some 
iterables.

Python's generators are a far better solution, like:

    void i_range (T) (T from, T to, T step = cast (T) 1, yield T index)
    {
        for (index = from; index < to; index += step)
            yield;
    }

    foreach (i; i_range (0, 100)) ...

Or:

    yield T i_range (T) (T from, T to, T step = cast (T) 1)
    {
        T index;

        for (index = from; index < to; index += step)
            yield index;
    }

You've been meaning to allow frame pointers for proper closures anyway, 
right? :P

The implementation in iterators using the simplest interface I can think 
of would be something like:

    struct IRange (T, T from, T to, T step = cast (T) 1)
    {
        T index = from;

        static IRange opCall (T value)
        {
            IRange result;

            result.index = value;
            return result;
        }

        T opThis ()
        {
            return index;
        }

        IRange opNext ()
        {
            assert (index <= end);
            return opCall (index + 1);
        }

        IRange opEnd ()
        {
            return opCall (to);
        }
    }

    foreach (i; IRange! (int, 0, 100)) ...

Notice how the SINGLE line indicating how the arguments are interpreted 
are hidden within a whole bunch of boilerplate.
Nov 10 2006
parent Walter Bright <newshound digitalmars.com> writes:
As nice as yield is for some types of problems, they aren't going to 
happen in D anytime soon, because they're a lot of tricky work to implement.
Nov 10 2006