www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Ranges and/versus iterators

reply clueless bystander <clueless bystander.com> writes:
Watching D evolve from the outside there seems to be a lot of ongoing discussion
on this newsgroup about the D range idiom which is somehow opposed to
conventional
thinking about iterators.

Can someone please explain in plain words just exactly what a range is and how
it differs from the iterator concept (if that's appropriate???) and what are
the benefits
from a data modeling and access perspective.

Sure, I'm clueless, though suspect many other bystanders would appreciate a
succinct heads-up.

Thanks,
clueless bystander
Mar 23 2010
next sibling parent reply "Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> writes:
clueless bystander wrote:
 Watching D evolve from the outside there seems to be a lot of ongoing
discussion
 on this newsgroup about the D range idiom which is somehow opposed to
conventional
 thinking about iterators.
 
 Can someone please explain in plain words just exactly what a range is and how
 it differs from the iterator concept (if that's appropriate???) and what are
the benefits
 from a data modeling and access perspective.
 
 Sure, I'm clueless, though suspect many other bystanders would appreciate a
 succinct heads-up.
 
 Thanks,
 clueless bystander
I'm probably not the right person to answer your question, since I have virtually no experience with C++ iterators. Instead I'll just refer you to Andrei's own article on the subject: http://www.informit.com/articles/article.aspx?p=1407357 Please don't hesitate to ask again if it didn't clear things up for you. :) -Lars
Mar 23 2010
next sibling parent clueless bystander <clueless bystander.com> writes:
Lars T. Kyllingstad Wrote:

 clueless bystander wrote:
 Watching D evolve from the outside there seems to be a lot of ongoing
discussion
 on this newsgroup about the D range idiom which is somehow opposed to
conventional
 thinking about iterators.
 
 Can someone please explain in plain words just exactly what a range is and how
 it differs from the iterator concept (if that's appropriate???) and what are
the benefits
 from a data modeling and access perspective.
 
 Sure, I'm clueless, though suspect many other bystanders would appreciate a
 succinct heads-up.
 
 Thanks,
 clueless bystander
I'm probably not the right person to answer your question, since I have virtually no experience with C++ iterators. Instead I'll just refer you to Andrei's own article on the subject: http://www.informit.com/articles/article.aspx?p=1407357 Please don't hesitate to ask again if it didn't clear things up for you. :) -Lars
Thanks Lars. I'm not quite willing to accept that 15 pages is succinct but if that's what it takes to build up the background then that's what it takes. I'm up to page 3 now and, btw, like the way the author *does not* mince his words: "Such matters as a polynomial slowdown were too mundane to hinder the power of S-lists, so some functional programmers got imbued with an attitude of contempt toward arrays and associative arrays, data structures essential to many algorithms." c.b.
Mar 23 2010
prev sibling parent reply clueless bystander <clueless bystander.com> writes:
Lars T. Kyllingstad Wrote:

 clueless bystander wrote:
 Watching D evolve from the outside there seems to be a lot of ongoing
discussion
 on this newsgroup about the D range idiom which is somehow opposed to
conventional
 thinking about iterators.
 
 Can someone please explain in plain words just exactly what a range is and how
 it differs from the iterator concept (if that's appropriate???) and what are
the benefits
 from a data modeling and access perspective.
 
 Sure, I'm clueless, though suspect many other bystanders would appreciate a
 succinct heads-up.
 
 Thanks,
 clueless bystander
I'm probably not the right person to answer your question, since I have virtually no experience with C++ iterators. Instead I'll just refer you to Andrei's own article on the subject: http://www.informit.com/articles/article.aspx?p=1407357 Please don't hesitate to ask again if it didn't clear things up for you. :) -Lars
Yes, well, thanks again. The first 7 pages seemed to have plausible arguments but the going get tough thereafter. Maybe the reason ranges are not popular is that they are hard to explain even though they might be simple and obvious in hindsight. Sigh, c.b.
Mar 23 2010
next sibling parent "Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> writes:
clueless bystander wrote:
 Lars T. Kyllingstad Wrote:
 
 clueless bystander wrote:
 Watching D evolve from the outside there seems to be a lot of ongoing
discussion
 on this newsgroup about the D range idiom which is somehow opposed to
conventional
 thinking about iterators.

 Can someone please explain in plain words just exactly what a range is and how
 it differs from the iterator concept (if that's appropriate???) and what are
the benefits
 from a data modeling and access perspective.

 Sure, I'm clueless, though suspect many other bystanders would appreciate a
 succinct heads-up.

 Thanks,
 clueless bystander
I'm probably not the right person to answer your question, since I have virtually no experience with C++ iterators. Instead I'll just refer you to Andrei's own article on the subject: http://www.informit.com/articles/article.aspx?p=1407357 Please don't hesitate to ask again if it didn't clear things up for you. :) -Lars
Yes, well, thanks again. The first 7 pages seemed to have plausible arguments but the going get tough thereafter. Maybe the reason ranges are not popular is that they are hard to explain even though they might be simple and obvious in hindsight. Sigh, c.b.
The reason ranges are not popular is because they haven't had time to become popular yet. The range concept itself is rather new, not much older than the article I referred you to, and AFAIK it's only been implemented in the D2 standard library. I'm sure there are several people on this forum that can give you a satisfactory (and succinct) answer. You may want to check back in a few hours, when activity picks up. FWIW, I don't think the concept of ranges is very hard to grasp. I just haven't used them that much, that's why I don't want to be the one to answer your question (and, like I said before, I have never used C++ iterators so I can't compare them either). -Lars
Mar 23 2010
prev sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 03/23/2010 06:58 AM, clueless bystander wrote:
 Lars T. Kyllingstad Wrote:

 clueless bystander wrote:
 Watching D evolve from the outside there seems to be a lot of ongoing
discussion
 on this newsgroup about the D range idiom which is somehow opposed to
conventional
 thinking about iterators.

 Can someone please explain in plain words just exactly what a range is and how
 it differs from the iterator concept (if that's appropriate???) and what are
the benefits
 from a data modeling and access perspective.

 Sure, I'm clueless, though suspect many other bystanders would appreciate a
 succinct heads-up.

 Thanks,
 clueless bystander
I'm probably not the right person to answer your question, since I have virtually no experience with C++ iterators. Instead I'll just refer you to Andrei's own article on the subject: http://www.informit.com/articles/article.aspx?p=1407357 Please don't hesitate to ask again if it didn't clear things up for you. :) -Lars
Yes, well, thanks again. The first 7 pages seemed to have plausible arguments but the going get tough thereafter. Maybe the reason ranges are not popular is that they are hard to explain even though they might be simple and obvious in hindsight. Sigh, c.b.
Thank you for your interest. The article (http://erdani.com/publications/on-iteration.html) is long because following my keynote talk at BoostCon 2009, I've received a flurry of emails asking me to flesh out the ranges design in greater detail and to better motivate them. That article not only describes the design, but it also explains the historical artifacts that led to today's imperfect state of affairs, motivates defining ranges with categories, and gives examples. The price for such thoroughness is - sorry - size. If you are familiar with STL iterators and GoF-style iterators (with hasmore/get/advance), ranges are so simple, they define themselves: a range is a GoF-style iterator that recognizes the necessity of STL's iterator categories (input, forward, bidirectional, and random). All the rest is aftermath. Andrei
Mar 23 2010
prev sibling next sibling parent Jesse Phillips <jessekphillips+D gmail.com> writes:
I think a good way to learn ranges is by looking at the interface to them,
compared to those used by other languages. For one thing, C++ iterators are way
more complicated than what you will find in many other languages; for example
Java's iterator interface[1] is similar to a D range[2].

In both languages the interface is 3 functions (there are many types of ranges
which require more, but only one kind of iterator).

Java:
bool hasNext()
E next()
void remove()

D:
bool empty()
E front()
void popFront()

These look almost identical but the semantics are very different. For example,
hasNext() requires a look-ahead, while empty() does not. This is important
since you may not be able to look ahead in a range.

next() performs the equivalent of calling front(); popFront(); And remove() has
nothing to do with the iterator as it performs on the underlining collection.

Removing the look-ahead is probably one of the biggest improvements over Java's
iterator.

1. http://java.sun.com/j2se/1.5.0/docs/api/java/util/Iterator.html
2. http://digitalmars.com/d/2.0/phobos/std_range.html#isInputRange

clueless bystander Wrote:

 Watching D evolve from the outside there seems to be a lot of ongoing
discussion
 on this newsgroup about the D range idiom which is somehow opposed to
conventional
 thinking about iterators.
 
 Can someone please explain in plain words just exactly what a range is and how
 it differs from the iterator concept (if that's appropriate???) and what are
the benefits
 from a data modeling and access perspective.
 
 Sure, I'm clueless, though suspect many other bystanders would appreciate a
 succinct heads-up.
 
 Thanks,
 clueless bystander
Mar 23 2010
prev sibling next sibling parent Igor Lesik <curoles yahoo.com> writes:
Can someone please explain in plain words just exactly what a range is and=
how=0A>it differs from the iterator concept (if that's appropriate???) and= what are the benefits=0A>from a data modeling and access perspective.=0A= =0ACheck Andrei's presentation "Iterators must go":=0Ahttp://www.slideshare= .net/rawwell/iteratorsmustgo=0A=0AI=A0am interested myself to know how rang= es=0Aare envisioned to be in D. Does the book talk=0Aabout ranges?=0A=0A=0A= =0A
Mar 23 2010
prev sibling next sibling parent reply Fawzi Mohamed <fawzi gmx.ch> writes:
Andrei, as the topic just came up a comment on the range interface.
Just for plain forward iterators iterators having

bool empty()
E front()
void popFront()

makes the interface non reentrant.
For that purpose having a single function is better.
I use

	bool popFront(ref T t)
	// returns true if there is a next element, and in that case returns  
it in t

this can be used by several consumers concurrently without problems  
and creating filters, combiners,... is simple.
Another advantage is that a single object can implement several  
iterators.
A disadvantage is that even if there is a single iterator D makes type  
inference cumbersome, i.e. you cannot simply use auto, as in a loop  
you have to declare the variable before using it as the loop is
T a;
while (it.popFront(a)){
//...
}
Mar 23 2010
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 03/23/2010 02:45 PM, Fawzi Mohamed wrote:
 Andrei, as the topic just came up a comment on the range interface.
 Just for plain forward iterators iterators having

 bool empty()
 E front()
 void popFront()

 makes the interface non reentrant.
 For that purpose having a single function is better.
 I use

 bool popFront(ref T t)
 // returns true if there is a next element, and in that case returns it
 in t

 this can be used by several consumers concurrently without problems and
 creating filters, combiners,... is simple.
 Another advantage is that a single object can implement several iterators.
 A disadvantage is that even if there is a single iterator D makes type
 inference cumbersome, i.e. you cannot simply use auto, as in a loop you
 have to declare the variable before using it as the loop is
 T a;
 while (it.popFront(a)){
 //...
 }
We've discussed this extensively, and I lost sleep over this simple matter more than once. The main problem with bool popFront(ref E) is that it doesn't work meaningfully for containers that expose references to their elements. The interface with front() leaves it to the range to return E or ref E. An alternative is this: bool empty(); ref E getNext(); // ref or no ref I'm thinking seriously of defining input ranges that way. The underlying notion is that you always move forward - getting an element is simultaneous with moving to the next. Andrei
Mar 23 2010
next sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Tue, 23 Mar 2010 16:34:24 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 On 03/23/2010 02:45 PM, Fawzi Mohamed wrote:
 Andrei, as the topic just came up a comment on the range interface.
 Just for plain forward iterators iterators having

 bool empty()
 E front()
 void popFront()

 makes the interface non reentrant.
 For that purpose having a single function is better.
 I use

 bool popFront(ref T t)
 // returns true if there is a next element, and in that case returns it
 in t

 this can be used by several consumers concurrently without problems and
 creating filters, combiners,... is simple.
 Another advantage is that a single object can implement several  
 iterators.
 A disadvantage is that even if there is a single iterator D makes type
 inference cumbersome, i.e. you cannot simply use auto, as in a loop you
 have to declare the variable before using it as the loop is
 T a;
 while (it.popFront(a)){
 //...
 }
We've discussed this extensively, and I lost sleep over this simple matter more than once. The main problem with bool popFront(ref E) is that it doesn't work meaningfully for containers that expose references to their elements. The interface with front() leaves it to the range to return E or ref E. An alternative is this: bool empty(); ref E getNext(); // ref or no ref I'm thinking seriously of defining input ranges that way. The underlying notion is that you always move forward - getting an element is simultaneous with moving to the next.
A while back, you identified one of the best interfaces for input ranges: E* getNext(); Which allows for null returns when no data is left. The drawback is that E must be either referenced or allocated on the heap (providing storage to the function is an option). But the killer issue was that safeD would not allow it. However, in recent times, you have hinted that safeD may allow pointers, but disallow bad pointer operations. In light of this, can we reconsider this interface, or other alternatives using pointers? I've always felt that if we were to define ranges for streams in a non-awkward way, we would need an "all in one" operation, since not only does getting data from the range move the range, but checking for empty might also move the range (empty on a stream means you tried to read and got nothing). -Steve
Mar 23 2010
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 03/23/2010 03:46 PM, Steven Schveighoffer wrote:
 On Tue, 23 Mar 2010 16:34:24 -0400, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 On 03/23/2010 02:45 PM, Fawzi Mohamed wrote:
 Andrei, as the topic just came up a comment on the range interface.
 Just for plain forward iterators iterators having

 bool empty()
 E front()
 void popFront()

 makes the interface non reentrant.
 For that purpose having a single function is better.
 I use

 bool popFront(ref T t)
 // returns true if there is a next element, and in that case returns it
 in t

 this can be used by several consumers concurrently without problems and
 creating filters, combiners,... is simple.
 Another advantage is that a single object can implement several
 iterators.
 A disadvantage is that even if there is a single iterator D makes type
 inference cumbersome, i.e. you cannot simply use auto, as in a loop you
 have to declare the variable before using it as the loop is
 T a;
 while (it.popFront(a)){
 //...
 }
We've discussed this extensively, and I lost sleep over this simple matter more than once. The main problem with bool popFront(ref E) is that it doesn't work meaningfully for containers that expose references to their elements. The interface with front() leaves it to the range to return E or ref E. An alternative is this: bool empty(); ref E getNext(); // ref or no ref I'm thinking seriously of defining input ranges that way. The underlying notion is that you always move forward - getting an element is simultaneous with moving to the next.
A while back, you identified one of the best interfaces for input ranges: E* getNext(); Which allows for null returns when no data is left. The drawback is that E must be either referenced or allocated on the heap (providing storage to the function is an option). But the killer issue was that safeD would not allow it. However, in recent times, you have hinted that safeD may allow pointers, but disallow bad pointer operations. In light of this, can we reconsider this interface, or other alternatives using pointers? I've always felt that if we were to define ranges for streams in a non-awkward way, we would need an "all in one" operation, since not only does getting data from the range move the range, but checking for empty might also move the range (empty on a stream means you tried to read and got nothing).
I'd gladly reconsider E* getNext(), and I like it a lot, but that doesn't accommodate ranges that want to return rvalues without storing them (e.g. a range using getchar() as a back-end, and generally streams that don't correspond to stuff stored in memory). If it's not in memory, there's no pointer to it. Andrei
Mar 23 2010
parent Fawzi Mohamed <fawzi gmx.ch> writes:
	charset=US-ASCII;
	format=flowed;
	delsp=yes
Content-Transfer-Encoding: 7bit


On 23-mar-10, at 21:51, Andrei Alexandrescu wrote:

 On 03/23/2010 03:46 PM, Steven Schveighoffer wrote:
 On Tue, 23 Mar 2010 16:34:24 -0400, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 [...]
 A while back, you identified one of the best interfaces for input  
 ranges:

 E* getNext();

 Which allows for null returns when no data is left. The drawback is  
 that
 E must be either referenced or allocated on the heap (providing  
 storage
 to the function is an option). But the killer issue was that safeD  
 would
 not allow it. However, in recent times, you have hinted that safeD  
 may
 allow pointers, but disallow bad pointer operations. In light of  
 this,
 can we reconsider this interface, or other alternatives using  
 pointers?

 I've always felt that if we were to define ranges for streams in a
 non-awkward way, we would need an "all in one" operation, since not  
 only
 does getting data from the range move the range, but checking for  
 empty
 might also move the range (empty on a stream means you tried to  
 read and
 got nothing).
yes that also makes filters/combiners really nice to write. the basic thing is that you have to return two things, 1. if there is more and 2. if yes the element.
 I'd gladly reconsider E* getNext(), and I like it a lot, but that  
 doesn't accommodate ranges that want to return rvalues without  
 storing them (e.g. a range using getchar() as a back-end, and  
 generally streams that don't correspond to stuff stored in memory).  
 If it's not in memory, there's no pointer to it.
E* getNext would probably also be workable, at the cost of storing one element. But then as andrei correctly points out one still cannot expect the pointer to be valid after one iteration, so as soon as you don't have memory storage you loose thread safety... Now reentrancy/thread safety is not necessarily the most important thing for iterators, but I like that my queues, sources of work can have the same interface.
Mar 23 2010
prev sibling parent reply grauzone <none example.net> writes:
Steven Schveighoffer wrote:
 A while back, you identified one of the best interfaces for input ranges:
 
 E* getNext();
 
 Which allows for null returns when no data is left.  The drawback is 
 that E must be either referenced or allocated on the heap (providing 
 storage to the function is an option).  But the killer issue was that 
 safeD would not allow it.  However, in recent times, you have hinted 
Nullable!(E) getNext(); ?
Mar 23 2010
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 03/23/2010 04:06 PM, grauzone wrote:
 Steven Schveighoffer wrote:
 A while back, you identified one of the best interfaces for input ranges:

 E* getNext();

 Which allows for null returns when no data is left. The drawback is
 that E must be either referenced or allocated on the heap (providing
 storage to the function is an option). But the killer issue was that
 safeD would not allow it. However, in recent times, you have hinted
Nullable!(E) getNext(); ?
And if returning a reference...? Andrei
Mar 23 2010
parent grauzone <none example.net> writes:
Andrei Alexandrescu wrote:
 On 03/23/2010 04:06 PM, grauzone wrote:
 Steven Schveighoffer wrote:
 A while back, you identified one of the best interfaces for input 
 ranges:

 E* getNext();

 Which allows for null returns when no data is left. The drawback is
 that E must be either referenced or allocated on the heap (providing
 storage to the function is an option). But the killer issue was that
 safeD would not allow it. However, in recent times, you have hinted
Nullable!(E) getNext(); ?
And if returning a reference...?
Extend auto ref to template parameters: struct Nullable(auto ref T) { ... } T would be actually a reference type if and only if you could return a reference to the variable the template parameter was inferred from from a SafeD function. Basically, the compiler would know that references to T can be passed around freely. (SafeD allows ref returns under circumstances.) Not a solution I would prefer, but in the spirit of the design of D2 and SafeD in general.
 Andrei
Mar 23 2010
prev sibling parent reply Fawzi Mohamed <fawzi gmx.ch> writes:
On 23-mar-10, at 21:34, Andrei Alexandrescu wrote:

 [...]
 We've discussed this extensively, and I lost sleep over this simple  
 matter more than once. The main problem with bool popFront(ref E) is  
 that it doesn't work meaningfully for containers that expose  
 references to their elements.
yes I agree that it is a difficult problem, the single function works well in the basic iterator case, but does not generalize well to modifiable values. In most cases I resorted to returning pointers. The templates that generate opApply (still D1.0 ;) from that is smart enough to remove the pointer when possible as the ref already gives that. Still not perfect, as always there are tradeoffs...
 The interface with front() leaves it to the range to return E or ref  
 E.

 An alternative is this:

 bool empty();
 ref E getNext(); // ref or no ref

 I'm thinking seriously of defining input ranges that way. The  
 underlying notion is that you always move forward - getting an  
 element is simultaneous with moving to the next.
already better (for basic iterators), but still not reentrant, if another thread executes between empty and getNext... anyway any choice has some drawbacks... I like bool next(ref T) because it works well also for streams... and somehow (using T* or not depending on the type) can accommodate all iteration needs. Not perfectly nice, but workable. Fawzi
Mar 23 2010
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 03/23/2010 05:41 PM, Fawzi Mohamed wrote:
 On 23-mar-10, at 21:34, Andrei Alexandrescu wrote:

 [...]
 We've discussed this extensively, and I lost sleep over this simple
 matter more than once. The main problem with bool popFront(ref E) is
 that it doesn't work meaningfully for containers that expose
 references to their elements.
yes I agree that it is a difficult problem, the single function works well in the basic iterator case, but does not generalize well to modifiable values. In most cases I resorted to returning pointers. The templates that generate opApply (still D1.0 ;) from that is smart enough to remove the pointer when possible as the ref already gives that. Still not perfect, as always there are tradeoffs...
 The interface with front() leaves it to the range to return E or ref E.

 An alternative is this:

 bool empty();
 ref E getNext(); // ref or no ref

 I'm thinking seriously of defining input ranges that way. The
 underlying notion is that you always move forward - getting an element
 is simultaneous with moving to the next.
already better (for basic iterators), but still not reentrant, if another thread executes between empty and getNext...
It can't :o).
 anyway any choice has some drawbacks... I like bool next(ref T) because
 it works well also for streams... and somehow (using T* or not depending
 on the type) can accommodate all iteration needs.
 Not perfectly nice, but workable.
next(ref T) works well _only_ on streams. It works badly on containers. Andrei
Mar 23 2010
prev sibling next sibling parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
Andrei Alexandrescu Wrote:

 On 03/23/2010 03:46 PM, Steven Schveighoffer wrote:
 A while back, you identified one of the best interfaces for input ranges:

 E* getNext();

 Which allows for null returns when no data is left. The drawback is that
 E must be either referenced or allocated on the heap (providing storage
 to the function is an option). But the killer issue was that safeD would
 not allow it. However, in recent times, you have hinted that safeD may
 allow pointers, but disallow bad pointer operations. In light of this,
 can we reconsider this interface, or other alternatives using pointers?

 I've always felt that if we were to define ranges for streams in a
 non-awkward way, we would need an "all in one" operation, since not only
 does getting data from the range move the range, but checking for empty
 might also move the range (empty on a stream means you tried to read and
 got nothing).
I'd gladly reconsider E* getNext(), and I like it a lot, but that doesn't accommodate ranges that want to return rvalues without storing them (e.g. a range using getchar() as a back-end, and generally streams that don't correspond to stuff stored in memory). If it's not in memory, there's no pointer to it.
First, a range backed by getchar is about as useful as functional qsort ;) Second, you *have* to read data into memory. Even with the ranges as they currently are, you have to read into memory. At least this is less awkward. Take for instance a line iterator. You have to read enough to see the line terminator, but you most likely do not read *exactly* to the line terminator, so you just read in chunks until you get a line, then return the pointer to the data. It works actually quite elegantly. Third, the memory could be supplied by the caller. For instance, if you wrote the function like this: E* getNext(E* buf = null); Then foreach could do something like this: foreach(e; streamrange) => E _e; while(auto e = streamrange.getNext(&_e)) To avoid heap allocation. Of course, heap allocation would be the default if buf is null. Tango does this sort of trick quite often, and it makes the I/O code extremely fast. Also, another thing to think about is we can generalize the return type to satisfying the condition: iff range is empty then cast(bool)range.getNext == false. This means as long as your range cannot return a null element for a non-empty return, it is OK not to use a pointer. For example, the line iterator again... it can be written like: const(char)[] getNext() because you will only ever return a null const(char)[] when there is no data left. I don't think we should give up on trying to make a stream range that is not awkward, I really dislike the way today's input ranges map to streams. -Steve
Mar 23 2010
next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 03/23/2010 09:12 PM, Steven Schveighoffer wrote:
 Andrei Alexandrescu Wrote:

 On 03/23/2010 03:46 PM, Steven Schveighoffer wrote:
 A while back, you identified one of the best interfaces for input ranges:

 E* getNext();

 Which allows for null returns when no data is left. The drawback is that
 E must be either referenced or allocated on the heap (providing storage
 to the function is an option). But the killer issue was that safeD would
 not allow it. However, in recent times, you have hinted that safeD may
 allow pointers, but disallow bad pointer operations. In light of this,
 can we reconsider this interface, or other alternatives using pointers?

 I've always felt that if we were to define ranges for streams in a
 non-awkward way, we would need an "all in one" operation, since not only
 does getting data from the range move the range, but checking for empty
 might also move the range (empty on a stream means you tried to read and
 got nothing).
I'd gladly reconsider E* getNext(), and I like it a lot, but that doesn't accommodate ranges that want to return rvalues without storing them (e.g. a range using getchar() as a back-end, and generally streams that don't correspond to stuff stored in memory). If it's not in memory, there's no pointer to it.
First, a range backed by getchar is about as useful as functional qsort ;)
Actually I need one. Think fscanf, i.e. unformat() for streams. Andrei
Mar 23 2010
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 03/23/2010 09:12 PM, Steven Schveighoffer wrote:
 Andrei Alexandrescu Wrote:
 I'd gladly reconsider E* getNext(), and I like it a lot, but that
 doesn't accommodate ranges that want to return rvalues without
 storing them (e.g. a range using getchar() as a back-end, and
 generally streams that don't correspond to stuff stored in memory).
 If it's not in memory, there's no pointer to it.
Second, you *have* to read data into memory. Even with the ranges as they currently are, you have to read into memory. At least this is less awkward.
I agree. But it's one thing to read and pass along, and a different thing to read and keep in a buffer inside the range.
 Take for instance a line iterator.  You have to read enough to see
 the line terminator, but you most likely do not read *exactly* to the
 line terminator, so you just read in chunks until you get a line,
 then return the pointer to the data.  It works actually quite
 elegantly.
I disagree about the elegance part. If the range arrogates the right to use its own buffering, then when you decide you're done with that range and try to read some more from the stream, you discover data has been lost. The Phobos file I/O functions all avoid doing any more buffering than the backing FILE* does. They achieve performance by locking the file once with flockfile/funlockfile and then using fgetc_unlocked(). This puts me in real trouble with the formatted reading functions (a la fscanf but generalized to all input ranges), which I'm gestating about. The problem with the current API is that if you call input.front(), it will call fgetc(). But then say I decide I'm done with the range, as is the case with e.g. reading an integer and stopping at the first non-digit. That non-digit character will be lost. So there's a need to say, hey, put this guy back because whoever reads after me will need to look at it. So I need a putBackFront() or something (which would call fungetc()). I wish things were simpler.
 Third, the memory could be supplied by the caller.  For instance, if
 you wrote the function like this:

 E* getNext(E* buf = null);

 Then foreach could do something like this:

 foreach(e; streamrange)

 =>

 E _e; while(auto e = streamrange.getNext(&_e))

 To avoid heap allocation.  Of course, heap allocation would be the
 default if buf is null.

 Tango does this sort of trick quite often, and it makes the I/O code
 extremely fast.
The problem is that that speed doesn't translate very well to in-memory containers. For containers it's preferable to pass null so you get a pointer to the actual element; for streams it's preferable to not pass null. So it's difficult to write code that works well for both.
 Also, another thing to think about is we can generalize the return
 type to satisfying the condition:

 iff range is empty then cast(bool)range.getNext == false.

 This means as long as your range cannot return a null element for a
 non-empty return, it is OK not to use a pointer.  For example, the
 line iterator again... it can be written like:

 const(char)[] getNext()

 because you will only ever return a null const(char)[] when there is
 no data left.
I see, but if I'm looking for ints? I'll have to return a pointer - or a nullable or something.
 I don't think we should give up on trying to make a stream range that
 is not awkward, I really dislike the way today's input ranges map to
 streams.
Me too. Let's keep on looking, I have the feeling something good is right behind the corner. But then I felt that way for a year :o). Andrei
Mar 23 2010
next sibling parent reply Fawzi Mohamed <fawzi gmx.ch> writes:
On 24-mar-10, at 03:51, Andrei Alexandrescu wrote:

 The Phobos file I/O functions all avoid doing any more buffering  
 than the backing FILE* does. They achieve performance by locking the  
 file once with flockfile/funlockfile and then using fgetc_unlocked().

 This puts me in real trouble with the formatted reading functions (a  
 la fscanf but generalized to all input ranges), which I'm gestating  
 about. The problem with the current API is that if you call  
 input.front(), it will call fgetc(). But then say I decide I'm done  
 with the range, as is the case with e.g. reading an integer and  
 stopping at the first non-digit. That non-digit character will be  
 lost. So there's a need to say, hey, put this guy back because  
 whoever reads after me will need to look at it. So I need a  
 putBackFront() or something (which would call fungetc()). I wish  
 things were simpler.
I had a pushBack (I called that unget) in http://github.com/fawzi/blip/blob/master/blip/text/TextParser.d , but I recently removed that in favor of a peek function that I think is much more flexible. What I did is to base most parsing on CharReaders (for example the char based ones from BasicIO): {{{ /// extent of a slice of a buffer enum SliceExtent{ Partial, Maximal, ToEnd } /// a delegate that reads in from a character source alias size_t delegate(char[]buf, SliceExtent slice,out bool iterate) CharReader; /// a handler of CharReader, returns true if something was read alias bool delegate(CharReader)CharReaderHandler; }}} a char reader reads from the given buffer buf, and can either request more (by returning EOF), or eat some characters out of it. If it sets iterate to true it wants to iterate with the eaten buffer (useful to for example skip undefined amount of whitespace that might overflow the buffer). Once you have that you can easily create a Peeker structure that wraps a CharReader, and exposes a CharReaded that tries to match it, but always eats 0 characters, even if the match was successful. With it you can have a peek method that returns true if the CharReader that you pass in matches, false if it does not match, and what you want if the buffer is too small to resolve the issue. Most of these things are templates that work for any type T. Then I built buffered types that using a size_t delegate(T[]) give a Reader based interface. All this is not based on single elements anymore, but on arrays (ranges? :), but I think that is what is needed for efficient i/o.
 On 03/23/2010 09:12 PM, Steven Schveighoffer wrote:
 I don't think we should give up on trying to make a stream range that
 is not awkward, I really dislike the way today's input ranges map to
 streams.
Me too. Let's keep on looking, I have the feeling something good is right behind the corner. But then I felt that way for a year :o).
give a try to bool popFront(ref T) ( or next, or another name, or even just a delegate with that signature) I was surprised how well it works, not perfect but better than the other alternatives I had tried. loop on a T[] array: bool popFront(ref T* el); mapped to opApply(int delegate(ref T x) loopBody); loop on a source of elements T: bool popFront(ref T el); mapped to opApply(int delegate(ref T x) loopBody); (well the ref there does not make much sense, but that is how opApply works to avoid the explosion of opApply). All it takes is a check for pointers in the templates, and dereference the type of opApply. also direct loops often look reasonable thanks to with D automatically dereferencing with ".": while (it.popFront(el)){ el.doSomething; } yes assigning stuff directly to el, and not its components you need a T* iterator and you have to write *el=... and x= *el but that is not so terrible. filter applied on an iterator it is just bool popNext(ref T el){ while (it.popNext(el)){ if (acceptable(el)){ return true; } } return false; } combiners of iterators are likewise quite simple to write.
Mar 24 2010
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 03/24/2010 09:00 AM, Fawzi Mohamed wrote:
 On 24-mar-10, at 03:51, Andrei Alexandrescu wrote:

 The Phobos file I/O functions all avoid doing any more buffering than
 the backing FILE* does. They achieve performance by locking the file
 once with flockfile/funlockfile and then using fgetc_unlocked().

 This puts me in real trouble with the formatted reading functions (a
 la fscanf but generalized to all input ranges), which I'm gestating
 about. The problem with the current API is that if you call
 input.front(), it will call fgetc(). But then say I decide I'm done
 with the range, as is the case with e.g. reading an integer and
 stopping at the first non-digit. That non-digit character will be
 lost. So there's a need to say, hey, put this guy back because whoever
 reads after me will need to look at it. So I need a putBackFront() or
 something (which would call fungetc()). I wish things were simpler.
I had a pushBack (I called that unget) in http://github.com/fawzi/blip/blob/master/blip/text/TextParser.d , but I recently removed that in favor of a peek function that I think is much more flexible.
Thanks for sharing your design with me. Yes, peek() is more flexible than get/unget, but I'm under the stdio tyranny. In fact I just realized something - I could call setvbuf(_handle, null, _IONBF, 0) whenever I bind a File to a FILE*. That way File can do its own buffering and can implement peek() etc. I wonder if we need to worry about sharing, because e.g. several threads would want to write to stdout.
 What I did is to base most parsing on CharReaders (for example the char
 based ones from BasicIO):
 {{{
 /// extent of a slice of a buffer
 enum SliceExtent{ Partial, Maximal, ToEnd }

 /// a delegate that reads in from a character source
 alias size_t delegate(char[]buf, SliceExtent slice,out bool iterate)
 CharReader;

 /// a handler of CharReader, returns true if something was read
 alias bool delegate(CharReader)CharReaderHandler;
 }}}

 a char reader reads from the given buffer buf, and can either request
 more (by returning EOF), or eat some characters out of it. If it sets
 iterate to true it wants to iterate with the eaten buffer (useful to for
 example skip undefined amount of whitespace that might overflow the
 buffer).

 Once you have that you can easily create a Peeker structure that wraps a
 CharReader, and exposes a CharReaded that tries to match it, but always
 eats 0 characters, even if the match was successful.
 With it you can have a peek method that returns true if the CharReader
 that you pass in matches, false if it does not match, and what you want
 if the buffer is too small to resolve the issue.

 Most of these things are templates that work for any type T.
Wait, if you called it CharReader, how come it works with any type T? Or are you referring to T as the parsed type?
 Then I
 built buffered types that using a size_t delegate(T[]) give a Reader
 based interface.

 All this is not based on single elements anymore, but on arrays (ranges?
 :), but I think that is what is needed for efficient i/o.
Sounds good, but I wonder why you use delegates instead of classes. Is that for simplicity? I confess it's not 100% clear to me how the delegates are supposed to be used in concert, particularly why there's a need for both CharReader and CharReaderHandler.
 On 03/23/2010 09:12 PM, Steven Schveighoffer wrote:
 I don't think we should give up on trying to make a stream range that
 is not awkward, I really dislike the way today's input ranges map to
 streams.
Me too. Let's keep on looking, I have the feeling something good is right behind the corner. But then I felt that way for a year :o).
give a try to bool popFront(ref T) ( or next, or another name, or even just a delegate with that signature) I was surprised how well it works, not perfect but better than the other alternatives I had tried. loop on a T[] array: bool popFront(ref T* el);
So arrays have a different interface than streams. It looks like you can't write code that works uniformly for both, because for some you need the * and for some you don't. Did I understand that correctly? Andrei
Mar 24 2010
next sibling parent Fawzi Mohamed <fawzi gmx.ch> writes:
On 24-mar-10, at 23:29, Andrei Alexandrescu wrote:

 On 03/24/2010 09:00 AM, Fawzi Mohamed wrote:
 On 24-mar-10, at 03:51, Andrei Alexandrescu wrote:

 The Phobos file I/O functions all avoid doing any more buffering  
 than
 the backing FILE* does. They achieve performance by locking the file
 once with flockfile/funlockfile and then using fgetc_unlocked().

 This puts me in real trouble with the formatted reading functions (a
 la fscanf but generalized to all input ranges), which I'm gestating
 about. The problem with the current API is that if you call
 input.front(), it will call fgetc(). But then say I decide I'm done
 with the range, as is the case with e.g. reading an integer and
 stopping at the first non-digit. That non-digit character will be
 lost. So there's a need to say, hey, put this guy back because  
 whoever
 reads after me will need to look at it. So I need a putBackFront()  
 or
 something (which would call fungetc()). I wish things were simpler.
I had a pushBack (I called that unget) in http://github.com/fawzi/blip/blob/master/blip/text/TextParser.d , but I recently removed that in favor of a peek function that I think is much more flexible.
Thanks for sharing your design with me. Yes, peek() is more flexible than get/unget, but I'm under the stdio tyranny. In fact I just realized something - I could call setvbuf(_handle, null, _IONBF, 0) whenever I bind a File to a FILE*. That way File can do its own buffering and can implement peek() etc. I wonder if we need to worry about sharing, because e.g. several threads would want to write to stdout.
well for stdout by default I use locking to ensure writing writes chunks atomically. I would say that by default streams imply sequence, so can safely be non threadsafe. stdout, stderr and logging are exceptions, there at least chunks should be written atomically.
 What I did is to base most parsing on CharReaders (for example the  
 char
 based ones from BasicIO):
 {{{
 /// extent of a slice of a buffer
 enum SliceExtent{ Partial, Maximal, ToEnd }

 /// a delegate that reads in from a character source
 alias size_t delegate(char[]buf, SliceExtent slice,out bool iterate)
 CharReader;

 /// a handler of CharReader, returns true if something was read
 alias bool delegate(CharReader)CharReaderHandler;
 }}}

 a char reader reads from the given buffer buf, and can either request
 more (by returning EOF), or eat some characters out of it. If it sets
 iterate to true it wants to iterate with the eaten buffer (useful  
 to for
 example skip undefined amount of whitespace that might overflow the
 buffer).

 Once you have that you can easily create a Peeker structure that  
 wraps a
 CharReader, and exposes a CharReaded that tries to match it, but  
 always
 eats 0 characters, even if the match was successful.
 With it you can have a peek method that returns true if the  
 CharReader
 that you pass in matches, false if it does not match, and what you  
 want
 if the buffer is too small to resolve the issue.

 Most of these things are templates that work for any type T.
Wait, if you called it CharReader, how come it works with any type T? Or are you referring to T as the parsed type?
Well I presented the CharReader for simplicity, and that is indeed only for chars, but most things can be generalized, and indeed if you look at the Reader(T) interface in http://github.com/fawzi/blip/blob/master/blip/io/BasicIO.d or blip.text.TextParser or similar they are templated with a generic type T. For TextParser I was thinking T=char,wchar or dchar, whereas others cases are even more generic.
 Then I
 built buffered types that using a size_t delegate(T[]) give a Reader
 based interface.

 All this is not based on single elements anymore, but on arrays  
 (ranges?
 :), but I think that is what is needed for efficient i/o.
Sounds good, but I wonder why you use delegates instead of classes. Is that for simplicity?
there are both, and both have their place. delegates are very simple and can be easily built on the fly, I like that very much, they reduce the code footprint of various things. More complex behaviour is better captured by classes, and indeed there are (also in BasicIO) the following interfaces: interface OutStreamI{ void rawWriteStr(char[]); void rawWriteStr(wchar[]); void rawWriteStr(dchar[]); void rawWrite(void[]); CharSink charSink(); BinSink binSink(); void flush(); void close(); } /// a reader of elements of type T interface Reader(T){ /// read some data into the given buffer size_t readSome(T[]); /// character reader handler bool handleReader(size_t delegate(T[], SliceExtent slice,out bool iterate) r); /// shutdown the input source void shutdownInput(); } /// one or more readers interface MultiReader{ enum Mode{ Binary=1, Char=2, Wchar=4, Dchar=8 } /// returns the modes this reader supports uint modes(); /// returns the native modes of this reader (less overhead) uint nativeModes(); Reader!(char) readerChar(); Reader!(wchar) readerWchar(); Reader!(dchar) readerDchar(); Reader!(void) readerBin(); void shutdownInput(); } there are classes that can create the more full fledged objects out of delegates.
 I confess it's not 100% clear to me how the delegates are supposed  
 to be used in concert, particularly why there's a need for both  
 CharReader and CharReaderHandler.
mainly one needs CharReader, which is a method that reads something. CharReaderHandler is there just for completeness, it is a delegate of a method that actually reads, but normally one simply uses that method, i.e. it uses a Reader!(T).handleReader method...
 On 03/23/2010 09:12 PM, Steven Schveighoffer wrote:
 I don't think we should give up on trying to make a stream range  
 that
 is not awkward, I really dislike the way today's input ranges map  
 to
 streams.
Me too. Let's keep on looking, I have the feeling something good is right behind the corner. But then I felt that way for a year :o).
give a try to bool popFront(ref T) ( or next, or another name, or even just a delegate with that signature) I was surprised how well it works, not perfect but better than the other alternatives I had tried. loop on a T[] array: bool popFront(ref T* el);
So arrays have a different interface than streams. It looks like you can't write code that works uniformly for both, because for some you need the * and for some you don't. Did I understand that correctly?
well the foreach loop is the same, but the iteration loop is indeed different in the sense that one uses a pointer to an element and the other the element itself. one can write code that removes the pointer that is there (dereferencing it, or doing and inline function with subsequent call which allows you to reuse the same variable name): void myF(ref x){ // code } myF(*x); (that is a nice trick that I used several times). But yes there *is* a difference and the difference is that with arrays you might modify the element, modifying the stored value, whereas with streams you can't. This conceptual difference and if reflected in the interface. One can then discuss if immutable arrays should be iterated with immutable pointers or with values (i.e. copying) just as streams are.
 Andrei
Mar 24 2010
prev sibling parent reply Fawzi Mohamed <fawzi gmx.ch> writes:
	charset=US-ASCII;
	format=flowed;
	delsp=yes
Content-Transfer-Encoding: 7bit


On 25-mar-10, at 00:09, Fawzi Mohamed wrote:

 On 24-mar-10, at 23:29, Andrei Alexandrescu wrote:

 [...]
 So arrays have a different interface than streams. It looks like  
 you can't write code that works uniformly for both, because for  
 some you need the * and for some you don't. Did I understand that  
 correctly?
well the foreach loop is the same, but the iteration loop is indeed different in the sense that one uses a pointer to an element and the other the element itself. one can write code that removes the pointer that is there (dereferencing it, or doing and inline function with subsequent call which allows you to reuse the same variable name): void myF(ref x){ // code } myF(*x); (that is a nice trick that I used several times). But yes there *is* a difference and the difference is that with arrays you might modify the element, modifying the stored value, whereas with streams you can't. This conceptual difference and if reflected in the interface. One can then discuss if immutable arrays should be iterated with immutable pointers or with values (i.e. copying) just as streams are.
thinking more about this, you are right something that returns a ref can be used exactly the same way as something that returns a value if one takes the value with auto val=returnRefOrVal; can be used as value exactly in the same way. Whereas something that returns a T or a T* , need an explicit conversion. That is easy to do, and one can even easily wrap the delegate in place with something that returns T instead of T*, but the conversion has to be explicit (before feeding it to the code), or explicitly tested for in the code. In practice I hadn't real problems due to this, but it is something that is uglier than ref return. On the other hand it is easier to know if you might modify the value that you received expecting to modify the underlying structure.
Mar 25 2010
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 03/25/2010 05:32 AM, Fawzi Mohamed wrote:
 thinking more about this, you are right something that returns a ref can
 be used exactly the same way as something that returns a value if one
 takes the value with
 auto val=returnRefOrVal;
 can be used as value exactly in the same way.
 Whereas something that returns a T or a T* , need an explicit conversion.
 That is easy to do, and one can even easily wrap the delegate in place
 with something that returns T instead of T*, but the conversion has to
 be explicit (before feeding it to the code), or explicitly tested for in
 the code.
 In practice I hadn't real problems due to this, but it is something that
 is uglier than ref return.
 On the other hand it is easier to know if you might modify the value
 that you received expecting to modify the underlying structure.
I see. So what you're saying is that maybe the entire idea of iterating streams and arrays in a unified way may be problematic, because you can do different things with the elements of the two. While I partially agree with that, there are a lot of things that one can do the same way over a stream or a collection, and I wasn't able to find a way to do that that's reasonably efficient for both. Andrei
Mar 25 2010
prev sibling next sibling parent Fawzi Mohamed <fawzi gmx.ch> writes:
On 24-mar-10, at 15:00, Fawzi Mohamed wrote:

 [...]
 give a try to
 	bool popFront(ref T) ( or next, or another name, or even just a  
 delegate with that signature)
 I was surprised how well it works, not perfect but better than the  
 other alternatives I had tried.
I forgot to say, that one of the main pita with that approach is having to declare the arguments before using them, but should you decide that it is indeed a good alternative I have no doubt that you could find a good syntactic sugar that Walter could implement... :)
Mar 24 2010
prev sibling next sibling parent Fawzi Mohamed <fawzi gmx.ch> writes:
On 24-mar-10, at 15:11, Fawzi Mohamed wrote:

 On 24-mar-10, at 15:00, Fawzi Mohamed wrote:

 [...]
 give a try to
 	bool popFront(ref T) ( or next, or another name, or even just a  
 delegate with that signature)
 I was surprised how well it works, not perfect but better than the  
 other alternatives I had tried.
I forgot to say, that one of the main pita with that approach is having to declare the arguments before using them, but should you decide that it is indeed a good alternative I have no doubt that you could find a good syntactic sugar that Walter could implement... :)
if one would have methods bool f(ref T) as valid iterators then syntactic sugar replacing expr(f(auto a)); with static if(is(f S==function)){ S args; expr(f(args)); } else {static assert(0);} would be nice.
Mar 24 2010
prev sibling parent Fawzi Mohamed <fawzi gmx.ch> writes:
On 24-mar-10, at 15:11, Fawzi Mohamed wrote:

 On 24-mar-10, at 15:00, Fawzi Mohamed wrote:

 [...]
 give a try to
 	bool popFront(ref T) ( or next, or another name, or even just a  
 delegate with that signature)
 I was surprised how well it works, not perfect but better than the  
 other alternatives I had tried.
I forgot to say, that one of the main pita with that approach is having to declare the arguments before using them, but should you decide that it is indeed a good alternative I have no doubt that you could find a good syntactic sugar that Walter could implement... :)
if one would have methods bool f(ref T) as valid iterators then syntactic sugar replacing expr(f(auto a)); with static if(is(f S==function)){ S args; expr(f(args)); } else {static assert(0);} would be nice.
Mar 24 2010
prev sibling parent Neal Becker <ndbecker2 gmail.com> writes:
clueless bystander wrote:

 Watching D evolve from the outside there seems to be a lot of ongoing
 discussion on this newsgroup about the D range idiom which is somehow
 opposed to conventional thinking about iterators.
 
 Can someone please explain in plain words just exactly what a range is and
 how it differs from the iterator concept (if that's appropriate???) and
 what are the benefits from a data modeling and access perspective.
 
 Sure, I'm clueless, though suspect many other bystanders would appreciate
 a succinct heads-up.
 
 Thanks,
 clueless bystander
Wow, what a great email address!
Mar 27 2010