www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.announce - RFC on range design for D2

reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Hello,


Walter, Bartosz and myself have been hard at work trying to find the 
right abstraction for iteration. That abstraction would replace the 
infamous opApply and would allow for external iteration, thus paving the 
way to implementing real generic algorithms.

We considered an STL-style container/iterator design. Containers would 
use the newfangled value semantics to enforce ownership of their 
contents. Iterators would span containers in various ways.

The main problem with that approach was integrating built-in arrays into 
the design. STL's iterators are generalized pointers; D's built-in 
arrays are, however, not pointers, they are "pairs of pointers" that 
cover contiguous ranges in memory. Most people who've used D gained the 
intuition that slices are superior to pointers in many ways, such as 
easier checking for validity, higher-level compact primitives, 
streamlined and safe interface. However, if STL iterators are 
generalized pointers, what is the corresponding generalization of D's 
slices? Intuitively that generalization should also be superior to 
iterators.

In a related development, the Boost C++ library has defined ranges as 
pairs of two iterators and implemented a series of wrappers that accept 
ranges and forward their iterators to STL functions. The main outcome of 
Boost ranges been to decrease the verboseness and perils of naked 
iterator manipulation (see 
http://www.boost.org/doc/libs/1_36_0/libs/range/doc/intro.html). So a 
C++ application using Boost could avail itself of containers, ranges, 
and iterators. The Boost notion of range is very close to a 
generalization of D's slice.

We have considered that design too, but that raised a nagging question. 
In most slice-based D programming, using bare pointers is not necessary. 
Could then there be a way to use _only_ ranges and eliminate iterators 
altogether? A container/range design would be much simpler than one also 
exposing iterators.

All these questions aside, there are several other imperfections in the 
STL, many caused by the underlying language. For example STL is 
incapable of distinguishing between input/output iterators and forward 
iterators. This is because C++ cannot reasonably implement a type with 
destructive copy semantics, which is what would be needed to make said 
distinction. We wanted the Phobos design to provide appropriate answers 
to such questions, too. This would be useful particularly because it 
would allow implementation of true and efficient I/O integrated with 
iteration. STL has made an attempt at that, but istream_iterator and 
ostream_iterator are, with all due respect, a joke that builds on 
another joke, the iostreams.

After much thought and discussions among Walter, Bartosz and myself, I 
defined a range design and reimplemented all of std.algorithm and much 
of std.stdio in terms of ranges alone. This is quite a thorough test 
because the algorithms are diverse and stress-test the expressiveness 
and efficiency of the range design. Along the way I made the interesting 
realization that certain union/difference operations are needed as 
primitives for ranges. There are also a few bugs in the compiler and 
some needed language enhancements (e.g. returning a reference from a 
function); Walter is committed to implement them.

I put together a short document for the range design. I definitely 
missed about a million things and have been imprecise about another 
million, so feedback would be highly appreciated. See:

http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html


Andrei
Sep 08 2008
next sibling parent "Jarrett Billingsley" <jarrett.billingsley gmail.com> writes:
On Mon, Sep 8, 2008 at 5:50 PM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 Hello,


 Walter, Bartosz and myself have been hard at work trying to find the right
 abstraction for iteration. That abstraction would replace the infamous
 opApply and would allow for external iteration, thus paving the way to
 implementing real generic algorithms.

 We considered an STL-style container/iterator design. Containers would use
 the newfangled value semantics to enforce ownership of their contents.
 Iterators would span containers in various ways.

 The main problem with that approach was integrating built-in arrays into the
 design. STL's iterators are generalized pointers; D's built-in arrays are,
 however, not pointers, they are "pairs of pointers" that cover contiguous
 ranges in memory. Most people who've used D gained the intuition that slices
 are superior to pointers in many ways, such as easier checking for validity,
 higher-level compact primitives, streamlined and safe interface. However, if
 STL iterators are generalized pointers, what is the corresponding
 generalization of D's slices? Intuitively that generalization should also be
 superior to iterators.

 In a related development, the Boost C++ library has defined ranges as pairs
 of two iterators and implemented a series of wrappers that accept ranges and
 forward their iterators to STL functions. The main outcome of Boost ranges
 been to decrease the verboseness and perils of naked iterator manipulation
 (see http://www.boost.org/doc/libs/1_36_0/libs/range/doc/intro.html). So a
 C++ application using Boost could avail itself of containers, ranges, and
 iterators. The Boost notion of range is very close to a generalization of
 D's slice.

 We have considered that design too, but that raised a nagging question. In
 most slice-based D programming, using bare pointers is not necessary. Could
 then there be a way to use _only_ ranges and eliminate iterators altogether?
 A container/range design would be much simpler than one also exposing
 iterators.

 All these questions aside, there are several other imperfections in the STL,
 many caused by the underlying language. For example STL is incapable of
 distinguishing between input/output iterators and forward iterators. This is
 because C++ cannot reasonably implement a type with destructive copy
 semantics, which is what would be needed to make said distinction. We wanted
 the Phobos design to provide appropriate answers to such questions, too.
 This would be useful particularly because it would allow implementation of
 true and efficient I/O integrated with iteration. STL has made an attempt at
 that, but istream_iterator and ostream_iterator are, with all due respect, a
 joke that builds on another joke, the iostreams.

 After much thought and discussions among Walter, Bartosz and myself, I
 defined a range design and reimplemented all of std.algorithm and much of
 std.stdio in terms of ranges alone. This is quite a thorough test because
 the algorithms are diverse and stress-test the expressiveness and efficiency
 of the range design. Along the way I made the interesting realization that
 certain union/difference operations are needed as primitives for ranges.
 There are also a few bugs in the compiler and some needed language
 enhancements (e.g. returning a reference from a function); Walter is
 committed to implement them.

 I put together a short document for the range design. I definitely missed
 about a million things and have been imprecise about another million, so
 feedback would be highly appreciated. See:

 http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html


 Andrei

I like!
Sep 08 2008
prev sibling next sibling parent reply BCS <ao pathlink.com> writes:
Reply to Andrei,

 Hello,
 
 Walter, Bartosz and myself have been hard at work trying to find the
 right abstraction for iteration. That abstraction would replace the
 infamous opApply and would allow for external iteration, thus paving
 the way to implementing real generic algorithms.

First of all, I /Like/ opApply. I know there are issues with it so I'd rather see it supplemented rather than replaced.
Sep 08 2008
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
BCS wrote:
 Reply to Andrei,
 
 Hello,

 Walter, Bartosz and myself have been hard at work trying to find the
 right abstraction for iteration. That abstraction would replace the
 infamous opApply and would allow for external iteration, thus paving
 the way to implementing real generic algorithms.

First of all, I /Like/ opApply. I know there are issues with it so I'd rather see it supplemented rather than replaced.

We all like the way it looks. Ranges will preserve its syntax within a much more efficient and expressive implementation. Andrei
Sep 08 2008
next sibling parent reply BCS <ao pathlink.com> writes:
Reply to Andrei,

 BCS wrote:
 
 Reply to Andrei,
 
 Hello,
 
 Walter, Bartosz and myself have been hard at work trying to find the
 right abstraction for iteration. That abstraction would replace the
 infamous opApply and would allow for external iteration, thus paving
 the way to implementing real generic algorithms.
 

I'd rather see it supplemented rather than replaced.

much more efficient and expressive implementation. Andrei

I was referring to the implementation as visible from the called code's side
Sep 08 2008
parent reply Walter Bright <newshound1 digitalmars.com> writes:
BCS wrote:
 I was referring to the implementation as visible from the called code's 
 side

opApply isn't going to go away, it will still be there as an alternative.
Sep 09 2008
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Walter Bright wrote:
 BCS wrote:
 I was referring to the implementation as visible from the called 
 code's side

opApply isn't going to go away, it will still be there as an alternative.

I disagree with that as I think that would be one perfect place to simplify the language thus fulfilling bearophile's and many others' wish. Say we refine ranges to work with foreach to perfection. Then we have: 1. A foreach that sucks 2. A foreach that rocks The obvious question is, why keep the one that sucks? Andrei
Sep 09 2008
next sibling parent Extrawurst <spam extrawurst.org> writes:
Andrei Alexandrescu wrote:
 Walter Bright wrote:
 BCS wrote:
 I was referring to the implementation as visible from the called 
 code's side

opApply isn't going to go away, it will still be there as an alternative.

I disagree with that as I think that would be one perfect place to simplify the language thus fulfilling bearophile's and many others' wish. Say we refine ranges to work with foreach to perfection. Then we have: 1. A foreach that sucks 2. A foreach that rocks The obvious question is, why keep the one that sucks?

I agree but i am worried that wont happen. D gets more and more polluted by deprecated and/or ambiguous stuff: - inout/ref - opCall/struct-ctor are some examples. I whished D would only provide unambiguous features. Especially since D2.0 is the experimental branch anyway, so why not clean up finally ?
Sep 09 2008
prev sibling parent reply Dejan Lekic <dejan.lekic tiscali.co.uk> writes:
 I disagree with that as I think that would be one perfect place to 
 simplify the language thus fulfilling bearophile's and many others' wish.

I agree with this. It is a good idea, and it is explained in a very good way. You have my support Mr. Alexandrescu.
Sep 09 2008
parent "Manfred_Nowak" <svv1999 hotmail.com> writes:
Dejan Lekic wrote:

 r.fromBegin(s) is really s.toEnd(r)

Might be true only, if `s' equals `r'. Otherwise at least one seems to be undefined, because not both can be true subranges of each other. I used the conjunctive form, because a formal definition of "subrange" is missing. Such is needed because in a cyclic model `s' and `r' might be equal, but not identical, because they contain a whole cycle, but their start points differ. -manfred -- If life is going to exist in this Universe, then the one thing it cannot afford to have is a sense of proportion. (Douglas Adams)
Sep 09 2008
prev sibling parent BCS <ao pathlink.com> writes:
Reply to Walter,

 BCS wrote:
 
 I was referring to the implementation as visible from the called
 code's side
 

alternative.

Rock on! Sorry I took so long to reply, I had to lobotimize my PC
Sep 11 2008
prev sibling next sibling parent "Denis Koroskin" <2korden gmail.com> writes:
On Tue, 09 Sep 2008 23:46:27 +0400, Extrawurst <spam extrawurst.org> wrote:

 Andrei Alexandrescu wrote:
 Walter Bright wrote:
 BCS wrote:
 I was referring to the implementation as visible from the called  
 code's side

opApply isn't going to go away, it will still be there as an alternative.

simplify the language thus fulfilling bearophile's and many others' wish. Say we refine ranges to work with foreach to perfection. Then we have: 1. A foreach that sucks 2. A foreach that rocks The obvious question is, why keep the one that sucks?

I agree but i am worried that wont happen. D gets more and more polluted by deprecated and/or ambiguous stuff: - inout/ref - opCall/struct-ctor are some examples. I whished D would only provide unambiguous features. Especially since D2.0 is the experimental branch anyway, so why not clean up finally ?

I would also add: invariant float pi1 = 3.1415926; const float pi2 = 3.1415926; enum pi3 = 3.1415926; ...
Sep 09 2008
prev sibling parent "Bill Baxter" <wbaxter gmail.com> writes:
On Wed, Sep 10, 2008 at 12:46 PM, Manfred_Nowak <svv1999 hotmail.com> wrote:
 Dejan Lekic wrote:

 r.fromBegin(s) is really s.toEnd(r)

Might be true only, if `s' equals `r'. Otherwise at least one seems to be undefined, because not both can be true subranges of each other. I used the conjunctive form, because a formal definition of "subrange" is missing. Such is needed because in a cyclic model `s' and `r' might be equal, but not identical, because they contain a whole cycle, but their start points differ.

So I think you can put that in other words by saying that they could mean different things if there's more to a range's state than just a beginning indicator and an end indicator. For instance in your example it would be like putting a "#of cycles" member in the range itself as a third element, rather than associating it with the end marker. Interesting. I suppose that sort of thing is possible, but maybe such possibilities are annoying enough that they should be made illegal. In your example it seems simple enough to make the cycle count a property associated with the "end", and then the difference goes away. --bb
Sep 09 2008
prev sibling next sibling parent reply Sergey Gromov <snake.scaly gmail.com> writes:
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:
 Walter, Bartosz and myself have been hard at work trying to find the 
 right abstraction for iteration. That abstraction would replace the 
 infamous opApply and would allow for external iteration, thus paving the 
 way to implementing real generic algorithms.

opApply() wasn't my hero either. :) Your article really looks like something I'd expect to find in D. It only requires foreach support, and yeah, return by reference.
 I put together a short document for the range design. I definitely 
 missed about a million things and have been imprecise about another 
 million, so feedback would be highly appreciated. See:
 
 http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html

- next operation not mentioned in section 3, forward ranges. - the union operations look... weird. Unobvious. I'm too sleepy now to propose anything better but I'll definitely give it a try. The rest of the interface seems very natural. - what's a collection? How do you get a range out of there? Collection should be a range itself, like an array. But it shouldn't be destroyed after passing it to foreach(). How to achieve this if foreach() essentially uses getNext()? I'd really love to have this design in D though.
Sep 08 2008
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Sergey Gromov wrote:
 Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:
 Walter, Bartosz and myself have been hard at work trying to find the 
 right abstraction for iteration. That abstraction would replace the 
 infamous opApply and would allow for external iteration, thus paving the 
 way to implementing real generic algorithms.

opApply() wasn't my hero either. :) Your article really looks like something I'd expect to find in D. It only requires foreach support, and yeah, return by reference.

Indeed. Both are in the works.
 I put together a short document for the range design. I definitely 
 missed about a million things and have been imprecise about another 
 million, so feedback would be highly appreciated. See:

 http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html

- next operation not mentioned in section 3, forward ranges.

Thanks! Fixed.
 - the union operations look... weird.  Unobvious.  I'm too sleepy now to 
 propose anything better but I'll definitely give it a try.  The rest of 
 the interface seems very natural.

I agree I hadn't known what primitives would be needed when I sat down. Clearly there was a need for some since individual iterators are not available anymore. New ideas would be great; I suggest you validate them by implementing some nontrivial algorithms in std.algorithm with your, um, computational basis of choice :o).
 - what's a collection?  How do you get a range out of there?  Collection 
 should be a range itself, like an array.  But it shouldn't be destroyed 
 after passing it to foreach().  How to achieve this if foreach() 
 essentially uses getNext()?

These are great questions, I'm glad you asked. The way I see things, D2 ranges can be of two kinds: owned and unowned. For example D1's ranges are all unowned: auto a = new int[100]; ... This array is unowned because it going out of scope leaves the underlying array in place. Now consider: scope a = new int[100]; In this case the array is owned by the current scope. Scoped data is a very crude form of ownership that IMHO brought more trouble than it solved. It's a huge hole smack in the middle of everything, and we plan to revisit it as soon as we can. A better design would be to define collections that own their contents. For example: Array!(int) a(100); This time a does own the underlying array. You'd be able to get ranges all over it: int[] b = a.all; So now we have two nice notions: Arrays own the data. Ranges walk over that data. An array can have many ranges crawling over it. But two arrays never overlap. The contents of the array will be destroyed (BUT NOT DELETED) when a goes out of scope. What's the deal with destroyed but not deleted? Consider: int[] a; if (condition) { Array!(int) b; a = b.all; } writeln(a); This is a misuse of the array in that a range crawling on its back has survived the array itself. What should happen now? Looking at other languages: 1) All Java objects are unowned, meaning the issue does not appear in the first place, which is an advantage. The disadvantage is that scarce resources must be carefully managed by hand in client code. 2) C++ makes the behavior undefined because it destroys data AND recycles memory as soon as the array goes out of scope. Mayhem ensues. We'd like: 1.5) Allow object ownership but also make the behavior of incorrect code well-defined so it can be reasoned about, reproduced, and debugged. That's why I think an Array going out of scope should invoke destructors for its data, and then obliterate contents with ElementType.init. That way, an Array!(File) will properly close all files AND put them in a "closed" state. At the same time, the memory associated with the array will NOT be deallocated, so a range surviving the array will never crash unrelated code, but instead will see closed files all over. In the case of int, there is no destructor so none of that happens. Surviving ranges will continue looking at the contents, which is now unowned. So there is a difference in the way data with destructors and data without destructors is handled. I don't like that, but this is the most reasonably effective design I came up with so far. About the "collection should be a range itself" mantra, I've had a micro-epiphany. Since D's slices so nicely model at the same time arrays and their ranges, it is very seductive to think of carrying that to other collection types. But I got disabused of that notion as soon as I wanted to define a less simple data structure. Consider a matrix: auto a = BlockMatrix!(float, 3)(100, 200, 300); defines a block contiguous matrix of three dimensions with the respective sizes. Now a should be the matrix AND its range at the same time. But what's "the range" of a matrix? Oops. As soon as you start to think of it, so many darn ranges come to mind. * flat: all elements in one shot in an arbitrary order * dimension-wise: iterate over a given dimension * subspace: iterate over a "slice" of the matrix with fewer dimensions * diagonal: scan the matrix from one corner to the opposite corner I guess there are some more. So before long I realized that the most gainful design is this: a) A matrix owns its stuff and is preoccupied with storage internals, allocation, and the such. b) The matrix defines as many range types as it wants. c) Users use the ranges. For example: foreach (ref e; a.flat) e *= 1.1; foreach (row; a.dim(0)) row[0, 0] = 0; foreach (col; a.dim(1)) col[1, 1] *= 5; and so on. Inevitably naysayers will, well, naysay: D defined a built-in array, but it also needs Array, so built-in arrays turned out to be useless. So how is that better than C++ which has pointers and vector? Walter has long feared such naysaying and opposed addition of user-defined array types to Phobos. But now I am fully prepared to un-naysay the naysayers: built-in slices are a superior building block to naked pointers. They are in fact embodying a powerful concept, that of a range. With ranges everything there is can be built efficiently and safely. Finally, garbage collection helps by ensuring object ownership while preserving well-definedness of incorrect code. Andrei
Sep 08 2008
next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Robert Jacques wrote:
 On Mon, 08 Sep 2008 20:24:27 -0400, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 
 About the "collection should be a range itself" mantra, I've had a 
 micro-epiphany. Since D's slices so nicely model at the same time 
 arrays and their ranges, it is very seductive to think of carrying 
 that to other collection types. But I got disabused of that notion as 
 soon as I wanted to define a less simple data structure. Consider a 
 matrix:

 auto a = BlockMatrix!(float, 3)(100, 200, 300);

 defines a block contiguous matrix of three dimensions with the 
 respective sizes. Now a should be the matrix AND its range at the same 
 time. But what's "the range" of a matrix? Oops. As soon as you start 
 to think of it, so many darn ranges come to mind.

 * flat: all elements in one shot in an arbitrary order

 * dimension-wise: iterate over a given dimension

 * subspace: iterate over a "slice" of the matrix with fewer dimensions

 * diagonal: scan the matrix from one corner to the opposite corner

 I guess there are some more. So before long I realized that the most 
 gainful design is this:

 a) A matrix owns its stuff and is preoccupied with storage internals, 
 allocation, and the such.

 b) The matrix defines as many range types as it wants.

 c) Users use the ranges.

 For example:

 foreach (ref e; a.flat) e *= 1.1;
 foreach (row; a.dim(0)) row[0, 0] = 0;
 foreach (col; a.dim(1)) col[1, 1] *= 5;

I'd recommend a more clear cut example. Three of the ranges are very well defined in array languages and libraries. Essentially a slice of a matrix is another matrix that may have less or more dimensions and therefore may be a collection in addition to a range.

There are two problems with the view that a slice of a matrix is also a matrix: 1. If ownership is desired, then the slice does not own anything in the matrix, so that does not put it on equal footing with the matrix it started with. 2. Slicing a block matrix on a hyperplane will result in a strided range. That is not a block matrix at all. So again it is more useful to think of the block matrix as the store, and of various ranges crawling over it as ways to look at the matrix. I agree that ranges could be shoehorned into working. But then all ownership is out of the window, and also it creates more confusion than it clears. Now for n possible ranges over a block matrix, you have a daunting task ahead. You'd need to be able to construct any from any other, otherwise you could get stuck with a range that's a sort of dead end. That could be solved in a number of ways (e.g. force all range2range conversions to go through some "central" range) but by that time you start to realize that that design is not quite gainful. Besides, it only takes care of matrices, but not of various other nonlinear structures The approach in which the container is preoccupied with storage and ownership, and it offers various ranges that view the container in various ways, sounds like the better design to me.
 The dimension-wise 
 range is the only operation which is more complex, due to the type and 
 dimensions of the returned array changing float[x,y,z] -> float[x,y][z]. 
 And the main argument is that a float[x,y][z] is large, slow to create 
 and unwanted, so a separate range/generator is better (Also note a 
 generator can provide implicit the head const, tail mutable nature of 
 the range). Even given this, it doesn't contratict the "collection 
 should be a range itself" mantra, since there is a very well defined 
 range which encompasses the data, its just that some ranges are more 
 optimal if they're only views, and not a root collection.

Again, I agree a range-only view can be shoehorned into working. I just think it would be a bad design. Andrei
Sep 08 2008
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Jarrett Billingsley wrote:
 On Mon, Sep 8, 2008 at 8:24 PM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 Sergey Gromov wrote:
 Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:
 Walter, Bartosz and myself have been hard at work trying to find the
 right abstraction for iteration. That abstraction would replace the infamous
 opApply and would allow for external iteration, thus paving the way to
 implementing real generic algorithms.

something I'd expect to find in D. It only requires foreach support, and yeah, return by reference.


Quick question about this one -- how will iterators get foreach support? Are they classes or structs? If they're structs, how will the compiler know something is an iterator? Or will it be based on duck typing (if it looks like an iterator, it must be an iterator)? And if this support involves "blessing" certain types within the runtime, what will this mean for other runtime libraries?

Great question. We'll go with structs and duck typing (why the heck don't they call it structural typing...) but we'll add interfaces for the range types so that they can be used dynamically too. Code generation will take care of gluing implementations into classes (more on that later). As someone said in some thread on digitalmars.d, if you start with structs it's easy to move towards classes. The other way around is not as easy. Andrei
Sep 08 2008
next sibling parent reply Alix Pexton <_a_l_i_x_._p_e_x_t_o_n_ _g_m_a_i_l_._c_o_m_> writes:
Andrei Alexandrescu wrote:
(why the heck don't they call it structural typing...) 
 
 Andrei

If it were the norm to call it "structural typing" and someone asked why it was called so, the one being asked would likely have to resort to diagrams and gesticulation in order to adequately convey the reasoning. With "duck typing" a simple and widespread saying* is all that is required. A... * "If it walks like a duck and it quacks like a duck, then it's a duck!"
Sep 09 2008
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Alix Pexton wrote:
 Andrei Alexandrescu wrote:
 (why the heck don't they call it structural typing...)
 Andrei

If it were the norm to call it "structural typing" and someone asked why it was called so, the one being asked would likely have to resort to diagrams and gesticulation in order to adequately convey the reasoning. With "duck typing" a simple and widespread saying* is all that is required. A... * "If it walks like a duck and it quacks like a duck, then it's a duck!"

Yeah, I know. It's a good point. Yet I'm somehow weary of cultural references in formalisms. Andrei
Sep 09 2008
prev sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
Brad Roberts wrote:
 Probably rhetorical, but I can't help myself:  If it walks like a duck
 and it talks like a duck, it must be a duck.

And if it floats and has a long nose, it's a witch!! Burn her!!!
Sep 09 2008
parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Walter Bright" <newshound1 digitalmars.com> wrote in message 
news:ga6pq4$hqt$1 digitalmars.com...
 Brad Roberts wrote:
 Probably rhetorical, but I can't help myself:  If it walks like a duck
 and it talks like a duck, it must be a duck.

And if it floats and has a long nose, it's a witch!! Burn her!!!

And she's got a wart! :) -steve
Sep 09 2008
prev sibling next sibling parent Brad Roberts <braddr puremagic.com> writes:
 Great question. We'll go with structs and duck typing (why the heck
 don't they call it structural typing...) but we'll add interfaces for
 the range types so that they can be used dynamically too. Code
 generation will take care of gluing implementations into classes (more
 on that later).

 Andrei

Probably rhetorical, but I can't help myself: If it walks like a duck and it talks like a duck, it must be a duck.
Sep 08 2008
prev sibling next sibling parent Derek Parnell <derek nomail.afraid.org> writes:
On Mon, 08 Sep 2008 22:22:04 -0400, Robert Jacques wrote:
 Essentially a slice of a matrix is another matrix 

I see it a little differently. To me, a slice of a matrix is a set of data that can be used to construct a new matrix, among other uses of course. Because a matrix can be sliced and diced in very many ways, some of which are clearly not matrices, the best we can generalize about a matrix slice is just that it is a set of data values. How one uses those values is dependant, to a degree, on what sort of slice created the set. -- Derek (skype: derek.j.parnell) Melbourne, Australia 9/09/2008 6:30:46 PM
Sep 09 2008
prev sibling next sibling parent reply Sergey Gromov <snake.scaly gmail.com> writes:
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:
 Sergey Gromov wrote:
 - what's a collection?  How do you get a range out of there?  Collection 
 should be a range itself, like an array.  But it shouldn't be destroyed 
 after passing it to foreach().  How to achieve this if foreach() 
 essentially uses getNext()?

These are great questions, I'm glad you asked. The way I see things, D2 ranges can be of two kinds: owned and unowned. For example D1's ranges are all unowned: [snip] A better design would be to define collections that own their contents. For example: Array!(int) a(100); This time a does own the underlying array. You'd be able to get ranges all over it: int[] b = a.all;

I really don't like to have basic language constructs implemented as templates. It's like Tuple!() which is sorta basic type but requires template trickery to really work with it.
 So now we have two nice notions: Arrays own the data. Ranges walk over 
 that data. An array can have many ranges crawling over it. But two 
 arrays never overlap. The contents of the array will be destroyed (BUT 
 NOT DELETED) when a goes out of scope.

This invalidates the idea of safe manipulations with data no matter where you've got that data from.
 About the "collection should be a range itself" mantra, I've had a 
 micro-epiphany. Since D's slices so nicely model at the same time arrays 
 and their ranges, it is very seductive to think of carrying that to 
 other collection types. But I got disabused of that notion as soon as I 
 wanted to define a less simple data structure. Consider a matrix:
 
 auto a = BlockMatrix!(float, 3)(100, 200, 300);
 
 defines a block contiguous matrix of three dimensions with the 
 respective sizes. Now a should be the matrix AND its range at the same 
 time. But what's "the range" of a matrix? Oops. As soon as you start to 
 think of it, so many darn ranges come to mind.

If you cannot think of a natural default range for your collection--- well, it's your decision not to implement range interface for it. But if it does have a natural iteration semantics, it should be possible to implement: auto a = new File("name"); auto b = new TreeSet!(char); foreach(ch; a) b.insert(ch); foreach(ch; b) writeln("unique char ", ch); Here is the problem. First foreach() naturally and expectedly changes the state of an object a. Second foreach() naturally and expectedly does not make changes to object b. Solution: File is an Input range in your notion. It supports isEmpty() and getNext(), it is non-copyable (but, note, referenceable). TreeSet is a Collection, which you don't discuss. It implements opSlice () without arguments, which is required and sufficient to define a collection. opSlice() must return a range that, at least, supports input range operations. foreach() checks if a passed object implements opSlice() so that it can iterate non-destructively. If no opSlice() is provided, it falls back to getNext().
 a) A matrix owns its stuff and is preoccupied with storage internals, 
 allocation, and the such.
 
 b) The matrix defines as many range types as it wants.
 
 c) Users use the ranges.

No problem. The matrix lives as long as a range refers to it. As expected.
 Inevitably naysayers will, well, naysay: D defined a built-in array, but 
 it also needs Array, so built-in arrays turned out to be useless. So how 
 is that better than C++ which has pointers and vector? Walter has long 
 feared such naysaying and opposed addition of user-defined array types 
 to Phobos. But now I am fully prepared to un-naysay the naysayers: 
 built-in slices are a superior building block to naked pointers. They 
 are in fact embodying a powerful concept, that of a range. With ranges 
 everything there is can be built efficiently and safely. Finally, 
 garbage collection helps by ensuring object ownership while preserving 
 well-definedness of incorrect code.

Slices need to implement random access range interface, that's all.
Sep 09 2008
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Sergey Gromov wrote:
 Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:
 Sergey Gromov wrote:
 - what's a collection?  How do you get a range out of there?  Collection 
 should be a range itself, like an array.  But it shouldn't be destroyed 
 after passing it to foreach().  How to achieve this if foreach() 
 essentially uses getNext()?

ranges can be of two kinds: owned and unowned. For example D1's ranges are all unowned: [snip] A better design would be to define collections that own their contents. For example: Array!(int) a(100); This time a does own the underlying array. You'd be able to get ranges all over it: int[] b = a.all;

I really don't like to have basic language constructs implemented as templates. It's like Tuple!() which is sorta basic type but requires template trickery to really work with it.

Well I guess we disagree on a number of issues here. The problem with "sorta basic" types is that the list could go on forever. I'd rather use a language that allows creation of good types from a small core, instead of one that tries to supplant all sorta basic types it could think of.
 So now we have two nice notions: Arrays own the data. Ranges walk over 
 that data. An array can have many ranges crawling over it. But two 
 arrays never overlap. The contents of the array will be destroyed (BUT 
 NOT DELETED) when a goes out of scope.

This invalidates the idea of safe manipulations with data no matter where you've got that data from.

Manipulation remains typesafe. The problem is that sometimes we want to ensure timely termination of certain resources.
 About the "collection should be a range itself" mantra, I've had a 
 micro-epiphany. Since D's slices so nicely model at the same time arrays 
 and their ranges, it is very seductive to think of carrying that to 
 other collection types. But I got disabused of that notion as soon as I 
 wanted to define a less simple data structure. Consider a matrix:

 auto a = BlockMatrix!(float, 3)(100, 200, 300);

 defines a block contiguous matrix of three dimensions with the 
 respective sizes. Now a should be the matrix AND its range at the same 
 time. But what's "the range" of a matrix? Oops. As soon as you start to 
 think of it, so many darn ranges come to mind.

If you cannot think of a natural default range for your collection--- well, it's your decision not to implement range interface for it. But if it does have a natural iteration semantics, it should be possible to implement:

It's not that I can't think of one. It's that I think of too many.
 auto a = new File("name");
 auto b = new TreeSet!(char);
 
 foreach(ch; a)
   b.insert(ch);
 
 foreach(ch; b)
     writeln("unique char ", ch);
 
 Here is the problem.  First foreach() naturally and expectedly changes 
 the state of an object a.  Second foreach() naturally and expectedly 
 does not make changes to object b.
 
 Solution:
 
 File is an Input range in your notion.  It supports isEmpty() and 
 getNext(), it is non-copyable (but, note, referenceable).

You left a crucial detail out. What does getNext() return? In the new std.stdio design, a File is preoccupied with opening, closing, and transferring data for the underlying file. On top of it several input ranges can be constructed - that read lines, blocks, parse text, format text, and so on. (One thing I want is to allow std.algorithm to work with I/O easily and naturally.) I fully understand there can be so many design choices in handling all this stuff, it's not even funny. I can't get excited about an equivalent solution that to me has no obvious advantages. I can even less get excited about a solution that I have objections with. That doesn't mean someone else can't get excited over it, and probably rightly so.
 TreeSet is a Collection, which you don't discuss.  It implements opSlice
 () without arguments, which is required and sufficient to define a 
 collection.  opSlice() must return a range that, at least, supports 
 input range operations.
 
 foreach() checks if a passed object implements opSlice() so that it can 
 iterate non-destructively. If no opSlice() is provided, it falls back to 
 getNext().
 
 a) A matrix owns its stuff and is preoccupied with storage internals, 
 allocation, and the such.

 b) The matrix defines as many range types as it wants.

 c) Users use the ranges.

No problem. The matrix lives as long as a range refers to it. As expected.

I, too, wish reference counting is a solution to everything. Andrei
Sep 09 2008
parent Sergey Gromov <snake.scaly gmail.com> writes:
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:
 Sergey Gromov wrote:
 Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:
 A better design would be to define collections that own their contents. 
 For example:

 Array!(int) a(100);

 This time a does own the underlying array. You'd be able to get ranges 
 all over it:

 int[] b = a.all;

I really don't like to have basic language constructs implemented as templates. It's like Tuple!() which is sorta basic type but requires template trickery to really work with it.

Well I guess we disagree on a number of issues here. The problem with "sorta basic" types is that the list could go on forever. I'd rather use a language that allows creation of good types from a small core, instead of one that tries to supplant all sorta basic types it could think of.

I think I've got your point here. D is not Python, it shouldn't do anything high-level in the core. The notion of range is sufficient to iterate through anything, core (namely foreach) doesn't need to be aware of the collections themselves. Though I'm not fully convinced. It's always good to have good defaults. So that you could quickly throw things together, and attend to details later. So that you could write string[string] dic; foreach(k, v; dic) whatever; Can I do this with your Array!()? Or should I always use all() even though the Array!() is plain linear and obvious?
 auto a = new File("name");
 auto b = new TreeSet!(char);
 
 foreach(ch; a)
   b.insert(ch);
 
 foreach(ch; b)
     writeln("unique char ", ch);
 
 Here is the problem.  First foreach() naturally and expectedly changes 
 the state of an object a.  Second foreach() naturally and expectedly 
 does not make changes to object b.
 
 Solution:
 
 File is an Input range in your notion.  It supports isEmpty() and 
 getNext(), it is non-copyable (but, note, referenceable).

You left a crucial detail out. What does getNext() return?

Something. Documented. I'd be happy with string, that is, line by line iteration. It's nice for text dumping, simple configurations, simple internet protocols like POP and FTP, user interaction. You see, some useful default. The File could also provide byte range bytes() and dchar range chars() and whatever the author considered feasible. Note that I wasn't convincing you to change stdio design. My choice of class names was bad. I should have used MyFile instead of File, meaning some user class with user functionality.
 In the new std.stdio design, a File is preoccupied with opening,
 closing, and transferring data for the underlying file. On top of it
 several input ranges can be constructed - that read lines, blocks, parse
 text, format text, and so on. (One thing I want is to allow
 std.algorithm to work with I/O easily and naturally.)

OK, you like this design, no problem. Better even. Your File is naturally iterable over its bytes. Any low-level file is, anybody knows that. I can see no reason to deny foreach() over a file.
 foreach() checks if a passed object implements opSlice() so that it can 
 iterate non-destructively. If no opSlice() is provided, it falls back to 
 getNext().


Sep 10 2008
prev sibling next sibling parent reply Leandro Lucarella <llucax gmail.com> writes:
Andrei Alexandrescu, el  8 de septiembre a las 19:24 me escribiste:
 1.5) Allow object ownership but also make the behavior of incorrect code 
 well-defined so it can be reasoned about, reproduced, and debugged.
 
 That's why I think an Array going out of scope should invoke destructors for
its 
 data, and then obliterate contents with ElementType.init. That way, an 
 Array!(File) will properly close all files AND put them in a "closed" state.
At 
 the same time, the memory associated with the array will NOT be deallocated,
so 
 a range surviving the array will never crash unrelated code, but instead will 
 see closed files all over.

Why is so bad that the program crashes if you do something wrong? For how long you will have the memory "alive", it will use "regular" GC semantics (i.e., when nobody points at it anymore)? In that case, leting the programmer leave dangling pointers to data that should be "dead" without crashing, wouldn't make easier to introduce memory leaks? -- Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/ ---------------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------------- Did you know the originally a Danish guy invented the burglar-alarm unfortunately, it got stolen
Sep 09 2008
parent reply superdan <super dan.org> writes:
Leandro Lucarella Wrote:

 Andrei Alexandrescu, el  8 de septiembre a las 19:24 me escribiste:
 1.5) Allow object ownership but also make the behavior of incorrect code 
 well-defined so it can be reasoned about, reproduced, and debugged.
 
 That's why I think an Array going out of scope should invoke destructors for
its 
 data, and then obliterate contents with ElementType.init. That way, an 
 Array!(File) will properly close all files AND put them in a "closed" state.
At 
 the same time, the memory associated with the array will NOT be deallocated,
so 
 a range surviving the array will never crash unrelated code, but instead will 
 see closed files all over.

Why is so bad that the program crashes if you do something wrong?

it's not bad. it's good if it crashes. problem is when it don't crash and continues running on oil instead of gas if you see what i mean.
 For how
 long you will have the memory "alive", it will use "regular" GC semantics
 (i.e., when nobody points at it anymore)? In that case, leting the
 programmer leave dangling pointers to data that should be "dead" without
 crashing, wouldn't make easier to introduce memory leaks?

such is the peril of gc. clearly meshing scoping with gc ain't gonna be perfect. but i like the next best thing. scarce resources are deallocated quick. memory stays around for longer. no dangling pointers.
Sep 09 2008
parent Leandro Lucarella <llucax gmail.com> writes:
superdan, el  9 de septiembre a las 10:12 me escribiste:
 For how
 long you will have the memory "alive", it will use "regular" GC semantics
 (i.e., when nobody points at it anymore)? In that case, leting the
 programmer leave dangling pointers to data that should be "dead" without
 crashing, wouldn't make easier to introduce memory leaks?

such is the peril of gc. clearly meshing scoping with gc ain't gonna be perfect. but i like the next best thing. scarce resources are deallocated quick. memory stays around for longer. no dangling pointers.

I was talking about logical[1] memory leaks, wich are possible even with GC. int[] a; if (condition) { Array!(int) b; a = b.all; } If you expect that b memory is freed at the end of the scope, and a retains it, is a logical memory leak (you probably forgot to null a before the scope ended). I think this kind of errors should be detected as soon as possible, as opposed to let a keep using that memory (or leak it). [1] http://en.wikipedia.org/wiki/Garbage_collection_(computer_science)#Benefits -- Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/ ---------------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------------- Desde chiquito quería ser doctor Pero después me enfermé y me hice músico
Sep 09 2008
prev sibling next sibling parent reply downs <default_357-line yahoo.de> writes:
Andrei Alexandrescu wrote:
 What's the deal with destroyed but not deleted? Consider:
 
 int[] a;
 if (condition) {
    Array!(int) b;
    a = b.all;
 }
 writeln(a);
 
 This is a misuse of the array in that a range crawling on its back has
 survived the array itself. What should happen now? Looking at other
 languages:
 
 1) All Java objects are unowned, meaning the issue does not appear in
 the first place, which is an advantage. The disadvantage is that scarce
 resources must be carefully managed by hand in client code.
 
 2) C++ makes the behavior undefined because it destroys data AND
 recycles memory as soon as the array goes out of scope. Mayhem ensues.
 
 We'd like:
 
 1.5) Allow object ownership but also make the behavior of incorrect code
 well-defined so it can be reasoned about, reproduced, and debugged.
 
 That's why I think an Array going out of scope should invoke destructors
 for its data, and then obliterate contents with ElementType.init. That
 way, an Array!(File) will properly close all files AND put them in a
 "closed" state. At the same time, the memory associated with the array
 will NOT be deallocated, so a range surviving the array will never crash
 unrelated code, but instead will see closed files all over.
 

I don't think this is a good thing, for reasons similar to the Error/Exception flaw - specifically, code that works in debug mode might end up failing in release mode. To explain what I mean by Error/Exception flaw, consider the case of an array out of bounds error, wrapped carelessly in a try .. catch (Exception) block. This would work fine in debug mode, and presumably retry the operation until it succeeded. In release mode, however, the above would crash. This is clearly undesirable, and arises directly from the fact that Error is derived from Exception, not the other way around or completely separate (as it clearly should be). After all, an Error ![is] an Exception, since Exceptions are clearly defined as recoverable errors, and the set of unrecoverable errors is obviously not a subset of the recoverable ones. This leads to my actual point: I suggest an extension of .init: the .fail state, indicating data that should not be accessed. Any standard library function that encounters data that is intentionally in the .fail state should throw an Error. For instance, the .fail state for strings could be a deliberately invalid UTF8 sequence. When this state could reasonably come up in normal operations, it is recommended to use values that will readily be visible in a debugger, such as the classical 0xDEADBEEF. This is imnsho superior to using .init to fill this memory, which doesn't tell the debugging programmer much about what exactly happened, and furthermore, might cause the program to treat "invalid" memory the same as "fresh" memory, if only by accident. --downs
Sep 09 2008
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
downs wrote:
 Andrei Alexandrescu wrote:
 What's the deal with destroyed but not deleted? Consider:
 
 int[] a; if (condition) { Array!(int) b; a = b.all; } writeln(a);
 
 This is a misuse of the array in that a range crawling on its back
 has survived the array itself. What should happen now? Looking at
 other languages:
 
 1) All Java objects are unowned, meaning the issue does not appear
 in the first place, which is an advantage. The disadvantage is that
 scarce resources must be carefully managed by hand in client code.
 
 2) C++ makes the behavior undefined because it destroys data AND 
 recycles memory as soon as the array goes out of scope. Mayhem
 ensues.
 
 We'd like:
 
 1.5) Allow object ownership but also make the behavior of incorrect
 code well-defined so it can be reasoned about, reproduced, and
 debugged.
 
 That's why I think an Array going out of scope should invoke
 destructors for its data, and then obliterate contents with
 ElementType.init. That way, an Array!(File) will properly close all
 files AND put them in a "closed" state. At the same time, the
 memory associated with the array will NOT be deallocated, so a
 range surviving the array will never crash unrelated code, but
 instead will see closed files all over.
 

I don't think this is a good thing, for reasons similar to the Error/Exception flaw - specifically, code that works in debug mode might end up failing in release mode. To explain what I mean by Error/Exception flaw, consider the case of an array out of bounds error, wrapped carelessly in a try .. catch (Exception) block. This would work fine in debug mode, and presumably retry the operation until it succeeded. In release mode, however, the above would crash. This is clearly undesirable, and arises directly from the fact that Error is derived from Exception, not the other way around or completely separate (as it clearly should be). After all, an Error ![is] an Exception, since Exceptions are clearly defined as recoverable errors, and the set of unrecoverable errors is obviously not a subset of the recoverable ones. This leads to my actual point: I suggest an extension of .init: the .fail state, indicating data that should not be accessed. Any standard library function that encounters data that is intentionally in the .fail state should throw an Error. For instance, the .fail state for strings could be a deliberately invalid UTF8 sequence. When this state could reasonably come up in normal operations, it is recommended to use values that will readily be visible in a debugger, such as the classical 0xDEADBEEF. This is imnsho superior to using .init to fill this memory, which doesn't tell the debugging programmer much about what exactly happened, and furthermore, might cause the program to treat "invalid" memory the same as "fresh" memory, if only by accident.

I hear you. I brought up the same exact design briefly with Bartosz last week. We called it T.invalid. He argued in favor of it. I thought it brings more complication than it's worth and was willing to go with T.init for simplicity's sake. Why deal with two empty states instead of one. One nagging question is, what is T.fail for integral types? For pointers fine, one could be found. For chars, fine too. But for integrals I'm not sure that e.g. T.min or T.max is a credible fail value. Andrei
Sep 09 2008
next sibling parent reply downs <default_357-line yahoo.de> writes:
Andrei Alexandrescu wrote:
 downs wrote:
 Andrei Alexandrescu wrote:
 What's the deal with destroyed but not deleted? Consider:

 int[] a; if (condition) { Array!(int) b; a = b.all; } writeln(a);

 This is a misuse of the array in that a range crawling on its back
 has survived the array itself. What should happen now? Looking at
 other languages:

 1) All Java objects are unowned, meaning the issue does not appear
 in the first place, which is an advantage. The disadvantage is that
 scarce resources must be carefully managed by hand in client code.

 2) C++ makes the behavior undefined because it destroys data AND
 recycles memory as soon as the array goes out of scope. Mayhem
 ensues.

 We'd like:

 1.5) Allow object ownership but also make the behavior of incorrect
 code well-defined so it can be reasoned about, reproduced, and
 debugged.

 That's why I think an Array going out of scope should invoke
 destructors for its data, and then obliterate contents with
 ElementType.init. That way, an Array!(File) will properly close all
 files AND put them in a "closed" state. At the same time, the
 memory associated with the array will NOT be deallocated, so a
 range surviving the array will never crash unrelated code, but
 instead will see closed files all over.

I don't think this is a good thing, for reasons similar to the Error/Exception flaw - specifically, code that works in debug mode might end up failing in release mode. To explain what I mean by Error/Exception flaw, consider the case of an array out of bounds error, wrapped carelessly in a try .. catch (Exception) block. This would work fine in debug mode, and presumably retry the operation until it succeeded. In release mode, however, the above would crash. This is clearly undesirable, and arises directly from the fact that Error is derived from Exception, not the other way around or completely separate (as it clearly should be). After all, an Error ![is] an Exception, since Exceptions are clearly defined as recoverable errors, and the set of unrecoverable errors is obviously not a subset of the recoverable ones. This leads to my actual point: I suggest an extension of .init: the .fail state, indicating data that should not be accessed. Any standard library function that encounters data that is intentionally in the .fail state should throw an Error. For instance, the .fail state for strings could be a deliberately invalid UTF8 sequence. When this state could reasonably come up in normal operations, it is recommended to use values that will readily be visible in a debugger, such as the classical 0xDEADBEEF. This is imnsho superior to using .init to fill this memory, which doesn't tell the debugging programmer much about what exactly happened, and furthermore, might cause the program to treat "invalid" memory the same as "fresh" memory, if only by accident.

I hear you. I brought up the same exact design briefly with Bartosz last week. We called it T.invalid. He argued in favor of it. I thought it brings more complication than it's worth and was willing to go with T.init for simplicity's sake. Why deal with two empty states instead of one. One nagging question is, what is T.fail for integral types? For pointers fine, one could be found. For chars, fine too. But for integrals I'm not sure that e.g. T.min or T.max is a credible fail value. Andrei

For numbers, it should probably be "the same as .init". Not every error condition can be detected, sadly. It would also be nice if a .fail value could be provided as an extension to typedef somehow .. user defined types will probably have their own possible failure indicators.
Sep 09 2008
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
downs wrote:
 For numbers, it should probably be "the same as .init". Not every
 error condition can be detected, sadly.

That further dillutes the benefits of T.fail.
 It would also be nice if a .fail value could be provided as an
 extension to typedef somehow .. user defined types will probably have
 their own possible failure indicators.

That further increases the cognitive cost of T.fail. Not putting you down. I think the notion is good, but I think we need to thoroughly understand its costs and benefits before even raising it to Walter's level of consciousness :o). Andrei
Sep 09 2008
prev sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
Andrei Alexandrescu wrote:
 I hear you. I brought up the same exact design briefly with Bartosz last 
 week. We called it T.invalid. He argued in favor of it. I thought it 
 brings more complication than it's worth and was willing to go with 
 T.init for simplicity's sake. Why deal with two empty states instead of 
 one.
 
 One nagging question is, what is T.fail for integral types? For pointers 
 fine, one could be found. For chars, fine too. But for integrals I'm not 
 sure that e.g. T.min or T.max is a credible fail value.

The T.init value should be that. That's why, for floats, float.init is a NaN. But for many types, there is no such thing as an invalid value, so it really doesn't work for generic code.
Sep 09 2008
next sibling parent reply Benji Smith <dlanguage benjismith.net> writes:
Walter Bright wrote:
 Andrei Alexandrescu wrote:
 I hear you. I brought up the same exact design briefly with Bartosz 
 last week. We called it T.invalid. He argued in favor of it. I thought 
 it brings more complication than it's worth and was willing to go with 
 T.init for simplicity's sake. Why deal with two empty states instead 
 of one.

 One nagging question is, what is T.fail for integral types? For 
 pointers fine, one could be found. For chars, fine too. But for 
 integrals I'm not sure that e.g. T.min or T.max is a credible fail value.

The T.init value should be that. That's why, for floats, float.init is a NaN. But for many types, there is no such thing as an invalid value, so it really doesn't work for generic code.

I don't think values necessarily have to be initialized to an invalid value. You could certainly argue that NaN values are valid results of certain computations, and that they're valid in certain contexts. The important thing is that they're *uncommon*, and if you see them cropping up all over the place where they shouldn't, you know you have an initialization problem somewhere in your code. The same thing could be true for integers, but zero is such a common value that it's tough to spot the origin of the error. If signed integers were initialized to min_value and signed integers were initialized to max_value, I think those initialization errors would be easier to track down. Not because the values are illegal, but because they're *uncommon*. --benji
Sep 09 2008
parent JAnderson <ask me.com> writes:
Benji Smith wrote:
 Walter Bright wrote:
 Andrei Alexandrescu wrote:
 I hear you. I brought up the same exact design briefly with Bartosz 
 last week. We called it T.invalid. He argued in favor of it. I 
 thought it brings more complication than it's worth and was willing 
 to go with T.init for simplicity's sake. Why deal with two empty 
 states instead of one.

 One nagging question is, what is T.fail for integral types? For 
 pointers fine, one could be found. For chars, fine too. But for 
 integrals I'm not sure that e.g. T.min or T.max is a credible fail 
 value.

The T.init value should be that. That's why, for floats, float.init is a NaN. But for many types, there is no such thing as an invalid value, so it really doesn't work for generic code.

I don't think values necessarily have to be initialized to an invalid value. You could certainly argue that NaN values are valid results of certain computations, and that they're valid in certain contexts. The important thing is that they're *uncommon*, and if you see them cropping up all over the place where they shouldn't, you know you have an initialization problem somewhere in your code. The same thing could be true for integers, but zero is such a common value that it's tough to spot the origin of the error. If signed integers were initialized to min_value and signed integers were initialized to max_value, I think those initialization errors would be easier to track down. Not because the values are illegal, but because they're *uncommon*. --benji

I agree. I use the 0xcdcdcdcd and 0xfefefefe provided by MSVC a lot to track down errors. -Jowl
Sep 09 2008
prev sibling parent reply "Manfred_Nowak" <svv1999 hotmail.com> writes:
Walter Bright wrote:

 But for many types, there is no such thing as an invalid value

Why can one then define | typedef int T=void; // T.init == void -manfred -- If life is going to exist in this Universe, then the one thing it cannot afford to have is a sense of proportion. (Douglas Adams)
Sep 10 2008
parent "Manfred_Nowak" <svv1999 hotmail.com> writes:
Brad Roberts wrote:

 That just means don't initialize

I know what the semantics of `T= void' is supposed to be, but your remark is only a result of Walters overloading of meanings to keywords. It does not change the fact, that for all types _one_ more possibility exists to (not)initialize it, than it has legal values. If there is one more possibility, then there are many more; including the possibility, that the initial value is in fact `void', i.e. illegal as an rvalue. See http://www.digitalmars.com/webnews/newsgroups.php? art_group=digitalmars.D.bugs&article_id=15041 for an example that, D in fact uses `void' as an initilization value. -manfred -- If life is going to exist in this Universe, then the one thing it cannot afford to have is a sense of proportion. (Douglas Adams)
Sep 11 2008
prev sibling next sibling parent reply Sergey Gromov <snake.scaly gmail.com> writes:
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:
 Sergey Gromov wrote:
 - the union operations look... weird.  Unobvious.  I'm too sleepy now to 
 propose anything better but I'll definitely give it a try.  The rest of 
 the interface seems very natural.

I agree I hadn't known what primitives would be needed when I sat down. Clearly there was a need for some since individual iterators are not available anymore. New ideas would be great; I suggest you validate them by implementing some nontrivial algorithms in std.algorithm with your, um, computational basis of choice :o).

r.before(s) r.after(s) r.begin r.end Here r.before(s) is everything from the r's first element (inclusive) to the first s's element (exclusive); r.after(s) is from last s's element (exclusive) to the last element of r (inclusive); r.begin is an empty range at the beginning of a parent range; and r.end is an empty range at the end of a parent range. Therefore, according to your diagram: r.toBegin(s) => r.before(s) s.toEnd(r) => s.before(r.end) s.fromEnd(r) => s.after(r)
Sep 10 2008
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Sergey Gromov wrote:
 Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:
 Sergey Gromov wrote:
 - the union operations look... weird.  Unobvious.  I'm too sleepy now to 
 propose anything better but I'll definitely give it a try.  The rest of 
 the interface seems very natural.

Clearly there was a need for some since individual iterators are not available anymore. New ideas would be great; I suggest you validate them by implementing some nontrivial algorithms in std.algorithm with your, um, computational basis of choice :o).

r.before(s) r.after(s) r.begin r.end Here r.before(s) is everything from the r's first element (inclusive) to the first s's element (exclusive); r.after(s) is from last s's element (exclusive) to the last element of r (inclusive); r.begin is an empty range at the beginning of a parent range; and r.end is an empty range at the end of a parent range. Therefore, according to your diagram: r.toBegin(s) => r.before(s) s.toEnd(r) => s.before(r.end) s.fromEnd(r) => s.after(r)

Cool! I was thinking of something along the same lines through the night, and actually with the same names before and after, but the begin and end did not occur to me. As soon as I'll have another chunk of time, I'll make another pass through algorithm2 to see how these work. But you may as well want to take std.algorithm and make it work with your primitives. Andrei
Sep 10 2008
prev sibling parent Brad Roberts <braddr puremagic.com> writes:
Manfred_Nowak wrote:
 Walter Bright wrote:
 
 But for many types, there is no such thing as an invalid value

Why can one then define | typedef int T=void; // T.init == void -manfred

That just means don't initialize, leaving any instances with random values until first assignment. That doesn't mean that it contains an invalid value. Later, Brad
Sep 10 2008
prev sibling next sibling parent reply "Denis Koroskin" <2korden gmail.com> writes:
On Tue, 09 Sep 2008 01:50:54 +0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Hello,


 Walter, Bartosz and myself have been hard at work trying to find the  
 right abstraction for iteration. That abstraction would replace the  
 infamous opApply and would allow for external iteration, thus paving the  
 way to implementing real generic algorithms.

 We considered an STL-style container/iterator design. Containers would  
 use the newfangled value semantics to enforce ownership of their  
 contents. Iterators would span containers in various ways.

 The main problem with that approach was integrating built-in arrays into  
 the design. STL's iterators are generalized pointers; D's built-in  
 arrays are, however, not pointers, they are "pairs of pointers" that  
 cover contiguous ranges in memory. Most people who've used D gained the  
 intuition that slices are superior to pointers in many ways, such as  
 easier checking for validity, higher-level compact primitives,  
 streamlined and safe interface. However, if STL iterators are  
 generalized pointers, what is the corresponding generalization of D's  
 slices? Intuitively that generalization should also be superior to  
 iterators.

 In a related development, the Boost C++ library has defined ranges as  
 pairs of two iterators and implemented a series of wrappers that accept  
 ranges and forward their iterators to STL functions. The main outcome of  
 Boost ranges been to decrease the verboseness and perils of naked  
 iterator manipulation (see  
 http://www.boost.org/doc/libs/1_36_0/libs/range/doc/intro.html). So a  
 C++ application using Boost could avail itself of containers, ranges,  
 and iterators. The Boost notion of range is very close to a  
 generalization of D's slice.

 We have considered that design too, but that raised a nagging question.  
 In most slice-based D programming, using bare pointers is not necessary.  
 Could then there be a way to use _only_ ranges and eliminate iterators  
 altogether? A container/range design would be much simpler than one also  
 exposing iterators.

 All these questions aside, there are several other imperfections in the  
 STL, many caused by the underlying language. For example STL is  
 incapable of distinguishing between input/output iterators and forward  
 iterators. This is because C++ cannot reasonably implement a type with  
 destructive copy semantics, which is what would be needed to make said  
 distinction. We wanted the Phobos design to provide appropriate answers  
 to such questions, too. This would be useful particularly because it  
 would allow implementation of true and efficient I/O integrated with  
 iteration. STL has made an attempt at that, but istream_iterator and  
 ostream_iterator are, with all due respect, a joke that builds on  
 another joke, the iostreams.

 After much thought and discussions among Walter, Bartosz and myself, I  
 defined a range design and reimplemented all of std.algorithm and much  
 of std.stdio in terms of ranges alone. This is quite a thorough test  
 because the algorithms are diverse and stress-test the expressiveness  
 and efficiency of the range design. Along the way I made the interesting  
 realization that certain union/difference operations are needed as  
 primitives for ranges. There are also a few bugs in the compiler and  
 some needed language enhancements (e.g. returning a reference from a  
 function); Walter is committed to implement them.

 I put together a short document for the range design. I definitely  
 missed about a million things and have been imprecise about another  
 million, so feedback would be highly appreciated. See:

 http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html


 Andrei

1) There is a typo: // Copies a range to another void copy(R1, R2)(R1 src, R2 tgt) { while (!src.isEmpty) { tgt.putNext(r.getNext); // should be tgt.putNext(src.getNext); } } 2) R.next and R.pop could have better names. I mean, they represent similar operations yet names are so different. 3) Walter mentioned that built-in array could be re-implemented using a pair of pointers instead of ptr+length. Will it ever get a green light? It fits range concept much better. 4) We need some way of supporting dollar notation in user containers. The hack of using __dollar is bad (although it works). 5) I don't quite like names left and right! :) I think they should represent limits (pointers to begin and end, in case of array) rather that values. In this case, built-in arrays could be implemented as follows: struct Array(T) { T* left; T* right; size_t length() { return right-left; } ref T opIndex(size_t index) { return left[index]; } // etc } The rationale behind having access to range limits is to allow operations on them. For example, R.left-=n; could be used instead of foreach(i; 0..n) { R.pop(); } which is more efficient in many cases. Other that that - great, I like it.
Sep 08 2008
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Denis Koroskin wrote:
 1) There is a typo:
 
 // Copies a range to another
 void copy(R1, R2)(R1 src, R2 tgt)
 {
     while (!src.isEmpty)
     {
         tgt.putNext(r.getNext);  // should be tgt.putNext(src.getNext);
     }
 }

Thanks! Fixed.
 2) R.next and R.pop could have better names. I mean, they represent 
 similar operations yet names are so different.

I agree. Next was a natural choice. I stole pop from Perl. Any symmetric and short operation names would be welcome.
 3) Walter mentioned that built-in array could be re-implemented using a 
 pair of pointers instead of ptr+length. Will it ever get a green light? 
 It fits range concept much better.

Walter told me to first implement my design, and if it works, he'll do the change. Yes, it does fit ranges much better because the often-used next and, um, pop will only touch one word instead of two.
 4) We need some way of supporting dollar notation in user containers. 
 The hack of using __dollar is bad (although it works).

It doesn't work for multiple dimensions. There should be an opDollar(uint dim) that gives the library information on which argument count it occured in. Consider: auto x = matrix[$-1, $-1]; Here the dollar's occurrences have different meanings. A good start would be to expand the above into: auto x = matrix[matrix.opDollar(0)-1, matrix.opDollar(1)-1];
 5) I don't quite like names left and right! :) I think they should 
 represent limits (pointers to begin and end, in case of array) rather 
 that values. In this case, built-in arrays could be implemented as follows:
 
 struct Array(T)
 {
     T* left;
     T* right;
     size_t length() { return right-left; }
     ref T opIndex(size_t index) { return left[index]; }
     // etc
 }
 
 The rationale behind having access to range limits is to allow 
 operations on them. For example,
 R.left-=n;

I disagree. Defining operations on range limits opens a box that would make Pandora jealous: 1. What is the type of left in general? Um, let's define Iterator!(R) for each range R. 2. What are the primitives of an iterator? Well, -= sounds good. How do you check it for correctness? In fact, how do you check any operation of a naked iterator for correctness? 3. I want to play with some data. What should I use here, ranges or iterators? ... Much of the smarts of the range design is that it gets away WITHOUT having to answer embarrassing questions such as the above. Ranges are rock-solid, and part of them being rock-solid is that they expose enough primitives to be complete, but at the same time do not expose dangerous internals.
 could be used instead of
 foreach(i; 0..n) {
     R.pop();
 }
 
 which is more efficient in many cases.

Stop right there. That's not a primitive. It is an algorithm that gets implemented in terms of a primitive. I disagree that such an algorithm is an operator and does not have a name such as popN.
 Other that that - great, I like it.

Thanks for your comments. Andrei
Sep 08 2008
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Robert Jacques wrote:
 On Mon, 08 Sep 2008 20:37:41 -0400, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 Denis Koroskin wrote:
 3) Walter mentioned that built-in array could be re-implemented using 
 a pair of pointers instead of ptr+length. Will it ever get a green 
 light? It fits range concept much better.

Walter told me to first implement my design, and if it works, he'll do the change. Yes, it does fit ranges much better because the often-used next and, um, pop will only touch one word instead of two.

I'd warn that changing away from ptr+length would create logical incosistencies between 1D arrays and 2D/3D/ND arrays.

How so?
 4) We need some way of supporting dollar notation in user containers. 
 The hack of using __dollar is bad (although it works).

It doesn't work for multiple dimensions. There should be an opDollar(uint dim) that gives the library information on which argument count it occured in. Consider: auto x = matrix[$-1, $-1]; Here the dollar's occurrences have different meanings. A good start would be to expand the above into: auto x = matrix[matrix.opDollar(0)-1, matrix.opDollar(1)-1];

I'd also add that multiple dimension slicing should be supported. i.e. auto x = matrix[2..5,0..$,3] would become auto x = matrix.opSlice(Slice!(size_t)(2,5),Slice!(size_t)(0,matrix.opDollar(0)),3) with struct Slice (T) { T start; T end; } Strided slices would also be nice. i.e. matrix[0..$:10] // decimate the array

Multidimensional slicing can be implemented with staggered indexing: matrix[2..5][0..$][3] means: first, take a slice 2..5 that returns a matrix range one dimension smaller. Then, for that type take a slice from 0 to $. And so on. This works great for row-wise storage. I'm not sure how efficient it would be for other storage schemes. Note how nice the distinction between the container and its views works: there is only one matrix. But there are many ranges and subranges within it, bearing various relationships with one another. Andrei
Sep 08 2008
next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Robert Jacques wrote:
 On Mon, 08 Sep 2008 23:53:17 -0400, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 
 Robert Jacques wrote:
 On Mon, 08 Sep 2008 20:37:41 -0400, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 Denis Koroskin wrote:
 3) Walter mentioned that built-in array could be re-implemented 
 using a pair of pointers instead of ptr+length. Will it ever get a 
 green light? It fits range concept much better.

Walter told me to first implement my design, and if it works, he'll do the change. Yes, it does fit ranges much better because the often-used next and, um, pop will only touch one word instead of two.

incosistencies between 1D arrays and 2D/3D/ND arrays.

How so?

An ND array is typically defined as a fat pointer like so: struct array(T,size_t N) { T* ptr; size_t[N] lengths; // of each dimension ptrdiff_t[N] byte_strides; // of each dimension } So a 1D array is { T* ptr; size_t lengths; ptrdiff_t byte_strides = T.sizeof; //Currently a compile time constant in the built-in array size_t length() { return lengths; } } which is logically consistent with a general dense matrix and aside from some name change and the stride being a compile time constant, is identical to the current D arrays. However, { T* first; T* last } may not be logically extended to ND arrays, particularly sliced ND arrays, as T* last no longer has any meaning.

Hmmm, I see. That could become a problem if we wanted lower-dimensional matrices to be prefixes of higher-dimensional matrices. This is a worthy goal, but one that my matrices don't pursue.
 4) We need some way of supporting dollar notation in user 
 containers. The hack of using __dollar is bad (although it works).

It doesn't work for multiple dimensions. There should be an opDollar(uint dim) that gives the library information on which argument count it occured in. Consider: auto x = matrix[$-1, $-1]; Here the dollar's occurrences have different meanings. A good start would be to expand the above into: auto x = matrix[matrix.opDollar(0)-1, matrix.opDollar(1)-1];

auto x = matrix[2..5,0..$,3] would become auto x = matrix.opSlice(Slice!(size_t)(2,5),Slice!(size_t)(0,matrix.opDollar(0)),3) with struct Slice (T) { T start; T end; } Strided slices would also be nice. i.e. matrix[0..$:10] // decimate the array

Multidimensional slicing can be implemented with staggered indexing: matrix[2..5][0..$][3]

Yes, but doing so utilizes expression templates and is relatively slow: matrix_row_slice temp1 = matrix.opSlice(2,5); matrix_col_slice temp2 = temp1.opSlice(0,$); matrix = temp2.opIndex(3); And causes code bloat. Worst matrix[2..5] by itself would be an unstable type. Either foo(matrix[2..5]) would not compile or it would generate code bloat and hard to find logic bugs. (Due to the fact that you've embedded the dimension of the slice operation into the type).

What is an unstable type? There is no use of expression templates, but indeed multiple slices are created. This isn't as bad as it seems because the desire was to access several elements, so the slice is supposed to be around for long enough to justify its construction cost. I agree it would be onerous to access a single element with e.g. matrix[1][1][2].
 means: first, take a slice 2..5 that returns a matrix range one 
 dimension smaller. Then, for that type take a slice from 0 to $. And 
 so on.

 This works great for row-wise storage. I'm not sure how efficient it 
 would be for other storage schemes.

No it doesn't. It works great for standard C arrays of arrays, but these are not matrices and have a large number of well documented performance issues when used as such. In general, multi-dimentional data structures relatively common and should be cleanly supported.

[Citation needed]
 Note how nice the distinction between the container and its views 
 works: there is only one matrix. But there are many ranges and 
 subranges within it, bearing various relationships with one another.

Yes, Data+View (i.e MVC for data structures) is a good thing(TM). But generally, matrices have been views into data and not the data themselves. (unless needed for memory management, etc)

Well if better terminology comes along I'm all for it. I want to define the "matrix storage" as a owning container, and several "matrix ranges" that access the data stored by the storage. Andrei
Sep 09 2008
prev sibling parent Oskar Linde <oskar.lindeREM OVEgmail.com> writes:
First, let me add my support for the range proposal. It is in line with 
earlier suggestions but makes some crucial additions. I'm also very glad 
that one of the most influencing forces behind D has returned to those 
newsgroups.

Andrei Alexandrescu wrote:
 Robert Jacques wrote:
 On Mon, 08 Sep 2008 20:37:41 -0400, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 Denis Koroskin wrote:
 4) We need some way of supporting dollar notation in user 
 containers. The hack of using __dollar is bad (although it works).

It doesn't work for multiple dimensions. There should be an opDollar(uint dim) that gives the library information on which argument count it occured in. Consider: auto x = matrix[$-1, $-1]; Here the dollar's occurrences have different meanings. A good start would be to expand the above into: auto x = matrix[matrix.opDollar(0)-1, matrix.opDollar(1)-1];

I'd also add that multiple dimension slicing should be supported. i.e. auto x = matrix[2..5,0..$,3] would become auto x = matrix.opSlice(Slice!(size_t)(2,5),Slice!(size_t)(0,matrix.opDollar(0)),3) with struct Slice (T) { T start; T end; } Strided slices would also be nice. i.e. matrix[0..$:10] // decimate the array

Multidimensional slicing can be implemented with staggered indexing: matrix[2..5][0..$][3] means: first, take a slice 2..5 that returns a matrix range one dimension smaller. Then, for that type take a slice from 0 to $. And so on.

Implementing multidimensional slicing in this way is quite irregular. One would expect m[2..5][1..2] to behave just like s[2..5][1..2] would for a regular array. (e.g. consider s being of the type char[]). I implemented most of all of this (multidimensional arrays) a little more than two years ago. (Being able to implement it required me to write the patch that made it into DMD 0.166 that made it possible for structs and classes to have template member functions/operators.) There was basically one Matrix type with rectangular, dense storage and a strided Slice type. I believe the only thing that makes sense when it comes to multidimensional slicing is how I implemented it: m[i_1, i_2, i_3, ...], where i_x is either a range or a singular index, and x corresponds to the dimension (1,2,3,...) results in a slice where each dimension indexed by a singular index is collapsed and dimensions indexed by a range is kept. The resulting syntax is: m[range(2,5), all, 3, range($-1,$)] even though the possibility to use $ like this wasn't discovered until much later. Of course, the optimal syntax would be something like: m[2..5, 0..$, 3, $-1..$], which would be easiest to implement by making a..b a value in it self of an integral range type. Intriguingly, integral ranges would just be an implementation of the same range concept you have already presented. -- Oskar
Sep 11 2008
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Mon, 08 Sep 2008 20:37:41 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:
 Denis Koroskin wrote:
 3) Walter mentioned that built-in array could be re-implemented using a  
 pair of pointers instead of ptr+length. Will it ever get a green light?  
 It fits range concept much better.

Walter told me to first implement my design, and if it works, he'll do the change. Yes, it does fit ranges much better because the often-used next and, um, pop will only touch one word instead of two.

I'd warn that changing away from ptr+length would create logical incosistencies between 1D arrays and 2D/3D/ND arrays.
 4) We need some way of supporting dollar notation in user containers.  
 The hack of using __dollar is bad (although it works).

It doesn't work for multiple dimensions. There should be an opDollar(uint dim) that gives the library information on which argument count it occured in. Consider: auto x = matrix[$-1, $-1]; Here the dollar's occurrences have different meanings. A good start would be to expand the above into: auto x = matrix[matrix.opDollar(0)-1, matrix.opDollar(1)-1];

I'd also add that multiple dimension slicing should be supported. i.e. auto x = matrix[2..5,0..$,3] would become auto x = matrix.opSlice(Slice!(size_t)(2,5),Slice!(size_t)(0,matrix.opDollar(0)),3) with struct Slice (T) { T start; T end; } Strided slices would also be nice. i.e. matrix[0..$:10] // decimate the array
Sep 08 2008
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Mon, 08 Sep 2008 23:53:17 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Robert Jacques wrote:
 On Mon, 08 Sep 2008 20:37:41 -0400, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> wrote:
 Denis Koroskin wrote:
 3) Walter mentioned that built-in array could be re-implemented using  
 a pair of pointers instead of ptr+length. Will it ever get a green  
 light? It fits range concept much better.

Walter told me to first implement my design, and if it works, he'll do the change. Yes, it does fit ranges much better because the often-used next and, um, pop will only touch one word instead of two.

incosistencies between 1D arrays and 2D/3D/ND arrays.

How so?

An ND array is typically defined as a fat pointer like so: struct array(T,size_t N) { T* ptr; size_t[N] lengths; // of each dimension ptrdiff_t[N] byte_strides; // of each dimension } So a 1D array is { T* ptr; size_t lengths; ptrdiff_t byte_strides = T.sizeof; //Currently a compile time constant in the built-in array size_t length() { return lengths; } } which is logically consistent with a general dense matrix and aside from some name change and the stride being a compile time constant, is identical to the current D arrays. However, { T* first; T* last } may not be logically extended to ND arrays, particularly sliced ND arrays, as T* last no longer has any meaning.
 4) We need some way of supporting dollar notation in user containers.  
 The hack of using __dollar is bad (although it works).

It doesn't work for multiple dimensions. There should be an opDollar(uint dim) that gives the library information on which argument count it occured in. Consider: auto x = matrix[$-1, $-1]; Here the dollar's occurrences have different meanings. A good start would be to expand the above into: auto x = matrix[matrix.opDollar(0)-1, matrix.opDollar(1)-1];

auto x = matrix[2..5,0..$,3] would become auto x = matrix.opSlice(Slice!(size_t)(2,5),Slice!(size_t)(0,matrix.opDollar(0)),3) with struct Slice (T) { T start; T end; } Strided slices would also be nice. i.e. matrix[0..$:10] // decimate the array

Multidimensional slicing can be implemented with staggered indexing: matrix[2..5][0..$][3]

Yes, but doing so utilizes expression templates and is relatively slow: matrix_row_slice temp1 = matrix.opSlice(2,5); matrix_col_slice temp2 = temp1.opSlice(0,$); matrix = temp2.opIndex(3); And causes code bloat. Worst matrix[2..5] by itself would be an unstable type. Either foo(matrix[2..5]) would not compile or it would generate code bloat and hard to find logic bugs. (Due to the fact that you've embedded the dimension of the slice operation into the type).
 means: first, take a slice 2..5 that returns a matrix range one  
 dimension smaller. Then, for that type take a slice from 0 to $. And so  
 on.

 This works great for row-wise storage. I'm not sure how efficient it  
 would be for other storage schemes.

No it doesn't. It works great for standard C arrays of arrays, but these are not matrices and have a large number of well documented performance issues when used as such. In general, multi-dimentional data structures relatively common and should be cleanly supported.
 Note how nice the distinction between the container and its views works:  
 there is only one matrix. But there are many ranges and subranges within  
 it, bearing various relationships with one another.

Yes, Data+View (i.e MVC for data structures) is a good thing(TM). But generally, matrices have been views into data and not the data themselves. (unless needed for memory management, etc)
Sep 08 2008
prev sibling next sibling parent reply Sergey Gromov <snake.scaly gmail.com> writes:
Denis Koroskin <2korden gmail.com> wrote:
 5) I don't quite like names left and right! :) I think they should  
 represent limits (pointers to begin and end, in case of array) rather that  
 values. In this case, built-in arrays could be implemented as follows:
 
 struct Array(T)
 {
      T* left;
      T* right;
      size_t length() { return right-left; }
      ref T opIndex(size_t index) { return left[index]; }
      // etc
 }
 
 The rationale behind having access to range limits is to allow operations  
 on them. For example,
 R.left-=n;
 
 could be used instead of
 foreach(i; 0..n) {
      R.pop();
 }

Now you stepped onto your own landmine. :) "R.left-=n" extends the range beyond its beginning with unpredictable consequences. That's why such operations shouldn't be easily accessible.
Sep 09 2008
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Sergey Gromov wrote:
 Denis Koroskin <2korden gmail.com> wrote:
 5) I don't quite like names left and right! :) I think they should  
 represent limits (pointers to begin and end, in case of array) rather that  
 values. In this case, built-in arrays could be implemented as follows:

 struct Array(T)
 {
      T* left;
      T* right;
      size_t length() { return right-left; }
      ref T opIndex(size_t index) { return left[index]; }
      // etc
 }

 The rationale behind having access to range limits is to allow operations  
 on them. For example,
 R.left-=n;

 could be used instead of
 foreach(i; 0..n) {
      R.pop();
 }

Now you stepped onto your own landmine. :) "R.left-=n" extends the range beyond its beginning with unpredictable consequences. That's why such operations shouldn't be easily accessible.

Oh I thought it's R.right -= n. It has become clear to me that a range never increases. It always shrinks. It can increase if fused to another range (I'm thinking of relaxing the fusion operations to allow for overlapping/adjacent ranges, not only ranges that include one another). But without extra info from the container a range can never grow. Andrei
Sep 09 2008
parent Sergey Gromov <snake.scaly gmail.com> writes:
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:
 Sergey Gromov wrote:
 Denis Koroskin <2korden gmail.com> wrote:
 5) I don't quite like names left and right! :) I think they should  
 represent limits (pointers to begin and end, in case of array) rather that  
 values. In this case, built-in arrays could be implemented as follows:

 struct Array(T)
 {
      T* left;
      T* right;
      size_t length() { return right-left; }
      ref T opIndex(size_t index) { return left[index]; }
      // etc
 }

 The rationale behind having access to range limits is to allow operations  
 on them. For example,
 R.left-=n;

 could be used instead of
 foreach(i; 0..n) {
      R.pop();
 }

Now you stepped onto your own landmine. :) "R.left-=n" extends the range beyond its beginning with unpredictable consequences. That's why such operations shouldn't be easily accessible.

Oh I thought it's R.right -= n. It has become clear to me that a range never increases. It always shrinks. It can increase if fused to another range (I'm thinking of relaxing the fusion operations to allow for overlapping/adjacent ranges, not only ranges that include one another). But without extra info from the container a range can never grow.

It was obviously a typo, but a very dangerous typo indeed.
Sep 09 2008
prev sibling next sibling parent "Denis Koroskin" <2korden gmail.com> writes:
On Tue, 09 Sep 2008 04:37:41 +0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Denis Koroskin wrote:
 1) There is a typo:
  // Copies a range to another
 void copy(R1, R2)(R1 src, R2 tgt)
 {
     while (!src.isEmpty)
     {
         tgt.putNext(r.getNext);  // should be tgt.putNext(src.getNext);
     }
 }

Thanks! Fixed.
 2) R.next and R.pop could have better names. I mean, they represent  
 similar operations yet names are so different.

I agree. Next was a natural choice. I stole pop from Perl. Any symmetric and short operation names would be welcome.

1) R.left += n / R.right -= n 2) R.left.advance(n) / R.right.advance(n) (or move) 3) R.advanceLeft(n)/R.advanceRight(n) (or moveLeft/moveRight)
 3) Walter mentioned that built-in array could be re-implemented using a  
 pair of pointers instead of ptr+length. Will it ever get a green light?  
 It fits range concept much better.

Walter told me to first implement my design, and if it works, he'll do the change. Yes, it does fit ranges much better because the often-used next and, um, pop will only touch one word instead of two.
 4) We need some way of supporting dollar notation in user containers.  
 The hack of using __dollar is bad (although it works).

It doesn't work for multiple dimensions. There should be an opDollar(uint dim) that gives the library information on which argument count it occured in. Consider: auto x = matrix[$-1, $-1]; Here the dollar's occurrences have different meanings. A good start would be to expand the above into: auto x = matrix[matrix.opDollar(0)-1, matrix.opDollar(1)-1];
 5) I don't quite like names left and right! :) I think they should  
 represent limits (pointers to begin and end, in case of array) rather  
 that values. In this case, built-in arrays could be implemented as  
 follows:
  struct Array(T)
 {
     T* left;
     T* right;
     size_t length() { return right-left; }
     ref T opIndex(size_t index) { return left[index]; }
     // etc
 }
  The rationale behind having access to range limits is to allow  
 operations on them. For example,
 R.left-=n;

I disagree. Defining operations on range limits opens a box that would make Pandora jealous: 1. What is the type of left in general? Um, let's define Iterator!(R) for each range R.

 2. What are the primitives of an iterator? Well, -= sounds good. How do  
 you check it for correctness? In fact, how do you check any operation of  
 a naked iterator for correctness?

First of all, left is a forward iterator and right is a backward iterator. That's why left might support ++, += and = while right support --, -= and =. Note that while we can rename ++ to advance() to get uniform naming. In many cases it is desirable to store an iterator and set an iterator to the range bounds.
 3. I want to play with some data. What should I use here, ranges or  
 iterators? ...

I don't think that ranges replace iterators (yet?). I define range as a pair of iterators to myself. For example, what is the equivalent D code of the following: for (iterator it = ..., end = list.end(); it != end; ) { if (predicate(*it)) { it = listerase(it); } else { ++it; } } Range range = vector.all; while (!range.isEmpty) { if (predicate(range.left)) { ??? } else { range.next; } } Iterator solution would be as follows: range.left = list.erase(range.left); I took a list for the sake of simlicity, because erasing an element doesn't update an end. However, in general, both left and right should be updated. A good solution would be to return a new range, like this: Range range = ...; range = container.erase(???); // but what should be here? maybe this: container.erase(range); // erase the whole range, no need to return anything because it would be empty anyway container = container.eraseFirtsN(range, n); // erase first n elements from a range, and return the rest. container = container.eraseLastN(range, n); // the same but for other end I don't say that iterators magically solve everyrthing, I merely try to find problematic places.
 Much of the smarts of the range design is that it gets away WITHOUT  
 having to answer embarrassing questions such as the above. Ranges are  
 rock-solid, and part of them being rock-solid is that they expose enough  
 primitives to be complete, but at the same time do not expose dangerous  
 internals.

 could be used instead of
 foreach(i; 0..n) {
     R.pop();
 }
  which is more efficient in many cases.

Stop right there. That's not a primitive. It is an algorithm that gets implemented in terms of a primitive. I disagree that such an algorithm is an operator and does not have a name such as popN.

 Other that that - great, I like it.

Thanks for your comments. Andrei

Thank YOU!
Sep 09 2008
prev sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Tue, 09 Sep 2008 07:06:55 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Robert Jacques wrote:
 On Mon, 08 Sep 2008 23:53:17 -0400, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> wrote:

 Robert Jacques wrote:
 On Mon, 08 Sep 2008 20:37:41 -0400, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> wrote:
 Denis Koroskin wrote:
 3) Walter mentioned that built-in array could be re-implemented  
 using a pair of pointers instead of ptr+length. Will it ever get a  
 green light? It fits range concept much better.

Walter told me to first implement my design, and if it works, he'll do the change. Yes, it does fit ranges much better because the often-used next and, um, pop will only touch one word instead of two.

incosistencies between 1D arrays and 2D/3D/ND arrays.

How so?

struct array(T,size_t N) { T* ptr; size_t[N] lengths; // of each dimension ptrdiff_t[N] byte_strides; // of each dimension } So a 1D array is { T* ptr; size_t lengths; ptrdiff_t byte_strides = T.sizeof; //Currently a compile time constant in the built-in array size_t length() { return lengths; } } which is logically consistent with a general dense matrix and aside from some name change and the stride being a compile time constant, is identical to the current D arrays. However, { T* first; T* last } may not be logically extended to ND arrays, particularly sliced ND arrays, as T* last no longer has any meaning.

Hmmm, I see. That could become a problem if we wanted lower-dimensional matrices to be prefixes of higher-dimensional matrices. This is a worthy goal, but one that my matrices don't pursue.
 4) We need some way of supporting dollar notation in user  
 containers. The hack of using __dollar is bad (although it works).

It doesn't work for multiple dimensions. There should be an opDollar(uint dim) that gives the library information on which argument count it occured in. Consider: auto x = matrix[$-1, $-1]; Here the dollar's occurrences have different meanings. A good start would be to expand the above into: auto x = matrix[matrix.opDollar(0)-1, matrix.opDollar(1)-1];

i.e. auto x = matrix[2..5,0..$,3] would become auto x = matrix.opSlice(Slice!(size_t)(2,5),Slice!(size_t)(0,matrix.opDollar(0)),3) with struct Slice (T) { T start; T end; } Strided slices would also be nice. i.e. matrix[0..$:10] // decimate the array

Multidimensional slicing can be implemented with staggered indexing: matrix[2..5][0..$][3]

matrix_row_slice temp1 = matrix.opSlice(2,5); matrix_col_slice temp2 = temp1.opSlice(0,$); matrix = temp2.opIndex(3); And causes code bloat. Worst matrix[2..5] by itself would be an unstable type. Either foo(matrix[2..5]) would not compile or it would generate code bloat and hard to find logic bugs. (Due to the fact that you've embedded the dimension of the slice operation into the type).

What is an unstable type?

What I meant, is that the type is fundamentally not designed to exist by itself. And therefore if not paired with the other slices, you've put your program into an a danger state.
 There is no use of expression templates,

So what would you call embedding an operation into a very temporary type that's not expected to last beyond the line of it's return?
 but indeed multiple slices are created. This isn't as bad as it seems  
 because the desire was to access several elements, so the slice is  
 supposed to be around for long enough to justify its construction cost.

True, but it's still less efficient.
 I agree it would be onerous to access a single element with e.g.  
 matrix[1][1][2].

And what about the code bloat and compile time or runtime logic bugs that this design is prone to produce?
 means: first, take a slice 2..5 that returns a matrix range one  
 dimension smaller. Then, for that type take a slice from 0 to $. And  
 so on.

 This works great for row-wise storage. I'm not sure how efficient it  
 would be for other storage schemes.

these are not matrices and have a large number of well documented performance issues when used as such. In general, multi-dimentional data structures relatively common and should be cleanly supported.

[Citation needed]

Imperfect C++: Practical Solutions for Real-Life Programming by Matthew Wilson has an entire chapter dedicated to the performance of matrices in C++, comparing Boost, the STL, plain arrays and a few structures of his own. Also, arrays of arrays don't support O(1) slicing, resize and creation, use more memory, etc. As for multi-dimentional data structures, they are used heavily by the fields of games, graphics, scientific computing, databases and I've probably forgotten some others.
Sep 09 2008
prev sibling next sibling parent reply Sean Kelly <sean invisibleduck.org> writes:
Andrei Alexandrescu wrote:
...
 
 After much thought and discussions among Walter, Bartosz and myself, I 
 defined a range design and reimplemented all of std.algorithm and much 
 of std.stdio in terms of ranges alone.

Yup. This is why I implemented all of Tango's algorithms specifically for arrays from the start--slices represent a reasonable approximation of ranges, and this seems far preferable to the iterator approach of C++. Glad to hear that this is what you've decided as well.
 This is quite a thorough test 
 because the algorithms are diverse and stress-test the expressiveness 
 and efficiency of the range design. Along the way I made the interesting 
 realization that certain union/difference operations are needed as 
 primitives for ranges. There are also a few bugs in the compiler and 
 some needed language enhancements (e.g. returning a reference from a 
 function); Walter is committed to implement them.

Very nice. The inability to return a reference has been a thorn in my side for ages.
 I put together a short document for the range design. I definitely 
 missed about a million things and have been imprecise about another 
 million, so feedback would be highly appreciated. See:
 
 http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html

It seems workable from a 1000' view. I'll have to try and apply the approach to some algorithms and see if anything comes up. So far, dealing with bidirectional ranges seems a bit weird, but that's likely more related to the syntax (ie. 'pop') than anything. Sean P.S. This decision has interesting implications for D2+, given the functional tendencies already present in the language :-)
Sep 08 2008
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Sean Kelly wrote:
 Andrei Alexandrescu wrote:
 ...
 After much thought and discussions among Walter, Bartosz and myself, I 
 defined a range design and reimplemented all of std.algorithm and much 
 of std.stdio in terms of ranges alone.

Yup. This is why I implemented all of Tango's algorithms specifically for arrays from the start--slices represent a reasonable approximation of ranges, and this seems far preferable to the iterator approach of C++. Glad to hear that this is what you've decided as well.

That's great to hear, but I should warn you that moving from arrays to "the lowest range that will do" is not quite easy. Think of std::rotate for example.
 This is quite a thorough test because the algorithms are diverse and 
 stress-test the expressiveness and efficiency of the range design. 
 Along the way I made the interesting realization that certain 
 union/difference operations are needed as primitives for ranges. There 
 are also a few bugs in the compiler and some needed language 
 enhancements (e.g. returning a reference from a function); Walter is 
 committed to implement them.

Very nice. The inability to return a reference has been a thorn in my side for ages.
 I put together a short document for the range design. I definitely 
 missed about a million things and have been imprecise about another 
 million, so feedback would be highly appreciated. See:

 http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html

It seems workable from a 1000' view. I'll have to try and apply the approach to some algorithms and see if anything comes up. So far, dealing with bidirectional ranges seems a bit weird, but that's likely more related to the syntax (ie. 'pop') than anything. Sean P.S. This decision has interesting implications for D2+, given the functional tendencies already present in the language :-)

Wait until you see the generators! An efficient generator of Fibonacci numbers in one line... auto fib = generate!("a[0] + a[1]")(1, 1); I'm so excited I can hardly stand myself. :o) Andrei
Sep 08 2008
parent reply Sean Kelly <sean invisibleduck.org> writes:
Andrei Alexandrescu wrote:
 Sean Kelly wrote:
 Andrei Alexandrescu wrote:
 ...
 After much thought and discussions among Walter, Bartosz and myself, 
 I defined a range design and reimplemented all of std.algorithm and 
 much of std.stdio in terms of ranges alone.

Yup. This is why I implemented all of Tango's algorithms specifically for arrays from the start--slices represent a reasonable approximation of ranges, and this seems far preferable to the iterator approach of C++. Glad to hear that this is what you've decided as well.

That's great to hear, but I should warn you that moving from arrays to "the lowest range that will do" is not quite easy. Think of std::rotate for example.

I'll admit that I find some of the features <algorithm> provides to be pretty weird. Has anyone ever actually wanted to sort something other than a random-access range in C++? Or rotate one, for example? These operations are allowed, but to me they fall outside the realm of useful functionality. I suppose there may be some relation here to Stepanov's idea of a computational basis. Should an algorithm operate on a range if it cannot do so efficiently? And even if it does, will anyone actually use it? Sean
Sep 08 2008
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Sean Kelly wrote:
 Andrei Alexandrescu wrote:
 Sean Kelly wrote:
 Andrei Alexandrescu wrote: ...
 
 After much thought and discussions among Walter, Bartosz and
 myself, I defined a range design and reimplemented all of
 std.algorithm and much of std.stdio in terms of ranges alone.

Yup. This is why I implemented all of Tango's algorithms specifically for arrays from the start--slices represent a reasonable approximation of ranges, and this seems far preferable to the iterator approach of C++. Glad to hear that this is what you've decided as well.

That's great to hear, but I should warn you that moving from arrays to "the lowest range that will do" is not quite easy. Think of std::rotate for example.

I'll admit that I find some of the features <algorithm> provides to be pretty weird. Has anyone ever actually wanted to sort something other than a random-access range in C++? Or rotate one, for example?

Great questions. I don't recall having needed to sort a list lately, but rotate is a great function that has an undeservedly odd name. What rotate does is to efficiently transform this: a, b, c, d, e, f, A, B, C, D into this: A, B, C, D, a, b, c, d, e, f I use that all the time because it's really a move-to-front operation. In fact my algorithm2 implementation does not call it rotate anymore, it calls it moveToFront and allows you to move any subrange of a range to the front of that range efficiently. It's a useful operation in a great deal of lookup strategies.
 These operations are allowed, but to me they fall outside the realm
 of useful functionality.  I suppose there may be some relation here
 to Stepanov's idea of a computational basis.  Should an algorithm
 operate on a range if it cannot do so efficiently?  And even if it
 does, will anyone actually use it?

I think it all depends on what one's day-to-day work consists of. I was chatting to Walter about it and he confessed that, although he has a great deal of respect for std.algorithm, he's not using much of it. I told him back that I need 80% of std.algorithm on a daily basis. In fact that's why I wrote it - otherwise I wouldn't have had the time to put into it. This is because I make next to no money so I can afford to work on basic research, which is "important" in a long-ranging way. Today's computing is quite disorganized and great energy is expended on gluing together various pieces, protocols, and interfaces. I've worked in that environment quite a lot, and dealing with glue can easily become 90% of a day's work, leaving only little time to get occupied with a real problem, such as making a computer genuinely smarter or at least more helpful towards its user. All too often we put a few widgets on a window and the actual logic driving those buttons - the "smarts", the actual "work" gets drowned by details taking care of making that logic stick to the buttons. I mentioned in a talk once that any programmer should know how to multiply two matrices. Why? Because if you don't, you can't tackle a variety of problems that can be easily expressed in terms of matrix multiplication, even though they have nothing to do with algebra (rotating figures, machine learning, fractals, fast series...). A person in the audience said that she never actually needs to multiply two matrices, so why bother? I gave an evasive response, but the reality was that that was a career-limiting state of affairs for her. Andrei
Sep 08 2008
next sibling parent Sean Kelly <sean invisibleduck.org> writes:
Andrei Alexandrescu wrote:
 Sean Kelly wrote:
 Andrei Alexandrescu wrote:
 Sean Kelly wrote:
 Andrei Alexandrescu wrote: ...
 After much thought and discussions among Walter, Bartosz and
 myself, I defined a range design and reimplemented all of
 std.algorithm and much of std.stdio in terms of ranges alone.

Yup. This is why I implemented all of Tango's algorithms specifically for arrays from the start--slices represent a reasonable approximation of ranges, and this seems far preferable to the iterator approach of C++. Glad to hear that this is what you've decided as well.

That's great to hear, but I should warn you that moving from arrays to "the lowest range that will do" is not quite easy. Think of std::rotate for example.

I'll admit that I find some of the features <algorithm> provides to be pretty weird. Has anyone ever actually wanted to sort something other than a random-access range in C++? Or rotate one, for example?

Great questions. I don't recall having needed to sort a list lately, but rotate is a great function that has an undeservedly odd name. What rotate does is to efficiently transform this: a, b, c, d, e, f, A, B, C, D into this: A, B, C, D, a, b, c, d, e, f I use that all the time because it's really a move-to-front operation.

Ah, so it's a bit like partition and select. I use the two of those constantly, but haven't ever had a need for rotate. Odd, I suppose, since they're so similar.
 In fact my algorithm2 implementation does not call it rotate anymore, it 
 calls it moveToFront and allows you to move any subrange of a range to 
 the front of that range efficiently. It's a useful operation in a great 
 deal of lookup strategies.
 
 These operations are allowed, but to me they fall outside the realm
 of useful functionality.  I suppose there may be some relation here
 to Stepanov's idea of a computational basis.  Should an algorithm
 operate on a range if it cannot do so efficiently?  And even if it
 does, will anyone actually use it?

I think it all depends on what one's day-to-day work consists of. I was chatting to Walter about it and he confessed that, although he has a great deal of respect for std.algorithm, he's not using much of it. I told him back that I need 80% of std.algorithm on a daily basis. In fact that's why I wrote it - otherwise I wouldn't have had the time to put into it.

Exactly. I implemented Tango's Array module for the same reason. Other than rotate and stable_sort, I think the module has everything from <algorithm>, plus some added bits.
 This is because I make next to no money so I can afford to work on basic 
 research, which is "important" in a long-ranging way. Today's computing 
 is quite disorganized and great energy is expended on gluing together 
 various pieces, protocols, and interfaces. I've worked in that 
 environment quite a lot, and dealing with glue can easily become 90% of 
 a day's work, leaving only little time to get occupied with a real 
 problem, such as making a computer genuinely smarter or at least more 
 helpful towards its user. All too often we put a few widgets on a window 
 and the actual logic driving those buttons - the "smarts", the actual 
 "work" gets drowned by details taking care of making that logic stick to 
 the buttons.

I've never worked in that environment, but I would think that even such positions require the use of algorithms. If not, then I wouldn't consider them to be software engineering positions. As for research-- I'd say that's a fairly broad category. My first salaried position was in R&D for a switched long-distance carrier, for example, but that's applied research as opposed to academic research, which I believe you're describing. I think there are benefits to each, but the overlap is what truly interests me.
 I mentioned in a talk once that any programmer should know how to 
 multiply two matrices. Why? Because if you don't, you can't tackle a 
 variety of problems that can be easily expressed in terms of matrix 
 multiplication, even though they have nothing to do with algebra 
 (rotating figures, machine learning, fractals, fast series...). A person 
 in the audience said that she never actually needs to multiply two 
 matrices, so why bother? I gave an evasive response, but the reality was 
 that that was a career-limiting state of affairs for her.

Yup. This is a lot like the argument for Calculus in a CS curriculum. Entry-level software positions rarely require such things and yet I've been surprised at how many times they have proven useful over the years, simply for general problem-solving. And there's certainly no debate about Linear Algebra--they may as well rename it "math for computer programming." Sean
Sep 09 2008
prev sibling parent reply Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:
Andrei Alexandrescu wrote:
 
 This is because I make next to no money so I can afford to work on basic 
 research, which is "important" in a long-ranging way. Today's computing 
 is quite disorganized and great energy is expended on gluing together 
 various pieces, protocols, and interfaces. I've worked in that 
 environment quite a lot, and dealing with glue can easily become 90% of 
 a day's work, leaving only little time to get occupied with a real 
 problem, such as making a computer genuinely smarter or at least more 
 helpful towards its user. All too often we put a few widgets on a window 
 and the actual logic driving those buttons - the "smarts", the actual 
 "work" gets drowned by details taking care of making that logic stick to 
 the buttons.
 

Well, didn't you find a "real problem" right there (and also a very interesting one), in trying to make code/libraries/methodologies/tools/whatever that reduce those 90% of work in boilerplate details? An example could the years of investment and research in ORM frameworks (Hibernate/EJB3, Ruby on Rails, etc.), which despite ORM technology having existed for quite many years, only recently has it reached a point where it's really easy and non-tedious to write an OO-DB persistence mapping. Another possible example, regarding GUI programming like you mentioned, is data binding. I haven't used it myself yet, but for what they describe, it's purpose is indeed to reduce a lot of the complexity and tedium in writing code to synchronize the UI with the model/logic, and vice-versa. Learning and building these kinds of stuff is, IMO, the pinnacle of software engineering. -- Bruno Medeiros - Software Developer, MSc. in CS/E graduate http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
Sep 25 2008
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Bruno Medeiros wrote:
 Andrei Alexandrescu wrote:
 This is because I make next to no money so I can afford to work on 
 basic research, which is "important" in a long-ranging way. Today's 
 computing is quite disorganized and great energy is expended on gluing 
 together various pieces, protocols, and interfaces. I've worked in 
 that environment quite a lot, and dealing with glue can easily become 
 90% of a day's work, leaving only little time to get occupied with a 
 real problem, such as making a computer genuinely smarter or at least 
 more helpful towards its user. All too often we put a few widgets on a 
 window and the actual logic driving those buttons - the "smarts", the 
 actual "work" gets drowned by details taking care of making that logic 
 stick to the buttons.

Well, didn't you find a "real problem" right there (and also a very interesting one), in trying to make code/libraries/methodologies/tools/whatever that reduce those 90% of work in boilerplate details? An example could the years of investment and research in ORM frameworks (Hibernate/EJB3, Ruby on Rails, etc.), which despite ORM technology having existed for quite many years, only recently has it reached a point where it's really easy and non-tedious to write an OO-DB persistence mapping. Another possible example, regarding GUI programming like you mentioned, is data binding. I haven't used it myself yet, but for what they describe, it's purpose is indeed to reduce a lot of the complexity and tedium in writing code to synchronize the UI with the model/logic, and vice-versa. Learning and building these kinds of stuff is, IMO, the pinnacle of software engineering.

This hardly characterizes or answers my point. Of course wherever there's difficulty there's opportunity for automation, and research in software engineering is alive and well. My point was that much effort in the industry today is expended on dealing with effects instead of fighting the causes. Andrei
Sep 25 2008
parent Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:
Andrei Alexandrescu wrote:
 Bruno Medeiros wrote:
 Andrei Alexandrescu wrote:
 This is because I make next to no money so I can afford to work on 
 basic research, which is "important" in a long-ranging way. Today's 
 computing is quite disorganized and great energy is expended on 
 gluing together various pieces, protocols, and interfaces. I've 
 worked in that environment quite a lot, and dealing with glue can 
 easily become 90% of a day's work, leaving only little time to get 
 occupied with a real problem, such as making a computer genuinely 
 smarter or at least more helpful towards its user. All too often we 
 put a few widgets on a window and the actual logic driving those 
 buttons - the "smarts", the actual "work" gets drowned by details 
 taking care of making that logic stick to the buttons.

Well, didn't you find a "real problem" right there (and also a very interesting one), in trying to make code/libraries/methodologies/tools/whatever that reduce those 90% of work in boilerplate details? An example could the years of investment and research in ORM frameworks (Hibernate/EJB3, Ruby on Rails, etc.), which despite ORM technology having existed for quite many years, only recently has it reached a point where it's really easy and non-tedious to write an OO-DB persistence mapping. Another possible example, regarding GUI programming like you mentioned, is data binding. I haven't used it myself yet, but for what they describe, it's purpose is indeed to reduce a lot of the complexity and tedium in writing code to synchronize the UI with the model/logic, and vice-versa. Learning and building these kinds of stuff is, IMO, the pinnacle of software engineering.

This hardly characterizes or answers my point. Of course wherever there's difficulty there's opportunity for automation, and research in software engineering is alive and well.

I was just pointing that things don't have to be way you described.
 My point was that much effort in 
 the industry today is expended on dealing with effects instead of 
 fighting the causes.
 
 Andrei

But that's quite true nonetheless. :/ -- Bruno Medeiros - Software Developer, MSc. in CS/E graduate http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
Sep 25 2008
prev sibling next sibling parent reply "Manfred_Nowak" <svv1999 hotmail.com> writes:
Andrei Alexandrescu wrote:

  feedback would be highly appreciated

1) Example in "4. Bidirectional range" Reversing of ranges can be done in constant runtime, but the example exposes runtime linear in the number of elements. This might be a hint, that a "6. Reversable Range" might be required, because a range reversable in constant time requires more space. 2) [left,right][Diff,Union] Ranges are not sets; therefore not only me might have problems to capture the idea behind "difference" and "union" on ranges. Of course one can define whatever one wants, but I would prefer [sub,snip,cut,split,...][B,E][B,E] (r,s) I.e. `subBB(r,s)' is the subrange of `r' starting at the beginning of `r' and ending at the beginning of `s' (including the beginning of `r', but not including the beginning of `s'). It my be of some worth to include the `B' or `E' as parameters to the choosen keyword(?) to enable algorithmically accesses: | sub(B,B,r,s) instead of `leftDiff( r, s)' -manfred -- If life is going to exist in this Universe, then the one thing it cannot afford to have is a sense of proportion. (Douglas Adams)
Sep 08 2008
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Manfred_Nowak wrote:
 Andrei Alexandrescu wrote:
 
  feedback would be highly appreciated

1) Example in "4. Bidirectional range" Reversing of ranges can be done in constant runtime, but the example exposes runtime linear in the number of elements. This might be a hint, that a "6. Reversable Range" might be required, because a range reversable in constant time requires more space.

There are numerous collections and ranges to be defined, of course. The five-kinds taxonomy is time-tested and allows implementation of a great many algorithms. Beyond that, users can define many containers and ranges with additional operations or with improved properties of existing operations.
 2) [left,right][Diff,Union]
 
 Ranges are not sets; therefore not only me might have problems to 
 capture the idea behind "difference" and "union" on ranges.

I am opened to better names. Bartosz talked me into the ones above. I used these: leftToLeft leftToRight rightToRight
 Of course one can define whatever one wants, but I would prefer
 [sub,snip,cut,split,...][B,E][B,E] (r,s)
 
 I.e. `subBB(r,s)' is the subrange of `r' starting at the beginning of 
 `r' and ending at the beginning of `s' (including the beginning of `r', 
 but not including the beginning of `s').
 
 It my be of some worth to include the `B' or `E' as parameters to the  
 choosen keyword(?) to enable algorithmically accesses:
 
 | sub(B,B,r,s)
 
 instead of `leftDiff( r, s)'

I find these too cryptic, but to each their own. I predict that primitive names will become a bicycle shed. In the end we'll have to use enum :o). Andrei
Sep 08 2008
parent reply "Manfred_Nowak" <svv1999 hotmail.com> writes:
Andrei Alexandrescu wrote:

 1)


You are right. A reversable range can be built with the designed primitives. But this also holds for the `Retro'-type.
 2)

leftToLeft

Much better, but did you notice, that you used the directionless words "beginning" and "end" to describe their semantics? That's why I used "B" and "E". With directions in the names one might be forced to make wrappers for not being irritated by the directions in case they do not fit. Ex.: Imagine a 4d-matrix in which a 3d-range is defined diagonally to three of the axis of the 4d-matrix. What is left respective right of such a range? 3) casting On second read I miss some words about explicite and implicite casting possibilities between the five types of ranges. -manfred -- If life is going to exist in this Universe, then the one thing it cannot afford to have is a sense of proportion. (Douglas Adams)
Sep 08 2008
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Manfred_Nowak wrote:
 Andrei Alexandrescu wrote:
 
 1)


You are right. A reversable range can be built with the designed primitives. But this also holds for the `Retro'-type.

Indeed Retro was provided as a mere example and has no special status. I plan to add some more widely useful ranges, such as a circular range. Contributions will be appreciated (of course after the design gets frozen).
 2)

leftToLeft

Much better, but did you notice, that you used the directionless words "beginning" and "end" to describe their semantics? That's why I used "B" and "E". With directions in the names one might be forced to make wrappers for not being irritated by the directions in case they do not fit. Ex.: Imagine a 4d-matrix in which a 3d-range is defined diagonally to three of the axis of the 4d-matrix. What is left respective right of such a range?

Good point!
 3) casting
 
 On second read I miss some words about explicite and implicite casting 
 possibilities between the five types of ranges. 

Yes, I also mentioned that in a different post. Andrei
Sep 08 2008
parent "Manfred_Nowak" <svv1999 hotmail.com> writes:
Andrei Alexandrescu wrote:

[...]

4) Operations on Ranges

In the discussion on the naming of `leftDiff' etc. the arguments of all 
participants implicitely declared "splitting" and "concatenating" 
operations on ranges. But there are none defined. Why?


Googling for "range algebra" gave a hit on

http://www.idealliance.org/papers/extreme/proceedings/xslfo-
pdf/2002/Nicol01/EML2002Nicol01.pdf

which seems to express pretty much of the concept presented.

-manfred
-- 
If life is going to exist in this Universe, then the one thing it 
cannot afford to have is a sense of proportion. (Douglas Adams)
Sep 09 2008
prev sibling next sibling parent "Jarrett Billingsley" <jarrett.billingsley gmail.com> writes:
On Mon, Sep 8, 2008 at 8:24 PM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 Sergey Gromov wrote:
 Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:
 Walter, Bartosz and myself have been hard at work trying to find the
 right abstraction for iteration. That abstraction would replace the infamous
 opApply and would allow for external iteration, thus paving the way to
 implementing real generic algorithms.

opApply() wasn't my hero either. :) Your article really looks like something I'd expect to find in D. It only requires foreach support, and yeah, return by reference.

Indeed. Both are in the works.

Quick question about this one -- how will iterators get foreach support? Are they classes or structs? If they're structs, how will the compiler know something is an iterator? Or will it be based on duck typing (if it looks like an iterator, it must be an iterator)? And if this support involves "blessing" certain types within the runtime, what will this mean for other runtime libraries?
Sep 08 2008
prev sibling next sibling parent reply "Lionello Lunesu" <lionello lunesu.remove.com> writes:
"Andrei Alexandrescu" <SeeWebsiteForEmail erdani.org> wrote in message 
news:ga46ok$2s77$1 digitalmars.com...
 I put together a short document for the range design. I definitely missed 
 about a million things and have been imprecise about another million, so 
 feedback would be highly appreciated. See:

 http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html

This is just awesome. Thank you for tackling this issue. I agree with others that some names are not so obvious. Left/right? How do Arabic speakers feel about this : ) Begin/end seems more intuitive. Can you explain this please. From Input range: e=r.getNext Reads an element and moves to the next one. Returns the read element, which is of type ElementType!(R). The call is defined only right after r.isEmpty returned false. That last part: The call is defined only right after r.isEmpty returned false. First of all, is* functions always sound "const" to me, but the way you describe isEmpty it sounds like it actually changes something, advancing a pointer or something like that. What happens if isEmpty is called twice? Will it skip 1 element? The input range behaves like C#'s IEnumerator, but at least the C# names are more obvious: while (i.MoveNext()) e = i.Current; But isEmpty is common to all ranges, so I understand why it's the way it is. I just hope it could stay "const", not modifying the internal state. Perhaps add "next" to input ranges as well? L.
Sep 08 2008
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
(This is an older message that somehow didn't make it to the group. 
Resending now.)

Lionello Lunesu wrote:
 
 "Andrei Alexandrescu" <SeeWebsiteForEmail erdani.org> wrote in message 
 news:ga46ok$2s77$1 digitalmars.com...
 I put together a short document for the range design. I definitely 
 missed about a million things and have been imprecise about another 
 million, so feedback would be highly appreciated. See:

 http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html

This is just awesome. Thank you for tackling this issue. I agree with others that some names are not so obvious. Left/right? How do Arabic speakers feel about this : ) Begin/end seems more intuitive.

I don't know of that particular cultural sensitivity. Begin and end are bad choices because they'd confuse the heck out of STL refugees. c.left and c.right are actually STL's c.front() and c.back() or *c.begin() and c.end()[-1], if you allow the notational abuse. But I sure hope some good names will come along.
 Can you explain this please. From Input range:
 
 e=r.getNext Reads an element and moves to the next one. Returns the read 
 element, which is of type ElementType!(R). The call is defined only 
 right after r.isEmpty returned false.
 
 That last part: The call is defined only right after r.isEmpty returned 
 false.
 
 First of all, is* functions always sound "const" to me, but the way you 
 describe isEmpty it sounds like it actually changes something, advancing 
 a pointer or something like that. What happens if isEmpty is called 
 twice? Will it skip 1 element?

Excellent question! Gosh if I staged this it couldn't have gone any better. Consider an input range getting ints separated by whitespace out of a FILE* - something that we'd expect our design to allow easily and neatly. So then how do I implement isEmpty()? Well, feof(f) is not really informative at all. Maybe the file has five more spaces and then it ends. So in order to TELL you that the range is not empty, I actually have to GO all the way and actually read the integer. Then I can tell you: yeah, there's stuff available, or no, I'm done, or even throw an exception if some error happened. That makes isEmpty non-const. You check for r.isEmpty, it makes sure an int is buffered inside the range's state. You call r.isEmpty again, it doesn't do anything because an int is already buffered. You call r.getNext, the int gets moved off the range's state into your program, and the internal flag is set telling the range that the buffer's empty. You call r.getNext without having called r.isEmpty, then the range makes a heroic effort to fetch another int. If that fails, the range throws an exception. So in essence the behavior is that you can use isEmpty to make sure that getNext won't blow in your face (speaking of Pulp Fiction...) I think this all is very sensible, and even more sensible when I'll give more detail below.
 The input range behaves like C#'s IEnumerator, but at least the C# names 
 are more obvious: while (i.MoveNext()) e = i.Current; But isEmpty is 
 common to all ranges, so I understand why it's the way it is. I just 
 hope it could stay "const", not modifying the internal state. Perhaps 
 add "next" to input ranges as well?

This is even better than the previous question. Why not this for an input iterator? for (; !r.isEmpty; r.next) use(r.getNext); or even for (; !r.isEmpty; r.next) use(r.left); thus making input ranges quite similar to forward ranges. In that design: a) The constructor fetches the first int b) isEmpty is const and just returns the "available" internal flag c) next reads the next int off the file First I'll eliminate the design with isEmpty/next/getNext as flawed for a subtle reason: cost of copying. Replace mentally the int above with something that needs dynamic allocation, such as BigInt. So the range reads one BigInt off the file, stores it in the internal buffer, and then the user calls: for (; !r.isEmpty; r.next) { BigInt my = r.getNext(); .... } Since in one iteration there's only one BigInt, not two, I'd need to do a destructive transfer in getNext() that "moves" the state of BigInt from the range to my, leaving r's state empty. (This feature will be available in D2 soon.) But then what if somebody calls r.getNext() again? Well I don't have the data anymore, so I need to issue a next(). So I discover next was not needed in the first place. Hope everything is clear so far. Now let's discuss the isEmpty/next/left design. That design is also flawed, just in a different way. The range holds the BigInt inside, but r.left gives to the client a *reference* to it. This is cool! There is no more extra copying and everything works smoothly. In fact this design patents a lie. It lies about giving a reference to an element (the same way a "real" container does) but that element will be overwritten every time r.next is called, UNBEKNOWNST TO THE CLIENT. So consider the algorithm: void bump(R)(R r) { for (; !r.isEmpty(); r.next) ++r.left; } You pass an input range to bump and it compiles and executes to something entirely nonsensical. This is an obvious misuse, but as I'm sure you know it gets real confusing real fast. A possible argument is to have left return a ref const(BigInt), but then we lose the ability to transfer its value into our state. What we want is a design that tells the truth. And a design that tells the truth is this: r.isEmpty does whatever the hell it takes to make sure whether there's data available or not. It is not const and it could throw an exception. v = r.getNext returns BY VALUE by means of DESTRUCTIVE COPY data that came through the wire, data that the client now owns as soon as getNext returned. There is no extra copy, no extra allocation, and the real thing has happened: data has been read from the outside and user code was made the only owner of it. Andrei
Sep 09 2008
next sibling parent reply superdan <super dan.org> writes:
Andrei Alexandrescu Wrote:

 What we want is a design that tells the truth. And a design that tells
 the truth is this:
 
 r.isEmpty does whatever the hell it takes to make sure whether there's
 data available or not. It is not const and it could throw an exception.
 
 v = r.getNext returns BY VALUE by means of DESTRUCTIVE COPY data that
 came through the wire, data that the client now owns as soon as getNext
 returned. There is no extra copy, no extra allocation, and the real
 thing has happened: data has been read from the outside and user code
 was made the only owner of it.

this is really kool n the gang. there's a sore point tho. if i wanna read strings from a file no prob. for (auto r = stringSucker(stdin); !r.isEmpty(); ) { string s = r.getNext(); // play with s } but a new string is allocated every line. that's safe but slow. so i want some more efficient stuff. i should use char[] because string don't deallocate. for (auto r = charArraySucker(stdin); !r.isEmpty(); ) { char[] s = r.getNext(); // play with s } no improvement. same thing a new char[] is alloc each time. maybe i could do for (auto r = charArraySucker(stdin); !r.isEmpty(); ) { char[] s = r.getNext(); // play with s delete s; } would this make stuff faster. maybe. maybe not. and it's not general. what i'd like is some way of telling the range, i'm done with this you can recycle and reuse it. it's a green green world. for (auto r = charArraySucker(stdin); !r.isEmpty(); ) { char[] s = r.getNext(); // play with s r.recycle(s); } sig is recycle(ref ElementType!(R)).
Sep 09 2008
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
superdan wrote:
 Andrei Alexandrescu Wrote:
 
 What we want is a design that tells the truth. And a design that tells
 the truth is this:

 r.isEmpty does whatever the hell it takes to make sure whether there's
 data available or not. It is not const and it could throw an exception.

 v = r.getNext returns BY VALUE by means of DESTRUCTIVE COPY data that
 came through the wire, data that the client now owns as soon as getNext
 returned. There is no extra copy, no extra allocation, and the real
 thing has happened: data has been read from the outside and user code
 was made the only owner of it.

this is really kool n the gang. there's a sore point tho. if i wanna read strings from a file no prob. for (auto r = stringSucker(stdin); !r.isEmpty(); ) { string s = r.getNext(); // play with s } but a new string is allocated every line. that's safe but slow. so i want some more efficient stuff. i should use char[] because string don't deallocate. for (auto r = charArraySucker(stdin); !r.isEmpty(); ) { char[] s = r.getNext(); // play with s } no improvement. same thing a new char[] is alloc each time. maybe i could do for (auto r = charArraySucker(stdin); !r.isEmpty(); ) { char[] s = r.getNext(); // play with s delete s; } would this make stuff faster. maybe. maybe not. and it's not general. what i'd like is some way of telling the range, i'm done with this you can recycle and reuse it. it's a green green world. for (auto r = charArraySucker(stdin); !r.isEmpty(); ) { char[] s = r.getNext(); // play with s r.recycle(s); } sig is recycle(ref ElementType!(R)).

This is a great point. Unfortunately that won't quite work properly. Equally unfortunately, you just revealed an issue with my design. One goal of range design is that algorithms written for "inferior" ranges should work seamlessly with "superior" ranges. For example, find should work fine with any range, so we should write (with your idea): R find(R, V)(R r, V v) { ElementType!(R) tmp; for (; !r.isEmpty; r.recycle(tmp)) { tmp = r.getNext; if (tmp == v) break; } return r; } This looks great but imagine we apply this to a collection of some costly ElementType!(R). Then getNext rightly returns a reference because there's no need to create a copy unless absolutely necessary. But then we realize that our look forces a copy no matter what! The copy was good for the input range. But it's bad for the actual container that wouldn't need to copy anything. It looks like I need to reconsider the design mentioned by Lionello, in which both input iterators and forward iterators expose separate isEmpty/next/first operations. Then an input range r caches the last read value internally and gives access to it through r.first. I have an idea on how to disallow at least egregious errors. Consider: void fill(R, V)(R r, V v) { for (; !r.isEmpty; r.next) r.first = v; } If "r.first = v;" compiles, then the following scandalous misuse goes unpunished: fill(charArraySucker, "bogus"); A way to prevent r.first = v from compiling is to define a one-argument function first: struct MyRange(T) { private void first(W)(W whatever); // not implemented ref T first(); } Expression r.first will resolve to the second function. Expression r.first = v will translate into r.first(v) so it will resolve to the first function, which will fail the protection test. Then there remains the problem: void bump(R, V)(R r) { for (; !r.isEmpty; r.next) ++r.first; } bump(charArraySucker); // bogus Sigh... Andrei
Sep 09 2008
parent reply Sean Kelly <sean invisibleduck.org> writes:
Andrei Alexandrescu wrote:
 
 Then there remains the problem:
 
 void bump(R, V)(R r)
 {
     for (; !r.isEmpty; r.next) ++r.first;
 }
 
 bump(charArraySucker); // bogus
 
 Sigh...

And I suppose you don't want to return a const reference from first() because the user may want to operate on the value, if only on a temporary basis? Let's say: P findFirst(P, R, C)( R r, C c ) { for( ; !r.isEmpty; r.next ) { // modify the temporary because a new element is expensive // to copy-construct if( auto p = c.contains( ++r.first ) ) return p; } return P.init; } Hm... must the implementation prevent stupid mistakes such as your example? Ideally, yes. But I don't see a way to do so and yet allow for routines like findFirst() above. Sean
Sep 09 2008
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Sean Kelly wrote:
 Andrei Alexandrescu wrote:
 Then there remains the problem:

 void bump(R, V)(R r)
 {
     for (; !r.isEmpty; r.next) ++r.first;
 }

 bump(charArraySucker); // bogus

 Sigh...

And I suppose you don't want to return a const reference from first() because the user may want to operate on the value, if only on a temporary basis? Let's say: P findFirst(P, R, C)( R r, C c ) { for( ; !r.isEmpty; r.next ) { // modify the temporary because a new element is expensive // to copy-construct if( auto p = c.contains( ++r.first ) ) return p; } return P.init; } Hm... must the implementation prevent stupid mistakes such as your example? Ideally, yes. But I don't see a way to do so and yet allow for routines like findFirst() above.

Yes, exactly. Also consider a user that inspects lines in a file, and occasionally takes ownership of the current line to put it into a hashtable or something. I think I'll resign myself to isEmpty/next/first for input ranges. The remaining difference between input ranges and forward ranges is that the former are uncopyable. Andrei
Sep 09 2008
prev sibling parent reply "Lionello Lunesu" <lionello lunesu.remove.com> writes:
"superdan" <super dan.org> wrote in message 
news:ga5vjs$snn$1 digitalmars.com...
 Andrei Alexandrescu Wrote:

 What we want is a design that tells the truth. And a design that tells
 the truth is this:

 r.isEmpty does whatever the hell it takes to make sure whether there's
 data available or not. It is not const and it could throw an exception.

 v = r.getNext returns BY VALUE by means of DESTRUCTIVE COPY data that
 came through the wire, data that the client now owns as soon as getNext
 returned. There is no extra copy, no extra allocation, and the real
 thing has happened: data has been read from the outside and user code
 was made the only owner of it.

this is really kool n the gang. there's a sore point tho. if i wanna read strings from a file no prob. for (auto r = stringSucker(stdin); !r.isEmpty(); ) { string s = r.getNext(); // play with s } but a new string is allocated every line. that's safe but slow. so i want some more efficient stuff. i should use char[] because string don't deallocate. for (auto r = charArraySucker(stdin); !r.isEmpty(); ) { char[] s = r.getNext(); // play with s } no improvement. same thing a new char[] is alloc each time. maybe i could do for (auto r = charArraySucker(stdin); !r.isEmpty(); ) { char[] s = r.getNext(); // play with s delete s; } would this make stuff faster. maybe. maybe not. and it's not general. what i'd like is some way of telling the range, i'm done with this you can recycle and reuse it. it's a green green world. for (auto r = charArraySucker(stdin); !r.isEmpty(); ) { char[] s = r.getNext(); // play with s r.recycle(s); } sig is recycle(ref ElementType!(R)).

Can't this be done by creating different ranges? I mean, trying to find a 'one size fits all' model is usually a lost cause. And now we have different consumers and different types. Perhaps one sucker is instantiated with its own internal buffer and another sucker allocates every new item. And possibly yet another sucker only returns invariant items. L.
Sep 09 2008
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Lionello Lunesu wrote:
 
 "superdan" <super dan.org> wrote in message 
 news:ga5vjs$snn$1 digitalmars.com...
 Andrei Alexandrescu Wrote:

 What we want is a design that tells the truth. And a design that tells
 the truth is this:

 r.isEmpty does whatever the hell it takes to make sure whether there's
 data available or not. It is not const and it could throw an exception.

 v = r.getNext returns BY VALUE by means of DESTRUCTIVE COPY data that
 came through the wire, data that the client now owns as soon as getNext
 returned. There is no extra copy, no extra allocation, and the real
 thing has happened: data has been read from the outside and user code
 was made the only owner of it.

this is really kool n the gang. there's a sore point tho. if i wanna read strings from a file no prob. for (auto r = stringSucker(stdin); !r.isEmpty(); ) { string s = r.getNext(); // play with s } but a new string is allocated every line. that's safe but slow. so i want some more efficient stuff. i should use char[] because string don't deallocate. for (auto r = charArraySucker(stdin); !r.isEmpty(); ) { char[] s = r.getNext(); // play with s } no improvement. same thing a new char[] is alloc each time. maybe i could do for (auto r = charArraySucker(stdin); !r.isEmpty(); ) { char[] s = r.getNext(); // play with s delete s; } would this make stuff faster. maybe. maybe not. and it's not general. what i'd like is some way of telling the range, i'm done with this you can recycle and reuse it. it's a green green world. for (auto r = charArraySucker(stdin); !r.isEmpty(); ) { char[] s = r.getNext(); // play with s r.recycle(s); } sig is recycle(ref ElementType!(R)).

Can't this be done by creating different ranges? I mean, trying to find a 'one size fits all' model is usually a lost cause. And now we have different consumers and different types. Perhaps one sucker is instantiated with its own internal buffer and another sucker allocates every new item. And possibly yet another sucker only returns invariant items.

I don't mind implementing different suckers. :o) The problem is that the suckers won't have the same interface, so some of them will work badly or not at all with std.algorithm. So again: I think the design you suggested isEmpty/first/getNext is the better one even for input iterators. Andrei
Sep 09 2008
prev sibling next sibling parent reply Leandro Lucarella <llucax gmail.com> writes:
Andrei Alexandrescu, el  9 de septiembre a las 07:47 me escribiste:
 Lionello Lunesu wrote:
"Andrei Alexandrescu" <SeeWebsiteForEmail erdani.org> wrote in message 
news:ga46ok$2s77$1 digitalmars.com...
I put together a short document for the range design. I definitely missed 
about a million things and have been imprecise about another million, so 
feedback would be highly appreciated. See:

http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html

I agree with others that some names are not so obvious. Left/right? How do Arabic speakers feel about this : ) Begin/end seems more intuitive.

I don't know of that particular cultural sensitivity. Begin and end are bad choices because they'd confuse the heck out of STL refugees. c.left and c.right are actually STL's c.front() and c.back() or *c.begin() and c.end()[-1], if you allow the notational abuse. But I sure hope some good names will come along.

I think STL refugees can deal with it. I think there is no point on keep compromising D's readability because of C/C++ (you just mentioned enum, another really bad choice to keep C/C++ refugees happy). I find left and right a little obscure too, it all depends on your mental image of a range. Using front/back or begin/end is much more clearer. -- Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/ ---------------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------------- MP: Qué tengo? B: 2 dedos de frente. MP: No, en mi mano, Bellini. B: Un acoplado! MP: No, escuche bien, eh... B: El pelo largo, Mario... Se usa en la espalda. MP: No! Es para cargar. B: Un hermano menor. MP: No, Bellini! Se llena con B: Un chancho, Mario... cualquier cosa. MP: No, Bellini, no y no! -- El Gran Bellini (Mario Podestá con una mochila)
Sep 09 2008
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Leandro Lucarella wrote:
 Andrei Alexandrescu, el  9 de septiembre a las 07:47 me escribiste:
 Lionello Lunesu wrote:
 "Andrei Alexandrescu" <SeeWebsiteForEmail erdani.org> wrote in message 
 news:ga46ok$2s77$1 digitalmars.com...
 I put together a short document for the range design. I definitely missed 
 about a million things and have been imprecise about another million, so 
 feedback would be highly appreciated. See:

 http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html

I agree with others that some names are not so obvious. Left/right? How do Arabic speakers feel about this : ) Begin/end seems more intuitive.

bad choices because they'd confuse the heck out of STL refugees. c.left and c.right are actually STL's c.front() and c.back() or *c.begin() and c.end()[-1], if you allow the notational abuse. But I sure hope some good names will come along.

I think STL refugees can deal with it. I think there is no point on keep compromising D's readability because of C/C++ (you just mentioned enum, another really bad choice to keep C/C++ refugees happy).

I agree. I just don't think that choosing one name over a synonym name compromises much.
 I find left and right a little obscure too, it all depends on your mental
 image of a range. Using front/back or begin/end is much more clearer.

I'd like to go with: r.first r.last r.next r.pop Not convinced about r.toBegin(s), r.toEnd(s), and r.fromEnd(s) yet, in wake of a realization. I've noticed that the r.fromBegin(s) operation is not needed if we make the appropriate relaxations. r.fromBegin(s) is really s.toEnd(r). I've updated http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html with a drawing (right at the beginning) that illustrates what primitives we need. Maybe this will help us choose even better names. Andrei
Sep 09 2008
next sibling parent reply Leandro Lucarella <llucax gmail.com> writes:
Andrei Alexandrescu, el  9 de septiembre a las 10:30 me escribiste:
I think STL refugees can deal with it. I think there is no point on keep
compromising D's readability because of C/C++ (you just mentioned enum,
another really bad choice to keep C/C++ refugees happy).

I agree. I just don't think that choosing one name over a synonym name compromises much.
I find left and right a little obscure too, it all depends on your mental
image of a range. Using front/back or begin/end is much more clearer.

I'd like to go with: r.first r.last

Much better, thank you! =) -- Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/ ---------------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------------- De todos los amigos que he tenido, eres el primero. -- Bender
Sep 09 2008
parent reply Benji Smith <dlanguage benjismith.net> writes:
Leandro Lucarella wrote:
 Andrei Alexandrescu, el  9 de septiembre a las 10:30 me escribiste:

 I'd like to go with:

 r.first
 r.last

Much better, thank you! =)

Maybe: r.head r.tail Also, I'm not crazy about the "isEmpty" name. Given the use of "getNext", I think "hasNext" is a more natural choice. --benji
Sep 09 2008
parent reply "Manfred_Nowak" <svv1999 hotmail.com> writes:
Benji Smith wrote:

 Given the use of "getNext", I think "hasNext" is a more natural
 choice 

clapping hands. -manfred -- If life is going to exist in this Universe, then the one thing it cannot afford to have is a sense of proportion. (Douglas Adams)
Sep 09 2008
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Manfred_Nowak wrote:
 Benji Smith wrote:
 
 Given the use of "getNext", I think "hasNext" is a more natural
 choice 

clapping hands.

Walter would love that. for (R r = getR(); r.hasNext; r.next) { ... } Look ma, no negation! Oops, I just materialized one with the exclamation sign. Andrei
Sep 09 2008
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Andrei Alexandrescu wrote:
 Manfred_Nowak wrote:
 Benji Smith wrote:

 Given the use of "getNext", I think "hasNext" is a more natural
 choice 

clapping hands.

Walter would love that. for (R r = getR(); r.hasNext; r.next) { ... } Look ma, no negation! Oops, I just materialized one with the exclamation sign.

I just discovered a problem with that. hasNext implies I'm supposed to call next to get the thingie. It should be hasFirst, which is less appealing. Andrei
Sep 09 2008
parent Benji Smith <dlanguage benjismith.net> writes:
Andrei Alexandrescu wrote:
 Andrei Alexandrescu wrote:
 Manfred_Nowak wrote:
 Benji Smith wrote:

 Given the use of "getNext", I think "hasNext" is a more natural
 choice 

clapping hands.

Walter would love that. for (R r = getR(); r.hasNext; r.next) { ... } Look ma, no negation! Oops, I just materialized one with the exclamation sign.

I just discovered a problem with that. hasNext implies I'm supposed to call next to get the thingie. It should be hasFirst, which is less appealing. Andrei

I see where you're coming from, because the range shrinks itself element-by-element as it iterates, eventually disappearing into a zero-element range. But for me as a consumer of the container, it's a little weird. When I visualize iteration (beware: silly metaphor ahead), it looks like a frog hopping from one lilly-pad to another. He might have a well-defined start and end point, but the lilly-pads don't evaporate after the frog hops away. When I see "isEmpty" in the implementation of an iteration routine, it makes me think of a producer/consumer work queue. The list is only empty if the consumer has consumed all the work items from the queue. I get what you're saying about the iteration-range being empty because it shrinks itself during each step of the iteration. But to me, iteration is an idempotent operation, so something non-empty before the iteration should not be empty after the iteration. --benji
Sep 09 2008
prev sibling next sibling parent reply Derek Parnell <derek psych.ward> writes:
On Tue, 09 Sep 2008 10:30:58 -0500, Andrei Alexandrescu wrote:


 I'd like to go with:
 
 r.first
 r.last
 r.next
 r.pop

LOL ... I was just thinking to myself ... "what's wrong with First and Last? I should suggest them." then I read this post. "next" is fine, but "pop"? Isn't the pair of "next" called "prev(ious)" and the pair of "pop" called "push". So please, either have next/prev or push/pop, and in that case push/pop looks quite silly. -- Derek Parnell Melbourne, Australia skype: derek.j.parnell
Sep 09 2008
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Derek Parnell wrote:
 On Tue, 09 Sep 2008 10:30:58 -0500, Andrei Alexandrescu wrote:
 
 
 I'd like to go with:

 r.first
 r.last
 r.next
 r.pop

LOL ... I was just thinking to myself ... "what's wrong with First and Last? I should suggest them." then I read this post. "next" is fine, but "pop"? Isn't the pair of "next" called "prev(ious)" and the pair of "pop" called "push". So please, either have next/prev or push/pop, and in that case push/pop looks quite silly.

Previous is confusing as it suggest I'm moving back where I came from. In reality I shrink the range from the other end. So we need: "Shrink the range from the left end" "Shrink the range from the right end" The first will be used much more often than the second. Andrei
Sep 09 2008
next sibling parent reply Derek Parnell <derek nomail.afraid.org> writes:
On Tue, 09 Sep 2008 18:13:08 -0500, Andrei Alexandrescu wrote:


 Previous is confusing as it suggest I'm moving back where I came from. 

Ah... how confusing is this English language! ;-)
 In reality I shrink the range from the other end. So we need:
 
 "Shrink the range from the left end"
 "Shrink the range from the right end"

And I'm sure you really mean ... "Shrink the range from the front" "Shrink the range from the back" because "left" does not always mean "front" etc ... but we've been over this.
 The first will be used much more often than the second.

If the concept and the implementation involves changing the size (shrinking), shouldn't the words used somehow invoke this idea? trim? strip? slice? cut? ... just thinking out loud ... nothing too serious. -- Derek (skype: derek.j.parnell) Melbourne, Australia 10/09/2008 10:10:32 AM
Sep 09 2008
parent Benji Smith <dlanguage benjismith.net> writes:
Derek Parnell wrote:
 If the concept and the implementation involves changing the size
 (shrinking), shouldn't the words used somehow invoke this idea?
 
 trim? strip? slice? cut? ... just thinking out loud ... nothing too
 serious.

Consume?
Sep 09 2008
prev sibling parent reply Leandro Lucarella <llucax gmail.com> writes:
Andrei Alexandrescu, el  9 de septiembre a las 18:13 me escribiste:
 Derek Parnell wrote:
On Tue, 09 Sep 2008 10:30:58 -0500, Andrei Alexandrescu wrote:
I'd like to go with:

r.first
r.last
r.next
r.pop

Last? I should suggest them." then I read this post. "next" is fine, but "pop"? Isn't the pair of "next" called "prev(ious)" and the pair of "pop" called "push". So please, either have next/prev or push/pop, and in that case push/pop looks quite silly.

Previous is confusing as it suggest I'm moving back where I came from. In reality I shrink the range from the other end. So we need: "Shrink the range from the left end" "Shrink the range from the right end"

You mean from begining and the end I guess ;)
 The first will be used much more often than the second.

shrink(int n = 1)? if n > 0, shrinks from the begining, if n < 0, shrinks if shrinks from the end (0 is no-op). This way you can skip some elements too. Even when it could be a little cryptic, I think it plays well with slicing. Another posibility is shrink(int begin = 1, end = 0), to shrink both ends at the time, for example (calling shrink1 to the first proposal and shrink2 to the second): r.pop == r.shrink1(-1) == r.shrink2(0, 1) r.pop; r.pop == r.shrink(-2) == r.shrink2(0, 2) r.shrink1() == r.shrink2() r.shrink1(3) == r.shrink2(3) r.shrink1(3); r.shrink1(-2) == r.shrink2(3, 2) -- Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/ ---------------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------------- <o_O> parakenotengobarraespaciadora <o_O> aver <o_O> estoyarreglandolabarraporkeserompiounapatita
Sep 10 2008
parent reply Sergey Gromov <snake.scaly gmail.com> writes:
Leandro Lucarella <llucax gmail.com> wrote:
 Andrei Alexandrescu, el  9 de septiembre a las 18:13 me escribiste:
 "Shrink the range from the left end"
 "Shrink the range from the right end"

 The first will be used much more often than the second.

shrink(int n = 1)? if n > 0, shrinks from the begining, if n < 0, shrinks if shrinks from the end (0 is no-op). This way you can skip some elements too. Even when it could be a little cryptic, I think it plays well with slicing. Another posibility is shrink(int begin = 1, end = 0), to shrink both ends at the time, for example (calling shrink1 to the first proposal and shrink2 to the second): r.pop == r.shrink1(-1) == r.shrink2(0, 1) r.pop; r.pop == r.shrink(-2) == r.shrink2(0, 2) r.shrink1() == r.shrink2() r.shrink1(3) == r.shrink2(3) r.shrink1(3); r.shrink1(-2) == r.shrink2(3, 2)

I remember that shift() was a method to remove first element from an array. Some Basic perhaps... So it could be push, pop, shift, er... unshift? But now that I think about it... What's the use case for these operations? It's clear to me that getNext/putNext are a generator/constructor pair, in the broad sense. But push/whatever? Andrei have told already, if I remember correctly, that ranges are views of data, not manipulators. This means that they cannot be used to extend a collection. Therefore ranges cannot have any extension methods whatsoever. If you draw a parallel between a range and a stack of paper, the shrink methods would probably be pop/snap... I'd also propose next() for moving the start and prev() for moving the end. It sounds a bit misleading but, on the other hand, it closely resembles forward and backward iteration with the opposite end of a range representing the iteration limit. Or maybe forward()/backward(), or fwd()/back()?
Sep 10 2008
parent reply David Gileadi <foo bar.com> writes:
Sergey Gromov wrote:
 If you draw a parallel between a range and a stack of paper, the shrink 
 methods would probably be pop/snap...  I'd also propose next() for 
 moving the start and prev() for moving the end.  It sounds a bit 
 misleading but, on the other hand, it closely resembles forward and 
 backward iteration with the opposite end of a range representing the 
 iteration limit.  Or maybe forward()/backward(), or fwd()/back()?

Perhaps reduce() instead of pop() for moving the end?
Sep 10 2008
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
David Gileadi wrote:
 Sergey Gromov wrote:
 If you draw a parallel between a range and a stack of paper, the 
 shrink methods would probably be pop/snap...  I'd also propose next() 
 for moving the start and prev() for moving the end.  It sounds a bit 
 misleading but, on the other hand, it closely resembles forward and 
 backward iteration with the opposite end of a range representing the 
 iteration limit.  Or maybe forward()/backward(), or fwd()/back()?

Perhaps reduce() instead of pop() for moving the end?

I love reduce! Thought of it as well. Unfortunately the term is loaded with reduction of a binary operation over a range, as e.g. in std.algorithm. I think shrink() is reasonable. next() moves to the next thing. shrink() shrinks the set of dudes I can reach. Andrei
Sep 10 2008
parent Sergey Gromov <snake.scaly gmail.com> writes:
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:
 David Gileadi wrote:
 Sergey Gromov wrote:
 If you draw a parallel between a range and a stack of paper, the 
 shrink methods would probably be pop/snap...  I'd also propose next() 
 for moving the start and prev() for moving the end.  It sounds a bit 
 misleading but, on the other hand, it closely resembles forward and 
 backward iteration with the opposite end of a range representing the 
 iteration limit.  Or maybe forward()/backward(), or fwd()/back()?

Perhaps reduce() instead of pop() for moving the end?

I love reduce! Thought of it as well. Unfortunately the term is loaded with reduction of a binary operation over a range, as e.g. in std.algorithm. I think shrink() is reasonable. next() moves to the next thing. shrink() shrinks the set of dudes I can reach.

I thought of next/shrink as well, but they look asymmetrical, and also next here is a form of shrink, too. Someone could think of "shrink" as of chopping from both ends.
Sep 10 2008
prev sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Bill Baxter wrote:
 On Wed, Sep 10, 2008 at 12:30 AM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 I've updated http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html
 with a drawing (right at the beginning) that illustrates what primitives we
 need. Maybe this will help us choose even better names.

The text says that s should be a subrange of r, but the drawing shows s extending beyond r. Does it actually need to be a subrange?

I am considering relaxing the requirements. Andrei
Sep 09 2008
prev sibling parent reply "Lionello Lunesu" <lionello lunesu.remove.com> writes:
"Andrei Alexandrescu" <SeeWebsiteForEmail erdani.org> wrote in message 
news:ga5r8i$h0v$1 digitalmars.com...
 So in essence the behavior is that you can use isEmpty to
 make sure that getNext won't blow in your face (speaking of Pulp
 Fiction...)

So isEmpty is optional for input ranges? This does not actually match your own documentation:
getNext: The call is defined only right after r.isEmpty returned false.

If you make isEmpty optional, its non-constness is less of a problem. What I have a problem with (overstatement) is having to call isEmpty to actually prepare the next element. If try { while (1) e = ir.getNext; } works, I'm sold : )
 r.isEmpty does whatever the hell it takes to make sure whether there's
 data available or not. It is not const and it could throw an exception.

 v = r.getNext returns BY VALUE by means of DESTRUCTIVE COPY data that
 came through the wire, data that the client now owns as soon as getNext
 returned. There is no extra copy, no extra allocation, and the real
 thing has happened: data has been read from the outside and user code
 was made the only owner of it.

Thank you for taking the time to explain all these details. This is all great stuff. L.
Sep 09 2008
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Lionello Lunesu wrote:
 
 "Andrei Alexandrescu" <SeeWebsiteForEmail erdani.org> wrote in message 
 news:ga5r8i$h0v$1 digitalmars.com...
 So in essence the behavior is that you can use isEmpty to
 make sure that getNext won't blow in your face (speaking of Pulp
 Fiction...)

So isEmpty is optional for input ranges? This does not actually match your own documentation:
 getNext: The call is defined only right after r.isEmpty returned false.

If you make isEmpty optional, its non-constness is less of a problem. What I have a problem with (overstatement) is having to call isEmpty to actually prepare the next element. If try { while (1) e = ir.getNext; } works, I'm sold : )

I think I'd want to make it nonoptional such that people wanting real fast iterators can define r.isEmpty to do a check and r.getNext (well, r.left) to go unchecked.
 r.isEmpty does whatever the hell it takes to make sure whether there's
 data available or not. It is not const and it could throw an exception.

 v = r.getNext returns BY VALUE by means of DESTRUCTIVE COPY data that
 came through the wire, data that the client now owns as soon as getNext
 returned. There is no extra copy, no extra allocation, and the real
 thing has happened: data has been read from the outside and user code
 was made the only owner of it.

Thank you for taking the time to explain all these details. This is all great stuff.

However, superdan destroyed me. (See my answer to him.) I think I need to concede to your design. Andrei
Sep 09 2008
prev sibling next sibling parent "Bill Baxter" <wbaxter gmail.com> writes:
On Wed, Sep 10, 2008 at 12:30 AM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 I've updated http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html
 with a drawing (right at the beginning) that illustrates what primitives we
 need. Maybe this will help us choose even better names.

The text says that s should be a subrange of r, but the drawing shows s extending beyond r. Does it actually need to be a subrange? --bb
Sep 09 2008
prev sibling parent "Denis Koroskin" <2korden gmail.com> writes:
On Wed, 10 Sep 2008 01:56:40 +0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Andrei Alexandrescu wrote:
 Manfred_Nowak wrote:
 Benji Smith wrote:

 Given the use of "getNext", I think "hasNext" is a more natural
 choice

clapping hands.

for (R r = getR(); r.hasNext; r.next) { ... } Look ma, no negation! Oops, I just materialized one with the exclamation sign.

I just discovered a problem with that. hasNext implies I'm supposed to call next to get the thingie. It should be hasFirst, which is less appealing. Andrei

I usually implement my iterators as follows: interface Iterator(T) { bool isValid(); T value(); void moveNext(); } Usage: auto it = ...; while (it.isValid()) { auto value = it.value(); it.moveNext(); } or for (auto it = ...; it.isValid(); it.moveNext()) { auto value = it.value(); }
Sep 10 2008
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Mon, 08 Sep 2008 20:24:27 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 About the "collection should be a range itself" mantra, I've had a  
 micro-epiphany. Since D's slices so nicely model at the same time arrays  
 and their ranges, it is very seductive to think of carrying that to  
 other collection types. But I got disabused of that notion as soon as I  
 wanted to define a less simple data structure. Consider a matrix:

 auto a = BlockMatrix!(float, 3)(100, 200, 300);

 defines a block contiguous matrix of three dimensions with the  
 respective sizes. Now a should be the matrix AND its range at the same  
 time. But what's "the range" of a matrix? Oops. As soon as you start to  
 think of it, so many darn ranges come to mind.

 * flat: all elements in one shot in an arbitrary order

 * dimension-wise: iterate over a given dimension

 * subspace: iterate over a "slice" of the matrix with fewer dimensions

 * diagonal: scan the matrix from one corner to the opposite corner

 I guess there are some more. So before long I realized that the most  
 gainful design is this:

 a) A matrix owns its stuff and is preoccupied with storage internals,  
 allocation, and the such.

 b) The matrix defines as many range types as it wants.

 c) Users use the ranges.

 For example:

 foreach (ref e; a.flat) e *= 1.1;
 foreach (row; a.dim(0)) row[0, 0] = 0;
 foreach (col; a.dim(1)) col[1, 1] *= 5;

I'd recommend a more clear cut example. Three of the ranges are very well defined in array languages and libraries. Essentially a slice of a matrix is another matrix that may have less or more dimensions and therefore may be a collection in addition to a range. The dimension-wise range is the only operation which is more complex, due to the type and dimensions of the returned array changing float[x,y,z] -> float[x,y][z]. And the main argument is that a float[x,y][z] is large, slow to create and unwanted, so a separate range/generator is better (Also note a generator can provide implicit the head const, tail mutable nature of the range). Even given this, it doesn't contratict the "collection should be a range itself" mantra, since there is a very well defined range which encompasses the data, its just that some ranges are more optimal if they're only views, and not a root collection.
Sep 08 2008
prev sibling next sibling parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2008-09-08 17:50:54 -0400, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 feedback would be highly appreciated. See:
 
 http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html

That looks great. I want to suggest renaming a few functions to make them more consistant and (hopefully) more expressive, as I see I'm not the only one frowning on them. So right now you're defining this: r.getNext r.putNext r.left r.next rightUnion(r, s) rightDiff(r, s) r.right r.pop leftUnion(r, s) leftDiff(r, s) Here's my alternate naming proposal: r.headNext r.putNext r.head r.next r.nextUntil(s) r.nextAfter(s) r.rear r.pull r.pullUntil(s) r.pullAfter(s) Note that r.headNext is literally r.head followed by r.next when you have a forward iterator. You could also add "rearPull" to bidirectional ranges if you wanted. :-) The syntax is a little different for binary functions (union, diff) as I changed them to members to make things easier to read and more in line with the regular next and pull. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Sep 08 2008
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Michel Fortin wrote:
 On 2008-09-08 17:50:54 -0400, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> said:
 
 feedback would be highly appreciated. See:

 http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html

That looks great. I want to suggest renaming a few functions to make them more consistant and (hopefully) more expressive, as I see I'm not the only one frowning on them. So right now you're defining this: r.getNext r.putNext r.left r.next rightUnion(r, s) rightDiff(r, s) r.right r.pop leftUnion(r, s) leftDiff(r, s) Here's my alternate naming proposal: r.headNext r.putNext r.head r.next r.nextUntil(s) r.nextAfter(s) r.rear r.pull r.pullUntil(s) r.pullAfter(s) Note that r.headNext is literally r.head followed by r.next when you have a forward iterator. You could also add "rearPull" to bidirectional ranges if you wanted. :-) The syntax is a little different for binary functions (union, diff) as I changed them to members to make things easier to read and more in line with the regular next and pull.

I like the alternate names quite some. One thing, however, is that head and rear are not near-antonyms (which ideally they should be). Maybe front and rear would be an improvement. (STL uses front and back). Also, I may be dirty-minded, but somehow headNext just sounds... bad :o). I like the intersection functions as members because they clarify the relationship between the two ranges, which is asymmetric. I will definitely heed this suggestion. "Until" suggests iteration, however, which it shouldn't be (should be constant time) so maybe "nextTo" or something could be more suggestive. This is going somewhere! Andrei
Sep 08 2008
next sibling parent reply "Manfred_Nowak" <svv1999 hotmail.com> writes:
Andrei Alexandrescu wrote:

 maybe "nextTo" or something could be more suggestive.

r.tillBeg(s), r.tillEnd(s), r.fromBeg(s), r.fromEnd(s) ? -manfred -- If life is going to exist in this Universe, then the one thing it cannot afford to have is a sense of proportion. (Douglas Adams)
Sep 08 2008
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Manfred_Nowak wrote:
 Andrei Alexandrescu wrote:
 
 maybe "nextTo" or something could be more suggestive.

r.tillBeg(s), r.tillEnd(s), r.fromBeg(s), r.fromEnd(s) ?

Sounds good! Walter doesn't like abbreviations, so probably *Begin would please him more. Andrei
Sep 08 2008
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Bill Baxter wrote:
 On Tue, Sep 9, 2008 at 1:06 PM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 Manfred_Nowak wrote:
 Andrei Alexandrescu wrote:

 maybe "nextTo" or something could be more suggestive.

r.fromBeg(s), r.fromEnd(s) ?

please him more.

But till and until are synonyms. They both sound like iteration. Although it might be unavoidable since all prepositions that give a destination seem to imply going to that destination. till, until, toward, to, up to, etc. So might as well go with the shortest one, "to". r.toBegin(s), r.toEnd(s) r.fromBegin(s), r.fromEnd(s)

These are the names that I find most appealing at the moment. They required the fewest neurons to fire when glancing over them and mapping them to the needed operations. Andrei
Sep 09 2008
parent reply Fawzi Mohamed <fmohamed mac.com> writes:
It is a nice idea to redesign the iterator and range.
I think that the proposal is not bad, but I have some notes about it, 
and some things that I would have done differently.

1) The simplest interface input (range) is just
bool isEmpty();
T next();
iterator(T) release();

Thefirst two I fully agree on, the second one I suppose is there to 
allow resources to be released and possibly transfer the data to 
another iterator.. is it really needed?

Now I would see this simplest thing (let me call it iterator) as the 
basic objects for foreach looping.
*all* things on which foreach loops should be iterators.
If an object is not a iterator there should be a standard way to 
convert it to one (.iterator for example).
So if the compiler gets something that is not a iterator it tries to 
see if .iterator is implemented and if it is it calls it and iterates 
on it.
This let many objects have a "default" iterator. Obviously an object 
could have other methods that return an iterator.

2) All the methods with intersection of iterator in my opinion are 
difficult to memorize, and rarely used, I would scrap them.
Instead I would add the comparison operation 
.atSamePlace(iterator!(T)y) that would say if two iterators are at the 
same place. With it one gets back all the power of pointers, and with a 
syntax and use that are understandable.
I understand the idea of covering all possibilities, if one wants it 
with .atSamePlace a template can easily construct all possible 
intersection iterators. Clearly calling recursively such a template is 
inefficient, but I would say the then one should use directly a pair of 
iterators (in the worst case one could make a specialization that 
implements it more efficiently for the types that support it).

3) copying: I would let the user freely copy and duplicate iterators if needed.

4) input-output
I think that the operations proposed are sound, I like them

5) hierarchy of iterators
I would classify the iterator also along another axis: size
infinite (stream) - finite (but unknown size) - bounded (finite and known size)

The other classification:
forward iterator (what I called iterator until now)
bidirectional range: I understand this, these are basically two 
iterators one from the beginning and the other from the end that are 
coupled together. I find it a little bit strange, I would just expect 
to have a pair of iterators... but I see that it might be useful
bidirectional iterator: this is a doubly linked list, I think that this 
class of iterators cannot easily be described just as a range, it often 
needs three points (start,end,actual_pos), I think has its place (and 
is not explicitly present in your list)
random_iterator: (this could be also called array type or linear indexed type).

So this is what "my" iterator/range would look like :)

Fawzi
Sep 09 2008
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Fawzi Mohamed wrote:
 It is a nice idea to redesign the iterator and range.
 I think that the proposal is not bad, but I have some notes about it, 
 and some things that I would have done differently.
 
 1) The simplest interface input (range) is just
 bool isEmpty();
 T next();
 iterator(T) release();

Actually next is getNext, and release returns R (the range type).
 Thefirst two I fully agree on, the second one I suppose is there to 
 allow resources to be released and possibly transfer the data to another 
 iterator.. is it really needed?

Yes. Consider findAdjacent that finds two equal adjacent elements in a collection: Range findAdjacent(alias pred = "a == b", Range)(Range r) { if (r.isEmpty) return r; auto ahead = r; ahead.next; for (; !ahead.isEmpty; r.next, ahead.next) if (binaryFun!(pred)(r.first, ahead.first)) return r; } return ahead; } The whole implementation fundamentally rests on the notion that you can copy a range into another, and that you can iterate the collection independently with two distinct ranges. If that's not true, findAdjacent will execute yielding nonsensical results. Input iterators are not copyable. With an input iterator "auto ahead = r;" will not compile. But they are movable. So you can relinquish control from one iterator to the other.
 Now I would see this simplest thing (let me call it iterator) as the 
 basic objects for foreach looping.
 *all* things on which foreach loops should be iterators.
 If an object is not a iterator there should be a standard way to convert 
 it to one (.iterator for example).
 So if the compiler gets something that is not a iterator it tries to see 
 if .iterator is implemented and if it is it calls it and iterates on it.
 This let many objects have a "default" iterator. Obviously an object 
 could have other methods that return an iterator.

Fine. So instead of saying: foreach (e; c.all) { ... } you can say foreach (e; c) { ... } I think that's some dubious savings.
 2) All the methods with intersection of iterator in my opinion are 
 difficult to memorize, and rarely used, I would scrap them.
 Instead I would add the comparison operation .atSamePlace(iterator!(T)y) 
 that would say if two iterators are at the same place. With it one gets 
 back all the power of pointers, and with a syntax and use that are 
 understandable.

But that comparison operation is not enough to implement anything of substance. Try your hand at a few classic algorithms and you'll see.
 I understand the idea of covering all possibilities, if one wants it 
 with .atSamePlace a template can easily construct all possible 
 intersection iterators. Clearly calling recursively such a template is 
 inefficient, but I would say the then one should use directly a pair of 
 iterators (in the worst case one could make a specialization that 
 implements it more efficiently for the types that support it).
 
 3) copying: I would let the user freely copy and duplicate iterators if 
 needed.

I like freedom too. But that kind of freedom is incorrect for input iterators.
 4) input-output
 I think that the operations proposed are sound, I like them

Then you got to accept the consequences :o).
 5) hierarchy of iterators
 I would classify the iterator also along another axis: size
 infinite (stream) - finite (but unknown size) - bounded (finite and 
 known size)

Distinguishing such things can be of advantage sometimes, and could be added as a refinement to the five categories if shown useful.
 The other classification:
 forward iterator (what I called iterator until now)
 bidirectional range: I understand this, these are basically two 
 iterators one from the beginning and the other from the end that are 
 coupled together. I find it a little bit strange, I would just expect to 
 have a pair of iterators... but I see that it might be useful
 bidirectional iterator: this is a doubly linked list, I think that this 
 class of iterators cannot easily be described just as a range, it often 
 needs three points (start,end,actual_pos), I think has its place (and is 
 not explicitly present in your list)
 random_iterator: (this could be also called array type or linear indexed 
 type).

I can't understand much of the above, sorry.
 So this is what "my" iterator/range would look like :)

I encourage you to realize your design. Before long you'll find probably even more issues with it than I mentioned above, but you'll be gained in being better equipped to find proper solutions. Andrei
Sep 09 2008
next sibling parent reply Fawzi Mohamed <fmohamed mac.com> writes:
On 2008-09-09 18:09:28 +0200, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 Fawzi Mohamed wrote:
 It is a nice idea to redesign the iterator and range.
 I think that the proposal is not bad, but I have some notes about it, 
 and some things that I would have done differently.
 
 1) The simplest interface input (range) is just
 bool isEmpty();
 T next();
 iterator(T) release();

Actually next is getNext, and release returns R (the range type).
 Thefirst two I fully agree on, the second one I suppose is there to 
 allow resources to be released and possibly transfer the data to 
 another iterator.. is it really needed?

Yes. Consider findAdjacent that finds two equal adjacent elements in a collection: Range findAdjacent(alias pred = "a == b", Range)(Range r) { if (r.isEmpty) return r; auto ahead = r; ahead.next; for (; !ahead.isEmpty; r.next, ahead.next) if (binaryFun!(pred)(r.first, ahead.first)) return r; } return ahead; } The whole implementation fundamentally rests on the notion that you can copy a range into another, and that you can iterate the collection independently with two distinct ranges. If that's not true, findAdjacent will execute yielding nonsensical results. Input iterators are not copyable. With an input iterator "auto ahead = r;" will not compile. But they are movable. So you can relinquish control from one iterator to the other.

ok I understand, indeed it is useful to have non copyable "unique" iterators, even if they are not the common iterators (actually I think it is potentially even more important for output iterators).
 Now I would see this simplest thing (let me call it iterator) as the 
 basic objects for foreach looping.
 *all* things on which foreach loops should be iterators.
 If an object is not a iterator there should be a standard way to 
 convert it to one (.iterator for example).
 So if the compiler gets something that is not a iterator it tries to 
 see if .iterator is implemented and if it is it calls it and iterates 
 on it.
 This let many objects have a "default" iterator. Obviously an object 
 could have other methods that return an iterator.

Fine. So instead of saying: foreach (e; c.all) { ... } you can say foreach (e; c) { ... } I think that's some dubious savings.

I think it is useful, but not absolutely necessary.
 2) All the methods with intersection of iterator in my opinion are 
 difficult to memorize, and rarely used, I would scrap them.
 Instead I would add the comparison operation 
 .atSamePlace(iterator!(T)y) that would say if two iterators are at the 
 same place. With it one gets back all the power of pointers, and with a 
 syntax and use that are understandable.

But that comparison operation is not enough to implement anything of substance. Try your hand at a few classic algorithms and you'll see.

are you sure? then a range is *exactly* equivalent to a STL iterator, only that it cannot go out of bounds: // left1-left2: while((!i1.isEmpty) && (!i1.atSamePlace(i2))){ i1.next; } // left2-left1: while((!i2.isEmpty) && (!i1.atSamePlace(i2))){ i1.next; } // union 1-2 while((!i1.isEmpty) && (!(i1.atSamePlace(i2))){ i1.next; } while(!i2.isEmpty){ i2.next; } // union 2-1 ... // lower triangle i1=c.all; while(!i1.isEmpty){ i2=c.all; while(!i2.isEmpty && !i2.atSamePlace(i1)){ i2.next; } well these are the operations that you can do on basically all iterators (and with wich you can define new iterators). The one you propose need an underlying total order that can be efficiently checked, for example iterators on trees do not have necessarily this property, and then getting your kind of intersection can be difficult (and not faster than the operation using atSamePlace.
 I understand the idea of covering all possibilities, if one wants it 
 with .atSamePlace a template can easily construct all possible 
 intersection iterators. Clearly calling recursively such a template is 
 inefficient, but I would say the then one should use directly a pair of 
 iterators (in the worst case one could make a specialization that 
 implements it more efficiently for the types that support it).
 
 3) copying: I would let the user freely copy and duplicate iterators if needed.

I like freedom too. But that kind of freedom is incorrect for input iterators.

Now I realized it, thanks.
 4) input-output
 I think that the operations proposed are sound, I like them

Then you got to accept the consequences :o).

yes :)
 5) hierarchy of iterators
 I would classify the iterator also along another axis: size
 infinite (stream) - finite (but unknown size) - bounded (finite and known size)

Distinguishing such things can be of advantage sometimes, and could be added as a refinement to the five categories if shown useful.

well if an iterator knows its size, and you want to use it to initialize an array for example...
 The other classification:
 forward iterator (what I called iterator until now)


 bidirectional range: I understand this, these are basically two 
 iterators one from the beginning and the other from the end that are 
 coupled together. I find it a little bit strange, I would just expect 
 to have a pair of iterators... but I see that it might be useful
 bidirectional iterator: this is a doubly linked list, I think that this 
 class of iterators cannot easily be described just as a range, it often 
 needs three points (start,end,actual_pos), I think has its place (and 
 is not explicitly present in your list)
 random_iterator: (this could be also called array type or linear indexed type).

I can't understand much of the above, sorry.

the only new thing is bidirectional iterator: an iterator that can go in both directions as extra iterator type (your bidirectional range is something different). I think it is useful and I don't see the need to shoehorn it into a range. For me an iterator is an object that can generate a sequence by itself, so a range is an example of iterator, but I don't see the need to make each iterator a range. As I said before a range also has a total ordering of the objects that can be easily checked, this is a special king of iterator for me, not the most general. Take two ranges of two linked lists, you cannot easily build your intersections because you don't know their relative order, and checking it is inefficient.
 
 So this is what "my" iterator/range would look like :)

I encourage you to realize your design. Before long you'll find probably even more issues with it than I mentioned above, but you'll be gained in being better equipped to find proper solutions.

I hope we converge toward a good solution ;) Fawzi
 
 
 Andrei

Sep 09 2008
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Fawzi Mohamed wrote:
 are you sure? then a range is *exactly* equivalent to a STL iterator, 
 only that it cannot go out of bounds:
 // left1-left2:
 while((!i1.isEmpty) && (!i1.atSamePlace(i2))){
  i1.next;
 }
 // left2-left1:
 while((!i2.isEmpty) && (!i1.atSamePlace(i2))){
  i1.next;
 }
 // union 1-2
 while((!i1.isEmpty) && (!(i1.atSamePlace(i2))){
  i1.next;
 }
 while(!i2.isEmpty){
  i2.next;
 }
 // union 2-1
 ...
 // lower triangle
 i1=c.all;
 while(!i1.isEmpty){
  i2=c.all;
  while(!i2.isEmpty && !i2.atSamePlace(i1)){
    i2.next;
  }
 well these are the operations that you can do on basically all iterators 
 (and with wich you can define new iterators).
 The one you propose need an underlying total order that can be 
 efficiently checked, for example iterators on trees do not have 
 necessarily this property, and then getting your kind of intersection 
 can be difficult (and not faster than the operation using atSamePlace.

I am getting seriously confused by this subthread. So are you saying that atSamePlace is your primitive and that you implement the other range operations all in linear time? If I did not misunderstand and that's your design, then you may want to revise that design right now. It will never work. I guarantee it.
 the only new thing is bidirectional iterator: an iterator that can go in 
 both directions as extra iterator type (your bidirectional range is 
 something different).

A bidirectional range is simply a range that you can shrink from either end.
 I think it is useful and I don't see the need to shoehorn it into a 
 range. For me an iterator is an object that can generate a sequence by 
 itself, so a range is an example of iterator, but I don't see the need 
 to make each iterator a range.

I have put forth reasons for doing away with iterators entirely in the range doc. What are your counter-reasons for bringing back iterators?
 As I said before a range also has a total ordering of the objects that 
 can be easily checked, this is a special king of iterator for me, not 
 the most general. Take two ranges of two linked lists, you cannot easily 
 build your intersections because you don't know their relative order, 
 and checking it is inefficient.

Correct. The range intersection primitives are Achille's heel of the range-based design. Checking subrange reachability is O(n), so the range intersection primitives take the user by faith. But iterators have that Achille's heel problem too, plus a few arrows in their back :o). The document clarifies this disadvantage by saying that range intersection primitives are undefined if certain conditions are not met. In short, this is an endemic problem of an iteration based on either ranges or individual iterators. An objection to that should automatically come with a constructive proof, i.e. a better design.
 So this is what "my" iterator/range would look like :)

I encourage you to realize your design. Before long you'll find probably even more issues with it than I mentioned above, but you'll be gained in being better equipped to find proper solutions.

I hope we converge toward a good solution ;)

Well I haven't seen much code yet. Andrei
Sep 09 2008
parent Fawzi Mohamed <fmohamed mac.com> writes:
On 2008-09-10 01:02:00 +0200, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 Fawzi Mohamed wrote:
 are you sure? then a range is *exactly* equivalent to a STL iterator, 
 only that it cannot go out of bounds:
 // left1-left2:
 while((!i1.isEmpty) && (!i1.atSamePlace(i2))){
  i1.next;
 }
 // left2-left1:
 while((!i2.isEmpty) && (!i1.atSamePlace(i2))){
  i1.next;
 }
 // union 1-2
 while((!i1.isEmpty) && (!(i1.atSamePlace(i2))){
  i1.next;
 }
 while(!i2.isEmpty){
  i2.next;
 }
 // union 2-1
 ...
 // lower triangle
 i1=c.all;
 while(!i1.isEmpty){
  i2=c.all;
  while(!i2.isEmpty && !i2.atSamePlace(i1)){
    i2.next;
  }
 well these are the operations that you can do on basically all 
 iterators (and with wich you can define new iterators).
 The one you propose need an underlying total order that can be 
 efficiently checked, for example iterators on trees do not have 
 necessarily this property, and then getting your kind of intersection 
 can be difficult (and not faster than the operation using atSamePlace.

I am getting seriously confused by this subthread. So are you saying that atSamePlace is your primitive and that you implement the other range operations all in linear time? If I did not misunderstand and that's your design, then you may want to revise that design right now. It will never work. I guarantee it.

It desn't seem to difficult to me, just look at the code, they are iterations on subranges of iterators i1 and i2, actually they are the only kind of range combination that can be performed safely on general iterators. The range combinations you propose are cumbersome rarely used and in general unsafe, I think that it is a bad idea add them to the object that is needed to get some foreach magic, and the most generic iterator. atSamePlace returns true if two iterators have .left (or however you call it) at the same place (and in general this might not mean that they have the same address) can be implemented for almost all iterators in constant time (at the moment I cannot think of a counter example), and with it (as the code just above shows) you can define some subranges. In the case in which you have a easily checkable total ordering between the elements then yes you do have all the all that it is needed to have a real range, and for this range object your subrange operations are safe, and I am not against them, just don't force them on every person that wants just to iterate on something.
 the only new thing is bidirectional iterator: an iterator that can go 
 in both directions as extra iterator type (your bidirectional range is 
 something different).

A bidirectional range is simply a range that you can shrink from either end.

That is exactly what I said it the sentence before, but in this sentence I am speaking about a bidirectional *iterator* that for me is an iterator that can move both back and forth.
 I think it is useful and I don't see the need to shoehorn it into a 
 range. For me an iterator is an object that can generate a sequence by 
 itself, so a range is an example of iterator, but I don't see the need 
 to make each iterator a range.

I have put forth reasons for doing away with iterators entirely in the range doc. What are your counter-reasons for bringing back iterators?

they are simpler and describe a larger range of useful constructs ranges on liked list as I said are unsafe, which does not meant that ranges are not useful, just that there is a place also for simple iterators. Iterators can be perfectly safe it is just the C++ idea of un-bundling the end from the iterator that makes it unsafe (an also more cumbersome to use). If iterator for you for you is too connected with C++ view of it call them generators: a self containde object that can generate a sequence. bidirectional iterators are a natural step in the progression of iterators also they can be implemented safely.
 As I said before a range also has a total ordering of the objects that 
 can be easily checked, this is a special king of iterator for me, not 
 the most general. Take two ranges of two linked lists, you cannot 
 easily build your intersections because you don't know their relative 
 order, and checking it is inefficient.

Correct. The range intersection primitives are Achille's heel of the range-based design. Checking subrange reachability is O(n), so the range intersection primitives take the user by faith. But iterators have that Achille's heel problem too, plus a few arrows in their back :o). The document clarifies this disadvantage by saying that range intersection primitives are undefined if certain conditions are not met. In short, this is an endemic problem of an iteration based on either ranges or individual iterators. An objection to that should automatically come with a constructive proof, i.e. a better design.

using atSamePlace you can do it safely on any kind of ranges, I think that the operation available should only be safe, and they can be safe, in general using atSamePlace, and if there is a quickly checkable total ordering (as in arrays for example) by never letting a range be larger than the union of two ranges. One can then discuss if segment it (and the result would be an iterator but not a range) or choose only one side (keep it a range) if the two ranges are disjoint and have a hole between them.
 So this is what "my" iterator/range would look like :)

I encourage you to realize your design. Before long you'll find probably even more issues with it than I mentioned above, but you'll be gained in being better equipped to find proper solutions.

I hope we converge toward a good solution ;)

Well I haven't seen much code yet.

I have written quite some code in my multidimensional array library ( http://github.com/fawzi/narray ) and I thought a lot about iterators, not only in D, but in the several languages that I know. As with everybody I don't think to really see all implications of the interface, but I think that I understand them enough to participate meaningfully to this discussion. Actually the moment I am really busy, but I am trying to say something because I think this discussion is important for the future of D, and I care about it. I will try to give some real code, but I thought that my contribution was not so difficult to understand, but it is always like this to yourself one always seems clear ;) Fawzi
Sep 10 2008
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Bill Baxter wrote:
 But I think you and I are in agreement that it would be easier and
 more natural to think of ranges as iterators augmented with
 information about bounds, as opposed to a contiguous block of things
 from A to B.

I like that you are bringing this point up, it is interesting. Note that my API never assumes or requires that there's an actual contiguous block of things underneath. Au contraire, in the I/O case, there's only "the current element" underneath. But a better example is generators. Consider a function generate that takes a string expression using a[0], a[1],... a[k] (the state) and returns a[k+1]. The generate function also takes the initial state. Then generate returns a range that returns in turn each element of the series. Generate is easy to implement, but I don't want to get into that now, only into usage. Simplest use is: auto boring = generate!("a[0]"(42); This guy will generate the series 42 42 42 42 42 42 42... forever and ever. Now to use it at all we'd have to temper it. So we use function called "take", which accepts a maximum size. And then: writeln(take(10, boring)); This guy will print "42 42 42 42". Let's generate a more interesting series. How about an iota: writeln(take(4, generate!("a[0] + 2")(5))); That guy prints "5 7 9 11". Or Newton's square root approximations: writeln(take(4, generate!("(a[0] + 2/a[0])/2")(1.0))); which prints "1 1.5 1.4167 1.4142". All of these are ranges, some bounded and some unbounded, but do not have blocks of elements underneath them.
 well these are the operations that you can do on basically all iterators
 (and with wich you can define new iterators).
 The one you propose need an underlying total order that can be efficiently
 checked, for example iterators on trees do not have necessarily this
 property, and then getting your kind of intersection can be difficult (and
 not faster than the operation using atSamePlace.

I don't think that's correct. Andrei's system does not need a total order any more than yours does. The unions and diffs just create new ranges by combining the components of existing ranges. They don't need to know anything about what happens in between those points or how you get from one to the other. Just take the "begin" of this guy and put it together with the "end" of that guy, for example. It doesn't require knowing how to get from anywhere to anywhere to create that new range.

Yes, that's exactly right. Andrei
Sep 10 2008
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Bill Baxter wrote:
 On Wed, Sep 10, 2008 at 10:07 PM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 Bill Baxter wrote:
 But I think you and I are in agreement that it would be easier and
 more natural to think of ranges as iterators augmented with
 information about bounds, as opposed to a contiguous block of things
 from A to B.

API never assumes or requires that there's an actual contiguous block of things underneath. Au contraire, in the I/O case, there's only "the current element" underneath.

Yes, I see that and think it's great. But the point I've been trying to make is that the nomenclature you are using seems to emphasize the contiguous block interpretation, rather than the interpretation as a cursor plus a sentinel. The contiguous block terminology makes good sense for slices, but less for things like trees and unbounded generators and HMMs.

I disagree that isEmpty, first, and next suggest anything near contiguous block. It's just list terminology. Is the list empty? Give me the first element of the list. Advance to the next element in the list. Names for the before and after range operations are still in the air... Are you referring to the "range" name itself?
 And ok, I do think your incredible shrinking bidirectional range is
 borked.  But other than that, I'm just talking about terminology.

How is it borked? Andrei
Sep 10 2008
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Bill Baxter wrote:
 On Wed, Sep 10, 2008 at 11:57 PM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 Bill Baxter wrote:
 On Wed, Sep 10, 2008 at 10:07 PM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 Bill Baxter wrote:
 But I think you and I are in agreement that it would be easier and
 more natural to think of ranges as iterators augmented with
 information about bounds, as opposed to a contiguous block of things
 from A to B.

my API never assumes or requires that there's an actual contiguous block of things underneath. Au contraire, in the I/O case, there's only "the current element" underneath.

to make is that the nomenclature you are using seems to emphasize the contiguous block interpretation, rather than the interpretation as a cursor plus a sentinel. The contiguous block terminology makes good sense for slices, but less for things like trees and unbounded generators and HMMs.

block. It's just list terminology. Is the list empty? Give me the first element of the list. Advance to the next element in the list.

However a range isn't, generally speaking, a list. It's a way to traverse or access data that may or may not be a list. For something like an unbounded generator, it is odd to speak of the "first". Such an object has a current value and a "next", but the value you can look at right now is only the "first" by a bit of a terminology stretch.

Agreed. The problem with "current" instead of "first" is that there's no clear correspondent for "the last that the current will be". First and last are obvious. Current and last are... well, not bad either :o).
 I think using list terminology unnecessarily confuses the iterating
 construct that does the accessing with the container being accessed.
 The range is not the container.  The range consists of a place where
 you are, and a termination condition.

No. A bidirectional range also knows the last place you'll ever be, and is able to manipulate it.
  The range is not "empty" or
 "full" because it does not actually contain elements.

It is because a range is a view. The view can reduce to nothing. In math, an interval can be "empty". That doesn't mean it made all real numbers disappear :o).
 Sure, if you're
 dead set on it, you can say that by "empty" we mean that the set of
 things you would get if you called .next repeatedly is empty, but why?
  The terminology is just encouraging one to think of a range as a
 container, when in fact it is not -- it is more like two goal posts.
 Call it atEnd() or similar and you'll naturally encourage people to
 think of ranges as references rather than containers.
 
 Similarly, using list terminology led you to "pop".  But pop on a
 range does not actually remove any content.  Pop just moves the goal
 post on one end.

Correct. Then how would you name'em?
 And then there's the various union/diff stuff, which everyone seems to
 find confusing.  I think much of that confusion and mental overhead
 just goes away if you think of a range as a good old iterator plus a
 stopping condition.

I like before and after. Besides, the challenge is that you come with something that's not confusing.
 Names for the before and after range operations are still in the air...

 Are you referring to the "range" name itself?

That could be part of the reason for this tendency to try to assign list-like names to the parts. If it were called a "bounded iterator" I think that would better describe the perspective I'm pushing, and naturally lead to choices like "atEnd" instead of "isEmpty".

Words are powerful. Phrases are less powerful. I'll never ever settle on anything longer than ONE word for the concept. Ranges came to mind because boost uses them with a similar meaning. Andrei
Sep 10 2008
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Bill Baxter wrote:
 The other problem with empty is that it doesn't generalize to what I
 happen think a bidirectional range should be, one with .next .prev,
 .hasNext and .hasPrev.

hasNext and hasPrev are not orthogonal and add unnecessarily complicated. Is there a range that has next but not prev, or vice versa? No, Sir. There is an "there's still meat on the plate" condition and that's all needed.
 Your bidir iterator in C++ parlance is a forward iterator and a
 reverse iterator operating on the same sequence.  I can't really think
 of any algorithms other than the one you showed that use such a pair.

 On the other hand my bidir is useful in all the places a C++ bidir
 iterator is useful.  Any time you need to scan a cursor back and
 forth.  It basically maps directly onto the operation a doubly-linked
 list is good at.  But could be used in traversing any tree-like data
 structure too, I think.

--it is easily done with range primitives if you've saved the initial position of it.
 Similarly, using list terminology led you to "pop".  But pop on a
 range does not actually remove any content.  Pop just moves the goal
 post on one end.



r.atEnd r.value r.next r.moveTo(s) r.moveToEndOf(s) r.last r.pop r.moveEndToEndOf(s) / moveEndTo(s) I see in another post: r.atStart which I think is a design faux pas. Aside from that, things seem workable. But honestly I don't see how they bring a world of difference, nor had I a slap on my forehead moment when seeing the primitive names (as I did with before and after). Andrei
Sep 10 2008
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Bill Baxter wrote:
 But upon further reflection I think it may be that it's just not what
 I would call a bidirectional range.  By that I mean it's not good at
 solving the problems that a bidirectional iterator in C++ is good for.

It's good. I proved that constructively for std.algorithm, which of course doesn't stand. But I've also proved it theoretically informally to myself. Please imagine an algorithm that bidir iterators do and bidir ranges don't.
  Your bidir range may be useful (though I'm not really convinced that
 very many algorithms need what it provides) --  but I think one also
 needs an iterator that's good at what C++'s bidir iterators are good
 at, i.e. moving the active cursor backwards or forwards.  I would call
 your construct more of a "double-headed" range than a bidirectional
 one.

Oh, one more thing. If you study any algorithm that uses bidirectional iterators (such as reverse or Stepanov's partition), you'll notice that ALWAYS WITHOUT EXCEPTION there's two iterators involved. One moves up, the other moves down. This is absolutely essential because it tells that a bidirectional range models all a bidirectional iterator could ever do. If you can move some bidirectional iterator down, then definitely you know its former boundary so you can model that move with a bidirectional range. This is fundamental. Ranges NEVER grow. They ALWAYS shrink. Why? Simple: because a range has no idea what's outside of itself. It starts life with information of its limits from the container, and knows nothing about what's outside those limits. Consequently it ALWAYS WITHOUT EXCEPTION shrinks. Andrei
Sep 10 2008
next sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Andrei Alexandrescu" wrote
 Bill Baxter wrote:
 But upon further reflection I think it may be that it's just not what
 I would call a bidirectional range.  By that I mean it's not good at
 solving the problems that a bidirectional iterator in C++ is good for.

It's good. I proved that constructively for std.algorithm, which of course doesn't stand. But I've also proved it theoretically informally to myself. Please imagine an algorithm that bidir iterators do and bidir ranges don't.

Any iterative algorithm where the search might go up or down might be a candidate. Although I think you have a hard time showing one that needs strictly bidirectional iterators and not random access iterators. Perhaps a stream represented as a linked list? Imagine a video stream coming in, where the player buffers 10 seconds of data for decoding, and keeps 10 seconds of data buffered behind the current spot. If the user pauses the video, then wants to play backwards for 5 seconds, what kind of structure would you use to represent the 'current point in time'? A bidir range doesn't cut it, because it can only move one direction at a time. You would need 2 bidir ranges, but since you can't 'grow' the ranges, you can't add stuff as it is consumed from the forward range to the backwards range, or am I wrong about that? So how do you continually update your backwards iterator? I suppose you could simply 'generate' the backwards iterator when needed by diff'ing with the all range, but it seems unnecessarily cumbersome. In fact, you'd need to regenerate both ranges as data is removed from the front and added to the back (because the ends are continually changing). Perhaps a meta-range which has 2 bidir ranges in it can be provided. It would be simple enough to implement using existing ranges, but might have unnecessary performance issues. My belief is that ranges should be the form of input to algorithms, but iterators should be provided for using containers as general data structures. Similar to how strings are represented by arrays/slices, but iterators (pointers) exist if you need them. I'll probably move forward with this model in dcollections, I really like the range idea, and in general the view on how ranges are akin to slices. But I also like having access to iterators for other functions. -Steve
Sep 10 2008
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Steven Schveighoffer wrote:
 "Andrei Alexandrescu" wrote
 Bill Baxter wrote:
 But upon further reflection I think it may be that it's just not what
 I would call a bidirectional range.  By that I mean it's not good at
 solving the problems that a bidirectional iterator in C++ is good for.

doesn't stand. But I've also proved it theoretically informally to myself. Please imagine an algorithm that bidir iterators do and bidir ranges don't.

Any iterative algorithm where the search might go up or down might be a candidate. Although I think you have a hard time showing one that needs strictly bidirectional iterators and not random access iterators. Perhaps a stream represented as a linked list? Imagine a video stream coming in, where the player buffers 10 seconds of data for decoding, and keeps 10 seconds of data buffered behind the current spot. If the user pauses the video, then wants to play backwards for 5 seconds, what kind of structure would you use to represent the 'current point in time'? A bidir range doesn't cut it, because it can only move one direction at a time.

Of course it does. You just remember the leftmost point in time you need to remember. Then you use range primitives to get to where you want. Maybe a better abstraction for all that is a sliding window though.
 You would 
 need 2 bidir ranges, but since you can't 'grow' the ranges, you can't add 
 stuff as it is consumed from the forward range to the backwards range, or am 
 I wrong about that?  So how do you continually update your backwards 
 iterator?  I suppose you could simply 'generate' the backwards iterator when 
 needed by diff'ing with the all range, but it seems unnecessarily 
 cumbersome.  In fact, you'd need to regenerate both ranges as data is 
 removed from the front and added to the back (because the ends are 
 continually changing).  Perhaps a meta-range which has 2 bidir ranges in it 
 can be provided.  It would be simple enough to implement using existing 
 ranges, but might have unnecessary performance issues.

You don't need a meta range, though it's a good idea to have it as a convenience structure. All you need is store the two ranges and do range operations on them. Notice that "a range can't grow" is different from "a range can't be assigned from a larger range". In particular, a range operation can return a range larger than both input ranges. But not larger than their union :o).
 My belief is that ranges should be the form of input to algorithms, but 
 iterators should be provided for using containers as general data 
 structures.  Similar to how strings are represented by arrays/slices, but 
 iterators (pointers) exist if you need them.

If we agree it's better without iterators if not needed, we'd need a strong case to add them. Right now I have a strong case against them.
 I'll probably move forward with this model in dcollections, I really like 
 the range idea, and in general the view on how ranges are akin to slices. 
 But I also like having access to iterators for other functions.

Which functions? Andrei
Sep 10 2008
next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Bill Baxter wrote:
 On Thu, Sep 11, 2008 at 2:44 AM, Andrei Alexandrescu
 Cognitive load...
 What if I want to write a nice standalone function that takes a
 pointer to where we are and manipulates it?  I have to pass that
 function two iterators I suppose?

A function only needing one iterator is a chymera. It can't move it any direction. To such a function you pass a pointer or reference to the object you want to manipulate directly. What's there to not like about it.
 One is (begin,current) the other
 (current,end), and as I iterate I have to move both the second of the
 first and the first of second?  All just to do something that should
 be trivial with a linked list.
 
 I agree that your pinch range is needed, but I also see a need for
 something that maps more directly onto the features of a doubly linked
 list.

I think you get a lot more insight by actually sitting down and rewriting a part of std.algorithm, and/or write some more meaningful algorithms with your abstraction of choice. When I started doing so I had no idea of what range primitives I need. And just like you now, I kept on hypothesizing in the dark on whether I need this and whether I need that. When you hypothesize in the dark the number of primitive things you need really grows unbounded, because there's always some unrealized imaginary need you want to satisfy. To carry the discussion on equal footing you need to do some of that work. Otherwise you will keep on coming with hypothetical situations of unverifiable likelihood, and I will have little meaningful retort to put forth. Speaking of which, a great merit of Stepanov is that he showed what a great host of algorithms can be implemented with a precise and narrow interface. We all knew how to rotate elements in an array. He showed how to rotate elements in a singly-linked list. Andrei
Sep 10 2008
prev sibling next sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Andrei Alexandrescu" wrote
 Steven Schveighoffer wrote:
 "Andrei Alexandrescu" wrote
 Bill Baxter wrote:
 But upon further reflection I think it may be that it's just not what
 I would call a bidirectional range.  By that I mean it's not good at
 solving the problems that a bidirectional iterator in C++ is good for.

course doesn't stand. But I've also proved it theoretically informally to myself. Please imagine an algorithm that bidir iterators do and bidir ranges don't.

Any iterative algorithm where the search might go up or down might be a candidate. Although I think you have a hard time showing one that needs strictly bidirectional iterators and not random access iterators. Perhaps a stream represented as a linked list? Imagine a video stream coming in, where the player buffers 10 seconds of data for decoding, and keeps 10 seconds of data buffered behind the current spot. If the user pauses the video, then wants to play backwards for 5 seconds, what kind of structure would you use to represent the 'current point in time'? A bidir range doesn't cut it, because it can only move one direction at a time.

Of course it does. You just remember the leftmost point in time you need to remember. Then you use range primitives to get to where you want. Maybe a better abstraction for all that is a sliding window though.

Not sure. I'd have to see how messy the 'use range primitives' looks :)
 You would need 2 bidir ranges, but since you can't 'grow' the ranges, you 
 can't add stuff as it is consumed from the forward range to the backwards 
 range, or am I wrong about that?  So how do you continually update your 
 backwards iterator?  I suppose you could simply 'generate' the backwards 
 iterator when needed by diff'ing with the all range, but it seems 
 unnecessarily cumbersome.  In fact, you'd need to regenerate both ranges 
 as data is removed from the front and added to the back (because the ends 
 are continually changing).  Perhaps a meta-range which has 2 bidir ranges 
 in it can be provided.  It would be simple enough to implement using 
 existing ranges, but might have unnecessary performance issues.

You don't need a meta range, though it's a good idea to have it as a convenience structure. All you need is store the two ranges and do range operations on them.

Perhaps not, I haven't used the ranges as you have implemented them, nor have I used them from boost. I agree with the general idea that ranges are safer and simpler to use when a range is needed. It makes perfect sense to pass a single range type rather than 2 iterators, and this is the majority of usages for iterators anyways. I 100% agree that ranges are the way to go instead of passing begin() and end() all the time to algorithm templates.
 Notice that "a range can't grow" is different from "a range can't be 
 assigned from a larger range". In particular, a range operation can return 
 a range larger than both input ranges. But not larger than their union 
 :o).

Yes, so every time you add an element you have to update your forward range from the 'all' range so it includes the new element at the end. Every time you remove an element, you have to update your reverse range from the 'all' range so it excludes the element at the beginning. Failure to do this results in invalid ranges, which seems to me like a lot more work than simply not doing anything (in the case of an iterator). The pitfalls of using ranges for dynamically changing containers might outweigh the advantages that they provide in certain cases.
 My belief is that ranges should be the form of input to algorithms, but 
 iterators should be provided for using containers as general data 
 structures.  Similar to how strings are represented by arrays/slices, but 
 iterators (pointers) exist if you need them.

If we agree it's better without iterators if not needed, we'd need a strong case to add them. Right now I have a strong case against them.

I don't need to worry about whether you have them or not, I can always implement them on my own ;) Really, range/iterator support doesn't require direct support from the compiler (except for builtin arrays), and any improvements made to the compiler to support ranges (such as reference returns, etc) can be applied to iterators as well. I think ranges are an excellent representation when a range of elements is needed. I think a cursor or iterator is an excellent representation when an individual position is needed.
 I'll probably move forward with this model in dcollections, I really like 
 the range idea, and in general the view on how ranges are akin to slices. 
 But I also like having access to iterators for other functions.

Which functions?

Functions which take or return a single position. Such as 'erase the element at this position' or 'find the position of element x'. -Steve
Sep 10 2008
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Steven Schveighoffer wrote:
 "Andrei Alexandrescu" wrote
 Notice that "a range can't grow" is different from "a range can't be 
 assigned from a larger range". In particular, a range operation can return 
 a range larger than both input ranges. But not larger than their union 
 :o).

Yes, so every time you add an element you have to update your forward range from the 'all' range so it includes the new element at the end. Every time you remove an element, you have to update your reverse range from the 'all' range so it excludes the element at the beginning. Failure to do this results in invalid ranges, which seems to me like a lot more work than simply not doing anything (in the case of an iterator). The pitfalls of using ranges for dynamically changing containers might outweigh the advantages that they provide in certain cases.

No, this is incorrect. I don't "have to" at all. I could define the behavior of range as you mention, or I could render them undefined. Iterators invalidate anyway at the drop of a hat, so they're none the wiser. You can't transform a lack of an advantage into a disadvantage. "Look at this pineapple. It's fresher than the other, and bigger too." "No, it's about as big. That pineapple sucks."
 My belief is that ranges should be the form of input to algorithms, but 
 iterators should be provided for using containers as general data 
 structures.  Similar to how strings are represented by arrays/slices, but 
 iterators (pointers) exist if you need them.

strong case to add them. Right now I have a strong case against them.

I don't need to worry about whether you have them or not, I can always implement them on my own ;) Really, range/iterator support doesn't require direct support from the compiler (except for builtin arrays), and any improvements made to the compiler to support ranges (such as reference returns, etc) can be applied to iterators as well. I think ranges are an excellent representation when a range of elements is needed. I think a cursor or iterator is an excellent representation when an individual position is needed.
 I'll probably move forward with this model in dcollections, I really like 
 the range idea, and in general the view on how ranges are akin to slices. 
 But I also like having access to iterators for other functions.


Functions which take or return a single position. Such as 'erase the element at this position' or 'find the position of element x'.

I agree. In fact I agreed in my original document, which I quote: ``Coding with ranges also has disadvatages. Some algorithms work naturally with individual iterators in the "middle" of a range. A range-based implementation would have to maintain a redundant range spanning e.g. from the beginning of the container to that middle.'' However, I could meanigfully rewrite std.algorithm to work on ranges alone. The disadvantage does exist but is minor, For example, find does not return an iterator. It simply shrinks its input range until the element is found, or until it is empty. That way you can nicely use the result of find iteratively. Range find(alias pred = "a == b", Range, E)(Range haystack, E needle) { alias binaryFun!(pred) test; for (; !haystack.isEmpty; haystack.next) { if (test(haystack.first, needle)) break; } return haystack; } This is quite a few bites smaller than the previous version, which is now to be found in std.algorithm: Iterator!(Range) find(alias pred = "a == b", Range, E)(Range haystack, E needle) { alias binaryFun!(pred) test; for (auto i = begin(haystack); i != end(haystack); ++i) { if (test(*i, needle)) return i; } return end(haystack); } It has two less variables, and VERY importantly, one less type to deal with. Arguments aired against primitive ranges systematically omit this important simplification they bring. When you don't weigh in the advantages, of course all there is to be seen are the disadvantages. Better yet, when find does return, client code's in better shape because it doesn't need to compare the result against end(myrange). It can just test whether it's empty and be done with. So a newcomer to D2 would have to have an understanding of containers and ranges. Containers own data. They offer various traversals to crawl them in the form of ranges. Ranges are generalized slices. If iterators are thrown into the mix, things get more complex because iterators are a lower-level primitive, a generalized pointer. So the newcomer would have to ALSO understand iterators and deal with functions that require or return either. They'd have to learn how to pair iterators from ranges and how to extract iterators from ranges (more primitives). They'd also have to understand when it's better to hold on to a range (most of the time) versus a naked iterator (seldom and for a dubious convenience). I /understand/ there are advantages to iterators. Just don't forget the cost when discussing them. I am also sure that if I sat down long enough contemplating my navel I could come with more examples of iterators=good/ranges=bad. I am also sure that if I continued doing so I could also figure cases where a doubly linked-list iterator that "knows" whether it's atBegin or atEnd could come in handily. In fact how about this imaginary discussion between Stepanov and his imaginary colleague Tikhonov: Stepanov: "Here, I have these cool iterators. I can express a great deal of stuff with them. It's amazing." Tikhonov: "Ok, I have a doubly-linked list. An iterator is a node in the list, right?" S: "Da. Those are bidirectional iterators because they can move in two directions in the list." T: "Ok, my first element has prev == NULL and last element has next == NULL. Does your iterator know when it's at the begin and at the end of the list?" S: "No. You see, you'd have to compare two iterators to figure that out. Just pass around an iterator to the beginning and end of the list fragment you're interested in, as you find fit." T: "That sucks! I want an iterator to move up and down and tell me when it's at the beginning and at the end, without another stinkin' iterator." S: "I implemented a great deal of algorithms without needing that. What algorithms of yours can't work with a comparison instead of atBegin and atEnd?" T: "Well, I need to think of it. Maybe some video buffer or something." S: "That works. You just save the iterator at the beginning of the sliding window. Then you compare against it." T: "But I don't like that. I want you to define atBegin and atEnd so I don't need to carry an extra laggard!" S: "Then what algorithms would fundamentally rest on that particular feature?" T: "No idea. Here, let me look at my navel." S: "While you do that, let me ask you this. Whatever those algorithms are, they can't work on a circular list, right?" T: "I guess not. atBegin is never true for a circular list, unless, damn, you keep another iterator around to compare with." S: "So those algorithms of yours would work on a doubly-linked list but not on a circular list. Are you sure you care for that distinction and that loss in generality?" T: "Gee, look at the time. It's Friday evening! Let's go have a beer." Andrei
Sep 10 2008
next sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Andrei Alexandrescu" wrote
 Steven Schveighoffer wrote:
 "Andrei Alexandrescu" wrote
 Notice that "a range can't grow" is different from "a range can't be 
 assigned from a larger range". In particular, a range operation can 
 return a range larger than both input ranges. But not larger than their 
 union :o).

Yes, so every time you add an element you have to update your forward range from the 'all' range so it includes the new element at the end. Every time you remove an element, you have to update your reverse range from the 'all' range so it excludes the element at the beginning. Failure to do this results in invalid ranges, which seems to me like a lot more work than simply not doing anything (in the case of an iterator). The pitfalls of using ranges for dynamically changing containers might outweigh the advantages that they provide in certain cases.

No, this is incorrect. I don't "have to" at all. I could define the behavior of range as you mention, or I could render them undefined. Iterators invalidate anyway at the drop of a hat, so they're none the wiser. You can't transform a lack of an advantage into a disadvantage.

A range or iterator that becomes undefined when adding an element to a linked list or removing an element from a linked list (provided you don't remove the element in question) makes it useless for this type of purpose. What I want is a cursor that saves the position of an element, not the end and beginning. Here is what I'm assuming a range consists of, and granted this is an assumption since I didn't look at any of your implementation, and a list object which uses ranges doesn't even exist AFAIK. Assume that integers below are individual elements all: 0 1 2 3 4 5 6 7 8 9 E reverse range: 0 1 2 3 4 forward range: 5 6 7 8 9 E Now I remove an element from the front: all: 1 2 3 4 5 6 7 8 9 E reverse range: ? 1 2 3 4 forward range: 5 6 7 8 9 E I've now lost my reverse iterator because it's no longer valid, but I can reconstruct it by diffing the forward iterator and list.all. If I add to the end I got a similar situation, I can reconstruct my forward iterator by diffing list.all and the reverse iterator. Yes, it can be done, but it seems like more work than it's worth for this case. The problem is, not only do I have to pay attention to what the end and beginning of the list are (as I would with an iterator), but I also have to pay attention to the same pieces in the ranges. So ranges (in this implementation) have given me more work to do, and their still not safe because I could mistakenly use an invalid range.
 "Look at this pineapple. It's fresher than the other, and bigger too."

 "No, it's about as big. That pineapple sucks."

???
 My belief is that ranges should be the form of input to algorithms, but 
 iterators should be provided for using containers as general data 
 structures.  Similar to how strings are represented by arrays/slices, 
 but iterators (pointers) exist if you need them.

strong case to add them. Right now I have a strong case against them.

I don't need to worry about whether you have them or not, I can always implement them on my own ;) Really, range/iterator support doesn't require direct support from the compiler (except for builtin arrays), and any improvements made to the compiler to support ranges (such as reference returns, etc) can be applied to iterators as well. I think ranges are an excellent representation when a range of elements is needed. I think a cursor or iterator is an excellent representation when an individual position is needed.
 I'll probably move forward with this model in dcollections, I really 
 like the range idea, and in general the view on how ranges are akin to 
 slices. But I also like having access to iterators for other functions.


Functions which take or return a single position. Such as 'erase the element at this position' or 'find the position of element x'.

I agree. In fact I agreed in my original document, which I quote: ``Coding with ranges also has disadvatages. Some algorithms work naturally with individual iterators in the "middle" of a range. A range-based implementation would have to maintain a redundant range spanning e.g. from the beginning of the container to that middle.'' However, I could meanigfully rewrite std.algorithm to work on ranges alone. The disadvantage does exist but is minor, For example, find does not return an iterator. It simply shrinks its input range until the element is found, or until it is empty. That way you can nicely use the result of find iteratively.

I totally agree with you that ranges are the way to go for std.algorithm. I am not debating that. But you have no example of how iterators and ranges compare for using non-array containers in situations BESIDES running std.algorithm. I'm showing you an example, which happens to model after code I actually wrote and use, where iterators seem to be more suited for the task.
 Range find(alias pred = "a == b", Range, E)(Range haystack, E needle)
 {
     alias binaryFun!(pred) test;
     for (; !haystack.isEmpty; haystack.next)
     {
         if (test(haystack.first, needle)) break;
     }
     return haystack;
 }

 This is quite a few bites smaller than the previous version, which is
 now to be found in std.algorithm:

 Iterator!(Range) find(alias pred = "a == b", Range, E)(Range haystack, E
 needle)
 {
     alias binaryFun!(pred) test;
     for (auto i = begin(haystack); i != end(haystack); ++i)
     {
         if (test(*i, needle)) return i;
     }
     return end(haystack);
 }

 It has two less variables, and VERY importantly, one less type to deal
 with. Arguments aired against primitive ranges systematically omit this
 important simplification they bring. When you don't weigh in the
 advantages, of course all there is to be seen are the disadvantages.

 Better yet, when find does return, client code's in better shape because
 it doesn't need to compare the result against end(myrange). It can just
 test whether it's empty and be done with.

Unless myrange has changed since you called find. In which case you have to run find again to get the range?
 So a newcomer to D2 would have to have an understanding of containers
 and ranges. Containers own data. They offer various traversals to crawl
 them in the form of ranges. Ranges are generalized slices.

 If iterators are thrown into the mix, things get more complex because
 iterators are a lower-level primitive, a generalized pointer. So the
 newcomer would have to ALSO understand iterators and deal with functions
 that require or return either. They'd have to learn how to pair
 iterators from ranges and how to extract iterators from ranges (more
 primitives). They'd also have to understand when it's better to hold on
 to a range (most of the time) versus a naked iterator (seldom and for a
 dubious convenience).

 I /understand/ there are advantages to iterators. Just don't forget the
 cost when discussing them.

I don't forget the cost. I absolutely *100%* agree that ranges are a much better representation for std.algorithm. i.e. when a RANGE OF VALUES is required. When you want references SINGLE ELEMENTS that persist across container changes, I think the best implementation is a cursor/iterator. I think they can both exist. I think there is value to having a pointer to a single element without storing the boundaries with that pointer. This is just like the const debate that I continue to have with you and Walter. You want const for different reasons than for what I want const. I want const for contracts, and you want it for pure functions. You seem to dismiss anything that isn't in your realm of requirements as 'dubious' and 'seldom used'. Other people have requirements that are different than yours, and are just as valid.
 I am also sure that if I sat down long enough contemplating my navel I
 could come with more examples of iterators=good/ranges=bad.
 <snip>

Now you're just being rude :) Please note that I'm not attacking you personally. All I'm pointing out is that your solution solves certain problems VERY well, but leaves other problems not solved. I think allowing iterators/cursors would solve all the problems. I might be proven wrong, but certainly I don't think you've done that so far. I'd love to be proven wrong, since I agree that iterators are generally unsafe. -Steve
Sep 10 2008
next sibling parent Fawzi Mohamed <fmohamed mac.com> writes:
I am sorry I hadn't the time t fully follow the discussion, but I took 
some time to actually define how I think a basic iterator should be, in 
this I think I am quite close to Bill and Steven from what I could see.

Again I am not against ranges, ranges are nice, but iterators are more 
general, and in my opinion they should be what foreach targets.
Then ranges can basically trivially be an iterator that has more 
structure (compareLevel= FullOrdering) and more

basic idea:
an iterator has a position in a sequence+ possibility to move into it


Basic Iterator

// return element and go to next
// (reasons: most common use, only one function call (good if not 
inlined), also o for iterators on data that is not stored)
// throw an exception if you iterate past end
T next();
void transferTo(ref R it2) // transfer this iterator to it2 
(un-copyable iterators)
void stop(); // stop the iteration (might release resources)
size_t nElNext(); // number of elements, constant time, 0 if empty
ComparePos comparePos(R it2); // comparison of position, has to be in 
constant time
static const CompareLevel compareLevel; // compare level (for compile 
time choices)
static const SizeLevel sizeLevel; // size level (for compile time choices)

Copyable Generator: Iterator
T value; // return the actual value
opAssign(R); // copies the iterator

A range is obviously also a Basic iterator, but has more structure

* Bidirectional Iterator
size_t nElPrev; // number of elements, constant time, 0 if empty
T prev; // goes to previous element

// constants
enum ComparePos:int {
    Uncomparable, // might be bigger smaller or incompatible (Same only 
if compareLevel==CompareLevel.None)
    Incompatible, // ranges of two different sequences
    Same, // at the same position
    Growing, // in growing order
    Descending // in decreasing order
}
enum CompareLevel:int {
    None, // no comparison
    Equal, // can decide if they are at the same position in constant time
    FullOrdering // can compare all iterators in constant time
}
enum SizeLevel:int{
    Bounded, // finite and known size
    Finite, // finite but possibly unknown size
    MaybeFinite, // maybe finite, maybe infinite
    Infinite // infinite
}
const INFINITE_SIZE=size_t.max;
const MAYBE_INFINITE=size_t.max-1;
const UNKNOWN_FINITE=size_t.max-2;
Sep 10 2008
prev sibling next sibling parent reply Sergey Gromov <snake.scaly gmail.com> writes:
Steven Schveighoffer <schveiguy yahoo.com> wrote:
 "Andrei Alexandrescu" wrote
 Steven Schveighoffer wrote:
 "Andrei Alexandrescu" wrote
 Notice that "a range can't grow" is different from "a range can't be 
 assigned from a larger range". In particular, a range operation can 
 return a range larger than both input ranges. But not larger than their 
 union :o).

Yes, so every time you add an element you have to update your forward range from the 'all' range so it includes the new element at the end. Every time you remove an element, you have to update your reverse range from the 'all' range so it excludes the element at the beginning. Failure to do this results in invalid ranges, which seems to me like a lot more work than simply not doing anything (in the case of an iterator). The pitfalls of using ranges for dynamically changing containers might outweigh the advantages that they provide in certain cases.

No, this is incorrect. I don't "have to" at all. I could define the behavior of range as you mention, or I could render them undefined. Iterators invalidate anyway at the drop of a hat, so they're none the wiser. You can't transform a lack of an advantage into a disadvantage.

A range or iterator that becomes undefined when adding an element to a linked list or removing an element from a linked list (provided you don't remove the element in question) makes it useless for this type of purpose. What I want is a cursor that saves the position of an element, not the end and beginning. Here is what I'm assuming a range consists of, and granted this is an assumption since I didn't look at any of your implementation, and a list object which uses ranges doesn't even exist AFAIK. Assume that integers below are individual elements all: 0 1 2 3 4 5 6 7 8 9 E reverse range: 0 1 2 3 4 forward range: 5 6 7 8 9 E Now I remove an element from the front: all: 1 2 3 4 5 6 7 8 9 E reverse range: ? 1 2 3 4 forward range: 5 6 7 8 9 E I've now lost my reverse iterator because it's no longer valid, but I can reconstruct it by diffing the forward iterator and list.all. If I add to the end I got a similar situation, I can reconstruct my forward iterator by diffing list.all and the reverse iterator.

You don't mention here which iterator usage pattern you are trying to model with ranges. I can think of at least two. 1. You use a single bidirectional 'center' iterator, center == 5. As one would naturally do with iterators. Note then that whenever you use your center for, say, backward iteration, you reconstruct the actual range by calling list.begin. You do it on each iteration. No wonder it stays valid even if you remove the first element in the meantime: you're constructing your range from scratch anyway. If you want to model this pattern with ranges---no problem, keep an empty 'center' range, center == (5,5), and reconstruct backward iteration range, reverse = all.before(center); whenever you need to iterate, then center = reverse.end; This 'center' range, being slightly less efficient, stays valid and becomes invalid in exactly the same conditions as your classical iterator. 2. You use 3 iterators, one for the list start, one for the center, and one for the end. In this case the 'start' iterator becomes invalid after removing the first element exactly like 'reverse' range becomes invalid in your example, with exactly the same consequences.
Sep 10 2008
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Sergey Gromov wrote:
 Steven Schveighoffer <schveiguy yahoo.com> wrote:
 "Andrei Alexandrescu" wrote
 Steven Schveighoffer wrote:
 "Andrei Alexandrescu" wrote
 Notice that "a range can't grow" is different from "a range can't be 
 assigned from a larger range". In particular, a range operation can 
 return a range larger than both input ranges. But not larger than their 
 union :o).

range from the 'all' range so it includes the new element at the end. Every time you remove an element, you have to update your reverse range from the 'all' range so it excludes the element at the beginning. Failure to do this results in invalid ranges, which seems to me like a lot more work than simply not doing anything (in the case of an iterator). The pitfalls of using ranges for dynamically changing containers might outweigh the advantages that they provide in certain cases.

behavior of range as you mention, or I could render them undefined. Iterators invalidate anyway at the drop of a hat, so they're none the wiser. You can't transform a lack of an advantage into a disadvantage.

linked list or removing an element from a linked list (provided you don't remove the element in question) makes it useless for this type of purpose. What I want is a cursor that saves the position of an element, not the end and beginning. Here is what I'm assuming a range consists of, and granted this is an assumption since I didn't look at any of your implementation, and a list object which uses ranges doesn't even exist AFAIK. Assume that integers below are individual elements all: 0 1 2 3 4 5 6 7 8 9 E reverse range: 0 1 2 3 4 forward range: 5 6 7 8 9 E Now I remove an element from the front: all: 1 2 3 4 5 6 7 8 9 E reverse range: ? 1 2 3 4 forward range: 5 6 7 8 9 E I've now lost my reverse iterator because it's no longer valid, but I can reconstruct it by diffing the forward iterator and list.all. If I add to the end I got a similar situation, I can reconstruct my forward iterator by diffing list.all and the reverse iterator.

You don't mention here which iterator usage pattern you are trying to model with ranges. I can think of at least two. 1. You use a single bidirectional 'center' iterator, center == 5. As one would naturally do with iterators. Note then that whenever you use your center for, say, backward iteration, you reconstruct the actual range by calling list.begin. You do it on each iteration. No wonder it stays valid even if you remove the first element in the meantime: you're constructing your range from scratch anyway. If you want to model this pattern with ranges---no problem, keep an empty 'center' range, center == (5,5), and reconstruct backward iteration range, reverse = all.before(center); whenever you need to iterate, then center = reverse.end; This 'center' range, being slightly less efficient, stays valid and becomes invalid in exactly the same conditions as your classical iterator. 2. You use 3 iterators, one for the list start, one for the center, and one for the end. In this case the 'start' iterator becomes invalid after removing the first element exactly like 'reverse' range becomes invalid in your example, with exactly the same consequences.

I'm acquiring the nagging feeling that Sergey understands ranges better than I do. I could understand how to address Steven's point only after reading the post above. Thanks, Sergey. Andrei
Sep 10 2008
parent reply Sergey Gromov <snake.scaly gmail.com> writes:
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:
 I'm acquiring the nagging feeling that Sergey understands ranges better 
 than I do. I could understand how to address Steven's point only after 
 reading the post above. Thanks, Sergey.

Thank you, and welcome! ;) P.S. I really love the looks of "reverse = all.before(center);" It's like writing program in plain English.
Sep 10 2008
parent Derek Parnell <derek psych.ward> writes:
On Thu, 11 Sep 2008 02:20:32 +0400, Sergey Gromov wrote:

 Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:
 I'm acquiring the nagging feeling that Sergey understands ranges better 
 than I do. I could understand how to address Steven's point only after 
 reading the post above. Thanks, Sergey.

Thank you, and welcome! ;) P.S. I really love the looks of "reverse = all.before(center);" It's like writing program in plain English.

Oh boy! We must put an end to that otherwise we might all be out of a job ;-) -- Derek Parnell Melbourne, Australia skype: derek.j.parnell
Sep 10 2008
prev sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Sergey Gromov" wrote
 You don't mention here which iterator usage pattern you are trying to
 model with ranges.  I can think of at least two.

 1.  You use a single bidirectional 'center' iterator, center == 5.  As
 one would naturally do with iterators.  Note then that whenever you use
 your center for, say, backward iteration, you reconstruct the actual
 range by calling list.begin.  You do it on each iteration.  No wonder it
 stays valid even if you remove the first element in the meantime: you're
 constructing your range from scratch anyway.  If you want to model this
 pattern with ranges---no problem, keep an empty 'center' range, center
 == (5,5), and reconstruct backward iteration range,

 reverse = all.before(center);

 whenever you need to iterate, then

 center = reverse.end;

 This 'center' range, being slightly less efficient, stays valid and
 becomes invalid in exactly the same conditions as your classical
 iterator.

This is exactly the pattern I use. I agree that your example would solve the problem, I hadn't thought of an empty range to be a cursor, that is clever! The only missing piece to your solution is that I must construct the range after the center range in order to access the value to see where I need to go. What I see as the biggest downside is the cumbersome and verbose code of moving the 'iterator' around, as every time I want to move forward, I construct a new range, and every time I want to move backwards I construct a new range (and construct a new 'center' afterwards). So a 'move back one' looks like: auto before = all.before(center); if(!before.isEmpty) center = before.pop.end; And to move forward it's: auto after = all.after(center); if(!after.isEmpty) center = after.next.begin; To get the value there, I have to do: all.after(center).left // or whatever gets decided as the 'get first value of range' member or if opStar is used: *all.after(center); I much prefer: forward: if(center != list.end) ++center; reverse: if(center != list.begin) --center; get value: *center; Especially without all the extra overhead I see both methods as being just as open to mistakes, the first more-so, and more difficult to comprehend (at least for me). -Steve
Sep 10 2008
next sibling parent Benji Smith <dlanguage benjismith.net> writes:
Bill Baxter wrote:
 So far though we don't seem to be able to come up with a good example
 other of where ranges are weak than traversing a list back and forth.

 ...

 But it is a little fishy that we can't come up with any other example
 besides sliding a bead on a wire back and forth.

I dunno about that. I can think of lots of examples where the "range" metaphor is an awkward interloper between the container and the iteration logic: maps, sets, bags, markov models, graphs, trees (especially in a breadth-first traversal) The word "range" and the idea of the range "moving", "shrinking", or being "empty" only matches my concept of "iteration" if I think strictly in terms of sequential containers (arrays, slices, lists, etc). I think the range methaphor is a very cool way of connecting sequential containers with algorithms (especially divide-and-conquer algorithms, which seem particularly well-suited to the range metaphor). But if I want to visit each <p> node in a DOM tree, I have a hard time visualizing how a "range" fits into that process. Maybe it's just terminology. I'm not sure yet. --benji
Sep 10 2008
prev sibling next sibling parent reply Benji Smith <dlanguage benjismith.net> writes:
Bill Baxter wrote:
 Ok, but I have yet to hear an actual use case that demands blazing
 fast iteration both forwards and backwards.  In your shuffling video
 there's no way moving the iterator back and forth is going to be the
 bottleneck.  In my undo/redo stack example it is also far from being
 on the critical path.    I think it goes back to the fact that going
 back and forth randomly isn't a property of many algorithms.  In all
 the examples I can think of it's more a property of how humans
 interact with data.  And humans are slow compared to how long it takes
 to update a few extra values.

Oh!! I thought of one!! Parsers & regex engines move both forward and backward, as they try to match characters to a pattern. Really, anything that uses an NFA or DFA to define patterns would benefit from fast bidirectional iteration... --benji
Sep 10 2008
next sibling parent reply Benji Smith <dlanguage benjismith.net> writes:
Bill Baxter wrote:
 On Thu, Sep 11, 2008 at 1:31 PM, Benji Smith <dlanguage benjismith.net> wrote:
 Parsers & regex engines move both forward and backward, as they try to match
 characters to a pattern.

 Really, anything that uses an NFA or DFA to define patterns would benefit
 from fast bidirectional iteration...

Good call. I was about to post something mentioning that Turing machines but that seemed too academic. Same class of thing as NFA/DFA/FSM. The question is, though, would you really implement those things using a linked list? I would expect most of those suckers work on arrays, and so can take advantage of the bidirectional nature of random access ranges.

Actually, Perl 6 will (assuming they ever finish it) finally allow regex matching against input streams: http://dev.perl.org/perl6/doc/design/syn/S05.html It's a big document. Search for the text "Matching against non-strings" This kind of thing was one of the main arguments I made in my "Why Strings as Classes" thread, that got everyone's panties in a bunch and that no one else agreed with. In that thread, I argued that Strings should be objects so that they can implement a CharSequence interface (or something like it). And then all the standard text-processing stuff could be written against that interface, allowing regex engines and parsers to be agnostic about whether they read from an actual in-memory string or from a (file|database|socket) input stream. With the range proposal on the table, I'd be just as happy if all the D text-processing stuff in the standard libs was implemented against a Range!(T), where T is one of the char types. Especially if ranges can be infinite. Bill Baxter wrote:
 Hmm, for FSMs you can't really define a good end state.  There may not
 be any particular end state. ... ah, but wait I forgot.  That's the
 beauty of a range -- the end "state" doesn't have to be a "state" per
 se.  It can be any predicate you want it to be.  "Range" is misleading
 in this case.  This is one of those cases where you just have to
 remember "range" means "current value plus stopping criterion".

That's what I was saying earlier. I think the mechanics are good. And for contiguous, sequential containers, the word "range" is great. For other types of containers, or other selection/iteration scenarios, you can shoehorn your mental model into the "range" metaphor. But it's weird. --benji
Sep 11 2008
parent reply Sergey Gromov <snake.scaly gmail.com> writes:
Benji Smith <dlanguage benjismith.net> wrote:
 Bill Baxter wrote:
 Hmm, for FSMs you can't really define a good end state.  There may not
 be any particular end state. ... ah, but wait I forgot.  That's the
 beauty of a range -- the end "state" doesn't have to be a "state" per
 se.  It can be any predicate you want it to be.  "Range" is misleading
 in this case.  This is one of those cases where you just have to
 remember "range" means "current value plus stopping criterion".

That's what I was saying earlier. I think the mechanics are good. And for contiguous, sequential containers, the word "range" is great. For other types of containers, or other selection/iteration scenarios, you can shoehorn your mental model into the "range" metaphor. But it's weird.

It seems to me like a misuse of ranges. Do you really want to iterate over a state machine? FSM is a mailbox with a 'message' hole. You put messages into it and it does things. How do you iterate over a mailbox?
Sep 11 2008
parent reply Benji Smith <dlanguage benjismith.net> writes:
Sergey Gromov wrote:
 Benji Smith <dlanguage benjismith.net> wrote:
 Bill Baxter wrote:
 Hmm, for FSMs you can't really define a good end state.  There may not
 be any particular end state. ... ah, but wait I forgot.  That's the
 beauty of a range -- the end "state" doesn't have to be a "state" per
 se.  It can be any predicate you want it to be.  "Range" is misleading
 in this case.  This is one of those cases where you just have to
 remember "range" means "current value plus stopping criterion".

I think the mechanics are good. And for contiguous, sequential containers, the word "range" is great. For other types of containers, or other selection/iteration scenarios, you can shoehorn your mental model into the "range" metaphor. But it's weird.

It seems to me like a misuse of ranges. Do you really want to iterate over a state machine? FSM is a mailbox with a 'message' hole. You put messages into it and it does things. How do you iterate over a mailbox?

Well, no. Not when you put it like that. The example I posted earlier went something like this: MarkovModel<ApplicationState> model = ...; for (ApplicationState state : model) { state.doStuff(); } It's not a bad abstraction. The model handles all of the semantics of calculating the transition probabilities and selecting the next state, so that the foreach loop doesn't have to muss with those details. Yeah, it's a total misuse of the "range" metaphor, and that's exactly what I'm saying. In Java, where I implemented this project, an "iterator" is a tiny bit of logic for returning objects in a (potentially endless, potentially reversible) sequence, primarily to support looping constructs. Just because there's no underlying range of objects doesn't mean they're not iterable. Of course, Java iterators are *much* more limited constructs than these new D ranges. But I still think the concept has merit. And, like you said, calling them ranges makes them seem stupid. Because they're not ranges. --benji
Sep 11 2008
parent reply Sergey Gromov <snake.scaly gmail.com> writes:
Benji Smith <dlanguage benjismith.net> wrote:
 Sergey Gromov wrote:
 Benji Smith <dlanguage benjismith.net> wrote:
 Bill Baxter wrote:
 Hmm, for FSMs you can't really define a good end state.  There may not
 be any particular end state. ... ah, but wait I forgot.  That's the
 beauty of a range -- the end "state" doesn't have to be a "state" per
 se.  It can be any predicate you want it to be.  "Range" is misleading
 in this case.  This is one of those cases where you just have to
 remember "range" means "current value plus stopping criterion".

I think the mechanics are good. And for contiguous, sequential containers, the word "range" is great. For other types of containers, or other selection/iteration scenarios, you can shoehorn your mental model into the "range" metaphor. But it's weird.

It seems to me like a misuse of ranges. Do you really want to iterate over a state machine? FSM is a mailbox with a 'message' hole. You put messages into it and it does things. How do you iterate over a mailbox?

Well, no. Not when you put it like that. The example I posted earlier went something like this: MarkovModel<ApplicationState> model = ...; for (ApplicationState state : model) { state.doStuff(); } It's not a bad abstraction. The model handles all of the semantics of calculating the transition probabilities and selecting the next state, so that the foreach loop doesn't have to muss with those details. Yeah, it's a total misuse of the "range" metaphor, and that's exactly what I'm saying. In Java, where I implemented this project, an "iterator" is a tiny bit of logic for returning objects in a (potentially endless, potentially reversible) sequence, primarily to support looping constructs. Just because there's no underlying range of objects doesn't mean they're not iterable. Of course, Java iterators are *much* more limited constructs than these new D ranges. But I still think the concept has merit. And, like you said, calling them ranges makes them seem stupid. Because they're not ranges.

Well, if you get an object out of there on every step, and that object source can exhaust at some point, then the abstraction is correct. I agree that an input range is actually an arbitrary bounded iterator. But you also must agree that a random access iterator in C++ is actually an unbounded array. You always can invent a better name for any particular case. But C++ keeps calling them iterators to pronounce generocity and emphasize interchangeability to some degree. You may not notice that calling a string pointer an iterator is a bit awkward and misleading, because you get used to it and learned to think that way. There's no difference with ranges. Some of them are actual ranges, some are not, some are plain abstractions. You need to learn to think in ranges to use them naturally. This is true for any new language you're learning, programming or human, if you want to use them efficiently.
Sep 11 2008
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Sergey Gromov wrote:
 Benji Smith <dlanguage benjismith.net> wrote:
 Sergey Gromov wrote:
 Benji Smith <dlanguage benjismith.net> wrote:
 Bill Baxter wrote:
 Hmm, for FSMs you can't really define a good end state.  There may not
 be any particular end state. ... ah, but wait I forgot.  That's the
 beauty of a range -- the end "state" doesn't have to be a "state" per
 se.  It can be any predicate you want it to be.  "Range" is misleading
 in this case.  This is one of those cases where you just have to
 remember "range" means "current value plus stopping criterion".

I think the mechanics are good. And for contiguous, sequential containers, the word "range" is great. For other types of containers, or other selection/iteration scenarios, you can shoehorn your mental model into the "range" metaphor. But it's weird.

over a state machine? FSM is a mailbox with a 'message' hole. You put messages into it and it does things. How do you iterate over a mailbox?

The example I posted earlier went something like this: MarkovModel<ApplicationState> model = ...; for (ApplicationState state : model) { state.doStuff(); } It's not a bad abstraction. The model handles all of the semantics of calculating the transition probabilities and selecting the next state, so that the foreach loop doesn't have to muss with those details. Yeah, it's a total misuse of the "range" metaphor, and that's exactly what I'm saying. In Java, where I implemented this project, an "iterator" is a tiny bit of logic for returning objects in a (potentially endless, potentially reversible) sequence, primarily to support looping constructs. Just because there's no underlying range of objects doesn't mean they're not iterable. Of course, Java iterators are *much* more limited constructs than these new D ranges. But I still think the concept has merit. And, like you said, calling them ranges makes them seem stupid. Because they're not ranges.

Well, if you get an object out of there on every step, and that object source can exhaust at some point, then the abstraction is correct. I agree that an input range is actually an arbitrary bounded iterator. But you also must agree that a random access iterator in C++ is actually an unbounded array. You always can invent a better name for any particular case. But C++ keeps calling them iterators to pronounce generocity and emphasize interchangeability to some degree. You may not notice that calling a string pointer an iterator is a bit awkward and misleading, because you get used to it and learned to think that way. There's no difference with ranges. Some of them are actual ranges, some are not, some are plain abstractions. You need to learn to think in ranges to use them naturally. This is true for any new language you're learning, programming or human, if you want to use them efficiently.

I agree 100%, and also with Sergey's other post that some abstractions simply don't fit the range charter, or don't fit it naturally, or are not expressive enough for some rich iteration abstraction. Maybe ranges are lousy for an HMM, but then does look like I want to run a host of generic algorithms against an HMM? That IS the question. Probably not. HMMs have their own algorithms, and I wouldn't think of finding/sorting/partitioning an HMM just as I wouldn't think of applying Viterbi to an array. What I wanted was to make sure ranges are appropriate as higher-level abstractions that can replace STL-like iterators. My experience shows that they can. Not on 100% of occasions have they been a superior replacement, but I'm looking at a solid 80s at least. Add to that the advantage of better generators (which iterators make unpalatable because of the unsightly dummy end() requirement). When I also add the safety advantage of sinks (no more buffer overruns!!!), I feel we have a huge winner. Of course, that doesn't mean ranges should be the be-all end-all of iteration. This discussion reminds me of the "iterator craze" around 2000. People were discovering STL iterators and were trying to define and use the weirdest iterators. I remember distinctly a one-page ad on Dr. Dobb's Journal. They were looking for article writers. They mentioned the upcoming themes (e.g. networking, security, patterns, C++...). There was a _specific_ note: "Not interested in yet another iterator article". That being said, damn I wish I had the time to make RegEx faster AND operating on input ranges... Andrei
Sep 11 2008
parent Benji Smith <dlanguage benjismith.net> writes:
Andrei Alexandrescu wrote:
 What I wanted was to make sure ranges are appropriate as higher-level 
 abstractions that can replace STL-like iterators. My experience shows 
 that they can. Not on 100% of occasions have they been a superior 
 replacement, but I'm looking at a solid 80s at least. Add to that the 
 advantage of better generators (which iterators make unpalatable because 
 of the unsightly dummy end() requirement). When I also add the safety 
 advantage of sinks (no more buffer overruns!!!), I feel we have a huge 
 winner.

I agree. My quibble with the name "range" is pretty minor, and I don't have any qualm with the semantics. And "range" is certainly a better name for an iteration metaphor than "opApply". :-) --benji
Sep 11 2008
prev sibling parent Russell Lewis <webmaster villagersonline.com> writes:
Benji Smith wrote:
 Bill Baxter wrote:
 Ok, but I have yet to hear an actual use case that demands blazing
 fast iteration both forwards and backwards.  In your shuffling video
 there's no way moving the iterator back and forth is going to be the
 bottleneck.  In my undo/redo stack example it is also far from being
 on the critical path.    I think it goes back to the fact that going
 back and forth randomly isn't a property of many algorithms.  In all
 the examples I can think of it's more a property of how humans
 interact with data.  And humans are slow compared to how long it takes
 to update a few extra values.

Oh!! I thought of one!! Parsers & regex engines move both forward and backward, as they try to match characters to a pattern.

They do backtracking, which is different than iterating backward. I would suggest that a parser should use a stack of forward iterators instead. That's my $.02, at least.
 Really, anything that uses an NFA or DFA to define patterns would 
 benefit from fast bidirectional iteration...

DFAs can't backtrack, so they don't require backward movement through the input. NFAs might, depending on the implementation (are you going to use guess-and-backtrack, or parallel execution?) but I would again suggest that a stack (or "TODO list") of forward iterators might work better than backtracking.
Sep 11 2008
prev sibling next sibling parent Sergey Gromov <snake.scaly gmail.com> writes:
Steven Schveighoffer <schveiguy yahoo.com> wrote:
 What I see as the biggest downside is the cumbersome and verbose code of 
 moving the 'iterator' around, as every time I want to move forward, I 
 construct a new range, and every time I want to move backwards I construct a 
 new range (and construct a new 'center' afterwards).  So a 'move back one' 
 looks like:
 
 auto before = all.before(center);
 if(!before.isEmpty)
   center = before.pop.end;
 
 And to move forward it's:
 auto after = all.after(center);
 if(!after.isEmpty)
   center = after.next.begin;
 
 To get the value there, I have to do:
 all.after(center).left // or whatever gets decided as the 'get first value 
 of range' member
 
 or if opStar is used:
 
 *all.after(center);
 
 I much prefer:
 
 forward:
 if(center != list.end)
     ++center;
 
 reverse:
 if(center != list.begin)
    --center;
 
 get value:
 *center;
 
 Especially without all the extra overhead
 
 I see both methods as being just as open to mistakes, the first more-so, and 
 more difficult to comprehend (at least for me).

Yes, these are valid points, I completely agree. But there are also other points. Let me voice some of them. 1. Probably most important. You say here that ranges suck at incremental bidirectional iteration over a linked list, as Bill aslo agrees with. This seems true. But this sort of iteration is not a goal in its own. It's just an idiomatic *iterator* solution for a range of real-world problems. I can't think of any such problem from the top of my head but it's probably a matter of my education. Bill proposed one already. I want to say that I believe that for any such real-world problem there is a range solution that's probably better than a direct mapping of an existing iterator solution. The analogy would be trying to write in C++ as if you were using Python, or Haskell, and then declare that C++ sucks because it requires bulky, inefficient and error-prone code to implement simple functional idioms. Different languages require different idioms and ranges are a different language from iterators. For instance, Bill's undo/redo stack consisted of two entities: a list of operations, and a cursor. It was OK and natural with iterators. It sucks with ranges. Okay, I'm also going to use two entities: undo stack and redo stacks. Undo => pop from one, push to another. New operation => push undo, trash redo. Doesn't it look simpler and safer? Well, it doesn't use ranges, at least from the user perspective, so what? I also believe that a regular expression engine would benefit from using ranges rather than suffer. 2. There was a special case that a center marker couldn't have been dereferenced. Let's imagine you really needed it. OK, no problem. Let's create a special kind of ranges, Cursor. Cursor always contains one element. Its begin points to that element and its end is calculated so that it always points after that element no matter what. This is as valid as having an iterator pointing to that same element because you must guarantee in both cases that an iterator is dereferenceable. This is ad-hoc, yes, but an EOF iterator in C++ is no less ad-hoc.
Sep 11 2008
prev sibling next sibling parent reply Sergey Gromov <snake.scaly gmail.com> writes:
Steven Schveighoffer <schveiguy yahoo.com> wrote:
 "Bill Baxter" wrote
 I think one thing to consider is what it will take to make a new
 container support and "play nice" with the regime proposed.  This
 touches on Andrei's point about being hard pressed to think of generic
 algorithms to run on an HMM, too.

 The first question is who do you want to "play nice" with?  If you're
 going to be writing functions specifically for that container, then
 you don't really have to play nice with anyone.  Your container just
 needs to have the operations necessary to support those functions.

Bill, thanks so much for explaining it like this, I really agree with what you say. My concern is that iterator is going to become a 'bad word' and considered a flawed design. But you are right, there is no need for iterators to be allowed for std.algorithm, I totally agree with that, I just assumed Andrei meant iterators would be discouraged for everything, including general use as pointers into container objects. If that is not the case, then I wholeheartedly agree that algorithms should be restricted to ranges, and iterators should be used only in container operations.

If you ask me, I think iterators AKA pointers into containers should be discouraged from SafeD. If you don't care about SafeD you may use whatever you like. Most library interfaces want to be SafeD to make user's life easier but few care about the library internals as long as they work.
Sep 12 2008
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Sergey Gromov wrote:
 Steven Schveighoffer <schveiguy yahoo.com> wrote:
 "Bill Baxter" wrote
 I think one thing to consider is what it will take to make a new
 container support and "play nice" with the regime proposed.  This
 touches on Andrei's point about being hard pressed to think of generic
 algorithms to run on an HMM, too.

 The first question is who do you want to "play nice" with?  If you're
 going to be writing functions specifically for that container, then
 you don't really have to play nice with anyone.  Your container just
 needs to have the operations necessary to support those functions.

you say. My concern is that iterator is going to become a 'bad word' and considered a flawed design. But you are right, there is no need for iterators to be allowed for std.algorithm, I totally agree with that, I just assumed Andrei meant iterators would be discouraged for everything, including general use as pointers into container objects. If that is not the case, then I wholeheartedly agree that algorithms should be restricted to ranges, and iterators should be used only in container operations.

If you ask me, I think iterators AKA pointers into containers should be discouraged from SafeD. If you don't care about SafeD you may use whatever you like. Most library interfaces want to be SafeD to make user's life easier but few care about the library internals as long as they work.

That's also a reason why std.stdio must wrap FILE* into a safe struct. Manipulating FILE* objects directly is unsafe even if you disable pointer arithmetic. Andrei
Sep 12 2008
prev sibling parent reply Fawzi Mohamed <fmohamed mac.com> writes:
I like the new proposal much more than the first.

I believe you will be able to use it successfully in std.algorithm.
I still would have preferred an operation like a sameHead or 
compareHeadPosition (that might or might not return the order, but at 
least tests for equality) so that upon request (-debug flag?) one would 
be able to make all range operation safe (with overhead) in a generic 
way, but it is up to you.

I what I really care about is the following:
I want foreach magic on all objects that support .done and .next, even 
if they are not ranges.
foreach is about iteration, iteration needs only .done and .next (a 
generator, iterator whatever), and it should work with that.
Do not force the range idea on foreach iteration.
foreach is a language construct, not a library one and should allow for 
maximum flexibility.

As extra nicety as each generator/iterator/range returns just one 
object I would like to be able to do:

// i counts starting from 1, j iterates on iterJ and in parallel k 
iterates on a.all
foreach(i,j,k;1..$,iterJ,a.all){
	//...
}

and have it expanded to

Range!(int) r1=1..$;
alias iterJ r2;
typeof(a.all) r3=a.all;
while(!(r1.done || r2.done || r3.done)){
	typeof(r1.next) i=r1.next;
	typeof(r2.next) j=r2.next;
	typeof(r3.next) k=r3.next;
	//...
}

Fawzi
Sep 12 2008
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Fawzi Mohamed wrote:
 I like the new proposal much more than the first.
 
 I believe you will be able to use it successfully in std.algorithm.
 I still would have preferred an operation like a sameHead or 
 compareHeadPosition (that might or might not return the order, but at 
 least tests for equality) so that upon request (-debug flag?) one would 
 be able to make all range operation safe (with overhead) in a generic 
 way, but it is up to you.

Comparing for equality of heads is very important. For now you can obtain it as a non-primitive by invoking: auto sameHead = r.before(s).done; The above also show how "done" is not always very expressive. Also you can compare whether two ranges have the same end by invoking: auto sameRange = r is s;
 I what I really care about is the following:
 I want foreach magic on all objects that support .done and .next, even 
 if they are not ranges.
 foreach is about iteration, iteration needs only .done and .next (a 
 generator, iterator whatever), and it should work with that.
 Do not force the range idea on foreach iteration.
 foreach is a language construct, not a library one and should allow for 
 maximum flexibility.

Yes. Walter asked me to send him the syntactic transformation that foreach and foreach_reverse need to do. Duck typing will be used so as long as you define the proper names you're in good shape.
 As extra nicety as each generator/iterator/range returns just one object 
 I would like to be able to do:
 
 // i counts starting from 1, j iterates on iterJ and in parallel k 
 iterates on a.all
 foreach(i,j,k;1..$,iterJ,a.all){
     //...
 }
 
 and have it expanded to
 
 Range!(int) r1=1..$;
 alias iterJ r2;
 typeof(a.all) r3=a.all;
 while(!(r1.done || r2.done || r3.done)){
     typeof(r1.next) i=r1.next;
     typeof(r2.next) j=r2.next;
     typeof(r3.next) k=r3.next;
     //...
 }

Walter and I were discussing about ranges exposing key, key1, ... keyn. In that case foreach with multiple arguments would work, and would bind each of the extra argument to key, key1 etc. respectively. Andrei
Sep 12 2008
parent Fawzi Mohamed <fmohamed mac.com> writes:
On 2008-09-12 17:48:02 +0200, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 Fawzi Mohamed wrote:
 I like the new proposal much more than the first.
 
 I believe you will be able to use it successfully in std.algorithm.
 I still would have preferred an operation like a sameHead or 
 compareHeadPosition (that might or might not return the order, but at 
 least tests for equality) so that upon request (-debug flag?) one would 
 be able to make all range operation safe (with overhead) in a generic 
 way, but it is up to you.

Comparing for equality of heads is very important. For now you can obtain it as a non-primitive by invoking: auto sameHead = r.before(s).done;

nice I hadn't thought about this
 The above also show how "done" is not always very expressive. Also you 
 can compare whether two ranges have the same end by invoking:
 
 auto sameRange = r is s;

I suppose that you mean that "is" compares both the start and the end...
 I what I really care about is the following:
 I want foreach magic on all objects that support .done and .next, even 
 if they are not ranges.
 foreach is about iteration, iteration needs only .done and .next (a 
 generator, iterator whatever), and it should work with that.
 Do not force the range idea on foreach iteration.
 foreach is a language construct, not a library one and should allow for 
 maximum flexibility.

Yes. Walter asked me to send him the syntactic transformation that foreach and foreach_reverse need to do. Duck typing will be used so as long as you define the proper names you're in good shape.

very nice, this is important because generic algorithms aside you might want to loop on all sort of things.
 As extra nicety as each generator/iterator/range returns just one 
 object I would like to be able to do:
 
 // i counts starting from 1, j iterates on iterJ and in parallel k 
 iterates on a.all
 foreach(i,j,k;1..$,iterJ,a.all){
     //...
 }
 
 and have it expanded to
 
 Range!(int) r1=1..$;
 alias iterJ r2;
 typeof(a.all) r3=a.all;
 while(!(r1.done || r2.done || r3.done)){
     typeof(r1.next) i=r1.next;
     typeof(r2.next) j=r2.next;
     typeof(r3.next) k=r3.next;
     //...
 }

Walter and I were discussing about ranges exposing key, key1, ... keyn. In that case foreach with multiple arguments would work, and would bind each of the extra argument to key, key1 etc. respectively.

I like the possibility to give several iterators at once to foreach, so that you never have to define two opApply (one with index, one without), but you can easily add a counter if you want to. You can solve this also by have a "combiner" of iterators, but in my opinion it is uglier. If you allow an iterator to return several objects and also to have several iterators that are advanced together should use another syntax than the one I proposed, something like foreach(i;1..$; j; iterJ; k,l; multiIter){ } otherwise matching iteration variables with iterators gets a mess. Fawzi
Sep 12 2008
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Steven Schveighoffer wrote:
 I am also sure that if I sat down long enough contemplating my navel I
 could come with more examples of iterators=good/ranges=bad.
 <snip>

Now you're just being rude :) Please note that I'm not attacking you personally. All I'm pointing out is that your solution solves certain problems VERY well, but leaves other problems not solved. I think allowing iterators/cursors would solve all the problems. I might be proven wrong, but certainly I don't think you've done that so far. I'd love to be proven wrong, since I agree that iterators are generally unsafe.

Didn't mean to. You are making great points, and I hope (without being sure) they can be addressed. The "contemplating navel" thing is a fave quote of mine from Bjarne's book on C++. Andrei
Sep 10 2008
parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Andrei Alexandrescu" wrote
 Steven Schveighoffer wrote:
 I am also sure that if I sat down long enough contemplating my navel I
 could come with more examples of iterators=good/ranges=bad.
 <snip>

Now you're just being rude :) Please note that I'm not attacking you personally. All I'm pointing out is that your solution solves certain problems VERY well, but leaves other problems not solved. I think allowing iterators/cursors would solve all the problems. I might be proven wrong, but certainly I don't think you've done that so far. I'd love to be proven wrong, since I agree that iterators are generally unsafe.

Didn't mean to. You are making great points, and I hope (without being sure) they can be addressed. The "contemplating navel" thing is a fave quote of mine from Bjarne's book on C++.

Didn't know that :) Sometimes when someone is not aware of a quote/joke, it seems more personally motivated. I agree that our discussion is not bringing either of us to the other's side. I'm also hopeful the points can be addressed with ranges. -Steve
Sep 10 2008
prev sibling next sibling parent "Bill Baxter" <wbaxter gmail.com> writes:
On Thu, Sep 11, 2008 at 1:31 PM, Benji Smith <dlanguage benjismith.net> wrote:
 Bill Baxter wrote:
 Ok, but I have yet to hear an actual use case that demands blazing
 fast iteration both forwards and backwards.  In your shuffling video
 there's no way moving the iterator back and forth is going to be the
 bottleneck.  In my undo/redo stack example it is also far from being
 on the critical path.    I think it goes back to the fact that going
 back and forth randomly isn't a property of many algorithms.  In all
 the examples I can think of it's more a property of how humans
 interact with data.  And humans are slow compared to how long it takes
 to update a few extra values.

Oh!! I thought of one!! Parsers & regex engines move both forward and backward, as they try to match characters to a pattern. Really, anything that uses an NFA or DFA to define patterns would benefit from fast bidirectional iteration...

Good call. I was about to post something mentioning that Turing machines but that seemed too academic. Same class of thing as NFA/DFA/FSM. The question is, though, would you really implement those things using a linked list? I would expect most of those suckers work on arrays, and so can take advantage of the bidirectional nature of random access ranges. Hmm, for FSMs you can't really define a good end state. There may not be any particular end state. ... ah, but wait I forgot. That's the beauty of a range -- the end "state" doesn't have to be a "state" per se. It can be any predicate you want it to be. "Range" is misleading in this case. This is one of those cases where you just have to remember "range" means "current value plus stopping criterion". --bb
Sep 10 2008
prev sibling parent reply "Bill Baxter" <wbaxter gmail.com> writes:
On Fri, Sep 12, 2008 at 1:24 AM, Steven Schveighoffer
<schveiguy yahoo.com> wrote:
 "Bill Baxter" wrote
 Thus one can implement iterators on top of ranges, but I'd argue that
 ranges
 are much easier to implement on top of iterators.

Ranges are safer and easier to work with in most cases so it's worth it, or so the argument goes. You don't buy it?

I can define an iterator and it doesn't mean that it makes ranges any less safe. Just give me the choice, if I think iterators are a better fit, I might choose iterators. But having to shoehorn ranges into an iterator form so that I do not wince at the ugliness of my code seems like unnecessary baggage.

I think one thing to consider is what it will take to make a new container support and "play nice" with the regime proposed. This touches on Andrei's point about being hard pressed to think of generic algorithms to run on an HMM, too. The first question is who do you want to "play nice" with? If you're going to be writing functions specifically for that container, then you don't really have to play nice with anyone. Your container just needs to have the operations necessary to support those functions. The question of "playing nice" with everyone is not an issue at all until you start wanting to have one algorithm that works for lots of different containers to that can do kinda similar sorts of things. And that's exactly what std.algorithm is for. Supporting those kinds of operations that apply equally to a lot of different kinds of containers. So if you agree that ranges are good enough for std.algorithm, then you should agree that a generic iterator concept is not really necessary, since the places left where you really need an iterator are those places where a generic algorithm are not really useful. If they were generic algorithms then they would be in std.algorithm. The next thing that's worth thinking about, is how much do you have to work to play nice with std.algorithm? The easier it is to implement that interface expected by std.algorthm the better. So for one thing that means that you really want the std.algorithm concepts to nest, for one to build on the next. That way if you implement the most generic level, then you've automatically implemented all the more restricted levels. This leads us to want to have a names that make sense all the way up the hierarchy. Like isEmpty(). It pretty much makes sense no matter which direction or how many degrees of freedom you have. That's better than something like atEnd() for that reason. It is unbiased. You wouldn't want to have to provide an "atEnd" to work with forward ranges, and then an "isEmpty" to work with random access even though they mean the same thing. So the levels of iterators and naming of their parts should nest as much as possible. Which seems to be pretty much the case with the current proposal. I think .value will be a better name for "the current thing" than .left. (Using the operator * may be better still.) But other than that, the direction the naming is taking here on the NG seems good. I say .value in part because if users like you implement their own iterator types, then .value is a reasonable name for getting the thing referred to. So you could then write a function that takes a range or your iterator and uses the distinguished value referred to. In that sense * would be even better because it would let you pass in a pointer too. So in the end, really what I'm saying is that I think you are right. Iterators are useful sometimes and it would be nice to design ranges in such a way that the range terminology makes sense for iterators too. An iterator would probably support the .next property just like a range, for instance. That's a good name that will work with either. Maybe it's worth codifying what the iterator concepts should be even if std.algorithm won't use them.
 But I want to be able to
 construct ranges from pointers.

If iterators are up to you then you will be able to do this. But std.algorithm will only care about the ranges you construct, not the iterators.
 I want to save pointers.  I want to use
 pointers to refer to elements in a collection.  I want to use pointers to
 move one-at-a-time along a node-based container.  I don't want to 'emulate'
 pointers using ranges.  I don't want the library to resist me doing what I
 find natural.

I don't think it will as long as you provide those my-iterator-to-range functions.
 Anyways, I'm going to leave the discussion, I think I've said all I can
 about my views.  I'm not really good at explaining things anyways.  But I
 will update dcollections with what I think is the best compromise.  Then I
 might have a better understanding of how ranges fit into a collection
 package.  The good news is I don't have to worry about the language not
 providing iterators, everything is going to be library based, so we can
 easily try out both ways and see which is easier to use.

I think your comments have made a valuable addition to the conversation, and have at least helped me get my thoughts together. So thanks! I'll be interested to see how the work on your lib turns out. --bb
Sep 11 2008
parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Bill Baxter" wrote
 I think one thing to consider is what it will take to make a new
 container support and "play nice" with the regime proposed.  This
 touches on Andrei's point about being hard pressed to think of generic
 algorithms to run on an HMM, too.

 The first question is who do you want to "play nice" with?  If you're
 going to be writing functions specifically for that container, then
 you don't really have to play nice with anyone.  Your container just
 needs to have the operations necessary to support those functions.

Bill, thanks so much for explaining it like this, I really agree with what you say. My concern is that iterator is going to become a 'bad word' and considered a flawed design. But you are right, there is no need for iterators to be allowed for std.algorithm, I totally agree with that, I just assumed Andrei meant iterators would be discouraged for everything, including general use as pointers into container objects. If that is not the case, then I wholeheartedly agree that algorithms should be restricted to ranges, and iterators should be used only in container operations. Cheers! -Steve
Sep 12 2008
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Steven Schveighoffer wrote:
 "Bill Baxter" wrote
 I think one thing to consider is what it will take to make a new
 container support and "play nice" with the regime proposed.  This
 touches on Andrei's point about being hard pressed to think of generic
 algorithms to run on an HMM, too.

 The first question is who do you want to "play nice" with?  If you're
 going to be writing functions specifically for that container, then
 you don't really have to play nice with anyone.  Your container just
 needs to have the operations necessary to support those functions.

Bill, thanks so much for explaining it like this, I really agree with what you say. My concern is that iterator is going to become a 'bad word' and considered a flawed design. But you are right, there is no need for iterators to be allowed for std.algorithm, I totally agree with that, I just assumed Andrei meant iterators would be discouraged for everything, including general use as pointers into container objects. If that is not the case, then I wholeheartedly agree that algorithms should be restricted to ranges, and iterators should be used only in container operations.

You are right. Iterators can definitely be handy in many situations, and it took me some hair pulling to figure out how to do moveToFront with ranges alone. (Then admittedly it's a pretty wicked algorithm no matter what.) I don't want to discourage defining iterators, but rather not force you to define them when you define a new range, and also force people who want to use std.algorithm in learning them in addition to ranges. Andrei
Sep 12 2008
prev sibling parent reply "Bill Baxter" <wbaxter gmail.com> writes:
On Thu, Sep 11, 2008 at 10:35 AM, Steven Schveighoffer
<schveiguy yahoo.com> wrote:
 "Bill Baxter" wrote
 On Thu, Sep 11, 2008 at 9:32 AM, Bill Baxter <wbaxter gmail.com> wrote:
 On Thu, Sep 11, 2008 at 8:17 AM, Steven Schveighoffer
 To get the value there, I have to do:
 all.after(center).left // or whatever gets decided as the 'get first
 value
 of range' member
 or if opStar is used:

 *all.after(center);

Why is all that necessary? Can't you just do a *center?

Oh, I get it. It's empty. Duh. Ok, so you can have third cursor function in the std lib: T cursorValue(R,T)(R all, R center) { return all.after(center).left; } ... plus the cursorAdvance and cursorRetreat.

That is all fine and dandy in the world of "I don't care how well my iterators perform or how much code bloat is added because of them," but I usually work in a different world ;)

Ok, but I have yet to hear an actual use case that demands blazing fast iteration both forwards and backwards. In your shuffling video there's no way moving the iterator back and forth is going to be the bottleneck. In my undo/redo stack example it is also far from being on the critical path. I think it goes back to the fact that going back and forth randomly isn't a property of many algorithms. In all the examples I can think of it's more a property of how humans interact with data. And humans are slow compared to how long it takes to update a few extra values. Certainly one-way iteration needs to be as fast as possible, for all kinds of algorithms. But does bidirection iteration really need to be super-fast?
 But if I were forced not to use an iterator model (which isn't the case,
 iterators should be very possible without compiler help), I would actually
 implement this as a wrapper struct:

 struct Cursor(containerType)
 {
   private Range!(containerType) _cur;
   private containerType owner;

   Cursor  moveLeft() {...}
   Cursor moveRight() {...}
   bool hasLeft() {...}
   etc.
 }

That would work for me too. Just put it in the standard lib so I don't have to scratch my head wondering why such a basic thing is so hard to do! Of course once you do that, you have to wonder why this one's interface isn't branded a range concept but the others are. (I know I know... it's not Stepanov "basic"), but if it's there and people want to use it, I see no value in refusing to recognize it on purist grounds.
 Thus one can implement iterators on top of ranges, but I'd argue that ranges
 are much easier to implement on top of iterators.

Ranges are safer and easier to work with in most cases so it's worth it, or so the argument goes. You don't buy it? I think things like infinite generators make more sense as a range because it's difficult to express succinctly as two iterators. Or perhaps you don't mean to imply that every range would have a begin() and an end() iterator you could access?
 In any case, I think there are benefits to having a range type that is not
 necessarily defined as two iterators.

But how to do it without a large increase in the number of fundamental concepts you have to keep track of -- that's the issue. --bb
Sep 10 2008
parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Bill Baxter" wrote
 Thus one can implement iterators on top of ranges, but I'd argue that 
 ranges
 are much easier to implement on top of iterators.

Ranges are safer and easier to work with in most cases so it's worth it, or so the argument goes. You don't buy it?

I can define an iterator and it doesn't mean that it makes ranges any less safe. Just give me the choice, if I think iterators are a better fit, I might choose iterators. But having to shoehorn ranges into an iterator form so that I do not wince at the ugliness of my code seems like unnecessary baggage. I believe that when you are actually using a range of values, a range form is a much better, safer fit. When you want just a pointer to a single value, then a pointer-form is a better fit. But I want to be able to construct ranges from pointers. I want to save pointers. I want to use pointers to refer to elements in a collection. I want to use pointers to move one-at-a-time along a node-based container. I don't want to 'emulate' pointers using ranges. I don't want the library to resist me doing what I find natural. This goes back to a lot of the points I've brought up about 'safety' issues in D. D is a systems language, I like the safety by default, but when I can gain something by breaking the safety, I want to be able to do it efficiently, and without resistance from the compiler. Like logical const. I've proven it's possible to emulate, but at a performance disadvantage. This is no different, you can emulate iterators, but at a performance (and code bloat) disadvantage. Granted the disadvantage isn't as big for this as it is for logical const, but the question still remains - if I can do it already, why is it so bad if it's supported natively? Anyways, I'm going to leave the discussion, I think I've said all I can about my views. I'm not really good at explaining things anyways. But I will update dcollections with what I think is the best compromise. Then I might have a better understanding of how ranges fit into a collection package. The good news is I don't have to worry about the language not providing iterators, everything is going to be library based, so we can easily try out both ways and see which is easier to use. -Steve
Sep 11 2008
prev sibling next sibling parent "Bill Baxter" <wbaxter gmail.com> writes:
On Thu, Sep 11, 2008 at 9:41 AM, Benji Smith <dlanguage benjismith.net> wrote:
 Bill Baxter wrote:
 So far though we don't seem to be able to come up with a good example
 other of where ranges are weak than traversing a list back and forth.

 ...

 But it is a little fishy that we can't come up with any other example
 besides sliding a bead on a wire back and forth.

I dunno about that. I can think of lots of examples where the "range" metaphor is an awkward interloper between the container and the iteration logic: maps, sets, bags, markov models, graphs, trees (especially in a breadth-first traversal)

Iterators for maps, sets, bags, graphs, trees are usually either for pointing to a found element or for iterating over the whole thing. With ranges the former just becomes a degenerate range where only one end is actually important. The other end would probably be then end() of the container in the STL sense. The latter is no problem if you just want to forward iterate over everything.
 The word "range" and the idea of the range "moving", "shrinking", or being
 "empty" only matches my concept of "iteration" if I think strictly in terms
 of sequential containers (arrays, slices, lists, etc).

For the one-way ranges, the range is equivalent to a forward iterator plus an end(). You can do exactly the same things with it. The end() may very happily not actually exist, though, if not needed, or if it depends on some dynamic condition, like for your HMM example.
 I think the range methaphor is a very cool way of connecting sequential
 containers with algorithms (especially divide-and-conquer algorithms, which
 seem particularly well-suited to the range metaphor).

 But if I want to visit each <p> node in a DOM tree, I have a hard time
 visualizing how a "range" fits into that process.

For that you just use a forward range, which is just forward iterator plus a stopping criterion, that's all.
 Maybe it's just terminology. I'm not sure yet.

Maybe. There is one thing so far that we can point to and say "ranges aren't so great for this". That's the case where you want to scan forward *and* backward over your data. But I now believe that std lib functions can cover that usage case in a non-burdensome way, too. --bb
Sep 10 2008
prev sibling parent "Bill Baxter" <wbaxter gmail.com> writes:
On Thu, Sep 11, 2008 at 10:46 AM, Steven Schveighoffer
<schveiguy yahoo.com> wrote:
 "Bill Baxter" wrote
 So far though we don't seem to be able to come up with a good example
 other of where ranges are weak than traversing a list back and forth.
 Note that "move back and forth according to some user input" is not
 clearly not an "algorithm" that would be in std.algorithm.  But it
 does come up often enough in applications.  I don't think the fact
 that it's not strictly an Algorithm-with-a-captial-A makes it any less
 important.

 But it is a little fishy that we can't come up with any other example
 besides sliding a bead on a wire back and forth.

Any structure that might change topology doesn't lend itself well to persistant ranges.

But they often don't lend themselves to iterators either.
 Ranges are fine for iterating over a constant version of
 the container.  i.e., if you want to implement a search function, where you
 are assuming that during the search, the container doesn't change, that
 should take a range as an argument.  But storing references to individual
 elements for later use (such as O(1) lookup or quick removal), and modifying
 the container inbetween getting the reference and using the reference makes
 it difficult to guarantee the behavior.

Lots of algorithms on containers using iterators have this property too.
 The only range type that seems like
 it would be immune to such changes would be the empty range where both ends
 point to the same element.  In fact, this can be reduced to a single
 reference, just copied for the sake of calling it a 'range'.

Or a here-to-end range where one end points to a special "end" sentinel. Like with a linked list. You can't change the sentinel by mutating the contents so it remains valid. I think this would be a more common/useful form than the empty range. Because you can dereference the range from here-to-end, but you can't dereference the range from here to here. Probably the cursor idiom should also use "all" and "here-to-end" rather than "all" and "here-to-here". Then the all.after(center).value ugliness isn't needed, just center.value. (I prefer ".value" to ".first")
 Arrays are really a special case where the ranges unequivocally work because
 once you get a range, all of it is guaranteed not to disappear or change
 topology.  i.e. a slice always contains valid data, no matter what you do to
 the original array.  I think this is the model Andrei is trying to achieve
 for all containers/iterables, and I think it's just not the same.  I think
 passing the range around as one entity is a very helpful thing, especially
 for algorithms which generally take ranges in the form of 2 iterators, but I
 don't think it solves all problems.

Well it solves them, but is it worth the tradeoffs you have to make. You say no. I say I now have a reasonable way to handle the only example I could think of that seemed really cumbersome. Lacking further evidence it seems the main problem remaining is just having to carry around an extra value when you just want to refer to one value. However, those pointers to "single values" can also move, so for safety's sake why not keep the fence post with it? --bb
Sep 10 2008
prev sibling next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Bill Baxter wrote:
 Part of my argument here is that it's more natural and requires less
 cognitive load to think of things in terms of moving a cursor back and
 forth.  So you won't convince me by constructing clever range unions
 and differences to achieve the same thing as a simple ++ and -- can
 do. :-)

I agree, and I agreed in the draft on ranges, that code using ranges can on occasion be more awkward than code using iterators. I think their advantages do outweigh this disadvantage.
 This is fundamental. Ranges NEVER grow. They ALWAYS shrink. Why? Simple:
 because a range has no idea what's outside of itself. It starts life with
 information of its limits from the container, and knows nothing about what's
 outside those limits. Consequently it ALWAYS WITHOUT EXCEPTION shrinks.

Doesn't seem to be quite so absolute from my perusal of std::algorithm.

Code using iterators will naturally avail itself of all of their advantages. Code using ranges will do the same. From my experience with rewriting std.algorithm, the working style is a bit different. On occasion iterators are indeed more flexible. But overall my code has reduced in size and became safer because ranges are a higher-level abstraction. Also often code using ranges is easier to follow because there are fewer variables with more apparent meaning, and the progress of the algorithm is easier to follow by tracking range shrinking. Andrei
Sep 10 2008
prev sibling next sibling parent reply Sergey Gromov <snake.scaly gmail.com> writes:
Bill Baxter <wbaxter gmail.com> wrote:
 Here's one from DinkumWare's <algorithm>:
 
 template<class _BidIt1,
 	class _BidIt2,
 	class _BidIt3> inline
 	_BidIt3 _Merge_backward(_BidIt1 _First1, _BidIt1 _Last1,
 		_BidIt2 _First2, _BidIt2 _Last2, _BidIt3 _Dest, _Range_checked_iterator_tag)
 	{	// merge backwards to _Dest, using operator<
 	for (; ; )
 		if (_First1 == _Last1)
 			return (_STDEXT unchecked_copy_backward(_First2, _Last2, _Dest));
 		else if (_First2 == _Last2)
 			return (_STDEXT unchecked_copy_backward(_First1, _Last1, _Dest));
 		else if (_DEBUG_LT(*--_Last2, *--_Last1))
 			*--_Dest = *_Last1, ++_Last2;
 		else
 			*--_Dest = *_Last2, ++_Last1;
 	}
 
 
 You can probably work around it some way, but basically it's using the
 ability to ++ and -- on the same end as a sort of "peek next".

They're using the ability to ++ and -- to avoid post-decrement at any cost. Otherwise it'd be just
 		else if (_DEBUG_LT(*_Last2, *_Last1))
 			*--_Dest = *_Last1--;
 		else
 			*--_Dest = *_Last2--;

Now the same algorithm in ranges:
 Merge_backward(R1, R2, R3)(R1 s1, R2 s2, R3 dst)
 {
     for (;;)
     {
         if (s1.isEmpty())
             dst[] = s2[];
         else if (s2.isEmpty())
             dst[] = s1[];
         else if (s1.last < s2.last)
         {
             dst.last = s1.last;
             s1.shrink();
         }
         else
         {
             dst.last = s2.last;
             s2.shrink();
         }
         dst.shrink();
     }
 }

If there were shrink-on-read and (eureka!) shrink-on-write operations, it would be even shorter:
         else if (s1.last < s2.last)
             dst.putBack(s1.getBack());
         else
             dst.putBack(s2.getBack());

where both getBack() and putBack() shrink the range from the end side.
Sep 10 2008
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Sergey Gromov wrote:
 Bill Baxter <wbaxter gmail.com> wrote:
 Here's one from DinkumWare's <algorithm>:

 template<class _BidIt1,
 	class _BidIt2,
 	class _BidIt3> inline
 	_BidIt3 _Merge_backward(_BidIt1 _First1, _BidIt1 _Last1,
 		_BidIt2 _First2, _BidIt2 _Last2, _BidIt3 _Dest, _Range_checked_iterator_tag)
 	{	// merge backwards to _Dest, using operator<
 	for (; ; )
 		if (_First1 == _Last1)
 			return (_STDEXT unchecked_copy_backward(_First2, _Last2, _Dest));
 		else if (_First2 == _Last2)
 			return (_STDEXT unchecked_copy_backward(_First1, _Last1, _Dest));
 		else if (_DEBUG_LT(*--_Last2, *--_Last1))
 			*--_Dest = *_Last1, ++_Last2;
 		else
 			*--_Dest = *_Last2, ++_Last1;
 	}


 You can probably work around it some way, but basically it's using the
 ability to ++ and -- on the same end as a sort of "peek next".

They're using the ability to ++ and -- to avoid post-decrement at any cost. Otherwise it'd be just
 		else if (_DEBUG_LT(*_Last2, *_Last1))
 			*--_Dest = *_Last1--;
 		else
 			*--_Dest = *_Last2--;

Now the same algorithm in ranges:
 Merge_backward(R1, R2, R3)(R1 s1, R2 s2, R3 dst)
 {
     for (;;)
     {
         if (s1.isEmpty())
             dst[] = s2[];
         else if (s2.isEmpty())
             dst[] = s1[];
         else if (s1.last < s2.last)
         {
             dst.last = s1.last;
             s1.shrink();
         }
         else
         {
             dst.last = s2.last;
             s2.shrink();
         }
         dst.shrink();
     }
 }

If there were shrink-on-read and (eureka!) shrink-on-write operations, it would be even shorter:
         else if (s1.last < s2.last)
             dst.putBack(s1.getBack());
         else
             dst.putBack(s2.getBack());

where both getBack() and putBack() shrink the range from the end side.

Got to say I'm pretty much in awe :o). But (without thinking much about it) I think the assignments dst[] = s1[] and dst[] = s2[] should be replaced with calls to copy(retro(sx), retro(dst)). No? Andrei
Sep 10 2008
parent reply Sergey Gromov <snake.scaly gmail.com> writes:
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:
 Sergey Gromov wrote:
 Bill Baxter <wbaxter gmail.com> wrote:
 Here's one from DinkumWare's <algorithm>:

 template<class _BidIt1,
 	class _BidIt2,
 	class _BidIt3> inline
 	_BidIt3 _Merge_backward(_BidIt1 _First1, _BidIt1 _Last1,
 		_BidIt2 _First2, _BidIt2 _Last2, _BidIt3 _Dest, _Range_checked_iterator_tag)
 	{	// merge backwards to _Dest, using operator<
 	for (; ; )
 		if (_First1 == _Last1)
 			return (_STDEXT unchecked_copy_backward(_First2, _Last2, _Dest));
 		else if (_First2 == _Last2)
 			return (_STDEXT unchecked_copy_backward(_First1, _Last1, _Dest));
 		else if (_DEBUG_LT(*--_Last2, *--_Last1))
 			*--_Dest = *_Last1, ++_Last2;
 		else
 			*--_Dest = *_Last2, ++_Last1;
 	}


 You can probably work around it some way, but basically it's using the
 ability to ++ and -- on the same end as a sort of "peek next".

They're using the ability to ++ and -- to avoid post-decrement at any cost. Otherwise it'd be just
 		else if (_DEBUG_LT(*_Last2, *_Last1))
 			*--_Dest = *_Last1--;
 		else
 			*--_Dest = *_Last2--;

Now the same algorithm in ranges:
 Merge_backward(R1, R2, R3)(R1 s1, R2 s2, R3 dst)
 {
     for (;;)
     {
         if (s1.isEmpty())
             dst[] = s2[];
         else if (s2.isEmpty())
             dst[] = s1[];
         else if (s1.last < s2.last)
         {
             dst.last = s1.last;
             s1.shrink();
         }
         else
         {
             dst.last = s2.last;
             s2.shrink();
         }
         dst.shrink();
     }
 }

If there were shrink-on-read and (eureka!) shrink-on-write operations, it would be even shorter:
         else if (s1.last < s2.last)
             dst.putBack(s1.getBack());
         else
             dst.putBack(s2.getBack());

where both getBack() and putBack() shrink the range from the end side.

Got to say I'm pretty much in awe :o). But (without thinking much about it) I think the assignments dst[] = s1[] and dst[] = s2[] should be replaced with calls to copy(retro(sx), retro(dst)). No?

They originally use backward copying because they don't know where the destination range starts, long live buffer overrun. In case of ranges the destination range is well defined and there are no overlaps---that is, I believe this algorithm doesn't support using the same buffer as source and destination. So slice copying should be OK.
Sep 10 2008
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Sergey Gromov wrote:
 Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:
 Sergey Gromov wrote:
 Bill Baxter <wbaxter gmail.com> wrote:
 Here's one from DinkumWare's <algorithm>:

 template<class _BidIt1,
 	class _BidIt2,
 	class _BidIt3> inline
 	_BidIt3 _Merge_backward(_BidIt1 _First1, _BidIt1 _Last1,
 		_BidIt2 _First2, _BidIt2 _Last2, _BidIt3 _Dest, _Range_checked_iterator_tag)
 	{	// merge backwards to _Dest, using operator<
 	for (; ; )
 		if (_First1 == _Last1)
 			return (_STDEXT unchecked_copy_backward(_First2, _Last2, _Dest));
 		else if (_First2 == _Last2)
 			return (_STDEXT unchecked_copy_backward(_First1, _Last1, _Dest));
 		else if (_DEBUG_LT(*--_Last2, *--_Last1))
 			*--_Dest = *_Last1, ++_Last2;
 		else
 			*--_Dest = *_Last2, ++_Last1;
 	}


 You can probably work around it some way, but basically it's using the
 ability to ++ and -- on the same end as a sort of "peek next".

cost. Otherwise it'd be just
 		else if (_DEBUG_LT(*_Last2, *_Last1))
 			*--_Dest = *_Last1--;
 		else
 			*--_Dest = *_Last2--;

 Merge_backward(R1, R2, R3)(R1 s1, R2 s2, R3 dst)
 {
     for (;;)
     {
         if (s1.isEmpty())
             dst[] = s2[];
         else if (s2.isEmpty())
             dst[] = s1[];
         else if (s1.last < s2.last)
         {
             dst.last = s1.last;
             s1.shrink();
         }
         else
         {
             dst.last = s2.last;
             s2.shrink();
         }
         dst.shrink();
     }
 }

it would be even shorter:
         else if (s1.last < s2.last)
             dst.putBack(s1.getBack());
         else
             dst.putBack(s2.getBack());


it) I think the assignments dst[] = s1[] and dst[] = s2[] should be replaced with calls to copy(retro(sx), retro(dst)). No?

They originally use backward copying because they don't know where the destination range starts, long live buffer overrun. In case of ranges the destination range is well defined and there are no overlaps---that is, I believe this algorithm doesn't support using the same buffer as source and destination. So slice copying should be OK.

One up for ranges then. Whew. I was due for it :o). Andrei
Sep 10 2008
prev sibling parent reply Sean Kelly <sean invisibleduck.org> writes:
Sergey Gromov wrote:
 
 Now the same algorithm in ranges:
 
 Merge_backward(R1, R2, R3)(R1 s1, R2 s2, R3 dst)
 {
     for (;;)
     {
         if (s1.isEmpty())
             dst[] = s2[];
         else if (s2.isEmpty())
             dst[] = s1[];


I'm not sure the above is correct. It should return after the copy is performed, and the code also assumes that the size of dst is equal to the size of s2 and s1, respectively.
         else if (s1.last < s2.last)
         {
             dst.last = s1.last;
             s1.shrink();
         }
         else
         {
             dst.last = s2.last;
             s2.shrink();
         }
         dst.shrink();
     }
 }

If there were shrink-on-read and (eureka!) shrink-on-write operations, it would be even shorter:
         else if (s1.last < s2.last)
             dst.putBack(s1.getBack());
         else
             dst.putBack(s2.getBack());

where both getBack() and putBack() shrink the range from the end side.

Very slick. Sean
Sep 10 2008
parent reply Sergey Gromov <snake.scaly gmail.com> writes:
Sean Kelly <sean invisibleduck.org> wrote:
 Sergey Gromov wrote:
 
 Now the same algorithm in ranges:
 
 Merge_backward(R1, R2, R3)(R1 s1, R2 s2, R3 dst)
 {
     for (;;)
     {
         if (s1.isEmpty())
             dst[] = s2[];
         else if (s2.isEmpty())
             dst[] = s1[];


I'm not sure the above is correct. It should return after the copy is performed, and the code also assumes that the size of dst is equal to the size of s2 and s1, respectively.

Of course there should be return statements, thank you. I've never tested this code (obviously), just've thrown it together, so there ought to be stupid mistakes like this. As to the destination size. This is merge sort. The size of destination buffer must be the sum of the sizes of source buffers. As soon as one of the source buffers is empty, i.e. completely moved to the destination, there must be place exactly for what left in another source buffer. If this condition doesn't hold then the arguments weren't correct in the first place.
Sep 10 2008
next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Sergey Gromov wrote:
 Sean Kelly <sean invisibleduck.org> wrote:
 Sergey Gromov wrote:
 Now the same algorithm in ranges:

 Merge_backward(R1, R2, R3)(R1 s1, R2 s2, R3 dst)
 {
     for (;;)
     {
         if (s1.isEmpty())
             dst[] = s2[];
         else if (s2.isEmpty())
             dst[] = s1[];


performed, and the code also assumes that the size of dst is equal to the size of s2 and s1, respectively.

Of course there should be return statements, thank you. I've never tested this code (obviously), just've thrown it together, so there ought to be stupid mistakes like this. As to the destination size. This is merge sort. The size of destination buffer must be the sum of the sizes of source buffers. As soon as one of the source buffers is empty, i.e. completely moved to the destination, there must be place exactly for what left in another source buffer. If this condition doesn't hold then the arguments weren't correct in the first place.

Speaking of copying, C++'s std::copy and friends have been under increasing scrutiny lately because of their inability to modularly protect data against overruns. STL's three-argument functions that copy out are often a kiss of death for inexperienced STL users. I'm glad ranges cut that Gordian knot. Andrei
Sep 10 2008
prev sibling parent Sean Kelly <sean invisibleduck.org> writes:
Sergey Gromov wrote:
 Sean Kelly <sean invisibleduck.org> wrote:
 Sergey Gromov wrote:
 Now the same algorithm in ranges:

 Merge_backward(R1, R2, R3)(R1 s1, R2 s2, R3 dst)
 {
     for (;;)
     {
         if (s1.isEmpty())
             dst[] = s2[];
         else if (s2.isEmpty())
             dst[] = s1[];


performed, and the code also assumes that the size of dst is equal to the size of s2 and s1, respectively.

Of course there should be return statements, thank you. I've never tested this code (obviously), just've thrown it together, so there ought to be stupid mistakes like this. As to the destination size. This is merge sort. The size of destination buffer must be the sum of the sizes of source buffers. As soon as one of the source buffers is empty, i.e. completely moved to the destination, there must be place exactly for what left in another source buffer. If this condition doesn't hold then the arguments weren't correct in the first place.

Oops, of course. Sean
Sep 10 2008
prev sibling next sibling parent Sergey Gromov <snake.scaly gmail.com> writes:
Bill Baxter <wbaxter gmail.com> wrote:
 Here's another, an insertion sort:
 
 template<class _BidIt,
 	class _Ty> inline
 	void _Insertion_sort1(_BidIt _First, _BidIt _Last, _Ty *)
 	{	// insertion sort [_First, _Last), using operator<
 	if (_First != _Last)
 		for (_BidIt _Next = _First; ++_Next != _Last; )
 			{	// order next element
 			_BidIt _Next1 = _Next;
 			_Ty _Val = *_Next;
 
 			if (_DEBUG_LT(_Val, *_First))
 				{	// found new earliest element, move to front
 				_STDEXT unchecked_copy_backward(_First, _Next, ++_Next1);
 				*_First = _Val;
 				}
 			else
 				{	// look for insertion point after first
 				for (_BidIt _First1 = _Next1;
 					_DEBUG_LT(_Val, *--_First1);
 					_Next1 = _First1)
 					*_Next1 = *_First1;	// move hole down
 				*_Next1 = _Val;	// insert element in hole
 				}
 			}
 	}

This is a bit more complex. If only basic operations on ranges are allowed, it looks like this:
 void Insertion_sort1(R, T)(R r)
 {
     if (!r.isEmpty())
     {
         R tail = r;
         do
         {
             tail.next();
             R head = r.before(tail);
             T _Val = head.last;
             if (_Val < head.first)
             {
                 R from, to;
                 from = to = head;
                 from.shrink();
                 to.next();
                 copy(retro(from), retro(to));
                 head.first = _Val;
             }
             else
             {
                 R head1 = head;
                 head1.shrink();
                 for (; _Val < head1.last; head.shrink(), head1.shrink())
                     head.last = head1.last;
                 head.last = _Val;
             }
         }
         while (!tail.isEmpty())
     }
 }

Though it starts to look much better if we employ shrink-on-read/shrink- on-write AND copy-on-shrink, AKA incremental slicing:
 void Insertion_sort1(R, T)(R r)
 {
     if (!r.isEmpty())
     {
         R tail = r;
         do
         {
             tail.next();
             R head = r.before(tail);
             T _Val = head.last;
             if (_Val < head.first)
             {
                 copy(retro(head[0..$-1]), retro(head[1..$]));
                 head.first = _Val;
             }
             else
             {
                 for (R head1 = head[0..$-1]; _Val < head1.last;)
                     head.putBack(head1.getBack());
                 head.last = _Val;
             }
         }
         while (!tail.isEmpty())
     }
 }

Sep 10 2008
prev sibling parent reply "Bill Baxter" <wbaxter gmail.com> writes:
On Thu, Sep 11, 2008 at 9:32 AM, Bill Baxter <wbaxter gmail.com> wrote:
 On Thu, Sep 11, 2008 at 8:17 AM, Steven Schveighoffer
 To get the value there, I have to do:
 all.after(center).left // or whatever gets decided as the 'get first value
 of range' member
 or if opStar is used:

 *all.after(center);

Why is all that necessary? Can't you just do a *center?

Oh, I get it. It's empty. Duh. Ok, so you can have third cursor function in the std lib: T cursorValue(R,T)(R all, R center) { return all.after(center).left; } ... plus the cursorAdvance and cursorRetreat. --bb
Sep 10 2008
parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Bill Baxter" wrote
 On Thu, Sep 11, 2008 at 9:32 AM, Bill Baxter <wbaxter gmail.com> wrote:
 On Thu, Sep 11, 2008 at 8:17 AM, Steven Schveighoffer
 To get the value there, I have to do:
 all.after(center).left // or whatever gets decided as the 'get first 
 value
 of range' member
 or if opStar is used:

 *all.after(center);

Why is all that necessary? Can't you just do a *center?

Oh, I get it. It's empty. Duh. Ok, so you can have third cursor function in the std lib: T cursorValue(R,T)(R all, R center) { return all.after(center).left; } ... plus the cursorAdvance and cursorRetreat.

That is all fine and dandy in the world of "I don't care how well my iterators perform or how much code bloat is added because of them," but I usually work in a different world ;) But if I were forced not to use an iterator model (which isn't the case, iterators should be very possible without compiler help), I would actually implement this as a wrapper struct: struct Cursor(containerType) { private Range!(containerType) _cur; private containerType owner; Cursor moveLeft() {...} Cursor moveRight() {...} bool hasLeft() {...} etc. } Thus one can implement iterators on top of ranges, but I'd argue that ranges are much easier to implement on top of iterators. In any case, I think there are benefits to having a range type that is not necessarily defined as two iterators. -Steve
Sep 10 2008
parent "Bill Baxter" <wbaxter gmail.com> writes:
On Sat, Sep 13, 2008 at 3:21 AM, Denis Koroskin <2korden gmail.com> wrote:
 On Fri, 12 Sep 2008 20:10:28 +0400, Fawzi Mohamed <fmohamed mac.com> wrote:

 On 2008-09-12 17:48:02 +0200, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> said:

 Fawzi Mohamed wrote:
 foreach(i,j,k;1..$,iterJ,a.all){
    //...
 }



Foreach over multiple ranges in paraller is great, but it is quite hard to match key/value to the ranges in your example, because they are far from each other, especially if ranges are evaluated in some (possibly long) expressions. I prefer the following syntax more: foreach (key0, value0 : range0; value1 : range1; ... ) { // or something like this } This way key/value and range are close to each other and you don't need to move you look back and forth to understand what range does this value correspond too.

Err, you just repeated exactly what he said. --bb
Sep 12 2008
prev sibling parent "Denis Koroskin" <2korden gmail.com> writes:
On Fri, 12 Sep 2008 20:10:28 +0400, Fawzi Mohamed <fmohamed mac.com> wrote:

 On 2008-09-12 17:48:02 +0200, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> said:

 Fawzi Mohamed wrote:
 foreach(i,j,k;1..$,iterJ,a.all){
     //...
 }



Foreach over multiple ranges in paraller is great, but it is quite hard to match key/value to the ranges in your example, because they are far from each other, especially if ranges are evaluated in some (possibly long) expressions. I prefer the following syntax more: foreach (key0, value0 : range0; value1 : range1; ... ) { // or something like this } This way key/value and range are close to each other and you don't need to move you look back and forth to understand what range does this value correspond too.
Sep 12 2008
prev sibling parent "Bill Baxter" <wbaxter gmail.com> writes:
On Thu, Sep 11, 2008 at 3:33 AM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 Bill Baxter wrote:
 On Thu, Sep 11, 2008 at 2:44 AM, Andrei Alexandrescu
 Cognitive load...
 What if I want to write a nice standalone function that takes a
 pointer to where we are and manipulates it?  I have to pass that
 function two iterators I suppose?

A function only needing one iterator is a chymera. It can't move it any direction. To such a function you pass a pointer or reference to the object you want to manipulate directly. What's there to not like about it.

Oops. I meant two ranges not two iterators there. There are no iterators in this world. What I was after in the above is a function that somehow gets a) where we are, b) how far back we can go c) how far forward we can go. With ranges that seems cumbersome. With iterators it's exactly those 3 iterators.
 I think you get a lot more insight by actually sitting down and rewriting a
 part of std.algorithm, and/or write some more meaningful algorithms with
 your abstraction of choice. When I started doing so I had no idea of what
 range primitives I need. And just like you now, I kept on hypothesizing in
 the dark on whether I need this and whether I need that. When you
 hypothesize in the dark the number of primitive things you need really grows
 unbounded, because there's always some unrealized imaginary need you want to
 satisfy. To carry the discussion on equal footing you need to do some of
 that work. Otherwise you will keep on coming with hypothetical situations of
 unverifiable likelihood, and I will have little meaningful retort to put
 forth.

Ok. There's the ultimatum. I'll shut up and go to bed now. :-) --bb
Sep 10 2008
prev sibling next sibling parent Fawzi Mohamed <fmohamed mac.com> writes:
On 2008-09-10 14:35:29 +0200, "Bill Baxter" <wbaxter gmail.com> said:

 On Wed, Sep 10, 2008 at 7:47 AM, Fawzi Mohamed <fmohamed mac.com> wrote:
 
 2) All the methods with intersection of iterator in my opinion are
 difficult to memorize, and rarely used, I would scrap them.
 Instead I would add the comparison operation .atSamePlace(iterator!(T)y)
 that would say if two iterators are at the same place. With it one gets back
 all the power of pointers, and with a syntax and use that are
 understandable.

But that comparison operation is not enough to implement anything of substance. Try your hand at a few classic algorithms and you'll see.

are you sure? then a range is *exactly* equivalent to a STL iterator, only that it cannot go out of bounds: // left1-left2: while((!i1.isEmpty) && (!i1.atSamePlace(i2))){ i1.next; } // left2-left1: while((!i2.isEmpty) && (!i1.atSamePlace(i2))){ i1.next; } // union 1-2 while((!i1.isEmpty) && (!(i1.atSamePlace(i2))){ i1.next; } while(!i2.isEmpty){ i2.next; } // union 2-1 ... // lower triangle i1=c.all; while(!i1.isEmpty){ i2=c.all; while(!i2.isEmpty && !i2.atSamePlace(i1)){ i2.next; }

Your code shows that you can successfully iterate over the same elements described by Andrei's various unions and differences, but they do not show how you would, say, pass that new range another function to do that job. Such as you would want to do in say, a recursive sort. Since in this design you can't set or access the individual iterator-like components of a range directly, being able to copy the begin or end iterator from one range over to another is necessary, I think.

yes you are right this operation on the simplest iterators cannot be preformed recursevely without overhead (you can do it once, but then you need to store i1 & i2 in the new iterator, to do it again will add more and more overhead. Range union... can be used efficiently and safely only if the iterator has an order that can be easily checked, this is a useful abstraction, but not the basic one.
 But I think you and I are in agreement that it would be easier and
 more natural to think of ranges as iterators augmented with
 information about bounds, as opposed to a contiguous block of things
 from A to B.
 
 well these are the operations that you can do on basically all iterators
 (and with wich you can define new iterators).
 The one you propose need an underlying total order that can be efficiently
 checked, for example iterators on trees do not have necessarily this
 property, and then getting your kind of intersection can be difficult (and
 not faster than the operation using atSamePlace.

I don't think that's correct. Andrei's system does not need a total order any more than yours does. The unions and diffs just create new ranges by combining the components of existing ranges. They don't need to know anything about what happens in between those points or how you get from one to the other. Just take the "begin" of this guy and put it together with the "end" of that guy, for example. It doesn't require knowing how to get from anywhere to anywhere to create that new range.

well if you don't have a total order that you can easily check then this might be very unsafe think to i1.begin..i2.begin if i2.begin<i1.begin, you might miss that it is empty, and iterate forever... Fawzi
Sep 10 2008
prev sibling parent "Bill Baxter" <wbaxter gmail.com> writes:
On Thu, Sep 11, 2008 at 2:44 AM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 Steven Schveighoffer wrote:
 "Andrei Alexandrescu" wrote
 Bill Baxter wrote:
 But upon further reflection I think it may be that it's just not what
 I would call a bidirectional range.  By that I mean it's not good at
 solving the problems that a bidirectional iterator in C++ is good for.

It's good. I proved that constructively for std.algorithm, which of course doesn't stand. But I've also proved it theoretically informally to myself. Please imagine an algorithm that bidir iterators do and bidir ranges don't.

Any iterative algorithm where the search might go up or down might be a candidate. Although I think you have a hard time showing one that needs strictly bidirectional iterators and not random access iterators. Perhaps a stream represented as a linked list? Imagine a video stream coming in, where the player buffers 10 seconds of data for decoding, and keeps 10 seconds of data buffered behind the current spot. If the user pauses the video, then wants to play backwards for 5 seconds, what kind of structure would you use to represent the 'current point in time'? A bidir range doesn't cut it, because it can only move one direction at a time.

Of course it does. You just remember the leftmost point in time you need to remember. Then you use range primitives to get to where you want. Maybe a better abstraction for all that is a sliding window though.

Cognitive load... What if I want to write a nice standalone function that takes a pointer to where we are and manipulates it? I have to pass that function two iterators I suppose? One is (begin,current) the other (current,end), and as I iterate I have to move both the second of the first and the first of second? All just to do something that should be trivial with a linked list. I agree that your pinch range is needed, but I also see a need for something that maps more directly onto the features of a doubly linked list. --bb
Sep 10 2008
prev sibling parent reply JAnderson <ask me.com> writes:
Hi Andrei,

I like the idea behind ranges.  I don't like C++'s / stl's long winded 
syntax at all.  Its so large that it generally uses up several lines 
along with several typedefs etc...  All that work just to iterate over 
some data.  The longer things get the more error prone they get... how 
many times have I put an begin when I meant to put end *sigh*.

However I currently disagree on this point.

Andrei Alexandrescu wrote:
 Fine. So instead of saying:

 foreach (e; c.all) { ... }

 you can say

 foreach (e; c) { ... }

 I think that's some dubious savings.

I think its useful to have the implicit range conversion. Consider writing generic/template code. Of course built in arrays could provide the .all but then consider passing around ranges. That would also mean all ranges would also have a .all (could we go .all.all.all for instance?). I'm all for compile time checking however I think that implicit .all (with of course an explicit option) will make it easy to change a function that once took an object to take a simple range Also it would make it easy to change from one way of getting at a range to another. What about matrices? They don't implement default .all, they would provide like .col and .row.
 Andrei

Sep 09 2008
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
JAnderson wrote:
 
 Hi Andrei,
 
 I like the idea behind ranges.  I don't like C++'s / stl's long winded 
 syntax at all.  Its so large that it generally uses up several lines 
 along with several typedefs etc...  All that work just to iterate over 
 some data.  The longer things get the more error prone they get... how 
 many times have I put an begin when I meant to put end *sigh*.
 
 However I currently disagree on this point.
 
 Andrei Alexandrescu wrote:
  >
  > Fine. So instead of saying:
  >
  > foreach (e; c.all) { ... }
  >
  > you can say
  >
  > foreach (e; c) { ... }
  >
  > I think that's some dubious savings.
 
 
 I think its useful to have the implicit range conversion.  Consider 
 writing generic/template code.  Of course built in arrays could provide 
 the .all but then consider passing around ranges.  That would also mean 
 all ranges would also have a .all (could we go .all.all.all for 
 instance?).

There's no regression. There are containers and ranges. Containers have .all. Ranges don't. I think you guys are making a good point; I'm undecided on what would be better. One not-so-cool part about implicit conversion to range is that all of a sudden all range operations spill into the container. So people try to call c.pop and it doesn't compile. (Why?) They get confused.
 I'm all for compile time checking however I think that 
 implicit .all (with of course an explicit option) will make it easy to 
 change a function that once took an object to take a simple range  Also 
 it would make it easy to change from one way of getting at a range to 
 another.
 
 What about matrices?  They don't implement default .all, they would 
 provide like .col and .row.

Bidimensional ones that is :o). Andrei
Sep 10 2008
next sibling parent JAnderson <ask me.com> writes:
Andrei Alexandrescu wrote:
 JAnderson wrote:
 Hi Andrei,

 I like the idea behind ranges.  I don't like C++'s / stl's long winded 
 syntax at all.  Its so large that it generally uses up several lines 
 along with several typedefs etc...  All that work just to iterate over 
 some data.  The longer things get the more error prone they get... how 
 many times have I put an begin when I meant to put end *sigh*.

 However I currently disagree on this point.

 Andrei Alexandrescu wrote:
  >
  > Fine. So instead of saying:
  >
  > foreach (e; c.all) { ... }
  >
  > you can say
  >
  > foreach (e; c) { ... }
  >
  > I think that's some dubious savings.


 I think its useful to have the implicit range conversion.  Consider 
 writing generic/template code.  Of course built in arrays could 
 provide the .all but then consider passing around ranges.  That would 
 also mean all ranges would also have a .all (could we go .all.all.all 
 for instance?).

There's no regression. There are containers and ranges. Containers have .all. Ranges don't.

Just to be clear then. Say you write something that works on arrays and objects. Then you write: void Foo(T)(T t) { ... foreach (auto i; t.all) { } ... } Now I realize I want to use that function with a range as well as an object (its a template after all). Well if .all isn't regressive then I can't. Of course if .all was implicit then I might have written: void Foo(T)(T t) { ... foreach (auto i; t) { } ... } But then again, .all is still available so there's still a chance a coder might not realize that its better to use the implicit value. I'm beginning to think regressive would be useful either way. Note of course generic code does not just apply to templates. It also applies when I want to change a variable to a different type. If .all is required (and non-regressive) then I have to go to all the places that value is used and change it. Its the same reason auto is so awesome. Of course .all adds an extra function you'd need to implement for custom ranges, but it could always be in the "range" mixin.
 I think you guys are making a good point; I'm undecided on what would be 
 better. One not-so-cool part about implicit conversion to range is that 
 all of a sudden all range operations spill into the container. So people 
 try to call c.pop and it doesn't compile. (Why?) They get confused.
 
 I'm all for compile time checking however I think that implicit .all 
 (with of course an explicit option) will make it easy to change a 
 function that once took an object to take a simple range  Also it 
 would make it easy to change from one way of getting at a range to 
 another.

 What about matrices?  They don't implement default .all, they would 
 provide like .col and .row.

Bidimensional ones that is :o).

Of course :) being a games programmer, we know of only speak of one matrix type. Just kidding.
 
 
 Andrei

Sep 10 2008
prev sibling parent JAnderson <ask me.com> writes:
Andrei Alexandrescu wrote:
 JAnderson wrote:
 Hi Andrei,

 I like the idea behind ranges.  I don't like C++'s / stl's long winded 
 syntax at all.  Its so large that it generally uses up several lines 
 along with several typedefs etc...  All that work just to iterate over 
 some data.  The longer things get the more error prone they get... how 
 many times have I put an begin when I meant to put end *sigh*.

 However I currently disagree on this point.

 Andrei Alexandrescu wrote:
  >
  > Fine. So instead of saying:
  >
  > foreach (e; c.all) { ... }
  >
  > you can say
  >
  > foreach (e; c) { ... }
  >
  > I think that's some dubious savings.


 I think its useful to have the implicit range conversion.  Consider 
 writing generic/template code.  Of course built in arrays could 
 provide the .all but then consider passing around ranges.  That would 
 also mean all ranges would also have a .all (could we go .all.all.all 
 for instance?).

There's no regression. There are containers and ranges. Containers have .all. Ranges don't. I think you guys are making a good point; I'm undecided on what would be better. One not-so-cool part about implicit conversion to range is that all of a sudden all range operations spill into the container. So people try to call c.pop and it doesn't compile. (Why?) They get confused.

I'm not sure that range operations need to spill over. I was thinking that foreach would be kinda like a template. The foreach would do the implict conversion. ie something like (pseudo): foreach(I,T)(I i, T t, delegate d) { foreach (I i; t.all) { d(); } } Infact anything that takes range would implicitly convert (for they too can be used inside generic code). Of course that that would require compiler support, probably.
 
 I'm all for compile time checking however I think that implicit .all 
 (with of course an explicit option) will make it easy to change a 
 function that once took an object to take a simple range  Also it 
 would make it easy to change from one way of getting at a range to 
 another.

 What about matrices?  They don't implement default .all, they would 
 provide like .col and .row.

Bidimensional ones that is :o). Andrei

Sep 10 2008
prev sibling next sibling parent "Bill Baxter" <wbaxter gmail.com> writes:
On Thu, Sep 11, 2008 at 1:30 AM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 Bill Baxter wrote:
 However a range isn't, generally speaking, a list.  It's a way to
 traverse or access data that may or may not be a list.  For something
 like an unbounded generator, it is odd to speak of the "first".  Such
 an object has a current value and a "next", but the value you can look
 at right now is only the "first" by a bit of a terminology stretch.

Agreed. The problem with "current" instead of "first" is that there's no clear correspondent for "the last that the current will be". First and last are obvious. Current and last are... well, not bad either :o).
 I think using list terminology unnecessarily confuses the iterating
 construct that does the accessing with the container being accessed.
 The range is not the container.  The range consists of a place where
 you are, and a termination condition.

No. A bidirectional range also knows the last place you'll ever be, and is able to manipulate it.

That's just a mutable termination condition. Still fits my description.
  The range is not "empty" or
 "full" because it does not actually contain elements.

It is because a range is a view. The view can reduce to nothing. In math, an interval can be "empty". That doesn't mean it made all real numbers disappear :o).

The other problem with empty is that it doesn't generalize to what I happen think a bidirectional range should be, one with .next .prev, .hasNext and .hasPrev. Your bidir iterator in C++ parlance is a forward iterator and a reverse iterator operating on the same sequence. I can't really think of any algorithms other than the one you showed that use such a pair. On the other hand my bidir is useful in all the places a C++ bidir iterator is useful. Any time you need to scan a cursor back and forth. It basically maps directly onto the operation a doubly-linked list is good at. But could be used in traversing any tree-like data structure too, I think.
 Similarly, using list terminology led you to "pop".  But pop on a
 range does not actually remove any content.  Pop just moves the goal
 post on one end.

Correct. Then how would you name'em?

I made one proposal on digitalmars.D and I'm still waiting for comments.
 And then there's the various union/diff stuff, which everyone seems to
 find confusing.  I think much of that confusion and mental overhead
 just goes away if you think of a range as a good old iterator plus a
 stopping condition.

I like before and after. Besides, the challenge is that you come with something that's not confusing.

Yeh, before and after aren't too bad.
 Names for the before and after range operations are still in the air...

 Are you referring to the "range" name itself?

That could be part of the reason for this tendency to try to assign list-like names to the parts. If it were called a "bounded iterator" I think that would better describe the perspective I'm pushing, and naturally lead to choices like "atEnd" instead of "isEmpty".

Words are powerful. Phrases are less powerful. I'll never ever settle on anything longer than ONE word for the concept. Ranges came to mind because boost uses them with a similar meaning.

Yeh, I don't really have a problem with calling them ranges, as long as people keep in mind they're really bounded iterators. :-) --bb
Sep 10 2008
prev sibling parent "Bill Baxter" <wbaxter gmail.com> writes:
On Thu, Sep 11, 2008 at 1:45 AM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 Bill Baxter wrote:
 But upon further reflection I think it may be that it's just not what
 I would call a bidirectional range.  By that I mean it's not good at
 solving the problems that a bidirectional iterator in C++ is good for.

It's good. I proved that constructively for std.algorithm, which of course doesn't stand. But I've also proved it theoretically informally to myself. Please imagine an algorithm that bidir iterators do and bidir ranges don't.

Here's one from DinkumWare's <algorithm>: template<class _BidIt1, class _BidIt2, class _BidIt3> inline _BidIt3 _Merge_backward(_BidIt1 _First1, _BidIt1 _Last1, _BidIt2 _First2, _BidIt2 _Last2, _BidIt3 _Dest, _Range_checked_iterator_tag) { // merge backwards to _Dest, using operator< for (; ; ) if (_First1 == _Last1) return (_STDEXT unchecked_copy_backward(_First2, _Last2, _Dest)); else if (_First2 == _Last2) return (_STDEXT unchecked_copy_backward(_First1, _Last1, _Dest)); else if (_DEBUG_LT(*--_Last2, *--_Last1)) *--_Dest = *_Last1, ++_Last2; else *--_Dest = *_Last2, ++_Last1; } You can probably work around it some way, but basically it's using the ability to ++ and -- on the same end as a sort of "peek next". Here's another, an insertion sort: template<class _BidIt, class _Ty> inline void _Insertion_sort1(_BidIt _First, _BidIt _Last, _Ty *) { // insertion sort [_First, _Last), using operator< if (_First != _Last) for (_BidIt _Next = _First; ++_Next != _Last; ) { // order next element _BidIt _Next1 = _Next; _Ty _Val = *_Next; if (_DEBUG_LT(_Val, *_First)) { // found new earliest element, move to front _STDEXT unchecked_copy_backward(_First, _Next, ++_Next1); *_First = _Val; } else { // look for insertion point after first for (_BidIt _First1 = _Next1; _DEBUG_LT(_Val, *--_First1); _Next1 = _First1) *_Next1 = *_First1; // move hole down *_Next1 = _Val; // insert element in hole } } } I /think/ that's taking advantage of going both ways on the same iterator (or at least copies of the same iterator), but the code is a little hard to read. Part of my argument here is that it's more natural and requires less cognitive load to think of things in terms of moving a cursor back and forth. So you won't convince me by constructing clever range unions and differences to achieve the same thing as a simple ++ and -- can do. :-) Also a cursor that can go forward and backwards inbetween two limits is exactly what is easy to do with a doubly linked list. If you know how to use a doubly linked list you know how to use my version of bidir ranges. That's true in all cases where you are using a doubly-linked list. For yours you have to think about how to map what you want to do onto the operations that are actually available. To me that's clearly a greater cognitive load. Another example is a function that is supposed to put a value back into its proper sorted place. Say you had a sorted list and now the value of one node has been modified. Write the function that puts that value back in its rightful place. The natural way to do it is with a range that has a cursor pointing to the modified node that can be moved either back or forward. Also I see the function "std::advance" is used quite a lot in this implementation of std::algorithm. That moves the cursor forwards or backwards N steps depending on the sign of N.
  Your bidir range may be useful (though I'm not really convinced that
 very many algorithms need what it provides) --  but I think one also
 needs an iterator that's good at what C++'s bidir iterators are good
 at, i.e. moving the active cursor backwards or forwards.  I would call
 your construct more of a "double-headed" range than a bidirectional
 one.

Oh, one more thing. If you study any algorithm that uses bidirectional iterators (such as reverse or Stepanov's partition), you'll notice that ALWAYS WITHOUT EXCEPTION there's two iterators involved. One moves up, the other moves down. This is absolutely essential because it tells that a bidirectional range models all a bidirectional iterator could ever do. If you can move some bidirectional iterator down, then definitely you know its former boundary so you can model that move with a bidirectional range.

This does seem to be true of a many of the algorithms that use bidirs in std::algorithm, which did surprise me. Actually seems to me that these types of algorithms are only using bidirectional iterators for a technicality -- because you can't compare a forward iterator and a reverse iterator. The bidirectionality of the iterator is not really material. One only needs the ++ op for one and the -- op for the other. That says to me the name of the range that does these two things should be something other than "bidirectional", because bidirectionality is not really the key property. "Two-headed range" or "squeeze range" or "pinch range" might be good names. But anyway, I am convinced that your shrinking range type is useful.
 This is fundamental. Ranges NEVER grow. They ALWAYS shrink. Why? Simple:
 because a range has no idea what's outside of itself. It starts life with
 information of its limits from the container, and knows nothing about what's
 outside those limits. Consequently it ALWAYS WITHOUT EXCEPTION shrinks.

Doesn't seem to be quite so absolute from my perusal of std::algorithm. --bb
Sep 10 2008
prev sibling parent Michel Fortin <michel.fortin michelf.com> writes:
On 2008-09-08 23:57:49 -0400, "Manfred_Nowak" <svv1999 hotmail.com> said:

 Andrei Alexandrescu wrote:
 
 maybe "nextTo" or something could be more suggestive.

r.tillBeg(s), r.tillEnd(s), r.fromBeg(s), r.fromEnd(s) ? -manfred

I'm not sure I like this because you have to be careful when reversing the iterating direction. With my previous proposal, you only had to change "next" for "pull" everywhere. With yours, it's "till" to "from" *and* "Beg" to "End", as the relationship is somewhat interleaved: r.nextUntil(s) => r.tillBeg(s) r.nextAfter(s) => r.tillEnd(s) r.pullUntil(s) => r.fromEnd(s) r.pullAfter(s) => r.fromBeg(s) -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Sep 08 2008
prev sibling next sibling parent Michel Fortin <michel.fortin michelf.com> writes:
On 2008-09-08 23:43:11 -0400, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 I like the alternate names quite some. One thing, however, is that head 
 and rear are not near-antonyms (which ideally they should be). Maybe 
 front and rear would be an improvement. (STL uses front and back). 
 Also, I may be dirty-minded, but somehow headNext just sounds... bad 
 :o).

Yeah, pehaps. I mostly wanted a verb, not "frontNext" which seems wrong, and "head" is both a noun and a verb so I kept it.
 I like the intersection functions as members because they clarify the 
 relationship between the two ranges, which is asymmetric. I will 
 definitely heed this suggestion. "Until" suggests iteration, however, 
 which it shouldn't be (should be constant time) so maybe "nextTo" or 
 something could be more suggestive.

Well, initially I thought about nextTo, but then it stuck me as also meaning "the thing just after", which is not really it. I also though about nextUpTo, but that's many capitals to type and many small words and I prefered nextUntil even with the downside of sounding like we're iterating. But perhaps we could get rid of next and replace it with a verb. What about this terminology? r.frontShift // conceptually r.front; r.shift r.putShift(e) // conceptually r.front = e; r.shift r.front r.shift r.shiftTo(s) r.shiftAfter(s) r.back r.pull r.pullTo(s) r.pullAfter(s) -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Sep 08 2008
prev sibling parent reply Leandro Lucarella <llucax gmail.com> writes:
Andrei Alexandrescu, el  8 de septiembre a las 22:43 me escribiste:
 I like the alternate names quite some. One thing, however, is that head
 and rear are not near-antonyms (which ideally they should be). Maybe
 front and rear would be an improvement. (STL uses front and back). Also,

What about head/tail? You certainly won't confuse STL refugees and functional guys will be at home ;) -- Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/ ---------------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------------- PROTESTA EN PLAZA DE MAYO: MUSICO SE COSIO LA BOCA -- Crónica TV
Sep 09 2008
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Leandro Lucarella wrote:
 Andrei Alexandrescu, el  8 de septiembre a las 22:43 me escribiste:
 I like the alternate names quite some. One thing, however, is that head
 and rear are not near-antonyms (which ideally they should be). Maybe
 front and rear would be an improvement. (STL uses front and back). Also,

What about head/tail? You certainly won't confuse STL refugees and functional guys will be at home ;)

You'll sure confuse the latter. To them, tail is everything except the head, e.g. a[1 .. $]. Andrei
Sep 09 2008
parent Leandro Lucarella <llucax gmail.com> writes:
Andrei Alexandrescu, el  9 de septiembre a las 09:50 me escribiste:
 Leandro Lucarella wrote:
Andrei Alexandrescu, el  8 de septiembre a las 22:43 me escribiste:
I like the alternate names quite some. One thing, however, is that head
and rear are not near-antonyms (which ideally they should be). Maybe
front and rear would be an improvement. (STL uses front and back). Also,

functional guys will be at home ;)

You'll sure confuse the latter. To them, tail is everything except the head, e.g. a[1 .. $].

You are right =/ Anyway, I think it better to confuse some other language guys than compromising D's readability... -- Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/ ---------------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------------- Karma police arrest this man, he talks in maths, he buzzes like a fridge, he's like a detuned radio.
Sep 09 2008
prev sibling next sibling parent "Bill Baxter" <wbaxter gmail.com> writes:
On Tue, Sep 9, 2008 at 1:06 PM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 Manfred_Nowak wrote:
 Andrei Alexandrescu wrote:

 maybe "nextTo" or something could be more suggestive.

r.tillBeg(s), r.tillEnd(s), r.fromBeg(s), r.fromEnd(s) ?

Sounds good! Walter doesn't like abbreviations, so probably *Begin would please him more.

But till and until are synonyms. They both sound like iteration. Although it might be unavoidable since all prepositions that give a destination seem to imply going to that destination. till, until, toward, to, up to, etc. So might as well go with the shortest one, "to". r.toBegin(s), r.toEnd(s) r.fromBegin(s), r.fromEnd(s) --bb
Sep 08 2008
prev sibling next sibling parent "Bill Baxter" <wbaxter gmail.com> writes:
On Tue, Sep 9, 2008 at 12:57 PM, Manfred_Nowak <svv1999 hotmail.com> wrote:
 Andrei Alexandrescu wrote:

 maybe "nextTo" or something could be more suggestive.

r.tillBeg(s), r.tillEnd(s), r.fromBeg(s), r.fromEnd(s) ?

Another idea might be go back to Intro to Algebra with the "FOIL" method for first,inner,outer,last. Really you're trying to form the different elements of the cartesian product of (rb,re) and (sb,se), so the "FOIL method" (a mnemonic for multiplying binomials) tells you the resulting monomials are: First: (rb, sb) Outer: (rb, se) Inner: (re, sb) Last: (re, se) So you could have functions like: fromFirsts(r,s) aka leftDiff(r,s) fromLasts(s,r) aka rightDiff(r,s) --- note the order reversal! fromInner(r,s) -- nonsense for ranges but would be "end of r to beginning of s" fromOuter(r,s) aka leftUnion(r,s) aka rightUnion(s,r) To me I think this way of decomposing the names makes it easier to visualize what the things are doing. I get no picture whatsoever from "pullNext" and I think it's going to be really hard for me to remember exactly what that does. And leftUnion is also tough because it's actually not a union of the two ranges, it's more like a union followed by intersection with complement. Thinking in terms of which components you're plucking out to make your new iterator makes it easy for me to visualize. But maybe not everyone had the FOIL method drilled into their heads so thoroughly at a young age like me. Anyway I think it does suggest that maybe left and right union can just be a single union op that goes from beginning of first arg to end of second. Maybe something like "span" would be a better name then. And the precondition is that either r contains s, or vice versa. --bb
Sep 08 2008
prev sibling next sibling parent "Lionello Lunesu" <lionello lunesu.remove.com> writes:
 r.rear

I think 'tail' would be better as the opposite of 'head'. L.
Sep 08 2008
prev sibling parent "Bill Baxter" <wbaxter gmail.com> writes:
On Wed, Sep 10, 2008 at 7:47 AM, Fawzi Mohamed <fmohamed mac.com> wrote:

 2) All the methods with intersection of iterator in my opinion are
 difficult to memorize, and rarely used, I would scrap them.
 Instead I would add the comparison operation .atSamePlace(iterator!(T)y)
 that would say if two iterators are at the same place. With it one gets back
 all the power of pointers, and with a syntax and use that are
 understandable.

But that comparison operation is not enough to implement anything of substance. Try your hand at a few classic algorithms and you'll see.

are you sure? then a range is *exactly* equivalent to a STL iterator, only that it cannot go out of bounds: // left1-left2: while((!i1.isEmpty) && (!i1.atSamePlace(i2))){ i1.next; } // left2-left1: while((!i2.isEmpty) && (!i1.atSamePlace(i2))){ i1.next; } // union 1-2 while((!i1.isEmpty) && (!(i1.atSamePlace(i2))){ i1.next; } while(!i2.isEmpty){ i2.next; } // union 2-1 ... // lower triangle i1=c.all; while(!i1.isEmpty){ i2=c.all; while(!i2.isEmpty && !i2.atSamePlace(i1)){ i2.next; }

Your code shows that you can successfully iterate over the same elements described by Andrei's various unions and differences, but they do not show how you would, say, pass that new range another function to do that job. Such as you would want to do in say, a recursive sort. Since in this design you can't set or access the individual iterator-like components of a range directly, being able to copy the begin or end iterator from one range over to another is necessary, I think. But I think you and I are in agreement that it would be easier and more natural to think of ranges as iterators augmented with information about bounds, as opposed to a contiguous block of things from A to B.
 well these are the operations that you can do on basically all iterators
 (and with wich you can define new iterators).
 The one you propose need an underlying total order that can be efficiently
 checked, for example iterators on trees do not have necessarily this
 property, and then getting your kind of intersection can be difficult (and
 not faster than the operation using atSamePlace.

I don't think that's correct. Andrei's system does not need a total order any more than yours does. The unions and diffs just create new ranges by combining the components of existing ranges. They don't need to know anything about what happens in between those points or how you get from one to the other. Just take the "begin" of this guy and put it together with the "end" of that guy, for example. It doesn't require knowing how to get from anywhere to anywhere to create that new range. --bb
Sep 10 2008
prev sibling next sibling parent reply "Bill Baxter" <wbaxter gmail.com> writes:
On Tue, Sep 9, 2008 at 6:50 AM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 I put together a short document for the range design. I definitely missed
 about a million things and have been imprecise about another million, so
 feedback would be highly appreciated. See:

 http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html

Small typo: "which opens forward ranges to much more many algorithms" --bb
Sep 08 2008
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Bill Baxter wrote:
 On Tue, Sep 9, 2008 at 6:50 AM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 I put together a short document for the range design. I definitely missed
 about a million things and have been imprecise about another million, so
 feedback would be highly appreciated. See:

 http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html

Small typo: "which opens forward ranges to much more many algorithms"

How do I fix it? Andrei
Sep 09 2008
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Bill Baxter wrote:
 On Tue, Sep 9, 2008 at 7:53 PM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 Bill Baxter wrote:
 On Tue, Sep 9, 2008 at 6:50 AM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 I put together a short document for the range design. I definitely missed
 about a million things and have been imprecise about another million, so
 feedback would be highly appreciated. See:

 http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html

"which opens forward ranges to much more many algorithms"


Just make it "to many more algorithms" instead of "to much more many algorithms".

Thanks, fixed. Andrei
Sep 09 2008
prev sibling next sibling parent "Bill Baxter" <wbaxter gmail.com> writes:
On Tue, Sep 9, 2008 at 7:53 PM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 Bill Baxter wrote:
 On Tue, Sep 9, 2008 at 6:50 AM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 I put together a short document for the range design. I definitely missed
 about a million things and have been imprecise about another million, so
 feedback would be highly appreciated. See:

 http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html

Small typo: "which opens forward ranges to much more many algorithms"

How do I fix it?

Just make it "to many more algorithms" instead of "to much more many algorithms". --bb
Sep 09 2008
prev sibling next sibling parent reply Ary Borenszweig <ary esperanto.org.ar> writes:
Andrei Alexandrescu a crit :
 Hello,
 
 
 Walter, Bartosz and myself have been hard at work trying to find the 
 right abstraction for iteration. That abstraction would replace the 
 infamous opApply and would allow for external iteration, thus paving the 
 way to implementing real generic algorithms.
 ...
 http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html
 
 
 Andrei

It looks very nice, though I have a few questions: - Is std.range's source code somewhere? Or it's just the documentation and then the implementation will follow? Because... - How do you create a range? In the documentation it says that "Built-in slices T[] are a direct implementation of random-access ranges", so I guess a built-in slice is already a range. But if that is true... - How is "void next(T)(ref T[] range)" implemented? If I pass a built-in slice to it, how does the template store the state of where in the range are we? Or maybe you'd need to do Range!(...) to create a range? - What do I do to make a collection implement a range? Do I need to implement the templates in std.range using template conditions? Sorry if my questions are silly, I don't know much about templated code.
Sep 09 2008
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Ary Borenszweig wrote:
 Andrei Alexandrescu a crit :
 Hello,


 Walter, Bartosz and myself have been hard at work trying to find the 
 right abstraction for iteration. That abstraction would replace the 
 infamous opApply and would allow for external iteration, thus paving 
 the way to implementing real generic algorithms.
 ...
 http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html


 Andrei

It looks very nice, though I have a few questions: - Is std.range's source code somewhere? Or it's just the documentation and then the implementation will follow?

There is an implementation that does not compile :o|.
 Because...
 - How do you create a range? In the documentation it says that "Built-in 
 slices T[] are a direct implementation of random-access ranges", so I 
 guess a built-in slice is already a range.

A slice is a range alright without any extra adaptation. It has some extra functions, e.g. ~=, that are not defined for ranges.
 But if that is true...
 - How is "void next(T)(ref T[] range)" implemented? If I pass a built-in 
 slice to it, how does the template store the state of where in the range 
 are we? Or maybe you'd need to do Range!(...) to create a range?

void next(T)(ref T[] range) { range = range[1 .. $]; }
 - What do I do to make a collection implement a range? Do I need to 
 implement the templates in std.range using template conditions?

Oh, much simpler. You don't need to use templates at all if you know the type in advance. // assume a collection of ints // using old names struct Collection { struct Range { bool isEmpty() { ... } ref int left() { ... } void next() { ... } } Range all() { ... } } Collection c; foreach (auto r = c.all; !r.isEmpty; r.next) { writeln(r.left); } The advantage of the above is not that it offers you looping over your collection. The advantage is that your collection now can use many of the algorithms in std.algorithm, and others written to use ranges. Collection.Range is in intimate connection with Collection because it understands the mechanism of walking the collection. Your code won't currently compile because returning ref int from left() is not allowed. Andrei
Sep 09 2008
next sibling parent reply Lars Ivar Igesund <larsivar igesund.net> writes:
Andrei Alexandrescu wrote:

 A slice is a range alright without any extra adaptation. It has some
 extra functions, e.g. ~=, that are not defined for ranges.

Aren't slices const/readonly/whatnot and thus ~= not possible without copying/allocation? -- Lars Ivar Igesund blog at http://larsivi.net DSource, #d.tango & #D: larsivi Dancing the Tango
Sep 09 2008
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Lars Ivar Igesund wrote:
 Andrei Alexandrescu wrote:
 
 A slice is a range alright without any extra adaptation. It has some
 extra functions, e.g. ~=, that are not defined for ranges.

Aren't slices const/readonly/whatnot and thus ~= not possible without copying/allocation?

Well there's no change in semantics of slices (meaning T[]) between D1 and D2, so slices mean business as usual. Maybe you are referring to strings, aka invariant(char)[]? Anyhow, today's ~= behaves really really erratically. I'd get rid of it if I could. Take a look at this: import std.stdio; void main(string args[]) { auto a = new int[10]; a[] = 10; auto b = a; writeln(b); a = a[1 .. 5]; a ~= [ 34, 345, 4324 ]; writeln(b); } The program will print all 10s two times. But if we change a[1 .. 5] with a[0 .. 5] the behavior will be very different! a will grow "over" b, thus stomping over its content. This is really bad because the behavior of a simple operation ~= depends on the history of the slice on the left hand side, something often extremely difficult to track, and actually impossible if the slice was received as an argument to a function. IMHO such a faux pas is inadmissible for a modern language. Andrei
Sep 09 2008
next sibling parent Lars Ivar Igesund <larsivar igesund.net> writes:
Andrei Alexandrescu wrote:

 Lars Ivar Igesund wrote:
 Andrei Alexandrescu wrote:
 
 A slice is a range alright without any extra adaptation. It has some
 extra functions, e.g. ~=, that are not defined for ranges.

Aren't slices const/readonly/whatnot and thus ~= not possible without copying/allocation?

Well there's no change in semantics of slices (meaning T[]) between D1 and D2, so slices mean business as usual. Maybe you are referring to strings, aka invariant(char)[]?

No, I actually referred to what you say below. My point was that ~= is an unsafe operation on slices (not impossible as I apparently said), and thus you need copy-on-write to be safe from erratic behaviour.
 
 Anyhow, today's ~= behaves really really erratically. I'd get rid of it
 if I could. Take a look at this:
 
 import std.stdio;
 
 void main(string args[]) {
      auto a = new int[10];
      a[] = 10;
      auto b = a;
      writeln(b);
      a = a[1 .. 5];
      a ~= [ 34, 345, 4324 ];
      writeln(b);
 }
 
 The program will print all 10s two times. But if we change a[1 .. 5]
 with a[0 .. 5] the behavior will be very different! a will grow "over"
 b, thus stomping over its content.
 
 This is really bad because the behavior of a simple operation ~= depends
 on the history of the slice on the left hand side, something often
 extremely difficult to track, and actually impossible if the slice was
 received as an argument to a function.
 
 IMHO such a faux pas is inadmissible for a modern language.
 
 
 Andrei

-- Lars Ivar Igesund blog at http://larsivi.net DSource, #d.tango & #D: larsivi Dancing the Tango
Sep 09 2008
prev sibling next sibling parent reply Sean Kelly <sean invisibleduck.org> writes:
Andrei Alexandrescu wrote:
 Lars Ivar Igesund wrote:
 Andrei Alexandrescu wrote:

 A slice is a range alright without any extra adaptation. It has some
 extra functions, e.g. ~=, that are not defined for ranges.

Aren't slices const/readonly/whatnot and thus ~= not possible without copying/allocation?

Well there's no change in semantics of slices (meaning T[]) between D1 and D2, so slices mean business as usual. Maybe you are referring to strings, aka invariant(char)[]?

I do think it's a fair point that ~= could be considered an operation that constructs a new container (an array, in this case) using a range (slice) as an initializer. The weird issue right now is that there is effectively no difference between a slice and an array insofar as the language or code representation are concerned. In many instances this is an advantage, but it leads to some issues, like the one you describe below.
 Anyhow, today's ~= behaves really really erratically. I'd get rid of it 
 if I could. Take a look at this:
 
 import std.stdio;
 
 void main(string args[]) {
     auto a = new int[10];
     a[] = 10;
     auto b = a;
     writeln(b);
     a = a[1 .. 5];
     a ~= [ 34, 345, 4324 ];
     writeln(b);
 }
 
 The program will print all 10s two times. But if we change a[1 .. 5] 
 with a[0 .. 5] the behavior will be very different! a will grow "over" 
 b, thus stomping over its content.
 
 This is really bad because the behavior of a simple operation ~= depends 
 on the history of the slice on the left hand side, something often 
 extremely difficult to track, and actually impossible if the slice was 
 received as an argument to a function.
 
 IMHO such a faux pas is inadmissible for a modern language.

This would be easy to fix by making arrays / slices fatter (by adding a capacity field, for example), but I'm still not convinced that's the right thing to do. However, it may well be preferable to eliminating appending completely. The obvious alternative would be either to resurrect head const (not gonna happen) or to make append always reallocation (not at all ideal). Sean
Sep 09 2008
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Sean Kelly wrote:
 This would be easy to fix by making arrays / slices fatter (by adding a 
 capacity field, for example), but I'm still not convinced that's the 
 right thing to do.  However, it may well be preferable to eliminating 
 appending completely.  The obvious alternative would be either to 
 resurrect head const (not gonna happen) or to make append always 
 reallocation (not at all ideal).

I couldn't imagine it put any better. Maybe time has come for starting to look into a good solution for this problem. The way things are now, ~= muddles the clean territory that slices cover. Consider we define "unowned" arrays are arrays as allocated by new T[n]. They are "unowned" because no entity controls them except the garbage collector, which by definition recycles them when it's sure you couldn't tell. An "owned" array would be something like a scope variable or an up-and-coming BlockArray!(T) with a destructor. Slices are beautiful for iterating owned and unowned arrays just as well. You can have the slice refer to any range of any array no problem. Calling a ~ b creates a new, unowned array containing their concatenation. Assigning a = new T[n]; binds a to a fresh unowned array. And so on. And all of a sudden we step on a kaka in this beautiful garden. Under very special, undetectable circumstances, a range becomes Hitler, annexes an adjacent range, and starts walking all over it. Sigh. Andrei
Sep 09 2008
parent reply Oskar Linde <oskar.lindeREM OVEgmail.com> writes:
Andrei Alexandrescu wrote:
 Sean Kelly wrote:
 This would be easy to fix by making arrays / slices fatter (by adding 
 a capacity field, for example), but I'm still not convinced that's the 
 right thing to do.  However, it may well be preferable to eliminating 
 appending completely.  The obvious alternative would be either to 
 resurrect head const (not gonna happen) or to make append always 
 reallocation (not at all ideal).

I couldn't imagine it put any better. Maybe time has come for starting to look into a good solution for this problem. The way things are now, ~= muddles the clean territory that slices cover. Consider we define "unowned" arrays are arrays as allocated by new T[n]. They are "unowned" because no entity controls them except the garbage collector, which by definition recycles them when it's sure you couldn't tell. An "owned" array would be something like a scope variable or an up-and-coming BlockArray!(T) with a destructor. Slices are beautiful for iterating owned and unowned arrays just as well. You can have the slice refer to any range of any array no problem. Calling a ~ b creates a new, unowned array containing their concatenation. Assigning a = new T[n]; binds a to a fresh unowned array. And so on. And all of a sudden we step on a kaka in this beautiful garden. Under very special, undetectable circumstances, a range becomes Hitler, annexes an adjacent range, and starts walking all over it. Sigh.

I'm very glad you share my thoughts on ~=. The current D T[] is a tool that has been stuffed with too many concepts. Arrays and slices are two fundamentally different concepts that T[] actually manage to capture impressively well, but unfortunately not fully. And it is the last bit that makes the puzzle complicated. One of the biggest differences between an array and a slice lies in the ownership of the data. And as far as I see it, arrays are conceptually better implemented as reference types, while slices are a natural value type. So by removing ~= from T[], T[] becomes a pure slice type. This is all the old T[new] discussion once again, but with the gained insight that instead of T[new] one could just as well use a pure library type. -- Oskar
Sep 11 2008
parent reply bearophile <bearophileHUGS lycos.com> writes:
Oskar Linde (and Andrei Alexandrescu):
 So by removing ~= from T[], T[] becomes a pure slice type.

Appending to the built-in dynamic arrays is a fundamental operation (I use it hundred of times in my code) so if the purpose is just to avoid problems when extending slices, a different solution can be invented. For example adding the third (capacity) field to the dyn array struct, the last bit of the capacity field can be used to tell apart slices from true whole arrays. So at runtime the code knows how to extend/append the array/slice correctly. This slows down the appending itself a little, but it's better than having to use an ugly ArrayBuilder everywhere. Bye, bearophile
Sep 11 2008
parent reply Sean Kelly <sean invisibleduck.org> writes:
bearophile wrote:
 Oskar Linde (and Andrei Alexandrescu):
 So by removing ~= from T[], T[] becomes a pure slice type.

Appending to the built-in dynamic arrays is a fundamental operation (I use it hundred of times in my code) so if the purpose is just to avoid problems when extending slices, a different solution can be invented. For example adding the third (capacity) field to the dyn array struct, the last bit of the capacity field can be used to tell apart slices from true whole arrays. So at runtime the code knows how to extend/append the array/slice correctly. This slows down the appending itself a little, but it's better than having to use an ugly ArrayBuilder everywhere.

I'd think that adding a capacity field should actually speed up append operations, since the GC wouldn't have to be queried to determine this info. And as in another thread, the capacity of all slices should either be zero or the size of the slice, thus forcing a realloc for any append op. Sean
Sep 11 2008
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Sean Kelly:
 I'd think that adding a capacity field should actually speed up append 
 operations, since the GC wouldn't have to be queried to determine this 
 info.

Yes, but I meant slower than just adding the capacity field without adding such extra bit flag to tell apart slices from arrays.
 And as in another thread, the capacity of all slices should 
 either be zero or the size of the slice, thus forcing a realloc for any 
 append op.

Oh, right, no need to a separate bit for tagging then, is the value capacity=0 that's the tag. Do D designers like this (small) change in the language? :-) Bye, bearophile
Sep 11 2008
parent Sergey Gromov <snake.scaly gmail.com> writes:
bearophile <bearophileHUGS lycos.com> wrote:
 Sean Kelly:
 I'd think that adding a capacity field should actually speed up append 
 operations, since the GC wouldn't have to be queried to determine this 
 info.

Yes, but I meant slower than just adding the capacity field without adding such extra bit flag to tell apart slices from arrays.
 And as in another thread, the capacity of all slices should 
 either be zero or the size of the slice, thus forcing a realloc for any 
 append op.

Oh, right, no need to a separate bit for tagging then, is the value capacity=0 that's the tag. Do D designers like this (small) change in the language? :-)

It just doesn't work. Arrays are structs passed by value. If you pass a capacity-array into a function, function appends, then you append to your old copy, and you overwrite what the function appended. If you force slice-on-copy semantics, then arrays become elusive, tending to implicitly turn to slices whenever you toss them around and then reallocate on append. When I was reading D specs on slicing and appending for the first time I've got a strong feeling of hackery. Slice is a view of another slice but you cannot count on that. Slice can be appended to but you can erase parts of other slices in the process. I've had my share of guessing when I was writing a sort of refactoring tool, when I checked carefully whether a slice would survive and dupped much more than I wish I had. I'd personally love to have something as simple and natural to use as current built-in arrays but with better defined semantics. I don't like Andrei's Array!() thingy but it seems better than anything proposed before.
Sep 11 2008
prev sibling parent Oskar Linde <oskar.lindeREM OVEgmail.com> writes:
Sean Kelly wrote:
 bearophile wrote:
 Oskar Linde (and Andrei Alexandrescu):
 So by removing ~= from T[], T[] becomes a pure slice type.

Appending to the built-in dynamic arrays is a fundamental operation (I use it hundred of times in my code) so if the purpose is just to avoid problems when extending slices, a different solution can be invented.


I agree that it is a fundamental operation, and my code contains hundreds of uses too. But the number of uses are actually fewer than I thought. One project of mine has only 157 ~= out of a total of 18000 lines of code, and the cases are by their nature quite easily identified. Arbitrary code doesn't usually append to arbitrary slices.
 For example adding the third (capacity) field to the dyn array struct, 
 the last bit of the capacity field can be used to tell apart slices 
 from true whole arrays. So at runtime the code knows how to 
 extend/append the array/slice correctly. This slows down the appending 
 itself a little, but it's better than having to use an ugly 
 ArrayBuilder everywhere.

I'd think that adding a capacity field should actually speed up append operations, since the GC wouldn't have to be queried to determine this info. And as in another thread, the capacity of all slices should either be zero or the size of the slice, thus forcing a realloc for any append op. Sean

capacity = the size of of the slice won't work, since then you could transform a slice into a resizable array by mistake: s = a[5..7]; // s.capacity = 2 t = s; s.length = s.length - 1; s ~= x; so that basically means that capacity has to be = 0 for slices, and != 0 for resizable arrays. Without considering whether arrays would gain from having the capacity readily accessible, the advantage from this would be to have a run-time way to separate the slice from the array at the cost of 50 % increased storage. But even though this information would only be accessible at run-time, it is fully deducible at compile time. So you lose all compile time gains from separating the two concepts. -- Oskar
Sep 11 2008
prev sibling parent Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:
Andrei Alexandrescu wrote:
 Lars Ivar Igesund wrote:
 Andrei Alexandrescu wrote:

 A slice is a range alright without any extra adaptation. It has some
 extra functions, e.g. ~=, that are not defined for ranges.

Aren't slices const/readonly/whatnot and thus ~= not possible without copying/allocation?

Well there's no change in semantics of slices (meaning T[]) between D1 and D2, so slices mean business as usual. Maybe you are referring to strings, aka invariant(char)[]? Anyhow, today's ~= behaves really really erratically. I'd get rid of it if I could. Take a look at this: import std.stdio; void main(string args[]) { auto a = new int[10]; a[] = 10; auto b = a; writeln(b); a = a[1 .. 5]; a ~= [ 34, 345, 4324 ]; writeln(b); } The program will print all 10s two times. But if we change a[1 .. 5] with a[0 .. 5] the behavior will be very different! a will grow "over" b, thus stomping over its content. This is really bad because the behavior of a simple operation ~= depends on the history of the slice on the left hand side, something often extremely difficult to track, and actually impossible if the slice was received as an argument to a function. IMHO such a faux pas is inadmissible for a modern language. Andrei

Cool, good to see this is going to be taken care of, it is a horrible wart. -- Bruno Medeiros - Software Developer, MSc. in CS/E graduate http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
Sep 25 2008
prev sibling parent reply Derek Parnell <derek psych.ward> writes:
On Tue, 09 Sep 2008 10:45:53 -0500, Andrei Alexandrescu wrote:


          ref int left() { ... }

Is "left" a "movement in a specific direction" as in "go left at the next lights" or the "amount of stuff left over"? It is a bit ambiguous. Even if it is a direction, is it moving towards the first or the last item? It is not self-evident. As a user of the Latin alphabet I'd assume it was going towards the first item but a Hebrew or Arabic users might assume it was heading towards the end. -- Derek Parnell Melbourne, Australia skype: derek.j.parnell
Sep 09 2008
parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Derek Parnell" wrote
 On Tue, 09 Sep 2008 10:45:53 -0500, Andrei Alexandrescu wrote:


          ref int left() { ... }

Is "left" a "movement in a specific direction" as in "go left at the next lights" or the "amount of stuff left over"? It is a bit ambiguous. Even if it is a direction, is it moving towards the first or the last item? It is not self-evident. As a user of the Latin alphabet I'd assume it was going towards the first item but a Hebrew or Arabic users might assume it was heading towards the end.

It means 'left-most element in the range'. It gets you the first element in the range (i.e. the next element to iterate) without modifying the range. I agree that it is very misleading, but I think Andrei is exploring other possibilities (see other threads). -Steve
Sep 09 2008
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Steven Schveighoffer wrote:
 "Derek Parnell" wrote
 On Tue, 09 Sep 2008 10:45:53 -0500, Andrei Alexandrescu wrote:


          ref int left() { ... }

lights" or the "amount of stuff left over"? It is a bit ambiguous. Even if it is a direction, is it moving towards the first or the last item? It is not self-evident. As a user of the Latin alphabet I'd assume it was going towards the first item but a Hebrew or Arabic users might assume it was heading towards the end.

It means 'left-most element in the range'. It gets you the first element in the range (i.e. the next element to iterate) without modifying the range. I agree that it is very misleading, but I think Andrei is exploring other possibilities (see other threads).

Finally the coin dropped on the Arabic/Hebrew cultural thing. I don't think they'd be offended. This is not writing. Left is left and right is right in math. But yes... first and last are in I guess. I'd also like *r as a shortcut for r.first, as it will be no doubt used very intensively. Andrei
Sep 09 2008
next sibling parent Derek Parnell <derek nomail.afraid.org> writes:
On Tue, 09 Sep 2008 18:04:12 -0500, Andrei Alexandrescu wrote:

 But yes... first and last are in I guess.

of choosing the "right" (or is that "correct") word; one that promotes least synaptic double-takes.
 Finally the coin dropped on the Arabic/Hebrew cultural thing. I don't 
 think they'd be offended.

I wasn't thinking about offense, just cultural assumptions.
 This is not writing. Left is left and right is right in math.

Well, I beg to differ. I believe that programming languages are closer to prose than they are to maths. Even if 'left' and 'right' have specific meanings in maths, people reading someone else's code and seeing r.left might not know that they are reading "maths". I'm sure most readers (at least while learning) will apply their cultural bias when interpreting the written text, and all I'm saying is that r.left may very well mean different things to different people. -- Derek (skype: derek.j.parnell) Melbourne, Australia 10/09/2008 9:58:38 AM
Sep 09 2008
prev sibling next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Bill Baxter wrote:
 On Wed, Sep 10, 2008 at 8:04 AM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 
 Finally the coin dropped on the Arabic/Hebrew cultural thing. I don't think
 they'd be offended. This is not writing. Left is left and right is right in
 math.

Also the direction in which D code is written does not depend on the language of the speaker. It's always left to right. So I think there's no real argument on linguistic grounds. On the other hand, a quick google for "left right confusion" turns up a fair number of relevant hits.

Yep. I needn't google any farther than my wife :o). Andrei
Sep 09 2008
prev sibling parent Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:
Andrei Alexandrescu wrote:
I'd also like *r as a shortcut for r.first, 

Agh, yuck! :( -- Bruno Medeiros - Software Developer, MSc. in CS/E graduate http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
Sep 25 2008
prev sibling parent Derek Parnell <derek nomail.afraid.org> writes:
On Tue, 9 Sep 2008 18:28:34 -0400, Steven Schveighoffer wrote:

 It means 'left-most element in the range'.  It gets you the first element in 
 the range (i.e. the next element to iterate) without modifying the range.
 
 I agree that it is very misleading, but I think Andrei is exploring other 
 possibilities (see other threads).

Thanks. I was playing at "devil's advocate" as my real point was that "left" is way too overloaded with different meanings and is thus not a suitable choice. We already have this problem a few times in D (eg. 'static') -- Derek (skype: derek.j.parnell) Melbourne, Australia 10/09/2008 9:56:09 AM
Sep 09 2008
prev sibling parent "Bill Baxter" <wbaxter gmail.com> writes:
On Wed, Sep 10, 2008 at 8:04 AM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:

 Finally the coin dropped on the Arabic/Hebrew cultural thing. I don't think
 they'd be offended. This is not writing. Left is left and right is right in
 math.

Also the direction in which D code is written does not depend on the language of the speaker. It's always left to right. So I think there's no real argument on linguistic grounds. On the other hand, a quick google for "left right confusion" turns up a fair number of relevant hits. There's enough people out there who have trouble keeping those directions straight for it to get discussed. Searches for "begin end confusion", "front back confusion", "first last confusion" predictably turned up no relevant hits I could find.
 But yes... first and last are in I guess. I'd also like *r as a shortcut for
 r.first, as it will be no doubt used very intensively.

Recognizing that the typical usage for these things will be that "first" is the current value and "last" is actually a bogus sentinel, I guess I would rather see something like .value or .item for the current value. I can understand the pull to try to make the names symmetric, but in fact the things they represent are not really symmetric, so I don't see it as a requirement that the names be symmetric. And opStar is hard to search for so I'd rather not see that at all. Note also that if you declare that * is an alias for .first in the ranges interface that means that every implementor of a range will have to remember include that alias. --bb
Sep 09 2008
prev sibling next sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Andrei Alexandrescu" wrote
<snip>

Excellent ideas!  I think the best part is about how you almost never need 
individual iterators, only ever ranges.  Perfectly explained!

One issue that you might not have considered is using a container as a data 
structure, and not using it for algorithms.  For example, how would one 
specify that one wants to remove a specific element, not a range of 
elements.  Having a construct that points to a specific element but not any 
specific end element might be a requirement for non-algorithmic reasons.

Also, some ranges might become invalid later on whereas the iterators would 
not.  Take for example a Hash container.  If you have to rehash the table, a 
range now might not make any sense, as the 'end' element may have moved to 
be before the 'begin' element.  But an iterator that points to a given 
element will still be valid (and could be used for removal).  In fact, I 
don't think ranges make a lot of sense for things like Hash where there 
isn't any defined order.  But you still should have a 'pointer' type to 
support O(1) removal.

One doesn't see any of these problems with arrays, because with arrays, you 
are guaranteed to have contiguous memory.

What I'm trying to say is there may be a reason to have pointers for certain 
containers, even though they might be unsafe.

My personal pet peeve of many container implementations is not being able to 
remove elements from a container while iterating.  For example, I have a 
linked list of open file descriptors, iterate over the list, closing and 
removing those that are done (which should be O(1) per removal).  In many 
container implementations, iteration implies immutable, which means you have 
to add references to the elements to remove to another list to then remove 
afterwards (somtimes at the cost of O(n) per removal.  grrrr.)  I hope 
ranges will support removal while traversing.

That's all I have for now, I have to go think about how this will impact 
dcollections :)

-Steve 
Sep 09 2008
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Steven Schveighoffer wrote:
 "Andrei Alexandrescu" wrote
 <snip>
 
 Excellent ideas!  I think the best part is about how you almost never need 
 individual iterators, only ever ranges.  Perfectly explained!
 
 One issue that you might not have considered is using a container as a data 
 structure, and not using it for algorithms.  For example, how would one 
 specify that one wants to remove a specific element, not a range of 
 elements.  Having a construct that points to a specific element but not any 
 specific end element might be a requirement for non-algorithmic reasons.

I'm sure you know and imply this, but just to clarify for everybody: Modifying the topology of the container is a task carried by the primitives of the container. Ranges can "look" at the topology and change elements sitting in it, but not alter the topology. Much like in STL, there's a container primitive for removing a range. It will return a range too, namely the range starting at the deleted position. Removing an element is really removing a range of one element - just a particular case.
 Also, some ranges might become invalid later on whereas the iterators would 
 not.  Take for example a Hash container.  If you have to rehash the table, a 
 range now might not make any sense, as the 'end' element may have moved to 
 be before the 'begin' element.  But an iterator that points to a given 
 element will still be valid (and could be used for removal).  In fact, I 
 don't think ranges make a lot of sense for things like Hash where there 
 isn't any defined order.  But you still should have a 'pointer' type to 
 support O(1) removal.

Iterators can be easily defined over hashtables, but indeed they are easily invalidated if implemented efficiently.
 One doesn't see any of these problems with arrays, because with arrays, you 
 are guaranteed to have contiguous memory.
 
 What I'm trying to say is there may be a reason to have pointers for certain 
 containers, even though they might be unsafe.

A pointer to an element can be taken as &(r.first). The range may or may not allow that, it's up to it.
 My personal pet peeve of many container implementations is not being able to 
 remove elements from a container while iterating.  For example, I have a 
 linked list of open file descriptors, iterate over the list, closing and 
 removing those that are done (which should be O(1) per removal).  In many 
 container implementations, iteration implies immutable, which means you have 
 to add references to the elements to remove to another list to then remove 
 afterwards (somtimes at the cost of O(n) per removal.  grrrr.)  I hope 
 ranges will support removal while traversing.

In STL removing from a list while iterating is easy and efficient, albeit verbose as always: list<Filedesc> lst; for (list<Filedesc>::iterator i = lst.begin(); i != lst.end(); ) { if (should_remove) i = lst.erase(i); else ++i; } In Phobos things will be something like: List!(Filedesc) lst; for (auto r = lst.all; !r.isEmpty; ) { if (should_remove) r = lst.erase(take(1, r)); else r.next; } Andrei
Sep 09 2008
next sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Andrei Alexandrescu" wrote
 Steven Schveighoffer wrote:
 "Andrei Alexandrescu" wrote
 <snip>

 Excellent ideas!  I think the best part is about how you almost never 
 need individual iterators, only ever ranges.  Perfectly explained!

 One issue that you might not have considered is using a container as a 
 data structure, and not using it for algorithms.  For example, how would 
 one specify that one wants to remove a specific element, not a range of 
 elements.  Having a construct that points to a specific element but not 
 any specific end element might be a requirement for non-algorithmic 
 reasons.

I'm sure you know and imply this, but just to clarify for everybody: Modifying the topology of the container is a task carried by the primitives of the container. Ranges can "look" at the topology and change elements sitting in it, but not alter the topology.

I agree. There are cases where just an iterator is necessary. For example, a linked-list implementation where the length is calculated instead of stored. But that is the exception, the rule should be that you always ask the container to alter the topology.
 Much like in STL, there's a container primitive for removing a range. It 
 will return a range too, namely the range starting at the deleted 
 position. Removing an element is really removing a range of one element - 
 just a particular case.

Yes, but the problem I see is how do you specify a range of one element. In the case of an array, it is easy because you always know that no matter what happens to a container, the end of '1' element is a pointer to the next element in memory. In the case of other containers, which could have changed since you obtained the range, you cannot be sure the 'range of 1' hasn't changed. For instance, I would assume that a linked list range has two pointers, one to the first element, and one to the element just past the last element in the range. But what if an element is inserted inbetween? Then your range suddenly got bigger. What I'm trying to say is, maybe it would be desirable to have a pointer to exactly one element, instead of a range that could possibly change. Operations on that pointer type would be supported just like the operations on the ranges, but is more specific.
 Also, some ranges might become invalid later on whereas the iterators 
 would not.  Take for example a Hash container.  If you have to rehash the 
 table, a range now might not make any sense, as the 'end' element may 
 have moved to be before the 'begin' element.  But an iterator that points 
 to a given element will still be valid (and could be used for removal). 
 In fact, I don't think ranges make a lot of sense for things like Hash 
 where there isn't any defined order.  But you still should have a 
 'pointer' type to support O(1) removal.

Iterators can be easily defined over hashtables, but indeed they are easily invalidated if implemented efficiently.

Iterators don't have to be invalidated, but ranges would.
 One doesn't see any of these problems with arrays, because with arrays, 
 you are guaranteed to have contiguous memory.

 What I'm trying to say is there may be a reason to have pointers for 
 certain containers, even though they might be unsafe.

A pointer to an element can be taken as &(r.first). The range may or may not allow that, it's up to it.

I thought you stated that 'pointers' shouldn't be allowed, only ranges? In general, I agree with that, but I think the ability to use a pointer type instead of ranges has advantages in some cases.
 My personal pet peeve of many container implementations is not being able 
 to remove elements from a container while iterating.  For example, I have 
 a linked list of open file descriptors, iterate over the list, closing 
 and removing those that are done (which should be O(1) per removal).  In 
 many container implementations, iteration implies immutable, which means 
 you have to add references to the elements to remove to another list to 
 then remove afterwards (somtimes at the cost of O(n) per removal. 
 grrrr.)  I hope ranges will support removal while traversing.

In STL removing from a list while iterating is easy and efficient, albeit verbose as always: list<Filedesc> lst; for (list<Filedesc>::iterator i = lst.begin(); i != lst.end(); ) { if (should_remove) i = lst.erase(i); else ++i; } In Phobos things will be something like: List!(Filedesc) lst; for (auto r = lst.all; !r.isEmpty; ) { if (should_remove) r = lst.erase(take(1, r)); else r.next; }

I prefer the dcollections syntax, but I did write it, so that's to be expected :) : foreach(ref doPurge, fd; lst.purger) doPurge = shouldIRemove(fd); Or if you prefer iterator-style syntax: for(auto i = lst.begin; i != lst.end;) { if shouldIRemove(i.value) i = lst.remove(i); else ++i; } But yes, I see that ranges are used for removal, and that should be supported in ordered containers. But the notion of storing a reference to a single element as a 'range of 1' in certain containers is troublesome I think. -Steve
Sep 09 2008
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Steven Schveighoffer wrote:
 "Andrei Alexandrescu" wrote
 I thought you stated that 'pointers' shouldn't be allowed, only ranges?  In 
 general, I agree with that, but I think the ability to use a pointer type 
 instead of ranges has advantages in some cases.

I think there's a little confusion. There's three things: 1. Ranges 2. Iterators 3. Pointers, e.g. the exact address where the object sits in memory My address uses 1 and drops 2. You still have access to 3 if you so need. void showAddresses(R)(R r) { for (size_t i = 0; !r.isEmpty; r.next, ++i) { writeln("Element ," i, " is sitting at address: ", &(r.first)); } } Andrei
Sep 09 2008
parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Andrei Alexandrescu" wrote
 Steven Schveighoffer wrote:
 "Andrei Alexandrescu" wrote
 I thought you stated that 'pointers' shouldn't be allowed, only ranges? 
 In general, I agree with that, but I think the ability to use a pointer 
 type instead of ranges has advantages in some cases.

I think there's a little confusion. There's three things: 1. Ranges 2. Iterators 3. Pointers, e.g. the exact address where the object sits in memory

Yes, I have been using the terms iterator and pointer interchangably, my bad :) I look at pointers as a specialized type of iterator, ones for which only 'dereference' is defined (and on contiguous memory types such as arrays, increment and decrement).
 My address uses 1 and drops 2. You still have access to 3 if you so need.

 void showAddresses(R)(R r)
 {
     for (size_t i = 0; !r.isEmpty; r.next, ++i)
     {
         writeln("Element ," i, " is sitting at address: ", &(r.first));
     }
 }

Let me explain by example: HashMap!(uint, myResource) resources; .... // returns something that allows me to later remove the element auto r = resources.find(key); useResource(r); resources[newkey] = new myResource; resources.erase(r); Now, assuming that adding the new resource rehashes the hash map, what is in r such that it ONLY points to the single resource? A marker saying 'only one element'? Perhaps you just deleted a range you didn't mean to delete, when you only wanted to delete a single resource. Perhaps r is now considered 'invalid'. Granted, this example can be fixed by reordering the lines of code, and perhaps you don't care about the penalty of looking up the key again, but what if I want to save the iterator to the resource somewhere and delete it later in another function? And what if the cost of lookup for removal is not as quick? I think with a range being the only available 'iterator' type for certain containers may make life difficult for stuff like this. I really don't think iterator is the right term for what I think is needed, what I think is needed is a dumbed down pointer. Something that has one operation -- opStar. No increment, no decrement, just 'here is a reference to this element' that can be passed into the container to represent a pointer to a specific element. -Steve
Sep 09 2008
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Steven Schveighoffer wrote:
 "Andrei Alexandrescu" wrote
 Steven Schveighoffer wrote:
 "Andrei Alexandrescu" wrote
 I thought you stated that 'pointers' shouldn't be allowed, only ranges? 
 In general, I agree with that, but I think the ability to use a pointer 
 type instead of ranges has advantages in some cases.

1. Ranges 2. Iterators 3. Pointers, e.g. the exact address where the object sits in memory

Yes, I have been using the terms iterator and pointer interchangably, my bad :) I look at pointers as a specialized type of iterator, ones for which only 'dereference' is defined (and on contiguous memory types such as arrays, increment and decrement).
 My address uses 1 and drops 2. You still have access to 3 if you so need.

 void showAddresses(R)(R r)
 {
     for (size_t i = 0; !r.isEmpty; r.next, ++i)
     {
         writeln("Element ," i, " is sitting at address: ", &(r.first));
     }
 }

Let me explain by example: HashMap!(uint, myResource) resources; .... // returns something that allows me to later remove the element auto r = resources.find(key); useResource(r); resources[newkey] = new myResource; resources.erase(r); Now, assuming that adding the new resource rehashes the hash map, what is in r such that it ONLY points to the single resource? A marker saying 'only one element'? Perhaps you just deleted a range you didn't mean to delete, when you only wanted to delete a single resource. Perhaps r is now considered 'invalid'. Granted, this example can be fixed by reordering the lines of code, and perhaps you don't care about the penalty of looking up the key again, but what if I want to save the iterator to the resource somewhere and delete it later in another function? And what if the cost of lookup for removal is not as quick? I think with a range being the only available 'iterator' type for certain containers may make life difficult for stuff like this. I really don't think iterator is the right term for what I think is needed, what I think is needed is a dumbed down pointer. Something that has one operation -- opStar. No increment, no decrement, just 'here is a reference to this element' that can be passed into the container to represent a pointer to a specific element.

I understand. My design predicates that you can't model such non-iterable iterators. Either you can use it to move along, in which case ranges will do just fine, or you can't, in which case my design doesn't support it. Note that the STL does not have non-iterable iterators. I think constructing cases where they make sense are tenuous. Andrei
Sep 09 2008
parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Andrei Alexandrescu" wrote
 Steven Schveighoffer wrote:
 Let me explain by example:

 HashMap!(uint, myResource) resources;

 ....

 // returns something that allows me to later remove the element
 auto r = resources.find(key);

 useResource(r);

 resources[newkey] = new myResource;

 resources.erase(r);

 Now, assuming that adding the new resource rehashes the hash map, what is 
 in r such that it ONLY points to the single resource?  A marker saying 
 'only one element'?  Perhaps you just deleted a range you didn't mean to 
 delete, when you only wanted to delete a single resource.  Perhaps r is 
 now considered 'invalid'.  Granted, this example can be fixed by 
 reordering the lines of code, and perhaps you don't care about the 
 penalty of looking up the key again, but what if I want to save the 
 iterator to the resource somewhere and delete it later in another 
 function?  And what if the cost of lookup for removal is not as quick?

 I think with a range being the only available 'iterator' type for certain 
 containers may make life difficult for stuff like this.  I really don't 
 think iterator is the right term for what I think is needed, what I think 
 is needed is a dumbed down pointer.  Something that has one operation --  
 opStar.  No increment, no decrement, just 'here is a reference to this 
 element'  that can be passed into the container to represent a pointer to 
 a specific element.

I understand. My design predicates that you can't model such non-iterable iterators. Either you can use it to move along, in which case ranges will do just fine, or you can't, in which case my design doesn't support it. Note that the STL does not have non-iterable iterators. I think constructing cases where they make sense are tenuous.

Well, STL happens to use iterators to specify elements. It doesn't mean that the iterators are used as iterators in that context, it's just that it's easier to specify one type that does iteration AND represents position :) For example, std::list defines multiple erase functions: iterator erase(iterator first, iterator last); iterator erase(iterator position); In the second case, the iterator need not support incrementing or decrementing (to the user anyway), just referencing. They just used iterator because it's already there :) But in your proposed scenario, I can't have the second function, only the first. My example shows a case where I'd want the second function. What I basically want is a range type where the upper limit is specified as 'always null', so that iterating the range once always results in an empty range, even if the container has changed topology. -Steve
Sep 09 2008
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Steven Schveighoffer wrote:
 "Andrei Alexandrescu" wrote
 For example, std::list defines multiple erase functions:
 
 iterator erase(iterator first, iterator last);
 iterator erase(iterator position);
 
 In the second case, the iterator need not support incrementing or 
 decrementing (to the user anyway), just referencing.  They just used 
 iterator because it's already there :)
 
 But in your proposed scenario, I can't have the second function, only the 
 first.  My example shows a case where I'd want the second function.  What I 
 basically want is a range type where the upper limit is specified as 'always 
 null', so that iterating the range once always results in an empty range, 
 even if the container has changed topology.

I understand. That can't be had in my design. You'd have: List.Range List.erase(Range toErase); and you'd model erasure of one element through a range of size one. I understand how that can be annoying on occasion, but I consider that a minor annoyance and do not plan to allow bare iterators for such cases. I think the absence of naked iterators has huge cognitive and safety advantages. Andrei
Sep 09 2008
prev sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Andrei Alexandrescu:
 In Phobos things will be something like:

List!(Filedesc) lst; for (auto r = lst.all; !r.isEmpty; ) { if (should_remove) r = lst.erase(take(1, r)); else r.next; } It may be better to invent and add some sugar to that, and foreach helps, maybe something like: List!(Filedesc) lst; foreach (ref r; lst.all) { if (predicate(lst.item(r))) r = lst.erase(r); else r = r.next(); } I think that code of mine isn't good yet :-) Bye, bearophile
Sep 09 2008
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
bearophile wrote:
 Andrei Alexandrescu:
 In Phobos things will be something like:

List!(Filedesc) lst; for (auto r = lst.all; !r.isEmpty; ) { if (should_remove) r = lst.erase(take(1, r)); else r.next; } It may be better to invent and add some sugar to that, and foreach helps, maybe something like: List!(Filedesc) lst; foreach (ref r; lst.all) { if (predicate(lst.item(r))) r = lst.erase(r); else r = r.next(); } I think that code of mine isn't good yet :-)

Wow, that's risky. foreach bumps r under the hood, so you'll skip some elements. Andrei
Sep 09 2008
prev sibling next sibling parent reply Benji Smith <dlanguage benjismith.net> writes:
Andrei Alexandrescu wrote:
 I put together a short document for the range design. I definitely 
 missed about a million things and have been imprecise about another 
 million, so feedback would be highly appreciated. See:
 
 http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html

Just thinking off the top of my head... How well would the proposal support a producer/consumer work queue, or a signal/slot implementation? A work-queue consumer would view the queue as an infinite range with no end, but the producer would view that same queue as an infinite range with no beginning. And, conceivably, you could have "conduits" between the producer and consumer that would view that same queue as having neither a beginning nor an end. I'm making no judgment about whether the proposal supports or doesn't support that kind of model. I'm just putting the idea out there for consideration. --benji
Sep 09 2008
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Benji Smith wrote:
 Andrei Alexandrescu wrote:
 I put together a short document for the range design. I definitely 
 missed about a million things and have been imprecise about another 
 million, so feedback would be highly appreciated. See:

 http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html

Just thinking off the top of my head... How well would the proposal support a producer/consumer work queue, or a signal/slot implementation? A work-queue consumer would view the queue as an infinite range with no end, but the producer would view that same queue as an infinite range with no beginning. And, conceivably, you could have "conduits" between the producer and consumer that would view that same queue as having neither a beginning nor an end. I'm making no judgment about whether the proposal supports or doesn't support that kind of model. I'm just putting the idea out there for consideration.

I think it's great to bring the idea up for discussion. The current design does not cater to such uses yet. Andrei
Sep 09 2008
prev sibling next sibling parent reply Derek Parnell <derek nomail.afraid.org> writes:
On Mon, 08 Sep 2008 16:50:54 -0500, Andrei Alexandrescu wrote:

 Hello,

By the way, I meant to say this earlier, but I'm very glad that you have presented something for us to discuss with you. I really appreciate this. -- Derek (skype: derek.j.parnell) Melbourne, Australia 10/09/2008 10:18:23 AM
Sep 09 2008
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Derek Parnell wrote:
 On Mon, 08 Sep 2008 16:50:54 -0500, Andrei Alexandrescu wrote:
 
 Hello,

By the way, I meant to say this earlier, but I'm very glad that you have presented something for us to discuss with you. I really appreciate this.

Thanks. chop? Andrei
Sep 09 2008
prev sibling next sibling parent reply Don <nospam nospam.com.au> writes:
Andrei Alexandrescu wrote:
 In most slice-based D programming, using bare pointers is not necessary. 
 Could then there be a way to use _only_ ranges and eliminate iterators 
 altogether? A container/range design would be much simpler than one also 
 exposing iterators.

 http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html

I like this a lot. You've mentioned safety and simplicity, but it also seems to be a more powerful abstraction than STL-style iterators. Consider a depth-first-search over a tree. You have a start point, an end point, and some internal state (in this case, some kind of stack). The interesting thing is that the required internal state _may depend on the values of the start & end points_. STL iterators don't model this very well, since they require a symmetry between iterators. Which creates the difficulty of where the internal state should be stored. You can get away with independent iterators when the relationship between start and end is, "if you perform ++ on start enough times, you reach end". A simple array-style range formalizes this relationship, but the range concept also allows more complex relationships to be expressed. So I think the value of this approach improves for more complicated iterators than the simple ones used by the STL.
Sep 10 2008
next sibling parent reply "Bill Baxter" <wbaxter gmail.com> writes:
On Wed, Sep 10, 2008 at 4:52 PM, Don <nospam nospam.com.au> wrote:
 Andrei Alexandrescu wrote:
 In most slice-based D programming, using bare pointers is not necessary.
 Could then there be a way to use _only_ ranges and eliminate iterators
 altogether? A container/range design would be much simpler than one also
 exposing iterators.

 http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html

I like this a lot. You've mentioned safety and simplicity, but it also seems to be a more powerful abstraction than STL-style iterators. Consider a depth-first-search over a tree. You have a start point, an end point, and some internal state (in this case, some kind of stack). The interesting thing is that the required internal state _may depend on the values of the start & end points_.

Or you can think of it as a current point and an stopping criterion.
 STL iterators don't model this very well, since they require a symmetry
 between iterators. Which creates the difficulty of where the internal state
 should be stored.

This is also why I argued in my other post on digitalmars.D that we shouldn't be trying to force the start and end parts of a range to be named with complete symmetry. They have different purposes. There will generally be one current point that is relatively active and one stopping criterion that is relatively fixed. I would like to take back one thing, though. In another post I said I didn't think using * for getting the current value was a good idea because it would be too hard to grep for. I hadn't been considering the RandomAccess ranges at that time. I can't imagine *not* using operators for the random access ranges -- it just makes too much sense. So if it's ok for random access then it should be ok for forward and bidir to use operators too. BUT, just as with the random access ranges, I don't think there should be any synonyms. Just use * as the only way to get the current element of a range. I think the desire to have a special "*" shortcut is as clear an indication as any that Andrei in his heart of hearts agrees that the two parts of a range are not really symmetric and should not be treated as such. --bb
Sep 10 2008
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Bill Baxter wrote:
 On Wed, Sep 10, 2008 at 4:52 PM, Don <nospam nospam.com.au> wrote:
 Andrei Alexandrescu wrote:
 In most slice-based D programming, using bare pointers is not necessary.
 Could then there be a way to use _only_ ranges and eliminate iterators
 altogether? A container/range design would be much simpler than one also
 exposing iterators.
 http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html

to be a more powerful abstraction than STL-style iterators. Consider a depth-first-search over a tree. You have a start point, an end point, and some internal state (in this case, some kind of stack). The interesting thing is that the required internal state _may depend on the values of the start & end points_.

Or you can think of it as a current point and an stopping criterion.

My design intently supports forward iteration with sentinel (e.g. a singly-linked list iterator that only has one node pointer and knows it's done when it hits null) and also forward iteration that holds both limits (e.g. a singly-linked list iterator that holds TWO node pointers and knows it's done when they are equal). That's why forward iterators never support a subrange "up to the beginning of some other range" because that would rule out sentinel-terminated iterators. It intently does not support things like zero-terminated strings as random iterators. Why? Because there's no safe way of implementing indexing. I am glad you noticed all this. It's quite subtle.
 STL iterators don't model this very well, since they require a symmetry
 between iterators. Which creates the difficulty of where the internal state
 should be stored.

This is also why I argued in my other post on digitalmars.D that we shouldn't be trying to force the start and end parts of a range to be named with complete symmetry. They have different purposes. There will generally be one current point that is relatively active and one stopping criterion that is relatively fixed.

They do have different purposes and they are asymmetric. But as far as I could tell in reimplementing std.algorithm that asymmetry does not need to spill into the interface. There is one imperfection: there are forward iterators that can implement subranges "up to the beginnig of some other range". They are not categorized in my design.
 I would like to take back one thing, though.  In another post I said I
 didn't think using * for getting the current value was a good idea
 because it would be too hard to grep for.  I hadn't been considering
 the RandomAccess ranges at that time.  I can't imagine *not* using
 operators for the random access ranges -- it just makes too much
 sense.  So if it's ok for random access then it should be ok for
 forward and bidir to use operators too.   BUT, just as with the random
 access ranges, I don't think there should be any synonyms.  Just use *
 as the only way to get the current element of a range.
 
 I think the desire to have a special "*" shortcut is as clear an
 indication as any that Andrei in his heart of hearts agrees that the
 two parts of a range are not really symmetric and should not be
 treated as such.

Walter doesn't like "*" :o(. Andrei
Sep 10 2008
prev sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Don wrote:
 Andrei Alexandrescu wrote:
 In most slice-based D programming, using bare pointers is not 
 necessary. Could then there be a way to use _only_ ranges and 
 eliminate iterators altogether? A container/range design would be much 
 simpler than one also exposing iterators.

 http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html

I like this a lot. You've mentioned safety and simplicity, but it also seems to be a more powerful abstraction than STL-style iterators. Consider a depth-first-search over a tree. You have a start point, an end point, and some internal state (in this case, some kind of stack). The interesting thing is that the required internal state _may depend on the values of the start & end points_. STL iterators don't model this very well, since they require a symmetry between iterators. Which creates the difficulty of where the internal state should be stored. You can get away with independent iterators when the relationship between start and end is, "if you perform ++ on start enough times, you reach end". A simple array-style range formalizes this relationship, but the range concept also allows more complex relationships to be expressed. So I think the value of this approach improves for more complicated iterators than the simple ones used by the STL.

That's a great insight. Hadn't thought of it! Andrei
Sep 10 2008
prev sibling next sibling parent "Bill Baxter" <wbaxter gmail.com> writes:
On Wed, Sep 10, 2008 at 10:07 PM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 Bill Baxter wrote:
 But I think you and I are in agreement that it would be easier and
 more natural to think of ranges as iterators augmented with
 information about bounds, as opposed to a contiguous block of things
 from A to B.

I like that you are bringing this point up, it is interesting. Note that my API never assumes or requires that there's an actual contiguous block of things underneath. Au contraire, in the I/O case, there's only "the current element" underneath.

Yes, I see that and think it's great. But the point I've been trying to make is that the nomenclature you are using seems to emphasize the contiguous block interpretation, rather than the interpretation as a cursor plus a sentinel. The contiguous block terminology makes good sense for slices, but less for things like trees and unbounded generators and HMMs. And ok, I do think your incredible shrinking bidirectional range is borked. But other than that, I'm just talking about terminology. Did you read my posts over on DigtialMars.D? I'm not into the "massive thread on d.announce" thing -- makes it too hard to find sub-threads later -- so I started some new sub-threads over there. --bb
Sep 10 2008
prev sibling next sibling parent "Bill Baxter" <wbaxter gmail.com> writes:
On Wed, Sep 10, 2008 at 11:57 PM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 Bill Baxter wrote:
 On Wed, Sep 10, 2008 at 10:07 PM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 Bill Baxter wrote:
 But I think you and I are in agreement that it would be easier and
 more natural to think of ranges as iterators augmented with
 information about bounds, as opposed to a contiguous block of things
 from A to B.

I like that you are bringing this point up, it is interesting. Note that my API never assumes or requires that there's an actual contiguous block of things underneath. Au contraire, in the I/O case, there's only "the current element" underneath.

Yes, I see that and think it's great. But the point I've been trying to make is that the nomenclature you are using seems to emphasize the contiguous block interpretation, rather than the interpretation as a cursor plus a sentinel. The contiguous block terminology makes good sense for slices, but less for things like trees and unbounded generators and HMMs.

I disagree that isEmpty, first, and next suggest anything near contiguous block. It's just list terminology. Is the list empty? Give me the first element of the list. Advance to the next element in the list.

However a range isn't, generally speaking, a list. It's a way to traverse or access data that may or may not be a list. For something like an unbounded generator, it is odd to speak of the "first". Such an object has a current value and a "next", but the value you can look at right now is only the "first" by a bit of a terminology stretch. I think using list terminology unnecessarily confuses the iterating construct that does the accessing with the container being accessed. The range is not the container. The range consists of a place where you are, and a termination condition. The range is not "empty" or "full" because it does not actually contain elements. Sure, if you're dead set on it, you can say that by "empty" we mean that the set of things you would get if you called .next repeatedly is empty, but why? The terminology is just encouraging one to think of a range as a container, when in fact it is not -- it is more like two goal posts. Call it atEnd() or similar and you'll naturally encourage people to think of ranges as references rather than containers. Similarly, using list terminology led you to "pop". But pop on a range does not actually remove any content. Pop just moves the goal post on one end. And then there's the various union/diff stuff, which everyone seems to find confusing. I think much of that confusion and mental overhead just goes away if you think of a range as a good old iterator plus a stopping condition.
 Names for the before and after range operations are still in the air...

 Are you referring to the "range" name itself?

That could be part of the reason for this tendency to try to assign list-like names to the parts. If it were called a "bounded iterator" I think that would better describe the perspective I'm pushing, and naturally lead to choices like "atEnd" instead of "isEmpty".
 And ok, I do think your incredible shrinking bidirectional range is
 borked.  But other than that, I'm just talking about terminology.

How is it borked?

See my post to Digitalmars.D. But upon further reflection I think it may be that it's just not what I would call a bidirectional range. By that I mean it's not good at solving the problems that a bidirectional iterator in C++ is good for. Your bidir range may be useful (though I'm not really convinced that very many algorithms need what it provides) -- but I think one also needs an iterator that's good at what C++'s bidir iterators are good at, i.e. moving the active cursor backwards or forwards. I would call your construct more of a "double-headed" range than a bidirectional one. --bb
Sep 10 2008
prev sibling next sibling parent reply "Bill Baxter" <wbaxter gmail.com> writes:
On Thu, Sep 11, 2008 at 8:17 AM, Steven Schveighoffer
<schveiguy yahoo.com> wrote:
 "Sergey Gromov" wrote
 You don't mention here which iterator usage pattern you are trying to
 model with ranges.  I can think of at least two.

 1.  You use a single bidirectional 'center' iterator, center == 5.  As
 one would naturally do with iterators.  Note then that whenever you use
 your center for, say, backward iteration, you reconstruct the actual
 range by calling list.begin.  You do it on each iteration.  No wonder it
 stays valid even if you remove the first element in the meantime: you're
 constructing your range from scratch anyway.  If you want to model this
 pattern with ranges---no problem, keep an empty 'center' range, center
 == (5,5), and reconstruct backward iteration range,

 reverse = all.before(center);

 whenever you need to iterate, then

 center = reverse.end;

 This 'center' range, being slightly less efficient, stays valid and
 becomes invalid in exactly the same conditions as your classical
 iterator.

This is exactly the pattern I use. I agree that your example would solve the problem, I hadn't thought of an empty range to be a cursor, that is clever! The only missing piece to your solution is that I must construct the range after the center range in order to access the value to see where I need to go. What I see as the biggest downside is the cumbersome and verbose code of moving the 'iterator' around, as every time I want to move forward, I construct a new range, and every time I want to move backwards I construct a new range (and construct a new 'center' afterwards). So a 'move back one' looks like: auto before = all.before(center); if(!before.isEmpty) center = before.pop.end; And to move forward it's: auto after = all.after(center); if(!after.isEmpty) center = after.next.begin; To get the value there, I have to do: all.after(center).left // or whatever gets decided as the 'get first value of range' member or if opStar is used: *all.after(center); I much prefer: forward: if(center != list.end) ++center; reverse: if(center != list.begin) --center; get value: *center; Especially without all the extra overhead I see both methods as being just as open to mistakes, the first more-so, and more difficult to comprehend (at least for me).

Well put. I was trying to come up with a comparison like this last night. But at 3am I was just too tired to pull it off. Great example of the kind of cognitive overload that comes from this kind of scenario. I really believe the point that ranges are good for std.algorithm is fine. But when people use iterators in code they are often used like the above. This whole shifting back and forth over a linked list was seeming very familiar to me last night and I recalled this morning that I had written some code to implement undo which worked in this very way. The linked list was the undo stack. And undo() moved the current iterator one direction, redo() moved it the other. So far though we don't seem to be able to come up with a good example other of where ranges are weak than traversing a list back and forth. Note that "move back and forth according to some user input" is not clearly not an "algorithm" that would be in std.algorithm. But it does come up often enough in applications. I don't think the fact that it's not strictly an Algorithm-with-a-captial-A makes it any less important. But it is a little fishy that we can't come up with any other example besides sliding a bead on a wire back and forth. --bb
Sep 10 2008
parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Bill Baxter" wrote
 So far though we don't seem to be able to come up with a good example
 other of where ranges are weak than traversing a list back and forth.
 Note that "move back and forth according to some user input" is not
 clearly not an "algorithm" that would be in std.algorithm.  But it
 does come up often enough in applications.  I don't think the fact
 that it's not strictly an Algorithm-with-a-captial-A makes it any less
 important.

 But it is a little fishy that we can't come up with any other example
 besides sliding a bead on a wire back and forth.

Any structure that might change topology doesn't lend itself well to persistant ranges. Ranges are fine for iterating over a constant version of the container. i.e., if you want to implement a search function, where you are assuming that during the search, the container doesn't change, that should take a range as an argument. But storing references to individual elements for later use (such as O(1) lookup or quick removal), and modifying the container inbetween getting the reference and using the reference makes it difficult to guarantee the behavior. The only range type that seems like it would be immune to such changes would be the empty range where both ends point to the same element. In fact, this can be reduced to a single reference, just copied for the sake of calling it a 'range'. Arrays are really a special case where the ranges unequivocally work because once you get a range, all of it is guaranteed not to disappear or change topology. i.e. a slice always contains valid data, no matter what you do to the original array. I think this is the model Andrei is trying to achieve for all containers/iterables, and I think it's just not the same. I think passing the range around as one entity is a very helpful thing, especially for algorithms which generally take ranges in the form of 2 iterators, but I don't think it solves all problems. -Steve
Sep 10 2008
parent Jason House <jason.james.house gmail.com> writes:
Steven Schveighoffer Wrote:

 "Bill Baxter" wrote
 So far though we don't seem to be able to come up with a good example
 other of where ranges are weak than traversing a list back and forth.
 Note that "move back and forth according to some user input" is not
 clearly not an "algorithm" that would be in std.algorithm.  But it
 does come up often enough in applications.  I don't think the fact
 that it's not strictly an Algorithm-with-a-captial-A makes it any less
 important.

 But it is a little fishy that we can't come up with any other example
 besides sliding a bead on a wire back and forth.

Any structure that might change topology doesn't lend itself well to persistant ranges.

Who says all ranges have to be persistant? Ranges "from here to the end" can be dynamic similar to an iterator. In my mind, the important cdiscussion is what "end" means and to compare when ranges and iterators get invalidated. I also wonder a bit about mixing ranges with non-iterable cursors. Of course, their limited value may not merit the complexity.
 Ranges are fine for iterating over a constant version of 
 the container.  i.e., if you want to implement a search function, where you 
 are assuming that during the search, the container doesn't change, that 
 should take a range as an argument.  But storing references to individual 
 elements for later use (such as O(1) lookup or quick removal), and modifying 
 the container inbetween getting the reference and using the reference makes 
 it difficult to guarantee the behavior.  The only range type that seems like 
 it would be immune to such changes would be the empty range where both ends 
 point to the same element.  In fact, this can be reduced to a single 
 reference, just copied for the sake of calling it a 'range'.
 
 Arrays are really a special case where the ranges unequivocally work because 
 once you get a range, all of it is guaranteed not to disappear or change 
 topology.  i.e. a slice always contains valid data, no matter what you do to 
 the original array.  I think this is the model Andrei is trying to achieve 
 for all containers/iterables, and I think it's just not the same.  I think 
 passing the range around as one entity is a very helpful thing, especially 
 for algorithms which generally take ranges in the form of 2 iterators, but I 
 don't think it solves all problems.
 
 -Steve 
 
 

Sep 10 2008
prev sibling next sibling parent "Bill Baxter" <wbaxter gmail.com> writes:
On Thu, Sep 11, 2008 at 8:17 AM, Steven Schveighoffer
<schveiguy yahoo.com> wrote:
 "Sergey Gromov" wrote
 You don't mention here which iterator usage pattern you are trying to
 model with ranges.  I can think of at least two.

 1.  You use a single bidirectional 'center' iterator, center == 5.  As
 one would naturally do with iterators.  Note then that whenever you use
 your center for, say, backward iteration, you reconstruct the actual
 range by calling list.begin.  You do it on each iteration.  No wonder it
 stays valid even if you remove the first element in the meantime: you're
 constructing your range from scratch anyway.  If you want to model this
 pattern with ranges---no problem, keep an empty 'center' range, center
 == (5,5), and reconstruct backward iteration range,

 reverse = all.before(center);

 whenever you need to iterate, then

 center = reverse.end;

 This 'center' range, being slightly less efficient, stays valid and
 becomes invalid in exactly the same conditions as your classical
 iterator.

This is exactly the pattern I use. I agree that your example would solve the problem, I hadn't thought of an empty range to be a cursor, that is clever! The only missing piece to your solution is that I must construct the range after the center range in order to access the value to see where I need to go. What I see as the biggest downside is the cumbersome and verbose code of moving the 'iterator' around, as every time I want to move forward, I construct a new range, and every time I want to move backwards I construct a new range (and construct a new 'center' afterwards). So a 'move back one' looks like: auto before = all.before(center); if(!before.isEmpty) center = before.pop.end; And to move forward it's: auto after = all.after(center); if(!after.isEmpty) center = after.next.begin;

Maybe all we need to neatly support this sliding cursor idiom is just some functions in the std lib: bool cursorRetreat(R)(R all, ref R center) { auto before = all.before(center); if(!before.isEmpty) { center = before.pop.end; return true; } return false; } bool cursorAdvance(R)(R all, ref R center) { auto after = all.after(center); if(!after.isEmpty) { center = after.next.begin; return true; } return false }
 To get the value there, I have to do:
 all.after(center).left // or whatever gets decided as the 'get first value
 of range' member
 or if opStar is used:

 *all.after(center);

Why is all that necessary? Can't you just do a *center?
 I much prefer:

 forward:
 if(center != list.end)
    ++center;

 reverse:
 if(center != list.begin)
   --center;

 get value:
 *center;

With the functions it becomes forward: cursorAdance(list,center); reverse: cursorRetreat(list,center); get value: *center -- this works doesn't it?
 Especially without all the extra overhead

Since we haven't really come up with any examples where the speed with which you can slide back and forth would make a whit of difference, perhaps the extra overhead is a non-issue.
 I see both methods as being just as open to mistakes, the first more-so, and
 more difficult to comprehend (at least for me).

I'm optimistic that this use case can also be covered by some well chosen std library functions, similar to the above. --bb
Sep 10 2008
prev sibling parent "Bill Baxter" <wbaxter gmail.com> writes:
On Sat, Sep 13, 2008 at 7:28 AM, Bill Baxter <wbaxter gmail.com> wrote:
 On Sat, Sep 13, 2008 at 3:21 AM, Denis Koroskin <2korden gmail.com> wrote:
 On Fri, 12 Sep 2008 20:10:28 +0400, Fawzi Mohamed <fmohamed mac.com> wrote:

 On 2008-09-12 17:48:02 +0200, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> said:

 Fawzi Mohamed wrote:
 foreach(i,j,k;1..$,iterJ,a.all){
    //...
 }



Foreach over multiple ranges in paraller is great, but it is quite hard to match key/value to the ranges in your example, because they are far from each other, especially if ranges are evaluated in some (possibly long) expressions. I prefer the following syntax more: foreach (key0, value0 : range0; value1 : range1; ... ) { // or something like this } This way key/value and range are close to each other and you don't need to move you look back and forth to understand what range does this value correspond too.

Err, you just repeated exactly what he said.

Ok sorry I do see a difference now, but you quoted the wrong one of Fawzi's, you should have quoted this one: foreach(i;1..$; j; iterJ; k,l; multiIter){ } Which I think falls into your "or something like this" category. --bb
Sep 12 2008