digitalmars.D.announce - RFC on range design for D2

Andrei Alexandrescu (57/57) Sep 08 2008 Hello,

Jarrett Billingsley (3/57) Sep 08 2008 I like!
BCS (3/9) Sep 08 2008 First of all, I /Like/ opApply. I know there are issues with it so I'd r...

Andrei Alexandrescu (4/15) Sep 08 2008 We all like the way it looks. Ranges will preserve its syntax within a

BCS (2/21) Sep 08 2008 I was referring to the implementation as visible from the called code's ...

Walter Bright (2/4) Sep 09 2008 opApply isn't going to go away, it will still be there as an alternative...

Andrei Alexandrescu (8/13) Sep 09 2008 I disagree with that as I think that would be one perfect place to

Extrawurst (8/25) Sep 09 2008 I agree but i am worried that wont happen. D gets more and more polluted...

Denis Koroskin (6/28) Sep 09 2008 I would also add:

Dejan Lekic (2/4) Sep 09 2008 I agree with this. It is a good idea, and it is explained in a very good...

Manfred_Nowak (12/13) Sep 09 2008 Might be true only, if `s' equals `r'.

Bill Baxter (12/21) Sep 09 2008 So I think you can put that in other words by saying that they could

BCS (3/11) Sep 11 2008 Rock on!

Sergey Gromov (13/22) Sep 08 2008 opApply() wasn't my hero either. :) Your article really looks like

Andrei Alexandrescu (96/119) Sep 08 2008 Thanks! Fixed.

Jarrett Billingsley (8/21) Sep 08 2008 Quick question about this one -- how will iterators get foreach

Andrei Alexandrescu (10/30) Sep 08 2008 Great question. We'll go with structs and duck typing (why the heck

Brad Roberts (2/8) Sep 08 2008 Probably rhetorical, but I can't help myself: If it walks like a duck

Walter Bright (2/4) Sep 09 2008 And if it floats and has a long nose, it's a witch!! Burn her!!!

Steven Schveighoffer (4/8) Sep 09 2008 And she's got a wart! :)

Alix Pexton (7/10) Sep 09 2008 If it were the norm to call it "structural typing" and someone asked why...

Andrei Alexandrescu (4/19) Sep 09 2008 Yeah, I know. It's a good point. Yet I'm somehow weary of cultural

Robert Jacques (15/39) Sep 08 2008 I'd recommend a more clear cut example. Three of the ranges are very wel...

Andrei Alexandrescu (25/80) Sep 08 2008 There are two problems with the view that a slice of a matrix is also a
Derek Parnell (12/13) Sep 09 2008 I see it a little differently. To me, a slice of a matrix is a set of da...

Sergey Gromov (32/85) Sep 09 2008 I really don't like to have basic language constructs implemented as

Andrei Alexandrescu (21/105) Sep 09 2008 Well I guess we disagree on a number of issues here. The problem with

Sergey Gromov (23/70) Sep 10 2008 I think I've got your point here. D is not Python, it shouldn't do

Leandro Lucarella (13/22) Sep 09 2008 Why is so bad that the program crashes if you do something wrong? For ho...

superdan (3/20) Sep 09 2008 such is the peril of gc. clearly meshing scoping with gc ain't gonna be ...

Leandro Lucarella (21/28) Sep 09 2008 I was talking about logical[1] memory leaks, wich are possible even with

downs (13/45) Sep 09 2008 I don't think this is a good thing, for reasons similar to the Error/Exc...

Andrei Alexandrescu (10/80) Sep 09 2008 I hear you. I brought up the same exact design briefly with Bartosz last...

downs (3/87) Sep 09 2008 For numbers, it should probably be "the same as .init". Not every error ...

Andrei Alexandrescu (7/12) Sep 09 2008 That further increases the cognitive cost of T.fail.

Walter Bright (4/13) Sep 09 2008 The T.init value should be that. That's why, for floats, float.init is a...

Benji Smith (14/28) Sep 09 2008 I don't think values necessarily have to be initialized to an invalid

JAnderson (4/38) Sep 09 2008 I agree. I use the 0xcdcdcdcd and 0xfefefefe provided by MSVC a lot to

Manfred_Nowak (7/8) Sep 10 2008 Why can one then define

Brad Roberts (6/15) Sep 10 2008 That just means don't initialize, leaving any instances with random

Manfred_Nowak (16/17) Sep 11 2008 I know what the semantics of `T= void' is supposed to be, but your

Sergey Gromov (13/23) Sep 10 2008 r.before(s)

Andrei Alexandrescu (8/33) Sep 10 2008 Cool! I was thinking of something along the same lines through the

Denis Koroskin (38/95) Sep 08 2008 1) There is a typo:

Andrei Alexandrescu (33/73) Sep 08 2008 I agree. Next was a natural choice. I stole pop from Perl. Any symmetric...

Robert Jacques (13/29) Sep 08 2008 I'd warn that changing away from ptr+length would create logical

Andrei Alexandrescu (12/48) Sep 08 2008 Multidimensional slicing can be implemented with staggered indexing:

Robert Jacques (36/82) Sep 08 2008 An ND array is typically defined as a fat pointer like so:

Andrei Alexandrescu (16/110) Sep 09 2008 Hmmm, I see. That could become a problem if we wanted lower-dimensional

Robert Jacques (18/113) Sep 09 2008 What I meant, is that the type is fundamentally not designed to exist by...

Oskar Linde (31/66) Sep 11 2008 First, let me add my support for the range proposal. It is in line with

Denis Koroskin (47/117) Sep 09 2008 1) R.left += n / R.right -= n

Sergey Gromov (4/25) Sep 09 2008 Now you stepped onto your own landmine. :) "R.left-=n" extends the

Andrei Alexandrescu (8/34) Sep 09 2008 Oh I thought it's R.right -= n.

Sergey Gromov (2/37) Sep 09 2008 It was obviously a typo, but a very dangerous typo indeed.

Sean Kelly (15/31) Sep 08 2008 Yup. This is why I implemented all of Tango's algorithms specifically

Andrei Alexandrescu (9/47) Sep 08 2008 That's great to hear, but I should warn you that moving from arrays to

Sean Kelly (10/26) Sep 08 2008 I'll admit that I find some of the features provides to be

Andrei Alexandrescu (38/64) Sep 08 2008 Great questions. I don't recall having needed to sort a list lately, but...

Sean Kelly (22/92) Sep 09 2008 Ah, so it's a bit like partition and select. I use the two of those
Bruno Medeiros (20/33) Sep 25 2008 Well, didn't you find a "real problem" right there (and also a very

Andrei Alexandrescu (7/38) Sep 25 2008 This hardly characterizes or answers my point. Of course wherever

Bruno Medeiros (6/47) Sep 25 2008 But that's quite true nonetheless. :/

Manfred_Nowak (22/23) Sep 08 2008 1) Example in "4. Bidirectional range"

Andrei Alexandrescu (15/43) Sep 08 2008 There are numerous collections and ranges to be defined, of course. The

Manfred_Nowak (18/23) Sep 08 2008 You are right. A reversable range can be built with the designed

Andrei Alexandrescu (7/31) Sep 08 2008 Indeed Retro was provided as a mere example and has no special status. I...

Manfred_Nowak (14/14) Sep 09 2008 Andrei Alexandrescu wrote:

Lionello Lunesu (21/25) Sep 08 2008 This is just awesome. Thank you for tackling this issue.

Andrei Alexandrescu (82/113) Sep 09 2008 (This is an older message that somehow didn't make it to the group.

superdan (28/39) Sep 09 2008 this is really kool n the gang. there's a sore point tho. if i wanna rea...

Andrei Alexandrescu (52/101) Sep 09 2008 This is a great point. Unfortunately that won't quite work properly.

Sean Kelly (19/30) Sep 09 2008 And I suppose you don't want to return a const reference from first()

Andrei Alexandrescu (8/40) Sep 09 2008 Yes, exactly. Also consider a user that inspects lines in a file, and

Lionello Lunesu (9/54) Sep 09 2008 Can't this be done by creating different ranges? I mean, trying to find ...

Andrei Alexandrescu (7/73) Sep 09 2008 I don't mind implementing different suckers. :o) The problem is that the

Leandro Lucarella (20/37) Sep 09 2008 I think STL refugees can deal with it. I think there is no point on keep

Andrei Alexandrescu (17/40) Sep 09 2008 I agree. I just don't think that choosing one name over a synonym name

Leandro Lucarella (9/23) Sep 09 2008 Much better, thank you! =)

Benji Smith (7/15) Sep 09 2008 Maybe:

Manfred_Nowak (6/8) Sep 09 2008 clapping hands.

Andrei Alexandrescu (6/12) Sep 09 2008 Walter would love that.

Andrei Alexandrescu (5/19) Sep 09 2008 I just discovered a problem with that. hasNext implies I'm supposed to

Benji Smith (17/38) Sep 09 2008 I see where you're coming from, because the range shrinks itself
Denis Koroskin (18/34) Sep 10 2008 I usually implement my iterators as follows:

Bill Baxter (5/8) Sep 09 2008 The text says that s should be a subrange of r, but the drawing shows

Andrei Alexandrescu (3/11) Sep 09 2008 I am considering relaxing the requirements.

Derek Parnell (10/16) Sep 09 2008 LOL ... I was just thinking to myself ... "what's wrong with First and

Andrei Alexandrescu (7/23) Sep 09 2008 Previous is confusing as it suggest I'm moving back where I came from.

Derek Parnell (17/23) Sep 09 2008 And I'm sure you really mean ...

Benji Smith (2/7) Sep 09 2008 Consume?

Leandro Lucarella (22/42) Sep 10 2008 shrink(int n = 1)?

Sergey Gromov (17/36) Sep 10 2008 I remember that shift() was a method to remove first element from an

David Gileadi (2/8) Sep 10 2008 Perhaps reduce() instead of pop() for moving the end?

Andrei Alexandrescu (6/15) Sep 10 2008 I love reduce! Thought of it as well. Unfortunately the term is loaded

Sergey Gromov (4/20) Sep 10 2008 I thought of next/shrink as well, but they look asymmetrical, and also

Lionello Lunesu (11/22) Sep 09 2008 So isEmpty is optional for input ranges? This does not actually match yo...

Andrei Alexandrescu (7/34) Sep 09 2008 I think I'd want to make it nonoptional such that people wanting real

Michel Fortin (37/40) Sep 08 2008 That looks great. I want to suggest renaming a few functions to make

Andrei Alexandrescu (12/60) Sep 08 2008 I like the alternate names quite some. One thing, however, is that head

Manfred_Nowak (7/8) Sep 08 2008 r.tillBeg(s), r.tillEnd(s),

Andrei Alexandrescu (4/10) Sep 08 2008 Sounds good! Walter doesn't like abbreviations, so probably *Begin would...

Bill Baxter (10/20) Sep 08 2008 But till and until are synonyms. They both sound like iteration.

Andrei Alexandrescu (5/24) Sep 09 2008 These are the names that I find most appealing at the moment. They

Fawzi Mohamed (51/51) Sep 09 2008 It is a nice idea to redesign the iterator and range.

Andrei Alexandrescu (38/92) Sep 09 2008 Yes. Consider findAdjacent that finds two equal adjacent elements in a

Fawzi Mohamed (56/165) Sep 09 2008 ok I understand, indeed it is useful to have non copyable "unique"

Andrei Alexandrescu (21/72) Sep 09 2008 I am getting seriously confused by this subthread. So are you saying

Fawzi Mohamed (55/131) Sep 10 2008 It desn't seem to difficult to me, just look at the code, they are

Bill Baxter (22/63) Sep 10 2008 Your code shows that you can successfully iterate over the same

Andrei Alexandrescu (26/45) Sep 10 2008 I like that you are bringing this point up, it is interesting. Note that...

Bill Baxter (14/24) Sep 10 2008 Yes, I see that and think it's great. But the point I've been trying

Andrei Alexandrescu (8/28) Sep 10 2008 I disagree that isEmpty, first, and next suggest anything near

Bill Baxter (40/73) Sep 10 2008 However a range isn't, generally speaking, a list. It's a way to

Andrei Alexandrescu (16/76) Sep 10 2008 Agreed. The problem with "current" instead of "first" is that there's no...

Bill Baxter (19/61) Sep 10 2008 That's just a mutable termination condition. Still fits my description.

Andrei Alexandrescu (22/37) Sep 10 2008 hasNext and hasPrev are not orthogonal and add unnecessarily

Andrei Alexandrescu (19/28) Sep 10 2008 It's good. I proved that constructively for std.algorithm, which of

Steven Schveighoffer (28/36) Sep 10 2008 Any iterative algorithm where the search might go up or down might be a

Andrei Alexandrescu (15/52) Sep 10 2008 Of course it does. You just remember the leftmost point in time you need...

Bill Baxter (13/40) Sep 10 2008 Cognitive load...

Andrei Alexandrescu (20/33) Sep 10 2008 A function only needing one iterator is a chymera. It can't move it any

Bill Baxter (9/30) Sep 10 2008 Oops. I meant two ranges not two iterators there. There are no

Steven Schveighoffer (27/78) Sep 10 2008 Perhaps not, I haven't used the ranges as you have implemented them, nor...

Andrei Alexandrescu (98/136) Sep 10 2008 No, this is incorrect. I don't "have to" at all. I could define the

Steven Schveighoffer (55/155) Sep 10 2008 A range or iterator that becomes undefined when adding an element to a

Fawzi Mohamed (54/54) Sep 10 2008 I am sorry I hadn't the time t fully follow the discussion, but I took
Sergey Gromov (21/70) Sep 10 2008 You don't mention here which iterator usage pattern you are trying to

Andrei Alexandrescu (5/79) Sep 10 2008 I'm acquiring the nagging feeling that Sergey understands ranges better

Sergey Gromov (4/7) Sep 10 2008 Thank you, and welcome! ;)

Derek Parnell (7/16) Sep 10 2008 Oh boy! We must put an end to that otherwise we might all be out of a jo...

Steven Schveighoffer (37/53) Sep 10 2008 This is exactly the pattern I use. I agree that your example would solv...

Bill Baxter (23/80) Sep 10 2008 Well put. I was trying to come up with a comparison like this last

Benji Smith (16/21) Sep 10 2008 I dunno about that.

Bill Baxter (19/42) Sep 10 2008 Iterators for maps, sets, bags, graphs, trees are usually either for

Steven Schveighoffer (21/30) Sep 10 2008 Any structure that might change topology doesn't lend itself well to

Bill Baxter (22/54) Sep 10 2008 But they often don't lend themselves to iterators either.
Jason House (5/42) Sep 10 2008 Who says all ranges have to be persistant? Ranges "from here to the end"...

Bill Baxter (36/93) Sep 10 2008 Maybe all we need to neatly support this sliding cursor idiom is just
Bill Baxter (10/18) Sep 10 2008 Oh, I get it. It's empty. Duh.

Steven Schveighoffer (21/40) Sep 10 2008 That is all fine and dandy in the world of "I don't care how well my

Bill Baxter (30/72) Sep 10 2008 Ok, but I have yet to hear an actual use case that demands blazing

Benji Smith (7/16) Sep 10 2008 Oh!! I thought of one!!

Bill Baxter (15/31) Sep 10 2008 Good call. I was about to post something mentioning that Turing

Benji Smith (25/46) Sep 11 2008 Actually, Perl 6 will (assuming they ever finish it) finally allow regex...

Sergey Gromov (4/18) Sep 11 2008 It seems to me like a misuse of ranges. Do you really want to iterate

Benji Smith (21/39) Sep 11 2008 Well, no. Not when you put it like that.

Sergey Gromov (14/58) Sep 11 2008 Well, if you get an object out of there on every step, and that object

Andrei Alexandrescu (28/87) Sep 11 2008 I agree 100%, and also with Sergey's other post that some abstractions

Benji Smith (8/16) Sep 11 2008 I agree.

Russell Lewis (9/26) Sep 11 2008 They do backtracking, which is different than iterating backward. I

Steven Schveighoffer (31/36) Sep 11 2008 I can define an iterator and it doesn't mean that it makes ranges any le...

Bill Baxter (63/89) Sep 11 2008 I think one thing to consider is what it will take to make a new

Steven Schveighoffer (12/20) Sep 12 2008 Bill, thanks so much for explaining it like this, I really agree with wh...

Sergey Gromov (6/27) Sep 12 2008 If you ask me, I think iterators AKA pointers into containers should be

Andrei Alexandrescu (5/32) Sep 12 2008 That's also a reason why std.stdio must wrap FILE* into a safe struct.

Andrei Alexandrescu (9/30) Sep 12 2008 You are right. Iterators can definitely be handy in many situations, and...
Fawzi Mohamed (33/33) Sep 12 2008 I like the new proposal much more than the first.

Andrei Alexandrescu (14/50) Sep 12 2008 Comparing for equality of heads is very important. For now you can

Fawzi Mohamed (18/72) Sep 12 2008 nice I hadn't thought about this

Denis Koroskin (12/18) Sep 12 2008 Foreach over multiple ranges in paraller is great, but it is quite hard ...

Bill Baxter (3/23) Sep 12 2008 Err, you just repeated exactly what he said.
Bill Baxter (7/34) Sep 12 2008 Ok sorry I do see a difference now, but you quoted the wrong one of

Sergey Gromov (33/73) Sep 11 2008 Yes, these are valid points, I completely agree. But there are also

Andrei Alexandrescu (5/15) Sep 10 2008 Didn't mean to. You are making great points, and I hope (without being

Steven Schveighoffer (6/21) Sep 10 2008 Didn't know that :) Sometimes when someone is not aware of a quote/joke...

Bill Baxter (83/108) Sep 10 2008 Here's one from DinkumWare's :

Andrei Alexandrescu (13/24) Sep 10 2008 I agree, and I agreed in the draft on ranges, that code using ranges can...
Sergey Gromov (7/58) Sep 10 2008 They're using the ability to ++ and -- to avoid post-decrement at any

Andrei Alexandrescu (5/70) Sep 10 2008 Got to say I'm pretty much in awe :o). But (without thinking much about

Sergey Gromov (6/76) Sep 10 2008 They originally use backward copying because they don't know where the

Andrei Alexandrescu (3/75) Sep 10 2008 One up for ranges then. Whew. I was due for it :o).

Sean Kelly (6/40) Sep 10 2008 I'm not sure the above is correct. It should return after the copy is

Sergey Gromov (10/26) Sep 10 2008 Of course there should be return statements, thank you. I've never

Andrei Alexandrescu (7/33) Sep 10 2008 Speaking of copying, C++'s std::copy and friends have been under
Sean Kelly (3/29) Sep 10 2008 Oops, of course.

Sergey Gromov (5/88) Sep 10 2008 This is a bit more complex. If only basic operations on ranges are

Fawzi Mohamed (13/80) Sep 10 2008 yes you are right this operation on the simplest iterators cannot be

JAnderson (19/25) Sep 09 2008 Hi Andrei,

Andrei Alexandrescu (9/46) Sep 10 2008 There's no regression. There are containers and ranges. Containers have

JAnderson (34/86) Sep 10 2008 Just to be clear then. Say you write something that works on arrays and...
JAnderson (14/66) Sep 10 2008 I'm not sure that range operations need to spill over. I was thinking

Michel Fortin (13/21) Sep 08 2008 I'm not sure I like this because you have to be careful when reversing
Bill Baxter (31/35) Sep 08 2008 Another idea might be go back to Intro to Algebra with the "FOIL"

Michel Fortin (25/35) Sep 08 2008 Yeah, pehaps. I mostly wanted a verb, not "frontNext" which seems
Leandro Lucarella (10/13) Sep 09 2008 What about head/tail? You certainly won't confuse STL refugees and

Andrei Alexandrescu (4/11) Sep 09 2008 You'll sure confuse the latter. To them, tail is everything except the

Leandro Lucarella (14/24) Sep 09 2008 You are right =/

Lionello Lunesu (2/3) Sep 08 2008 I think 'tail' would be better as the opposite of 'head'.

Jason House (13/82) Sep 08 2008 Left and right Union and diff seem awkward. The kind of thing a rare use...

Andrei Alexandrescu (4/20) Sep 09 2008 Wow. I could never type as much on a cell. Thanks for the suggestion. I

Jason House (2/4) Sep 09 2008 Is cute a bad thing? If no better suggestions were made, I hoped it mig...

Bill Baxter (5/9) Sep 08 2008 Small typo:

Andrei Alexandrescu (3/14) Sep 09 2008 How do I fix it?

Bill Baxter (5/20) Sep 09 2008 Just make it "to many more algorithms" instead of "to much more many

Andrei Alexandrescu (3/20) Sep 09 2008 Thanks, fixed.

bearophile (17/18) Sep 09 2008 As you see it's a matter of balance.

Andrei Alexandrescu (76/156) Sep 09 2008 Stop right there. This is just a presupposition. I think I could say I

superdan (2/7) Sep 09 2008 u could've stopped readin'. general comments without experience are oxpo...

Walter Bright (3/5) Sep 09 2008 The thing about iterators and collections is that they look so simple,

Walter Bright (3/9) Sep 09 2008 And when a correct design is devised, the mark of genius is everyone

Benji Smith (8/18) Sep 09 2008 Also: everyone will measure the quality of that design using a different...

Benji Smith (45/49) Sep 09 2008 I'd also add this:

Andrei Alexandrescu (4/73) Sep 09 2008 Hmm, HMMs :o). If you could do it with Java's hasNext and next, you can

Benji Smith (13/18) Sep 09 2008 Oh. Okay. Good to know :)

Benji Smith (10/11) Sep 09 2008 Along these same lines, while D is still young, the documentation is

Andrei Alexandrescu (5/24) Sep 09 2008 This is a valid concern. The sample ranges I have coded so far look
Andrei Alexandrescu (64/83) Sep 09 2008 Speaking of examples and readability, and to tie this with the

bearophile (5/12) Sep 10 2008 That module has about 380 lines of code of benchmarks (after few hundred...

bearophile (90/90) Sep 10 2008 Few benchmarks, appending ints, note this is a worst-case situation (oth...

Andrei Alexandrescu (22/33) Sep 10 2008 That's odd. The array appender should never, by definition, do

bearophile (8/12) Sep 10 2008 I have kept an eye on such things too. Note that benchmarks are generall...

Andrei Alexandrescu (16/34) Sep 10 2008 But it's not abstract, and besides my benchmarks do support my

superdan (38/79) Sep 10 2008 arrayappender is simple as dumb. compare and contrast with the ginormous...

Andrei Alexandrescu (9/96) Sep 10 2008 Thanks! Can't hurt. Guess I'll integrate your code if you don't mind.

bearophile (5/9) Sep 10 2008 You can't compare benchmarks of two different compilers.

bearophile (4/5) Sep 10 2008 Sorry, ignore what I have written, I'm a little nervous...

Andrei Alexandrescu (10/14) Sep 10 2008 I think I've unnecessarily overstated my case, which has put both of us

Sean Kelly (4/19) Sep 10 2008 For the record, in-place growth has been in D1 for as long as it's been

Sean Kelly (5/12) Sep 10 2008 Arrays larger than 4k grow logarithmically, smaller than 4k they grow

Andrei Alexandrescu (74/86) Sep 10 2008 Yes, but with in-place expansion, effective growth stays exponential

dsimcha (4/87) Jan 08 2009 One definite problem that I've just realized is that there's no putNext(...

bearophile (5/8) Jan 08 2009 Just add a simple method overload for that purpose, it's easy enough to ...
jq (6/17) Jan 08 2009 I have thoughts:

Andrei Alexandrescu (7/27) Jan 09 2009 Length sounds good. The other two I'm more hesitant about because

Andrei Alexandrescu (4/8) Jan 09 2009 My current codebase has that. It's about time I commit.

Ary Borenszweig (13/25) Sep 09 2008 It looks very nice, though I have a few questions:

Andrei Alexandrescu (32/59) Sep 09 2008 A slice is a range alright without any extra adaptation. It has some

Lars Ivar Igesund (8/10) Sep 09 2008 Aren't slices const/readonly/whatnot and thus ~= not possible without

Andrei Alexandrescu (25/32) Sep 09 2008 Well there's no change in semantics of slices (meaning T[]) between D1

Lars Ivar Igesund (9/50) Sep 09 2008 No, I actually referred to what you say below. My point was that ~= is a...
Sean Kelly (15/52) Sep 09 2008 I do think it's a fair point that ~= could be considered an operation

Andrei Alexandrescu (19/25) Sep 09 2008 I couldn't imagine it put any better. Maybe time has come for starting

Oskar Linde (16/45) Sep 11 2008 I'm very glad you share my thoughts on ~=. The current D T[] is a tool

bearophile (5/6) Sep 11 2008 Appending to the built-in dynamic arrays is a fundamental operation (I u...

Sean Kelly (7/12) Sep 11 2008 I'd think that adding a capacity field should actually speed up append

bearophile (6/12) Sep 11 2008 Oh, right, no need to a separate bit for tagging then, is the value capa...

Sergey Gromov (18/32) Sep 11 2008 It just doesn't work. Arrays are structs passed by value. If you pass

Oskar Linde (23/45) Sep 11 2008 I agree that it is a fundamental operation, and my code contains

Bruno Medeiros (5/46) Sep 25 2008 Cool, good to see this is going to be taken care of, it is a horrible wa...

Derek Parnell (11/12) Sep 09 2008 Is "left" a "movement in a specific direction" as in "go left at the nex...

Steven Schveighoffer (6/14) Sep 09 2008 It means 'left-most element in the range'. It gets you the first elemen...

Andrei Alexandrescu (7/24) Sep 09 2008 Finally the coin dropped on the Arabic/Hebrew cultural thing. I don't

Derek Parnell (17/21) Sep 09 2008 Yes, I understand this. I am just raking old coals to stress the importa...
Bill Baxter (23/28) Sep 09 2008 Also the direction in which D code is written does not depend on the

Andrei Alexandrescu (3/16) Sep 09 2008 Yep. I needn't google any farther than my wife :o).

Bruno Medeiros (5/6) Sep 25 2008 Agh, yuck! :(

Derek Parnell (10/15) Sep 09 2008 Thanks. I was playing at "devil's advocate" as my real point was that

bearophile (19/28) Sep 09 2008 I know that the STL is a highly refined piece of technology; after readi...

Andrei Alexandrescu (38/129) Sep 09 2008 Thanks for your continued comments.

Steven Schveighoffer (32/32) Sep 09 2008 "Andrei Alexandrescu" wrote

Andrei Alexandrescu (29/61) Sep 09 2008 I'm sure you know and imply this, but just to clarify for everybody:

Steven Schveighoffer (37/98) Sep 09 2008 I agree. There are cases where just an iterator is necessary. For exam...

Andrei Alexandrescu (14/18) Sep 09 2008 I think there's a little confusion. There's three things:

Steven Schveighoffer (30/47) Sep 09 2008 Yes, I have been using the terms iterator and pointer interchangably, my...

Andrei Alexandrescu (8/67) Sep 09 2008 I understand. My design predicates that you can't model such

Steven Schveighoffer (17/55) Sep 09 2008 Well, STL happens to use iterators to specify elements. It doesn't mean...

Andrei Alexandrescu (9/24) Sep 09 2008 I understand. That can't be had in my design. You'd have:

bearophile (19/20) Sep 09 2008 List!(Filedesc) lst;

Andrei Alexandrescu (4/26) Sep 09 2008 Wow, that's risky. foreach bumps r under the hood, so you'll skip some

Benji Smith (13/18) Sep 09 2008 Just thinking off the top of my head...

Andrei Alexandrescu (4/25) Sep 09 2008 I think it's great to bring the idea up for discussion. The current

bearophile (19/27) Sep 09 2008 For all I know, much of Walter's claim to fame with D is that he proved ...

Andrei Alexandrescu (4/7) Sep 09 2008 Not at all. C++ has much gratuitous complexity and just elementary
Walter Bright (27/35) Sep 10 2008 Back when I worked for Boeing, I had a discussion with some of the

JAnderson (25/35) Sep 11 2008 IMHO

Derek Parnell (8/9) Sep 09 2008 By the way, I meant to say this earlier, but I'm very glad that you have

Andrei Alexandrescu (4/10) Sep 09 2008 Thanks.

Don (16/21) Sep 10 2008 I like this a lot. You've mentioned safety and simplicity, but it also

Bill Baxter (21/37) Sep 10 2008 This is also why I argued in my other post on digitalmars.D that we

Andrei Alexandrescu (19/58) Sep 10 2008 My design intently supports forward iteration with sentinel (e.g. a

Andrei Alexandrescu (3/29) Sep 10 2008 That's a great insight. Hadn't thought of it!

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Hello,


Walter, Bartosz and myself have been hard at work trying to find the 
right abstraction for iteration. That abstraction would replace the 
infamous opApply and would allow for external iteration, thus paving the 
way to implementing real generic algorithms.

We considered an STL-style container/iterator design. Containers would 
use the newfangled value semantics to enforce ownership of their 
contents. Iterators would span containers in various ways.

The main problem with that approach was integrating built-in arrays into 
the design. STL's iterators are generalized pointers; D's built-in 
arrays are, however, not pointers, they are "pairs of pointers" that 
cover contiguous ranges in memory. Most people who've used D gained the 
intuition that slices are superior to pointers in many ways, such as 
easier checking for validity, higher-level compact primitives, 
streamlined and safe interface. However, if STL iterators are 
generalized pointers, what is the corresponding generalization of D's 
slices? Intuitively that generalization should also be superior to 
iterators.

In a related development, the Boost C++ library has defined ranges as 
pairs of two iterators and implemented a series of wrappers that accept 
ranges and forward their iterators to STL functions. The main outcome of 
Boost ranges been to decrease the verboseness and perils of naked 
iterator manipulation (see 
http://www.boost.org/doc/libs/1_36_0/libs/range/doc/intro.html). So a 
C++ application using Boost could avail itself of containers, ranges, 
and iterators. The Boost notion of range is very close to a 
generalization of D's slice.

We have considered that design too, but that raised a nagging question. 
In most slice-based D programming, using bare pointers is not necessary. 
Could then there be a way to use _only_ ranges and eliminate iterators 
altogether? A container/range design would be much simpler than one also 
exposing iterators.

All these questions aside, there are several other imperfections in the 
STL, many caused by the underlying language. For example STL is 
incapable of distinguishing between input/output iterators and forward 
iterators. This is because C++ cannot reasonably implement a type with 
destructive copy semantics, which is what would be needed to make said 
distinction. We wanted the Phobos design to provide appropriate answers 
to such questions, too. This would be useful particularly because it 
would allow implementation of true and efficient I/O integrated with 
iteration. STL has made an attempt at that, but istream_iterator and 
ostream_iterator are, with all due respect, a joke that builds on 
another joke, the iostreams.

After much thought and discussions among Walter, Bartosz and myself, I 
defined a range design and reimplemented all of std.algorithm and much 
of std.stdio in terms of ranges alone. This is quite a thorough test 
because the algorithms are diverse and stress-test the expressiveness 
and efficiency of the range design. Along the way I made the interesting 
realization that certain union/difference operations are needed as 
primitives for ranges. There are also a few bugs in the compiler and 
some needed language enhancements (e.g. returning a reference from a 
function); Walter is committed to implement them.

I put together a short document for the range design. I definitely 
missed about a million things and have been imprecise about another 
million, so feedback would be highly appreciated. See:

http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html


Andrei

Sep 08 2008

"Jarrett Billingsley" <jarrett.billingsley gmail.com> writes:

On Mon, Sep 8, 2008 at 5:50 PM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 Hello,


 Walter, Bartosz and myself have been hard at work trying to find the right
 abstraction for iteration. That abstraction would replace the infamous
 opApply and would allow for external iteration, thus paving the way to
 implementing real generic algorithms.

 We considered an STL-style container/iterator design. Containers would use
 the newfangled value semantics to enforce ownership of their contents.
 Iterators would span containers in various ways.

 The main problem with that approach was integrating built-in arrays into the
 design. STL's iterators are generalized pointers; D's built-in arrays are,
 however, not pointers, they are "pairs of pointers" that cover contiguous
 ranges in memory. Most people who've used D gained the intuition that slices
 are superior to pointers in many ways, such as easier checking for validity,
 higher-level compact primitives, streamlined and safe interface. However, if
 STL iterators are generalized pointers, what is the corresponding
 generalization of D's slices? Intuitively that generalization should also be
 superior to iterators.

 In a related development, the Boost C++ library has defined ranges as pairs
 of two iterators and implemented a series of wrappers that accept ranges and
 forward their iterators to STL functions. The main outcome of Boost ranges
 been to decrease the verboseness and perils of naked iterator manipulation
 (see http://www.boost.org/doc/libs/1_36_0/libs/range/doc/intro.html). So a
 C++ application using Boost could avail itself of containers, ranges, and
 iterators. The Boost notion of range is very close to a generalization of
 D's slice.

 We have considered that design too, but that raised a nagging question. In
 most slice-based D programming, using bare pointers is not necessary. Could
 then there be a way to use _only_ ranges and eliminate iterators altogether?
 A container/range design would be much simpler than one also exposing
 iterators.

 All these questions aside, there are several other imperfections in the STL,
 many caused by the underlying language. For example STL is incapable of
 distinguishing between input/output iterators and forward iterators. This is
 because C++ cannot reasonably implement a type with destructive copy
 semantics, which is what would be needed to make said distinction. We wanted
 the Phobos design to provide appropriate answers to such questions, too.
 This would be useful particularly because it would allow implementation of
 true and efficient I/O integrated with iteration. STL has made an attempt at
 that, but istream_iterator and ostream_iterator are, with all due respect, a
 joke that builds on another joke, the iostreams.

 After much thought and discussions among Walter, Bartosz and myself, I
 defined a range design and reimplemented all of std.algorithm and much of
 std.stdio in terms of ranges alone. This is quite a thorough test because
 the algorithms are diverse and stress-test the expressiveness and efficiency
 of the range design. Along the way I made the interesting realization that
 certain union/difference operations are needed as primitives for ranges.
 There are also a few bugs in the compiler and some needed language
 enhancements (e.g. returning a reference from a function); Walter is
 committed to implement them.

 I put together a short document for the range design. I definitely missed
 about a million things and have been imprecise about another million, so
 feedback would be highly appreciated. See:

 http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html


 Andrei

I like!

Sep 08 2008

BCS <ao pathlink.com> writes:

Reply to Andrei,

 Hello,
 
 Walter, Bartosz and myself have been hard at work trying to find the
 right abstraction for iteration. That abstraction would replace the
 infamous opApply and would allow for external iteration, thus paving
 the way to implementing real generic algorithms.

First of all, I /Like/ opApply. I know there are issues with it so I'd rather 
see it supplemented rather than replaced.

Sep 08 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

BCS wrote:
 Reply to Andrei,
 
 Hello,

 Walter, Bartosz and myself have been hard at work trying to find the
 right abstraction for iteration. That abstraction would replace the
 infamous opApply and would allow for external iteration, thus paving
 the way to implementing real generic algorithms.

 
 First of all, I /Like/ opApply. I know there are issues with it so I'd 
 rather see it supplemented rather than replaced.

We all like the way it looks. Ranges will preserve its syntax within a 
much more efficient and expressive implementation.

Andrei

Sep 08 2008

BCS <ao pathlink.com> writes:

Reply to Andrei,

 BCS wrote:
 
 Reply to Andrei,
 
 Hello,
 
 Walter, Bartosz and myself have been hard at work trying to find the
 right abstraction for iteration. That abstraction would replace the
 infamous opApply and would allow for external iteration, thus paving
 the way to implementing real generic algorithms.
 

 First of all, I /Like/ opApply. I know there are issues with it so
 I'd rather see it supplemented rather than replaced.
 

 We all like the way it looks. Ranges will preserve its syntax within a
 much more efficient and expressive implementation.
 
 Andrei
 

I was referring to the implementation as visible from the called code's side

Sep 08 2008

Walter Bright <newshound1 digitalmars.com> writes:

BCS wrote:
 I was referring to the implementation as visible from the called code's 
 side

opApply isn't going to go away, it will still be there as an alternative.

Sep 09 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Walter Bright wrote:
 BCS wrote:
 I was referring to the implementation as visible from the called 
 code's side

 
 opApply isn't going to go away, it will still be there as an alternative.

I disagree with that as I think that would be one perfect place to 
simplify the language thus fulfilling bearophile's and many others' wish.

Say we refine ranges to work with foreach to perfection. Then we have:

1. A foreach that sucks

2. A foreach that rocks

The obvious question is, why keep the one that sucks?


Andrei

Sep 09 2008

Extrawurst <spam extrawurst.org> writes:

Andrei Alexandrescu wrote:
 Walter Bright wrote:
 BCS wrote:
 I was referring to the implementation as visible from the called 
 code's side

 opApply isn't going to go away, it will still be there as an alternative.

 
 I disagree with that as I think that would be one perfect place to 
 simplify the language thus fulfilling bearophile's and many others' wish.
 
 Say we refine ranges to work with foreach to perfection. Then we have:
 
 1. A foreach that sucks
 
 2. A foreach that rocks
 
 The obvious question is, why keep the one that sucks?

I agree but i am worried that wont happen. D gets more and more polluted 
by deprecated and/or ambiguous stuff:

- inout/ref

- opCall/struct-ctor

are some examples. I whished D would only provide unambiguous features. 
Especially since D2.0 is the experimental branch anyway, so why not 
clean up finally ?

Sep 09 2008

"Denis Koroskin" <2korden gmail.com> writes:

On Tue, 09 Sep 2008 23:46:27 +0400, Extrawurst <spam extrawurst.org> wrote:

 Andrei Alexandrescu wrote:
 Walter Bright wrote:
 BCS wrote:
 I was referring to the implementation as visible from the called  
 code's side

 opApply isn't going to go away, it will still be there as an  
 alternative.

  I disagree with that as I think that would be one perfect place to  
 simplify the language thus fulfilling bearophile's and many others'  
 wish.
  Say we refine ranges to work with foreach to perfection. Then we have:
  1. A foreach that sucks
  2. A foreach that rocks
  The obvious question is, why keep the one that sucks?

 I agree but i am worried that wont happen. D gets more and more polluted  
 by deprecated and/or ambiguous stuff:

 - inout/ref

 - opCall/struct-ctor

 are some examples. I whished D would only provide unambiguous features.  
 Especially since D2.0 is the experimental branch anyway, so why not  
 clean up finally ?

I would also add:

invariant float pi1 = 3.1415926;
const float pi2 = 3.1415926;
enum pi3 = 3.1415926;
...

Sep 09 2008

Dejan Lekic <dejan.lekic tiscali.co.uk> writes:

 I disagree with that as I think that would be one perfect place to 
 simplify the language thus fulfilling bearophile's and many others' wish.

I agree with this. It is a good idea, and it is explained in a very good 
way. You have my support Mr. Alexandrescu.

Sep 09 2008

"Manfred_Nowak" <svv1999 hotmail.com> writes:

Dejan Lekic wrote:

 r.fromBegin(s) is really s.toEnd(r)

Might be true only, if `s' equals `r'.

Otherwise at least one seems to be undefined, because not both can be 
true subranges of each other. 

I used the conjunctive form, because a formal definition of "subrange" 
is missing. Such is needed because in a cyclic model `s' and `r' might 
be equal, but not identical, because they contain a whole cycle, but 
their start points differ.

-manfred

-- 
If life is going to exist in this Universe, then the one thing it 
cannot afford to have is a sense of proportion. (Douglas Adams)

Sep 09 2008

"Bill Baxter" <wbaxter gmail.com> writes:

On Wed, Sep 10, 2008 at 12:46 PM, Manfred_Nowak <svv1999 hotmail.com> wrote:
 Dejan Lekic wrote:

 r.fromBegin(s) is really s.toEnd(r)

 Might be true only, if `s' equals `r'.

 Otherwise at least one seems to be undefined, because not both can be
 true subranges of each other.

 I used the conjunctive form, because a formal definition of "subrange"
 is missing. Such is needed because in a cyclic model `s' and `r' might
 be equal, but not identical, because they contain a whole cycle, but
 their start points differ.

So I think you can put that in other words by saying that they could
mean different things if there's more to a range's state than just a
beginning indicator and an end indicator.

For instance in your example it would be like putting a "#of cycles"
member in the range itself as a third element, rather than associating
it with the end marker.

Interesting.  I suppose that sort of thing is possible, but maybe such
possibilities are annoying enough that they should be made illegal.
In your example it seems simple enough to make the cycle count a
property associated with the "end", and then the difference goes away.

--bb

Sep 09 2008

BCS <ao pathlink.com> writes:

Reply to Walter,

 BCS wrote:
 
 I was referring to the implementation as visible from the called
 code's side
 

 opApply isn't going to go away, it will still be there as an
 alternative.
 

Rock on!

Sorry I took so long to reply, I had to lobotimize my PC

Sep 11 2008

Sergey Gromov <snake.scaly gmail.com> writes:

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:
 Walter, Bartosz and myself have been hard at work trying to find the 
 right abstraction for iteration. That abstraction would replace the 
 infamous opApply and would allow for external iteration, thus paving the 
 way to implementing real generic algorithms.

opApply() wasn't my hero either. :)  Your article really looks like 
something I'd expect to find in D.  It only requires foreach support, 
and yeah, return by reference.

 I put together a short document for the range design. I definitely 
 missed about a million things and have been imprecise about another 
 million, so feedback would be highly appreciated. See:
 
 http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html

- next operation not mentioned in section 3, forward ranges.

- the union operations look... weird.  Unobvious.  I'm too sleepy now to 
propose anything better but I'll definitely give it a try.  The rest of 
the interface seems very natural.

- what's a collection?  How do you get a range out of there?  Collection 
should be a range itself, like an array.  But it shouldn't be destroyed 
after passing it to foreach().  How to achieve this if foreach() 
essentially uses getNext()?

I'd really love to have this design in D though.

Sep 08 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Sergey Gromov wrote:
 Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:
 Walter, Bartosz and myself have been hard at work trying to find the 
 right abstraction for iteration. That abstraction would replace the 
 infamous opApply and would allow for external iteration, thus paving the 
 way to implementing real generic algorithms.

 
 opApply() wasn't my hero either. :)  Your article really looks like 
 something I'd expect to find in D.  It only requires foreach support, 
 and yeah, return by reference.

Indeed. Both are in the works.

 I put together a short document for the range design. I definitely 
 missed about a million things and have been imprecise about another 
 million, so feedback would be highly appreciated. See:

 http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html

 
 - next operation not mentioned in section 3, forward ranges.

Thanks! Fixed.

 - the union operations look... weird.  Unobvious.  I'm too sleepy now to 
 propose anything better but I'll definitely give it a try.  The rest of 
 the interface seems very natural.

I agree I hadn't known what primitives would be needed when I sat down. 
Clearly there was a need for some since individual iterators are not 
available anymore. New ideas would be great; I suggest you validate them 
by implementing some nontrivial algorithms in std.algorithm with your, 
um, computational basis of choice :o).

 - what's a collection?  How do you get a range out of there?  Collection 
 should be a range itself, like an array.  But it shouldn't be destroyed 
 after passing it to foreach().  How to achieve this if foreach() 
 essentially uses getNext()?

These are great questions, I'm glad you asked. The way I see things, D2 
ranges can be of two kinds: owned and unowned. For example D1's ranges 
are all unowned:

auto a = new int[100];
...

This array is unowned because it going out of scope leaves the 
underlying array in place. Now consider:

scope a = new int[100];

In this case the array is owned by the current scope. Scoped data is a 
very crude form of ownership that IMHO brought more trouble than it 
solved. It's a huge hole smack in the middle of everything, and we plan 
to revisit it as soon as we can.

A better design would be to define collections that own their contents. 
For example:

Array!(int) a(100);

This time a does own the underlying array. You'd be able to get ranges 
all over it:

int[] b = a.all;

So now we have two nice notions: Arrays own the data. Ranges walk over 
that data. An array can have many ranges crawling over it. But two 
arrays never overlap. The contents of the array will be destroyed (BUT 
NOT DELETED) when a goes out of scope.

What's the deal with destroyed but not deleted? Consider:

int[] a;
if (condition) {
    Array!(int) b;
    a = b.all;
}
writeln(a);

This is a misuse of the array in that a range crawling on its back has 
survived the array itself. What should happen now? Looking at other 
languages:

1) All Java objects are unowned, meaning the issue does not appear in 
the first place, which is an advantage. The disadvantage is that scarce 
resources must be carefully managed by hand in client code.

2) C++ makes the behavior undefined because it destroys data AND 
recycles memory as soon as the array goes out of scope. Mayhem ensues.

We'd like:

1.5) Allow object ownership but also make the behavior of incorrect code 
well-defined so it can be reasoned about, reproduced, and debugged.

That's why I think an Array going out of scope should invoke destructors 
for its data, and then obliterate contents with ElementType.init. That 
way, an Array!(File) will properly close all files AND put them in a 
"closed" state. At the same time, the memory associated with the array 
will NOT be deallocated, so a range surviving the array will never crash 
unrelated code, but instead will see closed files all over.

In the case of int, there is no destructor so none of that happens. 
Surviving ranges will continue looking at the contents, which is now 
unowned.

So there is a difference in the way data with destructors and data 
without destructors is handled. I don't like that, but this is the most 
reasonably effective design I came up with so far.

About the "collection should be a range itself" mantra, I've had a 
micro-epiphany. Since D's slices so nicely model at the same time arrays 
and their ranges, it is very seductive to think of carrying that to 
other collection types. But I got disabused of that notion as soon as I 
wanted to define a less simple data structure. Consider a matrix:

auto a = BlockMatrix!(float, 3)(100, 200, 300);

defines a block contiguous matrix of three dimensions with the 
respective sizes. Now a should be the matrix AND its range at the same 
time. But what's "the range" of a matrix? Oops. As soon as you start to 
think of it, so many darn ranges come to mind.

* flat: all elements in one shot in an arbitrary order

* dimension-wise: iterate over a given dimension

* subspace: iterate over a "slice" of the matrix with fewer dimensions

* diagonal: scan the matrix from one corner to the opposite corner

I guess there are some more. So before long I realized that the most 
gainful design is this:

a) A matrix owns its stuff and is preoccupied with storage internals, 
allocation, and the such.

b) The matrix defines as many range types as it wants.

c) Users use the ranges.

For example:

foreach (ref e; a.flat) e *= 1.1;
foreach (row; a.dim(0)) row[0, 0] = 0;
foreach (col; a.dim(1)) col[1, 1] *= 5;

and so on.

Inevitably naysayers will, well, naysay: D defined a built-in array, but 
it also needs Array, so built-in arrays turned out to be useless. So how 
is that better than C++ which has pointers and vector? Walter has long 
feared such naysaying and opposed addition of user-defined array types 
to Phobos. But now I am fully prepared to un-naysay the naysayers: 
built-in slices are a superior building block to naked pointers. They 
are in fact embodying a powerful concept, that of a range. With ranges 
everything there is can be built efficiently and safely. Finally, 
garbage collection helps by ensuring object ownership while preserving 
well-definedness of incorrect code.


Andrei

Sep 08 2008

"Jarrett Billingsley" <jarrett.billingsley gmail.com> writes:

On Mon, Sep 8, 2008 at 8:24 PM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 Sergey Gromov wrote:
 Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:
 Walter, Bartosz and myself have been hard at work trying to find the
 right abstraction for iteration. That abstraction would replace the infamous
 opApply and would allow for external iteration, thus paving the way to
 implementing real generic algorithms.

 opApply() wasn't my hero either. :)  Your article really looks like
 something I'd expect to find in D.  It only requires foreach support, and
 yeah, return by reference.

 Indeed. Both are in the works.

Quick question about this one -- how will iterators get foreach
support?  Are they classes or structs?  If they're structs, how will
the compiler know something is an iterator?  Or will it be based on
duck typing (if it looks like an iterator, it must be an iterator)?

And if this support involves "blessing" certain types within the
runtime, what will this mean for other runtime libraries?

Sep 08 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Jarrett Billingsley wrote:
 On Mon, Sep 8, 2008 at 8:24 PM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 Sergey Gromov wrote:
 Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:
 Walter, Bartosz and myself have been hard at work trying to find the
 right abstraction for iteration. That abstraction would replace the infamous
 opApply and would allow for external iteration, thus paving the way to
 implementing real generic algorithms.

 opApply() wasn't my hero either. :)  Your article really looks like
 something I'd expect to find in D.  It only requires foreach support, and
 yeah, return by reference.

 Indeed. Both are in the works.

 
 Quick question about this one -- how will iterators get foreach
 support?  Are they classes or structs?  If they're structs, how will
 the compiler know something is an iterator?  Or will it be based on
 duck typing (if it looks like an iterator, it must be an iterator)?
 
 And if this support involves "blessing" certain types within the
 runtime, what will this mean for other runtime libraries?

Great question. We'll go with structs and duck typing (why the heck 
don't they call it structural typing...) but we'll add interfaces for 
the range types so that they can be used dynamically too. Code 
generation will take care of gluing implementations into classes (more 
on that later).

As someone said in some thread on digitalmars.d, if you start with 
structs it's easy to move towards classes. The other way around is not 
as easy.


Andrei

Sep 08 2008

Brad Roberts <braddr puremagic.com> writes:

 Great question. We'll go with structs and duck typing (why the heck
 don't they call it structural typing...) but we'll add interfaces for
 the range types so that they can be used dynamically too. Code
 generation will take care of gluing implementations into classes (more
 on that later).

 Andrei

Probably rhetorical, but I can't help myself:  If it walks like a duck
and it talks like a duck, it must be a duck.

Sep 08 2008

Walter Bright <newshound1 digitalmars.com> writes:

Brad Roberts wrote:
 Probably rhetorical, but I can't help myself:  If it walks like a duck
 and it talks like a duck, it must be a duck.

And if it floats and has a long nose, it's a witch!! Burn her!!!

Sep 09 2008

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

"Walter Bright" <newshound1 digitalmars.com> wrote in message 
news:ga6pq4$hqt$1 digitalmars.com...
 Brad Roberts wrote:
 Probably rhetorical, but I can't help myself:  If it walks like a duck
 and it talks like a duck, it must be a duck.

 And if it floats and has a long nose, it's a witch!! Burn her!!!

And she's got a wart! :)

-steve

Sep 09 2008

Alix Pexton <_a_l_i_x_._p_e_x_t_o_n_ _g_m_a_i_l_._c_o_m_> writes:

Andrei Alexandrescu wrote:
(why the heck don't they call it structural typing...) 
 
 Andrei

If it were the norm to call it "structural typing" and someone asked why it was
called
so, the one being asked would likely have to resort to diagrams and
gesticulation in
order to adequately convey the reasoning.

With "duck typing" a simple and widespread saying* is all that is required.

A...

* "If it walks like a duck and it quacks like a duck, then it's a duck!"

Sep 09 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Alix Pexton wrote:
 Andrei Alexandrescu wrote:
 (why the heck don't they call it structural typing...)
 Andrei

 
 If it were the norm to call it "structural typing" and someone asked why 
 it was called
 so, the one being asked would likely have to resort to diagrams and 
 gesticulation in
 order to adequately convey the reasoning.
 
 With "duck typing" a simple and widespread saying* is all that is required.
 
 A...
 
 * "If it walks like a duck and it quacks like a duck, then it's a duck!"

Yeah, I know. It's a good point. Yet I'm somehow weary of cultural 
references in formalisms.

Andrei

Sep 09 2008

"Robert Jacques" <sandford jhu.edu> writes:

On Mon, 08 Sep 2008 20:24:27 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 About the "collection should be a range itself" mantra, I've had a  
 micro-epiphany. Since D's slices so nicely model at the same time arrays  
 and their ranges, it is very seductive to think of carrying that to  
 other collection types. But I got disabused of that notion as soon as I  
 wanted to define a less simple data structure. Consider a matrix:

 auto a = BlockMatrix!(float, 3)(100, 200, 300);

 defines a block contiguous matrix of three dimensions with the  
 respective sizes. Now a should be the matrix AND its range at the same  
 time. But what's "the range" of a matrix? Oops. As soon as you start to  
 think of it, so many darn ranges come to mind.

 * flat: all elements in one shot in an arbitrary order

 * dimension-wise: iterate over a given dimension

 * subspace: iterate over a "slice" of the matrix with fewer dimensions

 * diagonal: scan the matrix from one corner to the opposite corner

 I guess there are some more. So before long I realized that the most  
 gainful design is this:

 a) A matrix owns its stuff and is preoccupied with storage internals,  
 allocation, and the such.

 b) The matrix defines as many range types as it wants.

 c) Users use the ranges.

 For example:

 foreach (ref e; a.flat) e *= 1.1;
 foreach (row; a.dim(0)) row[0, 0] = 0;
 foreach (col; a.dim(1)) col[1, 1] *= 5;

I'd recommend a more clear cut example. Three of the ranges are very well  
defined in array languages and libraries. Essentially a slice of a matrix  
is another matrix that may have less or more dimensions and therefore may  
be a collection in addition to a range. The dimension-wise range is the  
only operation which is more complex, due to the type and dimensions of  
the returned array changing float[x,y,z] -> float[x,y][z]. And the main  
argument is that a float[x,y][z] is large, slow to create and unwanted, so  
a separate range/generator is better (Also note a generator can provide  
implicit the head const, tail mutable nature of the range). Even given  
this, it doesn't contratict the "collection should be a range itself"  
mantra, since there is a very well defined range which encompasses the  
data, its just that some ranges are more optimal if they're only views,  
and not a root collection.

Sep 08 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Robert Jacques wrote:
 On Mon, 08 Sep 2008 20:24:27 -0400, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 
 About the "collection should be a range itself" mantra, I've had a 
 micro-epiphany. Since D's slices so nicely model at the same time 
 arrays and their ranges, it is very seductive to think of carrying 
 that to other collection types. But I got disabused of that notion as 
 soon as I wanted to define a less simple data structure. Consider a 
 matrix:

 auto a = BlockMatrix!(float, 3)(100, 200, 300);

 defines a block contiguous matrix of three dimensions with the 
 respective sizes. Now a should be the matrix AND its range at the same 
 time. But what's "the range" of a matrix? Oops. As soon as you start 
 to think of it, so many darn ranges come to mind.

 * flat: all elements in one shot in an arbitrary order

 * dimension-wise: iterate over a given dimension

 * subspace: iterate over a "slice" of the matrix with fewer dimensions

 * diagonal: scan the matrix from one corner to the opposite corner

 I guess there are some more. So before long I realized that the most 
 gainful design is this:

 a) A matrix owns its stuff and is preoccupied with storage internals, 
 allocation, and the such.

 b) The matrix defines as many range types as it wants.

 c) Users use the ranges.

 For example:

 foreach (ref e; a.flat) e *= 1.1;
 foreach (row; a.dim(0)) row[0, 0] = 0;
 foreach (col; a.dim(1)) col[1, 1] *= 5;

 
 I'd recommend a more clear cut example. Three of the ranges are very 
 well defined in array languages and libraries. Essentially a slice of a 
 matrix is another matrix that may have less or more dimensions and 
 therefore may be a collection in addition to a range.

There are two problems with the view that a slice of a matrix is also a 
matrix:

1. If ownership is desired, then the slice does not own anything in the 
matrix, so that does not put it on equal footing with the matrix it 
started with.

2. Slicing a block matrix on a hyperplane will result in a strided 
range. That is not a block matrix at all. So again it is more useful to 
think of the block matrix as the store, and of various ranges crawling 
over it as ways to look at the matrix.

I agree that ranges could be shoehorned into working. But then all 
ownership is out of the window, and also it creates more confusion than 
it clears. Now for n possible ranges over a block matrix, you have a 
daunting task ahead. You'd need to be able to construct any from any 
other, otherwise you could get stuck with a range that's a sort of dead end.

That could be solved in a number of ways (e.g. force all range2range 
conversions to go through some "central" range) but by that time you 
start to realize that that design is not quite gainful. Besides, it only 
takes care of matrices, but not of various other nonlinear structures

The approach in which the container is preoccupied with storage and 
ownership, and it offers various ranges that view the container in 
various ways, sounds like the better design to me.

 The dimension-wise 
 range is the only operation which is more complex, due to the type and 
 dimensions of the returned array changing float[x,y,z] -> float[x,y][z]. 
 And the main argument is that a float[x,y][z] is large, slow to create 
 and unwanted, so a separate range/generator is better (Also note a 
 generator can provide implicit the head const, tail mutable nature of 
 the range). Even given this, it doesn't contratict the "collection 
 should be a range itself" mantra, since there is a very well defined 
 range which encompasses the data, its just that some ranges are more 
 optimal if they're only views, and not a root collection.

Again, I agree a range-only view can be shoehorned into working. I just 
think it would be a bad design.


Andrei

Sep 08 2008

Derek Parnell <derek nomail.afraid.org> writes:

On Mon, 08 Sep 2008 22:22:04 -0400, Robert Jacques wrote:
 Essentially a slice of a matrix is another matrix 

I see it a little differently. To me, a slice of a matrix is a set of data
that can be used to construct a new matrix, among other uses of course.
Because a matrix can be sliced and diced in very many ways, some of which
are clearly not matrices, the best we can generalize about a matrix slice
is just that it is a set of data values. How one uses those values is
dependant, to a degree, on what sort of slice created the set.

-- 
Derek
(skype: derek.j.parnell)
Melbourne, Australia
9/09/2008 6:30:46 PM

Sep 09 2008

Sergey Gromov <snake.scaly gmail.com> writes:

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:
 Sergey Gromov wrote:
 - what's a collection?  How do you get a range out of there?  Collection 
 should be a range itself, like an array.  But it shouldn't be destroyed 
 after passing it to foreach().  How to achieve this if foreach() 
 essentially uses getNext()?

 
 These are great questions, I'm glad you asked. The way I see things, D2 
 ranges can be of two kinds: owned and unowned. For example D1's ranges 
 are all unowned:
 
 [snip]
 
 A better design would be to define collections that own their contents. 
 For example:
 
 Array!(int) a(100);
 
 This time a does own the underlying array. You'd be able to get ranges 
 all over it:
 
 int[] b = a.all;

I really don't like to have basic language constructs implemented as 
templates.  It's like Tuple!() which is sorta basic type but requires 
template trickery to really work with it.

 So now we have two nice notions: Arrays own the data. Ranges walk over 
 that data. An array can have many ranges crawling over it. But two 
 arrays never overlap. The contents of the array will be destroyed (BUT 
 NOT DELETED) when a goes out of scope.

This invalidates the idea of safe manipulations with data no matter 
where you've got that data from.

 About the "collection should be a range itself" mantra, I've had a 
 micro-epiphany. Since D's slices so nicely model at the same time arrays 
 and their ranges, it is very seductive to think of carrying that to 
 other collection types. But I got disabused of that notion as soon as I 
 wanted to define a less simple data structure. Consider a matrix:
 
 auto a = BlockMatrix!(float, 3)(100, 200, 300);
 
 defines a block contiguous matrix of three dimensions with the 
 respective sizes. Now a should be the matrix AND its range at the same 
 time. But what's "the range" of a matrix? Oops. As soon as you start to 
 think of it, so many darn ranges come to mind.

If you cannot think of a natural default range for your collection---
well, it's your decision not to implement range interface for it.  But 
if it does have a natural iteration semantics, it should be possible to 
implement:

auto a = new File("name");
auto b = new TreeSet!(char);

foreach(ch; a)
  b.insert(ch);

foreach(ch; b)
    writeln("unique char ", ch);

Here is the problem.  First foreach() naturally and expectedly changes 
the state of an object a.  Second foreach() naturally and expectedly 
does not make changes to object b.

Solution:

File is an Input range in your notion.  It supports isEmpty() and 
getNext(), it is non-copyable (but, note, referenceable).

TreeSet is a Collection, which you don't discuss.  It implements opSlice
() without arguments, which is required and sufficient to define a 
collection.  opSlice() must return a range that, at least, supports 
input range operations.

foreach() checks if a passed object implements opSlice() so that it can 
iterate non-destructively. If no opSlice() is provided, it falls back to 
getNext().

 a) A matrix owns its stuff and is preoccupied with storage internals, 
 allocation, and the such.
 
 b) The matrix defines as many range types as it wants.
 
 c) Users use the ranges.

No problem.  The matrix lives as long as a range refers to it.  As 
expected.

 Inevitably naysayers will, well, naysay: D defined a built-in array, but 
 it also needs Array, so built-in arrays turned out to be useless. So how 
 is that better than C++ which has pointers and vector? Walter has long 
 feared such naysaying and opposed addition of user-defined array types 
 to Phobos. But now I am fully prepared to un-naysay the naysayers: 
 built-in slices are a superior building block to naked pointers. They 
 are in fact embodying a powerful concept, that of a range. With ranges 
 everything there is can be built efficiently and safely. Finally, 
 garbage collection helps by ensuring object ownership while preserving 
 well-definedness of incorrect code.

Slices need to implement random access range interface, that's all.

Sep 09 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Sergey Gromov wrote:
 Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:
 Sergey Gromov wrote:
 - what's a collection?  How do you get a range out of there?  Collection 
 should be a range itself, like an array.  But it shouldn't be destroyed 
 after passing it to foreach().  How to achieve this if foreach() 
 essentially uses getNext()?

 These are great questions, I'm glad you asked. The way I see things, D2 
 ranges can be of two kinds: owned and unowned. For example D1's ranges 
 are all unowned:

 [snip]

 A better design would be to define collections that own their contents. 
 For example:

 Array!(int) a(100);

 This time a does own the underlying array. You'd be able to get ranges 
 all over it:

 int[] b = a.all;

 
 I really don't like to have basic language constructs implemented as 
 templates.  It's like Tuple!() which is sorta basic type but requires 
 template trickery to really work with it.

Well I guess we disagree on a number of issues here. The problem with
"sorta basic" types is that the list could go on forever. I'd rather use
a language that allows creation of good types from a small core, instead
of one that tries to supplant all sorta basic types it could think of.

 So now we have two nice notions: Arrays own the data. Ranges walk over 
 that data. An array can have many ranges crawling over it. But two 
 arrays never overlap. The contents of the array will be destroyed (BUT 
 NOT DELETED) when a goes out of scope.

 
 This invalidates the idea of safe manipulations with data no matter 
 where you've got that data from.

Manipulation remains typesafe. The problem is that sometimes we want to
ensure timely termination of certain resources.

 About the "collection should be a range itself" mantra, I've had a 
 micro-epiphany. Since D's slices so nicely model at the same time arrays 
 and their ranges, it is very seductive to think of carrying that to 
 other collection types. But I got disabused of that notion as soon as I 
 wanted to define a less simple data structure. Consider a matrix:

 auto a = BlockMatrix!(float, 3)(100, 200, 300);

 defines a block contiguous matrix of three dimensions with the 
 respective sizes. Now a should be the matrix AND its range at the same 
 time. But what's "the range" of a matrix? Oops. As soon as you start to 
 think of it, so many darn ranges come to mind.

 
 If you cannot think of a natural default range for your collection---
 well, it's your decision not to implement range interface for it.  But 
 if it does have a natural iteration semantics, it should be possible to 
 implement:

It's not that I can't think of one. It's that I think of too many.

 auto a = new File("name");
 auto b = new TreeSet!(char);
 
 foreach(ch; a)
   b.insert(ch);
 
 foreach(ch; b)
     writeln("unique char ", ch);
 
 Here is the problem.  First foreach() naturally and expectedly changes 
 the state of an object a.  Second foreach() naturally and expectedly 
 does not make changes to object b.
 
 Solution:
 
 File is an Input range in your notion.  It supports isEmpty() and 
 getNext(), it is non-copyable (but, note, referenceable).

You left a crucial detail out. What does getNext() return?

In the new std.stdio design, a File is preoccupied with opening,
closing, and transferring data for the underlying file. On top of it
several input ranges can be constructed - that read lines, blocks, parse
text, format text, and so on. (One thing I want is to allow
std.algorithm to work with I/O easily and naturally.)

I fully understand there can be so many design choices in handling all
this stuff, it's not even funny. I can't get excited about an equivalent
solution that to me has no obvious advantages. I can even less get
excited about a solution that I have objections with. That doesn't mean
someone else can't get excited over it, and probably rightly so.

 TreeSet is a Collection, which you don't discuss.  It implements opSlice
 () without arguments, which is required and sufficient to define a 
 collection.  opSlice() must return a range that, at least, supports 
 input range operations.
 
 foreach() checks if a passed object implements opSlice() so that it can 
 iterate non-destructively. If no opSlice() is provided, it falls back to 
 getNext().
 
 a) A matrix owns its stuff and is preoccupied with storage internals, 
 allocation, and the such.

 b) The matrix defines as many range types as it wants.

 c) Users use the ranges.

 
 No problem.  The matrix lives as long as a range refers to it.  As 
 expected.

I, too, wish reference counting is a solution to everything.


Andrei

Sep 09 2008

Sergey Gromov <snake.scaly gmail.com> writes:

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:
 Sergey Gromov wrote:
 Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:
 A better design would be to define collections that own their contents. 
 For example:

 Array!(int) a(100);

 This time a does own the underlying array. You'd be able to get ranges 
 all over it:

 int[] b = a.all;

 
 I really don't like to have basic language constructs implemented as 
 templates.  It's like Tuple!() which is sorta basic type but requires 
 template trickery to really work with it.

 
 Well I guess we disagree on a number of issues here. The problem with
 "sorta basic" types is that the list could go on forever. I'd rather use
 a language that allows creation of good types from a small core, instead
 of one that tries to supplant all sorta basic types it could think of.

I think I've got your point here.  D is not Python, it shouldn't do 
anything high-level in the core.  The notion of range is sufficient to 
iterate through anything, core (namely foreach) doesn't need to be aware 
of the collections themselves.

Though I'm not fully convinced.  It's always good to have good defaults.  
So that you could quickly throw things together, and attend to details 
later.  So that you could write

string[string] dic;
foreach(k, v; dic) whatever;

Can I do this with your Array!()?  Or should I always use all() even 
though the Array!() is plain linear and obvious?

 auto a = new File("name");
 auto b = new TreeSet!(char);
 
 foreach(ch; a)
   b.insert(ch);
 
 foreach(ch; b)
     writeln("unique char ", ch);
 
 Here is the problem.  First foreach() naturally and expectedly changes 
 the state of an object a.  Second foreach() naturally and expectedly 
 does not make changes to object b.
 
 Solution:
 
 File is an Input range in your notion.  It supports isEmpty() and 
 getNext(), it is non-copyable (but, note, referenceable).

 
 You left a crucial detail out. What does getNext() return?

Something.  Documented.  I'd be happy with string, that is, line by line 
iteration.  It's nice for text dumping, simple configurations, simple 
internet protocols like POP and FTP, user interaction.  You see, some 
useful default.  The File could also provide byte range bytes() and 
dchar range chars() and whatever the author considered feasible.

Note that I wasn't convincing you to change stdio design.  My choice of 
class names was bad.  I should have used MyFile instead of File, meaning 
some user class with user functionality.

 In the new std.stdio design, a File is preoccupied with opening,
 closing, and transferring data for the underlying file. On top of it
 several input ranges can be constructed - that read lines, blocks, parse
 text, format text, and so on. (One thing I want is to allow
 std.algorithm to work with I/O easily and naturally.)

OK, you like this design, no problem.  Better even.  Your File is 
naturally iterable over its bytes.  Any low-level file is, anybody knows 
that.  I can see no reason to deny foreach() over a file.

 foreach() checks if a passed object implements opSlice() so that it can 
 iterate non-destructively. If no opSlice() is provided, it falls back to 
 getNext().

Sep 10 2008

Leandro Lucarella <llucax gmail.com> writes:

Andrei Alexandrescu, el  8 de septiembre a las 19:24 me escribiste:
 1.5) Allow object ownership but also make the behavior of incorrect code 
 well-defined so it can be reasoned about, reproduced, and debugged.
 
 That's why I think an Array going out of scope should invoke destructors for
its 
 data, and then obliterate contents with ElementType.init. That way, an 
 Array!(File) will properly close all files AND put them in a "closed" state.
At 
 the same time, the memory associated with the array will NOT be deallocated,
so 
 a range surviving the array will never crash unrelated code, but instead will 
 see closed files all over.

Why is so bad that the program crashes if you do something wrong? For how
long you will have the memory "alive", it will use "regular" GC semantics
(i.e., when nobody points at it anymore)? In that case, leting the
programmer leave dangling pointers to data that should be "dead" without
crashing, wouldn't make easier to introduce memory leaks?

-- 
Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/
----------------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------------
Did you know the originally a Danish guy invented the burglar-alarm
unfortunately, it got stolen

Sep 09 2008

superdan <super dan.org> writes:

Leandro Lucarella Wrote:

 Andrei Alexandrescu, el  8 de septiembre a las 19:24 me escribiste:
 1.5) Allow object ownership but also make the behavior of incorrect code 
 well-defined so it can be reasoned about, reproduced, and debugged.
 
 That's why I think an Array going out of scope should invoke destructors for
its 
 data, and then obliterate contents with ElementType.init. That way, an 
 Array!(File) will properly close all files AND put them in a "closed" state.
At 
 the same time, the memory associated with the array will NOT be deallocated,
so 
 a range surviving the array will never crash unrelated code, but instead will 
 see closed files all over.

 
 Why is so bad that the program crashes if you do something wrong?

it's not bad. it's good if it crashes. problem is when it don't crash and
continues running on oil instead of gas if you see what i mean.

 For how
 long you will have the memory "alive", it will use "regular" GC semantics
 (i.e., when nobody points at it anymore)? In that case, leting the
 programmer leave dangling pointers to data that should be "dead" without
 crashing, wouldn't make easier to introduce memory leaks?

such is the peril of gc. clearly meshing scoping with gc ain't gonna be
perfect. but i like the next best thing. scarce resources are deallocated
quick. memory stays around for longer. no dangling pointers.

Sep 09 2008

Leandro Lucarella <llucax gmail.com> writes:

superdan, el  9 de septiembre a las 10:12 me escribiste:
 For how
 long you will have the memory "alive", it will use "regular" GC semantics
 (i.e., when nobody points at it anymore)? In that case, leting the
 programmer leave dangling pointers to data that should be "dead" without
 crashing, wouldn't make easier to introduce memory leaks?

 
 such is the peril of gc. clearly meshing scoping with gc ain't gonna be
perfect. but i like the next best thing. scarce resources are deallocated
quick. memory stays around for longer. no dangling pointers.

I was talking about logical[1] memory leaks, wich are possible even with
GC.

int[] a;
if (condition) {
   Array!(int) b;
      a = b.all;
}

If you expect that b memory is freed at the end of the scope, and
a retains it, is a logical memory leak (you probably forgot to null
a before the scope ended). I think this kind of errors should be detected
as soon as possible, as opposed to let a keep using that memory (or leak
it).

[1] http://en.wikipedia.org/wiki/Garbage_collection_(computer_science)#Benefits

-- 
Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/
----------------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------------
Desde chiquito quería ser doctor
Pero después me enfermé y me hice músico

Sep 09 2008

downs <default_357-line yahoo.de> writes:

Andrei Alexandrescu wrote:
 What's the deal with destroyed but not deleted? Consider:
 
 int[] a;
 if (condition) {
    Array!(int) b;
    a = b.all;
 }
 writeln(a);
 
 This is a misuse of the array in that a range crawling on its back has
 survived the array itself. What should happen now? Looking at other
 languages:
 
 1) All Java objects are unowned, meaning the issue does not appear in
 the first place, which is an advantage. The disadvantage is that scarce
 resources must be carefully managed by hand in client code.
 
 2) C++ makes the behavior undefined because it destroys data AND
 recycles memory as soon as the array goes out of scope. Mayhem ensues.
 
 We'd like:
 
 1.5) Allow object ownership but also make the behavior of incorrect code
 well-defined so it can be reasoned about, reproduced, and debugged.
 
 That's why I think an Array going out of scope should invoke destructors
 for its data, and then obliterate contents with ElementType.init. That
 way, an Array!(File) will properly close all files AND put them in a
 "closed" state. At the same time, the memory associated with the array
 will NOT be deallocated, so a range surviving the array will never crash
 unrelated code, but instead will see closed files all over.
 

I don't think this is a good thing, for reasons similar to the Error/Exception
flaw - specifically, code that works in debug mode might end up failing in
release mode.

To explain what I mean by Error/Exception flaw, consider the case of an array
out of bounds error, wrapped carelessly in a try .. catch (Exception) block.

This would work fine in debug mode, and presumably retry the operation until it
succeeded.

In release mode, however, the above would crash.

This is clearly undesirable, and arises directly from the fact that Error is
derived from Exception, not the other way around or completely separate (as it
clearly should be).

After all, an Error ![is] an Exception, since Exceptions are clearly defined as
recoverable errors, and the set of unrecoverable errors is obviously not a
subset of the recoverable ones.

This leads to my actual point: I suggest an extension of .init: the .fail
state, indicating data that should not be accessed.

Any standard library function that encounters data that is intentionally in the
.fail state should throw an Error.

For instance, the .fail state for strings could be a deliberately invalid UTF8
sequence.

When this state could reasonably come up in normal operations, it is
recommended to use values that will readily be visible in a debugger, such as
the classical 0xDEADBEEF.

This is imnsho superior to using .init to fill this memory, which doesn't tell
the debugging programmer much about what exactly happened, and furthermore,
might cause the program to treat "invalid" memory the same as "fresh" memory,
if only by accident.

 --downs

Sep 09 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

downs wrote:
 Andrei Alexandrescu wrote:
 What's the deal with destroyed but not deleted? Consider:
 
 int[] a; if (condition) { Array!(int) b; a = b.all; } writeln(a);
 
 This is a misuse of the array in that a range crawling on its back
 has survived the array itself. What should happen now? Looking at
 other languages:
 
 1) All Java objects are unowned, meaning the issue does not appear
 in the first place, which is an advantage. The disadvantage is that
 scarce resources must be carefully managed by hand in client code.
 
 2) C++ makes the behavior undefined because it destroys data AND 
 recycles memory as soon as the array goes out of scope. Mayhem
 ensues.
 
 We'd like:
 
 1.5) Allow object ownership but also make the behavior of incorrect
 code well-defined so it can be reasoned about, reproduced, and
 debugged.
 
 That's why I think an Array going out of scope should invoke
 destructors for its data, and then obliterate contents with
 ElementType.init. That way, an Array!(File) will properly close all
 files AND put them in a "closed" state. At the same time, the
 memory associated with the array will NOT be deallocated, so a
 range surviving the array will never crash unrelated code, but
 instead will see closed files all over.
 

 
 I don't think this is a good thing, for reasons similar to the
 Error/Exception flaw - specifically, code that works in debug mode
 might end up failing in release mode.
 
 To explain what I mean by Error/Exception flaw, consider the case of
 an array out of bounds error, wrapped carelessly in a try .. catch
 (Exception) block.
 
 This would work fine in debug mode, and presumably retry the
 operation until it succeeded.
 
 In release mode, however, the above would crash.
 
 This is clearly undesirable, and arises directly from the fact that
 Error is derived from Exception, not the other way around or
 completely separate (as it clearly should be).
 
 After all, an Error ![is] an Exception, since Exceptions are clearly
 defined as recoverable errors, and the set of unrecoverable errors is
 obviously not a subset of the recoverable ones.
 
 This leads to my actual point: I suggest an extension of .init: the
 .fail state, indicating data that should not be accessed.
 
 Any standard library function that encounters data that is
 intentionally in the .fail state should throw an Error.
 
 For instance, the .fail state for strings could be a deliberately
 invalid UTF8 sequence.
 
 When this state could reasonably come up in normal operations, it is
 recommended to use values that will readily be visible in a debugger,
 such as the classical 0xDEADBEEF.
 
 This is imnsho superior to using .init to fill this memory, which
 doesn't tell the debugging programmer much about what exactly
 happened, and furthermore, might cause the program to treat "invalid"
 memory the same as "fresh" memory, if only by accident.

I hear you. I brought up the same exact design briefly with Bartosz last 
week. We called it T.invalid. He argued in favor of it. I thought it 
brings more complication than it's worth and was willing to go with 
T.init for simplicity's sake. Why deal with two empty states instead of 
one.

One nagging question is, what is T.fail for integral types? For pointers 
fine, one could be found. For chars, fine too. But for integrals I'm not 
sure that e.g. T.min or T.max is a credible fail value.


Andrei

Sep 09 2008

downs <default_357-line yahoo.de> writes:

Andrei Alexandrescu wrote:
 downs wrote:
 Andrei Alexandrescu wrote:
 What's the deal with destroyed but not deleted? Consider:

 int[] a; if (condition) { Array!(int) b; a = b.all; } writeln(a);

 This is a misuse of the array in that a range crawling on its back
 has survived the array itself. What should happen now? Looking at
 other languages:

 1) All Java objects are unowned, meaning the issue does not appear
 in the first place, which is an advantage. The disadvantage is that
 scarce resources must be carefully managed by hand in client code.

 2) C++ makes the behavior undefined because it destroys data AND
 recycles memory as soon as the array goes out of scope. Mayhem
 ensues.

 We'd like:

 1.5) Allow object ownership but also make the behavior of incorrect
 code well-defined so it can be reasoned about, reproduced, and
 debugged.

 That's why I think an Array going out of scope should invoke
 destructors for its data, and then obliterate contents with
 ElementType.init. That way, an Array!(File) will properly close all
 files AND put them in a "closed" state. At the same time, the
 memory associated with the array will NOT be deallocated, so a
 range surviving the array will never crash unrelated code, but
 instead will see closed files all over.

 I don't think this is a good thing, for reasons similar to the
 Error/Exception flaw - specifically, code that works in debug mode
 might end up failing in release mode.

 To explain what I mean by Error/Exception flaw, consider the case of
 an array out of bounds error, wrapped carelessly in a try .. catch
 (Exception) block.

 This would work fine in debug mode, and presumably retry the
 operation until it succeeded.

 In release mode, however, the above would crash.

 This is clearly undesirable, and arises directly from the fact that
 Error is derived from Exception, not the other way around or
 completely separate (as it clearly should be).

 After all, an Error ![is] an Exception, since Exceptions are clearly
 defined as recoverable errors, and the set of unrecoverable errors is
 obviously not a subset of the recoverable ones.

 This leads to my actual point: I suggest an extension of .init: the
 .fail state, indicating data that should not be accessed.

 Any standard library function that encounters data that is
 intentionally in the .fail state should throw an Error.

 For instance, the .fail state for strings could be a deliberately
 invalid UTF8 sequence.

 When this state could reasonably come up in normal operations, it is
 recommended to use values that will readily be visible in a debugger,
 such as the classical 0xDEADBEEF.

 This is imnsho superior to using .init to fill this memory, which
 doesn't tell the debugging programmer much about what exactly
 happened, and furthermore, might cause the program to treat "invalid"
 memory the same as "fresh" memory, if only by accident.

 
 I hear you. I brought up the same exact design briefly with Bartosz last
 week. We called it T.invalid. He argued in favor of it. I thought it
 brings more complication than it's worth and was willing to go with
 T.init for simplicity's sake. Why deal with two empty states instead of
 one.
 
 One nagging question is, what is T.fail for integral types? For pointers
 fine, one could be found. For chars, fine too. But for integrals I'm not
 sure that e.g. T.min or T.max is a credible fail value.
 
 
 Andrei

For numbers, it should probably be "the same as .init". Not every error
condition can be detected, sadly.

It would also be nice if a .fail value could be provided as an extension to
typedef somehow .. user defined types will probably have their own possible
failure indicators.

Sep 09 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

downs wrote:
 For numbers, it should probably be "the same as .init". Not every
 error condition can be detected, sadly.

That further dillutes the benefits of T.fail.

 It would also be nice if a .fail value could be provided as an
 extension to typedef somehow .. user defined types will probably have
 their own possible failure indicators.

That further increases the cognitive cost of T.fail.

Not putting you down. I think the notion is good, but I think we need to 
thoroughly understand its costs and benefits before even raising it to 
Walter's level of consciousness :o).


Andrei

Sep 09 2008

Walter Bright <newshound1 digitalmars.com> writes:

Andrei Alexandrescu wrote:
 I hear you. I brought up the same exact design briefly with Bartosz last 
 week. We called it T.invalid. He argued in favor of it. I thought it 
 brings more complication than it's worth and was willing to go with 
 T.init for simplicity's sake. Why deal with two empty states instead of 
 one.
 
 One nagging question is, what is T.fail for integral types? For pointers 
 fine, one could be found. For chars, fine too. But for integrals I'm not 
 sure that e.g. T.min or T.max is a credible fail value.

The T.init value should be that. That's why, for floats, float.init is a 
NaN. But for many types, there is no such thing as an invalid value, so 
it really doesn't work for generic code.

Sep 09 2008

Benji Smith <dlanguage benjismith.net> writes:

Walter Bright wrote:
 Andrei Alexandrescu wrote:
 I hear you. I brought up the same exact design briefly with Bartosz 
 last week. We called it T.invalid. He argued in favor of it. I thought 
 it brings more complication than it's worth and was willing to go with 
 T.init for simplicity's sake. Why deal with two empty states instead 
 of one.

 One nagging question is, what is T.fail for integral types? For 
 pointers fine, one could be found. For chars, fine too. But for 
 integrals I'm not sure that e.g. T.min or T.max is a credible fail value.

 
 The T.init value should be that. That's why, for floats, float.init is a 
 NaN. But for many types, there is no such thing as an invalid value, so 
 it really doesn't work for generic code.

I don't think values necessarily have to be initialized to an invalid 
value. You could certainly argue that NaN values are valid results of 
certain computations, and that they're valid in certain contexts.

The important thing is that they're *uncommon*, and if you see them 
cropping up all over the place where they shouldn't, you know you have 
an initialization problem somewhere in your code.

The same thing could be true for integers, but zero is such a common 
value that it's tough to spot the origin of the error.

If signed integers were initialized to min_value and signed integers 
were initialized to max_value, I think those initialization errors would 
be easier to track down. Not because the values are illegal, but because 
they're *uncommon*.

--benji

Sep 09 2008

JAnderson <ask me.com> writes:

Benji Smith wrote:
 Walter Bright wrote:
 Andrei Alexandrescu wrote:
 I hear you. I brought up the same exact design briefly with Bartosz 
 last week. We called it T.invalid. He argued in favor of it. I 
 thought it brings more complication than it's worth and was willing 
 to go with T.init for simplicity's sake. Why deal with two empty 
 states instead of one.

 One nagging question is, what is T.fail for integral types? For 
 pointers fine, one could be found. For chars, fine too. But for 
 integrals I'm not sure that e.g. T.min or T.max is a credible fail 
 value.

 The T.init value should be that. That's why, for floats, float.init is 
 a NaN. But for many types, there is no such thing as an invalid value, 
 so it really doesn't work for generic code.

 
 I don't think values necessarily have to be initialized to an invalid 
 value. You could certainly argue that NaN values are valid results of 
 certain computations, and that they're valid in certain contexts.
 
 The important thing is that they're *uncommon*, and if you see them 
 cropping up all over the place where they shouldn't, you know you have 
 an initialization problem somewhere in your code.
 
 The same thing could be true for integers, but zero is such a common 
 value that it's tough to spot the origin of the error.
 
 If signed integers were initialized to min_value and signed integers 
 were initialized to max_value, I think those initialization errors would 
 be easier to track down. Not because the values are illegal, but because 
 they're *uncommon*.
 
 --benji

I agree.  I use the 0xcdcdcdcd and 0xfefefefe provided by MSVC a lot to 
track down errors.

-Jowl

Sep 09 2008

"Manfred_Nowak" <svv1999 hotmail.com> writes:

Walter Bright wrote:

 But for many types, there is no such thing as an invalid value

Why can one then define

|   typedef int T=void; // T.init == void

-manfred

-- 
If life is going to exist in this Universe, then the one thing it 
cannot afford to have is a sense of proportion. (Douglas Adams)

Sep 10 2008

Brad Roberts <braddr puremagic.com> writes:

Manfred_Nowak wrote:
 Walter Bright wrote:
 
 But for many types, there is no such thing as an invalid value

 
 Why can one then define
 
 |   typedef int T=void; // T.init == void
 
 -manfred

That just means don't initialize, leaving any instances with random
values until first assignment.  That doesn't mean that it contains an
invalid value.

Later,
Brad

Sep 10 2008

"Manfred_Nowak" <svv1999 hotmail.com> writes:

Brad Roberts wrote:

 That just means don't initialize

I know what the semantics of `T= void' is supposed to be, but your 
remark is only a result of Walters overloading of meanings to keywords.

It does not change the fact, that for all types _one_ more possibility 
exists to (not)initialize it, than it has legal values. 

If there is one more possibility, then there are many more; including 
the possibility, that the initial value is in fact `void', i.e. illegal 
as an rvalue.

See
http://www.digitalmars.com/webnews/newsgroups.php?
art_group=digitalmars.D.bugs&article_id=15041
for an example that, D in fact uses `void' as an initilization value.

-manfred

-- 
If life is going to exist in this Universe, then the one thing it 
cannot afford to have is a sense of proportion. (Douglas Adams)

Sep 11 2008

Sergey Gromov <snake.scaly gmail.com> writes:

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:
 Sergey Gromov wrote:
 - the union operations look... weird.  Unobvious.  I'm too sleepy now to 
 propose anything better but I'll definitely give it a try.  The rest of 
 the interface seems very natural.

 
 I agree I hadn't known what primitives would be needed when I sat down. 
 Clearly there was a need for some since individual iterators are not 
 available anymore. New ideas would be great; I suggest you validate them 
 by implementing some nontrivial algorithms in std.algorithm with your, 
 um, computational basis of choice :o).

r.before(s)
r.after(s)
r.begin
r.end

Here r.before(s) is everything from the r's first element (inclusive) to 
the first s's element (exclusive); r.after(s) is from last s's element 
(exclusive) to the last element of r (inclusive); r.begin is an empty 
range at the beginning of a parent range; and r.end is an empty range at 
the end of a parent range.  Therefore, according to your diagram: 

r.toBegin(s) => r.before(s)
s.toEnd(r) => s.before(r.end)
s.fromEnd(r) => s.after(r)

Sep 10 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Sergey Gromov wrote:
 Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:
 Sergey Gromov wrote:
 - the union operations look... weird.  Unobvious.  I'm too sleepy now to 
 propose anything better but I'll definitely give it a try.  The rest of 
 the interface seems very natural.

 I agree I hadn't known what primitives would be needed when I sat down. 
 Clearly there was a need for some since individual iterators are not 
 available anymore. New ideas would be great; I suggest you validate them 
 by implementing some nontrivial algorithms in std.algorithm with your, 
 um, computational basis of choice :o).

 
 r.before(s)
 r.after(s)
 r.begin
 r.end
 
 Here r.before(s) is everything from the r's first element (inclusive) to 
 the first s's element (exclusive); r.after(s) is from last s's element 
 (exclusive) to the last element of r (inclusive); r.begin is an empty 
 range at the beginning of a parent range; and r.end is an empty range at 
 the end of a parent range.  Therefore, according to your diagram: 
 
 r.toBegin(s) => r.before(s)
 s.toEnd(r) => s.before(r.end)
 s.fromEnd(r) => s.after(r)

Cool! I was thinking of something along the same lines through the 
night, and actually with the same names before and after, but the begin 
and end did not occur to me. As soon as I'll have another chunk of time, 
I'll make another pass through algorithm2 to see how these work. But you 
may as well want to take std.algorithm and make it work with your 
primitives.


Andrei

Sep 10 2008

"Denis Koroskin" <2korden gmail.com> writes:

On Tue, 09 Sep 2008 01:50:54 +0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Hello,


 Walter, Bartosz and myself have been hard at work trying to find the  
 right abstraction for iteration. That abstraction would replace the  
 infamous opApply and would allow for external iteration, thus paving the  
 way to implementing real generic algorithms.

 We considered an STL-style container/iterator design. Containers would  
 use the newfangled value semantics to enforce ownership of their  
 contents. Iterators would span containers in various ways.

 The main problem with that approach was integrating built-in arrays into  
 the design. STL's iterators are generalized pointers; D's built-in  
 arrays are, however, not pointers, they are "pairs of pointers" that  
 cover contiguous ranges in memory. Most people who've used D gained the  
 intuition that slices are superior to pointers in many ways, such as  
 easier checking for validity, higher-level compact primitives,  
 streamlined and safe interface. However, if STL iterators are  
 generalized pointers, what is the corresponding generalization of D's  
 slices? Intuitively that generalization should also be superior to  
 iterators.

 In a related development, the Boost C++ library has defined ranges as  
 pairs of two iterators and implemented a series of wrappers that accept  
 ranges and forward their iterators to STL functions. The main outcome of  
 Boost ranges been to decrease the verboseness and perils of naked  
 iterator manipulation (see  
 http://www.boost.org/doc/libs/1_36_0/libs/range/doc/intro.html). So a  
 C++ application using Boost could avail itself of containers, ranges,  
 and iterators. The Boost notion of range is very close to a  
 generalization of D's slice.

 We have considered that design too, but that raised a nagging question.  
 In most slice-based D programming, using bare pointers is not necessary.  
 Could then there be a way to use _only_ ranges and eliminate iterators  
 altogether? A container/range design would be much simpler than one also  
 exposing iterators.

 All these questions aside, there are several other imperfections in the  
 STL, many caused by the underlying language. For example STL is  
 incapable of distinguishing between input/output iterators and forward  
 iterators. This is because C++ cannot reasonably implement a type with  
 destructive copy semantics, which is what would be needed to make said  
 distinction. We wanted the Phobos design to provide appropriate answers  
 to such questions, too. This would be useful particularly because it  
 would allow implementation of true and efficient I/O integrated with  
 iteration. STL has made an attempt at that, but istream_iterator and  
 ostream_iterator are, with all due respect, a joke that builds on  
 another joke, the iostreams.

 After much thought and discussions among Walter, Bartosz and myself, I  
 defined a range design and reimplemented all of std.algorithm and much  
 of std.stdio in terms of ranges alone. This is quite a thorough test  
 because the algorithms are diverse and stress-test the expressiveness  
 and efficiency of the range design. Along the way I made the interesting  
 realization that certain union/difference operations are needed as  
 primitives for ranges. There are also a few bugs in the compiler and  
 some needed language enhancements (e.g. returning a reference from a  
 function); Walter is committed to implement them.

 I put together a short document for the range design. I definitely  
 missed about a million things and have been imprecise about another  
 million, so feedback would be highly appreciated. See:

 http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html


 Andrei

1) There is a typo:

// Copies a range to another
void copy(R1, R2)(R1 src, R2 tgt)
{
     while (!src.isEmpty)
     {
         tgt.putNext(r.getNext);  // should be tgt.putNext(src.getNext);
     }
}

2) R.next and R.pop could have better names. I mean, they represent  
similar operations yet names are so different.

3) Walter mentioned that built-in array could be re-implemented using a  
pair of pointers instead of ptr+length. Will it ever get a green light? It  
fits range concept much better.

4) We need some way of supporting dollar notation in user containers. The  
hack of using __dollar is bad (although it works).

5) I don't quite like names left and right! :) I think they should  
represent limits (pointers to begin and end, in case of array) rather that  
values. In this case, built-in arrays could be implemented as follows:

struct Array(T)
{
     T* left;
     T* right;
     size_t length() { return right-left; }
     ref T opIndex(size_t index) { return left[index]; }
     // etc
}

The rationale behind having access to range limits is to allow operations  
on them. For example,
R.left-=n;

could be used instead of
foreach(i; 0..n) {
     R.pop();
}

which is more efficient in many cases.

Other that that - great, I like it.

Sep 08 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Denis Koroskin wrote:
 1) There is a typo:
 
 // Copies a range to another
 void copy(R1, R2)(R1 src, R2 tgt)
 {
     while (!src.isEmpty)
     {
         tgt.putNext(r.getNext);  // should be tgt.putNext(src.getNext);
     }
 }

Thanks! Fixed.

 2) R.next and R.pop could have better names. I mean, they represent 
 similar operations yet names are so different.

I agree. Next was a natural choice. I stole pop from Perl. Any symmetric 
and short operation names would be welcome.

 3) Walter mentioned that built-in array could be re-implemented using a 
 pair of pointers instead of ptr+length. Will it ever get a green light? 
 It fits range concept much better.

Walter told me to first implement my design, and if it works, he'll do 
the change. Yes, it does fit ranges much better because the often-used 
next and, um, pop will only touch one word instead of two.

 4) We need some way of supporting dollar notation in user containers. 
 The hack of using __dollar is bad (although it works).

It doesn't work for multiple dimensions. There should be an 
opDollar(uint dim) that gives the library information on which argument 
count it occured in. Consider:

auto x = matrix[$-1, $-1];

Here the dollar's occurrences have different meanings. A good start 
would be to expand the above into:

auto x = matrix[matrix.opDollar(0)-1, matrix.opDollar(1)-1];

 5) I don't quite like names left and right! :) I think they should 
 represent limits (pointers to begin and end, in case of array) rather 
 that values. In this case, built-in arrays could be implemented as follows:
 
 struct Array(T)
 {
     T* left;
     T* right;
     size_t length() { return right-left; }
     ref T opIndex(size_t index) { return left[index]; }
     // etc
 }
 
 The rationale behind having access to range limits is to allow 
 operations on them. For example,
 R.left-=n;

I disagree. Defining operations on range limits opens a box that would 
make Pandora jealous:

1. What is the type of left in general? Um, let's define Iterator!(R) 
for each range R.

2. What are the primitives of an iterator? Well, -= sounds good. How do 
you check it for correctness? In fact, how do you check any operation of 
a naked iterator for correctness?

3. I want to play with some data. What should I use here, ranges or 
iterators? ...

Much of the smarts of the range design is that it gets away WITHOUT 
having to answer embarrassing questions such as the above. Ranges are 
rock-solid, and part of them being rock-solid is that they expose enough 
primitives to be complete, but at the same time do not expose dangerous 
internals.

 could be used instead of
 foreach(i; 0..n) {
     R.pop();
 }
 
 which is more efficient in many cases.

Stop right there. That's not a primitive. It is an algorithm that gets 
implemented in terms of a primitive. I disagree that such an algorithm 
is an operator and does not have a name such as popN.

 Other that that - great, I like it.

Thanks for your comments.


Andrei

Sep 08 2008

"Robert Jacques" <sandford jhu.edu> writes:

On Mon, 08 Sep 2008 20:37:41 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:
 Denis Koroskin wrote:
 3) Walter mentioned that built-in array could be re-implemented using a  
 pair of pointers instead of ptr+length. Will it ever get a green light?  
 It fits range concept much better.

 Walter told me to first implement my design, and if it works, he'll do  
 the change. Yes, it does fit ranges much better because the often-used  
 next and, um, pop will only touch one word instead of two.

I'd warn that changing away from ptr+length would create logical  
incosistencies between 1D arrays and 2D/3D/ND arrays.

 4) We need some way of supporting dollar notation in user containers.  
 The hack of using __dollar is bad (although it works).

 It doesn't work for multiple dimensions. There should be an  
 opDollar(uint dim) that gives the library information on which argument  
 count it occured in. Consider:

 auto x = matrix[$-1, $-1];

 Here the dollar's occurrences have different meanings. A good start  
 would be to expand the above into:

 auto x = matrix[matrix.opDollar(0)-1, matrix.opDollar(1)-1];

I'd also add that multiple dimension slicing should be supported. i.e.
auto x = matrix[2..5,0..$,3]
would become
auto x =  
matrix.opSlice(Slice!(size_t)(2,5),Slice!(size_t)(0,matrix.opDollar(0)),3)
with
struct Slice (T) { T start; T end; }
Strided slices would also be nice. i.e. matrix[0..$:10] // decimate the  
array

Sep 08 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Robert Jacques wrote:
 On Mon, 08 Sep 2008 20:37:41 -0400, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 Denis Koroskin wrote:
 3) Walter mentioned that built-in array could be re-implemented using 
 a pair of pointers instead of ptr+length. Will it ever get a green 
 light? It fits range concept much better.

 Walter told me to first implement my design, and if it works, he'll do 
 the change. Yes, it does fit ranges much better because the often-used 
 next and, um, pop will only touch one word instead of two.

 
 I'd warn that changing away from ptr+length would create logical 
 incosistencies between 1D arrays and 2D/3D/ND arrays.

How so?

 4) We need some way of supporting dollar notation in user containers. 
 The hack of using __dollar is bad (although it works).

 It doesn't work for multiple dimensions. There should be an 
 opDollar(uint dim) that gives the library information on which 
 argument count it occured in. Consider:

 auto x = matrix[$-1, $-1];

 Here the dollar's occurrences have different meanings. A good start 
 would be to expand the above into:

 auto x = matrix[matrix.opDollar(0)-1, matrix.opDollar(1)-1];

 
 I'd also add that multiple dimension slicing should be supported. i.e.
 auto x = matrix[2..5,0..$,3]
 would become
 auto x = 
 matrix.opSlice(Slice!(size_t)(2,5),Slice!(size_t)(0,matrix.opDollar(0)),3)
 with
 struct Slice (T) { T start; T end; }
 Strided slices would also be nice. i.e. matrix[0..$:10] // decimate the 
 array

Multidimensional slicing can be implemented with staggered indexing:

matrix[2..5][0..$][3]

means: first, take a slice 2..5 that returns a matrix range one 
dimension smaller. Then, for that type take a slice from 0 to $. And so on.

This works great for row-wise storage. I'm not sure how efficient it 
would be for other storage schemes.

Note how nice the distinction between the container and its views works: 
there is only one matrix. But there are many ranges and subranges within 
it, bearing various relationships with one another.


Andrei

Sep 08 2008

"Robert Jacques" <sandford jhu.edu> writes:

On Mon, 08 Sep 2008 23:53:17 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Robert Jacques wrote:
 On Mon, 08 Sep 2008 20:37:41 -0400, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> wrote:
 Denis Koroskin wrote:
 3) Walter mentioned that built-in array could be re-implemented using  
 a pair of pointers instead of ptr+length. Will it ever get a green  
 light? It fits range concept much better.

 Walter told me to first implement my design, and if it works, he'll do  
 the change. Yes, it does fit ranges much better because the often-used  
 next and, um, pop will only touch one word instead of two.

  I'd warn that changing away from ptr+length would create logical  
 incosistencies between 1D arrays and 2D/3D/ND arrays.

 How so?

An ND array is typically defined as a fat pointer like so:
struct array(T,size_t N) {
     T* ptr;
     size_t[N]    lengths;      // of each dimension
     ptrdiff_t[N] byte_strides; // of each dimension
}
So a 1D array is
{
     T* ptr;
     size_t lengths;
     ptrdiff_t byte_strides = T.sizeof; //Currently a compile time constant  
in the built-in array
     size_t length() { return lengths; }
}
which is logically consistent with a general dense matrix and aside from  
some name change and the stride being a compile time constant, is  
identical to the current D arrays.
However, { T* first; T* last } may not be logically extended to ND arrays,  
particularly sliced ND arrays, as T* last no longer has any meaning.

 4) We need some way of supporting dollar notation in user containers.  
 The hack of using __dollar is bad (although it works).

 It doesn't work for multiple dimensions. There should be an  
 opDollar(uint dim) that gives the library information on which  
 argument count it occured in. Consider:

 auto x = matrix[$-1, $-1];

 Here the dollar's occurrences have different meanings. A good start  
 would be to expand the above into:

 auto x = matrix[matrix.opDollar(0)-1, matrix.opDollar(1)-1];

  I'd also add that multiple dimension slicing should be supported. i.e.
 auto x = matrix[2..5,0..$,3]
 would become
 auto x =  
 matrix.opSlice(Slice!(size_t)(2,5),Slice!(size_t)(0,matrix.opDollar(0)),3)
 with
 struct Slice (T) { T start; T end; }
 Strided slices would also be nice. i.e. matrix[0..$:10] // decimate the  
 array

 Multidimensional slicing can be implemented with staggered indexing:

 matrix[2..5][0..$][3]

Yes, but doing so utilizes expression templates and is relatively slow:
matrix_row_slice temp1 = matrix.opSlice(2,5);
matrix_col_slice temp2 = temp1.opSlice(0,$);
matrix                 = temp2.opIndex(3);

And causes code bloat. Worst matrix[2..5] by itself would be an unstable  
type. Either foo(matrix[2..5]) would not compile or it would generate code  
bloat and hard to find logic bugs. (Due to the fact that you've embedded  
the dimension of the slice operation into the type).

 means: first, take a slice 2..5 that returns a matrix range one  
 dimension smaller. Then, for that type take a slice from 0 to $. And so  
 on.

 This works great for row-wise storage. I'm not sure how efficient it  
 would be for other storage schemes.

No it doesn't. It works great for standard C arrays of arrays, but these  
are not matrices and have a large number of well documented performance  
issues when used as such. In general, multi-dimentional data structures  
relatively common and should be cleanly supported.

 Note how nice the distinction between the container and its views works:  
 there is only one matrix. But there are many ranges and subranges within  
 it, bearing various relationships with one another.

Yes, Data+View (i.e MVC for data structures) is a good thing(TM). But  
generally, matrices have been views into data and not the data themselves.  
(unless needed for memory management, etc)

Sep 08 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Robert Jacques wrote:
 On Mon, 08 Sep 2008 23:53:17 -0400, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 
 Robert Jacques wrote:
 On Mon, 08 Sep 2008 20:37:41 -0400, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 Denis Koroskin wrote:
 3) Walter mentioned that built-in array could be re-implemented 
 using a pair of pointers instead of ptr+length. Will it ever get a 
 green light? It fits range concept much better.

 Walter told me to first implement my design, and if it works, he'll 
 do the change. Yes, it does fit ranges much better because the 
 often-used next and, um, pop will only touch one word instead of two.

  I'd warn that changing away from ptr+length would create logical 
 incosistencies between 1D arrays and 2D/3D/ND arrays.

 How so?

 
 An ND array is typically defined as a fat pointer like so:
 struct array(T,size_t N) {
     T* ptr;
     size_t[N]    lengths;      // of each dimension
     ptrdiff_t[N] byte_strides; // of each dimension
 }
 So a 1D array is
 {
     T* ptr;
     size_t lengths;
     ptrdiff_t byte_strides = T.sizeof; //Currently a compile time 
 constant in the built-in array
     size_t length() { return lengths; }
 }
 which is logically consistent with a general dense matrix and aside from 
 some name change and the stride being a compile time constant, is 
 identical to the current D arrays.
 However, { T* first; T* last } may not be logically extended to ND 
 arrays, particularly sliced ND arrays, as T* last no longer has any 
 meaning.

Hmmm, I see. That could become a problem if we wanted lower-dimensional 
matrices to be prefixes of higher-dimensional matrices. This is a worthy 
goal, but one that my matrices don't pursue.

 4) We need some way of supporting dollar notation in user 
 containers. The hack of using __dollar is bad (although it works).

 It doesn't work for multiple dimensions. There should be an 
 opDollar(uint dim) that gives the library information on which 
 argument count it occured in. Consider:

 auto x = matrix[$-1, $-1];

 Here the dollar's occurrences have different meanings. A good start 
 would be to expand the above into:

 auto x = matrix[matrix.opDollar(0)-1, matrix.opDollar(1)-1];

  I'd also add that multiple dimension slicing should be supported. i.e.
 auto x = matrix[2..5,0..$,3]
 would become
 auto x = 
 matrix.opSlice(Slice!(size_t)(2,5),Slice!(size_t)(0,matrix.opDollar(0)),3) 

 with
 struct Slice (T) { T start; T end; }
 Strided slices would also be nice. i.e. matrix[0..$:10] // decimate 
 the array

 Multidimensional slicing can be implemented with staggered indexing:

 matrix[2..5][0..$][3]

 
 Yes, but doing so utilizes expression templates and is relatively slow:
 matrix_row_slice temp1 = matrix.opSlice(2,5);
 matrix_col_slice temp2 = temp1.opSlice(0,$);
 matrix                 = temp2.opIndex(3);
 
 And causes code bloat. Worst matrix[2..5] by itself would be an unstable 
 type. Either foo(matrix[2..5]) would not compile or it would generate 
 code bloat and hard to find logic bugs. (Due to the fact that you've 
 embedded the dimension of the slice operation into the type).

What is an unstable type?

There is no use of expression templates, but indeed multiple slices are 
created. This isn't as bad as it seems because the desire was to access 
several elements, so the slice is supposed to be around for long enough 
to justify its construction cost.

I agree it would be onerous to access a single element with e.g. 
matrix[1][1][2].

 means: first, take a slice 2..5 that returns a matrix range one 
 dimension smaller. Then, for that type take a slice from 0 to $. And 
 so on.

 This works great for row-wise storage. I'm not sure how efficient it 
 would be for other storage schemes.

 
 No it doesn't. It works great for standard C arrays of arrays, but these 
 are not matrices and have a large number of well documented performance 
 issues when used as such. In general, multi-dimentional data structures 
 relatively common and should be cleanly supported.

[Citation needed]

 Note how nice the distinction between the container and its views 
 works: there is only one matrix. But there are many ranges and 
 subranges within it, bearing various relationships with one another.

 
 Yes, Data+View (i.e MVC for data structures) is a good thing(TM). But 
 generally, matrices have been views into data and not the data 
 themselves. (unless needed for memory management, etc)

Well if better terminology comes along I'm all for it. I want to define 
the "matrix storage" as a owning container, and several "matrix ranges" 
that access the data stored by the storage.


Andrei

Sep 09 2008

"Robert Jacques" <sandford jhu.edu> writes:

On Tue, 09 Sep 2008 07:06:55 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Robert Jacques wrote:
 On Mon, 08 Sep 2008 23:53:17 -0400, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> wrote:

 Robert Jacques wrote:
 On Mon, 08 Sep 2008 20:37:41 -0400, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> wrote:
 Denis Koroskin wrote:
 3) Walter mentioned that built-in array could be re-implemented  
 using a pair of pointers instead of ptr+length. Will it ever get a  
 green light? It fits range concept much better.

 Walter told me to first implement my design, and if it works, he'll  
 do the change. Yes, it does fit ranges much better because the  
 often-used next and, um, pop will only touch one word instead of two.

  I'd warn that changing away from ptr+length would create logical  
 incosistencies between 1D arrays and 2D/3D/ND arrays.

 How so?

  An ND array is typically defined as a fat pointer like so:
 struct array(T,size_t N) {
     T* ptr;
     size_t[N]    lengths;      // of each dimension
     ptrdiff_t[N] byte_strides; // of each dimension
 }
 So a 1D array is
 {
     T* ptr;
     size_t lengths;
     ptrdiff_t byte_strides = T.sizeof; //Currently a compile time  
 constant in the built-in array
     size_t length() { return lengths; }
 }
 which is logically consistent with a general dense matrix and aside  
 from some name change and the stride being a compile time constant, is  
 identical to the current D arrays.
 However, { T* first; T* last } may not be logically extended to ND  
 arrays, particularly sliced ND arrays, as T* last no longer has any  
 meaning.

 Hmmm, I see. That could become a problem if we wanted lower-dimensional  
 matrices to be prefixes of higher-dimensional matrices. This is a worthy  
 goal, but one that my matrices don't pursue.

 4) We need some way of supporting dollar notation in user  
 containers. The hack of using __dollar is bad (although it works).

 It doesn't work for multiple dimensions. There should be an  
 opDollar(uint dim) that gives the library information on which  
 argument count it occured in. Consider:

 auto x = matrix[$-1, $-1];

 Here the dollar's occurrences have different meanings. A good start  
 would be to expand the above into:

 auto x = matrix[matrix.opDollar(0)-1, matrix.opDollar(1)-1];

  I'd also add that multiple dimension slicing should be supported.  
 i.e.
 auto x = matrix[2..5,0..$,3]
 would become
 auto x =  
 matrix.opSlice(Slice!(size_t)(2,5),Slice!(size_t)(0,matrix.opDollar(0)),3)  
 with
 struct Slice (T) { T start; T end; }
 Strided slices would also be nice. i.e. matrix[0..$:10] // decimate  
 the array

 Multidimensional slicing can be implemented with staggered indexing:

 matrix[2..5][0..$][3]

  Yes, but doing so utilizes expression templates and is relatively slow:
 matrix_row_slice temp1 = matrix.opSlice(2,5);
 matrix_col_slice temp2 = temp1.opSlice(0,$);
 matrix                 = temp2.opIndex(3);
  And causes code bloat. Worst matrix[2..5] by itself would be an  
 unstable type. Either foo(matrix[2..5]) would not compile or it would  
 generate code bloat and hard to find logic bugs. (Due to the fact that  
 you've embedded the dimension of the slice operation into the type).

 What is an unstable type?

What I meant, is that the type is fundamentally not designed to exist by  
itself. And therefore if not paired with the other slices, you've put your  
program into an a danger state.

 There is no use of expression templates,

So what would you call embedding an operation into a very temporary type  
that's not expected to last beyond the line of it's return?

 but indeed multiple slices are created. This isn't as bad as it seems  
 because the desire was to access several elements, so the slice is  
 supposed to be around for long enough to justify its construction cost.

True, but it's still less efficient.

 I agree it would be onerous to access a single element with e.g.  
 matrix[1][1][2].

And what about the code bloat and compile time or runtime logic bugs that  
this design is prone to produce?

 means: first, take a slice 2..5 that returns a matrix range one  
 dimension smaller. Then, for that type take a slice from 0 to $. And  
 so on.

 This works great for row-wise storage. I'm not sure how efficient it  
 would be for other storage schemes.

  No it doesn't. It works great for standard C arrays of arrays, but  
 these are not matrices and have a large number of well documented  
 performance issues when used as such. In general, multi-dimentional  
 data structures relatively common and should be cleanly supported.

 [Citation needed]

Imperfect C++: Practical Solutions for Real-Life Programming by Matthew  
Wilson has an entire chapter dedicated to the performance of matrices in  
C++, comparing Boost, the STL, plain arrays and a few structures of his  
own. Also, arrays of arrays don't support O(1) slicing, resize and  
creation, use more memory, etc.

As for multi-dimentional data structures, they are used heavily by the  
fields of games, graphics, scientific computing, databases and I've  
probably forgotten some others.

Sep 09 2008

Oskar Linde <oskar.lindeREM OVEgmail.com> writes:

First, let me add my support for the range proposal. It is in line with 
earlier suggestions but makes some crucial additions. I'm also very glad 
that one of the most influencing forces behind D has returned to those 
newsgroups.

Andrei Alexandrescu wrote:
 Robert Jacques wrote:
 On Mon, 08 Sep 2008 20:37:41 -0400, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 Denis Koroskin wrote:
 4) We need some way of supporting dollar notation in user 
 containers. The hack of using __dollar is bad (although it works).

 It doesn't work for multiple dimensions. There should be an 
 opDollar(uint dim) that gives the library information on which 
 argument count it occured in. Consider:

 auto x = matrix[$-1, $-1];

 Here the dollar's occurrences have different meanings. A good start 
 would be to expand the above into:

 auto x = matrix[matrix.opDollar(0)-1, matrix.opDollar(1)-1];

 I'd also add that multiple dimension slicing should be supported. i.e.
 auto x = matrix[2..5,0..$,3]
 would become
 auto x = 
 matrix.opSlice(Slice!(size_t)(2,5),Slice!(size_t)(0,matrix.opDollar(0)),3) 

 with
 struct Slice (T) { T start; T end; }
 Strided slices would also be nice. i.e. matrix[0..$:10] // decimate 
 the array

 
 Multidimensional slicing can be implemented with staggered indexing:
 
 matrix[2..5][0..$][3]
 
 means: first, take a slice 2..5 that returns a matrix range one 
 dimension smaller. Then, for that type take a slice from 0 to $. And so on.

Implementing multidimensional slicing in this way is quite irregular. 
One would expect m[2..5][1..2] to behave just like s[2..5][1..2] would 
for a regular array. (e.g. consider s being of the type char[]).

I implemented most of all of this (multidimensional arrays) a little 
more than two years ago. (Being able to implement it required me to 
write the patch that made it into DMD 0.166 that made it possible for 
structs and classes to have template member functions/operators.)

There was basically one Matrix type with rectangular, dense storage and 
a strided Slice type. I believe the only thing that makes sense when it 
comes to multidimensional slicing is how I implemented it:

m[i_1, i_2, i_3, ...],

where i_x is either a range or a singular index, and x corresponds to 
the dimension (1,2,3,...) results in a slice where each dimension 
indexed by a singular index is collapsed and dimensions indexed by a 
range is kept.

The resulting syntax is:

m[range(2,5), all, 3, range($-1,$)]

even though the possibility to use $ like this wasn't discovered until 
much later.

Of course, the optimal syntax would be something like:

m[2..5, 0..$, 3, $-1..$],

which would be easiest to implement by making a..b a value in it self of 
an integral range type. Intriguingly, integral ranges would just be an 
implementation of the same range concept you have already presented.

-- 
Oskar

Sep 11 2008

"Denis Koroskin" <2korden gmail.com> writes:

On Tue, 09 Sep 2008 04:37:41 +0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Denis Koroskin wrote:
 1) There is a typo:
  // Copies a range to another
 void copy(R1, R2)(R1 src, R2 tgt)
 {
     while (!src.isEmpty)
     {
         tgt.putNext(r.getNext);  // should be tgt.putNext(src.getNext);
     }
 }

 Thanks! Fixed.

 2) R.next and R.pop could have better names. I mean, they represent  
 similar operations yet names are so different.

 I agree. Next was a natural choice. I stole pop from Perl. Any symmetric  
 and short operation names would be welcome.

1) R.left += n / R.right -= n
2) R.left.advance(n) / R.right.advance(n) (or move)
3) R.advanceLeft(n)/R.advanceRight(n) (or moveLeft/moveRight)

 3) Walter mentioned that built-in array could be re-implemented using a  
 pair of pointers instead of ptr+length. Will it ever get a green light?  
 It fits range concept much better.

 Walter told me to first implement my design, and if it works, he'll do  
 the change. Yes, it does fit ranges much better because the often-used  
 next and, um, pop will only touch one word instead of two.

 4) We need some way of supporting dollar notation in user containers.  
 The hack of using __dollar is bad (although it works).

 It doesn't work for multiple dimensions. There should be an  
 opDollar(uint dim) that gives the library information on which argument  
 count it occured in. Consider:

 auto x = matrix[$-1, $-1];

 Here the dollar's occurrences have different meanings. A good start  
 would be to expand the above into:

 auto x = matrix[matrix.opDollar(0)-1, matrix.opDollar(1)-1];

 5) I don't quite like names left and right! :) I think they should  
 represent limits (pointers to begin and end, in case of array) rather  
 that values. In this case, built-in arrays could be implemented as  
 follows:
  struct Array(T)
 {
     T* left;
     T* right;
     size_t length() { return right-left; }
     ref T opIndex(size_t index) { return left[index]; }
     // etc
 }
  The rationale behind having access to range limits is to allow  
 operations on them. For example,
 R.left-=n;

 I disagree. Defining operations on range limits opens a box that would  
 make Pandora jealous:

 1. What is the type of left in general? Um, let's define Iterator!(R)  
 for each range R.

Yes, it should be iterator.

 2. What are the primitives of an iterator? Well, -= sounds good. How do  
 you check it for correctness? In fact, how do you check any operation of  
 a naked iterator for correctness?

First of all, left is a forward iterator and right is a backward iterator.
That's why left might support ++, += and = while right support --, -= and  
=.
Note that while we can rename ++ to advance() to get uniform naming.

In many cases it is desirable to store an iterator and set an iterator to  
the range bounds.

 3. I want to play with some data. What should I use here, ranges or  
 iterators? ...

I don't think that ranges replace iterators (yet?). I define range as a  
pair of iterators to myself.

For example, what is the equivalent D code of the following:

for (iterator it = ..., end = list.end(); it != end; ) {
     if (predicate(*it)) {
         it = listerase(it);
     } else {
         ++it;
     }
}

Range range = vector.all;
while (!range.isEmpty) {
     if (predicate(range.left)) {
         ???
     } else {
         range.next;
     }
}

Iterator solution would be as follows:
range.left = list.erase(range.left);

I took a list for the sake of simlicity, because erasing an element  
doesn't update an end. However, in general, both left and right should be  
updated. A good solution would be to return a new range, like this:

Range range = ...;
range = container.erase(???); // but what should be here?

maybe this:
container.erase(range); // erase the whole range, no need to return  
anything because it would be empty anyway
container = container.eraseFirtsN(range, n); // erase first n elements  
 from a range, and return the rest.
container = container.eraseLastN(range, n);  // the same but for other end

I don't say that iterators magically solve everyrthing, I merely try to  
find problematic places.

 Much of the smarts of the range design is that it gets away WITHOUT  
 having to answer embarrassing questions such as the above. Ranges are  
 rock-solid, and part of them being rock-solid is that they expose enough  
 primitives to be complete, but at the same time do not expose dangerous  
 internals.

 could be used instead of
 foreach(i; 0..n) {
     R.pop();
 }
  which is more efficient in many cases.

 Stop right there. That's not a primitive. It is an algorithm that gets  
 implemented in terms of a primitive. I disagree that such an algorithm  
 is an operator and does not have a name such as popN.

Okay, but you didn't mention popN() in the paper :)

 Other that that - great, I like it.

 Thanks for your comments.


 Andrei

Thank YOU!

Sep 09 2008

Sergey Gromov <snake.scaly gmail.com> writes:

Denis Koroskin <2korden gmail.com> wrote:
 5) I don't quite like names left and right! :) I think they should  
 represent limits (pointers to begin and end, in case of array) rather that  
 values. In this case, built-in arrays could be implemented as follows:
 
 struct Array(T)
 {
      T* left;
      T* right;
      size_t length() { return right-left; }
      ref T opIndex(size_t index) { return left[index]; }
      // etc
 }
 
 The rationale behind having access to range limits is to allow operations  
 on them. For example,
 R.left-=n;
 
 could be used instead of
 foreach(i; 0..n) {
      R.pop();
 }

Now you stepped onto your own landmine.  :)  "R.left-=n" extends the 
range beyond its beginning with unpredictable consequences.  That's why 
such operations shouldn't be easily accessible.

Sep 09 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Sergey Gromov wrote:
 Denis Koroskin <2korden gmail.com> wrote:
 5) I don't quite like names left and right! :) I think they should  
 represent limits (pointers to begin and end, in case of array) rather that  
 values. In this case, built-in arrays could be implemented as follows:

 struct Array(T)
 {
      T* left;
      T* right;
      size_t length() { return right-left; }
      ref T opIndex(size_t index) { return left[index]; }
      // etc
 }

 The rationale behind having access to range limits is to allow operations  
 on them. For example,
 R.left-=n;

 could be used instead of
 foreach(i; 0..n) {
      R.pop();
 }

 
 Now you stepped onto your own landmine.  :)  "R.left-=n" extends the 
 range beyond its beginning with unpredictable consequences.  That's why 
 such operations shouldn't be easily accessible.

Oh I thought it's R.right -= n.

It has become clear to me that a range never increases. It always 
shrinks. It can increase if fused to another range (I'm thinking of 
relaxing the fusion operations to allow for overlapping/adjacent ranges, 
not only ranges that include one another). But without extra info from 
the container a range can never grow.


Andrei

Sep 09 2008

Sergey Gromov <snake.scaly gmail.com> writes:

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:
 Sergey Gromov wrote:
 Denis Koroskin <2korden gmail.com> wrote:
 5) I don't quite like names left and right! :) I think they should  
 represent limits (pointers to begin and end, in case of array) rather that  
 values. In this case, built-in arrays could be implemented as follows:

 struct Array(T)
 {
      T* left;
      T* right;
      size_t length() { return right-left; }
      ref T opIndex(size_t index) { return left[index]; }
      // etc
 }

 The rationale behind having access to range limits is to allow operations  
 on them. For example,
 R.left-=n;

 could be used instead of
 foreach(i; 0..n) {
      R.pop();
 }

 
 Now you stepped onto your own landmine.  :)  "R.left-=n" extends the 
 range beyond its beginning with unpredictable consequences.  That's why 
 such operations shouldn't be easily accessible.

 
 Oh I thought it's R.right -= n.
 
 It has become clear to me that a range never increases. It always 
 shrinks. It can increase if fused to another range (I'm thinking of 
 relaxing the fusion operations to allow for overlapping/adjacent ranges, 
 not only ranges that include one another). But without extra info from 
 the container a range can never grow.

It was obviously a typo, but a very dangerous typo indeed.

Sep 09 2008

Sean Kelly <sean invisibleduck.org> writes:

Andrei Alexandrescu wrote:
...
 
 After much thought and discussions among Walter, Bartosz and myself, I 
 defined a range design and reimplemented all of std.algorithm and much 
 of std.stdio in terms of ranges alone.

Yup.  This is why I implemented all of Tango's algorithms specifically 
for arrays from the start--slices represent a reasonable approximation 
of ranges, and this seems far preferable to the iterator approach of 
C++.  Glad to hear that this is what you've decided as well.

 This is quite a thorough test 
 because the algorithms are diverse and stress-test the expressiveness 
 and efficiency of the range design. Along the way I made the interesting 
 realization that certain union/difference operations are needed as 
 primitives for ranges. There are also a few bugs in the compiler and 
 some needed language enhancements (e.g. returning a reference from a 
 function); Walter is committed to implement them.

Very nice.  The inability to return a reference has been a thorn in my 
side for ages.

 I put together a short document for the range design. I definitely 
 missed about a million things and have been imprecise about another 
 million, so feedback would be highly appreciated. See:
 
 http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html

It seems workable from a 1000' view.  I'll have to try and apply the 
approach to some algorithms and see if anything comes up.  So far, 
dealing with bidirectional ranges seems a bit weird, but that's likely 
more related to the syntax (ie. 'pop') than anything.


Sean

P.S. This decision has interesting implications for D2+, given the 
functional tendencies already present in the language :-)

Sep 08 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Sean Kelly wrote:
 Andrei Alexandrescu wrote:
 ...
 After much thought and discussions among Walter, Bartosz and myself, I 
 defined a range design and reimplemented all of std.algorithm and much 
 of std.stdio in terms of ranges alone.

 
 Yup.  This is why I implemented all of Tango's algorithms specifically 
 for arrays from the start--slices represent a reasonable approximation 
 of ranges, and this seems far preferable to the iterator approach of 
 C++.  Glad to hear that this is what you've decided as well.

That's great to hear, but I should warn you that moving from arrays to 
"the lowest range that will do" is not quite easy. Think of std::rotate 
for example.

 This is quite a thorough test because the algorithms are diverse and 
 stress-test the expressiveness and efficiency of the range design. 
 Along the way I made the interesting realization that certain 
 union/difference operations are needed as primitives for ranges. There 
 are also a few bugs in the compiler and some needed language 
 enhancements (e.g. returning a reference from a function); Walter is 
 committed to implement them.

 
 Very nice.  The inability to return a reference has been a thorn in my 
 side for ages.
 
 I put together a short document for the range design. I definitely 
 missed about a million things and have been imprecise about another 
 million, so feedback would be highly appreciated. See:

 http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html

 
 It seems workable from a 1000' view.  I'll have to try and apply the 
 approach to some algorithms and see if anything comes up.  So far, 
 dealing with bidirectional ranges seems a bit weird, but that's likely 
 more related to the syntax (ie. 'pop') than anything.
 
 
 Sean
 
 P.S. This decision has interesting implications for D2+, given the 
 functional tendencies already present in the language :-)

Wait until you see the generators! An efficient generator of Fibonacci 
numbers in one line...

auto fib = generate!("a[0] + a[1]")(1, 1);

I'm so excited I can hardly stand myself. :o)


Andrei

Sep 08 2008

Sean Kelly <sean invisibleduck.org> writes:

Andrei Alexandrescu wrote:
 Sean Kelly wrote:
 Andrei Alexandrescu wrote:
 ...
 After much thought and discussions among Walter, Bartosz and myself, 
 I defined a range design and reimplemented all of std.algorithm and 
 much of std.stdio in terms of ranges alone.

 Yup.  This is why I implemented all of Tango's algorithms specifically 
 for arrays from the start--slices represent a reasonable approximation 
 of ranges, and this seems far preferable to the iterator approach of 
 C++.  Glad to hear that this is what you've decided as well.

 
 That's great to hear, but I should warn you that moving from arrays to 
 "the lowest range that will do" is not quite easy. Think of std::rotate 
 for example.

I'll admit that I find some of the features <algorithm> provides to be 
pretty weird.  Has anyone ever actually wanted to sort something other 
than a random-access range in C++?  Or rotate one, for example?  These 
operations are allowed, but to me they fall outside the realm of useful 
functionality.  I suppose there may be some relation here to Stepanov's 
idea of a computational basis.  Should an algorithm operate on a range 
if it cannot do so efficiently?  And even if it does, will anyone 
actually use it?


Sean

Sep 08 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Sean Kelly wrote:
 Andrei Alexandrescu wrote:
 Sean Kelly wrote:
 Andrei Alexandrescu wrote: ...
 
 After much thought and discussions among Walter, Bartosz and
 myself, I defined a range design and reimplemented all of
 std.algorithm and much of std.stdio in terms of ranges alone.

 
 Yup.  This is why I implemented all of Tango's algorithms 
 specifically for arrays from the start--slices represent a
 reasonable approximation of ranges, and this seems far preferable
 to the iterator approach of C++.  Glad to hear that this is what
 you've decided as well.

 
 That's great to hear, but I should warn you that moving from arrays
 to "the lowest range that will do" is not quite easy. Think of 
 std::rotate for example.

 
 I'll admit that I find some of the features <algorithm> provides to
 be pretty weird.  Has anyone ever actually wanted to sort something
 other than a random-access range in C++?  Or rotate one, for example?

Great questions. I don't recall having needed to sort a list lately, but 
rotate is a great function that has an undeservedly odd name. What 
rotate does is to efficiently transform this:

a, b, c, d, e, f, A, B, C, D

into this:

A, B, C, D, a, b, c, d, e, f

I use that all the time because it's really a move-to-front operation. 
In fact my algorithm2 implementation does not call it rotate anymore, it 
calls it moveToFront and allows you to move any subrange of a range to 
the front of that range efficiently. It's a useful operation in a great 
deal of lookup strategies.

 These operations are allowed, but to me they fall outside the realm
 of useful functionality.  I suppose there may be some relation here
 to Stepanov's idea of a computational basis.  Should an algorithm
 operate on a range if it cannot do so efficiently?  And even if it
 does, will anyone actually use it?

I think it all depends on what one's day-to-day work consists of. I was 
chatting to Walter about it and he confessed that, although he has a 
great deal of respect for std.algorithm, he's not using much of it. I 
told him back that I need 80% of std.algorithm on a daily basis. In fact 
that's why I wrote it - otherwise I wouldn't have had the time to put 
into it.

This is because I make next to no money so I can afford to work on basic 
research, which is "important" in a long-ranging way. Today's computing 
is quite disorganized and great energy is expended on gluing together 
various pieces, protocols, and interfaces. I've worked in that 
environment quite a lot, and dealing with glue can easily become 90% of 
a day's work, leaving only little time to get occupied with a real 
problem, such as making a computer genuinely smarter or at least more 
helpful towards its user. All too often we put a few widgets on a window 
and the actual logic driving those buttons - the "smarts", the actual 
"work" gets drowned by details taking care of making that logic stick to 
the buttons.

I mentioned in a talk once that any programmer should know how to 
multiply two matrices. Why? Because if you don't, you can't tackle a 
variety of problems that can be easily expressed in terms of matrix 
multiplication, even though they have nothing to do with algebra 
(rotating figures, machine learning, fractals, fast series...). A person 
in the audience said that she never actually needs to multiply two 
matrices, so why bother? I gave an evasive response, but the reality was 
that that was a career-limiting state of affairs for her.


Andrei

Sep 08 2008

Sean Kelly <sean invisibleduck.org> writes:

Andrei Alexandrescu wrote:
 Sean Kelly wrote:
 Andrei Alexandrescu wrote:
 Sean Kelly wrote:
 Andrei Alexandrescu wrote: ...
 After much thought and discussions among Walter, Bartosz and
 myself, I defined a range design and reimplemented all of
 std.algorithm and much of std.stdio in terms of ranges alone.

 Yup.  This is why I implemented all of Tango's algorithms 
 specifically for arrays from the start--slices represent a
 reasonable approximation of ranges, and this seems far preferable
 to the iterator approach of C++.  Glad to hear that this is what
 you've decided as well.

 That's great to hear, but I should warn you that moving from arrays
 to "the lowest range that will do" is not quite easy. Think of 
 std::rotate for example.

 I'll admit that I find some of the features <algorithm> provides to
 be pretty weird.  Has anyone ever actually wanted to sort something
 other than a random-access range in C++?  Or rotate one, for example?

 
 Great questions. I don't recall having needed to sort a list lately, but 
 rotate is a great function that has an undeservedly odd name. What 
 rotate does is to efficiently transform this:
 
 a, b, c, d, e, f, A, B, C, D
 
 into this:
 
 A, B, C, D, a, b, c, d, e, f
 
 I use that all the time because it's really a move-to-front operation.

Ah, so it's a bit like partition and select.  I use the two of those 
constantly, but haven't ever had a need for rotate.  Odd, I suppose, 
since they're so similar.

 In fact my algorithm2 implementation does not call it rotate anymore, it 
 calls it moveToFront and allows you to move any subrange of a range to 
 the front of that range efficiently. It's a useful operation in a great 
 deal of lookup strategies.
 
 These operations are allowed, but to me they fall outside the realm
 of useful functionality.  I suppose there may be some relation here
 to Stepanov's idea of a computational basis.  Should an algorithm
 operate on a range if it cannot do so efficiently?  And even if it
 does, will anyone actually use it?

 
 I think it all depends on what one's day-to-day work consists of. I was 
 chatting to Walter about it and he confessed that, although he has a 
 great deal of respect for std.algorithm, he's not using much of it. I 
 told him back that I need 80% of std.algorithm on a daily basis. In fact 
 that's why I wrote it - otherwise I wouldn't have had the time to put 
 into it.

Exactly.  I implemented Tango's Array module for the same reason.  Other 
than rotate and stable_sort, I think the module has everything from 
<algorithm>, plus some added bits.

 This is because I make next to no money so I can afford to work on basic 
 research, which is "important" in a long-ranging way. Today's computing 
 is quite disorganized and great energy is expended on gluing together 
 various pieces, protocols, and interfaces. I've worked in that 
 environment quite a lot, and dealing with glue can easily become 90% of 
 a day's work, leaving only little time to get occupied with a real 
 problem, such as making a computer genuinely smarter or at least more 
 helpful towards its user. All too often we put a few widgets on a window 
 and the actual logic driving those buttons - the "smarts", the actual 
 "work" gets drowned by details taking care of making that logic stick to 
 the buttons.

I've never worked in that environment, but I would think that even such 
positions require the use of algorithms.  If not, then I wouldn't 
consider them to be software engineering positions.  As for research-- 
I'd say that's a fairly broad category.  My first salaried position was 
in R&D for a switched long-distance carrier, for example, but that's 
applied research as opposed to academic research, which I believe you're 
describing.  I think there are benefits to each, but the overlap is what 
truly interests me.

 I mentioned in a talk once that any programmer should know how to 
 multiply two matrices. Why? Because if you don't, you can't tackle a 
 variety of problems that can be easily expressed in terms of matrix 
 multiplication, even though they have nothing to do with algebra 
 (rotating figures, machine learning, fractals, fast series...). A person 
 in the audience said that she never actually needs to multiply two 
 matrices, so why bother? I gave an evasive response, but the reality was 
 that that was a career-limiting state of affairs for her.

Yup.  This is a lot like the argument for Calculus in a CS curriculum. 
Entry-level software positions rarely require such things and yet I've 
been surprised at how many times they have proven useful over the years, 
simply for general problem-solving.  And there's certainly no debate 
about Linear Algebra--they may as well rename it "math for computer 
programming."


Sean

Sep 09 2008

Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:

Andrei Alexandrescu wrote:
 
 This is because I make next to no money so I can afford to work on basic 
 research, which is "important" in a long-ranging way. Today's computing 
 is quite disorganized and great energy is expended on gluing together 
 various pieces, protocols, and interfaces. I've worked in that 
 environment quite a lot, and dealing with glue can easily become 90% of 
 a day's work, leaving only little time to get occupied with a real 
 problem, such as making a computer genuinely smarter or at least more 
 helpful towards its user. All too often we put a few widgets on a window 
 and the actual logic driving those buttons - the "smarts", the actual 
 "work" gets drowned by details taking care of making that logic stick to 
 the buttons.
 

Well, didn't you find a "real problem" right there (and also a very 
interesting one), in trying to make 
code/libraries/methodologies/tools/whatever that reduce those 90% of 
work in boilerplate details?
An example could the years of investment and research in ORM frameworks 
(Hibernate/EJB3, Ruby on Rails, etc.), which despite ORM technology 
having existed for quite many years, only recently has it reached a 
point where it's really easy and non-tedious to write an OO-DB 
persistence mapping.
Another possible example, regarding GUI programming like you mentioned, 
is data binding. I haven't used it myself yet, but for what they 
describe, it's purpose is indeed to reduce a lot of the complexity and 
tedium in writing code to synchronize the UI with the model/logic, and 
vice-versa.
Learning and building these kinds of stuff is, IMO, the pinnacle of 
software engineering.

-- 
Bruno Medeiros - Software Developer, MSc. in CS/E graduate
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D

Sep 25 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Bruno Medeiros wrote:
 Andrei Alexandrescu wrote:
 This is because I make next to no money so I can afford to work on 
 basic research, which is "important" in a long-ranging way. Today's 
 computing is quite disorganized and great energy is expended on gluing 
 together various pieces, protocols, and interfaces. I've worked in 
 that environment quite a lot, and dealing with glue can easily become 
 90% of a day's work, leaving only little time to get occupied with a 
 real problem, such as making a computer genuinely smarter or at least 
 more helpful towards its user. All too often we put a few widgets on a 
 window and the actual logic driving those buttons - the "smarts", the 
 actual "work" gets drowned by details taking care of making that logic 
 stick to the buttons.

 
 Well, didn't you find a "real problem" right there (and also a very 
 interesting one), in trying to make 
 code/libraries/methodologies/tools/whatever that reduce those 90% of 
 work in boilerplate details?
 An example could the years of investment and research in ORM frameworks 
 (Hibernate/EJB3, Ruby on Rails, etc.), which despite ORM technology 
 having existed for quite many years, only recently has it reached a 
 point where it's really easy and non-tedious to write an OO-DB 
 persistence mapping.
 Another possible example, regarding GUI programming like you mentioned, 
 is data binding. I haven't used it myself yet, but for what they 
 describe, it's purpose is indeed to reduce a lot of the complexity and 
 tedium in writing code to synchronize the UI with the model/logic, and 
 vice-versa.
 Learning and building these kinds of stuff is, IMO, the pinnacle of 
 software engineering.

This hardly characterizes or answers my point. Of course wherever 
there's difficulty there's opportunity for automation, and research in 
software engineering is alive and well. My point was that much effort in 
the industry today is expended on dealing with effects instead of 
fighting the causes.

Andrei

Sep 25 2008

Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:

Andrei Alexandrescu wrote:
 Bruno Medeiros wrote:
 Andrei Alexandrescu wrote:
 This is because I make next to no money so I can afford to work on 
 basic research, which is "important" in a long-ranging way. Today's 
 computing is quite disorganized and great energy is expended on 
 gluing together various pieces, protocols, and interfaces. I've 
 worked in that environment quite a lot, and dealing with glue can 
 easily become 90% of a day's work, leaving only little time to get 
 occupied with a real problem, such as making a computer genuinely 
 smarter or at least more helpful towards its user. All too often we 
 put a few widgets on a window and the actual logic driving those 
 buttons - the "smarts", the actual "work" gets drowned by details 
 taking care of making that logic stick to the buttons.

 Well, didn't you find a "real problem" right there (and also a very 
 interesting one), in trying to make 
 code/libraries/methodologies/tools/whatever that reduce those 90% of 
 work in boilerplate details?
 An example could the years of investment and research in ORM 
 frameworks (Hibernate/EJB3, Ruby on Rails, etc.), which despite ORM 
 technology having existed for quite many years, only recently has it 
 reached a point where it's really easy and non-tedious to write an 
 OO-DB persistence mapping.
 Another possible example, regarding GUI programming like you 
 mentioned, is data binding. I haven't used it myself yet, but for what 
 they describe, it's purpose is indeed to reduce a lot of the 
 complexity and tedium in writing code to synchronize the UI with the 
 model/logic, and vice-versa.
 Learning and building these kinds of stuff is, IMO, the pinnacle of 
 software engineering.

 
 This hardly characterizes or answers my point. Of course wherever 
 there's difficulty there's opportunity for automation, and research in 
 software engineering is alive and well.

I was just pointing that things don't have to be way you described.

 My point was that much effort in 
 the industry today is expended on dealing with effects instead of 
 fighting the causes.
 
 Andrei

But that's quite true nonetheless. :/


-- 
Bruno Medeiros - Software Developer, MSc. in CS/E graduate
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D

Sep 25 2008

"Manfred_Nowak" <svv1999 hotmail.com> writes:

Andrei Alexandrescu wrote:

  feedback would be highly appreciated

1) Example in "4. Bidirectional range"

Reversing of ranges can be done in constant runtime, but the example 
exposes runtime linear in the number of elements.

This might be a hint, that a "6. Reversable Range" might be required, 
because a range reversable in constant time requires more space.


2) [left,right][Diff,Union]

Ranges are not sets; therefore not only me might have problems to 
capture the idea behind "difference" and "union" on ranges.

Of course one can define whatever one wants, but I would prefer
[sub,snip,cut,split,...][B,E][B,E] (r,s)

I.e. `subBB(r,s)' is the subrange of `r' starting at the beginning of 
`r' and ending at the beginning of `s' (including the beginning of `r', 
but not including the beginning of `s').

It my be of some worth to include the `B' or `E' as parameters to the  
choosen keyword(?) to enable algorithmically accesses:

| sub(B,B,r,s)

instead of `leftDiff( r, s)'

-manfred
-- 
If life is going to exist in this Universe, then the one thing it 
cannot afford to have is a sense of proportion. (Douglas Adams)

Sep 08 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Manfred_Nowak wrote:
 Andrei Alexandrescu wrote:
 
  feedback would be highly appreciated

 
 1) Example in "4. Bidirectional range"
 
 Reversing of ranges can be done in constant runtime, but the example 
 exposes runtime linear in the number of elements.
 
 This might be a hint, that a "6. Reversable Range" might be required, 
 because a range reversable in constant time requires more space.

There are numerous collections and ranges to be defined, of course. The 
five-kinds taxonomy is time-tested and allows implementation of a great 
many algorithms. Beyond that, users can define many containers and 
ranges with additional operations or with improved properties of 
existing operations.

 2) [left,right][Diff,Union]
 
 Ranges are not sets; therefore not only me might have problems to 
 capture the idea behind "difference" and "union" on ranges.

I am opened to better names. Bartosz talked me into the ones above. I 
used these:

leftToLeft
leftToRight
rightToRight

 Of course one can define whatever one wants, but I would prefer
 [sub,snip,cut,split,...][B,E][B,E] (r,s)
 
 I.e. `subBB(r,s)' is the subrange of `r' starting at the beginning of 
 `r' and ending at the beginning of `s' (including the beginning of `r', 
 but not including the beginning of `s').
 
 It my be of some worth to include the `B' or `E' as parameters to the  
 choosen keyword(?) to enable algorithmically accesses:
 
 | sub(B,B,r,s)
 
 instead of `leftDiff( r, s)'

I find these too cryptic, but to each their own. I predict that 
primitive names will become a bicycle shed. In the end we'll have to use 
enum :o).


Andrei

Sep 08 2008

"Manfred_Nowak" <svv1999 hotmail.com> writes:

Andrei Alexandrescu wrote:

 1)

 There are numerous collections and ranges to be defined

You are right. A reversable range can be built with the designed 
primitives. But this also holds for the `Retro'-type.

 2)

 I used these:
 leftToLeft

Much better, but did you notice, that you used the directionless words 
"beginning" and "end" to describe their semantics? That's why I used 
"B" and "E". With directions in the names one might be forced to make 
wrappers for not being irritated by the directions in case they do not 
fit.

Ex.: Imagine a 4d-matrix in which a 3d-range is defined diagonally to 
three of the axis of the 4d-matrix. What is left respective right of 
such a range?  

3) casting

On second read I miss some words about explicite and implicite casting 
possibilities between the five types of ranges. 

-manfred


-- 
If life is going to exist in this Universe, then the one thing it cannot 
afford to have is a sense of proportion. (Douglas Adams)

Sep 08 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Manfred_Nowak wrote:
 Andrei Alexandrescu wrote:
 
 1)

 There are numerous collections and ranges to be defined

 
 You are right. A reversable range can be built with the designed 
 primitives. But this also holds for the `Retro'-type.

Indeed Retro was provided as a mere example and has no special status. I 
plan to add some more widely useful ranges, such as a circular range. 
Contributions will be appreciated (of course after the design gets frozen).

 2)

 I used these:
 leftToLeft

 
 Much better, but did you notice, that you used the directionless words 
 "beginning" and "end" to describe their semantics? That's why I used 
 "B" and "E". With directions in the names one might be forced to make 
 wrappers for not being irritated by the directions in case they do not 
 fit.
 
 Ex.: Imagine a 4d-matrix in which a 3d-range is defined diagonally to 
 three of the axis of the 4d-matrix. What is left respective right of 
 such a range?  

Good point!

 3) casting
 
 On second read I miss some words about explicite and implicite casting 
 possibilities between the five types of ranges. 

Yes, I also mentioned that in a different post.


Andrei

Sep 08 2008

"Manfred_Nowak" <svv1999 hotmail.com> writes:

Andrei Alexandrescu wrote:

[...]

4) Operations on Ranges

In the discussion on the naming of `leftDiff' etc. the arguments of all 
participants implicitely declared "splitting" and "concatenating" 
operations on ranges. But there are none defined. Why?


Googling for "range algebra" gave a hit on

http://www.idealliance.org/papers/extreme/proceedings/xslfo-
pdf/2002/Nicol01/EML2002Nicol01.pdf

which seems to express pretty much of the concept presented.

-manfred
-- 
If life is going to exist in this Universe, then the one thing it 
cannot afford to have is a sense of proportion. (Douglas Adams)

Sep 09 2008

"Lionello Lunesu" <lionello lunesu.remove.com> writes:

"Andrei Alexandrescu" <SeeWebsiteForEmail erdani.org> wrote in message 
news:ga46ok$2s77$1 digitalmars.com...
 I put together a short document for the range design. I definitely missed 
 about a million things and have been imprecise about another million, so 
 feedback would be highly appreciated. See:

 http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html

This is just awesome. Thank you for tackling this issue.

I agree with others that some names are not so obvious. Left/right? How do 
Arabic speakers feel about this : ) Begin/end seems more intuitive.

Can you explain this please. From Input range:

e=r.getNext Reads an element and moves to the next one. Returns the read 
element, which is of type ElementType!(R). The call is defined only right 
after r.isEmpty returned false.

That last part: The call is defined only right after r.isEmpty returned 
false.

First of all, is* functions always sound "const" to me, but the way you 
describe isEmpty it sounds like it actually changes something, advancing a 
pointer or something like that. What happens if isEmpty is called twice? 
Will it skip 1 element?


more obvious: while (i.MoveNext()) e = i.Current; But isEmpty is common to 
all ranges, so I understand why it's the way it is. I just hope it could 
stay "const", not modifying the internal state. Perhaps add "next" to input 
ranges as well?

L.

Sep 08 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

(This is an older message that somehow didn't make it to the group. 
Resending now.)

Lionello Lunesu wrote:
 
 "Andrei Alexandrescu" <SeeWebsiteForEmail erdani.org> wrote in message 
 news:ga46ok$2s77$1 digitalmars.com...
 I put together a short document for the range design. I definitely 
 missed about a million things and have been imprecise about another 
 million, so feedback would be highly appreciated. See:

 http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html

 
 This is just awesome. Thank you for tackling this issue.
 
 I agree with others that some names are not so obvious. Left/right? How 
 do Arabic speakers feel about this : ) Begin/end seems more intuitive.

I don't know of that particular cultural sensitivity. Begin and end are
bad choices because they'd confuse the heck out of STL refugees. c.left
and c.right are actually STL's c.front() and c.back() or *c.begin() and
c.end()[-1], if you allow the notational abuse. But I sure hope some
good names will come along.

 Can you explain this please. From Input range:
 
 e=r.getNext Reads an element and moves to the next one. Returns the read 
 element, which is of type ElementType!(R). The call is defined only 
 right after r.isEmpty returned false.
 
 That last part: The call is defined only right after r.isEmpty returned 
 false.
 
 First of all, is* functions always sound "const" to me, but the way you 
 describe isEmpty it sounds like it actually changes something, advancing 
 a pointer or something like that. What happens if isEmpty is called 
 twice? Will it skip 1 element?

Excellent question! Gosh if I staged this it couldn't have gone any better.

Consider an input range getting ints separated by whitespace out of a
FILE* - something that we'd expect our design to allow easily and
neatly. So then how do I implement isEmpty()? Well, feof(f) is not
really informative at all. Maybe the file has five more spaces and then
it ends. So in order to TELL you that the range is not empty, I actually
have to GO all the way and actually read the integer. Then I can tell
you: yeah, there's stuff available, or no, I'm done, or even throw an
exception if some error happened.

That makes isEmpty non-const. You check for r.isEmpty, it makes sure an
int is buffered inside the range's state. You call r.isEmpty again, it
doesn't do anything because an int is already buffered. You call
r.getNext, the int gets moved off the range's state into your program,
and the internal flag is set telling the range that the buffer's empty.

You call r.getNext without having called r.isEmpty, then the range makes
a heroic effort to fetch another int. If that fails, the range throws an
exception. So in essence the behavior is that you can use isEmpty to
make sure that getNext won't blow in your face (speaking of Pulp
Fiction...) I think this all is very sensible, and even more sensible
when I'll give more detail below.


 are more obvious: while (i.MoveNext()) e = i.Current; But isEmpty is 
 common to all ranges, so I understand why it's the way it is. I just 
 hope it could stay "const", not modifying the internal state. Perhaps 
 add "next" to input ranges as well?

This is even better than the previous question. Why not this for an
input iterator?

for (; !r.isEmpty; r.next) use(r.getNext);

or even

for (; !r.isEmpty; r.next) use(r.left);

thus making input ranges quite similar to forward ranges.

In that design:

a) The constructor fetches the first int

b) isEmpty is const and just returns the "available" internal flag

c) next reads the next int off the file

First I'll eliminate the design with isEmpty/next/getNext as flawed for
a subtle reason: cost of copying. Replace mentally the int above with
something that needs dynamic allocation, such as BigInt. So the range
reads one BigInt off the file, stores it in the internal buffer, and
then the user calls:

for (; !r.isEmpty; r.next)
{
     BigInt my = r.getNext();
     ....
}

Since in one iteration there's only one BigInt, not two, I'd need to do
a destructive transfer in getNext() that "moves" the state of BigInt
from the range to my, leaving r's state empty. (This feature will be
available in D2 soon.) But then what if somebody calls r.getNext()
again? Well I don't have the data anymore, so I need to issue a next().
So I discover next was not needed in the first place.

Hope everything is clear so far. Now let's discuss the isEmpty/next/left
design.

That design is also flawed, just in a different way. The range holds the
BigInt inside, but r.left gives to the client a *reference* to it. This
is cool! There is no more extra copying and everything works smoothly.

In fact this design patents a lie. It lies about giving a reference to
an element (the same way a "real" container does) but that element will
be overwritten every time r.next is called, UNBEKNOWNST TO THE CLIENT.
So consider the algorithm:

void bump(R)(R r)
{
     for (; !r.isEmpty(); r.next) ++r.left;
}

You pass an input range to bump and it compiles and executes to
something entirely nonsensical. This is an obvious misuse, but as I'm
sure you know it gets real confusing real fast.

A possible argument is to have left return a ref const(BigInt), but then
we lose the ability to transfer its value into our state.

What we want is a design that tells the truth. And a design that tells
the truth is this:

r.isEmpty does whatever the hell it takes to make sure whether there's
data available or not. It is not const and it could throw an exception.

v = r.getNext returns BY VALUE by means of DESTRUCTIVE COPY data that
came through the wire, data that the client now owns as soon as getNext
returned. There is no extra copy, no extra allocation, and the real
thing has happened: data has been read from the outside and user code
was made the only owner of it.


Andrei

Sep 09 2008

superdan <super dan.org> writes:

Andrei Alexandrescu Wrote:

 What we want is a design that tells the truth. And a design that tells
 the truth is this:
 
 r.isEmpty does whatever the hell it takes to make sure whether there's
 data available or not. It is not const and it could throw an exception.
 
 v = r.getNext returns BY VALUE by means of DESTRUCTIVE COPY data that
 came through the wire, data that the client now owns as soon as getNext
 returned. There is no extra copy, no extra allocation, and the real
 thing has happened: data has been read from the outside and user code
 was made the only owner of it.

this is really kool n the gang. there's a sore point tho. if i wanna read
strings from a file no prob.

for (auto r = stringSucker(stdin); !r.isEmpty(); )
{
    string s = r.getNext();
    // play with s
}

but a new string is allocated every line. that's safe but slow. so i want some
more efficient stuff. i should use char[] because string don't deallocate.

for (auto r = charArraySucker(stdin); !r.isEmpty(); )
{
    char[] s = r.getNext();
    // play with s
}

no improvement. same thing a new char[] is alloc each time. maybe i could do

for (auto r = charArraySucker(stdin); !r.isEmpty(); )
{
    char[] s = r.getNext();
    // play with s
    delete s;
}

would this make stuff faster. maybe. maybe not. and it's not general. what i'd
like is some way of telling the range, i'm done with this you can recycle and
reuse it. it's a green green world.

for (auto r = charArraySucker(stdin); !r.isEmpty(); )
{
    char[] s = r.getNext();
    // play with s
    r.recycle(s);
}

sig is recycle(ref ElementType!(R)).

Sep 09 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

superdan wrote:
 Andrei Alexandrescu Wrote:
 
 What we want is a design that tells the truth. And a design that tells
 the truth is this:

 r.isEmpty does whatever the hell it takes to make sure whether there's
 data available or not. It is not const and it could throw an exception.

 v = r.getNext returns BY VALUE by means of DESTRUCTIVE COPY data that
 came through the wire, data that the client now owns as soon as getNext
 returned. There is no extra copy, no extra allocation, and the real
 thing has happened: data has been read from the outside and user code
 was made the only owner of it.

 
 this is really kool n the gang. there's a sore point tho. if i wanna read
strings from a file no prob.
 
 for (auto r = stringSucker(stdin); !r.isEmpty(); )
 {
     string s = r.getNext();
     // play with s
 }
 
 but a new string is allocated every line. that's safe but slow. so i want some
more efficient stuff. i should use char[] because string don't deallocate.
 
 for (auto r = charArraySucker(stdin); !r.isEmpty(); )
 {
     char[] s = r.getNext();
     // play with s
 }
 
 no improvement. same thing a new char[] is alloc each time. maybe i could do
 
 for (auto r = charArraySucker(stdin); !r.isEmpty(); )
 {
     char[] s = r.getNext();
     // play with s
     delete s;
 }
 
 would this make stuff faster. maybe. maybe not. and it's not general. what i'd
like is some way of telling the range, i'm done with this you can recycle and
reuse it. it's a green green world.
 
 for (auto r = charArraySucker(stdin); !r.isEmpty(); )
 {
     char[] s = r.getNext();
     // play with s
     r.recycle(s);
 }
 
 sig is recycle(ref ElementType!(R)).

This is a great point. Unfortunately that won't quite work properly. 
Equally unfortunately, you just revealed an issue with my design.

One goal of range design is that algorithms written for "inferior" 
ranges should work seamlessly with "superior" ranges. For example, find 
should work fine with any range, so we should write (with your idea):

R find(R, V)(R r, V v)
{
     ElementType!(R) tmp;
     for (; !r.isEmpty; r.recycle(tmp))
     {
         tmp = r.getNext;
         if (tmp == v) break;
     }
     return r;
}

This looks great but imagine we apply this to a collection of some 
costly ElementType!(R). Then getNext rightly returns a reference because 
there's no need to create a copy unless absolutely necessary. But then 
we realize that our look forces a copy no matter what! The copy was good 
for the input range. But it's bad for the actual container that wouldn't 
need to copy anything.

It looks like I need to reconsider the design mentioned by Lionello, in 
which both input iterators and forward iterators expose separate 
isEmpty/next/first operations. Then an input range r caches the last 
read value internally and gives access to it through r.first.

I have an idea on how to disallow at least egregious errors. Consider:

void fill(R, V)(R r, V v)
{
     for (; !r.isEmpty; r.next) r.first = v;
}

If "r.first = v;" compiles, then the following scandalous misuse goes 
unpunished:

fill(charArraySucker, "bogus");

A way to prevent r.first = v from compiling is to define a one-argument 
function first:

struct MyRange(T)
{
     private void first(W)(W whatever); // not implemented
     ref T first();
}

Expression r.first will resolve to the second function. Expression 
r.first = v will translate into r.first(v) so it will resolve to the 
first function, which will fail the protection test.

Then there remains the problem:

void bump(R, V)(R r)
{
     for (; !r.isEmpty; r.next) ++r.first;
}

bump(charArraySucker); // bogus

Sigh...


Andrei

Sep 09 2008

Sean Kelly <sean invisibleduck.org> writes:

Andrei Alexandrescu wrote:
 
 Then there remains the problem:
 
 void bump(R, V)(R r)
 {
     for (; !r.isEmpty; r.next) ++r.first;
 }
 
 bump(charArraySucker); // bogus
 
 Sigh...

And I suppose you don't want to return a const reference from first() 
because the user may want to operate on the value, if only on a 
temporary basis?  Let's say:

P findFirst(P, R, C)( R r, C c )
{
     for( ; !r.isEmpty; r.next )
     {
         // modify the temporary because a new element is expensive
         // to copy-construct
         if( auto p = c.contains( ++r.first ) )
             return p;
     }
     return P.init;
}

Hm... must the implementation prevent stupid mistakes such as your 
example?  Ideally, yes.  But I don't see a way to do so and yet allow 
for routines like findFirst() above.


Sean

Sep 09 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Sean Kelly wrote:
 Andrei Alexandrescu wrote:
 Then there remains the problem:

 void bump(R, V)(R r)
 {
     for (; !r.isEmpty; r.next) ++r.first;
 }

 bump(charArraySucker); // bogus

 Sigh...

 
 And I suppose you don't want to return a const reference from first() 
 because the user may want to operate on the value, if only on a 
 temporary basis?  Let's say:
 
 P findFirst(P, R, C)( R r, C c )
 {
     for( ; !r.isEmpty; r.next )
     {
         // modify the temporary because a new element is expensive
         // to copy-construct
         if( auto p = c.contains( ++r.first ) )
             return p;
     }
     return P.init;
 }
 
 Hm... must the implementation prevent stupid mistakes such as your 
 example?  Ideally, yes.  But I don't see a way to do so and yet allow 
 for routines like findFirst() above.

Yes, exactly. Also consider a user that inspects lines in a file, and 
occasionally takes ownership of the current line to put it into a 
hashtable or something.

I think I'll resign myself to isEmpty/next/first for input ranges. The 
remaining difference between input ranges and forward ranges is that the 
former are uncopyable.


Andrei

Sep 09 2008

"Lionello Lunesu" <lionello lunesu.remove.com> writes:

"superdan" <super dan.org> wrote in message 
news:ga5vjs$snn$1 digitalmars.com...
 Andrei Alexandrescu Wrote:

 What we want is a design that tells the truth. And a design that tells
 the truth is this:

 r.isEmpty does whatever the hell it takes to make sure whether there's
 data available or not. It is not const and it could throw an exception.

 v = r.getNext returns BY VALUE by means of DESTRUCTIVE COPY data that
 came through the wire, data that the client now owns as soon as getNext
 returned. There is no extra copy, no extra allocation, and the real
 thing has happened: data has been read from the outside and user code
 was made the only owner of it.

 this is really kool n the gang. there's a sore point tho. if i wanna read 
 strings from a file no prob.

 for (auto r = stringSucker(stdin); !r.isEmpty(); )
 {
    string s = r.getNext();
    // play with s
 }

 but a new string is allocated every line. that's safe but slow. so i want 
 some more efficient stuff. i should use char[] because string don't 
 deallocate.

 for (auto r = charArraySucker(stdin); !r.isEmpty(); )
 {
    char[] s = r.getNext();
    // play with s
 }

 no improvement. same thing a new char[] is alloc each time. maybe i could 
 do

 for (auto r = charArraySucker(stdin); !r.isEmpty(); )
 {
    char[] s = r.getNext();
    // play with s
    delete s;
 }

 would this make stuff faster. maybe. maybe not. and it's not general. what 
 i'd like is some way of telling the range, i'm done with this you can 
 recycle and reuse it. it's a green green world.

 for (auto r = charArraySucker(stdin); !r.isEmpty(); )
 {
    char[] s = r.getNext();
    // play with s
    r.recycle(s);
 }

 sig is recycle(ref ElementType!(R)).

Can't this be done by creating different ranges? I mean, trying to find a 
'one size fits all' model is usually a lost cause. And now we have different 
consumers and different types.

Perhaps one sucker is instantiated with its own internal buffer and another 
sucker allocates every new item. And possibly yet another sucker only 
returns invariant items.

L.

Sep 09 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Lionello Lunesu wrote:
 
 "superdan" <super dan.org> wrote in message 
 news:ga5vjs$snn$1 digitalmars.com...
 Andrei Alexandrescu Wrote:

 What we want is a design that tells the truth. And a design that tells
 the truth is this:

 r.isEmpty does whatever the hell it takes to make sure whether there's
 data available or not. It is not const and it could throw an exception.

 v = r.getNext returns BY VALUE by means of DESTRUCTIVE COPY data that
 came through the wire, data that the client now owns as soon as getNext
 returned. There is no extra copy, no extra allocation, and the real
 thing has happened: data has been read from the outside and user code
 was made the only owner of it.

 this is really kool n the gang. there's a sore point tho. if i wanna 
 read strings from a file no prob.

 for (auto r = stringSucker(stdin); !r.isEmpty(); )
 {
    string s = r.getNext();
    // play with s
 }

 but a new string is allocated every line. that's safe but slow. so i 
 want some more efficient stuff. i should use char[] because string 
 don't deallocate.

 for (auto r = charArraySucker(stdin); !r.isEmpty(); )
 {
    char[] s = r.getNext();
    // play with s
 }

 no improvement. same thing a new char[] is alloc each time. maybe i 
 could do

 for (auto r = charArraySucker(stdin); !r.isEmpty(); )
 {
    char[] s = r.getNext();
    // play with s
    delete s;
 }

 would this make stuff faster. maybe. maybe not. and it's not general. 
 what i'd like is some way of telling the range, i'm done with this you 
 can recycle and reuse it. it's a green green world.

 for (auto r = charArraySucker(stdin); !r.isEmpty(); )
 {
    char[] s = r.getNext();
    // play with s
    r.recycle(s);
 }

 sig is recycle(ref ElementType!(R)).

 
 Can't this be done by creating different ranges? I mean, trying to find 
 a 'one size fits all' model is usually a lost cause. And now we have 
 different consumers and different types.
 
 Perhaps one sucker is instantiated with its own internal buffer and 
 another sucker allocates every new item. And possibly yet another sucker 
 only returns invariant items.

I don't mind implementing different suckers. :o) The problem is that the
suckers won't have the same interface, so some of them will work badly
or not at all with std.algorithm.

So again: I think the design you suggested isEmpty/first/getNext is the
better one even for input iterators.

Andrei

Sep 09 2008

Leandro Lucarella <llucax gmail.com> writes:

Andrei Alexandrescu, el  9 de septiembre a las 07:47 me escribiste:
 Lionello Lunesu wrote:
"Andrei Alexandrescu" <SeeWebsiteForEmail erdani.org> wrote in message 
news:ga46ok$2s77$1 digitalmars.com...
I put together a short document for the range design. I definitely missed 
about a million things and have been imprecise about another million, so 
feedback would be highly appreciated. See:

http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html

This is just awesome. Thank you for tackling this issue.
I agree with others that some names are not so obvious. Left/right? How do 
Arabic speakers feel about this : ) Begin/end seems more intuitive.

 
 I don't know of that particular cultural sensitivity. Begin and end are
 bad choices because they'd confuse the heck out of STL refugees. c.left
 and c.right are actually STL's c.front() and c.back() or *c.begin() and
 c.end()[-1], if you allow the notational abuse. But I sure hope some
 good names will come along.

I think STL refugees can deal with it. I think there is no point on keep
compromising D's readability because of C/C++ (you just mentioned enum,
another really bad choice to keep C/C++ refugees happy).

I find left and right a little obscure too, it all depends on your mental
image of a range. Using front/back or begin/end is much more clearer.

-- 
Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/
----------------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------------
MP: Qué tengo?                     B: 2 dedos de frente.
MP: No, en mi mano, Bellini.       B: Un acoplado!
MP: No, escuche bien, eh...        B: El pelo largo, Mario...
    Se usa en la espalda.
MP: No! Es para cargar.            B: Un hermano menor.
MP: No, Bellini! Se llena con      B: Un chancho, Mario...
    cualquier cosa.
MP: No, Bellini, no y no!
	-- El Gran Bellini (Mario Podestá con una mochila)

Sep 09 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Leandro Lucarella wrote:
 Andrei Alexandrescu, el  9 de septiembre a las 07:47 me escribiste:
 Lionello Lunesu wrote:
 "Andrei Alexandrescu" <SeeWebsiteForEmail erdani.org> wrote in message 
 news:ga46ok$2s77$1 digitalmars.com...
 I put together a short document for the range design. I definitely missed 
 about a million things and have been imprecise about another million, so 
 feedback would be highly appreciated. See:

 http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html

 This is just awesome. Thank you for tackling this issue.
 I agree with others that some names are not so obvious. Left/right? How do 
 Arabic speakers feel about this : ) Begin/end seems more intuitive.

 I don't know of that particular cultural sensitivity. Begin and end are
 bad choices because they'd confuse the heck out of STL refugees. c.left
 and c.right are actually STL's c.front() and c.back() or *c.begin() and
 c.end()[-1], if you allow the notational abuse. But I sure hope some
 good names will come along.

 
 I think STL refugees can deal with it. I think there is no point on keep
 compromising D's readability because of C/C++ (you just mentioned enum,
 another really bad choice to keep C/C++ refugees happy).

I agree. I just don't think that choosing one name over a synonym name 
compromises much.

 I find left and right a little obscure too, it all depends on your mental
 image of a range. Using front/back or begin/end is much more clearer.

I'd like to go with:

r.first
r.last
r.next
r.pop

Not convinced about r.toBegin(s), r.toEnd(s), and r.fromEnd(s) yet, in 
wake of a realization. I've noticed that the r.fromBegin(s) operation is 
not needed if we make the appropriate relaxations. r.fromBegin(s) is 
really s.toEnd(r).

I've updated 
http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html with a 
drawing (right at the beginning) that illustrates what primitives we 
need. Maybe this will help us choose even better names.


Andrei

Sep 09 2008

Leandro Lucarella <llucax gmail.com> writes:

Andrei Alexandrescu, el  9 de septiembre a las 10:30 me escribiste:
I think STL refugees can deal with it. I think there is no point on keep
compromising D's readability because of C/C++ (you just mentioned enum,
another really bad choice to keep C/C++ refugees happy).

 
 I agree. I just don't think that choosing one name over a synonym name 
 compromises much.
 
I find left and right a little obscure too, it all depends on your mental
image of a range. Using front/back or begin/end is much more clearer.

 
 I'd like to go with:
 
 r.first
 r.last

Much better, thank you! =)

-- 
Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/
----------------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------------
De todos los amigos que he tenido, eres el primero.
	-- Bender

Sep 09 2008

Benji Smith <dlanguage benjismith.net> writes:

Leandro Lucarella wrote:
 Andrei Alexandrescu, el  9 de septiembre a las 10:30 me escribiste:

 I'd like to go with:

 r.first
 r.last

 
 Much better, thank you! =)
 

Maybe:

r.head
r.tail

Also, I'm not crazy about the "isEmpty" name. Given the use of 
"getNext", I think "hasNext" is a more natural choice.

--benji

Sep 09 2008

"Manfred_Nowak" <svv1999 hotmail.com> writes:

Benji Smith wrote:

 Given the use of "getNext", I think "hasNext" is a more natural
 choice 

clapping hands.

-manfred

-- 
If life is going to exist in this Universe, then the one thing it 
cannot afford to have is a sense of proportion. (Douglas Adams)

Sep 09 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Manfred_Nowak wrote:
 Benji Smith wrote:
 
 Given the use of "getNext", I think "hasNext" is a more natural
 choice 

 
 clapping hands.

Walter would love that.

for (R r = getR(); r.hasNext; r.next) { ... }

Look ma, no negation! Oops, I just materialized one with the exclamation 
sign.


Andrei

Sep 09 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Andrei Alexandrescu wrote:
 Manfred_Nowak wrote:
 Benji Smith wrote:

 Given the use of "getNext", I think "hasNext" is a more natural
 choice 

 clapping hands.

 
 Walter would love that.
 
 for (R r = getR(); r.hasNext; r.next) { ... }
 
 Look ma, no negation! Oops, I just materialized one with the exclamation 
 sign.

I just discovered a problem with that. hasNext implies I'm supposed to 
call next to get the thingie. It should be hasFirst, which is less 
appealing.

Andrei

Sep 09 2008

Benji Smith <dlanguage benjismith.net> writes:

Andrei Alexandrescu wrote:
 Andrei Alexandrescu wrote:
 Manfred_Nowak wrote:
 Benji Smith wrote:

 Given the use of "getNext", I think "hasNext" is a more natural
 choice 

 clapping hands.

 Walter would love that.

 for (R r = getR(); r.hasNext; r.next) { ... }

 Look ma, no negation! Oops, I just materialized one with the 
 exclamation sign.

 
 I just discovered a problem with that. hasNext implies I'm supposed to 
 call next to get the thingie. It should be hasFirst, which is less 
 appealing.
 
 Andrei

I see where you're coming from, because the range shrinks itself 
element-by-element as it iterates, eventually disappearing into a 
zero-element range.

But for me as a consumer of the container, it's a little weird.

When I visualize iteration (beware: silly metaphor ahead), it looks like 
a frog hopping from one lilly-pad to another. He might have a 
well-defined start and end point, but the lilly-pads don't evaporate 
after the frog hops away.

When I see "isEmpty" in the implementation of an iteration routine, it 
makes me think of a producer/consumer work queue. The list is only empty 
if the consumer has consumed all the work items from the queue.

I get what you're saying about the iteration-range being empty because 
it shrinks itself during each step of the iteration. But to me, 
iteration is an idempotent operation, so something non-empty before the 
iteration should not be empty after the iteration.

--benji

Sep 09 2008

"Denis Koroskin" <2korden gmail.com> writes:

On Wed, 10 Sep 2008 01:56:40 +0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Andrei Alexandrescu wrote:
 Manfred_Nowak wrote:
 Benji Smith wrote:

 Given the use of "getNext", I think "hasNext" is a more natural
 choice

 clapping hands.

  Walter would love that.
  for (R r = getR(); r.hasNext; r.next) { ... }
  Look ma, no negation! Oops, I just materialized one with the  
 exclamation sign.

 I just discovered a problem with that. hasNext implies I'm supposed to  
 call next to get the thingie. It should be hasFirst, which is less  
 appealing.

 Andrei

I usually implement my iterators as follows:

interface Iterator(T) {
     bool isValid();
     T value();
     void moveNext();
}

Usage:

auto it = ...;
while (it.isValid()) {
     auto value = it.value();
     it.moveNext();
}

or

for (auto it = ...; it.isValid(); it.moveNext()) {
     auto value = it.value();
}

Sep 10 2008

"Bill Baxter" <wbaxter gmail.com> writes:

On Wed, Sep 10, 2008 at 12:30 AM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 I've updated http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html
 with a drawing (right at the beginning) that illustrates what primitives we
 need. Maybe this will help us choose even better names.

The text says that s should be a subrange of r, but the drawing shows
s extending beyond r.  Does it actually need to be a subrange?

--bb

Sep 09 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Bill Baxter wrote:
 On Wed, Sep 10, 2008 at 12:30 AM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 I've updated http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html
 with a drawing (right at the beginning) that illustrates what primitives we
 need. Maybe this will help us choose even better names.

 
 The text says that s should be a subrange of r, but the drawing shows
 s extending beyond r.  Does it actually need to be a subrange?

I am considering relaxing the requirements.

Andrei

Sep 09 2008

Derek Parnell <derek psych.ward> writes:

On Tue, 09 Sep 2008 10:30:58 -0500, Andrei Alexandrescu wrote:


 I'd like to go with:
 
 r.first
 r.last
 r.next
 r.pop

LOL ... I was just thinking to myself ... "what's wrong with First and
Last? I should suggest them." then I read this post.

"next" is fine, but "pop"? Isn't the pair of "next" called "prev(ious)" and
the pair of "pop" called "push". So please, either have next/prev or
push/pop, and in that case push/pop looks quite silly.

-- 
Derek Parnell
Melbourne, Australia
skype: derek.j.parnell

Sep 09 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Derek Parnell wrote:
 On Tue, 09 Sep 2008 10:30:58 -0500, Andrei Alexandrescu wrote:
 
 
 I'd like to go with:

 r.first
 r.last
 r.next
 r.pop

 
 LOL ... I was just thinking to myself ... "what's wrong with First and
 Last? I should suggest them." then I read this post.
 
 "next" is fine, but "pop"? Isn't the pair of "next" called "prev(ious)" and
 the pair of "pop" called "push". So please, either have next/prev or
 push/pop, and in that case push/pop looks quite silly.

Previous is confusing as it suggest I'm moving back where I came from. 
In reality I shrink the range from the other end. So we need:

"Shrink the range from the left end"
"Shrink the range from the right end"

The first will be used much more often than the second.


Andrei

Sep 09 2008

Derek Parnell <derek nomail.afraid.org> writes:

On Tue, 09 Sep 2008 18:13:08 -0500, Andrei Alexandrescu wrote:


 Previous is confusing as it suggest I'm moving back where I came from. 

Ah... how confusing is this English language! ;-)

 In reality I shrink the range from the other end. So we need:
 
 "Shrink the range from the left end"
 "Shrink the range from the right end"

And I'm sure you really mean ...

 "Shrink the range from the front"
 "Shrink the range from the back"
 
because "left" does not always mean "front" etc ... but we've been over
this.

 The first will be used much more often than the second.

If the concept and the implementation involves changing the size
(shrinking), shouldn't the words used somehow invoke this idea?

trim? strip? slice? cut? ... just thinking out loud ... nothing too
serious.

-- 
Derek
(skype: derek.j.parnell)
Melbourne, Australia
10/09/2008 10:10:32 AM

Sep 09 2008

Benji Smith <dlanguage benjismith.net> writes:

Derek Parnell wrote:
 If the concept and the implementation involves changing the size
 (shrinking), shouldn't the words used somehow invoke this idea?
 
 trim? strip? slice? cut? ... just thinking out loud ... nothing too
 serious.

Consume?

Sep 09 2008

Leandro Lucarella <llucax gmail.com> writes:

Andrei Alexandrescu, el  9 de septiembre a las 18:13 me escribiste:
 Derek Parnell wrote:
On Tue, 09 Sep 2008 10:30:58 -0500, Andrei Alexandrescu wrote:
I'd like to go with:

r.first
r.last
r.next
r.pop

LOL ... I was just thinking to myself ... "what's wrong with First and
Last? I should suggest them." then I read this post.
"next" is fine, but "pop"? Isn't the pair of "next" called "prev(ious)" and
the pair of "pop" called "push". So please, either have next/prev or
push/pop, and in that case push/pop looks quite silly.

 
 Previous is confusing as it suggest I'm moving back where I came from. In 
 reality I shrink the range from the other end. So we need:
 
 "Shrink the range from the left end"
 "Shrink the range from the right end"

You mean from begining and the end I guess ;)

 The first will be used much more often than the second.

shrink(int n = 1)?
if n > 0, shrinks from the begining, if n < 0, shrinks if shrinks from the
end (0 is no-op). This way you can skip some elements too. Even when it
could be a little cryptic, I think it plays well with slicing.

Another posibility is shrink(int begin = 1, end = 0), to shrink both ends
at the time, for example (calling shrink1 to the first proposal and
shrink2 to the second):
r.pop == r.shrink1(-1) == r.shrink2(0, 1)
r.pop; r.pop == r.shrink(-2) == r.shrink2(0, 2)
r.shrink1() == r.shrink2()
r.shrink1(3) == r.shrink2(3)
r.shrink1(3); r.shrink1(-2) == r.shrink2(3, 2)


-- 
Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/
----------------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------------
<o_O> parakenotengobarraespaciadora
<o_O> aver
<o_O> estoyarreglandolabarraporkeserompiounapatita

Sep 10 2008

Sergey Gromov <snake.scaly gmail.com> writes:

Leandro Lucarella <llucax gmail.com> wrote:
 Andrei Alexandrescu, el  9 de septiembre a las 18:13 me escribiste:
 "Shrink the range from the left end"
 "Shrink the range from the right end"

 The first will be used much more often than the second.

 
 shrink(int n = 1)?
 if n > 0, shrinks from the begining, if n < 0, shrinks if shrinks from the
 end (0 is no-op). This way you can skip some elements too. Even when it
 could be a little cryptic, I think it plays well with slicing.
 
 Another posibility is shrink(int begin = 1, end = 0), to shrink both ends
 at the time, for example (calling shrink1 to the first proposal and
 shrink2 to the second):
 r.pop == r.shrink1(-1) == r.shrink2(0, 1)
 r.pop; r.pop == r.shrink(-2) == r.shrink2(0, 2)
 r.shrink1() == r.shrink2()
 r.shrink1(3) == r.shrink2(3)
 r.shrink1(3); r.shrink1(-2) == r.shrink2(3, 2)

I remember that shift() was a method to remove first element from an 
array.  Some Basic perhaps...  So it could be push, pop, shift, er... 
unshift?

But now that I think about it...  What's the use case for these 
operations?  It's clear to me that getNext/putNext are a 
generator/constructor pair, in the broad sense.  But push/whatever?

Andrei have told already, if I remember correctly, that ranges are views 
of data, not manipulators.  This means that they cannot be used to 
extend a collection.  Therefore ranges cannot have any extension methods 
whatsoever.

If you draw a parallel between a range and a stack of paper, the shrink 
methods would probably be pop/snap...  I'd also propose next() for 
moving the start and prev() for moving the end.  It sounds a bit 
misleading but, on the other hand, it closely resembles forward and 
backward iteration with the opposite end of a range representing the 
iteration limit.  Or maybe forward()/backward(), or fwd()/back()?

Sep 10 2008

David Gileadi <foo bar.com> writes:

Sergey Gromov wrote:
 If you draw a parallel between a range and a stack of paper, the shrink 
 methods would probably be pop/snap...  I'd also propose next() for 
 moving the start and prev() for moving the end.  It sounds a bit 
 misleading but, on the other hand, it closely resembles forward and 
 backward iteration with the opposite end of a range representing the 
 iteration limit.  Or maybe forward()/backward(), or fwd()/back()?

Perhaps reduce() instead of pop() for moving the end?

Sep 10 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

David Gileadi wrote:
 Sergey Gromov wrote:
 If you draw a parallel between a range and a stack of paper, the 
 shrink methods would probably be pop/snap...  I'd also propose next() 
 for moving the start and prev() for moving the end.  It sounds a bit 
 misleading but, on the other hand, it closely resembles forward and 
 backward iteration with the opposite end of a range representing the 
 iteration limit.  Or maybe forward()/backward(), or fwd()/back()?

 
 Perhaps reduce() instead of pop() for moving the end?

I love reduce! Thought of it as well. Unfortunately the term is loaded 
with reduction of a binary operation over a range, as e.g. in std.algorithm.

I think shrink() is reasonable. next() moves to the next thing. shrink() 
shrinks the set of dudes I can reach.


Andrei

Sep 10 2008

Sergey Gromov <snake.scaly gmail.com> writes:

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:
 David Gileadi wrote:
 Sergey Gromov wrote:
 If you draw a parallel between a range and a stack of paper, the 
 shrink methods would probably be pop/snap...  I'd also propose next() 
 for moving the start and prev() for moving the end.  It sounds a bit 
 misleading but, on the other hand, it closely resembles forward and 
 backward iteration with the opposite end of a range representing the 
 iteration limit.  Or maybe forward()/backward(), or fwd()/back()?

 
 Perhaps reduce() instead of pop() for moving the end?

 
 I love reduce! Thought of it as well. Unfortunately the term is loaded 
 with reduction of a binary operation over a range, as e.g. in std.algorithm.
 
 I think shrink() is reasonable. next() moves to the next thing. shrink() 
 shrinks the set of dudes I can reach.

I thought of next/shrink as well, but they look asymmetrical, and also 
next here is a form of shrink, too.  Someone could think of "shrink" as 
of chopping from both ends.

Sep 10 2008

"Lionello Lunesu" <lionello lunesu.remove.com> writes:

"Andrei Alexandrescu" <SeeWebsiteForEmail erdani.org> wrote in message 
news:ga5r8i$h0v$1 digitalmars.com...
 So in essence the behavior is that you can use isEmpty to
 make sure that getNext won't blow in your face (speaking of Pulp
 Fiction...)

So isEmpty is optional for input ranges? This does not actually match your 
own documentation:

getNext: The call is defined only right after r.isEmpty returned false.

If you make isEmpty optional, its non-constness is less of a problem. What I 
have a problem with (overstatement) is having to call isEmpty to actually 
prepare the next element. If try { while (1) e = ir.getNext; } works, I'm 
sold : )

 r.isEmpty does whatever the hell it takes to make sure whether there's
 data available or not. It is not const and it could throw an exception.

 v = r.getNext returns BY VALUE by means of DESTRUCTIVE COPY data that
 came through the wire, data that the client now owns as soon as getNext
 returned. There is no extra copy, no extra allocation, and the real
 thing has happened: data has been read from the outside and user code
 was made the only owner of it.

Thank you for taking the time to explain all these details. This is all 
great stuff.

L.

Sep 09 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Lionello Lunesu wrote:
 
 "Andrei Alexandrescu" <SeeWebsiteForEmail erdani.org> wrote in message 
 news:ga5r8i$h0v$1 digitalmars.com...
 So in essence the behavior is that you can use isEmpty to
 make sure that getNext won't blow in your face (speaking of Pulp
 Fiction...)

 
 So isEmpty is optional for input ranges? This does not actually match 
 your own documentation:
 
 getNext: The call is defined only right after r.isEmpty returned false.

 
 If you make isEmpty optional, its non-constness is less of a problem. 
 What I have a problem with (overstatement) is having to call isEmpty to 
 actually prepare the next element. If try { while (1) e = ir.getNext; } 
 works, I'm sold : )

I think I'd want to make it nonoptional such that people wanting real 
fast iterators can define r.isEmpty to do a check and r.getNext (well, 
r.left) to go unchecked.

 r.isEmpty does whatever the hell it takes to make sure whether there's
 data available or not. It is not const and it could throw an exception.

 v = r.getNext returns BY VALUE by means of DESTRUCTIVE COPY data that
 came through the wire, data that the client now owns as soon as getNext
 returned. There is no extra copy, no extra allocation, and the real
 thing has happened: data has been read from the outside and user code
 was made the only owner of it.

 
 Thank you for taking the time to explain all these details. This is all 
 great stuff.

However, superdan destroyed me. (See my answer to him.) I think I need 
to concede to your design.

Andrei

Sep 09 2008

Michel Fortin <michel.fortin michelf.com> writes:

On 2008-09-08 17:50:54 -0400, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 feedback would be highly appreciated. See:
 
 http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html

That looks great. I want to suggest renaming a few functions to make 
them more consistant and (hopefully) more expressive, as I see I'm not 
the only one frowning on them.

So right now you're defining this:

r.getNext
r.putNext

r.left
r.next
rightUnion(r, s)
rightDiff(r, s)

r.right
r.pop
leftUnion(r, s)
leftDiff(r, s)

Here's my alternate naming proposal:

r.headNext
r.putNext

r.head
r.next
r.nextUntil(s)
r.nextAfter(s)

r.rear
r.pull
r.pullUntil(s)
r.pullAfter(s)

Note that r.headNext is literally r.head followed by r.next when you 
have a forward iterator.  You could also add "rearPull" to 
bidirectional ranges if you wanted. :-)

The syntax is a little different for binary functions (union, diff) as 
I changed them to members to make things easier to read and more in 
line with the regular next and pull.

-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Sep 08 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Michel Fortin wrote:
 On 2008-09-08 17:50:54 -0400, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> said:
 
 feedback would be highly appreciated. See:

 http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html

 
 That looks great. I want to suggest renaming a few functions to make 
 them more consistant and (hopefully) more expressive, as I see I'm not 
 the only one frowning on them.
 
 So right now you're defining this:
 
 r.getNext
 r.putNext
 
 r.left
 r.next
 rightUnion(r, s)
 rightDiff(r, s)
 
 r.right
 r.pop
 leftUnion(r, s)
 leftDiff(r, s)
 
 Here's my alternate naming proposal:
 
 r.headNext
 r.putNext
 
 r.head
 r.next
 r.nextUntil(s)
 r.nextAfter(s)
 
 r.rear
 r.pull
 r.pullUntil(s)
 r.pullAfter(s)
 
 Note that r.headNext is literally r.head followed by r.next when you 
 have a forward iterator.  You could also add "rearPull" to bidirectional 
 ranges if you wanted. :-)
 
 The syntax is a little different for binary functions (union, diff) as I 
 changed them to members to make things easier to read and more in line 
 with the regular next and pull.

I like the alternate names quite some. One thing, however, is that head 
and rear are not near-antonyms (which ideally they should be). Maybe 
front and rear would be an improvement. (STL uses front and back). Also, 
I may be dirty-minded, but somehow headNext just sounds... bad :o).

I like the intersection functions as members because they clarify the 
relationship between the two ranges, which is asymmetric. I will 
definitely heed this suggestion. "Until" suggests iteration, however, 
which it shouldn't be (should be constant time) so maybe "nextTo" or 
something could be more suggestive.

This is going somewhere!


Andrei

Sep 08 2008

"Manfred_Nowak" <svv1999 hotmail.com> writes:

Andrei Alexandrescu wrote:

 maybe "nextTo" or something could be more suggestive.

r.tillBeg(s), r.tillEnd(s),
r.fromBeg(s), r.fromEnd(s) ?

-manfred 
-- 
If life is going to exist in this Universe, then the one thing it 
cannot afford to have is a sense of proportion. (Douglas Adams)

Sep 08 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Manfred_Nowak wrote:
 Andrei Alexandrescu wrote:
 
 maybe "nextTo" or something could be more suggestive.

 
 r.tillBeg(s), r.tillEnd(s),
 r.fromBeg(s), r.fromEnd(s) ?

Sounds good! Walter doesn't like abbreviations, so probably *Begin would 
please him more.

Andrei

Sep 08 2008

"Bill Baxter" <wbaxter gmail.com> writes:

On Tue, Sep 9, 2008 at 1:06 PM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 Manfred_Nowak wrote:
 Andrei Alexandrescu wrote:

 maybe "nextTo" or something could be more suggestive.

 r.tillBeg(s), r.tillEnd(s),
 r.fromBeg(s), r.fromEnd(s) ?

 Sounds good! Walter doesn't like abbreviations, so probably *Begin would
 please him more.

But till and until are synonyms.  They both sound like iteration.
Although it might be unavoidable since all prepositions that give a
destination seem to imply going to that destination.  till, until,
toward, to, up to, etc.  So might as well go with the shortest one,
"to".

r.toBegin(s), r.toEnd(s)
r.fromBegin(s), r.fromEnd(s)

--bb

Sep 08 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Bill Baxter wrote:
 On Tue, Sep 9, 2008 at 1:06 PM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 Manfred_Nowak wrote:
 Andrei Alexandrescu wrote:

 maybe "nextTo" or something could be more suggestive.

 r.tillBeg(s), r.tillEnd(s),
 r.fromBeg(s), r.fromEnd(s) ?

 Sounds good! Walter doesn't like abbreviations, so probably *Begin would
 please him more.

 
 But till and until are synonyms.  They both sound like iteration.
 Although it might be unavoidable since all prepositions that give a
 destination seem to imply going to that destination.  till, until,
 toward, to, up to, etc.  So might as well go with the shortest one,
 "to".
 
 r.toBegin(s), r.toEnd(s)
 r.fromBegin(s), r.fromEnd(s)

These are the names that I find most appealing at the moment. They 
required the fewest neurons to fire when glancing over them and mapping 
them to the needed operations.

Andrei

Sep 09 2008

Fawzi Mohamed <fmohamed mac.com> writes:

It is a nice idea to redesign the iterator and range.
I think that the proposal is not bad, but I have some notes about it, 
and some things that I would have done differently.

1) The simplest interface input (range) is just
bool isEmpty();
T next();
iterator(T) release();

Thefirst two I fully agree on, the second one I suppose is there to 
allow resources to be released and possibly transfer the data to 
another iterator.. is it really needed?

Now I would see this simplest thing (let me call it iterator) as the 
basic objects for foreach looping.
*all* things on which foreach loops should be iterators.
If an object is not a iterator there should be a standard way to 
convert it to one (.iterator for example).
So if the compiler gets something that is not a iterator it tries to 
see if .iterator is implemented and if it is it calls it and iterates 
on it.
This let many objects have a "default" iterator. Obviously an object 
could have other methods that return an iterator.

2) All the methods with intersection of iterator in my opinion are 
difficult to memorize, and rarely used, I would scrap them.
Instead I would add the comparison operation 
.atSamePlace(iterator!(T)y) that would say if two iterators are at the 
same place. With it one gets back all the power of pointers, and with a 
syntax and use that are understandable.
I understand the idea of covering all possibilities, if one wants it 
with .atSamePlace a template can easily construct all possible 
intersection iterators. Clearly calling recursively such a template is 
inefficient, but I would say the then one should use directly a pair of 
iterators (in the worst case one could make a specialization that 
implements it more efficiently for the types that support it).

3) copying: I would let the user freely copy and duplicate iterators if needed.

4) input-output
I think that the operations proposed are sound, I like them

5) hierarchy of iterators
I would classify the iterator also along another axis: size
infinite (stream) - finite (but unknown size) - bounded (finite and known size)

The other classification:
forward iterator (what I called iterator until now)
bidirectional range: I understand this, these are basically two 
iterators one from the beginning and the other from the end that are 
coupled together. I find it a little bit strange, I would just expect 
to have a pair of iterators... but I see that it might be useful
bidirectional iterator: this is a doubly linked list, I think that this 
class of iterators cannot easily be described just as a range, it often 
needs three points (start,end,actual_pos), I think has its place (and 
is not explicitly present in your list)
random_iterator: (this could be also called array type or linear indexed type).

So this is what "my" iterator/range would look like :)

Fawzi

Sep 09 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Fawzi Mohamed wrote:
 It is a nice idea to redesign the iterator and range.
 I think that the proposal is not bad, but I have some notes about it, 
 and some things that I would have done differently.
 
 1) The simplest interface input (range) is just
 bool isEmpty();
 T next();
 iterator(T) release();

Actually next is getNext, and release returns R (the range type).

 Thefirst two I fully agree on, the second one I suppose is there to 
 allow resources to be released and possibly transfer the data to another 
 iterator.. is it really needed?

Yes. Consider findAdjacent that finds two equal adjacent elements in a 
collection:

Range findAdjacent(alias pred = "a == b", Range)(Range r)
{
     if (r.isEmpty) return r;
     auto ahead = r;
     ahead.next;
     for (; !ahead.isEmpty; r.next, ahead.next)
         if (binaryFun!(pred)(r.first, ahead.first)) return r;
     }
     return ahead;
}

The whole implementation fundamentally rests on the notion that you can 
copy a range into another, and that you can iterate the collection 
independently with two distinct ranges. If that's not true, findAdjacent 
will execute yielding nonsensical results.

Input iterators are not copyable. With an input iterator "auto ahead = 
r;" will not compile. But they are movable. So you can relinquish 
control from one iterator to the other.

 Now I would see this simplest thing (let me call it iterator) as the 
 basic objects for foreach looping.
 *all* things on which foreach loops should be iterators.
 If an object is not a iterator there should be a standard way to convert 
 it to one (.iterator for example).
 So if the compiler gets something that is not a iterator it tries to see 
 if .iterator is implemented and if it is it calls it and iterates on it.
 This let many objects have a "default" iterator. Obviously an object 
 could have other methods that return an iterator.

Fine. So instead of saying:

foreach (e; c.all) { ... }

you can say

foreach (e; c) { ... }

I think that's some dubious savings.

 2) All the methods with intersection of iterator in my opinion are 
 difficult to memorize, and rarely used, I would scrap them.
 Instead I would add the comparison operation .atSamePlace(iterator!(T)y) 
 that would say if two iterators are at the same place. With it one gets 
 back all the power of pointers, and with a syntax and use that are 
 understandable.

But that comparison operation is not enough to implement anything of 
substance. Try your hand at a few classic algorithms and you'll see.

 I understand the idea of covering all possibilities, if one wants it 
 with .atSamePlace a template can easily construct all possible 
 intersection iterators. Clearly calling recursively such a template is 
 inefficient, but I would say the then one should use directly a pair of 
 iterators (in the worst case one could make a specialization that 
 implements it more efficiently for the types that support it).
 
 3) copying: I would let the user freely copy and duplicate iterators if 
 needed.

I like freedom too. But that kind of freedom is incorrect for input 
iterators.

 4) input-output
 I think that the operations proposed are sound, I like them

Then you got to accept the consequences :o).

 5) hierarchy of iterators
 I would classify the iterator also along another axis: size
 infinite (stream) - finite (but unknown size) - bounded (finite and 
 known size)

Distinguishing such things can be of advantage sometimes, and could be 
added as a refinement to the five categories if shown useful.

 The other classification:
 forward iterator (what I called iterator until now)
 bidirectional range: I understand this, these are basically two 
 iterators one from the beginning and the other from the end that are 
 coupled together. I find it a little bit strange, I would just expect to 
 have a pair of iterators... but I see that it might be useful
 bidirectional iterator: this is a doubly linked list, I think that this 
 class of iterators cannot easily be described just as a range, it often 
 needs three points (start,end,actual_pos), I think has its place (and is 
 not explicitly present in your list)
 random_iterator: (this could be also called array type or linear indexed 
 type).

I can't understand much of the above, sorry.

 So this is what "my" iterator/range would look like :)

I encourage you to realize your design. Before long you'll find probably 
even more issues with it than I mentioned above, but you'll be gained in 
being better equipped to find proper solutions.


Andrei

Sep 09 2008

Fawzi Mohamed <fmohamed mac.com> writes:

On 2008-09-09 18:09:28 +0200, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 Fawzi Mohamed wrote:
 It is a nice idea to redesign the iterator and range.
 I think that the proposal is not bad, but I have some notes about it, 
 and some things that I would have done differently.
 
 1) The simplest interface input (range) is just
 bool isEmpty();
 T next();
 iterator(T) release();

 
 Actually next is getNext, and release returns R (the range type).
 
 Thefirst two I fully agree on, the second one I suppose is there to 
 allow resources to be released and possibly transfer the data to 
 another iterator.. is it really needed?

 
 Yes. Consider findAdjacent that finds two equal adjacent elements in a 
 collection:
 
 Range findAdjacent(alias pred = "a == b", Range)(Range r)
 {
      if (r.isEmpty) return r;
      auto ahead = r;
      ahead.next;
      for (; !ahead.isEmpty; r.next, ahead.next)
          if (binaryFun!(pred)(r.first, ahead.first)) return r;
      }
      return ahead;
 }
 
 The whole implementation fundamentally rests on the notion that you can 
 copy a range into another, and that you can iterate the collection 
 independently with two distinct ranges. If that's not true, 
 findAdjacent will execute yielding nonsensical results.
 
 Input iterators are not copyable. With an input iterator "auto ahead = 
 r;" will not compile. But they are movable. So you can relinquish 
 control from one iterator to the other.

ok I understand, indeed it is useful to have non copyable "unique" 
iterators, even if they are not the common iterators (actually I think 
it is potentially even more important for output iterators).

 Now I would see this simplest thing (let me call it iterator) as the 
 basic objects for foreach looping.
 *all* things on which foreach loops should be iterators.
 If an object is not a iterator there should be a standard way to 
 convert it to one (.iterator for example).
 So if the compiler gets something that is not a iterator it tries to 
 see if .iterator is implemented and if it is it calls it and iterates 
 on it.
 This let many objects have a "default" iterator. Obviously an object 
 could have other methods that return an iterator.

 
 Fine. So instead of saying:
 
 foreach (e; c.all) { ... }
 
 you can say
 
 foreach (e; c) { ... }
 
 I think that's some dubious savings.

I think it is useful, but not absolutely necessary.

 2) All the methods with intersection of iterator in my opinion are 
 difficult to memorize, and rarely used, I would scrap them.
 Instead I would add the comparison operation 
 .atSamePlace(iterator!(T)y) that would say if two iterators are at the 
 same place. With it one gets back all the power of pointers, and with a 
 syntax and use that are understandable.

 
 But that comparison operation is not enough to implement anything of 
 substance. Try your hand at a few classic algorithms and you'll see.

are you sure? then a range is *exactly* equivalent to a STL iterator, 
only that it cannot go out of bounds:
// left1-left2:
while((!i1.isEmpty) && (!i1.atSamePlace(i2))){
  i1.next;
}
// left2-left1:
while((!i2.isEmpty) && (!i1.atSamePlace(i2))){
  i1.next;
}
// union 1-2
while((!i1.isEmpty) && (!(i1.atSamePlace(i2))){
  i1.next;
}
while(!i2.isEmpty){
  i2.next;
}
// union 2-1
...
// lower triangle
i1=c.all;
while(!i1.isEmpty){
  i2=c.all;
  while(!i2.isEmpty && !i2.atSamePlace(i1)){
    i2.next;
  }
well these are the operations that you can do on basically all 
iterators (and with wich you can define new iterators).
The one you propose need an underlying total order that can be 
efficiently checked, for example iterators on trees do not have 
necessarily this property, and then getting your kind of intersection 
can be difficult (and not faster than the operation using atSamePlace.

 I understand the idea of covering all possibilities, if one wants it 
 with .atSamePlace a template can easily construct all possible 
 intersection iterators. Clearly calling recursively such a template is 
 inefficient, but I would say the then one should use directly a pair of 
 iterators (in the worst case one could make a specialization that 
 implements it more efficiently for the types that support it).
 
 3) copying: I would let the user freely copy and duplicate iterators if needed.

 
 I like freedom too. But that kind of freedom is incorrect for input iterators.

Now I realized it, thanks.

 4) input-output
 I think that the operations proposed are sound, I like them

 
 Then you got to accept the consequences :o).

yes :)

 5) hierarchy of iterators
 I would classify the iterator also along another axis: size
 infinite (stream) - finite (but unknown size) - bounded (finite and known size)

 
 Distinguishing such things can be of advantage sometimes, and could be 
 added as a refinement to the five categories if shown useful.

well if an iterator knows its size, and you want to use it to 
initialize an array for example...

 The other classification:
 forward iterator (what I called iterator until now)


 bidirectional range: I understand this, these are basically two 
 iterators one from the beginning and the other from the end that are 
 coupled together. I find it a little bit strange, I would just expect 
 to have a pair of iterators... but I see that it might be useful
 bidirectional iterator: this is a doubly linked list, I think that this 
 class of iterators cannot easily be described just as a range, it often 
 needs three points (start,end,actual_pos), I think has its place (and 
 is not explicitly present in your list)
 random_iterator: (this could be also called array type or linear indexed type).

 
 I can't understand much of the above, sorry.

the only new thing is bidirectional iterator: an iterator that can go 
in both directions as extra iterator type (your bidirectional range is 
something different).
I think it is useful and I don't see the need to shoehorn it into a 
range. For me an iterator is an object that can generate a sequence by 
itself, so a range is an example of iterator, but I don't see the need 
to make each iterator a range.
As I said before a range also has a total ordering of the objects that 
can be easily checked, this is a special king of iterator for me, not 
the most general. Take two ranges of two linked lists, you cannot 
easily build your intersections because you don't know their relative 
order, and checking it is inefficient.

 
 So this is what "my" iterator/range would look like :)

 
 I encourage you to realize your design. Before long you'll find 
 probably even more issues with it than I mentioned above, but you'll be 
 gained in being better equipped to find proper solutions.

I hope we converge toward a good solution ;)

Fawzi
 
 
 Andrei

Sep 09 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Fawzi Mohamed wrote:
 are you sure? then a range is *exactly* equivalent to a STL iterator, 
 only that it cannot go out of bounds:
 // left1-left2:
 while((!i1.isEmpty) && (!i1.atSamePlace(i2))){
  i1.next;
 }
 // left2-left1:
 while((!i2.isEmpty) && (!i1.atSamePlace(i2))){
  i1.next;
 }
 // union 1-2
 while((!i1.isEmpty) && (!(i1.atSamePlace(i2))){
  i1.next;
 }
 while(!i2.isEmpty){
  i2.next;
 }
 // union 2-1
 ...
 // lower triangle
 i1=c.all;
 while(!i1.isEmpty){
  i2=c.all;
  while(!i2.isEmpty && !i2.atSamePlace(i1)){
    i2.next;
  }
 well these are the operations that you can do on basically all iterators 
 (and with wich you can define new iterators).
 The one you propose need an underlying total order that can be 
 efficiently checked, for example iterators on trees do not have 
 necessarily this property, and then getting your kind of intersection 
 can be difficult (and not faster than the operation using atSamePlace.

I am getting seriously confused by this subthread. So are you saying 
that atSamePlace is your primitive and that you implement the other 
range operations all in linear time? If I did not misunderstand and 
that's your design, then you may want to revise that design right now. 
It will never work. I guarantee it.

 the only new thing is bidirectional iterator: an iterator that can go in 
 both directions as extra iterator type (your bidirectional range is 
 something different).

A bidirectional range is simply a range that you can shrink from either 
end.

 I think it is useful and I don't see the need to shoehorn it into a 
 range. For me an iterator is an object that can generate a sequence by 
 itself, so a range is an example of iterator, but I don't see the need 
 to make each iterator a range.

I have put forth reasons for doing away with iterators entirely in the 
range doc. What are your counter-reasons for bringing back iterators?

 As I said before a range also has a total ordering of the objects that 
 can be easily checked, this is a special king of iterator for me, not 
 the most general. Take two ranges of two linked lists, you cannot easily 
 build your intersections because you don't know their relative order, 
 and checking it is inefficient.

Correct. The range intersection primitives are Achille's heel of the 
range-based design. Checking subrange reachability is O(n), so the range 
intersection primitives take the user by faith. But iterators have that 
Achille's heel problem too, plus a few arrows in their back :o). The 
document clarifies this disadvantage by saying that range intersection 
primitives are undefined if certain conditions are not met.

In short, this is an endemic problem of an iteration based on either 
ranges or individual iterators. An objection to that should 
automatically come with a constructive proof, i.e. a better design.

 So this is what "my" iterator/range would look like :)

 I encourage you to realize your design. Before long you'll find 
 probably even more issues with it than I mentioned above, but you'll 
 be gained in being better equipped to find proper solutions.

 
 I hope we converge toward a good solution ;)

Well I haven't seen much code yet.


Andrei

Sep 09 2008

Fawzi Mohamed <fmohamed mac.com> writes:

On 2008-09-10 01:02:00 +0200, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 Fawzi Mohamed wrote:
 are you sure? then a range is *exactly* equivalent to a STL iterator, 
 only that it cannot go out of bounds:
 // left1-left2:
 while((!i1.isEmpty) && (!i1.atSamePlace(i2))){
  i1.next;
 }
 // left2-left1:
 while((!i2.isEmpty) && (!i1.atSamePlace(i2))){
  i1.next;
 }
 // union 1-2
 while((!i1.isEmpty) && (!(i1.atSamePlace(i2))){
  i1.next;
 }
 while(!i2.isEmpty){
  i2.next;
 }
 // union 2-1
 ...
 // lower triangle
 i1=c.all;
 while(!i1.isEmpty){
  i2=c.all;
  while(!i2.isEmpty && !i2.atSamePlace(i1)){
    i2.next;
  }
 well these are the operations that you can do on basically all 
 iterators (and with wich you can define new iterators).
 The one you propose need an underlying total order that can be 
 efficiently checked, for example iterators on trees do not have 
 necessarily this property, and then getting your kind of intersection 
 can be difficult (and not faster than the operation using atSamePlace.

 
 I am getting seriously confused by this subthread. So are you saying 
 that atSamePlace is your primitive and that you implement the other 
 range operations all in linear time? If I did not misunderstand and 
 that's your design, then you may want to revise that design right now. 
 It will never work. I guarantee it.

It desn't seem to difficult to me, just look at the code, they are 
iterations on subranges of iterators i1 and i2, actually they are the 
only kind of range combination that can be performed safely on general 
iterators.
The range combinations you propose are cumbersome rarely used and in 
general unsafe, I think that it is a bad idea add them to the object 
that is needed to get some foreach magic, and the most generic iterator.

atSamePlace returns true if two iterators have .left (or however you 
call it) at the same place (and in general this might not mean that 
they have the same address) can be implemented for almost all iterators 
in constant time (at the moment I cannot think of a counter example), 
and with it (as the code just above shows) you can define some 
subranges.

In the case in which you have a easily checkable total ordering between 
the elements then yes you do have all the all that it is needed to have 
a real range, and for this range object your subrange operations are 
safe, and I am not against them, just don't force them on every person 
that wants just to iterate on something.

 the only new thing is bidirectional iterator: an iterator that can go 
 in both directions as extra iterator type (your bidirectional range is 
 something different).

 
 A bidirectional range is simply a range that you can shrink from either end.

That is exactly what I said it the sentence before, but in this 
sentence I am speaking about a bidirectional *iterator* that for me is 
an iterator that can move both back and forth.

 I think it is useful and I don't see the need to shoehorn it into a 
 range. For me an iterator is an object that can generate a sequence by 
 itself, so a range is an example of iterator, but I don't see the need 
 to make each iterator a range.

 
 I have put forth reasons for doing away with iterators entirely in the 
 range doc. What are your counter-reasons for bringing back iterators?

they are simpler and describe a larger range of useful constructs 
ranges on liked list as I said are unsafe, which does not meant that 
ranges are not useful, just that there is a place also for simple 
iterators.

Iterators can be perfectly safe it is just the C++ idea of un-bundling 
the end from the iterator that makes it unsafe (an also more cumbersome 
to use).
If iterator for you for you is too connected with C++ view of it call 
them generators: a self containde object that can generate a sequence.

bidirectional iterators are a natural step in the progression of 
iterators also they can be implemented safely.

 As I said before a range also has a total ordering of the objects that 
 can be easily checked, this is a special king of iterator for me, not 
 the most general. Take two ranges of two linked lists, you cannot 
 easily build your intersections because you don't know their relative 
 order, and checking it is inefficient.

 
 Correct. The range intersection primitives are Achille's heel of the 
 range-based design. Checking subrange reachability is O(n), so the 
 range intersection primitives take the user by faith. But iterators 
 have that Achille's heel problem too, plus a few arrows in their back 
 :o). The document clarifies this disadvantage by saying that range 
 intersection primitives are undefined if certain conditions are not met.
 
 In short, this is an endemic problem of an iteration based on either 
 ranges or individual iterators. An objection to that should 
 automatically come with a constructive proof, i.e. a better design.

using atSamePlace you can do it safely on any kind of ranges, I think 
that the operation available should only be safe, and they can be safe, 
in general using atSamePlace, and if there is a quickly checkable total 
ordering (as in arrays for example) by never letting a range be larger 
than the union of two ranges.
One can then discuss if segment it (and the result would be an iterator 
but not a range) or choose only one side (keep it a range) if the two 
ranges are disjoint and have a hole between them.

 So this is what "my" iterator/range would look like :)

 
 I encourage you to realize your design. Before long you'll find 
 probably even more issues with it than I mentioned above, but you'll be 
 gained in being better equipped to find proper solutions.

 
 I hope we converge toward a good solution ;)

 
 Well I haven't seen much code yet.

I have written quite some code in my multidimensional array library ( 
http://github.com/fawzi/narray ) and I thought a lot about iterators, 
not only in D, but in the several languages that I know.
As with everybody I don't think to really see all implications of the 
interface, but I think that I understand them enough to participate 
meaningfully to this discussion.
Actually the moment I am really busy, but I am trying to say something 
because I think this discussion is important for the future of D, and I 
care about it.
I will try to give some real code, but I thought that my contribution 
was not so difficult to understand, but it is always like this to 
yourself one always seems clear ;)

Fawzi

Sep 10 2008

"Bill Baxter" <wbaxter gmail.com> writes:

On Wed, Sep 10, 2008 at 7:47 AM, Fawzi Mohamed <fmohamed mac.com> wrote:

 2) All the methods with intersection of iterator in my opinion are
 difficult to memorize, and rarely used, I would scrap them.
 Instead I would add the comparison operation .atSamePlace(iterator!(T)y)
 that would say if two iterators are at the same place. With it one gets back
 all the power of pointers, and with a syntax and use that are
 understandable.

 But that comparison operation is not enough to implement anything of
 substance. Try your hand at a few classic algorithms and you'll see.

 are you sure? then a range is *exactly* equivalent to a STL iterator, only
 that it cannot go out of bounds:
 // left1-left2:
 while((!i1.isEmpty) && (!i1.atSamePlace(i2))){
  i1.next;
 }
 // left2-left1:
 while((!i2.isEmpty) && (!i1.atSamePlace(i2))){
  i1.next;
 }
 // union 1-2
 while((!i1.isEmpty) && (!(i1.atSamePlace(i2))){
  i1.next;
 }
 while(!i2.isEmpty){
  i2.next;
 }
 // union 2-1
 ...
 // lower triangle
 i1=c.all;
 while(!i1.isEmpty){
  i2=c.all;
  while(!i2.isEmpty && !i2.atSamePlace(i1)){
   i2.next;
  }

Your code shows that you can successfully iterate over the same
elements described by Andrei's various unions and differences, but
they do not show how you would, say, pass that new range another
function to do that job.  Such as you would want to do in say, a
recursive sort.  Since in this design you can't set or access the
individual iterator-like components of a range directly, being able to
copy the begin or end iterator from one range over to another is
necessary, I think.

But I think you and I are in agreement that it would be easier and
more natural to think of ranges as iterators augmented with
information about bounds, as opposed to a contiguous block of things
from A to B.

 well these are the operations that you can do on basically all iterators
 (and with wich you can define new iterators).
 The one you propose need an underlying total order that can be efficiently
 checked, for example iterators on trees do not have necessarily this
 property, and then getting your kind of intersection can be difficult (and
 not faster than the operation using atSamePlace.

I don't think that's correct.  Andrei's system does not need a total
order any more than yours does.  The unions and diffs just create new
ranges by combining the components of existing ranges.  They don't
need to know anything about what happens in between those points or
how you get from one to the other.  Just take the "begin" of this guy
and put it together with the "end" of that guy, for example.  It
doesn't require knowing how to get from anywhere to anywhere to create
that new range.

--bb

Sep 10 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Bill Baxter wrote:
 But I think you and I are in agreement that it would be easier and
 more natural to think of ranges as iterators augmented with
 information about bounds, as opposed to a contiguous block of things
 from A to B.

I like that you are bringing this point up, it is interesting. Note that 
my API never assumes or requires that there's an actual contiguous block 
of things underneath. Au contraire, in the I/O case, there's only "the 
current element" underneath.

But a better example is generators. Consider a function generate that 
takes a string expression using a[0], a[1],... a[k] (the state) and 
returns a[k+1]. The generate function also takes the initial state. Then 
generate returns a range that returns in turn each element of the series.

Generate is easy to implement, but I don't want to get into that now, 
only into usage. Simplest use is:

auto boring = generate!("a[0]"(42);

This guy will generate the series 42 42 42 42 42 42 42... forever and 
ever. Now to use it at all we'd have to temper it. So we use function 
called "take", which accepts a maximum size. And then:

writeln(take(10, boring));

This guy will print "42 42 42 42".

Let's generate a more interesting series. How about an iota:

writeln(take(4, generate!("a[0] + 2")(5)));

That guy prints "5 7 9 11". Or Newton's square root approximations:

writeln(take(4, generate!("(a[0] + 2/a[0])/2")(1.0)));

which prints "1 1.5 1.4167 1.4142". All of these are ranges, some 
bounded and some unbounded, but do not have blocks of elements 
underneath them.

 well these are the operations that you can do on basically all iterators
 (and with wich you can define new iterators).
 The one you propose need an underlying total order that can be efficiently
 checked, for example iterators on trees do not have necessarily this
 property, and then getting your kind of intersection can be difficult (and
 not faster than the operation using atSamePlace.

 
 I don't think that's correct.  Andrei's system does not need a total
 order any more than yours does.  The unions and diffs just create new
 ranges by combining the components of existing ranges.  They don't
 need to know anything about what happens in between those points or
 how you get from one to the other.  Just take the "begin" of this guy
 and put it together with the "end" of that guy, for example.  It
 doesn't require knowing how to get from anywhere to anywhere to create
 that new range.

Yes, that's exactly right.

Andrei

Sep 10 2008

"Bill Baxter" <wbaxter gmail.com> writes:

On Wed, Sep 10, 2008 at 10:07 PM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 Bill Baxter wrote:
 But I think you and I are in agreement that it would be easier and
 more natural to think of ranges as iterators augmented with
 information about bounds, as opposed to a contiguous block of things
 from A to B.

 I like that you are bringing this point up, it is interesting. Note that my
 API never assumes or requires that there's an actual contiguous block of
 things underneath. Au contraire, in the I/O case, there's only "the current
 element" underneath.

Yes, I see that and think it's great.  But the point I've been trying
to make is that the nomenclature you are using seems to emphasize the
contiguous block interpretation, rather than the interpretation as a
cursor plus a sentinel.  The contiguous block terminology makes good
sense for slices, but less for things like trees and unbounded
generators and HMMs.

And ok, I do think your incredible shrinking bidirectional range is
borked.  But other than that, I'm just talking about terminology.

Did you read my posts over on DigtialMars.D?  I'm not into the
"massive thread on d.announce" thing -- makes it too hard to find
sub-threads later -- so I started some new sub-threads over there.

--bb

Sep 10 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Bill Baxter wrote:
 On Wed, Sep 10, 2008 at 10:07 PM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 Bill Baxter wrote:
 But I think you and I are in agreement that it would be easier and
 more natural to think of ranges as iterators augmented with
 information about bounds, as opposed to a contiguous block of things
 from A to B.

 I like that you are bringing this point up, it is interesting. Note that my
 API never assumes or requires that there's an actual contiguous block of
 things underneath. Au contraire, in the I/O case, there's only "the current
 element" underneath.

 
 Yes, I see that and think it's great.  But the point I've been trying
 to make is that the nomenclature you are using seems to emphasize the
 contiguous block interpretation, rather than the interpretation as a
 cursor plus a sentinel.  The contiguous block terminology makes good
 sense for slices, but less for things like trees and unbounded
 generators and HMMs.

I disagree that isEmpty, first, and next suggest anything near 
contiguous block. It's just list terminology. Is the list empty? Give me 
the first element of the list. Advance to the next element in the list.

Names for the before and after range operations are still in the air...

Are you referring to the "range" name itself?

 And ok, I do think your incredible shrinking bidirectional range is
 borked.  But other than that, I'm just talking about terminology.

How is it borked?


Andrei

Sep 10 2008

"Bill Baxter" <wbaxter gmail.com> writes:

On Wed, Sep 10, 2008 at 11:57 PM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 Bill Baxter wrote:
 On Wed, Sep 10, 2008 at 10:07 PM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 Bill Baxter wrote:
 But I think you and I are in agreement that it would be easier and
 more natural to think of ranges as iterators augmented with
 information about bounds, as opposed to a contiguous block of things
 from A to B.

 I like that you are bringing this point up, it is interesting. Note that
 my
 API never assumes or requires that there's an actual contiguous block of
 things underneath. Au contraire, in the I/O case, there's only "the
 current
 element" underneath.

 Yes, I see that and think it's great.  But the point I've been trying
 to make is that the nomenclature you are using seems to emphasize the
 contiguous block interpretation, rather than the interpretation as a
 cursor plus a sentinel.  The contiguous block terminology makes good
 sense for slices, but less for things like trees and unbounded
 generators and HMMs.

 I disagree that isEmpty, first, and next suggest anything near contiguous
 block. It's just list terminology. Is the list empty? Give me the first
 element of the list. Advance to the next element in the list.

However a range isn't, generally speaking, a list.  It's a way to
traverse or access data that may or may not be a list.  For something
like an unbounded generator, it is odd to speak of the "first".  Such
an object has a current value and a "next", but the value you can look
at right now is only the "first" by a bit of a terminology stretch.

I think using list terminology unnecessarily confuses the iterating
construct that does the accessing with the container being accessed.
The range is not the container.  The range consists of a place where
you are, and a termination condition.  The range is not "empty" or
"full" because it does not actually contain elements.  Sure, if you're
dead set on it, you can say that by "empty" we mean that the set of
things you would get if you called .next repeatedly is empty, but why?
 The terminology is just encouraging one to think of a range as a
container, when in fact it is not -- it is more like two goal posts.
Call it atEnd() or similar and you'll naturally encourage people to
think of ranges as references rather than containers.

Similarly, using list terminology led you to "pop".  But pop on a
range does not actually remove any content.  Pop just moves the goal
post on one end.

And then there's the various union/diff stuff, which everyone seems to
find confusing.  I think much of that confusion and mental overhead
just goes away if you think of a range as a good old iterator plus a
stopping condition.

 Names for the before and after range operations are still in the air...

 Are you referring to the "range" name itself?

That could be part of the reason for this tendency to try to assign
list-like names to the parts.  If it were called a "bounded iterator"
I think that would better describe the perspective I'm pushing, and
naturally lead to choices like "atEnd" instead of "isEmpty".

 And ok, I do think your incredible shrinking bidirectional range is
 borked.  But other than that, I'm just talking about terminology.

 How is it borked?

See my post to Digitalmars.D.

But upon further reflection I think it may be that it's just not what
I would call a bidirectional range.  By that I mean it's not good at
solving the problems that a bidirectional iterator in C++ is good for.
 Your bidir range may be useful (though I'm not really convinced that
very many algorithms need what it provides) --  but I think one also
needs an iterator that's good at what C++'s bidir iterators are good
at, i.e. moving the active cursor backwards or forwards.  I would call
your construct more of a "double-headed" range than a bidirectional
one.

--bb

Sep 10 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Bill Baxter wrote:
 On Wed, Sep 10, 2008 at 11:57 PM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 Bill Baxter wrote:
 On Wed, Sep 10, 2008 at 10:07 PM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 Bill Baxter wrote:
 But I think you and I are in agreement that it would be easier and
 more natural to think of ranges as iterators augmented with
 information about bounds, as opposed to a contiguous block of things
 from A to B.

 I like that you are bringing this point up, it is interesting. Note that
 my
 API never assumes or requires that there's an actual contiguous block of
 things underneath. Au contraire, in the I/O case, there's only "the
 current
 element" underneath.

 Yes, I see that and think it's great.  But the point I've been trying
 to make is that the nomenclature you are using seems to emphasize the
 contiguous block interpretation, rather than the interpretation as a
 cursor plus a sentinel.  The contiguous block terminology makes good
 sense for slices, but less for things like trees and unbounded
 generators and HMMs.

 I disagree that isEmpty, first, and next suggest anything near contiguous
 block. It's just list terminology. Is the list empty? Give me the first
 element of the list. Advance to the next element in the list.

 
 However a range isn't, generally speaking, a list.  It's a way to
 traverse or access data that may or may not be a list.  For something
 like an unbounded generator, it is odd to speak of the "first".  Such
 an object has a current value and a "next", but the value you can look
 at right now is only the "first" by a bit of a terminology stretch.

Agreed. The problem with "current" instead of "first" is that there's no 
clear correspondent for "the last that the current will be". First and 
last are obvious. Current and last are... well, not bad either :o).

 I think using list terminology unnecessarily confuses the iterating
 construct that does the accessing with the container being accessed.
 The range is not the container.  The range consists of a place where
 you are, and a termination condition.

No. A bidirectional range also knows the last place you'll ever be, and 
is able to manipulate it.

  The range is not "empty" or
 "full" because it does not actually contain elements.

It is because a range is a view. The view can reduce to nothing. In 
math, an interval can be "empty". That doesn't mean it made all real 
numbers disappear :o).

 Sure, if you're
 dead set on it, you can say that by "empty" we mean that the set of
 things you would get if you called .next repeatedly is empty, but why?
  The terminology is just encouraging one to think of a range as a
 container, when in fact it is not -- it is more like two goal posts.
 Call it atEnd() or similar and you'll naturally encourage people to
 think of ranges as references rather than containers.
 
 Similarly, using list terminology led you to "pop".  But pop on a
 range does not actually remove any content.  Pop just moves the goal
 post on one end.

Correct. Then how would you name'em?

 And then there's the various union/diff stuff, which everyone seems to
 find confusing.  I think much of that confusion and mental overhead
 just goes away if you think of a range as a good old iterator plus a
 stopping condition.

I like before and after. Besides, the challenge is that you come with 
something that's not confusing.

 Names for the before and after range operations are still in the air...

 Are you referring to the "range" name itself?

 
 That could be part of the reason for this tendency to try to assign
 list-like names to the parts.  If it were called a "bounded iterator"
 I think that would better describe the perspective I'm pushing, and
 naturally lead to choices like "atEnd" instead of "isEmpty".

Words are powerful. Phrases are less powerful. I'll never ever settle on 
anything longer than ONE word for the concept. Ranges came to mind 
because boost uses them with a similar meaning.


Andrei

Sep 10 2008

"Bill Baxter" <wbaxter gmail.com> writes:

On Thu, Sep 11, 2008 at 1:30 AM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 Bill Baxter wrote:
 However a range isn't, generally speaking, a list.  It's a way to
 traverse or access data that may or may not be a list.  For something
 like an unbounded generator, it is odd to speak of the "first".  Such
 an object has a current value and a "next", but the value you can look
 at right now is only the "first" by a bit of a terminology stretch.

 Agreed. The problem with "current" instead of "first" is that there's no
 clear correspondent for "the last that the current will be". First and last
 are obvious. Current and last are... well, not bad either :o).

 I think using list terminology unnecessarily confuses the iterating
 construct that does the accessing with the container being accessed.
 The range is not the container.  The range consists of a place where
 you are, and a termination condition.

 No. A bidirectional range also knows the last place you'll ever be, and is
 able to manipulate it.

That's just a mutable termination condition.  Still fits my description.

  The range is not "empty" or
 "full" because it does not actually contain elements.

 It is because a range is a view. The view can reduce to nothing. In math, an
 interval can be "empty". That doesn't mean it made all real numbers
 disappear :o).

The other problem with empty is that it doesn't generalize to what I
happen think a bidirectional range should be, one with .next .prev,
.hasNext and .hasPrev.

Your bidir iterator in C++ parlance is a forward iterator and a
reverse iterator operating on the same sequence.  I can't really think
of any algorithms other than the one you showed that use such a pair.

On the other hand my bidir is useful in all the places a C++ bidir
iterator is useful.  Any time you need to scan a cursor back and
forth.  It basically maps directly onto the operation a doubly-linked
list is good at.  But could be used in traversing any tree-like data
structure too, I think.

 Similarly, using list terminology led you to "pop".  But pop on a
 range does not actually remove any content.  Pop just moves the goal
 post on one end.

 Correct. Then how would you name'em?

I made one proposal on digitalmars.D and I'm still waiting for comments.

 And then there's the various union/diff stuff, which everyone seems to
 find confusing.  I think much of that confusion and mental overhead
 just goes away if you think of a range as a good old iterator plus a
 stopping condition.

 I like before and after. Besides, the challenge is that you come with
 something that's not confusing.

Yeh, before and after aren't too bad.

 Names for the before and after range operations are still in the air...

 Are you referring to the "range" name itself?

 That could be part of the reason for this tendency to try to assign
 list-like names to the parts.  If it were called a "bounded iterator"
 I think that would better describe the perspective I'm pushing, and
 naturally lead to choices like "atEnd" instead of "isEmpty".

 Words are powerful. Phrases are less powerful. I'll never ever settle on
 anything longer than ONE word for the concept. Ranges came to mind because
 boost uses them with a similar meaning.

Yeh, I don't really have a problem with calling them ranges, as long
as people keep in mind they're really bounded iterators.  :-)

--bb

Sep 10 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Bill Baxter wrote:
 The other problem with empty is that it doesn't generalize to what I
 happen think a bidirectional range should be, one with .next .prev,
 .hasNext and .hasPrev.

hasNext and hasPrev are not orthogonal and add unnecessarily 
complicated. Is there a range that has next but not prev, or vice versa? 
No, Sir. There is an "there's still meat on the plate" condition and 
that's all needed.

 Your bidir iterator in C++ parlance is a forward iterator and a
 reverse iterator operating on the same sequence.  I can't really think
 of any algorithms other than the one you showed that use such a pair.

 On the other hand my bidir is useful in all the places a C++ bidir
 iterator is useful.  Any time you need to scan a cursor back and
 forth.  It basically maps directly onto the operation a doubly-linked
 list is good at.  But could be used in traversing any tree-like data
 structure too, I think.

--it is easily done with range primitives if you've saved the initial 
position of it.

 Similarly, using list terminology led you to "pop".  But pop on a
 range does not actually remove any content.  Pop just moves the goal
 post on one end.

 Correct. Then how would you name'em?


r.atEnd
r.value
r.next
r.moveTo(s)
r.moveToEndOf(s)
r.last
r.pop
r.moveEndToEndOf(s) / moveEndTo(s)

I see in another post:

r.atStart

which I think is a design faux pas. Aside from that, things seem 
workable. But honestly I don't see how they bring a world of difference, 
nor had I a slap on my forehead moment when seeing the primitive names 
(as I did with before and after).


Andrei

Sep 10 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Bill Baxter wrote:
 But upon further reflection I think it may be that it's just not what
 I would call a bidirectional range.  By that I mean it's not good at
 solving the problems that a bidirectional iterator in C++ is good for.

It's good. I proved that constructively for std.algorithm, which of 
course doesn't stand. But I've also proved it theoretically informally 
to myself. Please imagine an algorithm that bidir iterators do and bidir 
ranges don't.

  Your bidir range may be useful (though I'm not really convinced that
 very many algorithms need what it provides) --  but I think one also
 needs an iterator that's good at what C++'s bidir iterators are good
 at, i.e. moving the active cursor backwards or forwards.  I would call
 your construct more of a "double-headed" range than a bidirectional
 one.

Oh, one more thing. If you study any algorithm that uses bidirectional 
iterators (such as reverse or Stepanov's partition), you'll notice that 
ALWAYS WITHOUT EXCEPTION there's two iterators involved. One moves up, 
the other moves down. This is absolutely essential because it tells that 
a bidirectional range models all a bidirectional iterator could ever do. 
If you can move some bidirectional iterator down, then definitely you 
know its former boundary so you can model that move with a bidirectional 
range.

This is fundamental. Ranges NEVER grow. They ALWAYS shrink. Why? Simple: 
because a range has no idea what's outside of itself. It starts life 
with information of its limits from the container, and knows nothing 
about what's outside those limits. Consequently it ALWAYS WITHOUT 
EXCEPTION shrinks.


Andrei

Sep 10 2008

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

"Andrei Alexandrescu" wrote
 Bill Baxter wrote:
 But upon further reflection I think it may be that it's just not what
 I would call a bidirectional range.  By that I mean it's not good at
 solving the problems that a bidirectional iterator in C++ is good for.

 It's good. I proved that constructively for std.algorithm, which of course 
 doesn't stand. But I've also proved it theoretically informally to myself. 
 Please imagine an algorithm that bidir iterators do and bidir ranges 
 don't.

Any iterative algorithm where the search might go up or down might be a 
candidate.  Although I think you have a hard time showing one that needs 
strictly bidirectional iterators and not random access iterators.  Perhaps a 
stream represented as a linked list?  Imagine a video stream coming in, 
where the player buffers 10 seconds of data for decoding, and keeps 10 
seconds of data buffered behind the current spot.  If the user pauses the 
video, then wants to play backwards for 5 seconds, what kind of structure 
would you use to represent the 'current point in time'?  A bidir range 
doesn't cut it, because it can only move one direction at a time.  You would 
need 2 bidir ranges, but since you can't 'grow' the ranges, you can't add 
stuff as it is consumed from the forward range to the backwards range, or am 
I wrong about that?  So how do you continually update your backwards 
iterator?  I suppose you could simply 'generate' the backwards iterator when 
needed by diff'ing with the all range, but it seems unnecessarily 
cumbersome.  In fact, you'd need to regenerate both ranges as data is 
removed from the front and added to the back (because the ends are 
continually changing).  Perhaps a meta-range which has 2 bidir ranges in it 
can be provided.  It would be simple enough to implement using existing 
ranges, but might have unnecessary performance issues.

My belief is that ranges should be the form of input to algorithms, but 
iterators should be provided for using containers as general data 
structures.  Similar to how strings are represented by arrays/slices, but 
iterators (pointers) exist if you need them.

I'll probably move forward with this model in dcollections, I really like 
the range idea, and in general the view on how ranges are akin to slices. 
But I also like having access to iterators for other functions.

-Steve

Sep 10 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Steven Schveighoffer wrote:
 "Andrei Alexandrescu" wrote
 Bill Baxter wrote:
 But upon further reflection I think it may be that it's just not what
 I would call a bidirectional range.  By that I mean it's not good at
 solving the problems that a bidirectional iterator in C++ is good for.

 It's good. I proved that constructively for std.algorithm, which of course 
 doesn't stand. But I've also proved it theoretically informally to myself. 
 Please imagine an algorithm that bidir iterators do and bidir ranges 
 don't.

 
 Any iterative algorithm where the search might go up or down might be a 
 candidate.  Although I think you have a hard time showing one that needs 
 strictly bidirectional iterators and not random access iterators.  Perhaps a 
 stream represented as a linked list?  Imagine a video stream coming in, 
 where the player buffers 10 seconds of data for decoding, and keeps 10 
 seconds of data buffered behind the current spot.  If the user pauses the 
 video, then wants to play backwards for 5 seconds, what kind of structure 
 would you use to represent the 'current point in time'?  A bidir range 
 doesn't cut it, because it can only move one direction at a time.

Of course it does. You just remember the leftmost point in time you need 
to remember. Then you use range primitives to get to where you want. 
Maybe a better abstraction for all that is a sliding window though.

 You would 
 need 2 bidir ranges, but since you can't 'grow' the ranges, you can't add 
 stuff as it is consumed from the forward range to the backwards range, or am 
 I wrong about that?  So how do you continually update your backwards 
 iterator?  I suppose you could simply 'generate' the backwards iterator when 
 needed by diff'ing with the all range, but it seems unnecessarily 
 cumbersome.  In fact, you'd need to regenerate both ranges as data is 
 removed from the front and added to the back (because the ends are 
 continually changing).  Perhaps a meta-range which has 2 bidir ranges in it 
 can be provided.  It would be simple enough to implement using existing 
 ranges, but might have unnecessary performance issues.

You don't need a meta range, though it's a good idea to have it as a 
convenience structure. All you need is store the two ranges and do range 
operations on them.

Notice that "a range can't grow" is different from "a range can't be 
assigned from a larger range". In particular, a range operation can 
return a range larger than both input ranges. But not larger than their 
union :o).

 My belief is that ranges should be the form of input to algorithms, but 
 iterators should be provided for using containers as general data 
 structures.  Similar to how strings are represented by arrays/slices, but 
 iterators (pointers) exist if you need them.

If we agree it's better without iterators if not needed, we'd need a 
strong case to add them. Right now I have a strong case against them.

 I'll probably move forward with this model in dcollections, I really like 
 the range idea, and in general the view on how ranges are akin to slices. 
 But I also like having access to iterators for other functions.

Which functions?


Andrei

Sep 10 2008

"Bill Baxter" <wbaxter gmail.com> writes:

On Thu, Sep 11, 2008 at 2:44 AM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 Steven Schveighoffer wrote:
 "Andrei Alexandrescu" wrote
 Bill Baxter wrote:
 But upon further reflection I think it may be that it's just not what
 I would call a bidirectional range.  By that I mean it's not good at
 solving the problems that a bidirectional iterator in C++ is good for.

 It's good. I proved that constructively for std.algorithm, which of
 course doesn't stand. But I've also proved it theoretically informally to
 myself. Please imagine an algorithm that bidir iterators do and bidir ranges
 don't.

 Any iterative algorithm where the search might go up or down might be a
 candidate.  Although I think you have a hard time showing one that needs
 strictly bidirectional iterators and not random access iterators.  Perhaps a
 stream represented as a linked list?  Imagine a video stream coming in,
 where the player buffers 10 seconds of data for decoding, and keeps 10
 seconds of data buffered behind the current spot.  If the user pauses the
 video, then wants to play backwards for 5 seconds, what kind of structure
 would you use to represent the 'current point in time'?  A bidir range
 doesn't cut it, because it can only move one direction at a time.

 Of course it does. You just remember the leftmost point in time you need to
 remember. Then you use range primitives to get to where you want. Maybe a
 better abstraction for all that is a sliding window though.

Cognitive load...
What if I want to write a nice standalone function that takes a
pointer to where we are and manipulates it?  I have to pass that
function two iterators I suppose?  One is (begin,current) the other
(current,end), and as I iterate I have to move both the second of the
first and the first of second?  All just to do something that should
be trivial with a linked list.

I agree that your pinch range is needed, but I also see a need for
something that maps more directly onto the features of a doubly linked
list.

--bb

Sep 10 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Bill Baxter wrote:
 On Thu, Sep 11, 2008 at 2:44 AM, Andrei Alexandrescu
 Cognitive load...
 What if I want to write a nice standalone function that takes a
 pointer to where we are and manipulates it?  I have to pass that
 function two iterators I suppose?

A function only needing one iterator is a chymera. It can't move it any 
direction. To such a function you pass a pointer or reference to the 
object you want to manipulate directly. What's there to not like about it.

 One is (begin,current) the other
 (current,end), and as I iterate I have to move both the second of the
 first and the first of second?  All just to do something that should
 be trivial with a linked list.
 
 I agree that your pinch range is needed, but I also see a need for
 something that maps more directly onto the features of a doubly linked
 list.

I think you get a lot more insight by actually sitting down and 
rewriting a part of std.algorithm, and/or write some more meaningful 
algorithms with your abstraction of choice. When I started doing so I 
had no idea of what range primitives I need. And just like you now, I 
kept on hypothesizing in the dark on whether I need this and whether I 
need that. When you hypothesize in the dark the number of primitive 
things you need really grows unbounded, because there's always some 
unrealized imaginary need you want to satisfy. To carry the discussion 
on equal footing you need to do some of that work. Otherwise you will 
keep on coming with hypothetical situations of unverifiable likelihood, 
and I will have little meaningful retort to put forth.

Speaking of which, a great merit of Stepanov is that he showed what a 
great host of algorithms can be implemented with a precise and narrow 
interface. We all knew how to rotate elements in an array. He showed how 
to rotate elements in a singly-linked list.


Andrei

Sep 10 2008

"Bill Baxter" <wbaxter gmail.com> writes:

On Thu, Sep 11, 2008 at 3:33 AM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 Bill Baxter wrote:
 On Thu, Sep 11, 2008 at 2:44 AM, Andrei Alexandrescu
 Cognitive load...
 What if I want to write a nice standalone function that takes a
 pointer to where we are and manipulates it?  I have to pass that
 function two iterators I suppose?

 A function only needing one iterator is a chymera. It can't move it any
 direction. To such a function you pass a pointer or reference to the object
 you want to manipulate directly. What's there to not like about it.

Oops.  I meant two ranges not two iterators there.  There are no
iterators in this world.  What I was after in the above is a function
that somehow gets a) where we are, b) how far back we can go c) how
far forward we can go.    With ranges that seems cumbersome.  With
iterators it's exactly those 3 iterators.

 I think you get a lot more insight by actually sitting down and rewriting a
 part of std.algorithm, and/or write some more meaningful algorithms with
 your abstraction of choice. When I started doing so I had no idea of what
 range primitives I need. And just like you now, I kept on hypothesizing in
 the dark on whether I need this and whether I need that. When you
 hypothesize in the dark the number of primitive things you need really grows
 unbounded, because there's always some unrealized imaginary need you want to
 satisfy. To carry the discussion on equal footing you need to do some of
 that work. Otherwise you will keep on coming with hypothetical situations of
 unverifiable likelihood, and I will have little meaningful retort to put
 forth.

Ok.  There's the ultimatum.  I'll shut up and go to bed now.  :-)

--bb

Sep 10 2008

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

"Andrei Alexandrescu" wrote
 Steven Schveighoffer wrote:
 "Andrei Alexandrescu" wrote
 Bill Baxter wrote:
 But upon further reflection I think it may be that it's just not what
 I would call a bidirectional range.  By that I mean it's not good at
 solving the problems that a bidirectional iterator in C++ is good for.

 It's good. I proved that constructively for std.algorithm, which of 
 course doesn't stand. But I've also proved it theoretically informally 
 to myself. Please imagine an algorithm that bidir iterators do and bidir 
 ranges don't.

 Any iterative algorithm where the search might go up or down might be a 
 candidate.  Although I think you have a hard time showing one that needs 
 strictly bidirectional iterators and not random access iterators. 
 Perhaps a stream represented as a linked list?  Imagine a video stream 
 coming in, where the player buffers 10 seconds of data for decoding, and 
 keeps 10 seconds of data buffered behind the current spot.  If the user 
 pauses the video, then wants to play backwards for 5 seconds, what kind 
 of structure would you use to represent the 'current point in time'?  A 
 bidir range doesn't cut it, because it can only move one direction at a 
 time.

 Of course it does. You just remember the leftmost point in time you need 
 to remember. Then you use range primitives to get to where you want. Maybe 
 a better abstraction for all that is a sliding window though.

Not sure.  I'd have to see how messy the 'use range primitives' looks :)

 You would need 2 bidir ranges, but since you can't 'grow' the ranges, you 
 can't add stuff as it is consumed from the forward range to the backwards 
 range, or am I wrong about that?  So how do you continually update your 
 backwards iterator?  I suppose you could simply 'generate' the backwards 
 iterator when needed by diff'ing with the all range, but it seems 
 unnecessarily cumbersome.  In fact, you'd need to regenerate both ranges 
 as data is removed from the front and added to the back (because the ends 
 are continually changing).  Perhaps a meta-range which has 2 bidir ranges 
 in it can be provided.  It would be simple enough to implement using 
 existing ranges, but might have unnecessary performance issues.

 You don't need a meta range, though it's a good idea to have it as a 
 convenience structure. All you need is store the two ranges and do range 
 operations on them.

Perhaps not, I haven't used the ranges as you have implemented them, nor 
have I used them from boost.  I agree with the general idea that ranges are 
safer and simpler to use when a range is needed.  It makes perfect sense to 
pass a single range type rather than 2 iterators, and this is the majority 
of usages for iterators anyways.  I 100% agree that ranges are the way to go 
instead of passing begin() and end() all the time to algorithm templates.

 Notice that "a range can't grow" is different from "a range can't be 
 assigned from a larger range". In particular, a range operation can return 
 a range larger than both input ranges. But not larger than their union 
 :o).

Yes, so every time you add an element you have to update your forward range 
from the 'all' range so it includes the new element at the end.  Every time 
you remove an element, you have to update your reverse range from the 'all' 
range so it excludes the element at the beginning.  Failure to do this 
results in invalid ranges, which seems to me like a lot more work than 
simply not doing anything  (in the case of an iterator).  The pitfalls of 
using ranges for dynamically changing containers might outweigh the 
advantages that they provide in certain cases.

 My belief is that ranges should be the form of input to algorithms, but 
 iterators should be provided for using containers as general data 
 structures.  Similar to how strings are represented by arrays/slices, but 
 iterators (pointers) exist if you need them.

 If we agree it's better without iterators if not needed, we'd need a 
 strong case to add them. Right now I have a strong case against them.

I don't need to worry about whether you have them or not, I can always 
implement them on my own ;)  Really, range/iterator support doesn't require 
direct support from the compiler (except for builtin arrays), and any 
improvements made to the compiler to support ranges (such as reference 
returns, etc) can be applied to iterators as well.

I think ranges are an excellent representation when a range of elements is 
needed.  I think a cursor or iterator is an excellent representation when an 
individual position is needed.

 I'll probably move forward with this model in dcollections, I really like 
 the range idea, and in general the view on how ranges are akin to slices. 
 But I also like having access to iterators for other functions.

 Which functions?

Functions which take or return a single position.  Such as 'erase the 
element at this position' or 'find the position of element x'.

-Steve

Sep 10 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Steven Schveighoffer wrote:
 "Andrei Alexandrescu" wrote
 Notice that "a range can't grow" is different from "a range can't be 
 assigned from a larger range". In particular, a range operation can return 
 a range larger than both input ranges. But not larger than their union 
 :o).

 
 Yes, so every time you add an element you have to update your forward range 
 from the 'all' range so it includes the new element at the end.  Every time 
 you remove an element, you have to update your reverse range from the 'all' 
 range so it excludes the element at the beginning.  Failure to do this 
 results in invalid ranges, which seems to me like a lot more work than 
 simply not doing anything  (in the case of an iterator).  The pitfalls of 
 using ranges for dynamically changing containers might outweigh the 
 advantages that they provide in certain cases.

No, this is incorrect. I don't "have to" at all. I could define the
behavior of range as you mention, or I could render them undefined.
Iterators invalidate anyway at the drop of a hat, so they're none the
wiser. You can't transform a lack of an advantage into a disadvantage.

"Look at this pineapple. It's fresher than the other, and bigger too."

"No, it's about as big. That pineapple sucks."

 My belief is that ranges should be the form of input to algorithms, but 
 iterators should be provided for using containers as general data 
 structures.  Similar to how strings are represented by arrays/slices, but 
 iterators (pointers) exist if you need them.

 If we agree it's better without iterators if not needed, we'd need a 
 strong case to add them. Right now I have a strong case against them.

 
 I don't need to worry about whether you have them or not, I can always 
 implement them on my own ;)  Really, range/iterator support doesn't require 
 direct support from the compiler (except for builtin arrays), and any 
 improvements made to the compiler to support ranges (such as reference 
 returns, etc) can be applied to iterators as well.
 
 I think ranges are an excellent representation when a range of elements is 
 needed.  I think a cursor or iterator is an excellent representation when an 
 individual position is needed.
 
 I'll probably move forward with this model in dcollections, I really like 
 the range idea, and in general the view on how ranges are akin to slices. 
 But I also like having access to iterators for other functions.

 Which functions?

 
 Functions which take or return a single position.  Such as 'erase the 
 element at this position' or 'find the position of element x'.

I agree. In fact I agreed in my original document, which I quote:
``Coding with ranges also has disadvatages. Some algorithms work
naturally with individual iterators in the "middle" of a range. A
range-based implementation would have to maintain a redundant range
spanning e.g. from the beginning of the container to that middle.''

However, I could meanigfully rewrite std.algorithm to work on ranges
alone. The disadvantage does exist but is minor, For example, find does
not return an iterator. It simply shrinks its input range until the
element is found, or until it is empty. That way you can nicely use the
result of find iteratively.

Range find(alias pred = "a == b", Range, E)(Range haystack, E needle)
{
     alias binaryFun!(pred) test;
     for (; !haystack.isEmpty; haystack.next)
     {
         if (test(haystack.first, needle)) break;
     }
     return haystack;
}

This is quite a few bites smaller than the previous version, which is
now to be found in std.algorithm:

Iterator!(Range) find(alias pred = "a == b", Range, E)(Range haystack, E
needle)
{
     alias binaryFun!(pred) test;
     for (auto i = begin(haystack); i != end(haystack); ++i)
     {
         if (test(*i, needle)) return i;
     }
     return end(haystack);
}

It has two less variables, and VERY importantly, one less type to deal
with. Arguments aired against primitive ranges systematically omit this
important simplification they bring. When you don't weigh in the
advantages, of course all there is to be seen are the disadvantages.

Better yet, when find does return, client code's in better shape because
it doesn't need to compare the result against end(myrange). It can just
test whether it's empty and be done with.

So a newcomer to D2 would have to have an understanding of containers
and ranges. Containers own data. They offer various traversals to crawl
them in the form of ranges. Ranges are generalized slices.

If iterators are thrown into the mix, things get more complex because
iterators are a lower-level primitive, a generalized pointer. So the
newcomer would have to ALSO understand iterators and deal with functions
that require or return either. They'd have to learn how to pair
iterators from ranges and how to extract iterators from ranges (more
primitives). They'd also have to understand when it's better to hold on
to a range (most of the time) versus a naked iterator (seldom and for a
dubious convenience).

I /understand/ there are advantages to iterators. Just don't forget the
cost when discussing them.

I am also sure that if I sat down long enough contemplating my navel I
could come with more examples of iterators=good/ranges=bad. I am also
sure that if I continued doing so I could also figure cases where a
doubly linked-list iterator that "knows" whether it's atBegin or atEnd
could come in handily. In fact how about this imaginary discussion
between Stepanov and his imaginary colleague Tikhonov:

Stepanov: "Here, I have these cool iterators. I can express a great deal
of stuff with them. It's amazing."

Tikhonov: "Ok, I have a doubly-linked list. An iterator is a node in the
list, right?"

S: "Da. Those are bidirectional iterators because they can move in two
directions in the list."

T: "Ok, my first element has prev == NULL and last element has next ==
NULL. Does your iterator know when it's at the begin and at the end of
the list?"

S: "No. You see, you'd have to compare two iterators to figure that out.
Just pass around an iterator to the beginning and end of the list
fragment you're interested in, as you find fit."

T: "That sucks! I want an iterator to move up and down and tell me when
it's at the beginning and at the end, without another stinkin' iterator."

S: "I implemented a great deal of algorithms without needing that. What
algorithms of yours can't work with a comparison instead of atBegin and
atEnd?"

T: "Well, I need to think of it. Maybe some video buffer or something."

S: "That works. You just save the iterator at the beginning of the
sliding window. Then you compare against it."

T: "But I don't like that. I want you to define atBegin and atEnd so I
don't need to carry an extra laggard!"

S: "Then what algorithms would fundamentally rest on that particular
feature?"

T: "No idea. Here, let me look at my navel."

S: "While you do that, let me ask you this. Whatever those algorithms
are, they can't work on a circular list, right?"

T: "I guess not. atBegin is never true for a circular list, unless,
damn, you keep another iterator around to compare with."

S: "So those algorithms of yours would work on a doubly-linked list but
not on a circular list. Are you sure you care for that distinction and
that loss in generality?"

T: "Gee, look at the time. It's Friday evening! Let's go have a beer."


Andrei

Sep 10 2008

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

"Andrei Alexandrescu" wrote
 Steven Schveighoffer wrote:
 "Andrei Alexandrescu" wrote
 Notice that "a range can't grow" is different from "a range can't be 
 assigned from a larger range". In particular, a range operation can 
 return a range larger than both input ranges. But not larger than their 
 union :o).

 Yes, so every time you add an element you have to update your forward 
 range from the 'all' range so it includes the new element at the end. 
 Every time you remove an element, you have to update your reverse range 
 from the 'all' range so it excludes the element at the beginning. 
 Failure to do this results in invalid ranges, which seems to me like a 
 lot more work than simply not doing anything  (in the case of an 
 iterator).  The pitfalls of using ranges for dynamically changing 
 containers might outweigh the advantages that they provide in certain 
 cases.

 No, this is incorrect. I don't "have to" at all. I could define the
 behavior of range as you mention, or I could render them undefined.
 Iterators invalidate anyway at the drop of a hat, so they're none the
 wiser. You can't transform a lack of an advantage into a disadvantage.

A range or iterator that becomes undefined when adding an element to a 
linked list or removing an element from a linked list (provided you don't 
remove the element in question) makes it useless for this type of purpose. 
What I want is a cursor that saves the position of an element, not the end 
and beginning.

Here is what I'm assuming a range consists of, and granted this is an 
assumption since I didn't look at any of your implementation, and a list 
object which uses ranges doesn't even exist AFAIK.  Assume that integers 
below are individual elements

all: 0 1 2 3 4 5 6 7 8 9 E
reverse range: 0 1 2 3 4
forward range: 5 6 7 8 9 E

Now I remove an element from the front:

all: 1 2 3 4 5 6 7 8 9 E
reverse range: ? 1 2 3 4
forward range: 5 6 7 8 9 E

I've now lost my reverse iterator because it's no longer valid, but I can 
reconstruct it by diffing the forward iterator and list.all.

If I add to the end I got a similar situation, I can reconstruct my forward 
iterator by diffing list.all and the reverse iterator.

Yes, it can be done, but it seems like more work than it's worth for this 
case.  The problem is, not only do I have to pay attention to what the end 
and beginning of the list are (as I would with an iterator), but I also have 
to pay attention to the same pieces in the ranges.  So ranges (in this 
implementation) have given me more work to do, and their still not safe 
because I could mistakenly use an invalid range.

 "Look at this pineapple. It's fresher than the other, and bigger too."

 "No, it's about as big. That pineapple sucks."

???

 My belief is that ranges should be the form of input to algorithms, but 
 iterators should be provided for using containers as general data 
 structures.  Similar to how strings are represented by arrays/slices, 
 but iterators (pointers) exist if you need them.

 If we agree it's better without iterators if not needed, we'd need a 
 strong case to add them. Right now I have a strong case against them.

 I don't need to worry about whether you have them or not, I can always 
 implement them on my own ;)  Really, range/iterator support doesn't 
 require direct support from the compiler (except for builtin arrays), and 
 any improvements made to the compiler to support ranges (such as 
 reference returns, etc) can be applied to iterators as well.

 I think ranges are an excellent representation when a range of elements 
 is needed.  I think a cursor or iterator is an excellent representation 
 when an individual position is needed.

 I'll probably move forward with this model in dcollections, I really 
 like the range idea, and in general the view on how ranges are akin to 
 slices. But I also like having access to iterators for other functions.

 Which functions?

 Functions which take or return a single position.  Such as 'erase the 
 element at this position' or 'find the position of element x'.

 I agree. In fact I agreed in my original document, which I quote:
 ``Coding with ranges also has disadvatages. Some algorithms work
 naturally with individual iterators in the "middle" of a range. A
 range-based implementation would have to maintain a redundant range
 spanning e.g. from the beginning of the container to that middle.''

 However, I could meanigfully rewrite std.algorithm to work on ranges
 alone. The disadvantage does exist but is minor, For example, find does
 not return an iterator. It simply shrinks its input range until the
 element is found, or until it is empty. That way you can nicely use the
 result of find iteratively.

I totally agree with you that ranges are the way to go for std.algorithm.  I 
am not debating that.

But you have no example of how iterators and ranges compare for using 
non-array containers in situations BESIDES running std.algorithm.  I'm 
showing you an example, which happens to model after code I actually wrote 
and use, where iterators seem to be more suited for the task.

 Range find(alias pred = "a == b", Range, E)(Range haystack, E needle)
 {
     alias binaryFun!(pred) test;
     for (; !haystack.isEmpty; haystack.next)
     {
         if (test(haystack.first, needle)) break;
     }
     return haystack;
 }

 This is quite a few bites smaller than the previous version, which is
 now to be found in std.algorithm:

 Iterator!(Range) find(alias pred = "a == b", Range, E)(Range haystack, E
 needle)
 {
     alias binaryFun!(pred) test;
     for (auto i = begin(haystack); i != end(haystack); ++i)
     {
         if (test(*i, needle)) return i;
     }
     return end(haystack);
 }

 It has two less variables, and VERY importantly, one less type to deal
 with. Arguments aired against primitive ranges systematically omit this
 important simplification they bring. When you don't weigh in the
 advantages, of course all there is to be seen are the disadvantages.

 Better yet, when find does return, client code's in better shape because
 it doesn't need to compare the result against end(myrange). It can just
 test whether it's empty and be done with.

Unless myrange has changed since you called find.  In which case you have to 
run find again to get the range?

 So a newcomer to D2 would have to have an understanding of containers
 and ranges. Containers own data. They offer various traversals to crawl
 them in the form of ranges. Ranges are generalized slices.

 If iterators are thrown into the mix, things get more complex because
 iterators are a lower-level primitive, a generalized pointer. So the
 newcomer would have to ALSO understand iterators and deal with functions
 that require or return either. They'd have to learn how to pair
 iterators from ranges and how to extract iterators from ranges (more
 primitives). They'd also have to understand when it's better to hold on
 to a range (most of the time) versus a naked iterator (seldom and for a
 dubious convenience).

 I /understand/ there are advantages to iterators. Just don't forget the
 cost when discussing them.

I don't forget the cost.  I absolutely *100%* agree that ranges are a much 
better representation for std.algorithm.  i.e. when a RANGE OF VALUES is 
required.  When you want references SINGLE ELEMENTS that persist across 
container changes, I think the best implementation is a cursor/iterator.  I 
think they can both exist.  I think there is value to having a pointer to a 
single element without storing the boundaries with that pointer.


This is just like the const debate that I continue to have with you and 
Walter.  You want const for different reasons than for what I want const.  I 
want const for contracts, and you want it for pure functions.  You seem to 
dismiss anything that isn't in your realm of requirements as 'dubious' and 
'seldom used'.  Other people have requirements that are different than 
yours, and are just as valid.

 I am also sure that if I sat down long enough contemplating my navel I
 could come with more examples of iterators=good/ranges=bad.
 <snip>

Now you're just being rude :)  Please note that I'm not attacking you 
personally.  All I'm pointing out is that your solution solves certain 
problems VERY well, but leaves other problems not solved.  I think allowing 
iterators/cursors would solve all the problems.  I might be proven wrong, 
but certainly I don't think you've done that so far.  I'd love to be proven 
wrong, since I agree that iterators are generally unsafe.

-Steve

Sep 10 2008

Fawzi Mohamed <fmohamed mac.com> writes:

I am sorry I hadn't the time t fully follow the discussion, but I took 
some time to actually define how I think a basic iterator should be, in 
this I think I am quite close to Bill and Steven from what I could see.

Again I am not against ranges, ranges are nice, but iterators are more 
general, and in my opinion they should be what foreach targets.
Then ranges can basically trivially be an iterator that has more 
structure (compareLevel= FullOrdering) and more

basic idea:
an iterator has a position in a sequence+ possibility to move into it


Basic Iterator

// return element and go to next
// (reasons: most common use, only one function call (good if not 
inlined), also o for iterators on data that is not stored)
// throw an exception if you iterate past end
T next();
void transferTo(ref R it2) // transfer this iterator to it2 
(un-copyable iterators)
void stop(); // stop the iteration (might release resources)
size_t nElNext(); // number of elements, constant time, 0 if empty
ComparePos comparePos(R it2); // comparison of position, has to be in 
constant time
static const CompareLevel compareLevel; // compare level (for compile 
time choices)
static const SizeLevel sizeLevel; // size level (for compile time choices)

Copyable Generator: Iterator
T value; // return the actual value
opAssign(R); // copies the iterator

A range is obviously also a Basic iterator, but has more structure

* Bidirectional Iterator
size_t nElPrev; // number of elements, constant time, 0 if empty
T prev; // goes to previous element

// constants
enum ComparePos:int {
    Uncomparable, // might be bigger smaller or incompatible (Same only 
if compareLevel==CompareLevel.None)
    Incompatible, // ranges of two different sequences
    Same, // at the same position
    Growing, // in growing order
    Descending // in decreasing order
}
enum CompareLevel:int {
    None, // no comparison
    Equal, // can decide if they are at the same position in constant time
    FullOrdering // can compare all iterators in constant time
}
enum SizeLevel:int{
    Bounded, // finite and known size
    Finite, // finite but possibly unknown size
    MaybeFinite, // maybe finite, maybe infinite
    Infinite // infinite
}
const INFINITE_SIZE=size_t.max;
const MAYBE_INFINITE=size_t.max-1;
const UNKNOWN_FINITE=size_t.max-2;

Sep 10 2008

Sergey Gromov <snake.scaly gmail.com> writes:

Steven Schveighoffer <schveiguy yahoo.com> wrote:
 "Andrei Alexandrescu" wrote
 Steven Schveighoffer wrote:
 "Andrei Alexandrescu" wrote
 Notice that "a range can't grow" is different from "a range can't be 
 assigned from a larger range". In particular, a range operation can 
 return a range larger than both input ranges. But not larger than their 
 union :o).

 Yes, so every time you add an element you have to update your forward 
 range from the 'all' range so it includes the new element at the end. 
 Every time you remove an element, you have to update your reverse range 
 from the 'all' range so it excludes the element at the beginning. 
 Failure to do this results in invalid ranges, which seems to me like a 
 lot more work than simply not doing anything  (in the case of an 
 iterator).  The pitfalls of using ranges for dynamically changing 
 containers might outweigh the advantages that they provide in certain 
 cases.

 No, this is incorrect. I don't "have to" at all. I could define the
 behavior of range as you mention, or I could render them undefined.
 Iterators invalidate anyway at the drop of a hat, so they're none the
 wiser. You can't transform a lack of an advantage into a disadvantage.

 
 A range or iterator that becomes undefined when adding an element to a 
 linked list or removing an element from a linked list (provided you don't 
 remove the element in question) makes it useless for this type of purpose. 
 What I want is a cursor that saves the position of an element, not the end 
 and beginning.
 
 Here is what I'm assuming a range consists of, and granted this is an 
 assumption since I didn't look at any of your implementation, and a list 
 object which uses ranges doesn't even exist AFAIK.  Assume that integers 
 below are individual elements
 
 all: 0 1 2 3 4 5 6 7 8 9 E
 reverse range: 0 1 2 3 4
 forward range: 5 6 7 8 9 E
 
 Now I remove an element from the front:
 
 all: 1 2 3 4 5 6 7 8 9 E
 reverse range: ? 1 2 3 4
 forward range: 5 6 7 8 9 E
 
 I've now lost my reverse iterator because it's no longer valid, but I can 
 reconstruct it by diffing the forward iterator and list.all.
 
 If I add to the end I got a similar situation, I can reconstruct my forward 
 iterator by diffing list.all and the reverse iterator.

You don't mention here which iterator usage pattern you are trying to 
model with ranges.  I can think of at least two.

1.  You use a single bidirectional 'center' iterator, center == 5.  As 
one would naturally do with iterators.  Note then that whenever you use 
your center for, say, backward iteration, you reconstruct the actual 
range by calling list.begin.  You do it on each iteration.  No wonder it 
stays valid even if you remove the first element in the meantime: you're 
constructing your range from scratch anyway.  If you want to model this 
pattern with ranges---no problem, keep an empty 'center' range, center 
== (5,5), and reconstruct backward iteration range,

reverse = all.before(center);

whenever you need to iterate, then

center = reverse.end;

This 'center' range, being slightly less efficient, stays valid and 
becomes invalid in exactly the same conditions as your classical 
iterator.

2.  You use 3 iterators, one for the list start, one for the center, and 
one for the end.  In this case the 'start' iterator becomes invalid 
after removing the first element exactly like 'reverse' range becomes 
invalid in your example, with exactly the same consequences.

Sep 10 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Sergey Gromov wrote:
 Steven Schveighoffer <schveiguy yahoo.com> wrote:
 "Andrei Alexandrescu" wrote
 Steven Schveighoffer wrote:
 "Andrei Alexandrescu" wrote
 Notice that "a range can't grow" is different from "a range can't be 
 assigned from a larger range". In particular, a range operation can 
 return a range larger than both input ranges. But not larger than their 
 union :o).

 Yes, so every time you add an element you have to update your forward 
 range from the 'all' range so it includes the new element at the end. 
 Every time you remove an element, you have to update your reverse range 
 from the 'all' range so it excludes the element at the beginning. 
 Failure to do this results in invalid ranges, which seems to me like a 
 lot more work than simply not doing anything  (in the case of an 
 iterator).  The pitfalls of using ranges for dynamically changing 
 containers might outweigh the advantages that they provide in certain 
 cases.

 No, this is incorrect. I don't "have to" at all. I could define the
 behavior of range as you mention, or I could render them undefined.
 Iterators invalidate anyway at the drop of a hat, so they're none the
 wiser. You can't transform a lack of an advantage into a disadvantage.

 A range or iterator that becomes undefined when adding an element to a 
 linked list or removing an element from a linked list (provided you don't 
 remove the element in question) makes it useless for this type of purpose. 
 What I want is a cursor that saves the position of an element, not the end 
 and beginning.

 Here is what I'm assuming a range consists of, and granted this is an 
 assumption since I didn't look at any of your implementation, and a list 
 object which uses ranges doesn't even exist AFAIK.  Assume that integers 
 below are individual elements

 all: 0 1 2 3 4 5 6 7 8 9 E
 reverse range: 0 1 2 3 4
 forward range: 5 6 7 8 9 E

 Now I remove an element from the front:

 all: 1 2 3 4 5 6 7 8 9 E
 reverse range: ? 1 2 3 4
 forward range: 5 6 7 8 9 E

 I've now lost my reverse iterator because it's no longer valid, but I can 
 reconstruct it by diffing the forward iterator and list.all.

 If I add to the end I got a similar situation, I can reconstruct my forward 
 iterator by diffing list.all and the reverse iterator.

 
 You don't mention here which iterator usage pattern you are trying to 
 model with ranges.  I can think of at least two.
 
 1.  You use a single bidirectional 'center' iterator, center == 5.  As 
 one would naturally do with iterators.  Note then that whenever you use 
 your center for, say, backward iteration, you reconstruct the actual 
 range by calling list.begin.  You do it on each iteration.  No wonder it 
 stays valid even if you remove the first element in the meantime: you're 
 constructing your range from scratch anyway.  If you want to model this 
 pattern with ranges---no problem, keep an empty 'center' range, center 
 == (5,5), and reconstruct backward iteration range,
 
 reverse = all.before(center);
 
 whenever you need to iterate, then
 
 center = reverse.end;
 
 This 'center' range, being slightly less efficient, stays valid and 
 becomes invalid in exactly the same conditions as your classical 
 iterator.
 
 2.  You use 3 iterators, one for the list start, one for the center, and 
 one for the end.  In this case the 'start' iterator becomes invalid 
 after removing the first element exactly like 'reverse' range becomes 
 invalid in your example, with exactly the same consequences.

I'm acquiring the nagging feeling that Sergey understands ranges better 
than I do. I could understand how to address Steven's point only after 
reading the post above. Thanks, Sergey.

Andrei

Sep 10 2008

Sergey Gromov <snake.scaly gmail.com> writes:

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:
 I'm acquiring the nagging feeling that Sergey understands ranges better 
 than I do. I could understand how to address Steven's point only after 
 reading the post above. Thanks, Sergey.

Thank you, and welcome! ;)

P.S. I really love the looks of "reverse = all.before(center);" It's 
like writing program in plain English.

Sep 10 2008

Derek Parnell <derek psych.ward> writes:

On Thu, 11 Sep 2008 02:20:32 +0400, Sergey Gromov wrote:

 Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:
 I'm acquiring the nagging feeling that Sergey understands ranges better 
 than I do. I could understand how to address Steven's point only after 
 reading the post above. Thanks, Sergey.

 
 Thank you, and welcome! ;)
 
 P.S. I really love the looks of "reverse = all.before(center);" It's 
 like writing program in plain English.

Oh boy! We must put an end to that otherwise we might all be out of a job
;-)
-- 
Derek Parnell
Melbourne, Australia
skype: derek.j.parnell

Sep 10 2008

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

"Sergey Gromov" wrote
 You don't mention here which iterator usage pattern you are trying to
 model with ranges.  I can think of at least two.

 1.  You use a single bidirectional 'center' iterator, center == 5.  As
 one would naturally do with iterators.  Note then that whenever you use
 your center for, say, backward iteration, you reconstruct the actual
 range by calling list.begin.  You do it on each iteration.  No wonder it
 stays valid even if you remove the first element in the meantime: you're
 constructing your range from scratch anyway.  If you want to model this
 pattern with ranges---no problem, keep an empty 'center' range, center
 == (5,5), and reconstruct backward iteration range,

 reverse = all.before(center);

 whenever you need to iterate, then

 center = reverse.end;

 This 'center' range, being slightly less efficient, stays valid and
 becomes invalid in exactly the same conditions as your classical
 iterator.

This is exactly the pattern I use.  I agree that your example would solve 
the problem, I hadn't thought of an empty range to be a cursor, that is 
clever!

The only missing piece to your solution is that I must construct the range 
after the center range in order to access the value to see where I need to 
go.

What I see as the biggest downside is the cumbersome and verbose code of 
moving the 'iterator' around, as every time I want to move forward, I 
construct a new range, and every time I want to move backwards I construct a 
new range (and construct a new 'center' afterwards).  So a 'move back one' 
looks like:

auto before = all.before(center);
if(!before.isEmpty)
  center = before.pop.end;

And to move forward it's:
auto after = all.after(center);
if(!after.isEmpty)
  center = after.next.begin;

To get the value there, I have to do:
all.after(center).left // or whatever gets decided as the 'get first value 
of range' member

or if opStar is used:

*all.after(center);

I much prefer:

forward:
if(center != list.end)
    ++center;

reverse:
if(center != list.begin)
   --center;

get value:
*center;

Especially without all the extra overhead

I see both methods as being just as open to mistakes, the first more-so, and 
more difficult to comprehend (at least for me).

-Steve

Sep 10 2008

"Bill Baxter" <wbaxter gmail.com> writes:

On Thu, Sep 11, 2008 at 8:17 AM, Steven Schveighoffer
<schveiguy yahoo.com> wrote:
 "Sergey Gromov" wrote
 You don't mention here which iterator usage pattern you are trying to
 model with ranges.  I can think of at least two.

 1.  You use a single bidirectional 'center' iterator, center == 5.  As
 one would naturally do with iterators.  Note then that whenever you use
 your center for, say, backward iteration, you reconstruct the actual
 range by calling list.begin.  You do it on each iteration.  No wonder it
 stays valid even if you remove the first element in the meantime: you're
 constructing your range from scratch anyway.  If you want to model this
 pattern with ranges---no problem, keep an empty 'center' range, center
 == (5,5), and reconstruct backward iteration range,

 reverse = all.before(center);

 whenever you need to iterate, then

 center = reverse.end;

 This 'center' range, being slightly less efficient, stays valid and
 becomes invalid in exactly the same conditions as your classical
 iterator.

 This is exactly the pattern I use.  I agree that your example would solve
 the problem, I hadn't thought of an empty range to be a cursor, that is
 clever!

 The only missing piece to your solution is that I must construct the range
 after the center range in order to access the value to see where I need to
 go.

 What I see as the biggest downside is the cumbersome and verbose code of
 moving the 'iterator' around, as every time I want to move forward, I
 construct a new range, and every time I want to move backwards I construct a
 new range (and construct a new 'center' afterwards).  So a 'move back one'
 looks like:

 auto before = all.before(center);
 if(!before.isEmpty)
  center = before.pop.end;

 And to move forward it's:
 auto after = all.after(center);
 if(!after.isEmpty)
  center = after.next.begin;

 To get the value there, I have to do:
 all.after(center).left // or whatever gets decided as the 'get first value
 of range' member

 or if opStar is used:

 *all.after(center);

 I much prefer:

 forward:
 if(center != list.end)
    ++center;

 reverse:
 if(center != list.begin)
   --center;

 get value:
 *center;

 Especially without all the extra overhead

 I see both methods as being just as open to mistakes, the first more-so, and
 more difficult to comprehend (at least for me).

Well put.  I was trying to come up with a comparison like this last
night.  But at 3am I was just too tired to pull it off.  Great example
of the kind of cognitive overload that comes from this kind of
scenario.

I really believe the point that ranges are good for std.algorithm is
fine.  But when people use iterators in code they are often used like
the above.  This whole shifting back and forth over a linked list was
seeming very familiar to me last night and I recalled this morning
that I had written some code to implement undo which worked in this
very way.  The linked list was the undo stack.  And undo() moved the
current iterator one direction, redo() moved it the other.

So far though we don't seem to be able to come up with a good example
other of where ranges are weak than traversing a list back and forth.
 Note that "move back and forth according to some user input" is not
clearly not an "algorithm" that would be in std.algorithm.  But it
does come up often enough in applications.  I don't think the fact
that it's not strictly an Algorithm-with-a-captial-A makes it any less
important.

But it is a little fishy that we can't come up with any other example
besides sliding a bead on a wire back and forth.

--bb

Sep 10 2008

Benji Smith <dlanguage benjismith.net> writes:

Bill Baxter wrote:
 So far though we don't seem to be able to come up with a good example
 other of where ranges are weak than traversing a list back and forth.

 ...

 But it is a little fishy that we can't come up with any other example
 besides sliding a bead on a wire back and forth.

I dunno about that.

I can think of lots of examples where the "range" metaphor is an awkward 
interloper between the container and the iteration logic:

maps, sets, bags, markov models, graphs, trees (especially in a 
breadth-first traversal)

The word "range" and the idea of the range "moving", "shrinking", or 
being "empty" only matches my concept of "iteration" if I think strictly 
in terms of sequential containers (arrays, slices, lists, etc).

I think the range methaphor is a very cool way of connecting sequential 
containers with algorithms (especially divide-and-conquer algorithms, 
which seem particularly well-suited to the range metaphor).

But if I want to visit each <p> node in a DOM tree, I have a hard time 
visualizing how a "range" fits into that process.

Maybe it's just terminology. I'm not sure yet.

--benji

Sep 10 2008

"Bill Baxter" <wbaxter gmail.com> writes:

On Thu, Sep 11, 2008 at 9:41 AM, Benji Smith <dlanguage benjismith.net> wrote:
 Bill Baxter wrote:
 So far though we don't seem to be able to come up with a good example
 other of where ranges are weak than traversing a list back and forth.

 ...

 But it is a little fishy that we can't come up with any other example
 besides sliding a bead on a wire back and forth.

 I dunno about that.

 I can think of lots of examples where the "range" metaphor is an awkward
 interloper between the container and the iteration logic:

 maps, sets, bags, markov models, graphs, trees (especially in a
 breadth-first traversal)

Iterators for maps, sets, bags, graphs, trees are usually either for
pointing to a found element or for iterating over the whole thing.
With ranges the former just becomes a degenerate range where only one
end is actually important.  The other end would probably be then end()
of the container in the STL sense.
The latter is no problem if you just want to forward iterate over everything.

 The word "range" and the idea of the range "moving", "shrinking", or being
 "empty" only matches my concept of "iteration" if I think strictly in terms
 of sequential containers (arrays, slices, lists, etc).

For the one-way ranges, the range is equivalent to a forward iterator
plus an end().  You can do exactly the same things with it.  The end()
may very happily not actually exist, though, if not needed, or if it
depends on some dynamic condition, like for your HMM example.

 I think the range methaphor is a very cool way of connecting sequential
 containers with algorithms (especially divide-and-conquer algorithms, which
 seem particularly well-suited to the range metaphor).

 But if I want to visit each <p> node in a DOM tree, I have a hard time
 visualizing how a "range" fits into that process.

For that you just use a forward range, which is just forward iterator
plus a stopping criterion, that's all.

 Maybe it's just terminology. I'm not sure yet.

Maybe.

There is one thing so far that we can point to and say "ranges aren't
so great for this".  That's the case where you want to scan forward
*and* backward over your data.  But I now believe that std lib
functions can cover that usage case in a non-burdensome way, too.

--bb

Sep 10 2008

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

"Bill Baxter" wrote
 So far though we don't seem to be able to come up with a good example
 other of where ranges are weak than traversing a list back and forth.
 Note that "move back and forth according to some user input" is not
 clearly not an "algorithm" that would be in std.algorithm.  But it
 does come up often enough in applications.  I don't think the fact
 that it's not strictly an Algorithm-with-a-captial-A makes it any less
 important.

 But it is a little fishy that we can't come up with any other example
 besides sliding a bead on a wire back and forth.

Any structure that might change topology doesn't lend itself well to 
persistant ranges.  Ranges are fine for iterating over a constant version of 
the container.  i.e., if you want to implement a search function, where you 
are assuming that during the search, the container doesn't change, that 
should take a range as an argument.  But storing references to individual 
elements for later use (such as O(1) lookup or quick removal), and modifying 
the container inbetween getting the reference and using the reference makes 
it difficult to guarantee the behavior.  The only range type that seems like 
it would be immune to such changes would be the empty range where both ends 
point to the same element.  In fact, this can be reduced to a single 
reference, just copied for the sake of calling it a 'range'.

Arrays are really a special case where the ranges unequivocally work because 
once you get a range, all of it is guaranteed not to disappear or change 
topology.  i.e. a slice always contains valid data, no matter what you do to 
the original array.  I think this is the model Andrei is trying to achieve 
for all containers/iterables, and I think it's just not the same.  I think 
passing the range around as one entity is a very helpful thing, especially 
for algorithms which generally take ranges in the form of 2 iterators, but I 
don't think it solves all problems.

-Steve

Sep 10 2008

"Bill Baxter" <wbaxter gmail.com> writes:

On Thu, Sep 11, 2008 at 10:46 AM, Steven Schveighoffer
<schveiguy yahoo.com> wrote:
 "Bill Baxter" wrote
 So far though we don't seem to be able to come up with a good example
 other of where ranges are weak than traversing a list back and forth.
 Note that "move back and forth according to some user input" is not
 clearly not an "algorithm" that would be in std.algorithm.  But it
 does come up often enough in applications.  I don't think the fact
 that it's not strictly an Algorithm-with-a-captial-A makes it any less
 important.

 But it is a little fishy that we can't come up with any other example
 besides sliding a bead on a wire back and forth.

 Any structure that might change topology doesn't lend itself well to
 persistant ranges.

But they often don't lend themselves to iterators either.

 Ranges are fine for iterating over a constant version of
 the container.  i.e., if you want to implement a search function, where you
 are assuming that during the search, the container doesn't change, that
 should take a range as an argument.  But storing references to individual
 elements for later use (such as O(1) lookup or quick removal), and modifying
 the container inbetween getting the reference and using the reference makes
 it difficult to guarantee the behavior.

Lots of algorithms on containers using iterators have this property too.

 The only range type that seems like
 it would be immune to such changes would be the empty range where both ends
 point to the same element.  In fact, this can be reduced to a single
 reference, just copied for the sake of calling it a 'range'.

Or a here-to-end range where one end points to a special "end"
sentinel.  Like with a linked list.
You can't change the sentinel by mutating the contents so it remains
valid.  I think this would be a more common/useful form than the empty
range.  Because you can dereference the range from here-to-end, but
you can't dereference the range from here to here.

Probably the cursor idiom should also use "all" and "here-to-end"
rather than "all" and "here-to-here".  Then the
all.after(center).value ugliness isn't needed, just center.value.  (I
prefer ".value" to ".first")

 Arrays are really a special case where the ranges unequivocally work because
 once you get a range, all of it is guaranteed not to disappear or change
 topology.  i.e. a slice always contains valid data, no matter what you do to
 the original array.  I think this is the model Andrei is trying to achieve
 for all containers/iterables, and I think it's just not the same.  I think
 passing the range around as one entity is a very helpful thing, especially
 for algorithms which generally take ranges in the form of 2 iterators, but I
 don't think it solves all problems.

Well it solves them, but is it worth the tradeoffs you have to make.
You say no.  I say I now have a reasonable way to handle the only
example I could think of that seemed really cumbersome.  Lacking
further evidence it seems the main problem remaining is just having to
carry around an extra value when you just want to refer to one value.

However, those pointers to "single values" can also move, so for
safety's sake why not keep the fence post with it?

--bb

Sep 10 2008

Jason House <jason.james.house gmail.com> writes:

Steven Schveighoffer Wrote:

 "Bill Baxter" wrote
 So far though we don't seem to be able to come up with a good example
 other of where ranges are weak than traversing a list back and forth.
 Note that "move back and forth according to some user input" is not
 clearly not an "algorithm" that would be in std.algorithm.  But it
 does come up often enough in applications.  I don't think the fact
 that it's not strictly an Algorithm-with-a-captial-A makes it any less
 important.

 But it is a little fishy that we can't come up with any other example
 besides sliding a bead on a wire back and forth.

 
 Any structure that might change topology doesn't lend itself well to 
 persistant ranges.  

Who says all ranges have to be persistant? Ranges "from here to the end" can be
dynamic similar to an iterator. 

In my mind, the important cdiscussion is what "end" means and to compare when
ranges and iterators get invalidated. 
I also wonder a bit about mixing ranges with non-iterable cursors. Of course,
their limited value may not merit the complexity.






  

 Ranges are fine for iterating over a constant version of 
 the container.  i.e., if you want to implement a search function, where you 
 are assuming that during the search, the container doesn't change, that 
 should take a range as an argument.  But storing references to individual 
 elements for later use (such as O(1) lookup or quick removal), and modifying 
 the container inbetween getting the reference and using the reference makes 
 it difficult to guarantee the behavior.  The only range type that seems like 
 it would be immune to such changes would be the empty range where both ends 
 point to the same element.  In fact, this can be reduced to a single 
 reference, just copied for the sake of calling it a 'range'.
 
 Arrays are really a special case where the ranges unequivocally work because 
 once you get a range, all of it is guaranteed not to disappear or change 
 topology.  i.e. a slice always contains valid data, no matter what you do to 
 the original array.  I think this is the model Andrei is trying to achieve 
 for all containers/iterables, and I think it's just not the same.  I think 
 passing the range around as one entity is a very helpful thing, especially 
 for algorithms which generally take ranges in the form of 2 iterators, but I 
 don't think it solves all problems.
 
 -Steve

Sep 10 2008

"Bill Baxter" <wbaxter gmail.com> writes:

On Thu, Sep 11, 2008 at 8:17 AM, Steven Schveighoffer
<schveiguy yahoo.com> wrote:
 "Sergey Gromov" wrote
 You don't mention here which iterator usage pattern you are trying to
 model with ranges.  I can think of at least two.

 1.  You use a single bidirectional 'center' iterator, center == 5.  As
 one would naturally do with iterators.  Note then that whenever you use
 your center for, say, backward iteration, you reconstruct the actual
 range by calling list.begin.  You do it on each iteration.  No wonder it
 stays valid even if you remove the first element in the meantime: you're
 constructing your range from scratch anyway.  If you want to model this
 pattern with ranges---no problem, keep an empty 'center' range, center
 == (5,5), and reconstruct backward iteration range,

 reverse = all.before(center);

 whenever you need to iterate, then

 center = reverse.end;

 This 'center' range, being slightly less efficient, stays valid and
 becomes invalid in exactly the same conditions as your classical
 iterator.

 This is exactly the pattern I use.  I agree that your example would solve
 the problem, I hadn't thought of an empty range to be a cursor, that is
 clever!

 The only missing piece to your solution is that I must construct the range
 after the center range in order to access the value to see where I need to
 go.

 What I see as the biggest downside is the cumbersome and verbose code of
 moving the 'iterator' around, as every time I want to move forward, I
 construct a new range, and every time I want to move backwards I construct a
 new range (and construct a new 'center' afterwards).  So a 'move back one'
 looks like:

 auto before = all.before(center);
 if(!before.isEmpty)
  center = before.pop.end;

 And to move forward it's:
 auto after = all.after(center);
 if(!after.isEmpty)
  center = after.next.begin;

Maybe all we need to neatly support this sliding cursor idiom is just
some functions in the std lib:

bool cursorRetreat(R)(R all, ref R center)
{
  auto before = all.before(center);
  if(!before.isEmpty) {
    center = before.pop.end;
    return true;
  }
  return false;
}

bool cursorAdvance(R)(R all, ref R center)
{
  auto after = all.after(center);
  if(!after.isEmpty) {
   center = after.next.begin;
   return true;
  }
  return false
}

 To get the value there, I have to do:
 all.after(center).left // or whatever gets decided as the 'get first value
 of range' member
 or if opStar is used:

 *all.after(center);

Why is all that necessary?  Can't you just do a  *center?

 I much prefer:

 forward:
 if(center != list.end)
    ++center;

 reverse:
 if(center != list.begin)
   --center;

 get value:
 *center;

With the functions it becomes

forward:
cursorAdance(list,center);

reverse:
cursorRetreat(list,center);

get value:
 *center  -- this works doesn't it?

 Especially without all the extra overhead

Since we haven't really come up with any examples where the speed with
which you can slide back and forth would make a whit of difference,
perhaps the extra overhead is a non-issue.

 I see both methods as being just as open to mistakes, the first more-so, and
 more difficult to comprehend (at least for me).

I'm optimistic that this use case can also be covered by some well
chosen std library functions, similar to the above.

--bb

Sep 10 2008

"Bill Baxter" <wbaxter gmail.com> writes:

On Thu, Sep 11, 2008 at 9:32 AM, Bill Baxter <wbaxter gmail.com> wrote:
 On Thu, Sep 11, 2008 at 8:17 AM, Steven Schveighoffer
 To get the value there, I have to do:
 all.after(center).left // or whatever gets decided as the 'get first value
 of range' member
 or if opStar is used:

 *all.after(center);

 Why is all that necessary?  Can't you just do a  *center?

Oh, I get it.  It's empty.  Duh.

Ok, so you can have third cursor function in the std lib:

T cursorValue(R,T)(R all, R center)
{
   return all.after(center).left;
}
... plus the
cursorAdvance and cursorRetreat.

--bb

Sep 10 2008

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

"Bill Baxter" wrote
 On Thu, Sep 11, 2008 at 9:32 AM, Bill Baxter <wbaxter gmail.com> wrote:
 On Thu, Sep 11, 2008 at 8:17 AM, Steven Schveighoffer
 To get the value there, I have to do:
 all.after(center).left // or whatever gets decided as the 'get first 
 value
 of range' member
 or if opStar is used:

 *all.after(center);

 Why is all that necessary?  Can't you just do a  *center?

 Oh, I get it.  It's empty.  Duh.

 Ok, so you can have third cursor function in the std lib:

 T cursorValue(R,T)(R all, R center)
 {
   return all.after(center).left;
 }
 ... plus the
 cursorAdvance and cursorRetreat.

That is all fine and dandy in the world of "I don't care how well my 
iterators perform or how much code bloat is added because of them," but I 
usually work in a different world ;)

But if I were forced not to use an iterator model (which isn't the case, 
iterators should be very possible without compiler help), I would actually 
implement this as a wrapper struct:

struct Cursor(containerType)
{
   private Range!(containerType) _cur;
   private containerType owner;

   Cursor  moveLeft() {...}
   Cursor moveRight() {...}
   bool hasLeft() {...}
   etc.
}

Thus one can implement iterators on top of ranges, but I'd argue that ranges 
are much easier to implement on top of iterators.

In any case, I think there are benefits to having a range type that is not 
necessarily defined as two iterators.

-Steve

Sep 10 2008

"Bill Baxter" <wbaxter gmail.com> writes:

On Thu, Sep 11, 2008 at 10:35 AM, Steven Schveighoffer
<schveiguy yahoo.com> wrote:
 "Bill Baxter" wrote
 On Thu, Sep 11, 2008 at 9:32 AM, Bill Baxter <wbaxter gmail.com> wrote:
 On Thu, Sep 11, 2008 at 8:17 AM, Steven Schveighoffer
 To get the value there, I have to do:
 all.after(center).left // or whatever gets decided as the 'get first
 value
 of range' member
 or if opStar is used:

 *all.after(center);

 Why is all that necessary?  Can't you just do a  *center?

 Oh, I get it.  It's empty.  Duh.

 Ok, so you can have third cursor function in the std lib:

 T cursorValue(R,T)(R all, R center)
 {
   return all.after(center).left;
 }
 ... plus the
 cursorAdvance and cursorRetreat.

 That is all fine and dandy in the world of "I don't care how well my
 iterators perform or how much code bloat is added because of them," but I
 usually work in a different world ;)

Ok, but I have yet to hear an actual use case that demands blazing
fast iteration both forwards and backwards.  In your shuffling video
there's no way moving the iterator back and forth is going to be the
bottleneck.  In my undo/redo stack example it is also far from being
on the critical path.    I think it goes back to the fact that going
back and forth randomly isn't a property of many algorithms.  In all
the examples I can think of it's more a property of how humans
interact with data.  And humans are slow compared to how long it takes
to update a few extra values.

Certainly one-way iteration needs to be as fast as possible, for all
kinds of algorithms.  But does bidirection iteration really need to be
super-fast?

 But if I were forced not to use an iterator model (which isn't the case,
 iterators should be very possible without compiler help), I would actually
 implement this as a wrapper struct:

 struct Cursor(containerType)
 {
   private Range!(containerType) _cur;
   private containerType owner;

   Cursor  moveLeft() {...}
   Cursor moveRight() {...}
   bool hasLeft() {...}
   etc.
 }

That would work for me too.  Just put it in the standard lib so I
don't have to scratch my head wondering why such a basic thing is so
hard to do!  Of course once you do that, you have to wonder why this
one's interface isn't branded a range concept but the others are.  (I
know I know... it's not Stepanov "basic"), but if it's there and
people want to use it, I see no value in refusing to recognize it on
purist grounds.

 Thus one can implement iterators on top of ranges, but I'd argue that ranges
 are much easier to implement on top of iterators.

Ranges are safer and easier to work with in most cases so it's worth
it, or so the argument goes.  You don't buy it?
I think things like infinite generators make more sense as a range
because it's difficult to express succinctly as two iterators.  Or
perhaps you don't mean to imply that every range would have a begin()
and an end() iterator you could access?

 In any case, I think there are benefits to having a range type that is not
 necessarily defined as two iterators.

But how to do it without a large increase in the number of fundamental
concepts you have to keep track of -- that's the issue.

--bb

Sep 10 2008

Benji Smith <dlanguage benjismith.net> writes:

Bill Baxter wrote:
 Ok, but I have yet to hear an actual use case that demands blazing
 fast iteration both forwards and backwards.  In your shuffling video
 there's no way moving the iterator back and forth is going to be the
 bottleneck.  In my undo/redo stack example it is also far from being
 on the critical path.    I think it goes back to the fact that going
 back and forth randomly isn't a property of many algorithms.  In all
 the examples I can think of it's more a property of how humans
 interact with data.  And humans are slow compared to how long it takes
 to update a few extra values.

Oh!! I thought of one!!

Parsers & regex engines move both forward and backward, as they try to 
match characters to a pattern.

Really, anything that uses an NFA or DFA to define patterns would 
benefit from fast bidirectional iteration...

--benji

Sep 10 2008

"Bill Baxter" <wbaxter gmail.com> writes:

On Thu, Sep 11, 2008 at 1:31 PM, Benji Smith <dlanguage benjismith.net> wrote:
 Bill Baxter wrote:
 Ok, but I have yet to hear an actual use case that demands blazing
 fast iteration both forwards and backwards.  In your shuffling video
 there's no way moving the iterator back and forth is going to be the
 bottleneck.  In my undo/redo stack example it is also far from being
 on the critical path.    I think it goes back to the fact that going
 back and forth randomly isn't a property of many algorithms.  In all
 the examples I can think of it's more a property of how humans
 interact with data.  And humans are slow compared to how long it takes
 to update a few extra values.

 Oh!! I thought of one!!

 Parsers & regex engines move both forward and backward, as they try to match
 characters to a pattern.

 Really, anything that uses an NFA or DFA to define patterns would benefit
 from fast bidirectional iteration...

Good call.  I was about to post something mentioning that Turing
machines but that seemed too academic.  Same class of thing as
NFA/DFA/FSM.

The question is, though, would you really implement those things using
a linked list?  I would expect most of those suckers work on arrays,
and so can take advantage of the bidirectional nature of random access
ranges.

Hmm, for FSMs you can't really define a good end state.  There may not
be any particular end state. ... ah, but wait I forgot.  That's the
beauty of a range -- the end "state" doesn't have to be a "state" per
se.  It can be any predicate you want it to be.  "Range" is misleading
in this case.  This is one of those cases where you just have to
remember "range" means "current value plus stopping criterion".

--bb

Sep 10 2008

Benji Smith <dlanguage benjismith.net> writes:

Bill Baxter wrote:
 On Thu, Sep 11, 2008 at 1:31 PM, Benji Smith <dlanguage benjismith.net> wrote:
 Parsers & regex engines move both forward and backward, as they try to match
 characters to a pattern.

 Really, anything that uses an NFA or DFA to define patterns would benefit
 from fast bidirectional iteration...

 
 Good call.  I was about to post something mentioning that Turing
 machines but that seemed too academic.  Same class of thing as
 NFA/DFA/FSM.
 
 The question is, though, would you really implement those things using
 a linked list?  I would expect most of those suckers work on arrays,
 and so can take advantage of the bidirectional nature of random access
 ranges.

Actually, Perl 6 will (assuming they ever finish it) finally allow regex 
matching against input streams:

http://dev.perl.org/perl6/doc/design/syn/S05.html

It's a big document. Search for the text "Matching against non-strings"

This kind of thing was one of the main arguments I made in my "Why 
Strings as Classes" thread, that got everyone's panties in a bunch and 
that no one else agreed with.

In that thread, I argued that Strings should be objects so that they can 
implement a CharSequence interface (or something like it). And then all 
the standard text-processing stuff could be written against that 
interface, allowing regex engines and parsers to be agnostic about 
whether they read from an actual in-memory string or from a 
(file|database|socket) input stream.

With the range proposal on the table, I'd be just as happy if all the D 
text-processing stuff in the standard libs was implemented against a 
Range!(T), where T is one of the char types. Especially if ranges can be 
infinite.

Bill Baxter wrote:
 Hmm, for FSMs you can't really define a good end state.  There may not
 be any particular end state. ... ah, but wait I forgot.  That's the
 beauty of a range -- the end "state" doesn't have to be a "state" per
 se.  It can be any predicate you want it to be.  "Range" is misleading
 in this case.  This is one of those cases where you just have to
 remember "range" means "current value plus stopping criterion".

That's what I was saying earlier.

I think the mechanics are good. And for contiguous, sequential 
containers, the word "range" is great. For other types of containers, or 
other selection/iteration scenarios, you can shoehorn your mental model 
into the "range" metaphor. But it's weird.

--benji

Sep 11 2008

Sergey Gromov <snake.scaly gmail.com> writes:

Benji Smith <dlanguage benjismith.net> wrote:
 Bill Baxter wrote:
 Hmm, for FSMs you can't really define a good end state.  There may not
 be any particular end state. ... ah, but wait I forgot.  That's the
 beauty of a range -- the end "state" doesn't have to be a "state" per
 se.  It can be any predicate you want it to be.  "Range" is misleading
 in this case.  This is one of those cases where you just have to
 remember "range" means "current value plus stopping criterion".

 
 That's what I was saying earlier.
 
 I think the mechanics are good. And for contiguous, sequential 
 containers, the word "range" is great. For other types of containers, or 
 other selection/iteration scenarios, you can shoehorn your mental model 
 into the "range" metaphor. But it's weird.

It seems to me like a misuse of ranges.  Do you really want to iterate 
over a state machine?  FSM is a mailbox with a 'message' hole.  You put 
messages into it and it does things.  How do you iterate over a mailbox?

Sep 11 2008

Benji Smith <dlanguage benjismith.net> writes:

Sergey Gromov wrote:
 Benji Smith <dlanguage benjismith.net> wrote:
 Bill Baxter wrote:
 Hmm, for FSMs you can't really define a good end state.  There may not
 be any particular end state. ... ah, but wait I forgot.  That's the
 beauty of a range -- the end "state" doesn't have to be a "state" per
 se.  It can be any predicate you want it to be.  "Range" is misleading
 in this case.  This is one of those cases where you just have to
 remember "range" means "current value plus stopping criterion".

 That's what I was saying earlier.

 I think the mechanics are good. And for contiguous, sequential 
 containers, the word "range" is great. For other types of containers, or 
 other selection/iteration scenarios, you can shoehorn your mental model 
 into the "range" metaphor. But it's weird.

 
 It seems to me like a misuse of ranges.  Do you really want to iterate 
 over a state machine?  FSM is a mailbox with a 'message' hole.  You put 
 messages into it and it does things.  How do you iterate over a mailbox?

Well, no. Not when you put it like that.

The example I posted earlier went something like this:

    MarkovModel<ApplicationState> model = ...;
    for (ApplicationState state : model) {
       state.doStuff();
    }

It's not a bad abstraction. The model handles all of the semantics of 
calculating the transition probabilities and selecting the next state, 
so that the foreach loop doesn't have to muss with those details.

Yeah, it's a total misuse of the "range" metaphor, and that's exactly 
what I'm saying. In Java, where I implemented this project, an 
"iterator" is a tiny bit of logic for returning objects in a 
(potentially endless, potentially reversible) sequence, primarily to 
support looping constructs. Just because there's no underlying range of 
objects doesn't mean they're not iterable.

Of course, Java iterators are *much* more limited constructs than these 
new D ranges. But I still think the concept has merit. And, like you 
said, calling them ranges makes them seem stupid. Because they're not 
ranges.

--benji

Sep 11 2008

Sergey Gromov <snake.scaly gmail.com> writes:

Benji Smith <dlanguage benjismith.net> wrote:
 Sergey Gromov wrote:
 Benji Smith <dlanguage benjismith.net> wrote:
 Bill Baxter wrote:
 Hmm, for FSMs you can't really define a good end state.  There may not
 be any particular end state. ... ah, but wait I forgot.  That's the
 beauty of a range -- the end "state" doesn't have to be a "state" per
 se.  It can be any predicate you want it to be.  "Range" is misleading
 in this case.  This is one of those cases where you just have to
 remember "range" means "current value plus stopping criterion".

 That's what I was saying earlier.

 I think the mechanics are good. And for contiguous, sequential 
 containers, the word "range" is great. For other types of containers, or 
 other selection/iteration scenarios, you can shoehorn your mental model 
 into the "range" metaphor. But it's weird.

 
 It seems to me like a misuse of ranges.  Do you really want to iterate 
 over a state machine?  FSM is a mailbox with a 'message' hole.  You put 
 messages into it and it does things.  How do you iterate over a mailbox?

 
 Well, no. Not when you put it like that.
 
 The example I posted earlier went something like this:
 
     MarkovModel<ApplicationState> model = ...;
     for (ApplicationState state : model) {
        state.doStuff();
     }
 
 It's not a bad abstraction. The model handles all of the semantics of 
 calculating the transition probabilities and selecting the next state, 
 so that the foreach loop doesn't have to muss with those details.
 
 Yeah, it's a total misuse of the "range" metaphor, and that's exactly 
 what I'm saying. In Java, where I implemented this project, an 
 "iterator" is a tiny bit of logic for returning objects in a 
 (potentially endless, potentially reversible) sequence, primarily to 
 support looping constructs. Just because there's no underlying range of 
 objects doesn't mean they're not iterable.
 
 Of course, Java iterators are *much* more limited constructs than these 
 new D ranges. But I still think the concept has merit. And, like you 
 said, calling them ranges makes them seem stupid. Because they're not 
 ranges.

Well, if you get an object out of there on every step, and that object 
source can exhaust at some point, then the abstraction is correct.

I agree that an input range is actually an arbitrary bounded iterator.  
But you also must agree that a random access iterator in C++ is actually 
an unbounded array.  You always can invent a better name for any 
particular case.  But C++ keeps calling them iterators to pronounce 
generocity and emphasize interchangeability to some degree.  You may not 
notice that calling a string pointer an iterator is a bit awkward and 
misleading, because you get used to it and learned to think that way.

There's no difference with ranges.  Some of them are actual ranges, some 
are not, some are plain abstractions.  You need to learn to think in 
ranges to use them naturally.  This is true for any new language you're 
learning, programming or human, if you want to use them efficiently.

Sep 11 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Sergey Gromov wrote:
 Benji Smith <dlanguage benjismith.net> wrote:
 Sergey Gromov wrote:
 Benji Smith <dlanguage benjismith.net> wrote:
 Bill Baxter wrote:
 Hmm, for FSMs you can't really define a good end state.  There may not
 be any particular end state. ... ah, but wait I forgot.  That's the
 beauty of a range -- the end "state" doesn't have to be a "state" per
 se.  It can be any predicate you want it to be.  "Range" is misleading
 in this case.  This is one of those cases where you just have to
 remember "range" means "current value plus stopping criterion".

 That's what I was saying earlier.

 I think the mechanics are good. And for contiguous, sequential 
 containers, the word "range" is great. For other types of containers, or 
 other selection/iteration scenarios, you can shoehorn your mental model 
 into the "range" metaphor. But it's weird.

 It seems to me like a misuse of ranges.  Do you really want to iterate 
 over a state machine?  FSM is a mailbox with a 'message' hole.  You put 
 messages into it and it does things.  How do you iterate over a mailbox?

 Well, no. Not when you put it like that.

 The example I posted earlier went something like this:

     MarkovModel<ApplicationState> model = ...;
     for (ApplicationState state : model) {
        state.doStuff();
     }

 It's not a bad abstraction. The model handles all of the semantics of 
 calculating the transition probabilities and selecting the next state, 
 so that the foreach loop doesn't have to muss with those details.

 Yeah, it's a total misuse of the "range" metaphor, and that's exactly 
 what I'm saying. In Java, where I implemented this project, an 
 "iterator" is a tiny bit of logic for returning objects in a 
 (potentially endless, potentially reversible) sequence, primarily to 
 support looping constructs. Just because there's no underlying range of 
 objects doesn't mean they're not iterable.

 Of course, Java iterators are *much* more limited constructs than these 
 new D ranges. But I still think the concept has merit. And, like you 
 said, calling them ranges makes them seem stupid. Because they're not 
 ranges.

 
 Well, if you get an object out of there on every step, and that object 
 source can exhaust at some point, then the abstraction is correct.
 
 I agree that an input range is actually an arbitrary bounded iterator.  
 But you also must agree that a random access iterator in C++ is actually 
 an unbounded array.  You always can invent a better name for any 
 particular case.  But C++ keeps calling them iterators to pronounce 
 generocity and emphasize interchangeability to some degree.  You may not 
 notice that calling a string pointer an iterator is a bit awkward and 
 misleading, because you get used to it and learned to think that way.
 
 There's no difference with ranges.  Some of them are actual ranges, some 
 are not, some are plain abstractions.  You need to learn to think in 
 ranges to use them naturally.  This is true for any new language you're 
 learning, programming or human, if you want to use them efficiently.

I agree 100%, and also with Sergey's other post that some abstractions 
simply don't fit the range charter, or don't fit it naturally, or are 
not expressive enough for some rich iteration abstraction.

Maybe ranges are lousy for an HMM, but then does look like I want to run 
a host of generic algorithms against an HMM? That IS the question. 
Probably not. HMMs have their own algorithms, and I wouldn't think of 
finding/sorting/partitioning an HMM just as I wouldn't think of applying 
Viterbi to an array.

What I wanted was to make sure ranges are appropriate as higher-level 
abstractions that can replace STL-like iterators. My experience shows 
that they can. Not on 100% of occasions have they been a superior 
replacement, but I'm looking at a solid 80s at least. Add to that the 
advantage of better generators (which iterators make unpalatable because 
of the unsightly dummy end() requirement). When I also add the safety 
advantage of sinks (no more buffer overruns!!!), I feel we have a huge 
winner.

Of course, that doesn't mean ranges should be the be-all end-all of 
iteration.

This discussion reminds me of the "iterator craze" around 2000. People 
were discovering STL iterators and were trying to define and use the 
weirdest iterators. I remember distinctly a one-page ad on Dr. Dobb's 
Journal. They were looking for article writers. They mentioned the 
upcoming themes (e.g. networking, security, patterns, C++...). There was 
a _specific_ note: "Not interested in yet another iterator article".

That being said, damn I wish I had the time to make RegEx faster AND 
operating on input ranges...


Andrei

Sep 11 2008

Benji Smith <dlanguage benjismith.net> writes:

Andrei Alexandrescu wrote:
 What I wanted was to make sure ranges are appropriate as higher-level 
 abstractions that can replace STL-like iterators. My experience shows 
 that they can. Not on 100% of occasions have they been a superior 
 replacement, but I'm looking at a solid 80s at least. Add to that the 
 advantage of better generators (which iterators make unpalatable because 
 of the unsightly dummy end() requirement). When I also add the safety 
 advantage of sinks (no more buffer overruns!!!), I feel we have a huge 
 winner.

I agree.

My quibble with the name "range" is pretty minor, and I don't have any 
qualm with the semantics.

And "range" is certainly a better name for an iteration metaphor than 
"opApply".

:-)

--benji

Sep 11 2008

Russell Lewis <webmaster villagersonline.com> writes:

Benji Smith wrote:
 Bill Baxter wrote:
 Ok, but I have yet to hear an actual use case that demands blazing
 fast iteration both forwards and backwards.  In your shuffling video
 there's no way moving the iterator back and forth is going to be the
 bottleneck.  In my undo/redo stack example it is also far from being
 on the critical path.    I think it goes back to the fact that going
 back and forth randomly isn't a property of many algorithms.  In all
 the examples I can think of it's more a property of how humans
 interact with data.  And humans are slow compared to how long it takes
 to update a few extra values.

 
 Oh!! I thought of one!!
 
 Parsers & regex engines move both forward and backward, as they try to 
 match characters to a pattern.

They do backtracking, which is different than iterating backward.  I 
would suggest that a parser should use a stack of forward iterators 
instead.  That's my $.02, at least.

 Really, anything that uses an NFA or DFA to define patterns would 
 benefit from fast bidirectional iteration...

DFAs can't backtrack, so they don't require backward movement through 
the input.  NFAs might, depending on the implementation (are you going 
to use guess-and-backtrack, or parallel execution?) but I would again 
suggest that a stack (or "TODO list") of forward iterators might work 
better than backtracking.

Sep 11 2008

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

"Bill Baxter" wrote
 Thus one can implement iterators on top of ranges, but I'd argue that 
 ranges
 are much easier to implement on top of iterators.

 Ranges are safer and easier to work with in most cases so it's worth
 it, or so the argument goes.  You don't buy it?

I can define an iterator and it doesn't mean that it makes ranges any less 
safe.  Just give me the choice, if I think iterators are a better fit, I 
might choose iterators.  But having to shoehorn ranges into an iterator form 
so that I do not wince at the ugliness of my code seems like unnecessary 
baggage.

I believe that when you are actually using a range of values, a range form 
is a much better, safer fit.  When you want just a pointer to a single 
value, then a pointer-form is a better fit.  But I want to be able to 
construct ranges from pointers.  I want to save pointers.  I want to use 
pointers to refer to elements in a collection.  I want to use pointers to 
move one-at-a-time along a node-based container.  I don't want to 'emulate' 
pointers using ranges.  I don't want the library to resist me doing what I 
find natural.

This goes back to a lot of the points I've brought up about 'safety' issues 
in D.  D is a systems language, I like the safety by default, but when I can 
gain something by breaking the safety, I want to be able to do it 
efficiently, and without resistance from the compiler.  Like logical const. 
I've proven it's possible to emulate, but at a performance disadvantage. 
This is no different, you can emulate iterators, but at a performance (and 
code bloat) disadvantage.  Granted the disadvantage isn't as big for this as 
it is for logical const, but the question still remains - if I can do it 
already, why is it so bad if it's supported natively?

Anyways, I'm going to leave the discussion, I think I've said all I can 
about my views.  I'm not really good at explaining things anyways.  But I 
will update dcollections with what I think is the best compromise.  Then I 
might have a better understanding of how ranges fit into a collection 
package.  The good news is I don't have to worry about the language not 
providing iterators, everything is going to be library based, so we can 
easily try out both ways and see which is easier to use.

-Steve

Sep 11 2008

"Bill Baxter" <wbaxter gmail.com> writes:

On Fri, Sep 12, 2008 at 1:24 AM, Steven Schveighoffer
<schveiguy yahoo.com> wrote:
 "Bill Baxter" wrote
 Thus one can implement iterators on top of ranges, but I'd argue that
 ranges
 are much easier to implement on top of iterators.

 Ranges are safer and easier to work with in most cases so it's worth
 it, or so the argument goes.  You don't buy it?

 I can define an iterator and it doesn't mean that it makes ranges any less
 safe.  Just give me the choice, if I think iterators are a better fit, I
 might choose iterators.  But having to shoehorn ranges into an iterator form
 so that I do not wince at the ugliness of my code seems like unnecessary
 baggage.

I think one thing to consider is what it will take to make a new
container support and "play nice" with the regime proposed.  This
touches on Andrei's point about being hard pressed to think of generic
algorithms to run on an HMM, too.

The first question is who do you want to "play nice" with?  If you're
going to be writing functions specifically for that container, then
you don't really have to play nice with anyone.  Your container just
needs to have the operations necessary to support those functions.

The question of "playing nice" with everyone is not an issue at all
until you start wanting to have one algorithm that works for lots of
different containers to that can do kinda similar sorts of things.

And that's exactly what std.algorithm is for.  Supporting those kinds
of operations that apply equally to a lot of different kinds of
containers.

So if you agree that ranges are good enough for std.algorithm, then
you should agree that a generic iterator concept is not really
necessary, since the places left where you really need an iterator are
those places where a generic algorithm are not really useful.  If they
were generic algorithms then they would be in std.algorithm.

The next thing that's worth thinking about, is how much do you have to
work to play nice with std.algorithm?  The easier it is to implement
that interface expected by std.algorthm the better.

So for one thing that means that you really want the std.algorithm
concepts to nest, for one to build on the next.  That way if you
implement the most generic level, then you've automatically
implemented all the more restricted levels.  This leads us to want to
have a names that make sense all the way up the hierarchy.  Like
isEmpty().  It pretty much makes sense no matter which direction or
how many degrees of freedom you have.  That's better than something
like atEnd() for that reason.  It is unbiased.   You wouldn't want to
have to provide an "atEnd" to work with forward ranges, and then an
"isEmpty" to work with random access even though they mean the same
thing.

So the levels of iterators and naming of their parts should nest as
much as possible.  Which seems to be pretty much the case with the
current proposal.  I think .value will be a better name for "the
current thing" than .left.  (Using the operator * may be better
still.)  But other than that, the direction the naming is taking here
on the NG seems good.  I say .value in part because if users like you
implement their own iterator types, then .value is a reasonable name
for getting the thing referred to.  So you could then write a function
that takes a range or your iterator and uses the distinguished value
referred to.  In that sense * would be even better because it would
let you pass in a pointer too.


So in the end, really what I'm saying is that I think you are right.
Iterators are useful sometimes and it would be nice to design ranges
in such a way that the range terminology makes sense for iterators
too.  An iterator would probably support the .next property just like
a range, for instance.  That's a good name that will work with either.
  Maybe it's worth codifying what the iterator concepts should be even
if std.algorithm won't use them.

 But I want to be able to
 construct ranges from pointers.

If iterators are up to you then you will be able to do this.  But
std.algorithm will only care about the ranges you construct, not the
iterators.

 I want to save pointers.  I want to use
 pointers to refer to elements in a collection.  I want to use pointers to
 move one-at-a-time along a node-based container.  I don't want to 'emulate'
 pointers using ranges.  I don't want the library to resist me doing what I
 find natural.

I don't think it will as long as you provide those
my-iterator-to-range functions.

 Anyways, I'm going to leave the discussion, I think I've said all I can
 about my views.  I'm not really good at explaining things anyways.  But I
 will update dcollections with what I think is the best compromise.  Then I
 might have a better understanding of how ranges fit into a collection
 package.  The good news is I don't have to worry about the language not
 providing iterators, everything is going to be library based, so we can
 easily try out both ways and see which is easier to use.

I think your comments have made a valuable addition to the
conversation, and have at least helped me get my thoughts together.
So thanks!  I'll be interested to see how the work on your lib turns
out.

--bb

Sep 11 2008

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

"Bill Baxter" wrote
 I think one thing to consider is what it will take to make a new
 container support and "play nice" with the regime proposed.  This
 touches on Andrei's point about being hard pressed to think of generic
 algorithms to run on an HMM, too.

 The first question is who do you want to "play nice" with?  If you're
 going to be writing functions specifically for that container, then
 you don't really have to play nice with anyone.  Your container just
 needs to have the operations necessary to support those functions.

Bill, thanks so much for explaining it like this, I really agree with what 
you say.  My concern is that iterator is going to become a 'bad word' and 
considered a flawed design.

But you are right, there is no need for iterators to be allowed for 
std.algorithm, I totally agree with that, I just assumed Andrei meant 
iterators would be discouraged for everything, including general use as 
pointers into container objects.  If that is not the case, then I 
wholeheartedly agree that algorithms should be restricted to ranges, and 
iterators should be used only in container operations.

Cheers!

-Steve

Sep 12 2008

Sergey Gromov <snake.scaly gmail.com> writes:

Steven Schveighoffer <schveiguy yahoo.com> wrote:
 "Bill Baxter" wrote
 I think one thing to consider is what it will take to make a new
 container support and "play nice" with the regime proposed.  This
 touches on Andrei's point about being hard pressed to think of generic
 algorithms to run on an HMM, too.

 The first question is who do you want to "play nice" with?  If you're
 going to be writing functions specifically for that container, then
 you don't really have to play nice with anyone.  Your container just
 needs to have the operations necessary to support those functions.

 
 Bill, thanks so much for explaining it like this, I really agree with what 
 you say.  My concern is that iterator is going to become a 'bad word' and 
 considered a flawed design.
 
 But you are right, there is no need for iterators to be allowed for 
 std.algorithm, I totally agree with that, I just assumed Andrei meant 
 iterators would be discouraged for everything, including general use as 
 pointers into container objects.  If that is not the case, then I 
 wholeheartedly agree that algorithms should be restricted to ranges, and 
 iterators should be used only in container operations.

If you ask me, I think iterators AKA pointers into containers should be 
discouraged from SafeD.  If you don't care about SafeD you may use 
whatever you like.  Most library interfaces want to be SafeD to make 
user's life easier but few care about the library internals as long as 
they work.

Sep 12 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Sergey Gromov wrote:
 Steven Schveighoffer <schveiguy yahoo.com> wrote:
 "Bill Baxter" wrote
 I think one thing to consider is what it will take to make a new
 container support and "play nice" with the regime proposed.  This
 touches on Andrei's point about being hard pressed to think of generic
 algorithms to run on an HMM, too.

 The first question is who do you want to "play nice" with?  If you're
 going to be writing functions specifically for that container, then
 you don't really have to play nice with anyone.  Your container just
 needs to have the operations necessary to support those functions.

 Bill, thanks so much for explaining it like this, I really agree with what 
 you say.  My concern is that iterator is going to become a 'bad word' and 
 considered a flawed design.

 But you are right, there is no need for iterators to be allowed for 
 std.algorithm, I totally agree with that, I just assumed Andrei meant 
 iterators would be discouraged for everything, including general use as 
 pointers into container objects.  If that is not the case, then I 
 wholeheartedly agree that algorithms should be restricted to ranges, and 
 iterators should be used only in container operations.

 
 If you ask me, I think iterators AKA pointers into containers should be 
 discouraged from SafeD.  If you don't care about SafeD you may use 
 whatever you like.  Most library interfaces want to be SafeD to make 
 user's life easier but few care about the library internals as long as 
 they work.

That's also a reason why std.stdio must wrap FILE* into a safe struct. 
Manipulating FILE* objects directly is unsafe even if you disable 
pointer arithmetic.

Andrei

Sep 12 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Steven Schveighoffer wrote:
 "Bill Baxter" wrote
 I think one thing to consider is what it will take to make a new
 container support and "play nice" with the regime proposed.  This
 touches on Andrei's point about being hard pressed to think of generic
 algorithms to run on an HMM, too.

 The first question is who do you want to "play nice" with?  If you're
 going to be writing functions specifically for that container, then
 you don't really have to play nice with anyone.  Your container just
 needs to have the operations necessary to support those functions.

 
 Bill, thanks so much for explaining it like this, I really agree with what 
 you say.  My concern is that iterator is going to become a 'bad word' and 
 considered a flawed design.
 
 But you are right, there is no need for iterators to be allowed for 
 std.algorithm, I totally agree with that, I just assumed Andrei meant 
 iterators would be discouraged for everything, including general use as 
 pointers into container objects.  If that is not the case, then I 
 wholeheartedly agree that algorithms should be restricted to ranges, and 
 iterators should be used only in container operations.

You are right. Iterators can definitely be handy in many situations, and 
it took me some hair pulling to figure out how to do moveToFront with 
ranges alone. (Then admittedly it's a pretty wicked algorithm no matter 
what.)

I don't want to discourage defining iterators, but rather not force you 
to define them when you define a new range, and also force people who 
want to use std.algorithm in learning them in addition to ranges.


Andrei

Sep 12 2008

Fawzi Mohamed <fmohamed mac.com> writes:

I like the new proposal much more than the first.

I believe you will be able to use it successfully in std.algorithm.
I still would have preferred an operation like a sameHead or 
compareHeadPosition (that might or might not return the order, but at 
least tests for equality) so that upon request (-debug flag?) one would 
be able to make all range operation safe (with overhead) in a generic 
way, but it is up to you.

I what I really care about is the following:
I want foreach magic on all objects that support .done and .next, even 
if they are not ranges.
foreach is about iteration, iteration needs only .done and .next (a 
generator, iterator whatever), and it should work with that.
Do not force the range idea on foreach iteration.
foreach is a language construct, not a library one and should allow for 
maximum flexibility.

As extra nicety as each generator/iterator/range returns just one 
object I would like to be able to do:

// i counts starting from 1, j iterates on iterJ and in parallel k 
iterates on a.all
foreach(i,j,k;1..$,iterJ,a.all){
	//...
}

and have it expanded to

Range!(int) r1=1..$;
alias iterJ r2;
typeof(a.all) r3=a.all;
while(!(r1.done || r2.done || r3.done)){
	typeof(r1.next) i=r1.next;
	typeof(r2.next) j=r2.next;
	typeof(r3.next) k=r3.next;
	//...
}

Fawzi

Sep 12 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Fawzi Mohamed wrote:
 I like the new proposal much more than the first.
 
 I believe you will be able to use it successfully in std.algorithm.
 I still would have preferred an operation like a sameHead or 
 compareHeadPosition (that might or might not return the order, but at 
 least tests for equality) so that upon request (-debug flag?) one would 
 be able to make all range operation safe (with overhead) in a generic 
 way, but it is up to you.

Comparing for equality of heads is very important. For now you can 
obtain it as a non-primitive by invoking:

auto sameHead = r.before(s).done;

The above also show how "done" is not always very expressive. Also you 
can compare whether two ranges have the same end by invoking:

auto sameRange = r is s;

 I what I really care about is the following:
 I want foreach magic on all objects that support .done and .next, even 
 if they are not ranges.
 foreach is about iteration, iteration needs only .done and .next (a 
 generator, iterator whatever), and it should work with that.
 Do not force the range idea on foreach iteration.
 foreach is a language construct, not a library one and should allow for 
 maximum flexibility.

Yes. Walter asked me to send him the syntactic transformation that 
foreach and foreach_reverse need to do. Duck typing will be used so as 
long as you define the proper names you're in good shape.

 As extra nicety as each generator/iterator/range returns just one object 
 I would like to be able to do:
 
 // i counts starting from 1, j iterates on iterJ and in parallel k 
 iterates on a.all
 foreach(i,j,k;1..$,iterJ,a.all){
     //...
 }
 
 and have it expanded to
 
 Range!(int) r1=1..$;
 alias iterJ r2;
 typeof(a.all) r3=a.all;
 while(!(r1.done || r2.done || r3.done)){
     typeof(r1.next) i=r1.next;
     typeof(r2.next) j=r2.next;
     typeof(r3.next) k=r3.next;
     //...
 }

Walter and I were discussing about ranges exposing key, key1, ... keyn. 
In that case foreach with multiple arguments would work, and would bind 
each of the extra argument to key, key1 etc. respectively.


Andrei

Sep 12 2008

Fawzi Mohamed <fmohamed mac.com> writes:

On 2008-09-12 17:48:02 +0200, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 Fawzi Mohamed wrote:
 I like the new proposal much more than the first.
 
 I believe you will be able to use it successfully in std.algorithm.
 I still would have preferred an operation like a sameHead or 
 compareHeadPosition (that might or might not return the order, but at 
 least tests for equality) so that upon request (-debug flag?) one would 
 be able to make all range operation safe (with overhead) in a generic 
 way, but it is up to you.

 
 Comparing for equality of heads is very important. For now you can 
 obtain it as a non-primitive by invoking:
 
 auto sameHead = r.before(s).done;

nice I hadn't thought about this

 The above also show how "done" is not always very expressive. Also you 
 can compare whether two ranges have the same end by invoking:
 
 auto sameRange = r is s;

I suppose that you mean that "is" compares both the start and the end...

 I what I really care about is the following:
 I want foreach magic on all objects that support .done and .next, even 
 if they are not ranges.
 foreach is about iteration, iteration needs only .done and .next (a 
 generator, iterator whatever), and it should work with that.
 Do not force the range idea on foreach iteration.
 foreach is a language construct, not a library one and should allow for 
 maximum flexibility.

 
 Yes. Walter asked me to send him the syntactic transformation that 
 foreach and foreach_reverse need to do. Duck typing will be used so as 
 long as you define the proper names you're in good shape.

very nice, this is important because generic algorithms aside you might 
want to loop on all sort of things.

 As extra nicety as each generator/iterator/range returns just one 
 object I would like to be able to do:
 
 // i counts starting from 1, j iterates on iterJ and in parallel k 
 iterates on a.all
 foreach(i,j,k;1..$,iterJ,a.all){
     //...
 }
 
 and have it expanded to
 
 Range!(int) r1=1..$;
 alias iterJ r2;
 typeof(a.all) r3=a.all;
 while(!(r1.done || r2.done || r3.done)){
     typeof(r1.next) i=r1.next;
     typeof(r2.next) j=r2.next;
     typeof(r3.next) k=r3.next;
     //...
 }

 
 Walter and I were discussing about ranges exposing key, key1, ... keyn. 
 In that case foreach with multiple arguments would work, and would bind 
 each of the extra argument to key, key1 etc. respectively.

I like the possibility to give several iterators at once to foreach, so 
that you never have to define two opApply (one with index, one 
without), but you can easily add a counter if you want to.

You can solve this also by have a "combiner" of iterators, but in my 
opinion it is uglier.

If you allow an iterator to return several objects and also to have 
several iterators that are advanced together should use another syntax 
than the one I proposed, something like

foreach(i;1..$; j; iterJ; k,l; multiIter){

}

otherwise matching iteration variables with iterators gets a mess.

Fawzi

Sep 12 2008

"Denis Koroskin" <2korden gmail.com> writes:

On Fri, 12 Sep 2008 20:10:28 +0400, Fawzi Mohamed <fmohamed mac.com> wrote:

 On 2008-09-12 17:48:02 +0200, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> said:

 Fawzi Mohamed wrote:
 foreach(i,j,k;1..$,iterJ,a.all){
     //...
 }



Foreach over multiple ranges in paraller is great, but it is quite hard to  
match key/value to the ranges in your example, because they are far from  
each other, especially if ranges are evaluated in some (possibly long)  
expressions.

I prefer the following syntax more:

foreach (key0, value0 : range0; value1 : range1; ... ) { // or something  
like this
}

This way key/value and range are close to each other and you don't need to  
move you look back and forth to understand what range does this value  
correspond too.

Sep 12 2008

"Bill Baxter" <wbaxter gmail.com> writes:

On Sat, Sep 13, 2008 at 3:21 AM, Denis Koroskin <2korden gmail.com> wrote:
 On Fri, 12 Sep 2008 20:10:28 +0400, Fawzi Mohamed <fmohamed mac.com> wrote:

 On 2008-09-12 17:48:02 +0200, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> said:

 Fawzi Mohamed wrote:
 foreach(i,j,k;1..$,iterJ,a.all){
    //...
 }



 Foreach over multiple ranges in paraller is great, but it is quite hard to
 match key/value to the ranges in your example, because they are far from
 each other, especially if ranges are evaluated in some (possibly long)
 expressions.

 I prefer the following syntax more:

 foreach (key0, value0 : range0; value1 : range1; ... ) { // or something
 like this
 }

 This way key/value and range are close to each other and you don't need to
 move you look back and forth to understand what range does this value
 correspond too.

Err, you just repeated exactly what he said.

--bb

Sep 12 2008

"Bill Baxter" <wbaxter gmail.com> writes:

On Sat, Sep 13, 2008 at 7:28 AM, Bill Baxter <wbaxter gmail.com> wrote:
 On Sat, Sep 13, 2008 at 3:21 AM, Denis Koroskin <2korden gmail.com> wrote:
 On Fri, 12 Sep 2008 20:10:28 +0400, Fawzi Mohamed <fmohamed mac.com> wrote:

 On 2008-09-12 17:48:02 +0200, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> said:

 Fawzi Mohamed wrote:
 foreach(i,j,k;1..$,iterJ,a.all){
    //...
 }



 Foreach over multiple ranges in paraller is great, but it is quite hard to
 match key/value to the ranges in your example, because they are far from
 each other, especially if ranges are evaluated in some (possibly long)
 expressions.

 I prefer the following syntax more:

 foreach (key0, value0 : range0; value1 : range1; ... ) { // or something
 like this
 }

 This way key/value and range are close to each other and you don't need to
 move you look back and forth to understand what range does this value
 correspond too.

 Err, you just repeated exactly what he said.

Ok sorry I do see a difference now, but you quoted the wrong one of
Fawzi's,  you should have quoted this one:

foreach(i;1..$; j; iterJ; k,l; multiIter){

}

Which I think falls into your "or something like this" category.

--bb

Sep 12 2008

Sergey Gromov <snake.scaly gmail.com> writes:

Steven Schveighoffer <schveiguy yahoo.com> wrote:
 What I see as the biggest downside is the cumbersome and verbose code of 
 moving the 'iterator' around, as every time I want to move forward, I 
 construct a new range, and every time I want to move backwards I construct a 
 new range (and construct a new 'center' afterwards).  So a 'move back one' 
 looks like:
 
 auto before = all.before(center);
 if(!before.isEmpty)
   center = before.pop.end;
 
 And to move forward it's:
 auto after = all.after(center);
 if(!after.isEmpty)
   center = after.next.begin;
 
 To get the value there, I have to do:
 all.after(center).left // or whatever gets decided as the 'get first value 
 of range' member
 
 or if opStar is used:
 
 *all.after(center);
 
 I much prefer:
 
 forward:
 if(center != list.end)
     ++center;
 
 reverse:
 if(center != list.begin)
    --center;
 
 get value:
 *center;
 
 Especially without all the extra overhead
 
 I see both methods as being just as open to mistakes, the first more-so, and 
 more difficult to comprehend (at least for me).

Yes, these are valid points, I completely agree.  But there are also 
other points.  Let me voice some of them.

1.  Probably most important.  You say here that ranges suck at 
incremental bidirectional iteration over a linked list, as Bill aslo 
agrees with.  This seems true.  But this sort of iteration is not a goal 
in its own.  It's just an idiomatic *iterator* solution for a range of 
real-world problems.  I can't think of any such problem from the top of 
my head but it's probably a matter of my education.  Bill proposed one 
already.

I want to say that I believe that for any such real-world problem there 
is a range solution that's probably better than a direct mapping of an 
existing iterator solution.  The analogy would be trying to write in C++ 
as if you were using Python, or Haskell, and then declare that C++ sucks 
because it requires bulky, inefficient and error-prone code to implement 
simple functional idioms.  Different languages require different idioms 
and ranges are a different language from iterators.

For instance, Bill's undo/redo stack consisted of two entities: a list 
of operations, and a cursor. It was OK and natural with iterators.  It 
sucks with ranges.  Okay, I'm also going to use two entities: undo stack 
and redo stacks. Undo => pop from one, push to another.  New operation 
=> push undo, trash redo.  Doesn't it look simpler and safer?  Well, it 
doesn't use ranges, at least from the user perspective, so what?

I also believe that a regular expression engine would benefit from using 
ranges rather than suffer.

2.  There was a special case that a center marker couldn't have been 
dereferenced.  Let's imagine you really needed it.  OK, no problem.  
Let's create a special kind of ranges, Cursor.  Cursor always contains 
one element.  Its begin points to that element and its end is calculated 
so that it always points after that element no matter what.  This is as 
valid as having an iterator pointing to that same element because you 
must guarantee in both cases that an iterator is dereferenceable.  This 
is ad-hoc, yes, but an EOF iterator in C++ is no less ad-hoc.

Sep 11 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Steven Schveighoffer wrote:
 I am also sure that if I sat down long enough contemplating my navel I
 could come with more examples of iterators=good/ranges=bad.
 <snip>

 
 Now you're just being rude :)  Please note that I'm not attacking you 
 personally.  All I'm pointing out is that your solution solves certain 
 problems VERY well, but leaves other problems not solved.  I think allowing 
 iterators/cursors would solve all the problems.  I might be proven wrong, 
 but certainly I don't think you've done that so far.  I'd love to be proven 
 wrong, since I agree that iterators are generally unsafe.

Didn't mean to. You are making great points, and I hope (without being 
sure) they can be addressed. The "contemplating navel" thing is a fave 
quote of mine from Bjarne's book on C++.

Andrei

Sep 10 2008

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

"Andrei Alexandrescu" wrote
 Steven Schveighoffer wrote:
 I am also sure that if I sat down long enough contemplating my navel I
 could come with more examples of iterators=good/ranges=bad.
 <snip>

 Now you're just being rude :)  Please note that I'm not attacking you 
 personally.  All I'm pointing out is that your solution solves certain 
 problems VERY well, but leaves other problems not solved.  I think 
 allowing iterators/cursors would solve all the problems.  I might be 
 proven wrong, but certainly I don't think you've done that so far.  I'd 
 love to be proven wrong, since I agree that iterators are generally 
 unsafe.

 Didn't mean to. You are making great points, and I hope (without being 
 sure) they can be addressed. The "contemplating navel" thing is a fave 
 quote of mine from Bjarne's book on C++.

Didn't know that :)  Sometimes when someone is not aware of a quote/joke, it 
seems more personally motivated.  I agree that our discussion is not 
bringing either of us to the other's side.  I'm also hopeful the points can 
be addressed with ranges.

-Steve

Sep 10 2008

"Bill Baxter" <wbaxter gmail.com> writes:

On Thu, Sep 11, 2008 at 1:45 AM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 Bill Baxter wrote:
 But upon further reflection I think it may be that it's just not what
 I would call a bidirectional range.  By that I mean it's not good at
 solving the problems that a bidirectional iterator in C++ is good for.

 It's good. I proved that constructively for std.algorithm, which of course
 doesn't stand. But I've also proved it theoretically informally to myself.
 Please imagine an algorithm that bidir iterators do and bidir ranges don't.

Here's one from DinkumWare's <algorithm>:

template<class _BidIt1,
	class _BidIt2,
	class _BidIt3> inline
	_BidIt3 _Merge_backward(_BidIt1 _First1, _BidIt1 _Last1,
		_BidIt2 _First2, _BidIt2 _Last2, _BidIt3 _Dest, _Range_checked_iterator_tag)
	{	// merge backwards to _Dest, using operator<
	for (; ; )
		if (_First1 == _Last1)
			return (_STDEXT unchecked_copy_backward(_First2, _Last2, _Dest));
		else if (_First2 == _Last2)
			return (_STDEXT unchecked_copy_backward(_First1, _Last1, _Dest));
		else if (_DEBUG_LT(*--_Last2, *--_Last1))
			*--_Dest = *_Last1, ++_Last2;
		else
			*--_Dest = *_Last2, ++_Last1;
	}


You can probably work around it some way, but basically it's using the
ability to ++ and -- on the same end as a sort of "peek next".

Here's another, an insertion sort:

template<class _BidIt,
	class _Ty> inline
	void _Insertion_sort1(_BidIt _First, _BidIt _Last, _Ty *)
	{	// insertion sort [_First, _Last), using operator<
	if (_First != _Last)
		for (_BidIt _Next = _First; ++_Next != _Last; )
			{	// order next element
			_BidIt _Next1 = _Next;
			_Ty _Val = *_Next;

			if (_DEBUG_LT(_Val, *_First))
				{	// found new earliest element, move to front
				_STDEXT unchecked_copy_backward(_First, _Next, ++_Next1);
				*_First = _Val;
				}
			else
				{	// look for insertion point after first
				for (_BidIt _First1 = _Next1;
					_DEBUG_LT(_Val, *--_First1);
					_Next1 = _First1)
					*_Next1 = *_First1;	// move hole down
				*_Next1 = _Val;	// insert element in hole
				}
			}
	}

I /think/ that's taking advantage of going both ways on the same
iterator (or at least copies of the same iterator), but the code is a
little hard to read.

Part of my argument here is that it's more natural and requires less
cognitive load to think of things in terms of moving a cursor back and
forth.  So you won't convince me by constructing clever range unions
and differences to achieve the same thing as a simple ++ and -- can
do. :-)

Also a cursor that can go forward and backwards inbetween two limits
is exactly what is easy to do with a doubly linked list.  If you know
how to use a doubly linked list you know how to use my version of
bidir ranges.  That's true in all cases where you are using a
doubly-linked list.  For yours you have to think about how to map what
you want to do onto the operations that are actually available.  To me
that's clearly a greater cognitive load.

Another example is a function that is supposed to put a value back
into its proper sorted place.  Say you had a sorted list and now the
value of one node has been modified.  Write the function that puts
that value back in its rightful place.  The natural way to do it is
with a range that has a cursor pointing to the modified node that can
be moved either back or forward.

Also I see the function "std::advance" is used quite a lot in this
implementation of std::algorithm.  That moves the cursor forwards or
backwards N steps depending on the sign of N.

  Your bidir range may be useful (though I'm not really convinced that
 very many algorithms need what it provides) --  but I think one also
 needs an iterator that's good at what C++'s bidir iterators are good
 at, i.e. moving the active cursor backwards or forwards.  I would call
 your construct more of a "double-headed" range than a bidirectional
 one.

 Oh, one more thing. If you study any algorithm that uses bidirectional
 iterators (such as reverse or Stepanov's partition), you'll notice that
 ALWAYS WITHOUT EXCEPTION there's two iterators involved. One moves up, the
 other moves down. This is absolutely essential because it tells that a
 bidirectional range models all a bidirectional iterator could ever do. If
 you can move some bidirectional iterator down, then definitely you know its
 former boundary so you can model that move with a bidirectional range.

This does seem to be true of a many of the algorithms that use bidirs
in std::algorithm, which did surprise me.  Actually seems to me that
these types of algorithms are only using bidirectional iterators for a
technicality -- because you can't compare a forward iterator and a
reverse iterator.  The bidirectionality of the iterator is not really
material.  One only needs the ++ op for one and the -- op for the
other.  That says to me the name of the range that does these two
things should be something other than "bidirectional", because
bidirectionality is not really the key property.  "Two-headed range"
or "squeeze range" or "pinch range" might be good names.

But anyway, I am convinced that your shrinking range type is useful.

 This is fundamental. Ranges NEVER grow. They ALWAYS shrink. Why? Simple:
 because a range has no idea what's outside of itself. It starts life with
 information of its limits from the container, and knows nothing about what's
 outside those limits. Consequently it ALWAYS WITHOUT EXCEPTION shrinks.

Doesn't seem to be quite so absolute from my perusal of std::algorithm.

--bb

Sep 10 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Bill Baxter wrote:
 Part of my argument here is that it's more natural and requires less
 cognitive load to think of things in terms of moving a cursor back and
 forth.  So you won't convince me by constructing clever range unions
 and differences to achieve the same thing as a simple ++ and -- can
 do. :-)

I agree, and I agreed in the draft on ranges, that code using ranges can 
on occasion be more awkward than code using iterators. I think their 
advantages do outweigh this disadvantage.

 This is fundamental. Ranges NEVER grow. They ALWAYS shrink. Why? Simple:
 because a range has no idea what's outside of itself. It starts life with
 information of its limits from the container, and knows nothing about what's
 outside those limits. Consequently it ALWAYS WITHOUT EXCEPTION shrinks.

 
 Doesn't seem to be quite so absolute from my perusal of std::algorithm.

Code using iterators will naturally avail itself of all of their 
advantages. Code using ranges will do the same. From my experience with 
rewriting std.algorithm, the working style is a bit different. On 
occasion iterators are indeed more flexible. But overall my code has 
reduced in size and became safer because ranges are a higher-level 
abstraction. Also often code using ranges is easier to follow because 
there are fewer variables with more apparent meaning, and the progress 
of the algorithm is easier to follow by tracking range shrinking.


Andrei

Sep 10 2008

Sergey Gromov <snake.scaly gmail.com> writes:

Bill Baxter <wbaxter gmail.com> wrote:
 Here's one from DinkumWare's <algorithm>:
 
 template<class _BidIt1,
 	class _BidIt2,
 	class _BidIt3> inline
 	_BidIt3 _Merge_backward(_BidIt1 _First1, _BidIt1 _Last1,
 		_BidIt2 _First2, _BidIt2 _Last2, _BidIt3 _Dest, _Range_checked_iterator_tag)
 	{	// merge backwards to _Dest, using operator<
 	for (; ; )
 		if (_First1 == _Last1)
 			return (_STDEXT unchecked_copy_backward(_First2, _Last2, _Dest));
 		else if (_First2 == _Last2)
 			return (_STDEXT unchecked_copy_backward(_First1, _Last1, _Dest));
 		else if (_DEBUG_LT(*--_Last2, *--_Last1))
 			*--_Dest = *_Last1, ++_Last2;
 		else
 			*--_Dest = *_Last2, ++_Last1;
 	}
 
 
 You can probably work around it some way, but basically it's using the
 ability to ++ and -- on the same end as a sort of "peek next".

They're using the ability to ++ and -- to avoid post-decrement at any 
cost.  Otherwise it'd be just

 		else if (_DEBUG_LT(*_Last2, *_Last1))
 			*--_Dest = *_Last1--;
 		else
 			*--_Dest = *_Last2--;

Now the same algorithm in ranges:

 Merge_backward(R1, R2, R3)(R1 s1, R2 s2, R3 dst)
 {
     for (;;)
     {
         if (s1.isEmpty())
             dst[] = s2[];
         else if (s2.isEmpty())
             dst[] = s1[];
         else if (s1.last < s2.last)
         {
             dst.last = s1.last;
             s1.shrink();
         }
         else
         {
             dst.last = s2.last;
             s2.shrink();
         }
         dst.shrink();
     }
 }

If there were shrink-on-read and (eureka!) shrink-on-write operations, 
it would be even shorter:

         else if (s1.last < s2.last)
             dst.putBack(s1.getBack());
         else
             dst.putBack(s2.getBack());

where both getBack() and putBack() shrink the range from the end side.

Sep 10 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Sergey Gromov wrote:
 Bill Baxter <wbaxter gmail.com> wrote:
 Here's one from DinkumWare's <algorithm>:

 template<class _BidIt1,
 	class _BidIt2,
 	class _BidIt3> inline
 	_BidIt3 _Merge_backward(_BidIt1 _First1, _BidIt1 _Last1,
 		_BidIt2 _First2, _BidIt2 _Last2, _BidIt3 _Dest, _Range_checked_iterator_tag)
 	{	// merge backwards to _Dest, using operator<
 	for (; ; )
 		if (_First1 == _Last1)
 			return (_STDEXT unchecked_copy_backward(_First2, _Last2, _Dest));
 		else if (_First2 == _Last2)
 			return (_STDEXT unchecked_copy_backward(_First1, _Last1, _Dest));
 		else if (_DEBUG_LT(*--_Last2, *--_Last1))
 			*--_Dest = *_Last1, ++_Last2;
 		else
 			*--_Dest = *_Last2, ++_Last1;
 	}


 You can probably work around it some way, but basically it's using the
 ability to ++ and -- on the same end as a sort of "peek next".

 
 They're using the ability to ++ and -- to avoid post-decrement at any 
 cost.  Otherwise it'd be just
 
 		else if (_DEBUG_LT(*_Last2, *_Last1))
 			*--_Dest = *_Last1--;
 		else
 			*--_Dest = *_Last2--;

 
 Now the same algorithm in ranges:
 
 Merge_backward(R1, R2, R3)(R1 s1, R2 s2, R3 dst)
 {
     for (;;)
     {
         if (s1.isEmpty())
             dst[] = s2[];
         else if (s2.isEmpty())
             dst[] = s1[];
         else if (s1.last < s2.last)
         {
             dst.last = s1.last;
             s1.shrink();
         }
         else
         {
             dst.last = s2.last;
             s2.shrink();
         }
         dst.shrink();
     }
 }

 
 If there were shrink-on-read and (eureka!) shrink-on-write operations, 
 it would be even shorter:
 
         else if (s1.last < s2.last)
             dst.putBack(s1.getBack());
         else
             dst.putBack(s2.getBack());

 
 where both getBack() and putBack() shrink the range from the end side.

Got to say I'm pretty much in awe :o). But (without thinking much about 
it) I think the assignments dst[] = s1[] and dst[] = s2[] should be 
replaced with calls to copy(retro(sx), retro(dst)). No?

Andrei

Sep 10 2008

Sergey Gromov <snake.scaly gmail.com> writes:

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:
 Sergey Gromov wrote:
 Bill Baxter <wbaxter gmail.com> wrote:
 Here's one from DinkumWare's <algorithm>:

 template<class _BidIt1,
 	class _BidIt2,
 	class _BidIt3> inline
 	_BidIt3 _Merge_backward(_BidIt1 _First1, _BidIt1 _Last1,
 		_BidIt2 _First2, _BidIt2 _Last2, _BidIt3 _Dest, _Range_checked_iterator_tag)
 	{	// merge backwards to _Dest, using operator<
 	for (; ; )
 		if (_First1 == _Last1)
 			return (_STDEXT unchecked_copy_backward(_First2, _Last2, _Dest));
 		else if (_First2 == _Last2)
 			return (_STDEXT unchecked_copy_backward(_First1, _Last1, _Dest));
 		else if (_DEBUG_LT(*--_Last2, *--_Last1))
 			*--_Dest = *_Last1, ++_Last2;
 		else
 			*--_Dest = *_Last2, ++_Last1;
 	}


 You can probably work around it some way, but basically it's using the
 ability to ++ and -- on the same end as a sort of "peek next".

 
 They're using the ability to ++ and -- to avoid post-decrement at any 
 cost.  Otherwise it'd be just
 
 		else if (_DEBUG_LT(*_Last2, *_Last1))
 			*--_Dest = *_Last1--;
 		else
 			*--_Dest = *_Last2--;

 
 Now the same algorithm in ranges:
 
 Merge_backward(R1, R2, R3)(R1 s1, R2 s2, R3 dst)
 {
     for (;;)
     {
         if (s1.isEmpty())
             dst[] = s2[];
         else if (s2.isEmpty())
             dst[] = s1[];
         else if (s1.last < s2.last)
         {
             dst.last = s1.last;
             s1.shrink();
         }
         else
         {
             dst.last = s2.last;
             s2.shrink();
         }
         dst.shrink();
     }
 }

 
 If there were shrink-on-read and (eureka!) shrink-on-write operations, 
 it would be even shorter:
 
         else if (s1.last < s2.last)
             dst.putBack(s1.getBack());
         else
             dst.putBack(s2.getBack());

 
 where both getBack() and putBack() shrink the range from the end side.

 
 Got to say I'm pretty much in awe :o). But (without thinking much about 
 it) I think the assignments dst[] = s1[] and dst[] = s2[] should be 
 replaced with calls to copy(retro(sx), retro(dst)). No?

They originally use backward copying because they don't know where the 
destination range starts, long live buffer overrun.  In case of ranges 
the destination range is well defined and there are no overlaps---that 
is, I believe this algorithm doesn't support using the same buffer as 
source and destination.  So slice copying should be OK.

Sep 10 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Sergey Gromov wrote:
 Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:
 Sergey Gromov wrote:
 Bill Baxter <wbaxter gmail.com> wrote:
 Here's one from DinkumWare's <algorithm>:

 template<class _BidIt1,
 	class _BidIt2,
 	class _BidIt3> inline
 	_BidIt3 _Merge_backward(_BidIt1 _First1, _BidIt1 _Last1,
 		_BidIt2 _First2, _BidIt2 _Last2, _BidIt3 _Dest, _Range_checked_iterator_tag)
 	{	// merge backwards to _Dest, using operator<
 	for (; ; )
 		if (_First1 == _Last1)
 			return (_STDEXT unchecked_copy_backward(_First2, _Last2, _Dest));
 		else if (_First2 == _Last2)
 			return (_STDEXT unchecked_copy_backward(_First1, _Last1, _Dest));
 		else if (_DEBUG_LT(*--_Last2, *--_Last1))
 			*--_Dest = *_Last1, ++_Last2;
 		else
 			*--_Dest = *_Last2, ++_Last1;
 	}


 You can probably work around it some way, but basically it's using the
 ability to ++ and -- on the same end as a sort of "peek next".

 They're using the ability to ++ and -- to avoid post-decrement at any 
 cost.  Otherwise it'd be just

 		else if (_DEBUG_LT(*_Last2, *_Last1))
 			*--_Dest = *_Last1--;
 		else
 			*--_Dest = *_Last2--;

 Now the same algorithm in ranges:

 Merge_backward(R1, R2, R3)(R1 s1, R2 s2, R3 dst)
 {
     for (;;)
     {
         if (s1.isEmpty())
             dst[] = s2[];
         else if (s2.isEmpty())
             dst[] = s1[];
         else if (s1.last < s2.last)
         {
             dst.last = s1.last;
             s1.shrink();
         }
         else
         {
             dst.last = s2.last;
             s2.shrink();
         }
         dst.shrink();
     }
 }

 If there were shrink-on-read and (eureka!) shrink-on-write operations, 
 it would be even shorter:

         else if (s1.last < s2.last)
             dst.putBack(s1.getBack());
         else
             dst.putBack(s2.getBack());

 where both getBack() and putBack() shrink the range from the end side.

 Got to say I'm pretty much in awe :o). But (without thinking much about 
 it) I think the assignments dst[] = s1[] and dst[] = s2[] should be 
 replaced with calls to copy(retro(sx), retro(dst)). No?

 
 They originally use backward copying because they don't know where the 
 destination range starts, long live buffer overrun.  In case of ranges 
 the destination range is well defined and there are no overlaps---that 
 is, I believe this algorithm doesn't support using the same buffer as 
 source and destination.  So slice copying should be OK.

One up for ranges then. Whew. I was due for it :o).

Andrei

Sep 10 2008

Sean Kelly <sean invisibleduck.org> writes:

Sergey Gromov wrote:
 
 Now the same algorithm in ranges:
 
 Merge_backward(R1, R2, R3)(R1 s1, R2 s2, R3 dst)
 {
     for (;;)
     {
         if (s1.isEmpty())
             dst[] = s2[];
         else if (s2.isEmpty())
             dst[] = s1[];


I'm not sure the above is correct.  It should return after the copy is 
performed, and the code also assumes that the size of dst is equal to 
the size of s2 and s1, respectively.

         else if (s1.last < s2.last)
         {
             dst.last = s1.last;
             s1.shrink();
         }
         else
         {
             dst.last = s2.last;
             s2.shrink();
         }
         dst.shrink();
     }
 }

 
 If there were shrink-on-read and (eureka!) shrink-on-write operations, 
 it would be even shorter:
 
         else if (s1.last < s2.last)
             dst.putBack(s1.getBack());
         else
             dst.putBack(s2.getBack());

 
 where both getBack() and putBack() shrink the range from the end side.

Very slick.


Sean

Sep 10 2008

Sergey Gromov <snake.scaly gmail.com> writes:

Sean Kelly <sean invisibleduck.org> wrote:
 Sergey Gromov wrote:
 
 Now the same algorithm in ranges:
 
 Merge_backward(R1, R2, R3)(R1 s1, R2 s2, R3 dst)
 {
     for (;;)
     {
         if (s1.isEmpty())
             dst[] = s2[];
         else if (s2.isEmpty())
             dst[] = s1[];


 
 I'm not sure the above is correct.  It should return after the copy is 
 performed, and the code also assumes that the size of dst is equal to 
 the size of s2 and s1, respectively.

Of course there should be return statements, thank you.  I've never 
tested this code (obviously), just've thrown it together, so there ought 
to be stupid mistakes like this.

As to the destination size.  This is merge sort.  The size of 
destination buffer must be the sum of the sizes of source buffers.  As 
soon as one of the source buffers is empty, i.e. completely moved to the 
destination, there must be place exactly for what left in another source 
buffer.  If this condition doesn't hold then the arguments weren't 
correct in the first place.

Sep 10 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Sergey Gromov wrote:
 Sean Kelly <sean invisibleduck.org> wrote:
 Sergey Gromov wrote:
 Now the same algorithm in ranges:

 Merge_backward(R1, R2, R3)(R1 s1, R2 s2, R3 dst)
 {
     for (;;)
     {
         if (s1.isEmpty())
             dst[] = s2[];
         else if (s2.isEmpty())
             dst[] = s1[];


 I'm not sure the above is correct.  It should return after the copy is 
 performed, and the code also assumes that the size of dst is equal to 
 the size of s2 and s1, respectively.

 
 Of course there should be return statements, thank you.  I've never 
 tested this code (obviously), just've thrown it together, so there ought 
 to be stupid mistakes like this.
 
 As to the destination size.  This is merge sort.  The size of 
 destination buffer must be the sum of the sizes of source buffers.  As 
 soon as one of the source buffers is empty, i.e. completely moved to the 
 destination, there must be place exactly for what left in another source 
 buffer.  If this condition doesn't hold then the arguments weren't 
 correct in the first place.

Speaking of copying, C++'s std::copy and friends have been under 
increasing scrutiny lately because of their inability to modularly 
protect data against overruns. STL's three-argument functions that copy 
out are often a kiss of death for inexperienced STL users. I'm glad 
ranges cut that Gordian knot.

Andrei

Sep 10 2008

Sean Kelly <sean invisibleduck.org> writes:

Sergey Gromov wrote:
 Sean Kelly <sean invisibleduck.org> wrote:
 Sergey Gromov wrote:
 Now the same algorithm in ranges:

 Merge_backward(R1, R2, R3)(R1 s1, R2 s2, R3 dst)
 {
     for (;;)
     {
         if (s1.isEmpty())
             dst[] = s2[];
         else if (s2.isEmpty())
             dst[] = s1[];


 I'm not sure the above is correct.  It should return after the copy is 
 performed, and the code also assumes that the size of dst is equal to 
 the size of s2 and s1, respectively.

 
 Of course there should be return statements, thank you.  I've never 
 tested this code (obviously), just've thrown it together, so there ought 
 to be stupid mistakes like this.
 
 As to the destination size.  This is merge sort.  The size of 
 destination buffer must be the sum of the sizes of source buffers.  As 
 soon as one of the source buffers is empty, i.e. completely moved to the 
 destination, there must be place exactly for what left in another source 
 buffer.  If this condition doesn't hold then the arguments weren't 
 correct in the first place.

Oops, of course.


Sean

Sep 10 2008

Sergey Gromov <snake.scaly gmail.com> writes:

Bill Baxter <wbaxter gmail.com> wrote:
 Here's another, an insertion sort:
 
 template<class _BidIt,
 	class _Ty> inline
 	void _Insertion_sort1(_BidIt _First, _BidIt _Last, _Ty *)
 	{	// insertion sort [_First, _Last), using operator<
 	if (_First != _Last)
 		for (_BidIt _Next = _First; ++_Next != _Last; )
 			{	// order next element
 			_BidIt _Next1 = _Next;
 			_Ty _Val = *_Next;
 
 			if (_DEBUG_LT(_Val, *_First))
 				{	// found new earliest element, move to front
 				_STDEXT unchecked_copy_backward(_First, _Next, ++_Next1);
 				*_First = _Val;
 				}
 			else
 				{	// look for insertion point after first
 				for (_BidIt _First1 = _Next1;
 					_DEBUG_LT(_Val, *--_First1);
 					_Next1 = _First1)
 					*_Next1 = *_First1;	// move hole down
 				*_Next1 = _Val;	// insert element in hole
 				}
 			}
 	}

This is a bit more complex.  If only basic operations on ranges are 
allowed, it looks like this:

 void Insertion_sort1(R, T)(R r)
 {
     if (!r.isEmpty())
     {
         R tail = r;
         do
         {
             tail.next();
             R head = r.before(tail);
             T _Val = head.last;
             if (_Val < head.first)
             {
                 R from, to;
                 from = to = head;
                 from.shrink();
                 to.next();
                 copy(retro(from), retro(to));
                 head.first = _Val;
             }
             else
             {
                 R head1 = head;
                 head1.shrink();
                 for (; _Val < head1.last; head.shrink(), head1.shrink())
                     head.last = head1.last;
                 head.last = _Val;
             }
         }
         while (!tail.isEmpty())
     }
 }

Though it starts to look much better if we employ shrink-on-read/shrink-
on-write AND copy-on-shrink, AKA incremental slicing:

 void Insertion_sort1(R, T)(R r)
 {
     if (!r.isEmpty())
     {
         R tail = r;
         do
         {
             tail.next();
             R head = r.before(tail);
             T _Val = head.last;
             if (_Val < head.first)
             {
                 copy(retro(head[0..$-1]), retro(head[1..$]));
                 head.first = _Val;
             }
             else
             {
                 for (R head1 = head[0..$-1]; _Val < head1.last;)
                     head.putBack(head1.getBack());
                 head.last = _Val;
             }
         }
         while (!tail.isEmpty())
     }
 }

Sep 10 2008

Fawzi Mohamed <fmohamed mac.com> writes:

On 2008-09-10 14:35:29 +0200, "Bill Baxter" <wbaxter gmail.com> said:

 On Wed, Sep 10, 2008 at 7:47 AM, Fawzi Mohamed <fmohamed mac.com> wrote:
 
 2) All the methods with intersection of iterator in my opinion are
 difficult to memorize, and rarely used, I would scrap them.
 Instead I would add the comparison operation .atSamePlace(iterator!(T)y)
 that would say if two iterators are at the same place. With it one gets back
 all the power of pointers, and with a syntax and use that are
 understandable.

 
 But that comparison operation is not enough to implement anything of
 substance. Try your hand at a few classic algorithms and you'll see.

 
 are you sure? then a range is *exactly* equivalent to a STL iterator, only
 that it cannot go out of bounds:
 // left1-left2:
 while((!i1.isEmpty) && (!i1.atSamePlace(i2))){
 i1.next;
 }
 // left2-left1:
 while((!i2.isEmpty) && (!i1.atSamePlace(i2))){
 i1.next;
 }
 // union 1-2
 while((!i1.isEmpty) && (!(i1.atSamePlace(i2))){
 i1.next;
 }
 while(!i2.isEmpty){
 i2.next;
 }
 // union 2-1
 ...
 // lower triangle
 i1=c.all;
 while(!i1.isEmpty){
 i2=c.all;
 while(!i2.isEmpty && !i2.atSamePlace(i1)){
 i2.next;
 }

 
 Your code shows that you can successfully iterate over the same
 elements described by Andrei's various unions and differences, but
 they do not show how you would, say, pass that new range another
 function to do that job.  Such as you would want to do in say, a
 recursive sort.  Since in this design you can't set or access the
 individual iterator-like components of a range directly, being able to
 copy the begin or end iterator from one range over to another is
 necessary, I think.

yes you are right this operation on the simplest iterators cannot be 
preformed recursevely without overhead (you can do it once, but then 
you need to store i1 & i2 in the new iterator, to do it again will add 
more and more overhead.
Range union... can be used efficiently and safely only if the iterator 
has an order that can be easily checked, this is a useful abstraction, 
but not the basic one.

 But I think you and I are in agreement that it would be easier and
 more natural to think of ranges as iterators augmented with
 information about bounds, as opposed to a contiguous block of things
 from A to B.
 
 well these are the operations that you can do on basically all iterators
 (and with wich you can define new iterators).
 The one you propose need an underlying total order that can be efficiently
 checked, for example iterators on trees do not have necessarily this
 property, and then getting your kind of intersection can be difficult (and
 not faster than the operation using atSamePlace.

 
 I don't think that's correct.  Andrei's system does not need a total
 order any more than yours does.  The unions and diffs just create new
 ranges by combining the components of existing ranges.  They don't
 need to know anything about what happens in between those points or
 how you get from one to the other.  Just take the "begin" of this guy
 and put it together with the "end" of that guy, for example.  It
 doesn't require knowing how to get from anywhere to anywhere to create
 that new range.

well if you don't have a total order that you can easily check then 
this might be very unsafe
think to i1.begin..i2.begin if i2.begin<i1.begin, you might miss that 
it is empty, and iterate forever...

Fawzi

Sep 10 2008

JAnderson <ask me.com> writes:

Hi Andrei,

I like the idea behind ranges.  I don't like C++'s / stl's long winded 
syntax at all.  Its so large that it generally uses up several lines 
along with several typedefs etc...  All that work just to iterate over 
some data.  The longer things get the more error prone they get... how 
many times have I put an begin when I meant to put end *sigh*.

However I currently disagree on this point.

Andrei Alexandrescu wrote:
 Fine. So instead of saying:

 foreach (e; c.all) { ... }

 you can say

 foreach (e; c) { ... }

 I think that's some dubious savings.


I think its useful to have the implicit range conversion.  Consider 
writing generic/template code.  Of course built in arrays could provide 
the .all but then consider passing around ranges.  That would also mean 
all ranges would also have a .all (could we go .all.all.all for 
instance?).  I'm all for compile time checking however I think that 
implicit .all (with of course an explicit option) will make it easy to 
change a function that once took an object to take a simple range  Also 
it would make it easy to change from one way of getting at a range to 
another.

What about matrices?  They don't implement default .all, they would 
provide like .col and .row.

 Andrei

Sep 09 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

JAnderson wrote:

 Hi Andrei,

 I like the idea behind ranges.  I don't like C++'s / stl's long winded 
 syntax at all.  Its so large that it generally uses up several lines 
 along with several typedefs etc...  All that work just to iterate over 
 some data.  The longer things get the more error prone they get... how 
 many times have I put an begin when I meant to put end *sigh*.

 However I currently disagree on this point.

 Andrei Alexandrescu wrote:
  >
  > Fine. So instead of saying:
  >
  > foreach (e; c.all) { ... }
  >
  > you can say
  >
  > foreach (e; c) { ... }
  >
  > I think that's some dubious savings.

 I think its useful to have the implicit range conversion.  Consider 
 writing generic/template code.  Of course built in arrays could provide 
 the .all but then consider passing around ranges.  That would also mean 
 all ranges would also have a .all (could we go .all.all.all for 
 instance?).

There's no regression. There are containers and ranges. Containers have 
.all. Ranges don't.

I think you guys are making a good point; I'm undecided on what would be 
better. One not-so-cool part about implicit conversion to range is that 
all of a sudden all range operations spill into the container. So people 
try to call c.pop and it doesn't compile. (Why?) They get confused.

 I'm all for compile time checking however I think that 
 implicit .all (with of course an explicit option) will make it easy to 
 change a function that once took an object to take a simple range  Also 
 it would make it easy to change from one way of getting at a range to 
 another.

 What about matrices?  They don't implement default .all, they would 
 provide like .col and .row.

Bidimensional ones that is :o).

Andrei

Sep 10 2008

JAnderson <ask me.com> writes:

Andrei Alexandrescu wrote:
 JAnderson wrote:
 Hi Andrei,

 I like the idea behind ranges.  I don't like C++'s / stl's long winded 
 syntax at all.  Its so large that it generally uses up several lines 
 along with several typedefs etc...  All that work just to iterate over 
 some data.  The longer things get the more error prone they get... how 
 many times have I put an begin when I meant to put end *sigh*.

 However I currently disagree on this point.

 Andrei Alexandrescu wrote:
  >
  > Fine. So instead of saying:
  >
  > foreach (e; c.all) { ... }
  >
  > you can say
  >
  > foreach (e; c) { ... }
  >
  > I think that's some dubious savings.

 I think its useful to have the implicit range conversion.  Consider 
 writing generic/template code.  Of course built in arrays could 
 provide the .all but then consider passing around ranges.  That would 
 also mean all ranges would also have a .all (could we go .all.all.all 
 for instance?).

 There's no regression. There are containers and ranges. Containers have 
 .all. Ranges don't.

Just to be clear then.  Say you write something that works on arrays and 
objects.  Then you write:

void Foo(T)(T t)
{
	...
	foreach (auto i; t.all)
	{

	}
	...
}

Now I realize I want to use that function with a range as well as an 
object (its a template after all).  Well if .all isn't regressive then I 
can't.  Of course if .all was implicit then I might have written:

void Foo(T)(T t)
{
	...
	foreach (auto i; t)
	{

	}
	...
}

But then again, .all is still available so there's still a chance a 
coder might not realize that its better to use the implicit value.  I'm 
beginning to think regressive would be useful either way.

Note of course generic code does not just apply to templates.  It also 
applies when I want to change a variable to a different type.  If .all 
is required (and non-regressive) then I have to go to all the places 
that value is used and change it.  Its the same reason auto is so awesome.

Of course .all adds an extra function you'd need to implement for custom 
ranges, but it could always be in the "range" mixin.

 I think you guys are making a good point; I'm undecided on what would be 
 better. One not-so-cool part about implicit conversion to range is that 
 all of a sudden all range operations spill into the container. So people 
 try to call c.pop and it doesn't compile. (Why?) They get confused.

 I'm all for compile time checking however I think that implicit .all 
 (with of course an explicit option) will make it easy to change a 
 function that once took an object to take a simple range  Also it 
 would make it easy to change from one way of getting at a range to 
 another.

 What about matrices?  They don't implement default .all, they would 
 provide like .col and .row.

 Bidimensional ones that is :o).

Of course :) being a games programmer, we know of only speak of one 
matrix type.

Just kidding.

 Andrei

Sep 10 2008

JAnderson <ask me.com> writes:

Andrei Alexandrescu wrote:
 JAnderson wrote:
 Hi Andrei,

 I like the idea behind ranges.  I don't like C++'s / stl's long winded 
 syntax at all.  Its so large that it generally uses up several lines 
 along with several typedefs etc...  All that work just to iterate over 
 some data.  The longer things get the more error prone they get... how 
 many times have I put an begin when I meant to put end *sigh*.

 However I currently disagree on this point.

 Andrei Alexandrescu wrote:
  >
  > Fine. So instead of saying:
  >
  > foreach (e; c.all) { ... }
  >
  > you can say
  >
  > foreach (e; c) { ... }
  >
  > I think that's some dubious savings.

 I think its useful to have the implicit range conversion.  Consider 
 writing generic/template code.  Of course built in arrays could 
 provide the .all but then consider passing around ranges.  That would 
 also mean all ranges would also have a .all (could we go .all.all.all 
 for instance?).

 There's no regression. There are containers and ranges. Containers have 
 .all. Ranges don't.

 I think you guys are making a good point; I'm undecided on what would be 
 better. One not-so-cool part about implicit conversion to range is that 
 all of a sudden all range operations spill into the container. So people 
 try to call c.pop and it doesn't compile. (Why?) They get confused.

I'm not sure that range operations need to spill over.  I was thinking
that foreach would be kinda like a template.  The foreach would do the
implict conversion. ie something like (pseudo):

foreach(I,T)(I i, T t, delegate d)
{
	foreach (I i; t.all)
	{
		d();
	}
}

Infact anything that takes range would implicitly convert (for they too
can be used inside generic code).  Of course that that would require
compiler support, probably.

 I'm all for compile time checking however I think that implicit .all 
 (with of course an explicit option) will make it easy to change a 
 function that once took an object to take a simple range  Also it 
 would make it easy to change from one way of getting at a range to 
 another.

 What about matrices?  They don't implement default .all, they would 
 provide like .col and .row.

 Bidimensional ones that is :o).

 Andrei

Sep 10 2008

Michel Fortin <michel.fortin michelf.com> writes:

On 2008-09-08 23:57:49 -0400, "Manfred_Nowak" <svv1999 hotmail.com> said:

 Andrei Alexandrescu wrote:
 
 maybe "nextTo" or something could be more suggestive.

 
 r.tillBeg(s), r.tillEnd(s),
 r.fromBeg(s), r.fromEnd(s) ?
 
 -manfred

I'm not sure I like this because you have to be careful when reversing 
the iterating direction. With my previous proposal, you only had to 
change "next" for "pull" everywhere. With yours, it's "till" to "from" 
*and* "Beg" to "End", as the relationship is somewhat interleaved:

r.nextUntil(s) => r.tillBeg(s)
r.nextAfter(s) => r.tillEnd(s)

r.pullUntil(s) => r.fromEnd(s)
r.pullAfter(s) => r.fromBeg(s)

-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Sep 08 2008

"Bill Baxter" <wbaxter gmail.com> writes:

On Tue, Sep 9, 2008 at 12:57 PM, Manfred_Nowak <svv1999 hotmail.com> wrote:
 Andrei Alexandrescu wrote:

 maybe "nextTo" or something could be more suggestive.

 r.tillBeg(s), r.tillEnd(s),
 r.fromBeg(s), r.fromEnd(s) ?

Another idea might be go back to Intro to Algebra with the "FOIL"
method for first,inner,outer,last.

Really you're trying to form the different elements of the cartesian
product of (rb,re) and (sb,se), so the "FOIL method" (a mnemonic for
multiplying binomials) tells you the resulting monomials are:

First:  (rb, sb)
Outer: (rb, se)
Inner: (re, sb)
Last: (re, se)

So you could have functions like:

fromFirsts(r,s)    aka  leftDiff(r,s)
fromLasts(s,r)    aka  rightDiff(r,s)  --- note the order reversal!
fromInner(r,s)     -- nonsense for ranges but would be "end of r to
beginning of s"
fromOuter(r,s)    aka leftUnion(r,s) aka rightUnion(s,r)

To me I think this way of decomposing the names makes it easier to
visualize what the things are doing.  I get no picture whatsoever from
"pullNext" and I think it's going to be really hard for me to remember
exactly what that does.  And leftUnion is also tough because it's
actually not a union of the two ranges, it's more like a union
followed by intersection with complement.

Thinking in terms of which components you're plucking out to make your
new iterator makes it easy for me to visualize.  But maybe not
everyone had the FOIL method drilled into their heads so thoroughly at
a young age like me.

Anyway I think it does suggest that maybe left and right union can
just be a single union op that goes from beginning of first arg to end
of second.  Maybe something like "span" would be a better name then.
And the precondition is that either r contains s, or vice versa.

--bb

Sep 08 2008

Michel Fortin <michel.fortin michelf.com> writes:

On 2008-09-08 23:43:11 -0400, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 I like the alternate names quite some. One thing, however, is that head 
 and rear are not near-antonyms (which ideally they should be). Maybe 
 front and rear would be an improvement. (STL uses front and back). 
 Also, I may be dirty-minded, but somehow headNext just sounds... bad 
 :o).

Yeah, pehaps. I mostly wanted a verb, not "frontNext" which seems 
wrong, and "head" is both a noun and a verb so I kept it.

 I like the intersection functions as members because they clarify the 
 relationship between the two ranges, which is asymmetric. I will 
 definitely heed this suggestion. "Until" suggests iteration, however, 
 which it shouldn't be (should be constant time) so maybe "nextTo" or 
 something could be more suggestive.

Well, initially I thought about nextTo, but then it stuck me as also 
meaning "the thing just after", which is not really it. I also though 
about nextUpTo, but that's many capitals to type and many small words 
and I prefered nextUntil even with the downside of sounding like we're 
iterating.

But perhaps we could get rid of next and replace it with a verb.

What about this terminology?

r.frontShift      // conceptually r.front; r.shift
r.putShift(e)     // conceptually r.front = e; r.shift

r.front
r.shift
r.shiftTo(s)
r.shiftAfter(s)

r.back
r.pull
r.pullTo(s)
r.pullAfter(s)

-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Sep 08 2008

Leandro Lucarella <llucax gmail.com> writes:

Andrei Alexandrescu, el  8 de septiembre a las 22:43 me escribiste:
 I like the alternate names quite some. One thing, however, is that head
 and rear are not near-antonyms (which ideally they should be). Maybe
 front and rear would be an improvement. (STL uses front and back). Also,

What about head/tail? You certainly won't confuse STL refugees and
functional guys will be at home ;)

-- 
Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/
----------------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------------
PROTESTA EN PLAZA DE MAYO: MUSICO SE COSIO LA BOCA
	-- Crónica TV

Sep 09 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Leandro Lucarella wrote:
 Andrei Alexandrescu, el  8 de septiembre a las 22:43 me escribiste:
 I like the alternate names quite some. One thing, however, is that head
 and rear are not near-antonyms (which ideally they should be). Maybe
 front and rear would be an improvement. (STL uses front and back). Also,

 
 What about head/tail? You certainly won't confuse STL refugees and
 functional guys will be at home ;)

You'll sure confuse the latter. To them, tail is everything except the 
head, e.g. a[1 .. $].

Andrei

Sep 09 2008

Leandro Lucarella <llucax gmail.com> writes:

Andrei Alexandrescu, el  9 de septiembre a las 09:50 me escribiste:
 Leandro Lucarella wrote:
Andrei Alexandrescu, el  8 de septiembre a las 22:43 me escribiste:
I like the alternate names quite some. One thing, however, is that head
and rear are not near-antonyms (which ideally they should be). Maybe
front and rear would be an improvement. (STL uses front and back). Also,

What about head/tail? You certainly won't confuse STL refugees and
functional guys will be at home ;)

 
 You'll sure confuse the latter. To them, tail is everything except the head, 
 e.g. a[1 .. $].

You are right =/

Anyway, I think it better to confuse some other language guys than
compromising D's readability...

-- 
Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/
----------------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------------
Karma police
arrest this man,
he talks in maths,
he buzzes like a fridge,
he's like a detuned radio.

Sep 09 2008

"Lionello Lunesu" <lionello lunesu.remove.com> writes:

 r.rear

I think 'tail' would be better as the opposite of 'head'.

L.

Sep 08 2008

Jason House <jason.james.house gmail.com> writes:

Left and right Union and diff seem awkward. The kind of thing a rare user would
look up *every* time they use it or read code with it. I will make one
observation: requiring one range inside of the other leads to three logical
ranges:
� All elements "left" of the inner range
� All elements "right" of the inner range
� The inner range

In my mind that corresponds to "less than", "greater than", and "equal to".

Using this thought, here's how I think of the four awkward operations:
LeftDiff: <
LeftUnion: <=
RightUnion: >=
RightDiff: >

Maybe this could be done with some magic like r.range!(">=")(s)...

I hope this is clear. Long posts are tough to type on my cellphone...

Andrei Alexandrescu Wrote:

 Hello,
 
 
 Walter, Bartosz and myself have been hard at work trying to find the 
 right abstraction for iteration. That abstraction would replace the 
 infamous opApply and would allow for external iteration, thus paving the 
 way to implementing real generic algorithms.
 
 We considered an STL-style container/iterator design. Containers would 
 use the newfangled value semantics to enforce ownership of their 
 contents. Iterators would span containers in various ways.
 
 The main problem with that approach was integrating built-in arrays into 
 the design. STL's iterators are generalized pointers; D's built-in 
 arrays are, however, not pointers, they are "pairs of pointers" that 
 cover contiguous ranges in memory. Most people who've used D gained the 
 intuition that slices are superior to pointers in many ways, such as 
 easier checking for validity, higher-level compact primitives, 
 streamlined and safe interface. However, if STL iterators are 
 generalized pointers, what is the corresponding generalization of D's 
 slices? Intuitively that generalization should also be superior to 
 iterators.
 
 In a related development, the Boost C++ library has defined ranges as 
 pairs of two iterators and implemented a series of wrappers that accept 
 ranges and forward their iterators to STL functions. The main outcome of 
 Boost ranges been to decrease the verboseness and perils of naked 
 iterator manipulation (see 
 http://www.boost.org/doc/libs/1_36_0/libs/range/doc/intro.html). So a 
 C++ application using Boost could avail itself of containers, ranges, 
 and iterators. The Boost notion of range is very close to a 
 generalization of D's slice.
 
 We have considered that design too, but that raised a nagging question. 
 In most slice-based D programming, using bare pointers is not necessary. 
 Could then there be a way to use _only_ ranges and eliminate iterators 
 altogether? A container/range design would be much simpler than one also 
 exposing iterators.
 
 All these questions aside, there are several other imperfections in the 
 STL, many caused by the underlying language. For example STL is 
 incapable of distinguishing between input/output iterators and forward 
 iterators. This is because C++ cannot reasonably implement a type with 
 destructive copy semantics, which is what would be needed to make said 
 distinction. We wanted the Phobos design to provide appropriate answers 
 to such questions, too. This would be useful particularly because it 
 would allow implementation of true and efficient I/O integrated with 
 iteration. STL has made an attempt at that, but istream_iterator and 
 ostream_iterator are, with all due respect, a joke that builds on 
 another joke, the iostreams.
 
 After much thought and discussions among Walter, Bartosz and myself, I 
 defined a range design and reimplemented all of std.algorithm and much 
 of std.stdio in terms of ranges alone. This is quite a thorough test 
 because the algorithms are diverse and stress-test the expressiveness 
 and efficiency of the range design. Along the way I made the interesting 
 realization that certain union/difference operations are needed as 
 primitives for ranges. There are also a few bugs in the compiler and 
 some needed language enhancements (e.g. returning a reference from a 
 function); Walter is committed to implement them.
 
 I put together a short document for the range design. I definitely 
 missed about a million things and have been imprecise about another 
 million, so feedback would be highly appreciated. See:
 
 http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html
 
 
 Andrei

Sep 08 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Jason House wrote:
 Left and right Union and diff seem awkward. The kind of thing a rare user
would look up *every* time they use it or read code with it. I will make one
observation: requiring one range inside of the other leads to three logical
ranges:
 � All elements "left" of the inner range
 � All elements "right" of the inner range
 � The inner range
 
 In my mind that corresponds to "less than", "greater than", and "equal to".
 
 Using this thought, here's how I think of the four awkward operations:
 LeftDiff: <
 LeftUnion: <=
 RightUnion: >=
 RightDiff: >
 
 Maybe this could be done with some magic like r.range!(">=")(s)...
 
 I hope this is clear. Long posts are tough to type on my cellphone...

Wow. I could never type as much on a cell. Thanks for the suggestion. I 
personally find it a bit cute, but interesting.

Andrei

Sep 09 2008

Jason House <jason.james.house gmail.com> writes:

Andrei Alexandrescu Wrote:

 Thanks for the suggestion. I 
 personally find it a bit cute, but interesting.

Is cute a bad thing?  If no better suggestions were made, I hoped it might help
finding better names for leftDiff and friends. Of course, a better suggestion
was made: r.toBegin(s), r.toEnd(s), ...

Sep 09 2008

"Bill Baxter" <wbaxter gmail.com> writes:

On Tue, Sep 9, 2008 at 6:50 AM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 I put together a short document for the range design. I definitely missed
 about a million things and have been imprecise about another million, so
 feedback would be highly appreciated. See:

 http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html

Small typo:

"which opens forward ranges to much more many algorithms"

--bb

Sep 08 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Bill Baxter wrote:
 On Tue, Sep 9, 2008 at 6:50 AM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 I put together a short document for the range design. I definitely missed
 about a million things and have been imprecise about another million, so
 feedback would be highly appreciated. See:

 http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html

 
 Small typo:
 
 "which opens forward ranges to much more many algorithms"

How do I fix it?

Andrei

Sep 09 2008

"Bill Baxter" <wbaxter gmail.com> writes:

On Tue, Sep 9, 2008 at 7:53 PM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 Bill Baxter wrote:
 On Tue, Sep 9, 2008 at 6:50 AM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 I put together a short document for the range design. I definitely missed
 about a million things and have been imprecise about another million, so
 feedback would be highly appreciated. See:

 http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html

 Small typo:

 "which opens forward ranges to much more many algorithms"

 How do I fix it?

Just make it "to many more algorithms" instead of "to much more many
algorithms".

--bb

Sep 09 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Bill Baxter wrote:
 On Tue, Sep 9, 2008 at 7:53 PM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 Bill Baxter wrote:
 On Tue, Sep 9, 2008 at 6:50 AM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 I put together a short document for the range design. I definitely missed
 about a million things and have been imprecise about another million, so
 feedback would be highly appreciated. See:

 http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html

 Small typo:

 "which opens forward ranges to much more many algorithms"

 How do I fix it?

 
 Just make it "to many more algorithms" instead of "to much more many
 algorithms".

Thanks, fixed.

Andrei

Sep 09 2008

bearophile <bearophileHUGS lycos.com> writes:

I don't have enough experience to have a clear view on the whole subject, so I
write only some general comments:

1) In time I have seen that everyone that likes/loves language X wants D to
become like X. This is true for Andrei too, you clearly like C++ a lot. The
worse fault of C++ is excessive complexity, that's the single fault that's
killing it, that has pushed people to invent Java, and so on. Walter has
clearly tried hard to make D a less complex language. This means that D is
sometimes less powerful that C++, and every smart programmer can see that it's
less powerful, but having 15% less power is a good deal if you have 1/3 of the
complexity. As you may guess I don't like C++ (even if I like its efficiency
I'm not going to use it for real coding), so I don't like D to become just a
resyntaxed C++. Note that originally D was more like a statically compiled
Java, and while that's not perfect, that's probably better than a resyntaxed
C++. So I suggest less power, if this decreases a lot what the programmer has
to keep in the mind while programming. If you don't believe me take a look at
what the blurb of D says:

D is a systems programming language. Its focus is on combining the power and
high performance of C and C++ with the programmer productivity of modern
languages like Ruby and Python. Special attention is given to the needs of
quality assurance, documentation, management, portability and reliability.<

As you see it's a matter of balance.

2) Keep in mind that not all D programmers are lovers of C++ and they don't all
come from C++, some of them are actually lovers of other languages, like Java,



3) opApply has some problems, but it isn't infamous.

4) Syntax matters, and function/ method /attribute names too. So they have to
be chosen with care. I suggest to use single words when possible, and to
consider "alllowercase" names too (I know this contrasts with D style guide).

4b) Coping function/method/attribute names from C++ isn't bad if people think
such names are good. Inventing different names just to be different is not good.

5) The source code of the current algorithm module of D2 is already very
complex to follow, it smells of over-generalization here and there. Sometimes
it's better to reduce the generality of things, even if that reduces their
power a little, to reduce complexity, etc. Tango code too isn't perfect, but it
often looks more human. While you have created the algorithm module I too have
created something similar, but based on different grounds.

6) Built-in data types are important, they aren't meant to replace a good
standard library, where you can find more specialized and more efficient data
structures. The built-in data types are meant to:
- offer a very handy syntax, easy to use and remember, short to type too.
- They have to be efficient in a very wide variety of situations, so they must
avoid having really bad peformance in a large number of situations, while it's
okay for them to be not much fast in any situation.
- They are useful for example when you have little data, in 90% of the code
where max performance isn't so important. In the other situations you are
supposed able to import things like an IntrusiveRedBlackHashMap from the std
lib.

7) Take a look at the lazy "views" of keys/values/items of Python3, how they
fit into your view of such ranges. Java too has something similar. (I can give
a summary if you want. But in few words if you have an associative array (dict)
d.keys() doesn't return an array, but a "view", a lazy iterable that's an
object that can be sliced lazily, iterated, etc. This is way more efficient
than the .keys/.values of the currenct D implementation).

8) Lot of functions/thinghies in my mostly-functional library are designed to
be lazy, that is they generate items on the fly. At the moment D lacks a *lot*
such view of the world. Again, take a good look at how Python 3 has shifted
from eager to lazy in lot of its style. How do such things fit in your range
view? I think they can fit well, but the D language has to shift its phylosophy
a little to support such lazy generators/iterators more often and commonly in
the built-ins too.

9) I have a long discussion in the main D newsgroup, a partial conclusion was
that a struct of 2 items may not be enough for the current implementation of
dynamic arrays, because withot a capacity field, the append is dead-slow. I
presume this is ortogonal to your range propostal, but I have mentioned it here
because you keep talking about dynamic arrays as 2-pointer structs, and that
may be changed in the future.

Bye,
bearophile

Sep 09 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

bearophile wrote:
 I don't have enough experience to have a clear view on the whole
 subject, so I write only some general comments:

Uh-oh. That doesn't quite bode well :o).

 1) In time I have seen that everyone that likes/loves language X
 wants D to become like X. This is true for Andrei too, you clearly
 like C++ a lot.

Stop right there. This is just a presupposition. I think I could say I 
*know* C++ a lot, which is quite different. But then I know a few other 
languages quite well, along with the advantages and disadvantages that 
made them famous.

Maybe I could say I do like the STL. Not the quirks it takes to 
implement it in C++, but for the fact that it brings clarity and 
organization in defining fundamental data structures and algorithms. For 
my money, other collection/algorithms designs don't hold a candle to STL's.

 The worse fault of C++ is excessive complexity,
 that's the single fault that's killing it, that has pushed people to
 invent Java, and so on. Walter has clearly tried hard to make D a
 less complex language. This means that D is sometimes less powerful
 that C++, and every smart programmer can see that it's less powerful,
 but having 15% less power is a good deal if you have 1/3 of the
 complexity. As you may guess I don't like C++ (even if I like its
 efficiency I'm not going to use it for real coding), so I don't like
 D to become just a resyntaxed C++. Note that originally D was more
 like a statically compiled Java, and while that's not perfect, that's
 probably better than a resyntaxed C++. So I suggest less power, if
 this decreases a lot what the programmer has to keep in the mind
 while programming. If you don't believe me take a look at what the
 blurb of D says:
 
 D is a systems programming language. Its focus is on combining the
 power and high performance of C and C++ with the programmer
 productivity of modern languages like Ruby and Python. Special
 attention is given to the needs of quality assurance,
 documentation, management, portability and reliability.<

 
 As you see it's a matter of balance.

I know it's a matter of balance. I am not sure what gave you the idea 
that I didn't.

 2) Keep in mind that not all D programmers are lovers of C++ and they
 don't all come from C++, some of them are actually lovers of other



I like many things in all of the languages above. Don't forget it was me 
who opened Walter to Lisp :o).

 3) opApply has some problems, but it isn't infamous.

opApply has two problems:

1. It can't save the state of the iteration for later resumption.

2. It uses repeated function calls through a pointer, which is 
measurably slow.

Both disadvantages are major. The first fosters container design without 
true iteration. That's just bad design. (Look at built-in hashes.) The 
second is even worse because it makes foreach a toy useful when you read 
lines from the console, but not in inner loops. Which is kind of ironic 
because they are inner *loops*.

I think many would agree that foreach minus the disadvantages would be 
better.

 4) Syntax matters, and function/ method /attribute names too. So they
 have to be chosen with care. I suggest to use single words when
 possible, and to consider "alllowercase" names too (I know this
 contrasts with D style guide).

It's very easy to give very general advice (which to be honest your post 
is replete of). It would help if you followed with a few concrete points.

 4b) Coping function/method/attribute names from C++ isn't bad if
 people think such names are good. Inventing different names just to
 be different is not good.
 
 5) The source code of the current algorithm module of D2 is already
 very complex to follow, it smells of over-generalization here and
 there. Sometimes it's better to reduce the generality of things, even
 if that reduces their power a little, to reduce complexity, etc.
 Tango code too isn't perfect, but it often looks more human. While
 you have created the algorithm module I too have created something
 similar, but based on different grounds.

I am sure you like your library more than mine, because it's your design 
and your realization.

The code in std.algorithm is complex because it implements algorithms 
that are complex. I know if someone would look over partition, rotate, 
topN, or sort, without knowing how they work, they wouldn't have an easy 
job picking it up. That is fine. The purpose of std.algorithm's 
implementation is not to be a tutorial on algorithms.

On occasion std.algorithm does a few flip-flops, but always for a good 
reason. For example it takes some effort to allow multiple functions in 
reduce. Consider:

double[] a = [ 3.0, 4, 7, 11, 3, 2, 5 ];
// Compute minimum and maximum in one pass
auto r = reduce!(min, max)(double.max, -double.max, a);
// The type of r is Tuple!(double, double)
assert(r._0 == 2);  // minimum
assert(r._1 == 11); // maximum

A simpler reduce would have only allowed one function, so I would have 
had to write:

auto m = reduce!(min)(double.max, a);
auto M = reduce!(max)(-double.max, a);

On the face of it, this looks reasonable. After all, come one, why can't 
one write two lines instead of one. However, min and max are so simple 
functions that the cost of computing them is drown by the cost of 
looping alone. On a large array, things become onerous so I had to write 
the loop by hand.

How do I know that? Because I measured. Why did I measure? Because I 
care. Why do I care? Because the typical runtime of my programs measure 
in hours of sheer computation, and because things like reduce and others 
in std.algorithm are in the core loops. And I am not alone.

If I can help it, I wouldn't want to write an algorithms library for a 
systems-level programming language with fundamental limitations that 
makes them unsuitable for efficient computation. Costly abstractions are 
a dime a dozen. It's efficient abstractions that are harder to come across.

If you believe I am wasting time with cutesies in std.algorithm, please 
let me know of the places and of the suggested improvements.

 6) Built-in data types are important, they aren't meant to replace a
 good standard library, where you can find more specialized and more
 efficient data structures. The built-in data types are meant to: -
 offer a very handy syntax, easy to use and remember, short to type
 too. - They have to be efficient in a very wide variety of
 situations, so they must avoid having really bad peformance in a
 large number of situations, while it's okay for them to be not much
 fast in any situation. - They are useful for example when you have
 little data, in 90% of the code where max performance isn't so
 important. In the other situations you are supposed able to import
 things like an IntrusiveRedBlackHashMap from the std lib.

I agree.

 7) Take a look at the lazy "views" of keys/values/items of Python3,
 how they fit into your view of such ranges. Java too has something
 similar. (I can give a summary if you want. But in few words if you
 have an associative array (dict) d.keys() doesn't return an array,
 but a "view", a lazy iterable that's an object that can be sliced
 lazily, iterated, etc. This is way more efficient than the
 .keys/.values of the currenct D implementation).

To quote a classic, Lisp has had them all for 50 years. Well-defined 
ranges are the perfect stepping stone towards lazy computation. In 
particular input ranges are really generators. I plan to implement 
things like lazyMap and lazyReduce, and also generators that are so 
popular in functional languages.

 8) Lot of functions/thinghies in my mostly-functional library are
 designed to be lazy, that is they generate items on the fly. At the
 moment D lacks a *lot* such view of the world. Again, take a good
 look at how Python 3 has shifted from eager to lazy in lot of its
 style. How do such things fit in your range view? I think they can
 fit well, but the D language has to shift its phylosophy a little to
 support such lazy generators/iterators more often and commonly in the
 built-ins too.

I agree that D could use more lazy iteration instead of eager computation.

 9) I have a long discussion in the main D newsgroup, a partial
 conclusion was that a struct of 2 items may not be enough for the
 current implementation of dynamic arrays, because withot a capacity
 field, the append is dead-slow. I presume this is ortogonal to your
 range propostal, but I have mentioned it here because you keep
 talking about dynamic arrays as 2-pointer structs, and that may be
 changed in the future.

I saw that discussion. In my opinion that discussion clarifies why 
slices shouldn't be conflated with full-fledged containers. Handling 
storage strategy should not be the job of a slice. That's why there's a 
need for true containers (including straight block arrays as a 
particular case). Ranges are perfect for their internal implementation 
and for iterating them.


Andrei

Sep 09 2008

superdan <super dan.org> writes:

Andrei Alexandrescu Wrote:

 bearophile wrote:
 I don't have enough experience to have a clear view on the whole
 subject, so I write only some general comments:

 
 Uh-oh. That doesn't quite bode well :o).

u could've stopped readin'. general comments without experience are oxpoop. my
mailman could give'em.

Sep 09 2008

Walter Bright <newshound1 digitalmars.com> writes:

superdan wrote:
 u could've stopped readin'. general comments without experience are
 oxpoop. my mailman could give'em.

The thing about iterators and collections is that they look so simple, 
but getting the right design is fiendishly difficult.

Sep 09 2008

Walter Bright <newshound1 digitalmars.com> writes:

Walter Bright wrote:
 superdan wrote:
 u could've stopped readin'. general comments without experience are
 oxpoop. my mailman could give'em.

 
 The thing about iterators and collections is that they look so simple, 
 but getting the right design is fiendishly difficult.

And when a correct design is devised, the mark of genius is everyone 
will think it in retrospect to be simple and obvious!

Sep 09 2008

Benji Smith <dlanguage benjismith.net> writes:

Walter Bright wrote:
 Walter Bright wrote:
 superdan wrote:
 u could've stopped readin'. general comments without experience are
 oxpoop. my mailman could give'em.

 The thing about iterators and collections is that they look so simple, 
 but getting the right design is fiendishly difficult.

 
 And when a correct design is devised, the mark of genius is everyone 
 will think it in retrospect to be simple and obvious!

Also: everyone will measure the quality of that design using a different 
yardstick.

For my money, the best collection design I've ever worked with is the C5 
library for .Net, developed at the IT University of Copenhagen:

http://www.itu.dk/research/c5/
http://www.ddj.com/windows/199902700

--benji

Sep 09 2008

Benji Smith <dlanguage benjismith.net> writes:

bearophile wrote:
 6) Built-in data types are important, they aren't meant to replace a good
standard library, where you can find more specialized and more efficient data
structures. The built-in data types are meant to:
 - offer a very handy syntax, easy to use and remember, short to type too.
 - They have to be efficient in a very wide variety of situations, so they must
avoid having really bad peformance in a large number of situations, while it's
okay for them to be not much fast in any situation.
 - They are useful for example when you have little data, in 90% of the code
where max performance isn't so important. In the other situations you are
supposed able to import things like an IntrusiveRedBlackHashMap from the std
lib.

I'd also add this:

A built-in data-type can't implement an interface, so there should be 
one and only one obvious implementation of its interface. For example, 
it would be a mistake to have a "File" as a built-in type, because a 
"File" really ought to implement an "InputStream" interface, so that 
people can write stream-consumer code generically.

It's one of the reasons I think dynamic arrays and associate arrays 
belong in the standard library (as List and Map interfaces) rather than 
as built-in types.

On a separate note...

I'm also a little skeptical about the range proposal (though I'll have 
to give it more thought any maybe play with an implementation before I 
can come to any firm conclusion). The proposal seems to cover the 
standard cases, of bounded and unbounded forward iteration, reverse 
iteration, etc.

But I can think of a few examples where the concept of "iteration" needs 
to be even more generalized.

For example, I worked on a project last year (in Java) that used a 
hidden markov model to navigate through the nodes of a graph, driving a 
montel carlo simulation.

I designed the MarkovModel<T> class to implement the Collection<T> 
interface so that I could use foreach iteration to crawl through the 
nodes of the graph. It basically looked like this:

     MarkovModel<SimulationState> model = ...;

     for (SimulationState state : model) {
        state.doStuff();
     }

The cool thing about this is that the simulation would continue running 
until it reached a natural termination point (a simulation state with no 
outbound state transition probabilities).

Depending upon the particulars of the model, it might never end. And 
although the ordering of the nodes was significant, it was never 
deterministic. Iterating through the states of a markov model is 
essentially a probabilistically guided random-walk through the elements 
of the collection.

For me, java iterators made a very natural design choice, since the 
iterator is such a dirt-simple interface (with just a "hasNext" and 
"next" method).

How would the D range proposal address the sorts of problems that 
require non-deterministic iteration? To me, the metaphor of a "range" is 
nice, but it doesn't cover all the cases I'd like to see in a 
general-purpose iteration metaphor.

Thanks!

--benji

Sep 09 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Benji Smith wrote:
 bearophile wrote:
 6) Built-in data types are important, they aren't meant to replace a 
 good standard library, where you can find more specialized and more 
 efficient data structures. The built-in data types are meant to:
 - offer a very handy syntax, easy to use and remember, short to type too.
 - They have to be efficient in a very wide variety of situations, so 
 they must avoid having really bad peformance in a large number of 
 situations, while it's okay for them to be not much fast in any 
 situation.
 - They are useful for example when you have little data, in 90% of the 
 code where max performance isn't so important. In the other situations 
 you are supposed able to import things like an 
 IntrusiveRedBlackHashMap from the std lib.

 
 I'd also add this:
 
 A built-in data-type can't implement an interface, so there should be 
 one and only one obvious implementation of its interface. For example, 
 it would be a mistake to have a "File" as a built-in type, because a 
 "File" really ought to implement an "InputStream" interface, so that 
 people can write stream-consumer code generically.
 
 It's one of the reasons I think dynamic arrays and associate arrays 
 belong in the standard library (as List and Map interfaces) rather than 
 as built-in types.
 
 On a separate note...
 
 I'm also a little skeptical about the range proposal (though I'll have 
 to give it more thought any maybe play with an implementation before I 
 can come to any firm conclusion). The proposal seems to cover the 
 standard cases, of bounded and unbounded forward iteration, reverse 
 iteration, etc.
 
 But I can think of a few examples where the concept of "iteration" needs 
 to be even more generalized.
 
 For example, I worked on a project last year (in Java) that used a 
 hidden markov model to navigate through the nodes of a graph, driving a 
 montel carlo simulation.
 
 I designed the MarkovModel<T> class to implement the Collection<T> 
 interface so that I could use foreach iteration to crawl through the 
 nodes of the graph. It basically looked like this:
 
     MarkovModel<SimulationState> model = ...;
 
     for (SimulationState state : model) {
        state.doStuff();
     }
 
 The cool thing about this is that the simulation would continue running 
 until it reached a natural termination point (a simulation state with no 
 outbound state transition probabilities).
 
 Depending upon the particulars of the model, it might never end. And 
 although the ordering of the nodes was significant, it was never 
 deterministic. Iterating through the states of a markov model is 
 essentially a probabilistically guided random-walk through the elements 
 of the collection.
 
 For me, java iterators made a very natural design choice, since the 
 iterator is such a dirt-simple interface (with just a "hasNext" and 
 "next" method).
 
 How would the D range proposal address the sorts of problems that 
 require non-deterministic iteration? To me, the metaphor of a "range" is 
 nice, but it doesn't cover all the cases I'd like to see in a 
 general-purpose iteration metaphor.

Hmm, HMMs :o). If you could do it with Java's hasNext and next, you can 
do it with D's isEmpty and next. There's no difference.


Andrei

Sep 09 2008

Benji Smith <dlanguage benjismith.net> writes:

Andrei Alexandrescu wrote:
 Hmm, HMMs :o). If you could do it with Java's hasNext and next, you can 
 do it with D's isEmpty and next. There's no difference.
 
 
 Andrei

Oh. Okay. Good to know :)

I guess all the talk about "ranges" has me visualizing contiguous items 
and sequential ordering and determinism. It's a good word for a 
well-defined set of items with a true beginning and end, but I wonder 
whether "cursor" might be a better word than "range", especially for 
input consumption and non-deterministic iteration.

One thing I definitely like better about the new range proposal is the 
notion that the container is not responsible for providing iteration 
logic or, more importantly, maintaining the state of any iteration. I 
think it'll be a welcome change.

Nice work, and I'm looking forward to working with the new stuff :)

--benji

Sep 09 2008

Benji Smith <dlanguage benjismith.net> writes:

bearophile wrote:
 5) The source code of the current algorithm module of D2 is already very
complex to follow, it smells of over-generalization here and there. Sometimes
it's better to reduce the generality of things, even if that reduces their
power a little, to reduce complexity, etc. Tango code too isn't perfect, but it
often looks more human. While you have created the algorithm module I too have
created something similar, but based on different grounds.

Along these same lines, while D is still young, the documentation is 
often thin, and code examples are scarce.

I've been doing a lot of programming lately with Tango, and despite the 
growing documentation, I've had to refer directly to the Tango source 
code on more occasions than I can count. Luckily, the Tango sources are 
well-written and pretty easy to read and understand, and I've had very 
few problems figuring out how to use the library.

I hope the range implementation makes readability a high priority.

--benji

Sep 09 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Benji Smith wrote:
 bearophile wrote:
 5) The source code of the current algorithm module of D2 is already 
 very complex to follow, it smells of over-generalization here and 
 there. Sometimes it's better to reduce the generality of things, even 
 if that reduces their power a little, to reduce complexity, etc. Tango 
 code too isn't perfect, but it often looks more human. While you have 
 created the algorithm module I too have created something similar, but 
 based on different grounds.

 
 Along these same lines, while D is still young, the documentation is 
 often thin, and code examples are scarce.
 
 I've been doing a lot of programming lately with Tango, and despite the 
 growing documentation, I've had to refer directly to the Tango source 
 code on more occasions than I can count. Luckily, the Tango sources are 
 well-written and pretty easy to read and understand, and I've had very 
 few problems figuring out how to use the library.
 
 I hope the range implementation makes readability a high priority.

This is a valid concern. The sample ranges I have coded so far look 
deceptively simple, and that's a good sign. It only takes a dozen lines 
to write an interesting range. (This is very unlike STL iterators.)

Andrei

Sep 09 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Benji Smith wrote:
 bearophile wrote:
 5) The source code of the current algorithm module of D2 is already 
 very complex to follow, it smells of over-generalization here and 
 there. Sometimes it's better to reduce the generality of things, even 
 if that reduces their power a little, to reduce complexity, etc. Tango 
 code too isn't perfect, but it often looks more human. While you have 
 created the algorithm module I too have created something similar, but 
 based on different grounds.

 
 Along these same lines, while D is still young, the documentation is 
 often thin, and code examples are scarce.
 
 I've been doing a lot of programming lately with Tango, and despite the 
 growing documentation, I've had to refer directly to the Tango source 
 code on more occasions than I can count. Luckily, the Tango sources are 
 well-written and pretty easy to read and understand, and I've had very 
 few problems figuring out how to use the library.
 
 I hope the range implementation makes readability a high priority.

Speaking of examples and readability, and to tie this with the 
discussion on array reallocation, I was curious on an array appender 
built on the output range interface. I've seen a quite complicated array 
builder in digitalmars.d. I wanted a simpler appender that should not do 
worse than the built-in ~= and that works with algorithm2 whenever data 
is written out.

It turned out quite simple and it improved performance of a large-scale 
data preprocessing task of mine (which involved reading and parsing 
about 1M lines of integers) by 15%. I'd be curious how it fares with 
other tests that you guys may have.

The idea is very simple: just use D's native append operation, but cache 
the capacity to avoid too many lookups (I understand that that's the 
bottleneck).

I paste the code below, I'd be indebted if you guys grabbed it and 
tested it.

Andrei

struct ArrayAppender(T)
{
     private T* _buffer;
     private size_t _length;
     private size_t _capacity;

     this(T[] array)
     {
         _buffer = array.ptr;
         _length = array.length;
         if (_buffer) _capacity = .capacity(array.ptr) / T.sizeof;
     }

     size_t capacity() const { return _capacity; }

     void putNext(T item)
     {
         invariant desiredLength = _length + 1;
         if (desiredLength <= _capacity)
         {
             // Should do in-place construction here
             _buffer[_length] = item;
         }
         else
         {
             // Time to reallocate, do it and cache capacity
             auto tmp = _buffer[0 .. _length];
             tmp ~= item;
             _buffer = tmp.ptr;
             _capacity = .capacity(_buffer) / T.sizeof;
         }
         _length = desiredLength;
     }

     T[] release()
     {
         // Shrink-to-fit
         auto result = cast(T[]) realloc(_buffer, _length * T.sizeof);
         // Relinquish state
         _buffer = null;
         _length = _capacity = 0;
         return result;
     }
}

unittest
{
     auto app = arrayAppender(new char[0]);
     string b = "abcdefg";
     foreach (char c; b) app.putNext(c);
     assert(app.release == "abcdefg");
}

Sep 09 2008

bearophile <bearophileHUGS lycos.com> writes:

Andrei Alexandrescu:

 Speaking of examples and readability, and to tie this with the 
 discussion on array reallocation, I was curious on an array appender 
 built on the output range interface. I've seen a quite complicated array 
 builder in digitalmars.d. I wanted a simpler appender that should not do 
 worse than the built-in ~= and that works with algorithm2 whenever data 
 is written out.

That builder was probably my one, designed for D1. For the last version take a
look at the 'builder.d' module in the libs I have shown you. It's not too much
complex: its API is simple and its internals are as complex as they have to be
to be efficient. (I'm slowly improving it still, I'm now trying to make it more
flexible, making its extending functionality work with other kinds of iterables
too.)


 I'd be curious how it fares with other tests that you guys may have.

That module has about 380 lines of code of benchmarks (after few hundred of
lines of unit tests), I think you can add few lines to them to benchmark your
implementation, but I presume mine is more efficient :-) I may do few
benchmarks later...

Bye,
bearophile

Sep 10 2008

bearophile <bearophileHUGS lycos.com> writes:

Few benchmarks, appending ints, note this is a worst-case situation (other
benchmarks are less dramatic). Just many appends, followed by the "release":

benchmark 10, N=10_000_000:
  Array append:  0.813 s
  ArrayBuilder:  0.159 s
  ArrayAppender: 1.056 s

benchmark 10, N=100_000_000:
  Array append:  10.887 s
  ArrayBuilder:   1.477 s
  ArrayAppender: 13.305 s

-----------------

Chunk of the code I have used:

struct ArrayAppender(T)
{
     private T* _buffer;
     private size_t _length;
     private size_t _capacity;

     void build(T[] array)
     {
         _buffer = array.ptr;
         _length = array.length;
         //if (_buffer) _capacity = .capacity(array.ptr) / T.sizeof;
     }

     size_t capacity() /* const */ { return _capacity; }

     void putNext(T item)
     {
         /* invariant */ int desiredLength = _length + 1;
         if (desiredLength <= _capacity)
         {
             // Should do in-place construction here
             _buffer[_length] = item;
         }
         else
         {
             // Time to reallocate, do it and cache capacity
             auto tmp = _buffer[0 .. _length];
             tmp ~= item;
             _buffer = tmp.ptr;
             _capacity = this.capacity() / T.sizeof;
         }
         _length = desiredLength;
     }

     T[] release()
     {
         // Shrink-to-fit
         //auto result = cast(T[]) realloc(_buffer, _length * T.sizeof);
         auto result = cast(T[]) gcrealloc(_buffer, _length * T.sizeof);
         // Relinquish state
         _buffer = null;
         _length = _capacity = 0;
         return result;
     }
}

unittest
{
     auto app = arrayAppender(new char[0]);
     string b = "abcdefg";
     foreach (char c; b) app.putNext(c);
     assert(app.release == "abcdefg");
}


    void benchmark10(int ntest, int N) {
        putr("\nbenchmark 10, N=", thousands(N), ":");

        if (ntest == 0) {
            auto t0 = clock();
            int[] a1;
            for (int i; i < N; i++)
                a1 ~= i;
            auto last1 = a1[$ - 1];
            auto t2 = clock() - t0;
            putr("  Array append: %.3f", t2, " s   ", last1);
        } else if (ntest == 1) {
            auto t0 = clock();
            ArrayBuilder!(int) a2;
            for (int i; i < N; i++)
                a2 ~= i;
            auto last2 = a2.toarray[$ - 1];
            auto t2 = clock() - t0;
            putr("  ArrayBuilder: %.3f", t2, " s   ", last2);
        } else {
            auto t0 = clock();
            ArrayAppender!(int) a3;
            a3.build(null);
            for (int i; i < N; i++)
                a3.putNext(i);
            auto last3 = a3.release()[$ - 1];
            auto t2 = clock() - t0;
            putr("  ArrayAppender: %.3f", t2, " s   ", last3);
        }
    }

Bye,
bearophile

Sep 10 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

bearophile wrote:
 Few benchmarks, appending ints, note this is a worst-case situation (other
benchmarks are less dramatic). Just many appends, followed by the "release":
 
 benchmark 10, N=10_000_000:
   Array append:  0.813 s
   ArrayBuilder:  0.159 s
   ArrayAppender: 1.056 s
 
 benchmark 10, N=100_000_000:
   Array append:  10.887 s
   ArrayBuilder:   1.477 s
   ArrayAppender: 13.305 s

That's odd. The array appender should never, by definition, do 
significantly worse than the straight array append. I think some other 
factor intervened (e.g. swapping). Also don't forget to compile with -O 
-release -inline and to test several times after a warmup run. I adapted 
your code obtaining the numbers below:

benchmark 10, N=10000000:
   Array append: 0.69 s
   ArrayAppender: 0.19 s

benchmark 10, N=25000000:
   Array append: 2.06 s
   ArrayAppender: 0.82 s

benchmark 10, N=50000000:
   Array append: 4.28 s
   ArrayAppender: 1.75 s

benchmark 10, N=75000000:
   Array append: 9.62 s
   ArrayAppender: 5.8 s

benchmark 10, N=100000000:
   Array append: 11.35 s
   ArrayAppender: 6.20 s


Andrei

Sep 10 2008

bearophile <bearophileHUGS lycos.com> writes:

Andrei Alexandrescu:
 That's odd. The array appender should never, by definition, do 
 significantly worse than the straight array append.

But the reality of your code and benchmarks may differ from the abstract
definition :-)


 I think some other factor intervened (e.g. swapping). Also don't forget to
 compile with -O -release -inline and to test several times after a warmup run.

I have kept an eye on such things too. Note that benchmarks are generally
tricky.

I am using DMD v1.035, on a Core Duo 2 GHz, 2 GB RAM, on Win, the code doesn't
make HD swap, and timings are warm. My timings are repeatable within 0.02-0.03
seconds on my PC.
If you want I can give you the whole testing code, of course. But it's just the
builders.d module of my libs plus the added testing code I have shown you.

Anyway, in the end it doesn't matter much, I think.

Bye,
bearophile

Sep 10 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

bearophile wrote:
 Andrei Alexandrescu:
 That's odd. The array appender should never, by definition, do 
 significantly worse than the straight array append.

 
 But the reality of your code and benchmarks may differ from the
 abstract definition :-)

But it's not abstract, and besides my benchmarks do support my
hypothesis. On the common path my code does the minimum amount that any
append would do: test, assign at index, bump. On the less common path
(invoked an amortized constant number of times) my code does an actual
built-in append plus a call to capacity to cache it. Assuming the
built-in array does a screaming fast append, my code should be just
about as fast, probably insignificantly slower because of the extra
straggler operations.

 I think some other factor intervened (e.g. swapping). Also don't
 forget to compile with -O -release -inline and to test several
 times after a warmup run.

 
 I have kept an eye on such things too. Note that benchmarks are
 generally tricky.
 
 I am using DMD v1.035, on a Core Duo 2 GHz, 2 GB RAM, on Win, the
 code doesn't make HD swap, and timings are warm. My timings are
 repeatable within 0.02-0.03 seconds on my PC. If you want I can give
 you the whole testing code, of course. But it's just the builders.d
 module of my libs plus the added testing code I have shown you.

But I copied your test code from your post. Then I adapted it to Phobos
(replaced putr with writefln, clock with ctime and the such, which
shouldn't matter). Then I ran it under Phobos. For the built-in array
append we get comparable numbers. So there must be something that makes
your numbers for ArrayAppender skewed. I'm saying yours and not mine
because mine are in line with expectations and yours aren't.


Andrei

Sep 10 2008

superdan <super dan.org> writes:

Andrei Alexandrescu Wrote:

 bearophile wrote:
 Few benchmarks, appending ints, note this is a worst-case situation (other
benchmarks are less dramatic). Just many appends, followed by the "release":
 
 benchmark 10, N=10_000_000:
   Array append:  0.813 s
   ArrayBuilder:  0.159 s
   ArrayAppender: 1.056 s
 
 benchmark 10, N=100_000_000:
   Array append:  10.887 s
   ArrayBuilder:   1.477 s
   ArrayAppender: 13.305 s

 
 That's odd. The array appender should never, by definition, do 
 significantly worse than the straight array append. I think some other 
 factor intervened (e.g. swapping). Also don't forget to compile with -O 
 -release -inline and to test several times after a warmup run. I adapted 
 your code obtaining the numbers below:
 
 benchmark 10, N=10000000:
    Array append: 0.69 s
    ArrayAppender: 0.19 s
 
 benchmark 10, N=25000000:
    Array append: 2.06 s
    ArrayAppender: 0.82 s
 
 benchmark 10, N=50000000:
    Array append: 4.28 s
    ArrayAppender: 1.75 s
 
 benchmark 10, N=75000000:
    Array append: 9.62 s
    ArrayAppender: 5.8 s
 
 benchmark 10, N=100000000:
    Array append: 11.35 s
    ArrayAppender: 6.20 s
 
 
 Andrei

arrayappender is simple as dumb. compare and contrast with the ginormous
arraybuilder. yet works great. why? because it is on the right thing. the
common case. on the uncommon case it does whatever to get the job done. don't
matter if it's rare.

not sure anybody saw the irony. it was bearophile advocating simplicity eh.
allegedly andre's the complexity guy. code seems to tell nother story.

btw doods i figured indexed access is slower than access via pointer. so i
recoded arrayappender to only use pointers. there's some half a second savings
for the large case. small arrays don't feel it.

struct ArrayAppender(T)
{
    private T* _begin;
    private T* _end;
    private T* _eos;

    this(T[] array)
    {
        _begin = array.ptr;
        _end = _begin + array.length;
        if (_begin) _eos = _begin + .capacity(_begin) / T.sizeof;
    }

    size_t capacity() const { return _eos - _begin; }

    void putNext(T item)
    {
        if (_end < _eos)
        {
            *_end++ = item;
        }
        else
        {
            auto tmp = _begin[0 .. _end - _begin];
            tmp ~= item;
            _begin = tmp.ptr;
            _end = _begin + tmp.length;
            _eos = _begin + .capacity(_begin) / T.sizeof;
        }
    }

    T[] releaseArray()
    {
        auto result = _begin[0 .. _end - _begin];
        _begin = _end = _eos = null;
        return result;
    }
}

Sep 10 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

superdan wrote:
 Andrei Alexandrescu Wrote:
 
 bearophile wrote:
 Few benchmarks, appending ints, note this is a worst-case situation (other
benchmarks are less dramatic). Just many appends, followed by the "release":

 benchmark 10, N=10_000_000:
   Array append:  0.813 s
   ArrayBuilder:  0.159 s
   ArrayAppender: 1.056 s

 benchmark 10, N=100_000_000:
   Array append:  10.887 s
   ArrayBuilder:   1.477 s
   ArrayAppender: 13.305 s

 That's odd. The array appender should never, by definition, do 
 significantly worse than the straight array append. I think some other 
 factor intervened (e.g. swapping). Also don't forget to compile with -O 
 -release -inline and to test several times after a warmup run. I adapted 
 your code obtaining the numbers below:

 benchmark 10, N=10000000:
    Array append: 0.69 s
    ArrayAppender: 0.19 s

 benchmark 10, N=25000000:
    Array append: 2.06 s
    ArrayAppender: 0.82 s

 benchmark 10, N=50000000:
    Array append: 4.28 s
    ArrayAppender: 1.75 s

 benchmark 10, N=75000000:
    Array append: 9.62 s
    ArrayAppender: 5.8 s

 benchmark 10, N=100000000:
    Array append: 11.35 s
    ArrayAppender: 6.20 s


 Andrei

 
 arrayappender is simple as dumb. compare and contrast with the ginormous
arraybuilder. yet works great. why? because it is on the right thing. the
common case. on the uncommon case it does whatever to get the job done. don't
matter if it's rare.
 
 not sure anybody saw the irony. it was bearophile advocating simplicity eh.
allegedly andre's the complexity guy. code seems to tell nother story.
 
 btw doods i figured indexed access is slower than access via pointer. so i
recoded arrayappender to only use pointers. there's some half a second savings
for the large case. small arrays don't feel it.
 
 struct ArrayAppender(T)
 {
     private T* _begin;
     private T* _end;
     private T* _eos;
 
     this(T[] array)
     {
         _begin = array.ptr;
         _end = _begin + array.length;
         if (_begin) _eos = _begin + .capacity(_begin) / T.sizeof;
     }
 
     size_t capacity() const { return _eos - _begin; }
 
     void putNext(T item)
     {
         if (_end < _eos)
         {
             *_end++ = item;
         }
         else
         {
             auto tmp = _begin[0 .. _end - _begin];
             tmp ~= item;
             _begin = tmp.ptr;
             _end = _begin + tmp.length;
             _eos = _begin + .capacity(_begin) / T.sizeof;
         }
     }
 
     T[] releaseArray()
     {
         auto result = _begin[0 .. _end - _begin];
         _begin = _end = _eos = null;
         return result;
     }
 }

Thanks! Can't hurt. Guess I'll integrate your code if you don't mind.

I gave it some more thought and I have a theory for the root of the 
issue. My implementation assumes there's exponential (multiplicative) 
increase of capacity in the built-in ~=. I hope Walter wouldn't do 
anything else. If there are differences in growth strategies between D1 
and D2, that could explain the difference between bearophile's 
benchmarks and mine.


Andrei

Sep 10 2008

bearophile <bearophileHUGS lycos.com> writes:

Andrei Alexandrescu:
 I hope Walter wouldn't do 
 anything else. If there are differences in growth strategies between D1 
 and D2, that could explain the difference between bearophile's 
 benchmarks and mine.

You can't compare benchmarks of two different compilers.

(Your code doesn't work (on D1) if T is a static array, and using the ~= looks
like a better interface for the append. Your code doesn't append arrays, so you
have to call it many times if you want to append strings (in D1) that's a very
common case.)

Bye,
bearophile

Sep 10 2008

bearophile <bearophileHUGS lycos.com> writes:

bearophile:
 (Your code doesn't work (on D1) if T is a static array,

Sorry, ignore what I have written, I'm a little nervous...

Bye,
bearophile

Sep 10 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

bearophile wrote:
 bearophile:
 (Your code doesn't work (on D1) if T is a static array,

 
 Sorry, ignore what I have written, I'm a little nervous...

I think I've unnecessarily overstated my case, which has put both of us 
in defensive.

You are very right that tests on D1 and D2 are not comparable. And 
Walter has at one point made crucial changes in the allocator at my 
behest. Specifically, he introduced in-place growth whenever possible 
and added the expand() primitive to the gc. These are present in D2 but 
I don't know whether and when he has regressed those to D1. (And btw why 
wouldn't you try it :o).)


Andrei

Sep 10 2008

Sean Kelly <sean invisibleduck.org> writes:

Andrei Alexandrescu wrote:
 bearophile wrote:
 bearophile:
 (Your code doesn't work (on D1) if T is a static array,

 Sorry, ignore what I have written, I'm a little nervous...

 
 I think I've unnecessarily overstated my case, which has put both of us 
 in defensive.
 
 You are very right that tests on D1 and D2 are not comparable. And 
 Walter has at one point made crucial changes in the allocator at my 
 behest. Specifically, he introduced in-place growth whenever possible 
 and added the expand() primitive to the gc. These are present in D2 but 
 I don't know whether and when he has regressed those to D1. (And btw why 
 wouldn't you try it :o).)

For the record, in-place growth has been in D1 for as long as it's been 
in D2.


Sean

Sep 10 2008

Sean Kelly <sean invisibleduck.org> writes:

Andrei Alexandrescu wrote:
 
 I gave it some more thought and I have a theory for the root of the 
 issue. My implementation assumes there's exponential (multiplicative) 
 increase of capacity in the built-in ~=. I hope Walter wouldn't do 
 anything else. If there are differences in growth strategies between D1 
 and D2, that could explain the difference between bearophile's 
 benchmarks and mine.

Arrays larger than 4k grow logarithmically, smaller than 4k they grow 
exponentially.  This is certainly true of D1 and Tango, and I'd assume 
D2 is no different.


Sean

Sep 10 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Sean Kelly wrote:
 Andrei Alexandrescu wrote:
 I gave it some more thought and I have a theory for the root of the 
 issue. My implementation assumes there's exponential (multiplicative) 
 increase of capacity in the built-in ~=. I hope Walter wouldn't do 
 anything else. If there are differences in growth strategies between 
 D1 and D2, that could explain the difference between bearophile's 
 benchmarks and mine.

 
 Arrays larger than 4k grow logarithmically, smaller than 4k they grow 
 exponentially.  This is certainly true of D1 and Tango, and I'd assume 
 D2 is no different.

Yes, but with in-place expansion, effective growth stays exponential 
even beyond 4k. So I'm not that worried that it could become quadratic, 
unless there are some really wicked workloads.

I modified my putNext to print what's happening:

     void putNext(T item)
     {
         if (_end < _eos)
         {
             // Should do in-place construction here
             *_end++ = item;
         }
         else
         {
             // Time to reallocate, do it and cache capacity
             auto tmp = _begin[0 .. _end - _begin];
             tmp ~= item;
             if (_begin != tmp.ptr)
             {
                 _begin = tmp.ptr;
                 _end = _begin + tmp.length;
                 writeln(_end - _begin);
             }
             else
             {
                 ++_end;
             }
             _eos = _begin + .capacity(_begin) / T.sizeof;
         }
     }

Notice the writeln. On my system the console reads:

benchmark 10, N=75000000:
1
5
9
17
33
65
129
257
513
253953
1155073
4743169
18505729
68861953

That's exponential alright. On the other hand, if you move the writeln 
after the if, indeed:

1
5
9
17
33
65
129
257
513
1025
2049
3073
4097
5121
6145
7169
8193
9217
10241
11265
12289

But it's the former column that matters, because moving chinks is where 
real work is being done. In-place block expansion should take constant time.

With this behavior in place, my code amortizes calls to capacity() by a 
factor of 1024, and keeps amortized append complexity constant.


Andrei

Sep 10 2008

dsimcha <dsimcha yahoo.com> writes:

== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 Benji Smith wrote:
 bearophile wrote:
 5) The source code of the current algorithm module of D2 is already
 very complex to follow, it smells of over-generalization here and
 there. Sometimes it's better to reduce the generality of things, even
 if that reduces their power a little, to reduce complexity, etc. Tango
 code too isn't perfect, but it often looks more human. While you have
 created the algorithm module I too have created something similar, but
 based on different grounds.

 Along these same lines, while D is still young, the documentation is
 often thin, and code examples are scarce.

 I've been doing a lot of programming lately with Tango, and despite the
 growing documentation, I've had to refer directly to the Tango source
 code on more occasions than I can count. Luckily, the Tango sources are
 well-written and pretty easy to read and understand, and I've had very
 few problems figuring out how to use the library.

 I hope the range implementation makes readability a high priority.

 Speaking of examples and readability, and to tie this with the
 discussion on array reallocation, I was curious on an array appender
 built on the output range interface. I've seen a quite complicated array
 builder in digitalmars.d. I wanted a simpler appender that should not do
 worse than the built-in ~= and that works with algorithm2 whenever data
 is written out.
 It turned out quite simple and it improved performance of a large-scale
 data preprocessing task of mine (which involved reading and parsing
 about 1M lines of integers) by 15%. I'd be curious how it fares with
 other tests that you guys may have.
 The idea is very simple: just use D's native append operation, but cache
 the capacity to avoid too many lookups (I understand that that's the
 bottleneck).
 I paste the code below, I'd be indebted if you guys grabbed it and
 tested it.
 Andrei
 struct ArrayAppender(T)
 {
      private T* _buffer;
      private size_t _length;
      private size_t _capacity;
      this(T[] array)
      {
          _buffer = array.ptr;
          _length = array.length;
          if (_buffer) _capacity = .capacity(array.ptr) / T.sizeof;
      }
      size_t capacity() const { return _capacity; }
      void putNext(T item)
      {
          invariant desiredLength = _length + 1;
          if (desiredLength <= _capacity)
          {
              // Should do in-place construction here
              _buffer[_length] = item;
          }
          else
          {
              // Time to reallocate, do it and cache capacity
              auto tmp = _buffer[0 .. _length];
              tmp ~= item;
              _buffer = tmp.ptr;
              _capacity = .capacity(_buffer) / T.sizeof;
          }
          _length = desiredLength;
      }
      T[] release()
      {
          // Shrink-to-fit
          auto result = cast(T[]) realloc(_buffer, _length * T.sizeof);
          // Relinquish state
          _buffer = null;
          _length = _capacity = 0;
          return result;
      }
 }
 unittest
 {
      auto app = arrayAppender(new char[0]);
      string b = "abcdefg";
      foreach (char c; b) app.putNext(c);
      assert(app.release == "abcdefg");
 }

One definite problem that I've just realized is that there's no putNext(T[]).
What if you need to append another array to your ArrayAppender, not just a
single
element?

Jan 08 2009

bearophile <bearophileHUGS lycos.com> writes:

dsimcha:
 One definite problem that I've just realized is that there's no putNext(T[]).
 What if you need to append another array to your ArrayAppender, not just a
single
 element?

Just add a simple method overload for that purpose, it's easy enough to do.
(the ArrayBuilder of my dlibs has this already, of course).

Bye,
bearophile

Jan 08 2009

jq <jlquinn optonline.net> writes:

dsimcha Wrote:

 == Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article

 The idea is very simple: just use D's native append operation, but cache
 the capacity to avoid too many lookups (I understand that that's the
 bottleneck).
 I paste the code below, I'd be indebted if you guys grabbed it and
 tested it.
 Andrei
 struct ArrayAppender(T)
 {
      size_t capacity() const { return _capacity; }
      void putNext(T item)


I have thoughts:

1) There should probably be a length/size call
2) How about add() or append() for a shorter name
3) What about using ~=?  Maybe this is too short...

Jerry

Jan 08 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

jq wrote:
 dsimcha Wrote:
 
 == Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article

 
 The idea is very simple: just use D's native append operation, but cache
 the capacity to avoid too many lookups (I understand that that's the
 bottleneck).
 I paste the code below, I'd be indebted if you guys grabbed it and
 tested it.
 Andrei
 struct ArrayAppender(T)
 {
      size_t capacity() const { return _capacity; }
      void putNext(T item)


 
 I have thoughts:
 
 1) There should probably be a length/size call
 2) How about add() or append() for a shorter name
 3) What about using ~=?  Maybe this is too short...

Length sounds good. The other two I'm more hesitant about because 
ArrayAppender supports the interface of an output range. The output 
range only allows putting one element and making a step simultaneously, 
hence putNext. The ~= is also a bit of an unfortunate choice because 
it's odd to define a type with ~= but no meaningful/desirable binary ~.

Andrei

Jan 09 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

dsimcha wrote:
 == Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article

[snip]
 One definite problem that I've just realized is that there's no putNext(T[]).
 What if you need to append another array to your ArrayAppender, not just a
single
 element?

My current codebase has that. It's about time I commit.

Andrei

Jan 09 2009

Ary Borenszweig <ary esperanto.org.ar> writes:

Andrei Alexandrescu a �crit :
 Hello,
 
 
 Walter, Bartosz and myself have been hard at work trying to find the 
 right abstraction for iteration. That abstraction would replace the 
 infamous opApply and would allow for external iteration, thus paving the 
 way to implementing real generic algorithms.
 ...
 http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html
 
 
 Andrei

It looks very nice, though I have a few questions:
- Is std.range's source code somewhere? Or it's just the documentation 
and then the implementation will follow? Because...
- How do you create a range? In the documentation it says that "Built-in 
slices T[] are a direct implementation of random-access ranges", so I 
guess a built-in slice is already a range. But if that is true...
- How is "void next(T)(ref T[] range)" implemented? If I pass a built-in 
slice to it, how does the template store the state of where in the range 
are we? Or maybe you'd need to do Range!(...) to create a range?
- What do I do to make a collection implement a range? Do I need to 
implement the templates in std.range using template conditions?

Sorry if my questions are silly, I don't know much about templated code.

Sep 09 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Ary Borenszweig wrote:
 Andrei Alexandrescu a �crit :
 Hello,


 Walter, Bartosz and myself have been hard at work trying to find the 
 right abstraction for iteration. That abstraction would replace the 
 infamous opApply and would allow for external iteration, thus paving 
 the way to implementing real generic algorithms.
 ...
 http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html


 Andrei

 
 It looks very nice, though I have a few questions:
 - Is std.range's source code somewhere? Or it's just the documentation 
 and then the implementation will follow?

There is an implementation that does not compile :o|.

 Because...
 - How do you create a range? In the documentation it says that "Built-in 
 slices T[] are a direct implementation of random-access ranges", so I 
 guess a built-in slice is already a range.

A slice is a range alright without any extra adaptation. It has some 
extra functions, e.g. ~=, that are not defined for ranges.

 But if that is true...
 - How is "void next(T)(ref T[] range)" implemented? If I pass a built-in 
 slice to it, how does the template store the state of where in the range 
 are we? Or maybe you'd need to do Range!(...) to create a range?

void next(T)(ref T[] range) { range = range[1 .. $]; }

 - What do I do to make a collection implement a range? Do I need to 
 implement the templates in std.range using template conditions?

Oh, much simpler. You don't need to use templates at all if you know the 
type in advance.

// assume a collection of ints
// using old names
struct Collection
{
     struct Range
     {
         bool isEmpty() { ... }
         ref int left() { ... }
         void next() { ... }
     }
     Range all() { ... }
}

Collection c;
foreach (auto r = c.all; !r.isEmpty; r.next)
{
     writeln(r.left);
}

The advantage of the above is not that it offers you looping over your 
collection. The advantage is that your collection now can use many of 
the algorithms in std.algorithm, and others written to use ranges.

Collection.Range is in intimate connection with Collection because it 
understands the mechanism of walking the collection.

Your code won't currently compile because returning ref int from left() 
is not allowed.


Andrei

Sep 09 2008

Lars Ivar Igesund <larsivar igesund.net> writes:

Andrei Alexandrescu wrote:

 A slice is a range alright without any extra adaptation. It has some
 extra functions, e.g. ~=, that are not defined for ranges.

Aren't slices const/readonly/whatnot and thus ~= not possible without
copying/allocation?

-- 
Lars Ivar Igesund
blog at http://larsivi.net
DSource, #d.tango & #D: larsivi
Dancing the Tango

Sep 09 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Lars Ivar Igesund wrote:
 Andrei Alexandrescu wrote:
 
 A slice is a range alright without any extra adaptation. It has some
 extra functions, e.g. ~=, that are not defined for ranges.

 
 Aren't slices const/readonly/whatnot and thus ~= not possible without
 copying/allocation?

Well there's no change in semantics of slices (meaning T[]) between D1 
and D2, so slices mean business as usual. Maybe you are referring to 
strings, aka invariant(char)[]?

Anyhow, today's ~= behaves really really erratically. I'd get rid of it 
if I could. Take a look at this:

import std.stdio;

void main(string args[]) {
     auto a = new int[10];
     a[] = 10;
     auto b = a;
     writeln(b);
     a = a[1 .. 5];
     a ~= [ 34, 345, 4324 ];
     writeln(b);
}

The program will print all 10s two times. But if we change a[1 .. 5] 
with a[0 .. 5] the behavior will be very different! a will grow "over" 
b, thus stomping over its content.

This is really bad because the behavior of a simple operation ~= depends 
on the history of the slice on the left hand side, something often 
extremely difficult to track, and actually impossible if the slice was 
received as an argument to a function.

IMHO such a faux pas is inadmissible for a modern language.


Andrei

Sep 09 2008

Lars Ivar Igesund <larsivar igesund.net> writes:

Andrei Alexandrescu wrote:

 Lars Ivar Igesund wrote:
 Andrei Alexandrescu wrote:
 
 A slice is a range alright without any extra adaptation. It has some
 extra functions, e.g. ~=, that are not defined for ranges.

 
 Aren't slices const/readonly/whatnot and thus ~= not possible without
 copying/allocation?

 
 Well there's no change in semantics of slices (meaning T[]) between D1
 and D2, so slices mean business as usual. Maybe you are referring to
 strings, aka invariant(char)[]?

No, I actually referred to what you say below. My point was that ~= is an
unsafe operation on slices (not impossible as I apparently said), and thus
you need copy-on-write to be safe from erratic behaviour.

 
 Anyhow, today's ~= behaves really really erratically. I'd get rid of it
 if I could. Take a look at this:
 
 import std.stdio;
 
 void main(string args[]) {
      auto a = new int[10];
      a[] = 10;
      auto b = a;
      writeln(b);
      a = a[1 .. 5];
      a ~= [ 34, 345, 4324 ];
      writeln(b);
 }
 
 The program will print all 10s two times. But if we change a[1 .. 5]
 with a[0 .. 5] the behavior will be very different! a will grow "over"
 b, thus stomping over its content.
 
 This is really bad because the behavior of a simple operation ~= depends
 on the history of the slice on the left hand side, something often
 extremely difficult to track, and actually impossible if the slice was
 received as an argument to a function.
 
 IMHO such a faux pas is inadmissible for a modern language.
 
 
 Andrei

-- 
Lars Ivar Igesund
blog at http://larsivi.net
DSource, #d.tango & #D: larsivi
Dancing the Tango

Sep 09 2008

Sean Kelly <sean invisibleduck.org> writes:

Andrei Alexandrescu wrote:
 Lars Ivar Igesund wrote:
 Andrei Alexandrescu wrote:

 A slice is a range alright without any extra adaptation. It has some
 extra functions, e.g. ~=, that are not defined for ranges.

 Aren't slices const/readonly/whatnot and thus ~= not possible without
 copying/allocation?

 
 Well there's no change in semantics of slices (meaning T[]) between D1 
 and D2, so slices mean business as usual. Maybe you are referring to 
 strings, aka invariant(char)[]?

I do think it's a fair point that ~= could be considered an operation 
that constructs a new container (an array, in this case) using a range 
(slice) as an initializer.  The weird issue right now is that there is 
effectively no difference between a slice and an array insofar as the 
language or code representation are concerned.  In many instances this 
is an advantage, but it leads to some issues, like the one you describe 
below.

 Anyhow, today's ~= behaves really really erratically. I'd get rid of it 
 if I could. Take a look at this:
 
 import std.stdio;
 
 void main(string args[]) {
     auto a = new int[10];
     a[] = 10;
     auto b = a;
     writeln(b);
     a = a[1 .. 5];
     a ~= [ 34, 345, 4324 ];
     writeln(b);
 }
 
 The program will print all 10s two times. But if we change a[1 .. 5] 
 with a[0 .. 5] the behavior will be very different! a will grow "over" 
 b, thus stomping over its content.
 
 This is really bad because the behavior of a simple operation ~= depends 
 on the history of the slice on the left hand side, something often 
 extremely difficult to track, and actually impossible if the slice was 
 received as an argument to a function.
 
 IMHO such a faux pas is inadmissible for a modern language.

This would be easy to fix by making arrays / slices fatter (by adding a 
capacity field, for example), but I'm still not convinced that's the 
right thing to do.  However, it may well be preferable to eliminating 
appending completely.  The obvious alternative would be either to 
resurrect head const (not gonna happen) or to make append always 
reallocation (not at all ideal).


Sean

Sep 09 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Sean Kelly wrote:
 This would be easy to fix by making arrays / slices fatter (by adding a 
 capacity field, for example), but I'm still not convinced that's the 
 right thing to do.  However, it may well be preferable to eliminating 
 appending completely.  The obvious alternative would be either to 
 resurrect head const (not gonna happen) or to make append always 
 reallocation (not at all ideal).

I couldn't imagine it put any better. Maybe time has come for starting 
to look into a good solution for this problem. The way things are now, 
~= muddles the clean territory that slices cover.

Consider we define "unowned" arrays are arrays as allocated by new T[n]. 
They are "unowned" because no entity controls them except the garbage 
collector, which by definition recycles them when it's sure you couldn't 
tell.

An "owned" array would be something like a scope variable or an 
up-and-coming BlockArray!(T) with a destructor.

Slices are beautiful for iterating owned and unowned arrays just as 
well. You can have the slice refer to any range of any array no problem. 
Calling a ~ b creates a new, unowned array containing their 
concatenation. Assigning a = new T[n]; binds a to a fresh unowned array. 
And so on.

And all of a sudden we step on a kaka in this beautiful garden. Under 
very special, undetectable circumstances, a range becomes Hitler, 
annexes an adjacent range, and starts walking all over it. Sigh.


Andrei

Sep 09 2008

Oskar Linde <oskar.lindeREM OVEgmail.com> writes:

Andrei Alexandrescu wrote:
 Sean Kelly wrote:
 This would be easy to fix by making arrays / slices fatter (by adding 
 a capacity field, for example), but I'm still not convinced that's the 
 right thing to do.  However, it may well be preferable to eliminating 
 appending completely.  The obvious alternative would be either to 
 resurrect head const (not gonna happen) or to make append always 
 reallocation (not at all ideal).

 
 I couldn't imagine it put any better. Maybe time has come for starting 
 to look into a good solution for this problem. The way things are now, 
 ~= muddles the clean territory that slices cover.
 
 Consider we define "unowned" arrays are arrays as allocated by new T[n]. 
 They are "unowned" because no entity controls them except the garbage 
 collector, which by definition recycles them when it's sure you couldn't 
 tell.
 
 An "owned" array would be something like a scope variable or an 
 up-and-coming BlockArray!(T) with a destructor.
 
 Slices are beautiful for iterating owned and unowned arrays just as 
 well. You can have the slice refer to any range of any array no problem. 
 Calling a ~ b creates a new, unowned array containing their 
 concatenation. Assigning a = new T[n]; binds a to a fresh unowned array. 
 And so on.
 
 And all of a sudden we step on a kaka in this beautiful garden. Under 
 very special, undetectable circumstances, a range becomes Hitler, 
 annexes an adjacent range, and starts walking all over it. Sigh.

I'm very glad you share my thoughts on ~=. The current D T[] is a tool 
that has been stuffed with too many concepts.

Arrays and slices are two fundamentally different concepts that T[] 
actually manage to capture impressively well, but unfortunately not 
fully. And it is the last bit that makes the puzzle complicated.

One of the biggest differences between an array and a slice lies in the 
ownership of the data. And as far as I see it, arrays are conceptually 
better implemented as reference types, while slices are a natural value 
type.

So by removing ~= from T[], T[] becomes a pure slice type.

This is all the old T[new] discussion once again, but with the gained 
insight that instead of T[new] one could just as well use a pure library 
type.

-- 
Oskar

Sep 11 2008

bearophile <bearophileHUGS lycos.com> writes:

Oskar Linde (and Andrei Alexandrescu):
 So by removing ~= from T[], T[] becomes a pure slice type.

Appending to the built-in dynamic arrays is a fundamental operation (I use it
hundred of times in my code) so if the purpose is just to avoid problems when
extending slices, a different solution can be invented.
For example adding the third (capacity) field to the dyn array struct, the last
bit of the capacity field can be used to tell apart slices from true whole
arrays. So at runtime the code knows how to extend/append the array/slice
correctly. This slows down the appending itself a little, but it's better than
having to use an ugly ArrayBuilder everywhere.

Bye,
bearophile

Sep 11 2008

Sean Kelly <sean invisibleduck.org> writes:

bearophile wrote:
 Oskar Linde (and Andrei Alexandrescu):
 So by removing ~= from T[], T[] becomes a pure slice type.

 
 Appending to the built-in dynamic arrays is a fundamental operation (I use it
hundred of times in my code) so if the purpose is just to avoid problems when
extending slices, a different solution can be invented.
 For example adding the third (capacity) field to the dyn array struct, the
last bit of the capacity field can be used to tell apart slices from true whole
arrays. So at runtime the code knows how to extend/append the array/slice
correctly. This slows down the appending itself a little, but it's better than
having to use an ugly ArrayBuilder everywhere.

I'd think that adding a capacity field should actually speed up append 
operations, since the GC wouldn't have to be queried to determine this 
info.  And as in another thread, the capacity of all slices should 
either be zero or the size of the slice, thus forcing a realloc for any 
append op.


Sean

Sep 11 2008

bearophile <bearophileHUGS lycos.com> writes:

Sean Kelly:
 I'd think that adding a capacity field should actually speed up append 
 operations, since the GC wouldn't have to be queried to determine this 
 info.

Yes, but I meant slower than just adding the capacity field without adding such
extra bit flag to tell apart slices from arrays.


 And as in another thread, the capacity of all slices should 
 either be zero or the size of the slice, thus forcing a realloc for any 
 append op.

Oh, right, no need to a separate bit for tagging then, is the value capacity=0
that's the tag.
Do D designers like this (small) change in the language? :-)

Bye,
bearophile

Sep 11 2008

Sergey Gromov <snake.scaly gmail.com> writes:

bearophile <bearophileHUGS lycos.com> wrote:
 Sean Kelly:
 I'd think that adding a capacity field should actually speed up append 
 operations, since the GC wouldn't have to be queried to determine this 
 info.

 
 Yes, but I meant slower than just adding the capacity field without adding
such extra bit flag to tell apart slices from arrays.
 
 
 And as in another thread, the capacity of all slices should 
 either be zero or the size of the slice, thus forcing a realloc for any 
 append op.

 
 Oh, right, no need to a separate bit for tagging then, is the value capacity=0
that's the tag.
 Do D designers like this (small) change in the language? :-)

It just doesn't work.  Arrays are structs passed by value.  If you pass 
a capacity-array into a function, function appends, then you append to 
your old copy, and you overwrite what the function appended.  If you 
force slice-on-copy semantics, then arrays become elusive, tending to 
implicitly turn to slices whenever you toss them around and then 
reallocate on append.

When I was reading D specs on slicing and appending for the first time 
I've got a strong feeling of hackery.  Slice is a view of another slice 
but you cannot count on that.  Slice can be appended to but you can 
erase parts of other slices in the process.  I've had my share of 
guessing when I was writing a sort of refactoring tool, when I checked 
carefully whether a slice would survive and dupped much more than I wish 
I had.

I'd personally love to have something as simple and natural to use as 
current built-in arrays but with better defined semantics.  I don't like 
Andrei's Array!() thingy but it seems better than anything proposed 
before.

Sep 11 2008

Oskar Linde <oskar.lindeREM OVEgmail.com> writes:

Sean Kelly wrote:
 bearophile wrote:
 Oskar Linde (and Andrei Alexandrescu):
 So by removing ~= from T[], T[] becomes a pure slice type.

 Appending to the built-in dynamic arrays is a fundamental operation (I 
 use it hundred of times in my code) so if the purpose is just to avoid 
 problems when extending slices, a different solution can be invented.


I agree that it is a fundamental operation, and my code contains 
hundreds of uses too. But the number of uses are actually fewer than I 
thought. One project of mine has only 157 ~= out of a total of 18000 
lines of code, and the cases are by their nature quite easily 
identified. Arbitrary code doesn't usually append to arbitrary slices.

 For example adding the third (capacity) field to the dyn array struct, 
 the last bit of the capacity field can be used to tell apart slices 
 from true whole arrays. So at runtime the code knows how to 
 extend/append the array/slice correctly. This slows down the appending 
 itself a little, but it's better than having to use an ugly 
 ArrayBuilder everywhere.

 
 I'd think that adding a capacity field should actually speed up append 
 operations, since the GC wouldn't have to be queried to determine this 
 info.  And as in another thread, the capacity of all slices should 
 either be zero or the size of the slice, thus forcing a realloc for any 
 append op.
 
 
 Sean

capacity = the size of of the slice won't work, since then you could 
transform a slice into a resizable array by mistake:

s = a[5..7];
// s.capacity = 2
t = s;
s.length = s.length - 1;
s ~= x;

so that basically means that capacity has to be = 0 for slices, and != 0 
for resizable arrays.

Without considering whether arrays would gain from having the capacity 
readily accessible, the advantage from this would be to have a run-time 
way to separate the slice from the array at the cost of 50 % increased 
storage. But even though this information would only be accessible at 
run-time, it is fully deducible at compile time. So you lose all compile 
time gains from separating the two concepts.

-- 
Oskar

Sep 11 2008

Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:

Andrei Alexandrescu wrote:
 Lars Ivar Igesund wrote:
 Andrei Alexandrescu wrote:

 A slice is a range alright without any extra adaptation. It has some
 extra functions, e.g. ~=, that are not defined for ranges.

 Aren't slices const/readonly/whatnot and thus ~= not possible without
 copying/allocation?

 
 Well there's no change in semantics of slices (meaning T[]) between D1 
 and D2, so slices mean business as usual. Maybe you are referring to 
 strings, aka invariant(char)[]?
 
 Anyhow, today's ~= behaves really really erratically. I'd get rid of it 
 if I could. Take a look at this:
 
 import std.stdio;
 
 void main(string args[]) {
     auto a = new int[10];
     a[] = 10;
     auto b = a;
     writeln(b);
     a = a[1 .. 5];
     a ~= [ 34, 345, 4324 ];
     writeln(b);
 }
 
 The program will print all 10s two times. But if we change a[1 .. 5] 
 with a[0 .. 5] the behavior will be very different! a will grow "over" 
 b, thus stomping over its content.
 
 This is really bad because the behavior of a simple operation ~= depends 
 on the history of the slice on the left hand side, something often 
 extremely difficult to track, and actually impossible if the slice was 
 received as an argument to a function.
 
 IMHO such a faux pas is inadmissible for a modern language.
 
 
 Andrei

Cool, good to see this is going to be taken care of, it is a horrible wart.


-- 
Bruno Medeiros - Software Developer, MSc. in CS/E graduate
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D

Sep 25 2008

Derek Parnell <derek psych.ward> writes:

On Tue, 09 Sep 2008 10:45:53 -0500, Andrei Alexandrescu wrote:


          ref int left() { ... }

Is "left" a "movement in a specific direction" as in "go left at the next
lights" or the "amount of stuff left over"? It is a bit ambiguous. Even if
it is a direction, is it moving towards the first or the last item? It is
not self-evident. As a user of the Latin alphabet I'd assume it was going
towards the first item but a Hebrew or Arabic users might assume it was
heading towards the end.

-- 
Derek Parnell
Melbourne, Australia
skype: derek.j.parnell

Sep 09 2008

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

"Derek Parnell" wrote
 On Tue, 09 Sep 2008 10:45:53 -0500, Andrei Alexandrescu wrote:


          ref int left() { ... }

 Is "left" a "movement in a specific direction" as in "go left at the next
 lights" or the "amount of stuff left over"? It is a bit ambiguous. Even if
 it is a direction, is it moving towards the first or the last item? It is
 not self-evident. As a user of the Latin alphabet I'd assume it was going
 towards the first item but a Hebrew or Arabic users might assume it was
 heading towards the end.

It means 'left-most element in the range'.  It gets you the first element in 
the range (i.e. the next element to iterate) without modifying the range.

I agree that it is very misleading, but I think Andrei is exploring other 
possibilities (see other threads).

-Steve

Sep 09 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Steven Schveighoffer wrote:
 "Derek Parnell" wrote
 On Tue, 09 Sep 2008 10:45:53 -0500, Andrei Alexandrescu wrote:


          ref int left() { ... }

 Is "left" a "movement in a specific direction" as in "go left at the next
 lights" or the "amount of stuff left over"? It is a bit ambiguous. Even if
 it is a direction, is it moving towards the first or the last item? It is
 not self-evident. As a user of the Latin alphabet I'd assume it was going
 towards the first item but a Hebrew or Arabic users might assume it was
 heading towards the end.

 
 It means 'left-most element in the range'.  It gets you the first element in 
 the range (i.e. the next element to iterate) without modifying the range.
 
 I agree that it is very misleading, but I think Andrei is exploring other 
 possibilities (see other threads).

Finally the coin dropped on the Arabic/Hebrew cultural thing. I don't 
think they'd be offended. This is not writing. Left is left and right is 
right in math.

But yes... first and last are in I guess. I'd also like *r as a shortcut 
for r.first, as it will be no doubt used very intensively.


Andrei

Sep 09 2008

Derek Parnell <derek nomail.afraid.org> writes:

On Tue, 09 Sep 2008 18:04:12 -0500, Andrei Alexandrescu wrote:

 But yes... first and last are in I guess.

Yes, I understand this. I am just raking old coals to stress the importance
of choosing the "right" (or is that "correct") word; one that promotes
least synaptic double-takes.

 Finally the coin dropped on the Arabic/Hebrew cultural thing. I don't 
 think they'd be offended.

I wasn't thinking about offense, just cultural assumptions.

 This is not writing. Left is left and right is right in math.

Well, I beg to differ. I believe that programming languages are closer to
prose than they are to maths. Even if 'left' and 'right' have specific
meanings in maths, people reading someone else's code and seeing r.left
might not know that they are reading "maths". I'm sure most readers (at
least while learning) will apply their cultural bias when interpreting the
written text, and all I'm saying is that r.left may very well mean
different things to different people.

-- 
Derek
(skype: derek.j.parnell)
Melbourne, Australia
10/09/2008 9:58:38 AM

Sep 09 2008

"Bill Baxter" <wbaxter gmail.com> writes:

On Wed, Sep 10, 2008 at 8:04 AM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:

 Finally the coin dropped on the Arabic/Hebrew cultural thing. I don't think
 they'd be offended. This is not writing. Left is left and right is right in
 math.

Also the direction in which D code is written does not depend on the
language of the speaker.  It's always left to right.
So I think there's no real argument on linguistic grounds.

On the other hand, a quick google for "left right confusion" turns up
a fair number of relevant hits.  There's enough people out there who
have trouble keeping those directions straight for it to get
discussed.  Searches for "begin end confusion", "front back
confusion", "first last confusion" predictably turned up no relevant
hits I could find.

 But yes... first and last are in I guess. I'd also like *r as a shortcut for
 r.first, as it will be no doubt used very intensively.

Recognizing that the typical usage for these things will be that
"first" is the current value and "last" is actually a bogus sentinel,
I guess I would rather see something like .value or .item for the
current value.  I can understand the pull to try to make the names
symmetric, but in fact the things they represent are not really
symmetric, so I don't see it as a  requirement that the names be
symmetric.

And opStar is hard to search for so I'd rather not see that at all.
Note also that if you declare that * is an alias for .first in the
ranges interface that means that every implementor of a range will
have to remember include that alias.

--bb

Sep 09 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Bill Baxter wrote:
 On Wed, Sep 10, 2008 at 8:04 AM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 
 Finally the coin dropped on the Arabic/Hebrew cultural thing. I don't think
 they'd be offended. This is not writing. Left is left and right is right in
 math.

 
 Also the direction in which D code is written does not depend on the
 language of the speaker.  It's always left to right.
 So I think there's no real argument on linguistic grounds.
 
 On the other hand, a quick google for "left right confusion" turns up
 a fair number of relevant hits.

Yep. I needn't google any farther than my wife :o).

Andrei

Sep 09 2008

Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:

Andrei Alexandrescu wrote:
I'd also like *r as a shortcut for r.first, 

Agh, yuck! :(


-- 
Bruno Medeiros - Software Developer, MSc. in CS/E graduate
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D

Sep 25 2008

Derek Parnell <derek nomail.afraid.org> writes:

On Tue, 9 Sep 2008 18:28:34 -0400, Steven Schveighoffer wrote:

 It means 'left-most element in the range'.  It gets you the first element in 
 the range (i.e. the next element to iterate) without modifying the range.
 
 I agree that it is very misleading, but I think Andrei is exploring other 
 possibilities (see other threads).

Thanks. I was playing at "devil's advocate" as my real point was that
"left" is way too overloaded with different meanings and is thus not a
suitable choice. We already have this problem a few times in D (eg.
'static')

-- 
Derek
(skype: derek.j.parnell)
Melbourne, Australia
10/09/2008 9:56:09 AM

Sep 09 2008

bearophile <bearophileHUGS lycos.com> writes:

Andrei Alexandrescu:

For my money, other collection/algorithms designs don't hold a candle to STL's.<

I know that the STL is a highly refined piece of technology; after reading lot
of things written by Alexander Stepanov I was impressed.

Still, other languages don't care of the STL because they really want to be
simpler than C++ (I think you probably need a significant part of the
complexity of C++ language to implement a good STL), so the other languages
(and their std libs) may look worse to you, but for a lot of people those
languages and std libs aren't worse, they trade some power (that you can find
in STL) with a simpler language that more programmers may want to use. The
practical result is that today there may be two programmers using the Java std
lib for each C++ STL user.

become closer to C++.

I know it's a matter of balance. I am not sure what gave you the idea that I
didn't.<

Probably your balance seems toward more complexity than I like to have in a
language. I presume Walter will follow your advice, but a too much complex
language (even if more powerful and more efficient than the current D) may not
be what most people look for. We'll see if I'm right.
I'm not good enough yet to design a language, so I have to leave you the work
:-)

I think many would agree that foreach minus the disadvantages would be better.<

Only if the increase in complexity for this specific thing is perceived by most
D programmers as less important than the improvement in performance and
flexibility, of course :-)

How do I know that? Because I measured. Why did I measure? Because I care. Why
do I care? Because the typical runtime of my programs measure in hours of sheer
computation, and because things like reduce and others in std.algorithm are in
the core loops. And I am not alone.<

I can see there's a large difference in purposes here, while I too care and I
too have written few thousands of lines of code to benchmark most things I have
written, the point of my functional stuff lib is mostly to replace code that's
not in core loops, it's designed to improve coding for the 80-90% of the code
where performance isn't so important.

Costly abstractions are a dime a dozen. It's efficient abstractions that are
harder to come across.<

I have understood this only after reading long texts by Alexander Stepanov
regarding the STL :-) That's where I have gained respect for the STL and the
C++ language, and their designers.

But elsewhere I have also realized that in a very large percentage of lines of
code in a program (80%?) you don't need max performance, and in such parts of
the code there are other things that deserve to be maximized: succinctness of
code, flexibility, usage simplicity, adaptability to a large number of
situations, anti-bug-prone coding, etc. About the same qualities a built-in
data structure like the Associative arrays is useful for. Paying the price that
comes from code that strives for max performance in the whole program leads to
lot of brainpower required to write the code, very complex bugs, etc.

If you believe I am wasting time with cutesies in std.algorithm, please let me
know of the places and of the suggested improvements.<

There are few things I don't like there, but nothing major :-)
Time ago I have posted something in the main D newsgroup, I think few things
are changed in the meantime (I think Walter has improved max/min):

http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=67113

I agree.<

Oh good, then you may agree with me that the langage too may enjoy to have some
ability to "scale down" :-)

To quote a classic, Lisp has had them all for 50 years.<

But Python syntax is few lightyears ahead compared to CLisp ;-)

I plan to implement things like lazyMap and lazyReduce,<

I presume the large body of (refined) code I have already written on this in my
libs is useless to you, because it's based on opApply... But maybe you can find
use in few names, some argument lists, etc (In the next days I have some of
that code to show to Walter, but it becomes even more useless for Walter now).

Thank you for the patience and your kind answers,
bye,
bearophile

Sep 09 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

bearophile wrote:
Andrei Alexandrescu:

For my money, other collection/algorithms designs don't hold a
candle to STL's.<

I know that the STL is a highly refined piece of technology; after
reading lot of things written by Alexander Stepanov I was impressed.

Still, other languages don't care of the STL because they really want
to be simpler than C++ (I think you probably need a significant part
of the complexity of C++ language to implement a good STL), so the
other languages (and their std libs) may look worse to you, but for a
lot of people those languages and std libs aren't worse, they trade
some power (that you can find in STL) with a simpler language that
more programmers may want to use. The practical result is that today
there may be two programmers using the Java std lib for each C++ STL

it's trying to become closer to C++.

Thanks for your continued comments.

I agree with most of what you say, so probably most differences are in
the nuances. First nuance is that you seem to expose a false dichotomy:
"simple" versus "allows power"/"allows STL".

For all I know, much of Walter's claim to fame with D is that he proved
that power/efficiency and ease of use are not an either-or choice as
most people thought.

Also for all I know, D already is more powerful than C++. There are a
few remaining stragglers such as return by reference. Given that D is
already more powerful than C++ and also much simpler, we could take one
of two routes:

1. Implement STL in D, which, as std.algorithm shows, is compellingly
simpler and better than its C++ counterpart.

2. Implement an inferior containers/algorithms design from a less
powerful language.

I'd say you'd have to bring a very strong case for (2) to make it stick.

There's one more nuance to bring up here. D is meant to be a
systems-level programming language, and as such one clear goal of it is
to obviate a need for C++. In the inner circles of C++ there's often
talk about how a replacement for C++ looks like, and to quote a classic
who reflects the general opinion: "Whatever replaces C++ should be at
least as good as C++ at whatever C++ is good at, and better than C++ at
whatever C++ is bad at."

I know it's a matter of balance. I am not sure what gave you the
idea that I didn't.<

Probably your balance seems toward more complexity than I like to
have in a language. I presume Walter will follow your advice, but a
too much complex language (even if more powerful and more efficient
than the current D) may not be what most people look for. We'll see
if I'm right. I'm not good enough yet to design a language, so I have
to leave you the work :-)

As the thread on T.fail shows, I'm fighting tooth and nail against
adding complexity to the language whenever it can be helped.

I think many would agree that foreach minus the disadvantages would
be better.<

Only if the increase in complexity for this specific thing is
perceived by most D programmers as less important than the
improvement in performance and flexibility, of course :-)

How do I know that? Because I measured. Why did I measure? Because
I care. Why do I care? Because the typical runtime of my programs
measure in hours of sheer computation, and because things like
reduce and others in std.algorithm are in the core loops. And I am
not alone.<

I can see there's a large difference in purposes here, while I too
care and I too have written few thousands of lines of code to
benchmark most things I have written, the point of my functional
stuff lib is mostly to replace code that's not in core loops, it's
designed to improve coding for the 80-90% of the code where
performance isn't so important.

And what would you rather have in the standard library? And what
advantages are you showing in exchange for the steep penalty in
efficiency? (I looked over your library a while ago and it seems to use
delegates everywhere.)

Costly abstractions are a dime a dozen. It's efficient abstractions
that are harder to come across.<

I have understood this only after reading long texts by Alexander
Stepanov regarding the STL :-) That's where I have gained respect for
the STL and the C++ language, and their designers.

But elsewhere I have also realized that in a very large percentage of
lines of code in a program (80%?) you don't need max performance, and
in such parts of the code there are other things that deserve to be
maximized: succinctness of code, flexibility, usage simplicity,
adaptability to a large number of situations, anti-bug-prone coding,
etc. About the same qualities a built-in data structure like the
Associative arrays is useful for. Paying the price that comes from
code that strives for max performance in the whole program leads to
lot of brainpower required to write the code, very complex bugs, etc.

Again you offer a false dichotomy. Do I want a beautiful wife or a smart
one? D code can enjoy the properties you mention without needing to pay
an arm and a leg for them.

If you believe I am wasting time with cutesies in std.algorithm,
please let me know of the places and of the suggested
improvements.<

There are few things I don't like there, but nothing major :-) Time
ago I have posted something in the main D newsgroup, I think few
things are changed in the meantime (I think Walter has improved
max/min):

http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=67113

Thanks. I'll heed some of the comments, particularly those regarding
laziness. In fact I'm thinking that map should be lazy by default.

I agree.<

Oh good, then you may agree with me that the langage too may enjoy to
have some ability to "scale down" :-)

To quote a classic, Lisp has had them all for 50 years.<

But Python syntax is few lightyears ahead compared to CLisp ;-)

I plan to implement things like lazyMap and lazyReduce,<

I presume the large body of (refined) code I have already written on
this in my libs is useless to you, because it's based on opApply...
But maybe you can find use in few names, some argument lists, etc (In
the next days I have some of that code to show to Walter, but it
becomes even more useless for Walter now).

If there are no licensing issues (which Walter is weary of) I'd be
grateful to benefit of your code or designs. Do you have a link handy?

Andrei

Sep 09 2008

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

"Andrei Alexandrescu" wrote
<snip>

Excellent ideas!  I think the best part is about how you almost never need 
individual iterators, only ever ranges.  Perfectly explained!

One issue that you might not have considered is using a container as a data 
structure, and not using it for algorithms.  For example, how would one 
specify that one wants to remove a specific element, not a range of 
elements.  Having a construct that points to a specific element but not any 
specific end element might be a requirement for non-algorithmic reasons.

Also, some ranges might become invalid later on whereas the iterators would 
not.  Take for example a Hash container.  If you have to rehash the table, a 
range now might not make any sense, as the 'end' element may have moved to 
be before the 'begin' element.  But an iterator that points to a given 
element will still be valid (and could be used for removal).  In fact, I 
don't think ranges make a lot of sense for things like Hash where there 
isn't any defined order.  But you still should have a 'pointer' type to 
support O(1) removal.

One doesn't see any of these problems with arrays, because with arrays, you 
are guaranteed to have contiguous memory.

What I'm trying to say is there may be a reason to have pointers for certain 
containers, even though they might be unsafe.

My personal pet peeve of many container implementations is not being able to 
remove elements from a container while iterating.  For example, I have a 
linked list of open file descriptors, iterate over the list, closing and 
removing those that are done (which should be O(1) per removal).  In many 
container implementations, iteration implies immutable, which means you have 
to add references to the elements to remove to another list to then remove 
afterwards (somtimes at the cost of O(n) per removal.  grrrr.)  I hope 
ranges will support removal while traversing.

That's all I have for now, I have to go think about how this will impact 
dcollections :)

-Steve

Sep 09 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Steven Schveighoffer wrote:
 "Andrei Alexandrescu" wrote
 <snip>
 
 Excellent ideas!  I think the best part is about how you almost never need 
 individual iterators, only ever ranges.  Perfectly explained!
 
 One issue that you might not have considered is using a container as a data 
 structure, and not using it for algorithms.  For example, how would one 
 specify that one wants to remove a specific element, not a range of 
 elements.  Having a construct that points to a specific element but not any 
 specific end element might be a requirement for non-algorithmic reasons.

I'm sure you know and imply this, but just to clarify for everybody: 
Modifying the topology of the container is a task carried by the 
primitives of the container. Ranges can "look" at the topology and 
change elements sitting in it, but not alter the topology.

Much like in STL, there's a container primitive for removing a range. It 
will return a range too, namely the range starting at the deleted 
position. Removing an element is really removing a range of one element 
- just a particular case.

 Also, some ranges might become invalid later on whereas the iterators would 
 not.  Take for example a Hash container.  If you have to rehash the table, a 
 range now might not make any sense, as the 'end' element may have moved to 
 be before the 'begin' element.  But an iterator that points to a given 
 element will still be valid (and could be used for removal).  In fact, I 
 don't think ranges make a lot of sense for things like Hash where there 
 isn't any defined order.  But you still should have a 'pointer' type to 
 support O(1) removal.

Iterators can be easily defined over hashtables, but indeed they are 
easily invalidated if implemented efficiently.

 One doesn't see any of these problems with arrays, because with arrays, you 
 are guaranteed to have contiguous memory.
 
 What I'm trying to say is there may be a reason to have pointers for certain 
 containers, even though they might be unsafe.

A pointer to an element can be taken as &(r.first). The range may or may 
not allow that, it's up to it.

 My personal pet peeve of many container implementations is not being able to 
 remove elements from a container while iterating.  For example, I have a 
 linked list of open file descriptors, iterate over the list, closing and 
 removing those that are done (which should be O(1) per removal).  In many 
 container implementations, iteration implies immutable, which means you have 
 to add references to the elements to remove to another list to then remove 
 afterwards (somtimes at the cost of O(n) per removal.  grrrr.)  I hope 
 ranges will support removal while traversing.

In STL removing from a list while iterating is easy and efficient, 
albeit verbose as always:

list<Filedesc> lst;
for (list<Filedesc>::iterator i = lst.begin(); i != lst.end(); )
{
     if (should_remove) i = lst.erase(i);
     else ++i;
}

In Phobos things will be something like:

List!(Filedesc) lst;
for (auto r = lst.all; !r.isEmpty; )
{
     if (should_remove) r = lst.erase(take(1, r));
     else r.next;
}


Andrei

Sep 09 2008

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

"Andrei Alexandrescu" wrote
 Steven Schveighoffer wrote:
 "Andrei Alexandrescu" wrote
 <snip>

 Excellent ideas!  I think the best part is about how you almost never 
 need individual iterators, only ever ranges.  Perfectly explained!

 One issue that you might not have considered is using a container as a 
 data structure, and not using it for algorithms.  For example, how would 
 one specify that one wants to remove a specific element, not a range of 
 elements.  Having a construct that points to a specific element but not 
 any specific end element might be a requirement for non-algorithmic 
 reasons.

 I'm sure you know and imply this, but just to clarify for everybody: 
 Modifying the topology of the container is a task carried by the 
 primitives of the container. Ranges can "look" at the topology and change 
 elements sitting in it, but not alter the topology.

I agree.  There are cases where just an iterator is necessary.  For example, 
a linked-list implementation where the length is calculated instead of 
stored.  But that is the exception, the rule should be that you always ask 
the container to alter the topology.

 Much like in STL, there's a container primitive for removing a range. It 
 will return a range too, namely the range starting at the deleted 
 position. Removing an element is really removing a range of one element - 
 just a particular case.

Yes, but the problem I see is how do you specify a range of one element.  In 
the case of an array, it is easy because you always know that no matter what 
happens to a container, the end of '1' element is a pointer to the next 
element in memory.  In the case of other containers, which could have 
changed since you obtained the range, you cannot be sure the 'range of 1' 
hasn't changed.  For instance, I would assume that a linked list range has 
two pointers, one to the first element, and one to the element just past the 
last element in the range.  But what if an element is inserted inbetween? 
Then your range suddenly got bigger.

What I'm trying to say is, maybe it would be desirable to have a pointer to 
exactly one element, instead of a range that could possibly change. 
Operations on that pointer type would be supported just like the operations 
on the ranges, but is more specific.

 Also, some ranges might become invalid later on whereas the iterators 
 would not.  Take for example a Hash container.  If you have to rehash the 
 table, a range now might not make any sense, as the 'end' element may 
 have moved to be before the 'begin' element.  But an iterator that points 
 to a given element will still be valid (and could be used for removal). 
 In fact, I don't think ranges make a lot of sense for things like Hash 
 where there isn't any defined order.  But you still should have a 
 'pointer' type to support O(1) removal.

 Iterators can be easily defined over hashtables, but indeed they are 
 easily invalidated if implemented efficiently.

Iterators don't have to be invalidated, but ranges would.

 One doesn't see any of these problems with arrays, because with arrays, 
 you are guaranteed to have contiguous memory.

 What I'm trying to say is there may be a reason to have pointers for 
 certain containers, even though they might be unsafe.

 A pointer to an element can be taken as &(r.first). The range may or may 
 not allow that, it's up to it.

I thought you stated that 'pointers' shouldn't be allowed, only ranges?  In 
general, I agree with that, but I think the ability to use a pointer type 
instead of ranges has advantages in some cases.

 My personal pet peeve of many container implementations is not being able 
 to remove elements from a container while iterating.  For example, I have 
 a linked list of open file descriptors, iterate over the list, closing 
 and removing those that are done (which should be O(1) per removal).  In 
 many container implementations, iteration implies immutable, which means 
 you have to add references to the elements to remove to another list to 
 then remove afterwards (somtimes at the cost of O(n) per removal. 
 grrrr.)  I hope ranges will support removal while traversing.

 In STL removing from a list while iterating is easy and efficient, albeit 
 verbose as always:

 list<Filedesc> lst;
 for (list<Filedesc>::iterator i = lst.begin(); i != lst.end(); )
 {
     if (should_remove) i = lst.erase(i);
     else ++i;
 }

 In Phobos things will be something like:

 List!(Filedesc) lst;
 for (auto r = lst.all; !r.isEmpty; )
 {
     if (should_remove) r = lst.erase(take(1, r));
     else r.next;
 }

I prefer the dcollections syntax, but I did write it, so that's to be 
expected :) :

foreach(ref doPurge, fd; lst.purger)
   doPurge = shouldIRemove(fd);

Or if you prefer iterator-style syntax:

for(auto i = lst.begin; i != lst.end;)
{
    if shouldIRemove(i.value) i = lst.remove(i);
    else ++i;
}

But yes, I see that ranges are used for removal, and that should be 
supported in ordered containers.  But the notion of storing a reference to a 
single element as a 'range of 1' in certain containers is troublesome I 
think.

-Steve

Sep 09 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Steven Schveighoffer wrote:
 "Andrei Alexandrescu" wrote
 I thought you stated that 'pointers' shouldn't be allowed, only ranges?  In 
 general, I agree with that, but I think the ability to use a pointer type 
 instead of ranges has advantages in some cases.

I think there's a little confusion. There's three things:

1. Ranges
2. Iterators
3. Pointers, e.g. the exact address where the object sits in memory

My address uses 1 and drops 2. You still have access to 3 if you so need.

void showAddresses(R)(R r)
{
     for (size_t i = 0; !r.isEmpty; r.next, ++i)
     {
         writeln("Element ," i, " is sitting at address: ", &(r.first));
     }
}


Andrei

Sep 09 2008

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

"Andrei Alexandrescu" wrote
 Steven Schveighoffer wrote:
 "Andrei Alexandrescu" wrote
 I thought you stated that 'pointers' shouldn't be allowed, only ranges? 
 In general, I agree with that, but I think the ability to use a pointer 
 type instead of ranges has advantages in some cases.

 I think there's a little confusion. There's three things:

 1. Ranges
 2. Iterators
 3. Pointers, e.g. the exact address where the object sits in memory

Yes, I have been using the terms iterator and pointer interchangably, my bad 
:)  I look at pointers as a specialized type of iterator, ones for which 
only 'dereference' is defined (and on contiguous memory types such as 
arrays, increment and decrement).

 My address uses 1 and drops 2. You still have access to 3 if you so need.

 void showAddresses(R)(R r)
 {
     for (size_t i = 0; !r.isEmpty; r.next, ++i)
     {
         writeln("Element ," i, " is sitting at address: ", &(r.first));
     }
 }

Let me explain by example:

HashMap!(uint, myResource) resources;

....

// returns something that allows me to later remove the element
auto r = resources.find(key);

useResource(r);

resources[newkey] = new myResource;

resources.erase(r);

Now, assuming that adding the new resource rehashes the hash map, what is in 
r such that it ONLY points to the single resource?  A marker saying 'only 
one element'?  Perhaps you just deleted a range you didn't mean to delete, 
when you only wanted to delete a single resource.  Perhaps r is now 
considered 'invalid'.  Granted, this example can be fixed by reordering the 
lines of code, and perhaps you don't care about the penalty of looking up 
the key again, but what if I want to save the iterator to the resource 
somewhere and delete it later in another function?  And what if the cost of 
lookup for removal is not as quick?

I think with a range being the only available 'iterator' type for certain 
containers may make life difficult for stuff like this.  I really don't 
think iterator is the right term for what I think is needed, what I think is 
needed is a dumbed down pointer.  Something that has one operation --  
opStar.  No increment, no decrement, just 'here is a reference to this 
element'  that can be passed into the container to represent a pointer to a 
specific element.

-Steve

Sep 09 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Steven Schveighoffer wrote:
 "Andrei Alexandrescu" wrote
 Steven Schveighoffer wrote:
 "Andrei Alexandrescu" wrote
 I thought you stated that 'pointers' shouldn't be allowed, only ranges? 
 In general, I agree with that, but I think the ability to use a pointer 
 type instead of ranges has advantages in some cases.

 I think there's a little confusion. There's three things:

 1. Ranges
 2. Iterators
 3. Pointers, e.g. the exact address where the object sits in memory

 
 Yes, I have been using the terms iterator and pointer interchangably, my bad 
 :)  I look at pointers as a specialized type of iterator, ones for which 
 only 'dereference' is defined (and on contiguous memory types such as 
 arrays, increment and decrement).
 
 My address uses 1 and drops 2. You still have access to 3 if you so need.

 void showAddresses(R)(R r)
 {
     for (size_t i = 0; !r.isEmpty; r.next, ++i)
     {
         writeln("Element ," i, " is sitting at address: ", &(r.first));
     }
 }

 
 Let me explain by example:
 
 HashMap!(uint, myResource) resources;
 
 ....
 
 // returns something that allows me to later remove the element
 auto r = resources.find(key);
 
 useResource(r);
 
 resources[newkey] = new myResource;
 
 resources.erase(r);
 
 Now, assuming that adding the new resource rehashes the hash map, what is in 
 r such that it ONLY points to the single resource?  A marker saying 'only 
 one element'?  Perhaps you just deleted a range you didn't mean to delete, 
 when you only wanted to delete a single resource.  Perhaps r is now 
 considered 'invalid'.  Granted, this example can be fixed by reordering the 
 lines of code, and perhaps you don't care about the penalty of looking up 
 the key again, but what if I want to save the iterator to the resource 
 somewhere and delete it later in another function?  And what if the cost of 
 lookup for removal is not as quick?
 
 I think with a range being the only available 'iterator' type for certain 
 containers may make life difficult for stuff like this.  I really don't 
 think iterator is the right term for what I think is needed, what I think is 
 needed is a dumbed down pointer.  Something that has one operation --  
 opStar.  No increment, no decrement, just 'here is a reference to this 
 element'  that can be passed into the container to represent a pointer to a 
 specific element.

I understand. My design predicates that you can't model such 
non-iterable iterators. Either you can use it to move along, in which 
case ranges will do just fine, or you can't, in which case my design 
doesn't support it.

Note that the STL does not have non-iterable iterators. I think 
constructing cases where they make sense are tenuous.


Andrei

Sep 09 2008

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

"Andrei Alexandrescu" wrote
 Steven Schveighoffer wrote:
 Let me explain by example:

 HashMap!(uint, myResource) resources;

 ....

 // returns something that allows me to later remove the element
 auto r = resources.find(key);

 useResource(r);

 resources[newkey] = new myResource;

 resources.erase(r);

 Now, assuming that adding the new resource rehashes the hash map, what is 
 in r such that it ONLY points to the single resource?  A marker saying 
 'only one element'?  Perhaps you just deleted a range you didn't mean to 
 delete, when you only wanted to delete a single resource.  Perhaps r is 
 now considered 'invalid'.  Granted, this example can be fixed by 
 reordering the lines of code, and perhaps you don't care about the 
 penalty of looking up the key again, but what if I want to save the 
 iterator to the resource somewhere and delete it later in another 
 function?  And what if the cost of lookup for removal is not as quick?

 I think with a range being the only available 'iterator' type for certain 
 containers may make life difficult for stuff like this.  I really don't 
 think iterator is the right term for what I think is needed, what I think 
 is needed is a dumbed down pointer.  Something that has one operation --  
 opStar.  No increment, no decrement, just 'here is a reference to this 
 element'  that can be passed into the container to represent a pointer to 
 a specific element.

 I understand. My design predicates that you can't model such non-iterable 
 iterators. Either you can use it to move along, in which case ranges will 
 do just fine, or you can't, in which case my design doesn't support it.

 Note that the STL does not have non-iterable iterators. I think 
 constructing cases where they make sense are tenuous.

Well, STL happens to use iterators to specify elements.  It doesn't mean 
that the iterators are used as iterators in that context, it's just that 
it's easier to specify one type that does iteration AND represents position 
:)

For example, std::list defines multiple erase functions:

iterator erase(iterator first, iterator last);
iterator erase(iterator position);

In the second case, the iterator need not support incrementing or 
decrementing (to the user anyway), just referencing.  They just used 
iterator because it's already there :)

But in your proposed scenario, I can't have the second function, only the 
first.  My example shows a case where I'd want the second function.  What I 
basically want is a range type where the upper limit is specified as 'always 
null', so that iterating the range once always results in an empty range, 
even if the container has changed topology.

-Steve

Sep 09 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Steven Schveighoffer wrote:
 "Andrei Alexandrescu" wrote
 For example, std::list defines multiple erase functions:
 
 iterator erase(iterator first, iterator last);
 iterator erase(iterator position);
 
 In the second case, the iterator need not support incrementing or 
 decrementing (to the user anyway), just referencing.  They just used 
 iterator because it's already there :)
 
 But in your proposed scenario, I can't have the second function, only the 
 first.  My example shows a case where I'd want the second function.  What I 
 basically want is a range type where the upper limit is specified as 'always 
 null', so that iterating the range once always results in an empty range, 
 even if the container has changed topology.

I understand. That can't be had in my design. You'd have:

List.Range List.erase(Range toErase);

and you'd model erasure of one element through a range of size one. I 
understand how that can be annoying on occasion, but I consider that a 
minor annoyance and do not plan to allow bare iterators for such cases. 
I think the absence of naked iterators has huge cognitive and safety 
advantages.


Andrei

Sep 09 2008

bearophile <bearophileHUGS lycos.com> writes:

Andrei Alexandrescu:
 In Phobos things will be something like:

List!(Filedesc) lst;
for (auto r = lst.all; !r.isEmpty; ) {
     if (should_remove)
          r = lst.erase(take(1, r));
     else
          r.next;
}

It may be better to invent and add some sugar to that, and foreach helps, maybe
something like:

List!(Filedesc) lst;
foreach (ref r; lst.all) {
     if (predicate(lst.item(r)))
          r = lst.erase(r);
     else
          r = r.next();
}

I think that code of mine isn't good yet :-)

Bye,
bearophile

Sep 09 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

bearophile wrote:
 Andrei Alexandrescu:
 In Phobos things will be something like:

 
 List!(Filedesc) lst;
 for (auto r = lst.all; !r.isEmpty; ) {
      if (should_remove)
           r = lst.erase(take(1, r));
      else
           r.next;
 }
 
 It may be better to invent and add some sugar to that, and foreach helps,
maybe something like:
 
 List!(Filedesc) lst;
 foreach (ref r; lst.all) {
      if (predicate(lst.item(r)))
           r = lst.erase(r);
      else
           r = r.next();
 }
 
 I think that code of mine isn't good yet :-)

Wow, that's risky. foreach bumps r under the hood, so you'll skip some 
elements.

Andrei

Sep 09 2008

Benji Smith <dlanguage benjismith.net> writes:

Andrei Alexandrescu wrote:
 I put together a short document for the range design. I definitely 
 missed about a million things and have been imprecise about another 
 million, so feedback would be highly appreciated. See:
 
 http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html

Just thinking off the top of my head...

How well would the proposal support a producer/consumer work queue, or a 
signal/slot implementation?

A work-queue consumer would view the queue as an infinite range with no 
end, but the producer would view that same queue as an infinite range 
with no beginning. And, conceivably, you could have "conduits" between 
the producer and consumer that would view that same queue as having 
neither a beginning nor an end.

I'm making no judgment about whether the proposal supports or doesn't 
support that kind of model. I'm just putting the idea out there for 
consideration.

--benji

Sep 09 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Benji Smith wrote:
 Andrei Alexandrescu wrote:
 I put together a short document for the range design. I definitely 
 missed about a million things and have been imprecise about another 
 million, so feedback would be highly appreciated. See:

 http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html

 
 Just thinking off the top of my head...
 
 How well would the proposal support a producer/consumer work queue, or a 
 signal/slot implementation?
 
 A work-queue consumer would view the queue as an infinite range with no 
 end, but the producer would view that same queue as an infinite range 
 with no beginning. And, conceivably, you could have "conduits" between 
 the producer and consumer that would view that same queue as having 
 neither a beginning nor an end.
 
 I'm making no judgment about whether the proposal supports or doesn't 
 support that kind of model. I'm just putting the idea out there for 
 consideration.

I think it's great to bring the idea up for discussion. The current 
design does not cater to such uses yet.

Andrei

Sep 09 2008

bearophile <bearophileHUGS lycos.com> writes:

Andrei Alexandrescu:

you seem to expose a false dichotomy: "simple" versus "allows power"/"allows
STL".

For all I know, much of Walter's claim to fame with D is that he proved that
power/efficiency and ease of use are not an either-or choice as most people
thought.<

I have given a hand developing a Python => C++ compiler, "ShedSkin", that
contains a complex type inferencer. It compiles a large subset of Python code
in quite efficient code. That project has shown me for the first time that it
can exist a very clean language that's rather efficient too.

So I know that in theory it's often a false dichotomy, but in practice if you
look at languages around, you generally have to pay for increased efficiency
with uglier syntax, more complex language, more bug-prone language, etc.


D is meant to be a systems-level programming language, and as such one clear
goal of it is to obviate a need for C++.<

C and BitC too are system languages, but they don't try to be as C++. There are
many projects today that use C instead of C++, for example Perl/Ruby/Python
source code is in C, etc.

I like D to be a system language but I don't think it's wise for D to try to
become a full replacement for C++. I think of D like a C/Java cross, with OOP,
generic programming, more sugar and built-in safeties :-)


In the inner circles of C++ there's often talk about how a replacement for C++
looks like, and to quote a classic who reflects the general opinion: "Whatever
replaces C++ should be at least as good as C++ at whatever C++ is good at, and
better than C++ at whatever C++ is bad at."<

That idea of yours scares me a little: I believe that if you want to create a
language able to do *everything* better than C++ you may end creating a
language almost like C++. I was hoping for D to become less powerful (and quite
less complex) than C++. Many things in D1 are designed to be less powerful than
C++ ones. So now I'd like to know what Walter thinks about this subject (and
other people), because there's a large difference in what I think D wants to be
and what you say to me.

This also shows that general discussions are sometimes as important as
discussing details :-)


And what would you rather have in the standard library? And what advantages are
you showing in exchange for the steep penalty in efficiency?<

You are right. Note that my libs aren't meant as replacement for the std lib.


(I looked over your library a while ago and it seems to use delegates
everywhere.)<

Yes, the situation is the same, I have just refined things, added more things,
etc. (Two other people have told me that they don't like this.)


Again you offer a false dichotomy. Do I want a beautiful wife or a smart one? D
code can enjoy the properties you mention without needing to pay an arm and a
leg for them.<

From the little I have seen your women aren't much nice looking ;-)


In fact I'm thinking that map should be lazy by default.<

I have both map() and xmap() in my libs (and map2, unfortunately), in my lib
map() == array(xmap()).
Python 2.5 has map() built in and xmap (named imap) in a std lib, and Python
3.0 has just a xmap() as built-in (named map).


If there are no licensing issues (which Walter is weary of) I'd be grateful to
benefit of your code or designs. Do you have a link handy?<

The code is Python License, I'll probably change it to the license used by
Phobos:
http://www.fantascienza.net/leonardo/so/libs_d.zip
Most of the functional stuff is in 'func.d', while lot of templates are in
'templates.d' :-)
I can't see how that code of mine can be adapted to yours, because they are
based on quite different principles.

Bye,
bearophile

Sep 09 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

bearophile wrote:
 That idea of yours scares me a little: I believe that if you want to
 create a language able to do *everything* better than C++ you may end
 creating a language almost like C++.

Not at all. C++ has much gratuitous complexity and just elementary 
language design mistakes.

Andrei

Sep 09 2008

Walter Bright <newshound1 digitalmars.com> writes:

bearophile wrote:
 That idea of yours scares me a little: I believe that if you want to
 create a language able to do *everything* better than C++ you may end
 creating a language almost like C++. I was hoping for D to become
 less powerful (and quite less complex) than C++. Many things in D1
 are designed to be less powerful than C++ ones. So now I'd like to
 know what Walter thinks about this subject (and other people),
 because there's a large difference in what I think D wants to be and
 what you say to me.

Back when I worked for Boeing, I had a discussion with some of the 
cockpit engineers about the jet engine controls (which were not made by 
Boeing, but by the engine company). Managing the fuel flow in a jet 
engine is a pretty complex business. Early engines relied on the pilots 
to tweak parameters, and if they got it wrong the engine would flame out 
probably at the worst possible moment.

Every effort was poured into automating it. Over time, they came up with 
a marvelous, and very complex, mechanical computer that bolted on the 
side of the engine, all of which was commanded by a single lever in the 
cockpit that the pilot pushed forward and he got more power.

The pilot (our "programmer") had all this *power* at his command at the 
simple, and intuitive, push of a lever. The pilot wasn't giving up a 
thing for this elegance.

I believe that at least half of the complexity of C++ is unrelated to 
its power (it's more related to the language's history). I see no 
inherent reason why a powerful language must be complicated.

Sure, there's complexity in D, too, but I regard those aspects as a 
failure in the design department.

Look at it like vinyl vs CD. There are people who insist that vinyl does 
better sonic reproduction. If you look at how the sound is reproduced, 
that just doesn't make any sense. What is really happening is they 
*like* the peculiar imperfections of vinyl; they like the hiss, rumble, 
pops and clicks. I know people who like the wacky imperfections in C++. 
They revel in mastering the arcana. But I don't see the point in that. I 
  see beauty in elegance, which is not the same thing as simplicity 
achieved by stripping out the power.

Sep 10 2008

JAnderson <ask me.com> writes:

Walter Bright wrote:
 bearophile wrote:
 That idea of yours scares me a little: I believe that if you want to
 create a language able to do *everything* better than C++ you may end
 creating a language almost like C++. I was hoping for D to become
 less powerful (and quite less complex) than C++. Many things in D1
 are designed to be less powerful than C++ ones. So now I'd like to
 know what Walter thinks about this subject (and other people),
 because there's a large difference in what I think D wants to be and
 what you say to me.

 

IMHO

I think complexity is relative to the problem being tackled.  At the 
point at which something gets complex it probably requires its own 
abstraction.

Case in point, ranges.  We could create and manage ranges ourselves 
using standard C style forloops however to get it up to the level ranges 
provide we would have to add a load of asserts to validate that its 
correct.  Ranges abstract so that we don't have to think about it all 
the time.  Yes ranges are a more complex beast but only the internals, 
not the externs when your using them in a foreach.  So to begin with we 
are writing more code but its abstracting away details that you don't 
want to think about on a day-to-day bases, not to mention reuse.

Maybe D will endup with many more complex features then C++ however if 
they are used by library writers (in particular the standards) to make 
libraries easier to use then I'd say its a win.  If a feature enables us 
to optimize code just by using a few different terms then its a win.  If 
a feature reduces the complexity (size) of our own code then its a win. 
  If it reduced time spent in the debugger then its a huge win.

Looking at something like a car, these things are more complex then ever 
however its there complexity that makes them simple to use.  Personally 
when I'm coding I like to only pull as few strings as possible to get a 
job done.  It takes some time to get code in a state where that is 
possible however thats marginal compared to the time saved.

-Joel

Sep 11 2008

Derek Parnell <derek nomail.afraid.org> writes:

On Mon, 08 Sep 2008 16:50:54 -0500, Andrei Alexandrescu wrote:

 Hello,

By the way, I meant to say this earlier, but I'm very glad that you have
presented something for us to discuss with you. I really appreciate this.

-- 
Derek
(skype: derek.j.parnell)
Melbourne, Australia
10/09/2008 10:18:23 AM

Sep 09 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Derek Parnell wrote:
 On Mon, 08 Sep 2008 16:50:54 -0500, Andrei Alexandrescu wrote:
 
 Hello,

 
 By the way, I meant to say this earlier, but I'm very glad that you have
 presented something for us to discuss with you. I really appreciate this.

Thanks.

chop?


Andrei

Sep 09 2008

Don <nospam nospam.com.au> writes:

Andrei Alexandrescu wrote:
 In most slice-based D programming, using bare pointers is not necessary. 
 Could then there be a way to use _only_ ranges and eliminate iterators 
 altogether? A container/range design would be much simpler than one also 
 exposing iterators.

 http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html

I like this a lot. You've mentioned safety and simplicity, but it also 
seems to be a more powerful abstraction than STL-style iterators.

Consider a depth-first-search over a tree. You have a start point, an 
end point, and some internal state (in this case, some kind of stack). 
The interesting thing is that the required internal state _may depend on 
the values of the start & end points_.

STL iterators don't model this very well, since they require a symmetry 
between iterators. Which creates the difficulty of where the internal 
state should be stored.
You can get away with independent iterators when the relationship 
between start and end is, "if you perform ++ on start enough times, you 
reach end". A simple array-style range formalizes this relationship, but 
the range concept also allows more complex relationships to be expressed.

So I think the value of this approach improves for more complicated 
iterators than the simple ones used by the STL.

Sep 10 2008

"Bill Baxter" <wbaxter gmail.com> writes:

On Wed, Sep 10, 2008 at 4:52 PM, Don <nospam nospam.com.au> wrote:
 Andrei Alexandrescu wrote:
 In most slice-based D programming, using bare pointers is not necessary.
 Could then there be a way to use _only_ ranges and eliminate iterators
 altogether? A container/range design would be much simpler than one also
 exposing iterators.

 http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html

 I like this a lot. You've mentioned safety and simplicity, but it also seems
 to be a more powerful abstraction than STL-style iterators.

 Consider a depth-first-search over a tree. You have a start point, an end
 point, and some internal state (in this case, some kind of stack). The
 interesting thing is that the required internal state _may depend on the
 values of the start & end points_.

Or you can think of it as a current point and an stopping criterion.

 STL iterators don't model this very well, since they require a symmetry
 between iterators. Which creates the difficulty of where the internal state
 should be stored.

This is also why I argued in my other post on digitalmars.D that we
shouldn't be trying to force the start and end parts of a range to be
named with complete symmetry.  They have different purposes.  There
will generally be one current point that is relatively active and one
stopping criterion that is relatively fixed.

I would like to take back one thing, though.  In another post I said I
didn't think using * for getting the current value was a good idea
because it would be too hard to grep for.  I hadn't been considering
the RandomAccess ranges at that time.  I can't imagine *not* using
operators for the random access ranges -- it just makes too much
sense.  So if it's ok for random access then it should be ok for
forward and bidir to use operators too.   BUT, just as with the random
access ranges, I don't think there should be any synonyms.  Just use *
as the only way to get the current element of a range.

I think the desire to have a special "*" shortcut is as clear an
indication as any that Andrei in his heart of hearts agrees that the
two parts of a range are not really symmetric and should not be
treated as such.

--bb

Sep 10 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Bill Baxter wrote:
 On Wed, Sep 10, 2008 at 4:52 PM, Don <nospam nospam.com.au> wrote:
 Andrei Alexandrescu wrote:
 In most slice-based D programming, using bare pointers is not necessary.
 Could then there be a way to use _only_ ranges and eliminate iterators
 altogether? A container/range design would be much simpler than one also
 exposing iterators.
 http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html

 I like this a lot. You've mentioned safety and simplicity, but it also seems
 to be a more powerful abstraction than STL-style iterators.

 Consider a depth-first-search over a tree. You have a start point, an end
 point, and some internal state (in this case, some kind of stack). The
 interesting thing is that the required internal state _may depend on the
 values of the start & end points_.

 
 Or you can think of it as a current point and an stopping criterion.

My design intently supports forward iteration with sentinel (e.g. a 
singly-linked list iterator that only has one node pointer and knows 
it's done when it hits null) and also forward iteration that holds both 
limits (e.g. a singly-linked list iterator that holds TWO node pointers 
and knows it's done when they are equal). That's why forward iterators 
never support a subrange "up to the beginning of some other range" 
because that would rule out sentinel-terminated iterators.

It intently does not support things like zero-terminated strings as 
random iterators. Why? Because there's no safe way of implementing indexing.

I am glad you noticed all this. It's quite subtle.

 STL iterators don't model this very well, since they require a symmetry
 between iterators. Which creates the difficulty of where the internal state
 should be stored.

 
 This is also why I argued in my other post on digitalmars.D that we
 shouldn't be trying to force the start and end parts of a range to be
 named with complete symmetry.  They have different purposes.  There
 will generally be one current point that is relatively active and one
 stopping criterion that is relatively fixed.

They do have different purposes and they are asymmetric. But as far as I 
could tell in reimplementing std.algorithm that asymmetry does not need 
to spill into the interface.

There is one imperfection: there are forward iterators that can 
implement subranges "up to the beginnig of some other range". They are 
not categorized in my design.

 I would like to take back one thing, though.  In another post I said I
 didn't think using * for getting the current value was a good idea
 because it would be too hard to grep for.  I hadn't been considering
 the RandomAccess ranges at that time.  I can't imagine *not* using
 operators for the random access ranges -- it just makes too much
 sense.  So if it's ok for random access then it should be ok for
 forward and bidir to use operators too.   BUT, just as with the random
 access ranges, I don't think there should be any synonyms.  Just use *
 as the only way to get the current element of a range.
 
 I think the desire to have a special "*" shortcut is as clear an
 indication as any that Andrei in his heart of hearts agrees that the
 two parts of a range are not really symmetric and should not be
 treated as such.

Walter doesn't like "*" :o(.


Andrei

Sep 10 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Don wrote:
 Andrei Alexandrescu wrote:
 In most slice-based D programming, using bare pointers is not 
 necessary. Could then there be a way to use _only_ ranges and 
 eliminate iterators altogether? A container/range design would be much 
 simpler than one also exposing iterators.

 
 http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html

 
 I like this a lot. You've mentioned safety and simplicity, but it also 
 seems to be a more powerful abstraction than STL-style iterators.
 
 Consider a depth-first-search over a tree. You have a start point, an 
 end point, and some internal state (in this case, some kind of stack). 
 The interesting thing is that the required internal state _may depend on 
 the values of the start & end points_.
 
 STL iterators don't model this very well, since they require a symmetry 
 between iterators. Which creates the difficulty of where the internal 
 state should be stored.
 You can get away with independent iterators when the relationship 
 between start and end is, "if you perform ++ on start enough times, you 
 reach end". A simple array-style range formalizes this relationship, but 
 the range concept also allows more complex relationships to be expressed.
 
 So I think the value of this approach improves for more complicated 
 iterators than the simple ones used by the STL.

That's a great insight. Hadn't thought of it!

Andrei

Sep 10 2008

D Programming

C/C++ Programming

Other

digitalmars.D.announce - RFC on range design for D2