digitalmars.D - Why Strings as Classes?

Benji Smith (70/70) Aug 25 2008 In another thread (about array append performance) I mentioned that

Benji Smith (64/64) Aug 25 2008 Oh, man, I forgot one other thing... And it's a biggie...

superdan (13/96) Aug 25 2008 try this:

BCS (3/13) Aug 25 2008 Ditto, D is a *systems language* It's *supposed* to have access to the l...

Benji Smith (7/9) Aug 25 2008 But in this "systems language", it's a O(n) operation to get the nth

BCS (5/19) Aug 25 2008 Then borrow, buy, steal or build a class that does that /on top of the D...

Benji Smith (6/28) Aug 25 2008 The point is that the new string class would be incompatible with the

BCS (3/30) Aug 25 2008 That is an issue with (and *only* with) "the *hundreds* of existing func...
Chris R. Miller (26/54) Aug 25 2008 -------------------------------------------

superdan (6/16) Aug 25 2008 dood. i dunno where to start. allow me to answer from multiple angles.

Benji Smith (40/61) Aug 25 2008 Geez, man, you just keep missing the point, over and over again.

BCS (12/15) Aug 25 2008 No, you can't.
Robert Fraser (4/83) Aug 25 2008 Superdan is confusing the issues here. The main argument against your

superdan (2/86) Aug 25 2008 i'm not confusin'. mentioned the efficiency thing a number of times, did...

superdan (8/76) Aug 25 2008 so far so good.

Benji Smith (35/36) Aug 25 2008 Okay. I'll try :)

superdan (18/56) Aug 25 2008 okay.

Christopher Wright (38/77) Aug 25 2008 class Node(T)

superdan (11/96) Aug 25 2008 oopsies. houston we got a problem here. problem is all that pluggable so...

Christopher Wright (32/102) Aug 26 2008 WRONG!

superdan (13/128) Aug 26 2008 the correct opIndex runs in o(1).

Steven Schveighoffer (32/66) Aug 26 2008 O(n^2 log n) is still considered solved. Anything that is not exponenti...

superdan (7/81) Aug 26 2008 sure thing.

Chris R. Miller (39/89) Aug 26 2008 tion

superdan (7/83) Aug 27 2008 you: "this scent will make skunk farts stink less."

Chris R. Miller (28/58) Aug 27 2008 t

Dee Girl (2/47) Aug 27 2008 But this is what STL did. Sorry, Dee Girl

Chris R. Miller (12/17) Aug 27 2008 Reading back through the STL intro, it seems that all this STL power

Don (3/19) Aug 28 2008 Read Alexander Stepanov's notes. They are fantastic.

Dee Girl (6/60) Aug 27 2008 I am sorry to enter discussion. But I have some thing to say. Please do ...

Robert Fraser (25/70) Aug 26 2008 No, YOU got the wrong definition of correct. "Correct" and "scalabale"

Lars Ivar Igesund (8/12) Aug 26 2008 http://www.dsource.org/projects/tango/browser/trunk/tango/util/collectio...

Robert Fraser (5/15) Aug 26 2008 The new package has this feature too:

Steven Schveighoffer (9/24) Aug 26 2008 First, Slink is not really the public interface, it is the unit that

superdan (9/86) Aug 26 2008 stepanov has shown that for composable operations the complexity must be...

Robert Fraser (20/28) Aug 26 2008 Um.. . how would one "show" that. I'm not talking theoretical bullshit

superdan (8/42) Aug 26 2008 sure thing.

Denis Koroskin (10/14) Aug 26 2008 I agree. You can't rely on function invokation, i.e. the following might...

Denis Koroskin (7/24) Aug 26 2008 The same goes to assignment, casts, comparisons, shifts, i.e. everything...

Benji Smith (8/22) Aug 26 2008 This is the main reason I dislike D's optional parentheses for function

Steven Schveighoffer (10/26) Aug 26 2008 less than O(n) complexity please :) Think of tree map complexity which ...

Denis Koroskin (3/39) Aug 26 2008 Yes, that was a rash statement.

superdan (3/49) Aug 26 2008 i'm kool & the gang with log n too. that's like proportional 2 the count...

Nick Sabalausky (10/26) Aug 26 2008 I disagree. That strategy strikes me as a very clear example of breaking...

superdan (9/40) Aug 26 2008 take this:

Benji Smith (16/19) Aug 26 2008 The other variable cost operation of a sort is the element comparison.

superdan (3/22) Aug 26 2008 good points. i only know of one trick to save on comparisons. it's that ...

Nick Sabalausky (51/112) Aug 26 2008 Choosing a sort method is a separate task from the actual sorting. Any s...

superdan (23/153) Aug 26 2008 thot you were the one big on abstraction and encapsulation and all those...

Nick Sabalausky (52/242) Aug 26 2008 I never said that shouldn't be available. In fact, I did say it should b...

Michiel Helvensteijn (30/31) Aug 27 2008 Ok. So you are not opposed to the random access operation on a list, as ...

superdan (16/51) Aug 27 2008 yeppers. amend that to o(log n). in d, that rule is a social contract de...

Michiel Helvensteijn (43/64) Aug 27 2008 Perhaps. It's been a while since I've worked with data-structures on thi...

Dee Girl (15/87) Aug 27 2008 Michiel-san, this is new data structure very different from list! If I w...

Michiel Helvensteijn (29/95) Aug 27 2008 Yes, the first 'trick' makes it a different datastructure. The second do...

Dee Girl (8/110) Aug 27 2008 Not for singly linked lists. I think name "trick" is very good. It is tr...

Michiel Helvensteijn (14/52) Aug 27 2008 Yeah, also for singly linked lists.

Bill Baxter (12/20) Aug 27 2008 The complexity of STL's std::map indexing operator is O(lg N).
Dee Girl (7/63) Aug 27 2008 May be it is not interesting discuss trick more. I am sure many tricks c...

Michiel Helvensteijn (19/53) Aug 27 2008 Let me try again. I agree that you may impose complexity-restrictions in

Dee Girl (5/63) Aug 27 2008 Thank you for trying again. Thank you! I understand. Yes, a[n] is very c...

Michiel Helvensteijn (7/16) Aug 27 2008 In the future it may be possible to do such analysis. If the indexing is...

bearophile (5/9) Aug 27 2008 For example you can write Deque data structure made with a double linked...

superdan (5/99) Aug 27 2008 boils down to what's primitive access vs. what's actual algorithm. index...

Benji Smith (17/28) Aug 27 2008 Well, that's what you get with operator overloading.

Dee Girl (6/37) Aug 27 2008 The cost of + and - is proportional to digits in number. For small numbe...

Steven Schveighoffer (17/20) Aug 27 2008 Slightly off topic, when I was developing dcollections, I was a bit anno...
Nick Sabalausky (23/71) Aug 27 2008 A generic algoritm has absolutely no business caring about the complexit...

superdan (13/99) Aug 27 2008 sure you don't know what you're talking about. it is generic insofar as ...

Nick Sabalausky (5/131) Aug 27 2008 I'll agree to drop this issue. There's little point in debating with som...

Fawzi Mohamed (11/164) Aug 27 2008 I am with dan dee_girl & co on this issue, the problem is that a

Nick Sabalausky (25/208) Aug 27 2008 IMO, a better way to do that would be via C#-style attributes or equivil...

Steven Schveighoffer (32/47) Aug 27 2008 The guarantee is not enforced, but the expectation and convention is

Nick Sabalausky (17/53) Aug 27 2008 This seems to be a big part of the disagreement. Personally, I think it'...

Steven Schveighoffer (18/79) Aug 28 2008 Perhaps it is a mistake to assume it, but it is a common mistake. And t...

Nick Sabalausky (29/112) Aug 28 2008 Why in the world would any halfway competent programmer ever look at a

Dee Girl (6/126) Aug 28 2008 I think this is extreme complicate design. What is advantage of this des...
Steven Schveighoffer (29/149) Aug 28 2008 You are writing this function:

Nick Sabalausky (65/129) Aug 28 2008 Ok, so you want foo() to be able to tell if the collection has fast or s...

Steven Schveighoffer (62/153) Aug 28 2008 No, I don't want to be able to tell. I don't want to HAVE to be able to...

Nick Sabalausky (122/278) Aug 29 2008 You're missing the point. Since, as you say below, you want foo to not b...

Dee Girl (4/66) Aug 28 2008 This is big, big mistake! I think I do not know how to explain it. I cou...

Fawzi Mohamed (34/74) Aug 28 2008 yes categories are basically named interfaces for types, unlike the

Derek Parnell (6/8) Aug 27 2008 I also believe this to be true.

Dee Girl (2/8) Aug 27 2008 It is true. But only if you take out of context. An algorithm does not n...

Dee Girl (11/92) Aug 27 2008 I appreciate your view point. Please allow me explain. The view point is...

Nick Sabalausky (20/110) Aug 27 2008 If a generic algorithm describes itself as "linear find" then I know dam...

Don (4/114) Aug 28 2008 This will work too.

Nick Sabalausky (31/80) Aug 28 2008 I'm not sure that's really a "guarantee" per se, but that's splitting ha...

Don (10/101) Aug 28 2008 They are. But...

Nick Sabalausky (26/132) Aug 28 2008 Taking a slight detour, let me ask you this... Which of the following

Steven Schveighoffer (9/147) Aug 28 2008 For me at least, you are wrong :) In fact, I view it the other way, you...

Don (4/156) Aug 29 2008 I agree. It's about _which_ details do you want to abstract away. I

Christopher Wright (17/175) Aug 29 2008 We all agree about this. What we disagree about is how to find out about...

Robert Fraser (17/40) Aug 28 2008 Didn't read the rest of the discussion, but I disagree here... Most

Walter Bright (3/9) Aug 29 2008 I think B should be clearer and more intuitive, it's just that I'm not

bearophile (15/16) Aug 30 2008 Well, if you use D 2 you write it this way:

Dee Girl (13/131) Aug 28 2008 Of course you can design bad collection and bad iterator. Let me ask thi...

Simen Kjaeraas (14/15) Aug 27 2008 To me, this is one of the most important points here. I want a

superdan (3/17) Aug 27 2008 guess i missed the lesson teachin' array indexing was unsafe. you are lo...

Robert Fraser (19/20) Aug 27 2008 I see what you did thar -- you made up a rule you like and called it a

superdan (7/30) Aug 27 2008 well i'm exposed. good goin' johnny drama.
Sergey Gromov (5/9) Aug 28 2008 Hash's opIndex() throws an ArrayBoundsError if given an unknown key.

Christopher Wright (7/10) Aug 26 2008 My mistake. Merge sort, qsort, and heap sort are all O(n log n) for any ...

superdan (2/13) Aug 26 2008 sigh. your mistake indeed. just not where you thot. quicksort needs rand...

Christopher Wright (6/20) Aug 26 2008 You need to pick a random pivot in order to guarantee that runtime, in

superdan (2/23) Aug 26 2008 damn man you're right. yeah it's still o(n log n). i was wrong. 'pologie...

Michel Fortin (25/32) Aug 25 2008 Indeed. But notice that the Standard Template Library containers

superdan (8/33) Aug 25 2008 perfect answer. u da man.

Michel Fortin (6/10) Aug 25 2008 That's sorta what I had in mind.
Walter Bright (3/15) Aug 26 2008 Yes. It's just that template constraints came along later than

Walter Bright (5/6) Aug 26 2008 STL is a piece of brilliance in C++ (and one can reasonably argue that

Walter Bright (11/17) Aug 26 2008 I've written internationalized applications that dealt with multibyte

Lionello Lunesu (4/9) Aug 25 2008 I did, a long time ago. #111 if I'm not mistaken.
Benji Smith (8/22) Aug 26 2008 Cool. I had no idea that was possible. I was doing this:

superdan (14/92) Aug 25 2008 so u mean with a class the encoding char/wchar/dchar won't be an issue a...

BCS (2/6) Aug 25 2008 OTOH as standard lib string class...
Benji Smith (29/30) Aug 25 2008 Good APIs define public interfaces and hide implementation details,

superdan (15/46) Aug 25 2008 sure thing. all this gave me a warm fuzzy feelin' right there.

Benji Smith (23/27) Aug 25 2008 The standard libraries are in a grey area between language the language

BCS (9/25) Aug 25 2008 works as defined

bearophile (4/6) Aug 25 2008 I suggest Benji to try C# 3+, despite all the problems it has and the bo...

Benji Smith (8/16) Aug 25 2008 Yep, I like C# a lot. I think it's very well-designed, with the language...

bearophile (6/8) Aug 25 2008 In the past I have said that C# 3.5/4 has some small ideas that D may en...

Nick Sabalausky (5/11) Aug 26 2008 (pet peeve) As much as there is that I like about C#, the lack of an

BCS (2/18) Aug 26 2008 C# generics are *Crippled*. They more or less do nothing but map types a...

Christopher Wright (2/23) Aug 26 2008 Yes, but oh, the syntax!

Jesse Phillips (9/45) Aug 25 2008 On the language spec vs standard library. While the GC is implemented in...

BCS (7/12) Aug 25 2008 Unless you are only going to parse the start of the file or are going to...

Robert Fraser (4/19) Aug 25 2008 There are cases where you might want to parse an XML file that won't fit...

BCS (5/28) Aug 26 2008 If you can't fit the data file in memory the I find it hard to believe y...

Benji Smith (8/38) Aug 26 2008 Well, for something like a DOM parser, it's pretty much impossible to

BCS (6/51) Aug 26 2008 Interesting, I've worked with parsers* that function something like that...

Benji Smith (12/29) Aug 26 2008 In fact, that's one of the places where I've used this kind of parsing

superdan (2/35) Aug 26 2008 sure it takes very little memory. i'll tell u how much memory u need in ...

Benji Smith (7/22) Aug 26 2008 Noooooooobody uses backtracking to parse.

superdan (3/25) Aug 26 2008 live & learn. keep lookin'. hint: try antlr.

Benji Smith (13/23) Aug 26 2008 I've used ANTLR a few times. It's nice. I didn't realize it supported

BCS (4/6) Aug 26 2008 I've used it. If you gave me the choice of sitting in a cardboard small ...

superdan (2/10) Aug 26 2008 i've used it too eh. u gotta be talking about a pretty charmin' cozy box...

BCS (11/25) Aug 26 2008 The above is intended as a pun, "I don't fit in the ANTLR box". It's lik...

Benji Smith (29/37) Aug 26 2008 I've always been impressed by the capabilities of ANTLR. The ANTLRWorks

BCS (2/49) Aug 26 2008 My feeling's exactly (or near enough)

superdan (3/9) Aug 26 2008 well since you was gloating about handling a csv file as "parsing" i tho...

Benji Smith (4/5) Aug 26 2008 LOL

superdan (2/8) Aug 26 2008 meh. whacha sayin'? i ain't talking much.

Nick Sabalausky (5/13) Aug 26 2008 Talking, no. Rambling on with a bunch of needless slang, hostility, and

superdan (2/17) Aug 26 2008 don't be hat'n' :)

BCS (3/20) Aug 26 2008 A CVS parser can be interesting if you have high enough performance dema...

Benji Smith (15/20) Aug 26 2008 I don't know about "gloating". I mentioned it, because it was relevant

superdan (3/13) Aug 26 2008 conversatin's cool. but if you says something wrong and i happen to know...

BCS (2/33) Aug 26 2008 Antlr, dparse and (IIRC) eniki all do

Robert Fraser (3/33) Aug 26 2008 I think that's one of the reasons to use a streaming parser -- so you

Christopher Wright (3/4) Aug 25 2008 Instead, the runtime has to know how to convert between utf8, utf16, and...
Benji Smith (69/79) Aug 26 2008 issue anymore. that would be hidden behind the wraps. cool.

bearophile (7/18) Aug 26 2008 String splitting requires lot of work for the GC. The GC of HotSpot is l...

JAnderson (18/26) Aug 26 2008 I don't think polymorphic strings is right for D strings. This is the
Walter Bright (17/74) Aug 26 2008 Is the problem you're referring to the fact that there are 3 character

Benji Smith (34/72) Aug 26 2008 Basically, yeah.
bearophile (5/12) Aug 26 2008 Really? I must have missed those conclusions then, despite reading all t...

Tomas Lindquist Olsen (4/18) Aug 26 2008 Might I ask where that site is?

bearophile (4/5) Aug 26 2008 I have sent you an email with information and more things, etc.

Tomas Lindquist Olsen (3/10) Aug 28 2008 Got it. Thanx :) I'll give it a go over the weekend :)

Walter Bright (3/16) Aug 26 2008 In my experience with such programs, disabling the collection cycles

bearophile (38/46) Aug 26 2008 In my experience there's some difference still.

Walter Bright (2/10) Aug 26 2008 I invite you to look at the code in internal/aaA.d and do some testing!

Fawzi Mohamed (13/40) Aug 28 2008 I think that the invitation should be read as the possibility to

Manfred_Nowak (7/8) Aug 28 2008 Sorry. Although I cancelled my posting within seconds, you grabbed it

Fawzi Mohamed (4/12) Aug 29 2008 well out of curiosity how do you cancel a post? (that way I could have

Manfred_Nowak (6/7) Aug 29 2008 By using a news-client, that has this feature.

Michiel Helvensteijn (5/8) Aug 29 2008 But the news-server also needs to have this feature, and not all do. (Do...

Walter Bright (3/14) Aug 29 2008 Right.

Manfred_Nowak (15/22) Aug 29 2008 I retracted my posting immediately because it wasn't well thought out.

Manfred_Nowak (8/9) Aug 27 2008 Biggest of all: using the wrong tool. I.e. using a hash map for a

bearophile (7/9) Aug 27 2008 But if the hash machinery is good it must work well in this common situa...

Manfred_Nowak (19/24) Aug 28 2008

bearophile (6/7) Aug 28 2008 No, it's a different statement.

Manfred_Nowak (10/13) Aug 28 2008 Yes. It is a different statement. So what?

bearophile (112/112) Aug 28 2008 This is another small associative array benchmark, with strings. Please ...

Fawzi Mohamed (17/26) Aug 26 2008 You know I have got the impression that you have a naive view of

bearophile (14/19) Aug 27 2008 My view on the topic are:

Fawzi Mohamed (36/101) Aug 27 2008 on this we agree

bearophile (10/31) Aug 27 2008 I agree, a deque will probably be always faster in appending than a dyna...

Fawzi Mohamed (19/76) Aug 27 2008 well it is funny because now I am not too sure anymore that adding an

Yigal Chripun (13/13) Aug 26 2008 Benji suggested run-time inheritance and at least from a design

Walter Bright (5/9) Aug 26 2008 Virtual function calls have been a problem for hardware optimization.

Jb (11/20) Aug 26 2008 Modern x86 branch prediction treats indirect calls the same as condition...

superdan (3/28) Aug 26 2008 you're right. but direct calls don't speculate. they don't need speculat...
superdan (3/28) Aug 26 2008 you're right. but direct calls don't speculate. they don't need speculat...

Jb (7/43) Aug 26 2008 Walter said "the hardware cannot predict where a virtual call will go".

Walter Bright (7/14) Aug 26 2008 Looks like I keep falling behind on what modern CPUs are doing :-(

bearophile (6/7) Aug 26 2008 The 5 good PDF files in this page are probably enough to put you back in...

Jb (4/12) Aug 26 2008 Anger Fog's guides are the best optimization info you can get. They're

Jb (6/20) Aug 26 2008 True. I'd add this to the list aswell..
JAnderson (5/22) Aug 26 2008 Also you can't inline virtual calls (well a smart compiler could but

JAnderson (18/46) Aug 26 2008 That's x86 hardware. Try something like the ps3. That systems has
Nick Sabalausky (2/18) Aug 26 2008 Just curious: How "modern", do you mean by "modern" here?

Jb (6/27) Aug 26 2008 Well I thought it was the Pentium II, but acording to AgnerFog, it's sin...

Benji Smith (17/27) Aug 26 2008 What about for software optimization?

Michel Fortin (28/46) Aug 26 2008 Hum, I believe you're talking about the cache for method calls. What

bearophile (11/13) Aug 27 2008 Its code:

Denis Koroskin (4/21) Aug 27 2008 I believe this breaks encapsulation, unless an iterator is returned and ...

bearophile (5/6) Aug 27 2008 You may be right, I'm learning such thematic still. Can you explain why ...
Nick Sabalausky (18/43) Aug 27 2008 The way I see it, encapsulation is all about the black box idea. And the...

Derek Parnell (8/20) Aug 27 2008 Exactly.

Dee Girl (3/24) Aug 27 2008 I am confused to. Why is making [] do find good and making find do [] wr...

Christopher Wright (12/13) Aug 27 2008 Choosing the data structure used is never the job of the code you're

Dee Girl (5/20) Aug 27 2008 Yes. But code should refuse data if data does not respect an interface.

superdan (3/22) Aug 27 2008 cool. dee there's two things u said i like. one was the contract view of...
Christopher Wright (20/44) Aug 27 2008 Correct. The question is whether you should make asymptotic complexity

Dee Girl (5/54) Aug 27 2008 Again. Why make design on your knees (I mean in a hurry) when STL has mu...

Michiel Helvensteijn (6/11) Aug 28 2008 Isn't that all we're saying about opIndex()? It retrieves the value from...
Christopher Wright (29/82) Aug 28 2008 The problem with your suggestions -- I have no idea whether your

Dee Girl (4/99) Aug 29 2008 Hi, Christopher!

Nick Sabalausky (5/27) Aug 27 2008 True. But, since performance (as you described it) is not particularly

Nick Sabalausky (7/43) Aug 27 2008 Making [] do "find" is never good. The [n] should always mean "return th...

lurker (5/54) Aug 27 2008 (I hate myself for letting myself getting into this......)

Nick Sabalausky (4/6) Aug 28 2008 In terms of under-the-hood implementation, yes. In terms of input/output...
Nick Sabalausky (17/21) Aug 28 2008 Not that I normally go to such lengths for online debates, but I just

Dee Girl (2/27) Aug 28 2008 The properties are important Nick-san. Not the name! If it looks like a ...

Fawzi Mohamed (24/39) Aug 28 2008 I agree with the meaning, but I disagree with the example.

Fawzi Mohamed (4/50) Aug 28 2008 Please note before the discussion gets messy, that I find the hierarchy
Michel Fortin (15/23) Aug 28 2008 I perfectly agree with this. This is why I prefer much Objective-C's

Fawzi Mohamed (4/27) Aug 28 2008 ok given, but I would say that one almost never takes advantage of

David Wilson (9/47) Aug 28 2008 It is worse in simple "for each element" uses, but one of the cooler

Fawzi Mohamed (11/68) Aug 28 2008 I appreciate it, and in fact a general iterator is more powerful than

Denis Koroskin (5/42) Aug 28 2008 Agreed. I usually write iterators that support "T geValue()", "void

Benji Smith <dlanguage benjismith.net> writes:

In another thread (about array append performance) I mentioned that 
Strings ought to be implemented as classes rather than as simple builtin
arrays. Superdan asked why. Here's my response...

I'll start with a few of the softball, easy reasons.

For starters, with strings implemented as character arrays, writing 
library code that accepts and operates on strings is a bit of a pain in 
the neck, since you always have to write templates and template code is 
slightly less readable than non-template code. You can't distribute your 
code as a DLL or a shared object, because the template instantiations 
won't be included (unless you create wrapper functions with explicit 
template instantiations, bloating your code size, but more importantly 
tripling the number of functions in your API).

Another good low-hanging argument is that strings are frequently used as 
keys in associative arrays. Every insertion and retrieval in an 
associative array requires a hashcode computation. And since D strings 
are just dumb arrays, they have no way of memoizing their hashcodes. 
We've already observed that D assoc arrays are less performant than even 
Python maps, so the extra cost of lookup operations is unwelcome.

But much more important than either of those reasons is the lack of 
polymorphism on character arrays. Arrays can't have subclasses, and they 
can't implement interfaces.

A good example of what I'm talking about can be seen in the Phobos and 
Tango regular expression engines. At least the Tango implementation 
matches against all string types (the Phobos one only works with char[] 
strings).

But what if I want to consume a 100 MB logfile, counting all lines that 
match a pattern?

Right now, to use the either regex engine, I have to read the entire 
logfile into an enormous array before invoking the regex search function.

Instead, what if there was a CharacterStream interface? And what if all 
the text-handling code in Phobos & Tango was written to consume and 
return instances of that interface?

A regex engine accepting a CharacterStream interface could process text 
from string literals, file input streams, socket input streams, database 
records, etc, etc, etc... without having to pollute the API with a bunch 
of casts, copies, and conversions. And my logfile processing application 
would consume only a tiny fraction of the memory needed by the character 
array implementation.

Most importantly, the contract between the regex engine and its 
consumers would provide a well-defined interface for processing text, 
regardless of the source or representation of that text.

Along a similar vein, I've worked on a lot of parsers over the past few 
years, for domain specific languages and templating engines, and stuff 
like that. Sometimes it'd be very handy to define a "Token" class that 
behaves exactly like a String, but with some additional behavior. 
Ideally, I'd like to implement that Token class as an implementor of the 
CharacterStream interface, so that it can be passed directly into other 
text-handling functions.

But, in D, with no polymorphic text handling, I can't do that.

As one final thought... I suspect that mutable/const/invariant string 
handling would be much more conveniently implemented with a 
MutableCharacterStream interface (as an extended interface of 
CharacterStream).

Any function written to accept a CharacterStream would automatically 
accept a MutableCharacterStream, thanks to interface polymorphism, 
without any casts, conversions, or copies. And various implementors of 
the interface could provide buffered implementations operating on 
in-memory strings, file data, or network data.

Coding against the CharacterStream interface, library authors wouldn't 
need to worry about const-correctness, since the interface wouldn't 
provide any mutator methods.

But then again, I haven't used any of the const functionality in D2, so 
I can't actually comment on relative usability of compiler-enforced 
immutability versus interface-enforced immutability.

Anyhow, those are some of my thoughts... I think there are a lot of 
compelling reasons for de-coupling the specification of string handling 
functionality from the implementation of that functionality, primarily 
for enabling polymorphic text-processing.

But memoized hashcodes would be cool too :-)

--benji

Aug 25 2008

Benji Smith <dlanguage benjismith.net> writes:

Oh, man, I forgot one other thing... And it's a biggie...

The D _Arrays_ page says that "A string is an array of characters. 
String literals are just an easy way to write character arrays."

http://digitalmars.com/d/1.0/arrays.html

In my previous post, I also use the "character array" terminology.

Unfortunately, though, it's just not true.

A char[] is actually an array of UTF-8 encoded octets, where each 
character may consume one or more consecutive elements of the array. 
Retrieving the str.length property may or may not tell you how many 
characters are in the string. And pretty much any code that tries to 
iterate character-by-character through the array elements is 
fundamentally broken.

Take a look at this code, for example:

------------------------------------------------------------------
import tango.io.Stdout;

void main() {

    // Create a string with UTF-8 content
    char[] str = "mötley crüe";
    Stdout.formatln("full string value: {}", str);

    Stdout.formatln("len: {}", str.length);
    // --> "len: 13" ... but there are only 11 characters!

    Stdout.formatln("2nd char: '{}'", str[1]);
    // --> "2nd char: ''" ... where'd my character go?

    Stdout.formatln("first 3 chars: '{}'", str[0..3]);
    // --> "first 3 chars: 'mö'" ... why only 2?

    char o_umlat = 'ö';
    Stdout.formatln("char value: '{}'", o_umlat);
    // --> "char value: ''" ... where's my char?

}
------------------------------------------------------------------

So you can't actually iterate the the char elements of a char[] without 
risking that you'll turn your string data into garbage. And you can't 
trust that the length property tells you how many characters there are. 
And you can't trust that an index or a slice will return valid data.

Also: take a look at the Phobos string "find" functions:

   int find(char[] s, dchar c);
   int ifind(char[] s, dchar c);
   int rfind(char[] s, dchar c);
   int irfind(char[] s, dchar c);

Huh?

To find a character in a char[] array, you have to use a dchar?

To me, that's like looking for a long within an int[] array.

So.. If a char[] actually consists of dchar elements, does that mean I 
can append a dchar to a char[] array?

   dchar u_umlat = 'ü';
   char[] newString = "mötley crüe" ~ u_umlat;

No. Of course not. The compiler complains that you can't concatenate a 
dchar to a char[] array. Even though the "find" functions indicate that 
the array is truly a collection of dchar elements.

Now, don't get me wrong. I understand why the string is encoded as 
UTF-8. And I understand that the encoding prevents accurate element 
iteration, indexing, slicing, and all the other nice array goodies.

The existing D string implementation is exactly what I'd expect to see 
inside the guts of a string class, because encodings are important and 
efficiency is important. But those implementation details shouldn't be 
exposed through a public API.

To claim that D strings are actually usable as character arrays is more 
than a little spurious, since direct access of the array elements can 
return fragmented garbage bytes.

If accurate string manipulation is impossible without a set of 
special-purpose functions, then I'll argue that the implementation is 
already equivalent to that of a class, but without any of the niceties 
of encapsulation and polymorphism.

--benji

Aug 25 2008

superdan <super dan.org> writes:

Benji Smith Wrote:

 Oh, man, I forgot one other thing... And it's a biggie...
 
 The D _Arrays_ page says that "A string is an array of characters. 
 String literals are just an easy way to write character arrays."
 
 http://digitalmars.com/d/1.0/arrays.html
 
 In my previous post, I also use the "character array" terminology.
 
 Unfortunately, though, it's just not true.
 
 A char[] is actually an array of UTF-8 encoded octets, where each 
 character may consume one or more consecutive elements of the array. 
 Retrieving the str.length property may or may not tell you how many 
 characters are in the string. And pretty much any code that tries to 
 iterate character-by-character through the array elements is 
 fundamentally broken.

try this:

foreach (dchar c; str)
{
    process c
}

 Take a look at this code, for example:
 
 ------------------------------------------------------------------
 import tango.io.Stdout;
 
 void main() {
 
     // Create a string with UTF-8 content
     char[] str = "mötley crüe";
     Stdout.formatln("full string value: {}", str);
 
     Stdout.formatln("len: {}", str.length);
     // --> "len: 13" ... but there are only 11 characters!
 
     Stdout.formatln("2nd char: '{}'", str[1]);
     // --> "2nd char: ''" ... where'd my character go?
 
     Stdout.formatln("first 3 chars: '{}'", str[0..3]);
     // --> "first 3 chars: 'mö'" ... why only 2?
 
     char o_umlat = 'ö';
     Stdout.formatln("char value: '{}'", o_umlat);
     // --> "char value: ''" ... where's my char?
 
 }
 ------------------------------------------------------------------
 
 So you can't actually iterate the the char elements of a char[] without 
 risking that you'll turn your string data into garbage. And you can't 
 trust that the length property tells you how many characters there are. 
 And you can't trust that an index or a slice will return valid data.

you can iterate with foreach or lib functions. an index or slice won't return
valid data indeed, but it couldn't anyway. there's no o(1) indexing into a
string unless it's utf32.

 Also: take a look at the Phobos string "find" functions:
 
    int find(char[] s, dchar c);
    int ifind(char[] s, dchar c);
    int rfind(char[] s, dchar c);
    int irfind(char[] s, dchar c);
 
 Huh?
 
 To find a character in a char[] array, you have to use a dchar?
 
 To me, that's like looking for a long within an int[] array.

because you're wrong. you look for a dchar which can represent all characters
in an array of a given encoding. the comparison is off.

 So.. If a char[] actually consists of dchar elements, does that mean I 
 can append a dchar to a char[] array?
 
    dchar u_umlat = 'ü';
    char[] newString = "mötley crüe" ~ u_umlat;
 
 No. Of course not. The compiler complains that you can't concatenate a 
 dchar to a char[] array. Even though the "find" functions indicate that 
 the array is truly a collection of dchar elements.

that's a bug in the compiler. report it.

 Now, don't get me wrong. I understand why the string is encoded as 
 UTF-8. And I understand that the encoding prevents accurate element 
 iteration, indexing, slicing, and all the other nice array goodies.

i know you understand. you should also understand 

 The existing D string implementation is exactly what I'd expect to see 
 inside the guts of a string class, because encodings are important and 
 efficiency is important. But those implementation details shouldn't be 
 exposed through a public API.

exactly at this point your argument kinda explodes. yes, you should see that
stuff inside the guts of a string. which means builtin strings should be just
arrays that you build larger stuff from. but wait. that's exactly what happens
right now.

 To claim that D strings are actually usable as character arrays is more 
 than a little spurious, since direct access of the array elements can 
 return fragmented garbage bytes.

agreed.

 If accurate string manipulation is impossible without a set of 
 special-purpose functions, then I'll argue that the implementation is 
 already equivalent to that of a class, but without any of the niceties 
 of encapsulation and polymorphism.

and without the disadvantages.

Aug 25 2008

BCS <ao pathlink.com> writes:

Reply to superdan,

 The existing D string implementation is exactly what I'd expect to see
 inside the guts of a string class, because encodings are important and
 efficiency is important. But those implementation details shouldn't be
 exposed through a public API.

 
 exactly at this point your argument kinda explodes. yes, you should
 see that stuff inside the guts of a string. which means builtin
 strings should be just arrays that you build larger stuff from. but
 wait. that's exactly what happens right now.
 

Ditto, D is a *systems language* It's *supposed* to have access to the lowest 
level representation and build stuff on top of that

Aug 25 2008

Benji Smith <dlanguage benjismith.net> writes:

BCS wrote:
 Ditto, D is a *systems language* It's *supposed* to have access to the 
 lowest level representation and build stuff on top of that

But in this "systems language", it's a O(n) operation to get the nth 
character from a string, to slice a string based on character offsets, 
or to determine the number of characters in the string.

I'd gladly pay the price of a single interface vtable lookup to turn all 
of those into O(1) operations.

--benji

Aug 25 2008

BCS <ao pathlink.com> writes:

Reply to Benji,

 BCS wrote:
 
 Ditto, D is a *systems language* It's *supposed* to have access to
 the lowest level representation and build stuff on top of that
 

 But in this "systems language", it's a O(n) operation to get the nth
 character from a string, to slice a string based on character offsets,
 or to determine the number of characters in the string.
 
 I'd gladly pay the price of a single interface vtable lookup to turn
 all of those into O(1) operations.
 
 --benji
 

Then borrow, buy, steal or build a class that does that /on top of the D 
arrays/

No one has said that this should not be available, just that it should not 
/replace/ what is available

Aug 25 2008

Benji Smith <dlanguage benjismith.net> writes:

BCS wrote:
 Reply to Benji,
 
 BCS wrote:

 Ditto, D is a *systems language* It's *supposed* to have access to
 the lowest level representation and build stuff on top of that

 But in this "systems language", it's a O(n) operation to get the nth
 character from a string, to slice a string based on character offsets,
 or to determine the number of characters in the string.

 I'd gladly pay the price of a single interface vtable lookup to turn
 all of those into O(1) operations.

 --benji

 
 Then borrow, buy, steal or build a class that does that /on top of the D 
 arrays/
 
 No one has said that this should not be available, just that it should 
 not /replace/ what is available

The point is that the new string class would be incompatible with the 
*hundreds* of existing functions that process character arrays.

Why don't strings qualify for polymorphism?

Am I the only one who thinks the existing tradeoff is a fool's bargain?

--benji

Aug 25 2008

BCS <ao pathlink.com> writes:

Reply to Benji,

 BCS wrote:
 
 Reply to Benji,
 
 BCS wrote:
 
 Ditto, D is a *systems language* It's *supposed* to have access to
 the lowest level representation and build stuff on top of that
 

 But in this "systems language", it's a O(n) operation to get the nth
 character from a string, to slice a string based on character
 offsets, or to determine the number of characters in the string.
 
 I'd gladly pay the price of a single interface vtable lookup to turn
 all of those into O(1) operations.
 
 --benji
 

 Then borrow, buy, steal or build a class that does that /on top of
 the D arrays/
 
 No one has said that this should not be available, just that it
 should not /replace/ what is available
 

 The point is that the new string class would be incompatible with the
 *hundreds* of existing functions that process character arrays.
 

That is an issue with (and *only* with) "the *hundreds* of existing functions 
that process character arrays".

Aug 25 2008

"Chris R. Miller" <lordSaurontheGreat gmail.com> writes:

Benji Smith wrote:
 BCS wrote:
 Reply to Benji,

 BCS wrote:

 Ditto, D is a *systems language* It's *supposed* to have access to
 the lowest level representation and build stuff on top of that

 But in this "systems language", it's a O(n) operation to get the nth
 character from a string, to slice a string based on character offsets=



,
 or to determine the number of characters in the string.

 I'd gladly pay the price of a single interface vtable lookup to turn
 all of those into O(1) operations.

 --benji

 Then borrow, buy, steal or build a class that does that /on top of the=


 D arrays/

 No one has said that this should not be available, just that it should=


 not /replace/ what is available

=20
 The point is that the new string class would be incompatible with the
 *hundreds* of existing functions that process character arrays.
=20
 Why don't strings qualify for polymorphism?

-------------------------------------------
wchar[] foo=3D"text"w;

int indexOf(char[] str,char ch){
    foreach(int idx,char c;str)
        if(c=3D=3Dch) return idx;
    return -1;
}

void main() {
    assert(indexOf(foo, 'x')=3D=3D2);
}
-------------------------------------------

If that does compile, it shouldn't.  The best way to get that to work is
to use a template.  Templates can be annoying.  A String class could
simplify the different kinds of String inherent in D.  The String class
would (should) internally know what kind of String it is (wchar, char,
dchar) and to know how to mitigate those differences when operations are
called on it.

 Benji
If you want a String class, why don't you write one?  It's a fairly
simple task, even high-school CS students do it quite routinely in C++
(which is a lot more unwieldy for OOP than D is).

A very successful instance of Strings-as-objects is present in Java.
I'd suggest trying to duplicate that functionality.  Then you could
easily write wrappers on existing libraries to use the new String object.=

Aug 25 2008

superdan <super dan.org> writes:

Benji Smith Wrote:

 BCS wrote:
 Ditto, D is a *systems language* It's *supposed* to have access to the 
 lowest level representation and build stuff on top of that

 
 But in this "systems language", it's a O(n) operation to get the nth 
 character from a string, to slice a string based on character offsets, 
 or to determine the number of characters in the string.
 
 I'd gladly pay the price of a single interface vtable lookup to turn all 
 of those into O(1) operations.

dood. i dunno where to start. allow me to answer from multiple angles.

1. when was the last time looking up one char in a string or computing length
was your bottleneck.

2. you talk as if o(1) happens by magic that d currently disallows.

3. maybe i don't want to blow the size of my string by a factor of 4 if i'm
just interested in some occasional character search.

4. implement all that nice stuff you wanna. nobody put a gun to yer head not
to. understand you can't put a gun to my head to pay the price.

Aug 25 2008

Benji Smith <dlanguage benjismith.net> writes:

superdan wrote:
 Benji Smith Wrote:
 
 BCS wrote:
 Ditto, D is a *systems language* It's *supposed* to have access to the 
 lowest level representation and build stuff on top of that

 But in this "systems language", it's a O(n) operation to get the nth 
 character from a string, to slice a string based on character offsets, 
 or to determine the number of characters in the string.

 I'd gladly pay the price of a single interface vtable lookup to turn all 
 of those into O(1) operations.

 
 dood. i dunno where to start. allow me to answer from multiple angles.
 
 1. when was the last time looking up one char in a string or computing length
was your bottleneck.
 
 2. you talk as if o(1) happens by magic that d currently disallows.
 
 3. maybe i don't want to blow the size of my string by a factor of 4 if i'm
just interested in some occasional character search.
 
 4. implement all that nice stuff you wanna. nobody put a gun to yer head not
to. understand you can't put a gun to my head to pay the price.

Geez, man, you just keep missing the point, over and over again.

Let me make one point, blisteringly clear: I don't give a shit about the 
   data format. You want the fastest strings in the universe, 
implemented with zero-byte magic beans and burned into the local ROM. 
Fantastic! I'm completely in favor of it.

Presumably. people will be so into those strings that they'll write a 
shitload of functionality for them. Parsing, searching, sorting, 
indexing... the motherload.

One day, I come along, and I'd like to perform some text processing. But 
all of my string data comes from non-magic-beans data sources. I'd like 
to implement a new kind of string class that supports my data. I'm not 
going to push my super-slow string class on anybody else, because I know 
how concerned with performance you are.

But check this out... you can have your fast class, and I can have my 
slow class, and they can both implement the same interface. Like this:

interface CharSequence {
   int find(CharSequence needle);
   int rfind(CharSequence needle);
   // ...
}

class ZeroByteFastMagicString : CharSequence {
   // ...
}

class SuperSlowStoneTabletString : CharSequence {
   // ...
}

Now we can both use the same string functions. Just by implementing an 
interface, I can use the same text-processing as your 
hyper-compiler-optimized builtin arrays.

But only if the interface exists.

And only if library authors write their text-processing code against 
that interface.

That's the point.

A good API allows multiple implementations to make use of the same 
algorithms. Application authors can choose their own tradeoffs between 
speed, memory consumption, and functionality.

A rigid builtin implementation, with no interface definition, locks 
everybody into the same choices.

--benji

Aug 25 2008

BCS <ao pathlink.com> writes:

Reply to Benji,

 But check this out... you can have your fast class, and I can have my
 slow class, and they can both implement the same interface. Like this:
 

No, you can't.

The overhead needed to implement that is EXACTLY what we are unwilling to 
use.

I want an indexed array load

x = arr[i];

to be:

-- load arr.ptr into a reg
-- add i to that reg
-- indirect load that into i


3 ASM ops.

If you can get that and what you want, go get a PhD. You will have earned it.

Aug 25 2008

Robert Fraser <fraserofthenight gmail.com> writes:

Benji Smith wrote:
 superdan wrote:
 Benji Smith Wrote:

 BCS wrote:
 Ditto, D is a *systems language* It's *supposed* to have access to 
 the lowest level representation and build stuff on top of that

 But in this "systems language", it's a O(n) operation to get the nth 
 character from a string, to slice a string based on character 
 offsets, or to determine the number of characters in the string.

 I'd gladly pay the price of a single interface vtable lookup to turn 
 all of those into O(1) operations.

 dood. i dunno where to start. allow me to answer from multiple angles.

 1. when was the last time looking up one char in a string or computing 
 length was your bottleneck.

 2. you talk as if o(1) happens by magic that d currently disallows.

 3. maybe i don't want to blow the size of my string by a factor of 4 
 if i'm just interested in some occasional character search.

 4. implement all that nice stuff you wanna. nobody put a gun to yer 
 head not to. understand you can't put a gun to my head to pay the price.

 
 Geez, man, you just keep missing the point, over and over again.
 
 Let me make one point, blisteringly clear: I don't give a shit about the 
   data format. You want the fastest strings in the universe, implemented 
 with zero-byte magic beans and burned into the local ROM. Fantastic! I'm 
 completely in favor of it.
 
 Presumably. people will be so into those strings that they'll write a 
 shitload of functionality for them. Parsing, searching, sorting, 
 indexing... the motherload.
 
 One day, I come along, and I'd like to perform some text processing. But 
 all of my string data comes from non-magic-beans data sources. I'd like 
 to implement a new kind of string class that supports my data. I'm not 
 going to push my super-slow string class on anybody else, because I know 
 how concerned with performance you are.
 
 But check this out... you can have your fast class, and I can have my 
 slow class, and they can both implement the same interface. Like this:
 
 interface CharSequence {
   int find(CharSequence needle);
   int rfind(CharSequence needle);
   // ...
 }
 
 class ZeroByteFastMagicString : CharSequence {
   // ...
 }
 
 class SuperSlowStoneTabletString : CharSequence {
   // ...
 }
 
 Now we can both use the same string functions. Just by implementing an 
 interface, I can use the same text-processing as your 
 hyper-compiler-optimized builtin arrays.
 
 But only if the interface exists.
 
 And only if library authors write their text-processing code against 
 that interface.
 
 That's the point.
 
 A good API allows multiple implementations to make use of the same 
 algorithms. Application authors can choose their own tradeoffs between 
 speed, memory consumption, and functionality.
 
 A rigid builtin implementation, with no interface definition, locks 
 everybody into the same choices.
 
 --benji

Superdan is confusing the issues here. The main argument against your 
proposal (besides backwards compatibility, of course) is that every 
access would require a virtual call, which can be fairly slow.

Aug 25 2008

superdan <super dan.org> writes:

Robert Fraser Wrote:

 Benji Smith wrote:
 superdan wrote:
 Benji Smith Wrote:

 BCS wrote:
 Ditto, D is a *systems language* It's *supposed* to have access to 
 the lowest level representation and build stuff on top of that

 But in this "systems language", it's a O(n) operation to get the nth 
 character from a string, to slice a string based on character 
 offsets, or to determine the number of characters in the string.

 I'd gladly pay the price of a single interface vtable lookup to turn 
 all of those into O(1) operations.

 dood. i dunno where to start. allow me to answer from multiple angles.

 1. when was the last time looking up one char in a string or computing 
 length was your bottleneck.

 2. you talk as if o(1) happens by magic that d currently disallows.

 3. maybe i don't want to blow the size of my string by a factor of 4 
 if i'm just interested in some occasional character search.

 4. implement all that nice stuff you wanna. nobody put a gun to yer 
 head not to. understand you can't put a gun to my head to pay the price.

 
 Geez, man, you just keep missing the point, over and over again.
 
 Let me make one point, blisteringly clear: I don't give a shit about the 
   data format. You want the fastest strings in the universe, implemented 
 with zero-byte magic beans and burned into the local ROM. Fantastic! I'm 
 completely in favor of it.
 
 Presumably. people will be so into those strings that they'll write a 
 shitload of functionality for them. Parsing, searching, sorting, 
 indexing... the motherload.
 
 One day, I come along, and I'd like to perform some text processing. But 
 all of my string data comes from non-magic-beans data sources. I'd like 
 to implement a new kind of string class that supports my data. I'm not 
 going to push my super-slow string class on anybody else, because I know 
 how concerned with performance you are.
 
 But check this out... you can have your fast class, and I can have my 
 slow class, and they can both implement the same interface. Like this:
 
 interface CharSequence {
   int find(CharSequence needle);
   int rfind(CharSequence needle);
   // ...
 }
 
 class ZeroByteFastMagicString : CharSequence {
   // ...
 }
 
 class SuperSlowStoneTabletString : CharSequence {
   // ...
 }
 
 Now we can both use the same string functions. Just by implementing an 
 interface, I can use the same text-processing as your 
 hyper-compiler-optimized builtin arrays.
 
 But only if the interface exists.
 
 And only if library authors write their text-processing code against 
 that interface.
 
 That's the point.
 
 A good API allows multiple implementations to make use of the same 
 algorithms. Application authors can choose their own tradeoffs between 
 speed, memory consumption, and functionality.
 
 A rigid builtin implementation, with no interface definition, locks 
 everybody into the same choices.
 
 --benji

 
 Superdan is confusing the issues here. The main argument against your 
 proposal (besides backwards compatibility, of course) is that every 
 access would require a virtual call, which can be fairly slow.

i'm not confusin'. mentioned the efficiency thing a number of times, didn't
seem to phase him a bit. so i tried some more viewpoints.

Aug 25 2008

superdan <super dan.org> writes:

Benji Smith Wrote:

 superdan wrote:
 Benji Smith Wrote:
 
 BCS wrote:
 Ditto, D is a *systems language* It's *supposed* to have access to the 
 lowest level representation and build stuff on top of that

 But in this "systems language", it's a O(n) operation to get the nth 
 character from a string, to slice a string based on character offsets, 
 or to determine the number of characters in the string.

 I'd gladly pay the price of a single interface vtable lookup to turn all 
 of those into O(1) operations.

 
 dood. i dunno where to start. allow me to answer from multiple angles.
 
 1. when was the last time looking up one char in a string or computing length
was your bottleneck.
 
 2. you talk as if o(1) happens by magic that d currently disallows.
 
 3. maybe i don't want to blow the size of my string by a factor of 4 if i'm
just interested in some occasional character search.
 
 4. implement all that nice stuff you wanna. nobody put a gun to yer head not
to. understand you can't put a gun to my head to pay the price.

 
 Geez, man, you just keep missing the point, over and over again.

relax. believe me i'm tryin', maybe you could put it a better way and meet me
in the middle.

 Let me make one point, blisteringly clear: I don't give a shit about the 
    data format. You want the fastest strings in the universe, 
 implemented with zero-byte magic beans and burned into the local ROM. 
 Fantastic! I'm completely in favor of it.

so far so good. 

 Presumably. people will be so into those strings that they'll write a 
 shitload of functionality for them. Parsing, searching, sorting, 
 indexing... the motherload.

cool.

 One day, I come along, and I'd like to perform some text processing. But 
 all of my string data comes from non-magic-beans data sources. I'd like 
 to implement a new kind of string class that supports my data. I'm not 
 going to push my super-slow string class on anybody else, because I know 
 how concerned with performance you are.

i'm in nirvana.

 But check this out... you can have your fast class, and I can have my 
 slow class, and they can both implement the same interface. Like this:
 
 interface CharSequence {
    int find(CharSequence needle);
    int rfind(CharSequence needle);
    // ...
 }
 
 class ZeroByteFastMagicString : CharSequence {
    // ...
 }
 
 class SuperSlowStoneTabletString : CharSequence {
    // ...
 }
 
 Now we can both use the same string functions. Just by implementing an 
 interface, I can use the same text-processing as your 
 hyper-compiler-optimized builtin arrays.

but maestro. the interface call is already what's costing.

 But only if the interface exists.
 
 And only if library authors write their text-processing code against 
 that interface.
 
 That's the point.

then there was none. sorry.

 A good API allows multiple implementations to make use of the same 
 algorithms. Application authors can choose their own tradeoffs between 
 speed, memory consumption, and functionality.
 
 A rigid builtin implementation, with no interface definition, locks 
 everybody into the same choices.

no. this is just wrong. perfectly backwards in fact. a low-level builtin allows
unbounded architectures with control over efficiency.

Aug 25 2008

Benji Smith <dlanguage benjismith.net> writes:

superdan wrote:
 relax. believe me i'm tryin', maybe you could put it a better way and meet me
in the middle.

Okay. I'll try :)

Think about a collection API.

The container classes are all written to satisfy a few basic primitive 
operations: you can get an item at a particular index, you can iterate 
in sequence (either forward or in reverse). You can insert items into a 
hashtable or retrieve them by key. And so on.

Someone else comes along and writes a library of algorithms. The 
algorithms can operate on any container that implements the necessary 
operations.

When someone clever comes along and writes a new sorting algorithm, I 
can plug my new container class right into it, and get the algorithm for 
free. Likewise for the guy with the clever new collection class.

We don't bat an eye at the idea of containers & algorithms connecting to 
one another using a reciprocal set of interfaces. In most cases, you get 
a performance **benefit** because you can mix and match the container 
and algorithm implementations that most suit your needs. You can design 
your own performance solution, rather than being stuck a single "low 
level" implementation that might be good for the general case but isn't 
ideal for your problem.

Over in another message BCS said he wants an array index to compile to 3 
ASM ops. Cool I'm all for it.

I don't know a whole lot about the STL, but my understanding is that 
most C++ compilers are smart enough that they can produce the same ASM 
from an iterator moving over a vector as incrementing a pointer over an 
array.

So the default implementation is damn fast.

But if someone else, with special design constraints, needs to implement 
a custom container template, it's no problem. As long as the container 
provides a function for getting iterators to the container elements, it 
can consume any of the STL algorithms too, even if the performance isn't 
as good as the performance for a vector.

There's no good reason the same technique couldn't provide both speed 
and API flexibility for text processing.

--benji

Aug 25 2008

superdan <super dan.org> writes:

Benji Smith Wrote:

 superdan wrote:
 relax. believe me i'm tryin', maybe you could put it a better way and meet me
in the middle.

 
 Okay. I'll try :)

'preciate that.

 Think about a collection API.

okay.

 The container classes are all written to satisfy a few basic primitive 
 operations: you can get an item at a particular index, you can iterate 
 in sequence (either forward or in reverse). You can insert items into a 
 hashtable or retrieve them by key. And so on.

how do you implement getting an item at a particular index for a linked list? 

how do you make a hashtable, an array, and a linked list obey the same
interface? guess hashtable has stuff that others don't support?

these are serious questions. not in jest not rhetorical and not trick.

 Someone else comes along and writes a library of algorithms. The 
 algorithms can operate on any container that implements the necessary 
 operations.

hm. things are starting to screech a bit. but let's see your answers to your
questions above.

 When someone clever comes along and writes a new sorting algorithm, I 
 can plug my new container class right into it, and get the algorithm for 
 free. Likewise for the guy with the clever new collection class.

things ain't that simple.

saw this flick "the devil wears prada", an ok movie but one funny remark stayed
with me. "you are in desperate need of chanel." 

i'll paraphrase. "you are in desperate need of stl." you need to learn stl and
then you'll figure why you can't plug a new sorting algorithm into a container.
you need more guarantees. and you need iterators.

 We don't bat an eye at the idea of containers & algorithms connecting to 
 one another using a reciprocal set of interfaces.

i do. it's completely wrong. you need iterators that broker between containers
and algos. and iterators must give complexity guarantees.

 In most cases, you get 
 a performance **benefit** because you can mix and match the container 
 and algorithm implementations that most suit your needs. You can design 
 your own performance solution, rather than being stuck a single "low 
 level" implementation that might be good for the general case but isn't 
 ideal for your problem.

assuming there are iterators in the picture, sure. there is a performance
benefit. even more so when said mixing and matching is done during compilation.

 Over in another message BCS said he wants an array index to compile to 3 
 ASM ops. Cool I'm all for it.

great. but then you must be all for the consequences of it.

 I don't know a whole lot about the STL, but my understanding is that 
 most C++ compilers are smart enough that they can produce the same ASM 
 from an iterator moving over a vector as incrementing a pointer over an 
 array.

they are because stl is designed in a specific way. that specific way is
lightyears away from the design you outline above.

 So the default implementation is damn fast.

not sure what you mean by default here, but playing along.

 But if someone else, with special design constraints, needs to implement 
 a custom container template, it's no problem. As long as the container 
 provides a function for getting iterators to the container elements, it 
 can consume any of the STL algorithms too, even if the performance isn't 
 as good as the performance for a vector.
 
 There's no good reason the same technique couldn't provide both speed 
 and API flexibility for text processing.

you see here's the problem. you systematically forget to factor in the cost of
reaching through a binary interface. and if that's not there, congrats. you
just discovered perpetual motion.

stl is fast for two main reasons. one. it uses compile-time interfaces and not
run-time interfaces as you want. two. it defines and strictly uses a
compile-time hierarchy of iterators with stringent complexity guarantees.

your container design can't be fast because it uses runtime interfaces. let
alone that you don't mention complexity guarantees. but let's say those can be
provided. but the fundamental problem is that you want runtime interfaces for a
very low level data structure. fast that can't be. please understand.

Aug 25 2008

Christopher Wright <dhasenan gmail.com> writes:

superdan wrote:
 Benji Smith Wrote:
 
 superdan wrote:
 relax. believe me i'm tryin', maybe you could put it a better way and meet me
in the middle.

 Okay. I'll try :)

 
 'preciate that.
 
 Think about a collection API.

 
 okay.
 
 The container classes are all written to satisfy a few basic primitive 
 operations: you can get an item at a particular index, you can iterate 
 in sequence (either forward or in reverse). You can insert items into a 
 hashtable or retrieve them by key. And so on.

 
 how do you implement getting an item at a particular index for a linked list? 

class Node(T)
{
	Node!(T) next;
	T value;
}

class LinkedList(T)
{
	Node!(T) head;

	/** Gets the ith item of the list. Throws: SIGSEGV if i >= length of 
the list. Time complexity: O(N) for a list of length N. This operation 
is provided for completeness and not recommended for frequent use in 
large lists. */
	T opIndex(int i)
	{
		auto current = head;
		while (i)
		{
			current = current.next;
		}
		return current.value;
	}
}

 how do you make a hashtable, an array, and a linked list obey the same
interface? guess hashtable has stuff that others don't support?

You have an interface for collections (you can add, remove, and get the 
length, maybe a couple other things).

You have an interface for lists (they're collections, and you can index 
them).

Then you can use all the collection-oriented stuff with lists, and you 
can do special list-type things with them if you want.

 these are serious questions. not in jest not rhetorical and not trick.

Yes, but they're solved problems.

 Someone else comes along and writes a library of algorithms. The 
 algorithms can operate on any container that implements the necessary 
 operations.

 
 hm. things are starting to screech a bit. but let's see your answers to your
questions above.
 
 When someone clever comes along and writes a new sorting algorithm, I 
 can plug my new container class right into it, and get the algorithm for 
 free. Likewise for the guy with the clever new collection class.

 
 things ain't that simple.

Collection-oriented library code will care sufficiently about 
performance that this mix-and-match stuff is not feasible. Almost 
anything else doesn't care enough to take only an 
AssociativeArrayHashSet and not a TreeSet or a LinkedList or a primitive 
array.

 saw this flick "the devil wears prada", an ok movie but one funny remark
stayed with me. "you are in desperate need of chanel." 
 
 i'll paraphrase. "you are in desperate need of stl." you need to learn stl and
then you'll figure why you can't plug a new sorting algorithm into a container.
you need more guarantees. and you need iterators.
 
 We don't bat an eye at the idea of containers & algorithms connecting to 
 one another using a reciprocal set of interfaces.

 
 i do. it's completely wrong. you need iterators that broker between containers
and algos. and iterators must give complexity guarantees.

I don't. If I'm just going to iterate through the items of a collection, 
I only care about the opApply. If I need to index stuff, I don't care if 
I get a primitive array or Bob Dole's brand-new ArrayList class.

Aug 25 2008

superdan <super dan.org> writes:

Christopher Wright Wrote:

 superdan wrote:
 Benji Smith Wrote:
 
 superdan wrote:
 relax. believe me i'm tryin', maybe you could put it a better way and meet me
in the middle.

 Okay. I'll try :)

 
 'preciate that.
 
 Think about a collection API.

 
 okay.
 
 The container classes are all written to satisfy a few basic primitive 
 operations: you can get an item at a particular index, you can iterate 
 in sequence (either forward or in reverse). You can insert items into a 
 hashtable or retrieve them by key. And so on.

 
 how do you implement getting an item at a particular index for a linked list? 

 
 class Node(T)
 {
 	Node!(T) next;
 	T value;
 }

so far so good.

 class LinkedList(T)
 {
 	Node!(T) head;
 
 	/** Gets the ith item of the list. Throws: SIGSEGV if i >= length of 
 the list. Time complexity: O(N) for a list of length N. This operation 
 is provided for completeness and not recommended for frequent use in 
 large lists. */
 	T opIndex(int i)
 	{
 		auto current = head;
 		while (i)
 		{
 			current = current.next;
 		}
 		return current.value;
 	}
 }

oopsies. houston we got a problem here. problem is all that pluggable sort
business works only if it can count on a constant time opIndex. why? because
sort has right in its spec that it takes o(n log n) time. if u pass LinkedList
to it you obtain a nonsensical design that compiles but runs afoul of the spec.
because with that opIndex sort will run in quadratic time and no amount of
commentin' is gonna save it from that.

this design is a stillborn.

what needs done is to allow different kinds of containers implement different
interfaces. in fact a better way to factor things is via iterators, as stl has
shown.

 how do you make a hashtable, an array, and a linked list obey the same
interface? guess hashtable has stuff that others don't support?

 
 You have an interface for collections (you can add, remove, and get the 
 length, maybe a couple other things).

this is an incomplete response. what do you add for a vector!(T)? A T. what do
you add for a hash!(T, U)? you tell me. and you tell me how you make that
signature consistent across vector and hash.

 You have an interface for lists (they're collections, and you can index 
 them).

wrong. don't ever mention a linear-time indexing operator in an interview. you
will fail it right then. you can always define linear-time indexing as a named
function. but never masquerade it as an index operator.

 Then you can use all the collection-oriented stuff with lists, and you 
 can do special list-type things with them if you want.
 
 these are serious questions. not in jest not rhetorical and not trick.

 
 Yes, but they're solved problems.

apparently not since you failed at'em.

 Someone else comes along and writes a library of algorithms. The 
 algorithms can operate on any container that implements the necessary 
 operations.

 
 hm. things are starting to screech a bit. but let's see your answers to your
questions above.
 
 When someone clever comes along and writes a new sorting algorithm, I 
 can plug my new container class right into it, and get the algorithm for 
 free. Likewise for the guy with the clever new collection class.

 
 things ain't that simple.

 
 Collection-oriented library code will care sufficiently about 
 performance that this mix-and-match stuff is not feasible.

what's that supposed to mean? you sayin' stl don't exist?

 Almost 
 anything else doesn't care enough to take only an 
 AssociativeArrayHashSet and not a TreeSet or a LinkedList or a primitive 
 array.

wrong for the reasons above.

 saw this flick "the devil wears prada", an ok movie but one funny remark
stayed with me. "you are in desperate need of chanel." 
 
 i'll paraphrase. "you are in desperate need of stl." you need to learn stl and
then you'll figure why you can't plug a new sorting algorithm into a container.
you need more guarantees. and you need iterators.
 
 We don't bat an eye at the idea of containers & algorithms connecting to 
 one another using a reciprocal set of interfaces.

 
 i do. it's completely wrong. you need iterators that broker between containers
and algos. and iterators must give complexity guarantees.

 
 I don't. If I'm just going to iterate through the items of a collection, 
 I only care about the opApply. If I need to index stuff, I don't care if 
 I get a primitive array or Bob Dole's brand-new ArrayList class.

you too are in desperate need for stl.

Aug 25 2008

Christopher Wright <dhasenan gmail.com> writes:

superdan wrote:
 class LinkedList(T)
 {
 	Node!(T) head;

 	/** Gets the ith item of the list. Throws: SIGSEGV if i >= length of 
 the list. Time complexity: O(N) for a list of length N. This operation 
 is provided for completeness and not recommended for frequent use in 
 large lists. */
 	T opIndex(int i)
 	{
 		auto current = head;
 		while (i)
 		{
 			current = current.next;
 		}
 		return current.value;
 	}
 }

 
 oopsies. houston we got a problem here. problem is all that pluggable sort
business works only if it can count on a constant time opIndex. why? because
sort has right in its spec that it takes o(n log n) time. if u pass LinkedList
to it you obtain a nonsensical design that compiles but runs afoul of the spec.
because with that opIndex sort will run in quadratic time and no amount of
commentin' is gonna save it from that.

WRONG!
Those sorting algorithms are correct. Their runtime is now O(n^2 log n) 
for this linked list.

 this design is a stillborn.
 
 what needs done is to allow different kinds of containers implement different
interfaces. in fact a better way to factor things is via iterators, as stl has
shown.

You didn't ask for an O(1) opIndex on a linked list. You asked for a 
correct opIndex on a linked list. Any sorting algorithm you name that 
would work on an array would also work on this linked list. Admittedly, 
insertion sort would be much faster than qsort, but if your library 
provides that, you, knowing that you are using a linked list, would 
choose the insertion sort algorithm.

 how do you make a hashtable, an array, and a linked list obey the same
interface? guess hashtable has stuff that others don't support?

 You have an interface for collections (you can add, remove, and get the 
 length, maybe a couple other things).

 
 this is an incomplete response. what do you add for a vector!(T)? A T. what do
you add for a hash!(T, U)? you tell me. and you tell me how you make that
signature consistent across vector and hash.

interface List(T) : Collection!(T) {}

class Vector(T) : List!(T) {}

class HashMap(T, U) : Collection!(KeyValuePair!(T, U)) {}



 You have an interface for lists (they're collections, and you can index 
 them).

 
 wrong. don't ever mention a linear-time indexing operator in an interview. you
will fail it right then. you can always define linear-time indexing as a named
function. but never masquerade it as an index operator.

If you create a linked list with O(1) indexing, that might suffice to 
get you a PhD. If you claim that you can do so in an interview, you 
should be required to show proof; and should you fail to do so, you will 
probably be shown the door.

Even if you did prove it in the interview, they would probably consider 
you overqualified, unless the company's focus was data structures.

 Then you can use all the collection-oriented stuff with lists, and you 
 can do special list-type things with them if you want.

 these are serious questions. not in jest not rhetorical and not trick.

 Yes, but they're solved problems.

 
 apparently not since you failed at'em.

You claimed that a problem with an inefficient solution has no solution, 
and then you execrate me for providing an inefficient solution. Why?

 Someone else comes along and writes a library of algorithms. The 
 algorithms can operate on any container that implements the necessary 
 operations.

 hm. things are starting to screech a bit. but let's see your answers to your
questions above.

 When someone clever comes along and writes a new sorting algorithm, I 
 can plug my new container class right into it, and get the algorithm for 
 free. Likewise for the guy with the clever new collection class.

 things ain't that simple.

 Collection-oriented library code will care sufficiently about 
 performance that this mix-and-match stuff is not feasible.

 
 what's that supposed to mean? you sayin' stl don't exist?

No, I'm saying that for efficiency, you need to know about the internals 
of a data structure to implement a number of collection-oriented 
algorithms. Just like the linked list example.

 Almost 
 anything else doesn't care enough to take only an 
 AssociativeArrayHashSet and not a TreeSet or a LinkedList or a primitive 
 array.

 
 wrong for the reasons above.

They expose very similar interfaces. You might care about which you 
choose because of the efficiency of various operations, but most of your 
code won't care which type it gets; it would still be correct.

Well, sets have a property that no element appears in them twice, so 
that is an algorithmic consideration sometimes.

 saw this flick "the devil wears prada", an ok movie but one funny remark
stayed with me. "you are in desperate need of chanel." 

 i'll paraphrase. "you are in desperate need of stl." you need to learn stl and
then you'll figure why you can't plug a new sorting algorithm into a container.
you need more guarantees. and you need iterators.

 We don't bat an eye at the idea of containers & algorithms connecting to 
 one another using a reciprocal set of interfaces.

 i do. it's completely wrong. you need iterators that broker between containers
and algos. and iterators must give complexity guarantees.

 I don't. If I'm just going to iterate through the items of a collection, 
 I only care about the opApply. If I need to index stuff, I don't care if 
 I get a primitive array or Bob Dole's brand-new ArrayList class.

 
 you too are in desperate need for stl.

You are in desperate need of System.Collections.Generic. Or 
tango.util.container.

Aug 26 2008

superdan <super dan.org> writes:

Christopher Wright Wrote:

 superdan wrote:
 class LinkedList(T)
 {
 	Node!(T) head;

 	/** Gets the ith item of the list. Throws: SIGSEGV if i >= length of 
 the list. Time complexity: O(N) for a list of length N. This operation 
 is provided for completeness and not recommended for frequent use in 
 large lists. */
 	T opIndex(int i)
 	{
 		auto current = head;
 		while (i)
 		{
 			current = current.next;
 		}
 		return current.value;
 	}
 }

 
 oopsies. houston we got a problem here. problem is all that pluggable sort
business works only if it can count on a constant time opIndex. why? because
sort has right in its spec that it takes o(n log n) time. if u pass LinkedList
to it you obtain a nonsensical design that compiles but runs afoul of the spec.
because with that opIndex sort will run in quadratic time and no amount of
commentin' is gonna save it from that.

 
 WRONG!
 Those sorting algorithms are correct. Their runtime is now O(n^2 log n) 
 for this linked list.

you got the wrong definition of correct. sort became a non-scalable algo from a
scalable algo. whacha gonna say next. bubble sort is viable?!?

 this design is a stillborn.
 
 what needs done is to allow different kinds of containers implement different
interfaces. in fact a better way to factor things is via iterators, as stl has
shown.

 
 You didn't ask for an O(1) opIndex on a linked list. You asked for a 
 correct opIndex on a linked list.

the correct opIndex runs in o(1).

 Any sorting algorithm you name that 
 would work on an array would also work on this linked list. Admittedly, 
 insertion sort would be much faster than qsort, but if your library 
 provides that, you, knowing that you are using a linked list, would 
 choose the insertion sort algorithm.

no. the initial idea was for a design that allows that cool mixing and matching
gig thru interfaces without knowing what is where. but ur design leads to a lot
of unworkable mixing and matching.

it is a stillborn design.

 how do you make a hashtable, an array, and a linked list obey the same
interface? guess hashtable has stuff that others don't support?

 You have an interface for collections (you can add, remove, and get the 
 length, maybe a couple other things).

 
 this is an incomplete response. what do you add for a vector!(T)? A T. what do
you add for a hash!(T, U)? you tell me. and you tell me how you make that
signature consistent across vector and hash.

 
 interface List(T) : Collection!(T) {}
 
 class Vector(T) : List!(T) {}
 
 class HashMap(T, U) : Collection!(KeyValuePair!(T, U)) {}
 



that making vector!(string) and hash!(int, string) offer the same interface is
a tenuous proposition. 

 You have an interface for lists (they're collections, and you can index 
 them).

 
 wrong. don't ever mention a linear-time indexing operator in an interview. you
will fail it right then. you can always define linear-time indexing as a named
function. but never masquerade it as an index operator.

 
 If you create a linked list with O(1) indexing, that might suffice to 
 get you a PhD. If you claim that you can do so in an interview, you 
 should be required to show proof; and should you fail to do so, you will 
 probably be shown the door.
 
 Even if you did prove it in the interview, they would probably consider 
 you overqualified, unless the company's focus was data structures.

my point was opIndex should not be written for a list to begin with.

 Then you can use all the collection-oriented stuff with lists, and you 
 can do special list-type things with them if you want.

 these are serious questions. not in jest not rhetorical and not trick.

 Yes, but they're solved problems.

 
 apparently not since you failed at'em.

 
 You claimed that a problem with an inefficient solution has no solution, 
 and then you execrate me for providing an inefficient solution. Why?

because the correct answer was: a list cannot implement opIndex. it must be in
a different hierarchy branch than a vector. which reveals one of the wrongs in
the post i answered.

 Someone else comes along and writes a library of algorithms. The 
 algorithms can operate on any container that implements the necessary 
 operations.

 hm. things are starting to screech a bit. but let's see your answers to your
questions above.

 When someone clever comes along and writes a new sorting algorithm, I 
 can plug my new container class right into it, and get the algorithm for 
 free. Likewise for the guy with the clever new collection class.

 things ain't that simple.

 Collection-oriented library code will care sufficiently about 
 performance that this mix-and-match stuff is not feasible.

 
 what's that supposed to mean? you sayin' stl don't exist?

 
 No, I'm saying that for efficiency, you need to know about the internals 
 of a data structure to implement a number of collection-oriented 
 algorithms. Just like the linked list example.

wrong. you only need to define your abstract types appropriately. e.g. stl
defines forward and random iterators. a forward iterators has ++ but no [].
random iterator has both. so a random iterator can be substituted for a forward
iterator. but not the other way. bottom line, sort won't compile on a forward
iterator. your design allowed it to compile. which makes the design wrong.

 Almost 
 anything else doesn't care enough to take only an 
 AssociativeArrayHashSet and not a TreeSet or a LinkedList or a primitive 
 array.

 
 wrong for the reasons above.

 
 They expose very similar interfaces. You might care about which you 
 choose because of the efficiency of various operations, but most of your 
 code won't care which type it gets; it would still be correct.

no. you got the wrong notion of correctness.

 Well, sets have a property that no element appears in them twice, so 
 that is an algorithmic consideration sometimes.

finally a good point. true. thing is, that can't be told with types. indexing
can.

 saw this flick "the devil wears prada", an ok movie but one funny remark
stayed with me. "you are in desperate need of chanel." 

 i'll paraphrase. "you are in desperate need of stl." you need to learn stl and
then you'll figure why you can't plug a new sorting algorithm into a container.
you need more guarantees. and you need iterators.

 We don't bat an eye at the idea of containers & algorithms connecting to 
 one another using a reciprocal set of interfaces.

 i do. it's completely wrong. you need iterators that broker between containers
and algos. and iterators must give complexity guarantees.

 I don't. If I'm just going to iterate through the items of a collection, 
 I only care about the opApply. If I need to index stuff, I don't care if 
 I get a primitive array or Bob Dole's brand-new ArrayList class.

 
 you too are in desperate need for stl.

 
 You are in desperate need of System.Collections.Generic. Or 
 tango.util.container.

guess my advice fell on deaf ears eh.

btw my respect for tango improved when i found no opIndex in their list
container.

Aug 26 2008

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

"superdan" wrote
 Christopher Wright Wrote:

 superdan wrote:
 class LinkedList(T)
 {
 Node!(T) head;

 /** Gets the ith item of the list. Throws: SIGSEGV if i >= length of
 the list. Time complexity: O(N) for a list of length N. This operation
 is provided for completeness and not recommended for frequent use in
 large lists. */
 T opIndex(int i)
 {
 auto current = head;
 while (i)
 {
 current = current.next;
 }
 return current.value;
 }
 }

 oopsies. houston we got a problem here. problem is all that pluggable 
 sort business works only if it can count on a constant time opIndex. 
 why? because sort has right in its spec that it takes o(n log n) time. 
 if u pass LinkedList to it you obtain a nonsensical design that 
 compiles but runs afoul of the spec. because with that opIndex sort 
 will run in quadratic time and no amount of commentin' is gonna save it 
 from that.

 WRONG!
 Those sorting algorithms are correct. Their runtime is now O(n^2 log n)
 for this linked list.

 you got the wrong definition of correct. sort became a non-scalable algo 
 from a scalable algo. whacha gonna say next. bubble sort is viable?!?

O(n^2 log n) is still considered solved.  Anything that is not exponential 
is usually considered P-complete, meaning it can be solved in polynomial 
time.  The unsolvable problems are NP-complete, i.e. non-polynomial.  Non 
polynomial usually means n is in one of the exponents, e.g.:

O(2^n).

That being said, it doesn't take a genius to figure out that a standard 
sorting algorithm on a linked list while trying to use random access is 
going to run longer than the same sorting algorithm on a random-access list. 
But there are ways around this.  For instance, you can sort a linked list in 
O(n log n) time with (in pseudocode):

vector v = list; // copy all elements to v, O(n)
v.sort; // O(n lgn)
list.replaceAll(v); // O(n)

So the total is O(2n + n lgn), and we all know you always take the most 
significant part of the polynomial, so it then becomes:

O(n lgn)

Can I have my PhD now? :P

In all seriousness though, with the way you can call functions with arrays 
as the first argument like member functions, it almost seems like they are 
already classes.  One thing I have against having a string class be the 
default, is that you can use substring on a string in D without any heap 
allocation, and it is super-fast.  And I think substring (slicing) is one of 
the best features that D has.

FWIW, you can have both a string class and an array representing a string, 
and you can define the string class to use an array as it's backing storage. 
I do this in dcollections (ArrayList).  If you want the interface, wrap the 
array, if you want the speed of an array, it is accessible as a member. 
This allows you to decide whichever one you want to use.  You can even use 
algorithms on the array like sort by using the member because you are 
accessing the actual storage of the ArrayList.

-Steve

Aug 26 2008

superdan <super dan.org> writes:

Steven Schveighoffer Wrote:

 "superdan" wrote
 Christopher Wright Wrote:

 superdan wrote:
 class LinkedList(T)
 {
 Node!(T) head;

 /** Gets the ith item of the list. Throws: SIGSEGV if i >= length of
 the list. Time complexity: O(N) for a list of length N. This operation
 is provided for completeness and not recommended for frequent use in
 large lists. */
 T opIndex(int i)
 {
 auto current = head;
 while (i)
 {
 current = current.next;
 }
 return current.value;
 }
 }

 oopsies. houston we got a problem here. problem is all that pluggable 
 sort business works only if it can count on a constant time opIndex. 
 why? because sort has right in its spec that it takes o(n log n) time. 
 if u pass LinkedList to it you obtain a nonsensical design that 
 compiles but runs afoul of the spec. because with that opIndex sort 
 will run in quadratic time and no amount of commentin' is gonna save it 
 from that.

 WRONG!
 Those sorting algorithms are correct. Their runtime is now O(n^2 log n)
 for this linked list.

 you got the wrong definition of correct. sort became a non-scalable algo 
 from a scalable algo. whacha gonna say next. bubble sort is viable?!?

 
 O(n^2 log n) is still considered solved.

of course. for traveling salesman that is. just not sorting. o(n^2 log n)
sorting algo, that's  a no-go. notice also i didn't say solved/unsolved. i said
scalable. stuff beyond n log n don't scale.

  Anything that is not exponential 
 is usually considered P-complete, meaning it can be solved in polynomial 
 time.  The unsolvable problems are NP-complete, i.e. non-polynomial.  Non 
 polynomial usually means n is in one of the exponents, e.g.:
 
 O(2^n).

sure thing.

 That being said, it doesn't take a genius to figure out that a standard 
 sorting algorithm on a linked list while trying to use random access is 
 going to run longer than the same sorting algorithm on a random-access list. 
 But there are ways around this.  For instance, you can sort a linked list in 
 O(n log n) time with (in pseudocode):
 
 vector v = list; // copy all elements to v, O(n)
 v.sort; // O(n lgn)
 list.replaceAll(v); // O(n)

sure thing. problem is, you must know it's a list. otherwise you wouldn't make
a copy. 

don't forget how this all started when answering tiny bits of my post. it
started with a dood claiming vector and list both have opIndex and then sort
works with both without knowing the details. it don't work with both.

 So the total is O(2n + n lgn), and we all know you always take the most 
 significant part of the polynomial, so it then becomes:
 
 O(n lgn)
 
 Can I have my PhD now? :P

sure. i must have seen an email with an offer somewhere ;)

 In all seriousness though, with the way you can call functions with arrays 
 as the first argument like member functions, it almost seems like they are 
 already classes.  One thing I have against having a string class be the 
 default, is that you can use substring on a string in D without any heap 
 allocation, and it is super-fast.  And I think substring (slicing) is one of 
 the best features that D has.
 
 FWIW, you can have both a string class and an array representing a string, 
 and you can define the string class to use an array as it's backing storage. 
 I do this in dcollections (ArrayList).  If you want the interface, wrap the 
 array, if you want the speed of an array, it is accessible as a member. 
 This allows you to decide whichever one you want to use.  You can even use 
 algorithms on the array like sort by using the member because you are 
 accessing the actual storage of the ArrayList.

that sounds better.

Aug 26 2008

"Chris R. Miller" <lordSaurontheGreat gmail.com> writes:

superdan wrote:
 Steven Schveighoffer Wrote:
=20
 "superdan" wrote
 Christopher Wright Wrote:

 superdan wrote:
 class LinkedList(T)
 {
 Node!(T) head;

 /** Gets the ith item of the list. Throws: SIGSEGV if i >=3D lengt=






h of
 the list. Time complexity: O(N) for a list of length N. This opera=






tion
 is provided for completeness and not recommended for frequent use =






in
 large lists. */
 T opIndex(int i)
 {
 auto current =3D head;
 while (i)
 {
 current =3D current.next;
 }
 return current.value;
 }
 }

 oopsies. houston we got a problem here. problem is all that pluggab=





le=20
 sort business works only if it can count on a constant time opIndex=





=2E=20
 why? because sort has right in its spec that it takes o(n log n) ti=





me.=20
 if u pass LinkedList to it you obtain a nonsensical design that=20
 compiles but runs afoul of the spec. because with that opIndex sort=





=20
 will run in quadratic time and no amount of commentin' is gonna sav=





e it=20
 from that.

 WRONG!
 Those sorting algorithms are correct. Their runtime is now O(n^2 log=




 n)
 for this linked list.

 you got the wrong definition of correct. sort became a non-scalable a=



lgo=20
 from a scalable algo. whacha gonna say next. bubble sort is viable?!?=



 O(n^2 log n) is still considered solved.

=20
 of course. for traveling salesman that is. just not sorting. o(n^2 log =

n) sorting algo, that's  a no-go. notice also i didn't say solved/unsolve=
d. i said scalable. stuff beyond n log n don't scale.

Why are people unable to have the brains to understand that they're
using data structures in an suboptimal nature?

Furthermore, you're faulting linked lists as having a bad opIndex.  Why
not implement a cursor (Java LinkedList-like iterator) in the opIndex
function?  Thus you could retain the reference to the last indexed
location, and simply use that instead of the root node when calling
opIndex.  Granted that whenever the contents of the list are modified
that reference would have to be considered invalid (start from the root
node again), but it'd work with an O(1) efficiency for sequential
accesses from 0 to length.  True, it'll add another pointer to a node in
memory, as well as a integer representing the position of that node
reference.

 sure thing. problem is, you must know it's a list. otherwise you wouldn=

't make a copy.=20
=20
 don't forget how this all started when answering tiny bits of my post. =

it started with a dood claiming vector and list both have opIndex and the=
n sort works with both without knowing the details. it don't work with bo=
th.

Wrong.  It works.  That it's not precisely what the spec for sort
dictates (which is probably in error, since no spec can guarantee a
precise efficiency if it doesn't know the precise container type).  You
are also misinterpreting the spec.  It is saying that it uses a specific
efficiency of algorithm, not that you can arbitrarily expect a certain
efficiency out of it regardless of how dumb you might be with the choice
of container you use.

 So the total is O(2n + n lgn), and we all know you always take the mos=


t=20
 significant part of the polynomial, so it then becomes:

 O(n lgn)

 Can I have my PhD now? :P

=20
 sure. i must have seen an email with an offer somewhere ;)

A Ph.D from superdan... gee, I'd value that just above my MSDN
membership.  Remember: I value nothing less than my MSDN membership.

Aug 26 2008

superdan <super dan.org> writes:

Chris R. Miller Wrote:

 superdan wrote:
 Steven Schveighoffer Wrote:
 
 "superdan" wrote
 Christopher Wright Wrote:

 superdan wrote:
 class LinkedList(T)
 {
 Node!(T) head;

 /** Gets the ith item of the list. Throws: SIGSEGV if i >= length of
 the list. Time complexity: O(N) for a list of length N. This operation
 is provided for completeness and not recommended for frequent use in
 large lists. */
 T opIndex(int i)
 {
 auto current = head;
 while (i)
 {
 current = current.next;
 }
 return current.value;
 }
 }

 oopsies. houston we got a problem here. problem is all that pluggable 
 sort business works only if it can count on a constant time opIndex. 
 why? because sort has right in its spec that it takes o(n log n) time. 
 if u pass LinkedList to it you obtain a nonsensical design that 
 compiles but runs afoul of the spec. because with that opIndex sort 
 will run in quadratic time and no amount of commentin' is gonna save it 
 from that.

 WRONG!
 Those sorting algorithms are correct. Their runtime is now O(n^2 log n)
 for this linked list.

 you got the wrong definition of correct. sort became a non-scalable algo 
 from a scalable algo. whacha gonna say next. bubble sort is viable?!?

 O(n^2 log n) is still considered solved.

 
 of course. for traveling salesman that is. just not sorting. o(n^2 log n)
sorting algo, that's  a no-go. notice also i didn't say solved/unsolved. i said
scalable. stuff beyond n log n don't scale.

 
 Why are people unable to have the brains to understand that they're
 using data structures in an suboptimal nature?

coz they wants to do generic programming. they can't know what structures are
using. so mos def structures must define expressive interfaces that describe
their capabilities.

 Furthermore, you're faulting linked lists as having a bad opIndex.  Why
 not implement a cursor (Java LinkedList-like iterator) in the opIndex
 function?  Thus you could retain the reference to the last indexed
 location, and simply use that instead of the root node when calling
 opIndex.  Granted that whenever the contents of the list are modified
 that reference would have to be considered invalid (start from the root
 node again), but it'd work with an O(1) efficiency for sequential
 accesses from 0 to length.  True, it'll add another pointer to a node in
 memory, as well as a integer representing the position of that node
 reference.

you: "this scent will make skunk farts stink less."
me: "let's kick the gorram skunk outta here!"

 sure thing. problem is, you must know it's a list. otherwise you wouldn't make
a copy. 
 
 don't forget how this all started when answering tiny bits of my post. it
started with a dood claiming vector and list both have opIndex and then sort
works with both without knowing the details. it don't work with both.

 
 Wrong.  It works.  That it's not precisely what the spec for sort
 dictates (which is probably in error, since no spec can guarantee a
 precise efficiency if it doesn't know the precise container type).

sure it can. in big oh.

  You
 are also misinterpreting the spec.  It is saying that it uses a specific
 efficiency of algorithm, not that you can arbitrarily expect a certain
 efficiency out of it regardless of how dumb you might be with the choice
 of container you use.

in stl the spec says as i say. in d the spec is not precise. it should.

 So the total is O(2n + n lgn), and we all know you always take the most 
 significant part of the polynomial, so it then becomes:

 O(n lgn)

 Can I have my PhD now? :P

 
 sure. i must have seen an email with an offer somewhere ;)

 
 A Ph.D from superdan... gee, I'd value that just above my MSDN
 membership.  Remember: I value nothing less than my MSDN membership.

humor is a sign of intelligence. but let me explain it. i was referring to the
spam emails advertising phds from non-accredited universities.

Aug 27 2008

"Chris R. Miller" <lordSaurontheGreat gmail.com> writes:

superdan wrote:
 Chris R. Miller Wrote:
 Furthermore, you're faulting linked lists as having a bad opIndex.  Wh=


y
 not implement a cursor (Java LinkedList-like iterator) in the opIndex
 function?  Thus you could retain the reference to the last indexed
 location, and simply use that instead of the root node when calling
 opIndex.  Granted that whenever the contents of the list are modified
 that reference would have to be considered invalid (start from the roo=


t
 node again), but it'd work with an O(1) efficiency for sequential
 accesses from 0 to length.  True, it'll add another pointer to a node =


in
 memory, as well as a integer representing the position of that node
 reference.

=20
 you: "this scent will make skunk farts stink less."
 me: "let's kick the gorram skunk outta here!"

I would imagine that you'll have a hard time convincing others that
linked-lists are evil when you apparently have two broken shift keys.

 Wrong.  It works.  That it's not precisely what the spec for sort
 dictates (which is probably in error, since no spec can guarantee a
 precise efficiency if it doesn't know the precise container type).

=20
 sure it can. in big oh.

Which is simply identifying the algorithm used by its efficiency.  If
you're not familiar with the types of algorithms, it tells you the
proximate efficiency of the algorithm used.  If you are familiar with
algorithms, then you can identify the type of algorithm used so you can
better leverage it to do what you want.

  You
 are also misinterpreting the spec.  It is saying that it uses a specif=


ic
 efficiency of algorithm, not that you can arbitrarily expect a certain=


 efficiency out of it regardless of how dumb you might be with the choi=


ce
 of container you use.

=20
 in stl the spec says as i say. in d the spec is not precise. it should.=


Yes, it probably should explicitly say that "sort uses the xxxxx
algorithm, which gives a proximate efficiency of O(n log n) when used
with optimal data structures."

You honestly cannot write a spec for generic programming and expect
uniform performance.

Trying to move back on topic, yes, I believe it is important that such a
degree of ambiguity be avoided with something so simple as string
handling.  So no strings-as-objects.  But writing a String class and
using that wherever possible is advantageous, especially because it does
not remove the ability of the language to support the simpler string
implementation.

 A Ph.D from superdan... gee, I'd value that just above my MSDN
 membership.  Remember: I value nothing less than my MSDN membership.

=20
 humor is a sign of intelligence. but let me explain it. i was referring=

 to the spam emails advertising phds from non-accredited universities.

You get different spam than I do then.  I just get junk about cheap
Canadian pharmaceuticals and dead South African oil moguls who have left
large amounts of money in my name.

Aug 27 2008

Dee Girl <deegirl noreply.com> writes:

Chris R. Miller Wrote:

 superdan wrote:
 Chris R. Miller Wrote:
 Furthermore, you're faulting linked lists as having a bad opIndex.  Why
 not implement a cursor (Java LinkedList-like iterator) in the opIndex
 function?  Thus you could retain the reference to the last indexed
 location, and simply use that instead of the root node when calling
 opIndex.  Granted that whenever the contents of the list are modified
 that reference would have to be considered invalid (start from the root
 node again), but it'd work with an O(1) efficiency for sequential
 accesses from 0 to length.  True, it'll add another pointer to a node in
 memory, as well as a integer representing the position of that node
 reference.

 
 you: "this scent will make skunk farts stink less."
 me: "let's kick the gorram skunk outta here!"

 
 I would imagine that you'll have a hard time convincing others that
 linked-lists are evil when you apparently have two broken shift keys.
 
 Wrong.  It works.  That it's not precisely what the spec for sort
 dictates (which is probably in error, since no spec can guarantee a
 precise efficiency if it doesn't know the precise container type).

 
 sure it can. in big oh.

 
 Which is simply identifying the algorithm used by its efficiency.  If
 you're not familiar with the types of algorithms, it tells you the
 proximate efficiency of the algorithm used.  If you are familiar with
 algorithms, then you can identify the type of algorithm used so you can
 better leverage it to do what you want.
 
  You
 are also misinterpreting the spec.  It is saying that it uses a specific
 efficiency of algorithm, not that you can arbitrarily expect a certain
 efficiency out of it regardless of how dumb you might be with the choice
 of container you use.

 
 in stl the spec says as i say. in d the spec is not precise. it should.

 
 Yes, it probably should explicitly say that "sort uses the xxxxx
 algorithm, which gives a proximate efficiency of O(n log n) when used
 with optimal data structures."
 
 You honestly cannot write a spec for generic programming and expect
 uniform performance.

But this is what STL did. Sorry, Dee Girl

Aug 27 2008

"Chris R. Miller" <lordSaurontheGreat gmail.com> writes:

Dee Girl wrote:
 Chris R. Miller Wrote:
 You honestly cannot write a spec for generic programming and expect
 uniform performance.

=20
 But this is what STL did. Sorry, Dee Girl

Reading back through the STL intro, it seems that all this STL power
comes from the iterator.  Supposing I wrote a horrid iterator (sort of
like the annoying O(n) opIndex previously discussed) I don't see why STL
is "immune" to the same weakness of a slower data structure.

I can see how STL is more powerful in that you can pick and choose the
algorithm to use, but at this point I think we're discussing changing
the nature of the sort property in D at a fundamental level.

I still just don't see the (apparently obvious) advantage of STL.
Disclaimer: I do not /know/ STL that well at all.  I came from Java with
a ___brief___ dabbling in C/C++.  So I'm not trying to be annoying,
stupid, or blind - I'm just ignorant of what you see that I don't.

Aug 27 2008

Don <nospam nospam.com.au> writes:

Chris R. Miller wrote:
 Dee Girl wrote:
 Chris R. Miller Wrote:
 You honestly cannot write a spec for generic programming and expect
 uniform performance.

 But this is what STL did. Sorry, Dee Girl

 
 Reading back through the STL intro, it seems that all this STL power
 comes from the iterator.  Supposing I wrote a horrid iterator (sort of
 like the annoying O(n) opIndex previously discussed) I don't see why STL
 is "immune" to the same weakness of a slower data structure.
 
 I can see how STL is more powerful in that you can pick and choose the
 algorithm to use, but at this point I think we're discussing changing
 the nature of the sort property in D at a fundamental level.
 
 I still just don't see the (apparently obvious) advantage of STL.

Read Alexander Stepanov's notes. They are fantastic.

http://www.stepanovpapers.com/notes.pdf

Aug 28 2008

Dee Girl <deegirl noreply.com> writes:

Chris R. Miller Wrote:

 superdan wrote:
 Steven Schveighoffer Wrote:
 
 "superdan" wrote
 Christopher Wright Wrote:

 superdan wrote:
 class LinkedList(T)
 {
 Node!(T) head;

 /** Gets the ith item of the list. Throws: SIGSEGV if i >= length of
 the list. Time complexity: O(N) for a list of length N. This operation
 is provided for completeness and not recommended for frequent use in
 large lists. */
 T opIndex(int i)
 {
 auto current = head;
 while (i)
 {
 current = current.next;
 }
 return current.value;
 }
 }

 oopsies. houston we got a problem here. problem is all that pluggable 
 sort business works only if it can count on a constant time opIndex. 
 why? because sort has right in its spec that it takes o(n log n) time. 
 if u pass LinkedList to it you obtain a nonsensical design that 
 compiles but runs afoul of the spec. because with that opIndex sort 
 will run in quadratic time and no amount of commentin' is gonna save it 
 from that.

 WRONG!
 Those sorting algorithms are correct. Their runtime is now O(n^2 log n)
 for this linked list.

 you got the wrong definition of correct. sort became a non-scalable algo 
 from a scalable algo. whacha gonna say next. bubble sort is viable?!?

 O(n^2 log n) is still considered solved.

 
 of course. for traveling salesman that is. just not sorting. o(n^2 log n)
sorting algo, that's  a no-go. notice also i didn't say solved/unsolved. i said
scalable. stuff beyond n log n don't scale.

 
 Why are people unable to have the brains to understand that they're
 using data structures in an suboptimal nature?
 
 Furthermore, you're faulting linked lists as having a bad opIndex.  Why
 not implement a cursor (Java LinkedList-like iterator) in the opIndex
 function?  Thus you could retain the reference to the last indexed
 location, and simply use that instead of the root node when calling
 opIndex.  Granted that whenever the contents of the list are modified
 that reference would have to be considered invalid (start from the root
 node again), but it'd work with an O(1) efficiency for sequential
 accesses from 0 to length.  True, it'll add another pointer to a node in
 memory, as well as a integer representing the position of that node
 reference.

I am sorry to enter discussion. But I have some thing to say. Please do not
scare me ^_^.

I think Super Dan choose wrong example sort. Because sort is O(n log n) even
for list. But good example is find. Taken collection that gives length and
opIndex abstraction. Then I write find easy with index. It is slow O(n*n) for
list. But with optimization from Chris it is fast again. But if I want to write
findLast. Find last element equal to some thing. Then I go back. But going back
the optimization never works. I am again to O(n*n)!

This is important because abstraction. You want to write find abstract. Also
you want write container abstract. And you want both to work together well. If
you choose algorithm manually "you cheat". You break abstraction. Because you
want abstract algorithm work on abstract container. Not concrete algorithm on
concrete container.

Also is not only detail. When call findLast on container I expect better or
worse depending on optimization of library. But I expect proportional with
number of elements. If I know is O(n*n) maybe I want redesign. O(n*n) is really
bad. 1000 elements is not many. But 1000000 operations is many.

I took two data structures classes. What each structure gives fast is
essential. Not detail. I am not sure is clear what I say. Structures are
special for certain operations. For example there is suffix tree. It is for
fast common substring. Suffix tree must not have same interface as O(n*n)
search. Because algorithm should not accept both. If you say list has random
access it is naive I think (sorry!). Everybody in class could laugh. To find
index in list is linear search. A similar example an array string[] can define
overload a["abc"] to do linear search for "abc". But search is not indexing. It
must be name search find or linearSearch.

Aug 27 2008

Robert Fraser <fraserofthenight gmail.com> writes:

superdan wrote:
 Christopher Wright Wrote:
 
 superdan wrote:
 class LinkedList(T)
 {
 	Node!(T) head;

 	/** Gets the ith item of the list. Throws: SIGSEGV if i >= length of 
 the list. Time complexity: O(N) for a list of length N. This operation 
 is provided for completeness and not recommended for frequent use in 
 large lists. */
 	T opIndex(int i)
 	{
 		auto current = head;
 		while (i)
 		{
 			current = current.next;
 		}
 		return current.value;
 	}
 }

 oopsies. houston we got a problem here. problem is all that pluggable sort
business works only if it can count on a constant time opIndex. why? because
sort has right in its spec that it takes o(n log n) time. if u pass LinkedList
to it you obtain a nonsensical design that compiles but runs afoul of the spec.
because with that opIndex sort will run in quadratic time and no amount of
commentin' is gonna save it from that.

 WRONG!
 Those sorting algorithms are correct. Their runtime is now O(n^2 log n) 
 for this linked list.

 
 you got the wrong definition of correct. sort became a non-scalable algo from
a scalable algo. whacha gonna say next. bubble sort is viable?!?

No, YOU got the wrong definition of correct. "Correct" and "scalabale" 
are different words. As are "correct" and "viable". In Java, I've been 
known to index into linked lists... usually ones with ~5 elements, but 
I've done it.

 this design is a stillborn.

 what needs done is to allow different kinds of containers implement different
interfaces. in fact a better way to factor things is via iterators, as stl has
shown.

 You didn't ask for an O(1) opIndex on a linked list. You asked for a 
 correct opIndex on a linked list.

 
 the correct opIndex runs in o(1).

No, the SCALABLE opIndex runs in O(1). The CORRECT opIndex can run in 
O(n^n) and still be correct.

 Any sorting algorithm you name that 
 would work on an array would also work on this linked list. Admittedly, 
 insertion sort would be much faster than qsort, but if your library 
 provides that, you, knowing that you are using a linked list, would 
 choose the insertion sort algorithm.

 
 no. the initial idea was for a design that allows that cool mixing and
matching gig thru interfaces without knowing what is where. but ur design leads
to a lot of unworkable mixing and matching.

Again, WORKABLE, just not SCALABLE. You should wrap your head around 
this concept, since it's been around for about 30 years now.

 it is a stillborn design.

Tell that to Java, the world's most used programming language for new 

Tango, one of D's standard libraries.

 my point was opIndex should not be written for a list to begin with.

Yes it should be. Here's a fairly good example: Say you have a GUI 
control that displays a list and allows the user to insert or remove 
items from the list. It also allows the user to double-click on an item 
at a given position. Looking up what position maps to what item is an 
opIndex. Would this problem be better solved using an array (Vector)? 
Maybe. Luckily, if you used a List interface throughout your code, you 
can change one line, and it'll work wither way.

 wrong. you only need to define your abstract types appropriately. e.g. stl
defines forward and random iterators. a forward iterators has ++ but no [].
random iterator has both. so a random iterator can be substituted for a forward
iterator. but not the other way. bottom line, sort won't compile on a forward
iterator. your design allowed it to compile. which makes the design wrong.

STL happens to be one design and one world-view. It's a good one, but 
it's not the only one. My main problem with the STL is that it takes 
longer to learn than the Java/.NET standard libraries -- and thus the 
cost of a programmer who knows it is higher. But there are language 
considerations in there too, and this is a topic for another day.

 btw my respect for tango improved when i found no opIndex in their list
container.

http://www.dsource.org/projects/tango/browser/trunk/tango/util/collection/LinkSeq.d#L176

Aug 26 2008

Lars Ivar Igesund <larsivar igesund.net> writes:

Robert Fraser wrote:

 superdan wrote:

 btw my respect for tango improved when i found no opIndex in their list
 container.

 

http://www.dsource.org/projects/tango/browser/trunk/tango/util/collection/LinkSeq.d#L176

This particular collection package is deprecated.

-- 
Lars Ivar Igesund
blog at http://larsivi.net
DSource, #d.tango & #D: larsivi
Dancing the Tango

Aug 26 2008

Robert Fraser <fraserofthenight gmail.com> writes:

Lars Ivar Igesund wrote:
 Robert Fraser wrote:
 
 superdan wrote:

 
 btw my respect for tango improved when i found no opIndex in their list
 container.


 http://www.dsource.org/projects/tango/browser/trunk/tango/util/collection/LinkSeq.d#L176
 
 This particular collection package is deprecated.

The new package has this feature too:
http://www.dsource.org/projects/tango/browser/trunk/tango/util/container/Slink.d#L248

It's a good feature to have (I wouldn't consider a list class complete 
without it), it just shouldn't be abused.

Aug 26 2008

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

"Robert Fraser" <fraserofthenight gmail.com> wrote in message 
news:g91fva$1if4$1 digitalmars.com...
 Lars Ivar Igesund wrote:
 Robert Fraser wrote:

 superdan wrote:

 btw my respect for tango improved when i found no opIndex in their list
 container.


 http://www.dsource.org/projects/tango/browser/trunk/tango/util/collection/LinkSeq.d#L176

 This particular collection package is deprecated.

 The new package has this feature too:
 http://www.dsource.org/projects/tango/browser/trunk/tango/util/container/Slink.d#L248

 It's a good feature to have (I wouldn't consider a list class complete 
 without it), it just shouldn't be abused.

First, Slink is not really the public interface, it is the unit that 
LinkedList (and other containers) use to build linked lists.

Second, LinkedList implements a 'lookup by index', through the get function, 
but note that it is not implemented as an opIndex function.  an opIndex 
implies fast lookup (at least < O(n)).

I don't think these functions were intended to be used in sorting routines.

-Steve

Aug 26 2008

superdan <super dan.org> writes:

Robert Fraser Wrote:

 superdan wrote:
 Christopher Wright Wrote:
 
 superdan wrote:
 class LinkedList(T)
 {
 	Node!(T) head;

 	/** Gets the ith item of the list. Throws: SIGSEGV if i >= length of 
 the list. Time complexity: O(N) for a list of length N. This operation 
 is provided for completeness and not recommended for frequent use in 
 large lists. */
 	T opIndex(int i)
 	{
 		auto current = head;
 		while (i)
 		{
 			current = current.next;
 		}
 		return current.value;
 	}
 }

 oopsies. houston we got a problem here. problem is all that pluggable sort
business works only if it can count on a constant time opIndex. why? because
sort has right in its spec that it takes o(n log n) time. if u pass LinkedList
to it you obtain a nonsensical design that compiles but runs afoul of the spec.
because with that opIndex sort will run in quadratic time and no amount of
commentin' is gonna save it from that.

 WRONG!
 Those sorting algorithms are correct. Their runtime is now O(n^2 log n) 
 for this linked list.

 
 you got the wrong definition of correct. sort became a non-scalable algo from
a scalable algo. whacha gonna say next. bubble sort is viable?!?

 
 No, YOU got the wrong definition of correct. "Correct" and "scalabale" 
 are different words. As are "correct" and "viable". In Java, I've been 
 known to index into linked lists... usually ones with ~5 elements, but 
 I've done it.

yeah. for starters my dictionary fails to list "scalabale".

 this design is a stillborn.

 what needs done is to allow different kinds of containers implement different
interfaces. in fact a better way to factor things is via iterators, as stl has
shown.

 You didn't ask for an O(1) opIndex on a linked list. You asked for a 
 correct opIndex on a linked list.

 
 the correct opIndex runs in o(1).

 
 No, the SCALABLE opIndex runs in O(1). The CORRECT opIndex can run in 
 O(n^n) and still be correct.

stepanov has shown that for composable operations the complexity must be part
of the specification. otherwise composition easily leads to high-order
polynomial that fail to terminate in reasonable time. opIndex is an indexing
operator expected to run in constant time, and algorithms rely on that. so no.
opIndex running in o(n^n) is incorrect because it fails it spec.

 Any sorting algorithm you name that 
 would work on an array would also work on this linked list. Admittedly, 
 insertion sort would be much faster than qsort, but if your library 
 provides that, you, knowing that you are using a linked list, would 
 choose the insertion sort algorithm.

 
 no. the initial idea was for a design that allows that cool mixing and
matching gig thru interfaces without knowing what is where. but ur design leads
to a lot of unworkable mixing and matching.

 
 Again, WORKABLE, just not SCALABLE. You should wrap your head around 
 this concept, since it's been around for about 30 years now.

guess would do you good to entertain just for a second the idea that i know
what i'm talking about and you don't get me.

 it is a stillborn design.

 
 Tell that to Java, the world's most used programming language for new 

 Tango, one of D's standard libraries.

here you hint you don't understand what i'm talking about indeed. neither of

which i'm perfectly fine with.

 my point was opIndex should not be written for a list to begin with.

 
 Yes it should be. Here's a fairly good example: Say you have a GUI 
 control that displays a list and allows the user to insert or remove 
 items from the list. It also allows the user to double-click on an item 
 at a given position. Looking up what position maps to what item is an 
 opIndex. Would this problem be better solved using an array (Vector)? 
 Maybe. Luckily, if you used a List interface throughout your code, you 
 can change one line, and it'll work wither way.

funny you should mention that. window manager in windows 3.1 worked exactly
like that. users noticed that the more windows the opened, the longer it took
to open a new window. with new systems and more memory people would have many
windows. before long this became a big issue. windows 95 fixed that.

never misunderestimate scalability.

 wrong. you only need to define your abstract types appropriately. e.g. stl
defines forward and random iterators. a forward iterators has ++ but no [].
random iterator has both. so a random iterator can be substituted for a forward
iterator. but not the other way. bottom line, sort won't compile on a forward
iterator. your design allowed it to compile. which makes the design wrong.

 
 STL happens to be one design and one world-view. It's a good one, but 
 it's not the only one. My main problem with the STL is that it takes 
 longer to learn than the Java/.NET standard libraries -- and thus the 
 cost of a programmer who knows it is higher. But there are language 
 considerations in there too, and this is a topic for another day.

cool. don't see how this all relates to the problem at hand.

 btw my respect for tango improved when i found no opIndex in their list
container.

 
 http://www.dsource.org/projects/tango/browser/trunk/tango/util/collection/LinkSeq.d#L176

here you gently provide irrefutable proof you don't get what i'm sayin'.
schveiguy did. the page fails to list opIndex. there is a function called
'get'. better yet. the new package lists a function 'nth' suggesting a linear
walk to the nth element. good job tango fellas.

Aug 26 2008

Robert Fraser <fraserofthenight gmail.com> writes:

superdan wrote:
 No, the SCALABLE opIndex runs in O(1). The CORRECT opIndex can run in 
 O(n^n) and still be correct.

 
 stepanov has shown that for composable operations the complexity must be part
of the specification. otherwise composition easily leads to high-order
polynomial that fail to terminate in reasonable time. opIndex is an indexing
operator expected to run in constant time, and algorithms rely on that. so no.
opIndex running in o(n^n) is incorrect because it fails it spec.

Um.. . how would one "show" that. I'm not talking theoretical bullshit 
here, I'm talking real-world requirements. Some specs of operations 
(composable or not) list their time/memory complexity. Most do not. 
They're still usable. I agree that a standard library sort routine is 
one that *should* list its time complexity. My internal function for 
enumerating registry keys doesn't need to.

 here you hint you don't understand what i'm talking about indeed. neither of

which i'm perfectly fine with.

I guess I didn't understand what you were saying because you _never 
mentioned_ you were talking only about opIndex and not other functions. 
I don't see the difference between a[n] and a.get(n); the former is just 
a shorter syntax. The D spec certainly doesn't make any guarantees about 
the time/memory complexity of opIndex; it's up to the implementing class 
to do so. In fact, the D spec makes no time/memory complexity guarantees 
about sort for arbitrary user-defined types, either, so maybe you 
shouldn't use that.

 funny you should mention that. window manager in windows 3.1 worked exactly
like that. users noticed that the more windows the opened, the longer it took
to open a new window. with new systems and more memory people would have many
windows. before long this became a big issue. windows 95 fixed that.
 
 never misunderestimate scalability.

I don't know enough about GUI programming to say for sure, but that 
suggests a window manager shouldn't be written using linked lists. It 
doesn't suggest that getting a value from an arbitrary index in a linked 
list is useless (in fact, it shows the opposite -- that it works fine -- 
it just shows its not scalable).

Aug 26 2008

superdan <super dan.org> writes:

Robert Fraser Wrote:

 superdan wrote:
 No, the SCALABLE opIndex runs in O(1). The CORRECT opIndex can run in 
 O(n^n) and still be correct.

 
 stepanov has shown that for composable operations the complexity must be part
of the specification. otherwise composition easily leads to high-order
polynomial that fail to terminate in reasonable time. opIndex is an indexing
operator expected to run in constant time, and algorithms rely on that. so no.
opIndex running in o(n^n) is incorrect because it fails it spec.

 
 Um.. . how would one "show" that. I'm not talking theoretical bullshit 
 here, I'm talking real-world requirements.

hey. hey. watch'em manners :) he's shown it by putting stl together.

 Some specs of operations 
 (composable or not) list their time/memory complexity. Most do not. 
 They're still usable. I agree that a standard library sort routine is 
 one that *should* list its time complexity. My internal function for 
 enumerating registry keys doesn't need to.

sure thing.

 here you hint you don't understand what i'm talking about indeed. neither of

which i'm perfectly fine with.

 
 I guess I didn't understand what you were saying because you _never 
 mentioned_ you were talking only about opIndex and not other functions.

well then allow me to quote myself: "oopsies. houston we got a problem here.
problem is all that pluggable sort business works only if it can count on a
constant time opIndex. why? because sort has right in its spec that it takes
o(n log n) time. if u pass LinkedList to it you obtain a nonsensical design
that compiles but runs afoul of the spec. because with that opIndex sort will
run in quadratic time and no amount of commentin' is gonna save it from that."

 I don't see the difference between a[n] and a.get(n); the former is just 
 a shorter syntax.

wrong. the former is used by sort. the latter ain't.

 The D spec certainly doesn't make any guarantees about 
 the time/memory complexity of opIndex; it's up to the implementing class 
 to do so.

it don't indeed. it should. that's a problem with the spec.

 In fact, the D spec makes no time/memory complexity guarantees 
 about sort for arbitrary user-defined types, either, so maybe you 
 shouldn't use that.

makes guarantees in terms of the primitive operations used.

 funny you should mention that. window manager in windows 3.1 worked exactly
like that. users noticed that the more windows the opened, the longer it took
to open a new window. with new systems and more memory people would have many
windows. before long this became a big issue. windows 95 fixed that.
 
 never misunderestimate scalability.

 
 I don't know enough about GUI programming to say for sure, but that 
 suggests a window manager shouldn't be written using linked lists. It 
 doesn't suggest that getting a value from an arbitrary index in a linked 
 list is useless (in fact, it shows the opposite -- that it works fine -- 
 it just shows its not scalable).

problem's too many people talk without knowin' enuff 'bout stuff. there's only
a handful of subjects i know any about. and i try to not stray. when it come
about any stuff i know i'm amazed readin' here at how many just fudge their way
around.

Aug 26 2008

"Denis Koroskin" <2korden gmail.com> writes:

On Tue, 26 Aug 2008 23:29:33 +0400, superdan <super dan.org> wrote:
[snip]
 The D spec certainly doesn't make any guarantees about
 the time/memory complexity of opIndex; it's up to the implementing class
 to do so.

 it don't indeed. it should. that's a problem with the spec.

I agree. You can't rely on function invokation, i.e. the following might  
be slow as death:

auto n = collection.at(i);
auto len = collection.length();

but index operations and properties getters should be real-time and have  
O(1) complexity by design.

auto n = collection[i];
auto len = collection.length;

Aug 26 2008

"Denis Koroskin" <2korden gmail.com> writes:

On Tue, 26 Aug 2008 23:58:10 +0400, Denis Koroskin <2korden gmail.com>  
wrote:

 On Tue, 26 Aug 2008 23:29:33 +0400, superdan <super dan.org> wrote:
 [snip]
 The D spec certainly doesn't make any guarantees about
 the time/memory complexity of opIndex; it's up to the implementing  
 class
 to do so.

 it don't indeed. it should. that's a problem with the spec.

 I agree. You can't rely on function invokation, i.e. the following might  
 be slow as death:

 auto n = collection.at(i);
 auto len = collection.length();

 but index operations and properties getters should be real-time and have  
 O(1) complexity by design.

 auto n = collection[i];
 auto len = collection.length;

The same goes to assignment, casts, comparisons, shifts, i.e. everything  
that doesn't have a function invokation syntax.

BTW, that's one of the main C++ criticisms: you can't say how much time a  
given line may take. It is predictable in C because it lacks operator  
overloading.

Aug 26 2008

Benji Smith <dlanguage benjismith.net> writes:

Denis Koroskin wrote:
 I agree. You can't rely on function invokation, i.e. the following 
 might be slow as death:

 auto n = collection.at(i);
 auto len = collection.length();

 but index operations and properties getters should be real-time and 
 have O(1) complexity by design.

 auto n = collection[i];
 auto len = collection.length;

 
 The same goes to assignment, casts, comparisons, shifts, i.e. everything 
 that doesn't have a function invokation syntax.

This is the main reason I dislike D's optional parentheses for function 
invocations:

   something.dup; // looks cheap
   something.dup(); // looks expensive

Since any zero-arg function can have its parens omitted, it's harder to 
read code and see where the expensive operations are.

--benji

Aug 26 2008

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

"Denis Koroskin" wrote
 On Tue, 26 Aug 2008 23:29:33 +0400, superdan <super dan.org> wrote:
 [snip]
 The D spec certainly doesn't make any guarantees about
 the time/memory complexity of opIndex; it's up to the implementing class
 to do so.

 it don't indeed. it should. that's a problem with the spec.

 I agree. You can't rely on function invokation, i.e. the following might 
 be slow as death:

 auto n = collection.at(i);
 auto len = collection.length();

 but index operations and properties getters should be real-time and have 
 O(1) complexity by design.

 auto n = collection[i];
 auto len = collection.length;

less than O(n) complexity please :)  Think of tree map complexity which is 
usually O(lg n) for lookups.  And the opIndex syntax is sooo nice for maps 
:)

In general, opIndex just shouldn't imply 'linear search', as its roots come 
from array lookup, which is always O(1).  The perception is that x[n] should 
be fast.  Otherwise you have coders using x[n] all over the place thinking 
they are doing quick lookups, and wondering why their code is so damned 
slow.

-Steve

Aug 26 2008

"Denis Koroskin" <2korden gmail.com> writes:

On Wed, 27 Aug 2008 00:30:07 +0400, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:

 "Denis Koroskin" wrote
 On Tue, 26 Aug 2008 23:29:33 +0400, superdan <super dan.org> wrote:
 [snip]
 The D spec certainly doesn't make any guarantees about
 the time/memory complexity of opIndex; it's up to the implementing  
 class
 to do so.

 it don't indeed. it should. that's a problem with the spec.

 I agree. You can't rely on function invokation, i.e. the following might
 be slow as death:

 auto n = collection.at(i);
 auto len = collection.length();

 but index operations and properties getters should be real-time and have
 O(1) complexity by design.

 auto n = collection[i];
 auto len = collection.length;

 less than O(n) complexity please :)  Think of tree map complexity which  
 is
 usually O(lg n) for lookups.  And the opIndex syntax is sooo nice for  
 maps
 :)

 In general, opIndex just shouldn't imply 'linear search', as its roots  
 come
 from array lookup, which is always O(1).  The perception is that x[n]  
 should
 be fast.  Otherwise you have coders using x[n] all over the place  
 thinking
 they are doing quick lookups, and wondering why their code is so damned
 slow.

 -Steve

Yes, that was a rash statement.

Aug 26 2008

superdan <super dan.org> writes:

Denis Koroskin Wrote:

 On Wed, 27 Aug 2008 00:30:07 +0400, Steven Schveighoffer  
 <schveiguy yahoo.com> wrote:
 
 "Denis Koroskin" wrote
 On Tue, 26 Aug 2008 23:29:33 +0400, superdan <super dan.org> wrote:
 [snip]
 The D spec certainly doesn't make any guarantees about
 the time/memory complexity of opIndex; it's up to the implementing  
 class
 to do so.

 it don't indeed. it should. that's a problem with the spec.

 I agree. You can't rely on function invokation, i.e. the following might
 be slow as death:

 auto n = collection.at(i);
 auto len = collection.length();

 but index operations and properties getters should be real-time and have
 O(1) complexity by design.

 auto n = collection[i];
 auto len = collection.length;

 less than O(n) complexity please :)  Think of tree map complexity which  
 is
 usually O(lg n) for lookups.  And the opIndex syntax is sooo nice for  
 maps
 :)

 In general, opIndex just shouldn't imply 'linear search', as its roots  
 come
 from array lookup, which is always O(1).  The perception is that x[n]  
 should
 be fast.  Otherwise you have coders using x[n] all over the place  
 thinking
 they are doing quick lookups, and wondering why their code is so damned
 slow.

 -Steve

 
 Yes, that was a rash statement.

i'm kool & the gang with log n too. that's like proportional 2 the count of
digits in n.

undecided about sublinear. like o(n^.5). guess that would be pushin' it. but
they come by rarely so why bother makin' a decision :)

Aug 26 2008

"Nick Sabalausky" <a a.a> writes:

"Denis Koroskin" <2korden gmail.com> wrote in message 
news:op.ugie28dxo7cclz proton.creatstudio.intranet...
 On Tue, 26 Aug 2008 23:29:33 +0400, superdan <super dan.org> wrote:
 [snip]
 The D spec certainly doesn't make any guarantees about
 the time/memory complexity of opIndex; it's up to the implementing class
 to do so.

 it don't indeed. it should. that's a problem with the spec.

 I agree. You can't rely on function invokation, i.e. the following might 
 be slow as death:

 auto n = collection.at(i);
 auto len = collection.length();

 but index operations and properties getters should be real-time and have 
 O(1) complexity by design.

 auto n = collection[i];
 auto len = collection.length;

I disagree. That strategy strikes me as a very clear example of breaking 
encapsulation by having implementation details dictate certain aspects of 
the API. At the very least, that will make the API overly rigid, hindering 
future changes that could otherwise have been non-breaking, 
behind-the-scenes changes.

For realtime code, I can see the benefit to what you're saying. Although in 
many cases only part of a program needs to be realtime, and for the rest of 
the program's code I'd hate to have to sacrifice the encapsulation benefits.

Aug 26 2008

superdan <super dan.org> writes:

Nick Sabalausky Wrote:

 "Denis Koroskin" <2korden gmail.com> wrote in message 
 news:op.ugie28dxo7cclz proton.creatstudio.intranet...
 On Tue, 26 Aug 2008 23:29:33 +0400, superdan <super dan.org> wrote:
 [snip]
 The D spec certainly doesn't make any guarantees about
 the time/memory complexity of opIndex; it's up to the implementing class
 to do so.

 it don't indeed. it should. that's a problem with the spec.

 I agree. You can't rely on function invokation, i.e. the following might 
 be slow as death:

 auto n = collection.at(i);
 auto len = collection.length();

 but index operations and properties getters should be real-time and have 
 O(1) complexity by design.

 auto n = collection[i];
 auto len = collection.length;

 
 I disagree. That strategy strikes me as a very clear example of breaking 
 encapsulation by having implementation details dictate certain aspects of 
 the API. At the very least, that will make the API overly rigid, hindering 
 future changes that could otherwise have been non-breaking, 
 behind-the-scenes changes.

take this:

auto c = new Customer;
c.loadCustomerInfo("123-12-1234");

that's all cool. there's no performance guarantee other than some best effort
kinda thing `we won't sabotage things'. if you come with faster or slower
methods to load customers, no problem. coz noone assumes any.

now take sort. sort says. my input is a range that supports indexing and
swapping independent of the range size. if you don't have that just let me know
and i'll use a totally different method. just don't pretend.

with that indexing and swapping complexity ain't implementation detail. they're
part of the spec. guess stepanov's main contribution was to clarify that.

 For realtime code, I can see the benefit to what you're saying. Although in 
 many cases only part of a program needs to be realtime, and for the rest of 
 the program's code I'd hate to have to sacrifice the encapsulation benefits.

realtime has nothin' to do with it. encapsulation ain't broken by making
complexity part of the reqs. any more than any req ain't breakin'
encapsulation. if it looks like a problem then encapsulation was misdesigned
and needs change.

case in point. all containers should provide 'nth' say is it's o(n) or better.
then there's a subclass of container that is indexed_container. that provides
opIndex and says it's o(log n) or better. it also provides 'nth' by just
forwarding to opIndex. faster than o(n) ain't a problem. but forcing a list to
blurt something for opIndex - that's just bad design.

Aug 26 2008

Benji Smith <dlanguage benjismith.net> writes:

superdan wrote:
 now take sort. sort says. my input is a range that supports indexing and
swapping independent of the range size. if you don't have that just let me know
and i'll use a totally different method. just don't pretend.
 
 with that indexing and swapping complexity ain't implementation detail.
they're part of the spec. guess stepanov's main contribution was to clarify
that.

The other variable cost operation of a sort is the element comparison. 
Even if indexing and swapping are O(1), the cost of a comparison between 
two elements might be O(m), where m is proportional to the size of the 
elements themselves.

And since a typical sort algorithm will perform n log n comparisons, the 
cost of the comparison has to be factored into the total cost.

The performance of sorting...say, an array of strings based on a 
locale-specific collation...could be an expensive operation, if the 
strings themselves are really long. But that wouldn't make the 
implementation incorrect, and I'm always glad when a sorting 
implementation provides a way of passing a custom comparison delegate 
into the sort routine.

Not a counterargument to what you're saying about performance guarantees 
for indexing and swapping. Just something else to think about.

--benji

Aug 26 2008

superdan <super dan.org> writes:

Benji Smith Wrote:

 superdan wrote:
 now take sort. sort says. my input is a range that supports indexing and
swapping independent of the range size. if you don't have that just let me know
and i'll use a totally different method. just don't pretend.
 
 with that indexing and swapping complexity ain't implementation detail.
they're part of the spec. guess stepanov's main contribution was to clarify
that.

 
 The other variable cost operation of a sort is the element comparison. 
 Even if indexing and swapping are O(1), the cost of a comparison between 
 two elements might be O(m), where m is proportional to the size of the 
 elements themselves.
 
 And since a typical sort algorithm will perform n log n comparisons, the 
 cost of the comparison has to be factored into the total cost.
 
 The performance of sorting...say, an array of strings based on a 
 locale-specific collation...could be an expensive operation, if the 
 strings themselves are really long. But that wouldn't make the 
 implementation incorrect, and I'm always glad when a sorting 
 implementation provides a way of passing a custom comparison delegate 
 into the sort routine.

good points. i only know of one trick to save on comparisons. it's that -1/0/1
comparison. you compare once and get info on less/equal/greater. that cuts
comparisons in half. too bad std.algorithm don't use it.

then i moseyed 'round std.algorithm and saw all that schwartz xform business.
not sure i grokked it. does it have to do with saving on comparisons?

Aug 26 2008

"Nick Sabalausky" <a a.a> writes:

"superdan" <super dan.org> wrote in message 
news:g921nb$2qqq$1 digitalmars.com...
 Nick Sabalausky Wrote:

 "Denis Koroskin" <2korden gmail.com> wrote in message
 news:op.ugie28dxo7cclz proton.creatstudio.intranet...
 On Tue, 26 Aug 2008 23:29:33 +0400, superdan <super dan.org> wrote:
 [snip]
 The D spec certainly doesn't make any guarantees about
 the time/memory complexity of opIndex; it's up to the implementing 
 class
 to do so.

 it don't indeed. it should. that's a problem with the spec.

 I agree. You can't rely on function invokation, i.e. the following 
 might
 be slow as death:

 auto n = collection.at(i);
 auto len = collection.length();

 but index operations and properties getters should be real-time and 
 have
 O(1) complexity by design.

 auto n = collection[i];
 auto len = collection.length;

 I disagree. That strategy strikes me as a very clear example of breaking
 encapsulation by having implementation details dictate certain aspects of
 the API. At the very least, that will make the API overly rigid, 
 hindering
 future changes that could otherwise have been non-breaking,
 behind-the-scenes changes.

 take this:

 auto c = new Customer;
 c.loadCustomerInfo("123-12-1234");

 that's all cool. there's no performance guarantee other than some best 
 effort kinda thing `we won't sabotage things'. if you come with faster or 
 slower methods to load customers, no problem. coz noone assumes any.

 now take sort. sort says. my input is a range that supports indexing and 
 swapping independent of the range size. if you don't have that just let me 
 know and i'll use a totally different method. just don't pretend.

Choosing a sort method is a separate task from the actual sorting. Any sort 
that expects to be able to index will still work correctly if given a 
collection that has O(n) or worse indexing, just as it will still work 
correctly if given an O(n) or worse comparison delegate. It's up to the 
caller of the sort function to know that an O(n log n) sort (for instance) 
is only O(n log n) if the indexing and comparison are both O(1). And then 
it's up to them to decide if they still want to send it a linked list or a 
complex comparison. The sort shouldn't crap out at compile-time (or runtime) 
just because some novice might not know that doing a generalized bubble sort 
on a linked list scales poorly.

If you want automatic choosing of an appropriate sort algorithm (which is 
certainly a good thing to have), then that can be done at a separate, 
optional, level of abstraction using function overloading, template 
specialization, RTTI, etc. That way you're not imposing arbitrary 
restrictions on anyone.

 with that indexing and swapping complexity ain't implementation detail. 
 they're part of the spec. guess stepanov's main contribution was to 
 clarify that.

When I called indexing an implementation detail, I was referring to the 
collection itself. The method of indexing *is* an implementation detail of 
the collection. It should not be considered an implementation detail of the 
sort algorithm since it's encapsulated in the collection and thus hidden 
away from the sort algorithm. If you make a rule that collections with cheap 
indexing are indexed via opIndex and collections with expensive indexing are 
indexed via a function, then you've just defined the API in terms of the 
collection's implementation (and introduced an unnecessary inconsistency 
into the API).

If a sort function is desired that only accepts collections with O(1) 
indexing, then that can be accomplished at a higher level of abstraction 
(using function overloading, RTTI, etc.) without getting in the way when 
such a guarantee is not needed.

 For realtime code, I can see the benefit to what you're saying. Although 
 in
 many cases only part of a program needs to be realtime, and for the rest 
 of
 the program's code I'd hate to have to sacrifice the encapsulation 
 benefits.

 realtime has nothin' to do with it.

For code that needs to run in realtime, I agree with Denis Koroskin that it 
could be helpful to be able to look at a piece of code and have some sort of 
guarantee that there is no behind-the-scenes overloading going on that is 
any more complex than the operators' default behaviors. But for code that 
doesn't need to finish within a maximum amount of time, that becomes less 
important and the encapsulation/syntactic-consistency gained from the use of 
such things becomes a more worthy pursuit. That's what I was saying about 
realtime.

 encapsulation ain't broken by making complexity part of the reqs. any more 
 than any req ain't breakin' encapsulation. if it looks like a problem then 
 encapsulation was misdesigned and needs change.

 case in point. all containers should provide 'nth' say is it's o(n) or 
 better. then there's a subclass of container that is indexed_container. 
 that provides opIndex and says it's o(log n) or better. it also provides 
 'nth' by just forwarding to opIndex. faster than o(n) ain't a problem.

 but forcing a list to blurt something for opIndex - that's just bad 
 design.

I agree that not all collections should implement an opIndex. Anything 
without a natural sequence or mapping should lack opIndex (such as a tree or 
graph). But forcing the user of a collection that *does* have a natural 
sequence (like a linked list) to use function-call-syntax instead of 
standard indexing-syntax just because the collection is implemented in a way 
that causes indexing to be less scalable than other collections - that's bad 
design.

The way I see it, "group[n]" means "Get the nth element of group". Not "Get 
the element at location group.ptr + (n * sizeof(group_base_type)) or 
something else that's just as scalable." In plain C, those are one and the 
same. But when you start talking about generic collections, encapsulation 
and interface versus implementation, they are very different: the former is 
interface, the latter is implementation.

Aug 26 2008

superdan <super dan.org> writes:

Nick Sabalausky Wrote:

 "superdan" <super dan.org> wrote in message 
 news:g921nb$2qqq$1 digitalmars.com...
 Nick Sabalausky Wrote:

 "Denis Koroskin" <2korden gmail.com> wrote in message
 news:op.ugie28dxo7cclz proton.creatstudio.intranet...
 On Tue, 26 Aug 2008 23:29:33 +0400, superdan <super dan.org> wrote:
 [snip]
 The D spec certainly doesn't make any guarantees about
 the time/memory complexity of opIndex; it's up to the implementing 
 class
 to do so.

 it don't indeed. it should. that's a problem with the spec.

 I agree. You can't rely on function invokation, i.e. the following 
 might
 be slow as death:

 auto n = collection.at(i);
 auto len = collection.length();

 but index operations and properties getters should be real-time and 
 have
 O(1) complexity by design.

 auto n = collection[i];
 auto len = collection.length;

 I disagree. That strategy strikes me as a very clear example of breaking
 encapsulation by having implementation details dictate certain aspects of
 the API. At the very least, that will make the API overly rigid, 
 hindering
 future changes that could otherwise have been non-breaking,
 behind-the-scenes changes.

 take this:

 auto c = new Customer;
 c.loadCustomerInfo("123-12-1234");

 that's all cool. there's no performance guarantee other than some best 
 effort kinda thing `we won't sabotage things'. if you come with faster or 
 slower methods to load customers, no problem. coz noone assumes any.

 now take sort. sort says. my input is a range that supports indexing and 
 swapping independent of the range size. if you don't have that just let me 
 know and i'll use a totally different method. just don't pretend.

 
 Choosing a sort method is a separate task from the actual sorting.

thot you were the one big on abstraction and encapsulation and all those good
things. as a user i want to sort stuff. i let the library choose what's best
for the collection at hand.

sort(stuff)
{
    1) figure out best algo for stuff
    2) have at it
}

i don't want to make that decision outside. for example i like one sort
routine. not quicksort, heapsort, quicksort_with_median_of_5, or god forbid
bubblesort.

 Any sort 
 that expects to be able to index will still work correctly if given a 
 collection that has O(n) or worse indexing, just as it will still work 
 correctly if given an O(n) or worse comparison delegate.

i disagree but now that christopher gave me a black eye guess i have to shut up.

 It's up to the 
 caller of the sort function to know that an O(n log n) sort (for instance) 
 is only O(n log n) if the indexing and comparison are both O(1). And then 
 it's up to them to decide if they still want to send it a linked list or a 
 complex comparison. The sort shouldn't crap out at compile-time (or runtime) 
 just because some novice might not know that doing a generalized bubble sort 
 on a linked list scales poorly.

it should coz there's an obvious good choice. there's no good tradeoff in that.
there never will be a case to call the bad sort on the bad range.

 If you want automatic choosing of an appropriate sort algorithm (which is 
 certainly a good thing to have), then that can be done at a separate, 
 optional, level of abstraction using function overloading, template 
 specialization, RTTI, etc. That way you're not imposing arbitrary 
 restrictions on anyone.

i think u missed a lil point i was making.

All collections: implement nth()
Indexable collections: implement opIndex

is all. there is no restriction. just use nth.

 with that indexing and swapping complexity ain't implementation detail. 
 they're part of the spec. guess stepanov's main contribution was to 
 clarify that.

 
 When I called indexing an implementation detail, I was referring to the 
 collection itself. The method of indexing *is* an implementation detail of 
 the collection.

not when it gets composed in higher level ops.

 It should not be considered an implementation detail of the 
 sort algorithm since it's encapsulated in the collection and thus hidden 
 away from the sort algorithm. If you make a rule that collections with cheap 
 indexing are indexed via opIndex and collections with expensive indexing are 
 indexed via a function, then you've just defined the API in terms of the 
 collection's implementation (and introduced an unnecessary inconsistency 
 into the API).

no. there is consistency. nth() is consistent across - o(n) or better indexing.

i think u have it wrong when u think of "cheap" as if it were "fewer machine
instructions". no. it's about asymptotic complexity and that does matter. 

 If a sort function is desired that only accepts collections with O(1) 
 indexing, then that can be accomplished at a higher level of abstraction 
 (using function overloading, RTTI, etc.) without getting in the way when 
 such a guarantee is not needed.

exactly. nth() and opIndex() fit the bill. what's there not to love?

 For realtime code, I can see the benefit to what you're saying. Although 
 in
 many cases only part of a program needs to be realtime, and for the rest 
 of
 the program's code I'd hate to have to sacrifice the encapsulation 
 benefits.

 realtime has nothin' to do with it.

 
 For code that needs to run in realtime, I agree with Denis Koroskin that it 
 could be helpful to be able to look at a piece of code and have some sort of 
 guarantee that there is no behind-the-scenes overloading going on that is 
 any more complex than the operators' default behaviors. But for code that 
 doesn't need to finish within a maximum amount of time, that becomes less 
 important and the encapsulation/syntactic-consistency gained from the use of 
 such things becomes a more worthy pursuit. That's what I was saying about 
 realtime.

i disagree but am in a rush now. guess i can't convince u.
 
 encapsulation ain't broken by making complexity part of the reqs. any more 
 than any req ain't breakin' encapsulation. if it looks like a problem then 
 encapsulation was misdesigned and needs change.

 case in point. all containers should provide 'nth' say is it's o(n) or 
 better. then there's a subclass of container that is indexed_container. 
 that provides opIndex and says it's o(log n) or better. it also provides 
 'nth' by just forwarding to opIndex. faster than o(n) ain't a problem.

 but forcing a list to blurt something for opIndex - that's just bad 
 design.

 
 I agree that not all collections should implement an opIndex. Anything 
 without a natural sequence or mapping should lack opIndex (such as a tree or 
 graph). But forcing the user of a collection that *does* have a natural 
 sequence (like a linked list) to use function-call-syntax instead of 
 standard indexing-syntax just because the collection is implemented in a way 
 that causes indexing to be less scalable than other collections - that's bad 
 design.

no. it's great design. because it's not lyin'. you want o(1) indexing you say
a[n]. you are ok with o(n) indexing you say a.nth(n). this is how generic code
works, with consistent notation. not with lyin'.

 The way I see it, "group[n]" means "Get the nth element of group". Not "Get 
 the element at location group.ptr + (n * sizeof(group_base_type)) or 
 something else that's just as scalable."

no need to get that low. just say o(1) and understand o(1) has nothing to do
with the count of assembler ops.

 In plain C, those are one and the 
 same. But when you start talking about generic collections, encapsulation 
 and interface versus implementation, they are very different: the former is 
 interface, the latter is implementation. 

so now would u say stl has a poor design? because it's all about stuff that you
consider badly designed.

Aug 26 2008

"Nick Sabalausky" <a a.a> writes:

"superdan" <super dan.org> wrote in message 
news:g92drj$h0u$1 digitalmars.com...
 Nick Sabalausky Wrote:

 "superdan" <super dan.org> wrote in message
 news:g921nb$2qqq$1 digitalmars.com...
 Nick Sabalausky Wrote:

 "Denis Koroskin" <2korden gmail.com> wrote in message
 news:op.ugie28dxo7cclz proton.creatstudio.intranet...
 On Tue, 26 Aug 2008 23:29:33 +0400, superdan <super dan.org> wrote:
 [snip]
 The D spec certainly doesn't make any guarantees about
 the time/memory complexity of opIndex; it's up to the implementing
 class
 to do so.

 it don't indeed. it should. that's a problem with the spec.

 I agree. You can't rely on function invokation, i.e. the following
 might
 be slow as death:

 auto n = collection.at(i);
 auto len = collection.length();

 but index operations and properties getters should be real-time and
 have
 O(1) complexity by design.

 auto n = collection[i];
 auto len = collection.length;

 I disagree. That strategy strikes me as a very clear example of 
 breaking
 encapsulation by having implementation details dictate certain aspects 
 of
 the API. At the very least, that will make the API overly rigid,
 hindering
 future changes that could otherwise have been non-breaking,
 behind-the-scenes changes.

 take this:

 auto c = new Customer;
 c.loadCustomerInfo("123-12-1234");

 that's all cool. there's no performance guarantee other than some best
 effort kinda thing `we won't sabotage things'. if you come with faster 
 or
 slower methods to load customers, no problem. coz noone assumes any.

 now take sort. sort says. my input is a range that supports indexing 
 and
 swapping independent of the range size. if you don't have that just let 
 me
 know and i'll use a totally different method. just don't pretend.

 Choosing a sort method is a separate task from the actual sorting.

 thot you were the one big on abstraction and encapsulation and all those 
 good things. as a user i want to sort stuff. i let the library choose 
 what's best for the collection at hand.

 sort(stuff)
 {
    1) figure out best algo for stuff
    2) have at it
 }

 i don't want to make that decision outside. for example i like one sort 
 routine. not quicksort, heapsort, quicksort_with_median_of_5, or god 
 forbid bubblesort.

I never said that shouldn't be available. In fact, I did say it should be 
there. But just not forced.

 Any sort
 that expects to be able to index will still work correctly if given a
 collection that has O(n) or worse indexing, just as it will still work
 correctly if given an O(n) or worse comparison delegate.

 i disagree but now that christopher gave me a black eye guess i have to 
 shut up.

That's not a matter of agreeing or disagreeing, it's a verifiable fact. Grab 
a working sort function that operates on collection classes that implement 
indexing-syntax and a length property, feed it an unsorted linked list that 
has opIndex overloaded to return the nth node and a proper length property, 
and when it returns, the list will be sorted.

Or are you maybe talking about a sort function that's parameterized 
specifically to take an "array" instead of "a collection that implements 
opIndex and a length property"? Because that might make a difference 
depending on the language (not sure about D offhand).

 It's up to the
 caller of the sort function to know that an O(n log n) sort (for 
 instance)
 is only O(n log n) if the indexing and comparison are both O(1). And then
 it's up to them to decide if they still want to send it a linked list or 
 a
 complex comparison. The sort shouldn't crap out at compile-time (or 
 runtime)
 just because some novice might not know that doing a generalized bubble 
 sort
 on a linked list scales poorly.

 it should coz there's an obvious good choice. there's no good tradeoff in 
 that. there never will be a case to call the bad sort on the bad range.

No matter what type of collection you're using, the "best sort" is still 
going to vary depending on factors like the number of elements to be sorted, 
whether duplicates might exist, how close the collection is to either 
perfectly sorted, perfectly backwards or totally random, how likely it is to 
be random/sorted/backwards at any given time, etc. And then there can be 
different variations of the same basic algorithm that can be better or worse 
for certain scenarios. And then there's the issue of how does the 
algorithm-choosing sort handle user-created collections, if at all.

 If you want automatic choosing of an appropriate sort algorithm (which is
 certainly a good thing to have), then that can be done at a separate,
 optional, level of abstraction using function overloading, template
 specialization, RTTI, etc. That way you're not imposing arbitrary
 restrictions on anyone.

 i think u missed a lil point i was making.

 All collections: implement nth()
 Indexable collections: implement opIndex

 is all. there is no restriction. just use nth.

If you want opIndex to be reserved for highly scalable indexing, then I can 
see how that would lead to what you describe here. But I'm in the camp that 
feels opIndex means "indexing", not "cheap/scalable indexing", in which case 
it becomes unnecessary to also expose the separate "nth()" function.

 with that indexing and swapping complexity ain't implementation detail.
 they're part of the spec. guess stepanov's main contribution was to
 clarify that.

 When I called indexing an implementation detail, I was referring to the
 collection itself. The method of indexing *is* an implementation detail 
 of
 the collection.

 not when it gets composed in higher level ops.

It's not? If a collection's indexing isn't implemented by the collection's 
own class (or the equivalent functions in non-OO), then where is it 
implemented? Don't tell me it's the sort function, because I know that I'm 
not calling a sort function every time I say "collection[i]". The method of 
indexing is implemented by the collection class, therefore, it's an 
implementation detail of that method/class, not the functions that call it. 
Claiming otherwise is like saying that all of the inner working of printf() 
are implementation details of main().

 It should not be considered an implementation detail of the
 sort algorithm since it's encapsulated in the collection and thus hidden
 away from the sort algorithm. If you make a rule that collections with 
 cheap
 indexing are indexed via opIndex and collections with expensive indexing 
 are
 indexed via a function, then you've just defined the API in terms of the
 collection's implementation (and introduced an unnecessary inconsistency
 into the API).

 no. there is consistency. nth() is consistent across - o(n) or better 
 indexing.

 i think u have it wrong when u think of "cheap" as if it were "fewer 
 machine instructions". no. it's about asymptotic complexity and that does 
 matter.

Since we're talking about algorithmic complexity, I figured "cheap" and 
"expensive" would be understood as being intended in the same sense. So yes, 
I'm well aware of that.

 If a sort function is desired that only accepts collections with O(1)
 indexing, then that can be accomplished at a higher level of abstraction
 (using function overloading, RTTI, etc.) without getting in the way when
 such a guarantee is not needed.

 exactly. nth() and opIndex() fit the bill. what's there not to love?

The "nth()" deviates from standard indexing syntax. I consider 
"collection[i]" to mean "indexing", not "low-complexity indexing".

 For realtime code, I can see the benefit to what you're saying. 
 Although
 in
 many cases only part of a program needs to be realtime, and for the 
 rest
 of
 the program's code I'd hate to have to sacrifice the encapsulation
 benefits.

 realtime has nothin' to do with it.

 For code that needs to run in realtime, I agree with Denis Koroskin that 
 it
 could be helpful to be able to look at a piece of code and have some sort 
 of
 guarantee that there is no behind-the-scenes overloading going on that is
 any more complex than the operators' default behaviors. But for code that
 doesn't need to finish within a maximum amount of time, that becomes less
 important and the encapsulation/syntactic-consistency gained from the use 
 of
 such things becomes a more worthy pursuit. That's what I was saying about
 realtime.

 i disagree but am in a rush now. guess i can't convince u.

 encapsulation ain't broken by making complexity part of the reqs. any 
 more
 than any req ain't breakin' encapsulation. if it looks like a problem 
 then
 encapsulation was misdesigned and needs change.

 case in point. all containers should provide 'nth' say is it's o(n) or
 better. then there's a subclass of container that is indexed_container.
 that provides opIndex and says it's o(log n) or better. it also 
 provides
 'nth' by just forwarding to opIndex. faster than o(n) ain't a problem.

 but forcing a list to blurt something for opIndex - that's just bad
 design.

 I agree that not all collections should implement an opIndex. Anything
 without a natural sequence or mapping should lack opIndex (such as a tree 
 or
 graph). But forcing the user of a collection that *does* have a natural
 sequence (like a linked list) to use function-call-syntax instead of
 standard indexing-syntax just because the collection is implemented in a 
 way
 that causes indexing to be less scalable than other collections - that's 
 bad
 design.

 no. it's great design. because it's not lyin'. you want o(1) indexing you 
 say a[n]. you are ok with o(n) indexing you say a.nth(n). this is how 
 generic code works, with consistent notation. not with lyin'.

Number of different ways to index a collection:
One way: Consistent
Two ways: Not consistent

 The way I see it, "group[n]" means "Get the nth element of group". Not 
 "Get
 the element at location group.ptr + (n * sizeof(group_base_type)) or
 something else that's just as scalable."

 no need to get that low. just say o(1) and understand o(1) has nothing to 
 do with the count of assembler ops.

Here:

"group.ptr + (n * sizeof(group_base_type))..."

One multiplication, one addition, one memory read, no loops, no recursion: 
O(1)

"...or something else that's just as scalable"

Something that's just as scalable as O(1) must be O(1). So yes, that's what 
I said.

 In plain C, those are one and the
 same. But when you start talking about generic collections, encapsulation
 and interface versus implementation, they are very different: the former 
 is
 interface, the latter is implementation.

 so now would u say stl has a poor design? because it's all about stuff 
 that you consider badly designed.

I abandoned C++ back when STL was still fairly new, so I can't honestly say. 
I seem to remember C++ having some trouble with certain newer language 
concepts, it might be that STL is the best than can be reasonably done given 
the drawbacks of C++. Or there might be room for improvement.

Aug 26 2008

Michiel Helvensteijn <nomail please.com> writes:

superdan wrote:

 my point was opIndex should not be written for a list to begin with.

Ok. So you are not opposed to the random access operation on a list, as long
as it doesn't use opIndex but a named function, correct?

You are saying that there is a rule somewhere (either written or unwritten)
that guarantees a time-complexity of O(1) for opIndex, wherever it appears.

This of course means that a linked list cannot define opIndex, since a
random access operation on it will take O(n) (there are tricks that can
make it faster in most practical cases, but I digress).

That, in turn, means that a linked list and a dynamic array can not share a
common interface that includes opIndex.

Aren't you making things difficult for yourself with this rule?

A list and an array are very similar data-structures and it is natural for
them to share a common interface. The main differences are:
* A list takes more memory.
* A list has slower random access.
* A list has faster insertions and growth.

But the interface shouldn't necessarily make any complexity guarantees. The
implementations should. And any programmer worth his salt will be able to
use this wisely and choose the right sorting algorithm for the right
data-structure. There are other algorithms, I'm sure, that work equally
well on either. Of course, any algorithm should give its time-complexity in
terms of the complexity of the operations it uses.

I do understand your point, however. And I believe my argument would be
stronger if there were some sort of automatic complexity analysis tool.
This could either warn a programmer in case he makes the wrong choice, or
even take the choice out of the programmers hands and automatically choose
the right sorting algorithm for the job. That's a bit ambitious. I guess a
profiler is the next best thing.

-- 
Michiel

Aug 27 2008

superdan <super dan.org> writes:

Michiel Helvensteijn Wrote:

 superdan wrote:
 
 my point was opIndex should not be written for a list to begin with.

 
 Ok. So you are not opposed to the random access operation on a list, as long
 as it doesn't use opIndex but a named function, correct?

correctamundo.

 You are saying that there is a rule somewhere (either written or unwritten)
 that guarantees a time-complexity of O(1) for opIndex, wherever it appears.

yeppers. amend that to o(log n). in d, that rule is a social contract derived
from the built-in vector and hash indexing syntax.

 This of course means that a linked list cannot define opIndex, since a
 random access operation on it will take O(n) (there are tricks that can
 make it faster in most practical cases, but I digress).

it oughtn't. & you digress in the wrong direction. you can't prove a majority
of "practical cases" will not suffer a performance hit. the right direction is
to define the right abstraction for forward iteration.

i mean opIndex optimization is making a shitty design run better. y not make a
good design to start with?

 That, in turn, means that a linked list and a dynamic array can not share a
 common interface that includes opIndex.

so what. they can share a common interface that includes nth(). what exactly is
yer problem with that.

 Aren't you making things difficult for yourself with this rule?

not   all. i want o(n) index access, i use nth() and i know it's gonna take me
o(n) and i'll design my higher-level algorithm accordingly. if random access
helps my particular algo then is(typeof(a[n])) tells me that a supports random
access. if i can't live without a[n] my algo won't compile. every1's happy.

 A list and an array are very similar data-structures and it is natural for
 them to share a common interface.

sure. both r sequence containers. opIndex ain't part of a sequence container
interface.

 The main differences are:
 * A list takes more memory.
 * A list has slower random access.

nooonononono. o(n) vs. o(1) to be precise. that's not "slower". that's sayin'
"list don't have random access, you can as well get up your lazy ass & do a
linear search by calling nth()". turn that on its head. if i give u a container
and say "it has random access" u'd rightly expect better than a linear search.
a deque has slower random access than a vector   the same complexity.

 * A list has faster insertions and growth.

o(1) insertion if u have the insertion point.  both list and vector have o(1)
growth.

list also has o(1) splicing. that's important.

we get to the point where we realize the two r fundamentally different
structures built around fundamentally different tradeoffs. they do satisfy the
same interface. just ain't the vector interface. it's a sequential-access
interface. not a random-access interface.

 But the interface shouldn't necessarily make any complexity guarantees.

problem is many said so til stl came and said enuff is enuff. for fundamental
data structures & algos complexity /must/ be part of the spec & design.
otherwise all u get is a mishmash of crap & u simply can't do generic stuff w/
a mishmash of crap. as other cont/algo libs've copiously shown. that approach's
impressive coz it challenged some stupid taboos & proved them worthless. it was
contrarian & to great effect. for that alone stl puts to shame previous
container/algo libs. i know i'd used half a dozen and wrote a couple. thot the
whole container/algo design is old hat. when stl came along i was like, holy
effin' guacamole. that's why i say. even if u don't use c++ for its many
faults. understand stl coz it's d shiznit.

 The
 implementations should. And any programmer worth his salt will be able to
 use this wisely and choose the right sorting algorithm for the right
 data-structure.

here's where the thing blows apart. i agree with choosing manually if i didn't
want to do generic programming. if u wanna do generic programming u want help
from the compiler in mixing n matching stuff. it's not about the
saltworthiness. it's about writing generic code.

 There are other algorithms, I'm sure, that work equally
 well on either. Of course, any algorithm should give its time-complexity in
 terms of the complexity of the operations it uses.
 
 I do understand your point, however. And I believe my argument would be
 stronger if there were some sort of automatic complexity analysis tool.

stl makes-do without an automatic tool. 

 This could either warn a programmer in case he makes the wrong choice, or
 even take the choice out of the programmers hands and automatically choose
 the right sorting algorithm for the job. That's a bit ambitious. I guess a
 profiler is the next best thing.

i have no doubt stl has had big ambitions. for what i can tell it fulfilled
them tho c++ makes higher-level algorithms look arcane. so i'm happy with the
lambdas in std.algorithm. & can't figure why containers don't come along. walt?

Aug 27 2008

Michiel Helvensteijn <nomail please.com> writes:

superdan wrote:

 This of course means that a linked list cannot define opIndex, since a
 random access operation on it will take O(n) (there are tricks that can
 make it faster in most practical cases, but I digress).

 
 it oughtn't. & you digress in the wrong direction. you can't prove a
 majority of "practical cases" will not suffer a performance hit.

Perhaps. It's been a while since I've worked with data-structures on this
level, but I seem to remember there are ways.

What if you maintain a linked list of small arrays? Say each node in the
list contains around log(n) of the elements in the entire list. Wouldn't
that bring random access down to O(log n)? Of course, this would also bring
insertions up to O(log n).

And what if you save an index/pointer pair after each access. Then with each
new access request, you can choose from three locations to start walking:
* The start of the list.
* The end of the list.
* The last access-point of the list.

In a lot of practical cases a new access is close to the last access. Of
course, the general case would still be O(n).

 That, in turn, means that a linked list and a dynamic array can not share
 a common interface that includes opIndex.

 
 so what. they can share a common interface that includes nth(). what
 exactly is yer problem with that.

That's simple. a[i] looks much nicer than a.nth(i).

By the way, I suspect that if opIndex is available only on arrays and nth()
is available on all sequence types, algorithm writers will forget about
opIndex and use nth(), to make their algorithm more widely compatible. And
I wouldn't blame them, though I guess you would.

 * A list has faster insertions and growth.

 
 o(1) insertion if u have the insertion point.  both list and vector have
 o(1) growth.

Yeah, but dynamic arrays have to re-allocate once in a while. Lists don't.

 we get to the point where we realize the two r fundamentally different
 structures built around fundamentally different tradeoffs. they do satisfy
 the same interface. just ain't the vector interface. it's a
 sequential-access interface. not a random-access interface.

I believe we agree in principle, but are just confused about each others
definitions. If the "random-access interface" guarantees O(1) for
nth/opIndex/whatever, of course you are right.

But if time-complexity is not taken into consideration, the
sequential-access interface and the random-access interface are equivalent,
no?

I'm not opposed to complexity guarantees in public contracts. Far from it,
in fact. Just introduce both interfaces and let the algorithm writers
choose which one to accept. But give both interfaces opIndex, since it's
just good syntax.

I do think it's a good idea for algorithms to support the interface with the
weakest constraints (sequential-access). As long as they specify their
time-complexity in terms of the complexities of the interface operations,
not in absolute terms.

Then, when a programmer writes 'somealgorithm(someinterfaceobject)', a
hypothetical analysis tool could tell him the time-complexity of the the
resulting operation. The programmer might even assert a worst-case
complexity at that point and the compiler could bail out if it doesn't
match.

 even if u don't use c++ for its many faults. understand stl coz it's d
 shiznit.

I use C++. I use STL. I love both. But that doesn't mean there is no room
for improvement. The STL is quite complex, and maybe it doesn't have to be.

-- 
Michiel

Aug 27 2008

Dee Girl <deegirl noreply.com> writes:

Michiel Helvensteijn Wrote:

 superdan wrote:
 
 This of course means that a linked list cannot define opIndex, since a
 random access operation on it will take O(n) (there are tricks that can
 make it faster in most practical cases, but I digress).

 
 it oughtn't. & you digress in the wrong direction. you can't prove a
 majority of "practical cases" will not suffer a performance hit.

 
 Perhaps. It's been a while since I've worked with data-structures on this
 level, but I seem to remember there are ways.

I write example with findLast a little time a go. But there are many examples.
For example move to front algorithm. It is linear but with "trick" opIndex it
is O(n*n) even with optimization. Bad!

 What if you maintain a linked list of small arrays? Say each node in the
 list contains around log(n) of the elements in the entire list. Wouldn't
 that bring random access down to O(log n)? Of course, this would also bring
 insertions up to O(log n).
 
 And what if you save an index/pointer pair after each access. Then with each
 new access request, you can choose from three locations to start walking:
 * The start of the list.
 * The end of the list.
 * The last access-point of the list.
 
 In a lot of practical cases a new access is close to the last access. Of
 course, the general case would still be O(n).

Michiel-san, this is new data structure very different from list! If I want
list I never use this structure. It is like joking. Because when you write this
you agree that list is not vector.

 That, in turn, means that a linked list and a dynamic array can not share
 a common interface that includes opIndex.

 
 so what. they can share a common interface that includes nth(). what
 exactly is yer problem with that.

 
 That's simple. a[i] looks much nicer than a.nth(i).

It is not nicer. It is more deceiving (correct spell?). If you look at code it
looks like array code.

foreach (i; 0 .. a.length)
{
    a[i] += 1;
}

For array works nice. But for list it is terrible! Many operations for
incrementing only small list.

 By the way, I suspect that if opIndex is available only on arrays and nth()
 is available on all sequence types, algorithm writers will forget about
 opIndex and use nth(), to make their algorithm more widely compatible. And
 I wouldn't blame them, though I guess you would.

I do not agree with this. I am sorry! I think nobody should write find() that
uses nth().

 * A list has faster insertions and growth.

 
 o(1) insertion if u have the insertion point.  both list and vector have
 o(1) growth.

 
 Yeah, but dynamic arrays have to re-allocate once in a while. Lists don't.

Lists allocate memory each insert. Array allocate memory some time. With
doubling cost of allocation+copy converge to zero. 

 we get to the point where we realize the two r fundamentally different
 structures built around fundamentally different tradeoffs. they do satisfy
 the same interface. just ain't the vector interface. it's a
 sequential-access interface. not a random-access interface.

 
 I believe we agree in principle, but are just confused about each others
 definitions. If the "random-access interface" guarantees O(1) for
 nth/opIndex/whatever, of course you are right.
 
 But if time-complexity is not taken into consideration, the
 sequential-access interface and the random-access interface are equivalent,
 no?

I think it is mistake to not taken into consideration time complexity. For
basic data structures specially. I do not think you can call them equivalent.

 I'm not opposed to complexity guarantees in public contracts. Far from it,
 in fact. Just introduce both interfaces and let the algorithm writers
 choose which one to accept. But give both interfaces opIndex, since it's
 just good syntax.

I think is convenient syntax. Maybe too convenient ^_^.

 I do think it's a good idea for algorithms to support the interface with the
 weakest constraints (sequential-access). As long as they specify their
 time-complexity in terms of the complexities of the interface operations,
 not in absolute terms.
 
 Then, when a programmer writes 'somealgorithm(someinterfaceobject)', a
 hypothetical analysis tool could tell him the time-complexity of the the
 resulting operation. The programmer might even assert a worst-case
 complexity at that point and the compiler could bail out if it doesn't
 match.

The specification I think is with types. If that works tool is the compiler.

 even if u don't use c++ for its many faults. understand stl coz it's d
 shiznit.

 
 I use C++. I use STL. I love both. But that doesn't mean there is no room
 for improvement. The STL is quite complex, and maybe it doesn't have to be.

Many things in STL can be better with D. But iterators and complexity is
beautiful in STL.

Aug 27 2008

Michiel Helvensteijn <nomail please.com> writes:

Dee Girl wrote:

 What if you maintain a linked list of small arrays? Say each node in the
 list contains around log(n) of the elements in the entire list. Wouldn't
 that bring random access down to O(log n)? Of course, this would also
 bring insertions up to O(log n).
 
 And what if you save an index/pointer pair after each access. Then with
 each new access request, you can choose from three locations to start
 walking: * The start of the list.
 * The end of the list.
 * The last access-point of the list.
 
 In a lot of practical cases a new access is close to the last access. Of
 course, the general case would still be O(n).

 
 Michiel-san, this is new data structure very different from list! If I
 want list I never use this structure. It is like joking. Because when you
 write this you agree that list is not vector.

Yes, the first 'trick' makes it a different datastructure. The second does
not. Would you still be opposed to using opIndex if its time-complexity is
O(log n)?

 That's simple. a[i] looks much nicer than a.nth(i).

 
 It is not nicer. It is more deceiving (correct spell?). If you look at
 code it looks like array code.
 
 foreach (i; 0 .. a.length)
 {
     a[i] += 1;
 }
 
 For array works nice. But for list it is terrible! Many operations for
 incrementing only small list.

With that second trick the loop would have the same complexity for lists.

But putting that aside for the moment, are you saying you would allow
yourself to be deceived by a syntax detail? No, mentally attaching O(1) to
the *subscripting operator* is simply a legacy from C, where it is
syntactic sugar for pointer arithmetic.

 By the way, I suspect that if opIndex is available only on arrays and
 nth() is available on all sequence types, algorithm writers will forget
 about opIndex and use nth(), to make their algorithm more widely
 compatible. And I wouldn't blame them, though I guess you would.

 
 I do not agree with this. I am sorry! I think nobody should write find()
 that uses nth().

Of course not. Find should be written with an iterator, which has optimal
complexity for both data-structures. My point is that an algorithm should
be generic first and foremost. Then you use the operations that have the
lowest complexity over all targeted data-structures if possible.

 * A list has faster insertions and growth.

 
 o(1) insertion if u have the insertion point.  both list and vector
 have o(1) growth.

 
 Yeah, but dynamic arrays have to re-allocate once in a while. Lists
 don't.

 
 Lists allocate memory each insert. Array allocate memory some time. With
 doubling cost of allocation+copy converge to zero.

Lists allocate memory for bare nodes, but never have to copy their elements.
Arrays have to move their whole content to a larger memory location each
time they are outgrown. For more complex data-types that means potentially
very expensive copies.

 But if time-complexity is not taken into consideration, the
 sequential-access interface and the random-access interface are
 equivalent, no?

 
 I think it is mistake to not taken into consideration time complexity. For
 basic data structures specially. I do not think you can call them
 equivalent.

I did say 'if'. You have to agree that if you disregard complexity issues
(for the sake of argument), the two ARE equivalent.

 I do think it's a good idea for algorithms to support the interface with
 the weakest constraints (sequential-access). As long as they specify
 their time-complexity in terms of the complexities of the interface
 operations, not in absolute terms.
 
 Then, when a programmer writes 'somealgorithm(someinterfaceobject)', a
 hypothetical analysis tool could tell him the time-complexity of the the
 resulting operation. The programmer might even assert a worst-case
 complexity at that point and the compiler could bail out if it doesn't
 match.

 
 The specification I think is with types. If that works tool is the
 compiler.

But don't you understand that if this tool did exist, and the language had a
standard notation for time/space-complexity, I could simply write:

sequence<T> s;
/* fill sequence */
sort(s);

And the compiler (in cooperation with this 'tool') could automatically find
the most effective combination of data-structure and algorithm. The code
would be more readable and efficient.

-- 
Michiel

Aug 27 2008

Dee Girl <deegirl noreply.com> writes:

Michiel Helvensteijn Wrote:

 Dee Girl wrote:
 
 What if you maintain a linked list of small arrays? Say each node in the
 list contains around log(n) of the elements in the entire list. Wouldn't
 that bring random access down to O(log n)? Of course, this would also
 bring insertions up to O(log n).
 
 And what if you save an index/pointer pair after each access. Then with
 each new access request, you can choose from three locations to start
 walking: * The start of the list.
 * The end of the list.
 * The last access-point of the list.
 
 In a lot of practical cases a new access is close to the last access. Of
 course, the general case would still be O(n).

 
 Michiel-san, this is new data structure very different from list! If I
 want list I never use this structure. It is like joking. Because when you
 write this you agree that list is not vector.

 
 Yes, the first 'trick' makes it a different datastructure. The second does
 not. Would you still be opposed to using opIndex if its time-complexity is
 O(log n)?

This is different question. And tricks are not answer for problem. Problem is
list has other access method than array.  

 That's simple. a[i] looks much nicer than a.nth(i).

 
 It is not nicer. It is more deceiving (correct spell?). If you look at
 code it looks like array code.
 
 foreach (i; 0 .. a.length)
 {
     a[i] += 1;
 }
 
 For array works nice. But for list it is terrible! Many operations for
 incrementing only small list.

 
 With that second trick the loop would have the same complexity for lists.

Not for singly linked lists. I think name "trick" is very good. It is trick
like prank to a friend. It does not do real thing. It only fools for few cases.

 But putting that aside for the moment, are you saying you would allow
 yourself to be deceived by a syntax detail? No, mentally attaching O(1) to
 the *subscripting operator* is simply a legacy from C, where it is
 syntactic sugar for pointer arithmetic.

I do not think so. I am sorry. If a[n] is not allowed then other array access
primitive is allowed. Give index(a, n) as example. If language say index(a, n)
is array access then it is big mistake for list to also define index(a, n).
List maybe should define findAt(a, n). Then array also can define findAt(a, n).
It is not mistake.

 By the way, I suspect that if opIndex is available only on arrays and
 nth() is available on all sequence types, algorithm writers will forget
 about opIndex and use nth(), to make their algorithm more widely
 compatible. And I wouldn't blame them, though I guess you would.

 
 I do not agree with this. I am sorry! I think nobody should write find()
 that uses nth().

 
 Of course not. Find should be written with an iterator, which has optimal
 complexity for both data-structures. My point is that an algorithm should
 be generic first and foremost. Then you use the operations that have the
 lowest complexity over all targeted data-structures if possible.

Maybe I think "generic" word different than you. For me generic is that
algorithm asks minimum from structure to do its work. For example find ask only
one forward pass. Input iterator does one forward pass. It is mistake if find
ask for index. It is also mistake if structure makes an algorithm think it has
index as primitive operation.

 * A list has faster insertions and growth.

 
 o(1) insertion if u have the insertion point.  both list and vector
 have o(1) growth.

 
 Yeah, but dynamic arrays have to re-allocate once in a while. Lists
 don't.

 
 Lists allocate memory each insert. Array allocate memory some time. With
 doubling cost of allocation+copy converge to zero.

 
 Lists allocate memory for bare nodes, but never have to copy their elements.
 Arrays have to move their whole content to a larger memory location each
 time they are outgrown. For more complex data-types that means potentially
 very expensive copies.

I think this is mistake. I think you should google "amortized complexity".
Maybe that can help much.

 But if time-complexity is not taken into consideration, the
 sequential-access interface and the random-access interface are
 equivalent, no?

 
 I think it is mistake to not taken into consideration time complexity. For
 basic data structures specially. I do not think you can call them
 equivalent.

 
 I did say 'if'. You have to agree that if you disregard complexity issues
 (for the sake of argument), the two ARE equivalent.

But it is useless comparison. Comparison can not forget important aspect. If we
ignore fractionary floating point is integer. If organism is not alive it is
mostly water.

 I do think it's a good idea for algorithms to support the interface with
 the weakest constraints (sequential-access). As long as they specify
 their time-complexity in terms of the complexities of the interface
 operations, not in absolute terms.
 
 Then, when a programmer writes 'somealgorithm(someinterfaceobject)', a
 hypothetical analysis tool could tell him the time-complexity of the the
 resulting operation. The programmer might even assert a worst-case
 complexity at that point and the compiler could bail out if it doesn't
 match.

 
 The specification I think is with types. If that works tool is the
 compiler.

 
 But don't you understand that if this tool did exist, and the language had a
 standard notation for time/space-complexity, I could simply write:
 
 sequence<T> s;
 /* fill sequence */
 sort(s);
 
 And the compiler (in cooperation with this 'tool') could automatically find
 the most effective combination of data-structure and algorithm. The code
 would be more readable and efficient.

Michiel-san, STL does that. Or I misunderstand you?

Aug 27 2008

Michiel Helvensteijn <nomail please.com> writes:

Dee Girl wrote:

 Yes, the first 'trick' makes it a different datastructure. The second
 does not. Would you still be opposed to using opIndex if its
 time-complexity is O(log n)?

 
 This is different question. And tricks are not answer for problem. Problem
 is list has other access method than array.

And what's the answer?

 With that second trick the loop would have the same complexity for lists.

 
 Not for singly linked lists.

Yeah, also for singly linked lists.

 But putting that aside for the moment, are you saying you would allow
 yourself to be deceived by a syntax detail? No, mentally attaching O(1)
 to the *subscripting operator* is simply a legacy from C, where it is
 syntactic sugar for pointer arithmetic.

 
 I do not think so. I am sorry. If a[n] is not allowed then other array
 access primitive is allowed. Give index(a, n) as example. If language say
 index(a, n) is array access then it is big mistake for list to also define
 index(a, n). List maybe should define findAt(a, n). Then array also can
 define findAt(a, n). It is not mistake.

Yes, I agree for "array access". That term implies O(1), since it uses the
word "array". But I was argueing against the subscripting operator to be
forced into O(1).

 Lists allocate memory for bare nodes, but never have to copy their
 elements. Arrays have to move their whole content to a larger memory
 location each time they are outgrown. For more complex data-types that
 means potentially very expensive copies.

 
 I think this is mistake. I think you should google "amortized complexity".
 Maybe that can help much.

Amortized complexity has nothing to do with it. Dynamic arrays have to copy
their elements and lists do not. It's as simple as that.

 But don't you understand that if this tool did exist, and the language
 had a standard notation for time/space-complexity, I could simply write:
 
 sequence<T> s;
 /* fill sequence */
 sort(s);
 
 And the compiler (in cooperation with this 'tool') could automatically
 find the most effective combination of data-structure and algorithm. The
 code would be more readable and efficient.

 
 Michiel-san, STL does that. Or I misunderstand you?

STL will choose the right sorting algorithm, given a specific
data-structure. But I am saying it may be possible also for the
data-structure to be automatically chosen, based on what the programmer
does with it.

-- 
Michiel

Aug 27 2008

"Bill Baxter" <wbaxter gmail.com> writes:

On Wed, Aug 27, 2008 at 9:49 AM, Michiel Helvensteijn <nomail please.com> wrote:
 Dee Girl wrote:

 Yes, the first 'trick' makes it a different datastructure. The second
 does not. Would you still be opposed to using opIndex if its
 time-complexity is O(log n)?

 This is different question. And tricks are not answer for problem. Problem
 is list has other access method than array.

 And what's the answer?

The complexity of STL's std::map indexing operator is O(lg N).
So it is not the case even in the STL that [] *always* means O(1).

Plus, if the element is not found in the std::map when using [], it
triggers an insertion which can mean an allocation, which means the
upper bound for time required for an index operation is whatever the
upper bound for 'new' is on your system.

But std::map is kind of an oddball case.  I think a lot of people are
surprised to find that merely accessing an element can trigger
allocation.  Not a great design in my opinion, precisely because it
fails to have the behavior one would expect out of an [] operator.

--bb

Aug 27 2008

Dee Girl <deegirl noreply.com> writes:

Michiel Helvensteijn Wrote:

 Dee Girl wrote:
 
 Yes, the first 'trick' makes it a different datastructure. The second
 does not. Would you still be opposed to using opIndex if its
 time-complexity is O(log n)?

 
 This is different question. And tricks are not answer for problem. Problem
 is list has other access method than array.

 
 And what's the answer?

I accept logarithm complexity with []. Logarithm grows slow.

 With that second trick the loop would have the same complexity for lists.

 
 Not for singly linked lists.

 
 Yeah, also for singly linked lists.

May be it is not interesting discuss trick more. I am sure many tricks can be
done. And many serious things. Can be done and have been done. They make list
not a list any more.

 But putting that aside for the moment, are you saying you would allow
 yourself to be deceived by a syntax detail? No, mentally attaching O(1)
 to the *subscripting operator* is simply a legacy from C, where it is
 syntactic sugar for pointer arithmetic.

 
 I do not think so. I am sorry. If a[n] is not allowed then other array
 access primitive is allowed. Give index(a, n) as example. If language say
 index(a, n) is array access then it is big mistake for list to also define
 index(a, n). List maybe should define findAt(a, n). Then array also can
 define findAt(a, n). It is not mistake.

 
 Yes, I agree for "array access". That term implies O(1), since it uses the
 word "array". But I was argueing against the subscripting operator to be
 forced into O(1).

I am sorry. I do not understand your logic. My logic was this. Language has
a[n] for array index. My opinion was then a[n] should not be linear search. I
said also you can replace a[n] with index(a, n) and my same reason is the same.
How are you argueing?

I did not want to get in this discussion. I see how it is confusing fast ^_^.

 Lists allocate memory for bare nodes, but never have to copy their
 elements. Arrays have to move their whole content to a larger memory
 location each time they are outgrown. For more complex data-types that
 means potentially very expensive copies.

 
 I think this is mistake. I think you should google "amortized complexity".
 Maybe that can help much.

 
 Amortized complexity has nothing to do with it. Dynamic arrays have to copy
 their elements and lists do not. It's as simple as that.

No, it is not. I am sorry! In STL there is copy. In D there is std.move. I
think it only copies data by bits and clears source. And amortized complexity
shows that there is o(1) bit copy on many append.

 But don't you understand that if this tool did exist, and the language
 had a standard notation for time/space-complexity, I could simply write:
 
 sequence<T> s;
 /* fill sequence */
 sort(s);
 
 And the compiler (in cooperation with this 'tool') could automatically
 find the most effective combination of data-structure and algorithm. The
 code would be more readable and efficient.

 
 Michiel-san, STL does that. Or I misunderstand you?

 
 STL will choose the right sorting algorithm, given a specific
 data-structure. But I am saying it may be possible also for the
 data-structure to be automatically chosen, based on what the programmer
 does with it.

I think this is interesting. Then why argueing for bad container design? I do
not understand. Thank you, Dee Girl.

Aug 27 2008

Michiel Helvensteijn <nomail please.com> writes:

Dee Girl wrote:

 But putting that aside for the moment, are you saying you would allow
 yourself to be deceived by a syntax detail? No, mentally attaching
 O(1) to the *subscripting operator* is simply a legacy from C, where
 it is syntactic sugar for pointer arithmetic.

 
 I do not think so. I am sorry. If a[n] is not allowed then other array
 access primitive is allowed. Give index(a, n) as example. If language
 say index(a, n) is array access then it is big mistake for list to also
 define index(a, n). List maybe should define findAt(a, n). Then array
 also can define findAt(a, n). It is not mistake.

 
 Yes, I agree for "array access". That term implies O(1), since it uses
 the word "array". But I was argueing against the subscripting operator to
 be forced into O(1).

 
 I am sorry. I do not understand your logic. My logic was this. Language
 has a[n] for array index. My opinion was then a[n] should not be linear
 search. I said also you can replace a[n] with index(a, n) and my same
 reason is the same. How are you argueing?

Let me try again. I agree that you may impose complexity-restrictions in
function contracts. If you write a function called index(a, n), you may
impose the O(log n) syntax, for all I care. But the a[n] syntax is so
convenient that I would hate for it to be likewise restricted. I would like
to use it for lists and associative containers and the complexity may be
O(n). The programmer should just be careful.

 I did not want to get in this discussion. I see how it is confusing fast
 ^_^.

I find much of this subthread confusing. (Starting with the discussion of
Benji and superdan). It looks to me like 80% of the discussion is based on
misunderstandings.

 Amortized complexity has nothing to do with it. Dynamic arrays have to
 copy their elements and lists do not. It's as simple as that.

 
 No, it is not. I am sorry! In STL there is copy. In D there is std.move. I
 think it only copies data by bits and clears source. And amortized
 complexity shows that there is o(1) bit copy on many append.

Yes, a bit-copy would be ok. I was thinking of executing the potentially
more expensive copy constructor. It's nice that D doesn't have to do this.

 STL will choose the right sorting algorithm, given a specific
 data-structure. But I am saying it may be possible also for the
 data-structure to be automatically chosen, based on what the programmer
 does with it.

 
 I think this is interesting. Then why argueing for bad container design? I
 do not understand. Thank you, Dee Girl.

Where am I argueing for bad design? All I've been arguing for is looser
restrictions for the subscripting operator. You should be able to use it on
a list, even though the complexity is O(n). But if it is used often enough
(in a deeply nested loop), the compiler will probably automatically use an
array instead.

-- 
Michiel

Aug 27 2008

Dee Girl <deegirl noreply.com> writes:

Michiel Helvensteijn Wrote:

 Dee Girl wrote:
 
 But putting that aside for the moment, are you saying you would allow
 yourself to be deceived by a syntax detail? No, mentally attaching
 O(1) to the *subscripting operator* is simply a legacy from C, where
 it is syntactic sugar for pointer arithmetic.

 
 I do not think so. I am sorry. If a[n] is not allowed then other array
 access primitive is allowed. Give index(a, n) as example. If language
 say index(a, n) is array access then it is big mistake for list to also
 define index(a, n). List maybe should define findAt(a, n). Then array
 also can define findAt(a, n). It is not mistake.

 
 Yes, I agree for "array access". That term implies O(1), since it uses
 the word "array". But I was argueing against the subscripting operator to
 be forced into O(1).

 
 I am sorry. I do not understand your logic. My logic was this. Language
 has a[n] for array index. My opinion was then a[n] should not be linear
 search. I said also you can replace a[n] with index(a, n) and my same
 reason is the same. How are you argueing?

 
 Let me try again. I agree that you may impose complexity-restrictions in
 function contracts. If you write a function called index(a, n), you may
 impose the O(log n) syntax, for all I care. But the a[n] syntax is so
 convenient that I would hate for it to be likewise restricted. I would like
 to use it for lists and associative containers and the complexity may be
 O(n). The programmer should just be careful.

Thank you for trying again. Thank you! I understand. Yes, a[n] is very
convenient! And I would have agree 100% with you if a[n] was not build in
language for array access. But because of that I 100% disagree ^_^.

I also think there is objective mistake. In concrete code programmer can be
careful. But in generic code programmer can not be careful. I think this must
to be explained better. But I am not sure I can.

 I did not want to get in this discussion. I see how it is confusing fast
 ^_^.

 
 I find much of this subthread confusing. (Starting with the discussion of
 Benji and superdan). It looks to me like 80% of the discussion is based on
 misunderstandings.
 
 Amortized complexity has nothing to do with it. Dynamic arrays have to
 copy their elements and lists do not. It's as simple as that.

 
 No, it is not. I am sorry! In STL there is copy. In D there is std.move. I
 think it only copies data by bits and clears source. And amortized
 complexity shows that there is o(1) bit copy on many append.

 
 Yes, a bit-copy would be ok. I was thinking of executing the potentially
 more expensive copy constructor. It's nice that D doesn't have to do this.
 
 STL will choose the right sorting algorithm, given a specific
 data-structure. But I am saying it may be possible also for the
 data-structure to be automatically chosen, based on what the programmer
 does with it.

 
 I think this is interesting. Then why argueing for bad container design? I
 do not understand. Thank you, Dee Girl.

 
 Where am I argueing for bad design? All I've been arguing for is looser
 restrictions for the subscripting operator. You should be able to use it on
 a list, even though the complexity is O(n). But if it is used often enough
 (in a deeply nested loop), the compiler will probably automatically use an
 array instead.

The idea is nice. But I think it can not be done. Tool is not mind reader. If I
make some insert and some index. They want different structure. How does the
tool know what I want fast?

I say you want bad design because tool does not exist. So we do not have the
tool. But we can make good library with what we have. I am sure you can write
library with a[n] in O(n). And it works. But I say is more inferior design than
STL. Because your library allows things that should not work and does not warn
programmer.

Aug 27 2008

Michiel Helvensteijn <nomail please.com> writes:

Dee Girl wrote:

 Where am I argueing for bad design? All I've been arguing for is looser
 restrictions for the subscripting operator. You should be able to use it
 on a list, even though the complexity is O(n). But if it is used often
 enough (in a deeply nested loop), the compiler will probably
 automatically use an array instead.

 
 The idea is nice. But I think it can not be done. Tool is not mind reader.
 If I make some insert and some index. They want different structure. How
 does the tool know what I want fast?

In the future it may be possible to do such analysis. If the indexing is in
a deeper loop, it may weigh more than the insertions you are doing. But
failing that, the programmer might give the compiler 'hints' on which
functions he/she wants faster.

-- 
Michiel

Aug 27 2008

bearophile <bearophileHUGS lycos.com> writes:

Michiel Helvensteijn:
 In the future it may be possible to do such analysis. If the indexing is in
 a deeper loop, it may weigh more than the insertions you are doing. But
 failing that, the programmer might give the compiler 'hints' on which
 functions he/she wants faster.

For example you can write Deque data structure made with a double linked list
of small arrays. During run time it is able to collect few simple statistics of
its usage, and it can grow or shrink the length of the arrays according to the
cache line length and the patterns of its usage at runtime. There's a boolean
constant that at compile time can switch off such collection of statistics, to
make the data structure a bit faster but not adaptive. You may want the data
structure not adaptive if you know very well what its future usage will be in
the program, or in programs that run for few minutes/seconds. In programs that
run for hours or days you may prefer a more adaptive data structure.

You can create similar data structures with languages that compile statically,
but those operations look fitter when there's a virtual machine (HotSpot for
example compiles and uncompiles code dynamically). LLMV looks like being able
to be used in both situations :-)

Bye,
bearophile

Aug 27 2008

superdan <super dan.org> writes:

Dee Girl Wrote:

 Michiel Helvensteijn Wrote:
 
 Dee Girl wrote:
 
 What if you maintain a linked list of small arrays? Say each node in the
 list contains around log(n) of the elements in the entire list. Wouldn't
 that bring random access down to O(log n)? Of course, this would also
 bring insertions up to O(log n).
 
 And what if you save an index/pointer pair after each access. Then with
 each new access request, you can choose from three locations to start
 walking: * The start of the list.
 * The end of the list.
 * The last access-point of the list.
 
 In a lot of practical cases a new access is close to the last access. Of
 course, the general case would still be O(n).

 
 Michiel-san, this is new data structure very different from list! If I
 want list I never use this structure. It is like joking. Because when you
 write this you agree that list is not vector.

 
 Yes, the first 'trick' makes it a different datastructure. The second does
 not. Would you still be opposed to using opIndex if its time-complexity is
 O(log n)?

 
 This is different question. And tricks are not answer for problem. Problem is
list has other access method than array.  
 
 That's simple. a[i] looks much nicer than a.nth(i).

 
 It is not nicer. It is more deceiving (correct spell?). If you look at
 code it looks like array code.
 
 foreach (i; 0 .. a.length)
 {
     a[i] += 1;
 }
 
 For array works nice. But for list it is terrible! Many operations for
 incrementing only small list.

 
 With that second trick the loop would have the same complexity for lists.

 
 Not for singly linked lists. I think name "trick" is very good. It is trick
like prank to a friend. It does not do real thing. It only fools for few cases.

guess i'll risk telling which. forward iteration. backward iteration. accessing
first k. accessing last k. that's pretty much it. and first/last k are already
available in standard list. all else is linear time. so forget about using that
as an index table. a naive design at best.

 But putting that aside for the moment, are you saying you would allow
 yourself to be deceived by a syntax detail? No, mentally attaching O(1) to
 the *subscripting operator* is simply a legacy from C, where it is
 syntactic sugar for pointer arithmetic.

 
 I do not think so. I am sorry. If a[n] is not allowed then other array access
primitive is allowed. Give index(a, n) as example. If language say index(a, n)
is array access then it is big mistake for list to also define index(a, n).
List maybe should define findAt(a, n). Then array also can define findAt(a, n).
It is not mistake.

boils down to what's primitive access vs. what's actual algorithm. indexing in
array is primitive. indexing in list is same algorithm as finding nth element
anywhere - singly, doubly, file, you name it. so can't claim indexing is
primitive for list.

 By the way, I suspect that if opIndex is available only on arrays and
 nth() is available on all sequence types, algorithm writers will forget
 about opIndex and use nth(), to make their algorithm more widely
 compatible. And I wouldn't blame them, though I guess you would.

 
 I do not agree with this. I am sorry! I think nobody should write find()
 that uses nth().

 
 Of course not. Find should be written with an iterator, which has optimal
 complexity for both data-structures. My point is that an algorithm should
 be generic first and foremost. Then you use the operations that have the
 lowest complexity over all targeted data-structures if possible.

 
 Maybe I think "generic" word different than you. For me generic is that
algorithm asks minimum from structure to do its work. For example find ask only
one forward pass. Input iterator does one forward pass. It is mistake if find
ask for index. It is also mistake if structure makes an algorithm think it has
index as primitive operation.
 
 * A list has faster insertions and growth.

 
 o(1) insertion if u have the insertion point.  both list and vector
 have o(1) growth.

 
 Yeah, but dynamic arrays have to re-allocate once in a while. Lists
 don't.

 
 Lists allocate memory each insert. Array allocate memory some time. With
 doubling cost of allocation+copy converge to zero.

 
 Lists allocate memory for bare nodes, but never have to copy their elements.
 Arrays have to move their whole content to a larger memory location each
 time they are outgrown. For more complex data-types that means potentially
 very expensive copies.

 
 I think this is mistake. I think you should google "amortized complexity".
Maybe that can help much.

to expand: array append is o(1) averaged over many appends if you double the
capacity each time you need. interesting if you only add k complexity jumps to
quadratic.

 But if time-complexity is not taken into consideration, the
 sequential-access interface and the random-access interface are
 equivalent, no?

 
 I think it is mistake to not taken into consideration time complexity. For
 basic data structures specially. I do not think you can call them
 equivalent.

 
 I did say 'if'. You have to agree that if you disregard complexity issues
 (for the sake of argument), the two ARE equivalent.

 
 But it is useless comparison. Comparison can not forget important aspect. If
we ignore fractionary floating point is integer. If organism is not alive it is
mostly water.

pwned if u ask me :D

Aug 27 2008

Benji Smith <dlanguage benjismith.net> writes:

Dee Girl wrote:
 Michiel Helvensteijn Wrote:
 That's simple. a[i] looks much nicer than a.nth(i).

 
 It is not nicer. It is more deceiving (correct spell?). If you look at code it
looks like array code.
 
 foreach (i; 0 .. a.length)
 {
     a[i] += 1;
 }
 
 For array works nice. But for list it is terrible! Many operations for
incrementing only small list.

Well, that's what you get with operator overloading.

The same thing could be said for "+" or "-". They're inherently 
deceiving, because they look like builtin operations on primitive data 
types.

For expensive operations (like performing division on an 
unlimited-precision decimal object), should the author of the code use 
"opDiv" or should he implement a separate "divide" function?

Forget opIndex for a moment, and ask the more general question about all 
overloaded operators. Should they imply any sort of asymptotic 
complexity guarantee?

Personally, I don't think so.

I don't like "nth".

I'd rather use the opIndex. And if I'm using a linked list, I'll be 
aware of the fact that it'll exhibit linear-time indexing, and I'll be 
cautious about which algorithms to use.

--benji

Aug 27 2008

Dee Girl <deegirl noreply.com> writes:

Benji Smith Wrote:

 Dee Girl wrote:
 Michiel Helvensteijn Wrote:
 That's simple. a[i] looks much nicer than a.nth(i).

 
 It is not nicer. It is more deceiving (correct spell?). If you look at code it
looks like array code.
 
 foreach (i; 0 .. a.length)
 {
     a[i] += 1;
 }
 
 For array works nice. But for list it is terrible! Many operations for
incrementing only small list.

 
 Well, that's what you get with operator overloading.

I am sorry. I disagree. I think that is what you get with bad design.

 The same thing could be said for "+" or "-". They're inherently 
 deceiving, because they look like builtin operations on primitive data 
 types.
 
 For expensive operations (like performing division on an 
 unlimited-precision decimal object), should the author of the code use 
 "opDiv" or should he implement a separate "divide" function?

The cost of + and - is proportional to digits in number. For small number of
digits computer does fast in hardware. For many digits the cost grows. The
number of digits is log n. I think + and - are fine for big integer. I am not
surprise.

 Forget opIndex for a moment, and ask the more general question about all 
 overloaded operators. Should they imply any sort of asymptotic 
 complexity guarantee?

I think depends on good design. For example I think ++ or -- for iterator. If
it is O(n) it is bad design. Bad design make people say like you "This is what
you get with operator overloading".

 Personally, I don't think so.
 
 I don't like "nth".
 
 I'd rather use the opIndex. And if I'm using a linked list, I'll be 
 aware of the fact that it'll exhibit linear-time indexing, and I'll be 
 cautious about which algorithms to use.

But inside algorithm you do not know if you use a linked list or a vector. You
lost that information in bad abstraction. Also abstraction is bad because if
you change data structure you have concept errors that still compile. And run
until tomorrow ^_^.

I also like or do not like things. But good reason can convince me? Thank you,
Dee Girl.

Aug 27 2008

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

"Dee Girl" wrote
 I think depends on good design. For example I think ++ or -- for iterator. 
 If it is O(n) it is bad design. Bad design make people say like you "This 
 is what you get with operator overloading".

Slightly off topic, when I was developing dcollections, I was a bit annoyed 
that there was no opInc or opDec, instead you have to use opAddAssign and 
opSubAssign.  What this means is that for a list iterator, if you want to 
allow the syntax:

iterator it = list.find(x);
(++it).value = 5;

or such, you have to define the operator opAddAssign.  This makes it 
possible to do:

it += 10;

Which I don't like for the same reason we are arguing about this, it 
suggests this is a simple operation, when in fact, it is O(n).  But there's 
no way around it, as you can't define ++it without defining +=.  Of course, 
I could throw an exception, but I decided against that.  Instead, I just 
warn the user in the docs to only ever use the ++x version.

Annoying...

-Steve

Aug 27 2008

"Nick Sabalausky" <a a.a> writes:

"Dee Girl" <deegirl noreply.com> wrote in message 
news:g943oi$11f4$1 digitalmars.com...
 Benji Smith Wrote:

 Dee Girl wrote:
 Michiel Helvensteijn Wrote:
 That's simple. a[i] looks much nicer than a.nth(i).

 It is not nicer. It is more deceiving (correct spell?). If you look at 
 code it looks like array code.

 foreach (i; 0 .. a.length)
 {
     a[i] += 1;
 }

 For array works nice. But for list it is terrible! Many operations for 
 incrementing only small list.

 Well, that's what you get with operator overloading.

 I am sorry. I disagree. I think that is what you get with bad design.

 The same thing could be said for "+" or "-". They're inherently
 deceiving, because they look like builtin operations on primitive data
 types.

 For expensive operations (like performing division on an
 unlimited-precision decimal object), should the author of the code use
 "opDiv" or should he implement a separate "divide" function?

 The cost of + and - is proportional to digits in number. For small number 
 of digits computer does fast in hardware. For many digits the cost grows. 
 The number of digits is log n. I think + and - are fine for big integer. I 
 am not surprise.

 Forget opIndex for a moment, and ask the more general question about all
 overloaded operators. Should they imply any sort of asymptotic
 complexity guarantee?

 I think depends on good design. For example I think ++ or -- for iterator. 
 If it is O(n) it is bad design. Bad design make people say like you "This 
 is what you get with operator overloading".

 Personally, I don't think so.

 I don't like "nth".

 I'd rather use the opIndex. And if I'm using a linked list, I'll be
 aware of the fact that it'll exhibit linear-time indexing, and I'll be
 cautious about which algorithms to use.

 But inside algorithm you do not know if you use a linked list or a vector. 
 You lost that information in bad abstraction. Also abstraction is bad 
 because if you change data structure you have concept errors that still 
 compile. And run until tomorrow ^_^.

A generic algoritm has absolutely no business caring about the complexity of 
the collection it's operating on. If it does, then you've created a concrete 
algoritm, not a generic one. If an algoritm uses [] and doesn't know the 
complexity of the []...good! It shouldn't know, and it shouldn't care. It's 
the code that sends the collection to the algoritm that knows and cares. 
Why? Because "what algoritm is best?" depends on far more than just what 
type of collection is used. It depends on "Will the collection ever be 
larger than X elements?". It depends on "Is it a standard textbook list, or 
does it use trick 1 and/or trick 2?". It depends on "Is it usually mostly 
sorted or mostly random?". It depends on "What do I do with it most often? 
Sort, append, search, insert or delete?". And it depends on other things, 
too.

Using "[]" versus "nth()" can't tell the algoritm *any* of those things. But 
those things *must* be known in order to make an accurate decision of "Is 
this the right algoritm or not?" Therefore, a generic algoritm *cannot* ever 
know for certain if it's the right algoritm *even* if you say "[]" means 
"O(log n) or better". Therefore, the algorithm should not be designed to 
only work with certain types of collections. The code that sends the 
collection to the algoritm is the *only* code that knows the answers to all 
of the questions above, therefore it is the only code that should ever 
decide "I should use this algorithm, I shouldn't use that algorithm."

 I also like or do not like things. But good reason can convince me? Thank 
 you, Dee Girl.

Aug 27 2008

superdan <super dan.org> writes:

Nick Sabalausky Wrote:

 "Dee Girl" <deegirl noreply.com> wrote in message 
 news:g943oi$11f4$1 digitalmars.com...
 Benji Smith Wrote:

 Dee Girl wrote:
 Michiel Helvensteijn Wrote:
 That's simple. a[i] looks much nicer than a.nth(i).

 It is not nicer. It is more deceiving (correct spell?). If you look at 
 code it looks like array code.

 foreach (i; 0 .. a.length)
 {
     a[i] += 1;
 }

 For array works nice. But for list it is terrible! Many operations for 
 incrementing only small list.

 Well, that's what you get with operator overloading.

 I am sorry. I disagree. I think that is what you get with bad design.

 The same thing could be said for "+" or "-". They're inherently
 deceiving, because they look like builtin operations on primitive data
 types.

 For expensive operations (like performing division on an
 unlimited-precision decimal object), should the author of the code use
 "opDiv" or should he implement a separate "divide" function?

 The cost of + and - is proportional to digits in number. For small number 
 of digits computer does fast in hardware. For many digits the cost grows. 
 The number of digits is log n. I think + and - are fine for big integer. I 
 am not surprise.

 Forget opIndex for a moment, and ask the more general question about all
 overloaded operators. Should they imply any sort of asymptotic
 complexity guarantee?

 I think depends on good design. For example I think ++ or -- for iterator. 
 If it is O(n) it is bad design. Bad design make people say like you "This 
 is what you get with operator overloading".

 Personally, I don't think so.

 I don't like "nth".

 I'd rather use the opIndex. And if I'm using a linked list, I'll be
 aware of the fact that it'll exhibit linear-time indexing, and I'll be
 cautious about which algorithms to use.

 But inside algorithm you do not know if you use a linked list or a vector. 
 You lost that information in bad abstraction. Also abstraction is bad 
 because if you change data structure you have concept errors that still 
 compile. And run until tomorrow ^_^.

 
 A generic algoritm has absolutely no business caring about the complexity of 
 the collection it's operating on.

absolutely. desperate. need. of. chanel.

 If it does, then you've created a concrete 
 algoritm, not a generic one.

sure you don't know what you're talking about. it is generic insofar as it
abstracts away the primitives it needs from its iterator. run don't walk and
get a dose of stl.

 If an algoritm uses [] and doesn't know the 
 complexity of the []...good! It shouldn't know, and it shouldn't care.

nonsense. so wrong i won't even address it.

 It's 
 the code that sends the collection to the algoritm that knows and cares. 
 Why? Because "what algoritm is best?" depends on far more than just what 
 type of collection is used. It depends on "Will the collection ever be 
 larger than X elements?". It depends on "Is it a standard textbook list, or 
 does it use trick 1 and/or trick 2?". It depends on "Is it usually mostly 
 sorted or mostly random?". It depends on "What do I do with it most often? 
 Sort, append, search, insert or delete?". And it depends on other things, 
 too.

sure it does. problem is you have it backwards. types and algos tell you the
theoretical properties. then you in knowledge of what's goin' on use the
algorithm that does it for you in knowledge that the complexity would work for
your situation. or you encode your own specialized algorithm.

thing is stl encoded the most general linear search. you can use it for linear
searching everything. moreover it exactly said what's needed for a linear
search: a one-pass forward iterator aka input iterator.

now to tie it with what u said: you know in your situation whether linear find
cuts the mustard. that don't change the nature of that fundamental algorithm.
so you use it or use another. your choice. but find remains universal so long
as it has access to the basics of one-pass iteration.

 Using "[]" versus "nth()" can't tell the algoritm *any* of those things.

doesn't have to.

 But 
 those things *must* be known in order to make an accurate decision of "Is 
 this the right algoritm or not?"

sure. you make them decision on the call site.

 Therefore, a generic algoritm *cannot* ever 
 know for certain if it's the right algoritm *even* if you say "[]" means 
 "O(log n) or better".

utterly wrong. poppycock. gobbledygook. nonsense. this is so far off i don't
have time to even address. if you want to learn stuff go learn stl. then we
talk. if you want to teach me i guess we're done.

 Therefore, the algorithm should not be designed to 
 only work with certain types of collections.

it should be designed to work with certain iterator categories.

 The code that sends the 
 collection to the algoritm is the *only* code that knows the answers to all 
 of the questions above, therefore it is the only code that should ever 
 decide "I should use this algorithm, I shouldn't use that algorithm."

correct. you just have all of your other hypotheses jumbled. 

sorry dood don't be hatin' but there's so much you don't know i ain't gonna
continue this. last word is yours. call me a pompous prick if you want. go
ahead.

Aug 27 2008

"Nick Sabalausky" <a a.a> writes:

"superdan" <super dan.org> wrote in message 
news:g94g3e$20e9$1 digitalmars.com...
 Nick Sabalausky Wrote:

 "Dee Girl" <deegirl noreply.com> wrote in message
 news:g943oi$11f4$1 digitalmars.com...
 Benji Smith Wrote:

 Dee Girl wrote:
 Michiel Helvensteijn Wrote:
 That's simple. a[i] looks much nicer than a.nth(i).

 It is not nicer. It is more deceiving (correct spell?). If you look 
 at
 code it looks like array code.

 foreach (i; 0 .. a.length)
 {
     a[i] += 1;
 }

 For array works nice. But for list it is terrible! Many operations 
 for
 incrementing only small list.

 Well, that's what you get with operator overloading.

 I am sorry. I disagree. I think that is what you get with bad design.

 The same thing could be said for "+" or "-". They're inherently
 deceiving, because they look like builtin operations on primitive data
 types.

 For expensive operations (like performing division on an
 unlimited-precision decimal object), should the author of the code use
 "opDiv" or should he implement a separate "divide" function?

 The cost of + and - is proportional to digits in number. For small 
 number
 of digits computer does fast in hardware. For many digits the cost 
 grows.
 The number of digits is log n. I think + and - are fine for big 
 integer. I
 am not surprise.

 Forget opIndex for a moment, and ask the more general question about 
 all
 overloaded operators. Should they imply any sort of asymptotic
 complexity guarantee?

 I think depends on good design. For example I think ++ or -- for 
 iterator.
 If it is O(n) it is bad design. Bad design make people say like you 
 "This
 is what you get with operator overloading".

 Personally, I don't think so.

 I don't like "nth".

 I'd rather use the opIndex. And if I'm using a linked list, I'll be
 aware of the fact that it'll exhibit linear-time indexing, and I'll be
 cautious about which algorithms to use.

 But inside algorithm you do not know if you use a linked list or a 
 vector.
 You lost that information in bad abstraction. Also abstraction is bad
 because if you change data structure you have concept errors that still
 compile. And run until tomorrow ^_^.

 A generic algoritm has absolutely no business caring about the complexity 
 of
 the collection it's operating on.

 absolutely. desperate. need. of. chanel.

 If it does, then you've created a concrete
 algoritm, not a generic one.

 sure you don't know what you're talking about. it is generic insofar as it 
 abstracts away the primitives it needs from its iterator. run don't walk 
 and get a dose of stl.

 If an algoritm uses [] and doesn't know the
 complexity of the []...good! It shouldn't know, and it shouldn't care.

 nonsense. so wrong i won't even address it.

 It's
 the code that sends the collection to the algoritm that knows and cares.
 Why? Because "what algoritm is best?" depends on far more than just what
 type of collection is used. It depends on "Will the collection ever be
 larger than X elements?". It depends on "Is it a standard textbook list, 
 or
 does it use trick 1 and/or trick 2?". It depends on "Is it usually mostly
 sorted or mostly random?". It depends on "What do I do with it most 
 often?
 Sort, append, search, insert or delete?". And it depends on other things,
 too.

 sure it does. problem is you have it backwards. types and algos tell you 
 the theoretical properties. then you in knowledge of what's goin' on use 
 the algorithm that does it for you in knowledge that the complexity would 
 work for your situation. or you encode your own specialized algorithm.

 thing is stl encoded the most general linear search. you can use it for 
 linear searching everything. moreover it exactly said what's needed for a 
 linear search: a one-pass forward iterator aka input iterator.

 now to tie it with what u said: you know in your situation whether linear 
 find cuts the mustard. that don't change the nature of that fundamental 
 algorithm. so you use it or use another. your choice. but find remains 
 universal so long as it has access to the basics of one-pass iteration.

 Using "[]" versus "nth()" can't tell the algoritm *any* of those things.

 doesn't have to.

 But
 those things *must* be known in order to make an accurate decision of "Is
 this the right algoritm or not?"

 sure. you make them decision on the call site.

 Therefore, a generic algoritm *cannot* ever
 know for certain if it's the right algoritm *even* if you say "[]" means
 "O(log n) or better".

 utterly wrong. poppycock. gobbledygook. nonsense. this is so far off i 
 don't have time to even address. if you want to learn stuff go learn stl. 
 then we talk. if you want to teach me i guess we're done.

 Therefore, the algorithm should not be designed to
 only work with certain types of collections.

 it should be designed to work with certain iterator categories.

 The code that sends the
 collection to the algoritm is the *only* code that knows the answers to 
 all
 of the questions above, therefore it is the only code that should ever
 decide "I should use this algorithm, I shouldn't use that algorithm."

 correct. you just have all of your other hypotheses jumbled.

 sorry dood don't be hatin' but there's so much you don't know i ain't 
 gonna continue this. last word is yours. call me a pompous prick if you 
 want. go ahead.

I'll agree to drop this issue. There's little point in debating with someone 
whose arguments frequently consist of things like "You are wrong", "I'm not 
going to explain my point", and "dood don't be hatin'".

Aug 27 2008

Fawzi Mohamed <fmohamed mac.com> writes:

On 2008-08-27 23:27:52 +0200, "Nick Sabalausky" <a a.a> said:

 "superdan" <super dan.org> wrote in message
 news:g94g3e$20e9$1 digitalmars.com...
 Nick Sabalausky Wrote:
 
 "Dee Girl" <deegirl noreply.com> wrote in message
 news:g943oi$11f4$1 digitalmars.com...
 Benji Smith Wrote:
 
 Dee Girl wrote:
 Michiel Helvensteijn Wrote:
 That's simple. a[i] looks much nicer than a.nth(i).

 
 It is not nicer. It is more deceiving (correct spell?). If you look
 at
 code it looks like array code.
 
 foreach (i; 0 .. a.length)
 {
 a[i] += 1;
 }
 
 For array works nice. But for list it is terrible! Many operations
 for
 incrementing only small list.

 
 Well, that's what you get with operator overloading.

 
 I am sorry. I disagree. I think that is what you get with bad design.
 
 The same thing could be said for "+" or "-". They're inherently
 deceiving, because they look like builtin operations on primitive data
 types.
 
 For expensive operations (like performing division on an
 unlimited-precision decimal object), should the author of the code use
 "opDiv" or should he implement a separate "divide" function?

 
 The cost of + and - is proportional to digits in number. For small
 number
 of digits computer does fast in hardware. For many digits the cost
 grows.
 The number of digits is log n. I think + and - are fine for big
 integer. I
 am not surprise.
 
 Forget opIndex for a moment, and ask the more general question about
 all
 overloaded operators. Should they imply any sort of asymptotic
 complexity guarantee?

 
 I think depends on good design. For example I think ++ or -- for
 iterator.
 If it is O(n) it is bad design. Bad design make people say like you
 "This
 is what you get with operator overloading".
 
 Personally, I don't think so.
 
 I don't like "nth".
 
 I'd rather use the opIndex. And if I'm using a linked list, I'll be
 aware of the fact that it'll exhibit linear-time indexing, and I'll be
 cautious about which algorithms to use.

 
 But inside algorithm you do not know if you use a linked list or a
 vector.
 You lost that information in bad abstraction. Also abstraction is bad
 because if you change data structure you have concept errors that still
 compile. And run until tomorrow ^_^.
 

 
 A generic algoritm has absolutely no business caring about the complexity
 of
 the collection it's operating on.

 
 absolutely. desperate. need. of. chanel.
 
 If it does, then you've created a concrete
 algoritm, not a generic one.

 
 sure you don't know what you're talking about. it is generic insofar as it
 abstracts away the primitives it needs from its iterator. run don't walk
 and get a dose of stl.
 
 If an algoritm uses [] and doesn't know the
 complexity of the []...good! It shouldn't know, and it shouldn't care.

 
 nonsense. so wrong i won't even address it.
 
 It's
 the code that sends the collection to the algoritm that knows and cares.
 Why? Because "what algoritm is best?" depends on far more than just what
 type of collection is used. It depends on "Will the collection ever be
 larger than X elements?". It depends on "Is it a standard textbook list,
 or
 does it use trick 1 and/or trick 2?". It depends on "Is it usually mostly
 sorted or mostly random?". It depends on "What do I do with it most
 often?
 Sort, append, search, insert or delete?". And it depends on other things,
 too.

 
 sure it does. problem is you have it backwards. types and algos tell you
 the theoretical properties. then you in knowledge of what's goin' on use
 the algorithm that does it for you in knowledge that the complexity would
 work for your situation. or you encode your own specialized algorithm.
 
 thing is stl encoded the most general linear search. you can use it for
 linear searching everything. moreover it exactly said what's needed for a
 linear search: a one-pass forward iterator aka input iterator.
 
 now to tie it with what u said: you know in your situation whether linear
 find cuts the mustard. that don't change the nature of that fundamental
 algorithm. so you use it or use another. your choice. but find remains
 universal so long as it has access to the basics of one-pass iteration.
 
 Using "[]" versus "nth()" can't tell the algoritm *any* of those things.

 
 doesn't have to.
 
 But
 those things *must* be known in order to make an accurate decision of "Is
 this the right algoritm or not?"

 
 sure. you make them decision on the call site.
 
 Therefore, a generic algoritm *cannot* ever
 know for certain if it's the right algoritm *even* if you say "[]" means
 "O(log n) or better".

 
 utterly wrong. poppycock. gobbledygook. nonsense. this is so far off i
 don't have time to even address. if you want to learn stuff go learn stl.
 then we talk. if you want to teach me i guess we're done.
 
 Therefore, the algorithm should not be designed to
 only work with certain types of collections.

 
 it should be designed to work with certain iterator categories.
 
 The code that sends the
 collection to the algoritm is the *only* code that knows the answers to
 all
 of the questions above, therefore it is the only code that should ever
 decide "I should use this algorithm, I shouldn't use that algorithm."

 
 correct. you just have all of your other hypotheses jumbled.
 
 sorry dood don't be hatin' but there's so much you don't know i ain't
 gonna continue this. last word is yours. call me a pompous prick if you
 want. go ahead.

 
 I'll agree to drop this issue. There's little point in debating with someone
 whose arguments frequently consist of things like "You are wrong", "I'm not
 going to explain my point", and "dood don't be hatin'".

I am with dan dee_girl & co on this issue, the problem is that a 
generic algorithm "knows" the types he is working on and can easily 
check the operations they have, and based on this decide the strategy 
to use. This choice works well if the presence of a given operation is 
also connected with some performance guarantee.

Concepts (or better categories (aldor concept not C++), that are 
interfaces for types, but interfaces that have to be explicitly 
assigned to a type) might relax this situation a little, but the need 
for some guarantees will remain.

Fawzi

Aug 27 2008

"Nick Sabalausky" <a a.a> writes:

"Fawzi Mohamed" <fmohamed mac.com> wrote in message 
news:g94k2b$2a1e$1 digitalmars.com...
 On 2008-08-27 23:27:52 +0200, "Nick Sabalausky" <a a.a> said:

 "superdan" <super dan.org> wrote in message
 news:g94g3e$20e9$1 digitalmars.com...
 Nick Sabalausky Wrote:

 "Dee Girl" <deegirl noreply.com> wrote in message
 news:g943oi$11f4$1 digitalmars.com...
 Benji Smith Wrote:

 Dee Girl wrote:
 Michiel Helvensteijn Wrote:
 That's simple. a[i] looks much nicer than a.nth(i).

 It is not nicer. It is more deceiving (correct spell?). If you look
 at
 code it looks like array code.

 foreach (i; 0 .. a.length)
 {
 a[i] += 1;
 }

 For array works nice. But for list it is terrible! Many operations
 for
 incrementing only small list.

 Well, that's what you get with operator overloading.

 I am sorry. I disagree. I think that is what you get with bad design.

 The same thing could be said for "+" or "-". They're inherently
 deceiving, because they look like builtin operations on primitive 
 data
 types.

 For expensive operations (like performing division on an
 unlimited-precision decimal object), should the author of the code 
 use
 "opDiv" or should he implement a separate "divide" function?

 The cost of + and - is proportional to digits in number. For small
 number
 of digits computer does fast in hardware. For many digits the cost
 grows.
 The number of digits is log n. I think + and - are fine for big
 integer. I
 am not surprise.

 Forget opIndex for a moment, and ask the more general question about
 all
 overloaded operators. Should they imply any sort of asymptotic
 complexity guarantee?

 I think depends on good design. For example I think ++ or -- for
 iterator.
 If it is O(n) it is bad design. Bad design make people say like you
 "This
 is what you get with operator overloading".

 Personally, I don't think so.

 I don't like "nth".

 I'd rather use the opIndex. And if I'm using a linked list, I'll be
 aware of the fact that it'll exhibit linear-time indexing, and I'll 
 be
 cautious about which algorithms to use.

 But inside algorithm you do not know if you use a linked list or a
 vector.
 You lost that information in bad abstraction. Also abstraction is bad
 because if you change data structure you have concept errors that 
 still
 compile. And run until tomorrow ^_^.

 A generic algoritm has absolutely no business caring about the 
 complexity
 of
 the collection it's operating on.

 absolutely. desperate. need. of. chanel.

 If it does, then you've created a concrete
 algoritm, not a generic one.

 sure you don't know what you're talking about. it is generic insofar as 
 it
 abstracts away the primitives it needs from its iterator. run don't walk
 and get a dose of stl.

 If an algoritm uses [] and doesn't know the
 complexity of the []...good! It shouldn't know, and it shouldn't care.

 nonsense. so wrong i won't even address it.

 It's
 the code that sends the collection to the algoritm that knows and 
 cares.
 Why? Because "what algoritm is best?" depends on far more than just 
 what
 type of collection is used. It depends on "Will the collection ever be
 larger than X elements?". It depends on "Is it a standard textbook 
 list,
 or
 does it use trick 1 and/or trick 2?". It depends on "Is it usually 
 mostly
 sorted or mostly random?". It depends on "What do I do with it most
 often?
 Sort, append, search, insert or delete?". And it depends on other 
 things,
 too.

 sure it does. problem is you have it backwards. types and algos tell you
 the theoretical properties. then you in knowledge of what's goin' on use
 the algorithm that does it for you in knowledge that the complexity 
 would
 work for your situation. or you encode your own specialized algorithm.

 thing is stl encoded the most general linear search. you can use it for
 linear searching everything. moreover it exactly said what's needed for 
 a
 linear search: a one-pass forward iterator aka input iterator.

 now to tie it with what u said: you know in your situation whether 
 linear
 find cuts the mustard. that don't change the nature of that fundamental
 algorithm. so you use it or use another. your choice. but find remains
 universal so long as it has access to the basics of one-pass iteration.

 Using "[]" versus "nth()" can't tell the algoritm *any* of those 
 things.

 doesn't have to.

 But
 those things *must* be known in order to make an accurate decision of 
 "Is
 this the right algoritm or not?"

 sure. you make them decision on the call site.

 Therefore, a generic algoritm *cannot* ever
 know for certain if it's the right algoritm *even* if you say "[]" 
 means
 "O(log n) or better".

 utterly wrong. poppycock. gobbledygook. nonsense. this is so far off i
 don't have time to even address. if you want to learn stuff go learn 
 stl.
 then we talk. if you want to teach me i guess we're done.

 Therefore, the algorithm should not be designed to
 only work with certain types of collections.

 it should be designed to work with certain iterator categories.

 The code that sends the
 collection to the algoritm is the *only* code that knows the answers to
 all
 of the questions above, therefore it is the only code that should ever
 decide "I should use this algorithm, I shouldn't use that algorithm."

 correct. you just have all of your other hypotheses jumbled.

 sorry dood don't be hatin' but there's so much you don't know i ain't
 gonna continue this. last word is yours. call me a pompous prick if you
 want. go ahead.

 I'll agree to drop this issue. There's little point in debating with 
 someone
 whose arguments frequently consist of things like "You are wrong", "I'm 
 not
 going to explain my point", and "dood don't be hatin'".

 I am with dan dee_girl & co on this issue, the problem is that a generic 
 algorithm "knows" the types he is working on and can easily check the 
 operations they have, and based on this decide the strategy to use. This 
 choice works well if the presence of a given operation is also connected 
 with some performance guarantee.


named interfaces. I'm not sure if this is what you're referring to below or 
not.

 Concepts (or better categories (aldor concept not C++), that are 
 interfaces for types, but interfaces that have to be explicitly assigned 
 to a type) might relax this situation a little, but the need for some 
 guarantees will remain.

If this "guarantee" (or mechanism for checking the types of operations that 
a collection supports) takes the form of a style guideline that says "don't 
implement opIndex for a collection if it would be O(n) or worse", then that, 
frankly, is absolutely no guarantee at all.

If you *really* need that sort of guarantee (and I can imagine it may be 
useful in some cases), then the implementation of the guarantee does *not* 
belong in the realm of "implements vs doesn't-implement a particular 
operator overload". Doing so is an abuse of operator overloading, since 
operator overloading is there for defining syntactic sugar, not for acting 
as a makeshift contract.

The correct mechanism for such guarantees is with named interfaces or 

the collection author wants to, but they have to actually try (ie, they have 
to lie and say "implements IndexingInConstantTime" in addition to 
implementing opIndex). If you instead try to implement that guarantee with 
the "don't implement opIndex for a collection if it would be O(n) or worse" 
style-guideline, then it's far too easy for a collection to come along that 
is ignorant of that "psuedo-contract" and accidentially breaks it. Proper 
use of interfaces/attributes instead of relying on the existence or absense 
of an overloaded operator fixes that problem.

Aug 27 2008

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

"Nick Sabalausky" wrote
 Concepts (or better categories (aldor concept not C++), that are 
 interfaces for types, but interfaces that have to be explicitly assigned 
 to a type) might relax this situation a little, but the need for some 
 guarantees will remain.

 If this "guarantee" (or mechanism for checking the types of operations 
 that a collection supports) takes the form of a style guideline that says 
 "don't implement opIndex for a collection if it would be O(n) or worse", 
 then that, frankly, is absolutely no guarantee at all.

The guarantee is not enforced, but the expectation and convention is 
implict.  When someone sees an index operator the first thought is that it 
is a quick lookup.  You can force yourself to think differently, but the 
reality is that most people think that because of the universal usage of 
square brackets (except for VB, and I feel pity for anyone who needs to use 
VB) to mean 'lookup by key', and usually this is only useful on objects 
where the lookup is quick ( < O(n) ).  Although there is no requirement, nor 
enforcement, the 'quick' contract is expected by the user, no matter how 
much docs you throw at them.

Look, for instance, at Tango's now-deprecated LinkMap, which uses a 
linked-list of key/value pairs (copied from Doug Lea's implementation). 
Nobody in their right mind would use link map because lookups are O(n), and 
it's just as easy to use a TreeMap or HashMap.  Would you ever use it?

 If you *really* need that sort of guarantee (and I can imagine it may be 
 useful in some cases), then the implementation of the guarantee does *not* 
 belong in the realm of "implements vs doesn't-implement a particular 
 operator overload". Doing so is an abuse of operator overloading, since 
 operator overloading is there for defining syntactic sugar, not for acting 
 as a makeshift contract.

I don't think anybody is asking for a guarantee from the compiler or any 
specific tool.  I think what we are saying is that violating the 'opIndex is 
fast' notion is bad design because you end up with users thinking they are 
doing something that's quick.  You end up with people posting benchmarks on 
your containers saying 'why does python beat the pants off your list 
implementation?'.  You can say 'hey, it's not meant to be used that way', 
but then why can the user use it that way?  A better design is to nudge the 
user into using a correct container for the job by only supporting 
operations that make sense on the collections.

And as far as operator semantic meaning, D's operators are purposely named 
after what they are supposed to do.  Notice that the operator for + is 
opAdd, not opPlus.  This is because opAdd is supposed to mean you are 
performing an addition operation.  Assigning a different semantic meaning is 
not disallowed by the compiler, but is considered bad design.  opIndex is 
supposed to be an index function, not a linear search.  It's not called 
opSearch for a reason.  Sure you can redefine it however you want 
semantically, but it's considered bad design.  That's all we're saying.

-Steve

Aug 27 2008

"Nick Sabalausky" <a a.a> writes:

"Steven Schveighoffer" <schveiguy yahoo.com> wrote in message 
news:g95b89$1jii$1 digitalmars.com...
 When someone sees an index operator the first thought is that it is a 
 quick lookup.

This seems to be a big part of the disagreement. Personally, I think it's 
insane to look at [] and just assume it's a cheap lookup. That's was true in 
pure C where, as someone else said, it was nothing more than a shorthand for 
a specific pointer arithmentic operation, but carrying that idea over to 
more modern languages (especially one that also uses [] for associative 
arrays) is a big mistake.

The '+' operator means "add". Addition is typically O(1). But vectors can be 
added, and that's an O(n) operation. Should opAdd never be used for vectors?

 You can force yourself to think differently, but the reality is that most 
 people think that because of the universal usage of square brackets 
 (except for VB, and I feel pity for anyone who needs to use VB) to mean 
 'lookup by key', and usually this is only useful on objects where the 
 lookup is quick ( < O(n) ).  Although there is no requirement, nor 
 enforcement, the 'quick' contract is expected by the user, no matter how 
 much docs you throw at them.

 Look, for instance, at Tango's now-deprecated LinkMap, which uses a 
 linked-list of key/value pairs (copied from Doug Lea's implementation). 
 Nobody in their right mind would use link map because lookups are O(n), 
 and it's just as easy to use a TreeMap or HashMap.  Would you ever use it?

 If you *really* need that sort of guarantee (and I can imagine it may be 
 useful in some cases), then the implementation of the guarantee does 
 *not* belong in the realm of "implements vs doesn't-implement a 
 particular operator overload". Doing so is an abuse of operator 
 overloading, since operator overloading is there for defining syntactic 
 sugar, not for acting as a makeshift contract.

 I don't think anybody is asking for a guarantee from the compiler or any 
 specific tool.  I think what we are saying is that violating the 'opIndex 
 is fast' notion is bad design because you end up with users thinking they 
 are doing something that's quick.  You end up with people posting 
 benchmarks on your containers saying 'why does python beat the pants off 
 your list implementation?'.  You can say 'hey, it's not meant to be used 
 that way', but then why can the user use it that way?  A better design is 
 to nudge the user into using a correct container for the job by only 
 supporting operations that make sense on the collections.

 And as far as operator semantic meaning, D's operators are purposely named 
 after what they are supposed to do.  Notice that the operator for + is 
 opAdd, not opPlus.  This is because opAdd is supposed to mean you are 
 performing an addition operation.  Assigning a different semantic meaning 
 is not disallowed by the compiler, but is considered bad design.  opIndex 
 is supposed to be an index function, not a linear search.  It's not called 
 opSearch for a reason.  Sure you can redefine it however you want 
 semantically, but it's considered bad design.  That's all we're saying.

Nobody is suggesting using [] to invoke a search (Although we have talked 
about using [] *in the search function's implementation*). Search means you 
want to get the position of a given element, or in other words, "element" -> 
search -> "key/index". What we're talking about is the reverse: getting the 
element at a given position, ie, "key/index" -> [] -> "element". It doesn't 
matter if it's an array, a linked list, or a 
super-duper-collection-from-2092: that's still indexing, not searching.

Aug 27 2008

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

"Nick Sabalausky" wrote
 "Steven Schveighoffer" wrote in message 
 news:g95b89$1jii$1 digitalmars.com...
 When someone sees an index operator the first thought is that it is a 
 quick lookup.

 This seems to be a big part of the disagreement. Personally, I think it's 
 insane to look at [] and just assume it's a cheap lookup. That's was true 
 in pure C where, as someone else said, it was nothing more than a 
 shorthand for a specific pointer arithmentic operation, but carrying that 
 idea over to more modern languages (especially one that also uses [] for 
 associative arrays) is a big mistake.

Perhaps it is a mistake to assume it, but it is a common mistake.  And the 
expectation is intuitive.  You don't see people making light switches that 
look like outlets, even though it's possible.  You might perhaps make a 
library where opIndex is a linear search in your list, but I would expect 
that people would not use that indexing feature correctly.  Just as if I 
plug my lamp into the light switch that looks like an outlet, I'd expect it 
to get power, and be confused when it doesn't.  Except the opIndex mistake 
is more subtle because I *do* get what I actually want, but I just am not 
realizing the cost of it.

 The '+' operator means "add". Addition is typically O(1). But vectors can 
 be added, and that's an O(n) operation. Should opAdd never be used for 
 vectors?

As long as it's addition, I have no problem with O(n) operation (and it 
depends on your view of n).  Indexing implies speed, look at all other cases 
where indexing is used.  For addition to be proportional to the size of the 
element is natural and expected.

 You can force yourself to think differently, but the reality is that most 
 people think that because of the universal usage of square brackets 
 (except for VB, and I feel pity for anyone who needs to use VB) to mean 
 'lookup by key', and usually this is only useful on objects where the 
 lookup is quick ( < O(n) ).  Although there is no requirement, nor 
 enforcement, the 'quick' contract is expected by the user, no matter how 
 much docs you throw at them.

 Look, for instance, at Tango's now-deprecated LinkMap, which uses a 
 linked-list of key/value pairs (copied from Doug Lea's implementation). 
 Nobody in their right mind would use link map because lookups are O(n), 
 and it's just as easy to use a TreeMap or HashMap.  Would you ever use 
 it?

 If you *really* need that sort of guarantee (and I can imagine it may be 
 useful in some cases), then the implementation of the guarantee does 
 *not* belong in the realm of "implements vs doesn't-implement a 
 particular operator overload". Doing so is an abuse of operator 
 overloading, since operator overloading is there for defining syntactic 
 sugar, not for acting as a makeshift contract.

 I don't think anybody is asking for a guarantee from the compiler or any 
 specific tool.  I think what we are saying is that violating the 'opIndex 
 is fast' notion is bad design because you end up with users thinking they 
 are doing something that's quick.  You end up with people posting 
 benchmarks on your containers saying 'why does python beat the pants off 
 your list implementation?'.  You can say 'hey, it's not meant to be used 
 that way', but then why can the user use it that way?  A better design is 
 to nudge the user into using a correct container for the job by only 
 supporting operations that make sense on the collections.

 And as far as operator semantic meaning, D's operators are purposely 
 named after what they are supposed to do.  Notice that the operator for + 
 is opAdd, not opPlus.  This is because opAdd is supposed to mean you are 
 performing an addition operation.  Assigning a different semantic meaning 
 is not disallowed by the compiler, but is considered bad design.  opIndex 
 is supposed to be an index function, not a linear search.  It's not 
 called opSearch for a reason.  Sure you can redefine it however you want 
 semantically, but it's considered bad design.  That's all we're saying.

 Nobody is suggesting using [] to invoke a search (Although we have talked 
 about using [] *in the search function's implementation*). Search means 
 you want to get the position of a given element, or in other words, 
 "element" -> search -> "key/index". What we're talking about is the 
 reverse: getting the element at a given position, ie, "key/index" -> [] -> 
 "element". It doesn't matter if it's an array, a linked list, or a 
 super-duper-collection-from-2092: that's still indexing, not searching.

It is a search, you are searching for the element whose key matches.  When 
the key can be used to lookup the element efficiently, we call that 
indexing.  Indexing is a more constrained form of searching.

-Steve

Aug 28 2008

"Nick Sabalausky" <a a.a> writes:

"Steven Schveighoffer" <schveiguy yahoo.com> wrote in message 
news:g9650h$cp9$1 digitalmars.com...
 "Nick Sabalausky" wrote
 "Steven Schveighoffer" wrote in message 
 news:g95b89$1jii$1 digitalmars.com...
 When someone sees an index operator the first thought is that it is a 
 quick lookup.

 This seems to be a big part of the disagreement. Personally, I think it's 
 insane to look at [] and just assume it's a cheap lookup. That's was true 
 in pure C where, as someone else said, it was nothing more than a 
 shorthand for a specific pointer arithmentic operation, but carrying that 
 idea over to more modern languages (especially one that also uses [] for 
 associative arrays) is a big mistake.

 Perhaps it is a mistake to assume it, but it is a common mistake. And the 
 expectation is intuitive.

Why in the world would any halfway competent programmer ever look at a 
*linked list* and assume that the linked lists's [] (if implemented) is 
O(1)?

As for the risk that could create of accidentially sending a linked list to 
a "search" (ie, a "search for an element which contains data X") that uses 
[] internally instead of iterators (but then, why wouldn't it just use 
iterators anyway?): I'll agree that in a case like this there should be some 
mechanism for automatic choosing of an algorithm, but that mechanism should 
be at a separate level of abstraction. There would be a function "search" 
that, through either RTTI or template constraints or something else, says 
"does collection 'c' implement ConstantTimeForewardDirectionIndexing?" or 
better yet IMO "does the collection have attribute 
ForewardDirectionIndexingComplexity that is set equal to 
Complexity.Constant?", and based on that passes control to either 
IndexingSearch or IteratorSearch.

  You don't see people making light switches that look like outlets, even 
 though it's possible.  You might perhaps make a library where opIndex is a 
 linear search in your list, but I would expect that people would not use 
 that indexing feature correctly.  Just as if I plug my lamp into the light 
 switch that looks like an outlet, I'd expect it to get power, and be 
 confused when it doesn't.  Except the opIndex mistake is more subtle 
 because I *do* get what I actually want, but I just am not realizing the 
 cost of it.

 The '+' operator means "add". Addition is typically O(1). But vectors can 
 be added, and that's an O(n) operation. Should opAdd never be used for 
 vectors?

 As long as it's addition, I have no problem with O(n) operation (and it 
 depends on your view of n).  Indexing implies speed, look at all other 
 cases where indexing is used.  For addition to be proportional to the size 
 of the element is natural and expected.

When people look at '+' they typically think "integer/float addition". Why 
would, for example, the risk of mistaking an O(n) "big int" addition for an 
O(1) integer/float addition be any worse than the risk of mistaking an O(n) 
linked list "get element at index" for an O(1) array "get element at index"?

 You can force yourself to think differently, but the reality is that 
 most people think that because of the universal usage of square brackets 
 (except for VB, and I feel pity for anyone who needs to use VB) to mean 
 'lookup by key', and usually this is only useful on objects where the 
 lookup is quick ( < O(n) ).  Although there is no requirement, nor 
 enforcement, the 'quick' contract is expected by the user, no matter how 
 much docs you throw at them.

 Look, for instance, at Tango's now-deprecated LinkMap, which uses a 
 linked-list of key/value pairs (copied from Doug Lea's implementation). 
 Nobody in their right mind would use link map because lookups are O(n), 
 and it's just as easy to use a TreeMap or HashMap.  Would you ever use 
 it?

 If you *really* need that sort of guarantee (and I can imagine it may 
 be useful in some cases), then the implementation of the guarantee does 
 *not* belong in the realm of "implements vs doesn't-implement a 
 particular operator overload". Doing so is an abuse of operator 
 overloading, since operator overloading is there for defining syntactic 
 sugar, not for acting as a makeshift contract.

 I don't think anybody is asking for a guarantee from the compiler or any 
 specific tool.  I think what we are saying is that violating the 
 'opIndex is fast' notion is bad design because you end up with users 
 thinking they are doing something that's quick.  You end up with people 
 posting benchmarks on your containers saying 'why does python beat the 
 pants off your list implementation?'.  You can say 'hey, it's not meant 
 to be used that way', but then why can the user use it that way?  A 
 better design is to nudge the user into using a correct container for 
 the job by only supporting operations that make sense on the 
 collections.

 And as far as operator semantic meaning, D's operators are purposely 
 named after what they are supposed to do.  Notice that the operator for 
 + is opAdd, not opPlus.  This is because opAdd is supposed to mean you 
 are performing an addition operation.  Assigning a different semantic 
 meaning is not disallowed by the compiler, but is considered bad design. 
 opIndex is supposed to be an index function, not a linear search.  It's 
 not called opSearch for a reason.  Sure you can redefine it however you 
 want semantically, but it's considered bad design.  That's all we're 
 saying.

 Nobody is suggesting using [] to invoke a search (Although we have talked 
 about using [] *in the search function's implementation*). Search means 
 you want to get the position of a given element, or in other words, 
 "element" -> search -> "key/index". What we're talking about is the 
 reverse: getting the element at a given position, ie, "key/index" -> 
 [] -> "element". It doesn't matter if it's an array, a linked list, or a 
 super-duper-collection-from-2092: that's still indexing, not searching.

 It is a search, you are searching for the element whose key matches.  When 
 the key can be used to lookup the element efficiently, we call that 
 indexing.  Indexing is a more constrained form of searching.

If you've got a linked list, and you want to get element N, are you *really* 
going to go reaching for a function named "search"? How often do you really 
see a generic function named "search" or "find" that takes a numeric index 
as a the "to be found" parameter instead of something to be matched against 
the element's value? I would argue that that would be confusing for most 
people. Like I said in a different post farther down, the implementation of 
a "getAtIndex()" is obviously going to work like a search, but from "outside 
the box", what you're asking for is not the same.

Aug 28 2008

Dee Girl <deegirl noreply.com> writes:

Nick Sabalausky Wrote:

 "Steven Schveighoffer" <schveiguy yahoo.com> wrote in message 
 news:g9650h$cp9$1 digitalmars.com...
 "Nick Sabalausky" wrote
 "Steven Schveighoffer" wrote in message 
 news:g95b89$1jii$1 digitalmars.com...
 When someone sees an index operator the first thought is that it is a 
 quick lookup.

 This seems to be a big part of the disagreement. Personally, I think it's 
 insane to look at [] and just assume it's a cheap lookup. That's was true 
 in pure C where, as someone else said, it was nothing more than a 
 shorthand for a specific pointer arithmentic operation, but carrying that 
 idea over to more modern languages (especially one that also uses [] for 
 associative arrays) is a big mistake.

 Perhaps it is a mistake to assume it, but it is a common mistake. And the 
 expectation is intuitive.

 
 Why in the world would any halfway competent programmer ever look at a 
 *linked list* and assume that the linked lists's [] (if implemented) is 
 O(1)?

A programmer can look at generic code. Generic code does not know it is a
linked list or something else. I think this really helps thinking in generic
code. Because generic code that makes problem interesting.

 As for the risk that could create of accidentially sending a linked list to 
 a "search" (ie, a "search for an element which contains data X") that uses 
 [] internally instead of iterators (but then, why wouldn't it just use 
 iterators anyway?): I'll agree that in a case like this there should be some 
 mechanism for automatic choosing of an algorithm, but that mechanism should 
 be at a separate level of abstraction. There would be a function "search" 
 that, through either RTTI or template constraints or something else, says 
 "does collection 'c' implement ConstantTimeForewardDirectionIndexing?" or 
 better yet IMO "does the collection have attribute 
 ForewardDirectionIndexingComplexity that is set equal to 
 Complexity.Constant?", and based on that passes control to either 
 IndexingSearch or IteratorSearch.

I think this is extreme complicate design. What is advantage of this design
from STL?

  You don't see people making light switches that look like outlets, even 
 though it's possible.  You might perhaps make a library where opIndex is a 
 linear search in your list, but I would expect that people would not use 
 that indexing feature correctly.  Just as if I plug my lamp into the light 
 switch that looks like an outlet, I'd expect it to get power, and be 
 confused when it doesn't.  Except the opIndex mistake is more subtle 
 because I *do* get what I actually want, but I just am not realizing the 
 cost of it.

 The '+' operator means "add". Addition is typically O(1). But vectors can 
 be added, and that's an O(n) operation. Should opAdd never be used for 
 vectors?

 As long as it's addition, I have no problem with O(n) operation (and it 
 depends on your view of n).  Indexing implies speed, look at all other 
 cases where indexing is used.  For addition to be proportional to the size 
 of the element is natural and expected.

 
 When people look at '+' they typically think "integer/float addition". Why 
 would, for example, the risk of mistaking an O(n) "big int" addition for an 
 O(1) integer/float addition be any worse than the risk of mistaking an O(n) 
 linked list "get element at index" for an O(1) array "get element at index"?

This is again wrong for two reasons. I am sorry! One small thing is I think big
int "+" is O(log n) not O(n). But real problem is people look at a[] = b[] +
c[] and see operands. It is evident from operands that cost is proportional to
input size. If it is any shorter it would be miracle! Because it means some
elements are not even seen. You compare wrong situations. I mean different
situation. And operator or function is not important. I said if array access is
index(a, n) and everybody always thinks so then index(a, n) should not do
linear search.

 You can force yourself to think differently, but the reality is that 
 most people think that because of the universal usage of square brackets 
 (except for VB, and I feel pity for anyone who needs to use VB) to mean 
 'lookup by key', and usually this is only useful on objects where the 
 lookup is quick ( < O(n) ).  Although there is no requirement, nor 
 enforcement, the 'quick' contract is expected by the user, no matter how 
 much docs you throw at them.

 Look, for instance, at Tango's now-deprecated LinkMap, which uses a 
 linked-list of key/value pairs (copied from Doug Lea's implementation). 
 Nobody in their right mind would use link map because lookups are O(n), 
 and it's just as easy to use a TreeMap or HashMap.  Would you ever use 
 it?

 If you *really* need that sort of guarantee (and I can imagine it may 
 be useful in some cases), then the implementation of the guarantee does 
 *not* belong in the realm of "implements vs doesn't-implement a 
 particular operator overload". Doing so is an abuse of operator 
 overloading, since operator overloading is there for defining syntactic 
 sugar, not for acting as a makeshift contract.

 I don't think anybody is asking for a guarantee from the compiler or any 
 specific tool.  I think what we are saying is that violating the 
 'opIndex is fast' notion is bad design because you end up with users 
 thinking they are doing something that's quick.  You end up with people 
 posting benchmarks on your containers saying 'why does python beat the 
 pants off your list implementation?'.  You can say 'hey, it's not meant 
 to be used that way', but then why can the user use it that way?  A 
 better design is to nudge the user into using a correct container for 
 the job by only supporting operations that make sense on the 
 collections.

 And as far as operator semantic meaning, D's operators are purposely 
 named after what they are supposed to do.  Notice that the operator for 
 + is opAdd, not opPlus.  This is because opAdd is supposed to mean you 
 are performing an addition operation.  Assigning a different semantic 
 meaning is not disallowed by the compiler, but is considered bad design. 
 opIndex is supposed to be an index function, not a linear search.  It's 
 not called opSearch for a reason.  Sure you can redefine it however you 
 want semantically, but it's considered bad design.  That's all we're 
 saying.

 Nobody is suggesting using [] to invoke a search (Although we have talked 
 about using [] *in the search function's implementation*). Search means 
 you want to get the position of a given element, or in other words, 
 "element" -> search -> "key/index". What we're talking about is the 
 reverse: getting the element at a given position, ie, "key/index" -> 
 [] -> "element". It doesn't matter if it's an array, a linked list, or a 
 super-duper-collection-from-2092: that's still indexing, not searching.

 It is a search, you are searching for the element whose key matches.  When 
 the key can be used to lookup the element efficiently, we call that 
 indexing.  Indexing is a more constrained form of searching.

 
 If you've got a linked list, and you want to get element N, are you *really* 
 going to go reaching for a function named "search"?

Yes. This is exactly STL is doing. And there is no wrong with it. Again I think
STL book by Josuttis very helpful. Also Stepanov notes online very interesting!
Thank you Don.

 How often do you really 
 see a generic function named "search" or "find" that takes a numeric index 
 as a the "to be found" parameter instead of something to be matched against 
 the element's value? I would argue that that would be confusing for most 
 people.

I think you lose argueing. There is experience in STL that it is not confusing.
STL is most success library for C++ even if C++ is now old and has problems.

Aug 28 2008

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

"Nick Sabalausky" wrote
 "Steven Schveighoffer" wrote in message 
 news:g9650h$cp9$1 digitalmars.com...
 "Nick Sabalausky" wrote
 "Steven Schveighoffer" wrote in message 
 news:g95b89$1jii$1 digitalmars.com...
 When someone sees an index operator the first thought is that it is a 
 quick lookup.

 This seems to be a big part of the disagreement. Personally, I think 
 it's insane to look at [] and just assume it's a cheap lookup. That's 
 was true in pure C where, as someone else said, it was nothing more than 
 a shorthand for a specific pointer arithmentic operation, but carrying 
 that idea over to more modern languages (especially one that also uses 
 [] for associative arrays) is a big mistake.

 Perhaps it is a mistake to assume it, but it is a common mistake. And the 
 expectation is intuitive.

 Why in the world would any halfway competent programmer ever look at a 
 *linked list* and assume that the linked lists's [] (if implemented) is 
 O(1)?

You are writing this function:

void foo(IOrderedContainer cont)
{
    ....
}

IOrderedContainer implements opIndex(uint).  The problem is that you can't 
tell whether the object itself is a list or not, so you are powerless to 
make the decision as to whether the container has fast indexing.  In that 
case, your only choice (if speed is an issue) is to not use opIndex.

 As for the risk that could create of accidentially sending a linked list 
 to a "search" (ie, a "search for an element which contains data X") that 
 uses [] internally instead of iterators (but then, why wouldn't it just 
 use iterators anyway?): I'll agree that in a case like this there should 
 be some mechanism for automatic choosing of an algorithm, but that 
 mechanism should be at a separate level of abstraction. There would be a 
 function "search" that, through either RTTI or template constraints or 
 something else, says "does collection 'c' implement 
 ConstantTimeForewardDirectionIndexing?" or better yet IMO "does the 
 collection have attribute ForewardDirectionIndexingComplexity that is set 
 equal to Complexity.Constant?", and based on that passes control to either 
 IndexingSearch or IteratorSearch.

To me, this is a bad design.  It's my opinion, but one that is shared among 
many people.  You can do stuff this way, but it is not intuitive.  I'd much 
rather reserve opIndex to only quick lookups, and avoid the possibility of 
accidentally using it incorrectly.

In general, I'd say if you are using lists and frequently looking up the nth 
value in the list, you have chosen the wrong container for the job.

  You don't see people making light switches that look like outlets, even 
 though it's possible.  You might perhaps make a library where opIndex is 
 a linear search in your list, but I would expect that people would not 
 use that indexing feature correctly.  Just as if I plug my lamp into the 
 light switch that looks like an outlet, I'd expect it to get power, and 
 be confused when it doesn't.  Except the opIndex mistake is more subtle 
 because I *do* get what I actually want, but I just am not realizing the 
 cost of it.

 The '+' operator means "add". Addition is typically O(1). But vectors 
 can be added, and that's an O(n) operation. Should opAdd never be used 
 for vectors?

 As long as it's addition, I have no problem with O(n) operation (and it 
 depends on your view of n).  Indexing implies speed, look at all other 
 cases where indexing is used.  For addition to be proportional to the 
 size of the element is natural and expected.

 When people look at '+' they typically think "integer/float addition". Why 
 would, for example, the risk of mistaking an O(n) "big int" addition for 
 an O(1) integer/float addition be any worse than the risk of mistaking an 
 O(n) linked list "get element at index" for an O(1) array "get element at 
 index"?

What good are integers that can't be added?  In this case, it is not 
possible to have quick addition, no matter how you implement your 
arbitrary-precision integer.  I think the time penalty is understood and 
accepted.  With opIndex, the time penalty is not expected.  Like it or not, 
this is how many users look at it.

 You can force yourself to think differently, but the reality is that 
 most people think that because of the universal usage of square 
 brackets (except for VB, and I feel pity for anyone who needs to use 
 VB) to mean 'lookup by key', and usually this is only useful on objects 
 where the lookup is quick ( < O(n) ).  Although there is no 
 requirement, nor enforcement, the 'quick' contract is expected by the 
 user, no matter how much docs you throw at them.

 Look, for instance, at Tango's now-deprecated LinkMap, which uses a 
 linked-list of key/value pairs (copied from Doug Lea's implementation). 
 Nobody in their right mind would use link map because lookups are O(n), 
 and it's just as easy to use a TreeMap or HashMap.  Would you ever use 
 it?

 If you *really* need that sort of guarantee (and I can imagine it may 
 be useful in some cases), then the implementation of the guarantee 
 does *not* belong in the realm of "implements vs doesn't-implement a 
 particular operator overload". Doing so is an abuse of operator 
 overloading, since operator overloading is there for defining 
 syntactic sugar, not for acting as a makeshift contract.

 I don't think anybody is asking for a guarantee from the compiler or 
 any specific tool.  I think what we are saying is that violating the 
 'opIndex is fast' notion is bad design because you end up with users 
 thinking they are doing something that's quick.  You end up with people 
 posting benchmarks on your containers saying 'why does python beat the 
 pants off your list implementation?'.  You can say 'hey, it's not meant 
 to be used that way', but then why can the user use it that way?  A 
 better design is to nudge the user into using a correct container for 
 the job by only supporting operations that make sense on the 
 collections.

 And as far as operator semantic meaning, D's operators are purposely 
 named after what they are supposed to do.  Notice that the operator for 
 + is opAdd, not opPlus.  This is because opAdd is supposed to mean you 
 are performing an addition operation.  Assigning a different semantic 
 meaning is not disallowed by the compiler, but is considered bad 
 design. opIndex is supposed to be an index function, not a linear 
 search.  It's not called opSearch for a reason.  Sure you can redefine 
 it however you want semantically, but it's considered bad design. 
 That's all we're saying.

 Nobody is suggesting using [] to invoke a search (Although we have 
 talked about using [] *in the search function's implementation*). Search 
 means you want to get the position of a given element, or in other 
 words, "element" -> search -> "key/index". What we're talking about is 
 the reverse: getting the element at a given position, ie, "key/index" -> 
 [] -> "element". It doesn't matter if it's an array, a linked list, or a 
 super-duper-collection-from-2092: that's still indexing, not searching.

 It is a search, you are searching for the element whose key matches. 
 When the key can be used to lookup the element efficiently, we call that 
 indexing.  Indexing is a more constrained form of searching.

 If you've got a linked list, and you want to get element N, are you 
 *really* going to go reaching for a function named "search"? How often do 
 you really see a generic function named "search" or "find" that takes a 
 numeric index as a the "to be found" parameter instead of something to be 
 matched against the element's value? I would argue that that would be 
 confusing for most people. Like I said in a different post farther down, 
 the implementation of a "getAtIndex()" is obviously going to work like a 
 search, but from "outside the box", what you're asking for is not the 
 same.

If you are indexing into a tree, it is considered a binary search, if you 
are indexing into a hash, it is a search at some point to deal with 
collisions.  People don't think about indexing as being a search, but in 
reality it is.  A really fast search.

And I don't think search would be the name of the member function, it should 
be something like 'getNth', which returns a cursor that points to the 
element.

-Steve

Aug 28 2008

"Nick Sabalausky" <a a.a> writes:

"Steven Schveighoffer" <schveiguy yahoo.com> wrote in message 
news:g96vmq$3cc$1 digitalmars.com...
 "Nick Sabalausky" wrote
 "Steven Schveighoffer" wrote in message 
 news:g9650h$cp9$1 digitalmars.com...
 Perhaps it is a mistake to assume it, but it is a common mistake. And 
 the expectation is intuitive.

 Why in the world would any halfway competent programmer ever look at a 
 *linked list* and assume that the linked lists's [] (if implemented) is 
 O(1)?

 You are writing this function:

 void foo(IOrderedContainer cont)
 {
    ....
 }

 IOrderedContainer implements opIndex(uint).  The problem is that you can't 
 tell whether the object itself is a list or not, so you are powerless to 
 make the decision as to whether the container has fast indexing.  In that 
 case, your only choice (if speed is an issue) is to not use opIndex.

Ok, so you want foo() to be able to tell if the collection has fast or slow 
indexing. What are you suggesting that foo() does when the collection does 
have slow indexing?

1. Should it fail to compile because foo's implementation uses [] and the 
slow-indexing collection doesn't implement []?

Well then how does foo know that it's the most important, most frequent 
thing being done on the collection? Suppose foo is something that needs to 
access elements in a somewhat random order, ie the kind of thing that lists 
are poorly suited for. Further suppose that collection C is some set of data 
that *usually* just gets insertions and deletions at nodes that the code 
already has direct references to. Further suppose that foo does *need* to be 
run on the collection, *but* very infrequently. So, should I *really* be 
forced to make C a collection that trades good insertion/deletion complexity 
for good indexing complexity, just because the occasionally-run foo() 
doesn't like it? And what if I want to run benchmarks to test what 
collection works best, in real-world use, for C? Should foo's intolerance of 
slow-indexing collections really be able force me to exclude testing of such 
collections?

2. Should foo revert to an alternate branch of code that doesn't use []?

This behavior can be implemented via interfaces like I described. The 
benefit of that is that [] can still serve as the shorthand it's intended 
for (see below) and you never need to introduce the inconsistency of "Gee, 
how do I get the Nth element of a collection?" "Well, on some collections 
it's getNth(), and on other collections it's []."

3. Something else?

 As for the risk that could create of accidentially sending a linked list 
 to a "search" (ie, a "search for an element which contains data X") that 
 uses [] internally instead of iterators (but then, why wouldn't it just 
 use iterators anyway?): I'll agree that in a case like this there should 
 be some mechanism for automatic choosing of an algorithm, but that 
 mechanism should be at a separate level of abstraction. There would be a 
 function "search" that, through either RTTI or template constraints or 
 something else, says "does collection 'c' implement 
 ConstantTimeForewardDirectionIndexing?" or better yet IMO "does the 
 collection have attribute ForewardDirectionIndexingComplexity that is set 
 equal to Complexity.Constant?", and based on that passes control to 
 either IndexingSearch or IteratorSearch.

 To me, this is a bad design.  It's my opinion, but one that is shared 
 among many people.  You can do stuff this way, but it is not intuitive. 
 I'd much rather reserve opIndex to only quick lookups, and avoid the 
 possibility of accidentally using it incorrectly.

Preventing a collection from ever being used in a function that would 
typically perform poorly on that collection just smacks of premature 
optimization. How do you, as the collection author, know that the collection 
will never be used in a way such that *occasional* use in certain specific 
sub-optimal a manner might actually be necessary and/or acceptable?

If you omit [] then you've burnt the bridge (so to speak) and your only 
recourse is to add a standardized "getNth()" to every single collection 
which clutters the interface, hinders integration with third-party 
collections and algorithms, and is likely to still suffer from idiots who 
think that "get Nth element" is always better than O(n) (see below).

 In general, I'd say if you are using lists and frequently looking up the 
 nth value in the list, you have chosen the wrong container for the job.

If you're frequently looking up random elements in a list, then yes, you're 
probably using the wrong container. But that's beside the point. Even if you 
only do it once: If you have a collection with a natural order, and you want 
to get the nth element, you should be able to use the standard "get element 
at index X" notation, [].

I don't care how many people go around using [] and thinking they're 
guaranteed to get a cheap computation from it. In a language that supports 
overloading of [], the [] means "get the element at key/index X". Especially 
in a language like D where using [] on an associative array can trigger an 
unbounded allocation and GC run. Using [] in D (and various other languages) 
can be expensive, period, even in the standard lib (assoc array). So looking 
at a [] and thinking "guaranteed cheap", is incorrect, period. If most 
people think 2+2=5, you're not going to redesign arithmetic to work around 
that mistaken assumption.

 When people look at '+' they typically think "integer/float addition". 
 Why would, for example, the risk of mistaking an O(n) "big int" addition 
 for an O(1) integer/float addition be any worse than the risk of 
 mistaking an O(n) linked list "get element at index" for an O(1) array 
 "get element at index"?

 What good are integers that can't be added?  In this case, it is not 
 possible to have quick addition, no matter how you implement your 
 arbitrary-precision integer.  I think the time penalty is understood and 
 accepted.  With opIndex, the time penalty is not expected.  Like it or 
 not, this is how many users look at it.

 If you've got a linked list, and you want to get element N, are you 
 *really* going to go reaching for a function named "search"? How often do 
 you really see a generic function named "search" or "find" that takes a 
 numeric index as a the "to be found" parameter instead of something to be 
 matched against the element's value? I would argue that that would be 
 confusing for most people. Like I said in a different post farther down, 
 the implementation of a "getAtIndex()" is obviously going to work like a 
 search, but from "outside the box", what you're asking for is not the 
 same.

 If you are indexing into a tree, it is considered a binary search, if you 
 are indexing into a hash, it is a search at some point to deal with 
 collisions.  People don't think about indexing as being a search, but in 
 reality it is.  A really fast search.

It's implemented as a search, but I'd argue that the input/output 
specifications are different. And yes, I suppose that does put it into a bit 
of a grey area. But I wouldn't go so far as to say that, to the caller, it's 
the same thing, because there are differences. If you want get an element 
based on it's position in the collection, you call one function. If you want 
to get an element based on it's content instead of it's position, that's 
another function. If you want to get the position of an element based on 
it's content or it's identity, that's one or two more functions (depending, 
of course, if the element is a value type or reference type, respectively).

 And I don't think search would be the name of the member function, it 
 should be something like 'getNth', which returns a cursor that points to 
 the element.

Right, and outside of pure C, [] is the shorthand for and the standardized 
name for "getNth". If someone automatically assumes [] to be a simple 
lookup, chances are they're going to make the same assumption about anything 
named along the lines of "getNth". After all, that's what [] does, it gets 
the Nth.

Aug 28 2008

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

"Nick Sabalausky" wrote
 Ok, so you want foo() to be able to tell if the collection has fast or 
 slow indexing. What are you suggesting that foo() does when the collection 
 does have slow indexing?

No, I don't want to be able to tell.  I don't want to HAVE to be able to 
tell.  In my ideal world, the collection does not implement opIndex unless 
it is fast, so there is no issue.  i.e. you cannot call foo with a linked 
list.

I'm really tired of this argument, you understand my point of view, I 
understand yours.  To you, the syntax sugar is more important than the 
complexity guarantees.  To me, what the syntax intuitively means should be 
what it does.  So I'll develop my collections library and you develop yours, 
fair enough?  I don't think either of us is right or wrong in the strict 
sense of the terms.

To be fair, I'll answer your other points as you took the time to write 
them.  And then I'm done.  I can't really be any clearer as to what I 
believe is the best design.

 1. Should it fail to compile because foo's implementation uses [] and the 
 slow-indexing collection doesn't implement []?

No, foo will always compile because opIndex should always be fast, and then 
I can specify the complexity of foo without worry.

Using an O(n) lookup operation should be more painful because it requires 
more time.  It makes users use it less.

 2. Should foo revert to an alternate branch of code that doesn't use []?

 This behavior can be implemented via interfaces like I described. The 
 benefit of that is that [] can still serve as the shorthand it's intended 
 for (see below) and you never need to introduce the inconsistency of "Gee, 
 how do I get the Nth element of a collection?" "Well, on some collections 
 it's getNth(), and on other collections it's []."

I believe that you shouldn't really ever be calling getNth on a link-list, 
and if you are, it should be a red flag, like a cast.

Furthermore [] isn't always equivalent to getNth, see below.

 As for the risk that could create of accidentially sending a linked list 
 to a "search" (ie, a "search for an element which contains data X") that 
 uses [] internally instead of iterators (but then, why wouldn't it just 
 use iterators anyway?): I'll agree that in a case like this there should 
 be some mechanism for automatic choosing of an algorithm, but that 
 mechanism should be at a separate level of abstraction. There would be a 
 function "search" that, through either RTTI or template constraints or 
 something else, says "does collection 'c' implement 
 ConstantTimeForewardDirectionIndexing?" or better yet IMO "does the 
 collection have attribute ForewardDirectionIndexingComplexity that is 
 set equal to Complexity.Constant?", and based on that passes control to 
 either IndexingSearch or IteratorSearch.

 To me, this is a bad design.  It's my opinion, but one that is shared 
 among many people.  You can do stuff this way, but it is not intuitive. 
 I'd much rather reserve opIndex to only quick lookups, and avoid the 
 possibility of accidentally using it incorrectly.

 Preventing a collection from ever being used in a function that would 
 typically perform poorly on that collection just smacks of premature 
 optimization. How do you, as the collection author, know that the 
 collection will never be used in a way such that *occasional* use in 
 certain specific sub-optimal a manner might actually be necessary and/or 
 acceptable?

It's not premature optimization, it's not offering a feature that has little 
or no use.  It's like any contract for any object, you only want to define 
the interface for which your object is designed.  A linked list should not 
have an opIndex because it's not designed to be indexed.

If I designed a new car with which you could steer each front wheel 
independently, would that make you buy it?  It's another feature that the 
car has that other cars don't.  Who cares if it's useful, its another 
*feature*!  Sometimes a good design is not that a feature is included but 
that a feature is *not* included.

 If you omit [] then you've burnt the bridge (so to speak) and your only 
 recourse is to add a standardized "getNth()" to every single collection 
 which clutters the interface, hinders integration with third-party 
 collections and algorithms, and is likely to still suffer from idiots who 
 think that "get Nth element" is always better than O(n) (see below).

I'd reserve getNth for linked lists only, if I implemented it at all.  It is 
a useless feature.  The only common feature for all containers should be 
iteration, because 'iterate next element' is always an O(1) operation 
(amortized in the case of trees).

 In general, I'd say if you are using lists and frequently looking up the 
 nth value in the list, you have chosen the wrong container for the job.

 If you're frequently looking up random elements in a list, then yes, 
 you're probably using the wrong container. But that's beside the point. 
 Even if you only do it once: If you have a collection with a natural 
 order, and you want to get the nth element, you should be able to use the 
 standard "get element at index X" notation, [].

I respectfully disagree.  For the reasons I've stated above.

 I don't care how many people go around using [] and thinking they're 
 guaranteed to get a cheap computation from it. In a language that supports 
 overloading of [], the [] means "get the element at key/index X". 
 Especially in a language like D where using [] on an associative array can 
 trigger an unbounded allocation and GC run. Using [] in D (and various 
 other languages) can be expensive, period, even in the standard lib (assoc 
 array). So looking at a [] and thinking "guaranteed cheap", is incorrect, 
 period. If most people think 2+2=5, you're not going to redesign 
 arithmetic to work around that mistaken assumption.

Your assumption is that 'get the Nth element' is the only expectation for 
opIndex interface.  My assumption is that opIndex implies 'get an element 
efficiently' is an important part of the interface.  We obviously disagree, 
and as I said above, neither of us is right or wrong, strictly speaking. 
It's a matter of what is intuitive to you.

Part of the problems I see with many bad designs is the author thinks they 
see a fit for an interface, but it's not quite there.  They are so excited 
about fitting into an interface that they forget the importance of leaving 
out elements of the interface that don't make sense.  To me this is one of 
them.  An interface is a fit IMO if it fits exactly.  If you have to do 
things like implement functions that throw exceptions because they don't 
belong, or break the contract that the interface specifies, then either the 
interface is too specific, or you are not implementing the correct 
interface.

 If you've got a linked list, and you want to get element N, are you 
 *really* going to go reaching for a function named "search"? How often 
 do you really see a generic function named "search" or "find" that takes 
 a numeric index as a the "to be found" parameter instead of something to 
 be matched against the element's value? I would argue that that would be 
 confusing for most people. Like I said in a different post farther down, 
 the implementation of a "getAtIndex()" is obviously going to work like a 
 search, but from "outside the box", what you're asking for is not the 
 same.

 If you are indexing into a tree, it is considered a binary search, if you 
 are indexing into a hash, it is a search at some point to deal with 
 collisions.  People don't think about indexing as being a search, but in 
 reality it is.  A really fast search.

 It's implemented as a search, but I'd argue that the input/output 
 specifications are different. And yes, I suppose that does put it into a 
 bit of a grey area. But I wouldn't go so far as to say that, to the 
 caller, it's the same thing, because there are differences. If you want 
 get an element based on it's position in the collection, you call one 
 function. If you want to get an element based on it's content instead of 
 it's position, that's another function. If you want to get the position of 
 an element based on it's content or it's identity, that's one or two more 
 functions (depending, of course, if the element is a value type or 
 reference type, respectively).

I disagree.  I view the numeric index of an ordered container as a 'key' 
into the container.  A keyed container has the ability to look up elements 
quickly with the key.

Take a quick look at dcollections' ArrayList.  It implements the Keyed 
interface, with uint as the key.  I have no key for LinkList, because I 
don't see a useful key.

 And I don't think search would be the name of the member function, it 
 should be something like 'getNth', which returns a cursor that points to 
 the element.

 Right, and outside of pure C, [] is the shorthand for and the standardized 
 name for "getNth". If someone automatically assumes [] to be a simple 
 lookup, chances are they're going to make the same assumption about 
 anything named along the lines of "getNth". After all, that's what [] 
 does, it gets the Nth.

I view [] as "getByIndex", index being a value that offers quick access to 
elements.  There is no implied 'get the nth element'.  Look at an 
associative array.  If I had a string[string] array, what would you expect 
to get if you passed an integer as the index?

So good luck with your linked-list-should-be-indexed battle.  I shall not be 
posting on this again.

-Steve

Aug 28 2008

"Nick Sabalausky" <a a.a> writes:

 "Steven Schveighoffer" <schveiguy yahoo.com> wrote in message 
news:g983f5$2ns2$1 digitalmars.com...
 "Nick Sabalausky" wrote
 Ok, so you want foo() to be able to tell if the collection has fast or 
 slow indexing. What are you suggesting that foo() does when the 
 collection does have slow indexing?

 No, I don't want to be able to tell.  I don't want to HAVE to be able to 
 tell.

You're missing the point. Since, as you say below, you want foo to not be 
callable with the collection since it doesn't implement opIndex, your answer 

implementation uses [] and the slow-indexing collection doesn't implement 
[]".

 In my ideal world, the collection does not implement opIndex unless it is 
 fast, so there is no issue.  i.e. you cannot call foo with a linked list.

 I'm really tired of this argument, you understand my point of view, I 
 understand yours.

..(line split for clarity)..
 To you, the syntax sugar is more important than the complexity guarantees.

Not at all. And to that effect, I've already presented a way that we can 
have both syntactic sugar and, when desired, complexity guarantees. In fact, 
the method I presented actually provides more protection against poor 
complexity than your method (Since the guarantee doesn't break when faced 
with code from people with my viewpoint on [], which as you admit below is 
neither more right nor more wrong than your viewpoint on []). Just because I 
don't agree with your method of implementing complexity guarantees, doesn't 
mean I don't think they can be valuable.

 To me, what the syntax intuitively means should be what it does.

I absolutely agree that "What the syntax intuitively means should be what it 
does". Where we disagree is on "what the [] syntax intuitively means".

 So I'll develop my collections library and you develop yours, fair enough? 
 I don't think either of us is right or wrong in the strict sense of the 
 terms.

 To be fair, I'll answer your other points as you took the time to write 
 them.  And then I'm done.  I can't really be any clearer as to what I 
 believe is the best design.

 1. Should it fail to compile because foo's implementation uses [] and the 
 slow-indexing collection doesn't implement []?

 No, foo will always compile because opIndex should always be fast, and 
 then I can specify the complexity of foo without worry.

 Using an O(n) lookup operation should be more painful because it requires 
 more time.  It makes users use it less.

 2. Should foo revert to an alternate branch of code that doesn't use []?

 This behavior can be implemented via interfaces like I described. The 
 benefit of that is that [] can still serve as the shorthand it's intended 
 for (see below) and you never need to introduce the inconsistency of 
 "Gee, how do I get the Nth element of a collection?" "Well, on some 
 collections it's getNth(), and on other collections it's []."

 I believe that you shouldn't really ever be calling getNth on a link-list, 
 and if you are, it should be a red flag, like a cast.

 Furthermore [] isn't always equivalent to getNth, see below.

Addressed below...

 As for the risk that could create of accidentially sending a linked 
 list to a "search" (ie, a "search for an element which contains data 
 X") that uses [] internally instead of iterators (but then, why 
 wouldn't it just use iterators anyway?): I'll agree that in a case like 
 this there should be some mechanism for automatic choosing of an 
 algorithm, but that mechanism should be at a separate level of 
 abstraction. There would be a function "search" that, through either 
 RTTI or template constraints or something else, says "does collection 
 'c' implement ConstantTimeForewardDirectionIndexing?" or better yet IMO 
 "does the collection have attribute ForewardDirectionIndexingComplexity 
 that is set equal to Complexity.Constant?", and based on that passes 
 control to either IndexingSearch or IteratorSearch.

 To me, this is a bad design.  It's my opinion, but one that is shared 
 among many people.  You can do stuff this way, but it is not intuitive. 
 I'd much rather reserve opIndex to only quick lookups, and avoid the 
 possibility of accidentally using it incorrectly.

 Preventing a collection from ever being used in a function that would 
 typically perform poorly on that collection just smacks of premature 
 optimization. How do you, as the collection author, know that the 
 collection will never be used in a way such that *occasional* use in 
 certain specific sub-optimal a manner might actually be necessary and/or 
 acceptable?

 It's not premature optimization, it's not offering a feature that has 
 little or no use.  It's like any contract for any object, you only want to 
 define the interface for which your object is designed.  A linked list 
 should not have an opIndex because it's not designed to be indexed.

Addressed below...

 If I designed a new car with which you could steer each front wheel 
 independently, would that make you buy it?  It's another feature that the 
 car has that other cars don't.  Who cares if it's useful, its another 
 *feature*!  Sometimes a good design is not that a feature is included but 
 that a feature is *not* included.

So, in other words, it sounds like you're saying that in my scenario above, 
you think that a linked list should not be usable, even if it is faster in 
the greater context (Without actually saying so directly). Or do you claim 
that the scenario can never happen?

 If you omit [] then you've burnt the bridge (so to speak) and your only 
 recourse is to add a standardized "getNth()" to every single collection 
 which clutters the interface, hinders integration with third-party 
 collections and algorithms, and is likely to still suffer from idiots who 
 think that "get Nth element" is always better than O(n) (see below).

 I'd reserve getNth for linked lists only, if I implemented it at all.  It 
 is a useless feature.  The only common feature for all containers should 
 be iteration, because 'iterate next element' is always an O(1) operation 
 (amortized in the case of trees).

 In general, I'd say if you are using lists and frequently looking up the 
 nth value in the list, you have chosen the wrong container for the job.

 If you're frequently looking up random elements in a list, then yes, 
 you're probably using the wrong container. But that's beside the point. 
 Even if you only do it once: If you have a collection with a natural 
 order, and you want to get the nth element, you should be able to use the 
 standard "get element at index X" notation, [].

 I respectfully disagree.  For the reasons I've stated above.

 I don't care how many people go around using [] and thinking they're 
 guaranteed to get a cheap computation from it. In a language that 
 supports overloading of [], the [] means "get the element at key/index 
 X". Especially in a language like D where using [] on an associative 
 array can trigger an unbounded allocation and GC run. Using [] in D (and 
 various other languages) can be expensive, period, even in the standard 
 lib (assoc array). So looking at a [] and thinking "guaranteed cheap", is 
 incorrect, period. If most people think 2+2=5, you're not going to 
 redesign arithmetic to work around that mistaken assumption.

 Your assumption is that 'get the Nth element' is the only expectation for 
 opIndex interface.  My assumption is that opIndex implies 'get an element 
 efficiently' is an important part of the interface.  We obviously 
 disagree, and as I said above, neither of us is right or wrong, strictly 
 speaking. It's a matter of what is intuitive to you.

 Part of the problems I see with many bad designs is the author thinks they 
 see a fit for an interface, but it's not quite there.  They are so excited 
 about fitting into an interface that they forget the importance of leaving 
 out elements of the interface that don't make sense.  To me this is one of 
 them.  An interface is a fit IMO if it fits exactly.  If you have to do 
 things like implement functions that throw exceptions because they don't 
 belong, or break the contract that the interface specifies, then either 
 the interface is too specific, or you are not implementing the correct 
 interface.

(From the above "Addressed below..."'s)

I fully agree that leaving the wrong things out of an interface is just as 
important as putting the right things in. But I don't think that's 
applicable here.

An array can do anything a linked list can do (even insert). A linked list 
can do anything an array can do (even sort). They are both capable of the 
same exact set of basic operations: insert, delete, get at position, get 
position of, append, iterate, etc). The only thing that ever differs is how 
well each type of collection scales on each of those basic operations. The 
*whole point* of having both arrays and linked lists is that they provide 
different performance tradeoffs, not that they "implement different 
interfaces", because obviously they're all capable of doing the same things. 
It's the performance tradeoffs that are the whole point of "array vs linked 
list". But it's rarely as simple as just looking at the basic operations 
individually...

Its rare that a collection would ever be used for just one basic operation. 
What's the point sorting a collection if you're never going to insert 
anything into it? What's the point of inserting data if you're never going 
to retrieve any? In most cases, you're going to be doing multiple types of 
operations on the collection, therefore the choice of collection becomes 
"Which set of tradeoffs are the most worthwhile for my overall usage 
patters?"

You can speculate and analyze all you want about the usage patterns and the 
appropriate tradeoffs, and that's good, you certainly should. But it 
ultimately comes down to the real word tests: profiling. And if you're 
profiling, you're going to want to compare the performance of different 
types of collections. And if you're going to do that, why should you prevent 
yourself from making it a one-line change ("Vector myBunchOfStuff" <-> "List 
myBunchOfStuff"), just because the fear of someone using an array for an 
insert-intensive purpose, or a list for a random-access-intensive purpose, 
drove you to design your code in a way that forces a single change of type 
to (in many cases) be an all-out refactoring - and it'll be the type of 
refactoring that no automatic refactoring tool is going to do for you.

And suppose you do successfully find that optimal container, through your 
method or mine. Then a program feature/requirement is changed/added/removed, 
and all of a sudden, the usage patterns have changed! Now you get to do it 
all again! Major refactor then profile or change a line then profile?

You're looking at guaranteeing the performance of very narrow slices of a 
program. I'll agree that can be useful in some cases (hence, my proposal for 
how to implement performance guarantees). But in many cases, that's 
effectively a "taken out of context" fallacy and can lead to trouble.

 If you've got a linked list, and you want to get element N, are you 
 *really* going to go reaching for a function named "search"? How often 
 do you really see a generic function named "search" or "find" that 
 takes a numeric index as a the "to be found" parameter instead of 
 something to be matched against the element's value? I would argue that 
 that would be confusing for most people. Like I said in a different 
 post farther down, the implementation of a "getAtIndex()" is obviously 
 going to work like a search, but from "outside the box", what you're 
 asking for is not the same.

 If you are indexing into a tree, it is considered a binary search, if 
 you are indexing into a hash, it is a search at some point to deal with 
 collisions.  People don't think about indexing as being a search, but in 
 reality it is.  A really fast search.

 It's implemented as a search, but I'd argue that the input/output 
 specifications are different. And yes, I suppose that does put it into a 
 bit of a grey area. But I wouldn't go so far as to say that, to the 
 caller, it's the same thing, because there are differences. If you want 
 get an element based on it's position in the collection, you call one 
 function. If you want to get an element based on it's content instead of 
 it's position, that's another function. If you want to get the position 
 of an element based on it's content or it's identity, that's one or two 
 more functions (depending, of course, if the element is a value type or 
 reference type, respectively).

 I disagree.  I view the numeric index of an ordered container as a 'key' 
 into the container.  A keyed container has the ability to look up elements 
 quickly with the key.

 Take a quick look at dcollections' ArrayList.  It implements the Keyed 
 interface, with uint as the key.  I have no key for LinkList, because I 
 don't see a useful key.

 And I don't think search would be the name of the member function, it 
 should be something like 'getNth', which returns a cursor that points to 
 the element.

 Right, and outside of pure C, [] is the shorthand for and the 
 standardized name for "getNth". If someone automatically assumes [] to be 
 a simple lookup, chances are they're going to make the same assumption 
 about anything named along the lines of "getNth". After all, that's what 
 [] does, it gets the Nth.

 I view [] as "getByIndex", index being a value that offers quick access to 
 elements.  There is no implied 'get the nth element'.  Look at an 
 associative array.  If I had a string[string] array, what would you expect 
 to get if you passed an integer as the index?

You misunderstand. I'm well aware of the sequentially-indexed array vs 
associative array issues. I was just using "sequentially-indexed array" 
terminology to avoid cluttering the explanations with more general terms 
that would have distracted from bigger points. By "getNth", what I was 
getting at was "getByPosition". Maybe I should have been saying 
"getByPosition" from the start, my mistake. As you can see, I still consider 
the key of an associative array to be it's position. I'll explain why:

An associative array is the dynamic/runtime equivalent of a 
static/compiletime named variable (After all, in many dynamic languages, 
like PHP (not that I like PHP), named variables literally are keys into an 
implicit associative array). In a typical static or dynamic language, all 
variables are essentially made up of two parts: The raw data and a label. 
The label, obviously, is what's used to refer to the data. The label can be 
one of two things, an identifier or (in a non-sandboxed language) a 
dereferenced memory address.

So, borrowing the usual pointer metaphor of "memory as a series of labeled 
boxes", we can have the data "7" in the 0xA04D6'th "box" which is also 
labeled with the identifier "myInt". The memory address, obviously, is the 
position of the data. The identifier is another way to to refer the same 
position. "CPU: Where should I put this 7?" "High-level Code: In the 
location labeled with the identifier myInt".

The data of a variable corresponds to an element of any collection (array, 
assoc array, list). The memory addresses not only correspond to, but 
literally are sequential indicies into the array of addressable memory (ie, 
the key/position in a sequentially-indexed array). The identifier 
corresponds to the key of an associative array or other such collection. 
"CPU: Where, within the assoc array, should I put this 7?" "High-level Code: 
In the assoc array's box/element labeled myInt"

(With a linked list, of course, there's nothing that corresponds to the key 
of an assoc array, but it does have a natural sequential order.)

Maybe I can explain the "sorting" distinction I see a little bit better with 
our terminology hopefully now in closer sync: For any collection, each 
element has a concept of position (index/key/nth/whatever) and a concept of 
data. A collection is a series of "boxes". On the outside of each box is a 
label (position/index/key/nth/whatever). On the inside of each box is data. 
If the collection's base type is a reference type, then this "inside data" 
is, of course, a pointer/reference to more data somewhere else. There are 
two basic conceptual operations: "outside label -> inside data", and "inside 
data -> outside label".

The "inside data -> outside label" is always a search (although if the 
inside data contains a cached copy of it's outside label, then that's 
somewhat of a grey area. Personally, I would count it as a "cached search": 
usable just like a search, but faster).

The "outside label -> inside data" is, of course, our disputed 
"getAtPosition". In a linked list, it's a grey area similar to hat I called 
a "cached search" above. It's usable like an ordinary "getAtPosition", but 
slower. Sure, the implementation is done via a search algoritm, but if you 
call it a search that means that for a linked list, "getAtPosition" and 
search are the same thing (for whatever that implies, I don't have time to 
go any further on that ATM, so take it as you will).

I do understand though, that you're defining "index" and "search" 
essentially as "fast" and "slow" versions (respectively) of "X" -> "Y" 
regardless of which of X or Y is "outside label" and which is "inside data". 
Personally, I find that awkward and somewhat less useful since that means 
"index" and "search" each have multiple "input vs. output" behaviors (Ie, 
there's still the question of "Am I giving the outside position and getting 
the inside data, or vice versa?").

Aug 29 2008

Dee Girl <deegirl noreply.com> writes:

Nick Sabalausky Wrote:

 "Steven Schveighoffer" <schveiguy yahoo.com> wrote in message 
 news:g95b89$1jii$1 digitalmars.com...
 When someone sees an index operator the first thought is that it is a 
 quick lookup.

 
 This seems to be a big part of the disagreement. Personally, I think it's 
 insane to look at [] and just assume it's a cheap lookup. That's was true in 
 pure C where, as someone else said, it was nothing more than a shorthand for 
 a specific pointer arithmentic operation, but carrying that idea over to 
 more modern languages (especially one that also uses [] for associative 
 arrays) is a big mistake.
 
 The '+' operator means "add". Addition is typically O(1). But vectors can be 
 added, and that's an O(n) operation. Should opAdd never be used for vectors?

This is big, big mistake! I think I do not know how to explain it. I could not
so far because we talk from one mistake to other.

If you add vector a[] = b[] + c[] then that is one operation that is repeat for
each element of vector. Complexity is linear in cost of one operation. It would
be mistake if each + take O(a.length). In other case it is simply as expected
proportional to length of input.

 You can force yourself to think differently, but the reality is that most 
 people think that because of the universal usage of square brackets 
 (except for VB, and I feel pity for anyone who needs to use VB) to mean 
 'lookup by key', and usually this is only useful on objects where the 
 lookup is quick ( < O(n) ).  Although there is no requirement, nor 
 enforcement, the 'quick' contract is expected by the user, no matter how 
 much docs you throw at them.

 Look, for instance, at Tango's now-deprecated LinkMap, which uses a 
 linked-list of key/value pairs (copied from Doug Lea's implementation). 
 Nobody in their right mind would use link map because lookups are O(n), 
 and it's just as easy to use a TreeMap or HashMap.  Would you ever use it?

 If you *really* need that sort of guarantee (and I can imagine it may be 
 useful in some cases), then the implementation of the guarantee does 
 *not* belong in the realm of "implements vs doesn't-implement a 
 particular operator overload". Doing so is an abuse of operator 
 overloading, since operator overloading is there for defining syntactic 
 sugar, not for acting as a makeshift contract.

 I don't think anybody is asking for a guarantee from the compiler or any 
 specific tool.  I think what we are saying is that violating the 'opIndex 
 is fast' notion is bad design because you end up with users thinking they 
 are doing something that's quick.  You end up with people posting 
 benchmarks on your containers saying 'why does python beat the pants off 
 your list implementation?'.  You can say 'hey, it's not meant to be used 
 that way', but then why can the user use it that way?  A better design is 
 to nudge the user into using a correct container for the job by only 
 supporting operations that make sense on the collections.

 And as far as operator semantic meaning, D's operators are purposely named 
 after what they are supposed to do.  Notice that the operator for + is 
 opAdd, not opPlus.  This is because opAdd is supposed to mean you are 
 performing an addition operation.  Assigning a different semantic meaning 
 is not disallowed by the compiler, but is considered bad design.  opIndex 
 is supposed to be an index function, not a linear search.  It's not called 
 opSearch for a reason.  Sure you can redefine it however you want 
 semantically, but it's considered bad design.  That's all we're saying.

 
 Nobody is suggesting using [] to invoke a search (Although we have talked 
 about using [] *in the search function's implementation*). Search means you 
 want to get the position of a given element, or in other words, "element" -> 
 search -> "key/index". What we're talking about is the reverse: getting the 
 element at a given position, ie, "key/index" -> [] -> "element". It doesn't 
 matter if it's an array, a linked list, or a 
 super-duper-collection-from-2092: that's still indexing, not searching.

I think this is also big mistake. I am so sorry! I can not explain it. But it
looks you think if you call it different it behaves different. Sorry, Dee Girl

Aug 28 2008

Fawzi Mohamed <fmohamed mac.com> writes:

On 2008-08-28 06:06:11 +0200, "Nick Sabalausky" <a a.a> said:

 "Fawzi Mohamed" <fmohamed mac.com> wrote in message
 news:g94k2b$2a1e$1 digitalmars.com...
 
 I am with dan dee_girl & co on this issue, the problem is that a generic
 algorithm "knows" the types he is working on and can easily check the
 operations they have, and based on this decide the strategy to use. This
 choice works well if the presence of a given operation is also connected
 with some performance guarantee.
 

 

 named interfaces. I'm not sure if this is what you're referring to below or
 not.

yes categories are basically named interfaces for types, unlike the 
constraint (that check if something is implemented) one has to 
explicitly say T implements Interface (and obviously T has to have all 
the requested functions/methods).
This should be available for all types, and you should be able to 
request also the existence of free functions, not only of methods.
Attributes are a simplified version of this (basically no checks for a 
given interface).
The important thing is that presence or absence of attributes in a 
given type is not automatically inferred from the presence of given 
functions.

 Concepts (or better categories (aldor concept not C++), that are
 interfaces for types, but interfaces that have to be explicitly assigned
 to a type) might relax this situation a little, but the need for some
 guarantees will remain.
 

 
 If this "guarantee" (or mechanism for checking the types of operations that
 a collection supports) takes the form of a style guideline that says "don't
 implement opIndex for a collection if it would be O(n) or worse", then that,
 frankly, is absolutely no guarantee at all.

well if it is in the spec and everybody knows it, then breaking it and 
getting bad behavior is you own fault.

 If you *really* need that sort of guarantee (and I can imagine it may be
 useful in some cases), then the implementation of the guarantee does *not*
 belong in the realm of "implements vs doesn't-implement a particular
 operator overload". Doing so is an abuse of operator overloading, since
 operator overloading is there for defining syntactic sugar, not for acting
 as a makeshift contract.
 
 The correct mechanism for such guarantees is with named interfaces or

 the collection author wants to, but they have to actually try (ie, they have
 to lie and say "implements IndexingInConstantTime" in addition to
 implementing opIndex). If you instead try to implement that guarantee with
 the "don't implement opIndex for a collection if it would be O(n) or worse"
 style-guideline, then it's far too easy for a collection to come along that
 is ignorant of that "psuedo-contract" and accidentially breaks it. Proper
 use of interfaces/attributes instead of relying on the existence or absense
 of an overloaded operator fixes that problem.

I fully agree that with interfaces (or categories or attributes) the 
correct thing is to use them to enforce extra constraints, so that 
overloading (or naming the functions) is really just syntactic sugar, 
but also in this case it can make reading (and writing from the 
beginning reasonably fast code) and understanding complexity (speed) of 
code reading it easier if some social contract about speed of 
operations is respected.

Pleas note that these "guarantees" are not such that one cannot break 
them, their purpose is to make the life of who knows them and enforces 
them easier, and also of the whole community if it chooses to adopt it. 
As Steve just argued much more fully than me.

STL, does it, and I think that also in D should do it, should D get 
categories or attributes this things could be relaxed a little, but I 
think there will still be cases where expecting a given function to 
have a given complexity will be a good thing to have.

It just makes thinking about the code easier, and simpler to stay at 
high level without having surprises that you have to find out by 
looking in detail at the code, and so it makes a programmer more 
productive.

Fawzi

Aug 28 2008

Derek Parnell <derek psych.ward> writes:

On Wed, 27 Aug 2008 16:33:24 -0400, Nick Sabalausky wrote:

 A generic algoritm has absolutely no business caring about the complexity of 
 the collection it's operating on.

I also believe this to be true.

-- 
Derek Parnell
Melbourne, Australia
skype: derek.j.parnell

Aug 27 2008

Dee Girl <deegirl noreply.com> writes:

Derek Parnell Wrote:

 On Wed, 27 Aug 2008 16:33:24 -0400, Nick Sabalausky wrote:
 
 A generic algoritm has absolutely no business caring about the complexity of 
 the collection it's operating on.

 
 I also believe this to be true.

It is true. But only if you take out of context. An algorithm does not need to
know the complexity of collection. But it must have a minim guarantee of what
iterator the collection has. Is it forward only, bidirectional, or random
access? This is small interface. And very easy to implement. Inside the
iterator can do the thing it needs to access collection. Algorithm must not
know it! Only ++, *, [] and comparison. That is why in STL algorithms are so
general. Because they work with very small (narrow) interface. Thank you, Dee
Girl

Aug 27 2008

Dee Girl <deegirl noreply.com> writes:

Nick Sabalausky Wrote:

 "Dee Girl" <deegirl noreply.com> wrote in message 
 news:g943oi$11f4$1 digitalmars.com...
 Benji Smith Wrote:

 Dee Girl wrote:
 Michiel Helvensteijn Wrote:
 That's simple. a[i] looks much nicer than a.nth(i).

 It is not nicer. It is more deceiving (correct spell?). If you look at 
 code it looks like array code.

 foreach (i; 0 .. a.length)
 {
     a[i] += 1;
 }

 For array works nice. But for list it is terrible! Many operations for 
 incrementing only small list.

 Well, that's what you get with operator overloading.

 I am sorry. I disagree. I think that is what you get with bad design.

 The same thing could be said for "+" or "-". They're inherently
 deceiving, because they look like builtin operations on primitive data
 types.

 For expensive operations (like performing division on an
 unlimited-precision decimal object), should the author of the code use
 "opDiv" or should he implement a separate "divide" function?

 The cost of + and - is proportional to digits in number. For small number 
 of digits computer does fast in hardware. For many digits the cost grows. 
 The number of digits is log n. I think + and - are fine for big integer. I 
 am not surprise.

 Forget opIndex for a moment, and ask the more general question about all
 overloaded operators. Should they imply any sort of asymptotic
 complexity guarantee?

 I think depends on good design. For example I think ++ or -- for iterator. 
 If it is O(n) it is bad design. Bad design make people say like you "This 
 is what you get with operator overloading".

 Personally, I don't think so.

 I don't like "nth".

 I'd rather use the opIndex. And if I'm using a linked list, I'll be
 aware of the fact that it'll exhibit linear-time indexing, and I'll be
 cautious about which algorithms to use.

 But inside algorithm you do not know if you use a linked list or a vector. 
 You lost that information in bad abstraction. Also abstraction is bad 
 because if you change data structure you have concept errors that still 
 compile. And run until tomorrow ^_^.

 
 A generic algoritm has absolutely no business caring about the complexity of 
 the collection it's operating on. If it does, then you've created a concrete 
 algoritm, not a generic one.

I appreciate your view point. Please allow me explain. The view point is in
opposition with STL. In STL each algorithm defines what kind of iterator it
operates with. And it requires what iterator complexity.

I agree that other design can be made. But STL has that design. In my opinion
is much part of what make STL so successful.

I disagree that algorithm that knows complexity of iterator is concrete. I
think exactly contrary. Maybe it is good that you read book about STL by
Josuttis. STL algorithms are the most generic I ever find in any language. I
hope std.algorithm in D will be better. But right now std.algorithm works only
with array.

 If an algoritm uses [] and doesn't know the 
 complexity of the []...good! It shouldn't know, and it shouldn't care. It's 
 the code that sends the collection to the algoritm that knows and cares.

I think this is mistake. Algorithm should know. Otherwise "linear find" is not
"linear find"! It is "cuadratic find" (spell?). If you want to define something
called linear find then you must know iterator complexity.
 
 Why? Because "what algoritm is best?" depends on far more than just what 
 type of collection is used. It depends on "Will the collection ever be 
 larger than X elements?". It depends on "Is it a standard textbook list, or 
 does it use trick 1 and/or trick 2?". It depends on "Is it usually mostly 
 sorted or mostly random?". It depends on "What do I do with it most often? 
 Sort, append, search, insert or delete?". And it depends on other things, 
 too.

I agree it depends on many things. But such practical matters do not change the
nature of generic algorithm. Linear find is same on 5, 50, or 5 million
objects. I have to say I also think you have inversed some ideas. Algorithm is
same. You use it the way you want.

 Using "[]" versus "nth()" can't tell the algoritm *any* of those things.

This is interface convention. Like any other interface convention! Nobody says
that IStack.Push() puts something on stack. It is described in documentation.
If a concrete stack is wrong it can do anything.

Only special about [] is that built in array has []. So I do not think a list
should want to look like array.

 But 
 those things *must* be known in order to make an accurate decision of "Is 
 this the right algoritm or not?" Therefore, a generic algoritm *cannot* ever 
 know for certain if it's the right algoritm *even* if you say "[]" means 
 "O(log n) or better". Therefore, the algorithm should not be designed to 
 only work with certain types of collections. The code that sends the 
 collection to the algoritm is the *only* code that knows the answers to all 
 of the questions above, therefore it is the only code that should ever 
 decide "I should use this algorithm, I shouldn't use that algorithm."

I respectfully disagree. For example binary_search in STL should never compile
on a list. Because it would be simply wrong to use it with a list. It has no
sense. So I am happy that STL does not allow that.

I think you can easy build structure and algorithm library that allows wrong
combinations. In programming you can do anything ^_^. But I think then I say:
Your library is inferior and STL is superior. I am sorry, Dee Girl

Aug 27 2008

"Nick Sabalausky" <a a.a> writes:

"Dee Girl" <deegirl noreply.com> wrote in message 
news:g94j7a$2875$1 digitalmars.com...
 Nick Sabalausky Wrote:

 "Dee Girl" <deegirl noreply.com> wrote in message
 news:g943oi$11f4$1 digitalmars.com...
 Benji Smith Wrote:

 Dee Girl wrote:
 Michiel Helvensteijn Wrote:
 That's simple. a[i] looks much nicer than a.nth(i).

 It is not nicer. It is more deceiving (correct spell?). If you look 
 at
 code it looks like array code.

 foreach (i; 0 .. a.length)
 {
     a[i] += 1;
 }

 For array works nice. But for list it is terrible! Many operations 
 for
 incrementing only small list.

 Well, that's what you get with operator overloading.

 I am sorry. I disagree. I think that is what you get with bad design.

 The same thing could be said for "+" or "-". They're inherently
 deceiving, because they look like builtin operations on primitive data
 types.

 For expensive operations (like performing division on an
 unlimited-precision decimal object), should the author of the code use
 "opDiv" or should he implement a separate "divide" function?

 The cost of + and - is proportional to digits in number. For small 
 number
 of digits computer does fast in hardware. For many digits the cost 
 grows.
 The number of digits is log n. I think + and - are fine for big 
 integer. I
 am not surprise.

 Forget opIndex for a moment, and ask the more general question about 
 all
 overloaded operators. Should they imply any sort of asymptotic
 complexity guarantee?

 I think depends on good design. For example I think ++ or -- for 
 iterator.
 If it is O(n) it is bad design. Bad design make people say like you 
 "This
 is what you get with operator overloading".

 Personally, I don't think so.

 I don't like "nth".

 I'd rather use the opIndex. And if I'm using a linked list, I'll be
 aware of the fact that it'll exhibit linear-time indexing, and I'll be
 cautious about which algorithms to use.

 But inside algorithm you do not know if you use a linked list or a 
 vector.
 You lost that information in bad abstraction. Also abstraction is bad
 because if you change data structure you have concept errors that still
 compile. And run until tomorrow ^_^.

 A generic algoritm has absolutely no business caring about the complexity 
 of
 the collection it's operating on. If it does, then you've created a 
 concrete
 algoritm, not a generic one.

 I appreciate your view point. Please allow me explain. The view point is 
 in opposition with STL. In STL each algorithm defines what kind of 
 iterator it operates with. And it requires what iterator complexity.

 I agree that other design can be made. But STL has that design. In my 
 opinion is much part of what make STL so successful.

 I disagree that algorithm that knows complexity of iterator is concrete. I 
 think exactly contrary. Maybe it is good that you read book about STL by 
 Josuttis. STL algorithms are the most generic I ever find in any language. 
 I hope std.algorithm in D will be better. But right now std.algorithm 
 works only with array.

 If an algoritm uses [] and doesn't know the
 complexity of the []...good! It shouldn't know, and it shouldn't care. 
 It's
 the code that sends the collection to the algoritm that knows and cares.

 I think this is mistake. Algorithm should know. Otherwise "linear find" is 
 not "linear find"! It is "cuadratic find" (spell?). If you want to define 
 something called linear find then you must know iterator complexity.

If a generic algorithm describes itself as "linear find" then I know damn 
well that it's referring to the behavior of *just* the function itself, and 
is not a statement that the function *combined* with the behavior of the 
collection and/or a custom comparison is always going to be O(n).

A question about STL: If I create a collection that, internally, is like a 
linked list, but starts each indexing operation from the position of the 
last indexing operation (so that a "find first" would run in O(n) instead of 
O(n*n)), is it possible to send that collection to STL's generic "linear 
find first"? I would argue that it should somehow be possible *even* if the 
STL's generic "linear find first" guarantees a *total* performance of O(n) 
(Since, in this case, it would still be O(n) anyway). Because otherwise, the 
STL wouldn't be very extendable, which would be a bad thing for a library of 
"generic" algorithms.

Another STL question: It is possible to use STL to do a "linear find" using 
a custom comparison? If so, it is possible to make STL's "linear find" 
function use a comparison that just happens to be O(n)? If so, doesn't that 
violate the linear-time guarantee, too? If not, how does it know that the 
custom comparison is O(n) instead of O(1) or O(log n)?

Aug 27 2008

Don <nospam nospam.com.au> writes:

Nick Sabalausky wrote:
 "Dee Girl" <deegirl noreply.com> wrote in message 
 news:g94j7a$2875$1 digitalmars.com...
 Nick Sabalausky Wrote:

 "Dee Girl" <deegirl noreply.com> wrote in message
 news:g943oi$11f4$1 digitalmars.com...
 Benji Smith Wrote:

 Dee Girl wrote:
 Michiel Helvensteijn Wrote:
 That's simple. a[i] looks much nicer than a.nth(i).

 It is not nicer. It is more deceiving (correct spell?). If you look 
 at
 code it looks like array code.

 foreach (i; 0 .. a.length)
 {
     a[i] += 1;
 }

 For array works nice. But for list it is terrible! Many operations 
 for
 incrementing only small list.

 Well, that's what you get with operator overloading.

 I am sorry. I disagree. I think that is what you get with bad design.

 The same thing could be said for "+" or "-". They're inherently
 deceiving, because they look like builtin operations on primitive data
 types.

 For expensive operations (like performing division on an
 unlimited-precision decimal object), should the author of the code use
 "opDiv" or should he implement a separate "divide" function?

 The cost of + and - is proportional to digits in number. For small 
 number
 of digits computer does fast in hardware. For many digits the cost 
 grows.
 The number of digits is log n. I think + and - are fine for big 
 integer. I
 am not surprise.

 Forget opIndex for a moment, and ask the more general question about 
 all
 overloaded operators. Should they imply any sort of asymptotic
 complexity guarantee?

 I think depends on good design. For example I think ++ or -- for 
 iterator.
 If it is O(n) it is bad design. Bad design make people say like you 
 "This
 is what you get with operator overloading".

 Personally, I don't think so.

 I don't like "nth".

 I'd rather use the opIndex. And if I'm using a linked list, I'll be
 aware of the fact that it'll exhibit linear-time indexing, and I'll be
 cautious about which algorithms to use.

 But inside algorithm you do not know if you use a linked list or a 
 vector.
 You lost that information in bad abstraction. Also abstraction is bad
 because if you change data structure you have concept errors that still
 compile. And run until tomorrow ^_^.

 A generic algoritm has absolutely no business caring about the complexity 
 of
 the collection it's operating on. If it does, then you've created a 
 concrete
 algoritm, not a generic one.

 I appreciate your view point. Please allow me explain. The view point is 
 in opposition with STL. In STL each algorithm defines what kind of 
 iterator it operates with. And it requires what iterator complexity.

 I agree that other design can be made. But STL has that design. In my 
 opinion is much part of what make STL so successful.

 I disagree that algorithm that knows complexity of iterator is concrete. I 
 think exactly contrary. Maybe it is good that you read book about STL by 
 Josuttis. STL algorithms are the most generic I ever find in any language. 
 I hope std.algorithm in D will be better. But right now std.algorithm 
 works only with array.

 If an algoritm uses [] and doesn't know the
 complexity of the []...good! It shouldn't know, and it shouldn't care. 
 It's
 the code that sends the collection to the algoritm that knows and cares.

 I think this is mistake. Algorithm should know. Otherwise "linear find" is 
 not "linear find"! It is "cuadratic find" (spell?). If you want to define 
 something called linear find then you must know iterator complexity.

 
 If a generic algorithm describes itself as "linear find" then I know damn 
 well that it's referring to the behavior of *just* the function itself, and 
 is not a statement that the function *combined* with the behavior of the 
 collection and/or a custom comparison is always going to be O(n).
 
 A question about STL: If I create a collection that, internally, is like a 
 linked list, but starts each indexing operation from the position of the 
 last indexing operation (so that a "find first" would run in O(n) instead of 
 O(n*n)), is it possible to send that collection to STL's generic "linear 
 find first"? I would argue that it should somehow be possible *even* if the 
 STL's generic "linear find first" guarantees a *total* performance of O(n) 
 (Since, in this case, it would still be O(n) anyway). Because otherwise, the 
 STL wouldn't be very extendable, which would be a bad thing for a library of 
 "generic" algorithms.

Yes, it will work.

 Another STL question: It is possible to use STL to do a "linear find" using 
 a custom comparison? If so, it is possible to make STL's "linear find" 
 function use a comparison that just happens to be O(n)? If so, doesn't that 
 violate the linear-time guarantee, too? If not, how does it know that the 
 custom comparison is O(n) instead of O(1) or O(log n)?

This will work too.

IF you follow the conventions THEN the STL gives you the guarantees.

Aug 28 2008

"Nick Sabalausky" <a a.a> writes:

"Don" <nospam nospam.com.au> wrote in message 
news:g95ks5$2aon$1 digitalmars.com...
 Nick Sabalausky wrote:
 "Dee Girl" <deegirl noreply.com> wrote in message 
 news:g94j7a$2875$1 digitalmars.com...
 I appreciate your view point. Please allow me explain. The view point is 
 in opposition with STL. In STL each algorithm defines what kind of 
 iterator it operates with. And it requires what iterator complexity.

 I agree that other design can be made. But STL has that design. In my 
 opinion is much part of what make STL so successful.

 I disagree that algorithm that knows complexity of iterator is concrete. 
 I think exactly contrary. Maybe it is good that you read book about STL 
 by Josuttis. STL algorithms are the most generic I ever find in any 
 language. I hope std.algorithm in D will be better. But right now 
 std.algorithm works only with array.

 If an algoritm uses [] and doesn't know the
 complexity of the []...good! It shouldn't know, and it shouldn't care. 
 It's
 the code that sends the collection to the algoritm that knows and 
 cares.

 I think this is mistake. Algorithm should know. Otherwise "linear find" 
 is not "linear find"! It is "cuadratic find" (spell?). If you want to 
 define something called linear find then you must know iterator 
 complexity.

 If a generic algorithm describes itself as "linear find" then I know damn 
 well that it's referring to the behavior of *just* the function itself, 
 and is not a statement that the function *combined* with the behavior of 
 the collection and/or a custom comparison is always going to be O(n).

 A question about STL: If I create a collection that, internally, is like 
 a linked list, but starts each indexing operation from the position of 
 the last indexing operation (so that a "find first" would run in O(n) 
 instead of O(n*n)), is it possible to send that collection to STL's 
 generic "linear find first"? I would argue that it should somehow be 
 possible *even* if the STL's generic "linear find first" guarantees a 
 *total* performance of O(n) (Since, in this case, it would still be O(n) 
 anyway). Because otherwise, the STL wouldn't be very extendable, which 
 would be a bad thing for a library of "generic" algorithms.

 Yes, it will work.

 Another STL question: It is possible to use STL to do a "linear find" 
 using a custom comparison? If so, it is possible to make STL's "linear 
 find" function use a comparison that just happens to be O(n)? If so, 
 doesn't that violate the linear-time guarantee, too? If not, how does it 
 know that the custom comparison is O(n) instead of O(1) or O(log n)?

 This will work too.

 IF you follow the conventions THEN the STL gives you the guarantees.

I'm not sure that's really a "guarantee" per se, but that's splitting hairs.

In any case, it sounds like we're all arguing more or less the same point:

Setting aside the issue of "should opIndex be used and when?", suppose I 
have the following collection interface and find function (not guaranteed to 
compile):

interface ICollection(T)
{
    T getElement(index);
    int getSize();
}

int find(T)(ICollection(T) c, T elem)
{
    for(int i=0; i<c.size(); i++)
    {
 if(c.getElement(i) == elem)
            return i;
    }
}

It sounds like STL's approach is to do something roughly like that and say:

"find()'s parameter 'c' should be an ICollection for which getElement() is 
O(1), in which case find() is guaranteed to be O(n)"

What I've been advocating is, again, doing something like the code above and 
saying:

"find()'s complexity is dependant on the complexity of the ICollection's 
getElement(). If getElement()'s complexity is O(m), then find()'s complexity 
is guaranteed to be O(m * n). Of course, this means that the only way to get 
ideal complexity from find() is to use an ICollection for which getElement() 
is O(1)".

But, you see, those two statements are effectively equivilent.

Aug 28 2008

Don <nospam nospam.com.au> writes:

Nick Sabalausky wrote:
 "Don" <nospam nospam.com.au> wrote in message 
 news:g95ks5$2aon$1 digitalmars.com...
 Nick Sabalausky wrote:
 "Dee Girl" <deegirl noreply.com> wrote in message 
 news:g94j7a$2875$1 digitalmars.com...
 I appreciate your view point. Please allow me explain. The view point is 
 in opposition with STL. In STL each algorithm defines what kind of 
 iterator it operates with. And it requires what iterator complexity.

 I agree that other design can be made. But STL has that design. In my 
 opinion is much part of what make STL so successful.

 I disagree that algorithm that knows complexity of iterator is concrete. 
 I think exactly contrary. Maybe it is good that you read book about STL 
 by Josuttis. STL algorithms are the most generic I ever find in any 
 language. I hope std.algorithm in D will be better. But right now 
 std.algorithm works only with array.

 If an algoritm uses [] and doesn't know the
 complexity of the []...good! It shouldn't know, and it shouldn't care. 
 It's
 the code that sends the collection to the algoritm that knows and 
 cares.

 I think this is mistake. Algorithm should know. Otherwise "linear find" 
 is not "linear find"! It is "cuadratic find" (spell?). If you want to 
 define something called linear find then you must know iterator 
 complexity.

 If a generic algorithm describes itself as "linear find" then I know damn 
 well that it's referring to the behavior of *just* the function itself, 
 and is not a statement that the function *combined* with the behavior of 
 the collection and/or a custom comparison is always going to be O(n).

 A question about STL: If I create a collection that, internally, is like 
 a linked list, but starts each indexing operation from the position of 
 the last indexing operation (so that a "find first" would run in O(n) 
 instead of O(n*n)), is it possible to send that collection to STL's 
 generic "linear find first"? I would argue that it should somehow be 
 possible *even* if the STL's generic "linear find first" guarantees a 
 *total* performance of O(n) (Since, in this case, it would still be O(n) 
 anyway). Because otherwise, the STL wouldn't be very extendable, which 
 would be a bad thing for a library of "generic" algorithms.

 Yes, it will work.

 Another STL question: It is possible to use STL to do a "linear find" 
 using a custom comparison? If so, it is possible to make STL's "linear 
 find" function use a comparison that just happens to be O(n)? If so, 
 doesn't that violate the linear-time guarantee, too? If not, how does it 
 know that the custom comparison is O(n) instead of O(1) or O(log n)?

 This will work too.

 IF you follow the conventions THEN the STL gives you the guarantees.

 
 I'm not sure that's really a "guarantee" per se, but that's splitting hairs.
 
 In any case, it sounds like we're all arguing more or less the same point:
 
 Setting aside the issue of "should opIndex be used and when?", suppose I 
 have the following collection interface and find function (not guaranteed to 
 compile):
 
 interface ICollection(T)
 {
     T getElement(index);
     int getSize();
 }
 
 int find(T)(ICollection(T) c, T elem)
 {
     for(int i=0; i<c.size(); i++)
     {
  if(c.getElement(i) == elem)
             return i;
     }
 }
 
 It sounds like STL's approach is to do something roughly like that and say:
 
 "find()'s parameter 'c' should be an ICollection for which getElement() is 
 O(1), in which case find() is guaranteed to be O(n)"
 
 What I've been advocating is, again, doing something like the code above and 
 saying:
 
 "find()'s complexity is dependant on the complexity of the ICollection's 
 getElement(). If getElement()'s complexity is O(m), then find()'s complexity 
 is guaranteed to be O(m * n). Of course, this means that the only way to get 
 ideal complexity from find() is to use an ICollection for which getElement() 
 is O(1)".
 
 But, you see, those two statements are effectively equivilent.

They are. But...
if you don't adhere to the conventions, your code gets really hard to 
reason about.

"This class has an opIndex which is in O(n). Is that OK?" Well, that 
depends on what it's being used for. So you have to look at all of the 
places where it is used.

It's much simpler to use the convention that opIndex _must_ be fast; 
this way the performance requirements for containers and algorithms are 
completely decoupled from each other. It's about good design.

Aug 28 2008

"Nick Sabalausky" <a a.a> writes:

"Don" <nospam nospam.com.au> wrote in message 
news:g95td3$2tu0$1 digitalmars.com...
 Nick Sabalausky wrote:
 "Don" <nospam nospam.com.au> wrote in message 
 news:g95ks5$2aon$1 digitalmars.com...
 Nick Sabalausky wrote:
 "Dee Girl" <deegirl noreply.com> wrote in message 
 news:g94j7a$2875$1 digitalmars.com...
 I appreciate your view point. Please allow me explain. The view point 
 is in opposition with STL. In STL each algorithm defines what kind of 
 iterator it operates with. And it requires what iterator complexity.

 I agree that other design can be made. But STL has that design. In my 
 opinion is much part of what make STL so successful.

 I disagree that algorithm that knows complexity of iterator is 
 concrete. I think exactly contrary. Maybe it is good that you read 
 book about STL by Josuttis. STL algorithms are the most generic I ever 
 find in any language. I hope std.algorithm in D will be better. But 
 right now std.algorithm works only with array.

 If an algoritm uses [] and doesn't know the
 complexity of the []...good! It shouldn't know, and it shouldn't 
 care. It's
 the code that sends the collection to the algoritm that knows and 
 cares.

 I think this is mistake. Algorithm should know. Otherwise "linear 
 find" is not "linear find"! It is "cuadratic find" (spell?). If you 
 want to define something called linear find then you must know 
 iterator complexity.

 If a generic algorithm describes itself as "linear find" then I know 
 damn well that it's referring to the behavior of *just* the function 
 itself, and is not a statement that the function *combined* with the 
 behavior of the collection and/or a custom comparison is always going 
 to be O(n).

 A question about STL: If I create a collection that, internally, is 
 like a linked list, but starts each indexing operation from the 
 position of the last indexing operation (so that a "find first" would 
 run in O(n) instead of O(n*n)), is it possible to send that collection 
 to STL's generic "linear find first"? I would argue that it should 
 somehow be possible *even* if the STL's generic "linear find first" 
 guarantees a *total* performance of O(n) (Since, in this case, it would 
 still be O(n) anyway). Because otherwise, the STL wouldn't be very 
 extendable, which would be a bad thing for a library of "generic" 
 algorithms.

 Yes, it will work.

 Another STL question: It is possible to use STL to do a "linear find" 
 using a custom comparison? If so, it is possible to make STL's "linear 
 find" function use a comparison that just happens to be O(n)? If so, 
 doesn't that violate the linear-time guarantee, too? If not, how does 
 it know that the custom comparison is O(n) instead of O(1) or O(log n)?

 This will work too.

 IF you follow the conventions THEN the STL gives you the guarantees.

 I'm not sure that's really a "guarantee" per se, but that's splitting 
 hairs.

 In any case, it sounds like we're all arguing more or less the same 
 point:

 Setting aside the issue of "should opIndex be used and when?", suppose I 
 have the following collection interface and find function (not guaranteed 
 to compile):

 interface ICollection(T)
 {
     T getElement(index);
     int getSize();
 }

 int find(T)(ICollection(T) c, T elem)
 {
     for(int i=0; i<c.size(); i++)
     {
  if(c.getElement(i) == elem)
             return i;
     }
 }

 It sounds like STL's approach is to do something roughly like that and 
 say:

 "find()'s parameter 'c' should be an ICollection for which getElement() 
 is O(1), in which case find() is guaranteed to be O(n)"

 What I've been advocating is, again, doing something like the code above 
 and saying:

 "find()'s complexity is dependant on the complexity of the ICollection's 
 getElement(). If getElement()'s complexity is O(m), then find()'s 
 complexity is guaranteed to be O(m * n). Of course, this means that the 
 only way to get ideal complexity from find() is to use an ICollection for 
 which getElement() is O(1)".

 But, you see, those two statements are effectively equivilent.

 They are. But...
 if you don't adhere to the conventions, your code gets really hard to 
 reason about.

 "This class has an opIndex which is in O(n). Is that OK?" Well, that 
 depends on what it's being used for. So you have to look at all of the 
 places where it is used.

 It's much simpler to use the convention that opIndex _must_ be fast; this 
 way the performance requirements for containers and algorithms are 
 completely decoupled from each other. It's about good design.

Taking a slight detour, let me ask you this... Which of the following 
strategies do you consider to be better:

//-- A --
value = 0;
for(int i=1; i<=10; i++)
{
    value += i*2;
}

//-- B --
value = sum(map(1..10, {n * 2}));

Both strategies compute the sum of the first 10 multiples of 2.

Strategy A makes the low-level implementation details very clear, but IMO, 
it comes at the expense of high-level clarity. This is because the code 
intermixes the high-level "what I want to accomplish?" with the low-level 
details.

Strategy B much more closely resembles the high-level desired result, and 
thus makes the high-level intent more clear. But this comes at the cost of 
hiding the low-level details behind a layer of abstraction.

I may very well be wrong on this, but from what you've said it sounds like 
you (as well as the other people who prefer [] to never be O(n)) are the 
type of coder who would prefer "Strategy A". In that case, I can completely 
understand your viewpoint on opIndex, even though I don't agree with it (I'm 
a "Strategy B" kind of person).

Of course, if I'm wrong on that assumption, then we're back to square one ;)

Aug 28 2008

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

"Nick Sabalausky" wrote
 "Don" <nospam nospam.com.au> wrote in message 
 news:g95td3$2tu0$1 digitalmars.com...
 Nick Sabalausky wrote:
 "Don" <nospam nospam.com.au> wrote in message 
 news:g95ks5$2aon$1 digitalmars.com...
 Nick Sabalausky wrote:
 "Dee Girl" <deegirl noreply.com> wrote in message 
 news:g94j7a$2875$1 digitalmars.com...
 I appreciate your view point. Please allow me explain. The view point 
 is in opposition with STL. In STL each algorithm defines what kind of 
 iterator it operates with. And it requires what iterator complexity.

 I agree that other design can be made. But STL has that design. In my 
 opinion is much part of what make STL so successful.

 I disagree that algorithm that knows complexity of iterator is 
 concrete. I think exactly contrary. Maybe it is good that you read 
 book about STL by Josuttis. STL algorithms are the most generic I 
 ever find in any language. I hope std.algorithm in D will be better. 
 But right now std.algorithm works only with array.

 If an algoritm uses [] and doesn't know the
 complexity of the []...good! It shouldn't know, and it shouldn't 
 care. It's
 the code that sends the collection to the algoritm that knows and 
 cares.

 I think this is mistake. Algorithm should know. Otherwise "linear 
 find" is not "linear find"! It is "cuadratic find" (spell?). If you 
 want to define something called linear find then you must know 
 iterator complexity.

 If a generic algorithm describes itself as "linear find" then I know 
 damn well that it's referring to the behavior of *just* the function 
 itself, and is not a statement that the function *combined* with the 
 behavior of the collection and/or a custom comparison is always going 
 to be O(n).

 A question about STL: If I create a collection that, internally, is 
 like a linked list, but starts each indexing operation from the 
 position of the last indexing operation (so that a "find first" would 
 run in O(n) instead of O(n*n)), is it possible to send that collection 
 to STL's generic "linear find first"? I would argue that it should 
 somehow be possible *even* if the STL's generic "linear find first" 
 guarantees a *total* performance of O(n) (Since, in this case, it 
 would still be O(n) anyway). Because otherwise, the STL wouldn't be 
 very extendable, which would be a bad thing for a library of "generic" 
 algorithms.

 Yes, it will work.

 Another STL question: It is possible to use STL to do a "linear find" 
 using a custom comparison? If so, it is possible to make STL's "linear 
 find" function use a comparison that just happens to be O(n)? If so, 
 doesn't that violate the linear-time guarantee, too? If not, how does 
 it know that the custom comparison is O(n) instead of O(1) or O(log 
 n)?

 This will work too.

 IF you follow the conventions THEN the STL gives you the guarantees.

 I'm not sure that's really a "guarantee" per se, but that's splitting 
 hairs.

 In any case, it sounds like we're all arguing more or less the same 
 point:

 Setting aside the issue of "should opIndex be used and when?", suppose I 
 have the following collection interface and find function (not 
 guaranteed to compile):

 interface ICollection(T)
 {
     T getElement(index);
     int getSize();
 }

 int find(T)(ICollection(T) c, T elem)
 {
     for(int i=0; i<c.size(); i++)
     {
  if(c.getElement(i) == elem)
             return i;
     }
 }

 It sounds like STL's approach is to do something roughly like that and 
 say:

 "find()'s parameter 'c' should be an ICollection for which getElement() 
 is O(1), in which case find() is guaranteed to be O(n)"

 What I've been advocating is, again, doing something like the code above 
 and saying:

 "find()'s complexity is dependant on the complexity of the ICollection's 
 getElement(). If getElement()'s complexity is O(m), then find()'s 
 complexity is guaranteed to be O(m * n). Of course, this means that the 
 only way to get ideal complexity from find() is to use an ICollection 
 for which getElement() is O(1)".

 But, you see, those two statements are effectively equivilent.

 They are. But...
 if you don't adhere to the conventions, your code gets really hard to 
 reason about.

 "This class has an opIndex which is in O(n). Is that OK?" Well, that 
 depends on what it's being used for. So you have to look at all of the 
 places where it is used.

 It's much simpler to use the convention that opIndex _must_ be fast; this 
 way the performance requirements for containers and algorithms are 
 completely decoupled from each other. It's about good design.

 Taking a slight detour, let me ask you this... Which of the following 
 strategies do you consider to be better:

 //-- A --
 value = 0;
 for(int i=1; i<=10; i++)
 {
    value += i*2;
 }

 //-- B --
 value = sum(map(1..10, {n * 2}));

 Both strategies compute the sum of the first 10 multiples of 2.

 Strategy A makes the low-level implementation details very clear, but IMO, 
 it comes at the expense of high-level clarity. This is because the code 
 intermixes the high-level "what I want to accomplish?" with the low-level 
 details.

 Strategy B much more closely resembles the high-level desired result, and 
 thus makes the high-level intent more clear. But this comes at the cost of 
 hiding the low-level details behind a layer of abstraction.

 I may very well be wrong on this, but from what you've said it sounds like 
 you (as well as the other people who prefer [] to never be O(n)) are the 
 type of coder who would prefer "Strategy A". In that case, I can 
 completely understand your viewpoint on opIndex, even though I don't agree 
 with it (I'm a "Strategy B" kind of person).

 Of course, if I'm wrong on that assumption, then we're back to square one 
 ;)

For me at least, you are wrong :)  In fact, I view it the other way, you 
shouldn't have to care about the underlying implementation, as long as the 
runtime is well defined.  If you tell me strategy B may or may not take up 
to O(n^2) to compute, then you bet your ass I'm not going to even touch 
option B, 'cause I can always get O(n) time with option A :)  Your solution 
FORCES me to care about the details, it's not so much that I want to care 
about them.

-Steve

Aug 28 2008

Don <nospam nospam.com.au> writes:

Steven Schveighoffer wrote:
 "Nick Sabalausky" wrote
 "Don" <nospam nospam.com.au> wrote in message 
 news:g95td3$2tu0$1 digitalmars.com...
 Nick Sabalausky wrote:
 "Don" <nospam nospam.com.au> wrote in message 
 news:g95ks5$2aon$1 digitalmars.com...
 Nick Sabalausky wrote:
 "Dee Girl" <deegirl noreply.com> wrote in message 
 news:g94j7a$2875$1 digitalmars.com...
 I appreciate your view point. Please allow me explain. The view point 
 is in opposition with STL. In STL each algorithm defines what kind of 
 iterator it operates with. And it requires what iterator complexity.

 I agree that other design can be made. But STL has that design. In my 
 opinion is much part of what make STL so successful.

 I disagree that algorithm that knows complexity of iterator is 
 concrete. I think exactly contrary. Maybe it is good that you read 
 book about STL by Josuttis. STL algorithms are the most generic I 
 ever find in any language. I hope std.algorithm in D will be better. 
 But right now std.algorithm works only with array.

 If an algoritm uses [] and doesn't know the
 complexity of the []...good! It shouldn't know, and it shouldn't 
 care. It's
 the code that sends the collection to the algoritm that knows and 
 cares.

 I think this is mistake. Algorithm should know. Otherwise "linear 
 find" is not "linear find"! It is "cuadratic find" (spell?). If you 
 want to define something called linear find then you must know 
 iterator complexity.

 If a generic algorithm describes itself as "linear find" then I know 
 damn well that it's referring to the behavior of *just* the function 
 itself, and is not a statement that the function *combined* with the 
 behavior of the collection and/or a custom comparison is always going 
 to be O(n).

 A question about STL: If I create a collection that, internally, is 
 like a linked list, but starts each indexing operation from the 
 position of the last indexing operation (so that a "find first" would 
 run in O(n) instead of O(n*n)), is it possible to send that collection 
 to STL's generic "linear find first"? I would argue that it should 
 somehow be possible *even* if the STL's generic "linear find first" 
 guarantees a *total* performance of O(n) (Since, in this case, it 
 would still be O(n) anyway). Because otherwise, the STL wouldn't be 
 very extendable, which would be a bad thing for a library of "generic" 
 algorithms.

 Yes, it will work.

 Another STL question: It is possible to use STL to do a "linear find" 
 using a custom comparison? If so, it is possible to make STL's "linear 
 find" function use a comparison that just happens to be O(n)? If so, 
 doesn't that violate the linear-time guarantee, too? If not, how does 
 it know that the custom comparison is O(n) instead of O(1) or O(log 
 n)?

 This will work too.

 IF you follow the conventions THEN the STL gives you the guarantees.

 I'm not sure that's really a "guarantee" per se, but that's splitting 
 hairs.

 In any case, it sounds like we're all arguing more or less the same 
 point:

 Setting aside the issue of "should opIndex be used and when?", suppose I 
 have the following collection interface and find function (not 
 guaranteed to compile):

 interface ICollection(T)
 {
     T getElement(index);
     int getSize();
 }

 int find(T)(ICollection(T) c, T elem)
 {
     for(int i=0; i<c.size(); i++)
     {
  if(c.getElement(i) == elem)
             return i;
     }
 }

 It sounds like STL's approach is to do something roughly like that and 
 say:

 "find()'s parameter 'c' should be an ICollection for which getElement() 
 is O(1), in which case find() is guaranteed to be O(n)"

 What I've been advocating is, again, doing something like the code above 
 and saying:

 "find()'s complexity is dependant on the complexity of the ICollection's 
 getElement(). If getElement()'s complexity is O(m), then find()'s 
 complexity is guaranteed to be O(m * n). Of course, this means that the 
 only way to get ideal complexity from find() is to use an ICollection 
 for which getElement() is O(1)".

 But, you see, those two statements are effectively equivilent.

 They are. But...
 if you don't adhere to the conventions, your code gets really hard to 
 reason about.

 "This class has an opIndex which is in O(n). Is that OK?" Well, that 
 depends on what it's being used for. So you have to look at all of the 
 places where it is used.

 It's much simpler to use the convention that opIndex _must_ be fast; this 
 way the performance requirements for containers and algorithms are 
 completely decoupled from each other. It's about good design.

 Taking a slight detour, let me ask you this... Which of the following 
 strategies do you consider to be better:

 //-- A --
 value = 0;
 for(int i=1; i<=10; i++)
 {
    value += i*2;
 }

 //-- B --
 value = sum(map(1..10, {n * 2}));

 Both strategies compute the sum of the first 10 multiples of 2.

 Strategy A makes the low-level implementation details very clear, but IMO, 
 it comes at the expense of high-level clarity. This is because the code 
 intermixes the high-level "what I want to accomplish?" with the low-level 
 details.

 Strategy B much more closely resembles the high-level desired result, and 
 thus makes the high-level intent more clear. But this comes at the cost of 
 hiding the low-level details behind a layer of abstraction.

 I may very well be wrong on this, but from what you've said it sounds like 
 you (as well as the other people who prefer [] to never be O(n)) are the 
 type of coder who would prefer "Strategy A". In that case, I can 
 completely understand your viewpoint on opIndex, even though I don't agree 
 with it (I'm a "Strategy B" kind of person).

 Of course, if I'm wrong on that assumption, then we're back to square one 
 ;)

 
 For me at least, you are wrong :)  In fact, I view it the other way, you 
 shouldn't have to care about the underlying implementation, as long as the 
 runtime is well defined.  If you tell me strategy B may or may not take up 
 to O(n^2) to compute, then you bet your ass I'm not going to even touch 
 option B, 'cause I can always get O(n) time with option A :)  Your solution 
 FORCES me to care about the details, it's not so much that I want to care 
 about them.

I agree.  It's about _which_ details do you want to abstract away. I 
don't care about the internals. But I _do_ care about the complexity of 
them.

Aug 29 2008

Christopher Wright <dhasenan gmail.com> writes:

Don wrote:
 Steven Schveighoffer wrote:
 "Nick Sabalausky" wrote
 "Don" <nospam nospam.com.au> wrote in message 
 news:g95td3$2tu0$1 digitalmars.com...
 Nick Sabalausky wrote:
 "Don" <nospam nospam.com.au> wrote in message 
 news:g95ks5$2aon$1 digitalmars.com...
 Nick Sabalausky wrote:
 "Dee Girl" <deegirl noreply.com> wrote in message 
 news:g94j7a$2875$1 digitalmars.com...
 I appreciate your view point. Please allow me explain. The view 
 point is in opposition with STL. In STL each algorithm defines 
 what kind of iterator it operates with. And it requires what 
 iterator complexity.

 I agree that other design can be made. But STL has that design. 
 In my opinion is much part of what make STL so successful.

 I disagree that algorithm that knows complexity of iterator is 
 concrete. I think exactly contrary. Maybe it is good that you 
 read book about STL by Josuttis. STL algorithms are the most 
 generic I ever find in any language. I hope std.algorithm in D 
 will be better. But right now std.algorithm works only with array.

 If an algoritm uses [] and doesn't know the
 complexity of the []...good! It shouldn't know, and it 
 shouldn't care. It's
 the code that sends the collection to the algoritm that knows 
 and cares.

 I think this is mistake. Algorithm should know. Otherwise 
 "linear find" is not "linear find"! It is "cuadratic find" 
 (spell?). If you want to define something called linear find 
 then you must know iterator complexity.

 If a generic algorithm describes itself as "linear find" then I 
 know damn well that it's referring to the behavior of *just* the 
 function itself, and is not a statement that the function 
 *combined* with the behavior of the collection and/or a custom 
 comparison is always going to be O(n).

 A question about STL: If I create a collection that, internally, 
 is like a linked list, but starts each indexing operation from 
 the position of the last indexing operation (so that a "find 
 first" would run in O(n) instead of O(n*n)), is it possible to 
 send that collection to STL's generic "linear find first"? I 
 would argue that it should somehow be possible *even* if the 
 STL's generic "linear find first" guarantees a *total* 
 performance of O(n) (Since, in this case, it would still be O(n) 
 anyway). Because otherwise, the STL wouldn't be very extendable, 
 which would be a bad thing for a library of "generic" algorithms.

 Yes, it will work.

 Another STL question: It is possible to use STL to do a "linear 
 find" using a custom comparison? If so, it is possible to make 
 STL's "linear find" function use a comparison that just happens 
 to be O(n)? If so, doesn't that violate the linear-time 
 guarantee, too? If not, how does it know that the custom 
 comparison is O(n) instead of O(1) or O(log n)?

 This will work too.

 IF you follow the conventions THEN the STL gives you the guarantees.

 I'm not sure that's really a "guarantee" per se, but that's 
 splitting hairs.

 In any case, it sounds like we're all arguing more or less the same 
 point:

 Setting aside the issue of "should opIndex be used and when?", 
 suppose I have the following collection interface and find function 
 (not guaranteed to compile):

 interface ICollection(T)
 {
     T getElement(index);
     int getSize();
 }

 int find(T)(ICollection(T) c, T elem)
 {
     for(int i=0; i<c.size(); i++)
     {
  if(c.getElement(i) == elem)
             return i;
     }
 }

 It sounds like STL's approach is to do something roughly like that 
 and say:

 "find()'s parameter 'c' should be an ICollection for which 
 getElement() is O(1), in which case find() is guaranteed to be O(n)"

 What I've been advocating is, again, doing something like the code 
 above and saying:

 "find()'s complexity is dependant on the complexity of the 
 ICollection's getElement(). If getElement()'s complexity is O(m), 
 then find()'s complexity is guaranteed to be O(m * n). Of course, 
 this means that the only way to get ideal complexity from find() is 
 to use an ICollection for which getElement() is O(1)".

 But, you see, those two statements are effectively equivilent.

 They are. But...
 if you don't adhere to the conventions, your code gets really hard 
 to reason about.

 "This class has an opIndex which is in O(n). Is that OK?" Well, that 
 depends on what it's being used for. So you have to look at all of 
 the places where it is used.

 It's much simpler to use the convention that opIndex _must_ be fast; 
 this way the performance requirements for containers and algorithms 
 are completely decoupled from each other. It's about good design.

 Taking a slight detour, let me ask you this... Which of the following 
 strategies do you consider to be better:

 //-- A --
 value = 0;
 for(int i=1; i<=10; i++)
 {
    value += i*2;
 }

 //-- B --
 value = sum(map(1..10, {n * 2}));

 Both strategies compute the sum of the first 10 multiples of 2.

 Strategy A makes the low-level implementation details very clear, but 
 IMO, it comes at the expense of high-level clarity. This is because 
 the code intermixes the high-level "what I want to accomplish?" with 
 the low-level details.

 Strategy B much more closely resembles the high-level desired result, 
 and thus makes the high-level intent more clear. But this comes at 
 the cost of hiding the low-level details behind a layer of abstraction.

 I may very well be wrong on this, but from what you've said it sounds 
 like you (as well as the other people who prefer [] to never be O(n)) 
 are the type of coder who would prefer "Strategy A". In that case, I 
 can completely understand your viewpoint on opIndex, even though I 
 don't agree with it (I'm a "Strategy B" kind of person).

 Of course, if I'm wrong on that assumption, then we're back to square 
 one ;)

 For me at least, you are wrong :)  In fact, I view it the other way, 
 you shouldn't have to care about the underlying implementation, as 
 long as the runtime is well defined.  If you tell me strategy B may or 
 may not take up to O(n^2) to compute, then you bet your ass I'm not 
 going to even touch option B, 'cause I can always get O(n) time with 
 option A :)  Your solution FORCES me to care about the details, it's 
 not so much that I want to care about them.

 
 I agree.  It's about _which_ details do you want to abstract away. I 
 don't care about the internals. But I _do_ care about the complexity of 
 them.

We all agree about this. What we disagree about is how to find out about 
the complexity of an operation -- by whether it overloads an operator or 
by some metadata.

In terms of code, the difference is:
/* Operator overloading */
void foo(T)(T collection)
{
	static if (is (typeof (T[0]))) { ... }
}

/* Metadata */
void foo(T)(ICollection!(T) collection)
{
	if ((cast(FastIndexedCollection)collection) !is null) { ... }
}


You do need a metadata solution, whichever you choose. Otherwise you 
can't differentiate at runtime.

Aug 29 2008

Robert Fraser <fraserofthenight gmail.com> writes:

Nick Sabalausky wrote:
 Taking a slight detour, let me ask you this... Which of the following 
 strategies do you consider to be better:
 
 //-- A --
 value = 0;
 for(int i=1; i<=10; i++)
 {
     value += i*2;
 }
 
 //-- B --
 value = sum(map(1..10, {n * 2}));
 
 Both strategies compute the sum of the first 10 multiples of 2.
 
 Strategy A makes the low-level implementation details very clear, but IMO, 
 it comes at the expense of high-level clarity. This is because the code 
 intermixes the high-level "what I want to accomplish?" with the low-level 
 details.
 
 Strategy B much more closely resembles the high-level desired result, and 
 thus makes the high-level intent more clear. But this comes at the cost of 
 hiding the low-level details behind a layer of abstraction.

Didn't read the rest of the discussion, but I disagree here... Most 
programmers learn iterative languages first, and anyone whose taken 
Computer Science 101 can figure out what's going on in A. B takes a 
second to think about. I'm not into the zen of FP for sure, and that 
probably makes me a worse programmer... but I bet you if you take a 
random candidate for a development position, she'll be more likely to 
figure out (and write) A than B. [That may be projection; I haven't 
seen/done any studies]

The big problem IMO is the number of primitive things you need to 
understand. In A, you need to understand variables, looping and 
arithmetic operations. In B, you need to understand and think about 
closures/scoping, lists, the "map" function, aggregate functions, 
function compositions, and arithmetic operations. What hit me when first 
looking at it "where the **** did n come from?"

I'm not saying the functional style isn't perfect for a lot of things, 
I'm just saying that this not one of them.

Aug 28 2008

Walter Bright <newshound1 digitalmars.com> writes:

Robert Fraser wrote:
 The big problem IMO is the number of primitive things you need to 
 understand. In A, you need to understand variables, looping and 
 arithmetic operations. In B, you need to understand and think about 
 closures/scoping, lists, the "map" function, aggregate functions, 
 function compositions, and arithmetic operations. What hit me when first 
 looking at it "where the **** did n come from?"

I think B should be clearer and more intuitive, it's just that I'm not 
used to B at all whereas A style has worn a very deep groove in my brain.

Aug 29 2008

bearophile <bearophileHUGS lycos.com> writes:

Walter Bright:

I think B should be clearer and more intuitive, it's just that I'm not used to
B at all whereas A style has worn a very deep groove in my brain.<

Well, if you use D 2 you write it this way:

value = 0;
foreach (i; 1 .. 11)
    value += i * 2;

Using my libs you can write:

auto value = sum(map((int i){return i * 2;}, range(1, 11)));

But that creates two intermediate lists, so you may want to go all lazy instead:

auto value = sum(xmap((int i){return i * 2;}, xrange(1, 11)));

That's short and fast and uses very little (a constant amount of) memory, but
you have to count the open and closed brackets to be sure the expression is
correct...
So for me the most clear solution is the Python (lazy) one:

value = sum(i * 2 for i in xrange(1, 11))

That's why I suggested a similar syntax for D too ;-)

Bye,
bearophile

Aug 30 2008

Dee Girl <deegirl noreply.com> writes:

Nick Sabalausky Wrote:

 "Dee Girl" <deegirl noreply.com> wrote in message 
 news:g94j7a$2875$1 digitalmars.com...
 Nick Sabalausky Wrote:

 "Dee Girl" <deegirl noreply.com> wrote in message
 news:g943oi$11f4$1 digitalmars.com...
 Benji Smith Wrote:

 Dee Girl wrote:
 Michiel Helvensteijn Wrote:
 That's simple. a[i] looks much nicer than a.nth(i).

 It is not nicer. It is more deceiving (correct spell?). If you look 
 at
 code it looks like array code.

 foreach (i; 0 .. a.length)
 {
     a[i] += 1;
 }

 For array works nice. But for list it is terrible! Many operations 
 for
 incrementing only small list.

 Well, that's what you get with operator overloading.

 I am sorry. I disagree. I think that is what you get with bad design.

 The same thing could be said for "+" or "-". They're inherently
 deceiving, because they look like builtin operations on primitive data
 types.

 For expensive operations (like performing division on an
 unlimited-precision decimal object), should the author of the code use
 "opDiv" or should he implement a separate "divide" function?

 The cost of + and - is proportional to digits in number. For small 
 number
 of digits computer does fast in hardware. For many digits the cost 
 grows.
 The number of digits is log n. I think + and - are fine for big 
 integer. I
 am not surprise.

 Forget opIndex for a moment, and ask the more general question about 
 all
 overloaded operators. Should they imply any sort of asymptotic
 complexity guarantee?

 I think depends on good design. For example I think ++ or -- for 
 iterator.
 If it is O(n) it is bad design. Bad design make people say like you 
 "This
 is what you get with operator overloading".

 Personally, I don't think so.

 I don't like "nth".

 I'd rather use the opIndex. And if I'm using a linked list, I'll be
 aware of the fact that it'll exhibit linear-time indexing, and I'll be
 cautious about which algorithms to use.

 But inside algorithm you do not know if you use a linked list or a 
 vector.
 You lost that information in bad abstraction. Also abstraction is bad
 because if you change data structure you have concept errors that still
 compile. And run until tomorrow ^_^.

 A generic algoritm has absolutely no business caring about the complexity 
 of
 the collection it's operating on. If it does, then you've created a 
 concrete
 algoritm, not a generic one.

 I appreciate your view point. Please allow me explain. The view point is 
 in opposition with STL. In STL each algorithm defines what kind of 
 iterator it operates with. And it requires what iterator complexity.

 I agree that other design can be made. But STL has that design. In my 
 opinion is much part of what make STL so successful.

 I disagree that algorithm that knows complexity of iterator is concrete. I 
 think exactly contrary. Maybe it is good that you read book about STL by 
 Josuttis. STL algorithms are the most generic I ever find in any language. 
 I hope std.algorithm in D will be better. But right now std.algorithm 
 works only with array.

 If an algoritm uses [] and doesn't know the
 complexity of the []...good! It shouldn't know, and it shouldn't care. 
 It's
 the code that sends the collection to the algoritm that knows and cares.

 I think this is mistake. Algorithm should know. Otherwise "linear find" is 
 not "linear find"! It is "cuadratic find" (spell?). If you want to define 
 something called linear find then you must know iterator complexity.

 
 If a generic algorithm describes itself as "linear find" then I know damn 
 well that it's referring to the behavior of *just* the function itself, and 
 is not a statement that the function *combined* with the behavior of the 
 collection and/or a custom comparison is always going to be O(n).

I think this is wrong. (Maybe I wake up moody! ^_^) Linear find that use
another linear find each iteration is not linear find.

 A question about STL: If I create a collection that, internally, is like a 
 linked list, but starts each indexing operation from the position of the 
 last indexing operation (so that a "find first" would run in O(n) instead of 
 O(n*n)), is it possible to send that collection to STL's generic "linear 
 find first"? I would argue that it should somehow be possible *even* if the 
 STL's generic "linear find first" guarantees a *total* performance of O(n) 
 (Since, in this case, it would still be O(n) anyway). Because otherwise, the 
 STL wouldn't be very extendable, which would be a bad thing for a library of 
 "generic" algorithms.

Of course you can design bad collection and bad iterator. Let me ask this. 

interface IUnknown
{
    void AddRef();
    void Release();
    int QueryInterface(IID*, void**);
}

Now I come and ask you. If I implement functions bad to do wrong things, can I
use my class with COM? Maybe but I have leaks and other bad things.

Compiler or STL can not enforce meaning of words. It only can give you a
framework to express meanings correctly. Framework can be better or bad. You
hide that nth element costs O(n) as detail. Then I can not write find or
binary_search with your framework. Then I say STL better than your framework.

 Another STL question: It is possible to use STL to do a "linear find" using 
 a custom comparison? If so, it is possible to make STL's "linear find" 
 function use a comparison that just happens to be O(n)? If so, doesn't that 
 violate the linear-time guarantee, too? If not, how does it know that the 
 custom comparison is O(n) instead of O(1) or O(log n)?

An element of array does not have easy access to all array. But if you really
want it can store it as a member or use a global array. So you can make find do
O(n*n) or even more bad.

But I think it is same mistake. If you can do something bad it does not mean
framework is bad. The test is if you can do good thing easy. STL allows you to
do good thing easy. Your framework makes doing good thing impossible. Thank
you, Dee Girl

Aug 28 2008

"Simen Kjaeraas" <simen.kjaras gmail.com> writes:

Michiel Helvensteijn <nomail please.com> wrote:

 a[i] looks much nicer than a.nth(i).

To me, this is one of the most important points here. I want a
language that seems to make sense, more than I want a language
that is by default very fast. When I write a short example
program, I want to write a[i] not a.getElementAtPosition(i).

D is known as a language that does the safe thing by default,
and you have to jump through some hoops to do the fast, unsafe
thing. I will claim that a[i] is the default, as it is what
we're used to, and looks better. a.nth(i),
a.getElementAtPosition(i) and whatever other ways one might
come up with, is jumping through hoops.

Just my 0.02 kr.

-- 
Simen

Aug 27 2008

superdan <super dan.org> writes:

Simen Kjaeraas Wrote:

 Michiel Helvensteijn <nomail please.com> wrote:
 
 a[i] looks much nicer than a.nth(i).

 
 To me, this is one of the most important points here. I want a
 language that seems to make sense, more than I want a language
 that is by default very fast. When I write a short example
 program, I want to write a[i] not a.getElementAtPosition(i).

sure. you do so with arrays. i think you confuse "optimized-fast" with
"complexity-fast".

 D is known as a language that does the safe thing by default,
 and you have to jump through some hoops to do the fast, unsafe
 thing. I will claim that a[i] is the default, as it is what
 we're used to, and looks better. a.nth(i),
 a.getElementAtPosition(i) and whatever other ways one might
 come up with, is jumping through hoops.

guess i missed the lesson teachin' array indexing was unsafe. you are looking
at them wrong tradeoffs. it's not about slow and safe vs. fast and unsafe.
safety's nothin' to do with all this. you're lookin' at bad design vs. good
design of algos and data structs.

Aug 27 2008

Robert Fraser <fraserofthenight gmail.com> writes:

superdan wrote:
 yeppers. amend that to o(log n). in d, that rule is a social contract derived
from the built-in vector and hash indexing syntax.

I see what you did thar -- you made up a rule you like and called it a 
"social contract". Whether it _should be_ a rule or not is debatable, 
but it is neither a written nor unwritten rule in use right now, so what 
you said there is a lie.

First, a hash access is already time unbounded. hash["hello"] where 
"hello" is not already in the hash will create a hash entry for hello. 
This requires heap allocation, which can take arbitrarily long. So 
having unbounded opIndex is in the language already!

Second, opIndex can be used for things other than data structures. For 
example, if I had a handle to a folder that had a file "foo.txt" in it, 
folder["foo.txt"] seems a natural syntax to create a handle to that file 
(which allocates memory = time unbounded). I can see the opIndex syntax 
being used for things like properties that may require searching through 
a parse tree. Maybe this is sort of stretching it, but I wouldn't mind 
having the opIndex syntax as a shorthand for executing database queries, 
i.e. `auto result = db["SELECT * FROM posts WHERE from = 'superdan']";`.

It's a shorthand syntax that makes no guarantees as far as complexity 
nor should it.

Aug 27 2008

superdan <super dan.org> writes:

Robert Fraser Wrote:

 superdan wrote:
 yeppers. amend that to o(log n). in d, that rule is a social contract derived
from the built-in vector and hash indexing syntax.

 
 I see what you did thar -- you made up a rule you like and called it a 
 "social contract". Whether it _should be_ a rule or not is debatable, 
 but it is neither a written nor unwritten rule in use right now, so what 
 you said there is a lie.

well i'm exposed. good goin' johnny drama.

in c++ it's written. in d it's not yet. lookin' at std.algorithm i have no
doubt it will. so my lie is really a prophecy :D

 First, a hash access is already time unbounded. hash["hello"] where 
 "hello" is not already in the hash will create a hash entry for hello. 
 This requires heap allocation, which can take arbitrarily long. So 
 having unbounded opIndex is in the language already!

hash was always an oddball. it is acceptable because it offers constant time []
on average.

 Second, opIndex can be used for things other than data structures. For 
 example, if I had a handle to a folder that had a file "foo.txt" in it, 
 folder["foo.txt"] seems a natural syntax to create a handle to that file 
 (which allocates memory = time unbounded).

guess i wouldn't be crazy about it. but yeah it works no problem.

s'pose there's a misunderstanding s'mewhere. i'm not against opIndex usage in
various data structs. no problem! i am only against opIndex masquerading as
random access in a collection. that would allow algos thinkin' they do some
effin' good iteration. when in fact they do linear search each time they make a
pass. completely throws the shit towards the fan.

 I can see the opIndex syntax 
 being used for things like properties that may require searching through 
 a parse tree. Maybe this is sort of stretching it, but I wouldn't mind 
 having the opIndex syntax as a shorthand for executing database queries, 
 i.e. `auto result = db["SELECT * FROM posts WHERE from = 'superdan']";`.
 
 It's a shorthand syntax that makes no guarantees as far as complexity 
 nor should it.

kinda cute, but 100% agree.

Aug 27 2008

Sergey Gromov <snake.scaly gmail.com> writes:

Robert Fraser <fraserofthenight gmail.com> wrote:
 First, a hash access is already time unbounded. hash["hello"] where 
 "hello" is not already in the hash will create a hash entry for hello. 
 This requires heap allocation, which can take arbitrarily long. So 
 having unbounded opIndex is in the language already!

Hash's opIndex() throws an ArrayBoundsError if given an unknown key. 
It's opIndexAssign() which allocates.

-- 
SnakE

Aug 28 2008

Christopher Wright <dhasenan gmail.com> writes:

== Quote from Christopher Wright (dhasenan gmail.com)'s article
 WRONG!
 Those sorting algorithms are correct. Their runtime is now O(n^2 log n)
 for this linked list.

My mistake. Merge sort, qsort, and heap sort are all O(n log n) for any list
type
that allows for efficient iteration (O(n) to go through a list of n elements, or
for heap sort, O(n log n)) and O(1) appending (or, for heap sort, O(log n)). So
even for a linked list, those three algorithms, which are probably the most
common
sorting algorithms used, will still be efficient. Unless the person who wrote
them
was braindead and used indexing to iterate rather than the class's defined
opApply.

Aug 26 2008

superdan <super dan.org> writes:

Christopher Wright Wrote:

 == Quote from Christopher Wright (dhasenan gmail.com)'s article
 WRONG!
 Those sorting algorithms are correct. Their runtime is now O(n^2 log n)
 for this linked list.

 
 My mistake. Merge sort, qsort, and heap sort are all O(n log n) for any list
type
 that allows for efficient iteration (O(n) to go through a list of n elements,
or
 for heap sort, O(n log n)) and O(1) appending (or, for heap sort, O(log n)). So
 even for a linked list, those three algorithms, which are probably the most
common
 sorting algorithms used, will still be efficient. Unless the person who wrote
them
 was braindead and used indexing to iterate rather than the class's defined
opApply.

sigh. your mistake indeed. just not where you thot. quicksort needs random
access fer the pivot. not fer iterating. quicksort can't guarantee good runtime
if pivot is first element. actually any of first k elements. on a forward
iterator quicksort does quadratic time if already sorted or almost sorted.

Aug 26 2008

Christopher Wright <dhasenan gmail.com> writes:

superdan wrote:
 Christopher Wright Wrote:
 
 == Quote from Christopher Wright (dhasenan gmail.com)'s article
 WRONG!
 Those sorting algorithms are correct. Their runtime is now O(n^2 log n)
 for this linked list.

 My mistake. Merge sort, qsort, and heap sort are all O(n log n) for any list
type
 that allows for efficient iteration (O(n) to go through a list of n elements,
or
 for heap sort, O(n log n)) and O(1) appending (or, for heap sort, O(log n)). So
 even for a linked list, those three algorithms, which are probably the most
common
 sorting algorithms used, will still be efficient. Unless the person who wrote
them
 was braindead and used indexing to iterate rather than the class's defined
opApply.

 
 sigh. your mistake indeed. just not where you thot. quicksort needs random
access fer the pivot. not fer iterating. quicksort can't guarantee good runtime
if pivot is first element. actually any of first k elements. on a forward
iterator quicksort does quadratic time if already sorted or almost sorted.

You need to pick a random pivot in order to guarantee that runtime, in 
fact. And you can do that in linear time, and you're doing a linear scan 
through the elements anyway, so you get the same asymptotic time. It's 
going to double your runtime at worst, if you chose a poor datastructure 
for the task.

Aug 26 2008

superdan <super dan.org> writes:

Christopher Wright Wrote:

 superdan wrote:
 Christopher Wright Wrote:
 
 == Quote from Christopher Wright (dhasenan gmail.com)'s article
 WRONG!
 Those sorting algorithms are correct. Their runtime is now O(n^2 log n)
 for this linked list.

 My mistake. Merge sort, qsort, and heap sort are all O(n log n) for any list
type
 that allows for efficient iteration (O(n) to go through a list of n elements,
or
 for heap sort, O(n log n)) and O(1) appending (or, for heap sort, O(log n)). So
 even for a linked list, those three algorithms, which are probably the most
common
 sorting algorithms used, will still be efficient. Unless the person who wrote
them
 was braindead and used indexing to iterate rather than the class's defined
opApply.

 
 sigh. your mistake indeed. just not where you thot. quicksort needs random
access fer the pivot. not fer iterating. quicksort can't guarantee good runtime
if pivot is first element. actually any of first k elements. on a forward
iterator quicksort does quadratic time if already sorted or almost sorted.

 
 You need to pick a random pivot in order to guarantee that runtime, in 
 fact. And you can do that in linear time, and you're doing a linear scan 
 through the elements anyway, so you get the same asymptotic time. It's 
 going to double your runtime at worst, if you chose a poor datastructure 
 for the task.

damn man you're right. yeah it's still o(n log n). i was wrong. 'pologies.

Aug 26 2008

Michel Fortin <michel.fortin michelf.com> writes:

On 2008-08-25 21:56:18 -0400, Benji Smith <dlanguage benjismith.net> said:

 But if someone else, with special design constraints, needs to 
 implement a custom container template, it's no problem. As long as the 
 container provides a function for getting iterators to the container 
 elements, it can consume any of the STL algorithms too, even if the 
 performance isn't as good as the performance for a vector.

Indeed. But notice that the Standard Template Library containers 
doesn't use inheritance, but templates. You can create your own version 
of std::string by creating a different class and implementing the same 
functions, but then every function accepting a std::string would have 
to be a template capable of accepting either one as input, or changed 
to use your new string class. That's why std::find and std::foreach, 
akin many others, are template functions: those would work with your 
custom string class.

The situation is no different in D: you can create your own string 
class or struct, but only functions taking your string class or struct, 
or template functions where the string type is a template argument, 
will be able to use it.

If your argument is that string functions in Phobos should be template 
functions accepting any kind of string as input, then that sounds 
reasonable to me. But that's not exacly what you said you wanted.

 There's no good reason the same technique couldn't provide both speed 
 and API flexibility for text processing.

This is absolutely right... but unfortunately, using virtual 
inheritance (as interfaces in D imply) isn't the same technique as in 
the STL at all. Template algorithms parametrized on the container and 
iterator type is what the STL is all about, and from there come its 
speed.

-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Aug 25 2008

superdan <super dan.org> writes:

Michel Fortin Wrote:

 On 2008-08-25 21:56:18 -0400, Benji Smith <dlanguage benjismith.net> said:
 
 But if someone else, with special design constraints, needs to 
 implement a custom container template, it's no problem. As long as the 
 container provides a function for getting iterators to the container 
 elements, it can consume any of the STL algorithms too, even if the 
 performance isn't as good as the performance for a vector.

 
 Indeed. But notice that the Standard Template Library containers 
 doesn't use inheritance, but templates. You can create your own version 
 of std::string by creating a different class and implementing the same 
 functions, but then every function accepting a std::string would have 
 to be a template capable of accepting either one as input, or changed 
 to use your new string class. That's why std::find and std::foreach, 
 akin many others, are template functions: those would work with your 
 custom string class.
 
 The situation is no different in D: you can create your own string 
 class or struct, but only functions taking your string class or struct, 
 or template functions where the string type is a template argument, 
 will be able to use it.
 
 If your argument is that string functions in Phobos should be template 
 functions accepting any kind of string as input, then that sounds 
 reasonable to me. But that's not exacly what you said you wanted.

perfect answer. u da man. 

for example look at this fn from std.string.

int cmp(C1, C2)(in C1[] s1, in C2[] s2); 

so it looks like cmp accepts arrays of any character type. that is cool but the
[] limits the thing to builtin arrays. the correct sig is

int cmp(S1, S2)(in S1 s1, in S2 s2) 
    if (isSortaString!(S1) && isSortaString!(S2));

correct?

Aug 25 2008

Michel Fortin <michel.fortin michelf.com> writes:

On 2008-08-25 22:52:52 -0400, superdan <super dan.org> said:

 int cmp(S1, S2)(in S1 s1, in S2 s2)
     if (isSortaString!(S1) && isSortaString!(S2));
 
 correct?

That's sorta what I had in mind.

-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Aug 25 2008

Walter Bright <newshound1 digitalmars.com> writes:

superdan wrote:
 for example look at this fn from std.string.
 
 int cmp(C1, C2)(in C1[] s1, in C2[] s2);
 
 so it looks like cmp accepts arrays of any character type. that is
 cool but the [] limits the thing to builtin arrays. the correct sig
 is
 
 int cmp(S1, S2)(in S1 s1, in S2 s2) if (isSortaString!(S1) &&
 isSortaString!(S2));
 
 correct?


Yes. It's just that template constraints came along later than 
std.string.cmp :-)

Aug 26 2008

Walter Bright <newshound1 digitalmars.com> writes:

Benji Smith wrote:
 I don't know a whole lot about the STL,

STL is a piece of brilliance in C++ (and one can reasonably argue that 
STL saved C++). The design of STL solves the problems you are talking about.

Andrei has been hard at work getting equivalent functionality into the D 
library (see http://www.digitalmars.com/d/2.0/phobos/std_algorithm.html).

Aug 26 2008

Walter Bright <newshound1 digitalmars.com> writes:

Benji Smith wrote:
 But in this "systems language", it's a O(n) operation to get the nth 
 character from a string, to slice a string based on character offsets, 
 or to determine the number of characters in the string.
 
 I'd gladly pay the price of a single interface vtable lookup to turn all 
 of those into O(1) operations.

I've written internationalized applications that dealt with multibyte 
utf strings. It looks like one would regularly need all those 
operations, but interestingly it just doesn't come up. It turns out that 
one needs to slice with the byte offset, or get the byte length, or get 
the nth byte. In the very rare case where one wants to do it with 
characters, one seems to already have the right offsets at hand.

If you choose to use dchar's instead, there is a 1:1 mapping between 
characters and indices, and it doesn't cost you any class overhead. It's 
also a simple conversion from UTF-8 <==> UCS-2. I can't think of a 
scenario where using classes would produce any performance advantage.

Aug 26 2008

"Lionello Lunesu" <lionello lunesu.remove.com> writes:

"superdan" <super dan.org> wrote in message 
news:g8vh9b$fko$1 digitalmars.com...
 Benji Smith Wrote:
 No. Of course not. The compiler complains that you can't concatenate a
 dchar to a char[] array. Even though the "find" functions indicate that
 the array is truly a collection of dchar elements.

 that's a bug in the compiler. report it.



L.

Aug 25 2008

Benji Smith <dlanguage benjismith.net> writes:

superdan wrote:
 Benji Smith Wrote:
 A char[] is actually an array of UTF-8 encoded octets, where each 
 character may consume one or more consecutive elements of the array. 
 Retrieving the str.length property may or may not tell you how many 
 characters are in the string. And pretty much any code that tries to 
 iterate character-by-character through the array elements is 
 fundamentally broken.

 
 try this:
 
 foreach (dchar c; str)
 {
     process c
 }

Cool. I had no idea that was possible. I was doing this:

   myFunction(T)(T[] array) {
     foreach(T c; array) {
       doStuff(c);
     }
   }

--benji

Aug 26 2008

superdan <super dan.org> writes:

Benji Smith Wrote:

 In another thread (about array append performance) I mentioned that 
 Strings ought to be implemented as classes rather than as simple builtin
 arrays. Superdan asked why. Here's my response...

well then allow me to retort.

 I'll start with a few of the softball, easy reasons.
 
 For starters, with strings implemented as character arrays, writing 
 library code that accepts and operates on strings is a bit of a pain in 
 the neck, since you always have to write templates and template code is 
 slightly less readable than non-template code. You can't distribute your 
 code as a DLL or a shared object, because the template instantiations 
 won't be included (unless you create wrapper functions with explicit 
 template instantiations, bloating your code size, but more importantly 
 tripling the number of functions in your API).

so u mean with a class the encoding char/wchar/dchar won't be an issue anymore.
that would be hidden behind the wraps. cool.

problem is that means there's an indirection cost for every character access.
oops. so then apps that decided to use some particular encoding consistently
must pay a price for stuff they don't use.

but if u have strings like today it's a no-brainer to define a class that does
all that stuff. u can then use that class whenever you feel. it would be
madness to put that class in the language definition. at best it's a candidate
for the stdlib.

so that low-hangin' argument of yers ain't that low-hangin' after all. unless u
call a hanged deadman low-hangin'.

 Another good low-hanging argument is that strings are frequently used as 
 keys in associative arrays. Every insertion and retrieval in an 
 associative array requires a hashcode computation. And since D strings 
 are just dumb arrays, they have no way of memoizing their hashcodes. 
 We've already observed that D assoc arrays are less performant than even 
 Python maps, so the extra cost of lookup operations is unwelcome.

again you want to build larger components from smaller components. you can
build a string with memoized hashcode from a string without memoized hashcode.
but you can't build a string without memoized hashcode from a string with
memoized hashcode. but wait there's more. the extra field is paid for
regardless. so what numbers do you have to back up your assertion that it's
worth paying that cost for everything except hashtables.

 But much more important than either of those reasons is the lack of 
 polymorphism on character arrays. Arrays can't have subclasses, and they 
 can't implement interfaces.

that's why you can always define a class that does all those good things.

by the same arg why isn't int a class. the point is you can always create class
Int that does what an int does, slower but more flexible. if all you had was
class Int you'd be in slowland.

 A good example of what I'm talking about can be seen in the Phobos and 
 Tango regular expression engines. At least the Tango implementation 
 matches against all string types (the Phobos one only works with char[] 
 strings).
 
 But what if I want to consume a 100 MB logfile, counting all lines that 
 match a pattern?

 Right now, to use the either regex engine, I have to read the entire 
 logfile into an enormous array before invoking the regex search function.
 
 Instead, what if there was a CharacterStream interface? And what if all 
 the text-handling code in Phobos & Tango was written to consume and 
 return instances of that interface?

what exactly is the problem there aside from a library issue.

 A regex engine accepting a CharacterStream interface could process text 
 from string literals, file input streams, socket input streams, database 
 records, etc, etc, etc... without having to pollute the API with a bunch 
 of casts, copies, and conversions. And my logfile processing application 
 would consume only a tiny fraction of the memory needed by the character 
 array implementation.

library problem. or maybe you want to build character stream into the language
too.

 Most importantly, the contract between the regex engine and its 
 consumers would provide a well-defined interface for processing text, 
 regardless of the source or representation of that text.
 
 Along a similar vein, I've worked on a lot of parsers over the past few 
 years, for domain specific languages and templating engines, and stuff 
 like that. Sometimes it'd be very handy to define a "Token" class that 
 behaves exactly like a String, but with some additional behavior. 
 Ideally, I'd like to implement that Token class as an implementor of the 
 CharacterStream interface, so that it can be passed directly into other 
 text-handling functions.
 
 But, in D, with no polymorphic text handling, I can't do that.

of course you can. you just don't want to for the sake of building a fragile
argument.

 As one final thought... I suspect that mutable/const/invariant string 
 handling would be much more conveniently implemented with a 
 MutableCharacterStream interface (as an extended interface of 
 CharacterStream).
 
 Any function written to accept a CharacterStream would automatically 
 accept a MutableCharacterStream, thanks to interface polymorphism, 
 without any casts, conversions, or copies. And various implementors of 
 the interface could provide buffered implementations operating on 
 in-memory strings, file data, or network data.
 
 Coding against the CharacterStream interface, library authors wouldn't 
 need to worry about const-correctness, since the interface wouldn't 
 provide any mutator methods.

sounds great. so then go ahead and make the characterstream thingie. the
language gives u everything u need to make it clean and fast.

 But then again, I haven't used any of the const functionality in D2, so 
 I can't actually comment on relative usability of compiler-enforced 
 immutability versus interface-enforced immutability.
 
 Anyhow, those are some of my thoughts... I think there are a lot of 
 compelling reasons for de-coupling the specification of string handling 
 functionality from the implementation of that functionality, primarily 
 for enabling polymorphic text-processing.
 
 But memoized hashcodes would be cool too :-)

sorry dood each and every argument talks straight against your case. if i had
any doubts, you just convinced me that a builtin string class would be a
mistake.

Aug 25 2008

BCS <ao pathlink.com> writes:

Reply to superdan,

 sorry dood each and every argument talks straight against your case.
 if i had any doubts, you just convinced me that a builtin string class
 would be a mistake.
 

OTOH as standard lib string class...

Aug 25 2008

Benji Smith <dlanguage benjismith.net> writes:

Sorry, dan. You're wrong.

superdan wrote:
 again you want to build larger components from smaller components.

Good APIs define public interfaces and hide implementation details, 
usually providing a default general-purpose implementation while 
allowing third-parties to define special-purpose implementations to suit 
their needs.

In D, the text handling is defined by the data format (unicode byte 
sequences) rather than by the interface, while providing no polymorphic 
mechanism for alternate implementations.

It's the opposite of good API design.

The regular expression engine accepts character arrays. And that's it. 
You want to match on a regex pattern, character-by-character, from an 
input stream? Tough nuts. It's not possible.

The new JSON parser in the Tango library operates on templated string 
arrays. If I want to read from a file or a socket, I have to first slurp 
the whole thing into a character array, even though the 
character-streaming would be more practical.

Parsers, formatters, console IO, socket IO... Anything that provides an 
iterable sequence of characters ought to comply with an interface 
facilitating polymorphic text processing. In some cases, there might be 
a slight memory/speed tradeoff. But in many more cases, the benefit of 
iterating over the transient characters in a stream would be much faster 
and require a tiny fraction of the memory of the character array.

There are performance benefits to be found on both sides of the coin.

Anyhow, I totally agree with you when you say that "larger components" 
should be built from "smaller components".

But the "small components" are the *interfaces*, not the implementation 
details.

--benji

Aug 25 2008

superdan <super dan.org> writes:

Benji Smith Wrote:

 Sorry, dan. You're wrong.

well just sayin' it ain't makin' it. hope there's some in the way of a proof
henceforth.

 superdan wrote:
 again you want to build larger components from smaller components.

 
 Good APIs define public interfaces and hide implementation details, 
 usually providing a default general-purpose implementation while 
 allowing third-parties to define special-purpose implementations to suit 
 their needs.

sure thing. all this gave me a warm fuzzy feelin' right there.

 In D, the text handling is defined by the data format (unicode byte 
 sequences) rather than by the interface, while providing no polymorphic 
 mechanism for alternate implementations.
 
 It's the opposite of good API design.

wait a minute. you got terms all mixed up. and that's a recurring problem that
mashes your argument badly. first you say `in d'. so then i assume there's a
problem in the language. but then you go on with what looks like a library
thing. sure enuff you end with `api' which is 100% a library thingy.

so if you have any beef say `in phobos' or `in tango'. `in d' must refer to the
language proper.

the core lang is supposed to give you the necessary stuff to do that nice api
design that gave me half an erection in ur first paragraph. if the language
gives me the classes and apis and stuff then things will be slow for everyone
and no viagra and no herbal supplement ain't gonna make stuff hard around here.

i think d strings are the cleverest thing yet. not too low level like pointers.
not too high level like classes and stuff. just the right size. pun not
intended.

 The regular expression engine accepts character arrays. And that's it. 
 You want to match on a regex pattern, character-by-character, from an 
 input stream? Tough nuts. It's not possible.

there are multiple problems with this argument of yours. first it's that you
swap horses in the midstream. that, according to the cowboy proverb, is
unrecommended. you switch from strings to streams. don't.

about streams. same in perl. it's a classic problem. you must first read as
much text as you know could match. then you match. and nobody complains.

what you say is possible. it hasn't been written that way 'cause it's damn
hard. motherintercoursin' hard. regexes must backtrack and when they do they
need pronto access to the stuff behind. if that's in a file, yeah `tough nuts'
eh.

 The new JSON parser in the Tango library operates on templated string 
 arrays. If I want to read from a file or a socket, I have to first slurp 
 the whole thing into a character array, even though the 
 character-streaming would be more practical.

non seqitur. see stuff above.

 Parsers, formatters, console IO, socket IO... Anything that provides an 
 iterable sequence of characters ought to comply with an interface 
 facilitating polymorphic text processing. In some cases, there might be 
 a slight memory/speed tradeoff. But in many more cases, the benefit of 
 iterating over the transient characters in a stream would be much faster 
 and require a tiny fraction of the memory of the character array.

so what u want is a streaming interface and an implementation of it that spans
a string. for the mother of god i can't figure what this has to do with the
language, and what stops you from writing it.

 There are performance benefits to be found on both sides of the coin.

no. all benefits only of my side of the coin. my side of the coin includes your
side of the coin.

 Anyhow, I totally agree with you when you say that "larger components" 
 should be built from "smaller components".

cool.

 But the "small components" are the *interfaces*, not the implementation 
 details.

quite when i thought i drove a point home... dood we need to talk. you have all
core language, primitives, libraries, and app code confused.

Aug 25 2008

Benji Smith <dlanguage benjismith.net> writes:

superdan wrote:
 But the "small components" are the *interfaces*, not the implementation 
 details.

 
 quite when i thought i drove a point home... dood we need to talk. you have
all core language, primitives, libraries, and app code confused.

The standard libraries are in a grey area between language the language 
spec and application code. There are all sorts implicit "interfaces" in 
exposed by the builtin types (and there's also plenty of core language 
functionality implemented in the standard lib... take the GC, for example).

You act there's no such thing as an interface for a builtin language 
feature.

With strings implemented as raw arrays, they take on the array API...

slicing: broken
indexing: busted
iterating: fucked
length: you guessed it

I don't think the internals of the string representation should be any 
different. UTF-8 arrays? Fine by me. Just don't make me look at the 
malformed, mis-sliced bytes. Provide an API (yes, implemented in the 
standard lib, but specified by the language spec) that actually makes 
sense for text data.

(Incidentally, this is the same reason I think the builtin dynamic 
arrays should be classes implementing a standard List interface, and the 
associative arrays should be classes implementing a Map interface. The 
language implementations are nice, but they're not polymorphic, and that 
makes it a pain in the ass to extend them.)

--benji

Aug 25 2008

BCS <ao pathlink.com> writes:

Reply to Benji,

 slicing: broken

works as defined

 indexing: busted

works as defined

 iterating: fucked

works as defined and with foreach(dchar) as you want

 length: you guessed it

works as defined

 Provide an API (yes, implemented in the
 standard lib, but specified by the language spec) that actually makes
 sense for text data.

BTW everything in phobos under std.* is /not part of the D language spec/.

 
 (Incidentally, this is the same reason I think the builtin dynamic
 arrays should be classes implementing a standard List interface, and
 the associative arrays should be classes implementing a Map interface.
 The language implementations are nice, but they're not polymorphic,
 and that makes it a pain in the ass to extend them.)
 

A system language MUST have arrays that are not classes or anything near 
that thick. If you must have that sort of interface, pick a different language, 
because D isn't intended to work that way.

 --benji

Aug 25 2008

bearophile <bearophileHUGS lycos.com> writes:

BCS:
 If you must have that sort of interface, pick a different language, 
 because D isn't intended to work that way.


nature of such software, etc, it will be used way more than D, and it has all
the nice things Benji asks for.

Bye,
bearophile

Aug 25 2008

Benji Smith <dlanguage benjismith.net> writes:

bearophile wrote:
 BCS:
 If you must have that sort of interface, pick a different language, 
 because D isn't intended to work that way.

 

borg-like nature of such software, etc, it will be used way more than D, and it
has all the nice things Benji asks for.
 
 Bye,
 bearophile


and libraries dovetailing nicely together.

I'm using D on my current project because I need to distribute libraries 
on both windows and linux, with C-linkage.

And D is a helluva lot more pleasant than C/C++, even if there is a lot 
about D that I find lacking.

--benji

Aug 25 2008

bearophile <bearophileHUGS lycos.com> writes:

Benji Smith:

 and libraries dovetailing nicely together.


copying. But probably having a complex coherent OOP structure from the bottom

means it's designed for people that like to suffer more :-) D is designed
mostly for people coming from C and C++, and it must be fit to be used
procedurally/functionally without any OOP too.


situation isn't set in stone: time ago for example there was a person willing

created the Boo language. It's not widespread, and it has few small design
mistakes, but overall it's not a bad language, it's quite usable for its
purposes. So you can create your language fit for your purposes... Do you know

stage still, but it may be closer to your dream language.

Another approach you may follow is to reinvent just the standard

from outside, Tango too seems already closer to the Java std lib more than
Phobos (but I may be wrong). I like Python, so I am writing a large lib that no
one else uses that has partially the purpose of making D look like Python :-)

Bye,
bearophile

Aug 25 2008

"Nick Sabalausky" <a a.a> writes:

"bearophile" <bearophileHUGS lycos.com> wrote in message 
news:g8vmda$sd4$1 digitalmars.com...
 BCS:
 If you must have that sort of interface, pick a different language,
 because D isn't intended to work that way.


 borg-like nature of such software, etc, it will be used way more than D, 
 and it has all the nice things Benji asks for.


IArithmetic or operator constrains tends to gimp its template system in a 
number of cases.

Aug 26 2008

BCS <ao pathlink.com> writes:

Reply to Nick,

 "bearophile" <bearophileHUGS lycos.com> wrote in message
 news:g8vmda$sd4$1 digitalmars.com...
 
 BCS:
 
 If you must have that sort of interface, pick a different language,
 because D isn't intended to work that way.
 


 borg-like nature of such software, etc, it will be used way more than
 D, and it has all the nice things Benji asks for.
 


 IArithmetic or operator constrains tends to gimp its template system
 in a number of cases.

Aug 26 2008

Christopher Wright <dhasenan gmail.com> writes:

BCS wrote:
 Reply to Nick,
 
 "bearophile" <bearophileHUGS lycos.com> wrote in message
 news:g8vmda$sd4$1 digitalmars.com...

 BCS:

 If you must have that sort of interface, pick a different language,
 because D isn't intended to work that way.


 borg-like nature of such software, etc, it will be used way more than
 D, and it has all the nice things Benji asks for.


 IArithmetic or operator constrains tends to gimp its template system
 in a number of cases.

 

 around.

Yes, but oh, the syntax!

Aug 26 2008

Jesse Phillips <jessekphillips gmail.com> writes:

On Mon, 25 Aug 2008 20:52:04 -0400, Benji Smith wrote:

 superdan wrote:
 But the "small components" are the *interfaces*, not the
 implementation details.

 
 quite when i thought i drove a point home... dood we need to talk. you
 have all core language, primitives, libraries, and app code confused.

 
 The standard libraries are in a grey area between language the language
 spec and application code. There are all sorts implicit "interfaces" in
 exposed by the builtin types (and there's also plenty of core language
 functionality implemented in the standard lib... take the GC, for
 example).
 
 You act there's no such thing as an interface for a builtin language
 feature.
 
 With strings implemented as raw arrays, they take on the array API...
 
 slicing: broken
 indexing: busted
 iterating: fucked
 length: you guessed it
 
 I don't think the internals of the string representation should be any
 different. UTF-8 arrays? Fine by me. Just don't make me look at the
 malformed, mis-sliced bytes. Provide an API (yes, implemented in the
 standard lib, but specified by the language spec) that actually makes
 sense for text data.
 
 (Incidentally, this is the same reason I think the builtin dynamic
 arrays should be classes implementing a standard List interface, and the
 associative arrays should be classes implementing a Map interface. The
 language implementations are nice, but they're not polymorphic, and that
 makes it a pain in the ass to extend them.)
 
 --benji

On the language spec vs standard library. While the GC is implemented in 
the standard library, I do not believe the spec says it has to be (though 
I don't think it is possible otherwise). So the spec could state that 
strings should be implemented your way, but it shouldn't.

On another note. I must say this as been quite a turn around. There have 
been many posts in the past with people arguing over having a String 
class, I think they have been staying out. But none the less it is 
nothing new.

Aug 25 2008

BCS <ao pathlink.com> writes:

Reply to Benji,


 The new JSON parser in the Tango library operates on templated string
 arrays. If I want to read from a file or a socket, I have to first
 slurp the whole thing into a character array, even though the
 character-streaming would be more practical.
 

Unless you are only going to parse the start of the file or are going to 
be throwing away most of it *while you parse it, not after* The best way 
to parse a file is to load it all in one OS system call and then run a slicing 
parser (like the Tango XML parser) on that. 

One memory allocation and one load or a mmap, and then only the meta structures 
get allocated later.

Aug 25 2008

Robert Fraser <fraserofthenight gmail.com> writes:

BCS wrote:
 Reply to Benji,
 
 
 The new JSON parser in the Tango library operates on templated string
 arrays. If I want to read from a file or a socket, I have to first
 slurp the whole thing into a character array, even though the
 character-streaming would be more practical.

 
 Unless you are only going to parse the start of the file or are going to 
 be throwing away most of it *while you parse it, not after* The best way 
 to parse a file is to load it all in one OS system call and then run a 
 slicing parser (like the Tango XML parser) on that.
 One memory allocation and one load or a mmap, and then only the meta 
 structures get allocated later.

There are cases where you might want to parse an XML file that won't fit 
easily in main memory. I think a stream processing SAX parser would be a 
good addition (perhaps not replacement for) the exiting one.

Aug 25 2008

BCS <ao pathlink.com> writes:

Reply to Robert,

 BCS wrote:
 
 Reply to Benji,
 
 The new JSON parser in the Tango library operates on templated
 string arrays. If I want to read from a file or a socket, I have to
 first slurp the whole thing into a character array, even though the
 character-streaming would be more practical.
 

 Unless you are only going to parse the start of the file or are going
 to
 be throwing away most of it *while you parse it, not after* The best
 way
 to parse a file is to load it all in one OS system call and then run
 a
 slicing parser (like the Tango XML parser) on that.
 One memory allocation and one load or a mmap, and then only the meta
 structures get allocated later.

 There are cases where you might want to parse an XML file that won't
 fit easily in main memory. I think a stream processing SAX parser
 would be a good addition (perhaps not replacement for) the exiting
 one.
 

If you can't fit the data file in memory the I find it hard to believe you 
will be able to hold the parsed file in memory. If you can program the parser 
to dump unneeded data on the fly or process and discard the data, that might 
make a difference.

Aug 26 2008

Benji Smith <dlanguage benjismith.net> writes:

BCS wrote:
 Reply to Robert,
 
 BCS wrote:

 Reply to Benji,

 The new JSON parser in the Tango library operates on templated
 string arrays. If I want to read from a file or a socket, I have to
 first slurp the whole thing into a character array, even though the
 character-streaming would be more practical.

 Unless you are only going to parse the start of the file or are going
 to
 be throwing away most of it *while you parse it, not after* The best
 way
 to parse a file is to load it all in one OS system call and then run
 a
 slicing parser (like the Tango XML parser) on that.
 One memory allocation and one load or a mmap, and then only the meta
 structures get allocated later.

 There are cases where you might want to parse an XML file that won't
 fit easily in main memory. I think a stream processing SAX parser
 would be a good addition (perhaps not replacement for) the exiting
 one.

 
 If you can't fit the data file in memory the I find it hard to believe 
 you will be able to hold the parsed file in memory. If you can program 
 the parser to dump unneeded data on the fly or process and discard the 
 data, that might make a difference.

Well, for something like a DOM parser, it's pretty much impossible to 
parse a file that won't fit into memory. But a SAX parser doesn't 
actually create any objects. It just calls events, while processing XML 
data from a stream. A good SAX parser can operate without ever 
allocating anything on the heap, leaving the consumer to create any 
necessary objects from the parse process.

--benji

Aug 26 2008

BCS <ao pathlink.com> writes:

Reply to Benji,

 BCS wrote:
 
 Reply to Robert,
 
 BCS wrote:
 
 Reply to Benji,
 
 The new JSON parser in the Tango library operates on templated
 string arrays. If I want to read from a file or a socket, I have
 to first slurp the whole thing into a character array, even though
 the character-streaming would be more practical.
 

 Unless you are only going to parse the start of the file or are
 going
 to
 be throwing away most of it *while you parse it, not after* The
 best
 way
 to parse a file is to load it all in one OS system call and then
 run
 a
 slicing parser (like the Tango XML parser) on that.
 One memory allocation and one load or a mmap, and then only the
 meta
 structures get allocated later.

 There are cases where you might want to parse an XML file that won't
 fit easily in main memory. I think a stream processing SAX parser
 would be a good addition (perhaps not replacement for) the exiting
 one.
 

 If you can't fit the data file in memory the I find it hard to
 believe you will be able to hold the parsed file in memory. If you
 can program the parser to dump unneeded data on the fly or process
 and discard the data, that might make a difference.
 

 Well, for something like a DOM parser, it's pretty much impossible to
 parse a file that won't fit into memory. But a SAX parser doesn't
 actually create any objects. It just calls events, while processing
 XML data from a stream. A good SAX parser can operate without ever
 allocating anything on the heap, leaving the consumer to create any
 necessary objects from the parse process.
 
 --benji
 

Interesting, I've worked with parsers* that function something like that 
but never thought of them in that way. OTOH I can think of only very limited 
domain where this would be useful. If I needed to process that much data 
I'd load it into a database and go from there.

*In fact my parser generator could be used that way.

Aug 26 2008

Benji Smith <dlanguage benjismith.net> writes:

BCS wrote:
 Reply to Benji,
 Well, for something like a DOM parser, it's pretty much impossible to
 parse a file that won't fit into memory. But a SAX parser doesn't
 actually create any objects. It just calls events, while processing
 XML data from a stream. A good SAX parser can operate without ever
 allocating anything on the heap, leaving the consumer to create any
 necessary objects from the parse process.

 --benji

 
 Interesting, I've worked with parsers* that function something like that 
 but never thought of them in that way. OTOH I can think of only very 
 limited domain where this would be useful. If I needed to process that 
 much data I'd load it into a database and go from there.
 
 *In fact my parser generator could be used that way.

In fact, that's one of the places where I've used this kind of parsing 
technique before.

I wrote a streaming CSV parser (which takes discipline to do correctly, 
since a double-quote enclosed field can legally contain arbitrary 
newline characters, and quotes are escaped by doubling). It provides a 
field callback and a record callback, so it's very handy for performing 
ETL tasks.

If I had to load the whole CSV files into memory before parsing, it 
wouldn't work, because sometimes they can be hundreds of megabytes. But 
the streaming parser takes up almost no memory at all.

--benji

Aug 26 2008

superdan <super dan.org> writes:

Benji Smith Wrote:

 BCS wrote:
 Reply to Benji,
 Well, for something like a DOM parser, it's pretty much impossible to
 parse a file that won't fit into memory. But a SAX parser doesn't
 actually create any objects. It just calls events, while processing
 XML data from a stream. A good SAX parser can operate without ever
 allocating anything on the heap, leaving the consumer to create any
 necessary objects from the parse process.

 --benji

 
 Interesting, I've worked with parsers* that function something like that 
 but never thought of them in that way. OTOH I can think of only very 
 limited domain where this would be useful. If I needed to process that 
 much data I'd load it into a database and go from there.
 
 *In fact my parser generator could be used that way.

 
 In fact, that's one of the places where I've used this kind of parsing 
 technique before.
 
 I wrote a streaming CSV parser (which takes discipline to do correctly, 
 since a double-quote enclosed field can legally contain arbitrary 
 newline characters, and quotes are escaped by doubling). It provides a 
 field callback and a record callback, so it's very handy for performing 
 ETL tasks.
 
 If I had to load the whole CSV files into memory before parsing, it 
 wouldn't work, because sometimes they can be hundreds of megabytes. But 
 the streaming parser takes up almost no memory at all.
 
 --benji

sure it takes very little memory. i'll tell u how much memory u need in fact.
it's the finite state needed by the fsa. u could do that because csv only needs
finite state for parsing. soon as you need to backtrack stream parsing becomes
very difficult.

Aug 26 2008

Benji Smith <dlanguage benjismith.net> writes:

superdan wrote:
 Benji Smith Wrote:
 I wrote a streaming CSV parser (which takes discipline to do correctly, 
 since a double-quote enclosed field can legally contain arbitrary 
 newline characters, and quotes are escaped by doubling). It provides a 
 field callback and a record callback, so it's very handy for performing 
 ETL tasks.

 If I had to load the whole CSV files into memory before parsing, it 
 wouldn't work, because sometimes they can be hundreds of megabytes. But 
 the streaming parser takes up almost no memory at all.

 --benji

 
 sure it takes very little memory. i'll tell u how much memory u need in fact.
it's the finite state needed by the fsa. u could do that because csv only needs
finite state for parsing. soon as you need to backtrack stream parsing becomes
very difficult.

Noooooooobody uses backtracking to parse.

Most of the time LL(k) token lookahead solves the problem. Sometimes you 
need a syntactic predicate or (rarely) a semantic predicate.

I've never even heard of a parser generator framework that supported 
backtracking.

--benji

Aug 26 2008

superdan <super dan.org> writes:

Benji Smith Wrote:

 superdan wrote:
 Benji Smith Wrote:
 I wrote a streaming CSV parser (which takes discipline to do correctly, 
 since a double-quote enclosed field can legally contain arbitrary 
 newline characters, and quotes are escaped by doubling). It provides a 
 field callback and a record callback, so it's very handy for performing 
 ETL tasks.

 If I had to load the whole CSV files into memory before parsing, it 
 wouldn't work, because sometimes they can be hundreds of megabytes. But 
 the streaming parser takes up almost no memory at all.

 --benji

 
 sure it takes very little memory. i'll tell u how much memory u need in fact.
it's the finite state needed by the fsa. u could do that because csv only needs
finite state for parsing. soon as you need to backtrack stream parsing becomes
very difficult.

 
 Noooooooobody uses backtracking to parse.

guess that makes perl regexes et al noooooooobody.

 Most of the time LL(k) token lookahead solves the problem. Sometimes you 
 need a syntactic predicate or (rarely) a semantic predicate.

 I've never even heard of a parser generator framework that supported 
 backtracking.

live & learn. keep lookin'. hint: try antlr.

Aug 26 2008

Benji Smith <dlanguage benjismith.net> writes:

superdan wrote:
 Noooooooobody uses backtracking to parse.

 
 guess that makes perl regexes et al noooooooobody.

I suppose it depends on your definition of "parse".

 Most of the time LL(k) token lookahead solves the problem. Sometimes you 
 need a syntactic predicate or (rarely) a semantic predicate.

 I've never even heard of a parser generator framework that supported 
 backtracking.

 
 live & learn. keep lookin'. hint: try antlr.

I've used ANTLR a few times. It's nice. I didn't realize it supported 
backtracking, though. (In my experience writing parsers, backtracking is 
one of those things you work overtime to eliminate, because it usually 
destroys performance.)

It's funny you should mention ANTLR, actually, in this discussion. A 
year or so ago, I was considering porting the ANTLR runtime to D. The 
original runtime is written in Java, and makes full use of the robust 
string handling capabilities of the Java standard library.

Based on the available text processing functionality in D at that time, 
I quickly gave up on the project as being not worth the effort.

--benji

Aug 26 2008

BCS <ao pathlink.com> writes:

Reply to Benji,

 I've used ANTLR a few times. It's nice.
 

I've used it. If you gave me the choice of sitting in a cardboard small box 
all day or using it again, I'll sit in the cardboard box because I fit in 
that box better.

Aug 26 2008

superdan <super dan.org> writes:

BCS Wrote:

 Reply to Benji,
 
 I've used ANTLR a few times. It's nice.
 

 
 I've used it. If you gave me the choice of sitting in a cardboard small box 
 all day or using it again, I'll sit in the cardboard box because I fit in 
 that box better.

i've used it too eh. u gotta be talking about a pretty charmin' cozy box there.
the effin' mcmansion of boxes in fact. coz antlr is one of the best if not the
best period.

Aug 26 2008

BCS <ao pathlink.com> writes:

Reply to superdan,

 BCS Wrote:
 
 Reply to Benji,
 
 I've used ANTLR a few times. It's nice.
 

 I've used it. If you gave me the choice of sitting in a cardboard
 small box all day or using it again, I'll sit in the cardboard box
 because I fit in that box better.
 

 i've used it too eh. u gotta be talking about a pretty charmin' cozy
 box there. the effin' mcmansion of boxes in fact. coz antlr is one of
 the best if not the best period.
 

The above is intended as a pun, "I don't fit in the ANTLR box". It's like 
MS Word, as long as you do it the way things are intended to be done, clear 
sailing, as soon as you try something else rocks and shoals. 

My other main issue I have had with ANTLR is that the documentation is
ABSOLUTELY 
HORRIBLE! It took me three weeks working with it to even figure out that 
it was intended to be used differently than I expected. I was hard pressed 
to find critical information. Stuff that is, IMHO, only a quarter step less 
important than the fact ANTLR is a parser generator, stuff I'd expect to 
be looking straight at after hitting Google's "I'm feeling lucky" button 
for ANTLR, no scrolling needed.

Aug 26 2008

Benji Smith <dlanguage benjismith.net> writes:

BCS wrote:
 Reply to Benji,
 
 I've used ANTLR a few times. It's nice.

 
 I've used it. If you gave me the choice of sitting in a cardboard small 
 box all day or using it again, I'll sit in the cardboard box because I 
 fit in that box better.

I've always been impressed by the capabilities of ANTLR. The ANTLRWorks 
IDE is a very cool way to develop and debug grammars, and Terrence Parr 
is one of those people that pushes the research into interesting new 
areas (he wrote something a few months ago about simplifying the 
deeply-recursive Expression grammar common in most languages that I 
found very insightful).

The architecture is pretty cool too. Text input in consumed and AST's 
are constructed using token grammars, which are then transformed using 
tree-grammars, and code-generation is performed by output grammars. It's 
a very elegant system, and I've seen some example projects that used a 
sequence of those grammars to translate code between different 
programming languages. It's cool stuff.

So I appreciate ANTLR from that perspective. I think the theory behind 
the project is top-notch.

But the syntax sucks. Badly. The learning curve is waaaay too steep for 
me, so I've always had to keep the documentation close by. And once the 
grammars are written, they're hard to read and maintain.

Also, there's a strong bias in the ANLTR community toward ASTs. I prefer 
to construct a somewhat higher-level parse tree. For example: given the 
expression "1 + 2", I'd like the parser to construct a BinaryOperator 
node, with two Expression node children and an enum "operator" field of 
"PLUS". I'd like it to use a set of pre-defined "parse model" classes 
that I've written to represent the language elements.

It's hard to do that kind of thing in ANTLR, which usually just creates 
a "+" node with children of "1" and "2".

The majority of my parser-generator experience has been with JavaCC, 
which leaves model-generation to the user, which works better for me.

--benji

Aug 26 2008

BCS <ao pathlink.com> writes:

Reply to Benji,

 BCS wrote:
 
 Reply to Benji,
 
 I've used ANTLR a few times. It's nice.
 

 I've used it. If you gave me the choice of sitting in a cardboard
 small box all day or using it again, I'll sit in the cardboard box
 because I fit in that box better.
 

 I've always been impressed by the capabilities of ANTLR. The
 ANTLRWorks IDE is a very cool way to develop and debug grammars, and
 Terrence Parr is one of those people that pushes the research into
 interesting new areas (he wrote something a few months ago about
 simplifying the deeply-recursive Expression grammar common in most
 languages that I found very insightful).
 
 The architecture is pretty cool too. Text input in consumed and AST's
 are constructed using token grammars, which are then transformed using
 tree-grammars, and code-generation is performed by output grammars.
 It's a very elegant system, and I've seen some example projects that
 used a sequence of those grammars to translate code between different
 programming languages. It's cool stuff.
 
 So I appreciate ANTLR from that perspective. I think the theory behind
 the project is top-notch.
 
 But the syntax sucks. Badly. The learning curve is waaaay too steep
 for me, so I've always had to keep the documentation close by. And
 once the grammars are written, they're hard to read and maintain.
 
 Also, there's a strong bias in the ANLTR community toward ASTs. I
 prefer to construct a somewhat higher-level parse tree. For example:
 given the expression "1 + 2", I'd like the parser to construct a
 BinaryOperator node, with two Expression node children and an enum
 "operator" field of "PLUS". I'd like it to use a set of pre-defined
 "parse model" classes that I've written to represent the language
 elements.
 
 It's hard to do that kind of thing in ANTLR, which usually just
 creates a "+" node with children of "1" and "2".
 
 The majority of my parser-generator experience has been with JavaCC,
 which leaves model-generation to the user, which works better for me.
 
 --benji
 

My feeling's exactly (or near enough)

Aug 26 2008

superdan <super dan.org> writes:

Benji Smith Wrote:

 superdan wrote:
 Noooooooobody uses backtracking to parse.

 
 guess that makes perl regexes et al noooooooobody.

 
 I suppose it depends on your definition of "parse".

well since you was gloating about handling a csv file as "parsing" i thot i'd
lower my definition accordingly :)

p.s. sorry benji. you are cool n all (tho to be brutally honest listenin' more
an' talkin' less always helps) but you keep on raisin' those easy balls fer me.
what can i do? i keep on dunkin'em ;)

Aug 26 2008

Benji Smith <dlanguage benjismith.net> writes:

superdan wrote:
 tho to be brutally honest listenin' more an' talkin' less always helps

LOL

Coming from you, dan, that's my favorite ironic quote of the day :)

--benji

Aug 26 2008

superdan <super dan.org> writes:

Benji Smith Wrote:

 superdan wrote:
 tho to be brutally honest listenin' more an' talkin' less always helps

 
 LOL
 
 Coming from you, dan, that's my favorite ironic quote of the day :)

meh. whacha sayin'? i ain't talking much.

Aug 26 2008

"Nick Sabalausky" <a a.a> writes:

"superdan" <super dan.org> wrote in message 
news:g91uku$2l93$1 digitalmars.com...
 Benji Smith Wrote:

 superdan wrote:
 tho to be brutally honest listenin' more an' talkin' less always helps

 LOL

 Coming from you, dan, that's my favorite ironic quote of the day :)

 meh. whacha sayin'? i ain't talking much.

Talking, no. Rambling on with a bunch of needless slang, hostility, and 
missing capitalization that do nothing but hide any kernels of relevance 
that may or may not exist, yes.

Aug 26 2008

superdan <super dan.org> writes:

Nick Sabalausky Wrote:

 "superdan" <super dan.org> wrote in message 
 news:g91uku$2l93$1 digitalmars.com...
 Benji Smith Wrote:

 superdan wrote:
 tho to be brutally honest listenin' more an' talkin' less always helps

 LOL

 Coming from you, dan, that's my favorite ironic quote of the day :)

 meh. whacha sayin'? i ain't talking much.

 Talking, no. Rambling on with a bunch of needless slang, hostility, and 
 missing capitalization that do nothing but hide any kernels of relevance 
 that may or may not exist, yes.

don't be hat'n' :)

Aug 26 2008

BCS <ao pathlink.com> writes:

Reply to superdan,

 Benji Smith Wrote:
 
 superdan wrote:
 
 Noooooooobody uses backtracking to parse.
 

 guess that makes perl regexes et al noooooooobody.
 

 I suppose it depends on your definition of "parse".
 

 well since you was gloating about handling a csv file as "parsing" i
 thot i'd lower my definition accordingly :)
 
 p.s. sorry benji. you are cool n all (tho to be brutally honest
 listenin' more an' talkin' less always helps) but you keep on raisin'
 those easy balls fer me. what can i do? i keep on dunkin'em ;)
 

A CVS parser can be interesting if you have high enough performance demands 
(e.g. total memory footprint smaller than a single field might be)

Aug 26 2008

Benji Smith <dlanguage benjismith.net> writes:

 superdan wrote:

 well since you was gloating about handling a csv file as "parsing" i
 thot i'd lower my definition accordingly :)


I don't know about "gloating". I mentioned it, because it was relevant 
to the conversation about places where streaming parsers are useful. But 
I can't see how it was gloating. Geez.

Why is everything a challenge to you? Why can't you just have a 
conversation, without getting all argumentative?

BCS wrote:
 A CVS parser can be interesting if you have high enough performance 
 demands (e.g. total memory footprint smaller than a single field might be)

It's also interesting from the perspective that you can write a basic 
parser, using a dirt-simple grammar, that performs no backtracking.

In the word of parsers, it's about as simple and braindead as you get, 
but it's damn handy nevertheless.

It's possible to do the same thing with a regular expression, but it's 
very tricky to correctly handle all the weird newline issues, and it's 
even harder to avoid backtracking. I've done it both ways, and the regex 
solution sucks, compared to using a real parser generator.

--benji

Aug 26 2008

superdan <super dan.org> writes:

Benji Smith Wrote:

 superdan wrote:

 well since you was gloating about handling a csv file as "parsing" i
 thot i'd lower my definition accordingly :)


 
 I don't know about "gloating".

was jesting. 'twas too good a comeback after u switched the definition of
parsing on me. twice :)

 I mentioned it, because it was relevant 
 to the conversation about places where streaming parsers are useful. But 
 I can't see how it was gloating. Geez.

 Why is everything a challenge to you? Why can't you just have a 
 conversation, without getting all argumentative?

conversatin's cool. but if you says something wrong and i happen to knows how
it is i'll say how it is.

Aug 26 2008

BCS <ao pathlink.com> writes:

Reply to Benji,

 superdan wrote:
 
 Benji Smith Wrote:
 
 I wrote a streaming CSV parser (which takes discipline to do
 correctly, since a double-quote enclosed field can legally contain
 arbitrary newline characters, and quotes are escaped by doubling).
 It provides a field callback and a record callback, so it's very
 handy for performing ETL tasks.
 
 If I had to load the whole CSV files into memory before parsing, it
 wouldn't work, because sometimes they can be hundreds of megabytes.
 But the streaming parser takes up almost no memory at all.
 
 --benji
 

 sure it takes very little memory. i'll tell u how much memory u need
 in fact. it's the finite state needed by the fsa. u could do that
 because csv only needs finite state for parsing. soon as you need to
 backtrack stream parsing becomes very difficult.
 

 Noooooooobody uses backtracking to parse.
 
 Most of the time LL(k) token lookahead solves the problem. Sometimes
 you need a syntactic predicate or (rarely) a semantic predicate.
 
 I've never even heard of a parser generator framework that supported
 backtracking.
 
 --benji
 

Antlr, dparse and (IIRC) eniki all do

Aug 26 2008

Robert Fraser <fraserofthenight gmail.com> writes:

BCS wrote:
 Reply to Robert,
 
 BCS wrote:

 Reply to Benji,

 The new JSON parser in the Tango library operates on templated
 string arrays. If I want to read from a file or a socket, I have to
 first slurp the whole thing into a character array, even though the
 character-streaming would be more practical.

 Unless you are only going to parse the start of the file or are going
 to
 be throwing away most of it *while you parse it, not after* The best
 way
 to parse a file is to load it all in one OS system call and then run
 a
 slicing parser (like the Tango XML parser) on that.
 One memory allocation and one load or a mmap, and then only the meta
 structures get allocated later.

 There are cases where you might want to parse an XML file that won't
 fit easily in main memory. I think a stream processing SAX parser
 would be a good addition (perhaps not replacement for) the exiting
 one.

 
 If you can't fit the data file in memory the I find it hard to believe 
 you will be able to hold the parsed file in memory. If you can program 
 the parser to dump unneeded data on the fly or process and discard the 
 data, that might make a difference.

I think that's one of the reasons to use a streaming parser -- so you 
can dump data on the fly.

Aug 26 2008

Christopher Wright <dhasenan gmail.com> writes:

superdan wrote:
 but if u have strings like today it's a no-brainer to define a class that does
all that stuff. u can then use that class whenever you feel. it would be
madness to put that class in the language definition. at best it's a candidate
for the stdlib.

Instead, the runtime has to know how to convert between utf8, utf16, and 
utf32. Encodings are not a trivial matter, either.

Aug 25 2008

Benji Smith <dlanguage benjismith.net> writes:

superdan wrote:
 For starters, with strings implemented as character arrays, writing
 library code that accepts and operates on strings is a bit of a pain in
 the neck, since you always have to write templates and template code is
 slightly less readable than non-template code. You can't distribute 


your
 code as a DLL or a shared object, because the template instantiations
 won't be included (unless you create wrapper functions with explicit
 template instantiations, bloating your code size, but more importantly
 tripling the number of functions in your API).

 so u mean with a class the encoding char/wchar/dchar won't be an 

issue anymore. that would be hidden behind the wraps. cool.
 problem is that means there's an indirection cost for every character 

access. oops. so then apps that decided to use some particular encoding 
consistently must pay a price for stuff they don't use.

So, I was thinking about the actual costs involved with the String class 
and CharSequence interface design that I'd like to see (and that exists 


There's the cost of the class wrapper itself, the cost of internally 
representing and converting between encodings, the cost of routing all 
method calls through an interface vtable. Characters, if always 
represented using two bytes, would consume twice the memory. And 
returning characters from method-calls has got to be slower than 
accessing them directly from arrays. Right?

So I wrote some tests, in Java and in D/Tango.

The source code files are attached. Both of the tests perform a common 
set of string operations (searching, splitting, concatenating, and 
character-iterating). I tried to make the functionality as identical as 
possible, though I wasn't sure which technique to use for splitting text 
in Tango, so I used both the "Util.split" and "Util.delimit" functions.

I ran both tests using a 5MB text file, "The Complete Works of William 
Shakespeare", from the Project Gutenberg website:

http://www.gutenberg.org/dirs/etext94/shaks12.txt

You can grab it for yourself, or you can just run the code against your 
favorite large text file.

I compiled and ran the Java code in the 1.6.0_06 JDK, with the "-server" 
flag. The d code was compiled with DMD 1.034 and Tango 0.99.7, using the 
"-O -release -inline" flags.

My test machine is an AMD Turion 64 X2 dual-core laptop, with 2GB of RAM 
and running WinXP SP3.

I ran the tests eight times each, using fine-resolution timers. These 
are the median results:

LOADING THE FILE INTO A STRING:   D/Tango wins, by 428%
    D/Tango: 0.02960 seconds
    Java:    0.12675 seconds

ITERATING OVER CHARS IN A STRING:   Java wins, by 280%
    D/Tango:  0.10093 seconds
    Java:     0.03599 seconds

SEARCHING FOR A SUBSTRING:   D/Tango wins, by 218%
    D/Tango:  0.02251 seconds
    Java:     0.04915 seconds

SEARCH & REPLACE INTO A NEW STRING:   D/Tango wins, by 226%
    D/Tango:  0.17685 seconds
    Java:     0.39996 seconds

SPLIT A STRING ON WHITESPACE:
       Java wins, by 681% (against tango.text.Util.delimit())
       Java wins, by 313%  (against tango.text.Util.split())
    D/Tango (delimit): 8.28195 seconds
    D/Tango (split):   3.80465 seconds
    Java (split):      1.21477 seconds

CONCATENATING STRINGS:   Java wins, by 884%
    D/Tango (array concat, no pre-alloc):  4.07929 seconds
    Java (StringBuilder, no pre-alloc):    0.46150 seconds

SORT STRINGS (CASE-INSENSITIVE):   D/Tango wins, by 226%
    D/Tango:  1.62227 seconds
    Java:     3.66389 seconds

It looks like D mostly falls down when it has to allocate a lot of 
memory, even if it's just allocating slices. The D performance for 
string splitting really surprised me.

I was interested to see, though, that Java was so much faster at 
iterating through the characters in a string, since I used the charAt(i) 
method of the CharSequence interface, rather than directly iterating 
through a char[] array, or even calling the charAt method on the String 
instance.

And yet, character iteration is almost 3 times as fast as in D.

Down with premature optimization! Design the best interfaces possible, 
to enable the most pleasant and flexible programing idioms. The 
performance problems can be solved. :-P

--benji

Aug 26 2008

bearophile <bearophileHUGS lycos.com> writes:

Benji Smith:
 It looks like D mostly falls down when it has to allocate a lot of 
 memory, even if it's just allocating slices. The D performance for 
 string splitting really surprised me.

String splitting requires lot of work for the GC. The GC of HotSpot is light
years ahead of the current D GC. You can see that measuring just the time the D
GC takes to deallocate a large array of the splitted substrings.
I have posted several benchmarks here about this topic.


 I was interested to see, though, that Java was so much faster at 
 iterating through the characters in a string, since I used the charAt(i) 
 method of the CharSequence interface, rather than directly iterating 
 through a char[] array, or even calling the charAt method on the String 
 instance.

HotSpot is able to inline lot of virtual methods too, D can't do those things.


 Down with premature optimization! Design the best interfaces possible, 
 to enable the most pleasant and flexible programing idioms. The 
 performance problems can be solved. :-P

They currently can't be solved by the backends of DMD and GDC, only HotSpot
(and maybe the compiler of the dot net on windows) are able to do that. I don't
know if LLVM will be able to perform some of those things.

Bye,
bearophile

Aug 26 2008

JAnderson <ask me.com> writes:

Benji Smith wrote:
 In another thread (about array append performance) I mentioned that 
 Strings ought to be implemented as classes rather than as simple builtin
 arrays. Superdan asked why. Here's my response...
 
 I'll start with a few of the softball, easy reasons.
 

  > But much more important than either of those reasons is the lack of
 polymorphism on character arrays. Arrays can't have subclasses, and they 
 can't implement interfaces.

I don't think polymorphic strings is right for D strings.  This is the 
sort of thing a lib could implement but D should (and does) provide the 
basic components of which to build more complex components.  You can 
already extend D strings by using strings as a component, if it its 
necessary.

I don't want all this extra overhead in the primitive array type.  It 
seems to me a classic case of feature creep, pretty soon we have 
something that has been designed for all but its original purpose.  I'm 
ok with having features like hash caching as long as they can be 
implemented without changing the core mechanics of the primitive.

To me its not even correct design to inherit from a concrete class 
(there are quite a few books on that, Effective C++ talks about it a bit 
and so does Herb/Alexandrescu 101 Coding standard book).  I think, there 
are *much better* ways to handle this sort of thing.  Personally I don't 
want to encourage that sort of design.

-Joel

Aug 26 2008

Walter Bright <newshound1 digitalmars.com> writes:

Benji Smith wrote:
 For starters, with strings implemented as character arrays, writing 
 library code that accepts and operates on strings is a bit of a pain in 
 the neck, since you always have to write templates and template code is 
 slightly less readable than non-template code.
 You can't distribute your 
 code as a DLL or a shared object, because the template instantiations 
 won't be included (unless you create wrapper functions with explicit 
 template instantiations, bloating your code size, but more importantly 
 tripling the number of functions in your API).

Is the problem you're referring to the fact that there are 3 character 
types?

 Another good low-hanging argument is that strings are frequently used as 
 keys in associative arrays. Every insertion and retrieval in an 
 associative array requires a hashcode computation. And since D strings 
 are just dumb arrays, they have no way of memoizing their hashcodes.

True, but I've written a lot of string processing programs (compilers 
are just one example of such). This has never been an issue, because the 
AA itself memoizes the hash, and from then on the dictionary handle is used.


 We've already observed that D assoc arrays are less performant than even 
 Python maps, so the extra cost of lookup operations is unwelcome.

Every one of those benchmarks that purported to show that D AA's were 
relatively slow turned out to be, on closer examination, D running the 
garbage collector more often than Python does. It had NOTHING to do with 
the AA's.

 But much more important than either of those reasons is the lack of 
 polymorphism on character arrays. Arrays can't have subclasses, and they 
 can't implement interfaces.
 
 A good example of what I'm talking about can be seen in the Phobos and 
 Tango regular expression engines. At least the Tango implementation 
 matches against all string types (the Phobos one only works with char[] 
 strings).
 
 But what if I want to consume a 100 MB logfile, counting all lines that 
 match a pattern?
 
 Right now, to use the either regex engine, I have to read the entire 
 logfile into an enormous array before invoking the regex search function.
 
 Instead, what if there was a CharacterStream interface? And what if all 
 the text-handling code in Phobos & Tango was written to consume and 
 return instances of that interface?
 
 A regex engine accepting a CharacterStream interface could process text 
 from string literals, file input streams, socket input streams, database 
 records, etc, etc, etc... without having to pollute the API with a bunch 
 of casts, copies, and conversions. And my logfile processing application 
 would consume only a tiny fraction of the memory needed by the character 
 array implementation.
 
 Most importantly, the contract between the regex engine and its 
 consumers would provide a well-defined interface for processing text, 
 regardless of the source or representation of that text.

I think a better solution is for regexp to accept an Iterator as its 
source. That doesn't require polymorphic behavior via inheritance, it 
can do polymorphism by value (which is what templates do).

 
 Along a similar vein, I've worked on a lot of parsers over the past few 
 years, for domain specific languages and templating engines, and stuff 
 like that. Sometimes it'd be very handy to define a "Token" class that 
 behaves exactly like a String, but with some additional behavior. 
 Ideally, I'd like to implement that Token class as an implementor of the 
 CharacterStream interface, so that it can be passed directly into other 
 text-handling functions.
 
 But, in D, with no polymorphic text handling, I can't do that.

Templates are the ideal solution to that, and the more specific idiom is 
to use iterators.


 But then again, I haven't used any of the const functionality in D2, so 
 I can't actually comment on relative usability of compiler-enforced 
 immutability versus interface-enforced immutability.

 From my own experience, I didn't 'get' invariant strings until I'd used 
them for a while.

Aug 26 2008

Benji Smith <dlanguage benjismith.net> writes:

Walter Bright wrote:
 Benji Smith wrote:
 You can't distribute your code as a DLL or a shared object, because 
 the template instantiations won't be included (unless you create 
 wrapper functions with explicit template instantiations, bloating your 
 code size, but more importantly tripling the number of functions in 
 your API).

 
 Is the problem you're referring to the fact that there are 3 character 
 types?

Basically, yeah.

With three different character types, and two different array types 
(static & dynamic). And in D2, with const, invariant, and mutable types 
(and soon with shared and unshared), the number of ways of representing 
a "string" in the type-system is overwhelming.

This afternoon, I was writing some string-processing code that I intend 
to distribute in a library, and I couldn't but help thinking to myself 
"This code is probably broken, for anything but the most squeaky-clean 
ASCII text".

I don't mind that there are different character types, or that there are 
different character encodings. But I want to deal with those issues in 
exactly *one* place: in my string constructor (and, very rarely, during 
IO). But 99% of the time, I want to just think of the object as a 
String, with all the ugly details abstracted away.

 Another good low-hanging argument is that strings are frequently used 
 as keys in associative arrays. Every insertion and retrieval in an 
 associative array requires a hashcode computation. And since D strings 
 are just dumb arrays, they have no way of memoizing their hashcodes.

 
 True, but I've written a lot of string processing programs (compilers 
 are just one example of such). This has never been an issue, because the 
 AA itself memoizes the hash, and from then on the dictionary handle is 
 used.

Cool. The hashcode-memoization thing was really just a catalyst to get 
me thinking. It's really at the periphery of my concerns with Strings.

 We've already observed that D assoc arrays are less performant than 
 even Python maps, so the extra cost of lookup operations is unwelcome.

 
 Every one of those benchmarks that purported to show that D AA's were 
 relatively slow turned out to be, on closer examination, D running the 
 garbage collector more often than Python does. It had NOTHING to do with 
 the AA's.

Ah. Good point. Thanks for clarifying. I didn't remember all the 
follow-up details.

 Most importantly, the contract between the regex engine and its 
 consumers would provide a well-defined interface for processing text, 
 regardless of the source or representation of that text.

 
 I think a better solution is for regexp to accept an Iterator as its 
 source. That doesn't require polymorphic behavior via inheritance, it 
 can do polymorphism by value (which is what templates do).

That's a great idea.

I should clarify that my referring to an "interface" was in the informal 
sense. (Though I think actual interfaces would be a reasonable 
solution.) But any sort of contract between text-data-structures and 
text-processing-routines would fit the bill nicely.

 But then again, I haven't used any of the const functionality in D2, 
 so I can't actually comment on relative usability of compiler-enforced 
 immutability versus interface-enforced immutability.

 
  From my own experience, I didn't 'get' invariant strings until I'd used 
 them for a while.

I actually kind of think I'm on the other side of the issue.


programmer (3 years), so immutable Strings are the only thing I've ever 
used. Lots of the other JDK classes are like that, too.

So, from my perspective, it seems like the ideal, low-impact way of 
enforcing immutability is to have the classes enforce it on themselves. 
I've never felt the need for compiler-enforced const semantics in any of 
the work I've done.

Thanks for your replies! I always appreciate hearing from you.

--benji

Aug 26 2008

bearophile <bearophileHUGS lycos.com> writes:

Walter Bright:
 We've already observed that D assoc arrays are less performant than even 
 Python maps, so the extra cost of lookup operations is unwelcome.

 
 Every one of those benchmarks that purported to show that D AA's were 
 relatively slow turned out to be, on closer examination, D running the 
 garbage collector more often than Python does. It had NOTHING to do with 
 the AA's.

Really? I must have missed those conclusions then, despite reading all the
posts on the subject. What solutions do you propose for the problem then? I
recall that disabling the GC didn't improve the situation much. So the problem
now becomes how to improve the D GC? 

In my site I am keeping a gallery of tiny benchmarks where D code (with DMD) is
10 or more times slower than very equivalent Python, C, Java code (I have about
12 programs so far, very different from each other. There's a benchmark
regarding the associative arrays too). Hopefully it will become useful once
people will start tuning D implementations.

Bye,
bearophile

Aug 26 2008

Tomas Lindquist Olsen <tomas famolsen.dk> writes:

bearophile wrote:
 Walter Bright:
 We've already observed that D assoc arrays are less performant than even 
 Python maps, so the extra cost of lookup operations is unwelcome.

 Every one of those benchmarks that purported to show that D AA's were 
 relatively slow turned out to be, on closer examination, D running the 
 garbage collector more often than Python does. It had NOTHING to do with 
 the AA's.

 
 Really? I must have missed those conclusions then, despite reading all the
posts on the subject. What solutions do you propose for the problem then? I
recall that disabling the GC didn't improve the situation much. So the problem
now becomes how to improve the D GC? 
 
 In my site I am keeping a gallery of tiny benchmarks where D code (with DMD)
is 10 or more times slower than very equivalent Python, C, Java code (I have
about 12 programs so far, very different from each other. There's a benchmark
regarding the associative arrays too). Hopefully it will become useful once
people will start tuning D implementations.
 
 Bye,
 bearophile

Might I ask where that site is?
I'd like to compare them against LLVMDC if possible

Tomas

Aug 26 2008

bearophile <bearophileHUGS lycos.com> writes:

Tomas Lindquist Olsen:
 Might I ask where that site is?

I have sent you an email with information and more things, etc.

Bye,
bearophile

Aug 26 2008

Tomas Lindquist Olsen <tomas famolsen.dk> writes:

bearophile wrote:
 Tomas Lindquist Olsen:
 Might I ask where that site is?

 
 I have sent you an email with information and more things, etc.
 
 Bye,
 bearophile

Got it. Thanx :) I'll give it a go over the weekend :)

Tomas

Aug 28 2008

Walter Bright <newshound1 digitalmars.com> writes:

bearophile wrote:
 Walter Bright:
 We've already observed that D assoc arrays are less performant
 than even Python maps, so the extra cost of lookup operations is
 unwelcome.

 Every one of those benchmarks that purported to show that D AA's
 were relatively slow turned out to be, on closer examination, D
 running the garbage collector more often than Python does. It had
 NOTHING to do with the AA's.

 
 Really? I must have missed those conclusions then, despite reading
 all the posts on the subject. What solutions do you propose for the
 problem then? I recall that disabling the GC didn't improve the
 situation much. So the problem now becomes how to improve the D GC?

In my experience with such programs, disabling the collection cycles 
brought the speed up to par.

Aug 26 2008

bearophile <bearophileHUGS lycos.com> writes:

Walter Bright:

In my experience with such programs, disabling the collection cycles brought
the speed up to par.<

In my experience there's some difference still.
The usual disclaimer: benchmarks are tricky things, so anyone is invited to
spot problems in my code.


A very simple benchmark:

// D without GC
import std.gc: disable;
void main() {
    int[int] d;
    disable();
    for (int i; i < 10_000_000; ++i)
        d[i] = 0;
}



from gc import disable
def main():
    d = {}
    disable()
    for i in xrange(10000000):
        d[i] = 0
import psyco; psyco.full()
main()


hash without GC, n = 10_000_000:
  D:     9.12 s
  Psyco: 1.45 s

hash2 with GC, n = 10_000_000:
  D:     9.80 s
  Psyco: 1.46 s


If Psyco isn't used the Python version without GC requires 2.02 seconds. This
means the 2.02 - 1.45 = 0.57 s are needed by the Python virtual machine just to
run those 10_000_000 loops :-)

Warms tests, best of 3, tests performed with Python 2.5.2, Psyco 1.6, on Win
XP, and the last DMD with -O -release -inline.

Python integers are objects, rather bigger than 4 bytes, and they can grow
"naturally" to become multi-precision integers:

 a = 2147483647
 a



2147483647
 a + 1



2147483648L
 type(a)



<type 'int'>
 type(a + 1)



<type 'long'>
 type(7 ** 5)



<type 'int'>
 type(7 ** 55)



<type 'long'>

Bye,
bearophile

Aug 26 2008

Walter Bright <newshound1 digitalmars.com> writes:

bearophile wrote:
 Walter Bright:
 
 In my experience with such programs, disabling the collection
 cycles brought the speed up to par.<

 
 In my experience there's some difference still. The usual disclaimer:
 benchmarks are tricky things, so anyone is invited to spot problems
 in my code.

I invite you to look at the code in internal/aaA.d and do some testing!

Aug 26 2008

Fawzi Mohamed <fmohamed mac.com> writes:

On 2008-08-28 13:33:47 +0200, "Manfred_Nowak" <svv1999 hotmail.com> said:

 Walter Bright wrote:
 
 I invite you to look at the code in internal/aaA.d and do some
 testing!

 
 This invitation is a red herring without the offer to change the
 language, because there exists no implementation for AA covering all
 possible use cases.
 
 The bare minimum to get anything out of fiddling with the
 implemenations for AA is the posibility to use the results of adaptions
 without considerable overhead especially for the declarations. However,
 currenty I do not see any elegant solution, because types of
 implementation of maps are given implicitely.
 
 At least something like prototyping seems to be necessary:
 
 |    int[ char[]] map;
 |    Prototype= typeof( map);
 |    Prototype.implementation= MyAA;
 
 where MyAA is some class type implementing the interface required for
 AA.
 
 Are you willing to do something in this direction?

I think that the invitation should be read as the possibility to 
experiment with some changes for AA, see their effect, and if 
worthwhile provide them back, so that they can be applied to the 
"official" version.

making the standard version changeable seems just horrible form the 
portability and maintainability and clarity of the code: if the 
standard version is not ok for your use you should explicitly use 
another one, otherwise mixing codes that use two different standard 
versions becomes a nightmare.
On the other hand if you think that you can improve the standard 
version for everybody, changing internal/aaA.d is what you should do...

Fawzi
 
 -manfred

Aug 28 2008

"Manfred_Nowak" <svv1999 hotmail.com> writes:

Fawzi Mohamed wrote:

 I think

Sorry. Although I cancelled my posting within seconds, you grabbed it 
even faster.

-manfred

-- 
If life is going to exist in this Universe, then the one thing it 
cannot afford to have is a sense of proportion. (Douglas Adams)

Aug 28 2008

Fawzi Mohamed <fmohamed mac.com> writes:

On 2008-08-28 19:33:52 +0200, "Manfred_Nowak" <svv1999 hotmail.com> said:

 Fawzi Mohamed wrote:
 
 I think

 
 Sorry. Although I cancelled my posting within seconds, you grabbed it
 even faster.
 
 -manfred

well out of curiosity how do you cancel a post? (that way I could have 
removed also mine...

Fawzi

Aug 29 2008

"Manfred_Nowak" <svv1999 hotmail.com> writes:

Fawzi Mohamed wrote:

 how do you cancel a post?

By using a news-client, that has this feature.

-manfred

-- 
If life is going to exist in this Universe, then the one thing it 
cannot afford to have is a sense of proportion. (Douglas Adams)

Aug 29 2008

Michiel Helvensteijn <nomail please.com> writes:

Manfred_Nowak wrote:

 how do you cancel a post?

 
 By using a news-client, that has this feature.

But the news-server also needs to have this feature, and not all do. (Does
this one?)

-- 
Michiel

Aug 29 2008

Walter Bright <newshound1 digitalmars.com> writes:

Fawzi Mohamed wrote:
 I think that the invitation should be read as the possibility to 
 experiment with some changes for AA, see their effect, and if worthwhile 
 provide them back, so that they can be applied to the "official" version.
 
 making the standard version changeable seems just horrible form the 
 portability and maintainability and clarity of the code: if the standard 
 version is not ok for your use you should explicitly use another one, 
 otherwise mixing codes that use two different standard versions becomes 
 a nightmare.

I agree.

 On the other hand if you think that you can improve the standard version 
 for everybody, changing internal/aaA.d is what you should do...

Right.

Aug 29 2008

"Manfred_Nowak" <svv1999 hotmail.com> writes:

Walter Bright wrote:

  use two different standard versions becomes a nightmare.
 I agree.

I retracted my posting immediately because it wasn't well thought out. 
However, the least I wanted was to have "several" "standard" versions. 

So we all agree on this.

But even when I read my rectracted posting again, I can not imagine how 
one can come to the conclusion, that I wanted to have several. 


 
 On the other hand if you think that you can improve the standard
 version for everybody, changing internal/aaA.d is what you should
 do... 

 Right.

I wrote about that some years ago and got no answer:
   what is an improvement for everybody---or
   what is the general usage?

Whithout an agreed definition on that, every change will make someone 
else cry.

-manfred  

-- 
If life is going to exist in this Universe, then the one thing it 
cannot afford to have is a sense of proportion. (Douglas Adams)

Aug 29 2008

"Manfred_Nowak" <svv1999 hotmail.com> writes:

bearophile wrote:

 anyone is invited to spot problems

Biggest of all: using the wrong tool. I.e. using a hash map for a 
maximal populated key range.

-manfred


-- 
Maybe some knowledge of some types of disagreeing and their relation 
can turn out to be useful:
http://blog.createdebate.com/2008/04/07/writing-strong-arguments/

Aug 27 2008

bearophile <bearophileHUGS lycos.com> writes:

Manfred_Nowak:
 Biggest of all: using the wrong tool. I.e. using a hash map for a 
 maximal populated key range.

But if the hash machinery is good it must work well in this common situation
too.

Anyway, let's see how I can write a benchmark that you may like. I can use a
very fast random generator to create random integer keys. This will probably
make the Python version in disadvantage, because such language isn't fit for
doing integer operations as fast as a compiled language (a sum among two
integers may be 100 times slower in Python). This problem can be solved
pre-computing the numbers first and putting them in the associative array later.
Is this enough to satisfy you?

Note that in the meantime I have created another associative array benchmark,
this is string-based, and this time Python-Psyco comes out only about 2-2.5
times faster. I'll show it when I can...

Bye,
bearophile

Aug 27 2008

"Manfred_Nowak" <svv1999 hotmail.com> writes:

bearophile wrote:

 Biggest of all: using the wrong tool. I.e. using a hash map for a
 maximal populated key range.

 
 But if the hash machinery is good it must work well in this common
 situation too. 

 
This statement seems to be as true as the statement:

    But if a house is good it must perform well
    in the common situation of an off shore
    speed boat race.

The problem with your approach is, that you have close to no idea, 
whether your candidates are speed boats or houses. The only thing you 
know seems to be, that in both candidates some humans can live for some 
time.

Your first design seems to have placed both candidates off shore. Now 
you have introduced some randomness in the location.

However, you might only be designing some overly complicated tool for 
computing the percentage of landmass not covered by water on some  
random planet. 

-manfred

--
If life is going to exist in this Universe, then the one thing it 
cannot afford to have is a sense of proportion. (Douglas Adams)

Aug 28 2008

bearophile <bearophileHUGS lycos.com> writes:

Manfred_Nowak:
 This statement seems to be as true as the statement:<

No, it's a different statement.
I'll try to post another more real-looking benchmark later, anyway.

In the meantime I'll keep using Python for many of my purposes where I need
hash maps and sets (and regular expressions, etc), instead of D, because in
tons of real-world tests I have seen Python dicts are quite faster.

Bye,
bearophile

Aug 28 2008

"Manfred_Nowak" <svv1999 hotmail.com> writes:

bearophile wrote:

 This statement seems to be as true as the statement:

 No, it's a different statement.

Yes. It is a different statement. So what?

If you do not recognize how well the accompanying elaborations tied it 
to your statement, then there is no value in going any further.

Just convince Walter and the case is settled.


 I'll try to post another more real-looking benchmark later

Different people might perceive reality different, don't they?

-manfred

-- 
If life is going to exist in this Universe, then the one thing it 
cannot afford to have is a sense of proportion. (Douglas Adams)

Aug 28 2008

bearophile <bearophileHUGS lycos.com> writes:

This is another small associative array benchmark, with strings. Please spot
any problem/bug in it.

You can generate the data set with this Python script:


from array import array
from random import randrange, seed
def generate(filename, N):
    fout = file(filename, "w")
    seed(1)
    a = array("B", " ") * 10
    for i in xrange(N):
        n = randrange(3, 11)
        for j in xrange(n):
            a[j] = randrange(97, 123)
        print >>fout, a.tostring()[:n]
import psyco; psyco.full()
generate("words.txt", 1600000)


It generates a text file of about 13.3 MB, each line contains a random word.
Such dataset isn't exactly like a real dataset, because generally words aren't
random, they contain lot of redundancy that may worsen a little the performance
of an hash function. So this is probably a better situation for an associative
array.


The D code:

import std.stream, std.stdio, std.gc;
void main() {
    //disable();
    int[string] d;
    foreach (string line; new BufferedFile("words.txt"))
        d[line.dup] = 0;
}

Note that this program tests the I/O performance too. If you want to avoid that
you can read all the file lines up-front and time just the AA creation. (SEE
BELOW).

This is a first Python+Psyco version, it's not equal, because the good Python
developers have chosen the newlines at the end of the words:

def main():
    d = {}
    for line in file("words.txt"):
        d[line] = 0
import psyco; psyco.full()
main()


This second slower Python version strips the lines from their newline, rstrip()
is similar to std.string.stripr():

def main():
    d = {}
    for line in file("words.txt"):
        d[line.rstrip()] = 0

import psyco; psyco.full()
main()


Few timings:

N = 800_000:
	Psyco:          0.69 s
	Psyco stripped: 0.77 s
	D:              1.26 s
	D no GC:        0.96 s

N = 1_600_000:
	Psyco:          1.19 s
	Psyco stripped: 1.35 s
	D:              2.80 s
	D no GC:        2.08 s

Note that disabling the GC in those two Python programs has no effects on their
running time.

-------------------------------------

To be sure to not compare apples with oranges I have written two "empty"
benchmarks, that measure the time needed to just read the lines:

D code:

import std.stream;
void main() {
    foreach (string line; new BufferedFile("words.txt"))
        line.dup;
}


Python code:

def main():
    for line in file("words.txt"):
        pass
import psyco; psyco.full()
main()


The D version contains a dup to make the comparison more accurate, because the
"line" variable contains an actual copy and not just a slice.

Line reading timings, N = 1_600_000:
  D:     0.58 s
  Psyco: 0.30 s

So the I/O of Phobos is slower and may enjoy some tuning, so my timings of the
hash benchmarks are off.

Removing 0.58 - 0.30 = 0.28 seconds from the timings of the D associative array
benchmarks you have:

N = 1_600_000:
	Psyco:          1.19 s
	Psyco stripped: 1.35 s
	D:              2.80 s
	D no GC:        2.08 s
	D, I/O c:       2.80 - 0.28 = 2.52 s
	D no GC, I/O c: 2.08 - 0.28 = 1.80 s

------------------------------------

To hopefully create a more meaningful benchmark I have then written code that
loads all the lines before creating the hash.

The Python+Psyco code:

from timeit import default_timer as clock
def main():
    words = []
    for line in open("words.txt"):
        words.append(line.rstrip())
    t = clock()
    d = {}
    for line in words:
        d[line] = 0
    print round(clock() - t, 2), "s"
import psyco; psyco.bind(main)
main()


The D code:

import std.stream, std.stdio, std.c.time, std.gc;
void main() {
    string[] words;
    foreach (string line; new BufferedFile("words.txt"))
        words ~= line.dup;
    //disable();
    auto t = clock();
    int[string] d;
    foreach (word; words)
        d[word] = 0;
    writefln((cast(double)(clock()-t))/CLOCKS_PER_SEC, " s");
}


Timings, N = 1_600_000:
	Psyco:          0.61 s  (total running time: 1.36 s)
	D:              1.42 s  (total running time: 2.46 s)
	D no GC:        1.22 s  (total running time: 2.28 s)

Bye,
bearophile

Aug 28 2008

Fawzi Mohamed <fmohamed mac.com> writes:

On 2008-08-26 14:15:28 +0200, bearophile <bearophileHUGS lycos.com> said:

 [...]
 In my site I am keeping a gallery of tiny benchmarks where D code (with 
 DMD) is 10 or more times slower than very equivalent Python, C, Java 
 code (I have about 12 programs so far, very different from each other. 
 There's a benchmark regarding the associative arrays too). Hopefully it 
 will become useful once people will start tuning D implementations.
 
 Bye,
 bearophile

You know I have got the impression that you have a naive view of 
datastructures, and each time you find a performance problem you ask 
for the data structure to be improved.
One cannot expect to have a single data structure accomodate all uses, 
simply because something is a container, and support a given operation 
it does not mean that it does support it efficiently.
*if* something is slow for a given purpose what I do is to sit down a 
think a little which datastructure is optimal for my problem, and then 
switch to it (maybe taking it from tango containers).

Don't get me wrong, it is useful to know which usage patterns give 
performance problems with the default data structures, and if 
associative arrays would use a tree or some sorted structure for small 
sizes (avoiding the cost of hashing) I would not complain, but I do not 
think (for example) that arrays should necessarily be very optimized 
for appending...

Fawzi

Aug 26 2008

bearophile <bearophileHUGS lycos.com> writes:

Fawzi Mohamed:
You know I have got the impression that you have a naive view of
datastructures,<

I think you are wrong, see below.


One cannot expect to have a single data structure accomodate all uses,<

My view on the topic are:
- Each data structure (DS) is a compromise, it allows you to do some operations
with a certain performance, while giving you a different performance on onther
operations, so it gives you a performance profile. Generally you can't have a
DS with the best performance for every operation.
- Sometimes you may want to choose a DS with a worse performance just because
its implementation is simpler, to reduce development time, bug count, etc.
- The standard library of a modern language has to contain most of the most
commmon DSs, to avoid lot of problems and speed up programming, etc.
- If the standard library doesn't contain a certain DS or if the DS required is
very uncommon, or the performance profile you need for a time-critical part of
your code is very sharp, then your language is supposed able to allow you to
write your own DS (some scripting languages may need you to drop do a lower
level language to do this).
- A modern language is supposed to have some built-in DSs. What ones to choose?
This is a tricky question, but the answer I give is that a built-in data
structure has to be very flexible, so it has to be efficient enough in a large
variety of situations, without being optimal for any situations. This allows
the programmers to use it in most situations, where max performance isn't
required, so the programmer has to use DSs of the standard library (or even
his/her own ones) only once in a while. Creating a very flexible DS is not
easy, it requires lot of tuning, tons of benchmarks done on real code, your DS
often needs to use some extra memory to be flexible (you can even add subsytems
to such DS that collect its usage statistics during the runtime to adapt itself
to the specific usage in the code).


Don't get me wrong, it is useful to know which usage patterns give performance
problems with the default data structures,<

If you want to write efficient programs such knowledge is very important, even
in scripting languages.


and if associative arrays would use a tree or some sorted structure for small
sizes (avoiding the cost of hashing) I would not complain,<

Python hashes are optimized for being little too, and they don't use a tree.


but I do not think (for example) that arrays should necessarily be very
optimized for appending...<

From the long thread it seems that allowing both fast slices, mutability and
fast append isn't easy. I think all three features are important, so some
compromise has to be found, because appending is a common enough operation and
at the moment is slow or relly slow, too much slow.

Now you say that the built-in arrays don't need to be very optimized for
appending. In the Python world they solve this problem mining real programs to
collect real usage statistics. So they try to know if the append is actually
used in real programs, and how often, where, etc.

Bye,
bearophile

Aug 27 2008

Fawzi Mohamed <fmohamed mac.com> writes:

On 2008-08-27 13:21:10 +0200, bearophile <bearophileHUGS lycos.com> said:

 Fawzi Mohamed:
 You know I have got the impression that you have a naive view of 
 datastructures,<

 
 I think you are wrong, see below.

good :)

 One cannot expect to have a single data structure accomodate all uses,<

 
 My view on the topic are:
 - Each data structure (DS) is a compromise, it allows you to do some 
 operations with a certain performance, while giving you a different 
 performance on onther operations, so it gives you a performance 
 profile. Generally you can't have a DS with the best performance for 
 every operation.
 - Sometimes you may want to choose a DS with a worse performance just 
 because its implementation is simpler, to reduce development time, bug 
 count, etc.
 - The standard library of a modern language has to contain most of the 
 most commmon DSs, to avoid lot of problems and speed up programming, 
 etc.
 - If the standard library doesn't contain a certain DS or if the DS 
 required is very uncommon, or the performance profile you need for a 
 time-critical part of your code is very sharp, then your language is 
 supposed able to allow you to write your own DS (some scripting 
 languages may need you to drop do a lower level language to do this).
 - A modern language is supposed to have some built-in DSs. What ones to 
 choose? This is a tricky question, but the answer I give is that a 
 built-in data structure has to be very flexible, so it has to be 
 efficient enough in a large variety of situations, without being 
 optimal for any situations. This allows the programmers to use it in 
 most situations, where max performance isn't required, so the 
 programmer has to use DSs of the standard library (or even his/her own 
 ones) only once in a while. Creating a very flexible DS is not easy, it 
 requires lot of tuning, tons of benchmarks done on real code, your DS 
 often needs to use some extra memory to be flexible (you can even add 
 subsytems to such DS that collect its usage statistics during the 
 runtime to adapt itself to the specific usage in the code).
 
 
 Don't get me wrong, it is useful to know which usage patterns give 
 performance problems with the default data structures,<

 
 If you want to write efficient programs such knowledge is very 
 important, even in scripting languages.

on this we agree

 and if associative arrays would use a tree or some sorted structure for 
 small sizes (avoiding the cost of hashing) I would not complain,<

 
 Python hashes are optimized for being little too, and they don't use a tree.
 
 
 but I do not think (for example) that arrays should necessarily be very 
 optimized for appending...<

 
 From the long thread it seems that allowing both fast slices, 
 mutability and fast append isn't easy. I think all three features are 
 important, so some compromise has to be found, because appending is a 
 common enough operation and at the moment is slow or relly slow, too 
 much slow.
 
 Now you say that the built-in arrays don't need to be very optimized 
 for appending. In the Python world they solve this problem mining real 
 programs to collect real usage statistics. So they try to know if the 
 append is actually used in real programs, and how often, where, etc.

you know as you say it depends on the programs, but it also depend on 
the language, if in a language it is clear that using the default 
structure you are not supposed to append often, and if you really have 
to do it you use a special method if you do, then the programs written 
in that language will use that.

On the other hand when you translate code from a language to another 
you might encounter problems.

The standard array as I see it embodies the philosophy of the C array:
- minimal memory overhead (it is ok to have lots of them, D does have 
some overhead vs C)
- normal memory layout (usable with low level routines that pass memory 
around and with C)
- you can check bounds (C can't)
- you can slice it
- appending to it is difficult

D tries to do different things to mitigate the fact that appending is 
difficult, maybe it could do more, but it will *never* be as efficient 
as a structure that gives up that fact that an array has to be just a 
chunk of contiguous memory.

Now I find the choice of having the basic array being just a chunk of 
contiguous memory, and that the overhead of the structure should be 
minimal very reasonable for a system programming language that has to 
interact with C, so I also find ok that appending is not as fast as 
with other structures.

Clearly someone coming from lisp, any functional language or even 
python, might disagree about what is requested from the "default" array 
container.
It isn't that one is right and the other wrong, it is just a question 
of priorities of the language, the feel of it, its style...

This does not mean that if improvements can be done without 
compromising too much it shouldn't be done, just that failing some 
benchmarks might be ok :)

Fawzi

 
 Bye,
 bearophile

Aug 27 2008

bearophile <bearophileHUGS lycos.com> writes:

Fawzi Mohamed:
 it also depend on 
 the language, if in a language it is clear that using the default 
 structure you are not supposed to append often, and if you really have 
 to do it you use a special method if you do, then the programs written 
 in that language will use that.

I agree. But then the D specs have to be updated to say that the append to
built-in D arrays is a slow or very slow operation (not amortized O(1)), so
people coming from the C++ STL, Python, Ruby, TCl, Lua, Lisp, Clean, Oz, ecc
will not receive a bite from it.


 D tries to do different things to mitigate the fact that appending is 
 difficult, maybe it could do more, but it will *never* be as efficient 
 as a structure that gives up that fact that an array has to be just a 
 chunk of contiguous memory.

I agree, a deque will probably be always faster in appending than a dynamic
array.
But I think D may do more here :-)


 Now I find the choice of having the basic array being just a chunk of 
 contiguous memory, and that the overhead of the structure should be 
 minimal very reasonable for a system programming language that has to 
 interact with C

Note that both vector of the C++ STL and "list" of Python are generally (or
always) implemented with a chunk of contiguous memory.


 Clearly someone coming from lisp, any functional language or even 
 python, might disagree about what is requested from the "default" array 
 container.
 It isn't that one is right and the other wrong, it is just a question 
 of priorities of the language, the feel of it, its style...

I agree, that's why I have suggested to collect statistics from real D code,
and not from Lisp programs, to see what's the a good performance profile
compromise for D built-in dynamic arrays :-) This means that I'll stop caring
for a fast array append in D if most D programmers don't need fast appends much.


 This does not mean that if improvements can be done without 
 compromising too much it shouldn't be done, just that failing some 
 benchmarks might be ok :)

Well, in my opinion that's okay, but there's a limit. So I think a built-in
data structure has to be optimized for being flexible, so while not being very
good in anything, it has to be not terrible in any commonly done operation.
On the other hand, I can see what you say, that in a low level language a too
much flexible data structure may be not fit as built-in, while a simpler and
less flexible one may be fitter. You may be right and I may be wrong on this
point :-)

Bye,
bearophile

Aug 27 2008

Fawzi Mohamed <fmohamed mac.com> writes:

On 2008-08-27 16:16:49 +0200, bearophile <bearophileHUGS lycos.com> said:

 Fawzi Mohamed:
 it also depend on
 the language, if in a language it is clear that using the default
 structure you are not supposed to append often, and if you really have
 to do it you use a special method if you do, then the programs written
 in that language will use that.

 
 I agree. But then the D specs have to be updated to say that the append 
 to built-in D arrays is a slow or very slow operation (not amortized 
 O(1)), so people coming from the C++ STL, Python, Ruby, TCl, Lua, Lisp, 
 Clean, Oz, ecc will not receive a bite from it.
 
 
 D tries to do different things to mitigate the fact that appending is
 difficult, maybe it could do more, but it will *never* be as efficient
 as a structure that gives up that fact that an array has to be just a
 chunk of contiguous memory.

 
 I agree, a deque will probably be always faster in appending than a 
 dynamic array.
 But I think D may do more here :-)
 
 
 Now I find the choice of having the basic array being just a chunk of
 contiguous memory, and that the overhead of the structure should be
 minimal very reasonable for a system programming language that has to
 interact with C

 
 Note that both vector of the C++ STL and "list" of Python are generally 
 (or always) implemented with a chunk of contiguous memory.
 
 
 Clearly someone coming from lisp, any functional language or even
 python, might disagree about what is requested from the "default" array
 container.
 It isn't that one is right and the other wrong, it is just a question
 of priorities of the language, the feel of it, its style...

 
 I agree, that's why I have suggested to collect statistics from real D 
 code, and not from Lisp programs, to see what's the a good performance 
 profile compromise for D built-in dynamic arrays :-) This means that 
 I'll stop caring for a fast array append in D if most D programmers 
 don't need fast appends much.
 
 
 This does not mean that if improvements can be done without
 compromising too much it shouldn't be done, just that failing some
 benchmarks might be ok :)

 
 Well, in my opinion that's okay, but there's a limit. So I think a 
 built-in data structure has to be optimized for being flexible, so 
 while not being very good in anything, it has to be not terrible in any 
 commonly done operation.
 On the other hand, I can see what you say, that in a low level language 
 a too much flexible data structure may be not fit as built-in, while a 
 simpler and less flexible one may be fitter. You may be right and I may 
 be wrong on this point :-)

well it is funny because now I am not too sure anymore that adding an 
extra field (pointing to the end of the reserved data, if the actual 
array is the "owner" of it and to before the start if it isn't and one 
should reallocate (i.e. for slices) is such a bad idea.

I started out with the idea "slightly improved C characteristics", and 
so for me it was clear that appending would be bad, and I was actually 
surprised (using it) in seeing that it was less bad that I supposed.
The way it is has the advantage of allowing bound checks, and a very 
little overhead, but it has no concept of "extra grow space".
So for me it was clear that if I wanted to insert something I had to 
make place first, and then insert it (keeping in mind the old size).

The vector approach (do know about the capacity) can be useful in high 
level code where one wants that a.length always is the length of the 
array (not maybe longer because of some reserved memory).

So maybe it is worthwhile, even if it will for sure add some bloat, I 
do not know...
For most of my code it will just add bloat, but probably a tolerable one

Fawzi

Aug 27 2008

Yigal Chripun <yigal100 gmail.com> writes:

Benji suggested run-time inheritance and at least from a design
perspective I like some of his thoughts.
I've got a few questions though:

a) people here said that a virtual call will make it slow. How much
slow? how much of an overhead is it on modern hardware considering also
that this is a place where hardware manufacturers spend time on
optimizations?

b) can't a string class use implicit casts and maybe some sugar to
pretend to be a regular array in such a way to avoid that virtual call
and still be useful?
you already can do array.func(...) instead of func(array, ...) so this
can be used with a string class that implicitly converts to a char array..

C) compile-time interfaces? (aka concepts)

Aug 26 2008

Walter Bright <newshound1 digitalmars.com> writes:

Yigal Chripun wrote:
 a) people here said that a virtual call will make it slow. How much
 slow? how much of an overhead is it on modern hardware considering also
 that this is a place where hardware manufacturers spend time on
 optimizations?

Virtual function calls have been a problem for hardware optimization. 
Direct function calls can be speculatively executed, but not virtual 
ones, because the hardware cannot predict where it will go. This means 
virtual calls can be much slower than direct function calls.

Aug 26 2008

"Jb" <jb nowhere.com> writes:

"Walter Bright" <newshound1 digitalmars.com> wrote in message 
news:g90iia$2jc4$3 digitalmars.com...
 Yigal Chripun wrote:
 a) people here said that a virtual call will make it slow. How much
 slow? how much of an overhead is it on modern hardware considering also
 that this is a place where hardware manufacturers spend time on
 optimizations?

 Virtual function calls have been a problem for hardware optimization. 
 Direct function calls can be speculatively executed, but not virtual ones, 
 because the hardware cannot predict where it will go. This means virtual 
 calls can be much slower than direct function calls.

Modern x86 branch prediction treats indirect calls the same as conditional 
branches. They get a slot in the branch target buffer, so they do get 
speculatively executed. And if correctly predicted it's only a couple of 
cycles more costly direct calls.

See the thread "Feature Request: nontrivial functions and vtable 
optimizations" about 2 weeks ago.

I cited the technical docs and a few doubters ran benchmarks, which proved 
that virtual methods are not as evil as many people think. In fact they are 
no more evil than a conditional branch.

Aug 26 2008

superdan <super dan.org> writes:

Jb Wrote:

 
 "Walter Bright" <newshound1 digitalmars.com> wrote in message 
 news:g90iia$2jc4$3 digitalmars.com...
 Yigal Chripun wrote:
 a) people here said that a virtual call will make it slow. How much
 slow? how much of an overhead is it on modern hardware considering also
 that this is a place where hardware manufacturers spend time on
 optimizations?

 Virtual function calls have been a problem for hardware optimization. 
 Direct function calls can be speculatively executed, but not virtual ones, 
 because the hardware cannot predict where it will go. This means virtual 
 calls can be much slower than direct function calls.

 
 Modern x86 branch prediction treats indirect calls the same as conditional 
 branches. They get a slot in the branch target buffer, so they do get 
 speculatively executed. And if correctly predicted it's only a couple of 
 cycles more costly direct calls.
 
 See the thread "Feature Request: nontrivial functions and vtable 
 optimizations" about 2 weeks ago.
 
 I cited the technical docs and a few doubters ran benchmarks, which proved 
 that virtual methods are not as evil as many people think. In fact they are 
 no more evil than a conditional branch.

you're right. but direct calls don't speculate. they don't need speculation
because they're direct jump. so they are loaded straight into the pipeline.

so walter was right but used the wrong term.

Aug 26 2008

superdan <super dan.org> writes:

Jb Wrote:

 
 "Walter Bright" <newshound1 digitalmars.com> wrote in message 
 news:g90iia$2jc4$3 digitalmars.com...
 Yigal Chripun wrote:
 a) people here said that a virtual call will make it slow. How much
 slow? how much of an overhead is it on modern hardware considering also
 that this is a place where hardware manufacturers spend time on
 optimizations?

 Virtual function calls have been a problem for hardware optimization. 
 Direct function calls can be speculatively executed, but not virtual ones, 
 because the hardware cannot predict where it will go. This means virtual 
 calls can be much slower than direct function calls.

 
 Modern x86 branch prediction treats indirect calls the same as conditional 
 branches. They get a slot in the branch target buffer, so they do get 
 speculatively executed. And if correctly predicted it's only a couple of 
 cycles more costly direct calls.
 
 See the thread "Feature Request: nontrivial functions and vtable 
 optimizations" about 2 weeks ago.
 
 I cited the technical docs and a few doubters ran benchmarks, which proved 
 that virtual methods are not as evil as many people think. In fact they are 
 no more evil than a conditional branch.

you're right. but direct calls don't speculate. they don't need speculation
because they're direct jump. so they are loaded straight into the pipeline.

walt was right but used the wrong term.

Aug 26 2008

"Jb" <jb nowhere.com> writes:

"superdan" <super dan.org> wrote in message 
news:g912vh$mbe$1 digitalmars.com...
 Jb Wrote:

 "Walter Bright" <newshound1 digitalmars.com> wrote in message
 news:g90iia$2jc4$3 digitalmars.com...
 Yigal Chripun wrote:
 a) people here said that a virtual call will make it slow. How much
 slow? how much of an overhead is it on modern hardware considering 
 also
 that this is a place where hardware manufacturers spend time on
 optimizations?

 Virtual function calls have been a problem for hardware optimization.
 Direct function calls can be speculatively executed, but not virtual 
 ones,
 because the hardware cannot predict where it will go. This means 
 virtual
 calls can be much slower than direct function calls.

 Modern x86 branch prediction treats indirect calls the same as 
 conditional
 branches. They get a slot in the branch target buffer, so they do get
 speculatively executed. And if correctly predicted it's only a couple of
 cycles more costly direct calls.

 See the thread "Feature Request: nontrivial functions and vtable
 optimizations" about 2 weeks ago.

 I cited the technical docs and a few doubters ran benchmarks, which 
 proved
 that virtual methods are not as evil as many people think. In fact they 
 are
 no more evil than a conditional branch.

 you're right. but direct calls don't speculate. they don't need 
 speculation because they're
direct jump. so they are loaded straight into the pipeline.

 walt was right but used the wrong term.

Walter said "the hardware cannot predict where a virtual call will go".

It does in fact predict them, and speculatively execute them, and as pretty 
much any bechmark will show it gets it right the vast majority of the time. 
(On x86 anyway.)

That's what I was saying.

Aug 26 2008

Walter Bright <newshound1 digitalmars.com> writes:

Jb wrote:
 Walter said "the hardware cannot predict where a virtual call will go".
 
 It does in fact predict them, and speculatively execute them, and as pretty 
 much any bechmark will show it gets it right the vast majority of the time. 
 (On x86 anyway.)
 
 That's what I was saying. 

Looks like I keep falling behind on what modern CPUs are doing :-(

In any case, throughout all the revolutions in how CPUs work, there have 
been a few invariants that hold true well enough as an optimization guide:

1. fewer instructions ==> faster execution
2. fewer memory accesses ==> faster execution
3. fewer conditional branches ==> faster execution

Aug 26 2008

bearophile <bearophileHUGS lycos.com> writes:

Walter Bright:
 Looks like I keep falling behind on what modern CPUs are doing :-(

The 5 good PDF files in this page are probably enough to put you back in shape:
http://www.agner.org/optimize/

(Especially ones regarding CPUs and micro architecture. The first document is
the simpler one).

Bye,
bearophile

Aug 26 2008

"Jb" <jb nowhere.com> writes:

"bearophile" <bearophileHUGS lycos.com> wrote in message 
news:g91vf8$2mmk$1 digitalmars.com...
 Walter Bright:
 Looks like I keep falling behind on what modern CPUs are doing :-(

 The 5 good PDF files in this page are probably enough to put you back in 
 shape:
 http://www.agner.org/optimize/

 (Especially ones regarding CPUs and micro architecture. The first document 
 is
 the simpler one).

Anger Fog's guides are the best optimization info you can get. They're 
actualy a lot better than Intels and Amds own optimization guides imo.

Aug 26 2008

"Jb" <jb nowhere.com> writes:

"Walter Bright" <newshound1 digitalmars.com> wrote in message 
news:g91kah$1rvb$1 digitalmars.com...
 Jb wrote:
 Walter said "the hardware cannot predict where a virtual call will go".

 It does in fact predict them, and speculatively execute them, and as 
 pretty much any bechmark will show it gets it right the vast majority of 
 the time. (On x86 anyway.)

 That's what I was saying.

 Looks like I keep falling behind on what modern CPUs are doing :-(

 In any case, throughout all the revolutions in how CPUs work, there have 
 been a few invariants that hold true well enough as an optimization guide:

 1. fewer instructions ==> faster execution
 2. fewer memory accesses ==> faster execution
 3. fewer conditional branches ==> faster execution

True. I'd add this to the list aswell..

4. Shorter dependance chains => faster execution.

Although it's more relevant for floating point where most ops have at least 
a few cycles latency.

Aug 26 2008

JAnderson <ask me.com> writes:

Walter Bright wrote:
 Jb wrote:
 Walter said "the hardware cannot predict where a virtual call will go".

 It does in fact predict them, and speculatively execute them, and as 
 pretty much any bechmark will show it gets it right the vast majority 
 of the time. (On x86 anyway.)

 That's what I was saying. 

 
 Looks like I keep falling behind on what modern CPUs are doing :-(
 
 In any case, throughout all the revolutions in how CPUs work, there have 
 been a few invariants that hold true well enough as an optimization guide:
 
 1. fewer instructions ==> faster execution
 2. fewer memory accesses ==> faster execution
 3. fewer conditional branches ==> faster execution

Also you can't inline virtual calls (well a smart compiler could but 
that's another discussion).   That means the compiler can't optimize so 
well but removing unnecessary operations.

-Joel

Aug 26 2008

JAnderson <ask me.com> writes:

Jb wrote:
 "Walter Bright" <newshound1 digitalmars.com> wrote in message 
 news:g90iia$2jc4$3 digitalmars.com...
 Yigal Chripun wrote:
 a) people here said that a virtual call will make it slow. How much
 slow? how much of an overhead is it on modern hardware considering also
 that this is a place where hardware manufacturers spend time on
 optimizations?

 Virtual function calls have been a problem for hardware optimization. 
 Direct function calls can be speculatively executed, but not virtual ones, 
 because the hardware cannot predict where it will go. This means virtual 
 calls can be much slower than direct function calls.

 
 Modern x86 branch prediction treats indirect calls the same as conditional 
 branches. They get a slot in the branch target buffer, so they do get 
 speculatively executed. And if correctly predicted it's only a couple of 
 cycles more costly direct calls.
 
 See the thread "Feature Request: nontrivial functions and vtable 
 optimizations" about 2 weeks ago.
 
 I cited the technical docs and a few doubters ran benchmarks, which proved 
 that virtual methods are not as evil as many people think. In fact they are 
 no more evil than a conditional branch.
 
 
 
 
 

That's x86 hardware.  Try something like the ps3.  That systems has 
little or no cache.  They have to jump to the vtable which is in a 
totally different location from the class.  Note I'm not in the camp 
that things they should never be used on these ystems however I think 
you should use them smartly and profile profile profile.

One technique I've used in C++ to help improve things a little is to 
switch the vtable with one that's in the same location or close to the 
class.  The wrapper function looked something like this:

class A {...}

A a = new LocalVirualTable<A>(); //ie LocalVirualTable is a bolt-in template

However, performance only improved in cases where I could flush the 
cache.  It many cases it was slightly worse on a x86 so you had to try 
it, profile and see it it had a positive or negative effect in each 
case.  I imagine when you've got hundreds of these classes its simply 
more memory to process, so on a high cache system it can be inversely 
beneficial.  I never tried it on a ps3 so it might be more effective there.

-Joel

Aug 26 2008

"Nick Sabalausky" <a a.a> writes:

"Jb" <jb nowhere.com> wrote in message news:g90mm6$2tk9$1 digitalmars.com...
 "Walter Bright" <newshound1 digitalmars.com> wrote in message 
 news:g90iia$2jc4$3 digitalmars.com...
 Yigal Chripun wrote:
 a) people here said that a virtual call will make it slow. How much
 slow? how much of an overhead is it on modern hardware considering also
 that this is a place where hardware manufacturers spend time on
 optimizations?

 Virtual function calls have been a problem for hardware optimization. 
 Direct function calls can be speculatively executed, but not virtual 
 ones, because the hardware cannot predict where it will go. This means 
 virtual calls can be much slower than direct function calls.

 Modern x86 branch prediction treats indirect calls the same as conditional 
 branches. They get a slot in the branch target buffer, so they do get 
 speculatively executed. And if correctly predicted it's only a couple of 
 cycles more costly direct calls.

Just curious: How "modern", do you mean by "modern" here?

Aug 26 2008

"Jb" <jb nowhere.com> writes:

"Nick Sabalausky" <a a.a> wrote in message 
news:g91m3t$222a$1 digitalmars.com...
 "Jb" <jb nowhere.com> wrote in message 
 news:g90mm6$2tk9$1 digitalmars.com...
 "Walter Bright" <newshound1 digitalmars.com> wrote in message 
 news:g90iia$2jc4$3 digitalmars.com...
 Yigal Chripun wrote:
 a) people here said that a virtual call will make it slow. How much
 slow? how much of an overhead is it on modern hardware considering also
 that this is a place where hardware manufacturers spend time on
 optimizations?

 Virtual function calls have been a problem for hardware optimization. 
 Direct function calls can be speculatively executed, but not virtual 
 ones, because the hardware cannot predict where it will go. This means 
 virtual calls can be much slower than direct function calls.

 Modern x86 branch prediction treats indirect calls the same as 
 conditional branches. They get a slot in the branch target buffer, so 
 they do get speculatively executed. And if correctly predicted it's only 
 a couple of cycles more costly direct calls.

 Just curious: How "modern", do you mean by "modern" here?

Well I thought it was the Pentium II, but acording to AgnerFog, it's since 
the PMMX. So pretty much all Pentiums.

Although that's "predict it goes the same place it did last time".

More recent ones do remember multiple targets and recognize some patterns.

Aug 26 2008

Benji Smith <dlanguage benjismith.net> writes:

Walter Bright wrote:
 Yigal Chripun wrote:
 a) people here said that a virtual call will make it slow. How much
 slow? how much of an overhead is it on modern hardware considering also
 that this is a place where hardware manufacturers spend time on
 optimizations?

 
 Virtual function calls have been a problem for hardware optimization. 
 Direct function calls can be speculatively executed, but not virtual 
 ones, because the hardware cannot predict where it will go. This means 
 virtual calls can be much slower than direct function calls.

What about for software optimization?

I seem to remember reading something about the Objective-C compiler 
maybe six or eight months ago talking about some of its optimization 
techniques.

Obj-C uses a message-passing idiom, and all messages use dynamic 
dispatch, since the list of messages an object can receive is not fixed 
at compile-time.

If I remember correctly, this article said that the dynamic dispatch 
expense only had to be incurred once, upon the first invocation of each 
message type. After that, the address of the appropriate function was 
re-written in memory, so that it pointed directly to the correct code. 
No more dynamic dispatch. Although the message handlers aren't resolved 
until runtime, once invoked, they'll always use the same target.

Or some thing like that.

It was an interesting read. I'll see if I can find it.

--benji

Aug 26 2008

Michel Fortin <michel.fortin michelf.com> writes:

On 2008-08-26 13:25:40 -0400, Benji Smith <dlanguage benjismith.net> said:

 I seem to remember reading something about the Objective-C compiler 
 maybe six or eight months ago talking about some of its optimization 
 techniques.
 
 Obj-C uses a message-passing idiom, and all messages use dynamic 
 dispatch, since the list of messages an object can receive is not fixed 
 at compile-time.
 
 If I remember correctly, this article said that the dynamic dispatch 
 expense only had to be incurred once, upon the first invocation of each 
 message type. After that, the address of the appropriate function was 
 re-written in memory, so that it pointed directly to the correct code. 
 No more dynamic dispatch. Although the message handlers aren't resolved 
 until runtime, once invoked, they'll always use the same target.
 
 Or some thing like that.
 
 It was an interesting read. I'll see if I can find it.

Hum, I believe you're talking about the cache for method calls. What 
Objective-C does is that it caches methods by selector in a lookup 
table. There is one such table for each class, and it gets populated as 
methods are called on that class.

Once a method is in the cache, it's very efficient to find where to 
branch: you take the selector's pointer and apply the mask to get value 
n, then branch on the method pointer from the nth bucket in the table.

All messages are passed by calling the objc_msgSend function. Here's 
how you can implement some of that in D:

	id objc_msgSend(id, SEL, ...) {
		auto n = cast(uint)SEL & id.isa.cache.mask;
		auto func = cast(id function(id, SEL, ...))id.isa.cache.buckets[n];
		if (func != null) {
			<set instruction pointer to func>
			// never returns, the function pointed by func returns instead
		}
		<find func pointer by other means, fill cache, etc.>
	}

I've read somewhere that it's almost as fast as virtual functions. 
While I haven't verified that, it's much more flexible: you can add 
functions at runtime to any class. That's how Objective-C allows you to 
add methods to classes you do not control and you can still override 
them in derived classes.

-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Aug 26 2008

bearophile <bearophileHUGS lycos.com> writes:

Robert Fraser:
 It's a good feature to have (I wouldn't consider a list class complete 
 without it), it just shouldn't be abused.

Its code:

final Ref nth (int n) {
    auto p = this;
    for (int i; i < n; ++i)
        p = p.next;
    return p;
}

If the usage patterns show this is positive, then a static pointer/index pair
may be added, that keeps the last accessed pointer/index, this may speed up
things if the items are accessed sequentially, or even if they are accessed
with some several forward skips, avoiding starting from the beginning each time
:-)

Bye,
bearophile

Aug 27 2008

"Denis Koroskin" <2korden gmail.com> writes:

On Wed, 27 Aug 2008 21:23:53 +0400, bearophile <bearophileHUGS lycos.com>  
wrote:

 Robert Fraser:
 It's a good feature to have (I wouldn't consider a list class complete
 without it), it just shouldn't be abused.

 Its code:

 final Ref nth (int n) {
     auto p = this;
     for (int i; i < n; ++i)
         p = p.next;
     return p;
 }

 If the usage patterns show this is positive, then a static pointer/index  
 pair may be added, that keeps the last accessed pointer/index, this may  
 speed up things if the items are accessed sequentially, or even if they  
 are accessed with some several forward skips, avoiding starting from the  
 beginning each time :-)

 Bye,
 bearophile

I believe this breaks encapsulation, unless an iterator is returned and  
passed as a start. Find is a better name for such operation.

Aug 27 2008

bearophile <bearophileHUGS lycos.com> writes:

Denis Koroskin:
 I believe this breaks encapsulation,

You may be right, I'm learning such thematic still. Can you explain why do you
think it breaks information hiding?
The two static variables are inside the method, and they can't be seen from
outside.

Bye and thank you,
bearophile

Aug 27 2008

"Nick Sabalausky" <a a.a> writes:

"Denis Koroskin" <2korden gmail.com> wrote in message 
news:op.ugj2yeu0o7cclz proton.creatstudio.intranet...
 On Wed, 27 Aug 2008 21:23:53 +0400, bearophile <bearophileHUGS lycos.com> 
 wrote:

 Robert Fraser:
 It's a good feature to have (I wouldn't consider a list class complete
 without it), it just shouldn't be abused.

 Its code:

 final Ref nth (int n) {
     auto p = this;
     for (int i; i < n; ++i)
         p = p.next;
     return p;
 }

 If the usage patterns show this is positive, then a static pointer/index 
 pair may be added, that keeps the last accessed pointer/index, this may 
 speed up things if the items are accessed sequentially, or even if they 
 are accessed with some several forward skips, avoiding starting from the 
 beginning each time :-)

 Bye,
 bearophile

 I believe this breaks encapsulation, unless an iterator is returned and 
 passed as a start. Find is a better name for such operation.

The way I see it, encapsulation is all about the black box idea. And the 
only things you can see from outside the black box are the inputs and 
outputs. So, if encapsulation is desired, things like "Indexing" and "Find" 
should be considered black boxes (ie, encapsulation). And if "Indexing" and 
"Find" are black boxes, that means as defining them purely in terms of the 
relationship between their input and output. So this is how I consider them 
to be defined:

Indexing: Return the element at position N.
Find: Return the position of the element that is equal (value-equal or 
identity-equal, depending on the find) to X .

So if there's a function that returns the element at position N and you call 
it "find" just because of the *internal implementation* of the function is 
an incrementing loop, I consider that to be both a violation of 
encapsulation (because the name has been chosen based on the inner workings 
of the function, not it's input-output relationship) and a violation of the 
definitions.

Aug 27 2008

Derek Parnell <derek psych.ward> writes:

On Wed, 27 Aug 2008 17:08:47 -0400, Nick Sabalausky wrote:

 The way I see it, encapsulation is all about the black box idea. And the 
 only things you can see from outside the black box are the inputs and 
 outputs. 

Well said. 

 Indexing: Return the element at position N.
 Find: Return the position of the element that is equal (value-equal or 
 identity-equal, depending on the find) to X .

Exactly.

 So if there's a function that returns the element at position N and you call 
 it "find" just because of the *internal implementation* of the function is 
 an incrementing loop, I consider that to be both a violation of 
 encapsulation (because the name has been chosen based on the inner workings 
 of the function, not it's input-output relationship) and a violation of the 
 definitions.

Not to mention just plain confusing.

-- 
Derek Parnell
Melbourne, Australia
skype: derek.j.parnell

Aug 27 2008

Dee Girl <deegirl noreply.com> writes:

Derek Parnell Wrote:

 On Wed, 27 Aug 2008 17:08:47 -0400, Nick Sabalausky wrote:
 
 The way I see it, encapsulation is all about the black box idea. And the 
 only things you can see from outside the black box are the inputs and 
 outputs. 

 
 Well said. 

I am sorry I will say my opinion. This sounds good but is simplistic. Black box
is good in principle. But it assume you found right interface for the black
box. If you define bad interface you have a bad black box. Also please remember
that iterator is black box also. But it defines right interface.

 Indexing: Return the element at position N.
 Find: Return the position of the element that is equal (value-equal or 
 identity-equal, depending on the find) to X .

 
 Exactly.
 
 So if there's a function that returns the element at position N and you call 
 it "find" just because of the *internal implementation* of the function is 
 an incrementing loop, I consider that to be both a violation of 
 encapsulation (because the name has been chosen based on the inner workings 
 of the function, not it's input-output relationship) and a violation of the 
 definitions.

 
 Not to mention just plain confusing.

I am confused to. Why is making [] do find good and making find do [] wrong?
How do you make decision? To me it looks the opposite way. Because find is less
demanding contract than []. In design a stronger contract can replace a weaker
contract. Not the inverse. Thank you, Dee Girl

Aug 27 2008

Christopher Wright <dhasenan gmail.com> writes:

Dee Girl wrote:
 I am confused to. Why is making [] do find good and making find do [] wrong?
How do you make decision? To me it looks the opposite way. Because find is less
demanding contract than []. In design a stronger contract can replace a weaker
contract. Not the inverse. Thank you, Dee Girl

Choosing the data structure used is never the job of the code you're 
giving the data to. It's the job of whatever creates the data structure. 
By giving two functions where the only difference is performance 
guarantees, you're allowing the called code to determine what data 
structures it accepts, not based on the operations the structure 
supports, but based on the performance of those data structures.

D's standard arrays are a bad example of this: since they're built in, 
everyone uses them, so it's difficult to replace them with a linked list 
or something else. The effect is more pronounced with associative 
arrays, since there isn't as much choice in lists as in dictionaries. 
But they're so damn useful, and the syntax is hard to match.

Aug 27 2008

Dee Girl <deegirl noreply.com> writes:

Christopher Wright Wrote:

 Dee Girl wrote:
 I am confused to. Why is making [] do find good and making find do [] wrong?
How do you make decision? To me it looks the opposite way. Because find is less
demanding contract than []. In design a stronger contract can replace a weaker
contract. Not the inverse. Thank you, Dee Girl

 
 Choosing the data structure used is never the job of the code you're 
 giving the data to.

Yes. But code should refuse data if data does not respect an interface.

I think binary_search never should work on list. It does really bad thing when
linear search is the correct thing. Do you think it should?

 It's the job of whatever creates the data structure. 
 By giving two functions where the only difference is performance 
 guarantees, you're allowing the called code to determine what data 
 structures it accepts, not based on the operations the structure 
 supports, but based on the performance of those data structures.

I do not think there is anything wrong. And for me performance and complexity
is very different words. Performance is like dmd -O -release against dmd
-debug. Or asm implementation of function. Complexity is very different thing
and essential for algorithm.

 D's standard arrays are a bad example of this: since they're built in, 
 everyone uses them, so it's difficult to replace them with a linked list 
 or something else. The effect is more pronounced with associative 
 arrays, since there isn't as much choice in lists as in dictionaries. 
 But they're so damn useful, and the syntax is hard to match.

I hope Walter adds new data structure that work with std.algorithm. I am not
sure. But I think std.algorithm is make so it works with other types. Not only
arrays.

Aug 27 2008

superdan <super dan.org> writes:

Dee Girl Wrote:

 Christopher Wright Wrote:
 
 Dee Girl wrote:
 I am confused to. Why is making [] do find good and making find do [] wrong?
How do you make decision? To me it looks the opposite way. Because find is less
demanding contract than []. In design a stronger contract can replace a weaker
contract. Not the inverse. Thank you, Dee Girl

 
 Choosing the data structure used is never the job of the code you're 
 giving the data to.

 
 Yes. But code should refuse data if data does not respect an interface.
 
 I think binary_search never should work on list. It does really bad thing when
linear search is the correct thing. Do you think it should?
 
 It's the job of whatever creates the data structure. 
 By giving two functions where the only difference is performance 
 guarantees, you're allowing the called code to determine what data 
 structures it accepts, not based on the operations the structure 
 supports, but based on the performance of those data structures.

 
 I do not think there is anything wrong. And for me performance and complexity
is very different words. Performance is like dmd -O -release against dmd
-debug. Or asm implementation of function. Complexity is very different thing
and essential for algorithm.

cool. dee there's two things u said i like. one was the contract view of
complexity guarantee. after all o(n) is a weaker guarantee than o(1). and heck
nobody complain if someone comes with sort that works in o(n log log n). so
there's some sort of complexity hierarchy. never thot of it that way. that's as
cool as the other side of the pillow.

2nd thing i liked was the performance vs. complexity contrast. linear search in
ruby has less performance than linear search in d. but both have the same
complexity. that's universal. spot on girl. good food for thot while i get
drunk at tgiw. thanx.

Aug 27 2008

Christopher Wright <dhasenan gmail.com> writes:

Dee Girl wrote:
 Christopher Wright Wrote:
 
 Dee Girl wrote:
 I am confused to. Why is making [] do find good and making find do [] wrong?
How do you make decision? To me it looks the opposite way. Because find is less
demanding contract than []. In design a stronger contract can replace a weaker
contract. Not the inverse. Thank you, Dee Girl

 Choosing the data structure used is never the job of the code you're 
 giving the data to.

 
 Yes. But code should refuse data if data does not respect an interface.

Correct. The question is whether you should make asymptotic complexity 
part of the interface. If you do, that hurts when you want to optimize 
for a common case but still allow some inefficient operations.

If you want to define some empty interfaces, such as:
interface IFastIndexList : IList {}

These would allow you to do things like:
bool containsSorted (IList list, Object element)
{
	auto fast = cast(IFastIndexList)list;
	if (fast) return (binarySearch(list, element) >= 0);
	// default implementation here
}

However, your library should not have any publicly exposed operations 
that only take IFastIndexList, and it's silly to define different 
functions for indexing in an IFastIndexList versus an IList.

 I think binary_search never should work on list. It does really bad thing when
linear search is the correct thing. Do you think it should?

I think it should work, since the data structure implements the 
necessary operations. I think no sensible programmer should use it.

 It's the job of whatever creates the data structure. 
 By giving two functions where the only difference is performance 
 guarantees, you're allowing the called code to determine what data 
 structures it accepts, not based on the operations the structure 
 supports, but based on the performance of those data structures.

 
 I do not think there is anything wrong. And for me performance and complexity
is very different words. Performance is like dmd -O -release against dmd
-debug. Or asm implementation of function. Complexity is very different thing
and essential for algorithm.
 
 D's standard arrays are a bad example of this: since they're built in, 
 everyone uses them, so it's difficult to replace them with a linked list 
 or something else. The effect is more pronounced with associative 
 arrays, since there isn't as much choice in lists as in dictionaries. 
 But they're so damn useful, and the syntax is hard to match.

 
 I hope Walter adds new data structure that work with std.algorithm. I am not
sure. But I think std.algorithm is make so it works with other types. Not only
arrays. 

You can do that with templates. If you want to use classes with 
inheritance, or interfaces, you can no longer use templates.

Aug 27 2008

Dee Girl <deegirl noreply.com> writes:

Christopher Wright Wrote:

 Dee Girl wrote:
 Christopher Wright Wrote:
 
 Dee Girl wrote:
 I am confused to. Why is making [] do find good and making find do [] wrong?
How do you make decision? To me it looks the opposite way. Because find is less
demanding contract than []. In design a stronger contract can replace a weaker
contract. Not the inverse. Thank you, Dee Girl

 Choosing the data structure used is never the job of the code you're 
 giving the data to.

 
 Yes. But code should refuse data if data does not respect an interface.

 
 Correct. The question is whether you should make asymptotic complexity 
 part of the interface. If you do, that hurts when you want to optimize 
 for a common case but still allow some inefficient operations.

I think it is really helpful if you learn STL. Very opening for your mind.
Because STL does exactly you said. For example there is function distance(). It
computes distance between iterators in O(n). But if iterators are random it
works in O(1). So function is as fast as possible. And has uniform interface.

 If you want to define some empty interfaces, such as:
 interface IFastIndexList : IList {}
 
 These would allow you to do things like:
 bool containsSorted (IList list, Object element)
 {
 	auto fast = cast(IFastIndexList)list;
 	if (fast) return (binarySearch(list, element) >= 0);
 	// default implementation here
 }
 
 However, your library should not have any publicly exposed operations 
 that only take IFastIndexList, and it's silly to define different 
 functions for indexing in an IFastIndexList versus an IList.

Again. Why make design on your knees (I mean in a hurry) when STL has much
better and beautiful design? I think STL book helps very much!

 I think binary_search never should work on list. It does really bad thing when
linear search is the correct thing. Do you think it should?

 
 I think it should work, since the data structure implements the 
 necessary operations. I think no sensible programmer should use it.

This is very very big misunderstanding. I see this written again. I think
somebody else also explained. In concrete code you can make sensible choice.
But in generic code you can not choice. I think STL book is really helpful.
Also when you maintain code you need compiler to help. If code still compiles
but does O(n*n) it is bad!

 It's the job of whatever creates the data structure. 
 By giving two functions where the only difference is performance 
 guarantees, you're allowing the called code to determine what data 
 structures it accepts, not based on the operations the structure 
 supports, but based on the performance of those data structures.

 
 I do not think there is anything wrong. And for me performance and complexity
is very different words. Performance is like dmd -O -release against dmd
-debug. Or asm implementation of function. Complexity is very different thing
and essential for algorithm.
 
 D's standard arrays are a bad example of this: since they're built in, 
 everyone uses them, so it's difficult to replace them with a linked list 
 or something else. The effect is more pronounced with associative 
 arrays, since there isn't as much choice in lists as in dictionaries. 
 But they're so damn useful, and the syntax is hard to match.

 
 I hope Walter adds new data structure that work with std.algorithm. I am not
sure. But I think std.algorithm is make so it works with other types. Not only
arrays. 

 
 You can do that with templates. If you want to use classes with 
 inheritance, or interfaces, you can no longer use templates.

I do not understand. Thank you, Dee Girl

Aug 27 2008

Michiel Helvensteijn <nomail please.com> writes:

Dee Girl wrote:

 I think it is really helpful if you learn STL. Very opening for your mind.
 Because STL does exactly you said. For example there is function
 distance(). It computes distance between iterators in O(n). But if
 iterators are random it works in O(1). So function is as fast as possible.
 And has uniform interface.

Isn't that all we're saying about opIndex()? It retrieves the value from a
list at a certain index in O(n). But if it is an array it works in O(1). So
function is as fast as possible. And has uniform interface.

-- 
Michiel

Aug 28 2008

Christopher Wright <dhasenan gmail.com> writes:

Dee Girl wrote:
 Christopher Wright Wrote:
 
 Dee Girl wrote:
 Christopher Wright Wrote:

 Dee Girl wrote:
 I am confused to. Why is making [] do find good and making find do [] wrong?
How do you make decision? To me it looks the opposite way. Because find is less
demanding contract than []. In design a stronger contract can replace a weaker
contract. Not the inverse. Thank you, Dee Girl

 Choosing the data structure used is never the job of the code you're 
 giving the data to.

 Yes. But code should refuse data if data does not respect an interface.

 Correct. The question is whether you should make asymptotic complexity 
 part of the interface. If you do, that hurts when you want to optimize 
 for a common case but still allow some inefficient operations.

 
 I think it is really helpful if you learn STL. Very opening for your mind.
Because STL does exactly you said. For example there is function distance(). It
computes distance between iterators in O(n). But if iterators are random it
works in O(1). So function is as fast as possible. And has uniform interface.

The problem with your suggestions -- I have no idea whether your 
suggestions are related to STL or how, since I don't know the STL -- is 
that it's too easy for a library developer to prevent people from using 
a particular data structure just because it implements an operation in a 
slow manner.

 If you want to define some empty interfaces, such as:
 interface IFastIndexList : IList {}

 These would allow you to do things like:
 bool containsSorted (IList list, Object element)
 {
 	auto fast = cast(IFastIndexList)list;
 	if (fast) return (binarySearch(list, element) >= 0);
 	// default implementation here
 }

 However, your library should not have any publicly exposed operations 
 that only take IFastIndexList, and it's silly to define different 
 functions for indexing in an IFastIndexList versus an IList.

 
 Again. Why make design on your knees (I mean in a hurry) when STL has much
better and beautiful design? I think STL book helps very much!

I don't know of STL's design.

A collection has a set of supported operations. A supported operation is 
one that has a correct implementation. An operation need not be 
efficient to be supported, such as opIndex for linked lists.

A collection has guarantees about asymptotic complexity. (Or maybe not.)

These are two different responsibilities. It is a terrible thing to 
conflate them.

 I think binary_search never should work on list. It does really bad thing when
linear search is the correct thing. Do you think it should?

 I think it should work, since the data structure implements the 
 necessary operations. I think no sensible programmer should use it.

 
 This is very very big misunderstanding. I see this written again. I think
somebody else also explained. In concrete code you can make sensible choice.
But in generic code you can not choice. I think STL book is really helpful.
Also when you maintain code you need compiler to help. If code still compiles
but does O(n*n) it is bad!

If you are writing generic code, then your code is not choosing the data 
structure to use. It is the responsibility of the calling code to use an 
efficient data structure for the operation or deal with the extra time 
that the operation will take.

 It's the job of whatever creates the data structure. 
 By giving two functions where the only difference is performance 
 guarantees, you're allowing the called code to determine what data 
 structures it accepts, not based on the operations the structure 
 supports, but based on the performance of those data structures.

 I do not think there is anything wrong. And for me performance and complexity
is very different words. Performance is like dmd -O -release against dmd
-debug. Or asm implementation of function. Complexity is very different thing
and essential for algorithm.

 D's standard arrays are a bad example of this: since they're built in, 
 everyone uses them, so it's difficult to replace them with a linked list 
 or something else. The effect is more pronounced with associative 
 arrays, since there isn't as much choice in lists as in dictionaries. 
 But they're so damn useful, and the syntax is hard to match.

 I hope Walter adds new data structure that work with std.algorithm. I am not
sure. But I think std.algorithm is make so it works with other types. Not only
arrays. 

 You can do that with templates. If you want to use classes with 
 inheritance, or interfaces, you can no longer use templates.

 
 I do not understand. Thank you, Dee Girl

You have builtin arrays. Nothing can inherit from them.

Assume you want your function to work with anything that has the operation:
T opIndex(int);

You can do this:
void operation(T)(T value)
{
	for (int i = 0; i < some_number; i++) do_stuff(value[i]);
}

Now you want to put this in a class and maybe override the behavior in a 
subclass. Fail; you can't; it's a template. You have to choose between 
an interface and force people to use some collection class, or a 
primitive array and force people to use arrays.

Aug 28 2008

Dee Girl <deegirl noreply.com> writes:

Christopher Wright Wrote:

 Dee Girl wrote:
 Christopher Wright Wrote:
 
 Dee Girl wrote:
 Christopher Wright Wrote:

 Dee Girl wrote:
 I am confused to. Why is making [] do find good and making find do [] wrong?
How do you make decision? To me it looks the opposite way. Because find is less
demanding contract than []. In design a stronger contract can replace a weaker
contract. Not the inverse. Thank you, Dee Girl

 Choosing the data structure used is never the job of the code you're 
 giving the data to.

 Yes. But code should refuse data if data does not respect an interface.

 Correct. The question is whether you should make asymptotic complexity 
 part of the interface. If you do, that hurts when you want to optimize 
 for a common case but still allow some inefficient operations.

 
 I think it is really helpful if you learn STL. Very opening for your mind.
Because STL does exactly you said. For example there is function distance(). It
computes distance between iterators in O(n). But if iterators are random it
works in O(1). So function is as fast as possible. And has uniform interface.

 
 The problem with your suggestions -- I have no idea whether your 
 suggestions are related to STL or how, since I don't know the STL -- is 
 that it's too easy for a library developer to prevent people from using 
 a particular data structure just because it implements an operation in a 
 slow manner.
 
 If you want to define some empty interfaces, such as:
 interface IFastIndexList : IList {}

 These would allow you to do things like:
 bool containsSorted (IList list, Object element)
 {
 	auto fast = cast(IFastIndexList)list;
 	if (fast) return (binarySearch(list, element) >= 0);
 	// default implementation here
 }

 However, your library should not have any publicly exposed operations 
 that only take IFastIndexList, and it's silly to define different 
 functions for indexing in an IFastIndexList versus an IList.

 
 Again. Why make design on your knees (I mean in a hurry) when STL has much
better and beautiful design? I think STL book helps very much!

 
 I don't know of STL's design.
 
 A collection has a set of supported operations. A supported operation is 
 one that has a correct implementation. An operation need not be 
 efficient to be supported, such as opIndex for linked lists.
 
 A collection has guarantees about asymptotic complexity. (Or maybe not.)
 
 These are two different responsibilities. It is a terrible thing to 
 conflate them.
 
 I think binary_search never should work on list. It does really bad thing when
linear search is the correct thing. Do you think it should?

 I think it should work, since the data structure implements the 
 necessary operations. I think no sensible programmer should use it.

 
 This is very very big misunderstanding. I see this written again. I think
somebody else also explained. In concrete code you can make sensible choice.
But in generic code you can not choice. I think STL book is really helpful.
Also when you maintain code you need compiler to help. If code still compiles
but does O(n*n) it is bad!

 
 If you are writing generic code, then your code is not choosing the data 
 structure to use. It is the responsibility of the calling code to use an 
 efficient data structure for the operation or deal with the extra time 
 that the operation will take.
 
 It's the job of whatever creates the data structure. 
 By giving two functions where the only difference is performance 
 guarantees, you're allowing the called code to determine what data 
 structures it accepts, not based on the operations the structure 
 supports, but based on the performance of those data structures.

 I do not think there is anything wrong. And for me performance and complexity
is very different words. Performance is like dmd -O -release against dmd
-debug. Or asm implementation of function. Complexity is very different thing
and essential for algorithm.

 D's standard arrays are a bad example of this: since they're built in, 
 everyone uses them, so it's difficult to replace them with a linked list 
 or something else. The effect is more pronounced with associative 
 arrays, since there isn't as much choice in lists as in dictionaries. 
 But they're so damn useful, and the syntax is hard to match.

 I hope Walter adds new data structure that work with std.algorithm. I am not
sure. But I think std.algorithm is make so it works with other types. Not only
arrays. 

 You can do that with templates. If you want to use classes with 
 inheritance, or interfaces, you can no longer use templates.

 
 I do not understand. Thank you, Dee Girl

 
 You have builtin arrays. Nothing can inherit from them.
 
 Assume you want your function to work with anything that has the operation:
 T opIndex(int);
 
 You can do this:
 void operation(T)(T value)
 {
 	for (int i = 0; i < some_number; i++) do_stuff(value[i]);
 }
 
 Now you want to put this in a class and maybe override the behavior in a 
 subclass. Fail; you can't; it's a template. You have to choose between 
 an interface and force people to use some collection class, or a 
 primitive array and force people to use arrays.

Hi, Christopher!

Thank you for answering my post. Thank you very much. Unfortunately I must
leave this discussion. The subject becomes very long. There are many mistakes
in post above or Nick post I want to explain. But I do not know to explain very
well. And it takes me very long time! I can not continue.

Discussion becomes about STL design. To explain I have to explain STL first.
But it is impossible to me ^_^. I think, if you want to discuss STL, then it is
good to know it. It is very brave you can discuss without knowing! I can not do
it and I admire your strong opinion. But even strong opinion can be mistake.
And even strong opinion can change with knowledge. I think it helps very much
if you learn STL. Then even if your opinion does not change you have more
knowledge. Good luck! Thank you and sorry, Dee Girl

Aug 29 2008

"Nick Sabalausky" <a a.a> writes:

"Dee Girl" <deegirl noreply.com> wrote in message 
news:g94oup$2mk7$1 digitalmars.com...
 Christopher Wright Wrote:

 Dee Girl wrote:
 I am confused to. Why is making [] do find good and making find do [] 
 wrong? How do you make decision? To me it looks the opposite way. 
 Because find is less demanding contract than []. In design a stronger 
 contract can replace a weaker contract. Not the inverse. Thank you, Dee 
 Girl

 Choosing the data structure used is never the job of the code you're
 giving the data to.

 Yes. But code should refuse data if data does not respect an interface.

 I think binary_search never should work on list. It does really bad thing 
 when linear search is the correct thing. Do you think it should?

 It's the job of whatever creates the data structure.
 By giving two functions where the only difference is performance
 guarantees, you're allowing the called code to determine what data
 structures it accepts, not based on the operations the structure
 supports, but based on the performance of those data structures.

 I do not think there is anything wrong. And for me performance and 
 complexity is very different words. Performance is like dmd -O -release 
 against dmd -debug. Or asm implementation of function. Complexity is very 
 different thing and essential for algorithm.

True. But, since performance (as you described it) is not particularly 
relevant to these discussions, some of us have been (perhaps incorrectly) 
using "performance" and "complexity" interchangably. We mean "complexity".

Aug 27 2008

"Nick Sabalausky" <a a.a> writes:

"Dee Girl" <deegirl noreply.com> wrote in message 
news:g94k7i$2ahk$1 digitalmars.com...
 Derek Parnell Wrote:

 On Wed, 27 Aug 2008 17:08:47 -0400, Nick Sabalausky wrote:

 The way I see it, encapsulation is all about the black box idea. And 
 the
 only things you can see from outside the black box are the inputs and
 outputs.

 Well said.

 I am sorry I will say my opinion. This sounds good but is simplistic. 
 Black box is good in principle. But it assume you found right interface 
 for the black box. If you define bad interface you have a bad black box. 
 Also please remember that iterator is black box also. But it defines right 
 interface.

 Indexing: Return the element at position N.
 Find: Return the position of the element that is equal (value-equal or
 identity-equal, depending on the find) to X .

 Exactly.

 So if there's a function that returns the element at position N and you 
 call
 it "find" just because of the *internal implementation* of the function 
 is
 an incrementing loop, I consider that to be both a violation of
 encapsulation (because the name has been chosen based on the inner 
 workings
 of the function, not it's input-output relationship) and a violation of 
 the
 definitions.

 Not to mention just plain confusing.

 I am confused to. Why is making [] do find good and making find do [] 
 wrong? How do you make decision? To me it looks the opposite way. Because 
 find is less demanding contract than []. In design a stronger contract can 
 replace a weaker contract. Not the inverse. Thank you, Dee Girl

Making [] do "find" is never good. The [n] should always mean "return the 
element at position n". That is never "find", it's always "indexing" 
(Although some people here have disagreed and claimed that "return the 
element at position n" on a linked list is "find", which might be why you're 
confused).

Aug 27 2008

lurker <lurker lurk.com> writes:

Nick Sabalausky Wrote:

 "Dee Girl" <deegirl noreply.com> wrote in message 
 news:g94k7i$2ahk$1 digitalmars.com...
 Derek Parnell Wrote:

 On Wed, 27 Aug 2008 17:08:47 -0400, Nick Sabalausky wrote:

 The way I see it, encapsulation is all about the black box idea. And 
 the
 only things you can see from outside the black box are the inputs and
 outputs.

 Well said.

 I am sorry I will say my opinion. This sounds good but is simplistic. 
 Black box is good in principle. But it assume you found right interface 
 for the black box. If you define bad interface you have a bad black box. 
 Also please remember that iterator is black box also. But it defines right 
 interface.

 Indexing: Return the element at position N.
 Find: Return the position of the element that is equal (value-equal or
 identity-equal, depending on the find) to X .

 Exactly.

 So if there's a function that returns the element at position N and you 
 call
 it "find" just because of the *internal implementation* of the function 
 is
 an incrementing loop, I consider that to be both a violation of
 encapsulation (because the name has been chosen based on the inner 
 workings
 of the function, not it's input-output relationship) and a violation of 
 the
 definitions.

 Not to mention just plain confusing.

 I am confused to. Why is making [] do find good and making find do [] 
 wrong? How do you make decision? To me it looks the opposite way. Because 
 find is less demanding contract than []. In design a stronger contract can 
 replace a weaker contract. Not the inverse. Thank you, Dee Girl

 
 Making [] do "find" is never good. The [n] should always mean "return the 
 element at position n". That is never "find", it's always "indexing" 
 (Although some people here have disagreed and claimed that "return the 
 element at position n" on a linked list is "find", which might be why you're 
 confused).

(I hate myself for letting myself getting into this......)

It's you who is confused, and badly. More over I think Dee said she's confused
sort of ironically. Her posts are nowhere near confusion. I can only say she
kicks some serious butt. She merly reveals your confusion. Whenever she says
something we should be listening. 

Come to your senses. Open any algorithms book. Finding the nth element in a
list is linear search. It does not matter whether you are looking for a value
or an index. Search is search is search. Claiming a linear search it's not a
linear search is just empty retoric and semantic masturbation.

At the beginning of this huge thread I was straight in the
I-dont-care-about-theory-crap-opIndex-is-convenient camp. But the arguments put
forth especially by Dee and (I hate to say it), Dan (Dan you ARE a pompous
prick) got me seriously thinking. Dee had me at the contract metaphor.

Aug 27 2008

"Nick Sabalausky" <a a.a> writes:

"lurker" <lurker lurk.com> wrote in message 
news:g95e2c$1rbf$1 digitalmars.com...
 Finding the nth element in a list is linear search. It does not matter 
 whether you are looking for a value or an index.

In terms of under-the-hood implementation, yes. In terms of input/output 
interface, no.

Aug 28 2008

"Nick Sabalausky" <a a.a> writes:

"lurker" <lurker lurk.com> wrote in message 
news:g95e2c$1rbf$1 digitalmars.com...
 Come to your senses. Open any algorithms book. Finding the nth element in 
 a list is linear search. It does not matter whether you are looking for a 
 value or an index. Search is search is search. Claiming a linear search 
 it's not a linear search is just empty retoric and semantic masturbation.

Not that I normally go to such lengths for online debates, but I just 
happened to have my algorithms book a few feet away, and, well, I really was 
curious what it would say...

Apperently, not much. The Second Edition of "Introduction to Algorithms" 
from MIT Press doesn't appear to back either of us on this point. On page 
198, it lists "Operations on dynamic sets". "Search" is defined as 
retreiving a pointer to an element, given the element's "key" (ie, not index 
or value, and the book defines "key" on the prior page and the defenition 
doesn't match what we would consider an index or value).  But none of the 
other operations are anything that would correspond to an "Indexing" 
operation.  The section on linked lists (pages 204-209) doesn't provide any 
more clarification. It still treats search as retrieving a pointer to an 
element given an associated key and doesn't mention anything about getting 
the "Nth" element (perhaps not surprising, since the *implementation* of 
such an operation would obviously be very similar to search).

Aug 28 2008

Dee Girl <deegirl noreply.com> writes:

Nick Sabalausky Wrote:

 "lurker" <lurker lurk.com> wrote in message 
 news:g95e2c$1rbf$1 digitalmars.com...
 Come to your senses. Open any algorithms book. Finding the nth element in 
 a list is linear search. It does not matter whether you are looking for a 
 value or an index. Search is search is search. Claiming a linear search 
 it's not a linear search is just empty retoric and semantic masturbation.

 
 Not that I normally go to such lengths for online debates, but I just 
 happened to have my algorithms book a few feet away, and, well, I really was 
 curious what it would say...
 
 Apperently, not much. The Second Edition of "Introduction to Algorithms" 
 from MIT Press doesn't appear to back either of us on this point. On page 
 198, it lists "Operations on dynamic sets". "Search" is defined as 
 retreiving a pointer to an element, given the element's "key" (ie, not index 
 or value, and the book defines "key" on the prior page and the defenition 
 doesn't match what we would consider an index or value).  But none of the 
 other operations are anything that would correspond to an "Indexing" 
 operation.  The section on linked lists (pages 204-209) doesn't provide any 
 more clarification. It still treats search as retrieving a pointer to an 
 element given an associated key and doesn't mention anything about getting 
 the "Nth" element (perhaps not surprising, since the *implementation* of 
 such an operation would obviously be very similar to search).

The properties are important Nick-san. Not the name! If it looks like a duck
and quakes like a duck then it is linear search ^_^.

Aug 28 2008

Fawzi Mohamed <fmohamed mac.com> writes:

On 2008-08-28 00:24:50 +0200, Dee Girl <deegirl noreply.com> said:

 Derek Parnell Wrote:
 
 On Wed, 27 Aug 2008 17:08:47 -0400, Nick Sabalausky wrote:
 
 The way I see it, encapsulation is all about the black box idea. And the
 only things you can see from outside the black box are the inputs and
 outputs.

 
 Well said.

 
 I am sorry I will say my opinion. This sounds good but is simplistic. 
 Black box is good in principle. But it assume you found right interface 
 for the black box. If you define bad interface you have a bad black 
 box. Also please remember that iterator is black box also. But it 
 defines right interface.

I agree with the meaning, but I disagree with the example.
I think that iterators are an example of bad interface, as also others 
brought up the iterator as good example I though that I should say 
something.

An iterator should be like a generator, have a method next, and one 
at_end or something similar packaged (and maybe prev() and at_start() 
if it can also go back) in a single struct, furthermore it should work 
seamlessly with a kind of for_each(x;iterator) construct.

Instead C++ choose to have begin & end iterators, simply because with 
that construct it is trivial for the compiler to optimize it for 
arrays, and you can use pointers as iterators without a 
cast/constructor.

This means a worse interface for 99% of the uses, apart form arrays and 
vectors I think one is better off without end iterator, and even when 
this is not the case writing something like for_each(x;FromTo(a,b)), 
with FromTo constructing a generator is (I think) better than 
for(i=t.begin();i!=t.end();++t), and the implementation of an a 
generator itself is easier (no ==,!=,increment, decrement(pre/post),... 
for performance reasons)

As I believe that the optimizations to make the better interface be as 
efficient as the iterator one are perfectly doable (some work, yes, but 
not extremely much so), I see no good reason for C++ design.

Fawzi

Aug 28 2008

Fawzi Mohamed <fmohamed mac.com> writes:

On 2008-08-28 11:47:22 +0200, Fawzi Mohamed <fmohamed mac.com> said:

 On 2008-08-28 00:24:50 +0200, Dee Girl <deegirl noreply.com> said:
 
 Derek Parnell Wrote:
 
 On Wed, 27 Aug 2008 17:08:47 -0400, Nick Sabalausky wrote:
 
 The way I see it, encapsulation is all about the black box idea. And the
 only things you can see from outside the black box are the inputs and
 outputs.

 
 Well said.

 
 I am sorry I will say my opinion. This sounds good but is simplistic. 
 Black box is good in principle. But it assume you found right interface 
 for the black box. If you define bad interface you have a bad black 
 box. Also please remember that iterator is black box also. But it 
 defines right interface.

 
 I agree with the meaning, but I disagree with the example.
 I think that iterators are an example of bad interface, as also others 
 brought up the iterator as good example I though that I should say 
 something.
 
 An iterator should be like a generator, have a method next, and one 
 at_end or something similar packaged (and maybe prev() and at_start() 
 if it can also go back) in a single struct, furthermore it should work 
 seamlessly with a kind of for_each(x;iterator) construct.
 
 Instead C++ choose to have begin & end iterators, simply because with 
 that construct it is trivial for the compiler to optimize it for 
 arrays, and you can use pointers as iterators without a 
 cast/constructor.
 
 This means a worse interface for 99% of the uses, apart form arrays and 
 vectors I think one is better off without end iterator, and even when 
 this is not the case writing something like for_each(x;FromTo(a,b)), 
 with FromTo constructing a generator is (I think) better than 
 for(i=t.begin();i!=t.end();++t), and the implementation of an a 
 generator itself is easier (no ==,!=,increment, decrement(pre/post),... 
 for performance reasons)
 
 As I believe that the optimizations to make the better interface be as 
 efficient as the iterator one are perfectly doable (some work, yes, but 
 not extremely much so), I see no good reason for C++ design.
 
 Fawzi

Please note before the discussion gets messy, that I find the hierarchy 
and kinds of iterators in c++ well done, (input,output,...random), but 
the interface of them (especially the most used) flawed.

Aug 28 2008

Michel Fortin <michel.fortin michelf.com> writes:

On 2008-08-28 05:47:22 -0400, Fawzi Mohamed <fmohamed mac.com> said:

 An iterator should be like a generator, have a method next, and one 
 at_end or something similar packaged (and maybe prev() and at_start() 
 if it can also go back) in a single struct, furthermore it should work 
 seamlessly with a kind of for_each(x;iterator) construct.

I perfectly agree with this. This is why I prefer much Objective-C's 
NSEnumerator over C++ iterators: they're much simpler to use.

Some time ago, Walter asked if he could change D arrays to have 
internally an end pointer instead of a length value. He said it was to 
allow arrays, or slices, to be used as iterator efficiently. I hope he 
hasn't given up on that idea since slices are much easier to understand 
and use than C++-style iterators.


 Instead C++ choose to have begin & end iterators, simply because with 
 that construct it is trivial for the compiler to optimize it for 
 arrays, and you can use pointers as iterators without a 
 cast/constructor.

I would add that if you are using many iterators for the same array 
(vector) in a given algorithm or in function parameters, having only 
one "end" pointer save some space.


-- 
Michel Fortin  
michel.fortin michelf.com  
http://michelf.com/

Aug 28 2008

Fawzi Mohamed <fmohamed mac.com> writes:

On 2008-08-28 13:06:59 +0200, Michel Fortin <michel.fortin michelf.com> said:

 On 2008-08-28 05:47:22 -0400, Fawzi Mohamed <fmohamed mac.com> said:
 
 An iterator should be like a generator, have a method next, and one 
 at_end or something similar packaged (and maybe prev() and at_start() 
 if it can also go back) in a single struct, furthermore it should work 
 seamlessly with a kind of for_each(x;iterator) construct.

 
 I perfectly agree with this. This is why I prefer much Objective-C's 
 NSEnumerator over C++ iterators: they're much simpler to use.
 
 Some time ago, Walter asked if he could change D arrays to have 
 internally an end pointer instead of a length value. He said it was to 
 allow arrays, or slices, to be used as iterator efficiently. I hope he 
 hasn't given up on that idea since slices are much easier to understand 
 and use than C++-style iterators.

I agree

 Instead C++ choose to have begin & end iterators, simply because with 
 that construct it is trivial for the compiler to optimize it for 
 arrays, and you can use pointers as iterators without a 
 cast/constructor.

 
 I would add that if you are using many iterators for the same array 
 (vector) in a given algorithm or in function parameters, having only 
 one "end" pointer save some space.

ok given, but I would say that one almost never takes advantage of 
this, and even then the real advantage is probably small.

Aug 28 2008

"David Wilson" <dw botanicus.net> writes:

2008/8/28 Fawzi Mohamed <fmohamed mac.com>:
 On 2008-08-28 00:24:50 +0200, Dee Girl <deegirl noreply.com> said:

 Derek Parnell Wrote:

 On Wed, 27 Aug 2008 17:08:47 -0400, Nick Sabalausky wrote:

 The way I see it, encapsulation is all about the black box idea. And the
 only things you can see from outside the black box are the inputs and
 outputs.

 Well said.

 I am sorry I will say my opinion. This sounds good but is simplistic.
 Black box is good in principle. But it assume you found right interface for
 the black box. If you define bad interface you have a bad black box. Also
 please remember that iterator is black box also. But it defines right
 interface.

 I agree with the meaning, but I disagree with the example.
 I think that iterators are an example of bad interface, as also others
 brought up the iterator as good example I though that I should say
 something.

 An iterator should be like a generator, have a method next, and one at_end
 or something similar packaged (and maybe prev() and at_start() if it can
 also go back) in a single struct, furthermore it should work seamlessly with
 a kind of for_each(x;iterator) construct.

 Instead C++ choose to have begin & end iterators, simply because with that
 construct it is trivial for the compiler to optimize it for arrays, and you
 can use pointers as iterators without a cast/constructor.

 This means a worse interface for 99% of the uses, apart form arrays and
 vectors I think one is better off without end iterator, and even when this
 is not the case writing something like for_each(x;FromTo(a,b)), with FromTo
 constructing a generator is (I think) better than
 for(i=t.begin();i!=t.end();++t), and the implementation of an a generator
 itself is easier (no ==,!=,increment, decrement(pre/post),... for
 performance reasons)

It is worse in simple "for each element" uses, but one of the cooler
things about C++ iterators is exactly that they are exchangeable for
any pointer-like object, and that it's trivial to iterate over a
subset of a container. Emulating pointers means that in some ways,
once you grok how they work, they are more trivial than e.g. Python or
D since there is no further specialised syntax beyond "*", "++", and
maybe "--".

I love the concise D and Python syntaxes, just playing devil's advocate here. :)


 As I believe that the optimizations to make the better interface be as
 efficient as the iterator one are perfectly doable (some work, yes, but not
 extremely much so), I see no good reason for C++ design.

 Fawzi

Aug 28 2008

Fawzi Mohamed <fmohamed mac.com> writes:

On 2008-08-28 13:31:26 +0200, "David Wilson" <dw botanicus.net> said:

 2008/8/28 Fawzi Mohamed <fmohamed mac.com>:
 On 2008-08-28 00:24:50 +0200, Dee Girl <deegirl noreply.com> said:
 
 Derek Parnell Wrote:
 
 On Wed, 27 Aug 2008 17:08:47 -0400, Nick Sabalausky wrote:
 
 The way I see it, encapsulation is all about the black box idea. And the
 only things you can see from outside the black box are the inputs and
 outputs.

 
 Well said.

 
 I am sorry I will say my opinion. This sounds good but is simplistic.
 Black box is good in principle. But it assume you found right interface for
 the black box. If you define bad interface you have a bad black box. Also
 please remember that iterator is black box also. But it defines right
 interface.

 
 I agree with the meaning, but I disagree with the example.
 I think that iterators are an example of bad interface, as also others
 brought up the iterator as good example I though that I should say
 something.
 
 An iterator should be like a generator, have a method next, and one at_end
 or something similar packaged (and maybe prev() and at_start() if it can
 also go back) in a single struct, furthermore it should work seamlessly with
 a kind of for_each(x;iterator) construct.
 
 Instead C++ choose to have begin & end iterators, simply because with that
 construct it is trivial for the compiler to optimize it for arrays, and you
 can use pointers as iterators without a cast/constructor.
 
 This means a worse interface for 99% of the uses, apart form arrays and
 vectors I think one is better off without end iterator, and even when this
 is not the case writing something like for_each(x;FromTo(a,b)), with FromTo
 constructing a generator is (I think) better than
 for(i=t.begin();i!=t.end();++t), and the implementation of an a generator
 itself is easier (no ==,!=,increment, decrement(pre/post),... for
 performance reasons)

 
 It is worse in simple "for each element" uses, but one of the cooler
 things about C++ iterators is exactly that they are exchangeable for
 any pointer-like object, and that it's trivial to iterate over a
 subset of a container. Emulating pointers means that in some ways,
 once you grok how they work, they are more trivial than e.g. Python or
 D since there is no further specialised syntax beyond "*", "++", and
 maybe "--".
 
 I love the concise D and Python syntaxes, just playing devil's advocate 
 here. :)

I appreciate it, and in fact a general iterator is more powerful than 
the simple interface I gave.
And the iterator concept *is* a powerful one.
The thing is a general random iterator can also be implemented as a 
single structure (not a pair, start and end), you just need more 
methods (and a comparison operator if you want to compare iterators).
Indeed you have a little overhead for arrays and vectors (you also 
store the end), but I think that in most cases the real difference is 
0, and a nicer handling of the most common case more than offsets all 
these considerations (and it is also nicer to implement a new iterator).

 
 As I believe that the optimizations to make the better interface be as
 efficient as the iterator one are perfectly doable (some work, yes, but not
 extremely much so), I see no good reason for C++ design.
 
 Fawzi

Aug 28 2008

"Denis Koroskin" <2korden gmail.com> writes:

On Thu, 28 Aug 2008 13:47:22 +0400, Fawzi Mohamed <fmohamed mac.com> wrote:

 On 2008-08-28 00:24:50 +0200, Dee Girl <deegirl noreply.com> said:

 Derek Parnell Wrote:

 On Wed, 27 Aug 2008 17:08:47 -0400, Nick Sabalausky wrote:

 The way I see it, encapsulation is all about the black box idea. And  
 the
 only things you can see from outside the black box are the inputs and
 outputs.

  Well said.

  I am sorry I will say my opinion. This sounds good but is simplistic.  
 Black box is good in principle. But it assume you found right interface  
 for the black box. If you define bad interface you have a bad black  
 box. Also please remember that iterator is black box also. But it  
 defines right interface.

 I agree with the meaning, but I disagree with the example.
 I think that iterators are an example of bad interface, as also others  
 brought up the iterator as good example I though that I should say  
 something.

 An iterator should be like a generator, have a method next, and one  
 at_end or something similar packaged (and maybe prev() and at_start() if  
 it can also go back) in a single struct, furthermore it should work  
 seamlessly with a kind of for_each(x;iterator) construct.

 Instead C++ choose to have begin & end iterators, simply because with  
 that construct it is trivial for the compiler to optimize it for arrays,  
 and you can use pointers as iterators without a cast/constructor.

 This means a worse interface for 99% of the uses, apart form arrays and  
 vectors I think one is better off without end iterator, and even when  
 this is not the case writing something like for_each(x;FromTo(a,b)),  
 with FromTo constructing a generator is (I think) better than  
 for(i=t.begin();i!=t.end();++t), and the implementation of an a  
 generator itself is easier (no ==,!=,increment, decrement(pre/post),...  
 for performance reasons)

 As I believe that the optimizations to make the better interface be as  
 efficient as the iterator one are perfectly doable (some work, yes, but  
 not extremely much so), I see no good reason for C++ design.

 Fawzi

Agreed. I usually write iterators that support "T geValue()", "void  
moveNext()" and "bool isValid()" operations.
Also I don't like Java-style "T getNext()" iterator idiom and think it  
should be split into two methods: getValue() and moveNext().

Aug 28 2008

D Programming

C/C++ Programming

Other

digitalmars.D - Why Strings as Classes?