www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - AA implementation algo should be mentioned

reply Davidl 126.com writes:
without looking into phobos or join irc channel people would
never know what algo is used by AA. and that would make people
feel AA is not reliable, why not mention the algo used by AA
and the complexity on the page? i think that would make the
doc looks better
Mar 17 2007
next sibling parent reply Sean Kelly <sean f4.ca> writes:
Davidl 126.com wrote:
 without looking into phobos or join irc channel people would
 never know what algo is used by AA. and that would make people
 feel AA is not reliable, why not mention the algo used by AA
 and the complexity on the page? i think that would make the
 doc looks better

I personally love the C++ spec in this respect. Everything in the library is treated as an algorithm and complexity information is defined for all operations. However, I'm not sure that specifics about the algorithm should be put into the D spec for AAs beyond that they must be O(1) for lookups, O(N) for iteration via foreach, and possibly some constraints on insert/remove performance (possibly amortized O(1) for inserts and maybe O(N) for removals?). Sean
Mar 17 2007
parent reply Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
Sean Kelly wrote:
 Davidl 126.com wrote:
 without looking into phobos or join irc channel people would
 never know what algo is used by AA. and that would make people
 feel AA is not reliable, why not mention the algo used by AA
 and the complexity on the page? i think that would make the
 doc looks better

I personally love the C++ spec in this respect. Everything in the library is treated as an algorithm and complexity information is defined for all operations. However, I'm not sure that specifics about the algorithm should be put into the D spec for AAs beyond that they must be O(1) for lookups, O(N) for iteration via foreach, and possibly some constraints on insert/remove performance (possibly amortized O(1) for inserts and maybe O(N) for removals?).

I agree it would be nice if this were specified. However, I don't think inserts are currently amortized O(1), are they? IIRC the Phobos implementation uses a hash table with binary trees for collision handling, which would mean /average/ O(1) for "good" toHash() implementations, but worst-case O(log N) (for "bad" toHash()s). Also, that would mean removals are currently O(log N) as well, but specifying O(N) would allow other standard library implementations to use linked lists instead of binary trees so perhaps that's for the best.
Mar 17 2007
next sibling parent reply Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
Frits van Bommel wrote:
[about inserts]
 IIRC the Phobos implementation uses a hash table with binary trees for 
 collision handling, which would mean /average/ O(1) for "good" toHash() 
 implementations, but worst-case O(log N) (for "bad" toHash()s).

Oh sorry, forgot about resizing. What does that make it, average amortized O(1)? :P
Mar 17 2007
parent Sean Kelly <sean f4.ca> writes:
Frits van Bommel wrote:
 Frits van Bommel wrote:
 [about inserts]
 IIRC the Phobos implementation uses a hash table with binary trees for 
 collision handling, which would mean /average/ O(1) for "good" 
 toHash() implementations, but worst-case O(log N) (for "bad" toHash()s).

Oh sorry, forgot about resizing. What does that make it, average amortized O(1)? :P

I think so. Sean
Mar 17 2007
prev sibling parent reply Sean Kelly <sean f4.ca> writes:
Frits van Bommel wrote:
 Sean Kelly wrote:
 Davidl 126.com wrote:
 without looking into phobos or join irc channel people would
 never know what algo is used by AA. and that would make people
 feel AA is not reliable, why not mention the algo used by AA
 and the complexity on the page? i think that would make the
 doc looks better

I personally love the C++ spec in this respect. Everything in the library is treated as an algorithm and complexity information is defined for all operations. However, I'm not sure that specifics about the algorithm should be put into the D spec for AAs beyond that they must be O(1) for lookups, O(N) for iteration via foreach, and possibly some constraints on insert/remove performance (possibly amortized O(1) for inserts and maybe O(N) for removals?).

I agree it would be nice if this were specified. However, I don't think inserts are currently amortized O(1), are they?

Oops, you're right. They're typically O(N) with a best case (omega?) of 1. Double hashing is O(1) for inserts I think, but removals tend to be somewhat messy.
 IIRC the Phobos implementation uses a hash table with binary trees for 
 collision handling, which would mean /average/ O(1) for "good" toHash() 
 implementations, but worst-case O(log N) (for "bad" toHash()s).
 Also, that would mean removals are currently O(log N) as well, but 
 specifying O(N) would allow other standard library implementations to 
 use linked lists instead of binary trees so perhaps that's for the best.

Yup, which is correct IMO. There should be no restriction on implementation other than that some form of hashing is desired :-) Sean
Mar 17 2007
parent reply Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
Sean Kelly wrote:
 Double hashing is O(1) for inserts I think, but removals tend to be 
 somewhat messy.

I don't think so. AFAIK, it's just a technique that attempts to avoid the clustering effect that linear probing suffers from. However, in a full table it can still take O(N) probes before an empty slot is found. Wikipedia seems to back me up on this: "Like all other forms of open addressing, double hashing becomes linear as the hash table approaches maximum capacity" (http://en.wikipedia.org/wiki/Double_hashing)
 Frits van Bommel wrote:
 IIRC the Phobos implementation uses a hash table with binary trees for 
 collision handling, which would mean /average/ O(1) for "good" 
 toHash() implementations, but worst-case O(log N) (for "bad" toHash()s).
 Also, that would mean removals are currently O(log N) as well, but 
 specifying O(N) would allow other standard library implementations to 
 use linked lists instead of binary trees so perhaps that's for the best.

Yup, which is correct IMO. There should be no restriction on implementation other than that some form of hashing is desired :-)

Right, I just wanted to mention that. (IIRC that's also the philosophy the C++ standard follows on this point)
Mar 17 2007
parent Sean Kelly <sean f4.ca> writes:
Frits van Bommel wrote:
 Sean Kelly wrote:
 Double hashing is O(1) for inserts I think, but removals tend to be 
 somewhat messy.

I don't think so. AFAIK, it's just a technique that attempts to avoid the clustering effect that linear probing suffers from. However, in a full table it can still take O(N) probes before an empty slot is found. Wikipedia seems to back me up on this: "Like all other forms of open addressing, double hashing becomes linear as the hash table approaches maximum capacity" (http://en.wikipedia.org/wiki/Double_hashing)

Could be, I suppose. Though I'd have to tell a Data Structures professor I know that his proof is incorrect. I suppose I could be mis-remembering the conclusion as well though. Hm... it may have been based on particular growth conditions, like never letting the table get more than half full before a rehash. Sean
Mar 17 2007
prev sibling parent reply Derek Parnell <derek psych.ward> writes:
On Sat, 17 Mar 2007 23:29:18 +0800, Davidl 126.com wrote:

 without looking into phobos or join irc channel people would
 never know what algo is used by AA. and that would make people
 feel AA is not reliable, why not mention the algo used by AA
 and the complexity on the page? i think that would make the
 doc looks better

Isn't the alogrithm used an implementation detail of the compiler and not the language? I would see that any description on the AA alogrithm should be placed in the docs for the compiler and not mandated into the language. -- Derek Parnell Melbourne, Australia "Justice for David Hicks!" skype: derek.j.parnell
Mar 17 2007
parent Tom <tom nospam.com> writes:
Derek Parnell escribió:
 On Sat, 17 Mar 2007 23:29:18 +0800, Davidl 126.com wrote:
 
 without looking into phobos or join irc channel people would
 never know what algo is used by AA. and that would make people
 feel AA is not reliable, why not mention the algo used by AA
 and the complexity on the page? i think that would make the
 doc looks better

Isn't the alogrithm used an implementation detail of the compiler and not the language? I would see that any description on the AA alogrithm should be placed in the docs for the compiler and not mandated into the language.

I agree, but I think that time/space complexity IS mandatory in the language specification. -- Tom;
Mar 17 2007