digitalmars.D - AA with complex keytype?
- Manfred Nowak (4/4) Feb 09 2007 For complex numbers comparisons other than == and != are not defined.
- Andrei Alexandrescu (See Website For Email) (4/9) Feb 09 2007 Asociative arrays are unordered. All they need is a hash function, which...
- Oskar Linde (3/12) Feb 09 2007 Not the DMD associative arrays. They require full ordering.
- Lionello Lunesu (3/15) Feb 09 2007 Yes, ordering (opCmp) is used in case of hash collisions.
- Frits van Bommel (18/21) Feb 09 2007 Also correct.
- Brian Byrne (3/9) Feb 09 2007 Transitivity! :)
- Frits van Bommel (3/13) Feb 09 2007 "that sort of thing" == "and stuff like that", i.e. not *just* that.
- Jarrett Billingsley (3/5) Feb 09 2007 Woah, woah, woah! Calm down there, killer!
- Frits van Bommel (5/11) Feb 09 2007 I think that came over wrong. Maybe I should have put in a smiley or
- Manfred Nowak (11/13) Feb 09 2007 Thank you. This answers my question.
- Frits van Bommel (10/26) Feb 09 2007 By "that sort of thing", I meant "and similar properties". For instance,...
- Manfred Nowak (6/7) Feb 09 2007 In fact those requirements are necessary only for binary tress built in
- Frits van Bommel (4/10) Feb 09 2007 That might not be a great solution, especially when you need to consider...
- Stewart Gordon (10/18) Feb 09 2007 Apparently because Walter doesn't like the idea of providing an alternat...
- Manfred Nowak (19/25) Feb 09 2007 Following code shows under 1.005 that no compile time error is given.
- Frits van Bommel (7/13) Feb 09 2007 If they'd be put into a linear list you wouldn't even *need* opCmp, so
- Manfred Nowak (14/16) Feb 09 2007 Currently implemented AA's need to have for a good runtime beahviour at
- Frits van Bommel (17/34) Feb 09 2007 I haven't inspected the current AA implementation in detail. Are you
- Manfred Nowak (6/11) Feb 09 2007 Such an opCmp requires the absence of rebalancing operations. And there
- Frits van Bommel (6/13) Feb 09 2007 Doesn't it also require that objects are always passed in the same
- Manfred Nowak (7/8) Feb 10 2007 No. Because if you are at a node in the tree where a==b, you are done.
- Frits van Bommel (13/22) Feb 10 2007 Let's say a!=b. If compare(a,b) returns a!=b, as you proposed, that
- Manfred Nowak (7/9) Feb 11 2007 That's right. The value stemming from the collision bucket has to be at
- Stewart Gordon (4/10) Feb 10 2007 Balancing the trees on the fly would be inefficient. But rehash ought t...
- Frits van Bommel (6/16) Feb 10 2007 There are several well-known methods of efficiently keeping binary trees...
- Bill Baxter (11/29) Feb 10 2007 So much for O(N lg N) performance in the case of a really bad hash
- Jarrett Billingsley (16/19) Feb 10 2007 Lua tables use (according to the source) "a mix of chained scatter table...
- Charles D Hixson (11/20) Feb 13 2007 Actually, there are many cases where modular arithmetic to
- Joel C. Salomon (8/17) Feb 14 2007 Mathematically speaking, a group like ℤ₃ is unordered, precisely for...
- Manfred Nowak (3/8) Feb 14 2007 It is faster to not impose an ordering on an unordered set.
- Joel Salomon (2/9) Feb 14 2007 Do you know where D insists that “some” ordering be applied to a typ...
- Manfred Nowak (17/19) Feb 14 2007 I have given an example that an ordering _operator_ is required by D in
- Frits van Bommel (10/19) Feb 14 2007 Associative arrays. http://www.digitalmars.com/d/arrays.html#associative
- Nicolai Waniek (7/11) Feb 14 2007 What's about this:
- Manfred Nowak (12/19) Feb 14 2007 It depends on how one defines "increment".
For complex numbers comparisons other than == and != are not defined. AA's require classes to have comparisons defined. Why are AA's of complex numbers allowed? -manfred
Feb 09 2007
Manfred Nowak wrote:For complex numbers comparisons other than == and != are not defined. AA's require classes to have comparisons defined. Why are AA's of complex numbers allowed?Asociative arrays are unordered. All they need is a hash function, which can be defined over any data type. Andrei
Feb 09 2007
Andrei Alexandrescu (See Website For Email) wrote:Manfred Nowak wrote:Not the DMD associative arrays. They require full ordering. /OskarFor complex numbers comparisons other than == and != are not defined. AA's require classes to have comparisons defined. Why are AA's of complex numbers allowed?Asociative arrays are unordered. All they need is a hash function, which can be defined over any data type.
Feb 09 2007
Oskar Linde wrote:Andrei Alexandrescu (See Website For Email) wrote:Yes, ordering (opCmp) is used in case of hash collisions. L.Manfred Nowak wrote:Not the DMD associative arrays. They require full ordering.For complex numbers comparisons other than == and != are not defined. AA's require classes to have comparisons defined. Why are AA's of complex numbers allowed?Asociative arrays are unordered. All they need is a hash function, which can be defined over any data type.
Feb 09 2007
Manfred Nowak wrote:For complex numbers comparisons other than == and != are not defined.Indeed.AA's require classes to have comparisons defined.Also correct.Why are AA's of complex numbers allowed?For one thing, complex numbers aren't classes :P. Seriously: AA's use the TypeInfo.compare() method to compare. For complex types, this is implemented as lexicographical ordering (compare the real parts, and return the result unless it's equal; in that case, return the result of comparing the imaginary parts). The reason "<" and friends aren't defined for complex types is that there's no definition that makes any mathematical sense (at least, not more than other definitions). Fortunately, AA's don't require the ordering to make any mathematical sense at all, they just need some way to decide in what order to put them in a binary three if a hash collision should occur. As long as the order is consistent[1] (a < b && b < c => a < c, that sort of thing) they're happy. [1]: I'm sure there's a mathematical term for this that I can't currently remember.
Feb 09 2007
Frits van Bommel Wrote:[1] (a < b && b < c => a < c, that sort of thing) ... [1]: I'm sure there's a mathematical term for this that I can't currently remember.Transitivity! :) Brian Byrne
Feb 09 2007
Brian Byrne wrote:Frits van Bommel Wrote:"that sort of thing" == "and stuff like that", i.e. not *just* that. If I'd meant transitivity, I'd have said that (that term I know).[1] (a < b && b < c => a < c, that sort of thing) ... [1]: I'm sure there's a mathematical term for this that I can't currently remember.Transitivity! :)
Feb 09 2007
"Frits van Bommel" <fvbommel REMwOVExCAPSs.nl> wrote in message news:eqi7ph$io7$2 digitaldaemon.com..."that sort of thing" == "and stuff like that", i.e. not *just* that. If I'd meant transitivity, I'd have said that (that term I know).Woah, woah, woah! Calm down there, killer!
Feb 09 2007
Jarrett Billingsley wrote:"Frits van Bommel" <fvbommel REMwOVExCAPSs.nl> wrote in message news:eqi7ph$io7$2 digitaldaemon.com...I think that came over wrong. Maybe I should have put in a smiley or something. I just wanted to clarify that point. So hard to convey tone and stuff over a text-only medium..."that sort of thing" == "and stuff like that", i.e. not *just* that. If I'd meant transitivity, I'd have said that (that term I know).Woah, woah, woah! Calm down there, killer!
Feb 09 2007
Frits van Bommel wroteAs long as the order is consistent[1] (a < b && b < c => a < c, that sort of thing) they're happy.Thank you. This answers my question. But immediately another arises: Why are classes required to implement opCmp( Object) for AA's to function properly? Currently AA's misbehave if opCmp(Object) is not overwritten for a class, but no error is thrown. For the transitivity of an ordering it suffices for opCmp(Object) to always give "==" if the objects are equal, and give "<" if they are not. It should be easy to implement this as the default opCmp(Object).
Feb 09 2007
Manfred Nowak wrote:Frits van Bommel wroteBecause the TypeInfo for objects uses Object.opCmp (if neither is null)As long as the order is consistent[1] (a < b && b < c => a < c, that sort of thing) they're happy.Thank you. This answers my question. But immediately another arises: Why are classes required to implement opCmp( Object) for AA's to function properly?Currently AA's misbehave if opCmp(Object) is not overwritten for a class, but no error is thrown. For the transitivity of an ordering it suffices for opCmp(Object) to always give "==" if the objects are equal, and give "<" if they are not.By "that sort of thing", I meant "and similar properties". For instance, the order also needs to be non-symmetric: a < b => !(b < a)[1]. Your proposed default implementation doesn't satisfy that. IIRC Object used to compare addresses by default, but that was problematic because that behavior disallows moving garbage collectors.It should be easy to implement this as the default opCmp(Object).But as explained above it wouldn't work correctly... [1]: Actually, since the compare return value can have three meanings (less than, equal to, greater than), that would be: a < b => b > a
Feb 09 2007
Frits van Bommel wroteYour proposed default implementation doesn't satisfy that.In fact those requirements are necessary only for binary tress built in the collision buckets. If one can live with linear lists in the collsions baskets, then opCmp(Object) is allowed to degenerate to !opEquals(Object). -manfred
Feb 09 2007
Manfred Nowak wrote:Frits van Bommel wroteYes, but the current implementation *does* use binary trees, IIRC.Your proposed default implementation doesn't satisfy that.In fact those requirements are necessary only for binary tress built in the collision buckets.If one can live with linear lists in the collsions baskets, then opCmp(Object) is allowed to degenerate to !opEquals(Object).That might not be a great solution, especially when you need to consider that user-defined hash functions may not be very good...
Feb 09 2007
Manfred Nowak Wrote: <snip>Why are classes required to implement opCmp( Object) for AA's to function properly?Apparently because Walter doesn't like the idea of providing an alternative implementation of AAs for unordered classes, but hasn't to my knowledge given a truly convincing reason. My library implementation http://pr.stewartsplace.org.uk/d/sutil/ uses only toHash and opEquals, thus works on unordered types.Currently AA's misbehave if opCmp(Object) is not overwritten for a class, but no error is thrown.Currently? Not under DMD 1.005 as I try it. Please supply a code example that demonstrates this behaviour.For the transitivity of an ordering it suffices for opCmp(Object) to always give "==" if the objects are equal, and give "<" if they are not.So if x != y, then both x < y and y < x? That wouldn't make sense at all.It should be easy to implement this as the default opCmp(Object).OUAT, Object.opCmp was set up to compare the memory addresses, but this behaviour was removed to prepare for copying/compacting GC, and probably partly to eliminate the confusing behaviour it created. Stewart.
Feb 09 2007
Stewart Gordon wroteFollowing code shows under 1.005 that no compile time error is given. class C{ hash_t toHash() { return 0; } } void main(){ bool[C] map; map[ new C]= true; map[ new C]= false; } A runtime error shows up only, when a comparison is actually needed. This might be far too late. In this example the runtime error is forced by implementing a worthless toHash.Currently AA's misbehave if opCmp(Object) is not overwritten for a class, but no error is thrown.Currently? Not under DMD 1.005 as I try it. Please supply a code example that demonstrates this behaviour.So if x != y, then both x < y and y < x? That wouldn't make sense at all.Enough sense for an AA: all colliding elements are put into a linear list and searched in sequence on retrieval. Without a good toHash this would natrally lead to a bad runtime behaviour. -manfred
Feb 09 2007
Manfred Nowak wrote:If they'd be put into a linear list you wouldn't even *need* opCmp, so no opCmp would be "enough sence" as well ;). But the fact is they are put into a binary tree to keep acceptable behavior (i.e. O(log N) instead of O(N)) when the hash function sucks. Since hash functions can be user-defined, and not all users are experts at hashing, you need to consider that use case.So if x != y, then both x < y and y < x? That wouldn't make sense at all.Enough sense for an AA: all colliding elements are put into a linear list and searched in sequence on retrieval. Without a good toHash this would natrally lead to a bad runtime behaviour.
Feb 09 2007
Frits van Bommel wroteSince hash functions can be user-defined, and not all users are experts at hashing, you need to consider that use case.Currently implemented AA's need to have for a good runtime beahviour at least one of 1) or 2) where: (1a) toHash implements a good distribution and (1b) opCmp==!opEquals or better or (2a) toHash worse than described by (1a) and (2b) opCmp implements an ordering suitable for use in unbalanced binary trees There is no evidence, that users that are no experts at hashing will be experts at implementing a suitable ordering, especially if the elements do not have a natural ordering. -manfred
Feb 09 2007
Manfred Nowak wrote:Frits van Bommel wroteI haven't inspected the current AA implementation in detail. Are you sure that an opCmp that never returns "greater" (or, equivalently, never returns "less") will always work? It may well be that the current implementation's binary trees degenerate into a linked list in that case, but it may also very well be that some elements become irretrievable.Since hash functions can be user-defined, and not all users are experts at hashing, you need to consider that use case.Currently implemented AA's need to have for a good runtime beahviour at least one of 1) or 2) where: (1a) toHash implements a good distribution and (1b) opCmp==!opEquals or betteror (2a) toHash worse than described by (1a) and (2b) opCmp implements an ordering suitable for use in unbalanced binary trees There is no evidence, that users that are no experts at hashing will be experts at implementing a suitable ordering, especially if the elements do not have a natural ordering.Orderings are easier to get right than hash functions, I think. Pretty much everybody knows what an ordering looks like, but hash functions need to have a good distribution which can be a rather complicated topic. Of course, since you specified an ordering suitable for an _unbalanced_ binary tree that complicates things a bit. But I don't think good behavior in there depends much on the comparison function. It's more related to in which order you insert the elements. Which orders are good and bad _are_ of course dependent on the comparison... (do current AAs use unbalanced trees? I have no idea, I just know they use _some_ form of trees)
Feb 09 2007
Frits van Bommel wroteAre you sure that an opCmp that never returns "greater" (or, equivalently, never returns "less") will always work?Such an opCmp requires the absence of rebalancing operations. And there have to go some kudos to Walter for that being true.(do current AAs use unbalanced trees? I have no idea, I just know they use _some_ form of trees)They are unbalanced. Look at phobos/internal/aaA.d:278 and following lines. -manfred
Feb 09 2007
Manfred Nowak wrote:Frits van Bommel wroteDoesn't it also require that objects are always passed in the same parameter order to the comparison function? So when comparing two concrete objects a and b, you must always do compare(a, b), never compare(b, a). No matter which is the one you're looking for and which is the element in the tree.Are you sure that an opCmp that never returns "greater" (or, equivalently, never returns "less") will always work?Such an opCmp requires the absence of rebalancing operations. And there have to go some kudos to Walter for that being true.
Feb 09 2007
Frits van Bommel wroteyou must always do compare(a, b), never compare(b, a)No. Because if you are at a node in the tree where a==b, you are done. And if you are at a node in the tree where a!=b, you will go into the direction dictated by opCmp, which is by construction of opCmp the same for all permutations. This holds for all operations on the tree. -manfred
Feb 10 2007
Manfred Nowak wrote:Frits van Bommel wroteLet's say a!=b. If compare(a,b) returns a!=b, as you proposed, that evaluates to true, which converts to integer 1, representing "greater". So it returns that a>b holds. So if b is the node in the tree, you would then continue searching in the *right* branch. If on the other hand compare(b,a) is called, by the same process it returns 1 indicating b>a holds. If b is still the tree node, you would then continue your search in the *left* branch So the implementation needs to be careful about parameter order, and I don't want to go dig into the source to find out if it is. Nor do I want to check every Phobos/Tango release I use to find out if it _still_ is. Since the API doesn't specify this, I think it would be a mistake to rely on it even if it _currently_ holds.you must always do compare(a, b), never compare(b, a)No. Because if you are at a node in the tree where a==b, you are done. And if you are at a node in the tree where a!=b, you will go into the direction dictated by opCmp, which is by construction of opCmp the same for all permutations. This holds for all operations on the tree.
Feb 10 2007
Frits van Bommel wroteSo the implementation needs to be careful about parameter order,That's right. The value stemming from the collision bucket has to be at a fixed parameter position on all calls of opCmp. But this can be easily verified by inserting and retrieving two elements under a worthless toHash.Since the API doesn't specify thisThat's also right. But this also can be fixed easily. -manfred
Feb 11 2007
Manfred Nowak Wrote:Frits van Bommel wrote<snip>Balancing the trees on the fly would be inefficient. But rehash ought to balance them while it's at it. Stewart.(do current AAs use unbalanced trees? I have no idea, I just know they use _some_ form of trees)They are unbalanced. Look at phobos/internal/aaA.d:278 and following lines.
Feb 10 2007
Stewart Gordon wrote:Manfred Nowak Wrote:There are several well-known methods of efficiently keeping binary trees (reasonably) balanced. Off the top of my head there are AVL trees and red-black trees, but those definitely aren't the only ones. For implementing a hash-table bucket this is probably not necessary, though. There typically shouldn't be very much elements in a bucket.Frits van Bommel wrote<snip>Balancing the trees on the fly would be inefficient. But rehash ought to balance them while it's at it.(do current AAs use unbalanced trees? I have no idea, I just know they use _some_ form of trees)They are unbalanced. Look at phobos/internal/aaA.d:278 and following lines.
Feb 10 2007
Frits van Bommel wrote:Stewart Gordon wrote:Manfred Nowak Wrote:Frits van Bommel wrote<snip>Balancing the trees on the fly would be inefficient. But rehash ought to balance them while it's at it.(do current AAs use unbalanced trees? I have no idea, I just know they use _some_ form of trees)They are unbalanced. Look at phobos/internal/aaA.d:278 and following lines.There are several well-known methods of efficiently keeping binary trees (reasonably) balanced. Off the top of my head there are AVL trees and red-black trees, but those definitely aren't the only ones. For implementing a hash-table bucket this is probably not necessary, though. There typically shouldn't be very much elements in a bucket.So much for O(N lg N) performance in the case of a really bad hash function, though. You'll get O(N) if you happen to insert your values such that their hashes are strictly increasing or strictly decreasing. In theory anyway. ;-) I wouldn't be surprised though if in practice the simplicity of leaving out balancing outweighs the theoretical advantage of balanced trees in 95% of the cases. Someone who has too much time on their hands should take a look at what the code for Python dicts and Lua tables do. Both of those are reputed to be heavily tweaked for super-duper performance. --bb
Feb 10 2007
"Bill Baxter" <dnewsgroup billbaxter.com> wrote in message news:eqkmg1$hnb$1 digitaldaemon.com...Someone who has too much time on their hands should take a look at what the code for Python dicts and Lua tables do. Both of those are reputed to be heavily tweaked for super-duper performance.Lua tables use (according to the source) "a mix of chained scatter table with Brent's variation." The insert method comment reads: /* ** inserts a new key into a hash table; first, check whether key's main ** position is free. If not, check whether colliding node is in its main ** position or not: if it is not, move colliding node to an empty place and ** put new key in its main position; otherwise (colliding node is in its main ** position), new key goes to an empty position. */ It's basically using a variation on separate chaining. Additionally, Lua 5 tables will attempt to keep integer keys in a true array rather than in the hash, which improves performance when using tables as arrays, but that's not really necessary in a language that has true arrays like D ;)
Feb 10 2007
Stewart Gordon wrote:<snip> So if x != y, then both x < y and y < x? That wouldn't make sense at all.Actually, there are many cases where modular arithmetic to various bases is the desired behavior, and in such a system "both x < y and y < x" makes perfect sense. You can reach ANY number by repeatedly incrementing and also by repeatedly decrementing. OTOH, I'm not certain that these are worth building into a language. Ada thought so, but no other language appears to have followed their lead. (Possibly it was also done in Algol68 ... I never used it, but it contained all sorts of experimental things.)It should be easy to implement this as the default opCmp(Object).OUAT, Object.opCmp was set up to compare the memory addresses, but this behaviour was removed to prepare for copying/compacting GC, and probably partly to eliminate the confusing behaviour it created. Stewart.
Feb 13 2007
Charles D Hixson wrote:Stewart Gordon wrote:Mathematically speaking, a group like ℤ₃ is unordered, precisely for the reason given. If we need to impose an ordering for array purposes (I don’t understand that bit, just (mis)repeating something I°ve seen said on the subject), then give the ordering lexicographic properties. (x < y) && (y < x) should not ever return true. --JoelSo if x != y, then both x < y and y < x? That wouldn't make sense at all.Actually, there are many cases where modular arithmetic to various bases is the desired behavior, and in such a system "both x < y and y < x" makes perfect sense. You can reach ANY number by repeatedly incrementing and also by repeatedly decrementing.
Feb 14 2007
Joel C. Salomon wroteMathematically speaking, a group like ℤ₃ is unordered, precisely for the reason given. If we need to impose an ordering for array purposes (I don’t understand that bit, just (mis)repeating something I°ve seen said on the subject), then give the ordering lexicographic properties.It is faster to not impose an ordering on an unordered set. -manfred
Feb 14 2007
Do you know where D insists that “some” ordering be applied to a type, and why?Mathematically speaking, a group like ℤ₃ is unordered, precisely for the reason given. If we need to impose an ordering for array purposes (I don’t understand that bit, just (mis)repeating something I’ve seen said on the subject), then give the ordering lexicographic properties.It is faster to not impose an ordering on an unordered set.
Feb 14 2007
Joel Salomon wroteDo you know where D insists that “some” ordering be applied to a type, and why?I have given an example that an ordering _operator_ is required by D in another branch of this thread: class C{ hash_t toHash() { return 0; } } void main(){ bool[C] map; map[ new C]= true; map[ new C]= false; } And this branch of the thread discusses Stewarts remark, that an ordering operator, which doesn't implement an ordering is senseless in his opinion. -manfred
Feb 14 2007
Joel Salomon wrote:Associative arrays. http://www.digitalmars.com/d/arrays.html#associative They're implemented as a hash table with a binary tree per bucket. Binary trees require some ordering between the elements. This isn't the only implementation option of course, for example it could also be a linked list instead of a binary tree. Or it could not use a hash table at all and just use one big binary tree[1]. (But AFAICT the only reasonably efficient option without requiring ordering would be a hash table with a linked list per bucket) [1]: Presumably some self-balancing tree like red-black or AVL.Do you know where D insists that “some” ordering be applied to a type, and why?Mathematically speaking, a group like ℤ₃ is unordered, precisely for the reason given. If we need to impose an ordering for array purposes (I don’t understand that bit, just (mis)repeating something I’ve seen said on the subject), then give the ordering lexicographic properties.It is faster to not impose an ordering on an unordered set.
Feb 14 2007
Charles D Hixson wrote:Actually, there are many cases where modular arithmetic to various bases is the desired behavior, and in such a system "both x < y and y < x" makes perfect sense. You can reach ANY number by repeatedly incrementing and also by repeatedly decrementing.What's about this: a = 1+0j b = 0+1j how would you increment a to finnally get to b? how do you want to compare two complex numbers? this is mathematically impossible. Nicolai
Feb 14 2007
Nicolai Waniek wroteWhat's about this: a = 1+0j b = 0+1j how would you increment a to finnally get to b? how do you want to compare two complex numbers? this is mathematically impossible.It depends on how one defines "increment". Because Charles only mentions one operation, "increment" can be defined to be the normal complex multiplication. Then "increment( a, b)= b" holds in your case above. If one wants to define an ordering for all points on the unit-circle around the origin Charles is also righteous to define: (a != b) => (a < b) & (b < a) Because you can travel on the circumference of the circle in each direction until "a != b" turns out to be false. -manfred
Feb 14 2007