digitalmars.D.learn - What does it mean for opCmp and opEquals to be consistent?
- dnspies (12/12) Apr 03 2014 To make a struct a valid key type, do I need to implement both
- Jonathan M Davis (36/49) Apr 03 2014 _Any_ type which overloads both opEquals and opCmp and does not make the...
- monarch_dodra (22/29) Apr 03 2014 Yes.
- w0rp (4/7) Apr 03 2014 I would add to that, "try to use opCmp if it is available." It
- dnspies (12/42) Apr 03 2014 I would agree. Formally, it seems natural that a comparison
- Steven Schveighoffer (9/25) Apr 03 2014 Then you shouldn't use opCmp.
- monarch_dodra (8/41) Apr 03 2014 RBTree make no claim to test for equality, but only equivalence,
- Steven Schveighoffer (15/27) Apr 03 2014 I think a user of your 2d point may object to both (0,1) and (1,0) not
To make a struct a valid key type, do I need to implement both opCmp and opEquals or just one or the other? It says on the page about associative arrays: "The implementation may use either opEquals or opCmp or both." Does that mean it uses whichever one is user-defined (or both if they're both user-defined)? Or does it mean the user is responsible for defining both? Immediately after that it says: "Care should be taken so that the results of opEquals and opCmp are consistent with each other when the struct/union objects are the same or not.", certainly this means that if a.opEquals(b), then a.opCmp(b) should be 0, but does the converse have to be true?
Apr 03 2014
On Thursday, April 03, 2014 07:10:06 dnspies wrote:To make a struct a valid key type, do I need to implement both opCmp and opEquals or just one or the other? It says on the page about associative arrays: "The implementation may use either opEquals or opCmp or both." Does that mean it uses whichever one is user-defined (or both if they're both user-defined)? Or does it mean the user is responsible for defining both? Immediately after that it says: "Care should be taken so that the results of opEquals and opCmp are consistent with each other when the struct/union objects are the same or not.", certainly this means that if a.opEquals(b), then a.opCmp(b) should be 0, but does the converse have to be true?_Any_ type which overloads both opEquals and opCmp and does not make them match exactly is just plain broken. If a.opEquals(b) is true, then a.opCmp(b) must be 0. If a.opEquals(b) is false, then a.opCmp(b) must be non-zero. If a.opCmp(b) is 0, then a.opEquals(b) must be true. If a.opCmp(b) is non-zero, then a.opEquals(b) must be false. Think of how integers work with ==, <=, <, >, >=, and !=. If a user-defined type does not follow the same logic for how those operators are related (e.g. == implies that <= and >= are true and that <, >, and != are false), then it's broken and will likely fail to work properly with generic code. Don't get cute and try and make them mean something different than they mean with integers. In general, D's operator overloading was designed with the idea that you wouldn't try to overload the operators to mean anything different from what they mean with the built-in types, and you're just going to run into trouble if you try (unlike C++, which is far less restrictive about it and lets you do pretty much whatever you want with overloaded operators). There _are_ types where it makes sense to define opEquals but not opCmp, because they're comparable but not sortable. So, just because you define opEquals does not mean that you need to define opCmp. Also, if the default- generated opEquals does what you need, then you'd only need to define opCmp and not opEquals - but you'd still need to make it so that the opCmp implementation matched the default-generated opEquals, or you'd be in trouble. Now, as to AAs specifically, if you try and do struct S { int i; } void main() { int[S] aa; } you get an error message complaining about opCmp not being defined, so apparently the compiler expects the AA implementation to use opCmp (though I have no idea if it currently does), and you're going to need to define it. - Jonathan M Davis
Apr 03 2014
On Thursday, 3 April 2014 at 10:15:46 UTC, Jonathan M Davis wrote:_Any_ type which overloads both opEquals and opCmp and does not make them match exactly is just plain broken.I disagree:If a.opEquals(b) is true, then a.opCmp(b) must be 0. If a.opCmp(b) is non-zero, then a.opEquals(b) must be false.Yes.If a.opEquals(b) is false, then a.opCmp(b) must be non-zero. If a.opCmp(b) is 0, then a.opEquals(b) must be true.I disagree. You could have types with full comparison, but only *partial ordering*, depending on the "norm" you used to define their weight. A trivial example would be a 2D coordinate object, where opEquals is what you'd expect, and opCmp is some sort of norm (say, Euclidian). In this context, (1, 0) and (0, 1) would not be equal, but they'd have the same "weight" in terms of ordering. EG: They are not "equal", but they are "equivalent" (to quote C++ terminology) Or for example, a "Person" object. You'd have strict equality. But for ordering, you'd use height. So (John, 182cm) and (David, 182cm) would not be equal, but they'd be equivalent. -------- A correctly implemented AA would use opCmp to store objects in each bucket in cases of hash collisions, but still use opEqual in case of equivalence.
Apr 03 2014
On Thursday, 3 April 2014 at 10:42:33 UTC, monarch_dodra wrote:A correctly implemented AA would use opCmp to store objects in each bucket in cases of hash collisions, but still use opEqual in case of equivalence.I would add to that, "try to use opCmp if it is available." It should be possible to hash something if it has toHash and opEquals, and not require opCmp for hashing.
Apr 03 2014
On Thursday, 3 April 2014 at 10:42:33 UTC, monarch_dodra wrote:On Thursday, 3 April 2014 at 10:15:46 UTC, Jonathan M Davis wrote:I would agree. Formally, it seems natural that a comparison operator could expect the equivalence classes of some equivalence relation to be totally ordered, while the elements within each class are unordered (this is more strict than just having a partial ordering, but less strict than requiring the elements themselves to be totally ordered). But the documentation together with the feedback I've gotten here seems to imply that the elements do have to be totally ordered if I want to use them as AA keys (which is fine, it just means that after sorting by height, you should use names as a tiebreaker so (David, 182cm) < (John, 182cm) )._Any_ type which overloads both opEquals and opCmp and does not make them match exactly is just plain broken.I disagree:If a.opEquals(b) is true, then a.opCmp(b) must be 0. If a.opCmp(b) is non-zero, then a.opEquals(b) must be false.Yes.If a.opEquals(b) is false, then a.opCmp(b) must be non-zero. If a.opCmp(b) is 0, then a.opEquals(b) must be true.I disagree. You could have types with full comparison, but only *partial ordering*, depending on the "norm" you used to define their weight. A trivial example would be a 2D coordinate object, where opEquals is what you'd expect, and opCmp is some sort of norm (say, Euclidian). In this context, (1, 0) and (0, 1) would not be equal, but they'd have the same "weight" in terms of ordering. EG: They are not "equal", but they are "equivalent" (to quote C++ terminology) Or for example, a "Person" object. You'd have strict equality. But for ordering, you'd use height. So (John, 182cm) and (David, 182cm) would not be equal, but they'd be equivalent. -------- A correctly implemented AA would use opCmp to store objects in each bucket in cases of hash collisions, but still use opEqual in case of equivalence.
Apr 03 2014
On Thu, 03 Apr 2014 06:42:32 -0400, monarch_dodra <monarchdodra gmail.com> wrote:On Thursday, 3 April 2014 at 10:15:46 UTC, Jonathan M Davis wrote:Then you shouldn't use opCmp._Any_ type which overloads both opEquals and opCmp and does not make them match exactly is just plain broken.I disagree:If a.opEquals(b) is true, then a.opCmp(b) must be 0. If a.opCmp(b) is non-zero, then a.opEquals(b) must be false.Yes.If a.opEquals(b) is false, then a.opCmp(b) must be non-zero. If a.opCmp(b) is 0, then a.opEquals(b) must be true.I disagree. You could have types with full comparison, but only *partial ordering*, depending on the "norm" you used to define their weight.A correctly implemented AA would use opCmp to store objects in each bucket in cases of hash collisions, but still use opEqual in case of equivalence.This can lead to false positives if opCmp(x, y) == 0 is assumed to mean equal. An example: if you used RBTree to store 2d points, and defined opCmp like you did, then you wouldn't be able to store both (1, 0) and (0, 1) in the same RBTree without duplicates. -Steve
Apr 03 2014
On Thursday, 3 April 2014 at 16:47:05 UTC, Steven Schveighoffer wrote:On Thu, 03 Apr 2014 06:42:32 -0400, monarch_dodra <monarchdodra gmail.com> wrote:RBTree make no claim to test for equality, but only equivalence, and I'd find it perfectly normal for RBTree to not store both (0, 1) / (1, 0). An AA, on the other hand, really shouldn't care about equivalence, only equality. That's how it works in C++ set/unordered_set anyways.On Thursday, 3 April 2014 at 10:15:46 UTC, Jonathan M Davis wrote:Then you shouldn't use opCmp._Any_ type which overloads both opEquals and opCmp and does not make them match exactly is just plain broken.I disagree:If a.opEquals(b) is true, then a.opCmp(b) must be 0. If a.opCmp(b) is non-zero, then a.opEquals(b) must be false.Yes.If a.opEquals(b) is false, then a.opCmp(b) must be non-zero. If a.opCmp(b) is 0, then a.opEquals(b) must be true.I disagree. You could have types with full comparison, but only *partial ordering*, depending on the "norm" you used to define their weight.A correctly implemented AA would use opCmp to store objects in each bucket in cases of hash collisions, but still use opEqual in case of equivalence.This can lead to false positives if opCmp(x, y) == 0 is assumed to mean equal. An example: if you used RBTree to store 2d points, and defined opCmp like you did, then you wouldn't be able to store both (1, 0) and (0, 1) in the same RBTree without duplicates. -Steve
Apr 03 2014
On Thu, 03 Apr 2014 14:45:40 -0400, monarch_dodra <monarchdodra gmail.com> wrote:On Thursday, 3 April 2014 at 16:47:05 UTC, Steven Schveighoffer wrote:I think a user of your 2d point may object to both (0,1) and (1,0) not being stored.This can lead to false positives if opCmp(x, y) == 0 is assumed to mean equal. An example: if you used RBTree to store 2d points, and defined opCmp like you did, then you wouldn't be able to store both (1, 0) and (0, 1) in the same RBTree without duplicates.RBTree make no claim to test for equality, but only equivalence, and I'd find it perfectly normal for RBTree to not store both (0, 1) / (1, 0).An AA, on the other hand, really shouldn't care about equivalence, only equality.This means, an AA which uses opCmp for collisions may get to a point where opCmp returns 0, then has to additionally call opEquals defensively. I don't think this is a good situation. Note, I don't agree with using opCmp for collisions at all, but to say opCmp should be inconsistent with opEquals is not valid. In your example, I would say a *property* of the point can have partial ordering, and in that case, opCmp is not the right fit. With D, it's all to easy to simply use a property for ordering, if that is what you wish. But if I wanted to define opCmp on a 2d point, I would first compare x, then y. -Steve
Apr 03 2014