www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - What does it mean for opCmp and opEquals to be consistent?

reply "dnspies" <dspies ualberta.ca> writes:
To make a struct a valid key type, do I need to implement both
opCmp and opEquals or just one or the other?  It says on the page
about associative arrays: "The implementation may use either
opEquals or opCmp or both."  Does that mean it uses whichever one
is user-defined (or both if they're both user-defined)?  Or does
it mean the user is responsible for defining both?

Immediately after that it says:
"Care should be taken so that the results of
opEquals and opCmp are consistent with each other when the
struct/union objects are the same or not.", certainly this means
that if a.opEquals(b), then a.opCmp(b) should be 0, but does the
converse have to be true?
Apr 03 2014
parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Thursday, April 03, 2014 07:10:06 dnspies wrote:
 To make a struct a valid key type, do I need to implement both
 opCmp and opEquals or just one or the other?  It says on the page
 about associative arrays: "The implementation may use either
 opEquals or opCmp or both."  Does that mean it uses whichever one
 is user-defined (or both if they're both user-defined)?  Or does
 it mean the user is responsible for defining both?
 
 Immediately after that it says:
 "Care should be taken so that the results of
 opEquals and opCmp are consistent with each other when the
 struct/union objects are the same or not.", certainly this means
 that if a.opEquals(b), then a.opCmp(b) should be 0, but does the
 converse have to be true?
_Any_ type which overloads both opEquals and opCmp and does not make them match exactly is just plain broken. If a.opEquals(b) is true, then a.opCmp(b) must be 0. If a.opEquals(b) is false, then a.opCmp(b) must be non-zero. If a.opCmp(b) is 0, then a.opEquals(b) must be true. If a.opCmp(b) is non-zero, then a.opEquals(b) must be false. Think of how integers work with ==, <=, <, >, >=, and !=. If a user-defined type does not follow the same logic for how those operators are related (e.g. == implies that <= and >= are true and that <, >, and != are false), then it's broken and will likely fail to work properly with generic code. Don't get cute and try and make them mean something different than they mean with integers. In general, D's operator overloading was designed with the idea that you wouldn't try to overload the operators to mean anything different from what they mean with the built-in types, and you're just going to run into trouble if you try (unlike C++, which is far less restrictive about it and lets you do pretty much whatever you want with overloaded operators). There _are_ types where it makes sense to define opEquals but not opCmp, because they're comparable but not sortable. So, just because you define opEquals does not mean that you need to define opCmp. Also, if the default- generated opEquals does what you need, then you'd only need to define opCmp and not opEquals - but you'd still need to make it so that the opCmp implementation matched the default-generated opEquals, or you'd be in trouble. Now, as to AAs specifically, if you try and do struct S { int i; } void main() { int[S] aa; } you get an error message complaining about opCmp not being defined, so apparently the compiler expects the AA implementation to use opCmp (though I have no idea if it currently does), and you're going to need to define it. - Jonathan M Davis
Apr 03 2014
parent reply "monarch_dodra" <monarchdodra gmail.com> writes:
On Thursday, 3 April 2014 at 10:15:46 UTC, Jonathan M Davis wrote:
 _Any_ type which overloads both opEquals and opCmp and does not 
 make them
 match exactly is just plain broken.
I disagree:
 If a.opEquals(b) is true, then a.opCmp(b) must be 0.
 If a.opCmp(b) is non-zero, then a.opEquals(b) must be false.
Yes.
 If a.opEquals(b) is false, then a.opCmp(b) must be non-zero.
 If a.opCmp(b) is 0, then a.opEquals(b) must be true.
I disagree. You could have types with full comparison, but only *partial ordering*, depending on the "norm" you used to define their weight. A trivial example would be a 2D coordinate object, where opEquals is what you'd expect, and opCmp is some sort of norm (say, Euclidian). In this context, (1, 0) and (0, 1) would not be equal, but they'd have the same "weight" in terms of ordering. EG: They are not "equal", but they are "equivalent" (to quote C++ terminology) Or for example, a "Person" object. You'd have strict equality. But for ordering, you'd use height. So (John, 182cm) and (David, 182cm) would not be equal, but they'd be equivalent. -------- A correctly implemented AA would use opCmp to store objects in each bucket in cases of hash collisions, but still use opEqual in case of equivalence.
Apr 03 2014
next sibling parent "w0rp" <devw0rp gmail.com> writes:
On Thursday, 3 April 2014 at 10:42:33 UTC, monarch_dodra wrote:
 A correctly implemented AA would use opCmp to store objects in 
 each bucket in cases of hash collisions, but still use opEqual 
 in case of equivalence.
I would add to that, "try to use opCmp if it is available." It should be possible to hash something if it has toHash and opEquals, and not require opCmp for hashing.
Apr 03 2014
prev sibling next sibling parent "dnspies" <dspies ualberta.ca> writes:
On Thursday, 3 April 2014 at 10:42:33 UTC, monarch_dodra wrote:
 On Thursday, 3 April 2014 at 10:15:46 UTC, Jonathan M Davis 
 wrote:
 _Any_ type which overloads both opEquals and opCmp and does 
 not make them
 match exactly is just plain broken.
I disagree:
 If a.opEquals(b) is true, then a.opCmp(b) must be 0.
 If a.opCmp(b) is non-zero, then a.opEquals(b) must be false.
Yes.
 If a.opEquals(b) is false, then a.opCmp(b) must be non-zero.
 If a.opCmp(b) is 0, then a.opEquals(b) must be true.
I disagree. You could have types with full comparison, but only *partial ordering*, depending on the "norm" you used to define their weight. A trivial example would be a 2D coordinate object, where opEquals is what you'd expect, and opCmp is some sort of norm (say, Euclidian). In this context, (1, 0) and (0, 1) would not be equal, but they'd have the same "weight" in terms of ordering. EG: They are not "equal", but they are "equivalent" (to quote C++ terminology) Or for example, a "Person" object. You'd have strict equality. But for ordering, you'd use height. So (John, 182cm) and (David, 182cm) would not be equal, but they'd be equivalent. -------- A correctly implemented AA would use opCmp to store objects in each bucket in cases of hash collisions, but still use opEqual in case of equivalence.
I would agree. Formally, it seems natural that a comparison operator could expect the equivalence classes of some equivalence relation to be totally ordered, while the elements within each class are unordered (this is more strict than just having a partial ordering, but less strict than requiring the elements themselves to be totally ordered). But the documentation together with the feedback I've gotten here seems to imply that the elements do have to be totally ordered if I want to use them as AA keys (which is fine, it just means that after sorting by height, you should use names as a tiebreaker so (David, 182cm) < (John, 182cm) ).
Apr 03 2014
prev sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 03 Apr 2014 06:42:32 -0400, monarch_dodra <monarchdodra gmail.com>  
wrote:

 On Thursday, 3 April 2014 at 10:15:46 UTC, Jonathan M Davis wrote:
 _Any_ type which overloads both opEquals and opCmp and does not make  
 them
 match exactly is just plain broken.
I disagree:
 If a.opEquals(b) is true, then a.opCmp(b) must be 0.
 If a.opCmp(b) is non-zero, then a.opEquals(b) must be false.
Yes.
 If a.opEquals(b) is false, then a.opCmp(b) must be non-zero.
 If a.opCmp(b) is 0, then a.opEquals(b) must be true.
I disagree. You could have types with full comparison, but only *partial ordering*, depending on the "norm" you used to define their weight.
Then you shouldn't use opCmp.
 A correctly implemented AA would use opCmp to store objects in each  
 bucket in cases of hash collisions, but still use opEqual in case of  
 equivalence.
This can lead to false positives if opCmp(x, y) == 0 is assumed to mean equal. An example: if you used RBTree to store 2d points, and defined opCmp like you did, then you wouldn't be able to store both (1, 0) and (0, 1) in the same RBTree without duplicates. -Steve
Apr 03 2014
parent reply "monarch_dodra" <monarchdodra gmail.com> writes:
On Thursday, 3 April 2014 at 16:47:05 UTC, Steven Schveighoffer 
wrote:
 On Thu, 03 Apr 2014 06:42:32 -0400, monarch_dodra 
 <monarchdodra gmail.com> wrote:

 On Thursday, 3 April 2014 at 10:15:46 UTC, Jonathan M Davis 
 wrote:
 _Any_ type which overloads both opEquals and opCmp and does 
 not make them
 match exactly is just plain broken.
I disagree:
 If a.opEquals(b) is true, then a.opCmp(b) must be 0.
 If a.opCmp(b) is non-zero, then a.opEquals(b) must be false.
Yes.
 If a.opEquals(b) is false, then a.opCmp(b) must be non-zero.
 If a.opCmp(b) is 0, then a.opEquals(b) must be true.
I disagree. You could have types with full comparison, but only *partial ordering*, depending on the "norm" you used to define their weight.
Then you shouldn't use opCmp.
 A correctly implemented AA would use opCmp to store objects in 
 each bucket in cases of hash collisions, but still use opEqual 
 in case of equivalence.
This can lead to false positives if opCmp(x, y) == 0 is assumed to mean equal. An example: if you used RBTree to store 2d points, and defined opCmp like you did, then you wouldn't be able to store both (1, 0) and (0, 1) in the same RBTree without duplicates. -Steve
RBTree make no claim to test for equality, but only equivalence, and I'd find it perfectly normal for RBTree to not store both (0, 1) / (1, 0). An AA, on the other hand, really shouldn't care about equivalence, only equality. That's how it works in C++ set/unordered_set anyways.
Apr 03 2014
parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 03 Apr 2014 14:45:40 -0400, monarch_dodra <monarchdodra gmail.com>  
wrote:

 On Thursday, 3 April 2014 at 16:47:05 UTC, Steven Schveighoffer wrote:
 This can lead to false positives if opCmp(x, y) == 0 is assumed to mean  
 equal.

 An example: if you used RBTree to store 2d points, and defined opCmp  
 like you did, then you wouldn't be able to store both (1, 0) and (0, 1)  
 in the same RBTree without duplicates.
RBTree make no claim to test for equality, but only equivalence, and I'd find it perfectly normal for RBTree to not store both (0, 1) / (1, 0).
I think a user of your 2d point may object to both (0,1) and (1,0) not being stored.
 An AA, on the other hand, really shouldn't care about equivalence, only  
 equality.
This means, an AA which uses opCmp for collisions may get to a point where opCmp returns 0, then has to additionally call opEquals defensively. I don't think this is a good situation. Note, I don't agree with using opCmp for collisions at all, but to say opCmp should be inconsistent with opEquals is not valid. In your example, I would say a *property* of the point can have partial ordering, and in that case, opCmp is not the right fit. With D, it's all to easy to simply use a property for ordering, if that is what you wish. But if I wanted to define opCmp on a 2d point, I would first compare x, then y. -Steve
Apr 03 2014