digitalmars.D.learn - What does it mean for opCmp and opEquals to be consistent?

dnspies (12/12) Apr 03 2014 To make a struct a valid key type, do I need to implement both

Jonathan M Davis (36/49) Apr 03 2014 _Any_ type which overloads both opEquals and opCmp and does not make the...

monarch_dodra (22/29) Apr 03 2014 Yes.

w0rp (4/7) Apr 03 2014 I would add to that, "try to use opCmp if it is available." It
dnspies (12/42) Apr 03 2014 I would agree. Formally, it seems natural that a comparison
Steven Schveighoffer (9/25) Apr 03 2014 Then you shouldn't use opCmp.

monarch_dodra (8/41) Apr 03 2014 RBTree make no claim to test for equality, but only equivalence,

Steven Schveighoffer (15/27) Apr 03 2014 I think a user of your 2d point may object to both (0,1) and (1,0) not

"dnspies" <dspies ualberta.ca> writes:

To make a struct a valid key type, do I need to implement both
opCmp and opEquals or just one or the other?  It says on the page
about associative arrays: "The implementation may use either
opEquals or opCmp or both."  Does that mean it uses whichever one
is user-defined (or both if they're both user-defined)?  Or does
it mean the user is responsible for defining both?

Immediately after that it says:
"Care should be taken so that the results of
opEquals and opCmp are consistent with each other when the
struct/union objects are the same or not.", certainly this means
that if a.opEquals(b), then a.opCmp(b) should be 0, but does the
converse have to be true?

Apr 03 2014

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Thursday, April 03, 2014 07:10:06 dnspies wrote:
 To make a struct a valid key type, do I need to implement both
 opCmp and opEquals or just one or the other?  It says on the page
 about associative arrays: "The implementation may use either
 opEquals or opCmp or both."  Does that mean it uses whichever one
 is user-defined (or both if they're both user-defined)?  Or does
 it mean the user is responsible for defining both?
 
 Immediately after that it says:
 "Care should be taken so that the results of
 opEquals and opCmp are consistent with each other when the
 struct/union objects are the same or not.", certainly this means
 that if a.opEquals(b), then a.opCmp(b) should be 0, but does the
 converse have to be true?

_Any_ type which overloads both opEquals and opCmp and does not make them 
match exactly is just plain broken.

If a.opEquals(b) is true, then a.opCmp(b) must be 0.

If a.opEquals(b) is false, then a.opCmp(b) must be non-zero.

If a.opCmp(b) is 0, then a.opEquals(b) must be true.

If a.opCmp(b) is non-zero, then a.opEquals(b) must be false.

Think of how integers work with ==, <=, <, >, >=, and !=. If a user-defined 
type does not follow the same logic for how those operators are related (e.g. 
== implies that <= and >= are true and that <, >, and != are false), then it's 
broken and will likely fail to work properly with generic code. Don't get cute 
and try and make them mean something different than they mean with integers. 
In general, D's operator overloading was designed with the idea that you 
wouldn't try to overload the operators to mean anything different from what 
they mean with the built-in types, and you're just going to run into trouble 
if you try (unlike C++, which is far less restrictive about it and lets you do 
pretty much whatever you want with overloaded operators).

There _are_ types where it makes sense to define opEquals but not opCmp, 
because they're comparable but not sortable. So, just because you define 
opEquals does not mean that you need to define opCmp. Also, if the default-
generated opEquals does what you need, then you'd only need to define opCmp 
and not opEquals - but you'd still need to make it so that the opCmp 
implementation matched the default-generated opEquals, or you'd be in trouble.

Now, as to AAs specifically, if you try and do

struct S
{
    int i;
}

void main()
{
    int[S] aa;
}

you get an error message complaining about opCmp not being defined, so 
apparently the compiler expects the AA implementation to use opCmp (though I 
have no idea if it currently does), and you're going to need to define it.

- Jonathan M Davis

Apr 03 2014

"monarch_dodra" <monarchdodra gmail.com> writes:

On Thursday, 3 April 2014 at 10:15:46 UTC, Jonathan M Davis wrote:
 _Any_ type which overloads both opEquals and opCmp and does not 
 make them
 match exactly is just plain broken.

I disagree:

 If a.opEquals(b) is true, then a.opCmp(b) must be 0.
 If a.opCmp(b) is non-zero, then a.opEquals(b) must be false.

Yes.

 If a.opEquals(b) is false, then a.opCmp(b) must be non-zero.
 If a.opCmp(b) is 0, then a.opEquals(b) must be true.

I disagree.

You could have types with full comparison, but only *partial 
ordering*, depending on the "norm" you used to define their 
weight.

A trivial example would be a 2D coordinate object, where opEquals 
is what you'd expect, and opCmp is some sort of norm (say, 
Euclidian).

In this context, (1, 0) and (0, 1) would not be equal, but they'd 
have the same "weight" in terms of ordering.

EG: They are not "equal", but they are "equivalent" (to quote C++ 
terminology)

Or for example, a "Person" object. You'd have strict equality. 
But for ordering, you'd use height.

So (John, 182cm) and (David, 182cm) would not be equal, but 
they'd be equivalent.

--------

A correctly implemented AA would use opCmp to store objects in 
each bucket in cases of hash collisions, but still use opEqual in 
case of equivalence.

Apr 03 2014

"w0rp" <devw0rp gmail.com> writes:

On Thursday, 3 April 2014 at 10:42:33 UTC, monarch_dodra wrote:
 A correctly implemented AA would use opCmp to store objects in 
 each bucket in cases of hash collisions, but still use opEqual 
 in case of equivalence.

I would add to that, "try to use opCmp if it is available." It 
should be possible to hash something if it has toHash and 
opEquals, and not require opCmp for hashing.

Apr 03 2014

"dnspies" <dspies ualberta.ca> writes:

On Thursday, 3 April 2014 at 10:42:33 UTC, monarch_dodra wrote:
 On Thursday, 3 April 2014 at 10:15:46 UTC, Jonathan M Davis 
 wrote:
 _Any_ type which overloads both opEquals and opCmp and does 
 not make them
 match exactly is just plain broken.

 I disagree:

 If a.opEquals(b) is true, then a.opCmp(b) must be 0.
 If a.opCmp(b) is non-zero, then a.opEquals(b) must be false.

 Yes.

 If a.opEquals(b) is false, then a.opCmp(b) must be non-zero.
 If a.opCmp(b) is 0, then a.opEquals(b) must be true.

 I disagree.

 You could have types with full comparison, but only *partial 
 ordering*, depending on the "norm" you used to define their 
 weight.

 A trivial example would be a 2D coordinate object, where 
 opEquals is what you'd expect, and opCmp is some sort of norm 
 (say, Euclidian).

 In this context, (1, 0) and (0, 1) would not be equal, but 
 they'd have the same "weight" in terms of ordering.

 EG: They are not "equal", but they are "equivalent" (to quote 
 C++ terminology)

 Or for example, a "Person" object. You'd have strict equality. 
 But for ordering, you'd use height.

 So (John, 182cm) and (David, 182cm) would not be equal, but 
 they'd be equivalent.

 --------

 A correctly implemented AA would use opCmp to store objects in 
 each bucket in cases of hash collisions, but still use opEqual 
 in case of equivalence.


I would agree.  Formally, it seems natural that a comparison 
operator could expect the equivalence classes of some equivalence 
relation to be totally ordered, while the elements within each 
class are unordered (this is more strict than just having a 
partial ordering, but less strict than requiring the elements 
themselves to be totally ordered).

But the documentation together with the feedback I've gotten here 
seems to imply that the elements do have to be totally ordered if 
I want to use them as AA keys (which is fine, it just means that 
after sorting by height, you should use names as a tiebreaker so 
(David, 182cm) < (John, 182cm) ).

Apr 03 2014

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Thu, 03 Apr 2014 06:42:32 -0400, monarch_dodra <monarchdodra gmail.com>  
wrote:

 On Thursday, 3 April 2014 at 10:15:46 UTC, Jonathan M Davis wrote:
 _Any_ type which overloads both opEquals and opCmp and does not make  
 them
 match exactly is just plain broken.

 I disagree:

 If a.opEquals(b) is true, then a.opCmp(b) must be 0.
 If a.opCmp(b) is non-zero, then a.opEquals(b) must be false.

 Yes.

 If a.opEquals(b) is false, then a.opCmp(b) must be non-zero.
 If a.opCmp(b) is 0, then a.opEquals(b) must be true.

 I disagree.

 You could have types with full comparison, but only *partial ordering*,  
 depending on the "norm" you used to define their weight.

Then you shouldn't use opCmp.

 A correctly implemented AA would use opCmp to store objects in each  
 bucket in cases of hash collisions, but still use opEqual in case of  
 equivalence.

This can lead to false positives if opCmp(x, y) == 0 is assumed to mean  
equal.

An example: if you used RBTree to store 2d points, and defined opCmp like  
you did, then you wouldn't be able to store both (1, 0) and (0, 1) in the  
same RBTree without duplicates.

-Steve

Apr 03 2014

"monarch_dodra" <monarchdodra gmail.com> writes:

On Thursday, 3 April 2014 at 16:47:05 UTC, Steven Schveighoffer 
wrote:
 On Thu, 03 Apr 2014 06:42:32 -0400, monarch_dodra 
 <monarchdodra gmail.com> wrote:

 On Thursday, 3 April 2014 at 10:15:46 UTC, Jonathan M Davis 
 wrote:
 _Any_ type which overloads both opEquals and opCmp and does 
 not make them
 match exactly is just plain broken.

 I disagree:

 If a.opEquals(b) is true, then a.opCmp(b) must be 0.
 If a.opCmp(b) is non-zero, then a.opEquals(b) must be false.

 Yes.

 If a.opEquals(b) is false, then a.opCmp(b) must be non-zero.
 If a.opCmp(b) is 0, then a.opEquals(b) must be true.

 I disagree.

 You could have types with full comparison, but only *partial 
 ordering*, depending on the "norm" you used to define their 
 weight.

 Then you shouldn't use opCmp.

 A correctly implemented AA would use opCmp to store objects in 
 each bucket in cases of hash collisions, but still use opEqual 
 in case of equivalence.

 This can lead to false positives if opCmp(x, y) == 0 is assumed 
 to mean equal.

 An example: if you used RBTree to store 2d points, and defined 
 opCmp like you did, then you wouldn't be able to store both (1, 
 0) and (0, 1) in the same RBTree without duplicates.

 -Steve

RBTree make no claim to test for equality, but only equivalence, 
and I'd find it perfectly normal for RBTree to not store both (0, 
1) / (1, 0).

An AA, on the other hand, really shouldn't care about 
equivalence, only equality.

That's how it works in C++ set/unordered_set anyways.

Apr 03 2014

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Thu, 03 Apr 2014 14:45:40 -0400, monarch_dodra <monarchdodra gmail.com>  
wrote:

 On Thursday, 3 April 2014 at 16:47:05 UTC, Steven Schveighoffer wrote:
 This can lead to false positives if opCmp(x, y) == 0 is assumed to mean  
 equal.

 An example: if you used RBTree to store 2d points, and defined opCmp  
 like you did, then you wouldn't be able to store both (1, 0) and (0, 1)  
 in the same RBTree without duplicates.

 RBTree make no claim to test for equality, but only equivalence, and I'd  
 find it perfectly normal for RBTree to not store both (0, 1) / (1, 0).

I think a user of your 2d point may object to both (0,1) and (1,0) not  
being stored.

 An AA, on the other hand, really shouldn't care about equivalence, only  
 equality.

This means, an AA which uses opCmp for collisions may get to a point where  
opCmp returns 0, then has to additionally call opEquals defensively. I  
don't think this is a good situation.

Note, I don't agree with using opCmp for collisions at all, but to say  
opCmp should be inconsistent with opEquals is not valid.

In your example, I would say a *property* of the point can have partial  
ordering, and in that case, opCmp is not the right fit. With D, it's all  
to easy to simply use a property for ordering, if that is what you wish.

But if I wanted to define opCmp on a 2d point, I would first compare x,  
then y.

-Steve

Apr 03 2014

D Programming

C/C++ Programming

Other

digitalmars.D.learn - What does it mean for opCmp and opEquals to be consistent?