www.digitalmars.com         C & C++   DMDScript  

D - [Bug?] Associative array with class key doesn't work

reply Stewart Gordon <smjg_1998 yahoo.com> writes:
Using DMD 0.79, Windows 98SE.

In the program below, c1 and c2 are equal in just about every respect. 
So is c3 by the equality definition in the class.  Yet it doesn't 
consider them to be the same key in aa.  Hence the output is

4
No such element
No such element

Surely it should be looking up the key by actually comparing objects, 
not just memory addresses?

The same happens if I comment out opEquals and/or toHash, or throw a 
rehash statement in before the lookups.

Is this a bug, a gotcha, or something else?

Stewart.

----------
import std.c.stdio;
import std.ctype;

class IChar {
	char data;

	this(char d) { data = d; }

	bit opEquals(IChar i) {
		return toupper(data) == toupper(i.data);
	}

	uint toHash() {
		return toupper(data);
	}

	unittest {
		IChar c = new IChar('c');
		IChar d = new IChar('c');
		IChar C = new IChar('C');
		IChar e = new IChar('e');
		assert (c == d);
		assert (c == C);
		assert (c != e);
	}
}

int main() {
	int[IChar] aa;

	IChar c1 = new IChar('c');
	IChar c2 = new IChar('c');
	IChar c3 = new IChar('C');

	aa[c1] = 4;

	if (c1 in aa) {
		printf("%d\n", aa[c1]);
	} else {
		puts("No such element");
	}

	if (c2 in aa) {
		printf("%d\n", aa[c2]);
	} else {
		puts("No such element");
	}

	if (c3 in aa) {
		printf("%d\n", aa[c3]);
	} else {
		puts("No such element");
	}

	return 0;
}

-- 
My e-mail is valid but not my primary mailbox, aside from its being the 
unfortunate victim of intensive mail-bombing at the moment.  Please keep 
replies on the 'group where everyone may benefit.
Feb 11 2004
next sibling parent Carlos Santander B. <Carlos_member pathlink.com> writes:
In article <c0d66a$26hj$1 digitaldaemon.com>, Stewart Gordon says...
Using DMD 0.79, Windows 98SE.

In the program below, c1 and c2 are equal in just about every respect. 
So is c3 by the equality definition in the class.  Yet it doesn't 
consider them to be the same key in aa.  Hence the output is

4
No such element
No such element

Surely it should be looking up the key by actually comparing objects, 
not just memory addresses?

The same happens if I comment out opEquals and/or toHash, or throw a 
rehash statement in before the lookups.

Is this a bug, a gotcha, or something else?

Stewart.

----------
import std.c.stdio;
import std.ctype;

class IChar {
	char data;

	this(char d) { data = d; }

	bit opEquals(IChar i) {
		return toupper(data) == toupper(i.data);
	}

	uint toHash() {
		return toupper(data);
	}

	unittest {
		IChar c = new IChar('c');
		IChar d = new IChar('c');
		IChar C = new IChar('C');
		IChar e = new IChar('e');
		assert (c == d);
		assert (c == C);
		assert (c != e);
	}
}

int main() {
	int[IChar] aa;

	IChar c1 = new IChar('c');
	IChar c2 = new IChar('c');
	IChar c3 = new IChar('C');

	aa[c1] = 4;

	if (c1 in aa) {
		printf("%d\n", aa[c1]);
	} else {
		puts("No such element");
	}

	if (c2 in aa) {
		printf("%d\n", aa[c2]);
	} else {
		puts("No such element");
	}

	if (c3 in aa) {
		printf("%d\n", aa[c3]);
	} else {
		puts("No such element");
	}

	return 0;
}

-- 
My e-mail is valid but not my primary mailbox, aside from its being the 
unfortunate victim of intensive mail-bombing at the moment.  Please keep 
replies on the 'group where everyone may benefit.

Don't take it as a final word, but maybe it has to do with c1, c2 and c3 being different references. ------------------- Carlos Santander B.
Feb 11 2004
prev sibling next sibling parent reply "Ivan Senji" <ivan.senji public.srce.hr> writes:
It looks to me too that when the key of the associative array is class type
only references are compared and not the objects using the operator==

I had the same problem when trying to implement a set container! First i
used associative array because it was very simple to use but had to give it
up
becouse of this problem!

"Stewart Gordon" <smjg_1998 yahoo.com> wrote in message
news:c0d66a$26hj$1 digitaldaemon.com...
 Using DMD 0.79, Windows 98SE.

 In the program below, c1 and c2 are equal in just about every respect.
 So is c3 by the equality definition in the class.  Yet it doesn't
 consider them to be the same key in aa.  Hence the output is

 4
 No such element
 No such element

 Surely it should be looking up the key by actually comparing objects,
 not just memory addresses?

 The same happens if I comment out opEquals and/or toHash, or throw a
 rehash statement in before the lookups.

 Is this a bug, a gotcha, or something else?

 Stewart.

 ----------
 import std.c.stdio;
 import std.ctype;

 class IChar {
 char data;

 this(char d) { data = d; }

 bit opEquals(IChar i) {
 return toupper(data) == toupper(i.data);
 }

 uint toHash() {
 return toupper(data);
 }

 unittest {
 IChar c = new IChar('c');
 IChar d = new IChar('c');
 IChar C = new IChar('C');
 IChar e = new IChar('e');
 assert (c == d);
 assert (c == C);
 assert (c != e);
 }
 }

 int main() {
 int[IChar] aa;

 IChar c1 = new IChar('c');
 IChar c2 = new IChar('c');
 IChar c3 = new IChar('C');

 aa[c1] = 4;

 if (c1 in aa) {
 printf("%d\n", aa[c1]);
 } else {
 puts("No such element");
 }

 if (c2 in aa) {
 printf("%d\n", aa[c2]);
 } else {
 puts("No such element");
 }

 if (c3 in aa) {
 printf("%d\n", aa[c3]);
 } else {
 puts("No such element");
 }

 return 0;
 }

 --
 My e-mail is valid but not my primary mailbox, aside from its being the
 unfortunate victim of intensive mail-bombing at the moment.  Please keep
 replies on the 'group where everyone may benefit.

Feb 12 2004
parent reply Sean Kelly <sean ffwd.cx> writes:
Ivan Senji wrote:

 It looks to me too that when the key of the associative array is class type
 only references are compared and not the objects using the operator==
 
 I had the same problem when trying to implement a set container! First i
 used associative array because it was very simple to use but had to give it
 up becouse of this problem!

Associative containers typically care about equivalence, not equality. ie. in C++ containers the default equivalence rule is less-than, so two objects are considered equivalent if neither is less than the other. Have you tried supplying the other comparison operators? Sean
Feb 12 2004
next sibling parent reply Stewart Gordon <smjg_1998 yahoo.com> writes:
While it was 2/12/04 4:45 PM throughout the UK, Sean Kelly sprinkled 
little black dots on a white screen, and they fell thus:

<snip>
 Associative containers typically care about equivalence, not equality. 

But if D's allowing a class as a key type is supposed to be useful, surely it should care about equality?
 ie. in C++ containers the default equivalence rule is less-than, 

Define "C++ container".
 so two objects are considered equivalent if neither is less than the other. 

That would make sense if the associative array were implemented as a binary search tree. But it shouldn't be trying to do this anyway if it can't be sure first that a comparator is defined. And anyway, I thought the reason for D having both opCmp and opEquals is so that opEquals can be more efficient for that purpose. So why should it go out of its way to use opCmp when it should be using opEquals?
 Have you tried supplying the other comparison operators?

Not yet. Stewart. -- My e-mail is valid but not my primary mailbox, aside from its being the unfortunate victim of intensive mail-bombing at the moment. Please keep replies on the 'group where everyone may benefit.
Feb 12 2004
parent reply Sean Kelly <sean ffwd.cx> writes:
Stewart Gordon wrote:
 ie. in C++ containers the default equivalence rule is less-than, 

Define "C++ container".

The associative containers I mentioned previously: map, set, etc.
 so two objects are considered equivalent if neither is less than the 
 other. 

That would make sense if the associative array were implemented as a binary search tree. But it shouldn't be trying to do this anyway if it can't be sure first that a comparator is defined.

But the same holds for equality comparison, unless references don't support any other comparison method?
 And anyway, I thought the reason for D having both opCmp and opEquals is 
 so that opEquals can be more efficient for that purpose.  So why should 
 it go out of its way to use opCmp when it should be using opEquals?

I was making assumptions. I figured the back-end would be implemented as some sort of tree structure for efficiency, which wouldn't be compatible with equality comparisons. Sean
Feb 12 2004
parent reply Stewart Gordon <smjg_1998 yahoo.com> writes:
While it was 2/12/04 8:52 PM throughout the UK, Sean Kelly sprinkled
little black dots on a white screen, and they fell thus:

<snip>
 That would make sense if the associative array were implemented as 
 a binary search tree.  But it shouldn't be trying to do this anyway 
 if it can't be sure first that a comparator is defined.

But the same holds for equality comparison, unless references don't support any other comparison method?

I'm not quite sure what you mean by that. <snip>
 I was making assumptions.  I figured the back-end would be 
 implemented as some sort of tree structure for efficiency, which 
 wouldn't be compatible with equality comparisons.

I'd expect the typical implementation to be a hash table, if it isn't going to require the key type to be ordered. Of course, you could have a hash tree, but I'm not sure if there'd be much point in that.... Stewart. -- My e-mail is valid but not my primary mailbox, aside from its being the unfortunate victim of intensive mail-bombing at the moment. Please keep replies on the 'group where everyone may benefit.
Feb 13 2004
parent reply Sean Kelly <sean ffwd.cx> writes:
Stewart Gordon wrote:
 While it was 2/12/04 8:52 PM throughout the UK, Sean Kelly sprinkled
 little black dots on a white screen, and they fell thus:

 But the same holds for equality comparison, unless references don't 
 support any other comparison method?

I'm not quite sure what you mean by that.

If a reference is roughly equivalent to a pointer then it's possible it could be compared using all the normal numeric methods. However I think some languages restrict comparisons to equality only as the pointers may reference separate memory spaces.
 I'd expect the typical implementation to be a hash table, if it isn't
 going to require the key type to be ordered.

True. And in that case it may indeed use equality. Sean
Feb 13 2004
parent reply Stewart Gordon <smjg_1998 yahoo.com> writes:
While it was 2/13/04 4:50 PM throughout the UK, Sean Kelly sprinkled 
little black dots on a white screen, and they fell thus:

<snip>
 If a reference is roughly equivalent to a pointer then it's possible it 
 could be compared using all the normal numeric methods.  However I think 
 some languages restrict comparisons to equality only as the pointers may 
 reference separate memory spaces.

I don't know much about this, but C and C++ seem to support inequality operations on pointers. Not sure what would constitute "separate memory spaces". But that doesn't affect the fact, that surely it should be comparing the objects, not the pointers?
 I'd expect the typical implementation to be a hash table, if it isn't
 going to require the key type to be ordered.

True. And in that case it may indeed use equality.

Indeed, should use equality. I can't see any reason that Walter would've copied the eccentric behaviour of STL containers just for the sake of it. And indeed, I tried defining opCmp but it made no difference. What would be the point of having an equality operator if you're not going to use it? Stewart. -- My e-mail is valid but not my primary mailbox, aside from its being the unfortunate victim of intensive mail-bombing at the moment. Please keep replies on the 'group where everyone may benefit.
Feb 13 2004
parent Sean Kelly <sean ffwd.cx> writes:
Stewart Gordon wrote:
 I don't know much about this, but C and C++ seem to support inequality 
 operations on pointers.  Not sure what would constitute "separate memory 
 spaces".

By that I meant a pointer to a memory-mapped file vs. a pointer to main memory. In such cases, a less-than comparison is meaningless.
 But that doesn't affect the fact, that surely it should be comparing the 
 objects, not the pointers?

Yes it should, but apparently that isn't how it's currently working.
 I'd expect the typical implementation to be a hash table, if it isn't
 going to require the key type to be ordered.

True. And in that case it may indeed use equality.


 Indeed, should use equality.  I can't see any reason that Walter 
 would've copied the eccentric behaviour of STL containers just for the 
 sake of it.

There are some hashlist implementations that use equivalence. The Dinkumware version is an adaptor on top of a doubly-linked list IIRC.
 And indeed, I tried defining opCmp but it made no 
 difference.  What would be the point of having an equality operator if 
 you're not going to use it?

Agreed.
Feb 13 2004
prev sibling parent reply "Ivan Senji" <ivan.senji public.srce.hr> writes:
"Sean Kelly" <sean ffwd.cx> wrote in message
news:c0gaj5$185s$1 digitaldaemon.com...
 Ivan Senji wrote:

 It looks to me too that when the key of the associative array is class


 only references are compared and not the objects using the operator==

 I had the same problem when trying to implement a set container! First i
 used associative array because it was very simple to use but had to give


 up becouse of this problem!

Associative containers typically care about equivalence, not equality. ie. in C++ containers the default equivalence rule is less-than, so two objects are considered equivalent if neither is less than the other

this makes sense.
 Have you tried supplying the other comparison operators?

I tried writing opCmp for a class that i put in the associative array but ist slill doesnt work the way i think it should and the compiler still compares references!??
 Sean

Feb 12 2004
parent reply Ilya Minkov <minkov cs.tum.edu> writes:
Ivan Senji wrote:
 I tried writing opCmp for a class that i put in the associative array
 but ist slill doesnt work the way i think it should and the compiler
 still compares references!??

I think it's simply a bug since toHash should be enough. -eye
Feb 12 2004
next sibling parent Manfred Nowak <svv1999 hotmail.com> writes:
Ilya Minkov wrote:

 I think it's simply a bug since toHash should be enough.

Agreed. Hashing does not require a (partial) order on the elements. So long.
Feb 12 2004
prev sibling parent "Walter" <walter digitalmars.com> writes:
"Ilya Minkov" <minkov cs.tum.edu> wrote in message
news:c0gq82$221f$1 digitaldaemon.com...
 I think it's simply a bug since toHash should be enough.

It isn't because hashes can collide.
Feb 13 2004
prev sibling next sibling parent Vathix <vathix dprogramming.com> writes:
Stewart Gordon wrote:

 Surely it should be looking up the key by actually comparing objects, 
 not just memory addresses?
 
 The same happens if I comment out opEquals and/or toHash, or throw a 
 rehash statement in before the lookups.
 

It might be that it's always using getHash from the TypeInfo. I got structs to properly work with associative arrays by making my own TypeInfo. So it looks like a bug. -- Christopher E. Miller www.dprogramming.com irc.dprogramming.com #D
Feb 12 2004
prev sibling next sibling parent reply "Walter" <walter digitalmars.com> writes:
The associative array code for class objects uses toHash() and opCmp(). See
std\typeinfo\ti_C.d. You'll need to overload opCmp() for your example.
Feb 13 2004
next sibling parent reply "Ivan Senji" <ivan.senji public.srce.hr> writes:
"Walter" <walter digitalmars.com> wrote in message
news:c0kf1m$1psq$3 digitaldaemon.com...
 The associative array code for class objects uses toHash() and opCmp().

 std\typeinfo\ti_C.d. You'll need to overload opCmp() for your example.

i tried: class ABC { this(int x,int y, int z) { a = x; b = y; c = z; } override uint toHash() { return a+b+c; } int opCmp ( ABC x) { return this.toHash() - x.toHash(); } int a,b,c; } ... an leter: int[ABC] AA; ABC a111 = new ABC(1,1,1); ABC b222 = new ABC(2,2,2); ABC c111 = new ABC(1,1,1); assert(a111<b222); assert(!(a111<c111)); AA[a111] = 111; AA[b222] = 222; assert(AA.length == 2); AA[c111] = 111; assert(AA.length == 2); // this asser fails because there are 3 elements in associative //array but there should be only 2! Am i doing something wrong?
Feb 14 2004
parent reply "Walter" <walter digitalmars.com> writes:
"Ivan Senji" <ivan.senji public.srce.hr> wrote in message
news:c0l1ij$2p85$1 digitaldaemon.com...
 "Walter" <walter digitalmars.com> wrote in message
 news:c0kf1m$1psq$3 digitaldaemon.com...
 The associative array code for class objects uses toHash() and opCmp().

 std\typeinfo\ti_C.d. You'll need to overload opCmp() for your example.

i tried: class ABC { this(int x,int y, int z) { a = x; b = y; c = z; } override uint toHash() { return a+b+c; } int opCmp ( ABC x) { return this.toHash() - x.toHash(); } int a,b,c; } ... an leter: int[ABC] AA; ABC a111 = new ABC(1,1,1); ABC b222 = new ABC(2,2,2); ABC c111 = new ABC(1,1,1); assert(a111<b222); assert(!(a111<c111)); AA[a111] = 111; AA[b222] = 222; assert(AA.length == 2); AA[c111] = 111; assert(AA.length == 2); // this asser fails because there are 3 elements

 associative
 //array but there should be only 2!

 Am i doing something wrong?

There are 3 because the objects are distinct.
Feb 20 2004
parent Sean Kelly <sean ffwd.cx> writes:
Walter wrote:
 
 There are 3 because the objects are distinct.

So assuming two objects hash to the same value and compare as equivalent, then object addresses are compared? Or does this comparison occur earlier. I'm wondering if I will have to keep the original key object around to retrieve stored values or if I can somehow pass an equivalent one as a key. Sean
Feb 20 2004
prev sibling next sibling parent reply Stewart Gordon <smjg_1998 yahoo.com> writes:
While it was 2/14/04 6:25 AM throughout the UK, Walter sprinkled little 
black dots on a white screen, and they fell thus:
 The associative array code for class objects uses toHash() and opCmp().

Why, exactly? Do you implement it as a hash table of binary trees? Surely associative arrays shouldn't be restricted to classes that have an ordering? Further, is there any rule that if opCmp is defined, it has to be consistent with opEquals? I haven't found any clarification of this in the spec, and moreover, the Java API at least has a few counterexamples.
 See std\typeinfo\ti_C.d. You'll need to overload opCmp() for your example.

Tried already - please see my dialogue with Sean Kelly (D/23817 et seq). FWIR, I defined int opCmp(IChar c) { return toupper(d) - toupper(c.d); } Stewart. -- My e-mail is valid but not my primary mailbox, aside from its being the unfortunate victim of intensive mail-bombing at the moment. Please keep replies on the 'group where everyone may benefit.
Feb 16 2004
parent reply Sean Kelly <sean ffwd.cx> writes:
Stewart Gordon wrote:

 While it was 2/14/04 6:25 AM throughout the UK, Walter sprinkled little 
 black dots on a white screen, and they fell thus:
 
 The associative array code for class objects uses toHash() and opCmp().

Why, exactly? Do you implement it as a hash table of binary trees? Surely associative arrays shouldn't be restricted to classes that have an ordering?

The Dinkumware STL implements hash sets as a hash wrapper around an ordered deque. I suppose this is one possibility.
 Further, is there any rule that if opCmp is defined, it has to be 
 consistent with opEquals?  I haven't found any clarification of this in 
 the spec, and moreover, the Java API at least has a few counterexamples.

Nope, not required. In fact 95% of the time when I define equality and comparison in classes they give different results, as I generally want to order classes on a subset of what actually contitutes equality.
 See std\typeinfo\ti_C.d. You'll need to overload opCmp() for your 
 example.

Tried already - please see my dialogue with Sean Kelly (D/23817 et seq).

I'll have to try this out myself. Still haven't gotten around to it. Sean
Feb 17 2004
parent reply Stewart Gordon <smjg_1998 yahoo.com> writes:
While it was 2/17/04 4:37 PM throughout the UK, Sean Kelly sprinkled 
little black dots on a white screen, and they fell thus:

 Stewart Gordon wrote:

 Why, exactly?  Do you implement it as a hash table of binary trees? 
 Surely associative arrays shouldn't be restricted to classes that have 
 an ordering?

The Dinkumware STL implements hash sets as a hash wrapper around an ordered deque. I suppose this is one possibility.

Ah, I never imagined that the associative array implementation in DMD was a wrapper around a third-party library that isn't fully compatible with the objective. If that's the case, I suppose this fact is the problem that needs fixing. Stewart. -- My e-mail is valid but not my primary mailbox, aside from its being the unfortunate victim of intensive mail-bombing at the moment. Please keep replies on the 'group where everyone may benefit.
Feb 17 2004
parent reply Sean Kelly <sean ffwd.cx> writes:
Stewart Gordon wrote:
 Ah, I never imagined that the associative array implementation in DMD 
 was a wrapper around a third-party library that isn't fully compatible 
 with the objective.  If that's the case, I suppose this fact is the 
 problem that needs fixing.

I don't think it is. I was merely pointing out that some hash set implementation may use compare operations rather than equality. Sean
Feb 17 2004
parent Stewart Gordon <smjg_1998 yahoo.com> writes:
While it was 2/17/04 8:30 PM throughout the UK, Sean Kelly sprinkled 
little black dots on a white screen, and they fell thus:
 Stewart Gordon wrote:
 
 
 Ah, I never imagined that the associative array implementation in DMD 
 was a wrapper around a third-party library that isn't fully compatible 
 with the objective.  If that's the case, I suppose this fact is the 
 problem that needs fixing.

I don't think it is. I was merely pointing out that some hash set implementation may use compare operations rather than equality.

Exactly - *some* hash set implementation. Repeatedly citing third-party examples says nothing about the DMD implementation, and even less about its rationale. Stewart. -- My e-mail is valid but not my primary mailbox, aside from its being the unfortunate victim of intensive mail-bombing at the moment. Please keep replies on the 'group where everyone may benefit.
Feb 18 2004
prev sibling parent reply Manfred Nowak <svv1999 hotmail.com> writes:
Walter wrote:

 The associative array code for class objects uses toHash() and opCmp(). See
 std\typeinfo\ti_C.d. You'll need to overload opCmp() for your example.

Simply instrumenting the calls to `toHash', etc. shows that `opCmp' is not called, maybe because a null-pointer is compared: toHash c starting the requests toHash c toHash c 4 toHash c No such element toHash C No such element It is not clear, why `toHash' is called twice for the evaluation, that an element indeed is in the associative array. So long.
Feb 17 2004
parent reply "Walter" <walter digitalmars.com> writes:
"Manfred Nowak" <svv1999 hotmail.com> wrote in message
news:c0uct3$2glo$1 digitaldaemon.com...
 Walter wrote:

 The associative array code for class objects uses toHash() and opCmp().


 std\typeinfo\ti_C.d. You'll need to overload opCmp() for your example.

Simply instrumenting the calls to `toHash', etc. shows that `opCmp' is not called, maybe because a null-pointer is compared: toHash c starting the requests toHash c toHash c 4 toHash c No such element toHash C No such element It is not clear, why `toHash' is called twice for the evaluation, that an element indeed is in the associative array.

opCmp is only called if two objects hash to the same value.
Feb 20 2004
parent Manfred Nowak <svv1999 hotmail.com> writes:
Walter wrote:

 opCmp is only called if two objects hash to the same value.

Because multiple objects can hash to the same value how does the algorithm find out without comparing, that an object found is indeed the object searched for? So long.
Feb 21 2004
prev sibling next sibling parent Dr.Dizel <Dr.Dizel_member pathlink.com> writes:
Let:

uint[bit[]] avec;
static bit[] bvec = [0b1, 0b0];
++avec[bvec]; // <-- error

:-\
Feb 14 2004
prev sibling parent Stewart Gordon <smjg_1998 yahoo.com> writes:
While we're still pondering over the issue, I've just written an 
associative array implementation that works on classes.

http://smjg.port5.com/pr/d/hashmap.d
http://smjg.port5.com/pr/d/hashmaptest.d

Hopefully it's self-explanatory.  I haven't yet tested it on a struct.

Of course, the sooner this workaround becomes unnecessary, the better 
for the D programming community as a whole.

Stewart.

-- 
My e-mail is valid but not my primary mailbox, aside from its being the 
unfortunate victim of intensive mail-bombing at the moment.  Please keep 
replies on the 'group where everyone may benefit.
Feb 18 2004