www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - object.d and hash_t confusion?

reply kris <foo bar.com> writes:
In object.d, there's an alias declaration for hash_t like so:

------------
alias size_t hash_t;
-----------

This indicates that the hash_t type will be 32bit on a 32bit system, and 
64bit on that system; yes? Is this so that a pointer can be directly 
returned as a hash value?

Then, also in object.d, we have the decl for class Object:

-----------
class Object
{
     void print();
     char[] toString();
     uint toHash();
     int opCmp(Object o);
     int opEquals(Object o);
}
-----------

Notice that the toHash() method returns a uint? Is that supposed to be 
hash_t instead?

For the moment, let's suppose it is meant to be hash_t. The rest of this 
post is based upon that notion, so if I'm wrong here, no harm done :)

Using hash_t as the return type would mean the toHash() method returns a 
different type depending upon which platform it's compiled upon. This 
may have some ramifications, so let's explore what they might be:

1) because an alias is used, type-safety does not come into play. Thus, 
when someone overrides Object.toHash like so:

------------
override uint toHash() {...}
------------

a 32bit compiler will be unlikely to complain (remember, hash_t is an 
alias).

When this code is compiled in 64bit land, luckily, the compiler will 
probably complain about the uint/ulong mismatch. However, because the 
keyword "override" is not mandatory, most programmers will do this 
instead (in an class):

-----------
uint toHash() {....}
-----------

the result will perhaps be a good compile but a bogus override? Or will 
the compiler flag this as not being covariant? Either way, shouldn't 
this be handled in a more suitable manner?

I suppose one way to ensure consistency is to use a typedef instead of 
an alias ... but will that cause errors when the result is used in an 
arithmetic expression? In this situation, is typedef too type-safe and 
alias not sufficient?


2) It's generally not a great idea to change the signature/types of 
overridable methods when moving platforms. You have to ensure there's 
absolute consistency in the types used, otherwise the vaguely brittle 
nature of the override mechanism can be tripped.

So the question here is "why does toHash() need to change across 
platforms?". Isn't 32bits sufficient?

If the answer to that indicates a 64bit value being more applicable 
(even for avoiding type-conversion warnings), then it would seem to 
indicate a new integral-type is required? One that has type-safety (a la 
typedef) but can be used in arithmetic expression without warnings or 
errors? This new type would be equivalent to size_t vis-a-vis byte size.

I know D is supposed to have fixed-size basic integer types across 
platforms, and for good reason. Yet here's a situation where, it *seems* 
that the most fundamental class in the runtime is perhaps flaunting 
that? Perhaps there's a few other corners where similar concerns may 
crop up?

I will note a vague distaste for the gazilion C++ style meta-types 
anyway; D does the right thing in making almost all of them entirely 
redundant. But, if there is indeed a problem with toHash(), then I 
suspect we need a more robust solution. What say you?
Jun 21 2006
parent reply James Pelcis <jpelcis gmail.com> writes:
kris wrote:
 Notice that the toHash() method returns a uint? Is that supposed to be 
 hash_t instead?
Yes. In the internal\object.d file, it is hash_t. This is now Bugzilla 225.
 1) because an alias is used, type-safety does not come into play. Thus, 
 when someone overrides Object.toHash like so:
 
 ------------
 override uint toHash() {...}
 ------------
 
 a 32bit compiler will be unlikely to complain (remember, hash_t is an 
 alias).
The compiler would be right, too. It is the same type (for 32 bits).
 
 When this code is compiled in 64bit land, luckily, the compiler will 
 probably complain about the uint/ulong mismatch. However, because the 
 keyword "override" is not mandatory, most programmers will do this 
 instead (in an class):
 
 -----------
 uint toHash() {....}
 -----------
 
 the result will perhaps be a good compile but a bogus override? Or will 
 the compiler flag this as not being covariant? Either way, shouldn't 
 this be handled in a more suitable manner?
This is a programmer error, not a language error. Fortunately, it would be marked as not being covariant.
 I suppose one way to ensure consistency is to use a typedef instead of 
 an alias ... but will that cause errors when the result is used in an 
 arithmetic expression? In this situation, is typedef too type-safe and 
 alias not sufficient?
If a typedef was used, hash_t could still be used in expressions, but the result would need to be casted to go back to hash_t.
 
 2) It's generally not a great idea to change the signature/types of 
 overridable methods when moving platforms. You have to ensure there's 
 absolute consistency in the types used, otherwise the vaguely brittle 
 nature of the override mechanism can be tripped.
 
 So the question here is "why does toHash() need to change across 
 platforms?". Isn't 32bits sufficient?
toHash definitely needs to change across platforms. Here's the current implementation: Ignoring the fact that the function won't currently work on 64-bit either (since it is marked as having a bug, although for a different reason), the result needs to be big enough to return a pointer. 32-bits won't always do that.
 If the answer to that indicates a 64bit value being more applicable 
 (even for avoiding type-conversion warnings), then it would seem to 
 indicate a new integral-type is required? One that has type-safety (a la 
 typedef) but can be used in arithmetic expression without warnings or 
 errors? This new type would be equivalent to size_t vis-a-vis byte size.
On some platforms and at some time, even 64-bits won't be enough to handle toHash.
 I know D is supposed to have fixed-size basic integer types across 
 platforms, and for good reason. Yet here's a situation where, it *seems* 
 that the most fundamental class in the runtime is perhaps flaunting 
 that? Perhaps there's a few other corners where similar concerns may 
 crop up?
 
 I will note a vague distaste for the gazilion C++ style meta-types 
 anyway; D does the right thing in making almost all of them entirely 
 redundant. But, if there is indeed a problem with toHash(), then I 
 suspect we need a more robust solution. What say you?
Since the only non-bug problem I noticed here was a programmer error (using uint instead of hash_t), why should it be changed? If a change does need to be made though, the alias could be changed into a typedef. That would check for the problem regardless of the platform.
Jun 26 2006
next sibling parent reply kris <foo bar.com> writes:
James Pelcis wrote:
 Since the only non-bug problem I noticed here was a programmer error 
 (using uint instead of hash_t), why should it be changed?
Well, the hope was that such an easy-to-make 'mistake' would be caught by the compiler :)
 If a change does need to be made though, the alias could be changed into 
 a typedef.  That would check for the problem regardless of the platform.
Yep, but probably requires casting. Walter has noted on a number of ocassions that a cast is not exactly intended for general purposes. I just wonder if this should be considered a special-case or not
Jun 26 2006
parent James Pelcis <jpelcis gmail.com> writes:
kris wrote:
 James Pelcis wrote:
 Since the only non-bug problem I noticed here was a programmer error 
 (using uint instead of hash_t), why should it be changed?
Well, the hope was that such an easy-to-make 'mistake' would be caught by the compiler :)
Alas, no. It's similar to (for example) using ubyte instead of GLubyte. Both are legal. In fact, we don't normally even want the compiler to complain about it.
 If a change does need to be made though, the alias could be changed 
 into a typedef.  That would check for the problem regardless of the 
 platform.
Yep, but probably requires casting. Walter has noted on a number of ocassions that a cast is not exactly intended for general purposes. I just wonder if this should be considered a special-case or not
Casting wouldn't be necessary when using a typedef'ed version of hash_t, but it would still be needed whenever it's assigned to a variable. Personally, I don't think it's necessary and it definitely isn't desirable to need to use casting for the Object class. I vote to leave it as is (with the bug fixed).
Jun 26 2006
prev sibling parent reply xs0 <xs0 xs0.com> writes:
 On some platforms and at some time, even 64-bits won't be enough to 
 handle toHash.
Don't you think a hash of 64 (or even 32) bits should always be enough? If your hashing function is bad, no amount of bits will help, and if it's good, 32 bits is enough for most everything, and 64 is definitely enough for anything at all.. xs0
Jun 27 2006
parent Lionello Lunesu <lio lunesu.remove.com> writes:
xs0 wrote:
 
 On some platforms and at some time, even 64-bits won't be enough to 
 handle toHash.
Don't you think a hash of 64 (or even 32) bits should always be enough? If your hashing function is bad, no amount of bits will help, and if it's good, 32 bits is enough for most everything, and 64 is definitely enough for anything at all..
In fact, I think that a hash of 32-bit should indeed be enough for anything. Even a 64-bit pointer should be hashable in 32-bits, by using some logical operations (hi ^ lo?). L.
Jun 27 2006