digitalmars.D.learn - Struct hash issues with string fields

Andrej Mitrovic (21/21) May 26 2012 I don't understand this:

=?UTF-8?B?QWxpIMOHZWhyZWxp?= (38/60) May 26 2012 I don't know all aspects of this issue. I know that there has been

=?UTF-8?B?QWxpIMOHZWhyZWxp?= (5/8) May 26 2012 Correction: opEquals() is only for potential optimizations. What is

Andrej Mitrovic (2/9) May 27 2012 Thanks! Is this documented somewhere on dlang.org?

=?UTF-8?B?QWxpIMOHZWhyZWxp?= (9/20) May 27 2012 Yes. It is under the "Using Structs or Unions as the KeyType" section of...

Andrej Mitrovic <andrej.mitrovich gmail.com> writes:

I don't understand this:

import std.stdio;

struct Symbol { string val; }

void main()
{
    int[string] hash1;
    hash1["1".idup] = 1;
    hash1["1".idup] = 2;
    writeln(hash1);  // writes "["1":2]"

    int[Symbol] hash2;
    Symbol sym1 = Symbol("1".idup);
    Symbol sym2 = Symbol("1".idup);
    hash2[sym1] = 1;
    hash2[sym2] = 1;
    writeln(hash2);  // writes "[Symbol("1"):1, Symbol("1"):1]"
}

Why are sym1 and sym2 unique keys in hash2? Because the hash
implementation checks the array pointer instead of its contents? But
then why does hash1 not have the same issue?

I can't override toHash() in a struct, so what am I supposed to do in
order to make "sym1" and "sym2" be stored into the same hash key?

May 26 2012

=?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:

On 05/26/2012 12:53 PM, Andrej Mitrovic wrote:
 I don't understand this:

 import std.stdio;

 struct Symbol { string val; }

 void main()
 {
      int[string] hash1;
      hash1["1".idup] = 1;
      hash1["1".idup] = 2;
      writeln(hash1);  // writes "["1":2]"

      int[Symbol] hash2;
      Symbol sym1 = Symbol("1".idup);
      Symbol sym2 = Symbol("1".idup);
      hash2[sym1] = 1;
      hash2[sym2] = 1;
      writeln(hash2);  // writes "[Symbol("1"):1, Symbol("1"):1]"
 }

 Why are sym1 and sym2 unique keys in hash2? Because the hash
 implementation checks the array pointer instead of its contents? But
 then why does hash1 not have the same issue?

 I can't override toHash() in a struct,

But you can define it.

 so what am I supposed to do in
 order to make "sym1" and "sym2" be stored into the same hash key?

I don't know all aspects of this issue. I know that there has been 
recent discussions about the hash functions. You may be seeing some 
shortcomings.

What I am pretty sure about is though, once you define toHash(), you 
must also define opCmp() and opEquals() that are all consistent with 
each other.

When I do that, the following code works as you expect:

import std.stdio;

struct Symbol
{
     string val;

     hash_t toHash() const nothrow  safe
     {
         return typeid(val).getHash(&val);
     }

     bool opEquals(const ref Symbol that) const
     {
         return val == that.val;
     }

     int opCmp(const ref Symbol that) const
     {
         return val < that.val ? -1 : (val > that.val ? 1 : 0);
     }
}

void main()
{
     int[Symbol] hash2;
     Symbol sym1 = Symbol("1".idup);
     Symbol sym2 = Symbol("1".idup);
     hash2[sym1] = 1;
     hash2[sym2] = 1;
     writeln(hash2);             // writes [Symbol("1"):1]
}

Ali

-- 
D Programming Language Tutorial: http://ddili.org/ders/d.en/index.html

May 26 2012

=?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:

On 05/26/2012 05:13 PM, Ali Çehreli wrote:

 once you define toHash(), you
 must also define opCmp() and opEquals() that are all consistent with
 each other.

Correction: opEquals() is only for potential optimizations. What is 
needed alongside toHash() is just opCmp(), and the reason is to be able 
to resolve hash collisions.

Ali

May 26 2012

Andrej Mitrovic <andrej.mitrovich gmail.com> writes:

On 5/27/12, Ali =C7ehreli <acehreli yahoo.com> wrote:
 On 05/26/2012 05:13 PM, Ali =C7ehreli wrote:

  > once you define toHash(), you
  > must also define opCmp() and opEquals() that are all consistent with
  > each other.

 Correction: opEquals() is only for potential optimizations. What is
 needed alongside toHash() is just opCmp(), and the reason is to be able
 to resolve hash collisions.

Thanks! Is this documented somewhere on dlang.org?

May 27 2012

=?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:

On 05/27/2012 04:22 AM, Andrej Mitrovic wrote:
 On 5/27/12, Ali Çehreli<acehreli yahoo.com>  wrote:
 On 05/26/2012 05:13 PM, Ali Çehreli wrote:

   >  once you define toHash(), you
   >  must also define opCmp() and opEquals() that are all consistent with
   >  each other.

 Correction: opEquals() is only for potential optimizations. What is
 needed alongside toHash() is just opCmp(), and the reason is to be able
 to resolve hash collisions.

 Thanks! Is this documented somewhere on dlang.org?

Yes. It is under the "Using Structs or Unions as the KeyType" section of 
the Associative Arrays documentation:

   http://dlang.org/hash-map.html

It actually says "The implementation may use either opEquals or opCmp or 
both". So I guess opEquals() is still necessary for portability.

Ali

-- 
D Programming Language Tutorial: http://ddili.org/ders/d.en/index.html

May 27 2012

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Struct hash issues with string fields