www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Struct hash issues with string fields

reply Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
I don't understand this:

import std.stdio;

struct Symbol { string val; }

void main()
{
    int[string] hash1;
    hash1["1".idup] = 1;
    hash1["1".idup] = 2;
    writeln(hash1);  // writes "["1":2]"

    int[Symbol] hash2;
    Symbol sym1 = Symbol("1".idup);
    Symbol sym2 = Symbol("1".idup);
    hash2[sym1] = 1;
    hash2[sym2] = 1;
    writeln(hash2);  // writes "[Symbol("1"):1, Symbol("1"):1]"
}

Why are sym1 and sym2 unique keys in hash2? Because the hash
implementation checks the array pointer instead of its contents? But
then why does hash1 not have the same issue?

I can't override toHash() in a struct, so what am I supposed to do in
order to make "sym1" and "sym2" be stored into the same hash key?
May 26 2012
next sibling parent reply =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
On 05/26/2012 12:53 PM, Andrej Mitrovic wrote:
 I don't understand this:

 import std.stdio;

 struct Symbol { string val; }

 void main()
 {
      int[string] hash1;
      hash1["1".idup] = 1;
      hash1["1".idup] = 2;
      writeln(hash1);  // writes "["1":2]"

      int[Symbol] hash2;
      Symbol sym1 = Symbol("1".idup);
      Symbol sym2 = Symbol("1".idup);
      hash2[sym1] = 1;
      hash2[sym2] = 1;
      writeln(hash2);  // writes "[Symbol("1"):1, Symbol("1"):1]"
 }

 Why are sym1 and sym2 unique keys in hash2? Because the hash
 implementation checks the array pointer instead of its contents? But
 then why does hash1 not have the same issue?

 I can't override toHash() in a struct,

But you can define it.
 so what am I supposed to do in
 order to make "sym1" and "sym2" be stored into the same hash key?

I don't know all aspects of this issue. I know that there has been recent discussions about the hash functions. You may be seeing some shortcomings. What I am pretty sure about is though, once you define toHash(), you must also define opCmp() and opEquals() that are all consistent with each other. When I do that, the following code works as you expect: import std.stdio; struct Symbol { string val; hash_t toHash() const nothrow safe { return typeid(val).getHash(&val); } bool opEquals(const ref Symbol that) const { return val == that.val; } int opCmp(const ref Symbol that) const { return val < that.val ? -1 : (val > that.val ? 1 : 0); } } void main() { int[Symbol] hash2; Symbol sym1 = Symbol("1".idup); Symbol sym2 = Symbol("1".idup); hash2[sym1] = 1; hash2[sym2] = 1; writeln(hash2); // writes [Symbol("1"):1] } Ali -- D Programming Language Tutorial: http://ddili.org/ders/d.en/index.html
May 26 2012
parent reply =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
On 05/26/2012 05:13 PM, Ali Çehreli wrote:

 once you define toHash(), you
 must also define opCmp() and opEquals() that are all consistent with
 each other.

Correction: opEquals() is only for potential optimizations. What is needed alongside toHash() is just opCmp(), and the reason is to be able to resolve hash collisions. Ali
May 26 2012
parent =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
On 05/27/2012 04:22 AM, Andrej Mitrovic wrote:
 On 5/27/12, Ali Çehreli<acehreli yahoo.com>  wrote:
 On 05/26/2012 05:13 PM, Ali Çehreli wrote:

   >  once you define toHash(), you
   >  must also define opCmp() and opEquals() that are all consistent with
   >  each other.

 Correction: opEquals() is only for potential optimizations. What is
 needed alongside toHash() is just opCmp(), and the reason is to be able
 to resolve hash collisions.

Thanks! Is this documented somewhere on dlang.org?

Yes. It is under the "Using Structs or Unions as the KeyType" section of the As­so­cia­tive Ar­rays documentation: http://dlang.org/hash-map.html It actually says "The implementation may use either opEquals or opCmp or both". So I guess opEquals() is still necessary for portability. Ali -- D Programming Language Tutorial: http://ddili.org/ders/d.en/index.html
May 27 2012
prev sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
On 5/27/12, Ali =C7ehreli <acehreli yahoo.com> wrote:
 On 05/26/2012 05:13 PM, Ali =C7ehreli wrote:

  > once you define toHash(), you
  > must also define opCmp() and opEquals() that are all consistent with
  > each other.

 Correction: opEquals() is only for potential optimizations. What is
 needed alongside toHash() is just opCmp(), and the reason is to be able
 to resolve hash collisions.

Thanks! Is this documented somewhere on dlang.org?
May 27 2012