www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - built-in string hash ?

reply Kevin Bailey <noone nowhere.com> writes:
So I have a class containing two strings:

class Foo
{
	string s1, s2;

	...

and I'd like to add a toHash() member, but I can't find
the built-in string hash function. These don't work:

s1.toHash()
s1.toHash
s1.hash
s1.hash()
hash(s1)

yet strings can clearly be the key in a map.

I see that xml.d writes its own string hash function
but that just doesn't seem right.

Is there a way to get to the built-in ?
Aug 28 2010
parent reply torhu <no spam.invalid> writes:
On 28.08.2010 16:37, Kevin Bailey wrote:
 So I have a class containing two strings:

 class Foo
 {
 	string s1, s2;

 	...

 and I'd like to add a toHash() member, but I can't find
 the built-in string hash function. These don't work:

 s1.toHash()
 s1.toHash
 s1.hash
 s1.hash()
 hash(s1)

 yet strings can clearly be the key in a map.

 I see that xml.d writes its own string hash function
 but that just doesn't seem right.

 Is there a way to get to the built-in ?

You can get to it through TypeInfo: http://www.digitalmars.com/d/2.0/phobos/object.html#getHash string a = "abc"; auto hash = typeid(a).getHash(&a);
Aug 28 2010
parent reply bearophile <bearophileHUGS lycos.com> writes:
torhu:
 string a = "abc";
 auto hash = typeid(a).getHash(&a);

If higher performance is necessary, you may pre-compute part of that: void main() { string a = "abc"; auto hash1 = typeid(a).getHash(&a); auto stringHash = &(typeid(a).getHash); auto hash2 = stringHash(&a); assert(hash1 == hash2); } Bye, bearophile
Aug 28 2010
parent reply Pelle <pelle.mansson gmail.com> writes:
On 08/28/2010 10:25 PM, bearophile wrote:
 torhu:
 string a = "abc";
 auto hash = typeid(a).getHash(&a);

If higher performance is necessary, you may pre-compute part of that: void main() { string a = "abc"; auto hash1 = typeid(a).getHash(&a); auto stringHash =&(typeid(a).getHash); auto hash2 = stringHash(&a); assert(hash1 == hash2); } Bye, bearophile

I doubt that gives any performance gains. typeid(a).getHash should be a constant expression anyway, and I don't see any gains in my tiny benchmark test. Perhaps it works better if a was an Object, since typeid for objects does more.
Aug 28 2010
parent reply bearophile <bearophileHUGS lycos.com> writes:
Pelle:
 I doubt that gives any performance gains. typeid(a).getHash should be a 
 constant expression anyway, and I don't see any gains in my tiny 
 benchmark test.

My code shows a limited time difference: import std.stdio: writeln; import std.date: getUTCtime, ticksPerSecond; void main() { enum int NLOOP = 30_000_000; string s = "and I'd like to add a toHash() member, but I can't find the built-in string hash function."; { auto t0 = getUTCtime(); auto stringHash =&(typeid(s).getHash); hash_t tot; for (int i; i < NLOOP; i++) tot += stringHash(&s); auto t1 = getUTCtime(); writeln(tot, " ", (t1 - t0) / cast(double)ticksPerSecond); } { auto t0 = getUTCtime(); hash_t tot; for (int i; i < NLOOP; i++) tot += typeid(s).getHash(&s); auto t1 = getUTCtime(); writeln(tot, " ", (t1 - t0) / cast(double)ticksPerSecond); } } And the asm shows the first loop to contain one less instruction (dmd 2.048, -O -release -inline), so the difference is small: L42: lea ECX,01Ch[ESP] mov EAX,02Ch[ESP] push ECX call EDI add ESI,EAX inc EBX cmp EBX,01C9C380h jb L42 LA7: lea EDX,01Ch[ESP] mov ECX,_D12TypeInfo_Aya6__initZ mov EAX,offset FLAT:_D12TypeInfo_Aya6__initZ push EDX call dword ptr 018h[ECX] add EDI,EAX inc EBX cmp EBX,01C9C380h jb LA7 Bye, bearophile
Aug 28 2010
parent BCS <none anon.com> writes:
Hello bearophile,

 Pelle:
 
 I doubt that gives any performance gains. typeid(a).getHash should be
 a constant expression anyway, and I don't see any gains in my tiny
 benchmark test.
 

import std.stdio: writeln; import std.date: getUTCtime, ticksPerSecond; void main() { enum int NLOOP = 30_000_000; string s = "and I'd like to add a toHash() member, but I can't find the built-in string hash function."; { auto t0 = getUTCtime(); auto stringHash =&(typeid(s).getHash); hash_t tot; for (int i; i < NLOOP; i++) tot += stringHash(&s); auto t1 = getUTCtime(); writeln(tot, " ", (t1 - t0) / cast(double)ticksPerSecond); } { auto t0 = getUTCtime(); hash_t tot; for (int i; i < NLOOP; i++) tot += typeid(s).getHash(&s); auto t1 = getUTCtime(); writeln(tot, " ", (t1 - t0) / cast(double)ticksPerSecond); } } And the asm shows the first loop to contain one less instruction (dmd 2.048, -O -release -inline), so the difference is small: L42: lea ECX,01Ch[ESP] mov EAX,02Ch[ESP] push ECX call EDI add ESI,EAX inc EBX cmp EBX,01C9C380h jb L42 LA7: lea EDX,01Ch[ESP] mov ECX,_D12TypeInfo_Aya6__initZ

DMD seems to do a bad job of dead assignment removal at the ASM level. I've seen this befor where half the asm in large blocks was dead.
 mov EAX,offset FLAT:_D12TypeInfo_Aya6__initZ
 push    EDX
 call    dword ptr 018h[ECX]
 add EDI,EAX
 inc EBX
 cmp EBX,01C9C380h
 jb  LA7
 Bye,
 bearophile

... <IXOYE><
Aug 28 2010