www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - how hash_t toHash() works?

reply "gedaiu" <szabobogdan yahoo.com> writes:
hi,

I have a class which I want to use as key in an assoc array like
this:

string["KeyString"] myArray;

What i want is to preserve the order in the array. I want always
to have "1" before "2" if the string is a numeric value.

Can anyone help me to understand how const hash_t toHash() should
work?

Here is the KeyString class:

struct KeyString {
	string str;
	
	KeyString opAssign(string val) {
		str = val;
		return this;
	}
	
	const hash_t toHash() {
		try {
			return to!long(str);
		} catch(Exception e) {
			hash_t hash; foreach (char c; str) hash = (hash * 9) + c;
			return hash;
		}
	}
	
	const int opCmp(ref const KeyString s) {
		//return std.string.cmp(this.str, s.str);
		
		try {
			auto val1 = to!long(this.str);
			auto val2 = to!long(s.str);
			
			if(val1 < val2) {
				return -1;
			}
			
			if(val1 == val2) {
				return 0;
			}
			
		} catch(Exception e) {
			return std.string.cmp(str, s.str);
		}
		
		return 1;
	}
}


Thanks,
Bogdan
Apr 28 2013
parent reply "Ivan Kazmenko" <gassa mail.ru> writes:
Hi,

 I have a class which I want to use as key in an assoc array like
 this:

 string["KeyString"] myArray;

 What i want is to preserve the order in the array. I want always
 to have "1" before "2" if the string is a numeric value.

 Can anyone help me to understand how const hash_t toHash() 
 should
 work?
Associative arrays in D are implemented as hash tables. Hashing sacrifices ordering capabilities for greater speed. Creating a hash table will most certainly take your toHash value modulo some integer you can not control (small for small tables, large for larger ones). It would be crippling, if at all possible, to have foreach on an associative array always produce a sorted result. I'd suggest using std.container.RedBlackTree instead. It doesn't provide syntactic sugar (like "container[key] = value") but preserves the ordering. Also note that sorting items as strings should be separated from sorting as integers. For example, a value convertible to integer should be always less (or always greater) than a non-convertible value. Otherwise, it would quickly lead to inconsistent comparisons, as in "22" <s "22a" <s "4" <n "22". Here, "<s" compares values as strings and "<n" as integers. Below is an example of using RedBlackTree close to what you might need: ----- import std.container; import std.conv; import std.stdio; bool doConv(out long val, string s) { try { val = to!long(s); return true; } catch(ConvException) { return false; } } struct KeyString { string str; KeyString opAssign(string val) { str = val; return this; } const int opCmp(ref const KeyString s) { long v1, v2; auto b1 = doConv(v1, str); auto b2 = doConv(v2, s.str); if (b1 != b2) return b2 - b1; if (b1 && b2) return (v1 > v2) - (v1 < v2); return std.string.cmp(str, s.str); } } struct MyRecord { KeyString key; int val; this(string nkey, int nval) { key = nkey; val = nval; } } void main () { auto cont = redBlackTree !("a.key < b.key", true, MyRecord) (); cont.insert(MyRecord("1", 1)); cont.insert(MyRecord("111", 2)); cont.insert(MyRecord("25", 3)); cont.insert(MyRecord("53", 4)); cont.insert(MyRecord("a", 5)); cont.insert(MyRecord(" ", 6)); cont.insert(MyRecord("1234567890123456789", 7)); cont.insert(MyRecord("12345678901234567890", 8)); foreach (ref const cur; cont) { writefln("cont[\"%s\"] = %s", cur.key.str, cur.val); } } ----- The example prints: ----- cont["1"] = 1 cont["25"] = 3 cont["53"] = 4 cont["111"] = 2 cont["1234567890123456789"] = 7 cont[" "] = 6 cont["12345678901234567890"] = 8 cont["a"] = 5 ----- The last three entries are strings not convertible to a long, so they are ordered lexicographically. Ivan Kazmenko.
Apr 28 2013
parent reply "gedaiu" <szabobogdan yahoo.com> writes:
On Sunday, 28 April 2013 at 13:18:04 UTC, Ivan Kazmenko wrote:
 Hi,

 I have a class which I want to use as key in an assoc array 
 like
 this:

 string["KeyString"] myArray;

 What i want is to preserve the order in the array. I want 
 always
 to have "1" before "2" if the string is a numeric value.

 Can anyone help me to understand how const hash_t toHash() 
 should
 work?
Associative arrays in D are implemented as hash tables. Hashing sacrifices ordering capabilities for greater speed. Creating a hash table will most certainly take your toHash value modulo some integer you can not control (small for small tables, large for larger ones). It would be crippling, if at all possible, to have foreach on an associative array always produce a sorted result. I'd suggest using std.container.RedBlackTree instead. It doesn't provide syntactic sugar (like "container[key] = value") but preserves the ordering. Also note that sorting items as strings should be separated from sorting as integers. For example, a value convertible to integer should be always less (or always greater) than a non-convertible value. Otherwise, it would quickly lead to inconsistent comparisons, as in "22" <s "22a" <s "4" <n "22". Here, "<s" compares values as strings and "<n" as integers. Below is an example of using RedBlackTree close to what you might need: ----- import std.container; import std.conv; import std.stdio; bool doConv(out long val, string s) { try { val = to!long(s); return true; } catch(ConvException) { return false; } } struct KeyString { string str; KeyString opAssign(string val) { str = val; return this; } const int opCmp(ref const KeyString s) { long v1, v2; auto b1 = doConv(v1, str); auto b2 = doConv(v2, s.str); if (b1 != b2) return b2 - b1; if (b1 && b2) return (v1 > v2) - (v1 < v2); return std.string.cmp(str, s.str); } } struct MyRecord { KeyString key; int val; this(string nkey, int nval) { key = nkey; val = nval; } } void main () { auto cont = redBlackTree !("a.key < b.key", true, MyRecord) (); cont.insert(MyRecord("1", 1)); cont.insert(MyRecord("111", 2)); cont.insert(MyRecord("25", 3)); cont.insert(MyRecord("53", 4)); cont.insert(MyRecord("a", 5)); cont.insert(MyRecord(" ", 6)); cont.insert(MyRecord("1234567890123456789", 7)); cont.insert(MyRecord("12345678901234567890", 8)); foreach (ref const cur; cont) { writefln("cont[\"%s\"] = %s", cur.key.str, cur.val); } } ----- The example prints: ----- cont["1"] = 1 cont["25"] = 3 cont["53"] = 4 cont["111"] = 2 cont["1234567890123456789"] = 7 cont[" "] = 6 cont["12345678901234567890"] = 8 cont["a"] = 5 ----- The last three entries are strings not convertible to a long, so they are ordered lexicographically. Ivan Kazmenko.
Thanks! it looks great! one more questionWhat is the type of cont? auto cont = redBlackTree !("a.key < b.key", true, MyRecord) (); I want to use this as a property in a class and i can't use there auto keyword... I tried different types but it did not work. Bogdan
Apr 29 2013
parent reply "Ivan Kazmenko" <gassa mail.ru> writes:
 one more question
 What is the type of cont?

 auto cont = redBlackTree !("a.key < b.key", true, MyRecord) ();

 I want to use this as a property in a class and i can't use 
 there auto keyword... I tried different types but it did not 
 work.
For me, the following declaration works: ----- import std.functional; ... RedBlackTree !(MyRecord, binaryFun!"a.key < b.key", true) cont; ... cont = redBlackTree !("a.key < b.key", true, MyRecord) (); ----- I admit it could have been easier to figure out. For example, "writeln (typeof (cont).stringof);" just prints "RedBlackTree" which is not enough for a proper declaration. I've inferred the right declaration from the following lines in container.d: ----- /++ Ditto +/ auto redBlackTree(alias less, bool allowDuplicates, E)(E[] elems...) if(is(typeof(binaryFun!less(E.init, E.init)))) { //We shouldn't need to instantiate less here, but for some reason, //dmd can't handle it if we don't (even though the template which //takes less but not allowDuplicates works just fine). return new RedBlackTree!(E, binaryFun!less, allowDuplicates)(elems); } -----
Apr 29 2013
parent reply "gedaiu" <szabobogdan yahoo.com> writes:
On Monday, 29 April 2013 at 16:01:15 UTC, Ivan Kazmenko wrote:
 one more question
 What is the type of cont?

 auto cont = redBlackTree !("a.key < b.key", true, MyRecord) ();

 I want to use this as a property in a class and i can't use 
 there auto keyword... I tried different types but it did not 
 work.
For me, the following declaration works: ----- import std.functional; ... RedBlackTree !(MyRecord, binaryFun!"a.key < b.key", true) cont; ... cont = redBlackTree !("a.key < b.key", true, MyRecord) (); ----- I admit it could have been easier to figure out. For example, "writeln (typeof (cont).stringof);" just prints "RedBlackTree" which is not enough for a proper declaration. I've inferred the right declaration from the following lines in container.d: ----- /++ Ditto +/ auto redBlackTree(alias less, bool allowDuplicates, E)(E[] elems...) if(is(typeof(binaryFun!less(E.init, E.init)))) { //We shouldn't need to instantiate less here, but for some reason, //dmd can't handle it if we don't (even though the template which //takes less but not allowDuplicates works just fine). return new RedBlackTree!(E, binaryFun!less, allowDuplicates)(elems); } -----
Error: template instance RedBlackTree!(ValueRecord, binaryFun, true) RedBlackTree!(ValueRecord, binaryFun, true) does not match template declaration RedBlackTree(T, alias less = "a < b", bool allowDuplicates = false) if (is(typeof(binaryFun!(less)(T.init, T.init)))) Error: RedBlackTree!(ValueRecord, binaryFun, true) is used as a type Do you know what this error means? Thank, Bogdan
Apr 30 2013
parent reply "Ivan Kazmenko" <gassa mail.ru> writes:
 -----
 import std.functional;
 ...
 	RedBlackTree !(MyRecord, binaryFun!"a.key < b.key", true) 
 cont;
 ...
 	cont = redBlackTree !("a.key < b.key", true, MyRecord) ();
 -----
 Error: template instance RedBlackTree!(ValueRecord, binaryFun, 
 true) RedBlackTree!(ValueRecord, binaryFun, true) does not 
 match template declaration RedBlackTree(T, alias less = "a < 
 b", bool allowDuplicates = false) if 
 (is(typeof(binaryFun!(less)(T.init, T.init))))
 Error: RedBlackTree!(ValueRecord, binaryFun, true) is used as a 
 type
I am able to reproduce it if I write RedBlackTree !(MyRecord, binaryFun, true) instead of RedBlackTree !(MyRecord, binaryFun!"a.key < b.key", true) If you are using a plain regular function instead of "a.key < b.key" there, consider the following form: ----- bool less (T) (auto ref T a, auto ref T b) { return a.key < b.key; } ... RedBlackTree !(MyRecord, less, true) cont; ... cont = redBlackTree !(less, true, MyRecord) (); ----- Note that the straightforward notation does *not* work yet in 2.062 if you want ref parameters: ----- bool less (ref MyRecord a, ref MyRecord b) { return a.key < b.key; } ----- The current condition of binaryFun is too tight. So, for now, we have to create a non-ref version too to pass it. See (and perhaps comment) the following issue in BugZilla: http://d.puremagic.com/issues/show_bug.cgi?id=9513 Ivan Kazmenko.
Apr 30 2013
parent reply "gedaiu" <szabobogdan yahoo.com> writes:
On Tuesday, 30 April 2013 at 17:48:08 UTC, Ivan Kazmenko wrote:
 -----
 import std.functional;
 ...
 	RedBlackTree !(MyRecord, binaryFun!"a.key < b.key", true) 
 cont;
 ...
 	cont = redBlackTree !("a.key < b.key", true, MyRecord) ();
 -----
 Error: template instance RedBlackTree!(ValueRecord, binaryFun, 
 true) RedBlackTree!(ValueRecord, binaryFun, true) does not 
 match template declaration RedBlackTree(T, alias less = "a < 
 b", bool allowDuplicates = false) if 
 (is(typeof(binaryFun!(less)(T.init, T.init))))
 Error: RedBlackTree!(ValueRecord, binaryFun, true) is used as 
 a type
I am able to reproduce it if I write RedBlackTree !(MyRecord, binaryFun, true) instead of RedBlackTree !(MyRecord, binaryFun!"a.key < b.key", true) If you are using a plain regular function instead of "a.key < b.key" there, consider the following form: ----- bool less (T) (auto ref T a, auto ref T b) { return a.key < b.key; } ... RedBlackTree !(MyRecord, less, true) cont; ... cont = redBlackTree !(less, true, MyRecord) (); ----- Note that the straightforward notation does *not* work yet in 2.062 if you want ref parameters: ----- bool less (ref MyRecord a, ref MyRecord b) { return a.key < b.key; } ----- The current condition of binaryFun is too tight. So, for now, we have to create a non-ref version too to pass it. See (and perhaps comment) the following issue in BugZilla: http://d.puremagic.com/issues/show_bug.cgi?id=9513 Ivan Kazmenko.
One more question... why associative arrays in D can't be implemented like here? http://svn.php.net/viewvc/php/php-src/trunk/Zend/zend_hash.h?view=markup it seems that php arrays uses hash tables too but they preserve orders. Thanks, Bogdan
May 04 2013
parent "Ivan Kazmenko" <gassa mail.ru> writes:
 One more question... why associative arrays in D can't be 
 implemented like here?

 http://svn.php.net/viewvc/php/php-src/trunk/Zend/zend_hash.h?view=markup

 it seems that php arrays uses hash tables too but they preserve 
 orders.
From what I see there, this code traverses a hash table in the order in which the elements were inserted, not any other particular order (such as lexicographical). Much like Java's LinkedHashSet: http://docs.oracle.com/javase/6/docs/api/java/util/LinkedHashSet.html This however has a cost of maintaining a linked list which is at least one pointer per entry in size. If that was the implemented behavior, the performance would suffer for users who don't need any particular order. Slightly off topic: by the way, polynomial hashing modulo a power of two (as in the link you gave) is fast but has a rather general corner case regardless of the multiplier. And multiplying by small integers such as 33 gives another easy way to produce plenty of strings with identical hashes, this time regardless of the modulo. So, hashing like "hash(i) = hash(i-1) * 33 + str[i]" is a piece of cake for an adversary who wants your running time to be quadratic instead of linear. Ivan Kazmenko.
May 04 2013