www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Python dictionary inspired hash table implementation

reply Moritz Warning <moritzwarning web.de> writes:
Content-Type: text/plain

I like to share a hash table implementation inspired by Pythons Dictionary.

It's a so called closed hashing table, that is, keys and elements are stored in
the table.
Pointers are avoided when possible.
The table size is 2**n and the used load factor is 0.75.

It works for most key types (also static arrays).
All kind of values types should be supported.

For comparison, I did a simple/stupid benchmark,
adding 10.000.000 uint pairs and looking them up
in consecutive order (see attached file).

The build-in AA uint[uint]:
inserts:  1901966/s (5.26s) lookups: 8065673/s (1.24s)

tango.util.container.HashMap!(uint, uint, ..):
inserts:  6346664/s (1.58s) lookups: 23780047/s (0.42s)

The above implementation, Dict!(uint, uint):
inserts:  8199181/s (1.22s) lookups: 34506306/s (0.29s)

This implementation is far from perfect and has room for improvements.
E.g., it is twice as slow for lookups of ulong keys compared to Tango HashMap
(3 times slower for insertion). It's also slower for arrays.
But that's likely to change with some more tweaking.

The attached code is placed into Public Domain.
Jul 02 2008
parent reply bearophile <bearophileHUGS lycos.com> writes:
Moritz Warning Wrote:
 E.g., it is twice as slow for lookups of ulong keys compared to Tango HashMap
 (3 times slower for insertion). It's also slower for arrays.
 But that's likely to change with some more tweaking.

I think it may need more than just tweakings: the advantage of not using pointers may get lost when the data is too much long. So you may switch to using pointers when the data requires more than 4 or 8 bytes. How much work does it need to be ported to Phobos? Instead of this: import tango.stdc.string; //for memset You probably want to write something like: import tango.stdc.string: memset; I suggest you to compact the code vertically some, to improve readability. This may enjoy a better name: template Helper(T) In my libs I have named it DeconstArrayType. If you have a Python, like a 2.5, you may try it with similar code, to compare the speeds. The simpler code you may try are (in pairs): def main(): d = {} for i in xrange(500000): d["key_" + str(i)] = 0 import psyco; psyco.bind(main) main() import std.string: format; void main() { int[string] d; for (int i = 0; i < 500_000; ++i) d["key_" ~ format(i)] = 0; } def main(): d = dict.fromkeys(xrange(10000000), 0) main() void main() { int[int] d; for (int i; i < 10_000_000; ++i) d[i] = 0; } Bye, bearophile
Jul 02 2008
next sibling parent reply Moritz Warning <moritzwarning web.de> writes:
On Wed, 02 Jul 2008 17:11:43 -0400, bearophile wrote:

 Moritz Warning Wrote:
 E.g., it is twice as slow for lookups of ulong keys compared to Tango
 HashMap (3 times slower for insertion). It's also slower for arrays.
 But that's likely to change with some more tweaking.

I think it may need more than just tweakings: the advantage of not using pointers may get lost when the data is too much long. So you may switch to using pointers when the data requires more than 4 or 8 bytes.

True, but these changes can be made easily by changing one of the wrappers for the targeted key type or by adding a new wrapper. I haven't tested much which approach is most efficient for each data type so far.
 
 How much work does it need to be ported to Phobos?

The only thing that would need to be changed in the implementation itself is the traits stuff.
 
 This may enjoy a better name:
 template Helper(T)
 In my libs I have named it DeconstArrayType.

arises (until now :P).
 
 If you have a Python, like a 2.5, you may try it with similar code, to
 compare the speeds. The simpler code you may try are (in pairs):

 def main():
     d = {}
     for i in xrange(500000):
         d["key_" + str(i)] = 0
 import psyco; psyco.bind(main)
 main()
 
 
 import std.string: format;
 void main() {
     int[string] d;
     for (int i = 0; i < 500_000; ++i)
         d["key_" ~ format(i)] = 0;
 }

since both string formating function may overrule the computation done by the dictionary. Can you provide a Python example that pre-computes an array with strings in a separate function?
 
 
 def main():
     d = dict.fromkeys(xrange(10000000), 0)
 main()

cProfile tells me that Python spend 2.497s (best from 5 tries) with this build-in function. That's about 6680027 insertions per second.
Jul 02 2008
parent reply bearophile <bearophileHUGS lycos.com> writes:
Moritz Warning:
 Can you provide a Python example that pre-computes an array with strings 
 in a separate function?

Experience tells me that creating benchmarks is a very tricky thing, it's really easy to do something wrong. Here I have created a new pair for you, but I am ready to fix them if someone spots a problem. If you need help just ask me. You can create two more versions for your data structure and for the hash map of Tango, if you want. Your timings seem quite good. Python 2.6b may be a bit faster, but 2.5.2 is plenty enough. ------------------ from timeit import default_timer as clock def main(): strings = ["key_" + str(i) for i in xrange(1000000)] t = clock() d = dict.fromkeys(strings, 0) print round(clock() - t, 2) if len(d) < 200: print d import psyco; psyco.bind(main) main() ------------------ import std.stdio: putr = writefln; import std.string: format; version (Win32) { import std.c.windows.windows: GetTickCount; double clock() { return cast(double)GetTickCount() * 1.0e-03; } } version (linux) { import std.c.linux.linux: time; double clock() { return cast(double)time(null); } } void main() { const uint N = 1_000_000; // this doesn't save much time //auto strings = (cast(string*)malloc(string.sizeof * N))[0 .. N]; auto strings = new string[N]; for (uint i; i < N; ++i) strings[i] = "key_" ~ format(i); auto t = clock(); int[string] d; foreach(s; strings) d[s] = 0; putr(clock() - t); if (N < 200) putr(d); } One of the things I miss in Python is the ability to write numbers as 1_000_000. Bye, bearophile
Jul 02 2008
next sibling parent reply "Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:
"bearophile" <bearophileHUGS lycos.com> wrote in message 
news:g4hbj1$1jjj$1 digitalmars.com...

 One of the things I miss in Python is the ability to write numbers as 
 1_000_000.

Uh, that's in D too.
Jul 02 2008
parent "Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:
"Jarrett Billingsley" <kb3ctd2 yahoo.com> wrote in message 
news:g4hdkd$1pbv$1 digitalmars.com...
 "bearophile" <bearophileHUGS lycos.com> wrote in message 
 news:g4hbj1$1jjj$1 digitalmars.com...

 One of the things I miss in Python is the ability to write numbers as 
 1_000_000.

Uh, that's in D too.

Nevermind, I cannot read.
Jul 02 2008
prev sibling parent reply "Manfred_Nowak" <svv1999 hotmail.com> writes:
bearophile wrote:

 creating benchmarks is a very tricky thing

Yes. And as one may recognize: Moritz designed the benchmarks in such a way, - that collisions at least aren't obvious - that the locality of two consecutive inserts/acesses seems to be close to maximum In addition: the space effciency of as low as 25% which Moritz' attempt allows may be inacceptable. Whithout some agreed scenario, i.e. specifications - about space effciency - about static properties of the key spaces - about dynamic properties of the actually used set of keys and its insert/access sequences every user of a dictionary will be able to claim, that his preferred implementation is better than any other. -manfred
Jul 02 2008
parent reply "Manfred_Nowak" <svv1999 hotmail.com> writes:
Moritz Warning wrote:

 Don't expect me to begin testing with an iso certificated benchmark 
 scenario from start up. ;-)

Meanwhile you might want to know of an already existing generic one: http://www.digitalmars.com/d/archives/digitalmars/D/30773.html#N30849 -manfred
Jul 04 2008
parent reply "Manfred_Nowak" <svv1999 hotmail.com> writes:
Moritz Warning wrote:

 The tests for the build-in AA had to be started ten times since
 the test loop causes out of memory problems. The effect on the
 test result should be negligible.

No. You should reduce the number of operations to 7_500_000. Sorrily the PyDict seems to be bad suited for adresses out of the 2G- range. My tests show, that the builtin AA only needs 45% of the time of PyDict. To verify this on your machine set ttotal= 512*1024*1024; // 512 MB and replace "rand()" by "rand() << 2", which effectively gives 4 byte alligned adresses out of the 2G range. -manfred
Jul 04 2008
parent reply "Manfred_Nowak" <svv1999 hotmail.com> writes:
Moritz Warning wrote:

 PyDict decreases in performance for this case.

This is a hint, that PyDict may be well suited only, when the actual distribution of keys over the keyspace is dense. -manfred -- Maybe some knowledge of some types of disagreeing and their relation can turn out to be useful: http://blog.createdebate.com/2008/04/07/writing-strong-arguments/
Jul 09 2008
parent bearophile <bearophileHUGS lycos.com> writes:
Manfred_Nowak:
 Moritz Warning wrote:
 PyDict decreases in performance for this case.

This is a hint, that PyDict may be well suited only, when the actual distribution of keys over the keyspace is dense.

Python dicts contains only python objects, they are always references, so they contain only pointers, this means Python dicts are probably optimized only for keys/values 4 (and probably 8 too) bytes long each. So the AA implementation may have to contain two versions, one for large keys and one for smaller (4-8 bytes) ones. Builtin AAs are used very often (if you are used to program in scripting languages), so this may be worth. Bye, bearophile
Jul 09 2008
prev sibling next sibling parent Moritz Warning <moritzwarning web.de> writes:
On Thu, 03 Jul 2008 06:49:03 +0000, Manfred_Nowak wrote:

 bearophile wrote:
 
 creating benchmarks is a very tricky thing

Yes. And as one may recognize: Moritz designed the benchmarks in such a way, - that collisions at least aren't obvious - that the locality of two consecutive inserts/acesses seems to be close to maximum

There was no intention in designing the "benchmark" to get high speed for this particular implementation. It's just the simplest approach to test for errors most people would come up to. It's main aim is to detect bigger errors (and it already did).
 In addition: the space effciency of as low as 25% which Moritz' attempt
 allows may be inacceptable.

The build-in AA runs out of space on my machine (1GB) when other implementations keep on working (GC was enabled, 20m uint pairs). Apart from that, the Dictionary implementation is open for improvement.
 
 Whithout some agreed scenario, i.e. specifications
 
 - about space effciency
 - about static properties of the key spaces - about dynamic properties
 of the actually used set of keys and its insert/access sequences

That's a todo. Don't expect me to begin testing with an iso certificated benchmark scenario from start up. ;-)
 every user of a dictionary will be able to claim, that his preferred
 implementation is better than any other.

As already was said and most people know, designing hash tables is a tricky thing. You can mostly/always design them to be the fastest in the world for a particular testing method. The main goal is to perform well on most everyday use cases. Since the implementation is based on Pythons Dictionary (which is aimed for a general use), There is justified hope that it doesn't perform much worse for other use cases as well. But that's up to testing and and further improvements! - mwarning
Jul 03 2008
prev sibling next sibling parent Moritz Warning <moritzwarning web.de> writes:
On Wed, 02 Jul 2008 21:55:45 -0400, bearophile wrote:

[..]
 from timeit import default_timer as clock
 
 def main():
     strings = ["key_" + str(i) for i in xrange(1000000)]
 
     t = clock()
     d = dict.fromkeys(strings, 0)
     print round(clock() - t, 2)
     if len(d) < 200: print d
 
 import psyco; psyco.bind(main)
 main()

Outputs 0.68 on average. I ported your Phobos code to Tango, these are the results: Dict!(char[], uint) 10 x 1000000 iterations inserts: 1292566/s (0.77s) lookups: 3492603/s (0.29s) HashMap!(char[], uint, ..) 10 x 1000000 iterations inserts: 1673673/s (0.60s) lookups: 6483940/s (0.15s) uint[char[]] 10 x 1000000 iterations inserts: 2088894/s (0.48s) lookups: 3886346/s (0.26s) There might be some bottleneck in Dict. Array hashing for Dict only accounts for 0.03s. btw.: There is a bug in the array hash method. It doesn't iterate up to the last array element. Here is the fixed version used for the test. //hash function ubyte[] a = cast(ubyte[]) data; int len = a.length; ubyte* p = cast(ubyte *) a.ptr; hash = *p << 7; while (--len >= 0) { hash = (1000003 * hash) ^ *p++; } hash ^= a.length;
Jul 03 2008
prev sibling next sibling parent Moritz Warning <moritzwarning web.de> writes:
On Fri, 04 Jul 2008 07:15:02 +0000, Manfred_Nowak wrote:

 Moritz Warning wrote:
 
 Don't expect me to begin testing with an iso certificated benchmark
 scenario from start up. ;-)

Meanwhile you might want to know of an already existing generic one: http://www.digitalmars.com/d/archives/digitalmars/D/30773.html#N30849 -manfred

Good idea - I ported the test to Tango. Here are the results for uint=>uint maps on average (ten runs): (GC disabled) PyDict: 6.73s Tango HashMap: 9.27s build-in AA: 12.20s (GC enabled) PyDict: 6.85s Tango HashMap: 9.43s build-in AA: 70.xys (I've got lazy) The tests for the build-in AA had to be started ten times since the test loop causes out of memory problems. The effect on the test result should be negligible.
Jul 04 2008
prev sibling parent Moritz Warning <moritzwarning web.de> writes:
On Fri, 04 Jul 2008 19:45:47 +0000, Manfred_Nowak wrote:

 Moritz Warning wrote:
 
 The tests for the build-in AA had to be started ten times since the
 test loop causes out of memory problems. The effect on the test result
 should be negligible.

No. You should reduce the number of operations to 7_500_000. Sorrily the PyDict seems to be bad suited for adresses out of the 2G- range. My tests show, that the builtin AA only needs 45% of the time of PyDict. To verify this on your machine set ttotal= 512*1024*1024; // 512 MB and replace "rand()" by "rand() << 2", which effectively gives 4 byte alligned adresses out of the 2G range. -manfred

With these changes, I get these results: PyDict:7.73s AA: 9.76s HashMap: 7.36s PyDict decreases in performance for this case. But I can't see any significant speedup for AA. Btw.: I had the 4x table grow for small tables changed to 2x in PyDict for all previous tests, since there wasn't any speed decrease to measure.
Jul 04 2008