www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.announce - AA dsource project

reply dsimcha <dsimcha yahoo.com> writes:
A few weeks ago I mentioned that I was going to create some kind of forum for
people to post candidate associative array implementations to replace the
current, much-despised implementation as the new "builtin" AA for D.  It's now
up at http://dsource.org/projects/aa/wiki/WikiStart .  SVN write access should
work for all registered dsource users.  If you have an associative array
implementation that you feel is a significant improvement over the current
builtin and are able/willing to license it under the Boost license, please
post it for comment.

Also, if you have a good benchmark for AAs, please create a benchmarks/
directory and submit it.
Nov 04 2009
next sibling parent reply Moritz Warning <moritzwarning web.de> writes:
On Wed, 04 Nov 2009 20:53:20 +0000, dsimcha wrote:

 A few weeks ago I mentioned that I was going to create some kind of
 forum for people to post candidate associative array implementations to
 replace the current, much-despised implementation as the new "builtin"
 AA for D.  It's now up at http://dsource.org/projects/aa/wiki/WikiStart
 .  SVN write access should work for all registered dsource users.  If
 you have an associative array implementation that you feel is a
 significant improvement over the current builtin and are able/willing to
 license it under the Boost license, please post it for comment.
 
 Also, if you have a good benchmark for AAs, please create a benchmarks/
 directory and submit it.

I've committed a port of Pythons dictionary implementation. Here is my announcement along with some benchmark figures: http://digitalmars.com/d/archives/digitalmars/D/ Python_dictionary_inspired_hash_table_implementation_73176.html
Nov 04 2009
next sibling parent reply dsimcha <dsimcha yahoo.com> writes:
== Quote from Moritz Warning (moritzwarning web.de)'s article
 On Wed, 04 Nov 2009 20:53:20 +0000, dsimcha wrote:
 A few weeks ago I mentioned that I was going to create some kind of
 forum for people to post candidate associative array implementations to
 replace the current, much-despised implementation as the new "builtin"
 AA for D.  It's now up at http://dsource.org/projects/aa/wiki/WikiStart
 .  SVN write access should work for all registered dsource users.  If
 you have an associative array implementation that you feel is a
 significant improvement over the current builtin and are able/willing to
 license it under the Boost license, please post it for comment.

 Also, if you have a good benchmark for AAs, please create a benchmarks/
 directory and submit it.

Here is my announcement along with some benchmark figures: http://digitalmars.com/d/archives/digitalmars/D/ Python_dictionary_inspired_hash_table_implementation_73176.html

Nice job. I ported your code to D2, (quick and dirty, doesn't work right with const and all, just enough to be able to test it), and uploaded that version for D2 users. I had in the back of my mind that this was to be D2-centric, but I guess D1 could easily get better AAs, too, since it wouldn't affect the spec. I couldn't find your benchmarks, but here are my results: Builtin: Start Benchmark. 1 x 10000000 iterations inserts: 394601/s (25.342000s) lookups: 9813542/s (1.019000s) PyDict: Start Benchmark. 1 x 10000000 iterations inserts: 2044153/s (4.892000s) lookups: 3401360/s (2.940000s) This pretty much confirms my suspicion that the builtin AAs are optimized for lookup speed at the expense of insertion speed/GC overhead to an absurd degree. Also, regarding the license for Python's dictionary, apparently the Python License doesn't allow removing of copyright notices from stuff distributed in binary form. This might prove problematic if this were to be integrated into druntime.
Nov 04 2009
parent reply Walter Bright <newshound1 digitalmars.com> writes:
dsimcha wrote:
 I couldn't find your benchmarks, but here are my results:
 
 Builtin:
 Start Benchmark.
 1 x 10000000 iterations
 inserts:  394601/s (25.342000s)
 lookups: 9813542/s (1.019000s)
 
 PyDict:
 Start Benchmark.
 1 x 10000000 iterations
 inserts:  2044153/s (4.892000s)
 lookups: 3401360/s (2.940000s)
 
 This pretty much confirms my suspicion that the builtin AAs are optimized for
 lookup speed at the expense of insertion speed/GC overhead to an absurd degree.

Perhaps some combination would work. I'd be reluctant to go with a 3x slower lookup speed.
 Also, regarding the license for Python's dictionary, apparently the Python
License
 doesn't allow removing of copyright notices from stuff distributed in binary
form.
  This might prove problematic if this were to be integrated into druntime.

That's kind of an insurmountable problem. The core stuff needs to be free of licensing problems.
Nov 04 2009
parent dsimcha <dsimcha yahoo.com> writes:
== Quote from Walter Bright (newshound1 digitalmars.com)'s article
 dsimcha wrote:
 I couldn't find your benchmarks, but here are my results:

 Builtin:
 Start Benchmark.
 1 x 10000000 iterations
 inserts:  394601/s (25.342000s)
 lookups: 9813542/s (1.019000s)

 PyDict:
 Start Benchmark.
 1 x 10000000 iterations
 inserts:  2044153/s (4.892000s)
 lookups: 3401360/s (2.940000s)

 This pretty much confirms my suspicion that the builtin AAs are optimized for
 lookup speed at the expense of insertion speed/GC overhead to an absurd degree.

slower lookup speed.

I've also just added my RandAA candidate, which uses parallel arrays and linear congruential randomized probing. It gives insertion speed comparable to the PyDict, with lookup speed only about 30-40% slower instead of 150-200%. For more benchmark details, see the wiki http://dsource.org/projects/aa/wiki/WikiStart, which I've just updated.
Nov 04 2009
prev sibling next sibling parent Moritz Warning <moritzwarning web.de> writes:
On Thu, 05 Nov 2009 03:52:22 +0000, dsimcha wrote:

 == Quote from Moritz Warning (moritzwarning web.de)'s article
 On Wed, 04 Nov 2009 20:53:20 +0000, dsimcha wrote:
 A few weeks ago I mentioned that I was going to create some kind of
 forum for people to post candidate associative array implementations
 to replace the current, much-despised implementation as the new
 "builtin" AA for D.  It's now up at
 http://dsource.org/projects/aa/wiki/WikiStart .  SVN write access
 should work for all registered dsource users.  If you have an
 associative array implementation that you feel is a significant
 improvement over the current builtin and are able/willing to license
 it under the Boost license, please post it for comment.

 Also, if you have a good benchmark for AAs, please create a
 benchmarks/ directory and submit it.

announcement along with some benchmark figures: http://digitalmars.com/d/archives/digitalmars/D/ Python_dictionary_inspired_hash_table_implementation_73176.html

Nice job. I ported your code to D2, (quick and dirty, doesn't work right with const and all, just enough to be able to test it), and uploaded that version for D2 users. I had in the back of my mind that this was to be D2-centric, but I guess D1 could easily get better AAs, too, since it wouldn't affect the spec. I couldn't find your benchmarks, but here are my results: Builtin: Start Benchmark. 1 x 10000000 iterations inserts: 394601/s (25.342000s) lookups: 9813542/s (1.019000s) PyDict: Start Benchmark. 1 x 10000000 iterations inserts: 2044153/s (4.892000s) lookups: 3401360/s (2.940000s) This pretty much confirms my suspicion that the builtin AAs are optimized for lookup speed at the expense of insertion speed/GC overhead to an absurd degree. Also, regarding the license for Python's dictionary, apparently the Python License doesn't allow removing of copyright notices from stuff distributed in binary form. This might prove problematic if this were to be integrated into druntime.

I've committed an initial test bench (test.d). dmd2 test.d pydict/pyDictD2.d randAA/RandAA.d -O -release Linear test, GC enabled, 10 x 5_000_000 inserts/lookups: RandAA(K,V,bool storeHash = shouldStoreHash!(K)): 10 x 1000000 iterations inserts: 677460/s (1.476100s) lookups: 3549875/s (0.281700s) BuildIn: 10 x 1000000 iterations inserts: 218627/s (4.574000s) lookups: 4766444/s (0.209800s) PyDict(K,V): 10 x 1000000 iterations inserts: 1621271/s (0.616800s) lookups: 3834355/s (0.260800s)
Nov 05 2009
prev sibling parent Moritz Warning <moritzwarning web.de> writes:
On Thu, 05 Nov 2009 15:44:49 +0000, Moritz Warning wrote:

 On Thu, 05 Nov 2009 03:52:22 +0000, dsimcha wrote:
 
 == Quote from Moritz Warning (moritzwarning web.de)'s article
 On Wed, 04 Nov 2009 20:53:20 +0000, dsimcha wrote:
 A few weeks ago I mentioned that I was going to create some kind of
 forum for people to post candidate associative array implementations
 to replace the current, much-despised implementation as the new
 "builtin" AA for D.  It's now up at
 http://dsource.org/projects/aa/wiki/WikiStart .  SVN write access
 should work for all registered dsource users.  If you have an
 associative array implementation that you feel is a significant
 improvement over the current builtin and are able/willing to license
 it under the Boost license, please post it for comment.

 Also, if you have a good benchmark for AAs, please create a
 benchmarks/ directory and submit it.

announcement along with some benchmark figures: http://digitalmars.com/d/archives/digitalmars/D/ Python_dictionary_inspired_hash_table_implementation_73176.html

Nice job. I ported your code to D2, (quick and dirty, doesn't work right with const and all, just enough to be able to test it), and uploaded that version for D2 users. I had in the back of my mind that this was to be D2-centric, but I guess D1 could easily get better AAs, too, since it wouldn't affect the spec. I couldn't find your benchmarks, but here are my results: Builtin: Start Benchmark. 1 x 10000000 iterations inserts: 394601/s (25.342000s) lookups: 9813542/s (1.019000s) PyDict: Start Benchmark. 1 x 10000000 iterations inserts: 2044153/s (4.892000s) lookups: 3401360/s (2.940000s) This pretty much confirms my suspicion that the builtin AAs are optimized for lookup speed at the expense of insertion speed/GC overhead to an absurd degree. Also, regarding the license for Python's dictionary, apparently the Python License doesn't allow removing of copyright notices from stuff distributed in binary form. This might prove problematic if this were to be integrated into druntime.

I've committed an initial test bench (test.d). dmd2 test.d pydict/pyDictD2.d randAA/RandAA.d -O -release

Could you check the test bench and see how to integrate the buildin for nicely? (isn't there some alias this that can be used?) I wasn't able to reproduce your results. Could you retry with the test.d please? About the license issue. I have spoken to some python folks about it, but I need to search my logs first.
Nov 05 2009
prev sibling next sibling parent "Saaa" <empty needmail.com> writes:
"dsimcha" <dsimcha yahoo.com> wrote in message 
news:hcspk0$pk5$1 digitalmars.com...
A few weeks ago I mentioned that I was going to create some kind of forum 
for
 people to post candidate associative array implementations to replace the
 current, much-despised implementation as the new "builtin" AA for D.  It's 
 now
 up at http://dsource.org/projects/aa/wiki/WikiStart .  SVN write access 
 should
 work for all registered dsource users.  If you have an associative array
 implementation that you feel is a significant improvement over the current
 builtin and are able/willing to license it under the Boost license, please
 post it for comment.

 Also, if you have a good benchmark for AAs, please create a benchmarks/
 directory and submit it.

Slightly related, I recently looked at the typecons code and was a bit surprised that the string to enum works by checking the string against all possible enum elements. Couldn't a compile time perfect hashed aa fix this? :) private template enumParserImpl(string name, bool first, T...) { static if (first) { enum string enumParserImpl = "bool enumFromString(string s, ref " ~name~" v) {\n" ~enumParserImpl!(name, false, T) ~"return false;\n}\n"; } else { static if (T.length) enum string enumParserImpl = "if (s == `"~T[0]~"`) return (v = "~name~"."~T[0]~"), true;\n" ~enumParserImpl!(name, false, T[1 .. $]); else enum string enumParserImpl = ""; } }
Nov 05 2009
prev sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
dsimcha:
 Also, if you have a good benchmark for AAs, please create a benchmarks/
 directory and submit it.

You may try this too (C code, the "khash.h" file, plus few demos), the original code is not mine, with MIT License: http://www.fantascienza.net/leonardo/js/khash.zip I have found it to be quick for some of my usages. Bye, bearophile
Nov 05 2009
parent dsimcha <dsimcha yahoo.com> writes:
== Quote from bearophile (bearophileHUGS lycos.com)'s article
 dsimcha:
 Also, if you have a good benchmark for AAs, please create a benchmarks/
 directory and submit it.


 http://www.fantascienza.net/leonardo/js/khash.zip
 I have found it to be quick for some of my usages.
 Bye,
 bearophile

For the purposes of this project, I only accept D code. If you want to submit it, port it to D. Also, if you want to submit it, please submit it so we can browse the source code online instead of linking to a zip file that we then have to unzip, etc.
Nov 05 2009