digitalmars.D - Associative Arrays need cleanout method or property to help garbage

Michael Rynn (106/106) Mar 15 2010 The use of built in Associative Array can have deleterious effects on

dsimcha (13/13) Mar 16 2010 I've been arguing for a long time that the interaction between conservat...
Moritz Warning (15/17) Mar 16 2010 :(

Michael Rynn (134/156) Mar 16 2010 I am beginning to get a headache playing around with this. There must be

Robert Jacques (4/8) Mar 20 2010 On Wed, 17 Mar 2010 02:57:34 -0300, Michael Rynn

Michael Rynn (30/39) Mar 21 2010 I have done a lot of work, comparing and testing the various AA

Moritz Warning (7/16) Mar 20 2010 Real world usage is hard to test.

bearophile (6/8) Mar 20 2010 D and Python will probably have different usage patterns for AAs. In Pyt...

Rainer Deyke (13/25) Mar 20 2010 I don't know about Java or D, but my usage pattern for
Walter Bright (4/8) Mar 22 2010 There's no need to tune the language. The implementation and data struct...

Michael Rynn (11/21) Mar 23 2010 Well, I have.

Walter Bright (17/41) Mar 23 2010 It's a nice implementation. There are some problems with it, though:

Michael Rynn (38/85) Mar 23 2010

Walter Bright (19/25) Mar 23 2010 I was looking at:

Michael Rynn (3/7) Mar 24 2010 Thank you for the suggestion , I will give it a try.

Michael Rynn <michaelrynn optusnet.com.au> writes:

The use of built in Associative Array can have deleterious effects on 
Garbage Collection.


D by default uses garbage collection for memory management.

This means that any blocks of memory that contain random bits can be 
mistaken as containing pointers to valid data, and can prevent complete 
memory block recycling. This leaves increasing amounts of memory garbage 
lying around to progressively worsen allocation performance.

Its easy to observe a slowdown of general performance after using a 
builtin associative array to hold a large number of randomly generated 
keys and values.

To better observe and understand how the builtin runtime AA behaves, and 
to see it might be improved or better used, I created a module that 
implements the same algorithm as the builtin AA.  

This has been added to the Dsource aa project.  trunk/hash/aatree.d.

The actual AA is implemented as a class. To mimic the builtin AA, a 
template structure uses a reference to the class to carry out the builtin 
AA functions. The code was a direct adaption of the code found in the 
current version 2.041 of aaA.d

This is a useful class because it is possible to experiment with the AA 
functionality without affecting or needing to rebuild the D runtime 
library.
The aatree module exhibited the same slowdown behaviour as builtin AA 
using the random key-value test. This slowdown can be abolished by 
forcing the AA to clean up its nodes before the AA goes out of scope.

The structures for the builtin AA is simple. The main structure is a 
Node, which holds the Key and its hash value, the associated data value, 
and 2 pointers , a left and right Node.  The pointers make up a tree of 
key-value pairs. All the nodes in a tree are related by having the same 
value of (hash % tablesize) , which gives the index into main Node 
pointer array. The algorithm allocates a node for every association.

The other feature of the builtin AA is that it uses the TypeInfo object 
for the key to implement the hash and compare, and this is also done in 
the aatree module.

The AATree class holds the table of node pointers.  As the number of 
nodes increases with insertions, after it is greater than 4 times the 
table length, the table size is increased, and a rehash operation done to 
redistribute the nodes. 

The test main module is testhash.d. For comparison of a different 
algorithm, the pydict hash implementation is also tested. Being both 
template library modules, the relative effectiveness of the algorithms 
can be more equitably compared.


These are the results of running testhash, on a windows 32bit XP OS 
running virtual guest on linux. From the command line, the number of 
runs, number of insertions, and a test selection can be chosen.

50 repetitions of 500000,   -release -O   (optimise does improve 
performance)

uint[uint]		best/average
-------------		--------------
PyDict		0.12/0.13 seconds


AATree		0.36/1.13 seconds
		Best was first run, noted to be asymptotically increasing 
to 1.5 seconds.

Builtin (after recompile)
		0.33/ ** The number of seconds increased with every run 
by about 1 second. So run 27 was 34.5 seconds, after which I did a Ctrl-C.

uint[string]

PyDict		0.22 insertions,  
		0.17 lookups

AATree		0.43/1.27 insertions
		0.16/0.21 lookups

Builtin - 	0.43/   insertions
		0.16/**    lookups
		Insertion time tends to worsen with increasing run number 
(noisey). Run 30 was 25 
seconds.




As another experiment, I modified aatree to use the tango Container 
module to allocate nodes, and coded things to ensure that this container 
and the aatree table_ was wiped out after each run. This is controlled by 
compiling with version=TangyHeap. tangy.util.container.Container is a D2 
adjusted version of the same tango module.  I seem to have done a useful 
bit D2 tango adaption so far, and I wonder how many others have secret 
tango D2 adaptations.

The pydict is a good implementation in that it combines node allocation 
and table in one, and so needs no special node allocation heap.



This ensures that the nodes are not directly allocated by the GC, but 
belong to an array. This simplifies the clear() method implementation.


The figures for AATree for this version were

uint[uint]	0.23/0.23	seconds


uint[string]	0.33/0.34 insertions
		0.16/0.16 lookups


There is no slowdown, and best performance yet from the AATree. The 
figures might be improved by tweaking or customising the Container 
allocation. The AATree has slightly better lookups than the pydict 
implementation.

So the slowdown in performance from the builtin arrays is due to garbage 
collection failure to delete unused nodes, and also the slowdown of 
allocating a lot of nodes directly from the GC, possibly with a component 
of heap fragmentation.  The builtin AA does free individual nodes on 
remove, but otherwise has no built in optimal clear operation. 



Without using the tangy.util.container.Container, and still properly 
clearing out the node table_ and nodes, the following AATree results 
occurred.

uint[uint]	0.36/0.37	seconds


uint[string]	0.45/0.48 insertions
		0.20/0.21 lookups

The node cleanup allocation and cleanup takes a bit more work without the 
special heap.

My conclusion is that when AA implementations are scoped or tied to 
classes with known lifetimes, every effort should be made to help out the 
garbage collector.  A clear method/property should be provided to help 
achieve this.  "clear" may not be the best method name, as this property 
was available to the builtin AA code. It compiled, and it appears to be 
generic object thing, but it did nothing useful for memory management.

Otherwise some node data may point into the table memory, and then no 
nodes can be freed.

Micheal Rynn.

Mar 15 2010

dsimcha <dsimcha yahoo.com> writes:

I've been arguing for a long time that the interaction between conservative GC
w/o
thread-local allocators and the builtin AA is horrible.  Keep this in mind:  You
have to take the GC lock for **EVERY SINGLE INSERTION** into a builtin AA.  This
makes them absolutely worthless in multithreaded environments.  Even in single
threaded mode, they create ridiculous amounts of false pointers and heap
fragmentation.

I even went as far as to create my own AA implementation, called RandAA,
specifically designed for conservative GC.  It uses parallel key and value
arrays
(to save on alignment overhead and allow only keys or only values to be scanned
by
the GC) and randomized probing.  In real world programs where false pointers
were
eating me alive with the builtin, RandAA worked fine.  Unfortunately, it's
succumbed to bit rot a little, but if there's interest in it again, I'll fix the
issues and post a link.

Mar 16 2010

Moritz Warning <moritzwarning web.de> writes:

On Tue, 16 Mar 2010 04:41:55 +0000, Michael Rynn wrote:

 The use of built in Associative Array can have deleterious effects on
 Garbage Collection.

 :(


Fwiw, here is the original Python Dictionary implementation:
http://svn.python.org/projects/python/trunk/Objects/dictobject.c
This is also a nice read:
http://svn.python.org/projects/python/trunk/Objects/dictnotes.txt

As for difference to the D impl.:
The decision logic for table resizing is a bit different (Python uses a 
load factor of 2/3 and table doubling for more as 50K entries, *4 else).
Python also wraps everything in Objects and attaches the hash when 
computed.


As for your modifications to the PyDict code:
Why have you changed the length() implementation?
As far as I remember, it's wrong to just return "used".


Also, can you add your timing to the aa project homepage?

Mar 16 2010

Michael Rynn <michaelrynn optusnet.com.au> writes:

On Tue, 16 Mar 2010 16:12:25 +0000, Moritz Warning wrote:

 On Tue, 16 Mar 2010 04:41:55 +0000, Michael Rynn wrote:
 
 The use of built in Associative Array can have deleterious effects on
 Garbage Collection.

  :(
 
 
 Fwiw, here is the original Python Dictionary implementation:
 http://svn.python.org/projects/python/trunk/Objects/dictobject.c This is
 also a nice read:
 http://svn.python.org/projects/python/trunk/Objects/dictnotes.txt
 
 As for difference to the D impl.:
 The decision logic for table resizing is a bit different (Python uses a
 load factor of 2/3 and table doubling for more as 50K entries, *4 else).
 Python also wraps everything in Objects and attaches the hash when
 computed.
 
 


I am beginning to get a headache playing around with this. There must be
better things to do.  I have refined the test program a bit more and added
some missing bits to aaht by calling the implementation class.
For a better formatted statics summary,
http://www.dsource.org/projects/aa/wiki/HashTreeBenchmark

The following summary statistics were obtained. The no-cleanup averages
represent more or less the value of the middle run number, due to
slowdown, except for aapy.   Even so, doing a cleanup improved the aapy
results,  for which insertion times are very impressive. I attribute this
to not needing to allocate memory for each insertion. Lookups for aapy are
on average about 7-8% slower.

The bits I find interesting are:-

When doing cleanups, the aaht library version of the builtin AA is about
33% faster on insertions than the builtin AA, although a few percent
slower on lookups. Using a Tangy heap Container for node allocation
increases performance to 60% faster than builtin, and brings the lookups
to par. It also allows a super fast cleanup.

Since the builtin has no builtin cleanup to call, I used the following
template to inefficiently do the task.

The aaht version with a custom clear did the task 1.5 to 2.5 times faster
(no TangyHeap), or really super fast (TangyHeap).



void
clear(K,V)(ref V[K] aa)
{
    K[] keyset = aa.keys;
    foreach(ks; keyset)
    {
        aa.remove(ks);
    }
}

A call to aa.clear then cleans up in reasonable time.

summary average 10 runs of size 500_000

no-cleanup uint[uint]
		insert  lookup
builtin(K,V)	4.395	0.0388
aaht(K,V)	4.5906	0.0426
aapy(K,V)	0.139	0.0423

cleaned-up uint[uint]
		insert  lookup cleanup
builtin(K,V)	0.4601	0.0376	  0.1252
aaht(K,V)	0.3021	0.0399	  0.081
aapy(K,V)	0.0957	0.0438	  0.0002


no-cleanup uint[string]
builtin(K,V)	3.4727	0.1596
aaht(K,V)	3.6453	0.1646
aapy(K,V)	0.9753	0.1735

cleaned-up uint[string]
builtin(K,V)	0.668	0.1658	 0.2609
aaht(K,V)	0.4396	0.1621	 0.1001
aapy(K,V)	0.2286	0.1717	 0.0002


After recompile with aaht version=TangyHeap uint[uint]
aaht(K,V)	0.188	0.0387	0.0004

uint[string]
aaht(K,V)	0.3391	0.1609	0.0001



I have no empirical or logical justifications for when to resize tables. I
just copy the previous code and hope I got it right.

 There must have been some previous research on the number of times a
lookup or insert needs to check another slot because of hash collision,
causing slowdown. But wether the ratio is 3/5 (60%) 2/3 (66%) or 3/4(75%)
has not yet made for me an observable effect in pydict.

 As for your modifications to the PyDict code: Why have you changed the
 length() implementation? As far as I remember, it's wrong to just return
 "used".

Formerly 'used' field was the number of active nodes in the table itself.
The relationship is mathematically seen in the size() property: ---
length = used + is_dummy + is_unused; ---

where is_dummy and is_unused can be 1 or 0.

I have rewritten this an equivalent form used_entries = length_ - is_dummy
- is_unused.

I have now changed the 'used' to become length_, and added a comment to
make it plain.

This length_ variable now includes the count of the usage of the 2 extra
nodes for unused_hash(0) and dummy_hash(1).  The length_ count will get
bumped up or down whenever these nodes are set or unset respectively, as
if they were normal nodes.

'used_entries' is only calculated just before readjusting the table size.
This is a computationally a trivial no difference, only preferred because
I thought that length_ will be accessed more than 'used_entries', giving
it the possibility of no calculation property access.


I want to add that the problems of AA with garbage collection are not
particular to the implementation of built-in AA. The number of nodes that
builtin AA creates will increase GC scanning, but the AA slowdowns also
occurs using the pyDict implementation with string keys.

Original PyDictD2 slowed down for both uint and string keys, unless I set
NO_SCAN calloc.
Then I changed the implementation to use new Entry[] instead of a calloc.
This helped the uint[uint] test, but not the uint[string] test.

For uint[string]  the pyDict still slows progressively after each run, not
nearly as much as the for builtin AA. The only way I can prevent this is
to forcefully clean up the AA after each use, for whatever implementation.

The testhash.d command line 3rd option for the test select shows this.

option 5 -  aapy uint[string] - no cleanup. -- slows down option 15   aapy
uint[string] - cleanup     -- no slow down.

ditto for aabuiltin -1,11(except that it always slows down, because no
cleanup possible)  and aaht - 3,13.

For immutable(char)[] keys, uint values The AA needs to be scanned because
AA node holds a reference to the key. If the key is not reachable and
referenced elsewhere, the key array will be deleted by the GC, and the AA
key will then reference freed memory.

The testhash program is holding references to the original string keys in
a dynamic array. Also tested a version that makes sure all the AA values
are 0, to help the garbage collector, but it still slows down if no
cleanup requested.  Another version flag calls the GC.collect before each
run. Its as if bits in the string keys are being misinterpreted as
pointers.

With the current D garbage collector, using any AA implementation, builtin
or otherwise, requires the ability to enforce cleanup. I do not expect any
harm from using enforced AA memory management.

From my reading of Hans Boehm garbage collector, it is possible to provide
TypeInfo hints to it has to what to interpret as pointers, in a structure
or array. It would be nice to be able to do this from TypeInfo in D. I do
not know if it is done to any extent at all in D.  I suppose there is
always both a development and a performance penalty for this sort of
thing. There also appears to be a penalty for not doing it.

While the GC cannot be fixed real soon, or cannot be replaced with a more
descriminating or enhanced with TypeInfo hints, then for many
applications, D and its libraries interweave manual memory management with
garbage collection, such as this little aa project.  Applications are able
to delete and clear things using some degree safety encapsulation and
stack scoping, and still let the garbage collector worry about what it can
cope with.

Mono dev is trying to wean off the Hans Boehm and use something more exact
and generational. Given the failing of garbage collection for most AA
implementations, all current D AA implementations need a manual cleanup
method. Other alternatives include using being able to specify non-scanned
memory implementation where the programmmer deems it safe.

Have a cleanup method for the AA, makes memory management more flexible
and accessible, and when used will nearly always improve overall
performace.  Improve the AA or fix the GC. Which is easier?

The aaht version with the heap node allocation and fast cleanup seems to
be a better AA than the built-in.  The aapy has by far the best insertion
results.

But do not take this very restricted and rather extreme sampling of
possible test parameters to be a reliable guide to relative performance
when used in other applications with differing environments. ---
Michael Rynn.

Mar 16 2010

"Robert Jacques" <sandford jhu.edu> writes:

On Wed, 17 Mar 2010 02:57:34 -0300, Michael Rynn  
<michaelrynn optusnet.com.au> wrote:
[snip]
 I want to add that the problems of AA with garbage collection are not
 particular to the implementation of built-in AA. The number of nodes that
 builtin AA creates will increase GC scanning, but the AA slowdowns also
 occurs using the pyDict implementation with string keys.

Actually, you can design the AA to 'play nice' with D's GC. See RandAA.

Mar 20 2010

Michael Rynn <michaelrynn optusnet.com.au> writes:

On Sat, 20 Mar 2010 16:55:15 -0300, Robert Jacques wrote:

 On Wed, 17 Mar 2010 02:57:34 -0300, Michael Rynn
 <michaelrynn optusnet.com.au> wrote:
 [snip]
 I want to add that the problems of AA with garbage collection are not
 particular to the implementation of built-in AA. The number of nodes
 that builtin AA creates will increase GC scanning, but the AA slowdowns
 also occurs using the pyDict implementation with string keys.

 
 Actually, you can design the AA to 'play nice' with D's GC. See RandAA.


I have done a lot of work, comparing and testing the various AA 
implementations with uint and string keys. The results are on http://
www.dsource.org/projects/aa/wiki/HashTreeBenchmark. The adaptation 
sources and benchmark program are in the http://www.dsource.org/projects/
aa source tree.


I included 2 variants of the RandomAA in the test results, one using the 
current psuedo-random sequence and the other using a similar slot 
sequence algorithm to the pydict implementation.  

I called the 2nd class the parallel-py AA, (still in the RandAA file as 
the default template option). The Parallel-rd sequence using psuedo 
random lookup was a clear performance loser, possibly as the keys are 
well randomized already. Parallel-py outperforms the pydict on uint keys, 
but lagged on string keys. So the idea of using separate arrays to hide 
from unnecessary GC scans may be a help. 

The best performer all round was the Tango-based hashmap. Builtin AA has 
a few issues with really big datasets as noted before, and was not so 
fast with insertions and integer sized keys.

The big dataset GC problem is hopefully and probably not an issue for 
smaller datasets ( < 50,000) , unless maybe the data is has a lot of 
unlucky significant pointers, something very hard to control. 

There is now a template code version of the Builtin AA class (hash/
hashtree.d) with added optimizations of node heap management (from Tango 
Container), no hash necessary for integer sized keys, and enforceable 
cleanup.

 This optimized hashtree on these benchmarks performed as well as the 
Tango hashmap. I also added some enforceable cleanup for the Tango based 
hashmap, so that GC failure issues did not affect timings on big datasets.

--- taf

Michael Rynn

Mar 21 2010

Moritz Warning <moritzwarning web.de> writes:

On Wed, 17 Mar 2010 03:07:57 +0000, Michael Rynn wrote:

 On Tue, 16 Mar 2010 16:12:25 +0000, Moritz Warning wrote:
 
 On Tue, 16 Mar 2010 04:41:55 +0000, Michael Rynn wrote:
 


[..]
 
 But do not take this very restricted and rather extreme sampling of
 possible test parameters to be a reliable guide to relative performance
 when used in other applications with differing environments. ---
 Michael Rynn.

Real world usage is hard to test.
The Python docs say that common usage is mostly about very small aas
with frequent accesses but rare changes.

Anyway, thanks for updating the dsource.org/projects/aa wiki, adding 
tests and code.

Mar 20 2010

bearophile <bearophileHUGS lycos.com> writes:

Moritz Warning:
 The Python docs say that common usage is mostly about very small aas
 with frequent accesses but rare changes.

D and Python will probably have different usage patterns for AAs. In Python
dicts (AAs) are used to give arguments to functions (that's why recursive
functions are so slow in Python) and to represent most name spaces (there are
few exceptions like the __slots__, but they are uncommon), so in a big
percentage of cases they contain few string-value pairs (they have an
optimization for string-only keys), that's why python dicts have an
optimization for less than about 8 key-value pairs.

In D AA are easy to use so people will probably use smaller AAs compared to the
AAs used in C++/Java, so I think a small-AA optimization can be useful in D
too. But little AAs are not used for all those Python purposes, so probably on
average they are going to be larger.

A way to know the usage patterns of D AAs is to add instrumentation in
nonrelease mode, they can save few statistical data once in a while in the
install directory of D, and then the user can give this little txt file to
Walter to tune the language :-)

Bye,
bearophile

Mar 20 2010

Rainer Deyke <rainerd eldwood.com> writes:

On 3/20/2010 13:21, bearophile wrote:
 Moritz Warning:
 The Python docs say that common usage is mostly about very small
 aas with frequent accesses but rare changes.

 
 D and Python will probably have different usage patterns for AAs. In
 Python dicts (AAs) are used to give arguments to functions (that's
 why recursive functions are so slow in Python) and to represent most
 name spaces (there are few exceptions like the __slots__, but they
 are uncommon), so in a big percentage of cases they contain few
 string-value pairs (they have an optimization for string-only keys),
 that's why python dicts have an optimization for less than about 8
 key-value pairs.

I don't know about Java or D, but my usage pattern for
boost::unordered_map in C++ is rather similar to that of Python dicts:
  - Most keys are strings (either std::string or my own interned string
class).
  - The amount of read accesses far exceeds the amount of write
accesses.  Many AAs are write-once, read-often.
  - Small AAs are common, although probably not as common as in Python.
  - I even use boost::unordered_map as the dictionary type in my own
Python-like scripting language.  This accounts for around 10% of my
total usage of boost::unordered_map.


-- 
Rainer Deyke - rainerd eldwood.com

Mar 20 2010

Walter Bright <newshound1 digitalmars.com> writes:

bearophile wrote:
 A way to know the usage patterns of D AAs is to add instrumentation in
 nonrelease mode, they can save few statistical data once in a while in the
 install directory of D, and then the user can give this little txt file to
 Walter to tune the language :-)

There's no need to tune the language. The implementation and data structure of 
the AAs is completely opaque to the language. The implementation is in aaA.d. 
Feel free to try different implementations in it!

Mar 22 2010

Michael Rynn <michaelrynn optusnet.com.au> writes:

On Mon, 22 Mar 2010 00:52:24 -0700, Walter Bright wrote:

 bearophile wrote:
 A way to know the usage patterns of D AAs is to add instrumentation in
 nonrelease mode, they can save few statistical data once in a while in
 the install directory of D, and then the user can give this little txt
 file to Walter to tune the language :-)

 
 There's no need to tune the language. The implementation and data
 structure of the AAs is completely opaque to the language. The
 implementation is in aaA.d. Feel free to try different implementations
 in it!

Well, I have.

See the code here : http://www.dsource.org/projects/aa/browser/trunk/
druntime/aaA.d

See the benchmarks and comments here : http://www.dsource.org/projects/aa/
wiki/DrunTimeAA.  

The result squeezes some extra performance for integer or less sized keys 
(about 20% faster for lookups).

Insertions and cleanups are faster too.

-- taf
Michael Rynn.

Mar 23 2010

Walter Bright <newshound1 digitalmars.com> writes:

Michael Rynn wrote:
 On Mon, 22 Mar 2010 00:52:24 -0700, Walter Bright wrote:
 
 bearophile wrote:
 A way to know the usage patterns of D AAs is to add instrumentation in
 nonrelease mode, they can save few statistical data once in a while in
 the install directory of D, and then the user can give this little txt
 file to Walter to tune the language :-)

 There's no need to tune the language. The implementation and data
 structure of the AAs is completely opaque to the language. The
 implementation is in aaA.d. Feel free to try different implementations
 in it!

 
 Well, I have.
 
 See the code here : http://www.dsource.org/projects/aa/browser/trunk/
 druntime/aaA.d
 
 See the benchmarks and comments here : http://www.dsource.org/projects/aa/
 wiki/DrunTimeAA.  
 
 The result squeezes some extra performance for integer or less sized keys 
 (about 20% faster for lookups).
 
 Insertions and cleanups are faster too.

It's a nice implementation. There are some problems with it, though:

1. Deleted entries cannot be returned to the garbage collector pool unless the 
entire hash table is removed.

2. All the keys are stored in a single array, as are all the values. This 
requires that a contiguous chunk of memory can be found for the arrays when 
rehashing to a larger hashmap. This could make it very difficult for this to 
work when hash tables grow to be of significant size relative to all available 
memory. Note that the current implementation has a bucket array that has as
many 
entries as the number of elements in the hashtable, but the element size is
only 
a pointer, not the arbitrary size of a key or value.

3. Rehashing (required as the hashmap grows) means that the key/value arrays 
must be reallocated and copied. That's fine for the keys, but for the values 
this is a huge problem because the user may have dangling pointers into the 
values. (This is why Andrei started the thread entitled "An important potential 
change to the language: transitory ref".) I think this is currently the killer 
problem for randAA's approach.

Mar 23 2010

Michael Rynn <michaelrynn optusnet.com.au> writes:

On Tue, 23 Mar 2010 13:41:56 -0700, Walter Bright wrote:

 Michael Rynn wrote:
 On Mon, 22 Mar 2010 00:52:24 -0700, Walter Bright wrote:
 
 bearophile wrote:
 A way to know the usage patterns of D AAs is to add instrumentation
 in nonrelease mode, they can save few statistical data once in a
 while in the install directory of D, and then the user can give this
 little txt file to Walter to tune the language :-)

 There's no need to tune the language. The implementation and data
 structure of the AAs is completely opaque to the language. The
 implementation is in aaA.d. Feel free to try different implementations
 in it!

 
 Well, I have.
 
 See the code here : http://www.dsource.org/projects/aa/browser/trunk/
 druntime/aaA.d
 
 See the benchmarks and comments here :
 http://www.dsource.org/projects/aa/ wiki/DrunTimeAA.
 
 The result squeezes some extra performance for integer or less sized
 keys (about 20% faster for lookups).
 
 Insertions and cleanups are faster too.

 
 It's a nice implementation. There are some problems with it, though:
 
 1. Deleted entries cannot be returned to the garbage collector pool
 unless the entire hash table is removed.
 
 2. All the keys are stored in a single array, as are all the values.
 This requires that a contiguous chunk of memory can be found for the
 arrays when rehashing to a larger hashmap. This could make it very
 difficult for this to work when hash tables grow to be of significant
 size relative to all available memory. Note that the current
 implementation has a bucket array that has as many entries as the number
 of elements in the hashtable, but the element size is only a pointer,
 not the arbitrary size of a key or value.
 

  
   The file aaA.d is not using the randomAA implementation. Its your very 
own original AA with modifications (Added mag wheels, turbo charged 
memory, specializations). 

So assertion 2 is not quite true, as this is still the same underlying 
algorithm as for the original AA.  I actually benchmarked and tried to 
improve the random AA a fair chance out of politeness, for comparison 
interest, to demonstrate that its performance was not as good as the 
others, and to demonstrate the builtin AA had further potential. I also 
compared the tango hashmap, which is very good. 

 Your hashtree approach is to have a separate node is allocated for each 
Key-Value association.  So all the keys and values are not stored 
algorithmically in a single array, and do not actually require a 
contiguous large chunk of memory. There is however a intermediate heap 
between the AA and the garbage collector, which allocates 4K blocks of 
memory, and obviously, is unlikely to free any of those blocks until all 
nodes within are released from use, unless there is specially favourable 
insertion and deletion patterns. 

Now the memory chunker could be chucked out, just leaving the 
specialization for the integer sized keys, and that would settle 
assertion 1.  This is changed on only a few lines of code , and could 
actually be versioned.

What would also be lost is some degree of improved insertion and cleanup 
performance. And also some degree of that benchmark performance, which 
cheats, by the way, by looking up keys in the very same order that they 
were inserted, so that nodes stored in insertion order are likely to be 
grouped in memory together, and so benefit from CPU memory cache 
concurrency. I have not got around yet to changing it to a thoroughly 
stirred and mixed up sequence for the lookups, which should put figures 
up a bit.



 3. Rehashing (required as the hashmap grows) means that the key/value
 arrays must be reallocated and copied. That's fine for the keys, but for
 the values this is a huge problem because the user may have dangling
 pointers into the values. (This is why Andrei started the thread
 entitled "An important potential change to the language: transitory
 ref".) I think this is currently the killer problem for randAA's
 approach.

I do not think this is true of current builtin AA, and therefore not true 
of this enhanced implementation. I note the current builtin _aaGet will 
locate a new node by pointer, maybe do a rehash, and then return the node 
it previously relocated.  Relax, this AA has no fear of nodes moving 
around on rehash. They are just relinked. All done with pointers. 

---
Michael Rynn

Mar 23 2010

Walter Bright <newshound1 digitalmars.com> writes:

Michael Rynn wrote:
 I do not think this is true of current builtin AA,

Correct, it is not.

 and therefore not true 
 of this enhanced implementation. I note the current builtin _aaGet will 
 locate a new node by pointer, maybe do a rehash, and then return the node 
 it previously relocated.  Relax, this AA has no fear of nodes moving 
 around on rehash. They are just relinked. All done with pointers. 

I was looking at:

http://www.dsource.org/projects/aa/browser/trunk/randAA/RandAA.d

which contains the data structures:

   K* _keys;
   V* vals;
   ubyte* flags;

i.e. 3 parallel arrays. I apologize since you are apparently not referring to 
that, but this:


http://www.dsource.org/projects/aa/browser/trunk/druntime/aaA.d

but that uses the data structure:

       aaNode *left_;
       aaNode *right_;
       static if (keyBig)
            hash_t hash_;

so I don't see where the linear congruent random probe thingy is <g>. You 
*could* get rid of the left and right pointers and use linear congruent
probing, 
and that might be worth a try.

Mar 23 2010

Michael Rynn <michaelrynn optusnet.com.au> writes:

 
 so I don't see where the linear congruent random probe thingy is <g>.
 You *could* get rid of the left and right pointers and use linear
 congruent probing, and that might be worth a try.

Thank you for the suggestion , I will give it a try.
---
Michael Rynn

Mar 24 2010

D Programming

C/C++ Programming

Other

digitalmars.D - Associative Arrays need cleanout method or property to help garbage