www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - GC.calloc with random bits causes slowdown, also seen in built in

reply Michael Rynn <michaelrynn optusnet.com.au> writes:
Looked at the Associative Arrays project on http://www.dsource.org/
projects/aa, with the idea of testing and maybe uploading a trie class 
(radix tree)..

Ran the test program for PyDict noted the average result for consecutive 
runs of the one test it did, for  uint[unit]  associative arrays.

 This was very irritating, because of long wait for command line output. 
The test.d default sizes and number of runs were very big and only put 
out the average result at the end.

Changed test.d to output each individual run time.
The time for each individual run got progressively longer.

This was fixable by making the calloc block attribute be 
GC.BlkAttr.NO_SCAN.  So maybe the randomized uint bits confuse the 
garbage collector as it scans through.

In the PyDictD1 test code there was a call to disable the garbage 
collector. Thats cheating.

I looked at the code for PyDictD2.d  and I decided that the table of 
struct Entry,  holding the key and value, for which calloc is used , 
could be replaced by a more D - like (its a template anyway) call to new 
Entry[size].  The size is always a power of 2. 

Is it better to use new Entry[] rather than calloc,  since the GC has a 
map of which values are pointers and which are not?  Hopefully just like 
real D code.

Ran the test again after removing the calloc, and the speed improved and 
the progressive run time increase went away.

For comparison, I also put in a wrapper class template for the built in D 
AA type, and did the same tests. For uint[unit] this also had a 
progressive slow down in run-time.  Same issue found for the D1 builtin 
AA.

So the built-in associative array type of D suffers from this disease.
(under very stressful array size parameters of course). It might affect 
long running programs with big builtin AA tables. 


The mystery is  why was the slowdown progressive.  After each run the AA 
variable is deleted, and the GC gets told to delete the calloc array 
directly. If the previous block of memory is properly gone, and a new 
cleared block created, shouldn't the run results be the same each time? 

So now the PyDict does not use calloc. and is faster than the built-in. I 
added a few missing properties (length, opIndex) 

The tests for string[uint] were re-instated.  The built in AA had some 
progressive run slowdown tendency for this as well, although not as 
marked.

The trie class (which has been submitted) had made was miserably slow in 
comparison to the PyDict for inserts.  Lookups were much faster, but 
still not in the same ballpark as AA. 
 
My prior excitement about the lookup speed of the radix tree was actually 
due to an embarrassment of having code in an assert statement removed on 
the release compile.

The trie class however, at least can do prefix key completion well.

PyDictD1 is updated with the same change. 
The code is updated in SVN, so anyone can check out the changes. (and 
even update with any Dsource login).

--tof--

Michael Rynn
Mar 10 2010
next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Wed, 10 Mar 2010 20:33:32 -0500, Michael Rynn  
<michaelrynn optusnet.com.au> wrote:
 Looked at the Associative Arrays project on http://www.dsource.org/
 projects/aa, with the idea of testing and maybe uploading a trie class
 (radix tree)..

 Ran the test program for PyDict noted the average result for consecutive
 runs of the one test it did, for  uint[unit]  associative arrays.

  This was very irritating, because of long wait for command line output.
 The test.d default sizes and number of runs were very big and only put
 out the average result at the end.

 Changed test.d to output each individual run time.
 The time for each individual run got progressively longer.

 This was fixable by making the calloc block attribute be
 GC.BlkAttr.NO_SCAN.  So maybe the randomized uint bits confuse the
 garbage collector as it scans through.

 In the PyDictD1 test code there was a call to disable the garbage
 collector. Thats cheating.

 I looked at the code for PyDictD2.d  and I decided that the table of
 struct Entry,  holding the key and value, for which calloc is used ,
 could be replaced by a more D - like (its a template anyway) call to new
 Entry[size].  The size is always a power of 2.

 Is it better to use new Entry[] rather than calloc,  since the GC has a
 map of which values are pointers and which are not?  Hopefully just like
 real D code.

 Ran the test again after removing the calloc, and the speed improved and
 the progressive run time increase went away.

 For comparison, I also put in a wrapper class template for the built in D
 AA type, and did the same tests. For uint[unit] this also had a
 progressive slow down in run-time.  Same issue found for the D1 builtin
 AA.

 So the built-in associative array type of D suffers from this disease.
 (under very stressful array size parameters of course). It might affect
 long running programs with big builtin AA tables.


 The mystery is  why was the slowdown progressive.  After each run the AA
 variable is deleted, and the GC gets told to delete the calloc array
 directly. If the previous block of memory is properly gone, and a new
 cleared block created, shouldn't the run results be the same each time?

 So now the PyDict does not use calloc. and is faster than the built-in. I
 added a few missing properties (length, opIndex)

 The tests for string[uint] were re-instated.  The built in AA had some
 progressive run slowdown tendency for this as well, although not as
 marked.

 The trie class (which has been submitted) had made was miserably slow in
 comparison to the PyDict for inserts.  Lookups were much faster, but
 still not in the same ballpark as AA.
 My prior excitement about the lookup speed of the radix tree was actually
 due to an embarrassment of having code in an assert statement removed on
 the release compile.

 The trie class however, at least can do prefix key completion well.

 PyDictD1 is updated with the same change.
 The code is updated in SVN, so anyone can check out the changes. (and
 even update with any Dsource login).

 --tof--

 Michael Rynn

This sounds like a false sharing problem. The current GC isn't precise, so because D's AA nodes use pointers, the entire node is scanned for pointers, including the hash. As for the progressive nature, consider: during the first run many nodes are created and a few don't get deleted due to normal false sharing (stack, etc). Since these nodes exhibit false sharing themselves, they pin more nodes, etc, in a self perpetuating way. Not sure what was up with calloc though.
Mar 10 2010
prev sibling parent reply Moritz Warning <moritzwarning web.de> writes:
Hi,

the reason for using calloc in PyDict was because GC slows down 
allocation way too much.
Maybe it is/was a bug in the GC. I used manual memory management
to cut this problem and I got a huge speed improvement.


On Thu, 11 Mar 2010 01:33:32 +0000, Michael Rynn wrote:

 Looked at the Associative Arrays project on http://www.dsource.org/
 projects/aa, with the idea of testing and maybe uploading a trie class
 (radix tree)..

 In the PyDictD1 test code there was a call to disable the garbage
 collector. Thats cheating.

 I looked at the code for PyDictD2.d  and I decided that the table of
 struct Entry,  holding the key and value, for which calloc is used ,
 could be replaced by a more D - like (its a template anyway) call to new
 Entry[size].  The size is always a power of 2.

to runtime functions.
 Ran the test again after removing the calloc, and the speed improved and
 the progressive run time increase went away.

Maybe there was a bug fixed?
 So now the PyDict does not use calloc. and is faster than the built-in.
 I added a few missing properties (length, opIndex)

Also, opIndex might crash if the key is not present. I don't write D2 code., but shouldn't opIndex return a pointer to the entry?
Mar 11 2010
next sibling parent Michael Rynn <michaelrynn optusnet.com.au> writes:
 
 So now the PyDict does not use calloc. and is faster than the built-in.
 I added a few missing properties (length, opIndex)

might crash if the key is not present. I don't write D2 code., but shouldn't opIndex return a pointer to the entry?

Thank you, I needed to improve my understanding the finer details of this implementation. // call opIndex, which must return the value type // may throw exception ValueType v = aa[key]; // only works for V opIndex(..) ValueType* v = key in aa; // only works for V* opIn_r(..) I should probably continue on this and support all the properties mentioned in - http://www.digitalmars.com/d/2.0/arrays.html#associative This includes having a .length property. PyDictD2 is now updated with a .get, and the length() returns the same as size(). Also if the aim is to make a drop in replacement for the AA, the AA should have an implementation available as a struct, (even have the same .sizeof) , so it can be placed directly in other structs, local variables and classes just like the built-in AA, and no other code will have to change. I am creating a new module based on PyDictD2, to this effect, hope to have available ASAP, since it will be mainly refactoring. If I do not finish it tonight, it will have to be next week. When there is such a drop in replacement, it will be easier for others to try it out in existing code and uncover quirks and limitations. Then there might be benefits. -- taf Michael
Mar 11 2010
prev sibling next sibling parent Michael Rynn <michaelrynn optusnet.com.au> writes:
 When there is such a drop in replacement, it will be easier for others
 to try it out in existing code and uncover quirks and limitations.  Then
 there might be benefits.

Committed a struct AA wrapper and a modified PyDict class in hash\pydict.d as builtin AA behave-alike, to the DSource - aa project. Actually the builtin is initially about 10% bit faster in uint[uint] than the PyDict. Based on the first few runs that is, before progressive slowdown of builtin takes affect. Perhaps some crafty optimization could narrow the gap a bit. There appears to be no significant performance effect of wrapping the modified PyDict class as an implementation reference inside the struct. Back next week --=taf Michael.
Mar 12 2010
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Fri, 12 Mar 2010 07:46:50 -0500, Michael Rynn  
<michaelrynn optusnet.com.au> wrote:

 When there is such a drop in replacement, it will be easier for others
 to try it out in existing code and uncover quirks and limitations.  Then
 there might be benefits.

Committed a struct AA wrapper and a modified PyDict class in hash\pydict.d as builtin AA behave-alike, to the DSource - aa project. Actually the builtin is initially about 10% bit faster in uint[uint] than the PyDict. Based on the first few runs that is, before progressive slowdown of builtin takes affect. Perhaps some crafty optimization could narrow the gap a bit. There appears to be no significant performance effect of wrapping the modified PyDict class as an implementation reference inside the struct. Tried with dmd2 2.041, but the built-in AA crashed sometime during a what looked like a rehash, probably triggered by a big resize. This happened in both debug and release versions. This is strange, because a winmerge comparison of the D-runtime source aaA.d, showed no significant difference apart from the removal of a AA_equals function. Back next week

Oops, sorry about that. The array allocation and appending went through a major overhaul, and I introduced a nasty bug that causes this. Please apply the fix identified here: http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D.announce&article_id=17841 Sorry again. I hope this can be fixed soon. -Steve
Mar 12 2010
prev sibling next sibling parent Michael Rynn <michaelrynn optusnet.com.au> writes:
 a major overhaul, and I introduced a nasty bug that causes this.  Please
 apply the fix identified here:
 http://www.digitalmars.com/webnews/newsgroups.php?

 
 Sorry again.  I hope this can be fixed soon.
 
 -Steve

This particular progressive slowdown of built-in AA is not due to the bug introduced with 2-040. It was present in 2-039, and is also observable in the dmd, druntime (rev 265 for lifetime.d), phobos, built from latest SVN for the fixed 2-041 The pydict performance still 25% worse than fastest builtin AA, but does not slow anything down. Once the builtin AA tests have been run, everything slows down, including the pydict. Michael ----
Mar 14 2010
prev sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Sun, 14 Mar 2010 21:03:38 -0400, Michael Rynn  
<michaelrynn optusnet.com.au> wrote:

 a major overhaul, and I introduced a nasty bug that causes this.  Please
 apply the fix identified here:
 http://www.digitalmars.com/webnews/newsgroups.php?

 Sorry again.  I hope this can be fixed soon.

 -Steve

This particular progressive slowdown of built-in AA is not due to the bug introduced with 2-040. It was present in 2-039, and is also observable in the dmd, druntime (rev 265 for lifetime.d), phobos, built from latest SVN for the fixed 2-041 The pydict performance still 25% worse than fastest builtin AA, but does not slow anything down. Once the builtin AA tests have been run, everything slows down, including the pydict.

Sorry, I meant the crashing bug was introduced :) I can't take credit for the original issue you were chasing. -Steve
Mar 15 2010