digitalmars.D - GC.calloc with random bits causes slowdown, also seen in built in
- Michael Rynn (53/53) Mar 10 2010 Looked at the Associative Arrays project on http://www.dsource.org/
- Robert Jacques (9/61) Mar 10 2010 This sounds like a false sharing problem. The current GC isn't precise, ...
- Moritz Warning (16/29) Mar 11 2010 Hi,
- Michael Rynn (27/33) Mar 11 2010
- Michael Rynn (11/14) Mar 12 2010 Committed a struct AA wrapper and a modified PyDict class in hash\pydict...
- Steven Schveighoffer (8/26) Mar 12 2010 Oops, sorry about that. The array allocation and appending went through...
- Michael Rynn (10/17) Mar 14 2010 This particular progressive slowdown of built-in AA is not due to the bu...
- Steven Schveighoffer (5/20) Mar 15 2010 Sorry, I meant the crashing bug was introduced :) I can't take credit f...
Looked at the Associative Arrays project on http://www.dsource.org/ projects/aa, with the idea of testing and maybe uploading a trie class (radix tree).. Ran the test program for PyDict noted the average result for consecutive runs of the one test it did, for uint[unit] associative arrays. This was very irritating, because of long wait for command line output. The test.d default sizes and number of runs were very big and only put out the average result at the end. Changed test.d to output each individual run time. The time for each individual run got progressively longer. This was fixable by making the calloc block attribute be GC.BlkAttr.NO_SCAN. So maybe the randomized uint bits confuse the garbage collector as it scans through. In the PyDictD1 test code there was a call to disable the garbage collector. Thats cheating. I looked at the code for PyDictD2.d and I decided that the table of struct Entry, holding the key and value, for which calloc is used , could be replaced by a more D - like (its a template anyway) call to new Entry[size]. The size is always a power of 2. Is it better to use new Entry[] rather than calloc, since the GC has a map of which values are pointers and which are not? Hopefully just like real D code. Ran the test again after removing the calloc, and the speed improved and the progressive run time increase went away. For comparison, I also put in a wrapper class template for the built in D AA type, and did the same tests. For uint[unit] this also had a progressive slow down in run-time. Same issue found for the D1 builtin AA. So the built-in associative array type of D suffers from this disease. (under very stressful array size parameters of course). It might affect long running programs with big builtin AA tables. The mystery is why was the slowdown progressive. After each run the AA variable is deleted, and the GC gets told to delete the calloc array directly. If the previous block of memory is properly gone, and a new cleared block created, shouldn't the run results be the same each time? So now the PyDict does not use calloc. and is faster than the built-in. I added a few missing properties (length, opIndex) The tests for string[uint] were re-instated. The built in AA had some progressive run slowdown tendency for this as well, although not as marked. The trie class (which has been submitted) had made was miserably slow in comparison to the PyDict for inserts. Lookups were much faster, but still not in the same ballpark as AA. My prior excitement about the lookup speed of the radix tree was actually due to an embarrassment of having code in an assert statement removed on the release compile. The trie class however, at least can do prefix key completion well. PyDictD1 is updated with the same change. The code is updated in SVN, so anyone can check out the changes. (and even update with any Dsource login). --tof-- Michael Rynn
Mar 10 2010
On Wed, 10 Mar 2010 20:33:32 -0500, Michael Rynn <michaelrynn optusnet.com.au> wrote:Looked at the Associative Arrays project on http://www.dsource.org/ projects/aa, with the idea of testing and maybe uploading a trie class (radix tree).. Ran the test program for PyDict noted the average result for consecutive runs of the one test it did, for uint[unit] associative arrays. This was very irritating, because of long wait for command line output. The test.d default sizes and number of runs were very big and only put out the average result at the end. Changed test.d to output each individual run time. The time for each individual run got progressively longer. This was fixable by making the calloc block attribute be GC.BlkAttr.NO_SCAN. So maybe the randomized uint bits confuse the garbage collector as it scans through. In the PyDictD1 test code there was a call to disable the garbage collector. Thats cheating. I looked at the code for PyDictD2.d and I decided that the table of struct Entry, holding the key and value, for which calloc is used , could be replaced by a more D - like (its a template anyway) call to new Entry[size]. The size is always a power of 2. Is it better to use new Entry[] rather than calloc, since the GC has a map of which values are pointers and which are not? Hopefully just like real D code. Ran the test again after removing the calloc, and the speed improved and the progressive run time increase went away. For comparison, I also put in a wrapper class template for the built in D AA type, and did the same tests. For uint[unit] this also had a progressive slow down in run-time. Same issue found for the D1 builtin AA. So the built-in associative array type of D suffers from this disease. (under very stressful array size parameters of course). It might affect long running programs with big builtin AA tables. The mystery is why was the slowdown progressive. After each run the AA variable is deleted, and the GC gets told to delete the calloc array directly. If the previous block of memory is properly gone, and a new cleared block created, shouldn't the run results be the same each time? So now the PyDict does not use calloc. and is faster than the built-in. I added a few missing properties (length, opIndex) The tests for string[uint] were re-instated. The built in AA had some progressive run slowdown tendency for this as well, although not as marked. The trie class (which has been submitted) had made was miserably slow in comparison to the PyDict for inserts. Lookups were much faster, but still not in the same ballpark as AA. My prior excitement about the lookup speed of the radix tree was actually due to an embarrassment of having code in an assert statement removed on the release compile. The trie class however, at least can do prefix key completion well. PyDictD1 is updated with the same change. The code is updated in SVN, so anyone can check out the changes. (and even update with any Dsource login). --tof-- Michael RynnThis sounds like a false sharing problem. The current GC isn't precise, so because D's AA nodes use pointers, the entire node is scanned for pointers, including the hash. As for the progressive nature, consider: during the first run many nodes are created and a few don't get deleted due to normal false sharing (stack, etc). Since these nodes exhibit false sharing themselves, they pin more nodes, etc, in a self perpetuating way. Not sure what was up with calloc though.
Mar 10 2010
Hi, the reason for using calloc in PyDict was because GC slows down allocation way too much. Maybe it is/was a bug in the GC. I used manual memory management to cut this problem and I got a huge speed improvement. On Thu, 11 Mar 2010 01:33:32 +0000, Michael Rynn wrote:Looked at the Associative Arrays project on http://www.dsource.org/ projects/aa, with the idea of testing and maybe uploading a trie class (radix tree)..Nice. :-)In the PyDictD1 test code there was a call to disable the garbage collector. Thats cheating.Looks like it was forgotten.I looked at the code for PyDictD2.d and I decided that the table of struct Entry, holding the key and value, for which calloc is used , could be replaced by a more D - like (its a template anyway) call to new Entry[size]. The size is always a power of 2.The aim was to simplify the code and to prevent hidden function calls to runtime functions.Ran the test again after removing the calloc, and the speed improved and the progressive run time increase went away.Well, I've got a slowdown instead the last time I have tested this. Maybe there was a bug fixed?So now the PyDict does not use calloc. and is faster than the built-in. I added a few missing properties (length, opIndex)"public size_t size()" is the length you were looking for. Also, opIndex might crash if the key is not present. I don't write D2 code., but shouldn't opIndex return a pointer to the entry?
Mar 11 2010
Thank you, I needed to improve my understanding the finer details of this implementation. // call opIndex, which must return the value type // may throw exception ValueType v = aa[key]; // only works for V opIndex(..) ValueType* v = key in aa; // only works for V* opIn_r(..) I should probably continue on this and support all the properties mentioned in - http://www.digitalmars.com/d/2.0/arrays.html#associative This includes having a .length property. PyDictD2 is now updated with a .get, and the length() returns the same as size(). Also if the aim is to make a drop in replacement for the AA, the AA should have an implementation available as a struct, (even have the same .sizeof) , so it can be placed directly in other structs, local variables and classes just like the built-in AA, and no other code will have to change. I am creating a new module based on PyDictD2, to this effect, hope to have available ASAP, since it will be mainly refactoring. If I do not finish it tonight, it will have to be next week. When there is such a drop in replacement, it will be easier for others to try it out in existing code and uncover quirks and limitations. Then there might be benefits. -- taf MichaelSo now the PyDict does not use calloc. and is faster than the built-in. I added a few missing properties (length, opIndex)"public size_t size()" is the length you were looking for. Also, opIndex might crash if the key is not present. I don't write D2 code., but shouldn't opIndex return a pointer to the entry?
Mar 11 2010
When there is such a drop in replacement, it will be easier for others to try it out in existing code and uncover quirks and limitations. Then there might be benefits.Committed a struct AA wrapper and a modified PyDict class in hash\pydict.d as builtin AA behave-alike, to the DSource - aa project. Actually the builtin is initially about 10% bit faster in uint[uint] than the PyDict. Based on the first few runs that is, before progressive slowdown of builtin takes affect. Perhaps some crafty optimization could narrow the gap a bit. There appears to be no significant performance effect of wrapping the modified PyDict class as an implementation reference inside the struct. Back next week --=taf Michael.
Mar 12 2010
On Fri, 12 Mar 2010 07:46:50 -0500, Michael Rynn <michaelrynn optusnet.com.au> wrote:Oops, sorry about that. The array allocation and appending went through a major overhaul, and I introduced a nasty bug that causes this. Please apply the fix identified here: http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D.announce&article_id=17841 Sorry again. I hope this can be fixed soon. -SteveWhen there is such a drop in replacement, it will be easier for others to try it out in existing code and uncover quirks and limitations. Then there might be benefits.Committed a struct AA wrapper and a modified PyDict class in hash\pydict.d as builtin AA behave-alike, to the DSource - aa project. Actually the builtin is initially about 10% bit faster in uint[uint] than the PyDict. Based on the first few runs that is, before progressive slowdown of builtin takes affect. Perhaps some crafty optimization could narrow the gap a bit. There appears to be no significant performance effect of wrapping the modified PyDict class as an implementation reference inside the struct. Tried with dmd2 2.041, but the built-in AA crashed sometime during a what looked like a rehash, probably triggered by a big resize. This happened in both debug and release versions. This is strange, because a winmerge comparison of the D-runtime source aaA.d, showed no significant difference apart from the removal of a AA_equals function. Back next week
Mar 12 2010
a major overhaul, and I introduced a nasty bug that causes this. Please apply the fix identified here: http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D.announce&article_id=17841Sorry again. I hope this can be fixed soon. -SteveThis particular progressive slowdown of built-in AA is not due to the bug introduced with 2-040. It was present in 2-039, and is also observable in the dmd, druntime (rev 265 for lifetime.d), phobos, built from latest SVN for the fixed 2-041 The pydict performance still 25% worse than fastest builtin AA, but does not slow anything down. Once the builtin AA tests have been run, everything slows down, including the pydict. Michael ----
Mar 14 2010
On Sun, 14 Mar 2010 21:03:38 -0400, Michael Rynn <michaelrynn optusnet.com.au> wrote:Sorry, I meant the crashing bug was introduced :) I can't take credit for the original issue you were chasing. -Stevea major overhaul, and I introduced a nasty bug that causes this. Please apply the fix identified here: http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D.announce&article_id=17841Sorry again. I hope this can be fixed soon. -SteveThis particular progressive slowdown of built-in AA is not due to the bug introduced with 2-040. It was present in 2-039, and is also observable in the dmd, druntime (rev 265 for lifetime.d), phobos, built from latest SVN for the fixed 2-041 The pydict performance still 25% worse than fastest builtin AA, but does not slow anything down. Once the builtin AA tests have been run, everything slows down, including the pydict.
Mar 15 2010