www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Introducing Sampling to the GC

reply Etienne Cimon <etcimon gmail.com> writes:
I've made some benchmarks, and I have found that for every (costly) 
collection routine of the GC, about ~0.7% of an application's (GC page 
bin contents) used memory is actually freed (in the GC pages).

I made some tools to come up with those statistics, available with a 
patched druntime:

https://github.com/D-Programming-Language/druntime/pull/803

My proposal is to implement pointer sampling in the GC (using hypothesis 
testing - hypergeometric or poisson distributions) to tweak this 
collection efficiency. The idea would be to be able to specify how much 
% we'd like the GC to swipe on average at every cycle, so that these 
cycles run less frequently.

I'm still looking to challenge this idea with someone that is 
knowledgeable with probabilistic statistics and/or quality assurance. 
Does anyone think my time would be wasted if I added it? Would this 
collide with a semi-precise GC?
May 23 2014
next sibling parent reply "safety0ff" <safety0ff.dev gmail.com> writes:
On Friday, 23 May 2014 at 21:14:38 UTC, Etienne Cimon wrote:
 My proposal is to implement pointer sampling in the GC (using 
 hypothesis testing - hypergeometric or poisson distributions) 
 to tweak this collection efficiency. The idea would be to be 
 able to specify how much % we'd like the GC to swipe on average 
 at every cycle, so that these cycles run less frequently.

Now I understand what you mean. I think this is an interesting idea. I've used the idea of reducing collection frequency to trade off running time for peak memory usage before. I would be interesting to have these "knobs" available to turn to tune application performance. I think we should do something similar to CDGC for this: use environment variables to set the settings at initialization time.
May 24 2014
parent Etienne Cimon <etcimon gmail.com> writes:
On 2014-05-25 02:17, safety0ff wrote:
 Now I understand what you mean.

 I think this is an interesting idea. I've used the idea of reducing
 collection frequency to trade off running time for peak memory usage
 before.

 I would be interesting to have these "knobs" available to turn to tune
 application performance.

 I think we should do something similar to CDGC for this: use environment
 variables to set the settings at initialization time.

Yes! Exactly, this would be great for prioritizing keeping the memory tight vs saving cpu cycles, as a set of samples could be verified very quickly even when allocations go through the freelist and no collection would even be considered.
May 25 2014
prev sibling parent "Martin Nowak" <code dawg.eu> writes:
On Friday, 23 May 2014 at 21:14:38 UTC, Etienne Cimon wrote:
 My proposal is to implement pointer sampling in the GC (using 
 hypothesis testing - hypergeometric or poisson distributions) 
 to tweak this collection efficiency. The idea would be to be 
 able to specify how much % we'd like the GC to swipe on average 
 at every cycle, so that these cycles run less frequently.

I still think an adaptive threshold for when to trigger a collection would be much simpler and equally effective. So when you can only reclaim very little memory you increase the threshold, so that the next collection would be delayed.
May 25 2014