www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - "pause" garbage collection in parallel code

reply "Stephan Schiffels" <stephan_schiffels mac.com> writes:
Dear all,

I have a parallel program, using std.parallelism (awesome!), but 
I recently noticed that I achieve very poor performance on many 
CPUs, and I identified the Garbage Collector to be the main cause 
of this. Because I have quite heavy memory usage, the Garbage 
collector interrupts all multi-threading while it runs, which 
reduces the total runtime of my program dramatically. This is so 
bad that I actually achieve poorer performance running on 20 
cores than on 4 cores.

I see several ways how to improve my code:
1.) Is there a way to tell the GC the maximum heap size allowed 
before it initiates a collection cycle? Cranking that up would 
cause fewer collection cycles and hence spend more time in my 
multithreaded code?
2.) Is there a way to "pause" the GC collection for the parallel 
part of my program, deliberately accepting higher memory usage?
3.) Most of the memory is used in one huge array, perhaps I 
should simply use malloc and free for that particular array to 
avoid the GC from running so often.

Certainly, Option 1 and 2 are "noninvasive", so preferred. Are 
the other ways?
I am a bit surprised that there is no command line option for dmd 
to control GC maximum heap size. Who determines how often the GC 
is run? For example, in Haskell I can simply set the maximum heap 
size to 10Mb in GHC using -A10m, which I used in the past to help 
exactly the same problem and dramatically reduce the frequency of 
GC collection cycles.

Thanks for help!
Stephan
Dec 15 2014
next sibling parent "Artem Tarasov" <lomereiter gmail.com> writes:
I had the same situation, and ended up with the malloc/free 
option. It's also often possible to get rid of allocations in a 
loop by pre-allocating thread-local buffers and reusing them 
throughout (see std.parallelism.TaskPool.workerLocalStorage).
Dec 15 2014
prev sibling next sibling parent reply "bearophile" <bearophileHUGS lycos.com> writes:
Stephan Schiffels:

 2.) Is there a way to "pause" the GC collection for the 
 parallel part of my program, deliberately accepting higher 
 memory usage?
There are GC.disable and GC.enable.
 3.) Most of the memory is used in one huge array, perhaps I 
 should simply use malloc and free for that particular array to 
 avoid the GC from running so often.
This is sometimes acceptable. For some usages there is also Array in Phobos.
 I am a bit surprised that there is no command line option for 
 dmd to control GC maximum heap size.
Adding such option could be a good idea.
 Who determines how often the GC is run?
The D GC is run when you allocate memory. If you don't allocate, it's run only at program start to initialize it. Bye, bearophile
Dec 15 2014
parent "H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:
On Mon, Dec 15, 2014 at 11:26:26AM +0000, bearophile via Digitalmars-d wrote:
 Stephan Schiffels:
 
2.) Is there a way to "pause" the GC collection for the parallel part
of my program, deliberately accepting higher memory usage?
There are GC.disable and GC.enable.
[...] Recently in one of my projects I achieved dramatic performance boosts by calling core.memory.GC.disable() at the beginning of the program, and manually calling GC.collect() at a reduced rate (to keep total memory usage down). If you're having performance trouble with the GC, this could be a good way to deal with the situation. I almost halved my running times (== doubled performance) just by limiting the rate of GC collection cycles. I also achieved significant boosts by using a profiler to track down hotspots that turned out to be some obscure piece of code that I overlooked, that was overly-eager in allocating memory, thereby incurring a lot of needless GC pressure. Just a couple of simple fixes in this area improved my performance by about 10-20%. In one case I manually called GC.free() to collect dead memory that I know for sure there are no other references to -- this also reduces GC pressure and allows you to reduce the frequency of GC.collect() calls. (There's a caveat, though; if you call GC.free() on memory that's still being referenced, you might introduce segfaults and memory corruption into your program. Generally, I'd advise to do this only in simple cases where it's easy to prove that something is unreferenced. Anything more than that, and you end up reimplementing the GC, and you might as well just call GC.collect() instead. :-P) T -- Spaghetti code may be tangly, but lasagna code is just cheesy.
Dec 15 2014
prev sibling next sibling parent reply "Daniel Murphy" <yebbliesnospam gmail.com> writes:
"Stephan Schiffels"  wrote in message 
news:wjeozpnitvhtxrkhulaz forum.dlang.org...

 I see several ways how to improve my code:
 1.) Is there a way to tell the GC the maximum heap size allowed before it 
 initiates a collection cycle? Cranking that up would cause fewer 
 collection cycles and hence spend more time in my multithreaded code?
Yes, sort of. You can use big chunk of memory, and collections won't run until that is exhausted. As it is only allocating virtual memory, it should be more or less equivalent to setting the max heap size.
 2.) Is there a way to "pause" the GC collection for the parallel part of 
 my program, deliberately accepting higher memory usage?
Yes, with GC.enable and GC.diable
 3.) Most of the memory is used in one huge array, perhaps I should simply 
 use malloc and free for that particular array to avoid the GC from running 
 so often.
Yes, this may work if it's a reasonable design for your application. Other options like re-using one GC buffer per thread might work too.
 I am a bit surprised that there is no command line option for dmd to 
 control GC maximum heap size. Who determines how often the GC is run? For 
 example, in Haskell I can simply set the maximum heap size to 10Mb in GHC 
 using -A10m, which I used in the past to help exactly the same problem and 
 dramatically reduce the frequency of GC collection cycles.
Being able to configure the GC at run-time is something that's currently being worked on. I assume this will be possible rather soon, probably in the next release.
Dec 15 2014
next sibling parent reply "Stephan Schiffels" <stephan_schiffels mac.com> writes:
On Monday, 15 December 2014 at 11:54:44 UTC, Daniel Murphy wrote:
 "Stephan Schiffels"  wrote in message 
 news:wjeozpnitvhtxrkhulaz forum.dlang.org...

 I see several ways how to improve my code:
 1.) Is there a way to tell the GC the maximum heap size 
 allowed before it initiates a collection cycle? Cranking that 
 up would cause fewer collection cycles and hence spend more 
 time in my multithreaded code?
Yes, sort of. You can use the GC grab a big chunk of memory, and collections won't run until that is exhausted. As it is only allocating virtual memory, it should be more or less equivalent to setting the max heap size.
 2.) Is there a way to "pause" the GC collection for the 
 parallel part of my program, deliberately accepting higher 
 memory usage?
Yes, with GC.enable and GC.diable
 3.) Most of the memory is used in one huge array, perhaps I 
 should simply use malloc and free for that particular array to 
 avoid the GC from running so often.
Yes, this may work if it's a reasonable design for your application. Other options like re-using one GC buffer per thread might work too.
 I am a bit surprised that there is no command line option for 
 dmd to control GC maximum heap size. Who determines how often 
 the GC is run? For example, in Haskell I can simply set the 
 maximum heap size to 10Mb in GHC using -A10m, which I used in 
 the past to help exactly the same problem and dramatically 
 reduce the frequency of GC collection cycles.
Being able to configure the GC at run-time is something that's currently being worked on. I assume this will be possible rather soon, probably in the next release.
Excellent, thanks everyone, problem solved. I use GC.disable and GC.enable for now, works like a charm. I new about these functions, but I thought they prevent GC-allocation, not just collection. Using ThreadLocal storage with std.parallelism is also interesting, I won't need it now, as the memory is still within manageable bounds, but certainly an option to reduce the memory footprint... nice! And yes, I saw GC.reserve just after I wrote this thread, it seems to do what I wanted. Stephan
Dec 15 2014
parent "Gary Willoughby" <dev nomad.so> writes:
On Monday, 15 December 2014 at 12:22:32 UTC, Stephan Schiffels 
wrote:
 Excellent, thanks everyone, problem solved. I use GC.disable 
 and GC.enable for now, works like a charm. I new about these 
 functions, but I thought they prevent GC-allocation, not just 
 collection.

 Using ThreadLocal storage with std.parallelism is also 
 interesting, I won't need it now, as the memory is still within 
 manageable bounds, but certainly an option to reduce the memory 
 footprint... nice!
 And yes, I saw GC.reserve just after I wrote this thread, it 
 seems to do what I wanted.

 Stephan
Nice. I don't know if it would help you in future but gcarena.d looks quite useful here: https://bitbucket.org/infognition/dstuff/src/
Dec 15 2014
prev sibling parent reply "Stephan Schiffels" <stephan_schiffels mac.com> writes:
On Monday, 15 December 2014 at 11:54:44 UTC, Daniel Murphy wrote:
 "Stephan Schiffels"  wrote in message 
 news:wjeozpnitvhtxrkhulaz forum.dlang.org...

 I see several ways how to improve my code:
 1.) Is there a way to tell the GC the maximum heap size 
 allowed before it initiates a collection cycle? Cranking that 
 up would cause fewer collection cycles and hence spend more 
 time in my multithreaded code?
Yes, sort of. You can use the GC grab a big chunk of memory, and collections won't run until that is exhausted. As it is only allocating virtual memory, it should be more or less equivalent to setting the max heap size.
This doesn't work for me, for some reason. I reserve via GC.reserve(4_000_000_000), ensured that it does return at least that amount, but the Garbage collector will still collect like crazy, long before that reserved memory is exhausted...
Dec 15 2014
parent reply "Daniel Murphy" <yebbliesnospam gmail.com> writes:
"Stephan Schiffels"  wrote in message 
news:mzlhbfypcihvyattaukr forum.dlang.org...

 This doesn't work for me, for some reason. I reserve via 
 GC.reserve(4_000_000_000), ensured that it does return at least that 
 amount, but the Garbage collector will still collect like crazy, long 
 before that reserved memory is exhausted...
That's strange, I must be missing something. Hopefully one of the GC experts can explain why this happens.
Dec 15 2014
parent "Sean Kelly" <sean invisibleduck.org> writes:
On Monday, 15 December 2014 at 14:39:50 UTC, Daniel Murphy wrote:
 "Stephan Schiffels"  wrote in message 
 news:mzlhbfypcihvyattaukr forum.dlang.org...

 This doesn't work for me, for some reason. I reserve via 
 GC.reserve(4_000_000_000), ensured that it does return at 
 least that amount, but the Garbage collector will still 
 collect like crazy, long before that reserved memory is 
 exhausted...
That's strange, I must be missing something. Hopefully one of the GC experts can explain why this happens.
Perhaps the reserve pool is empty when the next collection runs and ends up being returned to the OS?
Dec 15 2014
prev sibling parent "ponce" <contact gam3sfrommars.fr> writes:
On Monday, 15 December 2014 at 11:12:55 UTC, Stephan Schiffels 
wrote:
 3.) Most of the memory is used in one huge array, perhaps I 
 should simply use malloc and free for that particular array to 
 avoid the GC from running so often.
Yes. That way you know with certainty the GC won't scan it.
Dec 15 2014