www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Thread-Local GC as an Allocator?

reply dsimcha <dsimcha yahoo.com> writes:
I've been lurking a little on the recent discussions about thread-local
garbage collection and my general opinion is that implicitly thread-local GC
makes it too easy to shoot oneself in the foot if using non-SafeD constructs
like casting to shared/immutable or using std.parallelism or core.thread, and
that SafeD concurrency is so limited that it's not reasonable to expect people
to use only that.

However, what if we kept the shared GC as the default for everything **but
made thread-local GCs an allocator in the allocator interface that's being
worked on**?  This probably wouldn't be hard to implement.  All you'd need is
multiple instances of the GCX struct, one for each thread, and a way to
register the thread-local GCs with the shared GC so that pointers from the
thread-local GCs to shared memory get scanned properly and get rid of a little
locking/world stopping for the thread-local instances.  Using the thread-local
GC allocator would be an explicit assertion that you will **not** be casting
the memory to shared/immutable and sharing it, using
core.thread/std.parallelism, etc.  This would also keep with D's design
philosophy that the simple, safe way should be the default but the more
complicated, less safe way should be available for performance-critical code.

Thoughts?
Oct 06 2011
parent reply deadalnix <deadalnix gmail.com> writes:
Le 06/10/2011 18:06, dsimcha a écrit :
 I've been lurking a little on the recent discussions about thread-local
 garbage collection and my general opinion is that implicitly thread-local GC
 makes it too easy to shoot oneself in the foot if using non-SafeD constructs
 like casting to shared/immutable or using std.parallelism or core.thread, and
 that SafeD concurrency is so limited that it's not reasonable to expect people
 to use only that.

 However, what if we kept the shared GC as the default for everything **but
 made thread-local GCs an allocator in the allocator interface that's being
 worked on**?  This probably wouldn't be hard to implement.  All you'd need is
 multiple instances of the GCX struct, one for each thread, and a way to
 register the thread-local GCs with the shared GC so that pointers from the
 thread-local GCs to shared memory get scanned properly and get rid of a little
 locking/world stopping for the thread-local instances.  Using the thread-local
 GC allocator would be an explicit assertion that you will **not** be casting
 the memory to shared/immutable and sharing it, using
 core.thread/std.parallelism, etc.  This would also keep with D's design
 philosophy that the simple, safe way should be the default but the more
 complicated, less safe way should be available for performance-critical code.

 Thoughts?

The problem with the global GC is that it will stop all thread during the whole collection. Having TL GC is interesting only if you have mostly TL garbages. So it would become necessary to use the allocator everywhere. Which isn't very practical either. IMO the solution should be done the other way around : if you want to do casting as immutable or shared, to use thoses lib, you have to specify it at allocation. cause you have to make sure anyway that data are effectively immutable/shared compliant. This cannot be done without knowing what you do anyway.
Oct 06 2011
next sibling parent dsimcha <dsimcha yahoo.com> writes:
== Quote from deadalnix (deadalnix gmail.com)'s article
 The problem with the global GC is that it will stop all thread during
 the whole collection. Having TL GC is interesting only if you have
 mostly TL garbages.
 So it would become necessary to use the allocator everywhere. Which
 isn't very practical either.

Right, but usually (at least in my experience) only a small portion of your code both allocates frequently and allocates at times when avoiding stopping the world is important. Therefore, with a multithreaded GC allocator you only have to worry about these issues in that small, performance-critical portion rather than this being a cross-cutting issue. This is similar to how I use RegionAllocator: I use it to get rid of a few frequent GC allocations in performance-critical parts of my code, and it works quite well. BTW, I just realized that breakage from thread-local GCs by default would leak even into SafeD. For example, they would disallow the return value of a strongly pure function from being implicitly converted to immutable as it is now, at least without some additional caveats. IMHO this is an unacceptable cross-cutting headache. If something's only benefit is performance and it causes a lot of nasty cross-cutting issues, it's almost always better for the optimization to be explicit rather than dealing with the implicit cross-cutting issues to be explicit.
Oct 06 2011
prev sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Thu, 06 Oct 2011 15:04:56 -0400, dsimcha <dsimcha yahoo.com> wrote:
 == Quote from deadalnix (deadalnix gmail.com)'s article
 The problem with the global GC is that it will stop all thread during
 the whole collection. Having TL GC is interesting only if you have
 mostly TL garbages.
 So it would become necessary to use the allocator everywhere. Which
 isn't very practical either.

Right, but usually (at least in my experience) only a small portion of your code both allocates frequently and allocates at times when avoiding stopping the world is important. Therefore, with a multithreaded GC allocator you only have to worry about these issues in that small, performance-critical portion rather than this being a cross-cutting issue. This is similar to how I use RegionAllocator: I use it to get rid of a few frequent GC allocations in performance-critical parts of my code, and it works quite well. BTW, I just realized that breakage from thread-local GCs by default would leak even into SafeD. For example, they would disallow the return value of a strongly pure function from being implicitly converted to immutable as it is now, at least without some additional caveats. IMHO this is an unacceptable cross-cutting headache. If something's only benefit is performance and it causes a lot of nasty cross-cutting issues, it's almost always better for the optimization to be explicit rather than dealing with the implicit cross-cutting issues to be explicit.

Well, aren't heap allocations from strongly pure arrays eventually going to be deep-duped anyways? One of the major GC advantages of strongly pure functions is that you don't have to mark-sweep everything it allocates: anything not pointed by the return value can be straight forwardly trashed.
Oct 06 2011