www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Explicit Thread Local Heaps

reply dsimcha <dsimcha yahoo.com> writes:
There was some discussion around here a while back about the possibility of
using thread-local heaps in the standard GC.  This was rejected largely
because of the complexity it would add when casting to shared/immutable.

I'm wondering if it would be a good idea to allow memory to be explicitly
allocated as thread-local through a separate GC.  Such a GC would be designed
from the ground up to assume thread-local data and would never be used to
allocate in standard Phobos or Druntime functions.  It would simply be a
Phobos module, something like std.localgc.  The only way to use it would be to
explicitly call something like ThreadLocal.malloc, or pass it as a parameter
to something that needs an allocator.

The collector would (unsafely) assume that you always maintain at least one
pointer to all thread-locally allocated data on either the relevant thread's
stack, the thread-local heap or in thread-local storage.  The global heap,
__gshared storage and other threads' stacks would not be scanned.

A major issue I see is interfacing such a GC with the regular GC such that
pointers from the thread-local memory to shared memory are dealt with
properly, without being excessively conservative.  The thread-local GC would
likely use core.stdc.malloc() to allocate large blocks of memory, and would
need a way to signal to the shared GC what blocks might contain pointers
without synchronizing on every update.

If this sounds like a good idea, maybe I'll start prototyping it.  Overall,
the idea is that thread-local heaps are an optimization that should be done
explicitly when/if you need it, not something that needs to be built deep into
the language runtime.
Nov 12 2010
parent Fawzi Mohamed <fawzi gmx.ch> writes:
On 12-nov-10, at 16:36, dsimcha wrote:

 There was some discussion around here a while back about the  
 possibility of
 using thread-local heaps in the standard GC.  This was rejected  
 largely
 because of the complexity it would add when casting to shared/ 
 immutable.

 I'm wondering if it would be a good idea to allow memory to be  
 explicitly
 allocated as thread-local through a separate GC.  Such a GC would be  
 designed
 from the ground up to assume thread-local data and would never be  
 used to
 allocate in standard Phobos or Druntime functions.  It would simply  
 be a
 Phobos module, something like std.localgc.  The only way to use it  
 would be to
 explicitly call something like ThreadLocal.malloc, or pass it as a  
 parameter
 to something that needs an allocator.

 The collector would (unsafely) assume that you always maintain at  
 least one
 pointer to all thread-locally allocated data on either the relevant  
 thread's
 stack, the thread-local heap or in thread-local storage.  The global  
 heap,
 __gshared storage and other threads' stacks would not be scanned.

 A major issue I see is interfacing such a GC with the regular GC  
 such that
 pointers from the thread-local memory to shared memory are dealt with
 properly, without being excessively conservative.  The thread-local  
 GC would
 likely use core.stdc.malloc() to allocate large blocks of memory,  
 and would
 need a way to signal to the shared GC what blocks might contain  
 pointers
 without synchronizing on every update.

 If this sounds like a good idea, maybe I'll start prototyping it.   
 Overall,
 the idea is that thread-local heaps are an optimization that should  
 be done
 explicitly when/if you need it, not something that needs to be built  
 deep into
 the language runtime.

In my code the lock during allocation is more an issue than GC scanning. Having thread local (or better numa node local) pools for the allocation with separate locks would solve the main bottleneck. I have always disliked extra memory hierarchies, I feel that its benefit/complexity ratio is too small, but I might be wrong. The problem you identified of pointers to "global" memory is difficult to solve in a way that really gives the local GC and advantage over the a good GC implementation has uses several pools, without burdening the programmer. Still I imagine that having a localgc library implementation could be useful to some. I suspect that using it for general types that might allocate memory on their own would be difficult, but as this be used in special cases probably it isn't an issue.
Nov 12 2010