www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Moving GC

reply dsimcha <dsimcha yahoo.com> writes:
I've been playing around with custom memory allocation schemes for some
programs, which are based on regions.  These work great, and are significantly
faster for the use cases I'm using them for, than the general allocation
scheme.  However, everything would break if it were used with reference types
and a moving GC because, as far as the GC knows, the regions I'm allocating
from it are untyped.

What are the odds that some D implementation gets a moving GC in the
foreseeable future?  Are they significant enough that I should avoid relying
on the GC being non-moving, or is worrying about such things just overdoing
trying to be future-proof?
Dec 12 2008
next sibling parent reply Sean Kelly <sean invisibleduck.org> writes:
dsimcha wrote:
 
 What are the odds that some D implementation gets a moving GC in the
 foreseeable future?

Minuscule. The GC isn't provided nearly enough type information to safely move memory right now, and I don't see that changing any time soon. Sean
Dec 12 2008
parent Don <nospam nospam.com> writes:
Sean Kelly wrote:
 dsimcha wrote:
 What are the odds that some D implementation gets a moving GC in the
 foreseeable future?

Minuscule. The GC isn't provided nearly enough type information to safely move memory right now, and I don't see that changing any time soon. Sean

An related case that could probably be done is pure functions. Inside a pure function, all allocations could be made on a 'pure heap' instead of the normal heap. When returning from a pure function back into a non-pure function, copy the return value into the normal heap (moving it in the process). Then discard the entire pure heap. (This is a *huge* win if the return value from the pure function is a POD, and the pure function does extensive memory allocation). Of course, that's more like a memory pool rather a gc.
Dec 13 2008
prev sibling next sibling parent reply Sergey Gromov <snake.scaly gmail.com> writes:
Fri, 12 Dec 2008 15:05:46 +0000 (UTC), dsimcha wrote:

 I've been playing around with custom memory allocation schemes for some
 programs, which are based on regions.  These work great, and are significantly
 faster for the use cases I'm using them for, than the general allocation
 scheme.  However, everything would break if it were used with reference types
 and a moving GC because, as far as the GC knows, the regions I'm allocating
 from it are untyped.
 
 What are the odds that some D implementation gets a moving GC in the
 foreseeable future?  Are they significant enough that I should avoid relying
 on the GC being non-moving, or is worrying about such things just overdoing
 trying to be future-proof?

Is this at all possible for a conservative GC to be moving? It's one thing when some unused memory is kept allocated due to false pointers, and another when some data resembling pointers changes unpredictably. And GC is to stay conservative as long as unions are supported.
Dec 12 2008
parent reply dsimcha <dsimcha yahoo.com> writes:
== Quote from Sergey Gromov (snake.scaly gmail.com)'s article
 Fri, 12 Dec 2008 15:05:46 +0000 (UTC), dsimcha wrote:
 I've been playing around with custom memory allocation schemes for some
 programs, which are based on regions.  These work great, and are significantly
 faster for the use cases I'm using them for, than the general allocation
 scheme.  However, everything would break if it were used with reference types
 and a moving GC because, as far as the GC knows, the regions I'm allocating
 from it are untyped.

 What are the odds that some D implementation gets a moving GC in the
 foreseeable future?  Are they significant enough that I should avoid relying
 on the GC being non-moving, or is worrying about such things just overdoing
 trying to be future-proof?

thing when some unused memory is kept allocated due to false pointers, and another when some data resembling pointers changes unpredictably. And GC is to stay conservative as long as unions are supported.

There are different levels of conservatism, though. For example, IIRC, the Mono GC scans the stack conservatively and pins all objects that have apparent pointers from the stack. However, it scans the heap precisely and moves objects that only have pointers from the heap. In theory (not saying it will or even should actually happen) D could implement a mostly precise GC that makes the conservative assumption only in the case of unions, and pins objects that may be pointed to by pointers contained in unions. However, after thinking about it, I doubt D will go the route of moving/more precise GC. I know I've advocated it in the past, but I've realized that false pointers aren't generally that big a deal if you delete a few huge (>1MB) objects manually. Furthermore, I think it's necessary in a systems programming language to be able to allocate a big chunk of untyped memory. Technically, this could still be done with a moving GC by using the C heap, but it would make things really, really complicated.
Dec 12 2008
next sibling parent reply Christopher Wright <dhasenan gmail.com> writes:
dsimcha wrote:
 However, after thinking about it, I doubt D will go the route of moving/more
 precise GC.  I know I've advocated it in the past, but I've realized that false
 pointers aren't generally that big a deal if you delete a few huge (>1MB)
objects
 manually.  Furthermore, I think it's necessary in a systems programming
language
 to be able to allocate a big chunk of untyped memory.  Technically, this could
 still be done with a moving GC by using the C heap, but it would make things
 really, really complicated.

You'd have to do more bookkeeping: - this memory is referenced by something that I know is a pointer - this memory is referenced by something that might be a pointer (or some random bit pattern) - this memory is not referenced This isn't unreasonable, and it really just depends on how complete and how fast D's runtime reflection is.
Dec 12 2008
parent reply Sean Kelly <sean invisibleduck.org> writes:
== Quote from Christopher Wright (dhasenan gmail.com)'s article
 This isn't unreasonable, and it really just depends on how complete and
 how fast D's runtime reflection is.

This would incur more memory overhead as well. Right now, we make do with one bit per block that indicates whether the block should be scanned for pointers, while this may theoretically require a pointer to TypeInfo per block in order to perform precise scanning. Sean
Dec 12 2008
next sibling parent reply dsimcha <dsimcha yahoo.com> writes:
== Quote from Sean Kelly (sean invisibleduck.org)'s article
 == Quote from Christopher Wright (dhasenan gmail.com)'s article
 This isn't unreasonable, and it really just depends on how complete and
 how fast D's runtime reflection is.

do with one bit per block that indicates whether the block should be scanned for pointers, while this may theoretically require a pointer to TypeInfo per block in order to perform precise scanning. Sean

Under what use cases would this extra few bytes really matter? What kinds of programs allocate so many objects that this overhead would be significant in practice? Note that I'm not advocating a moving GC, because in a systems language like D, where working with pointers and untyped memory blocks is supposed to be feasible, I think it would create more problems/limitations than it solves. For example, now it's ok to store a pointer to GC allocated memory in an area not scanned by the GC, such as the C heap or a memory block marked as NO_SCAN, as long as you also have another pointer to it somewhere else. With a moving GC, this would become problematic. However, a more precise non-moving GC (scans the whole heap precisely, only imprecise on the stack) would be nice if it can be reasonably implemented.
Dec 12 2008
parent reply Christopher Wright <dhasenan gmail.com> writes:
dsimcha wrote:
 == Quote from Sean Kelly (sean invisibleduck.org)'s article
 == Quote from Christopher Wright (dhasenan gmail.com)'s article
 This isn't unreasonable, and it really just depends on how complete and
 how fast D's runtime reflection is.

do with one bit per block that indicates whether the block should be scanned for pointers, while this may theoretically require a pointer to TypeInfo per block in order to perform precise scanning. Sean

Under what use cases would this extra few bytes really matter? What kinds of programs allocate so many objects that this overhead would be significant in practice?

It means that you can't use one block for objects of multiple types. That's a limitation on how you implement a GC, and it could possibly increase your overhead a lot more than converting a bit to a word. (Unless you use some tricks, that bit will take at least one byte, maybe more depending on alignment.)
Dec 12 2008
parent reply dsimcha <dsimcha yahoo.com> writes:
== Quote from Christopher Wright (dhasenan gmail.com)'s article
 It means that you can't use one block for objects of multiple types.

Sure you can, without being any worse off than under the current scheme. Mark the contents of the memory as an array of void*s. This would have the same effect as marking it for GC scanning under the current scheme, basically making the GC scan the block conservatively. Furthermore, if you're storing value types or pointers to reference types that also have a pointer stored in a GC-scanned block, set the typeinfo to byte or something, and you have the equivalent of setting the NO_SCAN bit.
Dec 12 2008
next sibling parent Fawzi Mohamed <fmohamed mac.com> writes:
On 2008-12-13 04:24:54 +0100, dsimcha <dsimcha yahoo.com> said:

 == Quote from Christopher Wright (dhasenan gmail.com)'s article
 It means that you can't use one block for objects of multiple types.

Sure you can, without being any worse off than under the current scheme. Mark the contents of the memory as an array of void*s. This would have the same effect as marking it for GC scanning under the current scheme, basically making the GC scan the block conservatively. Furthermore, if you're storing value types or pointers to reference types that also have a pointer stored in a GC-scanned block, set the typeinfo to byte or something, and you have the equivalent of setting the NO_SCAN bit.

it should be an array of void (or union pointer, non pointer), they should be pinned. Also all pointers given to C (and potentially pointers reachable from them) should be pinned. This makes system programming difficult. If one has a closed system with just D then it is easier. Fawzi
Dec 13 2008
prev sibling parent Christopher Wright <dhasenan gmail.com> writes:
dsimcha wrote:
 == Quote from Christopher Wright (dhasenan gmail.com)'s article
 It means that you can't use one block for objects of multiple types.

Sure you can, without being any worse off than under the current scheme. Mark the contents of the memory as an array of void*s. This would have the same effect as marking it for GC scanning under the current scheme, basically making the GC scan the block conservatively. Furthermore, if you're storing value types or pointers to reference types that also have a pointer stored in a GC-scanned block, set the typeinfo to byte or something, and you have the equivalent of setting the NO_SCAN bit.

If you regularly do that, you won't have a moving garbage collector. If you do that only when you're low on memory, that might be acceptable.
Dec 13 2008
prev sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Sean Kelly" wrote
 == Quote from Christopher Wright (dhasenan gmail.com)'s article
 This isn't unreasonable, and it really just depends on how complete and
 how fast D's runtime reflection is.

This would incur more memory overhead as well. Right now, we make do with one bit per block that indicates whether the block should be scanned for pointers, while this may theoretically require a pointer to TypeInfo per block in order to perform precise scanning.

You need a bitfield indicating which ptr-sized blocks are valid pointers. For example, a 16-byte block (on 32-bit system) only requires 4 bits. For 16, 32, 64, 128 byte blocks, you can probably get away with just a bitfield per block. For larger ones, you may have to have a pointer to a bitfield. But the bitfield should also always contain a bit that says whether the block is an array. That way, you only need one bitfield per element (i.e. the bit pattern repeats). Perhaps, you have one bit that says whether the block is an array, one bit that says whether the 'bitfield' is really a pointer to a static bitfield (for larger elements). These bitfields can easily be stored with the TypeInfo by the compiler. -Steve
Dec 13 2008
prev sibling parent Brad Roberts <braddr puremagic.com> writes:
Christopher Wright wrote:
 dsimcha wrote:
 However, after thinking about it, I doubt D will go the route of
 moving/more
 precise GC.  I know I've advocated it in the past, but I've realized
 that false
 pointers aren't generally that big a deal if you delete a few huge
 (>1MB) objects
 manually.  Furthermore, I think it's necessary in a systems
 programming language
 to be able to allocate a big chunk of untyped memory.  Technically,
 this could
 still be done with a moving GC by using the C heap, but it would make
 things
 really, really complicated.

You'd have to do more bookkeeping: - this memory is referenced by something that I know is a pointer - this memory is referenced by something that might be a pointer (or some random bit pattern) - this memory is not referenced This isn't unreasonable, and it really just depends on how complete and how fast D's runtime reflection is.

Maybe isn't good enough. You have to definitively be able to identify every pointer to an object to be able to move the object since every pointer must be adjusted. The only alternative to that is double indirection for movable objects. With double indirection you point to a pointer that the GC owns and can update when the objects are moved. There's HUGE benefits to a moving gc. The gc in java 6 is _extremely_ clever (and not all that hard to write either, at least most of it is.. there's a lot of details in the heuristics parts), but 100% relies on moving objects. Later, Brad
Dec 12 2008
prev sibling parent Stewart Gordon <smjg_1998 yahoo.com> writes:
dsimcha wrote:
<snip>
 What are the odds that some D implementation gets a moving GC in the
 foreseeable future?

I think the underlying question is: Has anybody yet foreseen http://d.puremagic.com/issues/show_bug.cgi?id=679 being fixed?
 Are they significant enough that I should avoid relying on the GC 
 being non-moving, or is worrying about such things just overdoing 
 trying to be future-proof?

Depends on whether you're writing an application or a library, I guess. Stewart.
Dec 25 2008