digitalmars.D - Moving GC

dsimcha (10/10) Dec 12 2008 I've been playing around with custom memory allocation schemes for some

Sean Kelly (4/7) Dec 12 2008 Minuscule. The GC isn't provided nearly enough type information to

Don (9/19) Dec 13 2008 An related case that could probably be done is pure functions. Inside a

Sergey Gromov (5/16) Dec 12 2008 Is this at all possible for a conservative GC to be moving? It's one

dsimcha (15/31) Dec 12 2008 There are different levels of conservatism, though. For example, IIRC, ...

Christopher Wright (8/15) Dec 12 2008 You'd have to do more bookkeeping:

Sean Kelly (6/8) Dec 12 2008 This would incur more memory overhead as well. Right now, we make

dsimcha (12/21) Dec 12 2008 Under what use cases would this extra few bytes really matter? What kin...

Christopher Wright (6/18) Dec 12 2008 It means that you can't use one block for objects of multiple types.

dsimcha (7/8) Dec 12 2008 Sure you can, without being any worse off than under the current scheme....

Fawzi Mohamed (8/23) Dec 13 2008 it should be an array of void (or union pointer, non pointer), they
Christopher Wright (3/12) Dec 13 2008 If you regularly do that, you won't have a moving garbage collector. If

Steven Schveighoffer (14/22) Dec 13 2008 You need a bitfield indicating which ptr-sized blocks are valid pointers...

Brad Roberts (12/35) Dec 12 2008 Maybe isn't good enough. You have to definitively be able to identify

Stewart Gordon (7/12) Dec 25 2008 I think the underlying question is: Has anybody yet foreseen

dsimcha <dsimcha yahoo.com> writes:

I've been playing around with custom memory allocation schemes for some
programs, which are based on regions.  These work great, and are significantly
faster for the use cases I'm using them for, than the general allocation
scheme.  However, everything would break if it were used with reference types
and a moving GC because, as far as the GC knows, the regions I'm allocating
from it are untyped.

What are the odds that some D implementation gets a moving GC in the
foreseeable future?  Are they significant enough that I should avoid relying
on the GC being non-moving, or is worrying about such things just overdoing
trying to be future-proof?

Dec 12 2008

Sean Kelly <sean invisibleduck.org> writes:

dsimcha wrote:
 
 What are the odds that some D implementation gets a moving GC in the
 foreseeable future?

Minuscule.  The GC isn't provided nearly enough type information to 
safely move memory right now, and I don't see that changing any time soon.


Sean

Dec 12 2008

Don <nospam nospam.com> writes:

Sean Kelly wrote:
 dsimcha wrote:
 What are the odds that some D implementation gets a moving GC in the
 foreseeable future?

 
 Minuscule.  The GC isn't provided nearly enough type information to 
 safely move memory right now, and I don't see that changing any time soon.
 
 
 Sean

An related case that could probably be done is pure functions. Inside a 
pure function, all allocations could be made on a 'pure heap' instead of 
the normal heap. When returning from a pure function back into a 
non-pure function, copy the return value into the normal heap (moving it 
in the process). Then discard the entire pure heap.
(This is a *huge* win if the return value from the pure function is a 
POD, and the pure function does extensive memory allocation).

Of course, that's more like a memory pool rather a gc.

Dec 13 2008

Sergey Gromov <snake.scaly gmail.com> writes:

Fri, 12 Dec 2008 15:05:46 +0000 (UTC), dsimcha wrote:

 I've been playing around with custom memory allocation schemes for some
 programs, which are based on regions.  These work great, and are significantly
 faster for the use cases I'm using them for, than the general allocation
 scheme.  However, everything would break if it were used with reference types
 and a moving GC because, as far as the GC knows, the regions I'm allocating
 from it are untyped.
 
 What are the odds that some D implementation gets a moving GC in the
 foreseeable future?  Are they significant enough that I should avoid relying
 on the GC being non-moving, or is worrying about such things just overdoing
 trying to be future-proof?

Is this at all possible for a conservative GC to be moving?  It's one
thing when some unused memory is kept allocated due to false pointers,
and another when some data resembling pointers changes unpredictably.

And GC is to stay conservative as long as unions are supported.

Dec 12 2008

dsimcha <dsimcha yahoo.com> writes:

== Quote from Sergey Gromov (snake.scaly gmail.com)'s article
 Fri, 12 Dec 2008 15:05:46 +0000 (UTC), dsimcha wrote:
 I've been playing around with custom memory allocation schemes for some
 programs, which are based on regions.  These work great, and are significantly
 faster for the use cases I'm using them for, than the general allocation
 scheme.  However, everything would break if it were used with reference types
 and a moving GC because, as far as the GC knows, the regions I'm allocating
 from it are untyped.

 What are the odds that some D implementation gets a moving GC in the
 foreseeable future?  Are they significant enough that I should avoid relying
 on the GC being non-moving, or is worrying about such things just overdoing
 trying to be future-proof?

 Is this at all possible for a conservative GC to be moving?  It's one
 thing when some unused memory is kept allocated due to false pointers,
 and another when some data resembling pointers changes unpredictably.
 And GC is to stay conservative as long as unions are supported.

There are different levels of conservatism, though.  For example, IIRC, the Mono
GC scans the stack conservatively and pins all objects that have apparent
pointers
from the stack.  However, it scans the heap precisely and moves objects that
only
have pointers from the heap.  In theory (not saying it will or even should
actually happen) D could implement a mostly precise GC that makes the
conservative
assumption only in the case of unions, and pins objects that may be pointed to
by
pointers contained in unions.

However, after thinking about it, I doubt D will go the route of moving/more
precise GC.  I know I've advocated it in the past, but I've realized that false
pointers aren't generally that big a deal if you delete a few huge (>1MB)
objects
manually.  Furthermore, I think it's necessary in a systems programming language
to be able to allocate a big chunk of untyped memory.  Technically, this could
still be done with a moving GC by using the C heap, but it would make things
really, really complicated.

Dec 12 2008

Christopher Wright <dhasenan gmail.com> writes:

dsimcha wrote:
 However, after thinking about it, I doubt D will go the route of moving/more
 precise GC.  I know I've advocated it in the past, but I've realized that false
 pointers aren't generally that big a deal if you delete a few huge (>1MB)
objects
 manually.  Furthermore, I think it's necessary in a systems programming
language
 to be able to allocate a big chunk of untyped memory.  Technically, this could
 still be done with a moving GC by using the C heap, but it would make things
 really, really complicated.

You'd have to do more bookkeeping:
- this memory is referenced by something that I know is a pointer
- this memory is referenced by something that might be a pointer (or 
some random bit pattern)
- this memory is not referenced

This isn't unreasonable, and it really just depends on how complete and 
how fast D's runtime reflection is.

Dec 12 2008

Sean Kelly <sean invisibleduck.org> writes:

== Quote from Christopher Wright (dhasenan gmail.com)'s article
 This isn't unreasonable, and it really just depends on how complete and
 how fast D's runtime reflection is.

This would incur more memory overhead as well.  Right now, we make
do with one bit per block that indicates whether the block should be
scanned for pointers, while this may theoretically require a pointer
to TypeInfo per block in order to perform precise scanning.


Sean

Dec 12 2008

dsimcha <dsimcha yahoo.com> writes:

== Quote from Sean Kelly (sean invisibleduck.org)'s article
 == Quote from Christopher Wright (dhasenan gmail.com)'s article
 This isn't unreasonable, and it really just depends on how complete and
 how fast D's runtime reflection is.

 This would incur more memory overhead as well.  Right now, we make
 do with one bit per block that indicates whether the block should be
 scanned for pointers, while this may theoretically require a pointer
 to TypeInfo per block in order to perform precise scanning.
 Sean

Under what use cases would this extra few bytes really matter?  What kinds of
programs allocate so many objects that this overhead would be significant in
practice?

Note that I'm not advocating a moving GC, because in a systems language like D,
where working with pointers and untyped memory blocks is supposed to be
feasible,
I think it would create more problems/limitations than it solves.  For example,
now it's ok to store a pointer to GC allocated memory in an area not scanned by
the GC, such as the C heap or a memory block marked as NO_SCAN, as long as you
also have another pointer to it somewhere else.  With a moving GC, this would
become problematic.

However, a more precise non-moving GC (scans the whole heap precisely, only
imprecise on the stack) would be nice if it can be reasonably implemented.

Dec 12 2008

Christopher Wright <dhasenan gmail.com> writes:

dsimcha wrote:
 == Quote from Sean Kelly (sean invisibleduck.org)'s article
 == Quote from Christopher Wright (dhasenan gmail.com)'s article
 This isn't unreasonable, and it really just depends on how complete and
 how fast D's runtime reflection is.

 This would incur more memory overhead as well.  Right now, we make
 do with one bit per block that indicates whether the block should be
 scanned for pointers, while this may theoretically require a pointer
 to TypeInfo per block in order to perform precise scanning.
 Sean

 
 Under what use cases would this extra few bytes really matter?  What kinds of
 programs allocate so many objects that this overhead would be significant in
practice?

It means that you can't use one block for objects of multiple types. 
That's a limitation on how you implement a GC, and it could possibly 
increase your overhead a lot more than converting a bit to a word. 
(Unless you use some tricks, that bit will take at least one byte, maybe 
more depending on alignment.)

Dec 12 2008

dsimcha <dsimcha yahoo.com> writes:

== Quote from Christopher Wright (dhasenan gmail.com)'s article
 It means that you can't use one block for objects of multiple types.

Sure you can, without being any worse off than under the current scheme.  Mark
the
contents of the memory as an array of void*s.  This would have the same effect
as
marking it for GC scanning under the current scheme, basically making the GC
scan
the block conservatively.  Furthermore, if you're storing value types or
pointers
to reference types that also have a pointer stored in a GC-scanned block, set
the
typeinfo to byte or something, and you have the equivalent of setting the
NO_SCAN bit.

Dec 12 2008

Fawzi Mohamed <fmohamed mac.com> writes:

On 2008-12-13 04:24:54 +0100, dsimcha <dsimcha yahoo.com> said:

 == Quote from Christopher Wright (dhasenan gmail.com)'s article
 It means that you can't use one block for objects of multiple types.

 
 Sure you can, without being any worse off than under the current 
 scheme.  Mark the
 contents of the memory as an array of void*s.  This would have the same 
 effect as
 marking it for GC scanning under the current scheme, basically making 
 the GC scan
 the block conservatively.  Furthermore, if you're storing value types 
 or pointers
 to reference types that also have a pointer stored in a GC-scanned 
 block, set the
 typeinfo to byte or something, and you have the equivalent of setting 
 the NO_SCAN bit.

it should be an array of void (or union pointer, non pointer), they 
should be pinned.
Also all pointers given to C (and potentially pointers reachable from 
them) should be pinned.
This makes system programming difficult.
If one has a closed system with just D then it is easier.

Fawzi

Dec 13 2008

Christopher Wright <dhasenan gmail.com> writes:

dsimcha wrote:
 == Quote from Christopher Wright (dhasenan gmail.com)'s article
 It means that you can't use one block for objects of multiple types.

 
 Sure you can, without being any worse off than under the current scheme.  Mark
the
 contents of the memory as an array of void*s.  This would have the same effect
as
 marking it for GC scanning under the current scheme, basically making the GC
scan
 the block conservatively.  Furthermore, if you're storing value types or
pointers
 to reference types that also have a pointer stored in a GC-scanned block, set
the
 typeinfo to byte or something, and you have the equivalent of setting the
NO_SCAN bit.

If you regularly do that, you won't have a moving garbage collector. If 
you do that only when you're low on memory, that might be acceptable.

Dec 13 2008

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

"Sean Kelly" wrote
 == Quote from Christopher Wright (dhasenan gmail.com)'s article
 This isn't unreasonable, and it really just depends on how complete and
 how fast D's runtime reflection is.

 This would incur more memory overhead as well.  Right now, we make
 do with one bit per block that indicates whether the block should be
 scanned for pointers, while this may theoretically require a pointer
 to TypeInfo per block in order to perform precise scanning.

You need a bitfield indicating which ptr-sized blocks are valid pointers.

For example, a 16-byte block (on 32-bit system) only requires 4 bits.

For 16, 32, 64, 128 byte blocks, you can probably get away with just a 
bitfield per block.

For larger ones, you may have to have a pointer to a bitfield.  But the 
bitfield should also always contain a bit that says whether the block is an 
array.  That way, you only need one bitfield per element (i.e. the bit 
pattern repeats).

Perhaps, you have one bit that says whether the block is an array, one bit 
that says whether the 'bitfield' is really a pointer to a static bitfield 
(for larger elements).

These bitfields can easily be stored with the TypeInfo by the compiler.

-Steve

Dec 13 2008

Brad Roberts <braddr puremagic.com> writes:

Christopher Wright wrote:
 dsimcha wrote:
 However, after thinking about it, I doubt D will go the route of
 moving/more
 precise GC.  I know I've advocated it in the past, but I've realized
 that false
 pointers aren't generally that big a deal if you delete a few huge
 (>1MB) objects
 manually.  Furthermore, I think it's necessary in a systems
 programming language
 to be able to allocate a big chunk of untyped memory.  Technically,
 this could
 still be done with a moving GC by using the C heap, but it would make
 things
 really, really complicated.

 
 You'd have to do more bookkeeping:
 - this memory is referenced by something that I know is a pointer
 - this memory is referenced by something that might be a pointer (or
 some random bit pattern)
 - this memory is not referenced
 
 This isn't unreasonable, and it really just depends on how complete and
 how fast D's runtime reflection is.

Maybe isn't good enough. You have to definitively be able to identify
every pointer to an object to be able to move the object since every
pointer must be adjusted.  The only alternative to that is double
indirection for movable objects.  With double indirection you point to a
pointer that the GC owns and can update when the objects are moved.

There's HUGE benefits to a moving gc.  The gc in java 6 is _extremely_
clever (and not all that hard to write either, at least most of it is..
there's a lot of details in the heuristics parts), but 100% relies on
moving objects.

Later,
Brad

Dec 12 2008

Stewart Gordon <smjg_1998 yahoo.com> writes:

dsimcha wrote:
<snip>
 What are the odds that some D implementation gets a moving GC in the
 foreseeable future?

I think the underlying question is: Has anybody yet foreseen
http://d.puremagic.com/issues/show_bug.cgi?id=679
being fixed?

 Are they significant enough that I should avoid relying on the GC 
 being non-moving, or is worrying about such things just overdoing 
 trying to be future-proof?

Depends on whether you're writing an application or a library, I guess.

Stewart.

Dec 25 2008

D Programming

C/C++ Programming

Other

digitalmars.D - Moving GC