digitalmars.D - GC Precision

dsimcha (31/31) Oct 26 2009 I just realized last night that D's templates are probably powerful enou...

Sean Kelly (2/12) Oct 26 2009 I've thought about it, but not done anything about it. The compiler doe...

dsimcha (5/17) Oct 26 2009 provide this information, so precise scanning would require a user-level...

Sean Kelly (2/20) Oct 26 2009 Nope.

Andrei Alexandrescu (5/27) Oct 26 2009 One question is, is there enough information for stack variables? My

dsimcha (16/43) Oct 26 2009 That's why I said precise *heap* scanning. This would solve probably 90...

Andrei Alexandrescu (3/47) Oct 26 2009 Absolutely! I think that's great work. Thanks for clarifying things for ...

dsimcha (49/49) Oct 26 2009 I've spent some free brain cycles today thinking about this and here's t...

Leandro Lucarella (35/90) Oct 26 2009 Did you read the thread I posted? What do you think about Fritz's idea o...

dsimcha (23/83) Oct 26 2009 As far as I can tell, RTTI doesn't have all the stuff that's needed yet ...

Leandro Lucarella (28/67) Oct 27 2009 I guess you don't. You have a bin size, so you use the bin size. I guess

Leandro Lucarella (18/45) Oct 26 2009 There is some discussion about precise stack in the thread I cited too.

Leandro Lucarella (21/57) Oct 26 2009 Maybe you're talking about me. I didn't have the chance to play with thi...
bearophile (12/15) Oct 29 2009 I agree that here doing something simple now is better than doing nothin...

Jacob Carlborg (3/18) Oct 29 2009 The current implementation of toHash in Object does that: return

bearophile (4/6) Oct 29 2009 I agree, such things will have to change when D wants a moving GC.

dsimcha <dsimcha yahoo.com> writes:

I just realized last night that D's templates are probably powerful enough now
to generate bit masks that can be used for precise GC heap scanning.  I'm
halfway (emphasis on halfway) thinking of using this to try to hack the GC and
make heap scanning fully precise except for the corner case of unions.
However, this ties into several things that others in the D community are
doing, so I want to gauge people's responses and make sure I'm not wasting
effort on something that will be useless in 6 months.

1.  Sean, Leonardo, whoever else may be working on GC implementations, have
you by any chance broken ground on precise heap scanning already?

2.  Andrei, Walter, how close are we to actually eliminating new from the
language?  If all allocations were done by either calling GC.malloc() or using
templates that call GC.malloc(), then things would get a lot simpler than if I
were to have to hack the compiler to make new pass type info to the GC.

3.  I'm thinking bit masks could be stored as follows:

When getBitMask!(T) is instantiated, it generates an immutable size_t[N].
Element 0 is the size of the array (to allow for storing only the ptr in the
GC), element 1 is the size of one instance of the object, in bytes.  The size
of the memory block must be a multiple of this.  Elements 2..$ are all of the
offsets that should be scanned for pointers.  For example:

struct Foo {
    uint bar;
    void* baz;
}

getBitMask!(Foo);  // [3, 8, 4].

That leaves the problem of where/how to store the pointers to this information
in the GC efficiently.  I haven't gotten that far yet, but I remember some
concerns have been raised in the past about storing 4 bytes per GC object for
a pointer to the bitmask.  For my use cases, where I tend to allocate a
relatively small number of relatively large objects, this isn't a problem.
However, in a heavily OO environment, where people allocate tons of tiny
objects, it might be.

Oct 26 2009

Sean Kelly <sean invisibleduck.org> writes:

dsimcha Wrote:

 I just realized last night that D's templates are probably powerful enough now
 to generate bit masks that can be used for precise GC heap scanning.  I'm
 halfway (emphasis on halfway) thinking of using this to try to hack the GC and
 make heap scanning fully precise except for the corner case of unions.
 However, this ties into several things that others in the D community are
 doing, so I want to gauge people's responses and make sure I'm not wasting
 effort on something that will be useless in 6 months.
 
 1.  Sean, Leonardo, whoever else may be working on GC implementations, have
 you by any chance broken ground on precise heap scanning already?

I've thought about it, but not done anything about it.  The compiler doesn't
provide this information, so precise scanning would require a user-level call. 
You'll also have to deal with arrays of structs, by the way.

Oct 26 2009

dsimcha <dsimcha yahoo.com> writes:

== Quote from Sean Kelly (sean invisibleduck.org)'s article
 dsimcha Wrote:
 I just realized last night that D's templates are probably powerful enough now
 to generate bit masks that can be used for precise GC heap scanning.  I'm
 halfway (emphasis on halfway) thinking of using this to try to hack the GC and
 make heap scanning fully precise except for the corner case of unions.
 However, this ties into several things that others in the D community are
 doing, so I want to gauge people's responses and make sure I'm not wasting
 effort on something that will be useless in 6 months.

 1.  Sean, Leonardo, whoever else may be working on GC implementations, have
 you by any chance broken ground on precise heap scanning already?

 I've thought about it, but not done anything about it.  The compiler doesn't

provide this information, so precise scanning would require a user-level call.
You'll also have to deal with arrays of structs, by the way.

Arrays of structs are easy:  Generate a bitmask for one element, and keep
reusing
that bitmask until the end of the block.  Am I missing something?

Oct 26 2009

Sean Kelly <sean invisibleduck.org> writes:

dsimcha Wrote:

 == Quote from Sean Kelly (sean invisibleduck.org)'s article
 dsimcha Wrote:
 I just realized last night that D's templates are probably powerful enough now
 to generate bit masks that can be used for precise GC heap scanning.  I'm
 halfway (emphasis on halfway) thinking of using this to try to hack the GC and
 make heap scanning fully precise except for the corner case of unions.
 However, this ties into several things that others in the D community are
 doing, so I want to gauge people's responses and make sure I'm not wasting
 effort on something that will be useless in 6 months.

 1.  Sean, Leonardo, whoever else may be working on GC implementations, have
 you by any chance broken ground on precise heap scanning already?

 I've thought about it, but not done anything about it.  The compiler doesn't

 provide this information, so precise scanning would require a user-level call.
 You'll also have to deal with arrays of structs, by the way.
 
 Arrays of structs are easy:  Generate a bitmask for one element, and keep
reusing
 that bitmask until the end of the block.  Am I missing something?

Nope.

Oct 26 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Sean Kelly wrote:
 dsimcha Wrote:
 
 == Quote from Sean Kelly (sean invisibleduck.org)'s article
 dsimcha Wrote:
 I just realized last night that D's templates are probably powerful enough now
 to generate bit masks that can be used for precise GC heap scanning.  I'm
 halfway (emphasis on halfway) thinking of using this to try to hack the GC and
 make heap scanning fully precise except for the corner case of unions.
 However, this ties into several things that others in the D community are
 doing, so I want to gauge people's responses and make sure I'm not wasting
 effort on something that will be useless in 6 months.

 1.  Sean, Leonardo, whoever else may be working on GC implementations, have
 you by any chance broken ground on precise heap scanning already?

 I've thought about it, but not done anything about it.  The compiler doesn't

 provide this information, so precise scanning would require a user-level call.
 You'll also have to deal with arrays of structs, by the way.

 Arrays of structs are easy:  Generate a bitmask for one element, and keep
reusing
 that bitmask until the end of the block.  Am I missing something?

 
 Nope.

One question is, is there enough information for stack variables? My 
understanding from a while ago was that heap data could be reasonably 
analyzed, but stack data has no info associated with it.

Andrei

Oct 26 2009

dsimcha <dsimcha yahoo.com> writes:

== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 Sean Kelly wrote:
 dsimcha Wrote:

 == Quote from Sean Kelly (sean invisibleduck.org)'s article
 dsimcha Wrote:
 I just realized last night that D's templates are probably powerful enough now
 to generate bit masks that can be used for precise GC heap scanning.  I'm
 halfway (emphasis on halfway) thinking of using this to try to hack the GC and
 make heap scanning fully precise except for the corner case of unions.
 However, this ties into several things that others in the D community are
 doing, so I want to gauge people's responses and make sure I'm not wasting
 effort on something that will be useless in 6 months.

 1.  Sean, Leonardo, whoever else may be working on GC implementations, have
 you by any chance broken ground on precise heap scanning already?

 I've thought about it, but not done anything about it.  The compiler doesn't

 provide this information, so precise scanning would require a user-level call.
 You'll also have to deal with arrays of structs, by the way.

 Arrays of structs are easy:  Generate a bitmask for one element, and keep
reusing
 that bitmask until the end of the block.  Am I missing something?

 Nope.

 One question is, is there enough information for stack variables? My
 understanding from a while ago was that heap data could be reasonably
 analyzed, but stack data has no info associated with it.
 Andrei

That's why I said precise *heap* scanning.  This would solve probably 90+% of
the
problem w/ false pointers without requiring any changes with major ripple
effects,
i.e. only druntime, not the compiler, would need to be hacked.  Admittedly,
though, unless you came up w/ some pinning scheme for stack variables and
unions,
it still wouldn't allow a moving GC.

I personally am much more interested in a decent solution to our GC woes now
than
a perfect one at some point indefinitely far into the future.  Right now, when
working with programs that use more than maybe 100-200 MB of memory, false
pointers become such a problem that the GC is almost useless, yet all kinds of
library code still uses the GC heap, which is why I resisted the idea of
removing
GC.free() so strongly.  As I see it, the biggest problem is false pointers, with
the fact that every allocation requires a lock in a close second.  These are the
low-hanging fruit.  A moving GC, one that doesn't stop the world on collection,
and one that's fully precise including stack would be nice, but they're several
orders of magnitude less important and would also have more ripple effects.

Oct 26 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

dsimcha wrote:
 == Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 Sean Kelly wrote:
 dsimcha Wrote:

 == Quote from Sean Kelly (sean invisibleduck.org)'s article
 dsimcha Wrote:
 I just realized last night that D's templates are probably powerful enough now
 to generate bit masks that can be used for precise GC heap scanning.  I'm
 halfway (emphasis on halfway) thinking of using this to try to hack the GC and
 make heap scanning fully precise except for the corner case of unions.
 However, this ties into several things that others in the D community are
 doing, so I want to gauge people's responses and make sure I'm not wasting
 effort on something that will be useless in 6 months.

 1.  Sean, Leonardo, whoever else may be working on GC implementations, have
 you by any chance broken ground on precise heap scanning already?

 I've thought about it, but not done anything about it.  The compiler doesn't

 provide this information, so precise scanning would require a user-level call.
 You'll also have to deal with arrays of structs, by the way.

 Arrays of structs are easy:  Generate a bitmask for one element, and keep
reusing
 that bitmask until the end of the block.  Am I missing something?

 Nope.

 One question is, is there enough information for stack variables? My
 understanding from a while ago was that heap data could be reasonably
 analyzed, but stack data has no info associated with it.
 Andrei

 
 That's why I said precise *heap* scanning.  This would solve probably 90+% of
the
 problem w/ false pointers without requiring any changes with major ripple
effects,
 i.e. only druntime, not the compiler, would need to be hacked.  Admittedly,
 though, unless you came up w/ some pinning scheme for stack variables and
unions,
 it still wouldn't allow a moving GC.
 
 I personally am much more interested in a decent solution to our GC woes now
than
 a perfect one at some point indefinitely far into the future.  Right now, when
 working with programs that use more than maybe 100-200 MB of memory, false
 pointers become such a problem that the GC is almost useless, yet all kinds of
 library code still uses the GC heap, which is why I resisted the idea of
removing
 GC.free() so strongly.  As I see it, the biggest problem is false pointers,
with
 the fact that every allocation requires a lock in a close second.  These are
the
 low-hanging fruit.  A moving GC, one that doesn't stop the world on collection,
 and one that's fully precise including stack would be nice, but they're several
 orders of magnitude less important and would also have more ripple effects.

Absolutely! I think that's great work. Thanks for clarifying things for me.

Andrei

Oct 26 2009

dsimcha <dsimcha yahoo.com> writes:

I've spent some free brain cycles today thinking about this and here's the
scheme
I have in mind.  If anyone thinks this could be improved in a way that would not
have substantial ripple effects throughout the compiler/language (because then
it
might never actually get implemented) let me know.

1.  GC.malloc's signature changes to GC.malloc(size_t size, uint ba = 0, size_t*
bitmask = null).  A null bitmask means use the old-school behavior and either
scan
everything or don't scan anything based on the NO_SCAN bit.  IMHO plain old
conservative scanning must be supported to allow for untyped memory blocks to be
allocated, because requiring every memory block to have a type associated with
it
is an unacceptable limitation in a systems language.

For now, the only way to get precise scanning would be to call malloc directly
or
use a template that does.  Eventually new would either be fixed to instantiate
the
bitMask!(T) template or eliminated entirely.  Since using the precise scanning
by
writing a template that calls GC.malloc() is a lot easier than hacking the
compiler, this may be a catalyst for getting rid of new.

2.  Some concern has been expressed in the past about the possibility of using 4
bytes per block in overhead to store pointers to bitmasks.  IMHO this concern is
misplaced because in any program that takes up enough memory for space
efficiency
to matter, false pointers waste more space.  Furthermore, if you're programming
for a toaster oven, elevator controller, etc., you probably aren't going to use
the standard out-of-the-box GC anyhow.  However, I've thought of ways to
mitigate
this and have come up with the following:

    A.  Store a pointer to the bitmask iff the NO_SCAN bit is set.
        this is a no-brainer and will prevent any overhead on,
        for example, arrays of floats.

    B.  Add another attributes bit to the GC called something like
        NEEDS_BITMASK.  This bit would be set iff an object mixes
        pointers and non-pointers.  If it's not set, no bitmask
        pointer would be stored.  However, the overhead of an
        extra bit may or may not be worth it.

3.  The bitmask pointer would be stored at the end of every GC-allocated block
for
which a bitmask pointer is stored.  The reason for using the end of the block
instead of the beginning is just implementation simplicity:  That way, finding
the
beginning of a block would work the same whether or not we have a bitmask
pointer.

4.  The bitmask would be a size_t[] created at compile time by a template and
stored in the static data segment.  Its layout would be [length of array,
T.sizeof, offsets that need to be scanned].  For example, if you have something
like:

struct Foo {
    uint bar;
    void* ptr;
}

On a 32-bit machine, bitMask!Foo would be [3, 8, 4].  On a 64-bit, it would be
[3,
16, 8].  The reason the size of the array is stored in the array is so that we
can
get away with storing a single ptr in each memory block instead of a pointer
and a
length.

5.  To store information about pinning, we simply use the high-order bits of the
pointer offsets.  1 means pinned, 0 means not pinned.  This means that, for any
type T, T.sizeof can't be bigger than size_t.max / 2.  I think this is a fairly
minor limitation.

Oct 26 2009

Leandro Lucarella <llucax gmail.com> writes:

dsimcha, el 26 de octubre a las 23:05 me escribiste:
 I've spent some free brain cycles today thinking about this and here's the
scheme
 I have in mind.  If anyone thinks this could be improved in a way that would
not
 have substantial ripple effects throughout the compiler/language (because then
it
 might never actually get implemented) let me know.
 
 1.  GC.malloc's signature changes to GC.malloc(size_t size, uint ba = 0,
size_t*
 bitmask = null).  A null bitmask means use the old-school behavior and either
scan
 everything or don't scan anything based on the NO_SCAN bit.  IMHO plain old
 conservative scanning must be supported to allow for untyped memory blocks to
be
 allocated, because requiring every memory block to have a type associated with
it
 is an unacceptable limitation in a systems language.
 
 For now, the only way to get precise scanning would be to call malloc directly
or
 use a template that does.  Eventually new would either be fixed to instantiate
the
 bitMask!(T) template or eliminated entirely.  Since using the precise scanning
by
 writing a template that calls GC.malloc() is a lot easier than hacking the
 compiler, this may be a catalyst for getting rid of new.

Did you read the thread I posted? What do you think about Fritz's idea on
how to encode the pointers information? I'm not very familiar with
TypeInfo, but wouldn't be more natural to pass the TypeInfo to GC.malloc()
directly if it can get the pointer information itself instead of
translating that information? I think if that's possible it will keep the
GC interface simpler.

 2.  Some concern has been expressed in the past about the possibility of using
4
 bytes per block in overhead to store pointers to bitmasks.  IMHO this concern
is
 misplaced because in any program that takes up enough memory for space
efficiency
 to matter, false pointers waste more space.  Furthermore, if you're programming
 for a toaster oven, elevator controller, etc., you probably aren't going to use
 the standard out-of-the-box GC anyhow.

This seems reasonable, but I don't see why we can't use a more efficient
way to store this information in the GC. Implementation simplicity (to
have a better GC now instead of a perfect GC in some point in a far
future) is a good enough reason :) I'm just curious if you found any flaws
in the scheme proposed by Frits or is just you want a simpler
implementation.

 However, I've thought of ways to mitigate this and have come up with the
 following:
 
     A.  Store a pointer to the bitmask iff the NO_SCAN bit is set.
         this is a no-brainer and will prevent any overhead on,
         for example, arrays of floats.

Good idea.

     B.  Add another attributes bit to the GC called something like
         NEEDS_BITMASK.  This bit would be set iff an object mixes
         pointers and non-pointers.  If it's not set, no bitmask
         pointer would be stored.  However, the overhead of an
         extra bit may or may not be worth it.

You mean for the special case where all the attributes of an object are
pointers? I think this should be rare enough, so I doubt about the
utility, but maybe I'm missing some common case.

 3.  The bitmask pointer would be stored at the end of every GC-allocated block
for
 which a bitmask pointer is stored.  The reason for using the end of the block
 instead of the beginning is just implementation simplicity:  That way, finding
the
 beginning of a block would work the same whether or not we have a bitmask
pointer.

Did you even consider storing this information outside the memory block
(like the flags)? I think storing them in the memory block can be annoying
if they are not stored always because now your fixed sized blocks are not
fixed. It might be very easy to overcome this, but maybe thinking about
the other option is worthy.

 4.  The bitmask would be a size_t[] created at compile time by a template and
 stored in the static data segment.  Its layout would be [length of array,
 T.sizeof, offsets that need to be scanned].  For example, if you have
something like:

Again, I wonder if this information can't be still obtained from the
TypeInfo.

 struct Foo {
     uint bar;
     void* ptr;
 }
 
 On a 32-bit machine, bitMask!Foo would be [3, 8, 4].  On a 64-bit, it would be
[3,
 16, 8].  The reason the size of the array is stored in the array is so that we
can
 get away with storing a single ptr in each memory block instead of a pointer
and a
 length.
 
 5.  To store information about pinning, we simply use the high-order bits of
the
 pointer offsets.  1 means pinned, 0 means not pinned.  This means that, for any
 type T, T.sizeof can't be bigger than size_t.max / 2.  I think this is a fairly
 minor limitation.

I like from Frits's proposal that information about weak pointer were
added too, this might fix another big missing in the memory management
area in D.

-- 
Leandro Lucarella (AKA luca)                     http://llucax.com.ar/
----------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------
Agitamos toda la zona de la paleta al horno, vemos que una luz nos
atraviesa toda la zona intestinal...
	-- Sidharta Kiwi

Oct 26 2009

dsimcha <dsimcha yahoo.com> writes:

== Quote from Leandro Lucarella (llucax gmail.com)'s article
 dsimcha, el 26 de octubre a las 23:05 me escribiste:
 I've spent some free brain cycles today thinking about this and here's the
scheme
 I have in mind.  If anyone thinks this could be improved in a way that would
not
 have substantial ripple effects throughout the compiler/language (because then
it
 might never actually get implemented) let me know.

 1.  GC.malloc's signature changes to GC.malloc(size_t size, uint ba = 0,
size_t*
 bitmask = null).  A null bitmask means use the old-school behavior and either
scan
 everything or don't scan anything based on the NO_SCAN bit.  IMHO plain old
 conservative scanning must be supported to allow for untyped memory blocks to
be
 allocated, because requiring every memory block to have a type associated with
it
 is an unacceptable limitation in a systems language.

 For now, the only way to get precise scanning would be to call malloc directly
or
 use a template that does.  Eventually new would either be fixed to instantiate
the
 bitMask!(T) template or eliminated entirely.  Since using the precise scanning
by
 writing a template that calls GC.malloc() is a lot easier than hacking the
 compiler, this may be a catalyst for getting rid of new.

 Did you read the thread I posted? What do you think about Fritz's idea on
 how to encode the pointers information? I'm not very familiar with
 TypeInfo, but wouldn't be more natural to pass the TypeInfo to GC.malloc()
 directly if it can get the pointer information itself instead of
 translating that information? I think if that's possible it will keep the
 GC interface simpler.

As far as I can tell, RTTI doesn't have all the stuff that's needed yet for
structs and adding it would require hacking the compiler.  Frankly, I want this
to
be simple enough to actually get implemented as opposed to just being talked
about
and deadlocking on everyone waiting on everyone else to do something.

 2.  Some concern has been expressed in the past about the possibility of using
4
 bytes per block in overhead to store pointers to bitmasks.  IMHO this concern
is
 misplaced because in any program that takes up enough memory for space
efficiency
 to matter, false pointers waste more space.  Furthermore, if you're programming
 for a toaster oven, elevator controller, etc., you probably aren't going to use
 the standard out-of-the-box GC anyhow.

 This seems reasonable, but I don't see why we can't use a more efficient
 way to store this information in the GC. Implementation simplicity (to
 have a better GC now instead of a perfect GC in some point in a far
 future) is a good enough reason :) I'm just curious if you found any flaws
 in the scheme proposed by Frits or is just you want a simpler
 implementation.

Two things:  Implementation simplicity is one.  As I've been alluding to, worse
and works in practice is better than better and only exists on paper.  The other
is that I don't understand, in Frits's approach, how do you store the size of
the
object so you know how many bits to interpret as the mask?

 However, I've thought of ways to mitigate this and have come up with the
 following:

     A.  Store a pointer to the bitmask iff the NO_SCAN bit is set.
         this is a no-brainer and will prevent any overhead on,
         for example, arrays of floats.

 Good idea.
     B.  Add another attributes bit to the GC called something like
         NEEDS_BITMASK.  This bit would be set iff an object mixes
         pointers and non-pointers.  If it's not set, no bitmask
         pointer would be stored.  However, the overhead of an
         extra bit may or may not be worth it.

 You mean for the special case where all the attributes of an object are
 pointers? I think this should be rare enough, so I doubt about the
 utility, but maybe I'm missing some common case.

You're probably right.  The only common one I can think of is arrays of classes.
For an array, the 4 bytes of overhead is usually negligible.

 3.  The bitmask pointer would be stored at the end of every GC-allocated block
for
 which a bitmask pointer is stored.  The reason for using the end of the block
 instead of the beginning is just implementation simplicity:  That way, finding
the
 beginning of a block would work the same whether or not we have a bitmask
pointer.

 Did you even consider storing this information outside the memory block
 (like the flags)? I think storing them in the memory block can be annoying
 if they are not stored always because now your fixed sized blocks are not
 fixed. It might be very easy to overcome this, but maybe thinking about
 the other option is worthy.

The flags in the GC are designed to store single bits apparently.  Also, as far
as
weak refs, we could use another high-order bit for that.  I don't think anyone
cares if (on 32-bit) T.sizeof is limited to about a gigabyte.  The question is,
how many high-order bits do we reserve?  BTW, I'm not going to actually
implement
weak references unless I see an easy way to do it, we're only talking about
reserving bits here.

I guess the bottom line is that IMHO precise heap scanning is really low-hanging
fruit where the bang for the buck would be huge.  Improving the GC in other ways
raises some nasty issues, but in this case I really think the perfect is the
enemy
of the good.  I'm not guaranteeing anything, but I think I might be able to have
precise heap scanning up and running in a release or two if I really keep it
simple and stupid.

Oct 26 2009

Leandro Lucarella <llucax gmail.com> writes:

dsimcha, el 27 de octubre a las 01:34 me escribiste:
 2.  Some concern has been expressed in the past about the possibility of using
4
 bytes per block in overhead to store pointers to bitmasks.  IMHO this concern
is
 misplaced because in any program that takes up enough memory for space
efficiency
 to matter, false pointers waste more space.  Furthermore, if you're programming
 for a toaster oven, elevator controller, etc., you probably aren't going to use
 the standard out-of-the-box GC anyhow.

 This seems reasonable, but I don't see why we can't use a more efficient
 way to store this information in the GC. Implementation simplicity (to
 have a better GC now instead of a perfect GC in some point in a far
 future) is a good enough reason :) I'm just curious if you found any flaws
 in the scheme proposed by Frits or is just you want a simpler
 implementation.

 
 Two things:  Implementation simplicity is one.  As I've been alluding to, worse
 and works in practice is better than better and only exists on paper.  The
other

Fair enough.

 is that I don't understand, in Frits's approach, how do you store the size of
the
 object so you know how many bits to interpret as the mask?

I guess you don't. You have a bin size, so you use the bin size. I guess
you just mark unused words as being non-pointers, so they wont get
scanned.

... I guess that doesn't work for arrays though. I remember some
discussion about how to handle arrays, but I can't find it now...

 3.  The bitmask pointer would be stored at the end of every GC-allocated block
for
 which a bitmask pointer is stored.  The reason for using the end of the block
 instead of the beginning is just implementation simplicity:  That way, finding
the
 beginning of a block would work the same whether or not we have a bitmask
pointer.

 Did you even consider storing this information outside the memory block
 (like the flags)? I think storing them in the memory block can be annoying
 if they are not stored always because now your fixed sized blocks are not
 fixed. It might be very easy to overcome this, but maybe thinking about
 the other option is worthy.

 
 The flags in the GC are designed to store single bits apparently.  Also, as
far as
 weak refs, we could use another high-order bit for that.  I don't think anyone
 cares if (on 32-bit) T.sizeof is limited to about a gigabyte.

I think at first sight it sounds reasonable, but I'm a little affraid of
it; it can sound like "640K ought to be enough for anybody" ;)

I know it almost impossible to have a T.sizeof larger than 1GiB, but what
if you hit that? I guess since you won't be creating more than one or two
objects of that size (3 at most), I guess is pretty reasonable to say
"manage that memory yourself".

 The question is, how many high-order bits do we reserve?  BTW, I'm not
 going to actually implement weak references unless I see an easy way to
 do it, we're only talking about reserving bits here.

I wasn't talking about bitmaps, I was talking more generally. Even if you
want to store a pointer to the array describing the offsets of the
pointers, you could store that pointers in a separated memory block (not
in the block reserved for the object itself).

Again, I didn't thought about it, it was just something that popped in my
mind. I guess is a little irrelevant just now, it's something that should
be moderately easy to change in the future if it's proven to be better to
store it elsewhere.

 I guess the bottom line is that IMHO precise heap scanning is really
low-hanging
 fruit where the bang for the buck would be huge.  Improving the GC in other
ways
 raises some nasty issues, but in this case I really think the perfect is the
enemy
 of the good.  I'm not guaranteeing anything, but I think I might be able to
have
 precise heap scanning up and running in a release or two if I really keep it
 simple and stupid.


Agreed. I would like to know very much how things goes.

-- 
Leandro Lucarella (AKA luca)                     http://llucax.com.ar/
----------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------
Just because you're paranoid, don't mean they're not after you.

Oct 27 2009

Leandro Lucarella <llucax gmail.com> writes:

Andrei Alexandrescu, el 26 de octubre a las 11:01 me escribiste:
 Sean Kelly wrote:
dsimcha Wrote:

== Quote from Sean Kelly (sean invisibleduck.org)'s article
dsimcha Wrote:
I just realized last night that D's templates are probably powerful enough now
to generate bit masks that can be used for precise GC heap scanning.  I'm
halfway (emphasis on halfway) thinking of using this to try to hack the GC and
make heap scanning fully precise except for the corner case of unions.
However, this ties into several things that others in the D community are
doing, so I want to gauge people's responses and make sure I'm not wasting
effort on something that will be useless in 6 months.

1.  Sean, Leonardo, whoever else may be working on GC implementations, have
you by any chance broken ground on precise heap scanning already?

I've thought about it, but not done anything about it.  The compiler doesn't

provide this information, so precise scanning would require a user-level call.
You'll also have to deal with arrays of structs, by the way.

Arrays of structs are easy:  Generate a bitmask for one element, and keep
reusing
that bitmask until the end of the block.  Am I missing something?

Nope.

 
 One question is, is there enough information for stack variables? My
 understanding from a while ago was that heap data could be
 reasonably analyzed, but stack data has no info associated with it.

There is some discussion about precise stack in the thread I cited too.
The stack can be also precise adding a shadow stack (like a TypeInfo for
the stack) but it's costly and it needs compiler support (LLVM has some
mechanisms to generate this stack information AFAIK but I never played
with it).

The problem with the stack is that you still have to interact with C,
which has no stack information, so it's a little more controversial about
how much the extra complexity will pay off.

I agree with David that it's much more reasonable to make the heap precise
first and then see how thing are going from that.

-- 
Leandro Lucarella (AKA luca)                     http://llucax.com.ar/
----------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------
hypocrite opportunist
don't infect me with your poison

Oct 26 2009

Leandro Lucarella <llucax gmail.com> writes:

dsimcha, el 26 de octubre a las 13:08 me escribiste:
I just realized last night that D's templates are probably powerful enough now
to generate bit masks that can be used for precise GC heap scanning. I'm
halfway (emphasis on halfway) thinking of using this to try to hack the GC and
make heap scanning fully precise except for the corner case of unions.
However, this ties into several things that others in the D community are
doing, so I want to gauge people's responses and make sure I'm not wasting
effort on something that will be useless in 6 months.

1. Sean, Leonardo, whoever else may be working on GC implementations, have
you by any chance broken ground on precise heap scanning already?

Maybe you're talking about me. I didn't have the chance to play with this
yet. My main goal is to make the collect run concurrently with the
mutator, but I have been a little busy lately so I didn't make many
advances yet. I will like to play with adding preciseness to the GC too if
I have the time.

2. Andrei, Walter, how close are we to actually eliminating new from the
language? If all allocations were done by either calling GC.malloc() or using
templates that call GC.malloc(), then things would get a lot simpler than if I
were to have to hack the compiler to make new pass type info to the GC.

The runtime is already receiving the type information on the allocated
object when new is used AFAIK, but this information is not propagated to
gc_malloc(). So it shouldn't be too hard to add type information to the
GC. There was some interesting discussion about this some time ago.
http://www.digitalmars.com/d/archives/digitalmars/D/Std_Phobos_2_and_logging_library_87794.html#N87831

3. I'm thinking bit masks could be stored as follows:

When getBitMask!(T) is instantiated, it generates an immutable size_t[N].
Element 0 is the size of the array (to allow for storing only the ptr in the
GC), element 1 is the size of one instance of the object, in bytes. The size
of the memory block must be a multiple of this. Elements 2..$ are all of the
offsets that should be scanned for pointers. For example:

struct Foo {
uint bar;
void* baz;
}

getBitMask!(Foo); // [3, 8, 4].

That leaves the problem of where/how to store the pointers to this information
in the GC efficiently. I haven't gotten that far yet, but I remember some
concerns have been raised in the past about storing 4 bytes per GC object for
a pointer to the bitmask. For my use cases, where I tend to allocate a
relatively small number of relatively large objects, this isn't a problem.
However, in a heavily OO environment, where people allocate tons of tiny
objects, it might be.

In the discussion I mentioned Frits van Bommel proposed a reasonable way
to encode the information efficiently:
http://www.digitalmars.com/d/archives/digitalmars/D/Std_Phobos_2_and_logging_library_87794.html#N87968

--
Leandro Lucarella (AKA luca) http://llucax.com.ar/
----------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------
"The Guinness Book of Records" holds the record for being the most
stolen book in public libraries

Oct 26 2009

bearophile <bearophileHUGS lycos.com> writes:

dsimcha:

 A moving GC, one that doesn't stop the world on collection,
 and one that's fully precise including stack would be nice, but they're several
 orders of magnitude less important and would also have more ripple effects.

I agree that here doing something simple now is better than doing nothing or
doing something very refined in an unknown future.

And in future things may be improved. In D objects are always managed by
reference, and I think that most programs don't alter or cast such references
to something else (objects allocated on memory specified by the programmer, and
scoped objects allocated on the stack may be excluded from this). So I think
that it can be safe to move objects, to compact the heap.

So you may have 5 memory zones:
- C heap. (The type system of the D compiler may see the C-heap pointers and
D-heap pointers as two different types, as I have proposed in the past. So you
need casts if you want to mix them, and the compiler can use the D moving heap
in a safer way).
- Pinned D heap for everything can't be moved, like structs managed by pointers
(eventually a D compiler can infer at compile time that some structs too may be
moved around, because their pointer is used only in clean ways (you don't need
to introduce struct references for this)). I think SafeD modules will not use
this heap a lot (but they can use unsafe modules that may use pinned objects.
Is D safety transitive? I think it is not, so from a SafeD module you can call
and use an unsafe module);
- Old object generation managed with compaction (I think there's no need for
the permanent objects zone in D);
- Two "from" and "to" zones for the young generation, that don't use a true
compaction strategy, young objects bounce between them;
- New generation Eden where new object allocations happen managed as a memory
arena.

All this has the disadvantage of requiring a more complex GC, and probably
requiring 2-4 times more RAM at runtime. It hopefully has the advantage of
allowing new programmers, that have learnt Java at university, to program in
almost like in Java. (I have found a not-synthetic Java benchmark program that
converted to D is something like 18 times slower on LDC. I'll put the code in
my site in the following days).

Bye,
bearophile

Oct 29 2009

Jacob Carlborg <doob me.com> writes:

On 10/29/09 11:47, bearophile wrote:
 dsimcha:

 A moving GC, one that doesn't stop the world on collection,
 and one that's fully precise including stack would be nice, but they're several
 orders of magnitude less important and would also have more ripple effects.

 I agree that here doing something simple now is better than doing nothing or
doing something very refined in an unknown future.

 And in future things may be improved. In D objects are always managed by
reference, and I think that most programs don't alter or cast such references
to something else (objects allocated on memory specified by the programmer, and
scoped objects allocated on the stack may be excluded from this). So I think
that it can be safe to move objects, to compact the heap.

The current implementation of toHash in Object does that: return 
cast(hash_t)cast(void*)this;

 So you may have 5 memory zones:
 - C heap. (The type system of the D compiler may see the C-heap pointers and
D-heap pointers as two different types, as I have proposed in the past. So you
need casts if you want to mix them, and the compiler can use the D moving heap
in a safer way).
 - Pinned D heap for everything can't be moved, like structs managed by
pointers (eventually a D compiler can infer at compile time that some structs
too may be moved around, because their pointer is used only in clean ways (you
don't need to introduce struct references for this)). I think SafeD modules
will not use this heap a lot (but they can use unsafe modules that may use
pinned objects. Is D safety transitive? I think it is not, so from a SafeD
module you can call and use an unsafe module);
 - Old object generation managed with compaction (I think there's no need for
the permanent objects zone in D);
 - Two "from" and "to" zones for the young generation, that don't use a true
compaction strategy, young objects bounce between them;
 - New generation Eden where new object allocations happen managed as a memory
arena.

 All this has the disadvantage of requiring a more complex GC, and probably
requiring 2-4 times more RAM at runtime. It hopefully has the advantage of
allowing new programmers, that have learnt Java at university, to program in
almost like in Java. (I have found a not-synthetic Java benchmark program that
converted to D is something like 18 times slower on LDC. I'll put the code in
my site in the following days).

 Bye,
 bearophile

Oct 29 2009

bearophile <bearophileHUGS lycos.com> writes:

Jacob Carlborg:

 The current implementation of toHash in Object does that: return 
 cast(hash_t)cast(void*)this;

I agree, such things will have to change when D wants a moving GC.

Bye,
bearophile

Oct 29 2009

D Programming

C/C++ Programming

Other

digitalmars.D - GC Precision