www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - GC Precision

reply dsimcha <dsimcha yahoo.com> writes:
I just realized last night that D's templates are probably powerful enough now
to generate bit masks that can be used for precise GC heap scanning.  I'm
halfway (emphasis on halfway) thinking of using this to try to hack the GC and
make heap scanning fully precise except for the corner case of unions.
However, this ties into several things that others in the D community are
doing, so I want to gauge people's responses and make sure I'm not wasting
effort on something that will be useless in 6 months.

1.  Sean, Leonardo, whoever else may be working on GC implementations, have
you by any chance broken ground on precise heap scanning already?

2.  Andrei, Walter, how close are we to actually eliminating new from the
language?  If all allocations were done by either calling GC.malloc() or using
templates that call GC.malloc(), then things would get a lot simpler than if I
were to have to hack the compiler to make new pass type info to the GC.

3.  I'm thinking bit masks could be stored as follows:

When getBitMask!(T) is instantiated, it generates an immutable size_t[N].
Element 0 is the size of the array (to allow for storing only the ptr in the
GC), element 1 is the size of one instance of the object, in bytes.  The size
of the memory block must be a multiple of this.  Elements 2..$ are all of the
offsets that should be scanned for pointers.  For example:

struct Foo {
    uint bar;
    void* baz;
}

getBitMask!(Foo);  // [3, 8, 4].

That leaves the problem of where/how to store the pointers to this information
in the GC efficiently.  I haven't gotten that far yet, but I remember some
concerns have been raised in the past about storing 4 bytes per GC object for
a pointer to the bitmask.  For my use cases, where I tend to allocate a
relatively small number of relatively large objects, this isn't a problem.
However, in a heavily OO environment, where people allocate tons of tiny
objects, it might be.
Oct 26 2009
next sibling parent reply Sean Kelly <sean invisibleduck.org> writes:
dsimcha Wrote:

 I just realized last night that D's templates are probably powerful enough now
 to generate bit masks that can be used for precise GC heap scanning.  I'm
 halfway (emphasis on halfway) thinking of using this to try to hack the GC and
 make heap scanning fully precise except for the corner case of unions.
 However, this ties into several things that others in the D community are
 doing, so I want to gauge people's responses and make sure I'm not wasting
 effort on something that will be useless in 6 months.
 
 1.  Sean, Leonardo, whoever else may be working on GC implementations, have
 you by any chance broken ground on precise heap scanning already?
I've thought about it, but not done anything about it. The compiler doesn't provide this information, so precise scanning would require a user-level call. You'll also have to deal with arrays of structs, by the way.
Oct 26 2009
parent reply dsimcha <dsimcha yahoo.com> writes:
== Quote from Sean Kelly (sean invisibleduck.org)'s article
 dsimcha Wrote:
 I just realized last night that D's templates are probably powerful enough now
 to generate bit masks that can be used for precise GC heap scanning.  I'm
 halfway (emphasis on halfway) thinking of using this to try to hack the GC and
 make heap scanning fully precise except for the corner case of unions.
 However, this ties into several things that others in the D community are
 doing, so I want to gauge people's responses and make sure I'm not wasting
 effort on something that will be useless in 6 months.

 1.  Sean, Leonardo, whoever else may be working on GC implementations, have
 you by any chance broken ground on precise heap scanning already?
I've thought about it, but not done anything about it. The compiler doesn't
provide this information, so precise scanning would require a user-level call. You'll also have to deal with arrays of structs, by the way. Arrays of structs are easy: Generate a bitmask for one element, and keep reusing that bitmask until the end of the block. Am I missing something?
Oct 26 2009
parent reply Sean Kelly <sean invisibleduck.org> writes:
dsimcha Wrote:

 == Quote from Sean Kelly (sean invisibleduck.org)'s article
 dsimcha Wrote:
 I just realized last night that D's templates are probably powerful enough now
 to generate bit masks that can be used for precise GC heap scanning.  I'm
 halfway (emphasis on halfway) thinking of using this to try to hack the GC and
 make heap scanning fully precise except for the corner case of unions.
 However, this ties into several things that others in the D community are
 doing, so I want to gauge people's responses and make sure I'm not wasting
 effort on something that will be useless in 6 months.

 1.  Sean, Leonardo, whoever else may be working on GC implementations, have
 you by any chance broken ground on precise heap scanning already?
I've thought about it, but not done anything about it. The compiler doesn't
provide this information, so precise scanning would require a user-level call. You'll also have to deal with arrays of structs, by the way. Arrays of structs are easy: Generate a bitmask for one element, and keep reusing that bitmask until the end of the block. Am I missing something?
Nope.
Oct 26 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Sean Kelly wrote:
 dsimcha Wrote:
 
 == Quote from Sean Kelly (sean invisibleduck.org)'s article
 dsimcha Wrote:
 I just realized last night that D's templates are probably powerful enough now
 to generate bit masks that can be used for precise GC heap scanning.  I'm
 halfway (emphasis on halfway) thinking of using this to try to hack the GC and
 make heap scanning fully precise except for the corner case of unions.
 However, this ties into several things that others in the D community are
 doing, so I want to gauge people's responses and make sure I'm not wasting
 effort on something that will be useless in 6 months.

 1.  Sean, Leonardo, whoever else may be working on GC implementations, have
 you by any chance broken ground on precise heap scanning already?
I've thought about it, but not done anything about it. The compiler doesn't
provide this information, so precise scanning would require a user-level call. You'll also have to deal with arrays of structs, by the way. Arrays of structs are easy: Generate a bitmask for one element, and keep reusing that bitmask until the end of the block. Am I missing something?
Nope.
One question is, is there enough information for stack variables? My understanding from a while ago was that heap data could be reasonably analyzed, but stack data has no info associated with it. Andrei
Oct 26 2009
next sibling parent reply dsimcha <dsimcha yahoo.com> writes:
== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 Sean Kelly wrote:
 dsimcha Wrote:

 == Quote from Sean Kelly (sean invisibleduck.org)'s article
 dsimcha Wrote:
 I just realized last night that D's templates are probably powerful enough now
 to generate bit masks that can be used for precise GC heap scanning.  I'm
 halfway (emphasis on halfway) thinking of using this to try to hack the GC and
 make heap scanning fully precise except for the corner case of unions.
 However, this ties into several things that others in the D community are
 doing, so I want to gauge people's responses and make sure I'm not wasting
 effort on something that will be useless in 6 months.

 1.  Sean, Leonardo, whoever else may be working on GC implementations, have
 you by any chance broken ground on precise heap scanning already?
I've thought about it, but not done anything about it. The compiler doesn't
provide this information, so precise scanning would require a user-level call. You'll also have to deal with arrays of structs, by the way. Arrays of structs are easy: Generate a bitmask for one element, and keep reusing that bitmask until the end of the block. Am I missing something?
Nope.
One question is, is there enough information for stack variables? My understanding from a while ago was that heap data could be reasonably analyzed, but stack data has no info associated with it. Andrei
That's why I said precise *heap* scanning. This would solve probably 90+% of the problem w/ false pointers without requiring any changes with major ripple effects, i.e. only druntime, not the compiler, would need to be hacked. Admittedly, though, unless you came up w/ some pinning scheme for stack variables and unions, it still wouldn't allow a moving GC. I personally am much more interested in a decent solution to our GC woes now than a perfect one at some point indefinitely far into the future. Right now, when working with programs that use more than maybe 100-200 MB of memory, false pointers become such a problem that the GC is almost useless, yet all kinds of library code still uses the GC heap, which is why I resisted the idea of removing GC.free() so strongly. As I see it, the biggest problem is false pointers, with the fact that every allocation requires a lock in a close second. These are the low-hanging fruit. A moving GC, one that doesn't stop the world on collection, and one that's fully precise including stack would be nice, but they're several orders of magnitude less important and would also have more ripple effects.
Oct 26 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
dsimcha wrote:
 == Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 Sean Kelly wrote:
 dsimcha Wrote:

 == Quote from Sean Kelly (sean invisibleduck.org)'s article
 dsimcha Wrote:
 I just realized last night that D's templates are probably powerful enough now
 to generate bit masks that can be used for precise GC heap scanning.  I'm
 halfway (emphasis on halfway) thinking of using this to try to hack the GC and
 make heap scanning fully precise except for the corner case of unions.
 However, this ties into several things that others in the D community are
 doing, so I want to gauge people's responses and make sure I'm not wasting
 effort on something that will be useless in 6 months.

 1.  Sean, Leonardo, whoever else may be working on GC implementations, have
 you by any chance broken ground on precise heap scanning already?
I've thought about it, but not done anything about it. The compiler doesn't
provide this information, so precise scanning would require a user-level call. You'll also have to deal with arrays of structs, by the way. Arrays of structs are easy: Generate a bitmask for one element, and keep reusing that bitmask until the end of the block. Am I missing something?
Nope.
One question is, is there enough information for stack variables? My understanding from a while ago was that heap data could be reasonably analyzed, but stack data has no info associated with it. Andrei
That's why I said precise *heap* scanning. This would solve probably 90+% of the problem w/ false pointers without requiring any changes with major ripple effects, i.e. only druntime, not the compiler, would need to be hacked. Admittedly, though, unless you came up w/ some pinning scheme for stack variables and unions, it still wouldn't allow a moving GC. I personally am much more interested in a decent solution to our GC woes now than a perfect one at some point indefinitely far into the future. Right now, when working with programs that use more than maybe 100-200 MB of memory, false pointers become such a problem that the GC is almost useless, yet all kinds of library code still uses the GC heap, which is why I resisted the idea of removing GC.free() so strongly. As I see it, the biggest problem is false pointers, with the fact that every allocation requires a lock in a close second. These are the low-hanging fruit. A moving GC, one that doesn't stop the world on collection, and one that's fully precise including stack would be nice, but they're several orders of magnitude less important and would also have more ripple effects.
Absolutely! I think that's great work. Thanks for clarifying things for me. Andrei
Oct 26 2009
parent reply dsimcha <dsimcha yahoo.com> writes:
I've spent some free brain cycles today thinking about this and here's the
scheme
I have in mind.  If anyone thinks this could be improved in a way that would not
have substantial ripple effects throughout the compiler/language (because then
it
might never actually get implemented) let me know.

1.  GC.malloc's signature changes to GC.malloc(size_t size, uint ba = 0, size_t*
bitmask = null).  A null bitmask means use the old-school behavior and either
scan
everything or don't scan anything based on the NO_SCAN bit.  IMHO plain old
conservative scanning must be supported to allow for untyped memory blocks to be
allocated, because requiring every memory block to have a type associated with
it
is an unacceptable limitation in a systems language.

For now, the only way to get precise scanning would be to call malloc directly
or
use a template that does.  Eventually new would either be fixed to instantiate
the
bitMask!(T) template or eliminated entirely.  Since using the precise scanning
by
writing a template that calls GC.malloc() is a lot easier than hacking the
compiler, this may be a catalyst for getting rid of new.

2.  Some concern has been expressed in the past about the possibility of using 4
bytes per block in overhead to store pointers to bitmasks.  IMHO this concern is
misplaced because in any program that takes up enough memory for space
efficiency
to matter, false pointers waste more space.  Furthermore, if you're programming
for a toaster oven, elevator controller, etc., you probably aren't going to use
the standard out-of-the-box GC anyhow.  However, I've thought of ways to
mitigate
this and have come up with the following:

    A.  Store a pointer to the bitmask iff the NO_SCAN bit is set.
        this is a no-brainer and will prevent any overhead on,
        for example, arrays of floats.

    B.  Add another attributes bit to the GC called something like
        NEEDS_BITMASK.  This bit would be set iff an object mixes
        pointers and non-pointers.  If it's not set, no bitmask
        pointer would be stored.  However, the overhead of an
        extra bit may or may not be worth it.

3.  The bitmask pointer would be stored at the end of every GC-allocated block
for
which a bitmask pointer is stored.  The reason for using the end of the block
instead of the beginning is just implementation simplicity:  That way, finding
the
beginning of a block would work the same whether or not we have a bitmask
pointer.

4.  The bitmask would be a size_t[] created at compile time by a template and
stored in the static data segment.  Its layout would be [length of array,
T.sizeof, offsets that need to be scanned].  For example, if you have something
like:

struct Foo {
    uint bar;
    void* ptr;
}

On a 32-bit machine, bitMask!Foo would be [3, 8, 4].  On a 64-bit, it would be
[3,
16, 8].  The reason the size of the array is stored in the array is so that we
can
get away with storing a single ptr in each memory block instead of a pointer
and a
length.

5.  To store information about pinning, we simply use the high-order bits of the
pointer offsets.  1 means pinned, 0 means not pinned.  This means that, for any
type T, T.sizeof can't be bigger than size_t.max / 2.  I think this is a fairly
minor limitation.
Oct 26 2009
parent reply Leandro Lucarella <llucax gmail.com> writes:
dsimcha, el 26 de octubre a las 23:05 me escribiste:
 I've spent some free brain cycles today thinking about this and here's the
scheme
 I have in mind.  If anyone thinks this could be improved in a way that would
not
 have substantial ripple effects throughout the compiler/language (because then
it
 might never actually get implemented) let me know.
 
 1.  GC.malloc's signature changes to GC.malloc(size_t size, uint ba = 0,
size_t*
 bitmask = null).  A null bitmask means use the old-school behavior and either
scan
 everything or don't scan anything based on the NO_SCAN bit.  IMHO plain old
 conservative scanning must be supported to allow for untyped memory blocks to
be
 allocated, because requiring every memory block to have a type associated with
it
 is an unacceptable limitation in a systems language.
 
 For now, the only way to get precise scanning would be to call malloc directly
or
 use a template that does.  Eventually new would either be fixed to instantiate
the
 bitMask!(T) template or eliminated entirely.  Since using the precise scanning
by
 writing a template that calls GC.malloc() is a lot easier than hacking the
 compiler, this may be a catalyst for getting rid of new.
Did you read the thread I posted? What do you think about Fritz's idea on how to encode the pointers information? I'm not very familiar with TypeInfo, but wouldn't be more natural to pass the TypeInfo to GC.malloc() directly if it can get the pointer information itself instead of translating that information? I think if that's possible it will keep the GC interface simpler.
 2.  Some concern has been expressed in the past about the possibility of using
4
 bytes per block in overhead to store pointers to bitmasks.  IMHO this concern
is
 misplaced because in any program that takes up enough memory for space
efficiency
 to matter, false pointers waste more space.  Furthermore, if you're programming
 for a toaster oven, elevator controller, etc., you probably aren't going to use
 the standard out-of-the-box GC anyhow.
This seems reasonable, but I don't see why we can't use a more efficient way to store this information in the GC. Implementation simplicity (to have a better GC now instead of a perfect GC in some point in a far future) is a good enough reason :) I'm just curious if you found any flaws in the scheme proposed by Frits or is just you want a simpler implementation.
 However, I've thought of ways to mitigate this and have come up with the
 following:
 
     A.  Store a pointer to the bitmask iff the NO_SCAN bit is set.
         this is a no-brainer and will prevent any overhead on,
         for example, arrays of floats.
Good idea.
     B.  Add another attributes bit to the GC called something like
         NEEDS_BITMASK.  This bit would be set iff an object mixes
         pointers and non-pointers.  If it's not set, no bitmask
         pointer would be stored.  However, the overhead of an
         extra bit may or may not be worth it.
You mean for the special case where all the attributes of an object are pointers? I think this should be rare enough, so I doubt about the utility, but maybe I'm missing some common case.
 3.  The bitmask pointer would be stored at the end of every GC-allocated block
for
 which a bitmask pointer is stored.  The reason for using the end of the block
 instead of the beginning is just implementation simplicity:  That way, finding
the
 beginning of a block would work the same whether or not we have a bitmask
pointer.
Did you even consider storing this information outside the memory block (like the flags)? I think storing them in the memory block can be annoying if they are not stored always because now your fixed sized blocks are not fixed. It might be very easy to overcome this, but maybe thinking about the other option is worthy.
 4.  The bitmask would be a size_t[] created at compile time by a template and
 stored in the static data segment.  Its layout would be [length of array,
 T.sizeof, offsets that need to be scanned].  For example, if you have
something like:
Again, I wonder if this information can't be still obtained from the TypeInfo.
 struct Foo {
     uint bar;
     void* ptr;
 }
 
 On a 32-bit machine, bitMask!Foo would be [3, 8, 4].  On a 64-bit, it would be
[3,
 16, 8].  The reason the size of the array is stored in the array is so that we
can
 get away with storing a single ptr in each memory block instead of a pointer
and a
 length.
 
 5.  To store information about pinning, we simply use the high-order bits of
the
 pointer offsets.  1 means pinned, 0 means not pinned.  This means that, for any
 type T, T.sizeof can't be bigger than size_t.max / 2.  I think this is a fairly
 minor limitation.
I like from Frits's proposal that information about weak pointer were added too, this might fix another big missing in the memory management area in D. -- Leandro Lucarella (AKA luca) http://llucax.com.ar/ ---------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------- Agitamos toda la zona de la paleta al horno, vemos que una luz nos atraviesa toda la zona intestinal... -- Sidharta Kiwi
Oct 26 2009
parent reply dsimcha <dsimcha yahoo.com> writes:
== Quote from Leandro Lucarella (llucax gmail.com)'s article
 dsimcha, el 26 de octubre a las 23:05 me escribiste:
 I've spent some free brain cycles today thinking about this and here's the
scheme
 I have in mind.  If anyone thinks this could be improved in a way that would
not
 have substantial ripple effects throughout the compiler/language (because then
it
 might never actually get implemented) let me know.

 1.  GC.malloc's signature changes to GC.malloc(size_t size, uint ba = 0,
size_t*
 bitmask = null).  A null bitmask means use the old-school behavior and either
scan
 everything or don't scan anything based on the NO_SCAN bit.  IMHO plain old
 conservative scanning must be supported to allow for untyped memory blocks to
be
 allocated, because requiring every memory block to have a type associated with
it
 is an unacceptable limitation in a systems language.

 For now, the only way to get precise scanning would be to call malloc directly
or
 use a template that does.  Eventually new would either be fixed to instantiate
the
 bitMask!(T) template or eliminated entirely.  Since using the precise scanning
by
 writing a template that calls GC.malloc() is a lot easier than hacking the
 compiler, this may be a catalyst for getting rid of new.
Did you read the thread I posted? What do you think about Fritz's idea on how to encode the pointers information? I'm not very familiar with TypeInfo, but wouldn't be more natural to pass the TypeInfo to GC.malloc() directly if it can get the pointer information itself instead of translating that information? I think if that's possible it will keep the GC interface simpler.
As far as I can tell, RTTI doesn't have all the stuff that's needed yet for structs and adding it would require hacking the compiler. Frankly, I want this to be simple enough to actually get implemented as opposed to just being talked about and deadlocking on everyone waiting on everyone else to do something.
 2.  Some concern has been expressed in the past about the possibility of using
4
 bytes per block in overhead to store pointers to bitmasks.  IMHO this concern
is
 misplaced because in any program that takes up enough memory for space
efficiency
 to matter, false pointers waste more space.  Furthermore, if you're programming
 for a toaster oven, elevator controller, etc., you probably aren't going to use
 the standard out-of-the-box GC anyhow.
This seems reasonable, but I don't see why we can't use a more efficient way to store this information in the GC. Implementation simplicity (to have a better GC now instead of a perfect GC in some point in a far future) is a good enough reason :) I'm just curious if you found any flaws in the scheme proposed by Frits or is just you want a simpler implementation.
Two things: Implementation simplicity is one. As I've been alluding to, worse and works in practice is better than better and only exists on paper. The other is that I don't understand, in Frits's approach, how do you store the size of the object so you know how many bits to interpret as the mask?
 However, I've thought of ways to mitigate this and have come up with the
 following:

     A.  Store a pointer to the bitmask iff the NO_SCAN bit is set.
         this is a no-brainer and will prevent any overhead on,
         for example, arrays of floats.
Good idea.
     B.  Add another attributes bit to the GC called something like
         NEEDS_BITMASK.  This bit would be set iff an object mixes
         pointers and non-pointers.  If it's not set, no bitmask
         pointer would be stored.  However, the overhead of an
         extra bit may or may not be worth it.
You mean for the special case where all the attributes of an object are pointers? I think this should be rare enough, so I doubt about the utility, but maybe I'm missing some common case.
You're probably right. The only common one I can think of is arrays of classes. For an array, the 4 bytes of overhead is usually negligible.
 3.  The bitmask pointer would be stored at the end of every GC-allocated block
for
 which a bitmask pointer is stored.  The reason for using the end of the block
 instead of the beginning is just implementation simplicity:  That way, finding
the
 beginning of a block would work the same whether or not we have a bitmask
pointer.
Did you even consider storing this information outside the memory block (like the flags)? I think storing them in the memory block can be annoying if they are not stored always because now your fixed sized blocks are not fixed. It might be very easy to overcome this, but maybe thinking about the other option is worthy.
The flags in the GC are designed to store single bits apparently. Also, as far as weak refs, we could use another high-order bit for that. I don't think anyone cares if (on 32-bit) T.sizeof is limited to about a gigabyte. The question is, how many high-order bits do we reserve? BTW, I'm not going to actually implement weak references unless I see an easy way to do it, we're only talking about reserving bits here. I guess the bottom line is that IMHO precise heap scanning is really low-hanging fruit where the bang for the buck would be huge. Improving the GC in other ways raises some nasty issues, but in this case I really think the perfect is the enemy of the good. I'm not guaranteeing anything, but I think I might be able to have precise heap scanning up and running in a release or two if I really keep it simple and stupid.
Oct 26 2009
parent Leandro Lucarella <llucax gmail.com> writes:
dsimcha, el 27 de octubre a las 01:34 me escribiste:
 2.  Some concern has been expressed in the past about the possibility of using
4
 bytes per block in overhead to store pointers to bitmasks.  IMHO this concern
is
 misplaced because in any program that takes up enough memory for space
efficiency
 to matter, false pointers waste more space.  Furthermore, if you're programming
 for a toaster oven, elevator controller, etc., you probably aren't going to use
 the standard out-of-the-box GC anyhow.
This seems reasonable, but I don't see why we can't use a more efficient way to store this information in the GC. Implementation simplicity (to have a better GC now instead of a perfect GC in some point in a far future) is a good enough reason :) I'm just curious if you found any flaws in the scheme proposed by Frits or is just you want a simpler implementation.
Two things: Implementation simplicity is one. As I've been alluding to, worse and works in practice is better than better and only exists on paper. The other
Fair enough.
 is that I don't understand, in Frits's approach, how do you store the size of
the
 object so you know how many bits to interpret as the mask?
I guess you don't. You have a bin size, so you use the bin size. I guess you just mark unused words as being non-pointers, so they wont get scanned. ... I guess that doesn't work for arrays though. I remember some discussion about how to handle arrays, but I can't find it now...
 3.  The bitmask pointer would be stored at the end of every GC-allocated block
for
 which a bitmask pointer is stored.  The reason for using the end of the block
 instead of the beginning is just implementation simplicity:  That way, finding
the
 beginning of a block would work the same whether or not we have a bitmask
pointer.
Did you even consider storing this information outside the memory block (like the flags)? I think storing them in the memory block can be annoying if they are not stored always because now your fixed sized blocks are not fixed. It might be very easy to overcome this, but maybe thinking about the other option is worthy.
The flags in the GC are designed to store single bits apparently. Also, as far as weak refs, we could use another high-order bit for that. I don't think anyone cares if (on 32-bit) T.sizeof is limited to about a gigabyte.
I think at first sight it sounds reasonable, but I'm a little affraid of it; it can sound like "640K ought to be enough for anybody" ;) I know it almost impossible to have a T.sizeof larger than 1GiB, but what if you hit that? I guess since you won't be creating more than one or two objects of that size (3 at most), I guess is pretty reasonable to say "manage that memory yourself".
 The question is, how many high-order bits do we reserve?  BTW, I'm not
 going to actually implement weak references unless I see an easy way to
 do it, we're only talking about reserving bits here.
I wasn't talking about bitmaps, I was talking more generally. Even if you want to store a pointer to the array describing the offsets of the pointers, you could store that pointers in a separated memory block (not in the block reserved for the object itself). Again, I didn't thought about it, it was just something that popped in my mind. I guess is a little irrelevant just now, it's something that should be moderately easy to change in the future if it's proven to be better to store it elsewhere.
 I guess the bottom line is that IMHO precise heap scanning is really
low-hanging
 fruit where the bang for the buck would be huge.  Improving the GC in other
ways
 raises some nasty issues, but in this case I really think the perfect is the
enemy
 of the good.  I'm not guaranteeing anything, but I think I might be able to
have
 precise heap scanning up and running in a release or two if I really keep it
 simple and stupid.
Agreed. I would like to know very much how things goes. -- Leandro Lucarella (AKA luca) http://llucax.com.ar/ ---------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------- Just because you're paranoid, don't mean they're not after you.
Oct 27 2009
prev sibling parent Leandro Lucarella <llucax gmail.com> writes:
Andrei Alexandrescu, el 26 de octubre a las 11:01 me escribiste:
 Sean Kelly wrote:
dsimcha Wrote:

== Quote from Sean Kelly (sean invisibleduck.org)'s article
dsimcha Wrote:
I just realized last night that D's templates are probably powerful enough now
to generate bit masks that can be used for precise GC heap scanning.  I'm
halfway (emphasis on halfway) thinking of using this to try to hack the GC and
make heap scanning fully precise except for the corner case of unions.
However, this ties into several things that others in the D community are
doing, so I want to gauge people's responses and make sure I'm not wasting
effort on something that will be useless in 6 months.

1.  Sean, Leonardo, whoever else may be working on GC implementations, have
you by any chance broken ground on precise heap scanning already?
I've thought about it, but not done anything about it. The compiler doesn't
provide this information, so precise scanning would require a user-level call. You'll also have to deal with arrays of structs, by the way. Arrays of structs are easy: Generate a bitmask for one element, and keep reusing that bitmask until the end of the block. Am I missing something?
Nope.
One question is, is there enough information for stack variables? My understanding from a while ago was that heap data could be reasonably analyzed, but stack data has no info associated with it.
There is some discussion about precise stack in the thread I cited too. The stack can be also precise adding a shadow stack (like a TypeInfo for the stack) but it's costly and it needs compiler support (LLVM has some mechanisms to generate this stack information AFAIK but I never played with it). The problem with the stack is that you still have to interact with C, which has no stack information, so it's a little more controversial about how much the extra complexity will pay off. I agree with David that it's much more reasonable to make the heap precise first and then see how thing are going from that. -- Leandro Lucarella (AKA luca) http://llucax.com.ar/ ---------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------- hypocrite opportunist don't infect me with your poison
Oct 26 2009
prev sibling next sibling parent Leandro Lucarella <llucax gmail.com> writes:
dsimcha, el 26 de octubre a las 13:08 me escribiste:
 I just realized last night that D's templates are probably powerful enough now
 to generate bit masks that can be used for precise GC heap scanning.  I'm
 halfway (emphasis on halfway) thinking of using this to try to hack the GC and
 make heap scanning fully precise except for the corner case of unions.
 However, this ties into several things that others in the D community are
 doing, so I want to gauge people's responses and make sure I'm not wasting
 effort on something that will be useless in 6 months.
 
 1.  Sean, Leonardo, whoever else may be working on GC implementations, have
 you by any chance broken ground on precise heap scanning already?
Maybe you're talking about me. I didn't have the chance to play with this yet. My main goal is to make the collect run concurrently with the mutator, but I have been a little busy lately so I didn't make many advances yet. I will like to play with adding preciseness to the GC too if I have the time.
 2.  Andrei, Walter, how close are we to actually eliminating new from the
 language?  If all allocations were done by either calling GC.malloc() or using
 templates that call GC.malloc(), then things would get a lot simpler than if I
 were to have to hack the compiler to make new pass type info to the GC.
The runtime is already receiving the type information on the allocated object when new is used AFAIK, but this information is not propagated to gc_malloc(). So it shouldn't be too hard to add type information to the GC. There was some interesting discussion about this some time ago. http://www.digitalmars.com/d/archives/digitalmars/D/Std_Phobos_2_and_logging_library_87794.html#N87831
 3.  I'm thinking bit masks could be stored as follows:
 
 When getBitMask!(T) is instantiated, it generates an immutable size_t[N].
 Element 0 is the size of the array (to allow for storing only the ptr in the
 GC), element 1 is the size of one instance of the object, in bytes.  The size
 of the memory block must be a multiple of this.  Elements 2..$ are all of the
 offsets that should be scanned for pointers.  For example:
 
 struct Foo {
     uint bar;
     void* baz;
 }
 
 getBitMask!(Foo);  // [3, 8, 4].
 
 That leaves the problem of where/how to store the pointers to this information
 in the GC efficiently.  I haven't gotten that far yet, but I remember some
 concerns have been raised in the past about storing 4 bytes per GC object for
 a pointer to the bitmask.  For my use cases, where I tend to allocate a
 relatively small number of relatively large objects, this isn't a problem.
 However, in a heavily OO environment, where people allocate tons of tiny
 objects, it might be.
In the discussion I mentioned Frits van Bommel proposed a reasonable way to encode the information efficiently: http://www.digitalmars.com/d/archives/digitalmars/D/Std_Phobos_2_and_logging_library_87794.html#N87968 -- Leandro Lucarella (AKA luca) http://llucax.com.ar/ ---------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------- "The Guinness Book of Records" holds the record for being the most stolen book in public libraries
Oct 26 2009
prev sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
dsimcha:

 A moving GC, one that doesn't stop the world on collection,
 and one that's fully precise including stack would be nice, but they're several
 orders of magnitude less important and would also have more ripple effects.
I agree that here doing something simple now is better than doing nothing or doing something very refined in an unknown future. And in future things may be improved. In D objects are always managed by reference, and I think that most programs don't alter or cast such references to something else (objects allocated on memory specified by the programmer, and scoped objects allocated on the stack may be excluded from this). So I think that it can be safe to move objects, to compact the heap. So you may have 5 memory zones: - C heap. (The type system of the D compiler may see the C-heap pointers and D-heap pointers as two different types, as I have proposed in the past. So you need casts if you want to mix them, and the compiler can use the D moving heap in a safer way). - Pinned D heap for everything can't be moved, like structs managed by pointers (eventually a D compiler can infer at compile time that some structs too may be moved around, because their pointer is used only in clean ways (you don't need to introduce struct references for this)). I think SafeD modules will not use this heap a lot (but they can use unsafe modules that may use pinned objects. Is D safety transitive? I think it is not, so from a SafeD module you can call and use an unsafe module); - Old object generation managed with compaction (I think there's no need for the permanent objects zone in D); - Two "from" and "to" zones for the young generation, that don't use a true compaction strategy, young objects bounce between them; - New generation Eden where new object allocations happen managed as a memory arena. All this has the disadvantage of requiring a more complex GC, and probably requiring 2-4 times more RAM at runtime. It hopefully has the advantage of allowing new programmers, that have learnt Java at university, to program in almost like in Java. (I have found a not-synthetic Java benchmark program that converted to D is something like 18 times slower on LDC. I'll put the code in my site in the following days). Bye, bearophile
Oct 29 2009
parent reply Jacob Carlborg <doob me.com> writes:
On 10/29/09 11:47, bearophile wrote:
 dsimcha:

 A moving GC, one that doesn't stop the world on collection,
 and one that's fully precise including stack would be nice, but they're several
 orders of magnitude less important and would also have more ripple effects.
I agree that here doing something simple now is better than doing nothing or doing something very refined in an unknown future. And in future things may be improved. In D objects are always managed by reference, and I think that most programs don't alter or cast such references to something else (objects allocated on memory specified by the programmer, and scoped objects allocated on the stack may be excluded from this). So I think that it can be safe to move objects, to compact the heap.
The current implementation of toHash in Object does that: return cast(hash_t)cast(void*)this;
 So you may have 5 memory zones:
 - C heap. (The type system of the D compiler may see the C-heap pointers and
D-heap pointers as two different types, as I have proposed in the past. So you
need casts if you want to mix them, and the compiler can use the D moving heap
in a safer way).
 - Pinned D heap for everything can't be moved, like structs managed by
pointers (eventually a D compiler can infer at compile time that some structs
too may be moved around, because their pointer is used only in clean ways (you
don't need to introduce struct references for this)). I think SafeD modules
will not use this heap a lot (but they can use unsafe modules that may use
pinned objects. Is D safety transitive? I think it is not, so from a SafeD
module you can call and use an unsafe module);
 - Old object generation managed with compaction (I think there's no need for
the permanent objects zone in D);
 - Two "from" and "to" zones for the young generation, that don't use a true
compaction strategy, young objects bounce between them;
 - New generation Eden where new object allocations happen managed as a memory
arena.

 All this has the disadvantage of requiring a more complex GC, and probably
requiring 2-4 times more RAM at runtime. It hopefully has the advantage of
allowing new programmers, that have learnt Java at university, to program in
almost like in Java. (I have found a not-synthetic Java benchmark program that
converted to D is something like 18 times slower on LDC. I'll put the code in
my site in the following days).

 Bye,
 bearophile
Oct 29 2009
parent bearophile <bearophileHUGS lycos.com> writes:
Jacob Carlborg:

 The current implementation of toHash in Object does that: return 
 cast(hash_t)cast(void*)this;
I agree, such things will have to change when D wants a moving GC. Bye, bearophile
Oct 29 2009