www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - More precise GC

reply bearophile <bearophileHUGS lycos.com> writes:
This is just a small invention.
Enums can decrease the precision of the GC, because fields can be pointers or
values, an example:


enum NodeType { value, sum, product }

struct Node {
    NodeType type;

    union {
        double val;
        struct {
            Node* left, right;
        }
    }
}

(Node can be used to build a simple expression tree, that can contain things
like: 2.5 * 3.0 + 2.0).

In this case the GC has to use the safe strategy of always following the left
and right pointers, even when they are a double.

At runtime the program usually knows what's inside the enum, if the enum truly
contains pointers. This information can be inside a tag as in this example,
inside a tagged pointer that points to the struct/enum, or at worst in the code
that uses the struct/enum. But generally the compiler is not able to use this
information, because a compiler usually doesn't understand the program
semantics.

So I can think of a new optional standard class/struct/enum method, for example
gcfollow() that the garbage collector recognizes when it is present and useful.
It gets called with the attribute name and returns true at runtime if the the
GC has to follow a pointer/reference:


struct Node {
    NodeType type;

    union {
        double val;
        struct {
            Node* left, right;
        }
    }

    bool gcfollow(string field)() {
        return type != NodeType.value;
    }
}


Note that the GC will call gcfollow() (here a template for more efficiency)
only with the strings of "left" or "right", because "type" and "val" attributes
can't contain pointers/references.

So this gcfollow() will be instantiated only with "left" or "right", and in
both cases it returns true if the node isn't a value, so the GC will not
mistake the "val" double in the leaves of this expression tree as a couple of
pointers. The GC will follow the left and right pointers only for the internal
nodes of the tree. This slows down the GC a little but increases its precision
(if gcfollow() has no bugs).

If gcfollow() is not present the GC acts as it does now, assuming the pointers
are always present.

If a struct contains no pointers/references the gcfollow() is never used. If a
struct contains another struct, the GC will call gcfollow() of the inner struct
too if it contains pointers/references, etc.

Bye,
bearophile
Mar 28 2010
next sibling parent reply William T. Fnk <wtfmail somewheredomain.net> writes:
bearophile Wrote:

 This is just a small invention.
 Enums can decrease the precision of the GC, because fields can be pointers or
values, an example:
 
 
 enum NodeType { value, sum, product }
 
 struct Node {
     NodeType type;
 
     union {
         double val;
         struct {
             Node* left, right;
         }
     }
 }
 
 (Node can be used to build a simple expression tree, that can contain things
like: 2.5 * 3.0 + 2.0).
 
 In this case the GC has to use the safe strategy of always following the left
and right pointers, even when they are a double.
 
 At runtime the program usually knows what's inside the enum, if the enum truly
contains pointers. This information can be inside a tag as in this example,
inside a tagged pointer that points to the struct/enum, or at worst in the code
that uses the struct/enum. But generally the compiler is not able to use this
information, because a compiler usually doesn't understand the program
semantics.
 
 So I can think of a new optional standard class/struct/enum method, for
example gcfollow() that the garbage collector recognizes when it is present and
useful. It gets called with the attribute name and returns true at runtime if
the the GC has to follow a pointer/reference:
 
 
 struct Node {
     NodeType type;
 
     union {
         double val;
         struct {
             Node* left, right;
         }
     }
 
     bool gcfollow(string field)() {
         return type != NodeType.value;
     }
 }
 
 
 Note that the GC will call gcfollow() (here a template for more efficiency)
only with the strings of "left" or "right", because "type" and "val" attributes
can't contain pointers/references.
 
 So this gcfollow() will be instantiated only with "left" or "right", and in
both cases it returns true if the node isn't a value, so the GC will not
mistake the "val" double in the leaves of this expression tree as a couple of
pointers. The GC will follow the left and right pointers only for the internal
nodes of the tree. This slows down the GC a little but increases its precision
(if gcfollow() has no bugs).
 
 If gcfollow() is not present the GC acts as it does now, assuming the pointers
are always present.
 
 If a struct contains no pointers/references the gcfollow() is never used. If a
struct contains another struct, the GC will call gcfollow() of the inner struct
too if it contains pointers/references, etc.
 
 Bye,
 bearophile

This is a rather ridiculous way of emulating algebraic data types and value-oriented programming. But then again, that kind of features might fit perfectly to D or at least provide some food for thought for further bikeshedding.
Mar 28 2010
parent bearophile <bearophileHUGS lycos.com> writes:
William T. Fnk:
This is a rather ridiculous way of emulating algebraic data types<

I think algebraic data types in Haskell don't allow you to use for example enums with no tags, where the tag is stored in the less significant bit of the pointer that points to the enum. And I think algebraic data types will not be added to D, while normal not-GC-precise enums will be kept in D, so...
and value-oriented programming.<

I don't know what this is. Bye, bearophile
Mar 28 2010
prev sibling parent reply "Robert Jacques" <sandford jhu.edu> writes:
On Sun, 28 Mar 2010 09:40:10 -0300, bearophile <bearophileHUGS lycos.com>  
wrote:
 This is just a small invention.
 Enums can decrease the precision of the GC, because fields can be  
 pointers or values, an example:


 enum NodeType { value, sum, product }

 struct Node {
     NodeType type;

     union {
         double val;
         struct {
             Node* left, right;
         }
     }
 }

 (Node can be used to build a simple expression tree, that can contain  
 things like: 2.5 * 3.0 + 2.0).

 In this case the GC has to use the safe strategy of always following the  
 left and right pointers, even when they are a double.

 At runtime the program usually knows what's inside the enum, if the enum  
 truly contains pointers. This information can be inside a tag as in this  
 example, inside a tagged pointer that points to the struct/enum, or at  
 worst in the code that uses the struct/enum. But generally the compiler  
 is not able to use this information, because a compiler usually doesn't  
 understand the program semantics.

 So I can think of a new optional standard class/struct/enum method, for  
 example gcfollow() that the garbage collector recognizes when it is  
 present and useful. It gets called with the attribute name and returns  
 true at runtime if the the GC has to follow a pointer/reference:


 struct Node {
     NodeType type;

     union {
         double val;
         struct {
             Node* left, right;
         }
     }

     bool gcfollow(string field)() {
         return type != NodeType.value;
     }
 }


 Note that the GC will call gcfollow() (here a template for more  
 efficiency) only with the strings of "left" or "right", because "type"  
 and "val" attributes can't contain pointers/references.

 So this gcfollow() will be instantiated only with "left" or "right", and  
 in both cases it returns true if the node isn't a value, so the GC will  
 not mistake the "val" double in the leaves of this expression tree as a  
 couple of pointers. The GC will follow the left and right pointers only  
 for the internal nodes of the tree. This slows down the GC a little but  
 increases its precision (if gcfollow() has no bugs).

 If gcfollow() is not present the GC acts as it does now, assuming the  
 pointers are always present.

 If a struct contains no pointers/references the gcfollow() is never  
 used. If a struct contains another struct, the GC will call gcfollow()  
 of the inner struct too if it contains pointers/references, etc.

 Bye,
 bearophile

This would require structs/arrays/etc to contain a header with a vtable, so the GC could know what to do. You'd probably save memory (the headers cost 16 bytes) and would definitely save collection time simply breaking those unions into things with pointers and things without pointers. In your example, doing that would cost some extra memory, as you'd go from a 16-byte block to a 32-byte block. However, remember, the GC allocates on 16-byte boundaries so each Node* actually has 4-bits (8 total) in which to hide an enum.
Mar 28 2010
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Robert Jacques:

 This would require structs/arrays/etc to contain a header with a vtable,  
 so the GC could know what to do.

Do you mean a vtable pointer? Can you explain me why this is necessary?
 remember, the GC allocates on  
 16-byte boundaries so each Node* actually has 4-bits (8 total) in which to  
 hide an enum.

They can't be used, the D specs say that pointers to memory managed by the GC can't be used to store flags (so I too was wrong in an answer to another person), probably because they are used by the garbage to color the graph in two or three colors. Bye, bearophile
Mar 28 2010
parent bearophile <bearophileHUGS lycos.com> writes:
Robert Jacques:

What I think you're forgetting is that all compile-time type info is lost at
runtime. [... etc]<

Thank you very much for all your explanations, I didn't know that the situation is so terrible. I suddenly like not-GC languages more :-) I think the compilation of D code must build a data structure that will be used at runtime by the GC to know the type of all variables and pointers, otherwise there's no hope in a GC that works well in long-running programs. What you have explained me means that my gcfollow is useless (it has another minor problem: sometimes the information regarding the contents of the union is not inside the struct fields, so the gcfollow can have troubles in finding such information far away).
 No, what you can't do is hide flags in high order bits or use tricks like  
 XOR to store two pointers in a single field. The 4 low order bits are fair  
 game:

This page: http://www.digitalmars.com/d/2.0/garbage.html Says: Do not take advantage of alignment of pointers to store bit flags in the low order bits: p = cast(void*)(cast(int)p | 1); // error: undefined behavior Bye, bearophile
Mar 28 2010
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Sun, 28 Mar 2010 12:32:22 -0300, bearophile <bearophileHUGS lycos.com>  
wrote:
 Robert Jacques:

 This would require structs/arrays/etc to contain a header with a vtable,
 so the GC could know what to do.

Do you mean a vtable pointer? Can you explain me why this is necessary?

Yes. What I think you're forgetting is that all compile-time type info is lost at runtime. All the GC knows about Node is that it's 16-bytes, it doesn't have an object finalizer and it contains pointers somewhere. Hopefully, in the future, it'll even know where those pointers are. But that's it. It doesn't know that that 16-byte memory chunk is a Node or that a gcfollow method exists. This is one reason struct destructors don't run when you allocate them on the heap: the GC simply doesn't know about any function to run. Naturally, this also applies to arrays and value types. So if you want to add type specific GC stuff, you have to add type-specific headers to each memory chuck.
 remember, the GC allocates on
 16-byte boundaries so each Node* actually has 4-bits (8 total) in which  
 to
 hide an enum.

They can't be used, the D specs say that pointers to memory managed by the GC can't be used to store flags (so I too was wrong in an answer to another person), probably because they are used by the garbage to color the graph in two or three colors. Bye, bearophile

No, what you can't do is hide flags in high order bits or use tricks like XOR to store two pointers in a single field. The 4 low order bits are fair game: adding 0-15 to the base pointer with still cause the struct to be properly marked. As for using those bits for the mark colors; A) D supports pointers to bytes, so the GC has to leave those bits alone. B) D's GC isn't precise and has no clue what are pointers and what aren't, so it can't use pointers in this fashion. C) The color bits apply to the object pointed to, not the pointer pointing to it.
Mar 28 2010
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Sun, 28 Mar 2010 12:50:20 -0400, bearophile <bearophileHUGS lycos.com>  
wrote:

 Robert Jacques:

 What I think you're forgetting is that all compile-time type info is  
 lost at runtime. [... etc]<

Thank you very much for all your explanations, I didn't know that the situation is so terrible. I suddenly like not-GC languages more :-) I think the compilation of D code must build a data structure that will be used at runtime by the GC to know the type of all variables and pointers, otherwise there's no hope in a GC that works well in long-running programs.

The current GC has a simple "type info" if you will -- contains pointers or doesn't contain pointers. It doesn't mean we cannot add to that. In fact, I think dsimcha has provided a way to have precise scanning for heap-allocated types. I don't think a reasonably precise GC is out of the question. However, it may be too much to require the GC to do semantic analysis of enums for unions. Not impossible, but probably not worth the effort and restrictions necessary. -Steve
Mar 28 2010
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Sun, 28 Mar 2010 13:50:20 -0300, bearophile <bearophileHUGS lycos.com>  
wrote:
 Robert Jacques:
 No, what you can't do is hide flags in high order bits or use tricks  
 like
 XOR to store two pointers in a single field. The 4 low order bits are  
 fair
 game:

This page: http://www.digitalmars.com/d/2.0/garbage.html Says: Do not take advantage of alignment of pointers to store bit flags in the low order bits: p = cast(void*)(cast(int)p | 1); // error: undefined behavior Bye, bearophile

Yes, hiding bit flags in pointers is generally a bad idea because it makes your code dependent on a particular GC/CPU system. Though I do think storing a single bit is going to work on anything greater than a 8-bit system.
Mar 28 2010
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Sun, 28 Mar 2010 16:16:41 -0300, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:

 On Sun, 28 Mar 2010 12:50:20 -0400, bearophile  
 <bearophileHUGS lycos.com> wrote:

 Robert Jacques:

 What I think you're forgetting is that all compile-time type info is  
 lost at runtime. [... etc]<

Thank you very much for all your explanations, I didn't know that the situation is so terrible. I suddenly like not-GC languages more :-) I think the compilation of D code must build a data structure that will be used at runtime by the GC to know the type of all variables and pointers, otherwise there's no hope in a GC that works well in long-running programs.

The current GC has a simple "type info" if you will -- contains pointers or doesn't contain pointers. It doesn't mean we cannot add to that. In fact, I think dsimcha has provided a way to have precise scanning for heap-allocated types. I don't think a reasonably precise GC is out of the question. However, it may be too much to require the GC to do semantic analysis of enums for unions. Not impossible, but probably not worth the effort and restrictions necessary. -Steve

Also, don't forget that classes have a bunch of runtime type info.
Mar 28 2010
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Sun, 28 Mar 2010 23:30:32 -0400, Robert Jacques <sandford jhu.edu>  
wrote:

 On Sun, 28 Mar 2010 16:16:41 -0300, Steven Schveighoffer  
 <schveiguy yahoo.com> wrote:
 The current GC has a simple "type info" if you will -- contains  
 pointers or doesn't contain pointers.  It doesn't mean we cannot add to  
 that.  In fact, I think dsimcha has provided a way to have precise  
 scanning for heap-allocated types.  I don't think a reasonably precise  
 GC is out of the question.  However, it may be too much to require the  
 GC to do semantic analysis of enums for unions.  Not impossible, but  
 probably not worth the effort and restrictions necessary.

Also, don't forget that classes have a bunch of runtime type info.

But the GC doesn't use/need this info (except to call the destructors). At least the mark/sweep portion doesn't. -Steve
Mar 29 2010
prev sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Mon, 29 Mar 2010 08:09:09 -0300, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:

 On Sun, 28 Mar 2010 23:30:32 -0400, Robert Jacques <sandford jhu.edu>  
 wrote:

 On Sun, 28 Mar 2010 16:16:41 -0300, Steven Schveighoffer  
 <schveiguy yahoo.com> wrote:
 The current GC has a simple "type info" if you will -- contains  
 pointers or doesn't contain pointers.  It doesn't mean we cannot add  
 to that.  In fact, I think dsimcha has provided a way to have precise  
 scanning for heap-allocated types.  I don't think a reasonably precise  
 GC is out of the question.  However, it may be too much to require the  
 GC to do semantic analysis of enums for unions.  Not impossible, but  
 probably not worth the effort and restrictions necessary.

Also, don't forget that classes have a bunch of runtime type info.

But the GC doesn't use/need this info (except to call the destructors). At least the mark/sweep portion doesn't. -Steve

Sorry, my comment was more for a D in general than the GC. GC's in general don't know or need anything beyond a pointer mask and whether to finalize or not.
Mar 29 2010