digitalmars.D - More precise GC
- bearophile <bearophileHUGS lycos.com> Mar 28 2010
- William T. Fnk <wtfmail somewheredomain.net> Mar 28 2010
- bearophile <bearophileHUGS lycos.com> Mar 28 2010
- "Robert Jacques" <sandford jhu.edu> Mar 28 2010
- bearophile <bearophileHUGS lycos.com> Mar 28 2010
- bearophile <bearophileHUGS lycos.com> Mar 28 2010
- "Robert Jacques" <sandford jhu.edu> Mar 28 2010
- "Steven Schveighoffer" <schveiguy yahoo.com> Mar 28 2010
- "Robert Jacques" <sandford jhu.edu> Mar 28 2010
- "Robert Jacques" <sandford jhu.edu> Mar 28 2010
- "Steven Schveighoffer" <schveiguy yahoo.com> Mar 29 2010
- "Robert Jacques" <sandford jhu.edu> Mar 29 2010
This is just a small invention.
Enums can decrease the precision of the GC, because fields can be pointers or
values, an example:
enum NodeType { value, sum, product }
struct Node {
NodeType type;
union {
double val;
struct {
Node* left, right;
}
}
}
(Node can be used to build a simple expression tree, that can contain things
like: 2.5 * 3.0 + 2.0).
In this case the GC has to use the safe strategy of always following the left
and right pointers, even when they are a double.
At runtime the program usually knows what's inside the enum, if the enum truly
contains pointers. This information can be inside a tag as in this example,
inside a tagged pointer that points to the struct/enum, or at worst in the code
that uses the struct/enum. But generally the compiler is not able to use this
information, because a compiler usually doesn't understand the program
semantics.
So I can think of a new optional standard class/struct/enum method, for example
gcfollow() that the garbage collector recognizes when it is present and useful.
It gets called with the attribute name and returns true at runtime if the the
GC has to follow a pointer/reference:
struct Node {
NodeType type;
union {
double val;
struct {
Node* left, right;
}
}
bool gcfollow(string field)() {
return type != NodeType.value;
}
}
Note that the GC will call gcfollow() (here a template for more efficiency)
only with the strings of "left" or "right", because "type" and "val" attributes
can't contain pointers/references.
So this gcfollow() will be instantiated only with "left" or "right", and in
both cases it returns true if the node isn't a value, so the GC will not
mistake the "val" double in the leaves of this expression tree as a couple of
pointers. The GC will follow the left and right pointers only for the internal
nodes of the tree. This slows down the GC a little but increases its precision
(if gcfollow() has no bugs).
If gcfollow() is not present the GC acts as it does now, assuming the pointers
are always present.
If a struct contains no pointers/references the gcfollow() is never used. If a
struct contains another struct, the GC will call gcfollow() of the inner struct
too if it contains pointers/references, etc.
Bye,
bearophile
Mar 28 2010
bearophile Wrote:This is just a small invention. Enums can decrease the precision of the GC, because fields can be pointers or values, an example: enum NodeType { value, sum, product } struct Node { NodeType type; union { double val; struct { Node* left, right; } } } (Node can be used to build a simple expression tree, that can contain things like: 2.5 * 3.0 + 2.0). In this case the GC has to use the safe strategy of always following the left and right pointers, even when they are a double. At runtime the program usually knows what's inside the enum, if the enum truly contains pointers. This information can be inside a tag as in this example, inside a tagged pointer that points to the struct/enum, or at worst in the code that uses the struct/enum. But generally the compiler is not able to use this information, because a compiler usually doesn't understand the program semantics. So I can think of a new optional standard class/struct/enum method, for example gcfollow() that the garbage collector recognizes when it is present and useful. It gets called with the attribute name and returns true at runtime if the the GC has to follow a pointer/reference: struct Node { NodeType type; union { double val; struct { Node* left, right; } } bool gcfollow(string field)() { return type != NodeType.value; } } Note that the GC will call gcfollow() (here a template for more efficiency) only with the strings of "left" or "right", because "type" and "val" attributes can't contain pointers/references. So this gcfollow() will be instantiated only with "left" or "right", and in both cases it returns true if the node isn't a value, so the GC will not mistake the "val" double in the leaves of this expression tree as a couple of pointers. The GC will follow the left and right pointers only for the internal nodes of the tree. This slows down the GC a little but increases its precision (if gcfollow() has no bugs). If gcfollow() is not present the GC acts as it does now, assuming the pointers are always present. If a struct contains no pointers/references the gcfollow() is never used. If a struct contains another struct, the GC will call gcfollow() of the inner struct too if it contains pointers/references, etc. Bye, bearophile
This is a rather ridiculous way of emulating algebraic data types and value-oriented programming. But then again, that kind of features might fit perfectly to D or at least provide some food for thought for further bikeshedding.
Mar 28 2010
William T. Fnk:This is a rather ridiculous way of emulating algebraic data types<
I think algebraic data types in Haskell don't allow you to use for example enums with no tags, where the tag is stored in the less significant bit of the pointer that points to the enum. And I think algebraic data types will not be added to D, while normal not-GC-precise enums will be kept in D, so...and value-oriented programming.<
I don't know what this is. Bye, bearophile
Mar 28 2010
On Sun, 28 Mar 2010 09:40:10 -0300, bearophile <bearophileHUGS lycos.com> wrote:This is just a small invention. Enums can decrease the precision of the GC, because fields can be pointers or values, an example: enum NodeType { value, sum, product } struct Node { NodeType type; union { double val; struct { Node* left, right; } } } (Node can be used to build a simple expression tree, that can contain things like: 2.5 * 3.0 + 2.0). In this case the GC has to use the safe strategy of always following the left and right pointers, even when they are a double. At runtime the program usually knows what's inside the enum, if the enum truly contains pointers. This information can be inside a tag as in this example, inside a tagged pointer that points to the struct/enum, or at worst in the code that uses the struct/enum. But generally the compiler is not able to use this information, because a compiler usually doesn't understand the program semantics. So I can think of a new optional standard class/struct/enum method, for example gcfollow() that the garbage collector recognizes when it is present and useful. It gets called with the attribute name and returns true at runtime if the the GC has to follow a pointer/reference: struct Node { NodeType type; union { double val; struct { Node* left, right; } } bool gcfollow(string field)() { return type != NodeType.value; } } Note that the GC will call gcfollow() (here a template for more efficiency) only with the strings of "left" or "right", because "type" and "val" attributes can't contain pointers/references. So this gcfollow() will be instantiated only with "left" or "right", and in both cases it returns true if the node isn't a value, so the GC will not mistake the "val" double in the leaves of this expression tree as a couple of pointers. The GC will follow the left and right pointers only for the internal nodes of the tree. This slows down the GC a little but increases its precision (if gcfollow() has no bugs). If gcfollow() is not present the GC acts as it does now, assuming the pointers are always present. If a struct contains no pointers/references the gcfollow() is never used. If a struct contains another struct, the GC will call gcfollow() of the inner struct too if it contains pointers/references, etc. Bye, bearophile
This would require structs/arrays/etc to contain a header with a vtable, so the GC could know what to do. You'd probably save memory (the headers cost 16 bytes) and would definitely save collection time simply breaking those unions into things with pointers and things without pointers. In your example, doing that would cost some extra memory, as you'd go from a 16-byte block to a 32-byte block. However, remember, the GC allocates on 16-byte boundaries so each Node* actually has 4-bits (8 total) in which to hide an enum.
Mar 28 2010
Robert Jacques:This would require structs/arrays/etc to contain a header with a vtable, so the GC could know what to do.
Do you mean a vtable pointer? Can you explain me why this is necessary?remember, the GC allocates on 16-byte boundaries so each Node* actually has 4-bits (8 total) in which to hide an enum.
They can't be used, the D specs say that pointers to memory managed by the GC can't be used to store flags (so I too was wrong in an answer to another person), probably because they are used by the garbage to color the graph in two or three colors. Bye, bearophile
Mar 28 2010
Robert Jacques:What I think you're forgetting is that all compile-time type info is lost at runtime. [... etc]<
Thank you very much for all your explanations, I didn't know that the situation is so terrible. I suddenly like not-GC languages more :-) I think the compilation of D code must build a data structure that will be used at runtime by the GC to know the type of all variables and pointers, otherwise there's no hope in a GC that works well in long-running programs. What you have explained me means that my gcfollow is useless (it has another minor problem: sometimes the information regarding the contents of the union is not inside the struct fields, so the gcfollow can have troubles in finding such information far away).No, what you can't do is hide flags in high order bits or use tricks like XOR to store two pointers in a single field. The 4 low order bits are fair game:
This page: http://www.digitalmars.com/d/2.0/garbage.html Says: Do not take advantage of alignment of pointers to store bit flags in the low order bits: p = cast(void*)(cast(int)p | 1); // error: undefined behavior Bye, bearophile
Mar 28 2010
On Sun, 28 Mar 2010 12:32:22 -0300, bearophile <bearophileHUGS lycos.com> wrote:Robert Jacques:This would require structs/arrays/etc to contain a header with a vtable, so the GC could know what to do.
Do you mean a vtable pointer? Can you explain me why this is necessary?
Yes. What I think you're forgetting is that all compile-time type info is lost at runtime. All the GC knows about Node is that it's 16-bytes, it doesn't have an object finalizer and it contains pointers somewhere. Hopefully, in the future, it'll even know where those pointers are. But that's it. It doesn't know that that 16-byte memory chunk is a Node or that a gcfollow method exists. This is one reason struct destructors don't run when you allocate them on the heap: the GC simply doesn't know about any function to run. Naturally, this also applies to arrays and value types. So if you want to add type specific GC stuff, you have to add type-specific headers to each memory chuck.remember, the GC allocates on 16-byte boundaries so each Node* actually has 4-bits (8 total) in which to hide an enum.
They can't be used, the D specs say that pointers to memory managed by the GC can't be used to store flags (so I too was wrong in an answer to another person), probably because they are used by the garbage to color the graph in two or three colors. Bye, bearophile
No, what you can't do is hide flags in high order bits or use tricks like XOR to store two pointers in a single field. The 4 low order bits are fair game: adding 0-15 to the base pointer with still cause the struct to be properly marked. As for using those bits for the mark colors; A) D supports pointers to bytes, so the GC has to leave those bits alone. B) D's GC isn't precise and has no clue what are pointers and what aren't, so it can't use pointers in this fashion. C) The color bits apply to the object pointed to, not the pointer pointing to it.
Mar 28 2010
On Sun, 28 Mar 2010 12:50:20 -0400, bearophile <bearophileHUGS lycos.com> wrote:Robert Jacques:What I think you're forgetting is that all compile-time type info is lost at runtime. [... etc]<
Thank you very much for all your explanations, I didn't know that the situation is so terrible. I suddenly like not-GC languages more :-) I think the compilation of D code must build a data structure that will be used at runtime by the GC to know the type of all variables and pointers, otherwise there's no hope in a GC that works well in long-running programs.
The current GC has a simple "type info" if you will -- contains pointers or doesn't contain pointers. It doesn't mean we cannot add to that. In fact, I think dsimcha has provided a way to have precise scanning for heap-allocated types. I don't think a reasonably precise GC is out of the question. However, it may be too much to require the GC to do semantic analysis of enums for unions. Not impossible, but probably not worth the effort and restrictions necessary. -Steve
Mar 28 2010
On Sun, 28 Mar 2010 13:50:20 -0300, bearophile <bearophileHUGS lycos.com> wrote:Robert Jacques:No, what you can't do is hide flags in high order bits or use tricks like XOR to store two pointers in a single field. The 4 low order bits are fair game:
This page: http://www.digitalmars.com/d/2.0/garbage.html Says: Do not take advantage of alignment of pointers to store bit flags in the low order bits: p = cast(void*)(cast(int)p | 1); // error: undefined behavior Bye, bearophile
Yes, hiding bit flags in pointers is generally a bad idea because it makes your code dependent on a particular GC/CPU system. Though I do think storing a single bit is going to work on anything greater than a 8-bit system.
Mar 28 2010
On Sun, 28 Mar 2010 16:16:41 -0300, Steven Schveighoffer <schveiguy yahoo.com> wrote:On Sun, 28 Mar 2010 12:50:20 -0400, bearophile <bearophileHUGS lycos.com> wrote:Robert Jacques:What I think you're forgetting is that all compile-time type info is lost at runtime. [... etc]<
Thank you very much for all your explanations, I didn't know that the situation is so terrible. I suddenly like not-GC languages more :-) I think the compilation of D code must build a data structure that will be used at runtime by the GC to know the type of all variables and pointers, otherwise there's no hope in a GC that works well in long-running programs.
The current GC has a simple "type info" if you will -- contains pointers or doesn't contain pointers. It doesn't mean we cannot add to that. In fact, I think dsimcha has provided a way to have precise scanning for heap-allocated types. I don't think a reasonably precise GC is out of the question. However, it may be too much to require the GC to do semantic analysis of enums for unions. Not impossible, but probably not worth the effort and restrictions necessary. -Steve
Also, don't forget that classes have a bunch of runtime type info.
Mar 28 2010
On Sun, 28 Mar 2010 23:30:32 -0400, Robert Jacques <sandford jhu.edu> wrote:On Sun, 28 Mar 2010 16:16:41 -0300, Steven Schveighoffer <schveiguy yahoo.com> wrote:The current GC has a simple "type info" if you will -- contains pointers or doesn't contain pointers. It doesn't mean we cannot add to that. In fact, I think dsimcha has provided a way to have precise scanning for heap-allocated types. I don't think a reasonably precise GC is out of the question. However, it may be too much to require the GC to do semantic analysis of enums for unions. Not impossible, but probably not worth the effort and restrictions necessary.
Also, don't forget that classes have a bunch of runtime type info.
But the GC doesn't use/need this info (except to call the destructors). At least the mark/sweep portion doesn't. -Steve
Mar 29 2010
On Mon, 29 Mar 2010 08:09:09 -0300, Steven Schveighoffer <schveiguy yahoo.com> wrote:On Sun, 28 Mar 2010 23:30:32 -0400, Robert Jacques <sandford jhu.edu> wrote:On Sun, 28 Mar 2010 16:16:41 -0300, Steven Schveighoffer <schveiguy yahoo.com> wrote:The current GC has a simple "type info" if you will -- contains pointers or doesn't contain pointers. It doesn't mean we cannot add to that. In fact, I think dsimcha has provided a way to have precise scanning for heap-allocated types. I don't think a reasonably precise GC is out of the question. However, it may be too much to require the GC to do semantic analysis of enums for unions. Not impossible, but probably not worth the effort and restrictions necessary.
Also, don't forget that classes have a bunch of runtime type info.
But the GC doesn't use/need this info (except to call the destructors). At least the mark/sweep portion doesn't. -Steve
Sorry, my comment was more for a D in general than the GC. GC's in general don't know or need anything beyond a pointer mask and whether to finalize or not.
Mar 29 2010









bearophile <bearophileHUGS lycos.com> 