www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - A question/suggestion regarding the GC

reply bearophile <bearophileHUGS lycos.com> writes:
In \phobos\internal\gc\gc.d  there's this function:

void setTypeInfo(TypeInfo ti, void* p) {
  if (ti.flags() & 1)
    hasNoPointers(p);
  else
    hasPointers(p);
}

At first sight the code of that function looks wrong, because if you call it
like this:

struct S { int* pi, int i }
setTypeInfo(typeid(S), void* p);

then the GC will look for pointers in the 'i' field too. So setTypeInfo()
becomes nearly useless to me and I can use my HasReferences!() template and
call hasNoPointers/hasPointers manually.

TypeInfo knows at runtime the name of the type, so at runtime it can use an
associative array to find the name mangling of the given type, and with that it
can determine if each field of S is a pointer/reference or not (and it can even
store such information into a bitarray for future needs, so the name demangling
is done only once, and only for types that receive a typeid() call on them).
This isn't perfect because it may mark as reference pointers to not-GC-managed
memory, but it look better than the current situation anyway.

I think this isn't going to slow down D code. And can make the GC a bit more
precise. I think this may help speed up the deallocation/management of
associative arrays too.

I am surely missing many details and I can be wrong, so if you want you can
tell me what I am wrong about, so I can learn a bit more.

Bye and thank you,
bearophile
Sep 20 2008
next sibling parent reply "Jarrett Billingsley" <jarrett.billingsley gmail.com> writes:
On Sat, Sep 20, 2008 at 3:04 PM, bearophile <bearophileHUGS lycos.com> wrote:
 In \phobos\internal\gc\gc.d  there's this function:

 void setTypeInfo(TypeInfo ti, void* p) {
  if (ti.flags() & 1)
    hasNoPointers(p);
  else
    hasPointers(p);
 }

 At first sight the code of that function looks wrong, because if you call it
like this:

 struct S { int* pi, int i }
 setTypeInfo(typeid(S), void* p);

 then the GC will look for pointers in the 'i' field too. So setTypeInfo()
becomes nearly useless to me and I can use my HasReferences!() template and
call hasNoPointers/hasPointers manually.

Congratulations, you've figured out how conservative GCs work.
 TypeInfo knows at runtime the name of the type, so at runtime it can use an
associative array to find the name mangling of the given type, and with that it
can determine if each field of S is a pointer/reference or not (and it can even
store such information into a bitarray for future needs, so the name demangling
is done only once, and only for types that receive a typeid() call on them).
This isn't perfect because it may mark as reference pointers to not-GC-managed
memory, but it look better than the current situation anyway.

This information would be embedded in the typeinfo at compile time, not figured out at runtime. Geez, if you're looking for performance... But the number of pointers to non-GC'ed memory is so minimal that it's most likely not a problem.
 I think this isn't going to slow down D code. And can make the GC a bit more
precise. I think this may help speed up the deallocation/management of
associative arrays too.

D can never have a precise GC because of unions, sadly. But SafeD could have a precise GC. The performance effects, however, are a bit more complicated than "I think it's not going to be a problem." You have a way with being unconvincing in your proposals.
Sep 20 2008
parent reply bearophile <bearophileHUGS lycos.com> writes:
Jarrett Billingsley:

Congratulations, you've figured out how conservative GCs work.<

Thank you. I don't know if you are trying to be sarcastic, but for me this stuff isn't obvious as it is for most of you here. Anyway, I'm sure that with time I'll understand those topics.
This information would be embedded in the typeinfo at compile time, not figured
out at runtime.<

But isn't this impossible? There are dynamic casts too.
Geez, if you're looking for performance...<

If you are willing, you may explain me why doing it at runtime may significantly damage the general performance of general D code.
But the number of pointers to non-GC'ed memory is so minimal that it's most
likely not a problem.<

Okay (in some of my programs I use mostly the C heap, so I presume my code is not common).
D can never have a precise GC because of unions, sadly.<

But what I have tried to express is that there can be levels of precision, and even if some sources of approximation can't be avoided (or are too much difficult to avoid) there can be ways to remove some other of them. A more precise GC may be better than a less precise one, even if both of them aren't fully precise :-)
But SafeD could have a precise GC.<

I hope to see this, but I think there are some people that like D that think that having two different GCs running at the same time is too much complext, etc.
The performance effects, however, are a bit more complicated than "I think it's
not going to be a problem."<

Sure. I have just started understanding this topic.
You have a way with being unconvincing in your proposals.<

I presume it's the way I am. On the other hand I don't like people that are both bold and ignorant, etc. So in the D community there are people way worse than me ;-) Thank you, bearophile
Sep 20 2008
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Jarrett Billingsley:

I thought you already knew how conservative GCs worked ;)<

Me too. In my mind there are dead zones of theory that I have to transform into small living nets of practical knowledge :-)
Your code is, from what I've seen, uncommon.<

I write normal code too, but mostly in other languages.
Let me guess: in that sentence, "some people" means "me" (you).<

No, when I mean me I say me. I'd like to see a built-in Safe D (maybe like you tag #unsafe parts in C#?).
It would all be covered by one GC, though.<

I see, good.
If your _entire_ program were in SafeD, however, you could come up with a new
GC that could do heap compaction and the like.<

From what I have read the compacting is very important, without it the heap may become quite fragmented so the program may become very slow. http://deepheap.blogspot.com/2008/09/ghost-in-java-virtual-machine.html This may mean that currently D may be not much fit to write a program that has to run no-stop for days, because I think there are no ways to avoid high fragmentation of the heap. Bye and thank you, bearophile
Sep 20 2008
parent Sean Kelly <sean invisibleduck.org> writes:
bearophile wrote:
 Jarrett Billingsley: 
 
 If your _entire_ program were in SafeD, however, you could come up with a new
GC that could do heap compaction and the like.<

From what I have read the compacting is very important, without it the heap may become quite fragmented so the program may become very slow. http://deepheap.blogspot.com/2008/09/ghost-in-java-virtual-machine.html This may mean that currently D may be not much fit to write a program that has to run no-stop for days, because I think there are no ways to avoid high fragmentation of the heap.

The Phobos/Tango GC uses pools of fixed-size memory blocks that increase in powers of two, so 16, 32, 64, etc, bytes. Fragmentation is basically impossible with this design. More generally, there are papers in print showing that fragmentation in allocators is a solved problem, so in general I wouldn't worry about it these days anyway. If there are Java VMs with fragmentation issues then I'd say their allocators are junk. That said, compaction can also reduce the number of page faults in some apps as well as reduce the process memory footprint, so there are definitely advantages to it even when fragmentation is not a concern. Sean
Sep 20 2008
prev sibling parent Christopher Wright <dhasenan gmail.com> writes:
Jarrett Billingsley wrote:
 On Sat, Sep 20, 2008 at 4:58 PM, bearophile <bearophileHUGS lycos.com> wrote:
 I hope to see this, but I think there are some people that like D that think
that having two different GCs running at the same time is too much complext,
etc.

Let me guess: in that sentence, "some people" means "me" (you). Actually it wouldn't require having 2 GCs at all, as long as you forewent the possibility of having a moving GC. More or less SafeD would be memory-safe, and so this theoretical "almost-precise" GC _would_ be precise for SafeD code, and would only be imprecise when dealing with non-SafeD code. It would all be covered by one GC, though. If your _entire_ program were in SafeD, however, you could come up with a new GC that could do heap compaction and the like.

I would prefer to avoid multiple GCs in one application. For one thing, fullCollect would either have to scan twice or only collect stuff from one GC. Probably the former; due to the use of SIGUSR1/SIGUSR2 on Unix, it'd be difficult to have two GCs run separately in one process without some sort of controlling GC. Anyway, two GCs running at once is guaranteed to be inefficient. You could usefully come up with a pluggable GC that does much of the scanning and so forth, maybe. But I wouldn't hold my breath. Still, heap compaction doesn't require SafeD. It requires compiler support, but not language support. The only thing that you can't do without language support is a precise GC that handles unions. (Maybe you could do that as well, if the compiler inserted a discriminator value.) You could accomplish this with C or C++ or D.
Sep 21 2008
prev sibling parent "Jarrett Billingsley" <jarrett.billingsley gmail.com> writes:
On Sat, Sep 20, 2008 at 4:58 PM, bearophile <bearophileHUGS lycos.com> wrote:

 Thank you. I don't know if you are trying to be sarcastic, but for me this
stuff isn't obvious as it is for most of you here. Anyway, I'm sure that with
time I'll understand those topics.

It was sarcasm, but only because I thought you already knew how conservative GCs worked ;)
 But isn't this impossible? There are dynamic casts too.

Dynamic casts are only made possible by objects carrying around their type information -- which is determined at compile time -- around with them. But even then, there's no need for that to come into consideration in the GC. I suppose what you're saying is that if you have a class Base and a class Derived : Base, and then something else has a Base reference that's really a Derived instance, that issues would arise. Not so -- any object.classinfo will point to the typeinfo that that object _really_ is, so any instance of Derived, even if it's stored in a Base reference, will have typeid(Derived) as its classinfo. So as long as the typeinfo/classinfo defines some method or bitmask or whatever for determining what to scan, dynamic casts don't come into it.
 If you are willing, you may explain me why doing it at runtime may
significantly damage the general performance of general D code.

It's pointless to do something at runtime if it can be done perfectly fine at compile time. That's the _entire reason that statically-checkable languages exist_.
 Okay (in some of my programs I use mostly the C heap, so I presume my code is
not common).

Your code is, from what I've seen, uncommon.
 But what I have tried to express is that there can be levels of precision, and
even if some sources of approximation can't be avoided (or are too much
difficult to avoid) there can be ways to remove some other of them. A more
precise GC may be better than a less precise one, even if both of them aren't
fully precise :-)

True :)
 I hope to see this, but I think there are some people that like D that think
that having two different GCs running at the same time is too much complext,
etc.

Let me guess: in that sentence, "some people" means "me" (you). Actually it wouldn't require having 2 GCs at all, as long as you forewent the possibility of having a moving GC. More or less SafeD would be memory-safe, and so this theoretical "almost-precise" GC _would_ be precise for SafeD code, and would only be imprecise when dealing with non-SafeD code. It would all be covered by one GC, though. If your _entire_ program were in SafeD, however, you could come up with a new GC that could do heap compaction and the like.
Sep 20 2008