digitalmars.D.learn - A question/suggestion regarding the GC

bearophile (16/16) Sep 20 2008 In \phobos\internal\gc\gc.d there's this function:

Jarrett Billingsley (12/25) Sep 20 2008 This information would be embedded in the typeinfo at compile time,

bearophile (11/19) Sep 20 2008 But isn't this impossible? There are dynamic casts too.

Jarrett Billingsley (29/35) Sep 20 2008 It was sarcasm, but only because I thought you already knew how

bearophile (11/16) Sep 20 2008 I write normal code too, but mostly in other languages.

Sean Kelly (12/19) Sep 20 2008 The Phobos/Tango GC uses pools of fixed-size memory blocks that increase...

Christopher Wright (14/28) Sep 21 2008 I would prefer to avoid multiple GCs in one application. For one thing,

bearophile <bearophileHUGS lycos.com> writes:

In \phobos\internal\gc\gc.d  there's this function:

void setTypeInfo(TypeInfo ti, void* p) {
  if (ti.flags() & 1)
    hasNoPointers(p);
  else
    hasPointers(p);
}

At first sight the code of that function looks wrong, because if you call it
like this:

struct S { int* pi, int i }
setTypeInfo(typeid(S), void* p);

then the GC will look for pointers in the 'i' field too. So setTypeInfo()
becomes nearly useless to me and I can use my HasReferences!() template and
call hasNoPointers/hasPointers manually.

TypeInfo knows at runtime the name of the type, so at runtime it can use an
associative array to find the name mangling of the given type, and with that it
can determine if each field of S is a pointer/reference or not (and it can even
store such information into a bitarray for future needs, so the name demangling
is done only once, and only for types that receive a typeid() call on them).
This isn't perfect because it may mark as reference pointers to not-GC-managed
memory, but it look better than the current situation anyway.

I think this isn't going to slow down D code. And can make the GC a bit more
precise. I think this may help speed up the deallocation/management of
associative arrays too.

I am surely missing many details and I can be wrong, so if you want you can
tell me what I am wrong about, so I can learn a bit more.

Bye and thank you,
bearophile

Sep 20 2008

"Jarrett Billingsley" <jarrett.billingsley gmail.com> writes:

On Sat, Sep 20, 2008 at 3:04 PM, bearophile <bearophileHUGS lycos.com> wrote:
 In \phobos\internal\gc\gc.d  there's this function:

 void setTypeInfo(TypeInfo ti, void* p) {
  if (ti.flags() & 1)
    hasNoPointers(p);
  else
    hasPointers(p);
 }

 At first sight the code of that function looks wrong, because if you call it
like this:

 struct S { int* pi, int i }
 setTypeInfo(typeid(S), void* p);

 then the GC will look for pointers in the 'i' field too. So setTypeInfo()
becomes nearly useless to me and I can use my HasReferences!() template and
call hasNoPointers/hasPointers manually.

Congratulations, you've figured out how conservative GCs work.

 TypeInfo knows at runtime the name of the type, so at runtime it can use an
associative array to find the name mangling of the given type, and with that it
can determine if each field of S is a pointer/reference or not (and it can even
store such information into a bitarray for future needs, so the name demangling
is done only once, and only for types that receive a typeid() call on them).
This isn't perfect because it may mark as reference pointers to not-GC-managed
memory, but it look better than the current situation anyway.

This information would be embedded in the typeinfo at compile time,
not figured out at runtime.  Geez, if you're looking for
performance...

But the number of pointers to non-GC'ed memory is so minimal that it's
most likely not a problem.

 I think this isn't going to slow down D code. And can make the GC a bit more
precise. I think this may help speed up the deallocation/management of
associative arrays too.

D can never have a precise GC because of unions, sadly.  But SafeD
could have a precise GC.

The performance effects, however, are a bit more complicated than "I
think it's not going to be a problem."  You have a way with being
unconvincing in your proposals.

Sep 20 2008

bearophile <bearophileHUGS lycos.com> writes:

Jarrett Billingsley:

Congratulations, you've figured out how conservative GCs work.<

Thank you. I don't know if you are trying to be sarcastic, but for me this
stuff isn't obvious as it is for most of you here. Anyway, I'm sure that with
time I'll understand those topics.


This information would be embedded in the typeinfo at compile time, not figured
out at runtime.<

But isn't this impossible? There are dynamic casts too.


Geez, if you're looking for performance...<

If you are willing, you may explain me why doing it at runtime may
significantly damage the general performance of general D code.


But the number of pointers to non-GC'ed memory is so minimal that it's most
likely not a problem.<

Okay (in some of my programs I use mostly the C heap, so I presume my code is
not common).


D can never have a precise GC because of unions, sadly.<

But what I have tried to express is that there can be levels of precision, and
even if some sources of approximation can't be avoided (or are too much
difficult to avoid) there can be ways to remove some other of them. A more
precise GC may be better than a less precise one, even if both of them aren't
fully precise :-)


But SafeD could have a precise GC.<

I hope to see this, but I think there are some people that like D that think
that having two different GCs running at the same time is too much complext,
etc.


The performance effects, however, are a bit more complicated than "I think it's
not going to be a problem."<

Sure. I have just started understanding this topic.


You have a way with being unconvincing in your proposals.<

I presume it's the way I am. On the other hand I don't like people that are
both bold and ignorant, etc. So in the D community there are people way worse
than me ;-)

Thank you,
bearophile

Sep 20 2008

"Jarrett Billingsley" <jarrett.billingsley gmail.com> writes:

On Sat, Sep 20, 2008 at 4:58 PM, bearophile <bearophileHUGS lycos.com> wrote:

 Thank you. I don't know if you are trying to be sarcastic, but for me this
stuff isn't obvious as it is for most of you here. Anyway, I'm sure that with
time I'll understand those topics.

It was sarcasm, but only because I thought you already knew how
conservative GCs worked ;)

 But isn't this impossible? There are dynamic casts too.

Dynamic casts are only made possible by objects carrying around their
type information -- which is determined at compile time -- around with
them.  But even then, there's no need for that to come into
consideration in the GC.  I suppose what you're saying is that if you
have a class Base and a class Derived : Base, and then something else
has a Base reference that's really a Derived instance, that issues
would arise.  Not so -- any object.classinfo will point to the
typeinfo that that object _really_ is, so any instance of Derived,
even if it's stored in a Base reference, will have typeid(Derived) as
its classinfo.  So as long as the typeinfo/classinfo defines some
method or bitmask or whatever for determining what to scan, dynamic
casts don't come into it.

 If you are willing, you may explain me why doing it at runtime may
significantly damage the general performance of general D code.

It's pointless to do something at runtime if it can be done perfectly
fine at compile time.  That's the _entire reason that
statically-checkable languages exist_.

 Okay (in some of my programs I use mostly the C heap, so I presume my code is
not common).

Your code is, from what I've seen, uncommon.

 But what I have tried to express is that there can be levels of precision, and
even if some sources of approximation can't be avoided (or are too much
difficult to avoid) there can be ways to remove some other of them. A more
precise GC may be better than a less precise one, even if both of them aren't
fully precise :-)

True :)

 I hope to see this, but I think there are some people that like D that think
that having two different GCs running at the same time is too much complext,
etc.

Let me guess: in that sentence, "some people" means "me" (you).

Actually it wouldn't require having 2 GCs at all, as long as you
forewent the possibility of having a moving GC.  More or less SafeD
would be memory-safe, and so this theoretical "almost-precise" GC
_would_ be precise for SafeD code, and would only be imprecise when
dealing with non-SafeD code.  It would all be covered by one GC,
though.

If your _entire_ program were in SafeD, however, you could come up
with a new GC that could do heap compaction and the like.

Sep 20 2008

bearophile <bearophileHUGS lycos.com> writes:

Jarrett Billingsley:

I thought you already knew how conservative GCs worked ;)<

Me too. In my mind there are dead zones of theory that I have to transform into
small living nets of practical knowledge :-)


Your code is, from what I've seen, uncommon.<

I write normal code too, but mostly in other languages.


Let me guess: in that sentence, "some people" means "me" (you).<

No, when I mean me I say me.
I'd like to see a built-in Safe D (maybe like you tag #unsafe


It would all be covered by one GC, though.<

I see, good.


If your _entire_ program were in SafeD, however, you could come up with a new
GC that could do heap compaction and the like.<

From what I have read the compacting is very important, without it the heap may
become quite fragmented so the program may become very slow.
http://deepheap.blogspot.com/2008/09/ghost-in-java-virtual-machine.html
This may mean that currently D may be not much fit to write a program that has
to run no-stop for days, because I think there are no ways to avoid high
fragmentation of the heap.

Bye and thank you,
bearophile

Sep 20 2008

Sean Kelly <sean invisibleduck.org> writes:

bearophile wrote:
 Jarrett Billingsley: 
 
 If your _entire_ program were in SafeD, however, you could come up with a new
GC that could do heap compaction and the like.<

 
 From what I have read the compacting is very important, without it the heap
may become quite fragmented so the program may become very slow.
 http://deepheap.blogspot.com/2008/09/ghost-in-java-virtual-machine.html
 This may mean that currently D may be not much fit to write a program that has
to run no-stop for days, because I think there are no ways to avoid high
fragmentation of the heap.

The Phobos/Tango GC uses pools of fixed-size memory blocks that increase 
in powers of two, so 16, 32, 64, etc, bytes.  Fragmentation is basically 
impossible with this design.

More generally, there are papers in print showing that fragmentation in 
allocators is a solved problem, so in general I wouldn't worry about it 
these days anyway.  If there are Java VMs with fragmentation issues then 
I'd say their allocators are junk.

That said, compaction can also reduce the number of page faults in some 
apps as well as reduce the process memory footprint, so there are 
definitely advantages to it even when fragmentation is not a concern.


Sean

Sep 20 2008

Christopher Wright <dhasenan gmail.com> writes:

Jarrett Billingsley wrote:
 On Sat, Sep 20, 2008 at 4:58 PM, bearophile <bearophileHUGS lycos.com> wrote:
 I hope to see this, but I think there are some people that like D that think
that having two different GCs running at the same time is too much complext,
etc.

 
 Let me guess: in that sentence, "some people" means "me" (you).
 
 Actually it wouldn't require having 2 GCs at all, as long as you
 forewent the possibility of having a moving GC.  More or less SafeD
 would be memory-safe, and so this theoretical "almost-precise" GC
 _would_ be precise for SafeD code, and would only be imprecise when
 dealing with non-SafeD code.  It would all be covered by one GC,
 though.
 
 If your _entire_ program were in SafeD, however, you could come up
 with a new GC that could do heap compaction and the like.

I would prefer to avoid multiple GCs in one application. For one thing, 
fullCollect would either have to scan twice or only collect stuff from 
one GC. Probably the former; due to the use of SIGUSR1/SIGUSR2 on Unix, 
it'd be difficult to have two GCs run separately in one process without 
some sort of controlling GC.

Anyway, two GCs running at once is guaranteed to be inefficient.

You could usefully come up with a pluggable GC that does much of the 
scanning and so forth, maybe. But I wouldn't hold my breath.

Still, heap compaction doesn't require SafeD. It requires compiler 
support, but not language support. The only thing that you can't do 
without language support is a precise GC that handles unions. (Maybe you 
could do that as well, if the compiler inserted a discriminator value.) 
You could accomplish this with C or C++ or D.

Sep 21 2008

D Programming

C/C++ Programming

Other

digitalmars.D.learn - A question/suggestion regarding the GC