www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Changes in the D2 design to help the GC?

reply bearophile <bearophileHUGS lycos.com> writes:
In Java the GC is able to collect garbage very quickly, so people in Java
allocate many small objects quite often.
In functional-style languages, like Scala, Clojure, F#, etc, most data is
immutable, so again the GC has lot of pressure in allocating and freeing many
small structures all the time.

D2 syntax allows both styles of programming (you can program in D almost as
Java, if you want), but if you follow one of those two styles of programming
you will see that the current D GC is much less efficient, and leads to low
performance, compared to Java/F#. (Scoped classes are not enough).

I am not expert of GCs yet, but I'm certain there are ways to improve the
current situation. Beside improving the GC itself, there can be ways to modify
a bit the current design of D2 to help the design of a more efficient GC. Do
you have ideas?

Time ago I have suggested to split the D pointers in two types, the GC-managed
ones and the ones that work on the C heap, that the GC never touches. The type
system can assure they never get mixed by mistake. Now I think (just an idea)
the type of GC-managed pointers can be split in two types: the ones that are
fully managed by a moving GC (see below) and the ones managed by a conservative
GC, such memory is pinned, and the GC doesn't move it around. The type system
will assure such three groups doesn't mix unless the programmer is really
determined to mix them :-)

A simple idea of mine to improve the GC (not to change the D2 language yet) is
to split the D GC in two parts, one is a moving one, that acts like a
Java-style GC, especially useful in SafeD code, such GC will become the one
used in OOP/functional-style code, probably it is the GC that will be used in
most of the code of most D programs. A second part of the GC acts in a
conservative way, like the current GC, it's safer. The second part of the GC
manages "pinned" blocks of memory, that can't be moved, such memory is usually
the one managed in lower level D modules, by user-written collections, etc. The
performance of this second part of the GC will be lower (like the current one),
but most data will not be managed by it anyway.

When you use LDC the slow GC is one of the few parts of D language that have
low performance still (the other two part are that currently D isn't able to
inline closures and virtual methods. Such things too will eventually need to be
addressed if D wants to become high-performance. I can leave such topic to
other posts/threads).

Bye,
bearophile
Jul 15 2009
next sibling parent KennyTM~ <kennytm gmail.com> writes:
bearophile wrote:
 In Java the GC is able to collect garbage very quickly, so people in Java
allocate many small objects quite often.
 In functional-style languages, like Scala, Clojure, F#, etc, most data is
immutable, so again the GC has lot of pressure in allocating and freeing many
small structures all the time.
 
 D2 syntax allows both styles of programming (you can program in D almost as
Java, if you want), but if you follow one of those two styles of programming
you will see that the current D GC is much less efficient, and leads to low
performance, compared to Java/F#. (Scoped classes are not enough).
 
 I am not expert of GCs yet, but I'm certain there are ways to improve the
current situation. Beside improving the GC itself, there can be ways to modify
a bit the current design of D2 to help the design of a more efficient GC. Do
you have ideas?
 
 Time ago I have suggested to split the D pointers in two types, the GC-managed
ones and the ones that work on the C heap, that the GC never touches. The type
system can assure they never get mixed by mistake. Now I think (just an idea)
the type of GC-managed pointers can be split in two types: the ones that are
fully managed by a moving GC (see below) and the ones managed by a conservative
GC, such memory is pinned, and the GC doesn't move it around. The type system
will assure such three groups doesn't mix unless the programmer is really
determined to mix them :-)
 

No way, 3 kinds of constness is confusing enough.
 A simple idea of mine to improve the GC (not to change the D2 language yet) is
to split the D GC in two parts, one is a moving one, that acts like a
Java-style GC, especially useful in SafeD code, such GC will become the one
used in OOP/functional-style code, probably it is the GC that will be used in
most of the code of most D programs. A second part of the GC acts in a
conservative way, like the current GC, it's safer. The second part of the GC
manages "pinned" blocks of memory, that can't be moved, such memory is usually
the one managed in lower level D modules, by user-written collections, etc. The
performance of this second part of the GC will be lower (like the current one),
but most data will not be managed by it anyway.
 
 When you use LDC the slow GC is one of the few parts of D language that have
low performance still (the other two part are that currently D isn't able to
inline closures and virtual methods. Such things too will eventually need to be
addressed if D wants to become high-performance. I can leave such topic to
other posts/threads).
 
 Bye,
 bearophile

Jul 15 2009
prev sibling next sibling parent Lutger <lutger.blijdestijn gmail.com> writes:
I'm worried too about this, but haven't a clue as to what is needed to 
overcome the performance gap. I don't think extending the type system in a 
major way for some extra performance is worth it. Still, there may be some 
ways to make less drastic adjustments so that a (more) precise GC can be 
built. Or to put it another way: to not make a high performance GC for D 
impossible in the future.
Jul 15 2009
prev sibling next sibling parent reply Stewart Gordon <smjg_1998 yahoo.com> writes:
bearophile wrote:
<snip>
 Time ago I have suggested to split the D pointers in two types, the 
 GC-managed ones and the ones that work on the C heap, that the GC 
 never touches. The type system can assure they never get mixed by 
 mistake.

I can imagine this making interfacing external APIs a pain in the rear end....
 Now I think (just an idea) the type of GC-managed pointers
 can be split in two types: the ones that are fully managed by a
 moving GC (see below) and the ones managed by a conservative GC, such
 memory is pinned, and the GC doesn't move it around. The type system
 will assure such three groups doesn't mix unless the programmer is
 really determined to mix them :-)

I'm not sure that having two separate, independent GCs will work. But having two GC heaps along these lines might. One way I can see is having an "immovable" type modifier in line with const and invariant. Anything that isn't allocated as immovable, the GC may move around if it's clever enough. But an immovable reference could just as well be implicitly convertible to a non-immovable reference - the GC'll know which heap it points into. Immovable might be useful for interfacing external APIs. We could also spec that only immovable pointer/reference types may be used in a union. BTW even D1 needs some work in the area of moving GC: http://d.puremagic.com/issues/show_bug.cgi?id=679 Stewart.
Jul 16 2009
parent bearophile <bearophileHUGS lycos.com> writes:
Can D steal the future GC of Mono?
http://mono-project.com/Compacting_GC
http://www.go-mono.com/meeting06/mono-sgen.pdf
It manages pinned objects too, but it will be tuned for few of them, while in D
they are probably a bit more common

Bye,
bearophile
Jul 17 2009
prev sibling parent Iivari Mokelainen <iivari mokelainen.com> writes:
bearophile wrote:
 In Java the GC is able to collect garbage very quickly, so people in Java
allocate many small objects quite often.
 In functional-style languages, like Scala, Clojure, F#, etc, most data is
immutable, so again the GC has lot of pressure in allocating and freeing many
small structures all the time.
 
 D2 syntax allows both styles of programming (you can program in D almost as
Java, if you want), but if you follow one of those two styles of programming
you will see that the current D GC is much less efficient, and leads to low
performance, compared to Java/F#. (Scoped classes are not enough).
 
 I am not expert of GCs yet, but I'm certain there are ways to improve the
current situation. Beside improving the GC itself, there can be ways to modify
a bit the current design of D2 to help the design of a more efficient GC. Do
you have ideas?
 
 Time ago I have suggested to split the D pointers in two types, the GC-managed
ones and the ones that work on the C heap, that the GC never touches. The type
system can assure they never get mixed by mistake. Now I think (just an idea)
the type of GC-managed pointers can be split in two types: the ones that are
fully managed by a moving GC (see below) and the ones managed by a conservative
GC, such memory is pinned, and the GC doesn't move it around. The type system
will assure such three groups doesn't mix unless the programmer is really
determined to mix them :-)
 
 A simple idea of mine to improve the GC (not to change the D2 language yet) is
to split the D GC in two parts, one is a moving one, that acts like a
Java-style GC, especially useful in SafeD code, such GC will become the one
used in OOP/functional-style code, probably it is the GC that will be used in
most of the code of most D programs. A second part of the GC acts in a
conservative way, like the current GC, it's safer. The second part of the GC
manages "pinned" blocks of memory, that can't be moved, such memory is usually
the one managed in lower level D modules, by user-written collections, etc. The
performance of this second part of the GC will be lower (like the current one),
but most data will not be managed by it anyway.
 
 When you use LDC the slow GC is one of the few parts of D language that have
low performance still (the other two part are that currently D isn't able to
inline closures and virtual methods. Such things too will eventually need to be
addressed if D wants to become high-performance. I can leave such topic to
other posts/threads).
 
 Bye,
 bearophile

C# has a 'fixed' keyword, which assures that the variable in the fixed scope wont be moved by the GC. Such variables can be native pointers used for interpo'ing with OS or working fast with arrays (generating bitmaps in memory). But i dont think that is a viable solution - it's too big. Two GC's? no-no.
Jul 21 2009