www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - gc question

reply Ben Hinkle <bhinkle4 juno.com> writes:
I have a question about garbage that I'm curious if anyone can help out
with. I'm wondering what percentage of memory allocations contain pointers.
A string, for example, doesn't contain pointers. An array of pointers does
contain pointers. What's the typical breakdown? I'm curious because I'm
experimenting with the GC and if the "new" operator can tell if the thing
it is allocating doesn't have any pointers then it can allocate it from a
region of the GC that doesn't get scanned for pointers - hence speed up the
marking phase of a collection. I'm guessing some paper exists out there
that has studied this but I can't find anything useful.

thanks,
Ben
Oct 24 2004
next sibling parent reply Sean Kelly <sean f4.ca> writes:
In article <clho8q$pnt$1 digitaldaemon.com>, Ben Hinkle says...
I have a question about garbage that I'm curious if anyone can help out
with. I'm wondering what percentage of memory allocations contain pointers.
A string, for example, doesn't contain pointers. An array of pointers does
contain pointers. What's the typical breakdown? I'm curious because I'm
experimenting with the GC and if the "new" operator can tell if the thing
it is allocating doesn't have any pointers then it can allocate it from a
region of the GC that doesn't get scanned for pointers - hence speed up the
marking phase of a collection. I'm guessing some paper exists out there
that has studied this but I can't find anything useful.

I'm not sure if this link contains a paper that directly applies, but it does have a lot of good information: http://www.cs.utexas.edu/users/oops/papers.html Sean
Oct 25 2004
parent Ben Hinkle <bhinkle4 juno.com> writes:
Sean Kelly wrote:

 In article <clho8q$pnt$1 digitaldaemon.com>, Ben Hinkle says...
I have a question about garbage that I'm curious if anyone can help out
with. I'm wondering what percentage of memory allocations contain
pointers. A string, for example, doesn't contain pointers. An array of
pointers does contain pointers. What's the typical breakdown? I'm curious
because I'm experimenting with the GC and if the "new" operator can tell
if the thing it is allocating doesn't have any pointers then it can
allocate it from a region of the GC that doesn't get scanned for pointers
- hence speed up the marking phase of a collection. I'm guessing some
paper exists out there that has studied this but I can't find anything
useful.

I'm not sure if this link contains a paper that directly applies, but it does have a lot of good information: http://www.cs.utexas.edu/users/oops/papers.html Sean

nice link. Looking at the Boehm collector it has an API malloc_atomic that is used for memory that doesn't have any pointers in it. I'll try poking around with GDC to see if it can be used (if it isn't used already). I'd also like to try some of the other hooks like incremental GC and parallel marking etc. -Ben
Oct 25 2004
prev sibling next sibling parent reply Dave <Dave_member pathlink.com> writes:
Ben Hinkle wrote:

 I have a question about garbage that I'm curious if anyone can help out
 with. I'm wondering what percentage of memory allocations contain
 pointers. A string, for example, doesn't contain pointers. An array of
 pointers does contain pointers. What's the typical breakdown? I'm curious
 because I'm experimenting with the GC and if the "new" operator can tell
 if the thing it is allocating doesn't have any pointers then it can
 allocate it from a region of the GC that doesn't get scanned for pointers
 - hence speed up the marking phase of a collection. I'm guessing some
 paper exists out there that has studied this but I can't find anything
 useful.
 
 thanks,
 Ben

I can't help directly with the question, but I wanted to mention that I think you are definately on the right track as far as improving D through improving the GC marking phase. I've been benchmarking some stuff and have noticed that the DMD compiler generally puts out some very competitive code for operations that don't involve a great deal of de/allocation - sometimes a bit better than Intel C/++ (in my book that's outstanding), for operations on primitive types, conditional jumps, vectors, etc. GDC is competitive also - seems to be pretty close to DMD overall, which I think really validates the work Walter is doing on both the language design and compiler, as well as David Friedman's work on GDC. All that to underscore that the one area where D seems to lag right now is in GC allocation, and that seems to be strongly tied to the collection that kicks in before the heap is expanded: gcx.d/malloc() ... # if (!gcx.fullcollectshell()) // collect to find a new page # { # //gcx.newPool(1); # } ... If you drill into that further, most of that time spent during the collection is during the sweep phase (obviously). If some way could be found to improve that, then I think that would really help D v1.0 get out of the gate well. An improvement here would obviously effect a lot of important things - string/array concatentation, object instantiation, AA perf., stream operations, etc.. - Dave
Oct 27 2004
next sibling parent Dave <Dave_member pathlink.com> writes:
In article <clpedn$l7p$1 digitaldaemon.com>, Dave says...

[snip]
I can't help directly with the question, but I wanted to mention that I
think you are definately on the right track as far as improving D through
improving the GC marking phase.

[snip]
If you drill into that further, most of that time spent during the
collection is during the sweep phase (obviously).

That should have been "...during the mark phase...".
Oct 28 2004
prev sibling parent reply pragma <pragma_member pathlink.com> writes:
In article <clpedn$l7p$1 digitaldaemon.com>, Dave says...
[snip]

All that to underscore that the one area where D seems to lag right now is
in GC allocation, and that seems to be strongly tied to the collection that
kicks in before the heap is expanded:

I suppose you could always disable/enable the more sensitive and critical sections of your code to side-step this. Of course, it just puts of the inevidable, so its really a band-aid approach at best. Off the top of my head, the only optimization I can think of here is to opt for an incremental collector design (provided that D isn't one already). At least then your overhead per collection phase will be /closer/ to constant than it is at present. BTW, has anyone done anything to make the GC more pluggable? Anyone out there done any GC hacking on D already? I'm curious to see what has been, and what could be, done in this area. The reason why I bring this up is that any performance concern for something as fundamental as the GC is going to have a penalty for a certain category of applications. Optional GC types is one way to satisfy the wide range of memory managment strategies needed by a language platform such as D. If one could compile-in a different flavor of GC (stop-and-collect, incremental, compacting, generational/copying, lazy-as-hell, none-at-all?) you could at least pick the set of consequences that you're stuck with. EricAnderton at --yahoo--
Oct 28 2004
next sibling parent Sean Kelly <sean f4.ca> writes:
In article <clrn6d$flm$1 digitaldaemon.com>, pragma says...
BTW, has anyone done anything to make the GC more pluggable?  Anyone out there
done any GC hacking on D already?  I'm curious to see what has been, and what
could be, done in this area.

Not really, though I've been gutting Phobos in a more general sense. I would like to modularize all the internals, but haven't quite gotten this far yet--I'm still working on separating out a minimal D runtime from the rest of Phobos. Follow up in the Ares forum or here if you're so inclined. Sean
Oct 28 2004
prev sibling parent Russ Lewis <spamhole-2001-07-16 deming-os.org> writes:
pragma wrote:
 In article <clpedn$l7p$1 digitaldaemon.com>, Dave says...
 
[snip]

All that to underscore that the one area where D seems to lag right now is
in GC allocation, and that seems to be strongly tied to the collection that
kicks in before the heap is expanded:

I suppose you could always disable/enable the more sensitive and critical sections of your code to side-step this. Of course, it just puts of the inevidable, so its really a band-aid approach at best.

I'd like to quibble a bit here. The cost of performing a GC sweep (given our current, mark-and-sweep collector) is roughly propotional to the current *active* set. So if you put off the GC run for a while, and generate a lot more garbage in the meantime (but don't increase the active set of your program) then running a GC sweep at some later date is no more costly than now. That is, by delaying the GC sweep a while, you aren't necessarily making the next GC sweep any longer. Of course, as garbage piles up, that means that you have more and more destructors to run, so there is some increased cost, but the total cost here (sum over the lifetime of the program) is proportional to the number of garbage objects you create, and has nothing to do with the frequency with which the GC runs. What this all means is that, IMHO, you should try to limit the number of times when the GC runs, and when it runs, you want it to run when you have a small working set. In other words, if you have an algorithm that creates lots and lots of objects, and then releases most of them, you might run the GC before and after the algorithm, but leave it disabled during the algorithm.
Oct 28 2004
prev sibling parent Kevin Bealer <Kevin_member pathlink.com> writes:
I think you could get this optimization with something like:

:class MemHandle {
:  byte * data;
:  uint   length;
:  this(uint x) { data = malloc(length=x); }
:  ~this() { free(data); }
:}

I think the malloc'd area should be free of GC interactions (it won't be
scanned) unless the GC's "addRange" is called.  This object will ensure
collection happens (so keep a pointer to it).

Maybe it would be cleaner to directly support objects that have "in" pointers
but not "out" pointers (char arrays for example).

Kevin

In article <clho8q$pnt$1 digitaldaemon.com>, Ben Hinkle says...
I have a question about garbage that I'm curious if anyone can help out
with. I'm wondering what percentage of memory allocations contain pointers.
A string, for example, doesn't contain pointers. An array of pointers does
contain pointers. What's the typical breakdown? I'm curious because I'm
experimenting with the GC and if the "new" operator can tell if the thing
it is allocating doesn't have any pointers then it can allocate it from a
region of the GC that doesn't get scanned for pointers - hence speed up the
marking phase of a collection. I'm guessing some paper exists out there
that has studied this but I can't find anything useful.

thanks,
Ben

Oct 28 2004