digitalmars.D - gc question

Ben Hinkle (11/11) Oct 24 2004 I have a question about garbage that I'm curious if anyone can help out

Sean Kelly (4/13) Oct 25 2004 I'm not sure if this link contains a paper that directly applies, but it...

Ben Hinkle (7/26) Oct 25 2004 nice link. Looking at the Boehm collector it has an API malloc_atomic th...

Dave (29/42) Oct 27 2004 I can't help directly with the question, but I wanted to mention that I

Dave (4/9) Oct 28 2004 [snip]
pragma (19/23) Oct 28 2004 I suppose you could always disable/enable the more sensitive and critica...

Sean Kelly (6/9) Oct 28 2004 Not really, though I've been gutting Phobos in a more general sense. I ...
Russ Lewis (20/31) Oct 28 2004 I'd like to quibble a bit here. The cost of performing a GC sweep

Kevin Bealer (14/25) Oct 28 2004 I think you could get this optimization with something like:

Ben Hinkle <bhinkle4 juno.com> writes:

I have a question about garbage that I'm curious if anyone can help out
with. I'm wondering what percentage of memory allocations contain pointers.
A string, for example, doesn't contain pointers. An array of pointers does
contain pointers. What's the typical breakdown? I'm curious because I'm
experimenting with the GC and if the "new" operator can tell if the thing
it is allocating doesn't have any pointers then it can allocate it from a
region of the GC that doesn't get scanned for pointers - hence speed up the
marking phase of a collection. I'm guessing some paper exists out there
that has studied this but I can't find anything useful.

thanks,
Ben

Oct 24 2004

Sean Kelly <sean f4.ca> writes:

In article <clho8q$pnt$1 digitaldaemon.com>, Ben Hinkle says...
I have a question about garbage that I'm curious if anyone can help out
with. I'm wondering what percentage of memory allocations contain pointers.
A string, for example, doesn't contain pointers. An array of pointers does
contain pointers. What's the typical breakdown? I'm curious because I'm
experimenting with the GC and if the "new" operator can tell if the thing
it is allocating doesn't have any pointers then it can allocate it from a
region of the GC that doesn't get scanned for pointers - hence speed up the
marking phase of a collection. I'm guessing some paper exists out there
that has studied this but I can't find anything useful.

I'm not sure if this link contains a paper that directly applies, but it does
have a lot of good information: http://www.cs.utexas.edu/users/oops/papers.html


Sean

Oct 25 2004

Ben Hinkle <bhinkle4 juno.com> writes:

Sean Kelly wrote:

 In article <clho8q$pnt$1 digitaldaemon.com>, Ben Hinkle says...
I have a question about garbage that I'm curious if anyone can help out
with. I'm wondering what percentage of memory allocations contain
pointers. A string, for example, doesn't contain pointers. An array of
pointers does contain pointers. What's the typical breakdown? I'm curious
because I'm experimenting with the GC and if the "new" operator can tell
if the thing it is allocating doesn't have any pointers then it can
allocate it from a region of the GC that doesn't get scanned for pointers
- hence speed up the marking phase of a collection. I'm guessing some
paper exists out there that has studied this but I can't find anything
useful.

 
 I'm not sure if this link contains a paper that directly applies, but it
 does have a lot of good information:
 http://www.cs.utexas.edu/users/oops/papers.html
 
 
 Sean

nice link. Looking at the Boehm collector it has an API malloc_atomic that
is used for memory that doesn't have any pointers in it. I'll try poking
around with GDC to see if it can be used (if it isn't used already). I'd
also like to try some of the other hooks like incremental GC and parallel
marking etc.

-Ben

Oct 25 2004

Dave <Dave_member pathlink.com> writes:

Ben Hinkle wrote:

 I have a question about garbage that I'm curious if anyone can help out
 with. I'm wondering what percentage of memory allocations contain
 pointers. A string, for example, doesn't contain pointers. An array of
 pointers does contain pointers. What's the typical breakdown? I'm curious
 because I'm experimenting with the GC and if the "new" operator can tell
 if the thing it is allocating doesn't have any pointers then it can
 allocate it from a region of the GC that doesn't get scanned for pointers
 - hence speed up the marking phase of a collection. I'm guessing some
 paper exists out there that has studied this but I can't find anything
 useful.
 
 thanks,
 Ben

I can't help directly with the question, but I wanted to mention that I
think you are definately on the right track as far as improving D through
improving the GC marking phase.

I've been benchmarking some stuff and have noticed that the DMD compiler
generally puts out some very competitive code for operations that don't
involve a great deal of de/allocation - sometimes a bit better than Intel
C/++ (in my book that's outstanding), for operations on primitive types,
conditional jumps, vectors, etc. GDC is competitive also - seems to be
pretty close to DMD overall, which I think really validates the work Walter
is doing on both the language design and compiler, as well as David
Friedman's work on GDC.

All that to underscore that the one area where D seems to lag right now is
in GC allocation, and that seems to be strongly tied to the collection that
kicks in before the heap is expanded:

gcx.d/malloc()
...




...

If you drill into that further, most of that time spent during the
collection is during the sweep phase (obviously).

If some way could be found to improve that, then I think that would really
help D v1.0 get out of the gate well. An improvement here would obviously
effect a lot of important things - string/array concatentation, object
instantiation, AA perf., stream operations, etc..

- Dave

Oct 27 2004

Dave <Dave_member pathlink.com> writes:

In article <clpedn$l7p$1 digitaldaemon.com>, Dave says...

[snip]

I can't help directly with the question, but I wanted to mention that I
think you are definately on the right track as far as improving D through
improving the GC marking phase.

[snip]

If you drill into that further, most of that time spent during the
collection is during the sweep phase (obviously).

That should have been "...during the mark phase...".

Oct 28 2004

pragma <pragma_member pathlink.com> writes:

In article <clpedn$l7p$1 digitaldaemon.com>, Dave says...
[snip]

All that to underscore that the one area where D seems to lag right now is
in GC allocation, and that seems to be strongly tied to the collection that
kicks in before the heap is expanded:

I suppose you could always disable/enable the more sensitive and critical
sections of your code to side-step this.  Of course, it just puts of the
inevidable, so its really a band-aid approach at best.

Off the top of my head, the only optimization I can think of here is to opt for
an incremental collector design (provided that D isn't one already).  At least
then your overhead per collection phase will be /closer/ to constant than it is
at present.

BTW, has anyone done anything to make the GC more pluggable?  Anyone out there
done any GC hacking on D already?  I'm curious to see what has been, and what
could be, done in this area.

The reason why I bring this up is that any performance concern for something as
fundamental as the GC is going to have a penalty for a certain category of
applications.  Optional GC types is one way to satisfy the wide range of memory
managment strategies needed by a language platform such as D.

If one could compile-in a different flavor of GC (stop-and-collect, incremental,
compacting, generational/copying, lazy-as-hell, none-at-all?) you could at least
pick the set of consequences that you're stuck with.

EricAnderton at --yahoo--

Oct 28 2004

Sean Kelly <sean f4.ca> writes:

In article <clrn6d$flm$1 digitaldaemon.com>, pragma says...
BTW, has anyone done anything to make the GC more pluggable?  Anyone out there
done any GC hacking on D already?  I'm curious to see what has been, and what
could be, done in this area.

Not really, though I've been gutting Phobos in a more general sense.  I would
like to modularize all the internals, but haven't quite gotten this far yet--I'm
still working on separating out a minimal D runtime from the rest of Phobos.
Follow up in the Ares forum or here if you're so inclined.


Sean

Oct 28 2004

Russ Lewis <spamhole-2001-07-16 deming-os.org> writes:

pragma wrote:
 In article <clpedn$l7p$1 digitaldaemon.com>, Dave says...
 
[snip]

All that to underscore that the one area where D seems to lag right now is
in GC allocation, and that seems to be strongly tied to the collection that
kicks in before the heap is expanded:

 
 I suppose you could always disable/enable the more sensitive and critical
 sections of your code to side-step this.  Of course, it just puts of the
 inevidable, so its really a band-aid approach at best.

I'd like to quibble a bit here.  The cost of performing a GC sweep 
(given our current, mark-and-sweep collector) is roughly propotional to 
the current *active* set.  So if you put off the GC run for a while, and 
generate a lot more garbage in the meantime (but don't increase the 
active set of your program) then running a GC sweep at some later date 
is no more costly than now.

That is, by delaying the GC sweep a while, you aren't necessarily making 
the next GC sweep any longer.

Of course, as garbage piles up, that means that you have more and more 
destructors to run, so there is some increased cost, but the total cost 
here (sum over the lifetime of the program) is proportional to the 
number of garbage objects you create, and has nothing to do with the 
frequency with which the GC runs.

What this all means is that, IMHO, you should try to limit the number of 
times when the GC runs, and when it runs, you want it to run when you 
have a small working set.  In other words, if you have an algorithm that 
creates lots and lots of objects, and then releases most of them, you 
might run the GC before and after the algorithm, but leave it disabled 
during the algorithm.

Oct 28 2004

Kevin Bealer <Kevin_member pathlink.com> writes:

I think you could get this optimization with something like:

:class MemHandle {
:  byte * data;
:  uint   length;
:  this(uint x) { data = malloc(length=x); }
:  ~this() { free(data); }
:}

I think the malloc'd area should be free of GC interactions (it won't be
scanned) unless the GC's "addRange" is called.  This object will ensure
collection happens (so keep a pointer to it).

Maybe it would be cleaner to directly support objects that have "in" pointers
but not "out" pointers (char arrays for example).

Kevin

In article <clho8q$pnt$1 digitaldaemon.com>, Ben Hinkle says...
I have a question about garbage that I'm curious if anyone can help out
with. I'm wondering what percentage of memory allocations contain pointers.
A string, for example, doesn't contain pointers. An array of pointers does
contain pointers. What's the typical breakdown? I'm curious because I'm
experimenting with the GC and if the "new" operator can tell if the thing
it is allocating doesn't have any pointers then it can allocate it from a
region of the GC that doesn't get scanned for pointers - hence speed up the
marking phase of a collection. I'm guessing some paper exists out there
that has studied this but I can't find anything useful.

thanks,
Ben

Oct 28 2004

D Programming

C/C++ Programming

Other

digitalmars.D - gc question