digitalmars.D - State of and plans for the garbage collector

Jonathan M Davis (20/20) Jul 15 2010 Okay. I really don't know much about garbage collectors, how they work, ...

Vladimir Panteleev (20/29) Jul 15 2010 IMO the D GC isn't bad, it's just mediocre. (It could have been way wors...

Robert Jacques (3/30) Jul 15 2010 Wouldn't that mean it can't handle cycles?

Leandro Lucarella (18/39) Jul 15 2010 I'm working on a concurrent GC, things are going really slow, but I plan
Bane (2/25) Jul 15 2010

Bane (1/5) Jul 15 2010

dsimcha (20/25) Jul 15 2010 one GC type to chose when compiling would be very interesting thing, if ...

Leandro Lucarella (14/40) Jul 15 2010 Not completely, if you have a multi-core, suddenly your program becomes

dsimcha (3/33) Jul 15 2010 In theory. I thought that in practice concurrent GC's (I'm not talking ...

Vladimir Panteleev (10/14) Jul 15 2010 In D1, the garbage collector is actually compiled to a stand-alone
Leandro Lucarella (15/24) Jul 15 2010 The GC I'm working on is configurable at startup (runtime) via

Michael Rynn (42/44) Jul 18 2010 Me too.

Jonathan M Davis <jmdavisprog gmail.com> writes:

Okay. I really don't know much about garbage collectors, how they work, or what 
makes one particularly good or bad (other than the fact that it needs to be 
efficient execution-wise and manage memory wisely so that you don't use too
much 
of it or do anything else that would be an overall negative for performance). 
However, from the comments here - both recent and in the past - it's pretty 
clear that D's garbage collector is fairly outdated. I would assume that that 
would be negative for performance - certainly it would mean that significant 
improvements could be made.

So, my question is this: what are the plans for the garbage collector? Is the 
intention to continue to improve it bit by bit, to give it a major overhaul at 
some point, to outright replace it at a later date, or something else entirely?

If D is going to compete with C and C++, it needs to be highly efficient, and
if 
the garbage collector isn't up to snuff, that's going to be a big problem. I'm 
not looking to complain about the current garbage collector - I really don't 
know how good or bad it is - but if it is rather poor (as I've gotten the 
impresison that it is - at least in some respects - from various discussions on 
it here), then I'd assume that it needs a major overhaul or replacement at some 
point. So, are there any specific plans with regards to that, or is that just 
something that may be considered in the future?

- Jonathan M Davis

Jul 15 2010

"Vladimir Panteleev" <vladimir thecybershadow.net> writes:

On Thu, 15 Jul 2010 10:18:38 +0300, Jonathan M Davis  
<jmdavisprog gmail.com> wrote:

 Okay. I really don't know much about garbage collectors, how they work,  
 or what makes one particularly good or bad (other than the fact that it  
 needs to be efficient execution-wise and manage memory wisely so that  
 you don't use too much of it or do anything else that would be an  
 overall negative for performance). However, from the comments here -  
 both recent and in the past - it's pretty clear that D's garbage  
 collector is fairly outdated. I would assume that that would be negative  
 for performance - certainly it would mean that significant improvements  
 could be made.

IMO the D GC isn't bad, it's just mediocre. (It could have been way worse.)

I would like to use this opportunity to bring up a GC implementation  
written by Jeremie Pelletier, which seems to have gone mostly unnoticed  
when it was posted as a reply to D.announce:
http://pastebin.com/f7a3b4c4a

Aside from multiple optimizations across the board, this GC employs an  
interesting different strategy. The gist of it is that it iteratively  
destroys only objects that have no immediate references. In the case of  
long linked lists, this trades destruction complexity with scan  
complexity, which is a very good change - most times deeply-nested  
structures such as linked lists survive multiple generational cycles.

Jeremie, if you're reading this: how goes your D2 runtime project?

(I also have an unfinished generational GC lying around, which is still  
unknown if it's viable performance-wise - I should really try to finish it  
one day.)

-- 
Best regards,
  Vladimir                            mailto:vladimir thecybershadow.net

Jul 15 2010

"Robert Jacques" <sandford jhu.edu> writes:

On Thu, 15 Jul 2010 04:28:43 -0400, Vladimir Panteleev  
<vladimir thecybershadow.net> wrote:

 On Thu, 15 Jul 2010 10:18:38 +0300, Jonathan M Davis  
 <jmdavisprog gmail.com> wrote:

 Okay. I really don't know much about garbage collectors, how they work,  
 or what makes one particularly good or bad (other than the fact that it  
 needs to be efficient execution-wise and manage memory wisely so that  
 you don't use too much of it or do anything else that would be an  
 overall negative for performance). However, from the comments here -  
 both recent and in the past - it's pretty clear that D's garbage  
 collector is fairly outdated. I would assume that that would be  
 negative for performance - certainly it would mean that significant  
 improvements could be made.

 IMO the D GC isn't bad, it's just mediocre. (It could have been way  
 worse.)

 I would like to use this opportunity to bring up a GC implementation  
 written by Jeremie Pelletier, which seems to have gone mostly unnoticed  
 when it was posted as a reply to D.announce:
 http://pastebin.com/f7a3b4c4a

 Aside from multiple optimizations across the board, this GC employs an  
 interesting different strategy. The gist of it is that it iteratively  
 destroys only objects that have no immediate references. In the case of  
 long linked lists, this trades destruction complexity with scan  
 complexity, which is a very good change - most times deeply-nested  
 structures such as linked lists survive multiple generational cycles.

Wouldn't that mean it can't handle cycles?

 Jeremie, if you're reading this: how goes your D2 runtime project?

 (I also have an unfinished generational GC lying around, which is still  
 unknown if it's viable performance-wise - I should really try to finish  
 it one day.)

Jul 15 2010

Leandro Lucarella <luca llucax.com.ar> writes:

Jonathan M Davis, el 15 de julio a las 00:18 me escribiste:
 Okay. I really don't know much about garbage collectors, how they work, or
what 
 makes one particularly good or bad (other than the fact that it needs to be 
 efficient execution-wise and manage memory wisely so that you don't use too
much 
 of it or do anything else that would be an overall negative for performance). 
 However, from the comments here - both recent and in the past - it's pretty 
 clear that D's garbage collector is fairly outdated. I would assume that that 
 would be negative for performance - certainly it would mean that significant 
 improvements could be made.
 
 So, my question is this: what are the plans for the garbage collector? Is the 
 intention to continue to improve it bit by bit, to give it a major overhaul at 
 some point, to outright replace it at a later date, or something else entirely?
 
 If D is going to compete with C and C++, it needs to be highly efficient, and
if 
 the garbage collector isn't up to snuff, that's going to be a big problem. I'm 
 not looking to complain about the current garbage collector - I really don't 
 know how good or bad it is - but if it is rather poor (as I've gotten the 
 impresison that it is - at least in some respects - from various discussions
on 
 it here), then I'd assume that it needs a major overhaul or replacement at
some 
 point. So, are there any specific plans with regards to that, or is that just 
 something that may be considered in the future?

I'm working on a concurrent GC, things are going really slow, but I plan
to give it more attention this months, and I hope it will be finished
(finished in my own terms, as it is my thesis) by the end of the year.
I would like to explore merging the precise scanning patch and some
other optimizations, collection strategies, but I'm not sure I will have
the time (probably not).

I'm not sure how it would turn out, even when I'm doing regular
benchmarks to ensure, at least, that the current performance is not
degraded (I didn't started doing the collection concurrently).

One more note: I'm working with D1, but using the Tango runtime, so
I guess it should be not to hard to port to D2.

-- 
Leandro Lucarella (AKA luca)                     http://llucax.com.ar/
----------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------
Every day 21 new born babies will be given to the wrong parents

Jul 15 2010

Bane <branimir.milosavljevic gmail.com> writes:

If I had to chose one topic with most bitchin' on this newsgroup I have
impression it would be the one about GC. They usually goes from 'GC managed
programs are slow, D ain't good enough', to 'language X has better GC than D',
to ' GC that D has is bad at Z'.

Why not make D summer of code - write your own GC optimized for special case of
XYZ, send it, bundle all up in D with compiler switch '--useGC=XYZ'. That is
only way to really compare what is best for special case.

 Okay. I really don't know much about garbage collectors, how they work, or
what 
 makes one particularly good or bad (other than the fact that it needs to be 
 efficient execution-wise and manage memory wisely so that you don't use too
much 
 of it or do anything else that would be an overall negative for performance). 
 However, from the comments here - both recent and in the past - it's pretty 
 clear that D's garbage collector is fairly outdated. I would assume that that 
 would be negative for performance - certainly it would mean that significant 
 improvements could be made.
 
 So, my question is this: what are the plans for the garbage collector? Is the 
 intention to continue to improve it bit by bit, to give it a major overhaul at 
 some point, to outright replace it at a later date, or something else entirely?
 
 If D is going to compete with C and C++, it needs to be highly efficient, and
if 
 the garbage collector isn't up to snuff, that's going to be a big problem. I'm 
 not looking to complain about the current garbage collector - I really don't 
 know how good or bad it is - but if it is rather poor (as I've gotten the 
 impresison that it is - at least in some respects - from various discussions
on 
 it here), then I'd assume that it needs a major overhaul or replacement at
some 
 point. So, are there any specific plans with regards to that, or is that just 
 something that may be considered in the future?
 
 - Jonathan M Davis

Jul 15 2010

Bane <branimir.milosavljevic gmail.com> writes:

Anyway, I'm here bitching myself :) Just want to say that idea to have more
than one GC type to chose when compiling would be very interesting thing, if
single implementation can't be good for all cases.

 If I had to chose one topic with most bitchin' on this newsgroup I have
impression it would be the one about GC. They usually goes from 'GC managed
programs are slow, D ain't good enough', to 'language X has better GC than D',
to ' GC that D has is bad at Z'.
 
 Why not make D summer of code - write your own GC optimized for special case
of XYZ, send it, bundle all up in D with compiler switch '--useGC=XYZ'. That is
only way to really compare what is best for special case.

Jul 15 2010

dsimcha <dsimcha yahoo.com> writes:

== Quote from Bane (branimir.milosavljevic gmail.com)'s article
 Anyway, I'm here bitching myself :) Just want to say that idea to have more
than

one GC type to chose when compiling would be very interesting thing, if single
implementation can't be good for all cases.
 If I had to chose one topic with most bitchin' on this newsgroup I have


impression it would be the one about GC. They usually goes from 'GC managed
programs are slow, D ain't good enough', to 'language X has better GC than D',
to
' GC that D has is bad at Z'.
 Why not make D summer of code - write your own GC optimized for special case


of XYZ, send it, bundle all up in D with compiler switch '--useGC=XYZ'. That is
only way to really compare what is best for special case.


If/when we have enough manpower to write/maintain multiple GC's, here are some
GC
designs that can be either optimizations or pessimizations depending on use
case:

1.  Precise heap scanning (i.e. in the absence of copying/heap compaction).  If
you allocate mostly small objects, you're probably very seldom bitten by false
pointers and the O(1) overhead per block needed to store type info may be
significant.  If you allocate mostly large objects, you've probably been eaten
alive by false pointers and the O(1) per block overhead is negligible to you.

2.  Concurrent collection.  If you use threads for concurreny/latency reasons,
this can be a boon.  If you use threads for parallelism/throughput reasons, or
write single-threaded code, this is a complete waste.

3.  Thread local allocators.  Quite simply, it's a space-speed tradeoff, and it
depends which is more important to you.

Jul 15 2010

Leandro Lucarella <luca llucax.com.ar> writes:

dsimcha, el 15 de julio a las 19:23 me escribiste:
 == Quote from Bane (branimir.milosavljevic gmail.com)'s article
 Anyway, I'm here bitching myself :) Just want to say that idea to have more
than

 one GC type to chose when compiling would be very interesting thing, if single
 implementation can't be good for all cases.
 If I had to chose one topic with most bitchin' on this newsgroup I have


 impression it would be the one about GC. They usually goes from 'GC managed
 programs are slow, D ain't good enough', to 'language X has better GC than D',
to
 ' GC that D has is bad at Z'.
 Why not make D summer of code - write your own GC optimized for special case


 of XYZ, send it, bundle all up in D with compiler switch '--useGC=XYZ'. That is
 only way to really compare what is best for special case.


 
 If/when we have enough manpower to write/maintain multiple GC's, here are some
GC
 designs that can be either optimizations or pessimizations depending on use
case:
 
 1.  Precise heap scanning (i.e. in the absence of copying/heap compaction).  If
 you allocate mostly small objects, you're probably very seldom bitten by false
 pointers and the O(1) overhead per block needed to store type info may be
 significant.  If you allocate mostly large objects, you've probably been eaten
 alive by false pointers and the O(1) per block overhead is negligible to you.
 
 2.  Concurrent collection.  If you use threads for concurreny/latency reasons,
 this can be a boon.  If you use threads for parallelism/throughput reasons, or
 write single-threaded code, this is a complete waste.

Not completely, if you have a multi-core, suddenly your program becomes
multi-threaded and uses more than 1 core. In that case, a concurrent GC
could improve the overall performance of the application.

-- 
Leandro Lucarella (AKA luca)                     http://llucax.com.ar/
----------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------
- Tata Dios lo creó a usté solamente pa despertar al pueblo y fecundar
  las gayinas.
- Otro constrasentido divino... Quieren que yo salga de joda con las
  hembras y después quieren que madrugue.
	-- Inodoro Pereyra y un gallo

Jul 15 2010

dsimcha <dsimcha yahoo.com> writes:

== Quote from Leandro Lucarella (luca llucax.com.ar)'s article
 dsimcha, el 15 de julio a las 19:23 me escribiste:
 == Quote from Bane (branimir.milosavljevic gmail.com)'s article
 Anyway, I'm here bitching myself :) Just want to say that idea to have more
than

 one GC type to chose when compiling would be very interesting thing, if single
 implementation can't be good for all cases.
 If I had to chose one topic with most bitchin' on this newsgroup I have


 impression it would be the one about GC. They usually goes from 'GC managed
 programs are slow, D ain't good enough', to 'language X has better GC than D',
to
 ' GC that D has is bad at Z'.
 Why not make D summer of code - write your own GC optimized for special case


 of XYZ, send it, bundle all up in D with compiler switch '--useGC=XYZ'. That is
 only way to really compare what is best for special case.


 If/when we have enough manpower to write/maintain multiple GC's, here are some
GC
 designs that can be either optimizations or pessimizations depending on use
case:

 1.  Precise heap scanning (i.e. in the absence of copying/heap compaction).  If
 you allocate mostly small objects, you're probably very seldom bitten by false
 pointers and the O(1) overhead per block needed to store type info may be
 significant.  If you allocate mostly large objects, you've probably been eaten
 alive by false pointers and the O(1) per block overhead is negligible to you.

 2.  Concurrent collection.  If you use threads for concurreny/latency reasons,
 this can be a boon.  If you use threads for parallelism/throughput reasons, or
 write single-threaded code, this is a complete waste.

 Not completely, if you have a multi-core, suddenly your program becomes
 multi-threaded and uses more than 1 core. In that case, a concurrent GC
 could improve the overall performance of the application.

In theory.  I thought that in practice concurrent GC's (I'm not talking about
the
case of thread-local heaps) are always slower throughput-wise than
stop-the-world.

Jul 15 2010

"Vladimir Panteleev" <vladimir thecybershadow.net> writes:

On Thu, 15 Jul 2010 21:34:36 +0300, Bane  
<branimir.milosavljevic gmail.com> wrote:

 Why not make D summer of code - write your own GC optimized for special  
 case of XYZ, send it, bundle all up in D with compiler switch  
 '--useGC=XYZ'. That is only way to really compare what is best for  
 special case.

In D1, the garbage collector is actually compiled to a stand-alone  
library, but then it's statically linked into Phobos. If you edit the  
Phobos makefile a bit, you should then be able to specify the GC .lib to  
link to on the compiler/linker command-line.

In D2 it looks like the GC is clumped together with the runtime.

-- 
Best regards,
  Vladimir                            mailto:vladimir thecybershadow.net

Jul 15 2010

Leandro Lucarella <luca llucax.com.ar> writes:

Bane, el 15 de julio a las 14:34 me escribiste:
 If I had to chose one topic with most bitchin' on this newsgroup
 I have impression it would be the one about GC. They usually goes from
 'GC managed programs are slow, D ain't good enough', to 'language
 X has better GC than D', to ' GC that D has is bad at Z'.
 
 Why not make D summer of code - write your own GC optimized for
 special case of XYZ, send it, bundle all up in D with compiler switch
 '--useGC=XYZ'. That is only way to really compare what is best for
 special case.

The GC I'm working on is configurable at startup (runtime) via
environment variables. The idea is to be able to tune the GC for
different programs without even recompiling.

I already made the actual compile time options for memory stomping and
sentinel configurable at runtime and it works great, there is no
noticeable performance penalty when you don't use them either.

I think this is really the way to go.

-- 
Leandro Lucarella (AKA luca)                     http://llucax.com.ar/
----------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------
Los pobres buscan su destino. Acá está; ¿no lo ven?
	-- Emilio Vaporeso. Marzo de 1914

Jul 15 2010

Michael Rynn <michaelrynn optusnet.com.au> writes:

On Thu, 15 Jul 2010 00:18:38 -0700, Jonathan M Davis wrote:

 Okay. I really don't know much about garbage collectors, how they work,
 or what makes one particularly good or bad

Me too.

But its a central part of the D langauge, its design and runtime.
Even though D programmmers can replace allocators and do manual memory 
management, I would like to know if anyone has tried to do this in a big 
way.

Without GC, programming with the DPL would be a different beast.

It would seem to me that type information, GC design, memory allocation 
schemes are by dependence necessarily integrated together to try to get 
efficient performance. There must be loads of academic research projects 
done on this already, that would show what the trade-offs are.

The first thing I notice is that memory blocks allocated by GC are
either marked as either needing to be checked or are not checked at all.  
As far as I know there is no futher refining meta data, for instance a 
list of 32/64 bit offsets which can be excluded or included for being 
pointers to data.  Such a list would be of the form

[offset-array length, exclusive/inclusive], offset1, offset2, offsetn.


Its slower code to walk through such list than the method of the current 
GC, which from my remembrance of an earlier newsgroup post, just bit-ands 
every value to see if looks like a memory aligned address or not.

Having the option of a more careful check may resolve the residual cases.

There is a trade off between the amount of information available to the 
GC, and how effective it will be.  More information also means a cost of 
attaching another pointer for the information to each memory block type,
with implications on how the memory blocks might be pooled.

I read that the .NET garbage collector uses much more complete metadata 
stored in the assembly to know what part of objects contain addresses.
I am sure its a state of the art GC, but I know nothing of its guts.

So the GC depends on how the D compiler may compile in a accessible 
memory descriptor  it to each typeinfo, type allocation pool, and 
integrate this into the runtime allocation design and memory block 
scanning.

This gives the GC an option of resolving the harder cases of false 
pointers for objects not released by the conservative scan which does not 
use type information. 

Without the type information, the GC and memory allocation experimenter 
is limited to a reduced subset of models and algorithms, and so the 
possibility of having a choice of something better than the Boehm 
conservative collector (I think it actually can use some form of meta 
data), is restricted.

-->= -->=

Jul 18 2010

D Programming

C/C++ Programming

Other

digitalmars.D - State of and plans for the garbage collector