www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.announce - Antti-Ville Tuuainen Passes GSoC Final Evaluation

reply "dsimcha" <dsimcha yahoo.com> writes:
Congratulations, Antti-Ville!  This project creates a better 
implementation of precise GC heap scanning than anything that's 
been created so far for D.  The goal is to eventually integrate 
it into standard D distributions.  Any volunteers for beta 
testing?

Code:

https://github.com/Tuna-Fish/druntime/tree/gc_poolwise_bitmap

The code for this project is a fork of druntime.  The master 
branch was a failed (or less successful) experiment.  The version 
we're going with for integration is the gc_poolwise_bitmap branch.
Aug 21 2012
next sibling parent "Bernard Helyer" <b.helyer gmail.com> writes:
Congrats!
Aug 21 2012
prev sibling next sibling parent "Simen Kjaeraas" <simen.kjaras gmail.com> writes:
On Tue, 21 Aug 2012 15:15:48 +0200, dsimcha <dsimcha yahoo.com> wrote:

 Congratulations, Antti-Ville!  This project creates a better  
 implementation of precise GC heap scanning than anything that's been  
 created so far for D.  The goal is to eventually integrate it into  
 standard D distributions.  Any volunteers for beta testing?

 Code:

 https://github.com/Tuna-Fish/druntime/tree/gc_poolwise_bitmap

 The code for this project is a fork of druntime.  The master branch was  
 a failed (or less successful) experiment.  The version we're going with  
 for integration is the gc_poolwise_bitmap branch.
Congratulations! Any numbers on how this is better than the status quo? -- Simen
Aug 21 2012
prev sibling parent reply "David Nadlinger" <see klickverbot.at> writes:
On Tuesday, 21 August 2012 at 13:15:49 UTC, dsimcha wrote:
 Congratulations, Antti-Ville!  This project creates a better 
 implementation of precise GC heap scanning than anything that's 
 been created so far for D.  The goal is to eventually integrate 
 it into standard D distributions.
Great! I've been pretty much out of touch with the GSoC projects this summers, as I had a rather intense exam session at university – are there any progress reports/blog posts with actual numbers in them? David
Aug 22 2012
parent reply "dsimcha" <dsimcha yahoo.com> writes:
On Wednesday, 22 August 2012 at 20:04:12 UTC, David Nadlinger 
wrote:
 On Tuesday, 21 August 2012 at 13:15:49 UTC, dsimcha wrote:
 Congratulations, Antti-Ville!  This project creates a better 
 implementation of precise GC heap scanning than anything 
 that's been created so far for D.  The goal is to eventually 
 integrate it into standard D distributions.
Great! I've been pretty much out of touch with the GSoC projects this summers, as I had a rather intense exam session at university – are there any progress reports/blog posts with actual numbers in them? David
Unfortunately not yet. Antti-ville ran into a lot if difficulty getting things to work and spent the first half of GSoC on an approach that turned out to be the wrong one in hindsight. His project was originally supposed to be improving concurrency support in the GC, but when rtinfo was added to TypeInfo, precise heap scanning seemed like too good an opportunity to pass up. To do this project he had to learn the D template system from scratch. Therefore we didn't have the luxury of the luxury of doing extensive benchmarking and documentation. I plan to work on that with him before he creates a pull request to integrate his new and improved GC upstream.
Aug 22 2012
parent reply Chad J <chadjoan __spam.is.bad__gmail.com> writes:
On 08/22/2012 05:41 PM, dsimcha wrote:
 On Wednesday, 22 August 2012 at 20:04:12 UTC, David Nadlinger wrote:
 On Tuesday, 21 August 2012 at 13:15:49 UTC, dsimcha wrote:
 Congratulations, Antti-Ville! This project creates a better
 implementation of precise GC heap scanning than anything that's been
 created so far for D. The goal is to eventually integrate it into
 standard D distributions.
Great! I've been pretty much out of touch with the GSoC projects this summers, as I had a rather intense exam session at university – are there any progress reports/blog posts with actual numbers in them? David
Unfortunately not yet. Antti-ville ran into a lot if difficulty getting things to work and spent the first half of GSoC on an approach that turned out to be the wrong one in hindsight. His project was originally supposed to be improving concurrency support in the GC, but when rtinfo was added to TypeInfo, precise heap scanning seemed like too good an opportunity to pass up. To do this project he had to learn the D template system from scratch. Therefore we didn't have the luxury of the luxury of doing extensive benchmarking and documentation. I plan to work on that with him before he creates a pull request to integrate his new and improved GC upstream.
Poolwise bitmap... what an interesting name. I'll look forward to learning about the concepts behind it!
Aug 22 2012
parent reply Rory McGuire <rjmcguire gmail.com> writes:
On Thu, Aug 23, 2012 at 4:01 AM, Chad J
<chadjoan __spam.is.bad__gmail.com>wrote:

 Poolwise bitmap... what an interesting name.  I'll look forward to
 learning about the concepts behind it!
+1
Aug 23 2012
parent reply "dsimcha" <dsimcha yahoo.com> writes:
On Thursday, 23 August 2012 at 11:40:22 UTC, Rory McGuire wrote:
 On Thu, Aug 23, 2012 at 4:01 AM, Chad J
 <chadjoan __spam.is.bad__gmail.com>wrote:

 Poolwise bitmap... what an interesting name.  I'll look 
 forward to
 learning about the concepts behind it!
+1
Basically, the idea is to store information about what is and isn't a pointer at the pool level instead of at the block level. My attempt from a long time ago at precise heap scanning, and Antti-Ville's first attempt, stored meta-data at the end of every allocated block. This worked well for large arrays, but was terribly inefficient for smaller allocations and made the GC code even messier than it already is. The overhead was a fixed (void*).sizeof bits per block. Now, each pool has a bit array that contains one bit for every possible aligned pointer. The overhead is always 1 bit for every (void*).sizeof bytes no matter how large or small the block is.
Aug 23 2012
parent reply Rory McGuire <rjmcguire gmail.com> writes:
On Thu, Aug 23, 2012 at 2:51 PM, dsimcha <dsimcha yahoo.com> wrote:

 Basically, the idea is to store information about what is and isn't a
 pointer at the pool level instead of at the block level.  My attempt from a
 long time ago at precise heap scanning, and Antti-Ville's first attempt,
 stored meta-data at the end of every allocated block.  This worked well for
 large arrays, but was terribly inefficient for smaller allocations and made
 the GC code even messier than it already is.  The overhead was a fixed
 (void*).sizeof bits per block.  Now, each pool has a bit array that
 contains one bit for every possible aligned pointer.  The overhead is
 always 1 bit for every (void*).sizeof bytes no matter how large or small
 the block is.
Am I correct in thinking that this is still single threaded stop the world? Any chance of the code being documented extensively in the hopes that it would encourage participation/experimentation? Thanks for all the work you guys have put in.
Aug 23 2012
parent reply =?UTF-8?B?QWxleCBSw7hubmUgUGV0ZXJzZW4=?= <alex lycus.org> writes:
On 23-08-2012 15:21, Rory McGuire wrote:
 On Thu, Aug 23, 2012 at 2:51 PM, dsimcha <dsimcha yahoo.com
 <mailto:dsimcha yahoo.com>> wrote:

     Basically, the idea is to store information about what is and isn't
     a pointer at the pool level instead of at the block level.  My
     attempt from a long time ago at precise heap scanning, and
     Antti-Ville's first attempt, stored meta-data at the end of every
     allocated block.  This worked well for large arrays, but was
     terribly inefficient for smaller allocations and made the GC code
     even messier than it already is.  The overhead was a fixed
     (void*).sizeof bits per block.  Now, each pool has a bit array that
     contains one bit for every possible aligned pointer.  The overhead
     is always 1 bit for every (void*).sizeof bytes no matter how large
     or small the block is.


 Am I correct in thinking that this is still single threaded stop the world?
Yes, but parallelization of the mark phase is fairly trivial, and something we should probably look into. The GC will probably always be STW unless we get compiler support for inserting GC barriers.
 Any chance of the code being documented extensively in the hopes that it
 would encourage participation/experimentation?

 Thanks for all the work you guys have put in.
-- Alex Rønne Petersen alex lycus.org http://lycus.org
Aug 23 2012
next sibling parent reply "dsimcha" <dsimcha yahoo.com> writes:
On Thursday, 23 August 2012 at 14:38:19 UTC, Alex Rønne Petersen 
wrote:
 Yes, but parallelization of the mark phase is fairly trivial, 
 and something we should probably look into.
Ironically, Antti-ville's original proposal involved parallelization. This was scrapped because after rtinfo was added, we agreed that precise heap scanning was more important and looked newly feasible.
Aug 23 2012
parent reply =?UTF-8?B?QWxleCBSw7hubmUgUGV0ZXJzZW4=?= <alex lycus.org> writes:
On 23-08-2012 16:47, dsimcha wrote:
 On Thursday, 23 August 2012 at 14:38:19 UTC, Alex Rønne Petersen wrote:
 Yes, but parallelization of the mark phase is fairly trivial, and
 something we should probably look into.
Ironically, Antti-ville's original proposal involved parallelization. This was scrapped because after rtinfo was added, we agreed that precise heap scanning was more important and looked newly feasible.
Oh, I agree it's more important. The GC is reasonably fast as-is, but has severe issues with false pointers. Parallel marking is just in the "nice to have" category. -- Alex Rønne Petersen alex lycus.org http://lycus.org
Aug 23 2012
next sibling parent "renoX" <renozyx gmail.com> writes:
On Thursday, 23 August 2012 at 14:49:11 UTC, Alex Rønne Petersen 
wrote:
 On 23-08-2012 16:47, dsimcha wrote:
 On Thursday, 23 August 2012 at 14:38:19 UTC, Alex Rønne 
 Petersen wrote:
 Yes, but parallelization of the mark phase is fairly trivial, 
 and
 something we should probably look into.
Ironically, Antti-ville's original proposal involved parallelization. This was scrapped because after rtinfo was added, we agreed that precise heap scanning was more important and looked newly feasible.
Oh, I agree it's more important. The GC is reasonably fast as-is, but has severe issues with false pointers. Parallel marking is just in the "nice to have" category.
Funny, I've just read an interview from Narihiro Nakamura who is working on parallel marking in Ruby: http://rubysource.com/narihiro-nakamura-rubys-gc-innovator/ BR, renoX
Aug 23 2012
prev sibling parent "renoX" <renozyx gmail.com> writes:
On Thursday, 23 August 2012 at 14:49:11 UTC, Alex Rønne Petersen 
wrote:
 On 23-08-2012 16:47, dsimcha wrote:
 On Thursday, 23 August 2012 at 14:38:19 UTC, Alex Rønne 
 Petersen wrote:
 Yes, but parallelization of the mark phase is fairly trivial, 
 and
 something we should probably look into.
Ironically, Antti-ville's original proposal involved parallelization. This was scrapped because after rtinfo was added, we agreed that precise heap scanning was more important and looked newly feasible.
Oh, I agree it's more important. The GC is reasonably fast as-is, but has severe issues with false pointers. Parallel marking is just in the "nice to have" category.
Funny, I've just read an interview from Narihiro Nakamura who is working on parallel marking in Ruby: http://rubysource.com/narihiro-nakamura-rubys-gc-innovator/ BR, renoX
Aug 24 2012
prev sibling next sibling parent Leandro Lucarella <luca llucax.com.ar> writes:
Alex Rønne Petersen, el 23 de August a las 16:38 me escribiste:
 On 23-08-2012 15:21, Rory McGuire wrote:
On Thu, Aug 23, 2012 at 2:51 PM, dsimcha <dsimcha yahoo.com
<mailto:dsimcha yahoo.com>> wrote:

    Basically, the idea is to store information about what is and isn't
    a pointer at the pool level instead of at the block level.  My
    attempt from a long time ago at precise heap scanning, and
    Antti-Ville's first attempt, stored meta-data at the end of every
    allocated block.  This worked well for large arrays, but was
    terribly inefficient for smaller allocations and made the GC code
    even messier than it already is.  The overhead was a fixed
    (void*).sizeof bits per block.  Now, each pool has a bit array that
    contains one bit for every possible aligned pointer.  The overhead
    is always 1 bit for every (void*).sizeof bytes no matter how large
    or small the block is.


Am I correct in thinking that this is still single threaded stop the world?
Yes, but parallelization of the mark phase is fairly trivial, and something we should probably look into. The GC will probably always be STW unless we get compiler support for inserting GC barriers.
Feel free to use my work[1] to have a cloning collector which is not STW in the classical sense (it just STW for the time it takes to pause the threads and fork() the process), at least on *nixes. Works fairly well[2], even in production code. There's is only one issue[3], but a nasty one, because of a glibc internal lock shared between fork() and other glibc functions like malloc and the GC using signals to interrupt the threads. I hope I can fix this problem eventually (the ideal for me would be to find another way to pause the threads). [1] http://www.dsource.org/projects/tango/browser/trunk/tango/core/rt/gc/cdgc [2] http://www.llucax.com.ar/blog/blog/post/-1a4bdfba [3] http://www.dsource.org/projects/tango/ticket/2087 -- Leandro Lucarella (AKA luca) http://llucax.com.ar/ ---------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------- Si por el chancho fuera, se autocomería con chimichurri Worshestershire!
Aug 23 2012
prev sibling next sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2012-08-23 16:38, Alex Rønne Petersen wrote:

 Yes, but parallelization of the mark phase is fairly trivial, and
 something we should probably look into.

 The GC will probably always be STW unless we get compiler support for
 inserting GC barriers.
Would a thread local GC be possible, and desirable? To my understanding, which is not much, that means the GC will only run in one thread (or multiple) and only needs to stop that/those thread(s). That also means it only need to search for dead objects in the heap/storage area for that particular thread (and where these point to). If I understand this correctly this would be perfect for D since everything is thread local by default. There's also a global heap for global objects or objects shared between threads. -- /Jacob Carlborg
Aug 23 2012
parent reply "Simen Kjaeraas" <simen.kjaras gmail.com> writes:
On Fri, 24 Aug 2012 08:27:09 +0200, Jacob Carlborg <doob me.com> wrote:

 On 2012-08-23 16:38, Alex R=C3=B8nne Petersen wrote:

 Yes, but parallelization of the mark phase is fairly trivial, and
 something we should probably look into.

 The GC will probably always be STW unless we get compiler support for=
 inserting GC barriers.
Would a thread local GC be possible, and desirable? To my understandin=
g, =
 which is not much, that means the GC will only run in one thread (or  =
 multiple) and only needs to stop that/those thread(s). That also means=
=
 it only need to search for dead objects in the heap/storage area for  =
 that particular thread (and where these point to).

 If I understand this correctly this would be perfect for D since  =
 everything is thread local by default.

 There's also a global heap for global objects or objects shared betwee=
n =
 threads.
It certainly would be possible. But I believe it requires thread-local heaps, and some way of keeping track of when an object changes owning thread. Perhaps a mark phase could run on each thread's heap, and unreferenced objects be moved to a global list of potentially dead objects. If after all threads have run a mark, none have claimed the objects, they're collected. Then again, I'm hardly a GC architect. -- = Simen
Aug 24 2012
parent Jacob Carlborg <doob me.com> writes:
On 2012-08-24 09:32, Simen Kjaeraas wrote:

 It certainly would be possible. But I believe it requires thread-local
 heaps, and some way of keeping track of when an object changes owning
 thread.
Yeah, I was think about that. Are thread local heaps a problem?
 Perhaps a mark phase could run on each thread's heap, and unreferenced
 objects be moved to a global list of potentially dead objects. If after
 all threads have run a mark, none have claimed the objects, they're
 collected.

 Then again, I'm hardly a GC architect.
Me neither. -- /Jacob Carlborg
Aug 24 2012
prev sibling parent "renoX" <renozyx gmail.com> writes:
On Thursday, 23 August 2012 at 14:38:19 UTC, Alex Rønne Petersen 
wrote:
 On 23-08-2012 15:21, Rory McGuire wrote:
 On Thu, Aug 23, 2012 at 2:51 PM, dsimcha <dsimcha yahoo.com
 <mailto:dsimcha yahoo.com>> wrote:

    Basically, the idea is to store information about what is 
 and isn't
    a pointer at the pool level instead of at the block level.  
 My
    attempt from a long time ago at precise heap scanning, and
    Antti-Ville's first attempt, stored meta-data at the end of 
 every
    allocated block.  This worked well for large arrays, but was
    terribly inefficient for smaller allocations and made the 
 GC code
    even messier than it already is.  The overhead was a fixed
    (void*).sizeof bits per block.  Now, each pool has a bit 
 array that
    contains one bit for every possible aligned pointer.  The 
 overhead
    is always 1 bit for every (void*).sizeof bytes no matter 
 how large
    or small the block is.


 Am I correct in thinking that this is still single threaded 
 stop the world?
Yes, but parallelization of the mark phase is fairly trivial, and something we should probably look into.
Funny coincidence, I've just read an interview from Narihiro Nakamura who is working on parallel marking in Ruby: http://rubysource.com/narihiro-nakamura-rubys-gc-innovator/ BR, renoX
Aug 24 2012