digitalmars.D.announce - Antti-Ville Tuuainen Passes GSoC Final Evaluation

dsimcha (10/10) Aug 21 2012 Congratulations, Antti-Ville! This project creates a better

Bernard Helyer (1/1) Aug 21 2012 Congrats!
Simen Kjaeraas (5/14) Aug 21 2012 Congratulations!
David Nadlinger (6/10) Aug 22 2012 Great! I've been pretty much out of touch with the GSoC projects

dsimcha (13/23) Aug 22 2012 Unfortunately not yet. Antti-ville ran into a lot if difficulty

Chad J (3/25) Aug 22 2012 Poolwise bitmap... what an interesting name. I'll look forward to

Rory McGuire (3/5) Aug 23 2012 +1

dsimcha (12/20) Aug 23 2012 Basically, the idea is to store information about what is and

Rory McGuire (5/15) Aug 23 2012 Am I correct in thinking that this is still single threaded stop the wor...

=?UTF-8?B?QWxleCBSw7hubmUgUGV0ZXJzZW4=?= (9/26) Aug 23 2012 Yes, but parallelization of the mark phase is fairly trivial, and

dsimcha (6/8) Aug 23 2012 Ironically, Antti-ville's original proposal involved

=?UTF-8?B?QWxleCBSw7hubmUgUGV0ZXJzZW4=?= (8/14) Aug 23 2012 Oh, I agree it's more important. The GC is reasonably fast as-is, but

renoX (7/22) Aug 23 2012 Funny, I've just read an interview from Narihiro Nakamura who is
renoX (7/22) Aug 24 2012 Funny, I've just read an interview from Narihiro Nakamura who is

Leandro Lucarella (18/40) Aug 23 2012 Feel free to use my work[1] to have a cloning collector which is not STW
Jacob Carlborg (12/16) Aug 23 2012 Would a thread local GC be possible, and desirable? To my understanding,...

Simen Kjaeraas (14/29) Aug 24 2012 =

Jacob Carlborg (5/13) Aug 24 2012 Me neither.

renoX (7/35) Aug 24 2012 Funny coincidence, I've just read an interview from Narihiro

"dsimcha" <dsimcha yahoo.com> writes:

Congratulations, Antti-Ville!  This project creates a better 
implementation of precise GC heap scanning than anything that's 
been created so far for D.  The goal is to eventually integrate 
it into standard D distributions.  Any volunteers for beta 
testing?

Code:

https://github.com/Tuna-Fish/druntime/tree/gc_poolwise_bitmap

The code for this project is a fork of druntime.  The master 
branch was a failed (or less successful) experiment.  The version 
we're going with for integration is the gc_poolwise_bitmap branch.

Aug 21 2012

"Bernard Helyer" <b.helyer gmail.com> writes:

Congrats!

Aug 21 2012

"Simen Kjaeraas" <simen.kjaras gmail.com> writes:

On Tue, 21 Aug 2012 15:15:48 +0200, dsimcha <dsimcha yahoo.com> wrote:

 Congratulations, Antti-Ville!  This project creates a better  
 implementation of precise GC heap scanning than anything that's been  
 created so far for D.  The goal is to eventually integrate it into  
 standard D distributions.  Any volunteers for beta testing?

 Code:

 https://github.com/Tuna-Fish/druntime/tree/gc_poolwise_bitmap

 The code for this project is a fork of druntime.  The master branch was  
 a failed (or less successful) experiment.  The version we're going with  
 for integration is the gc_poolwise_bitmap branch.

Congratulations!

Any numbers on how this is better than the status quo?

-- 
Simen

Aug 21 2012

"David Nadlinger" <see klickverbot.at> writes:

On Tuesday, 21 August 2012 at 13:15:49 UTC, dsimcha wrote:
 Congratulations, Antti-Ville!  This project creates a better 
 implementation of precise GC heap scanning than anything that's 
 been created so far for D.  The goal is to eventually integrate 
 it into standard D distributions.

Great! I've been pretty much out of touch with the GSoC projects 
this summers, as I had a rather intense exam session at 
university – are there any progress reports/blog posts with 
actual numbers in them?

David

Aug 22 2012

"dsimcha" <dsimcha yahoo.com> writes:

On Wednesday, 22 August 2012 at 20:04:12 UTC, David Nadlinger 
wrote:
 On Tuesday, 21 August 2012 at 13:15:49 UTC, dsimcha wrote:
 Congratulations, Antti-Ville!  This project creates a better 
 implementation of precise GC heap scanning than anything 
 that's been created so far for D.  The goal is to eventually 
 integrate it into standard D distributions.

 Great! I've been pretty much out of touch with the GSoC 
 projects this summers, as I had a rather intense exam session 
 at university – are there any progress reports/blog posts 
 with actual numbers in them?

 David

Unfortunately not yet.  Antti-ville ran into a lot if difficulty 
getting things to work and spent the first half of GSoC on an 
approach that turned out to be the wrong one in hindsight.  His 
project was originally supposed to be improving concurrency 
support in the GC, but when rtinfo was added to TypeInfo, precise 
heap scanning seemed like too good an opportunity to pass up.  To 
do this project he had to learn the D template system from 
scratch.  Therefore we didn't have the luxury of the luxury of 
doing extensive benchmarking and documentation.  I plan to work 
on that with him before he creates a pull request to integrate 
his new and improved GC upstream.

Aug 22 2012

Chad J <chadjoan __spam.is.bad__gmail.com> writes:

On 08/22/2012 05:41 PM, dsimcha wrote:
 On Wednesday, 22 August 2012 at 20:04:12 UTC, David Nadlinger wrote:
 On Tuesday, 21 August 2012 at 13:15:49 UTC, dsimcha wrote:
 Congratulations, Antti-Ville! This project creates a better
 implementation of precise GC heap scanning than anything that's been
 created so far for D. The goal is to eventually integrate it into
 standard D distributions.

 Great! I've been pretty much out of touch with the GSoC projects this
 summers, as I had a rather intense exam session at university – are
 there any progress reports/blog posts with actual numbers in them?

 David

 Unfortunately not yet. Antti-ville ran into a lot if difficulty getting
 things to work and spent the first half of GSoC on an approach that
 turned out to be the wrong one in hindsight. His project was originally
 supposed to be improving concurrency support in the GC, but when rtinfo
 was added to TypeInfo, precise heap scanning seemed like too good an
 opportunity to pass up. To do this project he had to learn the D
 template system from scratch. Therefore we didn't have the luxury of the
 luxury of doing extensive benchmarking and documentation. I plan to work
 on that with him before he creates a pull request to integrate his new
 and improved GC upstream.

Poolwise bitmap... what an interesting name.  I'll look forward to 
learning about the concepts behind it!

Aug 22 2012

Rory McGuire <rjmcguire gmail.com> writes:

On Thu, Aug 23, 2012 at 4:01 AM, Chad J
<chadjoan __spam.is.bad__gmail.com>wrote:

 Poolwise bitmap... what an interesting name.  I'll look forward to
 learning about the concepts behind it!

+1

Aug 23 2012

"dsimcha" <dsimcha yahoo.com> writes:

On Thursday, 23 August 2012 at 11:40:22 UTC, Rory McGuire wrote:
 On Thu, Aug 23, 2012 at 4:01 AM, Chad J
 <chadjoan __spam.is.bad__gmail.com>wrote:

 Poolwise bitmap... what an interesting name.  I'll look 
 forward to
 learning about the concepts behind it!

 +1

Basically, the idea is to store information about what is and 
isn't a pointer at the pool level instead of at the block level.  
My attempt from a long time ago at precise heap scanning, and 
Antti-Ville's first attempt, stored meta-data at the end of every 
allocated block.  This worked well for large arrays, but was 
terribly inefficient for smaller allocations and made the GC code 
even messier than it already is.  The overhead was a fixed 
(void*).sizeof bits per block.  Now, each pool has a bit array 
that contains one bit for every possible aligned pointer.  The 
overhead is always 1 bit for every (void*).sizeof bytes no matter 
how large or small the block is.

Aug 23 2012

Rory McGuire <rjmcguire gmail.com> writes:

On Thu, Aug 23, 2012 at 2:51 PM, dsimcha <dsimcha yahoo.com> wrote:

 Basically, the idea is to store information about what is and isn't a
 pointer at the pool level instead of at the block level.  My attempt from a
 long time ago at precise heap scanning, and Antti-Ville's first attempt,
 stored meta-data at the end of every allocated block.  This worked well for
 large arrays, but was terribly inefficient for smaller allocations and made
 the GC code even messier than it already is.  The overhead was a fixed
 (void*).sizeof bits per block.  Now, each pool has a bit array that
 contains one bit for every possible aligned pointer.  The overhead is
 always 1 bit for every (void*).sizeof bytes no matter how large or small
 the block is.

Am I correct in thinking that this is still single threaded stop the world?

Any chance of the code being documented extensively in the hopes that it
would encourage participation/experimentation?

Thanks for all the work you guys have put in.

Aug 23 2012

=?UTF-8?B?QWxleCBSw7hubmUgUGV0ZXJzZW4=?= <alex lycus.org> writes:

On 23-08-2012 15:21, Rory McGuire wrote:
 On Thu, Aug 23, 2012 at 2:51 PM, dsimcha <dsimcha yahoo.com
 <mailto:dsimcha yahoo.com>> wrote:

     Basically, the idea is to store information about what is and isn't
     a pointer at the pool level instead of at the block level.  My
     attempt from a long time ago at precise heap scanning, and
     Antti-Ville's first attempt, stored meta-data at the end of every
     allocated block.  This worked well for large arrays, but was
     terribly inefficient for smaller allocations and made the GC code
     even messier than it already is.  The overhead was a fixed
     (void*).sizeof bits per block.  Now, each pool has a bit array that
     contains one bit for every possible aligned pointer.  The overhead
     is always 1 bit for every (void*).sizeof bytes no matter how large
     or small the block is.


 Am I correct in thinking that this is still single threaded stop the world?

Yes, but parallelization of the mark phase is fairly trivial, and 
something we should probably look into.

The GC will probably always be STW unless we get compiler support for 
inserting GC barriers.

 Any chance of the code being documented extensively in the hopes that it
 would encourage participation/experimentation?

 Thanks for all the work you guys have put in.

-- 
Alex Rønne Petersen
alex lycus.org
http://lycus.org

Aug 23 2012

"dsimcha" <dsimcha yahoo.com> writes:

On Thursday, 23 August 2012 at 14:38:19 UTC, Alex Rønne Petersen 
wrote:
 Yes, but parallelization of the mark phase is fairly trivial, 
 and something we should probably look into.

Ironically, Antti-ville's original proposal involved 
parallelization.  This was scrapped because after rtinfo was 
added, we agreed that precise heap scanning was more important 
and looked newly feasible.

Aug 23 2012

=?UTF-8?B?QWxleCBSw7hubmUgUGV0ZXJzZW4=?= <alex lycus.org> writes:

On 23-08-2012 16:47, dsimcha wrote:
 On Thursday, 23 August 2012 at 14:38:19 UTC, Alex Rønne Petersen wrote:
 Yes, but parallelization of the mark phase is fairly trivial, and
 something we should probably look into.

 Ironically, Antti-ville's original proposal involved parallelization.
 This was scrapped because after rtinfo was added, we agreed that precise
 heap scanning was more important and looked newly feasible.

Oh, I agree it's more important. The GC is reasonably fast as-is, but 
has severe issues with false pointers. Parallel marking is just in the 
"nice to have" category.

-- 
Alex Rønne Petersen
alex lycus.org
http://lycus.org

Aug 23 2012

"renoX" <renozyx gmail.com> writes:

On Thursday, 23 August 2012 at 14:49:11 UTC, Alex Rønne Petersen 
wrote:
 On 23-08-2012 16:47, dsimcha wrote:
 On Thursday, 23 August 2012 at 14:38:19 UTC, Alex Rønne 
 Petersen wrote:
 Yes, but parallelization of the mark phase is fairly trivial, 
 and
 something we should probably look into.

 Ironically, Antti-ville's original proposal involved 
 parallelization.
 This was scrapped because after rtinfo was added, we agreed 
 that precise
 heap scanning was more important and looked newly feasible.

 Oh, I agree it's more important. The GC is reasonably fast 
 as-is, but has severe issues with false pointers. Parallel 
 marking is just in the "nice to have" category.

Funny, I've just read an interview from Narihiro Nakamura who is 
working on parallel marking in Ruby: 
http://rubysource.com/narihiro-nakamura-rubys-gc-innovator/

BR,
renoX

Aug 23 2012

"renoX" <renozyx gmail.com> writes:

On Thursday, 23 August 2012 at 14:49:11 UTC, Alex Rønne Petersen 
wrote:
 On 23-08-2012 16:47, dsimcha wrote:
 On Thursday, 23 August 2012 at 14:38:19 UTC, Alex Rønne 
 Petersen wrote:
 Yes, but parallelization of the mark phase is fairly trivial, 
 and
 something we should probably look into.

 Ironically, Antti-ville's original proposal involved 
 parallelization.
 This was scrapped because after rtinfo was added, we agreed 
 that precise
 heap scanning was more important and looked newly feasible.

 Oh, I agree it's more important. The GC is reasonably fast 
 as-is, but has severe issues with false pointers. Parallel 
 marking is just in the "nice to have" category.

Funny, I've just read an interview from Narihiro Nakamura who is 
working on parallel marking in Ruby:
http://rubysource.com/narihiro-nakamura-rubys-gc-innovator/

BR,
renoX

Aug 24 2012

Leandro Lucarella <luca llucax.com.ar> writes:

Alex Rønne Petersen, el 23 de August a las 16:38 me escribiste:
 On 23-08-2012 15:21, Rory McGuire wrote:
On Thu, Aug 23, 2012 at 2:51 PM, dsimcha <dsimcha yahoo.com
<mailto:dsimcha yahoo.com>> wrote:

    Basically, the idea is to store information about what is and isn't
    a pointer at the pool level instead of at the block level.  My
    attempt from a long time ago at precise heap scanning, and
    Antti-Ville's first attempt, stored meta-data at the end of every
    allocated block.  This worked well for large arrays, but was
    terribly inefficient for smaller allocations and made the GC code
    even messier than it already is.  The overhead was a fixed
    (void*).sizeof bits per block.  Now, each pool has a bit array that
    contains one bit for every possible aligned pointer.  The overhead
    is always 1 bit for every (void*).sizeof bytes no matter how large
    or small the block is.


Am I correct in thinking that this is still single threaded stop the world?

 Yes, but parallelization of the mark phase is fairly trivial, and
 something we should probably look into.

 The GC will probably always be STW unless we get compiler support
 for inserting GC barriers.

Feel free to use my work[1] to have a cloning collector which is not STW
in the classical sense (it just STW for the time it takes to pause the
threads and fork() the process), at least on *nixes. Works fairly
well[2], even in production code. There's is only one issue[3], but
a nasty one, because of a glibc internal lock shared between fork() and
other glibc functions like malloc and the GC using signals to interrupt
the threads. I hope I can fix this problem eventually (the ideal for me
would be to find another way to pause the threads).

[1] http://www.dsource.org/projects/tango/browser/trunk/tango/core/rt/gc/cdgc
[2] http://www.llucax.com.ar/blog/blog/post/-1a4bdfba
[3] http://www.dsource.org/projects/tango/ticket/2087

--
Leandro Lucarella (AKA luca)                     http://llucax.com.ar/
----------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------
Si por el chancho fuera, se autocomería con chimichurri Worshestershire!

Aug 23 2012

Jacob Carlborg <doob me.com> writes:

On 2012-08-23 16:38, Alex Rønne Petersen wrote:

 Yes, but parallelization of the mark phase is fairly trivial, and
 something we should probably look into.

 The GC will probably always be STW unless we get compiler support for
 inserting GC barriers.

Would a thread local GC be possible, and desirable? To my understanding, 
which is not much, that means the GC will only run in one thread (or 
multiple) and only needs to stop that/those thread(s). That also means 
it only need to search for dead objects in the heap/storage area for 
that particular thread (and where these point to).

If I understand this correctly this would be perfect for D since 
everything is thread local by default.

There's also a global heap for global objects or objects shared between 
threads.

-- 
/Jacob Carlborg

Aug 23 2012

"Simen Kjaeraas" <simen.kjaras gmail.com> writes:

On Fri, 24 Aug 2012 08:27:09 +0200, Jacob Carlborg <doob me.com> wrote:

 On 2012-08-23 16:38, Alex R=C3=B8nne Petersen wrote:

 Yes, but parallelization of the mark phase is fairly trivial, and
 something we should probably look into.

 The GC will probably always be STW unless we get compiler support for=


 inserting GC barriers.

 Would a thread local GC be possible, and desirable? To my understandin=

g,  =

 which is not much, that means the GC will only run in one thread (or  =

 multiple) and only needs to stop that/those thread(s). That also means=

  =

 it only need to search for dead objects in the heap/storage area for  =

 that particular thread (and where these point to).

 If I understand this correctly this would be perfect for D since  =

 everything is thread local by default.

 There's also a global heap for global objects or objects shared betwee=

n  =

 threads.

It certainly would be possible. But I believe it requires thread-local
heaps, and some way of keeping track of when an object changes owning
thread.

Perhaps a mark phase could run on each thread's heap, and unreferenced
objects be moved to a global list of potentially dead objects. If after
all threads have run a mark, none have claimed the objects, they're
collected.

Then again, I'm hardly a GC architect.

-- =

Simen

Aug 24 2012

Jacob Carlborg <doob me.com> writes:

On 2012-08-24 09:32, Simen Kjaeraas wrote:

 It certainly would be possible. But I believe it requires thread-local
 heaps, and some way of keeping track of when an object changes owning
 thread.

Yeah, I was think about that. Are thread local heaps a problem?

 Perhaps a mark phase could run on each thread's heap, and unreferenced
 objects be moved to a global list of potentially dead objects. If after
 all threads have run a mark, none have claimed the objects, they're
 collected.

 Then again, I'm hardly a GC architect.

Me neither.

-- 
/Jacob Carlborg

Aug 24 2012

"renoX" <renozyx gmail.com> writes:

On Thursday, 23 August 2012 at 14:38:19 UTC, Alex Rønne Petersen 
wrote:
 On 23-08-2012 15:21, Rory McGuire wrote:
 On Thu, Aug 23, 2012 at 2:51 PM, dsimcha <dsimcha yahoo.com
 <mailto:dsimcha yahoo.com>> wrote:

    Basically, the idea is to store information about what is 
 and isn't
    a pointer at the pool level instead of at the block level.  
 My
    attempt from a long time ago at precise heap scanning, and
    Antti-Ville's first attempt, stored meta-data at the end of 
 every
    allocated block.  This worked well for large arrays, but was
    terribly inefficient for smaller allocations and made the 
 GC code
    even messier than it already is.  The overhead was a fixed
    (void*).sizeof bits per block.  Now, each pool has a bit 
 array that
    contains one bit for every possible aligned pointer.  The 
 overhead
    is always 1 bit for every (void*).sizeof bytes no matter 
 how large
    or small the block is.


 Am I correct in thinking that this is still single threaded 
 stop the world?

 Yes, but parallelization of the mark phase is fairly trivial, 
 and something we should probably look into.

Funny coincidence, I've just read an interview from Narihiro 
Nakamura who is working on parallel marking in Ruby: 
http://rubysource.com/narihiro-nakamura-rubys-gc-innovator/

BR,
renoX

Aug 24 2012

D Programming

C/C++ Programming

Other

digitalmars.D.announce - Antti-Ville Tuuainen Passes GSoC Final Evaluation