www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Discusssion on the Discussion of the Design for a new GC

reply "Orvid King" <blah38621 gmail.com> writes:
So, in order to get the ball rolling on the new GC I intend to 
implement for D, I want to facilitate a lively discussion of the 
design of it, so that it can be designed to be both robust and 
free of design flaws. To keep the discussion from getting 
derailed, I want to lay out a few guidelines, but want to get 
feedback on those guidelines before I actually implement them. My 
current draft of them is as follows:

First we’ll start with a brief overview of the development 
process:
A PR will be created for DMD, DRuntime, and, although it may stay 
empty, Phobos. A new commit will be created for each update of 
the implementation, this includes bug fixes, and continuing work 
on the implementation, in as many iterations as are required. 
This is done to allow progressive review of the code rather than 
trying to review the PRs as a whole, because, as it is likely to 
include several thousand lines of changes to the code, it would 
be impractical to review all at once.
No force push should ever be done to the PRs except to fix a typo 
in or clarify a detail of the commit message for the newest 
commit. If there is a typo in a commit message, or it is not very 
clear on what was actually done, and another commit has already 
been pushed, the typo or un-clear message shall remain for all 
eternity. The suggested remedy in this case is to make a note of 
the typo or clarify the commit message with a comment on the 
commit.
PRs to the PRs are welcome, it is however encouraged to 
coordinate any work you do with the others actively working on 
the GC. The primary outlet for this should be the IRC, however, 
should the need arise, the mailing list is a valid venue for this.
Github should be used as the primary outlet for discussion of 
actual code, due to the ease of referencing code, as well as the 
ability to tell if a comment is about a piece of code that was 
already changed.
The mailing list should be used exclusively for discussion of the 
design. It should not be used for discussing snippets of code in 
the actual implementation. It can, and should be, used to discuss 
snippets of code that may demonstrate a flaw, weakness, or 
strength in the design.
The IRC should be used for rapid-fire Q&A, or bringing someone 
up-to-date with the discussion and progression of the GC so far. 
Discussion about inconsistencies in the coding style of the 
implementation (whitespaces, newlines, etc.) should reside 
exclusively on the IRC, as they are things that a future reader 
of the discussions doesn’t really care about. If a discussion of 
the overall code style used in the implementation is required, a 
thread should be created on the mailing list.
The IRC should not be used to facilitate a design discussion. The 
reason for this is twofold, firstly the IRC has a limited 
audience, thus limited feedback, and secondly, I want the 
discussion of the design to stand as documentation for why the GC 
is designed the way it is.

Now, on to the guidelines for the design discussion.
ARC does not exist. We are implementing a GC, however, if the 
opportunity arises to allow an efficient implementation of 
interfacing with an external ARC platform, such as what is used 
in Objective C, discussion of that interfacing mechanism is 
permitted.
If DMD support is needed, it exists. This means that if the GC 
needs DMD to be capable of something such as scope analysis in 
order to make a particular optimization, then DMD should be 
assumed to be capable of doing that.
While language additions may be proposed, the design must still 
be able to function should the additions not be done, as the 
additions should only be to allow for additional optimization 
opportunities. For instance, re-introducing scoped class locals.



After all of that, I intend to include a base draft of the design 
of the GC, along with opening the PRs and committing the starting 
API. So, is there something I’m missing? Am I overlooking the 
obvious? Is there a more practical way to produce the same 
results?
Apr 23 2014
next sibling parent "Joakim" <dlang joakim.airpost.net> writes:
On Wednesday, 23 April 2014 at 15:33:36 UTC, Orvid King wrote:
 So, in order to get the ball rolling on the new GC I intend to 
 implement for D, I want to facilitate a lively discussion of 
 the design of it, so that it can be designed to be both robust 
 and free of design flaws. To keep the discussion from getting 
 derailed, I want to lay out a few guidelines, but want to get 
 feedback on those guidelines before I actually implement them. 
 My current draft of them is as follows:

 First we’ll start with a brief overview of the development 
 process:
 A PR will be created for DMD, DRuntime, and, although it may 
 stay empty, Phobos. A new commit will be created for each 
 update of the implementation, this includes bug fixes, and 
 continuing work on the implementation, in as many iterations as 
 are required. This is done to allow progressive review of the 
 code rather than trying to review the PRs as a whole, because, 
 as it is likely to include several thousand lines of changes to 
 the code, it would be impractical to review all at once.
 No force push should ever be done to the PRs except to fix a 
 typo in or clarify a detail of the commit message for the 
 newest commit. If there is a typo in a commit message, or it is 
 not very clear on what was actually done, and another commit 
 has already been pushed, the typo or un-clear message shall 
 remain for all eternity. The suggested remedy in this case is 
 to make a note of the typo or clarify the commit message with a 
 comment on the commit.
 PRs to the PRs are welcome, it is however encouraged to 
 coordinate any work you do with the others actively working on 
 the GC. The primary outlet for this should be the IRC, however, 
 should the need arise, the mailing list is a valid venue for 
 this.
 Github should be used as the primary outlet for discussion of 
 actual code, due to the ease of referencing code, as well as 
 the ability to tell if a comment is about a piece of code that 
 was already changed.
 The mailing list should be used exclusively for discussion of 
 the design. It should not be used for discussing snippets of 
 code in the actual implementation. It can, and should be, used 
 to discuss snippets of code that may demonstrate a flaw, 
 weakness, or strength in the design.
 The IRC should be used for rapid-fire Q&A, or bringing someone 
 up-to-date with the discussion and progression of the GC so 
 far. Discussion about inconsistencies in the coding style of 
 the implementation (whitespaces, newlines, etc.) should reside 
 exclusively on the IRC, as they are things that a future reader 
 of the discussions doesn’t really care about. If a discussion 
 of the overall code style used in the implementation is 
 required, a thread should be created on the mailing list.
 The IRC should not be used to facilitate a design discussion. 
 The reason for this is twofold, firstly the IRC has a limited 
 audience, thus limited feedback, and secondly, I want the 
 discussion of the design to stand as documentation for why the 
 GC is designed the way it is.

 Now, on to the guidelines for the design discussion.
 ARC does not exist. We are implementing a GC, however, if the 
 opportunity arises to allow an efficient implementation of 
 interfacing with an external ARC platform, such as what is used 
 in Objective C, discussion of that interfacing mechanism is 
 permitted.
 If DMD support is needed, it exists. This means that if the GC 
 needs DMD to be capable of something such as scope analysis in 
 order to make a particular optimization, then DMD should be 
 assumed to be capable of doing that.
 While language additions may be proposed, the design must still 
 be able to function should the additions not be done, as the 
 additions should only be to allow for additional optimization 
 opportunities. For instance, re-introducing scoped class locals.



 After all of that, I intend to include a base draft of the 
 design of the GC, along with opening the PRs and committing the 
 starting API. So, is there something I’m missing? Am I 
 overlooking the obvious? Is there a more practical way to 
 produce the same results?

Wow, this takes the D forums tradition of "all talk, no code," or as the original saying goes, "all hat, no cattle," to a new peak. ;) I think most would agree with you on most of these guidelines, better to get onto the actual design. It might help if you put forth a tentative proposal, that the D goons can then proceed to destroy... I mean, critically evaluate.
Apr 23 2014
prev sibling next sibling parent reply "Messenger" <dont shoot.me> writes:
On Wednesday, 23 April 2014 at 15:33:36 UTC, Orvid King wrote:
 After all of that, I intend to include a base draft of the 
 design of the GC, along with opening the PRs and committing the 
 starting API. So, is there something I’m missing? Am I 
 overlooking the obvious? Is there a more practical way to 
 produce the same results?

What is the state of Rainer Schütze's precise gc? Duplication of effort and all that.
Apr 23 2014
parent Rainer Schuetze <r.sagitario gmx.de> writes:
On 23.04.2014 20:35, Messenger wrote:
 On Wednesday, 23 April 2014 at 15:33:36 UTC, Orvid King wrote:
 After all of that, I intend to include a base draft of the design of
 the GC, along with opening the PRs and committing the starting API.
 So, is there something I’m missing? Am I overlooking the obvious? Is
 there a more practical way to produce the same results?

What is the state of Rainer Schütze's precise gc? Duplication of effort and all that.

The implementation relies on correct RTInfo generation, but that still doesn't work as this pull request is sitting there for 8 months now without getting much review: https://github.com/D-Programming-Language/dmd/pull/2480 Coincidentally, I updated the PR just a couple of days ago. The precise GC changes for druntime are here: https://github.com/rainers/druntime/tree/gcx_precise2
Apr 23 2014
prev sibling next sibling parent Marco Leise <Marco.Leise gmx.de> writes:
Am Wed, 23 Apr 2014 18:35:25 +0000
schrieb "Messenger" <dont shoot.me>:

 What is the state of Rainer Sch=C3=BCtze's precise gc? Duplication of=20
 effort and all that.

+1. And I hope you know what you are up to :D. Some people may expect a magic pill to emerge from your efforts that makes the GC approx. as fast as manual memory management for typical uses or at least as good as the one in Java. We must not forget that Java has just-in-time compilation and no raw pointer access. They might have found clever ways to make use of features/restrictions in Java, that are not available to D. Memory compaction is one from the top of my head. I only know for sure that you are looking into using some ideas from TCMalloc. Other than that, what are the exact problems you are trying to solve? That would be good to know, since different goals might require different implementations. E.g. a precise GC is generally slowed down by checking data types, but it doesn't keep memory alive because some float variable happens to look like a pointer to it. What are the limitations of garbage collection? As an example: If someone loads some million items graph structure into memory, can they still make any assumption about the run time of GC.alloc()? Can generational collection be implemented? --=20 Marco
Apr 23 2014
prev sibling next sibling parent "Mike" <none none.com> writes:
On Wednesday, 23 April 2014 at 15:33:36 UTC, Orvid King wrote:
 After all of that, I intend to include a base draft of the 
 design of the GC, along with opening the PRs and committing the 
 starting API. So, is there something I’m missing? Am I 
 overlooking the obvious? Is there a more practical way to 
 produce the same results?

What specific problems do you hope to solve with the new design? Is it only to improve the speed at which the GC runs to completion on a single pass? What about premptability so the GC could run as a low priority task, and be interrupted at any time to run something at a higher priority? Is that out of scope? Mike
Apr 24 2014
prev sibling next sibling parent Andrej Mitrovic via Digitalmars-d <digitalmars-d puremagic.com> writes:
On 4/23/14, Joakim via Digitalmars-d <digitalmars-d puremagic.com> wrote:
  It might help if you put
 forth a tentative proposal, that the D goons can then proceed to
 destroy... I mean, critically evaluate.

Btw guys, what's stopping us from simply porting over Leandro's CDGC to D2? I've looked at the codebase and it doesn't seem to be that large (the following link is the most recent version of the GC I could find): http://www.dsource.org/projects/tango/browser/trunk/tango/core/rt/gc/cdgc I know there was some talk about a missing Windows implementation, but if we had a Posix implementation and it turned out to be really good I don't think it would take a long time before we step up and make a Windows implementation.
Apr 24 2014
prev sibling next sibling parent "Kagamin" <spam here.lot> writes:
On Wednesday, 23 April 2014 at 18:35:26 UTC, Messenger wrote:
 On Wednesday, 23 April 2014 at 15:33:36 UTC, Orvid King wrote:
 After all of that, I intend to include a base draft of the 
 design of the GC, along with opening the PRs and committing 
 the starting API. So, is there something I’m missing? Am I 
 overlooking the obvious? Is there a more practical way to 
 produce the same results?

What is the state of Rainer Schütze's precise gc? Duplication of effort and all that.

Also CDGC http://www.llucax.com.ar/proj/dgc/
Apr 24 2014
prev sibling next sibling parent "Orvid King" <blah38621 gmail.com> writes:
On Wednesday, 23 April 2014 at 18:35:26 UTC, Messenger wrote:
 On Wednesday, 23 April 2014 at 15:33:36 UTC, Orvid King wrote:
 After all of that, I intend to include a base draft of the 
 design of the GC, along with opening the PRs and committing 
 the starting API. So, is there something I’m missing? Am I 
 overlooking the obvious? Is there a more practical way to 
 produce the same results?

What is the state of Rainer Schütze's precise gc? Duplication of effort and all that.

While I do admire the work that has gone into it, there are multiple limitations in his implementation that I believe will prevent his implementation to being the best for D. The single biggest thing is that it was created before a preliminary std.allocators was published, and doesn't take them into account in it's design. I believe it is possible to make these a huge strength of the GC, because it is possible to use them, with minor extensions to the design, to even implement a generational compacting allocator, along with the allocation speed that it comes with. I believe it's design also forgoes the possibility of async collection, which would fail to address the existence of realtime and server applications. I do however intend to have a heap-precise allocator as the default. The extensibility of my design would even allow for DMD style allocation, where it simply never deallocates anything. I'll post a full design soon, but I have to type it all out first, and make sure that it doesn't just end up as some incoherent babbling, and also so that it will make sense to everyone else who's reading it.
Apr 24 2014
prev sibling next sibling parent "Orvid King" <blah38621 gmail.com> writes:
On Thursday, 24 April 2014 at 10:14:07 UTC, Kagamin wrote:
 On Wednesday, 23 April 2014 at 18:35:26 UTC, Messenger wrote:
 On Wednesday, 23 April 2014 at 15:33:36 UTC, Orvid King wrote:
 After all of that, I intend to include a base draft of the 
 design of the GC, along with opening the PRs and committing 
 the starting API. So, is there something I’m missing? Am I 
 overlooking the obvious? Is there a more practical way to 
 produce the same results?

What is the state of Rainer Schütze's precise gc? Duplication of effort and all that.

Also CDGC http://www.llucax.com.ar/proj/dgc/

Ah, I hadn't realized he had actually implemented the concurrent GC he gave a talk on, I will make sure to look over the code before writing out the design. I was also planning on watching the full talk before posting the design, to make sure that it doesn't prevent any aspects of the design from working.
Apr 24 2014
prev sibling next sibling parent "Dicebot" <public dicebot.lv> writes:
On Thursday, 24 April 2014 at 14:15:45 UTC, Orvid King wrote:
 On Thursday, 24 April 2014 at 10:14:07 UTC, Kagamin wrote:
 On Wednesday, 23 April 2014 at 18:35:26 UTC, Messenger wrote:
 On Wednesday, 23 April 2014 at 15:33:36 UTC, Orvid King wrote:
 After all of that, I intend to include a base draft of the 
 design of the GC, along with opening the PRs and committing 
 the starting API. So, is there something I’m missing? Am I 
 overlooking the obvious? Is there a more practical way to 
 produce the same results?

What is the state of Rainer Schütze's precise gc? Duplication of effort and all that.

Also CDGC http://www.llucax.com.ar/proj/dgc/

Ah, I hadn't realized he had actually implemented the concurrent GC he gave a talk on, I will make sure to look over the code before writing out the design. I was also planning on watching the full talk before posting the design, to make sure that it doesn't prevent any aspects of the design from working.

Leandro's CDGC is used as default GC in Sociomantic applications, so it is not only implemented but also extensively production-tested :) I have pinged him to give some feedback about issues he has encountered when trying to port it to D2.
Apr 24 2014
prev sibling next sibling parent "Orvid King" <blah38621 gmail.com> writes:
On Thursday, 24 April 2014 at 07:09:27 UTC, Mike wrote:
 On Wednesday, 23 April 2014 at 15:33:36 UTC, Orvid King wrote:
 After all of that, I intend to include a base draft of the 
 design of the GC, along with opening the PRs and committing 
 the starting API. So, is there something I’m missing? Am I 
 overlooking the obvious? Is there a more practical way to 
 produce the same results?

What specific problems do you hope to solve with the new design? Is it only to improve the speed at which the GC runs to completion on a single pass? What about premptability so the GC could run as a low priority task, and be interrupted at any time to run something at a higher priority? Is that out of scope? Mike

The goals of my design are to first off reduce the impact of the GC on D as a whole, I would love it if I could get it down to the same, or less impact as the MS.net GC. Secondly I want to make extending, or customizing the GC for each particular use-case significantly easier. Reducing the number of false positives would also be a good outcome. Overall I intend to significantly improve on D's current GC.
Apr 24 2014
prev sibling next sibling parent "Orvid King" <blah38621 gmail.com> writes:
On Thursday, 24 April 2014 at 14:22:17 UTC, Dicebot wrote:
 On Thursday, 24 April 2014 at 14:15:45 UTC, Orvid King wrote:
 Ah, I hadn't realized he had actually implemented the 
 concurrent GC he gave a talk on, I will make sure to look over 
 the code before writing out the design. I was also planning on 
 watching the full talk before posting the design, to make sure 
 that it doesn't prevent any aspects of the design from working.

Leandro's CDGC is used as default GC in Sociomantic applications, so it is not only implemented but also extensively production-tested :) I have pinged him to give some feedback about issues he has encountered when trying to port it to D2.

In that case I'll have to take a much deeper look at it than I was originally planning, I will however definitely attempt to include a concurrent collection mode with my GC design, especially as there is already a well-tested version.
Apr 24 2014
prev sibling next sibling parent "Leandro Lucarella" <leandro.lucarella sociomantic.com> writes:
On Thursday, 24 April 2014 at 14:22:17 UTC, Dicebot wrote:
 On Thursday, 24 April 2014 at 14:15:45 UTC, Orvid King wrote:
 On Thursday, 24 April 2014 at 10:14:07 UTC, Kagamin wrote:
 On Wednesday, 23 April 2014 at 18:35:26 UTC, Messenger wrote:
 On Wednesday, 23 April 2014 at 15:33:36 UTC, Orvid King 
 wrote:
 After all of that, I intend to include a base draft of the 
 design of the GC, along with opening the PRs and committing 
 the starting API. So, is there something I’m missing? Am I 
 overlooking the obvious? Is there a more practical way to 
 produce the same results?

What is the state of Rainer Schütze's precise gc? Duplication of effort and all that.

Also CDGC http://www.llucax.com.ar/proj/dgc/

Ah, I hadn't realized he had actually implemented the concurrent GC he gave a talk on, I will make sure to look over the code before writing out the design. I was also planning on watching the full talk before posting the design, to make sure that it doesn't prevent any aspects of the design from working.

Leandro's CDGC is used as default GC in Sociomantic applications, so it is not only implemented but also extensively production-tested :) I have pinged him to give some feedback about issues he has encountered when trying to port it to D2.

I didn't find any particular issues with the porting, I just didn't get far enough with the porting. My idea was to add patches on top of the current implementation instead of rolling out a complete new implementation from the scratch, since there were a few changes to the GC in D2, the porting should be done with a lot of care. There is still also the problem with threading, see https://www.dsource.org/projects/tango/ticket/2087 for details. BTW, if someone wants to take over the porting, and also do it "patch by patch" I recommend consulting this repo: http://git.llucax.com.ar/w/software/dgc/cdgc.git (here is where the complete history of the changes live, not in tango). And of course, feel free to contact me for any questions advice you might need!
Apr 24 2014
prev sibling next sibling parent "Leandro Lucarella" <leandro.lucarella sociomantic.com> writes:
On Thursday, 24 April 2014 at 14:15:45 UTC, Orvid King wrote:
 On Thursday, 24 April 2014 at 10:14:07 UTC, Kagamin wrote:
 On Wednesday, 23 April 2014 at 18:35:26 UTC, Messenger wrote:
 On Wednesday, 23 April 2014 at 15:33:36 UTC, Orvid King wrote:
 After all of that, I intend to include a base draft of the 
 design of the GC, along with opening the PRs and committing 
 the starting API. So, is there something I’m missing? Am I 
 overlooking the obvious? Is there a more practical way to 
 produce the same results?

What is the state of Rainer Schütze's precise gc? Duplication of effort and all that.

Also CDGC http://www.llucax.com.ar/proj/dgc/

Ah, I hadn't realized he had actually implemented the concurrent GC he gave a talk on

Haha, you thought I was just inventing the numbers?!? :D
Apr 24 2014
prev sibling next sibling parent "Brian Rogoff" <brogoff gmail.com> writes:
On Thursday, 24 April 2014 at 06:10:00 UTC, Rainer Schuetze wrote:
 On 23.04.2014 20:35, Messenger wrote:
 What is the state of Rainer Schütze's precise gc? Duplication 
 of effort
 and all that.

The implementation relies on correct RTInfo generation, but that still doesn't work as this pull request is sitting there for 8 months now without getting much review: https://github.com/D-Programming-Language/dmd/pull/2480

So this PR is the main blocker to a working heap-precise GC? Given all of the talk and energy spent around the GC issue I'd have thought this would be on some super duper high priority fast path for inclusion. Nothing wrong with other GC projects but this seems like the lowest hanging fruit.
 Coincidentally, I updated the PR just a couple of days ago.

 The precise GC changes for druntime are here: 
 https://github.com/rainers/druntime/tree/gcx_precise2

Apr 24 2014
prev sibling next sibling parent "Kagamin" <spam here.lot> writes:
On Thursday, 24 April 2014 at 17:10:20 UTC, Brian Rogoff wrote:
 The implementation relies on correct RTInfo generation, but 
 that still doesn't work as this pull request is sitting there 
 for 8 months now without getting much review: 
 https://github.com/D-Programming-Language/dmd/pull/2480

So this PR is the main blocker to a working heap-precise GC? Given all of the talk and energy spent around the GC issue I'd have thought this would be on some super duper high priority fast path for inclusion. Nothing wrong with other GC projects but this seems like the lowest hanging fruit.

Another issue is that in 64-bit address space precise GC gives no advantage as false pointers have low probability, so precise GC seems to be not worth the effort.
Apr 24 2014
prev sibling next sibling parent "safety0ff" <safety0ff.dev gmail.com> writes:
On Thursday, 24 April 2014 at 17:33:08 UTC, Kagamin wrote:
 Another issue is that in 64-bit address space precise GC gives 
 no advantage as false pointers have low probability, so precise 
 GC seems to be not worth the effort.

I disagree, in the current state of things there is benefit for 64 bit systems: Currently there are many ways to accidentally allocate without the NO_SCAN flag for data without pointers. The overhead of scanning this non-pointer data can often be significant enough to warrant hunting for mis-flagged allocations.
Apr 24 2014
prev sibling next sibling parent "Messenger" <dont shoot.me> writes:
On Thursday, 24 April 2014 at 17:33:08 UTC, Kagamin wrote:
 Another issue is that in 64-bit address space precise GC gives 
 no advantage as false pointers have low probability, so precise 
 GC seems to be not worth the effort.

Except a precise gc is type-aware, no? And you could basically ask it to "please print out everything that you have currently allocated so I can debug what's allocating".
Apr 24 2014
prev sibling next sibling parent "Kagamin" <spam here.lot> writes:
On Thursday, 24 April 2014 at 18:39:34 UTC, Messenger wrote:
 Except a precise gc is type-aware, no? And you could basically 
 ask it to "please print out everything that you have currently 
 allocated so I can debug what's allocating".

Depends on the implementation. One would want to not store data which doesn't help with collection - a pointers bitmask is enough for scanning, but not enough to know the type.
Apr 24 2014
prev sibling next sibling parent "Orvid King" <blah38621 gmail.com> writes:
Just thought of a minor addition to the guidelines.

While discussion of the naming of the public API should occur on 
the newsgroup, discussion of the names of locals, or non-public 
APIs should occur on Github.

Any objections/concerns/improvements to this?
Apr 30 2014
prev sibling parent Marco Leise <Marco.Leise gmx.de> writes:
Am Wed, 30 Apr 2014 19:10:04 +0000
schrieb "Orvid King" <blah38621 gmail.com>:

 Just thought of a minor addition to the guidelines.
 
 While discussion of the naming of the public API should occur on 
 the newsgroup, discussion of the names of locals, or non-public 
 APIs should occur on Github.
 
 Any objections/concerns/improvements to this?

No and I think it comes natural that we don't need to discuss internal variable names here. :) -- Marco
May 04 2014