www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Thought on limiting scope of GC

reply Jerry <jlquinn optonline.net> writes:
Hi all,

I just had the following thought on limiting the gc in regions.  I don't
know if this would address some of Manu's concerns, but here goes:

My thought is to have something like the following:

GC.track();
auto obj = allocateStuff();
GC.cleanup(obj);

The idea here is that track() tells GC to explicitly track all objects
created from that point until the cleanup call.  The cleanup() call
tells gc to limit its collection to those objects allocated since the
track() call.  The obj parameter tells gc to consider obj live.

This way, you can avoid tracking everything that may get created, but
you can limit how much work gets done.

Comments? Slams?

Jerry
Feb 13 2014
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 2/13/14, 8:41 PM, Jerry wrote:
 Hi all,

 I just had the following thought on limiting the gc in regions.  I don't
 know if this would address some of Manu's concerns, but here goes:

 My thought is to have something like the following:

 GC.track();
 auto obj = allocateStuff();
 GC.cleanup(obj);

 The idea here is that track() tells GC to explicitly track all objects
 created from that point until the cleanup call.  The cleanup() call
 tells gc to limit its collection to those objects allocated since the
 track() call.  The obj parameter tells gc to consider obj live.

 This way, you can avoid tracking everything that may get created, but
 you can limit how much work gets done.

 Comments? Slams?

 Jerry

Yah, it's a classic (with the manes "track" -> "mark" and "cleanup" -> "sweep"). Allocators support that already, and installing a global GC should do as well. Andrei
Feb 13 2014
next sibling parent reply Jerry <jlquinn optonline.net> writes:
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

 On 2/13/14, 8:41 PM, Jerry wrote:
 Hi all,

 I just had the following thought on limiting the gc in regions.  I don't
 know if this would address some of Manu's concerns, but here goes:

 My thought is to have something like the following:

 GC.track();
 auto obj = allocateStuff();
 GC.cleanup(obj);

 The idea here is that track() tells GC to explicitly track all objects
 created from that point until the cleanup call.  The cleanup() call
 tells gc to limit its collection to those objects allocated since the
 track() call.  The obj parameter tells gc to consider obj live.

 This way, you can avoid tracking everything that may get created, but
 you can limit how much work gets done.

 Comments? Slams?

 Jerry

Yah, it's a classic (with the manes "track" -> "mark" and "cleanup" -> "sweep"). Allocators support that already, and installing a global GC should do as well.

I don't follow the global GC comment. Let's say you're using global GC in general but want to control more tightly what it's doing at a particular region of the code. Mark looks at all things that have been allocated and possibly live. Track says keep track of objects allocated after the track call, and cleanup only looks at those objects that were recently allocated, ignoring the rest of the heap. If you're saying that allocators will provide the means of doing this, then that's fine.
Feb 14 2014
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 2/14/14, 3:28 AM, Jerry wrote:
 Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
 Yah, it's a classic (with the manes "track" -> "mark" and "cleanup" ->
 "sweep"). Allocators support that already, and installing a global GC should
 do as well.

I don't follow the global GC comment. Let's say you're using global GC in general but want to control more tightly what it's doing at a particular region of the code. Mark looks at all things that have been allocated and possibly live.

Oh, I think mark/sweep in the "mark/sweep idiom" are different from "mark & sweep garbage collector". I looked for the evidence that the idiom does exist under that name, but apparently I was wrong. Anyhow, I guess track/cleanup is less confusing.
 Track says keep track of objects allocated after the track call, and
 cleanup only looks at those objects that were recently allocated,
 ignoring the rest of the heap.

 If you're saying that allocators will provide the means of doing this,
 then that's fine.

I'm thinking of something like: MyAllocator alloc = ...; alloc.installGlobally(); ... alloc.deallocateAll(); alloc.uninstallGlobally(); Andrei
Feb 14 2014
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 2/14/14, 8:26 AM, Jerry wrote:
 Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

 On 2/14/14, 3:28 AM, Jerry wrote:
 Track says keep track of objects allocated after the track call, and
 cleanup only looks at those objects that were recently allocated,
 ignoring the rest of the heap.

 If you're saying that allocators will provide the means of doing this,
 then that's fine.

I'm thinking of something like: MyAllocator alloc = ...; alloc.installGlobally(); ... alloc.deallocateAll(); alloc.uninstallGlobally();

The difference is that I'd like the ability for some objects to live after the region ends. I.e. it's reducing the scope of the GC, not temporarily replacing it with a completely separate heap.

Then I guess you'd need to use two allocators. Andrei
Feb 14 2014
prev sibling next sibling parent "Francesco Cattoglio" <francesco.cattoglio gmail.com> writes:
On Friday, 14 February 2014 at 11:28:11 UTC, Jerry wrote:
 Track says keep track of objects allocated after the track 
 call, and
 cleanup only looks at those objects that were recently 
 allocated,
 ignoring the rest of the heap.

Track cannot make sure that no reference escapes, therefore cleaning up an object could be a huge error. This would however make sense e.g. inside pure functions.
Feb 14 2014
prev sibling parent Jerry <jlquinn optonline.net> writes:
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

 On 2/14/14, 3:28 AM, Jerry wrote:
 Track says keep track of objects allocated after the track call, and
 cleanup only looks at those objects that were recently allocated,
 ignoring the rest of the heap.

 If you're saying that allocators will provide the means of doing this,
 then that's fine.

I'm thinking of something like: MyAllocator alloc = ...; alloc.installGlobally(); ... alloc.deallocateAll(); alloc.uninstallGlobally();

The difference is that I'd like the ability for some objects to live after the region ends. I.e. it's reducing the scope of the GC, not temporarily replacing it with a completely separate heap.
Feb 14 2014
prev sibling next sibling parent "thedeemon" <dlang thedeemon.com> writes:
On Friday, 14 February 2014 at 04:41:43 UTC, Jerry wrote:
 My thought is to have something like the following:

 GC.track();
 auto obj = allocateStuff();
 GC.cleanup(obj);

 The idea here is that track() tells GC to explicitly track all 
 objects
 created from that point until the cleanup call.  The cleanup() 
 call
 tells gc to limit its collection to those objects allocated 
 since the
 track() call.  The obj parameter tells gc to consider obj live.

What if allocateStuff() writes address of some newly allocated object to a field of some old object existing before GC.track()? You can't just scan only objects created after GC.track(), this might create dangling references in the "old generation".
Feb 13 2014
prev sibling next sibling parent "Paulo Pinto" <pjmlp progtools.org> writes:
On Friday, 14 February 2014 at 04:41:43 UTC, Jerry wrote:
 Hi all,

 I just had the following thought on limiting the gc in regions.
  I don't
 know if this would address some of Manu's concerns, but here 
 goes:

 My thought is to have something like the following:

 GC.track();
 auto obj = allocateStuff();
 GC.cleanup(obj);

 The idea here is that track() tells GC to explicitly track all 
 objects
 created from that point until the cleanup call.  The cleanup() 
 call
 tells gc to limit its collection to those objects allocated 
 since the
 track() call.  The obj parameter tells gc to consider obj live.

 This way, you can avoid tracking everything that may get 
 created, but
 you can limit how much work gets done.

 Comments? Slams?

 Jerry

How do imagine it to work in multi-core programs? Does it only track thread local allocations?
Feb 13 2014
prev sibling next sibling parent "Namespace" <rswhite4 googlemail.com> writes:
On Friday, 14 February 2014 at 04:41:43 UTC, Jerry wrote:
 Hi all,

 I just had the following thought on limiting the gc in regions.
  I don't
 know if this would address some of Manu's concerns, but here 
 goes:

 My thought is to have something like the following:

 GC.track();
 auto obj = allocateStuff();
 GC.cleanup(obj);

 The idea here is that track() tells GC to explicitly track all 
 objects
 created from that point until the cleanup call.  The cleanup() 
 call
 tells gc to limit its collection to those objects allocated 
 since the
 track() call.  The obj parameter tells gc to consider obj live.

 This way, you can avoid tracking everything that may get 
 created, but
 you can limit how much work gets done.

 Comments? Slams?

 Jerry

Looks like DIP 46: http://wiki.dlang.org/DIP46 I like the idea.
Feb 14 2014
prev sibling next sibling parent reply "tcak" <tcak pcak.com> writes:
On Friday, 14 February 2014 at 04:41:43 UTC, Jerry wrote:
 Hi all,

 I just had the following thought on limiting the gc in regions.
  I don't
 know if this would address some of Manu's concerns, but here 
 goes:

 My thought is to have something like the following:

 GC.track();
 auto obj = allocateStuff();
 GC.cleanup(obj);

 The idea here is that track() tells GC to explicitly track all 
 objects
 created from that point until the cleanup call.  The cleanup() 
 call
 tells gc to limit its collection to those objects allocated 
 since the
 track() call.  The obj parameter tells gc to consider obj live.

 This way, you can avoid tracking everything that may get 
 created, but
 you can limit how much work gets done.

 Comments? Slams?

 Jerry

A programmer's aim is to tell computer what to do. Purpose of GC is to help him to prevent problems. In default, AFAIK, GC considers every part of memory in case there are references in them. Well, if the time taking process is scanning all memory, programmer could tell to GC, if he/she trusts about correctness, not to scan some parts of memory to limit scanning area. Example, if I create a char array of 10,000 items, why would I want GC to scan it. I won't put any object references in it for sure.
Feb 14 2014
parent Paulo Pinto <pjmlp progtools.org> writes:
Am 14.02.2014 16:46, schrieb tcak:
 A programmer's aim is to tell computer what to do. Purpose of GC is
 to help him to prevent problems. In default, AFAIK, GC considers
 every part of memory in case there are references in them. Well, if
 the time taking process is scanning all memory, programmer could tell
 to GC, if he/she trusts about correctness, not to scan some parts of
 memory to limit scanning area. Example, if I create a char array of
 10,000 items, why would I want GC to scan it. I won't put any object
 references in it for sure.

This only works when you are the only guy on the team and have a small codebase to visualize on your head. The moment a middle size team comes into play, it is chaos. There is a reason why manual memory managed languages have lost their place on the enterprise. -- Paulo

Many people wants to disable GC to improve performance (if there are other reasons, it is not included here.). If after adding new codes, memory problems start, just disable the GC-disabled-code-parts (as I exampled with that 10,000 item array). This way, errors will disappear and performance may decrease a little. Then fixing can be done to increase performance again. I think enabling GC for only some parts of code is wrong. It should be disabling it for some parts of code. This way, if programmer loses control of memory, he/she can remove GC-disabling codes, and tada everything works correctly without doing any other changes.

Again, this example only works when you are the only guy working on the code. For example, projects of the size of Linux kernel are only viable in languages like C, because there are guys validating every single line of code that gets added to the kernel. In most projects that is far from truth, everyone just checks whatever they feel like. Then when the thing blows up on the customer and there are high escalation meetings going over, there are a few poor souls, usually senior developers, going over commit history and using tools like Insure++ to track down the issue. Sometimes it takes a whole week to track down such culprits. I don't miss those days. -- Paulo
Feb 14 2014
prev sibling next sibling parent "Paulo Pinto" <pjmlp progtools.org> writes:
On Friday, 14 February 2014 at 09:01:09 UTC, tcak wrote:
 On Friday, 14 February 2014 at 04:41:43 UTC, Jerry wrote:
 Hi all,

 I just had the following thought on limiting the gc in regions.
 I don't
 know if this would address some of Manu's concerns, but here 
 goes:

 My thought is to have something like the following:

 GC.track();
 auto obj = allocateStuff();
 GC.cleanup(obj);

 The idea here is that track() tells GC to explicitly track all 
 objects
 created from that point until the cleanup call.  The cleanup() 
 call
 tells gc to limit its collection to those objects allocated 
 since the
 track() call.  The obj parameter tells gc to consider obj live.

 This way, you can avoid tracking everything that may get 
 created, but
 you can limit how much work gets done.

 Comments? Slams?

 Jerry

A programmer's aim is to tell computer what to do. Purpose of GC is to help him to prevent problems. In default, AFAIK, GC considers every part of memory in case there are references in them. Well, if the time taking process is scanning all memory, programmer could tell to GC, if he/she trusts about correctness, not to scan some parts of memory to limit scanning area. Example, if I create a char array of 10,000 items, why would I want GC to scan it. I won't put any object references in it for sure.

This only works when you are the only guy on the team and have a small codebase to visualize on your head. The moment a middle size team comes into play, it is chaos. There is a reason why manual memory managed languages have lost their place on the enterprise. -- Paulo
Feb 14 2014
prev sibling next sibling parent "tcak" <tcak pcak.com> writes:
 A programmer's aim is to tell computer what to do. Purpose of 
 GC is to help him to prevent problems. In default, AFAIK, GC 
 considers every part of memory in case there are references in 
 them. Well, if the time taking process is scanning all memory, 
 programmer could tell to GC, if he/she trusts about 
 correctness, not to scan some parts of memory to limit 
 scanning area. Example, if I create a char array of 10,000 
 items, why would I want GC to scan it. I won't put any object 
 references in it for sure.

This only works when you are the only guy on the team and have a small codebase to visualize on your head. The moment a middle size team comes into play, it is chaos. There is a reason why manual memory managed languages have lost their place on the enterprise. -- Paulo

Many people wants to disable GC to improve performance (if there are other reasons, it is not included here.). If after adding new codes, memory problems start, just disable the GC-disabled-code-parts (as I exampled with that 10,000 item array). This way, errors will disappear and performance may decrease a little. Then fixing can be done to increase performance again. I think enabling GC for only some parts of code is wrong. It should be disabling it for some parts of code. This way, if programmer loses control of memory, he/she can remove GC-disabling codes, and tada everything works correctly without doing any other changes.
Feb 14 2014
prev sibling next sibling parent Jerry <jlquinn optonline.net> writes:
"Paulo Pinto" <pjmlp progtools.org> writes:

 On Friday, 14 February 2014 at 04:41:43 UTC, Jerry wrote:
 My thought is to have something like the following:

 GC.track();
 auto obj = allocateStuff();
 GC.cleanup(obj);

 The idea here is that track() tells GC to explicitly track all objects
 created from that point until the cleanup call.  The cleanup() call
 tells gc to limit its collection to those objects allocated since the
 track() call.  The obj parameter tells gc to consider obj live.

 This way, you can avoid tracking everything that may get created, but
 you can limit how much work gets done.

How do imagine it to work in multi-core programs? Does it only track thread local allocations?

I think this can be handled by storing the thread that requests tracking, and then each allocation is tracked if it's done from the same thread that requested tracking. Then cleanup just considers the objects that were tracked.
Feb 14 2014
prev sibling next sibling parent Jerry <jlquinn optonline.net> writes:
"tcak" <tcak pcak.com> writes:

 Many people wants to disable GC to improve performance (if there are other
 reasons, it is not included here.). If after adding new codes, memory problems
 start, just disable the GC-disabled-code-parts (as I exampled with that 10,000
 item array). This way, errors will disappear and performance may decrease a
 little. Then fixing can be done to increase performance again.

 I think enabling GC for only some parts of code is wrong. It should be
 disabling it for some parts of code. This way, if programmer loses control of
 memory, he/she can remove GC-disabling codes, and tada everything works
 correctly without doing any other changes.

My proposal was to leave GC enabled for the whole program. The track and cleanup call pair is intended to narrow the scope of GC in some regions of the code.
Feb 14 2014
prev sibling parent Jerry <jlquinn optonline.net> writes:
"thedeemon" <dlang thedeemon.com> writes:

 On Friday, 14 February 2014 at 04:41:43 UTC, Jerry wrote:
 My thought is to have something like the following:

 GC.track();
 auto obj = allocateStuff();
 GC.cleanup(obj);

 The idea here is that track() tells GC to explicitly track all objects
 created from that point until the cleanup call.  The cleanup() call
 tells gc to limit its collection to those objects allocated since the
 track() call.  The obj parameter tells gc to consider obj live.

What if allocateStuff() writes address of some newly allocated object to a field of some old object existing before GC.track()? You can't just scan only objects created after GC.track(), this might create dangling references in the "old generation".

This is a concern. Rather than passing a single object into the cleanup, a list of objects to consider live can be passed in. That would cover at least some of these situations, but not all. Would it still be useful given this limitation? Would it give someone looking for tighter control over GC the tools they need?
Feb 14 2014