www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - large objects and GC

reply Fawzi Mohamed <fmohamed mac.com> writes:
There was recently a discussion on large array and GC.

The main conclusion was that the fact that the garbage collector is not 
exact large arrays don't get collected. On tango this seems less 
problematic, but can still be an issue.

I am writing something that needs large arrays, and one obvious 
solution is to manually allocate the memory.
This works, but then one has to use some kind of memory management, for 
example either having just 1 owner, or using reference counting (with 
synchronization or atomic operations).
The problem is that if the owner/object that do reference counting are 
managed by the gc they might stay uncollected for a quite long time 
because they are probably small, so one needs to really do everything 
manually, use scope,...
Obviously this is efficient and one should do it with large objects, 
but it would be nice if the thing could transition more gracefully to 
an automatic managed model.

Large object when allocated should get a region for themselves when 
allocated with the gc, so there could be another approach.
One could to add a flag to the garbage collector. This flag would say 
to the gc to ignore inner pointer in a region when deciding if the 
region should be collected (but the pointers should be updated when the 
region is moved).

To have internal pointers one should also keep a pointer to the base object.
Basically one has automatic reference counting, where the references 
are pointers to the base object.
The advantage is that it is automatic, and that memory can be relocated.

If other think it it is a good idea I am willing to invest some time to 
explore it.

Fawzi
May 16 2008
next sibling parent reply "Vladimir Panteleev" <thecybershadow gmail.com> writes:
On Sat, 17 May 2008 00:32:26 +0300, Fawzi Mohamed <fmohamed mac.com> wrote:

 The problem is that if the owner/object that do reference counting are managed
by the gc they might stay uncollected for a quite long time because they are
probably small, so one needs to really do everything manually, use scope,...

I don't understand your logic here. The GC does not prioritize objects based on their size. Smaller objects are much less likely to leak because of the proportionally smaller chance of a bogus pointer keeping it "referenced".
 One could to add a flag to the garbage collector. This flag would say to the
gc to ignore inner pointer in a region when deciding if the region should be
collected.
 To have internal pointers one should also keep a pointer to the base object.

This will be effectively the same as having a "wrapper" class for manually allocated memory. The class destructor, which will be called by the GC, should deallocate the external memory. The wrapper class should only have one field, and thus be very small and it will have the chance of leaking almost equivalent to the method you describe. I see no necessity for reference counting either, you just pass around the reference to the wrapper object.
 (but the pointers should be updated when the region is moved)

A moving garbage collector must also be an exact garbage collector. -- Best regards, Vladimir mailto:thecybershadow gmail.com
May 16 2008
next sibling parent reply BCS <ao pathlink.com> writes:
Reply to Vladimir,

 This will be effectively the same as having a "wrapper" class for
 manually allocated memory. The class destructor, which will be called
 by the GC, should deallocate the external memory. The wrapper class
 should only have one field, and thus be very small and it will have
 the chance of leaking almost equivalent to the method you describe. I
 see no necessity for reference counting either, you just pass around
 the reference to the wrapper object.

I think this is based on the assumption that lots of systems only ever use the base pointer so why not let the programer leverage that?
May 16 2008
parent Fawzi Mohamed <fmohamed mac.com> writes:
On 2008-05-17 03:00:48 +0200, BCS <ao pathlink.com> said:

 Reply to Vladimir,
 
 Changing this would
 break many things,

for those cases, you don't opt-in. The dafault would stay the same as now.

exactly, maybe I should have been clearer, this approach is opt-in.
May 17 2008
prev sibling next sibling parent reply "Vladimir Panteleev" <thecybershadow gmail.com> writes:
On Sat, 17 May 2008 00:58:41 +0300, BCS <ao pathlink.com> wrote:

 I think this is based on the assumption that lots of systems only ever use the
base pointer so why not let the programer leverage that?

Yes, but in lots of cases this is not the case. Changing this would break many things, while it can be worked around with wrapper objects. -- Best regards, Vladimir mailto:thecybershadow gmail.com
May 16 2008
parent BCS <ao pathlink.com> writes:
Reply to Vladimir,

 Changing this would
 break many things,

for those cases, you don't opt-in. The dafault would stay the same as now.
May 16 2008
prev sibling next sibling parent reply Fawzi Mohamed <fmohamed mac.com> writes:
On 2008-05-16 23:54:49 +0200, "Vladimir Panteleev" 
<thecybershadow gmail.com> said:

 On Sat, 17 May 2008 00:32:26 +0300, Fawzi Mohamed <fmohamed mac.com> wrote:
 
 The problem is that if the owner/object that do reference counting are 
 managed by the gc they might stay uncollected for a quite long time 
 because they are probably small, so one needs to really do everything 
 manually, use scope,...

I don't understand your logic here. The GC does not prioritize objects based on their size. Smaller objects are much less likely to leak because of the proportionally smaller chance of a bogus pointer keeping it "referenced".

If I understood correctly the actual gc (I looked at tango's one, but it seems that it is just a slighlty improved version of phobos gc) it doesn't know anything about single objects, it just works with pools. This way the number of object it has to handle stays manageable. If an object is big it gets its own pool, whereas if it is small it gets in a pool with other objects. Now the pool will stay around as long as any objet into it has references. When the pools goes away all the finalizers are called and then memory is released (not necessarily to the system, but at least to the gc). This behavior is ok as long as the size of the object is the one the gc sees, if the pools have a reasonable size the memory loss stays reasonable. But now look at the typical use of an array initialized from other arrays through calculations that need temporary arrays. Using small wrappers both the result and the temporary arrays are likely to be in the same pool. So as long as the result is kept around all the temporaries used to create it stay around. It is clear that if the temporary actually have a large amount of manual allocated memory this result is a waste of resources. The result is that you don't have to just manually manage the big memory allocation, but also the wrappers. I know that with big objects manual management is probably a good idea, but I would like a system that can work reasonably well with a more relaxed management. I think that my proposal achieves this with a small change.
 One could to add a flag to the garbage collector. This flag would say 
 to the gc to ignore inner pointer in a region when deciding if the 
 region should be collected.
 To have internal pointers one should also keep a pointer to the base object.

This will be effectively the same as having a "wrapper" class for manually allocated memory. The class destructor, which will be called by the GC, should deallocate the external memory. The wrapper class should only have one field, and thus be very small and it will have the chance of leaking almost equivalent to the method you describe. I see no necessity for reference counting either, you just pass around the reference to the wrapper object.

see the previous point.
 (but the pointers should be updated when the region is moved)

A moving garbage collector must also be an exact garbage collector.

I know, it was just in case... Fawzi
May 17 2008
parent reply Fawzi Mohamed <fmohamed mac.com> writes:
On 2008-05-17 10:42:21 +0200, "Vladimir Panteleev" 
<thecybershadow gmail.com> said:

 On Sat, 17 May 2008 10:54:06 +0300, Fawzi Mohamed <fmohamed mac.com> wrote:
 
 Now the pool will stay around as long as any objet into it has references.

Heh, no, this is not the case. The GC will track references individually for every object inside the memory pool. The code for freeing sub-pool-size objects is in gcx.d, lines 2056 to 2075.

Thanks if that is the case then wrapper objects are ok. gcx.d, lines 2056 to 2075 of which codebase, where can I get that gcx? tango varsion has something else at those lines... Fawzi
May 17 2008
parent reply Sean Kelly <sean invisibleduck.org> writes:
== Quote from Fawzi Mohamed (fmohamed mac.com)'s article
 On 2008-05-17 10:42:21 +0200, "Vladimir Panteleev"
 <thecybershadow gmail.com> said:
 On Sat, 17 May 2008 10:54:06 +0300, Fawzi Mohamed <fmohamed mac.com> wrote:

 Now the pool will stay around as long as any objet into it has references.

Heh, no, this is not the case. The GC will track references individually for every object inside the memory pool. The code for freeing sub-pool-size objects is in gcx.d, lines 2056 to 2075.

gcx.d, lines 2056 to 2075 of which codebase, where can I get that gcx? tango varsion has something else at those lines...

In Tango: /trunk/lib/gc/basic/gcx.d In Phobos: /internal/gc/gcx.d For what you describe, the easiest thing would be to add a new bitfield and have an "allow interior pointers" per block, then check this bit during scanning before flagging a block as reachable. Sean
May 17 2008
parent Fawzi Mohamed <fmohamed mac.com> writes:
On 2008-05-17 21:35:07 +0200, Sean Kelly <sean invisibleduck.org> said:

 == Quote from Fawzi Mohamed (fmohamed mac.com)'s article
 On 2008-05-17 10:42:21 +0200, "Vladimir Panteleev"
 <thecybershadow gmail.com> said:
 On Sat, 17 May 2008 10:54:06 +0300, Fawzi Mohamed <fmohamed mac.com> wrote:
 
 Now the pool will stay around as long as any objet into it has references.

Heh, no, this is not the case. The GC will track references individually for every object inside the memory pool. The code for freeing sub-pool-size objects is in gcx.d, lines 2056 to 2075.

gcx.d, lines 2056 to 2075 of which codebase, where can I get that gcx? tango varsion has something else at those lines...

In Tango: /trunk/lib/gc/basic/gcx.d In Phobos: /internal/gc/gcx.d For what you describe, the easiest thing would be to add a new bitfield and have an "allow interior pointers" per block, then check this bit during scanning before flagging a block as reachable. Sean

thank you for the explanation, I had badly interpreted the "gc does not know anything about the objects", and I didn't actually try to test my understanding with a program. For what I want to do a wrapper object that does manual memory allocation fits the bill nicely. My proposal could still be useful to transform a-posteriori a large existing object to this memory management mode, and could avoid the need to have to manually delete it (as for example requested in http://d.puremagic.com/issues/show_bug.cgi?id=2105 ). I will look into the possibility of introducing this in the gc and having a function that given a pointer chacks if it has a full pool for only an object (and no space left for other objects) it switches the flag "no internal pointers". But as I don't need it, don't expect anything :) Fawzi
May 17 2008
prev sibling next sibling parent "Vladimir Panteleev" <thecybershadow gmail.com> writes:
On Sat, 17 May 2008 10:54:06 +0300, Fawzi Mohamed <fmohamed mac.com> wrote:

 Now the pool will stay around as long as any objet into it has references.

Heh, no, this is not the case. The GC will track references individually for every object inside the memory pool. The code for freeing sub-pool-size objects is in gcx.d, lines 2056 to 2075. -- Best regards, Vladimir mailto:thecybershadow gmail.com
May 17 2008
prev sibling parent "Vladimir Panteleev" <thecybershadow gmail.com> writes:
On Sat, 17 May 2008 12:06:45 +0300, Fawzi Mohamed <fmohamed mac.com> wrote:

 gcx.d, lines 2056 to 2075 of which codebase, where can I get that gcx?
 tango varsion has something else at those lines...

It's Phobos 1.x, sorry. The file is in \dmd\src\phobos\internal\gc\gcx.d -- Best regards, Vladimir mailto:thecybershadow gmail.com
May 17 2008
prev sibling parent reply BCS <ao pathlink.com> writes:
Reply to Fawzi,

 To have internal pointers one should also keep a pointer to the base
 object.

So in effect, only a pointer the the start of the block counts. Anything else is just ignored. Just making sure I'm understanding you correctly. (I think this is worth exploring, but I don't known how good it will be)
May 16 2008
parent Fawzi Mohamed <fmohamed mac.com> writes:
On 2008-05-16 23:55:08 +0200, BCS <ao pathlink.com> said:

 Reply to Fawzi,
 
 To have internal pointers one should also keep a pointer to the base
 object.

So in effect, only a pointer the the start of the block counts. Anything else is just ignored.

exactly that's the gist of the idea, and would let one use almost normal gc allocated memory instead of manually managed one. This behavior would kick in just if the gc decides that the object should get a pool for himself (obviously the object of that kind should *always* be treated as if they had a pool for themselves and would need a pointer to their base).
 Just making sure I'm understanding you correctly. (I think this is 
 worth exploring, but I don't known how good it will be)

good, I am just telling it here to see if I missed something basic, or it is worth exploring (and maybe someone more knowledgeable in the gc, does it for me ;). Fawzi
May 17 2008