www.digitalmars.com         C & C++   DMDScript  

D - D Garbage collector. "weak" pointers.

reply Ilya Minkov <midiclub 8ung.at> writes:
Q: does D GC have "weak" pointers?

I didn't find it being mentioned in the docs.

"Weak" pointers are pointers, which are similar to NULL pointers, that 
no legal access can be done with them. They point to real objects, which 
however can be collected by a garbage collector. In the case a 
pointed-to object is collected, a "weak" pointer is turned into NULL 
pointer.

This is all for the sake of caching.

For implementation to be useful, it should be possible to turn a normal 
pointer into weak pointer at runtime and back. Dereferencing a weak 
pointer is to be an error, they have to be turned into "normal" for any 
operations, and afterwards back into weak if desired.

I have some more detailed implementation ideas, but i don't have time 
now and will share them later. Together with possible usage examples.

-i.
Jan 06 2003
parent reply "Mike Wynn" <mike.wynn l8night.co.uk> writes:
if, (big if) the GC finalise/destructor behaviour was a little different
then java style weak, soft, phantom refs could all be done with a simple
template for your classes.
I personally hate the finalise behaviour, and think you should get two calls
first an "unreachable" which informs the object that it is no longer
reachable at which time the code can use any means to "revive" the object
(it may contain a ref to a live object or can access a static)
this is called EVERY time the object is found to be unreachable at the end
of a GC cycle (even concurrent collectors have cycles AFAIK).
finally the destructor is called when the object's memory is finally and
permanatly released
(in debug builds it should be released a cycle later to make sure there is
no way it can referenced).
this also allows per class caches of objects (when its unreachable you can
put it into a class static list if its not full)

the upshot of this is (I believe) unfortunatly write barriers during GC
(like concurrent collectors use) to detect object becoming live by putting a
ref into an already walked object (if its not been walked yet then thats
o.k. and the unreachable func might be called twice). this check is only
needed within the destructor on a pausing collector.

afaik, D uses a very simple GC, things are live or not, when the destructor
is called its too late to save the object
you can simulate weak refs your self by using refcounts and indexs.
your class is actually just a index into a static array, when you create a
new one you put the data into the static (which will be forever live) and
mark that index with a count  of 1, you can either create copy refs of just
pass that "ref" about having N refs to it is fine, its destructor decrements
the count, and if 0 releases the data from the array
for weak refs you do need either another array of id's or have the data
contain an incrementing id (globally unique) so you can tell if the object
it the "right" one
the weak refs is just an index, and the id (but does not effect the ref
count).
I'm sure many of you are thinking that you can use the address of the object
^somevalue for the ID, but alas if the memory manager caches similar sized
blocks, then it is very possible that the same memory block may become the
same class again (may not be true of D now, but I've worked on Java VM's
which did this).

you can also (if you don't mind poiner mangleing to stop the GC knowing you
have a ptr to an object) do this:
your class has a list of weak refs that refer to it, its destructor notifies
these that its been collected
each weak ref contains a mangled ptr to the object and its destructor
removes it from the objects list of weak refs.
this only allows caching between GC cycles, and not the more often required
(I want to be live for N cycles or until memory is realy tight).

I'm sure if I though about it long enought you can do it by constantly
objects that are created unreachable to detect the GC cycles.

Mike.

"Ilya Minkov" <midiclub 8ung.at> wrote in message
news:avclh0$o5r$1 digitaldaemon.com...
 Q: does D GC have "weak" pointers?

 I didn't find it being mentioned in the docs.

 "Weak" pointers are pointers, which are similar to NULL pointers, that
 no legal access can be done with them. They point to real objects, which
 however can be collected by a garbage collector. In the case a
 pointed-to object is collected, a "weak" pointer is turned into NULL
 pointer.

 This is all for the sake of caching.

 For implementation to be useful, it should be possible to turn a normal
 pointer into weak pointer at runtime and back. Dereferencing a weak
 pointer is to be an error, they have to be turned into "normal" for any
 operations, and afterwards back into weak if desired.

 I have some more detailed implementation ideas, but i don't have time
 now and will share them later. Together with possible usage examples.

 -i.
Jan 06 2003
next sibling parent reply Evan McClanahan <evan dontSPAMaltarinteractive.com> writes:
Mike Wynn wrote:
 the upshot of this is (I believe) unfortunatly write barriers during GC
 (like concurrent collectors use) to detect object becoming live by putting a
 ref into an already walked object (if its not been walked yet then thats
 o.k. and the unreachable func might be called twice). this check is only
 needed within the destructor on a pausing collector.
From the research that I've done, any concurrent collector would need read and write barriers. Maybe there's a way to do it without them, but all of the signs that I've seen point to know. They're easy and fast in hardware, but they aren't in any current and common hardware, and would require OS support.
 afaik, D uses a very simple GC, things are live or not, when the destructor
 is called its too late to save the object
 you can simulate weak refs your self by using refcounts and indexs.
 your class is actually just a index into a static array, when you create a
 new one you put the data into the static (which will be forever live) and
 mark that index with a count  of 1, you can either create copy refs of just
 pass that "ref" about having N refs to it is fine, its destructor decrements
 the count, and if 0 releases the data from the array
 for weak refs you do need either another array of id's or have the data
 contain an incrementing id (globally unique) so you can tell if the object
 it the "right" one
 the weak refs is just an index, and the id (but does not effect the ref
 count).
 I'm sure many of you are thinking that you can use the address of the object
 ^somevalue for the ID, but alas if the memory manager caches similar sized
 blocks, then it is very possible that the same memory block may become the
 same class again (may not be true of D now, but I've worked on Java VM's
 which did this).
Yeah, the current D gc is pretty simple. It should be easy enough to tear it out and add in a new one, but I think that you would need some language support to do something like software write and read barriers. I might be wrong, but I think that at the moment it isn't possible to do some of the more interesting and up to date collectors with D. This could change, of course, but even I'm not sure that having read and write barriers are the Right Things to Do. It would be great to have a language that could freely either use GC or manual and had a good selection of drop in collectors that could be used according to your needs. I'm not sure that D is there yet, but there's been some progress, and I think that there will be more in the future. Evan
Jan 07 2003
parent "Mike Wynn" <mike.wynn l8night.co.uk> writes:
  From the research that I've done, any concurrent collector would need
 read and write barriers.  Maybe there's a way to do it without them, but
   all of the signs that I've seen point to know.  They're easy and fast
 in hardware, but they aren't in any current and common hardware, and
 would require OS support.
The only GC I've been heavily exposed to was the GC in the Insignia embedded Java VM (EVM). this is a Fully concurrent accurate mark-sweep non copying collector. it does not have read barriers it does have write and return barriers when the GC is active. and several software patents, most I think where to do with dealing with Java's GC semantics. I believe read barriers are only needed on true concurrent collectors (that is where the collector runs independantly from the CPU either multi cpu (some only running code, some always running GC may have to walk the hot end of the stack at the same time as the code is running) or hardware GC. the EVM however run fine on dual Pentium machines without read barriers, I think this is becasue even the working end of the stack can not be walked by the GC if that thread is active. there is also a question of who marks, the thread itself or the GC.
Jan 07 2003
prev sibling parent reply Ilya Minkov <midiclub 8ung.at> writes:
Mike Wynn wrote:
 if, (big if) the GC finalise/destructor behaviour was a little different
 then java style weak, soft, phantom refs could all be done with a simple
 template for your classes.
I don't know much about java. The implementation of OCaml is, you can create a repository which can hold weak pointers. Access is extremely awkward, as you actually have to go and pull this repository around so that it doesn't get lost. :) Really, since there's no concept of pointer in ocaml, you can't turn anything into "weak" or "back to normal", so you create an object "weak pointer table", which behaves like an array of objects, members of which magically disappear from time to time. Can you imagine anything worse?
 I personally hate the finalise behaviour, and think you should get two calls
 first an "unreachable" which informs the object that it is no longer
 reachable at which time the code can use any means to "revive" the object
 (it may contain a ref to a live object or can access a static)
Ref to a live object? So what? If it's requiered by noone else, then it may be collected too. If some living object references it, it'll stay alive. Global access? If it mixes around with globals, they can be fixed up during "normal" destruction. Or do you have some example where you need a 2-pass destructor? I thought it was the sense of the weak references to allow collection, not to prevent it. If you want to prevent it, you still have normal references. :) I somehow don't understand, why an object which is not requiered by anyone or anything should try to save himself?
 this is called EVERY time the object is found to be unreachable at the end
 of a GC cycle (even concurrent collectors have cycles AFAIK).
 finally the destructor is called when the object's memory is finally and
 permanatly released
 (in debug builds it should be released a cycle later to make sure there is
 no way it can referenced).
Why relese later? Isn't it better to place a trap then to hide a bug?
 this also allows per class caches of objects (when its unreachable you can
 put it into a class static list if its not full)
 
 the upshot of this is (I believe) unfortunatly write barriers during GC
 (like concurrent collectors use) to detect object becoming live by putting a
 ref into an already walked object (if its not been walked yet then thats
 o.k. and the unreachable func might be called twice). this check is only
 needed within the destructor on a pausing collector.
 
 afaik, D uses a very simple GC, things are live or not, when the destructor
 is called its too late to save the object
OK. But imagine you have a tree or network of objects, and you decide you don't need the ones which are further then, say, 10 steps away from you. You can always restore them, but it involves calculations or loading from disk. Which might both be better than if it would get swapped away. But in the case you've got a lot of memory, you might come later to the point where you want these objects again. You've "let them go" but the destruction has not onset yet. Would you prefer to try to re-use them, or re-create them, spend double memory and time, and wait until the GC collects the unused copy? But mind, the object doesn't restore itself, it gets restored by the others.
 you can simulate weak refs your self by using refcounts and indexs.
 your class is actually just a index into a static array, when you create a
 new one you put the data into the static (which will be forever live) and
 mark that index with a count  of 1, you can either create copy refs of just
 pass that "ref" about having N refs to it is fine, its destructor decrements
 the count, and if 0 releases the data from the array
I've thought if that. It is applicable in some cases without major loss of efficiency. But not as a rule. Usually an application needs a kind of a central repository. For example, to identify, whether an object has to be created, or already exists, if it's, say, a representation of a file or something similar. So, the constructor of an object would look in a repository for a similar one of itself, and if found increment the counter and return the found, else create a new one. A destructor would decrement the counter in the repository. Thes schema is only limited to certain kinds of applications, since the objects have to be "signed" with their source or id, managing such ids can become a problem when objects are copied to be changed. It only makes sense if the reuse grade of the same objects is intended to be very high. Just i thought, the function of the repository would be drastically simplified if there was native support for "weak" pointers.
 for weak refs you do need either another array of id's or have the data
 contain an incrementing id (globally unique) so you can tell if the object
 it the "right" one
 the weak refs is just an index, and the id (but does not effect the ref
 count).
 I'm sure many of you are thinking that you can use the address of the object
 ^somevalue for the ID, but alas if the memory manager caches similar sized
 blocks, then it is very possible that the same memory block may become the
 same class again (may not be true of D now, but I've worked on Java VM's
 which did this).
Another thing: what if an object has moved? Imgine you have a number of objects maintaining weak references to some object, which is also maintained somewhere as a strong one. Then, for efficiency reasons the GC would compact the heap and move it somewhere. There you are. The object is not yet lost, but the weak references don't turn back into normal, and the object could get recreated. Now you have another useless copy (or multiple copies) of the same thing. A *real* waste, since there's no way they could all be cleaned up.
 
 you can also (if you don't mind poiner mangleing to stop the GC knowing you
 have a ptr to an object) do this:
 your class has a list of weak refs that refer to it, its destructor notifies
 these that its been collected
 each weak ref contains a mangled ptr to the object and its destructor
 removes it from the objects list of weak refs.
 this only allows caching between GC cycles, and not the more often required
 (I want to be live for N cycles or until memory is realy tight).
Rather each "weak reference" contains a cell number in a central reposirory of mangled pointers, which gets NULLed by an object destructor. Object must know its cell number. Then as i said, a "weak reference" must be turned into strong before use, the fuction like StrengthenPtr(void *) would only have to look up a pointer in a central repository and de-mangle it into the reference to be made strong. And the object would need to update one value on destruction and need not maintain any lists. But still, it doesn't solve the problem of heap compaction. The objects have to be notified somehow if the heap has been compacted. They could detect such things if they had refrences to "strong" references to them! But i bet it'd be too much of a complication and a slowdown. Maybe it's easier to make a change into the GC than to try to fake some workaround for it? :>
 
 I'm sure if I though about it long enought you can do it by constantly
 objects that are created unreachable to detect the GC cycles.
:/ Reliability?
 
 Mike.
 
 "Ilya Minkov" <midiclub 8ung.at> wrote in message
 news:avclh0$o5r$1 digitaldaemon.com...
 
Q: does D GC have "weak" pointers?

I didn't find it being mentioned in the docs.

"Weak" pointers are pointers, which are similar to NULL pointers, that
no legal access can be done with them. They point to real objects, which
however can be collected by a garbage collector. In the case a
pointed-to object is collected, a "weak" pointer is turned into NULL
pointer.

This is all for the sake of caching.

For implementation to be useful, it should be possible to turn a normal
pointer into weak pointer at runtime and back. Dereferencing a weak
pointer is to be an error, they have to be turned into "normal" for any
operations, and afterwards back into weak if desired.

I have some more detailed implementation ideas, but i don't have time
now and will share them later. Together with possible usage examples.

-i.
Jan 07 2003
parent reply "Mike Wynn" <mike.wynn l8night.co.uk> writes:
 I personally hate the finalise behaviour, and think you should get two
calls
 first an "unreachable" which informs the object that it is no longer
 reachable at which time the code can use any means to "revive" the
object
 (it may contain a ref to a live object or can access a static)
Ref to a live object? So what? If it's requiered by noone else, then it may be collected too. If some living object references it, it'll stay
alive.
 Global access? If it mixes around with globals, they can be fixed up
 during "normal" destruction.
an unreachable object can contain a ref to a live object. what is there to stop the destructor putting a ref to the dying object onto that live object which it has a ref to (or a static which anyone can ref).
Jan 07 2003
parent reply Ilya Minkov <midiclub 8ung.at> writes:
Mike Wynn wrote:
I personally hate the finalise behaviour, and think you should get two
calls
first an "unreachable" which informs the object that it is no longer
reachable at which time the code can use any means to "revive" the
object
(it may contain a ref to a live object or can access a static)
Ref to a live object? So what? If it's requiered by noone else, then it may be collected too. If some living object references it, it'll stay
alive.
Global access? If it mixes around with globals, they can be fixed up
during "normal" destruction.
an unreachable object can contain a ref to a live object. what is there to stop the destructor putting a ref to the dying object onto that live object which it has a ref to (or a static which anyone can ref).
Sure. BUT WHY BLOODY WHY DO YOU WANT THAT? Is there an example where it makes sense?
Jan 07 2003
parent reply "Mike Wynn" <mike.wynn l8night.co.uk> writes:
 an unreachable object can contain a ref to a live object.
 what is there to stop the destructor putting a ref to the dying object
onto
 that live object which it has a ref to
 (or a static which anyone can ref).
Sure. BUT WHY BLOODY WHY DO YOU WANT THAT? Is there an example where it makes sense?
its not why but that you can, and becuase you can the GC MUST cope. perhaps something like .... (not real code) class MyCachedObj { ParentObj parent; Time created; static MyCachedObj[] saved; this( ParentObj p0 ) { parent =p0; created = Time.now(); } // stuff ! this~() { /// this is unreachable at this point if ( (Time.Now()- created) < Time.TenMinutes ) { saved ~= this; // revived via a static (this is now reachable) // or parent.lastToDie = this; // revived via a ref to a reachable obj (this is now reachable if parent is reachable) } } }
Jan 07 2003
parent reply Burton Radons <loth users.sourceforge.net> writes:
Mike Wynn wrote:
an unreachable object can contain a ref to a live object.
what is there to stop the destructor putting a ref to the dying object
onto
that live object which it has a ref to
(or a static which anyone can ref).
Sure. BUT WHY BLOODY WHY DO YOU WANT THAT? Is there an example where it makes sense?
its not why but that you can, and becuase you can the GC MUST cope.
We have no control over what order destructors are called, so if you have exclusive ownership of a sub-object then it might already be destroyed and its pointer invalid, so telling the runtime to keep yourself alive is pretty pointless. In your example's case, created would fit that description. Workarounds for this that I can perceive are all more difficult for the user than keeping the object alive in normal ways. The GC doesn't have to cope with anything the language hasn't alloted to it. I would find it very surprising if D implementations have to support revival, as that would make the reference implementation incorrect.
 perhaps something like .... (not real code)
 
 class MyCachedObj
 {
     ParentObj parent;
     Time created;
     static MyCachedObj[] saved;
 
     this( ParentObj p0 ) { parent =p0; created = Time.now(); }
 
         //  stuff !
 
     this~()
     {
             /// this is unreachable at this point
         if ( (Time.Now()- created) < Time.TenMinutes )
         {
             saved ~= this;    // revived via a static (this is now
 reachable)
             // or
             parent.lastToDie = this; // revived via a ref to a reachable obj
 (this is now reachable if parent is reachable)
         }
     }
 }
Jan 08 2003
parent Ilya Minkov <midiclub 8ung.at> writes:
Burton Radons wrote:
 We have no control over what order destructors are called, so if you
  have exclusive ownership of a sub-object then it might already be 
 destroyed and its pointer invalid, so telling the runtime to keep 
 yourself alive is pretty pointless.  In your example's case, created
  would fit that description.  Workarounds for this that I can
 perceive are all more difficult for the user than keeping the object
 alive in normal ways.
Right, I agree. There's no sense saving a dying object since it may be its time to die, besides it would imply a lot of additional memory analysis and probably an amount of slowdown. My original idea were the "weak pointers" which would be pointers/references, which do not affect GC's decision upon cleaning up an object or not. A function is called over a normal reference to turn it into unusable state. Another function, when called, re-creates a reference, makes sure the object is intact, and then assignes it back to the formerly mangled reference. It started a long discussion with tricking methods, which are all pretty bad and useless. Although i think i know now of a situation, in which an object can try to save itself. Usually GC collects objects youngest-first, but for caches the opposite might be true. Any such tricking around would either effectively shut down GC, or severely worsen the situation, likely both. It would be cool if objects could optionally have designated fields filled in by GC at each scan, containing GC-obtained statistics, such as a number of users and such. Although these values won't be always current, they might somehow simpify the job for the central repository, and would be "for free". Maybe some clean solution would be good, such as a special type of pointers, not considered for to be a "user" of an object, not affecting the decision to destroy them or not, but being updated if the object moves or something. They need not be bound to the "real" destruction of the object, it may be enough if it is NULLed out as soon a GC finds no other users. Even simpler: let objects opionally have a designated method invoked by a GC giving some handle to it, allowing method to explore current GC statistics relating to this object. For example, if there are no users besides the "repository" object, it might NULL out the pointer to itself in the repository. A limitation needs probably to be imposed, so that it doesn't try to revive itself by spending pointers. Not NULLing out its pointer in the repository would be its way of revival and would be considered legal by the GC, at the contrary pushing some late pointers around would not, as all ot these changes may only affect next scanning pass. Pushing around late pointers would create a dangling pointer, which is a real danger the GC is actually born to cope with. I think GC is a great improvement over all other methods of memory management and it should not be curcumvented too much, and in no case it makes sense to make some ref-counting and alike when the similar information is collected by GC, it would be wise to use it to extent possible. -i.
 The GC doesn't have to cope with anything the language hasn't alloted
  to it.  I would find it very surprising if D implementations have to
  support revival, as that would make the reference implementation 
 incorrect.
Yup.
 
 perhaps something like .... (not real code)
 
 class MyCachedObj { ParentObj parent; Time created; static 
 MyCachedObj[] saved;
 
 this( ParentObj p0 ) { parent =p0; created = Time.now(); }
 
 //  stuff !
 
 this~() { /// this is unreachable at this point if ( (Time.Now()- 
 created) < Time.TenMinutes ) { saved ~= this;    // revived via a 
 static (this is now reachable) // or parent.lastToDie = this; // 
 revived via a ref to a reachable obj (this is now reachable if 
 parent is reachable) } } }
Jan 09 2003