D - D Garbage collector. "weak" pointers.

Ilya Minkov (15/15) Jan 06 2003 Q: does D GC have "weak" pointers?

Mike Wynn (51/66) Jan 06 2003 if, (big if) the GC finalise/destructor behaviour was a little different

Evan McClanahan (18/41) Jan 07 2003 From the research that I've done, any concurrent collector would need

Mike Wynn (15/20) Jan 07 2003 The only GC I've been heavily exposed to was the GC in the Insignia embe...

Ilya Minkov (68/151) Jan 07 2003 I don't know much about java. The implementation of OCaml is, you can

Mike Wynn (7/15) Jan 07 2003 alive.

Ilya Minkov (3/28) Jan 07 2003 Sure. BUT WHY BLOODY WHY DO YOU WANT THAT? Is there an example where it

Mike Wynn (23/30) Jan 07 2003 its not why but that you can, and becuase you can the GC MUST cope.

Burton Radons (11/51) Jan 08 2003 We have no control over what order destructors are called, so if you

Ilya Minkov (44/73) Jan 09 2003 Right, I agree. There's no sense saving a dying object since it may be

Ilya Minkov <midiclub 8ung.at> writes:

Q: does D GC have "weak" pointers?

I didn't find it being mentioned in the docs.

"Weak" pointers are pointers, which are similar to NULL pointers, that 
no legal access can be done with them. They point to real objects, which 
however can be collected by a garbage collector. In the case a 
pointed-to object is collected, a "weak" pointer is turned into NULL 
pointer.

This is all for the sake of caching.

For implementation to be useful, it should be possible to turn a normal 
pointer into weak pointer at runtime and back. Dereferencing a weak 
pointer is to be an error, they have to be turned into "normal" for any 
operations, and afterwards back into weak if desired.

I have some more detailed implementation ideas, but i don't have time 
now and will share them later. Together with possible usage examples.

-i.

Jan 06 2003

"Mike Wynn" <mike.wynn l8night.co.uk> writes:

if, (big if) the GC finalise/destructor behaviour was a little different
then java style weak, soft, phantom refs could all be done with a simple
template for your classes.
I personally hate the finalise behaviour, and think you should get two calls
first an "unreachable" which informs the object that it is no longer
reachable at which time the code can use any means to "revive" the object
(it may contain a ref to a live object or can access a static)
this is called EVERY time the object is found to be unreachable at the end
of a GC cycle (even concurrent collectors have cycles AFAIK).
finally the destructor is called when the object's memory is finally and
permanatly released
(in debug builds it should be released a cycle later to make sure there is
no way it can referenced).
this also allows per class caches of objects (when its unreachable you can
put it into a class static list if its not full)

the upshot of this is (I believe) unfortunatly write barriers during GC
(like concurrent collectors use) to detect object becoming live by putting a
ref into an already walked object (if its not been walked yet then thats
o.k. and the unreachable func might be called twice). this check is only
needed within the destructor on a pausing collector.

afaik, D uses a very simple GC, things are live or not, when the destructor
is called its too late to save the object
you can simulate weak refs your self by using refcounts and indexs.
your class is actually just a index into a static array, when you create a
new one you put the data into the static (which will be forever live) and
mark that index with a count  of 1, you can either create copy refs of just
pass that "ref" about having N refs to it is fine, its destructor decrements
the count, and if 0 releases the data from the array
for weak refs you do need either another array of id's or have the data
contain an incrementing id (globally unique) so you can tell if the object
it the "right" one
the weak refs is just an index, and the id (but does not effect the ref
count).
I'm sure many of you are thinking that you can use the address of the object
^somevalue for the ID, but alas if the memory manager caches similar sized
blocks, then it is very possible that the same memory block may become the
same class again (may not be true of D now, but I've worked on Java VM's
which did this).

you can also (if you don't mind poiner mangleing to stop the GC knowing you
have a ptr to an object) do this:
your class has a list of weak refs that refer to it, its destructor notifies
these that its been collected
each weak ref contains a mangled ptr to the object and its destructor
removes it from the objects list of weak refs.
this only allows caching between GC cycles, and not the more often required
(I want to be live for N cycles or until memory is realy tight).

I'm sure if I though about it long enought you can do it by constantly
objects that are created unreachable to detect the GC cycles.

Mike.

"Ilya Minkov" <midiclub 8ung.at> wrote in message
news:avclh0$o5r$1 digitaldaemon.com...
 Q: does D GC have "weak" pointers?

 I didn't find it being mentioned in the docs.

 "Weak" pointers are pointers, which are similar to NULL pointers, that
 no legal access can be done with them. They point to real objects, which
 however can be collected by a garbage collector. In the case a
 pointed-to object is collected, a "weak" pointer is turned into NULL
 pointer.

 This is all for the sake of caching.

 For implementation to be useful, it should be possible to turn a normal
 pointer into weak pointer at runtime and back. Dereferencing a weak
 pointer is to be an error, they have to be turned into "normal" for any
 operations, and afterwards back into weak if desired.

 I have some more detailed implementation ideas, but i don't have time
 now and will share them later. Together with possible usage examples.

 -i.

Jan 06 2003

Evan McClanahan <evan dontSPAMaltarinteractive.com> writes:

Mike Wynn wrote:
 the upshot of this is (I believe) unfortunatly write barriers during GC
 (like concurrent collectors use) to detect object becoming live by putting a
 ref into an already walked object (if its not been walked yet then thats
 o.k. and the unreachable func might be called twice). this check is only
 needed within the destructor on a pausing collector.

 From the research that I've done, any concurrent collector would need 
read and write barriers.  Maybe there's a way to do it without them, but 
  all of the signs that I've seen point to know.  They're easy and fast 
in hardware, but they aren't in any current and common hardware, and 
would require OS support.

 afaik, D uses a very simple GC, things are live or not, when the destructor
 is called its too late to save the object
 you can simulate weak refs your self by using refcounts and indexs.
 your class is actually just a index into a static array, when you create a
 new one you put the data into the static (which will be forever live) and
 mark that index with a count  of 1, you can either create copy refs of just
 pass that "ref" about having N refs to it is fine, its destructor decrements
 the count, and if 0 releases the data from the array
 for weak refs you do need either another array of id's or have the data
 contain an incrementing id (globally unique) so you can tell if the object
 it the "right" one
 the weak refs is just an index, and the id (but does not effect the ref
 count).
 I'm sure many of you are thinking that you can use the address of the object
 ^somevalue for the ID, but alas if the memory manager caches similar sized
 blocks, then it is very possible that the same memory block may become the
 same class again (may not be true of D now, but I've worked on Java VM's
 which did this).

Yeah, the current D gc is pretty simple.  It should be easy enough to 
tear it out and add in a new one, but I think that you would need some 
language support to do something like software write and read barriers. 
I might be wrong, but I think that at the moment it isn't possible to do 
some of the more interesting and up to date collectors with D.  This 
could change, of course, but even I'm not sure that having read and 
write barriers are the Right Things to Do.  It would be great to have a 
language that could freely either use GC or manual and had a good 
selection of drop in collectors that could be used according to your 
needs.  I'm not sure that D is there yet, but there's been some 
progress, and I think that there will be more in the future.


Evan

Jan 07 2003

"Mike Wynn" <mike.wynn l8night.co.uk> writes:

  From the research that I've done, any concurrent collector would need
 read and write barriers.  Maybe there's a way to do it without them, but
   all of the signs that I've seen point to know.  They're easy and fast
 in hardware, but they aren't in any current and common hardware, and
 would require OS support.

The only GC I've been heavily exposed to was the GC in the Insignia embedded
Java VM (EVM).
this is a Fully concurrent accurate mark-sweep non copying collector.
it does not have read barriers
it does have write and return barriers when the GC is active.
and several software patents, most I think where to do with dealing with
Java's GC semantics.
I believe read barriers are only needed on true concurrent collectors (that
is where the collector runs independantly from the CPU either multi cpu
(some only running code, some always running GC may have to walk the hot end
of the stack at the same time as the code is running) or hardware GC.
the EVM however run fine on dual Pentium machines without read barriers, I
think this is becasue even the working end of the stack can not be walked by
the GC if that thread is active.

there is also a question of who marks, the thread itself or the GC.

Jan 07 2003

Ilya Minkov <midiclub 8ung.at> writes:

Mike Wynn wrote:
 if, (big if) the GC finalise/destructor behaviour was a little different
 then java style weak, soft, phantom refs could all be done with a simple
 template for your classes.

I don't know much about java. The implementation of OCaml is, you can 
create a repository which can hold weak pointers. Access is extremely 
awkward, as you actually have to go and pull this repository around so 
that it doesn't get lost. :) Really, since there's no concept of pointer 
in ocaml, you can't turn anything into "weak" or "back to normal", so 
you create an object "weak pointer table", which behaves like an array 
of objects, members of which magically disappear from time to time. Can 
you imagine anything worse?

 I personally hate the finalise behaviour, and think you should get two calls
 first an "unreachable" which informs the object that it is no longer
 reachable at which time the code can use any means to "revive" the object
 (it may contain a ref to a live object or can access a static)

Ref to a live object? So what? If it's requiered by noone else, then it 
may be collected too. If some living object references it, it'll stay alive.
Global access? If it mixes around with globals, they can be fixed up 
during "normal" destruction.
Or do you have some example where you need a 2-pass destructor?
I thought it was the sense of the weak references to allow collection, 
not to prevent it. If you want to prevent it, you still have normal 
references. :)

I somehow don't understand, why an object which is not requiered by 
anyone or anything should try to save himself?
 this is called EVERY time the object is found to be unreachable at the end
 of a GC cycle (even concurrent collectors have cycles AFAIK).
 finally the destructor is called when the object's memory is finally and
 permanatly released
 (in debug builds it should be released a cycle later to make sure there is
 no way it can referenced).

Why relese later? Isn't it better to place a trap then to hide a bug?
 this also allows per class caches of objects (when its unreachable you can
 put it into a class static list if its not full)
 
 the upshot of this is (I believe) unfortunatly write barriers during GC
 (like concurrent collectors use) to detect object becoming live by putting a
 ref into an already walked object (if its not been walked yet then thats
 o.k. and the unreachable func might be called twice). this check is only
 needed within the destructor on a pausing collector.
 
 afaik, D uses a very simple GC, things are live or not, when the destructor
 is called its too late to save the object

OK. But imagine you have a tree or network of objects, and you decide 
you don't need the ones which are further then, say, 10 steps away from 
you. You can always restore them, but it involves calculations or 
loading from disk. Which might both be better than if it would get 
swapped away. But in the case you've got a lot of memory, you might come 
later to the point where you want these objects again. You've "let them 
go" but the destruction has not onset yet. Would you prefer to try to 
re-use them, or re-create them, spend double memory and time, and wait 
until the GC collects the unused copy? But mind, the object doesn't 
restore itself, it gets restored by the others.

 you can simulate weak refs your self by using refcounts and indexs.
 your class is actually just a index into a static array, when you create a
 new one you put the data into the static (which will be forever live) and
 mark that index with a count  of 1, you can either create copy refs of just
 pass that "ref" about having N refs to it is fine, its destructor decrements
 the count, and if 0 releases the data from the array

I've thought if that. It is applicable in some cases without major loss 
of efficiency. But not as a rule.

Usually an application needs a kind of a central repository. For 
example, to identify, whether an object has to be created, or already 
exists, if it's, say, a representation of a file or something similar.

So, the constructor of an object would look in a repository for a 
similar one of itself, and if found increment the counter and return the 
found, else create a new one. A destructor would decrement the counter 
in the repository. Thes schema is only limited to certain kinds of 
applications, since the objects have to be "signed" with their source or 
id, managing such ids can become a problem when objects are copied to be 
changed. It only makes sense if the reuse grade of the same objects is 
intended to be very high.

Just i thought, the function of the repository would be drastically 
simplified if there was native support for "weak" pointers.

 for weak refs you do need either another array of id's or have the data
 contain an incrementing id (globally unique) so you can tell if the object
 it the "right" one
 the weak refs is just an index, and the id (but does not effect the ref
 count).

 I'm sure many of you are thinking that you can use the address of the object
 ^somevalue for the ID, but alas if the memory manager caches similar sized
 blocks, then it is very possible that the same memory block may become the
 same class again (may not be true of D now, but I've worked on Java VM's
 which did this).

Another thing: what if an object has moved? Imgine you have a number of 
objects maintaining weak references to some object, which is also 
maintained somewhere as a strong one. Then, for efficiency reasons the 
GC would compact the heap and move it somewhere. There you are. The 
object is not yet lost, but the weak references don't turn back into 
normal, and the object could get recreated. Now you have another useless 
copy (or multiple copies) of the same thing. A *real* waste, since 
there's no way they could all be cleaned up.

 
 you can also (if you don't mind poiner mangleing to stop the GC knowing you
 have a ptr to an object) do this:
 your class has a list of weak refs that refer to it, its destructor notifies
 these that its been collected
 each weak ref contains a mangled ptr to the object and its destructor
 removes it from the objects list of weak refs.
 this only allows caching between GC cycles, and not the more often required
 (I want to be live for N cycles or until memory is realy tight).

Rather each "weak reference" contains a cell number in a central 
reposirory of mangled pointers, which gets NULLed by an object 
destructor. Object must know its cell number. Then as i said, a "weak 
reference" must be turned into strong before use, the fuction like 
StrengthenPtr(void *) would only have to look up a pointer in a central 
repository and de-mangle it into the reference to be made strong. And 
the object would need to update one value on destruction and need not 
maintain any lists. But still, it doesn't solve the problem of heap 
compaction. The objects have to be notified somehow if the heap has been 
compacted. They could detect such things if they had refrences to 
"strong" references to them! But i bet it'd be too much of a 
complication and a slowdown. Maybe it's easier to make a change into the 
GC than to try to fake some workaround for it? :>

 
 I'm sure if I though about it long enought you can do it by constantly
 objects that are created unreachable to detect the GC cycles.

:/
Reliability?

 
 Mike.
 
 "Ilya Minkov" <midiclub 8ung.at> wrote in message
 news:avclh0$o5r$1 digitaldaemon.com...
 
Q: does D GC have "weak" pointers?

I didn't find it being mentioned in the docs.

"Weak" pointers are pointers, which are similar to NULL pointers, that
no legal access can be done with them. They point to real objects, which
however can be collected by a garbage collector. In the case a
pointed-to object is collected, a "weak" pointer is turned into NULL
pointer.

This is all for the sake of caching.

For implementation to be useful, it should be possible to turn a normal
pointer into weak pointer at runtime and back. Dereferencing a weak
pointer is to be an error, they have to be turned into "normal" for any
operations, and afterwards back into weak if desired.

I have some more detailed implementation ideas, but i don't have time
now and will share them later. Together with possible usage examples.

-i.

Jan 07 2003

"Mike Wynn" <mike.wynn l8night.co.uk> writes:

 I personally hate the finalise behaviour, and think you should get two


calls
 first an "unreachable" which informs the object that it is no longer
 reachable at which time the code can use any means to "revive" the


object
 (it may contain a ref to a live object or can access a static)

 Ref to a live object? So what? If it's requiered by noone else, then it
 may be collected too. If some living object references it, it'll stay

alive.
 Global access? If it mixes around with globals, they can be fixed up
 during "normal" destruction.

an unreachable object can contain a ref to a live object.
what is there to stop the destructor putting a ref to the dying object onto
that live object which it has a ref to
(or a static which anyone can ref).

Jan 07 2003

Ilya Minkov <midiclub 8ung.at> writes:

Mike Wynn wrote:
I personally hate the finalise behaviour, and think you should get two


 
 calls
 
first an "unreachable" which informs the object that it is no longer
reachable at which time the code can use any means to "revive" the


 
 object
 
(it may contain a ref to a live object or can access a static)

Ref to a live object? So what? If it's requiered by noone else, then it
may be collected too. If some living object references it, it'll stay

 
 alive.
 
Global access? If it mixes around with globals, they can be fixed up
during "normal" destruction.

 
 
 an unreachable object can contain a ref to a live object.
 what is there to stop the destructor putting a ref to the dying object onto
 that live object which it has a ref to
 (or a static which anyone can ref).
 

Sure. BUT WHY BLOODY WHY DO YOU WANT THAT? Is there an example where it 
makes sense?

Jan 07 2003

"Mike Wynn" <mike.wynn l8night.co.uk> writes:

 an unreachable object can contain a ref to a live object.
 what is there to stop the destructor putting a ref to the dying object


onto
 that live object which it has a ref to
 (or a static which anyone can ref).

 Sure. BUT WHY BLOODY WHY DO YOU WANT THAT? Is there an example where it
 makes sense?

its not why but that you can, and becuase you can the GC MUST cope.

perhaps something like .... (not real code)

class MyCachedObj
{
    ParentObj parent;
    Time created;
    static MyCachedObj[] saved;

    this( ParentObj p0 ) { parent =p0; created = Time.now(); }

        //  stuff !

    this~()
    {
            /// this is unreachable at this point
        if ( (Time.Now()- created) < Time.TenMinutes )
        {
            saved ~= this;    // revived via a static (this is now
reachable)
            // or
            parent.lastToDie = this; // revived via a ref to a reachable obj
(this is now reachable if parent is reachable)
        }
    }
}

Jan 07 2003

Burton Radons <loth users.sourceforge.net> writes:

Mike Wynn wrote:
an unreachable object can contain a ref to a live object.
what is there to stop the destructor putting a ref to the dying object


 
 onto
 
that live object which it has a ref to
(or a static which anyone can ref).

Sure. BUT WHY BLOODY WHY DO YOU WANT THAT? Is there an example where it
makes sense?

 
 
 its not why but that you can, and becuase you can the GC MUST cope.

We have no control over what order destructors are called, so if you
have exclusive ownership of a sub-object then it might already be
destroyed and its pointer invalid, so telling the runtime to keep
yourself alive is pretty pointless.  In your example's case, created
would fit that description.  Workarounds for this that I can perceive
are all more difficult for the user than keeping the object alive in
normal ways.

The GC doesn't have to cope with anything the language hasn't alloted to
it.  I would find it very surprising if D implementations have to
support revival, as that would make the reference implementation incorrect.

 perhaps something like .... (not real code)
 
 class MyCachedObj
 {
     ParentObj parent;
     Time created;
     static MyCachedObj[] saved;
 
     this( ParentObj p0 ) { parent =p0; created = Time.now(); }
 
         //  stuff !
 
     this~()
     {
             /// this is unreachable at this point
         if ( (Time.Now()- created) < Time.TenMinutes )
         {
             saved ~= this;    // revived via a static (this is now
 reachable)
             // or
             parent.lastToDie = this; // revived via a ref to a reachable obj
 (this is now reachable if parent is reachable)
         }
     }
 }

Jan 08 2003

Ilya Minkov <midiclub 8ung.at> writes:

Burton Radons wrote:
 We have no control over what order destructors are called, so if you
  have exclusive ownership of a sub-object then it might already be 
 destroyed and its pointer invalid, so telling the runtime to keep 
 yourself alive is pretty pointless.  In your example's case, created
  would fit that description.  Workarounds for this that I can
 perceive are all more difficult for the user than keeping the object
 alive in normal ways.

Right, I agree. There's no sense saving a dying object since it may be 
its time to die, besides it would imply a lot of additional memory 
analysis and probably an amount of slowdown. My original idea were the 
"weak pointers" which would be pointers/references, which do not affect 
GC's decision upon cleaning up an object or not. A function is called 
over a normal reference to turn it into unusable state. Another 
function, when called, re-creates a reference, makes sure the object is 
intact, and then assignes it back to the formerly mangled reference. It 
started a long discussion with tricking methods, which are all pretty 
bad and useless.

Although i think i know now of a situation, in which an object can try 
to save itself. Usually GC collects objects youngest-first, but for 
caches the opposite might be true.

Any such tricking around would either effectively shut down GC, or 
severely worsen the situation, likely both. It would be cool if objects 
could optionally have designated fields filled in by GC at each scan, 
containing GC-obtained statistics, such as a number of users and such. 
Although these values won't be always current, they might somehow 
simpify the job for the central repository, and would be "for free". 
Maybe some clean solution would be good, such as a special type of 
pointers, not considered for to be a "user" of an object, not affecting 
the decision to destroy them or not, but being updated if the object 
moves or something. They need not be bound to the "real" destruction of 
the object, it may be enough if it is NULLed out as soon a GC finds no 
other users.

Even simpler: let objects opionally have a designated method invoked by 
a GC giving some handle to it, allowing method to explore current GC 
statistics relating to this object. For example, if there are no users 
besides the "repository" object, it might NULL out the pointer to itself 
in the repository. A limitation needs probably to be imposed, so that it 
doesn't try to revive itself by spending pointers. Not NULLing out its 
pointer in the repository would be its way of revival and would be 
considered legal by the GC, at the contrary pushing some late pointers 
around would not, as all ot these changes may only affect next scanning 
pass. Pushing around late pointers would create a dangling pointer, 
which is a real danger the GC is actually born to cope with.

I think GC is a great improvement over all other methods of memory 
management and it should not be curcumvented too much, and in no case it 
makes sense to make some ref-counting and alike when the similar 
information is collected by GC, it would be wise to use it to extent 
possible.

-i.

 The GC doesn't have to cope with anything the language hasn't alloted
  to it.  I would find it very surprising if D implementations have to
  support revival, as that would make the reference implementation 
 incorrect.

Yup.

 
 perhaps something like .... (not real code)
 
 class MyCachedObj { ParentObj parent; Time created; static 
 MyCachedObj[] saved;
 
 this( ParentObj p0 ) { parent =p0; created = Time.now(); }
 
 //  stuff !
 
 this~() { /// this is unreachable at this point if ( (Time.Now()- 
 created) < Time.TenMinutes ) { saved ~= this;    // revived via a 
 static (this is now reachable) // or parent.lastToDie = this; // 
 revived via a ref to a reachable obj (this is now reachable if 
 parent is reachable) } } }

Jan 09 2003

D Programming

C/C++ Programming

Other

D - D Garbage collector. "weak" pointers.