digitalmars.D - gc and Object.toHash()

Brian Hammond (13/13) May 17 2004 From http://www.digitalmars.com/d/garbage.html

Stewart Gordon (13/16) May 18 2004

Brian Hammond (17/30) May 18 2004 In my case no.. I was simply using Object's toHash() as a convenient way...

Norbert Nemec (8/13) May 18 2004 B.t.w: is this "moving around" described more clearly anywhere in the

=?iso-8859-1?Q?Robert_M._M=FCnch?= (12/21) May 18 2004 The current GC implementation doesn't support it.
Walter (6/14) May 23 2004 Yes.

Norbert Nemec (10/29) May 23 2004 What's the core idea behind it? I don't doubt that it can be done someho...

Ben Hinkle (9/40) May 23 2004 Here's my guess:

Brian Hammond (9/49) May 23 2004 So Walter, if I understand correctly, the current Object.toHash() implem...

Walter (9/17) May 25 2004 implementation

Norbert Nemec (13/21) May 24 2004 I don't agree: the question should not be what kind of data you move, bu...

Ben Hinkle (8/32) May 24 2004 Agreed. I was thinking any ambiguous object reference from outside the

Kevin Bealer (22/54) May 24 2004 Removing or adding such a pointer is simple, if the list is changed to a...

Norbert Nemec (14/20) May 24 2004 That might really be an idea! Object references on the stack usually eit...

Kevin Bealer (26/46) May 25 2004 Other ideas:

Walter (8/19) May 23 2004 collector

Brian Hammond <d at brianhammond dot com> <Brian_member pathlink.com> writes:

From http://www.digitalmars.com/d/garbage.html 

"Using pointer values to compute a hash function. A copying garbage collector
can arbitrarily move objects around in memory, thus invalidating the computed
hash value."

Why then does Object's toHash() method default implementation do exactly that?

uint toHash()
{
return (uint)(void *)this;
}

I was relying on the default toHash() for the value of a key in an associative
array.  I guess I should provide my own hash function then?

Thanks!
Brian

May 17 2004

Stewart Gordon <smjg_1998 yahoo.com> writes:

Brian Hammond <d at brianhammond dot com> wrote:

<snip>
 Why then does Object's toHash() method default implementation do exactly that?

<snip>
 I was relying on the default toHash() for the value of a key in an associative
 array.  I guess I should provide my own hash function then?

Always a good idea.  Especially if your class has a sense of equality 
over and above that of being the very same object.

But you're right, Object.toHash is being a bit naughty there.  But is 
there a better idea?

Maybe it's only meant to be a makeshift, but still....

Stewart.

-- 
My e-mail is valid but not my primary mailbox, aside from its being the 
unfortunate victim of intensive mail-bombing at the moment.  Please keep 
replies on the 'group where everyone may benefit.

May 18 2004

Brian Hammond <d at brianhammond dot com> <Brian_member pathlink.com> writes:

 I was relying on the default toHash() for the value of a key in 
 an associative array.  I guess I should provide my own hash 
 function then?

Always a good idea.  Especially if your class has a sense of 
equality over and above that of being the very same object.

In my case no.. I was simply using Object's toHash() as a convenient way to have
fast lookups (via an associative array) for collections of basic object types
like a delegate.  

A contrived example: a class that allows delegates to be registered/unregistered
as observers.  

alias void delegate(subject) obs_dg;

class subject
{
void register(obs_dg dg) { observers_[dg.toHash()] = dg; }
void unregister(obs_dg dg) { delete observers_[dg.toHash()]; }
//...
private obs_dg[uint] observers_;
}

The key can change due to the GC however so I'll have to use something else like
a straight array and have O(n) lookups.  For small n, who cares really.

Thanks,
Brian

But you're right, Object.toHash is being a bit naughty there.  But is 
there a better idea?

Maybe it's only meant to be a makeshift, but still....

Stewart.

-- 
My e-mail is valid but not my primary mailbox, aside from its being the 
unfortunate victim of intensive mail-bombing at the moment.  Please keep 
replies on the 'group where everyone may benefit.

May 18 2004

Norbert Nemec <Norbert.Nemec gmx.de> writes:

Brian Hammond wrote:

 From http://www.digitalmars.com/d/garbage.html
 
 "Using pointer values to compute a hash function. A copying garbage
 collector can arbitrarily move objects around in memory, thus invalidating
 the computed hash value."

B.t.w: is this "moving around" described more clearly anywhere in the
documentation? What exactly happens if the GC moves an object that was
allocated by a call to "new"? Does it adjust all references pointing
anywhere into that object?

I would assume that this takes quite a bit of intelligence, definitely more
than the Boehm GC has. Is there actually any GC out there, that does move
objects for defragmentation or some other purpose?

May 18 2004

"=?iso-8859-1?Q?Robert_M._M=FCnch?=" <robert.muench robertmuench.de> writes:

On Tue, 18 May 2004 16:53:43 +0200, Norbert Nemec <Norbert.Nemec gmx.de>  
wrote:

 B.t.w: is this "moving around" described more clearly anywhere in the
 documentation?

The current GC implementation doesn't support it.

 What exactly happens if the GC moves an object that was
 allocated by a call to "new"? Does it adjust all references pointing
 anywhere into that object?

Yes, that's how it works.

 I would assume that this takes quite a bit of intelligence, definitely  
 more than the Boehm GC has. Is there actually any GC out there, that  
 does move
 objects for defragmentation or some other purpose?

Walter, once mentioned that he had written such a GC. I don't know how he  
did it, maybe using something like a dictornary, or a special reference  
memory pool, so you know where to start the scan to adjust addresses.  
Maybe the MMU could be used to help too. Just wild guessing...

-- 
Robert M. M�nch
Management & IT Freelancer
http://www.robertmuench.de

May 18 2004

"Walter" <newshound digitalmars.com> writes:

"Norbert Nemec" <Norbert.Nemec gmx.de> wrote in message
news:c8d81o$12m4$1 digitaldaemon.com...
 B.t.w: is this "moving around" described more clearly anywhere in the
 documentation? What exactly happens if the GC moves an object that was
 allocated by a call to "new"?
 Does it adjust all references pointing
 anywhere into that object?

Yes.

 I would assume that this takes quite a bit of intelligence, definitely

more
 than the Boehm GC has. Is there actually any GC out there, that does move
 objects for defragmentation or some other purpose?

I've seen a lot of material saying this cannot be done with a language that
manipulates pointers like C, C++ and D do, but there is a way to do it.

May 23 2004

Norbert Nemec <Norbert.Nemec gmx.de> writes:

Walter wrote:

 
 "Norbert Nemec" <Norbert.Nemec gmx.de> wrote in message
 news:c8d81o$12m4$1 digitaldaemon.com...
 B.t.w: is this "moving around" described more clearly anywhere in the
 documentation? What exactly happens if the GC moves an object that was
 allocated by a call to "new"?
 Does it adjust all references pointing
 anywhere into that object?

 
 Yes.
 
 I would assume that this takes quite a bit of intelligence, definitely

 more
 than the Boehm GC has. Is there actually any GC out there, that does move
 objects for defragmentation or some other purpose?

 
 I've seen a lot of material saying this cannot be done with a language
 that manipulates pointers like C, C++ and D do, but there is a way to do
 it.

What's the core idea behind it? I don't doubt that it can be done somehow,
but I wonder how it efficient it would be. Of course, if the GC knows
exactly what data is a pointer, it can go through the complete memory and
readjust every pointer referencing the memory that has to be moved. Going
through the whole memory should not even be the problem in efficiency, but
how should the GC know what is a pointer and what is some other data that
by chance has the same binary value as a valid pointer? For a plain GC,
this is no problem, since misinterpreting random data as pointers will at
worst keep alive some data that actually is dead.

May 23 2004

Ben Hinkle <bhinkle4 juno.com> writes:

Norbert Nemec wrote:

 Walter wrote:
 
 
 "Norbert Nemec" <Norbert.Nemec gmx.de> wrote in message
 news:c8d81o$12m4$1 digitaldaemon.com...
 B.t.w: is this "moving around" described more clearly anywhere in the
 documentation? What exactly happens if the GC moves an object that was
 allocated by a call to "new"?
 Does it adjust all references pointing
 anywhere into that object?

 
 Yes.
 
 I would assume that this takes quite a bit of intelligence, definitely

 more
 than the Boehm GC has. Is there actually any GC out there, that does
 move objects for defragmentation or some other purpose?

 
 I've seen a lot of material saying this cannot be done with a language
 that manipulates pointers like C, C++ and D do, but there is a way to do
 it.

 
 What's the core idea behind it? I don't doubt that it can be done somehow,
 but I wonder how it efficient it would be. Of course, if the GC knows
 exactly what data is a pointer, it can go through the complete memory and
 readjust every pointer referencing the memory that has to be moved. Going
 through the whole memory should not even be the problem in efficiency, but
 how should the GC know what is a pointer and what is some other data that
 by chance has the same binary value as a valid pointer? For a plain GC,
 this is no problem, since misinterpreting random data as pointers will at
 worst keep alive some data that actually is dead.

Here's my guess:
if the GC can tell what is an object reference (for example by allocating
from special heaps that store only objects) then it can get at the class
info and the class info can store a mask that tells the GC which fields can
be updated if the referant is moved.

That would mean dynamic arrays don't get moved and pointers to structs
(well, actually, pointers in general) don't get moved.

-Ben

May 23 2004

Brian Hammond <d at brianhammond dot com> <Brian_member pathlink.com> writes:

So Walter, if I understand correctly, the current Object.toHash() implementation
is valid with the current garbage collector but would be broken using a new,
copying collector.  If and when the GC impl. is updated, will Object.toHash() be
updated as well?  I'd imagine yes, Object.toHash() will always work correctly
else it should be abstract :)

Just wondering if I need to rewrite code now, when the GC impl. changes, or
never.  Thanks.

Brian

In article <c8r874$2tvd$1 digitaldaemon.com>, Ben Hinkle says...
Norbert Nemec wrote:

 Walter wrote:
 
 
 "Norbert Nemec" <Norbert.Nemec gmx.de> wrote in message
 news:c8d81o$12m4$1 digitaldaemon.com...
 B.t.w: is this "moving around" described more clearly anywhere in the
 documentation? What exactly happens if the GC moves an object that was
 allocated by a call to "new"?
 Does it adjust all references pointing
 anywhere into that object?

 
 Yes.
 
 I would assume that this takes quite a bit of intelligence, definitely

 more
 than the Boehm GC has. Is there actually any GC out there, that does
 move objects for defragmentation or some other purpose?

 
 I've seen a lot of material saying this cannot be done with a language
 that manipulates pointers like C, C++ and D do, but there is a way to do
 it.

 
 What's the core idea behind it? I don't doubt that it can be done somehow,
 but I wonder how it efficient it would be. Of course, if the GC knows
 exactly what data is a pointer, it can go through the complete memory and
 readjust every pointer referencing the memory that has to be moved. Going
 through the whole memory should not even be the problem in efficiency, but
 how should the GC know what is a pointer and what is some other data that
 by chance has the same binary value as a valid pointer? For a plain GC,
 this is no problem, since misinterpreting random data as pointers will at
 worst keep alive some data that actually is dead.

Here's my guess:
if the GC can tell what is an object reference (for example by allocating
from special heaps that store only objects) then it can get at the class
info and the class info can store a mask that tells the GC which fields can
be updated if the referant is moved.

That would mean dynamic arrays don't get moved and pointers to structs
(well, actually, pointers in general) don't get moved.

-Ben

May 23 2004

"Walter" <newshound digitalmars.com> writes:

"Brian Hammond" <d at brianhammond dot comBrian_member pathlink.com> wrote
in message news:c8s0hv$v1n$1 digitaldaemon.com...
 So Walter, if I understand correctly, the current Object.toHash()

implementation
 is valid with the current garbage collector but would be broken using a

new,
 copying collector.

Yes, that's right.

  If and when the GC impl. is updated, will Object.toHash() be
 updated as well?  I'd imagine yes, Object.toHash() will always work

correctly
 else it should be abstract :)

 Just wondering if I need to rewrite code now, when the GC impl. changes,

or
 never.  Thanks.

To be safe, right now I'd implement your own toHash() for your derived
types.

May 25 2004

Norbert Nemec <Norbert.Nemec gmx.de> writes:

Ben Hinkle wrote:

 Here's my guess:
 if the GC can tell what is an object reference (for example by allocating
 from special heaps that store only objects) then it can get at the class
 info and the class info can store a mask that tells the GC which fields
 can be updated if the referant is moved.
 
 That would mean dynamic arrays don't get moved and pointers to structs
 (well, actually, pointers in general) don't get moved.

I don't agree: the question should not be what kind of data you move, but
where the reference is stored.

References in object are easy to be found, but what about references on the
stack or inside of structures/arrays? If the GC wants to move just a single
object, it has to be sure to know about *all* references to this object, so
it needs to know exactly what piece of data on the stack is a reference or
not. It would need to do a complete backtrace and interpret each stackframe
according to the corresponding function.

I really would like to know what idea Walter has in mind about that. I doubt
that it can be done with reasonable effort, in which case it would really
make sense to drop that concept from the specs and rather guarantee that
references stay fixed.

May 24 2004

Ben Hinkle <bhinkle4 juno.com> writes:

Norbert Nemec wrote:

 Ben Hinkle wrote:
 
 Here's my guess:
 if the GC can tell what is an object reference (for example by allocating
 from special heaps that store only objects) then it can get at the class
 info and the class info can store a mask that tells the GC which fields
 can be updated if the referant is moved.
 
 That would mean dynamic arrays don't get moved and pointers to structs
 (well, actually, pointers in general) don't get moved.

 
 I don't agree: the question should not be what kind of data you move, but
 where the reference is stored.
 
 References in object are easy to be found, but what about references on
 the stack or inside of structures/arrays? If the GC wants to move just a
 single object, it has to be sure to know about *all* references to this
 object, so it needs to know exactly what piece of data on the stack is a
 reference or not. It would need to do a complete backtrace and interpret
 each stackframe according to the corresponding function.

Agreed. I was thinking any ambiguous object reference from outside the
object heap would make the object unmovable but I never considered that
only garbage would be moveable then :-)
Maybe pure object references on the stack (or maybe everywhere... who knows)
can be stored in a giant list. That would make the size of a reference
bigger since it stores a pointer to the next reference but it would make
traversing the set of references easy.

 I really would like to know what idea Walter has in mind about that. I
 doubt that it can be done with reasonable effort, in which case it would
 really make sense to drop that concept from the specs and rather guarantee
 that references stay fixed.

May 24 2004

Kevin Bealer <Kevin_member pathlink.com> writes:

In article <c8sri0$28a7$1 digitaldaemon.com>, Ben Hinkle says...
Norbert Nemec wrote:

 Ben Hinkle wrote:
 
 Here's my guess:
 if the GC can tell what is an object reference (for example by allocating
 from special heaps that store only objects) then it can get at the class
 info and the class info can store a mask that tells the GC which fields
 can be updated if the referant is moved.
 
 That would mean dynamic arrays don't get moved and pointers to structs
 (well, actually, pointers in general) don't get moved.

 
 I don't agree: the question should not be what kind of data you move, but
 where the reference is stored.
 
 References in object are easy to be found, but what about references on
 the stack or inside of structures/arrays? If the GC wants to move just a
 single object, it has to be sure to know about *all* references to this
 object, so it needs to know exactly what piece of data on the stack is a
 reference or not. It would need to do a complete backtrace and interpret
 each stackframe according to the corresponding function.

Agreed. I was thinking any ambiguous object reference from outside the
object heap would make the object unmovable but I never considered that
only garbage would be moveable then :-)
Maybe pure object references on the stack (or maybe everywhere... who knows)
can be stored in a giant list. That would make the size of a reference
bigger since it stores a pointer to the next reference but it would make
traversing the set of references easy.

Removing or adding such a pointer is simple, if the list is changed to a doubly
linked list.. but the overhead is annoying for single threaded, and much worse
for multithreaded, programs.  You could also use a bitmap of stack frames.

I would guess the solution involves sectioning memory into parts that are "fully
understood" and "partially understood".  Fully understood memory is D objects
where the pointers and non-pointers are clearly known from the code.

During a sweep, objects are classified as dead, moveable, or fixed.  Dead
objects are removed because no pointer is found to them.  Moveable objects have
pointers, but only from "fully understood" memory.  "Fixed" objects are all
objects in the partial heap, plus "fully understood" objects that have a pointer
or *possible* pointer, from partially understood memory.

Since most code will not use pointers.  In practice, it doesn't hurt much to
leave these objects where they are and move other objects around them.  This is
similar to the "defrag" of a FAT filesystem.  You can't move files that are
open, so you leave them in place.  You get 99% of the benefit.

This technique is partly based on a cultural distinction.  In C, programmers
expect that they know at least as much as the language does about what memory
layout is.  The language helps out, but is only a servant.  D programmers will
have different expectations; the language runtime controls the heap and knows
enough to do its own accounting.  It's like an armed shopkeeper.

Kevin

 I really would like to know what idea Walter has in mind about that. I
 doubt that it can be done with reasonable effort, in which case it would
 really make sense to drop that concept from the specs and rather guarantee
 that references stay fixed.

May 24 2004

Norbert Nemec <Norbert.Nemec gmx.de> writes:

Kevin Bealer wrote:

 I would guess the solution involves sectioning memory into parts that are
 "fully
 understood" and "partially understood".  Fully understood memory is D
 objects where the pointers and non-pointers are clearly known from the
 code.
 ...

That might really be an idea! Object references on the stack usually either
point to huge objects, short-time objects or to the top of object
hierarchies. In those cases where memory-defragmentation really would
matter, references are mostly between objects that are saved on the heap
for a long time.

Furthermore, defragmentation really is only necessary for long-running
programs like interactive programs or daemons. These should be fairly easy
to optimize for defragmentation, but just trying not to keep too many
references in variables on the stack during the top loop.

Additionally, the GC could simply offer a way to mark/unmark variables on
the stack as references, so their content may be moved as well. (This
should not happen automatically, since it would mean quite some overhead,
but doing it manually in important cases would not be a problem.

May 24 2004

Kevin Bealer <Kevin_member pathlink.com> writes:

In article <c8t1s1$2hv7$1 digitaldaemon.com>, Norbert Nemec says...
Kevin Bealer wrote:

 I would guess the solution involves sectioning memory into parts that are
 "fully
 understood" and "partially understood".  Fully understood memory is D
 objects where the pointers and non-pointers are clearly known from the
 code.
 ...

That might really be an idea! Object references on the stack usually either
point to huge objects, short-time objects or to the top of object
hierarchies. In those cases where memory-defragmentation really would
matter, references are mostly between objects that are saved on the heap
for a long time.

Furthermore, defragmentation really is only necessary for long-running
programs like interactive programs or daemons. These should be fairly easy
to optimize for defragmentation, but just trying not to keep too many
references in variables on the stack during the top loop.

Additionally, the GC could simply offer a way to mark/unmark variables on
the stack as references, so their content may be moved as well. (This
should not happen automatically, since it would mean quite some overhead,
but doing it manually in important cases would not be a problem.

Other ideas:

The GC could prefer to collect when the stack is "shallow" relative to recent
activity.  This would cause collections at the end of "natural breaks" in
processing.

The stack itself could be "well understood" as follows: each function/method has
a stack frame with a certain layout.  A bitmap of that layout, i.e. "1" bits
mean this-is-a-reference, and "0" bits mean "this is piece-of-data", would be
created at compile time.  When the GC "took over" it would determine which
bitmap applied to each frame.

If you think about it, a struct is like a stack frame, is like an object's data
section.  Programs have code and data elements; you can move "local variables"
in an object into and out of the object body, to make different parts of it more
or less persistent.

So, everything that can be done with an object can be done with a stack frame
and vice versa.

Languages like ruby even allow "closures" where the stack frame stays around
after the program exits because a nested function still has references to
variables in it.  This (potentially) requires garbage collection of the stack
frame, which is not fast.  It also means (depending on implementation) there
really is no stack: just a lot of floating objects that serve as contexts for
each other.

There are some neat writeups on this.  I don't remember where, but it was on the
page for "parrot", the perl "assembler language".  Slightly mind blowing, if you
ask me.

Kevin

May 25 2004

"Walter" <newshound digitalmars.com> writes:

"Brian Hammond" <d at brianhammond dot comBrian_member pathlink.com> wrote
in message news:c8b56i$pro$1 digitaldaemon.com...
 From http://www.digitalmars.com/d/garbage.html

 "Using pointer values to compute a hash function. A copying garbage

collector
 can arbitrarily move objects around in memory, thus invalidating the

computed
 hash value."

 Why then does Object's toHash() method default implementation do exactly

that?
 uint toHash()
 {
 return (uint)(void *)this;
 }

 I was relying on the default toHash() for the value of a key in an

associative
 array.  I guess I should provide my own hash function then?

You're right, Object.toHash() is fundamentally broken, and this break will
show up if and when a D implementation gets a copying garbage collector.

May 23 2004

D Programming

C/C++ Programming

Other

digitalmars.D - gc and Object.toHash()