www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - gc and Object.toHash()

reply Brian Hammond <d at brianhammond dot com> <Brian_member pathlink.com> writes:
From http://www.digitalmars.com/d/garbage.html 

"Using pointer values to compute a hash function. A copying garbage collector
can arbitrarily move objects around in memory, thus invalidating the computed
hash value."

Why then does Object's toHash() method default implementation do exactly that?

uint toHash()
{
return (uint)(void *)this;
}

I was relying on the default toHash() for the value of a key in an associative
array.  I guess I should provide my own hash function then?

Thanks!
Brian
May 17 2004
next sibling parent reply Stewart Gordon <smjg_1998 yahoo.com> writes:
Brian Hammond <d at brianhammond dot com> wrote:

<snip>
 Why then does Object's toHash() method default implementation do exactly that?

 I was relying on the default toHash() for the value of a key in an associative
 array.  I guess I should provide my own hash function then?

Always a good idea. Especially if your class has a sense of equality over and above that of being the very same object. But you're right, Object.toHash is being a bit naughty there. But is there a better idea? Maybe it's only meant to be a makeshift, but still.... Stewart. -- My e-mail is valid but not my primary mailbox, aside from its being the unfortunate victim of intensive mail-bombing at the moment. Please keep replies on the 'group where everyone may benefit.
May 18 2004
parent Brian Hammond <d at brianhammond dot com> <Brian_member pathlink.com> writes:
 I was relying on the default toHash() for the value of a key in 
 an associative array.  I guess I should provide my own hash 
 function then?

Always a good idea. Especially if your class has a sense of equality over and above that of being the very same object.

In my case no.. I was simply using Object's toHash() as a convenient way to have fast lookups (via an associative array) for collections of basic object types like a delegate. A contrived example: a class that allows delegates to be registered/unregistered as observers. alias void delegate(subject) obs_dg; class subject { void register(obs_dg dg) { observers_[dg.toHash()] = dg; } void unregister(obs_dg dg) { delete observers_[dg.toHash()]; } //... private obs_dg[uint] observers_; } The key can change due to the GC however so I'll have to use something else like a straight array and have O(n) lookups. For small n, who cares really. Thanks, Brian
But you're right, Object.toHash is being a bit naughty there.  But is 
there a better idea?

Maybe it's only meant to be a makeshift, but still....

Stewart.

-- 
My e-mail is valid but not my primary mailbox, aside from its being the 
unfortunate victim of intensive mail-bombing at the moment.  Please keep 
replies on the 'group where everyone may benefit.

May 18 2004
prev sibling next sibling parent reply Norbert Nemec <Norbert.Nemec gmx.de> writes:
Brian Hammond wrote:

 From http://www.digitalmars.com/d/garbage.html
 
 "Using pointer values to compute a hash function. A copying garbage
 collector can arbitrarily move objects around in memory, thus invalidating
 the computed hash value."

B.t.w: is this "moving around" described more clearly anywhere in the documentation? What exactly happens if the GC moves an object that was allocated by a call to "new"? Does it adjust all references pointing anywhere into that object? I would assume that this takes quite a bit of intelligence, definitely more than the Boehm GC has. Is there actually any GC out there, that does move objects for defragmentation or some other purpose?
May 18 2004
next sibling parent "=?iso-8859-1?Q?Robert_M._M=FCnch?=" <robert.muench robertmuench.de> writes:
On Tue, 18 May 2004 16:53:43 +0200, Norbert Nemec <Norbert.Nemec gmx.de>  
wrote:

 B.t.w: is this "moving around" described more clearly anywhere in the
 documentation?

The current GC implementation doesn't support it.
 What exactly happens if the GC moves an object that was
 allocated by a call to "new"? Does it adjust all references pointing
 anywhere into that object?

Yes, that's how it works.
 I would assume that this takes quite a bit of intelligence, definitely  
 more than the Boehm GC has. Is there actually any GC out there, that  
 does move
 objects for defragmentation or some other purpose?

Walter, once mentioned that he had written such a GC. I don't know how he did it, maybe using something like a dictornary, or a special reference memory pool, so you know where to start the scan to adjust addresses. Maybe the MMU could be used to help too. Just wild guessing... -- Robert M. Münch Management & IT Freelancer http://www.robertmuench.de
May 18 2004
prev sibling parent reply "Walter" <newshound digitalmars.com> writes:
"Norbert Nemec" <Norbert.Nemec gmx.de> wrote in message
news:c8d81o$12m4$1 digitaldaemon.com...
 B.t.w: is this "moving around" described more clearly anywhere in the
 documentation? What exactly happens if the GC moves an object that was
 allocated by a call to "new"?
 Does it adjust all references pointing
 anywhere into that object?

Yes.
 I would assume that this takes quite a bit of intelligence, definitely

 than the Boehm GC has. Is there actually any GC out there, that does move
 objects for defragmentation or some other purpose?

I've seen a lot of material saying this cannot be done with a language that manipulates pointers like C, C++ and D do, but there is a way to do it.
May 23 2004
parent reply Norbert Nemec <Norbert.Nemec gmx.de> writes:
Walter wrote:

 
 "Norbert Nemec" <Norbert.Nemec gmx.de> wrote in message
 news:c8d81o$12m4$1 digitaldaemon.com...
 B.t.w: is this "moving around" described more clearly anywhere in the
 documentation? What exactly happens if the GC moves an object that was
 allocated by a call to "new"?
 Does it adjust all references pointing
 anywhere into that object?

Yes.
 I would assume that this takes quite a bit of intelligence, definitely

 than the Boehm GC has. Is there actually any GC out there, that does move
 objects for defragmentation or some other purpose?

I've seen a lot of material saying this cannot be done with a language that manipulates pointers like C, C++ and D do, but there is a way to do it.

What's the core idea behind it? I don't doubt that it can be done somehow, but I wonder how it efficient it would be. Of course, if the GC knows exactly what data is a pointer, it can go through the complete memory and readjust every pointer referencing the memory that has to be moved. Going through the whole memory should not even be the problem in efficiency, but how should the GC know what is a pointer and what is some other data that by chance has the same binary value as a valid pointer? For a plain GC, this is no problem, since misinterpreting random data as pointers will at worst keep alive some data that actually is dead.
May 23 2004
parent reply Ben Hinkle <bhinkle4 juno.com> writes:
Norbert Nemec wrote:

 Walter wrote:
 
 
 "Norbert Nemec" <Norbert.Nemec gmx.de> wrote in message
 news:c8d81o$12m4$1 digitaldaemon.com...
 B.t.w: is this "moving around" described more clearly anywhere in the
 documentation? What exactly happens if the GC moves an object that was
 allocated by a call to "new"?
 Does it adjust all references pointing
 anywhere into that object?

Yes.
 I would assume that this takes quite a bit of intelligence, definitely

 than the Boehm GC has. Is there actually any GC out there, that does
 move objects for defragmentation or some other purpose?

I've seen a lot of material saying this cannot be done with a language that manipulates pointers like C, C++ and D do, but there is a way to do it.

What's the core idea behind it? I don't doubt that it can be done somehow, but I wonder how it efficient it would be. Of course, if the GC knows exactly what data is a pointer, it can go through the complete memory and readjust every pointer referencing the memory that has to be moved. Going through the whole memory should not even be the problem in efficiency, but how should the GC know what is a pointer and what is some other data that by chance has the same binary value as a valid pointer? For a plain GC, this is no problem, since misinterpreting random data as pointers will at worst keep alive some data that actually is dead.

Here's my guess: if the GC can tell what is an object reference (for example by allocating from special heaps that store only objects) then it can get at the class info and the class info can store a mask that tells the GC which fields can be updated if the referant is moved. That would mean dynamic arrays don't get moved and pointers to structs (well, actually, pointers in general) don't get moved. -Ben
May 23 2004
next sibling parent reply Brian Hammond <d at brianhammond dot com> <Brian_member pathlink.com> writes:
So Walter, if I understand correctly, the current Object.toHash() implementation
is valid with the current garbage collector but would be broken using a new,
copying collector.  If and when the GC impl. is updated, will Object.toHash() be
updated as well?  I'd imagine yes, Object.toHash() will always work correctly
else it should be abstract :)

Just wondering if I need to rewrite code now, when the GC impl. changes, or
never.  Thanks.

Brian

In article <c8r874$2tvd$1 digitaldaemon.com>, Ben Hinkle says...
Norbert Nemec wrote:

 Walter wrote:
 
 
 "Norbert Nemec" <Norbert.Nemec gmx.de> wrote in message
 news:c8d81o$12m4$1 digitaldaemon.com...
 B.t.w: is this "moving around" described more clearly anywhere in the
 documentation? What exactly happens if the GC moves an object that was
 allocated by a call to "new"?
 Does it adjust all references pointing
 anywhere into that object?

Yes.
 I would assume that this takes quite a bit of intelligence, definitely

 than the Boehm GC has. Is there actually any GC out there, that does
 move objects for defragmentation or some other purpose?

I've seen a lot of material saying this cannot be done with a language that manipulates pointers like C, C++ and D do, but there is a way to do it.

What's the core idea behind it? I don't doubt that it can be done somehow, but I wonder how it efficient it would be. Of course, if the GC knows exactly what data is a pointer, it can go through the complete memory and readjust every pointer referencing the memory that has to be moved. Going through the whole memory should not even be the problem in efficiency, but how should the GC know what is a pointer and what is some other data that by chance has the same binary value as a valid pointer? For a plain GC, this is no problem, since misinterpreting random data as pointers will at worst keep alive some data that actually is dead.

Here's my guess: if the GC can tell what is an object reference (for example by allocating from special heaps that store only objects) then it can get at the class info and the class info can store a mask that tells the GC which fields can be updated if the referant is moved. That would mean dynamic arrays don't get moved and pointers to structs (well, actually, pointers in general) don't get moved. -Ben

May 23 2004
parent "Walter" <newshound digitalmars.com> writes:
"Brian Hammond" <d at brianhammond dot comBrian_member pathlink.com> wrote
in message news:c8s0hv$v1n$1 digitaldaemon.com...
 So Walter, if I understand correctly, the current Object.toHash()

 is valid with the current garbage collector but would be broken using a

 copying collector.

Yes, that's right.
  If and when the GC impl. is updated, will Object.toHash() be
 updated as well?  I'd imagine yes, Object.toHash() will always work

 else it should be abstract :)

 Just wondering if I need to rewrite code now, when the GC impl. changes,

 never.  Thanks.

To be safe, right now I'd implement your own toHash() for your derived types.
May 25 2004
prev sibling parent reply Norbert Nemec <Norbert.Nemec gmx.de> writes:
Ben Hinkle wrote:

 Here's my guess:
 if the GC can tell what is an object reference (for example by allocating
 from special heaps that store only objects) then it can get at the class
 info and the class info can store a mask that tells the GC which fields
 can be updated if the referant is moved.
 
 That would mean dynamic arrays don't get moved and pointers to structs
 (well, actually, pointers in general) don't get moved.

I don't agree: the question should not be what kind of data you move, but where the reference is stored. References in object are easy to be found, but what about references on the stack or inside of structures/arrays? If the GC wants to move just a single object, it has to be sure to know about *all* references to this object, so it needs to know exactly what piece of data on the stack is a reference or not. It would need to do a complete backtrace and interpret each stackframe according to the corresponding function. I really would like to know what idea Walter has in mind about that. I doubt that it can be done with reasonable effort, in which case it would really make sense to drop that concept from the specs and rather guarantee that references stay fixed.
May 24 2004
parent reply Ben Hinkle <bhinkle4 juno.com> writes:
Norbert Nemec wrote:

 Ben Hinkle wrote:
 
 Here's my guess:
 if the GC can tell what is an object reference (for example by allocating
 from special heaps that store only objects) then it can get at the class
 info and the class info can store a mask that tells the GC which fields
 can be updated if the referant is moved.
 
 That would mean dynamic arrays don't get moved and pointers to structs
 (well, actually, pointers in general) don't get moved.

I don't agree: the question should not be what kind of data you move, but where the reference is stored. References in object are easy to be found, but what about references on the stack or inside of structures/arrays? If the GC wants to move just a single object, it has to be sure to know about *all* references to this object, so it needs to know exactly what piece of data on the stack is a reference or not. It would need to do a complete backtrace and interpret each stackframe according to the corresponding function.

Agreed. I was thinking any ambiguous object reference from outside the object heap would make the object unmovable but I never considered that only garbage would be moveable then :-) Maybe pure object references on the stack (or maybe everywhere... who knows) can be stored in a giant list. That would make the size of a reference bigger since it stores a pointer to the next reference but it would make traversing the set of references easy.
 I really would like to know what idea Walter has in mind about that. I
 doubt that it can be done with reasonable effort, in which case it would
 really make sense to drop that concept from the specs and rather guarantee
 that references stay fixed.

May 24 2004
parent reply Kevin Bealer <Kevin_member pathlink.com> writes:
In article <c8sri0$28a7$1 digitaldaemon.com>, Ben Hinkle says...
Norbert Nemec wrote:

 Ben Hinkle wrote:
 
 Here's my guess:
 if the GC can tell what is an object reference (for example by allocating
 from special heaps that store only objects) then it can get at the class
 info and the class info can store a mask that tells the GC which fields
 can be updated if the referant is moved.
 
 That would mean dynamic arrays don't get moved and pointers to structs
 (well, actually, pointers in general) don't get moved.

I don't agree: the question should not be what kind of data you move, but where the reference is stored. References in object are easy to be found, but what about references on the stack or inside of structures/arrays? If the GC wants to move just a single object, it has to be sure to know about *all* references to this object, so it needs to know exactly what piece of data on the stack is a reference or not. It would need to do a complete backtrace and interpret each stackframe according to the corresponding function.

Agreed. I was thinking any ambiguous object reference from outside the object heap would make the object unmovable but I never considered that only garbage would be moveable then :-) Maybe pure object references on the stack (or maybe everywhere... who knows) can be stored in a giant list. That would make the size of a reference bigger since it stores a pointer to the next reference but it would make traversing the set of references easy.

Removing or adding such a pointer is simple, if the list is changed to a doubly linked list.. but the overhead is annoying for single threaded, and much worse for multithreaded, programs. You could also use a bitmap of stack frames. I would guess the solution involves sectioning memory into parts that are "fully understood" and "partially understood". Fully understood memory is D objects where the pointers and non-pointers are clearly known from the code. During a sweep, objects are classified as dead, moveable, or fixed. Dead objects are removed because no pointer is found to them. Moveable objects have pointers, but only from "fully understood" memory. "Fixed" objects are all objects in the partial heap, plus "fully understood" objects that have a pointer or *possible* pointer, from partially understood memory. Since most code will not use pointers. In practice, it doesn't hurt much to leave these objects where they are and move other objects around them. This is similar to the "defrag" of a FAT filesystem. You can't move files that are open, so you leave them in place. You get 99% of the benefit. This technique is partly based on a cultural distinction. In C, programmers expect that they know at least as much as the language does about what memory layout is. The language helps out, but is only a servant. D programmers will have different expectations; the language runtime controls the heap and knows enough to do its own accounting. It's like an armed shopkeeper. Kevin
 I really would like to know what idea Walter has in mind about that. I
 doubt that it can be done with reasonable effort, in which case it would
 really make sense to drop that concept from the specs and rather guarantee
 that references stay fixed.


May 24 2004
parent reply Norbert Nemec <Norbert.Nemec gmx.de> writes:
Kevin Bealer wrote:

 I would guess the solution involves sectioning memory into parts that are
 "fully
 understood" and "partially understood".  Fully understood memory is D
 objects where the pointers and non-pointers are clearly known from the
 code.
 ...

That might really be an idea! Object references on the stack usually either point to huge objects, short-time objects or to the top of object hierarchies. In those cases where memory-defragmentation really would matter, references are mostly between objects that are saved on the heap for a long time. Furthermore, defragmentation really is only necessary for long-running programs like interactive programs or daemons. These should be fairly easy to optimize for defragmentation, but just trying not to keep too many references in variables on the stack during the top loop. Additionally, the GC could simply offer a way to mark/unmark variables on the stack as references, so their content may be moved as well. (This should not happen automatically, since it would mean quite some overhead, but doing it manually in important cases would not be a problem.
May 24 2004
parent Kevin Bealer <Kevin_member pathlink.com> writes:
In article <c8t1s1$2hv7$1 digitaldaemon.com>, Norbert Nemec says...
Kevin Bealer wrote:

 I would guess the solution involves sectioning memory into parts that are
 "fully
 understood" and "partially understood".  Fully understood memory is D
 objects where the pointers and non-pointers are clearly known from the
 code.
 ...

That might really be an idea! Object references on the stack usually either point to huge objects, short-time objects or to the top of object hierarchies. In those cases where memory-defragmentation really would matter, references are mostly between objects that are saved on the heap for a long time. Furthermore, defragmentation really is only necessary for long-running programs like interactive programs or daemons. These should be fairly easy to optimize for defragmentation, but just trying not to keep too many references in variables on the stack during the top loop. Additionally, the GC could simply offer a way to mark/unmark variables on the stack as references, so their content may be moved as well. (This should not happen automatically, since it would mean quite some overhead, but doing it manually in important cases would not be a problem.

Other ideas: The GC could prefer to collect when the stack is "shallow" relative to recent activity. This would cause collections at the end of "natural breaks" in processing. The stack itself could be "well understood" as follows: each function/method has a stack frame with a certain layout. A bitmap of that layout, i.e. "1" bits mean this-is-a-reference, and "0" bits mean "this is piece-of-data", would be created at compile time. When the GC "took over" it would determine which bitmap applied to each frame. If you think about it, a struct is like a stack frame, is like an object's data section. Programs have code and data elements; you can move "local variables" in an object into and out of the object body, to make different parts of it more or less persistent. So, everything that can be done with an object can be done with a stack frame and vice versa. Languages like ruby even allow "closures" where the stack frame stays around after the program exits because a nested function still has references to variables in it. This (potentially) requires garbage collection of the stack frame, which is not fast. It also means (depending on implementation) there really is no stack: just a lot of floating objects that serve as contexts for each other. There are some neat writeups on this. I don't remember where, but it was on the page for "parrot", the perl "assembler language". Slightly mind blowing, if you ask me. Kevin
May 25 2004
prev sibling parent "Walter" <newshound digitalmars.com> writes:
"Brian Hammond" <d at brianhammond dot comBrian_member pathlink.com> wrote
in message news:c8b56i$pro$1 digitaldaemon.com...
 From http://www.digitalmars.com/d/garbage.html

 "Using pointer values to compute a hash function. A copying garbage

 can arbitrarily move objects around in memory, thus invalidating the

 hash value."

 Why then does Object's toHash() method default implementation do exactly

 uint toHash()
 {
 return (uint)(void *)this;
 }

 I was relying on the default toHash() for the value of a key in an

 array.  I guess I should provide my own hash function then?

You're right, Object.toHash() is fundamentally broken, and this break will show up if and when a D implementation gets a copying garbage collector.
May 23 2004