digitalmars.D - Dlls and object collection

pragma (39/39) Feb 06 2005 This really isn't so much of a request for help, as an example of what c...

Kris (17/60) Feb 06 2005 Aye;

pragma (12/17) Feb 06 2005 Gah. I keep forgetting about that. Thank you for setting me straight on...

Kris (13/17) Feb 06 2005 Ah; gotcha. Walter will probably flip over that notion (perhaps rightly ...

pragma (13/24) Feb 06 2005 Right, its not the best solution, its just one of several things that mi...

Kris (8/13) Feb 06 2005 I'm lost on this one, Eric - why would one need to cast to the Interface...

Matthew (6/27) Feb 06 2005 Which brings me back to my "overly complicated" solution we discussed

Ben Hinkle (5/17) Feb 06 2005 I guess that's why Java doesn't let you unload classes explicitly and C#...

Walter (5/7) Feb 10 2005 the

Walter (26/37) Feb 10 2005 calling

pragma <pragma_member pathlink.com> writes:

This really isn't so much of a request for help, as an example of what can go
wrong with Dll's and GC in D (at present); its something to look out for that I
didn't expect at first.  Perhaps this will help some struggling noobs out there,
but I hope it'll raise some eyebrows with the more seasoned developers among us.

BTW, I'm open to suggestions on how to best tackle this problem.

Basically, with the new 'hookable GC' that Walter gave us with 0.112, things
have improved.  One no longer needs to worry about having a dll return a string
or int[] and only to watch where that memory will go once the dll is unloaded.
However, there are some 'gotchas' still present in the architecture.

//mydll.d
//(assuming that winmain is configured and the proper GC hooks are in place)

class Foobar{}
static this(){ new Foobar(); }

//test.d
//(assuming a Library class that loads a library and hooks/unhooks the GC)
void main(){
Library lib = new Library("mydll.dll"); // load and hook
lib.unload(); //unhook and unload
}

Note that there is no communication between main and the dll other than calling
hook and unhook.

Since the GC is lazy, any collection pass can leave objects outstanding for a
variety of reasons (even after a full collect).  The code above is *likely* to
work, but can fail if the 'Foobar' object created in the mydll.d module is still
outsanding after the call to unload().  In that case, the 'Foobar' becomes more
of a 'Fubar' object since the gc duitifully tries to call the object's
destructor.  Said destuctor doesn't exist anymore since the object's v-table
points to where the dll used to be.

So this isn't a problem that GC-hooking doesn't solve; objects are *very*
tightly bound to their 'home' dll.  

I for one used to think that just keeping track of those objects that cross the
dll/application barrier were the only ones requiring discrete tracking.  I now
understand that this is no longer the case; that every object created within a
dll needs the same consideration one way or another.

I think GC-managed libraries are the way to go, but such a technique would
require being able to set the entire dll-space in memory as a GC root.  Does
anyone out there have an idea how to gather that information on win32?  How
about Posix?

- EricAnderton at yahoo

Feb 06 2005

Kris <Kris_member pathlink.com> writes:

Aye;

This is partly what I was getting at in an earlier thread; like you, I feel the
DLL unloading needs to be managed by the GC (via a DLL wrapper class). However,
that can lead to deadlock when the GC halts all threads while it collects. 

One way around the deadlock issue is to construct a non-blocking, non-spinning,
mechanism whereby the DLL-wrapper may be marked for subsequent removal via its
destructor (synch won't work, because another thread could be holding the
wrapper-mutex while it is asking for the DLL to be loaded - that thread will be
paused() by the GC during a collect, which is when the wrapper-destructor could
be invoked).

A further issue is where the DLL creates a thread of it's own. The GC will not
know about such threads, and therefore will not pause() them during a collect.
This exposes the potential for heap-corruption, so fair warning :-)

but such a technique would
require being able to set the entire dll-space in memory as a GC root.  Does
anyone out there have an idea how to gather that information on win32?  How
about Posix?

I believe the GC adds the DLL static-data-area as a GC root. But I don't follow
as to why the entire DLL would need to be mapped. Can you elaborate, Eric?

- Kris


In article <cu62oh$215u$1 digitaldaemon.com>, pragma says...
This really isn't so much of a request for help, as an example of what can go
wrong with Dll's and GC in D (at present); its something to look out for that I
didn't expect at first.  Perhaps this will help some struggling noobs out there,
but I hope it'll raise some eyebrows with the more seasoned developers among us.

BTW, I'm open to suggestions on how to best tackle this problem.

Basically, with the new 'hookable GC' that Walter gave us with 0.112, things
have improved.  One no longer needs to worry about having a dll return a string
or int[] and only to watch where that memory will go once the dll is unloaded.
However, there are some 'gotchas' still present in the architecture.

//mydll.d
//(assuming that winmain is configured and the proper GC hooks are in place)

class Foobar{}
static this(){ new Foobar(); }

//test.d
//(assuming a Library class that loads a library and hooks/unhooks the GC)
void main(){
Library lib = new Library("mydll.dll"); // load and hook
lib.unload(); //unhook and unload
}

Note that there is no communication between main and the dll other than calling
hook and unhook.

Since the GC is lazy, any collection pass can leave objects outstanding for a
variety of reasons (even after a full collect).  The code above is *likely* to
work, but can fail if the 'Foobar' object created in the mydll.d module is still
outsanding after the call to unload().  In that case, the 'Foobar' becomes more
of a 'Fubar' object since the gc duitifully tries to call the object's
destructor.  Said destuctor doesn't exist anymore since the object's v-table
points to where the dll used to be.

So this isn't a problem that GC-hooking doesn't solve; objects are *very*
tightly bound to their 'home' dll.  

I for one used to think that just keeping track of those objects that cross the
dll/application barrier were the only ones requiring discrete tracking.  I now
understand that this is no longer the case; that every object created within a
dll needs the same consideration one way or another.

I think GC-managed libraries are the way to go, but such a technique would
require being able to set the entire dll-space in memory as a GC root.  Does
anyone out there have an idea how to gather that information on win32?  How
about Posix?

- EricAnderton at yahoo

Feb 06 2005

pragma <pragma_member pathlink.com> writes:

In article <cu64pm$25h4$1 digitaldaemon.com>, Kris says...
This is partly what I was getting at in an earlier thread; like you, I feel the
DLL unloading needs to be managed by the GC (via a DLL wrapper class). However,
that can lead to deadlock when the GC halts all threads while it collects. 

Gah. I keep forgetting about that.  Thank you for setting me straight once
again. :)

I believe the GC adds the DLL static-data-area as a GC root. But I don't follow
as to why the entire DLL would need to be mapped. Can you elaborate, Eric?

That's easy to explain.  The key issue here is that while the GC does a great
job of tracking pointer-to-data dependencies, it fails on pointer-to-code with
respect to dlls.  So without the *code* space of the dll being mapped,
delegates, function-pointers and object v-tables all slip through the cracks.

It doesn't have to be all-or-nothing for mapping the dll as a root. I figured
that it would probably be easier than trying to find the dll's code segment(s)
and add them via gc.addRange().  Either way, its the missing magic needed to
make this work transparently.

- EricAnderton at yahoo.com

Feb 06 2005

Kris <Kris_member pathlink.com> writes:

In article <cu6ak5$2hoj$1 digitaldaemon.com>, pragma says...
That's easy to explain.  The key issue here is that while the GC does a great
job of tracking pointer-to-data dependencies, it fails on pointer-to-code with
respect to dlls.  So without the *code* space of the dll being mapped,
delegates, function-pointers and object v-tables all slip through the cracks.

Ah; gotcha. Walter will probably flip over that notion (perhaps rightly so)
since there's now a raft of machine-code to be scanned (in addition to data
segments), some of which will probably look like valid pointers into the heap. 

:-)

One might deal with this via an extended delegate, which also has a reference to
the DLL wrapper. Could be done with a class/struct. Perhaps better to use an
Interface in the first place, since the implementation could hold a reference to
the DLL wrapper, thus steering the GC in the right direction.

p.s. The 'flip' reference is with respect to Walter's arguments against
disabling stack-array initialization (from months ago), since the array content
would likely contain relics of heap references - even random data can look like
a valid heap reference. We did, at least, get a partial resolution to that one.

Feb 06 2005

pragma <pragma_member pathlink.com> writes:

In article <cu6cnf$2kv4$1 digitaldaemon.com>, Kris says...
In article <cu6ak5$2hoj$1 digitaldaemon.com>, pragma says...
That's easy to explain.  The key issue here is that while the GC does a great
job of tracking pointer-to-data dependencies, it fails on pointer-to-code with
respect to dlls.  So without the *code* space of the dll being mapped,
delegates, function-pointers and object v-tables all slip through the cracks.

Ah; gotcha. Walter will probably flip over that notion (perhaps rightly so)
since there's now a raft of machine-code to be scanned (in addition to data
segments), some of which will probably look like valid pointers into the heap. 

Right, its not the best solution, its just one of several things that might
work. :)

There are other issues that seem to be in-between the cracks with regards to
using dlls.  

I still can't confirm that I'm doing this right, but it looks like casting an
object to an interface, when the object is passed from a dll, throws out the
object's v-table (and yields methods that do *nothing* as a result).  This flows
into yet another problem: calling 'delete' on said object from outside the dll
also creates a fault. 

 ITester tester = cast(ITester)mydll.newTestObject();  // get a new object
 tester.foo(); // does absolutely nothing, not even a fault.
 delete tester; // faults

Now if I change the code to use a base class instead of an interface, the odd
behavior goes away.  Go figure.

- EricAnderton at yahoo

Feb 06 2005

Kris <Kris_member pathlink.com> writes:

In article <cu6fcl$2q4h$1 digitaldaemon.com>, pragma says...
There are other issues that seem to be in-between the cracks with regards to
using dlls.  

Too true :-)

 ITester tester = cast(ITester)mydll.newTestObject();  // get a new object
 tester.foo(); // does absolutely nothing, not even a fault.
 delete tester; // faults


I'm lost on this one, Eric - why would one need to cast to the Interface if
mydll.newTestObject() returns one? Oh; is that the error? Should that really be
a mydll.newTestInterface() instead, where the concrete DLL class implements
ITester? Forgive me if I'm stating the obvious!

Delete does work correctly on an Interface, under normal circumstances (since
DMD 0.81 I think)

Feb 06 2005

"Matthew" <admin stlsoft.dot.dot.dot.dot.org> writes:

"pragma" <pragma_member pathlink.com> wrote in message 
news:cu6ak5$2hoj$1 digitaldaemon.com...
 In article <cu64pm$25h4$1 digitaldaemon.com>, Kris says...
This is partly what I was getting at in an earlier thread; like you, I 
feel the
DLL unloading needs to be managed by the GC (via a DLL wrapper class). 
However,
that can lead to deadlock when the GC halts all threads while it 
collects.

 Gah. I keep forgetting about that.  Thank you for setting me straight 
 once
 again. :)

I believe the GC adds the DLL static-data-area as a GC root. But I 
don't follow
as to why the entire DLL would need to be mapped. Can you elaborate, 
Eric?

 That's easy to explain.  The key issue here is that while the GC does 
 a great
 job of tracking pointer-to-data dependencies, it fails on 
 pointer-to-code with
 respect to dlls.  So without the *code* space of the dll being mapped,
 delegates, function-pointers and object v-tables all slip through the 
 cracks.

Which brings me back to my "overly complicated" solution we discussed 
last month.

Keeping code loaded is a sine qua non of component based programming. 
Glad that there's at least a few others interested. :-)

Feb 06 2005

"Ben Hinkle" <ben.hinkle gmail.com> writes:

 Since the GC is lazy, any collection pass can leave objects outstanding 
 for a
 variety of reasons (even after a full collect).  The code above is 
 *likely* to
 work, but can fail if the 'Foobar' object created in the mydll.d module is 
 still
 outsanding after the call to unload().  In that case, the 'Foobar' becomes 
 more
 of a 'Fubar' object since the gc duitifully tries to call the object's
 destructor.  Said destuctor doesn't exist anymore since the object's 
 v-table
 points to where the dll used to be.


only lets you unload "AppDomains". I don't know much about AppDomains but 
they look like a way of separating an application into distinct parts.

I agree with Kris's observation that current behavior has a problem that the 
dll's thread list is not merged with the GC's thread list.

Feb 06 2005

"Walter" <newshound digitalmars.com> writes:

"Ben Hinkle" <ben.hinkle gmail.com> wrote in message
news:cu66em$2976$1 digitaldaemon.com...
 I agree with Kris's observation that current behavior has a problem that

the
 dll's thread list is not merged with the GC's thread list.

True, that is a bug. The thread management code needs to be single instanced
like the gc is. At the moment, the DLL shouldn't create any threads.

Feb 10 2005

"Walter" <newshound digitalmars.com> writes:

"pragma" <pragma_member pathlink.com> wrote in message
news:cu62oh$215u$1 digitaldaemon.com...
 Note that there is no communication between main and the dll other than

calling
 hook and unhook.

 Since the GC is lazy, any collection pass can leave objects outstanding

for a
 variety of reasons (even after a full collect).  The code above is

*likely* to
 work, but can fail if the 'Foobar' object created in the mydll.d module is

still
 outsanding after the call to unload().  In that case, the 'Foobar' becomes

more
 of a 'Fubar' object since the gc duitifully tries to call the object's
 destructor.  Said destuctor doesn't exist anymore since the object's

v-table
 points to where the dll used to be.

 So this isn't a problem that GC-hooking doesn't solve; objects are *very*
 tightly bound to their 'home' dll.

That's why, in the example DLL, there's a call to MyDll_Terminate() which
will run the DLL's static destructors before unloading. Furthermore, the
static data area of the DLL is removed from the gc's list of roots to scan,
so the gc in the EXE is not going to be scanning the DLL's static data after
it is unloaded.

The only real problem is if an object is left hanging around that has a
destructor that resides in the now unloaded DLL. The destructor's code will
have been unloaded. (This same problem would happen with C++.) The solutions
are:

1) Don't explicitly unload the DLL, just let the OS unload it for you when
the application exits
2) Do not leave around long lived objects that have destructors
3) Keep a list of the objects the DLL creates that have destructors, and
delete them explicitly when the DLL is unloaded
4) Make the unloading of the DLL the responsibility of a destructor in an
object. In each DLL allocated object with a destructor, have a pointer to
that DLL object. Then, the DLL won't get unloaded until after the other
objects are no longer referred to themselves.

Feb 10 2005

D Programming

C/C++ Programming

Other

digitalmars.D - Dlls and object collection