www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - D may disappoint in the presence of an alien Garbage Collector?

reply "Carl Sturtivant" <sturtivant gmail.com> writes:
Suppose I want to use D as a system programming language to work 
with a library of functions written in another language, 
operating on dynamically typed data that has its own garbage 
collector, such as an algebra system or the virtual machine of a 
dynamically typed scripting language viewed as a library of 
operations on its own data type. For concreteness, suppose the 
library is written in C. (More generally, the data need not 
restricted to the kind above, but for concreteness, make that 
supposition.)

Data in such a system is usually a (possibly elaborate) tagged 
union, that is essentially a struct consisting of (say) two 
words, the first indicating the type and perhaps containing some 
bits that indicate other attributes, and the second containing 
the data, which may be held directly or indirectly. Call this a 
Descriptor.

Descriptors are small, so it's natural to want them held by value 
and not allocated on the heap (either D's or the library's) 
unless they are a part of a bigger structure that naturally 
resides there. And it's natural to want them to behave like 
values when passed as parameters or assigned. This usually fits 
in with the sort of heterogeneous copy semantics of such a 
library, where some of the dynamic types are implicitly reference 
types and others are not.

The trouble is that the library's alien GC needs to be made aware 
of each Descriptor when it appears and when it disappears, so 
that a call of a library function that allocates storage doesn't 
trigger a garbage collection that vacuums up library allocated 
storage that a D Descriptor points to, or fails to adjust a 
pointer inside a D descriptor when it moves the corresponding 
data, or worse, follows a garbage pointer from an invalid D 
Descriptor that's gone out of scope. This requirement applies to 
local variables, parameters and temporaries, as well as to other 
situations, like D arrays of Descriptors that are D-heap 
allocated. Ignore the latter kind of occasion for now.

Abstract the process of informing the GC of a Descriptor's 
existence as a Protect operation, and that it will be out of 
scope as an Unprotect operation. Protect and Unprotect naturally 
need the address of the storage holding the relevant Descriptor.

In a nutshell, the natural requirement when interfacing to such a 
library is to add Descriptor as a new value type in D along the 
lines described above, with a definition such that Protect and 
Unprotect operations are compiled to be performed automatically 
at the appropriate junctures so that the user of the library can 
forget about garbage collection to the usual extent.

How can this requirement be fulfilled?
Jul 28 2014
next sibling parent reply "Anton" <Anton nowhere.de> writes:
On Monday, 28 July 2014 at 19:57:38 UTC, Carl Sturtivant wrote:
 Suppose I want to use D as a system programming language to 
 work with a library of functions written in another language, 
 operating on dynamically typed data that has its own garbage 
 collector, such as an algebra system or the virtual machine of 
 a dynamically typed scripting language viewed as a library of 
 operations on its own data type. For concreteness, suppose the 
 library is written in C. (More generally, the data need not 
 restricted to the kind above, but for concreteness, make that 
 supposition.)

 Data in such a system is usually a (possibly elaborate) tagged 
 union, that is essentially a struct consisting of (say) two 
 words, the first indicating the type and perhaps containing 
 some bits that indicate other attributes, and the second 
 containing the data, which may be held directly or indirectly. 
 Call this a Descriptor.

 Descriptors are small, so it's natural to want them held by 
 value and not allocated on the heap (either D's or the 
 library's) unless they are a part of a bigger structure that 
 naturally resides there. And it's natural to want them to 
 behave like values when passed as parameters or assigned. This 
 usually fits in with the sort of heterogeneous copy semantics 
 of such a library, where some of the dynamic types are 
 implicitly reference types and others are not.

 The trouble is that the library's alien GC needs to be made 
 aware of each Descriptor when it appears and when it 
 disappears, so that a call of a library function that allocates 
 storage doesn't trigger a garbage collection that vacuums up 
 library allocated storage that a D Descriptor points to, or 
 fails to adjust a pointer inside a D descriptor when it moves 
 the corresponding data, or worse, follows a garbage pointer 
 from an invalid D Descriptor that's gone out of scope. This 
 requirement applies to local variables, parameters and 
 temporaries, as well as to other situations, like D arrays of 
 Descriptors that are D-heap allocated. Ignore the latter kind 
 of occasion for now.

 Abstract the process of informing the GC of a Descriptor's 
 existence as a Protect operation, and that it will be out of 
 scope as an Unprotect operation. Protect and Unprotect 
 naturally need the address of the storage holding the relevant 
 Descriptor.

 In a nutshell, the natural requirement when interfacing to such 
 a library is to add Descriptor as a new value type in D along 
 the lines described above, with a definition such that Protect 
 and Unprotect operations are compiled to be performed 
 automatically at the appropriate junctures so that the user of 
 the library can forget about garbage collection to the usual 
 extent.

 How can this requirement be fulfilled?
Suppose I want to do system programming...Would I choose the option with a GC ? Just get off. The GC is just such a fagot. People are smart enough to manage memory.
Jul 28 2014
parent "Carl Sturtivant" <sturtivant gmail.com> writes:
On Monday, 28 July 2014 at 20:52:01 UTC, Anton wrote:
 On Monday, 28 July 2014 at 19:57:38 UTC, Carl Sturtivant wrote:
 Suppose I want to use D as a system programming language to 
 work with a library of functions written in another language, 
 operating on dynamically typed data that has its own garbage 
 collector, such as an algebra system or the virtual machine of 
 a dynamically typed scripting language viewed as a library of 
 operations on its own data type. For concreteness, suppose the 
 library is written in C. (More generally, the data need not 
 restricted to the kind above, but for concreteness, make that 
 supposition.)

 Data in such a system is usually a (possibly elaborate) tagged 
 union, that is essentially a struct consisting of (say) two 
 words, the first indicating the type and perhaps containing 
 some bits that indicate other attributes, and the second 
 containing the data, which may be held directly or indirectly. 
 Call this a Descriptor.

 Descriptors are small, so it's natural to want them held by 
 value and not allocated on the heap (either D's or the 
 library's) unless they are a part of a bigger structure that 
 naturally resides there. And it's natural to want them to 
 behave like values when passed as parameters or assigned. This 
 usually fits in with the sort of heterogeneous copy semantics 
 of such a library, where some of the dynamic types are 
 implicitly reference types and others are not.

 The trouble is that the library's alien GC needs to be made 
 aware of each Descriptor when it appears and when it 
 disappears, so that a call of a library function that 
 allocates storage doesn't trigger a garbage collection that 
 vacuums up library allocated storage that a D Descriptor 
 points to, or fails to adjust a pointer inside a D descriptor 
 when it moves the corresponding data, or worse, follows a 
 garbage pointer from an invalid D Descriptor that's gone out 
 of scope. This requirement applies to local variables, 
 parameters and temporaries, as well as to other situations, 
 like D arrays of Descriptors that are D-heap allocated. Ignore 
 the latter kind of occasion for now.

 Abstract the process of informing the GC of a Descriptor's 
 existence as a Protect operation, and that it will be out of 
 scope as an Unprotect operation. Protect and Unprotect 
 naturally need the address of the storage holding the relevant 
 Descriptor.

 In a nutshell, the natural requirement when interfacing to 
 such a library is to add Descriptor as a new value type in D 
 along the lines described above, with a definition such that 
 Protect and Unprotect operations are compiled to be performed 
 automatically at the appropriate junctures so that the user of 
 the library can forget about garbage collection to the usual 
 extent.

 How can this requirement be fulfilled?
Suppose I want to do system programming...Would I choose the option with a GC ? Just get off. The GC is just such a fagot. People are smart enough to manage memory.
It's the library to interface to that has its own GC, not my code. I just need to use D's system programming capabilities to work around the library's nasty GC so my data used by my calls to that library isn't trashed, and to do that efficiently and transparently. A system programming language should be able to efficiently interface to anything, right?
Jul 29 2014
prev sibling next sibling parent reply "Rene Zwanenburg" <renezwanenburg gmail.com> writes:
On Monday, 28 July 2014 at 19:57:38 UTC, Carl Sturtivant wrote:
 Suppose I want to use D as a system programming language to 
 work with a library of functions written in another language, 
 operating on dynamically typed data that has its own garbage 
 collector, such as an algebra system or the virtual machine of 
 a dynamically typed scripting language viewed as a library of 
 operations on its own data type. For concreteness, suppose the 
 library is written in C. (More generally, the data need not 
 restricted to the kind above, but for concreteness, make that 
 supposition.)

 Data in such a system is usually a (possibly elaborate) tagged 
 union, that is essentially a struct consisting of (say) two 
 words, the first indicating the type and perhaps containing 
 some bits that indicate other attributes, and the second 
 containing the data, which may be held directly or indirectly. 
 Call this a Descriptor.

 Descriptors are small, so it's natural to want them held by 
 value and not allocated on the heap (either D's or the 
 library's) unless they are a part of a bigger structure that 
 naturally resides there. And it's natural to want them to 
 behave like values when passed as parameters or assigned. This 
 usually fits in with the sort of heterogeneous copy semantics 
 of such a library, where some of the dynamic types are 
 implicitly reference types and others are not.

 The trouble is that the library's alien GC needs to be made 
 aware of each Descriptor when it appears and when it 
 disappears, so that a call of a library function that allocates 
 storage doesn't trigger a garbage collection that vacuums up 
 library allocated storage that a D Descriptor points to, or 
 fails to adjust a pointer inside a D descriptor when it moves 
 the corresponding data, or worse, follows a garbage pointer 
 from an invalid D Descriptor that's gone out of scope. This 
 requirement applies to local variables, parameters and 
 temporaries, as well as to other situations, like D arrays of 
 Descriptors that are D-heap allocated. Ignore the latter kind 
 of occasion for now.

 Abstract the process of informing the GC of a Descriptor's 
 existence as a Protect operation, and that it will be out of 
 scope as an Unprotect operation. Protect and Unprotect 
 naturally need the address of the storage holding the relevant 
 Descriptor.

 In a nutshell, the natural requirement when interfacing to such 
 a library is to add Descriptor as a new value type in D along 
 the lines described above, with a definition such that Protect 
 and Unprotect operations are compiled to be performed 
 automatically at the appropriate junctures so that the user of 
 the library can forget about garbage collection to the usual 
 extent.

 How can this requirement be fulfilled?
If I understand you correctly, an easy way is to use RefCounted with a simple wrapper. Something like this: // Descriptor defined by the external library struct DescriptorImpl { size_t type; void* data; } // Tiny wrapper telling the alien GC of the existence of this reference private struct DescriptorWrapper { DescriptorImpl descriptor; alias descriptor this; disable this(); this(DescriptorImpl desc) { // Make alien GC aware of this reference } ~this() { // Make alien GC aware this reference is no longer valid } } // This is the type you will be working with on the D side alias Descriptor = RefCounted!DescriptorWrapper;
Jul 28 2014
parent "Carl Sturtivant" <sturtivant gmail.com> writes:
On Monday, 28 July 2014 at 21:33:54 UTC, Rene Zwanenburg wrote:
 If I understand you correctly, an easy way is to use RefCounted 
 with a simple wrapper. Something like this:

 // Descriptor defined by the external library
 struct DescriptorImpl
 {
   size_t type;
   void* data;
 }

 // Tiny wrapper telling the alien GC of the existence of this 
 reference
 private struct DescriptorWrapper
 {
   DescriptorImpl descriptor;
   alias descriptor this;

    disable this();

   this(DescriptorImpl desc)
   {
     // Make alien GC aware of this reference
   }

  ~this()
   {
     // Make alien GC aware this reference is no longer valid
   }
 }

 // This is the type you will be working with on the D side
 alias Descriptor = RefCounted!DescriptorWrapper;
Just read RefCounted definition here, http://dlang.org/phobos/std_typecons.html#.RefCounted and it heap allocates its object, so your response above does not stack allocate the basic type that you call DescriptorWrapper, and is not a solution to the problem as stated. If there was no alien GC, but everything else was the same, heap allocation of something containing a DescriptorImpl would be unnecessary. Now achieve the same with the alien GC present without an extra layer of indirection and heap allocation --- this is the essence of my question.
Jul 29 2014
prev sibling next sibling parent "Kagamin" <spam here.lot> writes:
Registering a descriptor with moving GC is not enough, you should 
also fix the pointer so that it's not moved.
Jul 29 2014
prev sibling parent "Kagamin" <spam here.lot> writes:
The better way would be to interact through a COM interface, 
which would abstract tricks of the library code. Advanced 
environments are usually able to generate such interface.
Jul 29 2014