digitalmars.D.learn - Wrapping a C library with its own GC + classes vs refcounted structs
- aldanor (54/54) Jan 09 2015 Hi all,
- Laeeth Isharc (37/37) Jan 10 2015 Hi Aldanor.
- aldanor (16/55) Jan 10 2015 Thanks for the reply. Yes, this concerns my HDF5 wrapper project;
- Laeeth Isharc (13/19) Jan 12 2015 An easy way is to just use scope(exit) to either close the HDF5
Hi all, I was wondering what's the most D-idiomatic way of dealing with a C library (or rather writing wrappers for a C library) that does its own GC via reference counting. The objects are identified and passed around by integer ids only; most functions like "find me an object foo in object bar" return an id and increase a refcount internally; in rare cases, a borrowed reference is returned. Whenever refcount drops to zero, the id becomes invalid and the memory (and possibly the id as well) gets eventually reused. Some C functions may explicitly or implicitly release IDs so there's also a problem of tracking whether a live D object refers to a live C object. Since the main concern here is wrapping the C library, the only data that is stored in D objects is the object id, so the objects are really lightweight in that sense. However, there's a logical hierarchy of objects that would be logical to reflect in D types either via inheritance or struct aliasing. The main question here is whether it's most appropriate in this situation to use D classes and cross the fingers relying on D's GC to trigger C's GC (i.e., ~this() to explicitly decrease refcount in the C library), or use refcounted structs (or something else?). I think I understand how RefCounted works but can't see how exactly it is applicable in cases like this or what are the consequences of using it. My initial naive guess was to use classes in D to encapsulate objects (to be able to use inheritance), so the code for the base class looks along the lines of: class ID { protected int id; private static shared Registry registry; this(int id) { // assume that refcount was already increased in C this.id = id; registry.store(this); // store weakref to track zombie objects } ~this() nogc { if (c_is_valid(id) && c_refcount(id) > 0) c_decref(id); registry.remove(this); } } class ConcreteTypeA(ID) { ... } class ConcreteTypeB(ID) { ... } where the weak static registry is required to keep track of live D objects that may refer to dead C objects and has to be traversed once in a while. However there's something sketchy about doing it this way since the lifetimes of objects are not directly controlled, plus there are situations where a temporary object is only required to exist in function's scope and is naturally expected to be released upon exit from the scope. A related thread: http://forum.dlang.org/thread/lmneclktewajznvfdawu forum.dlang.org
Jan 09 2015
Hi Aldanor. I wrote a slightly longer reply, but mislaid the file somewhere. I guess your question might relate to wrapping the HDF5 library - something that I have already done in a basic way, although I welcome your project, as no doubt we will get to a higher quality eventual solution that way. One question about accurately representing the HDF5 object hierarchy. Are you sure you wish to do this rather than present a flattened approach oriented to what makes sense to make things easy for the user in the way that is done by h5py and pytables? In terms of the actual garbage generated by this library - there are lots of small objects. The little ones are things like a file access attribute, or a schema for a dataset. But really the total size taken up by the small ones is unlikely to amount to much for scientific computing or for quant finance if you have a small number of users and are not building some kind of public web server. I think it should be satisfactory for the little objects just to wrap the C functions with a D wrapper and rely on the object destructor calling the C function to free memory. On the rare occasions when not, it will be pretty obvious to the user and he can always call destroy directly. For the big ones, maybe reference counting brings enough value to be useful - I don't know. But mostly you are either passing data to HDF5 to write, or you are receiving data from it. In the former case you pass it a pointer to the data, and I don't think it keeps it around. In the latter, you know how big the buffer needs to be, and you can just allocate something from the heap of the right size (and if using reflection, type) and use destroy on it when done. So I don't have enough experience yet with either D or HDF5 to be confident in my view, but my inclination is to think that one doesn't need to worry about reference counting. Since objects are small and there are not that many of them, relying on the destructor to be run (manually if need be) seems likely to be fine, as I understand it. I may well be wrong on this, and would like to understand the reasons if so. Laeeth.
Jan 10 2015
On Saturday, 10 January 2015 at 20:55:05 UTC, Laeeth Isharc wrote:Hi Aldanor. I wrote a slightly longer reply, but mislaid the file somewhere. I guess your question might relate to wrapping the HDF5 library - something that I have already done in a basic way, although I welcome your project, as no doubt we will get to a higher quality eventual solution that way. One question about accurately representing the HDF5 object hierarchy. Are you sure you wish to do this rather than present a flattened approach oriented to what makes sense to make things easy for the user in the way that is done by h5py and pytables? In terms of the actual garbage generated by this library - there are lots of small objects. The little ones are things like a file access attribute, or a schema for a dataset. But really the total size taken up by the small ones is unlikely to amount to much for scientific computing or for quant finance if you have a small number of users and are not building some kind of public web server. I think it should be satisfactory for the little objects just to wrap the C functions with a D wrapper and rely on the object destructor calling the C function to free memory. On the rare occasions when not, it will be pretty obvious to the user and he can always call destroy directly. For the big ones, maybe reference counting brings enough value to be useful - I don't know. But mostly you are either passing data to HDF5 to write, or you are receiving data from it. In the former case you pass it a pointer to the data, and I don't think it keeps it around. In the latter, you know how big the buffer needs to be, and you can just allocate something from the heap of the right size (and if using reflection, type) and use destroy on it when done. So I don't have enough experience yet with either D or HDF5 to be confident in my view, but my inclination is to think that one doesn't need to worry about reference counting. Since objects are small and there are not that many of them, relying on the destructor to be run (manually if need be) seems likely to be fine, as I understand it. I may well be wrong on this, and would like to understand the reasons if so. Laeeth.Thanks for the reply. Yes, this concerns my HDF5 wrapper project; the main concern is not that the memory consumption of course, but rather explicitly controlling lifetimes of the objects (especially objects like files -- so you are can be sure there are no zombie handles floating around). Most of the time when you're doing some operations on an HDF5 file you want all handles to get closed by the time you're done (i.e. by the time you leave the scope) which feels natural (e.g. close groups, links etc). Some operations in HDF5, particularly those related to linking/unlinking/closing may behave different if an object has any chilld objects with open handles. In addition to that, the C HDF5 library retains the right to reuse both the memory and id once the refcount drops to zero so it's best to be precise about that and keep a registry of weak references to all C ids that D knows about (sort of the same way as h5py does in Python).
Jan 10 2015
Laeeth. Thanks for the reply. Yes, this concerns my HDF5 wrapper project; the main concern is not that the memory consumption of course, but rather explicitly controlling lifetimes of the objects (especially objects like files -- so you are can be sure there are no zombie handles floating around).An easy way is to just use scope(exit) to either close the HDF5 object directly, or indirectly call destroy on the wrapper. If you want to make it 'idiot proof', maybe ref counting structs will get you there (at possible cost of small overhead). I personally don't tend to forget to close a file or dataset; its much easier up forget to close a data type or data space descriptor. But struct vs class depends somewhat on how you want to represent the object hierarchy in D, no ? Incidentally there are some nice things one can do using compile time code to map D structs to HDF5 types (I have implemented a simple version of this in my wrapper). A bit more work the other way around if you don't know what's in the file beforehand.
Jan 12 2015