www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - D and clusters

reply Kramer <Kramer_member pathlink.com> writes:
Has anyone tried running a D program over a cluster of any kind?  I'm wondering
how it would be handled with the GC statically built into the executable.  I'm
guessing it would be fine.  I've just been reading articles lately about
bioinformatics and how a lot of lower level libraries are built with C and then
used by Python or Java and often distributed over a cluster.  D might be a good
fit in this arena; as a library workhorse or as a pipeline driver front-end.

Kramer
Jan 18 2005
parent reply Kris <Kris_member pathlink.com> writes:
In article <csjsuu$2db2$1 digitaldaemon.com>, Kramer says...
Has anyone tried running a D program over a cluster of any kind?  I'm wondering
how it would be handled with the GC statically built into the executable.  I'm
guessing it would be fine.  I've just been reading articles lately about
bioinformatics and how a lot of lower level libraries are built with C and then
used by Python or Java and often distributed over a cluster.  D might be a good
fit in this arena; as a library workhorse or as a pipeline driver front-end.

Mango (dsource.org) has a reasonably extensive clustering package: it distributes queue & cache content as D classes, supports optimized cache-coherency, and can squirt behaviour around the network as mobile-tasks. It's also rather easy to use. However, the GC really needs to support DLLs properly to make the latter operate in a robust, truly dynamic manner. That is, if the mobile-code functionality you need can be defined statically (per cluster node) then it will currently operate just fine within a cluster. If you need dynamic Java-style loading of classes via a DLL distribution mechanism, then you may run into the MM problems that plague D & DLLs -- each DLL will end up with it's own GC, which can wreak havoc if you're not rather careful. Mango.cluster is designed to handle both scenarios, but you currently have to be aware of the multiple GC issues within the dynamic scenario. There are a number of past topics where people are lamenting the lack of useful GC support vis-a-vis DLLs. Any dynamic system will run into these problems: imagine having to reconfigure & reboot an entire site just to install a new servlet ... Walter perfers everything to be statically linked, and previously indicated that he does not like DLLs at all (due to potential versioning problems). This is likely the reason why the multiple GC problem apparently has rather minimal priority. Needless to say, many of us believe D would benefit greatly from some attention in this arena. Two things need to happen: 1) the GC has to be isolated into a DLL itself (so there's only one instance) 2) As I recall, the static-data extents of each DLL have to be registered with the GC (in the same manner as executables). This could theoretically be done manually, but should be done by the compiler instead. Sean has been working on #1 (as part of the 'Ares' project), while #2 really needs support from Walter himself. I can only suggest that more people encourage Walter to assist. Failing that, Sean's work could get picked up by GDC and a language fork could occur. - Kris
Jan 19 2005
next sibling parent Kramer <Kramer_member pathlink.com> writes:
In article <csmcjh$5ro$1 digitaldaemon.com>, Kris says...

[snip]
Mango.cluster is designed to handle both scenarios, but you currently have to be
aware of the multiple GC issues within the dynamic scenario. There are a number
of past topics where people are lamenting the lack of useful GC support
vis-a-vis DLLs. Any dynamic system will run into these problems: imagine having
to reconfigure & reboot an entire site just to install a new servlet ...

Walter perfers everything to be statically linked, and previously indicated that
he does not like DLLs at all (due to potential versioning problems). This is
likely the reason why the multiple GC problem apparently has rather minimal
priority.

Needless to say, many of us believe D would benefit greatly from some attention
in this arena.

Two things need to happen:

1) the GC has to be isolated into a DLL itself (so there's only one instance)

2) As I recall, the static-data extents of each DLL have to be registered with
the GC (in the same manner as executables). This could theoretically be done
manually, but should be done by the compiler instead.

Sean has been working on #1 (as part of the 'Ares' project), while #2 really
needs support from Walter himself. I can only suggest that more people encourage
Walter to assist. Failing that, Sean's work could get picked up by GDC and a
language fork could occur.

- Kris

There definetly were some posts mentioning DLL's and the GC as important issues when the MIID thread was active. digitalmars.D/10456 digitalmars.D/9166 digitalmars.D/10555 -Kramer
Jan 20 2005
prev sibling parent "Matthew" <admin.hat stlsoft.dot.org> writes:
"Kris" <Kris_member pathlink.com> wrote in message
news:csmcjh$5ro$1 digitaldaemon.com...
 In article <csjsuu$2db2$1 digitaldaemon.com>, Kramer says...
Has anyone tried running a D program over a cluster of any kind?  I'm wondering
how it would be handled with the GC statically built into the executable.  I'm
guessing it would be fine.  I've just been reading articles lately about
bioinformatics and how a lot of lower level libraries are built with C and then
used by Python or Java and often distributed over a cluster.  D might be a good
fit in this arena; as a library workhorse or as a pipeline driver front-end.

Mango (dsource.org) has a reasonably extensive clustering package: it distributes queue & cache content as D classes, supports optimized cache-coherency, and can squirt behaviour around the network as mobile-tasks. It's also rather easy to use. However, the GC really needs to support DLLs properly to make the latter operate in a robust, truly dynamic manner. That is, if the mobile-code functionality you need can be defined statically (per cluster node) then it will currently operate just fine within a cluster. If you need dynamic Java-style loading of classes via a DLL distribution mechanism, then you may run into the MM problems that plague D & DLLs -- each DLL will end up with it's own GC, which can wreak havoc if you're not rather careful. Mango.cluster is designed to handle both scenarios, but you currently have to be aware of the multiple GC issues within the dynamic scenario. There are a number of past topics where people are lamenting the lack of useful GC support vis-a-vis DLLs. Any dynamic system will run into these problems: imagine having to reconfigure & reboot an entire site just to install a new servlet ... Walter perfers everything to be statically linked, and previously indicated that he does not like DLLs at all (due to potential versioning problems). This is likely the reason why the multiple GC problem apparently has rather minimal priority. Needless to say, many of us believe D would benefit greatly from some attention in this arena. Two things need to happen: 1) the GC has to be isolated into a DLL itself (so there's only one instance) 2) As I recall, the static-data extents of each DLL have to be registered with the GC (in the same manner as executables). This could theoretically be done manually, but should be done by the compiler instead. Sean has been working on #1 (as part of the 'Ares' project), while #2 really needs support from Walter himself. I can only suggest that more people encourage Walter to assist. Failing that, Sean's work could get picked up by GDC and a language fork could occur.

I'm not convinced that the GC *has* to be in a DLL/.so, but I completely agree that this issue needs to be sorted. To be frank, I'm surprised it's not received any input in the months I've been away. I'll certainly help lend a voice and, sometime later next month, technical input to the cause. But I would say now that I believe that D should support the following scenarios, all correctly functional: 1. Compilation of an exe, statically linked, without any non-system runtime dynamic library dependencies 2. Compilation of an exe, dynamically linked to use "The D DLL" (let's call it DGC.DLL) 3. Compilation of an exe, statically linked, that can load a statically linked D DLL 4. Compilation of an exe, statically linked, that can load a dynamically linked D DLL (i.e. the DLL uses DCG.DLL) 5. Compilation of an exe, dynamically linked to DCG.DLL, that can load a statically linked D DLL 6. Compilation of an exe, dynamically linked to DCG.DLL, that can load a dynamically linked D DLL (i.e. the DLL uses DCG.DLL) If it fails to do any of these, it's still born, IMO, since it will fail to be better than C++ and/or Java/.NET in their respective areas of weakness. AFAIK the current state of play is that only 1 is supported, and possibly 3. If we say that 1, 3, 5 & 4 are not needed, then it's easy to do, but D becomes another VM/Dll-hell white elephant joke like Java and .NET, suitable only for large-scale, highly proactively managed, projects whose installations have to be nursed by experts. I think I proposed many moons ago that the GC objects inside the exe *and* inside any DLLs must, at the epoch of their initialisation, work out who is in first, and defer to that. Naturally, there are some complications, since one might load two D DLLs from a non-D program. In such a case, were the second D DLL to defer all its GC to the first, and the first to be unloaded, the second one might snuff it in an unseemly fashion. Methinks that the better way would be to associate the GC with the _process_, rather than the _module_, and so each D-GC-using module either creates the GC, or attaches to it if it already exists. Naturally, the single per-process GC would have to operate some kind of reference counting. The other complication, of course, is where the GC code reside. If it's not in the process, but rather in a DLL, the second and subsequent D DLLs would themselves have to take module references (a la LoadLibrary(GetModuleFileName())) on the first, so as to ensure that it's code stays in the process. I'm not sure of all the subtleties involved here off the top of my head, but at worst case it might mean that the first D DLL would remain in memory for the lifetime of the process.
Jan 21 2005