www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - std.concurrency wrapper over MPI?

reply dsimcha <dsimcha yahoo.com> writes:
I've finally bitten the bullet and learned MPI 
(http://en.wikipedia.org/wiki/Message_passing_interface) for an ultra 
computationally intensive research project I've been working on lately. 
  I wrote all the MPI-calling code in D against the C API, using a very 
quick-and-dirty (i.e. not releasable) translation of the parts of the 
header I needed.

I'm halfway-thinking of writing a std.concurrency-like interface on top 
of MPI in D.  A few questions:

1.  Is anyone besides me interested in this?

2.  Is anyone already working on something similar.

3.  Would this be Phobos material even though it would depend on MPI, or 
would it better be kept as a 3rd party library?

4.  std.concurrency in its current incarnation doesn't allow objects 
with mutable indirection to be passed as messages.   This makes sense 
when passing messages between threads in the same address space. 
However, for passing between MPI processes, the object is going to be 
copied anyhow.  Should the restriction be kept (for consistency) or 
removed (because it doesn't serve much of a purpose in the MPI context)?

5.  For passing complex object graphs, serialization would obviously be 
necessary.  What's the current state of the art in serialization in D? 
I want something that's efficient and general first and foremost.  I 
really don't care about human readability or standards compliance (in 
other words, no XML or JSON or anything like that).
Aug 05 2011
next sibling parent reply Russel Winder <russel russel.org.uk> writes:
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Fri, 2011-08-05 at 23:51 -0400, dsimcha wrote:
[ . . . ]
 1.  Is anyone besides me interested in this?

MPI may be ancient, it may be a bit daft in terms of its treatment of marshalling, unmarshalling and serializing, it may be only a Fortran and C thing bolted into C++ (quite well) but it is the de facto standard for HPC. OK so HPC is about 10% of world-wide computing, probably less than that of spend despite the enormous per installation price, but it is about 90% of political marketing. Any short term parallelism strategy must include MPI -- and work with OpenMPI and MPICH2. So I don't think it is a matter of just interest for D, I would say that if D is to stand with C++, C and Fortran then there has to be an MPI API. Even though MPI should be banned going forward.
 2.  Is anyone already working on something similar.
=20
 3.  Would this be Phobos material even though it would depend on MPI, or=

 would it better be kept as a 3rd party library?

Given that it requires a transitive dependency then either Phobos goes forward with optional dependencies or the MPI API is a separate thing. Given my personal opinion that actor model, dataflow model, agents, etc. should be the application level concurrency and parallelism model, I would be quite happy with an MPI API not being in Phobos. Keep Phobos for that which every D installation will need. MPI is a niche market in that respect. Optional dependencies sort of work but are sort of a real pain in the Java/Maven milieu.
 4.  std.concurrency in its current incarnation doesn't allow objects=20
 with mutable indirection to be passed as messages.   This makes sense=20
 when passing messages between threads in the same address space.=20
 However, for passing between MPI processes, the object is going to be=20
 copied anyhow.  Should the restriction be kept (for consistency) or=20
 removed (because it doesn't serve much of a purpose in the MPI context)?

At the root of this issue is local thread-based parallelism in a shared memory context, vs cluster parallelism. MPI is a cluster solution -- even though it can be used in multicore shared memory situation. The point about enforced copying vs. potential sharing is core to this obviously. This has to be handled with absolute top notch performance in mind. It is arguably a situation where programming language semantics and purity have to be sacrificed at the altar of performance. There are already far too many MPI applications that are written with far too much comms code in the application simply to ensure performance -- because the MPI infrastructure cannot be trusted to do things fast enough if you use anything other than the bottom most layer.
 5.  For passing complex object graphs, serialization would obviously be=

 necessary.  What's the current state of the art in serialization in D?=

 I want something that's efficient and general first and foremost.  I=20
 really don't care about human readability or standards compliance (in=20
 other words, no XML or JSON or anything like that).

Again performance is everything, so nothing must get in the way of having something that cannot be made faster. The main problem here is going to be that when anything gets released performance will be the only yardstick by which things are measured. Simplicity of code, ease of evolution of code, all the things professional developers value, will go out of the window. It's HPC after all :-) I still think D needs a dataflow, CSP and data parallelism strategy, cf. Go, GPars, Akka, even Haskell. Having actors is good, but having only actors is not good, cf. Scala and Akka. --=20 Russel. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.winder ekiga.n= et 41 Buckmaster Road m: +44 7770 465 077 xmpp: russel russel.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder
Aug 05 2011
next sibling parent dsimcha <dsimcha yahoo.com> writes:
On 8/6/2011 2:57 AM, Russel Winder wrote:
 The main problem here is going to be that when anything gets released
 performance will be the only yardstick by which things are measured.
 Simplicity of code, ease of evolution of code, all the things
 professional developers value, will go out of the window.  It's HPC
 after all :-)

This is why, even though I do stuff that's arguably HPC, I can't stand the HPC community. Of course performance is important, but nothing should be so sacred as to be completely immune to tradeoffs. The thing that drew me to D is that you can get pretty good performance out of it without sacrificing that much ease of use compared to dynamic languages. Besides, you can always provide a high-level but not-that-efficient API for most cases and a lower-level API for when more control is needed. Anyhow, D has one key advantage that makes it more tolerant of communication overhead than most languages: std.parallelism. At least the way things are set up on the cluster here at Johns Hopkins, each node has 8 cores. The "traditional" MPI way of doing things is apparently to allocate 8 MPI processes per node in this case, one per core. Instead, I'm allocating one process per node, using MPI only for very coarse grained parallelism and using std.parallelism for more fine-grained parallelism to keep all 8 cores occupied with one MPI process.
Aug 06 2011
prev sibling parent dsimcha <dsimcha yahoo.com> writes:
On 8/6/2011 2:57 AM, Russel Winder wrote:
 The main problem here is going to be that when anything gets released
 performance will be the only yardstick by which things are measured.
 Simplicity of code, ease of evolution of code, all the things
 professional developers value, will go out of the window.  It's HPC
 after all :-)

Now that I think of it, there's also the option of porting boost::mpi to D and then possibly writing a std.concurrency-like wrapper on top of that (or not).
Aug 06 2011
prev sibling next sibling parent Jacob Carlborg <doob me.com> writes:
On 2011-08-06 05:51, dsimcha wrote:
 5. For passing complex object graphs, serialization would obviously be
 necessary. What's the current state of the art in serialization in D? I
 want something that's efficient and general first and foremost. I really
 don't care about human readability or standards compliance (in other
 words, no XML or JSON or anything like that).

My rewrite of Orange is almost finished. It can currently only serialize to XML, but it's possible to create new archive types for other formats. I have no idea about the performance, I'm mostly focusing on be able to serialize as many types as possible. http://dsource.org/projects/orange/ -- /Jacob Carlborg
Aug 06 2011
prev sibling next sibling parent bearophile <bearophileHUGS lycos.com> writes:
dsimcha:

 1.  Is anyone besides me interested in this?

Other people are interested.
 3.  Would this be Phobos material even though it would depend on MPI, or 
 would it better be kept as a 3rd party library?

I'd like one or more Phobos modules built on top of the basic MPI, so I think it's better to have MPI too in Phobos.
 Should the restriction be kept (for consistency) or 
 removed (because it doesn't serve much of a purpose in the MPI context)?

This not easy to say now, for me. Bye, bearophile
Aug 06 2011
prev sibling next sibling parent Russel Winder <russel russel.org.uk> writes:
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Sat, 2011-08-06 at 10:09 -0400, dsimcha wrote:
[ . . . ]
 Anyhow, D has one key advantage that makes it more tolerant of=20
 communication overhead than most languages:  std.parallelism.  At least=

 the way things are set up on the cluster here at Johns Hopkins, each=20
 node has 8 cores.  The "traditional" MPI way of doing things is=20
 apparently to allocate 8 MPI processes per node in this case, one per=20
 core.  Instead, I'm allocating one process per node, using MPI only for=

 very coarse grained parallelism and using std.parallelism for more=20
 fine-grained parallelism to keep all 8 cores occupied with one MPI proces=

I think increasingly the idiom in the Fortran/C/C++ HPC community is to use MPI on a per address space basis, rather than a per ALU basis, and to use OpenMP to handle the thread control in a given address space handling the multicores. (OpenMP being something totally different to OpenMPI.) In the C++ arena though there is Threading Building Blocks (TBB) which has element of arcane-ness but is a whole lot better than OpenMP. As you point out there are much better, generally higher-level, abstractions that would make HPC code faster as well as much, much easier to maintain. However even with Intel's high budget marketing of some of the alternatives, the HPC community seem steadfast in their support of MPI and OpenMP. Of course they also have codes from the 1970s and 1980s they are in continued use because no-one is prepared to rewrite them. =20 --=20 Russel. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.winder ekiga.n= et 41 Buckmaster Road m: +44 7770 465 077 xmpp: russel russel.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder
Aug 06 2011
prev sibling next sibling parent reply Sean Kelly <sean invisibleduck.org> writes:
I'd love to be able to send classes between processes, but first we need a g=
ood serialization/deserialization mechanism.=20

Sent from my iPhone

On Aug 5, 2011, at 8:51 PM, dsimcha <dsimcha yahoo.com> wrote:

 I've finally bitten the bullet and learned MPI (http://en.wikipedia.org/wi=

h project I've been working on lately. I wrote all the MPI-calling code in D= against the C API, using a very quick-and-dirty (i.e. not releasable) trans= lation of the parts of the header I needed.
=20
 I'm halfway-thinking of writing a std.concurrency-like interface on top of=

=20
 1.  Is anyone besides me interested in this?
=20
 2.  Is anyone already working on something similar.
=20
 3.  Would this be Phobos material even though it would depend on MPI, or w=

=20
 4.  std.concurrency in its current incarnation doesn't allow objects with m=

messages between threads in the same address space. However, for passing be= tween MPI processes, the object is going to be copied anyhow. Should the re= striction be kept (for consistency) or removed (because it doesn't serve muc= h of a purpose in the MPI context)?
=20
 5.  For passing complex object graphs, serialization would obviously be ne=

omething that's efficient and general first and foremost. I really don't ca= re about human readability or standards compliance (in other words, no XML o= r JSON or anything like that).
Aug 06 2011
next sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2011-08-06 18:32, Sean Kelly wrote:
 I'd love to be able to send classes between processes, but first we need a
good serialization/deserialization mechanism.

Have a look at Orange, I don't know if it's considered good but it works for almost all types available in D, the only available archive is currently XML. http://dsource.org/projects/orange/ -- /Jacob Carlborg
Aug 06 2011
parent reply Jacob Carlborg <doob me.com> writes:
On 2011-08-07 02:24, Sean Kelly wrote:
 Is the archive formatter dynamically pluggable?

I'm not exactly sure what you mean but you can create new archive types and use them with the existing serializer. When creating a new serializer it takes an archive (as an interface) as a parameter.
 Sent from my iPhone

 On Aug 6, 2011, at 11:51 AM, Jacob Carlborg<doob me.com>  wrote:

 On 2011-08-06 18:32, Sean Kelly wrote:
 I'd love to be able to send classes between processes, but first we need a
good serialization/deserialization mechanism.

Have a look at Orange, I don't know if it's considered good but it works for almost all types available in D, the only available archive is currently XML. http://dsource.org/projects/orange/ -- /Jacob Carlborg


-- /Jacob Carlborg
Aug 07 2011
parent Jacob Carlborg <doob me.com> writes:
On 2011-08-07 18:15, Sean Kelly wrote:
 I was mostly wondering if the serialized was all template code or if the
archived portion used some form of polymorphism. Sounds like its the latter.

The serializer uses template methods, the archive uses interfaces and virtual methods.
 Sent from my iPhone

 On Aug 7, 2011, at 8:19 AM, Jacob Carlborg<doob me.com>  wrote:

 On 2011-08-07 02:24, Sean Kelly wrote:
 Is the archive formatter dynamically pluggable?

I'm not exactly sure what you mean but you can create new archive types and use them with the existing serializer. When creating a new serializer it takes an archive (as an interface) as a parameter.
 Sent from my iPhone

 On Aug 6, 2011, at 11:51 AM, Jacob Carlborg<doob me.com>   wrote:

 On 2011-08-06 18:32, Sean Kelly wrote:
 I'd love to be able to send classes between processes, but first we need a
good serialization/deserialization mechanism.

Have a look at Orange, I don't know if it's considered good but it works for almost all types available in D, the only available archive is currently XML. http://dsource.org/projects/orange/ -- /Jacob Carlborg


-- /Jacob Carlborg


-- /Jacob Carlborg
Aug 07 2011
prev sibling parent dsimcha <dsimcha yahoo.com> writes:
On 8/6/2011 12:32 PM, Sean Kelly wrote:
 I'd love to be able to send classes between processes, but first we need a
good serialization/deserialization mechanism.

The more I think about it, the more I think that std.concurrency isn't quite the right interface for cluster parallelism. I'm thinking instead of doing something loosely based on, but not a translation of, boost::mpi. The following differences between std.concurrency and what makes sense for MPI bother me: 1. shared/immutable isn't needed when you're copying the data anyhow. 2. spawn() is taken care of by the MPI runtime. 3. std.concurrency doesn't support broadcasting.
Aug 07 2011
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Friday 05 August 2011 23:51:24 dsimcha wrote:
 I've finally bitten the bullet and learned MPI
 (http://en.wikipedia.org/wiki/Message_passing_interface) for an ultra
 computationally intensive research project I've been working on lately.
   I wrote all the MPI-calling code in D against the C API, using a very
 quick-and-dirty (i.e. not releasable) translation of the parts of the
 header I needed.
 
 I'm halfway-thinking of writing a std.concurrency-like interface on top
 of MPI in D.  A few questions:
 
 1.  Is anyone besides me interested in this?

Personally, I've never heard of MPI and have no interest in it whatsoever, but I don't do much with concurrent programming. Others will probably be far more interested though.
 3.  Would this be Phobos material even though it would depend on MPI, or
 would it better be kept as a 3rd party library?

If MPI is something which can be found by default on your typical OS install, then it may be okay to have it in Phobos. But in general, I would think that if you need to install 3rd party libraries to use it, it should probably be a 3rd party library itself. It may be a bit of a grey area though. If we have many libraries such as that, then we may want to create an official (or at least pseudo-official) project which contains the major ones to make them easy to find. Regardless, we _don't_ want Phobos to require extra dependencies which aren't normally found on your typical OS install such that you have to install them even if you don't use the functionality that they're needed for. - Jonathan M Davis
Aug 06 2011
prev sibling next sibling parent reply jdrewsen <jdrewsen nospam.com> writes:
Den 06-08-2011 05:51, dsimcha skrev:
 I've finally bitten the bullet and learned MPI
 (http://en.wikipedia.org/wiki/Message_passing_interface) for an ultra
 computationally intensive research project I've been working on lately.
 I wrote all the MPI-calling code in D against the C API, using a very
 quick-and-dirty (i.e. not releasable) translation of the parts of the
 header I needed.

 I'm halfway-thinking of writing a std.concurrency-like interface on top
 of MPI in D. A few questions:

 1. Is anyone besides me interested in this?

 2. Is anyone already working on something similar.

 3. Would this be Phobos material even though it would depend on MPI, or
 would it better be kept as a 3rd party library?

I think std.concurrency needs to define a new interface for passing messages out-of-process ie. other process or host. The implementation itself should probably be 3rd party since there are many serialized representations and protocols out there to pick from.
 4. std.concurrency in its current incarnation doesn't allow objects with
 mutable indirection to be passed as messages. This makes sense when
 passing messages between threads in the same address space. However, for
 passing between MPI processes, the object is going to be copied anyhow.
 Should the restriction be kept (for consistency) or removed (because it
 doesn't serve much of a purpose in the MPI context)?

 5. For passing complex object graphs, serialization would obviously be
 necessary. What's the current state of the art in serialization in D? I
 want something that's efficient and general first and foremost. I really
 don't care about human readability or standards compliance (in other
 words, no XML or JSON or anything like that).

AFAIK David Nadlinger is handling serialization in his GSOC Thrift project that he is working on currently. /Jonas
Aug 06 2011
next sibling parent reply dsimcha <dsimcha yahoo.com> writes:
On 8/6/2011 5:38 PM, jdrewsen wrote:
 AFAIK David Nadlinger is handling serialization in his GSOC Thrift
 project that he is working on currently.

 /Jonas

Good to know, but what flavor? As I see it there is a three-way tradeoff in serialization. In order of importance for distributed parallelism, the qualities are: 1. Efficiency. How much does it cost to serialize/unserialize something and how much space overhead is there? 2. Flexibility w.r.t. types: How many types can be serialized? How faithfully are they reproduced on the other end w.r.t. things like pointer/reference/slice aliasing? 3. Standardization: How universally understood is the format? Can it be used to send data across different CPU architectures? Across languages? Is it human readable? Is it based on some meta-format like XML? For enterprisey use cases, I think this ordering would probably be completely reversed. For example, in a typical MPI cluster all nodes are of the same architecture, so it's usually perfectly reasonable to send arrays of primitives as just raw bits. I imagine this is a terrible idea in other contexts that I know less about.
Aug 06 2011
next sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2011-08-07 00:09, dsimcha wrote:
 On 8/6/2011 5:38 PM, jdrewsen wrote:
 AFAIK David Nadlinger is handling serialization in his GSOC Thrift
 project that he is working on currently.

 /Jonas

Good to know, but what flavor? As I see it there is a three-way tradeoff in serialization. In order of importance for distributed parallelism, the qualities are:

I can answer these tradeoff for the Orange serialization library, http://dsource.org/projects/orange/.
 1. Efficiency. How much does it cost to serialize/unserialize something
 and how much space overhead is there?

I haven't done any measurements but I would guess it depends on which archive type is used. The actual serializer tries to do quite a lot, where possible, at compile time. But it also stores a reference for every serialized value, in the case a pointer points to the value.
 2. Flexibility w.r.t. types: How many types can be serialized? How
 faithfully are they reproduced on the other end w.r.t. things like
 pointer/reference/slice aliasing?

If I haven't missed something Orange can serialize almost all types, except unions, function pointers, void pointers and delegates.
 3. Standardization: How universally understood is the format? Can it be
 used to send data across different CPU architectures? Across languages?
 Is it human readable? Is it based on some meta-format like XML?

Currently, the only available format is XML. -- /Jacob Carlborg
Aug 07 2011
next sibling parent reply dsimcha <dsimcha yahoo.com> writes:
On 8/7/2011 11:36 AM, Jacob Carlborg wrote:
 Currently, the only available format is XML.

Ok, I'll look into writing a binary archiver that assumes that the CPU architecture on the deserializing end is the same as that on the serializing end. If it works, maybe Orange is a good choice.
Aug 07 2011
next sibling parent reply Lutger Blijdestijn <lutger.blijdestijn gmail.com> writes:
dsimcha wrote:

 On 8/7/2011 11:36 AM, Jacob Carlborg wrote:
 Currently, the only available format is XML.

Ok, I'll look into writing a binary archiver that assumes that the CPU architecture on the deserializing end is the same as that on the serializing end. If it works, maybe Orange is a good choice.

Just in case you missed it, the messagepack protocol has a D implementation which seems to be what you're looking for: http://msgpack.org/ The last commit on bitbucket reveals it should be compatible with 2.054. Perhaps it can be adapted as an archiver for Orange.
Aug 07 2011
next sibling parent Lutger Blijdestijn <lutger.blijdestijn gmail.com> writes:
link for the D implementation: https://bitbucket.org/repeatedly/msgpack4d/
Aug 07 2011
prev sibling next sibling parent dsimcha <dsimcha yahoo.com> writes:
On 8/7/2011 12:01 PM, Lutger Blijdestijn wrote:
 dsimcha wrote:

 On 8/7/2011 11:36 AM, Jacob Carlborg wrote:
 Currently, the only available format is XML.

Ok, I'll look into writing a binary archiver that assumes that the CPU architecture on the deserializing end is the same as that on the serializing end. If it works, maybe Orange is a good choice.

Just in case you missed it, the messagepack protocol has a D implementation which seems to be what you're looking for: http://msgpack.org/ The last commit on bitbucket reveals it should be compatible with 2.054. Perhaps it can be adapted as an archiver for Orange.

Ok, this sounds great. Again, though, it would be great to get serialization into Phobos. (I don't know whether messagepack is suitable in its current form, because I haven't looked in detail.) I was vaguely aware of a messagepack implementation for D, but I didn't realize it was still maintained and didn't know where it was hosted.
Aug 07 2011
prev sibling parent Jacob Carlborg <doob me.com> writes:
On 2011-08-07 18:01, Lutger Blijdestijn wrote:
 dsimcha wrote:

 On 8/7/2011 11:36 AM, Jacob Carlborg wrote:
 Currently, the only available format is XML.

Ok, I'll look into writing a binary archiver that assumes that the CPU architecture on the deserializing end is the same as that on the serializing end. If it works, maybe Orange is a good choice.

Just in case you missed it, the messagepack protocol has a D implementation which seems to be what you're looking for: http://msgpack.org/ The last commit on bitbucket reveals it should be compatible with 2.054. Perhaps it can be adapted as an archiver for Orange.

I think it should be possible. -- /Jacob Carlborg
Aug 07 2011
prev sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2011-08-07 17:45, dsimcha wrote:
 On 8/7/2011 11:36 AM, Jacob Carlborg wrote:
 Currently, the only available format is XML.

Ok, I'll look into writing a binary archiver that assumes that the CPU architecture on the deserializing end is the same as that on the serializing end. If it works, maybe Orange is a good choice.

Sounds good. I just hope that the current design allows for a binary archive. Currently the serializer in Orange assumes that an archive can deserialize a value based on a key which could be basically anywhere in the serialized data. This allows at least to implement archives which store the serialized data in a structured format, e.g. XML, JSON, YAML. I don't know if that's possible with a binary format, I'm not familiar with how to implement a binary format. -- /Jacob Carlborg
Aug 07 2011
parent reply dsimcha <dsimcha yahoo.com> writes:
On 8/7/2011 2:27 PM, Jacob Carlborg wrote:
 On 2011-08-07 17:45, dsimcha wrote:
 On 8/7/2011 11:36 AM, Jacob Carlborg wrote:
 Currently, the only available format is XML.

Ok, I'll look into writing a binary archiver that assumes that the CPU architecture on the deserializing end is the same as that on the serializing end. If it works, maybe Orange is a good choice.

Sounds good. I just hope that the current design allows for a binary archive. Currently the serializer in Orange assumes that an archive can deserialize a value based on a key which could be basically anywhere in the serialized data. This allows at least to implement archives which store the serialized data in a structured format, e.g. XML, JSON, YAML. I don't know if that's possible with a binary format, I'm not familiar with how to implement a binary format.

Yeah, I was trying to wrap my head around the whole "key" concept. I wasn't very successful. I also tried out Orange and filed a few bug reports. It may be that Orange isn't the right tool for the job for MPI, though modulo some bug fixing and polishing it could be extremely useful in different cases with different sets of tradeoffs. In addition to the bug reports I filed, why is it necessary to write any serialization code to serialize through the base class? What's wrong with just doing something like: class Base {} class Derived : Base {} void main() { auto serializer = new Serializer(new XMLArchive!()); // Introspect Derived and figure out all the details automatically. serializer.register!(Derived); }
Aug 07 2011
next sibling parent Jacob Carlborg <doob me.com> writes:
On 2011-08-07 21:28, dsimcha wrote:
 Yeah, I was trying to wrap my head around the whole "key" concept. I
 wasn't very successful. I also tried out Orange and filed a few bug
 reports. It may be that Orange isn't the right tool for the job for MPI,
 though modulo some bug fixing and polishing it could be extremely useful
 in different cases with different sets of tradeoffs.

Every serialized value has a associated key. The key should be unique in its context but doesn't have to be unique in the whole document. A key can be explicitly chosen, in that case that key will be used, or the serialize can create a key (just a number that is incremented). Example: class Foo { int bar; } auto foo = new Foo; When serializing "foo", it will get the key "0", chosen by the serializer. When "bar" is serialized it will use the explicit key "bar". This way the serialization process won't depend on the order of instance variables or struct members. In addition to keys, all values have a associated id which is unique across the whole document. This is used for pointers and similar which reference other variables.
 In addition to the bug reports I filed, why is it necessary to write any
 serialization code to serialize through the base class? What's wrong
 with just doing something like:

 class Base {}
 class Derived : Base {}

 void main() {
 auto serializer = new Serializer(new XMLArchive!());

 // Introspect Derived and figure out all the details automatically.
 serializer.register!(Derived);
 }

I haven't thought about that, seems it would work. That will shorten the code a lot. This is a part that has not gone through the rewrite. Note that all documentation on the wiki pages are outdated, they only refer to the first version, 0.0.1. The unit tests can be used as documentation to see how to use the new version and how it behaves. -- /Jacob Carlborg
Aug 07 2011
prev sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2011-08-07 21:28, dsimcha wrote:
 In addition to the bug reports I filed, why is it necessary to write any
 serialization code to serialize through the base class? What's wrong
 with just doing something like:

 class Base {}
 class Derived : Base {}

 void main() {
 auto serializer = new Serializer(new XMLArchive!());

 // Introspect Derived and figure out all the details automatically.
 serializer.register!(Derived);
 }

I've been thinking about this and currently I don't see how this would be possible. When serializing through a base class reference the static type would be of the base class. But what I need is the static type of the subclass, to be able to loop through the tuple returned by tupleof. The only information I can get about the subclass is basically the fully qualified name. What I would need is some kind of associative array that maps strings to types, but as far as I know that's not possible, specially since the strings would be runtime values. -- /Jacob Carlborg
Aug 11 2011
parent reply dsimcha <dsimcha yahoo.com> writes:
On 8/11/2011 4:14 AM, Jacob Carlborg wrote:
 On 2011-08-07 21:28, dsimcha wrote:
 In addition to the bug reports I filed, why is it necessary to write any
 serialization code to serialize through the base class? What's wrong
 with just doing something like:

 class Base {}
 class Derived : Base {}

 void main() {
 auto serializer = new Serializer(new XMLArchive!());

 // Introspect Derived and figure out all the details automatically.
 serializer.register!(Derived);
 }

I've been thinking about this and currently I don't see how this would be possible. When serializing through a base class reference the static type would be of the base class. But what I need is the static type of the subclass, to be able to loop through the tuple returned by tupleof. The only information I can get about the subclass is basically the fully qualified name. What I would need is some kind of associative array that maps strings to types, but as far as I know that's not possible, specially since the strings would be runtime values.

You have classinfo as a key, as you point out. You also already have a template that's capable of serializing a class given that its static type is exactly its dynamic type. I was thinking something like: class Serializer { string delegate(Object)[TypeInfo_Class] registered; void register(T)() { registered[T.classinfo] = &downcastSerialize!(T); } void serialize(T)(T value) if(is(T : Object)) { if(value.classinfo is T.classinfo) { // Then the static type is exactly the runtime type. // Serialize it the same way you do now. } else { enforce(value.classinfo in registered, "Cannot serialize a " ~ value.classinfo.name ~ " because it has not been registered."); return registered[value.classinfo](value); } } string downcastSerialize(T)(Object value) if(is(T : Object)) { auto casted = cast(T) value; assert(casted); assert(value.classinfo is T.classinfo); return serialize(casted); } }
Aug 11 2011
next sibling parent dsimcha <dsimcha yahoo.com> writes:
On 8/11/2011 7:07 AM, dsimcha wrote:
 On 8/11/2011 4:14 AM, Jacob Carlborg wrote:
 On 2011-08-07 21:28, dsimcha wrote:
 In addition to the bug reports I filed, why is it necessary to write any
 serialization code to serialize through the base class? What's wrong
 with just doing something like:

 class Base {}
 class Derived : Base {}

 void main() {
 auto serializer = new Serializer(new XMLArchive!());

 // Introspect Derived and figure out all the details automatically.
 serializer.register!(Derived);
 }

I've been thinking about this and currently I don't see how this would be possible. When serializing through a base class reference the static type would be of the base class. But what I need is the static type of the subclass, to be able to loop through the tuple returned by tupleof. The only information I can get about the subclass is basically the fully qualified name. What I would need is some kind of associative array that maps strings to types, but as far as I know that's not possible, specially since the strings would be runtime values.

You have classinfo as a key, as you point out. You also already have a template that's capable of serializing a class given that its static type is exactly its dynamic type. I was thinking something like: class Serializer { string delegate(Object)[TypeInfo_Class] registered; void register(T)() { registered[T.classinfo] = &downcastSerialize!(T); } void serialize(T)(T value) if(is(T : Object)) { if(value.classinfo is T.classinfo) { // Then the static type is exactly the runtime type. // Serialize it the same way you do now. } else { enforce(value.classinfo in registered, "Cannot serialize a " ~ value.classinfo.name ~ " because it has not been registered."); return registered[value.classinfo](value); } } string downcastSerialize(T)(Object value) if(is(T : Object)) { auto casted = cast(T) value; assert(casted); assert(value.classinfo is T.classinfo); return serialize(casted); } }

One small correction: string downcastSerialize(T)(Object value) if(is(T : Object)) { auto casted = cast(T) value; assert(casted); assert(casted.classinfo is T.classinfo); return serialize(casted); }
Aug 11 2011
prev sibling parent Jacob Carlborg <doob me.com> writes:
On 2011-08-11 13:07, dsimcha wrote:
 On 8/11/2011 4:14 AM, Jacob Carlborg wrote:
 On 2011-08-07 21:28, dsimcha wrote:
 In addition to the bug reports I filed, why is it necessary to write any
 serialization code to serialize through the base class? What's wrong
 with just doing something like:

 class Base {}
 class Derived : Base {}

 void main() {
 auto serializer = new Serializer(new XMLArchive!());

 // Introspect Derived and figure out all the details automatically.
 serializer.register!(Derived);
 }

I've been thinking about this and currently I don't see how this would be possible. When serializing through a base class reference the static type would be of the base class. But what I need is the static type of the subclass, to be able to loop through the tuple returned by tupleof. The only information I can get about the subclass is basically the fully qualified name. What I would need is some kind of associative array that maps strings to types, but as far as I know that's not possible, specially since the strings would be runtime values.

You have classinfo as a key, as you point out. You also already have a template that's capable of serializing a class given that its static type is exactly its dynamic type. I was thinking something like: class Serializer { string delegate(Object)[TypeInfo_Class] registered; void register(T)() { registered[T.classinfo] = &downcastSerialize!(T); } void serialize(T)(T value) if(is(T : Object)) { if(value.classinfo is T.classinfo) { // Then the static type is exactly the runtime type. // Serialize it the same way you do now. } else { enforce(value.classinfo in registered, "Cannot serialize a " ~ value.classinfo.name ~ " because it has not been registered."); return registered[value.classinfo](value); } } string downcastSerialize(T)(Object value) if(is(T : Object)) { auto casted = cast(T) value; assert(casted); assert(value.classinfo is T.classinfo); return serialize(casted); } }

Cool, very clever. I didn't think of delegates and specially not creating a delegate out of a template method. Thanks. -- /Jacob Carlborg
Aug 11 2011
prev sibling parent reply dsimcha <dsimcha yahoo.com> writes:
On 8/7/2011 11:36 AM, Jacob Carlborg wrote:
 Good to know, but what flavor? As I see it there is a three-way tradeoff
 in serialization. In order of importance for distributed parallelism,
 the qualities are:

I can answer these tradeoff for the Orange serialization library, http://dsource.org/projects/orange/.

BTW, I know this has been discussed in the past, but I'll bring it up again. Since serialization is pretty fundamental to a lot of things and I want to avoid dependency hell, what are the prospects for getting Orange into Phobos?
Aug 07 2011
parent Jacob Carlborg <doob me.com> writes:
On 2011-08-07 17:58, dsimcha wrote:
 On 8/7/2011 11:36 AM, Jacob Carlborg wrote:
 Good to know, but what flavor? As I see it there is a three-way tradeoff
 in serialization. In order of importance for distributed parallelism,
 the qualities are:

I can answer these tradeoff for the Orange serialization library, http://dsource.org/projects/orange/.

BTW, I know this has been discussed in the past, but I'll bring it up again. Since serialization is pretty fundamental to a lot of things and I want to avoid dependency hell, what are the prospects for getting Orange into Phobos?

To get Orange into Phobos, at least this most be done: * Actually finishing the rewrite (I'm almost done, the basic stuff works) * Add more unit tests * Add documentation * Rip out all D1 and Tango related code * Some minor changes to follow the Phobos style guide, I have not followed the 80-120 column limit * The XML module in Phobos needs some minor updates * I've used my own kind of mini unit test framework, don't know if people like that, should be easy to remove I think that's all. -- /Jacob Carlborg
Aug 07 2011
prev sibling parent David Nadlinger <see klickverbot.at> writes:
On 8/7/11 12:09 AM, dsimcha wrote:
 On 8/6/2011 5:38 PM, jdrewsen wrote:
 AFAIK David Nadlinger is handling serialization in his GSOC Thrift
 project that he is working on currently.


The most important thing to note, and the reason it could not be appropriate for what you want to do, is that the intended main use case for Thrift is to define an interface that can easily be used from several programming languages, feeling »native« for each of them (similar to what protobuf does). As a consequence, Thrift by design only supports value types, so it is not possible to e.g. serialize a tree or a DAG without »flattening« it first. Another important feature of Thrift is protocol versioning – you can have required and optional fields, and the order of struct fields on the wire is not defined. While the schemata themselves are never serialized, the serialized data includes type tags and field ids for this purpose. For the actual serialization format, there are several choices available, currently implemented for D are the most popular ones: Binary, which basically just dumps the raw bytes to the stream (all numbers are written in network byte order, though), Compact, which is a space-optimized binary protocol (zigzag varints, merging of some bytes where you know you don't need all bits, …), and a »rich« JSON format. These features obviously come at a (manageable) performance cost, but except for that, the code is quite heavily optimized for reading/writing performance. For example, while the protocols and transports (serialized data sources/sinks) are pluggable at runtime, it is possible to specialize all the serialization/RPC code for the actual implementations used, thus eliminating all virtual calls and allowing e.g. the serialization code for a struct to be inlined into a single function without any control flow resp. the reading code into a single switch statement (for the field ids) inside a loop. But as said above, the second item from your list, flexibility with regard to the types serialized, is a non-goal for Thrift, so it probably isn't the best fit for your application. David
Aug 09 2011
prev sibling next sibling parent reply Sean Kelly <sean invisibleduck.org> writes:
I'm hoping to simply extend the existing API. The crucial portion will be th=
e addition of a Node (base) type.=20

Sent from my iPhone

On Aug 6, 2011, at 2:38 PM, jdrewsen <jdrewsen nospam.com> wrote:

 Den 06-08-2011 05:51, dsimcha skrev:
 I've finally bitten the bullet and learned MPI
 (http://en.wikipedia.org/wiki/Message_passing_interface) for an ultra
 computationally intensive research project I've been working on lately.
 I wrote all the MPI-calling code in D against the C API, using a very
 quick-and-dirty (i.e. not releasable) translation of the parts of the
 header I needed.
=20
 I'm halfway-thinking of writing a std.concurrency-like interface on top
 of MPI in D. A few questions:
=20
 1. Is anyone besides me interested in this?
=20
 2. Is anyone already working on something similar.
=20
 3. Would this be Phobos material even though it would depend on MPI, or
 would it better be kept as a 3rd party library?

I think std.concurrency needs to define a new interface for passing messag=

d probably be 3rd party since there are many serialized representations and p= rotocols out there to pick from.
=20
 4. std.concurrency in its current incarnation doesn't allow objects with
 mutable indirection to be passed as messages. This makes sense when
 passing messages between threads in the same address space. However, for
 passing between MPI processes, the object is going to be copied anyhow.
 Should the restriction be kept (for consistency) or removed (because it
 doesn't serve much of a purpose in the MPI context)?

 5. For passing complex object graphs, serialization would obviously be
 necessary. What's the current state of the art in serialization in D? I
 want something that's efficient and general first and foremost. I really
 don't care about human readability or standards compliance (in other
 words, no XML or JSON or anything like that).

AFAIK David Nadlinger is handling serialization in his GSOC Thrift project=

=20
 /Jonas

Aug 06 2011
parent dsimcha <dsimcha yahoo.com> writes:
On 8/6/2011 8:26 PM, Sean Kelly wrote:
 I'm hoping to simply extend the existing API. The crucial portion will be the
addition of a Node (base) type.

So Node would be the equivalent of Tid in the current API?
Aug 06 2011
prev sibling next sibling parent Sean Kelly <sean invisibleduck.org> writes:
Nope. It would represent an external destination and defines the protocol.=20=


Sent from my iPhone

On Aug 6, 2011, at 6:57 PM, dsimcha <dsimcha yahoo.com> wrote:

 On 8/6/2011 8:26 PM, Sean Kelly wrote:
 I'm hoping to simply extend the existing API. The crucial portion will be=


=20
 So Node would be the equivalent of Tid in the current API?

Aug 07 2011
prev sibling next sibling parent Sean Kelly <sean invisibleduck.org> writes:
This would probably work with the protobuf format.=20

Sent from my iPhone

On Aug 7, 2011, at 12:28 PM, dsimcha <dsimcha yahoo.com> wrote:

 On 8/7/2011 2:27 PM, Jacob Carlborg wrote:
 On 2011-08-07 17:45, dsimcha wrote:
 On 8/7/2011 11:36 AM, Jacob Carlborg wrote:
 Currently, the only available format is XML.

Ok, I'll look into writing a binary archiver that assumes that the CPU architecture on the deserializing end is the same as that on the serializing end. If it works, maybe Orange is a good choice.

Sounds good. I just hope that the current design allows for a binary archive. Currently the serializer in Orange assumes that an archive can deserialize a value based on a key which could be basically anywhere in the serialized data. This allows at least to implement archives which store the serialized data in a structured format, e.g. XML, JSON, YAML. I don't know if that's possible with a binary format, I'm not familiar with how to implement a binary format. =20

Yeah, I was trying to wrap my head around the whole "key" concept. I wasn=

t may be that Orange isn't the right tool for the job for MPI, though modulo= some bug fixing and polishing it could be extremely useful in different cas= es with different sets of tradeoffs.
=20
 In addition to the bug reports I filed, why is it necessary to write any s=

st doing something like:
=20
 class Base {}
 class Derived : Base {}
=20
 void main() {
    auto serializer =3D new Serializer(new XMLArchive!());
=20
    // Introspect Derived and figure out all the details automatically.
    serializer.register!(Derived);
 }
=20

Aug 07 2011
prev sibling parent "Masahiro Nakagawa" <repeatedly gmail.com> writes:
On Mon, 08 Aug 2011 01:08:39 +0900, dsimcha <dsimcha yahoo.com> wrote:

 On 8/7/2011 12:01 PM, Lutger Blijdestijn wrote:
 dsimcha wrote:

 On 8/7/2011 11:36 AM, Jacob Carlborg wrote:
 Currently, the only available format is XML.

Ok, I'll look into writing a binary archiver that assumes that the CPU architecture on the deserializing end is the same as that on the serializing end. If it works, maybe Orange is a good choice.

Just in case you missed it, the messagepack protocol has a D implementation which seems to be what you're looking for: http://msgpack.org/ The last commit on bitbucket reveals it should be compatible with 2.054. Perhaps it can be adapted as an archiver for Orange.

Ok, this sounds great. Again, though, it would be great to get serialization into Phobos. (I don't know whether messagepack is suitable in its current form, because I haven't looked in detail.) I was vaguely aware of a messagepack implementation for D, but I didn't realize it was still maintained and didn't know where it was hosted.

I maintain MessagePack for D and use this library as internal tool of my job. I will move from bitbucket to github. D programmer mainly uses git and github is more useful than bitbucket. Masahiro
Aug 08 2011
prev sibling next sibling parent Sean Kelly <sean invisibleduck.org> writes:
Is the archive formatter dynamically pluggable?

Sent from my iPhone

On Aug 6, 2011, at 11:51 AM, Jacob Carlborg <doob me.com> wrote:

 On 2011-08-06 18:32, Sean Kelly wrote:
 I'd love to be able to send classes between processes, but first we need a=


=20
 Have a look at Orange, I don't know if it's considered good but it works f=

ML. http://dsource.org/projects/orange/
=20
 --=20
 /Jacob Carlborg

Aug 06 2011
prev sibling parent Sean Kelly <sean invisibleduck.org> writes:
I was mostly wondering if the serialized was all template code or if the arc=
hived portion used some form of polymorphism. Sounds like its the latter.=20=


Sent from my iPhone

On Aug 7, 2011, at 8:19 AM, Jacob Carlborg <doob me.com> wrote:

 On 2011-08-07 02:24, Sean Kelly wrote:
 Is the archive formatter dynamically pluggable?

I'm not exactly sure what you mean but you can create new archive types an=

akes an archive (as an interface) as a parameter.
=20
 Sent from my iPhone
=20
 On Aug 6, 2011, at 11:51 AM, Jacob Carlborg<doob me.com>  wrote:
=20
 On 2011-08-06 18:32, Sean Kelly wrote:
 I'd love to be able to send classes between processes, but first we nee=




=20
 Have a look at Orange, I don't know if it's considered good but it works=



y XML. http://dsource.org/projects/orange/
=20
 --
 /Jacob Carlborg


=20 --=20 /Jacob Carlborg

Aug 07 2011