digitalmars.D - std.concurrency wrapper over MPI?

dsimcha (23/23) Aug 05 2011 I've finally bitten the bullet and learned MPI

Russel Winder (55/71) Aug 05 2011 MPI may be ancient, it may be a bit daft in terms of its treatment of

dsimcha (16/21) Aug 06 2011 This is why, even though I do stuff that's arguably HPC, I can't stand

Russel Winder (30/38) Aug 06 2011 =20

dsimcha (4/9) Aug 06 2011 Now that I think of it, there's also the option of porting boost::mpi to...

Jacob Carlborg (8/13) Aug 06 2011 My rewrite of Orange is almost finished. It can currently only serialize...
bearophile (6/11) Aug 06 2011 I'd like one or more Phobos modules built on top of the basic MPI, so I ...
Sean Kelly (19/32) Aug 06 2011 I'd love to be able to send classes between processes, but first we need...

Jacob Carlborg (6/7) Aug 06 2011 Have a look at Orange, I don't know if it's considered good but it works...

Sean Kelly (6/13) Aug 06 2011 Is the archive formatter dynamically pluggable?

Jacob Carlborg (6/16) Aug 07 2011 I'm not exactly sure what you mean but you can create new archive types

Sean Kelly (9/29) Aug 07 2011 I was mostly wondering if the serialized was all template code or if the...

Jacob Carlborg (5/28) Aug 07 2011 The serializer uses template methods, the archive uses interfaces and

dsimcha (9/10) Aug 07 2011 The more I think about it, the more I think that std.concurrency isn't

Jonathan M Davis (14/27) Aug 06 2011 Personally, I've never heard of MPI and have no interest in it whatsoeve...
jdrewsen (8/31) Aug 06 2011 I think std.concurrency needs to define a new interface for passing

dsimcha (17/20) Aug 06 2011 Good to know, but what flavor? As I see it there is a three-way

Jacob Carlborg (12/28) Aug 07 2011 I can answer these tradeoff for the Orange serialization library,

dsimcha (4/5) Aug 07 2011 Ok, I'll look into writing a binary archiver that assumes that the CPU

Lutger Blijdestijn (5/11) Aug 07 2011 Just in case you missed it, the messagepack protocol has a D implementat...

Lutger Blijdestijn (1/1) Aug 07 2011 link for the D implementation: https://bitbucket.org/repeatedly/msgpack4...
dsimcha (6/17) Aug 07 2011 Ok, this sounds great. Again, though, it would be great to get

Masahiro Nakagawa (6/27) Aug 08 2011 I maintain MessagePack for D and use this library as internal tool of my...

Jacob Carlborg (4/15) Aug 07 2011 I think it should be possible.

Jacob Carlborg (10/15) Aug 07 2011 Sounds good. I just hope that the current design allows for a binary

dsimcha (16/30) Aug 07 2011 Yeah, I was trying to wrap my head around the whole "key" concept. I

Sean Kelly (9/41) Aug 07 2011 This would probably work with the protobuf format.=20
Jacob Carlborg (24/39) Aug 07 2011 Every serialized value has a associated key. The key should be unique in...
Jacob Carlborg (12/22) Aug 11 2011 I've been thinking about this and currently I don't see how this would

dsimcha (28/52) Aug 11 2011 You have classinfo as a key, as you point out. You also already have a

dsimcha (8/63) Aug 11 2011 One small correction:
Jacob Carlborg (5/60) Aug 11 2011 Cool, very clever. I didn't think of delegates and specially not

dsimcha (5/10) Aug 07 2011 BTW, I know this has been discussed in the past, but I'll bring it up

Jacob Carlborg (14/25) Aug 07 2011 To get Orange into Phobos, at least this most be done:

David Nadlinger (31/35) Aug 09 2011 The most important thing to note, and the reason it could not be

Sean Kelly (8/44) Aug 06 2011 I'm hoping to simply extend the existing API. The crucial portion will b...

dsimcha (2/3) Aug 06 2011 So Node would be the equivalent of Tid in the current API?

Sean Kelly (4/8) Aug 07 2011 Nope. It would represent an external destination and defines the protoco...

dsimcha <dsimcha yahoo.com> writes:

I've finally bitten the bullet and learned MPI 
(http://en.wikipedia.org/wiki/Message_passing_interface) for an ultra 
computationally intensive research project I've been working on lately. 
  I wrote all the MPI-calling code in D against the C API, using a very 
quick-and-dirty (i.e. not releasable) translation of the parts of the 
header I needed.

I'm halfway-thinking of writing a std.concurrency-like interface on top 
of MPI in D.  A few questions:

1.  Is anyone besides me interested in this?

2.  Is anyone already working on something similar.

3.  Would this be Phobos material even though it would depend on MPI, or 
would it better be kept as a 3rd party library?

4.  std.concurrency in its current incarnation doesn't allow objects 
with mutable indirection to be passed as messages.   This makes sense 
when passing messages between threads in the same address space. 
However, for passing between MPI processes, the object is going to be 
copied anyhow.  Should the restriction be kept (for consistency) or 
removed (because it doesn't serve much of a purpose in the MPI context)?

5.  For passing complex object graphs, serialization would obviously be 
necessary.  What's the current state of the art in serialization in D? 
I want something that's efficient and general first and foremost.  I 
really don't care about human readability or standards compliance (in 
other words, no XML or JSON or anything like that).

Aug 05 2011

Russel Winder <russel russel.org.uk> writes:

On Fri, 2011-08-05 at 23:51 -0400, dsimcha wrote:
[ . . . ]
 1.  Is anyone besides me interested in this?

MPI may be ancient, it may be a bit daft in terms of its treatment of
marshalling, unmarshalling and serializing, it may be only a Fortran and
C thing bolted into C++ (quite well) but it is the de facto standard for
HPC.  OK so HPC is about 10% of world-wide computing, probably less than
that of spend despite the enormous per installation price, but it is
about 90% of political marketing.  Any short term parallelism strategy
must include MPI -- and work with OpenMPI and MPICH2.

So I don't think it is a matter of just interest for D, I would say that
if D is to stand with C++, C and Fortran then there has to be an MPI
API.  Even though MPI should be banned going forward.

 2.  Is anyone already working on something similar.
=20
 3.  Would this be Phobos material even though it would depend on MPI, or=

=20
 would it better be kept as a 3rd party library?

Given that it requires a transitive dependency then either Phobos goes
forward with optional dependencies or the MPI API is a separate thing.
Given my personal opinion that actor model, dataflow model, agents, etc.
should be the application level concurrency and parallelism model, I
would be quite happy with an MPI API not being in Phobos.  Keep Phobos
for that which every D installation will need.  MPI is a niche market in
that respect.

Optional dependencies sort of work but are sort of a real pain in the
Java/Maven milieu.

 4.  std.concurrency in its current incarnation doesn't allow objects=20
 with mutable indirection to be passed as messages.   This makes sense=20
 when passing messages between threads in the same address space.=20
 However, for passing between MPI processes, the object is going to be=20
 copied anyhow.  Should the restriction be kept (for consistency) or=20
 removed (because it doesn't serve much of a purpose in the MPI context)?

At the root of this issue is local thread-based parallelism in a shared
memory context, vs cluster parallelism.  MPI is a cluster solution --
even though it can be used in multicore shared memory situation.  The
point about enforced copying vs. potential sharing is core to this
obviously.  This has to be handled with absolute top notch performance
in mind.  It is arguably a situation where programming language
semantics and purity have to be sacrificed at the altar of performance.
There are already far too many MPI applications that are written with
far too much comms code in the application simply to ensure performance
-- because the MPI infrastructure cannot be trusted to do things fast
enough if you use anything other than the bottom most layer.

 5.  For passing complex object graphs, serialization would obviously be=

=20
 necessary.  What's the current state of the art in serialization in D?=

=20
 I want something that's efficient and general first and foremost.  I=20
 really don't care about human readability or standards compliance (in=20
 other words, no XML or JSON or anything like that).

Again performance is everything, so nothing must get in the way of
having something that cannot be made faster.

The main problem here is going to be that when anything gets released
performance will be the only yardstick by which things are measured.
Simplicity of code, ease of evolution of code, all the things
professional developers value, will go out of the window.  It's HPC
after all :-)

I still think D needs a dataflow, CSP and data parallelism strategy, cf.
Go, GPars, Akka, even Haskell.  Having actors is good, but having only
actors is not good, cf. Scala and Akka.

--=20
Russel.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D
Dr Russel Winder      t: +44 20 7585 2200   voip: sip:russel.winder ekiga.n=
et
41 Buckmaster Road    m: +44 7770 465 077   xmpp: russel russel.org.uk
London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder

Aug 05 2011

dsimcha <dsimcha yahoo.com> writes:

On 8/6/2011 2:57 AM, Russel Winder wrote:
 The main problem here is going to be that when anything gets released
 performance will be the only yardstick by which things are measured.
 Simplicity of code, ease of evolution of code, all the things
 professional developers value, will go out of the window.  It's HPC
 after all :-)

This is why, even though I do stuff that's arguably HPC, I can't stand 
the HPC community.  Of course performance is important, but nothing 
should be so sacred as to be completely immune to tradeoffs.  The thing 
that drew me to D is that you can get pretty good performance out of it 
without sacrificing that much ease of use compared to dynamic languages. 
  Besides, you can always provide a high-level but not-that-efficient 
API for most cases and a lower-level API for when more control is needed.

Anyhow, D has one key advantage that makes it more tolerant of 
communication overhead than most languages:  std.parallelism.  At least 
the way things are set up on the cluster here at Johns Hopkins, each 
node has 8 cores.  The "traditional" MPI way of doing things is 
apparently to allocate 8 MPI processes per node in this case, one per 
core.  Instead, I'm allocating one process per node, using MPI only for 
very coarse grained parallelism and using std.parallelism for more 
fine-grained parallelism to keep all 8 cores occupied with one MPI process.

Aug 06 2011

Russel Winder <russel russel.org.uk> writes:

On Sat, 2011-08-06 at 10:09 -0400, dsimcha wrote:
[ . . . ]
 Anyhow, D has one key advantage that makes it more tolerant of=20
 communication overhead than most languages:  std.parallelism.  At least=

=20
 the way things are set up on the cluster here at Johns Hopkins, each=20
 node has 8 cores.  The "traditional" MPI way of doing things is=20
 apparently to allocate 8 MPI processes per node in this case, one per=20
 core.  Instead, I'm allocating one process per node, using MPI only for=

=20
 very coarse grained parallelism and using std.parallelism for more=20
 fine-grained parallelism to keep all 8 cores occupied with one MPI proces=

s.

I think increasingly the idiom in the Fortran/C/C++ HPC community is to
use MPI on a per address space basis, rather than a per ALU basis, and
to use OpenMP to handle the thread control in a given address space
handling the multicores.  (OpenMP being something totally different to
OpenMPI.)

In the C++ arena though there is Threading Building Blocks (TBB) which
has element of arcane-ness but is a whole lot better than OpenMP.

As you point out there are much better, generally higher-level,
abstractions that would make HPC code faster as well as much, much
easier to maintain.  However even with Intel's high budget marketing of
some of the alternatives, the HPC community seem steadfast in their
support of MPI and OpenMP.  Of course they also have codes from the
1970s and 1980s they are in continued use because no-one is prepared to
rewrite them.
=20
--=20
Russel.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D
Dr Russel Winder      t: +44 20 7585 2200   voip: sip:russel.winder ekiga.n=
et
41 Buckmaster Road    m: +44 7770 465 077   xmpp: russel russel.org.uk
London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder

Aug 06 2011

dsimcha <dsimcha yahoo.com> writes:

On 8/6/2011 2:57 AM, Russel Winder wrote:
 The main problem here is going to be that when anything gets released
 performance will be the only yardstick by which things are measured.
 Simplicity of code, ease of evolution of code, all the things
 professional developers value, will go out of the window.  It's HPC
 after all :-)

Now that I think of it, there's also the option of porting boost::mpi to 
D and then possibly writing a std.concurrency-like wrapper on top of 
that (or not).

Aug 06 2011

Jacob Carlborg <doob me.com> writes:

On 2011-08-06 05:51, dsimcha wrote:
 5. For passing complex object graphs, serialization would obviously be
 necessary. What's the current state of the art in serialization in D? I
 want something that's efficient and general first and foremost. I really
 don't care about human readability or standards compliance (in other
 words, no XML or JSON or anything like that).

My rewrite of Orange is almost finished. It can currently only serialize 
to XML, but it's possible to create new archive types for other formats. 
I have no idea about the performance, I'm mostly focusing on be able to 
serialize as many types as possible.

http://dsource.org/projects/orange/

-- 
/Jacob Carlborg

Aug 06 2011

bearophile <bearophileHUGS lycos.com> writes:

dsimcha:

 1.  Is anyone besides me interested in this?

Other people are interested.


 3.  Would this be Phobos material even though it would depend on MPI, or 
 would it better be kept as a 3rd party library?

I'd like one or more Phobos modules built on top of the basic MPI, so I think
it's better to have MPI too in Phobos.


 Should the restriction be kept (for consistency) or 
 removed (because it doesn't serve much of a purpose in the MPI context)?

This not easy to say now, for me.

Bye,
bearophile

Aug 06 2011

Sean Kelly <sean invisibleduck.org> writes:

I'd love to be able to send classes between processes, but first we need a g=
ood serialization/deserialization mechanism.=20

Sent from my iPhone

On Aug 5, 2011, at 8:51 PM, dsimcha <dsimcha yahoo.com> wrote:

 I've finally bitten the bullet and learned MPI (http://en.wikipedia.org/wi=

ki/Message_passing_interface) for an ultra computationally intensive researc=
h project I've been working on lately.  I wrote all the MPI-calling code in D=
 against the C API, using a very quick-and-dirty (i.e. not releasable) trans=
lation of the parts of the header I needed.
=20
 I'm halfway-thinking of writing a std.concurrency-like interface on top of=

 MPI in D.  A few questions:
=20
 1.  Is anyone besides me interested in this?
=20
 2.  Is anyone already working on something similar.
=20
 3.  Would this be Phobos material even though it would depend on MPI, or w=

ould it better be kept as a 3rd party library?
=20
 4.  std.concurrency in its current incarnation doesn't allow objects with m=

utable indirection to be passed as messages.   This makes sense when passing=
 messages between threads in the same address space. However, for passing be=
tween MPI processes, the object is going to be copied anyhow.  Should the re=
striction be kept (for consistency) or removed (because it doesn't serve muc=
h of a purpose in the MPI context)?
=20
 5.  For passing complex object graphs, serialization would obviously be ne=

cessary.  What's the current state of the art in serialization in D? I want s=
omething that's efficient and general first and foremost.  I really don't ca=
re about human readability or standards compliance (in other words, no XML o=
r JSON or anything like that).

Aug 06 2011

Jacob Carlborg <doob me.com> writes:

On 2011-08-06 18:32, Sean Kelly wrote:
 I'd love to be able to send classes between processes, but first we need a
good serialization/deserialization mechanism.

Have a look at Orange, I don't know if it's considered good but it works 
for almost all types available in D, the only available archive is 
currently XML. http://dsource.org/projects/orange/

-- 
/Jacob Carlborg

Aug 06 2011

Sean Kelly <sean invisibleduck.org> writes:

Is the archive formatter dynamically pluggable?

Sent from my iPhone

On Aug 6, 2011, at 11:51 AM, Jacob Carlborg <doob me.com> wrote:

 On 2011-08-06 18:32, Sean Kelly wrote:
 I'd love to be able to send classes between processes, but first we need a=


 good serialization/deserialization mechanism.
=20
 Have a look at Orange, I don't know if it's considered good but it works f=

or almost all types available in D, the only available archive is currently X=
ML. http://dsource.org/projects/orange/
=20
 --=20
 /Jacob Carlborg

Aug 06 2011

Jacob Carlborg <doob me.com> writes:

On 2011-08-07 02:24, Sean Kelly wrote:
 Is the archive formatter dynamically pluggable?

I'm not exactly sure what you mean but you can create new archive types 
and use them with the existing serializer. When creating a new 
serializer it takes an archive (as an interface) as a parameter.

 Sent from my iPhone

 On Aug 6, 2011, at 11:51 AM, Jacob Carlborg<doob me.com>  wrote:

 On 2011-08-06 18:32, Sean Kelly wrote:
 I'd love to be able to send classes between processes, but first we need a
good serialization/deserialization mechanism.

 Have a look at Orange, I don't know if it's considered good but it works for
almost all types available in D, the only available archive is currently XML.
http://dsource.org/projects/orange/

 --
 /Jacob Carlborg



-- 
/Jacob Carlborg

Aug 07 2011

Sean Kelly <sean invisibleduck.org> writes:

I was mostly wondering if the serialized was all template code or if the arc=
hived portion used some form of polymorphism. Sounds like its the latter.=20=


Sent from my iPhone

On Aug 7, 2011, at 8:19 AM, Jacob Carlborg <doob me.com> wrote:

 On 2011-08-07 02:24, Sean Kelly wrote:
 Is the archive formatter dynamically pluggable?

=20
 I'm not exactly sure what you mean but you can create new archive types an=

d use them with the existing serializer. When creating a new serializer it t=
akes an archive (as an interface) as a parameter.
=20
 Sent from my iPhone
=20
 On Aug 6, 2011, at 11:51 AM, Jacob Carlborg<doob me.com>  wrote:
=20
 On 2011-08-06 18:32, Sean Kelly wrote:
 I'd love to be able to send classes between processes, but first we nee=




d a good serialization/deserialization mechanism.
=20
 Have a look at Orange, I don't know if it's considered good but it works=



 for almost all types available in D, the only available archive is currentl=
y XML. http://dsource.org/projects/orange/
=20
 --
 /Jacob Carlborg


=20
=20
 --=20
 /Jacob Carlborg

Aug 07 2011

Jacob Carlborg <doob me.com> writes:

On 2011-08-07 18:15, Sean Kelly wrote:
 I was mostly wondering if the serialized was all template code or if the
archived portion used some form of polymorphism. Sounds like its the latter.

The serializer uses template methods, the archive uses interfaces and 
virtual methods.

 Sent from my iPhone

 On Aug 7, 2011, at 8:19 AM, Jacob Carlborg<doob me.com>  wrote:

 On 2011-08-07 02:24, Sean Kelly wrote:
 Is the archive formatter dynamically pluggable?

 I'm not exactly sure what you mean but you can create new archive types and
use them with the existing serializer. When creating a new serializer it takes
an archive (as an interface) as a parameter.

 Sent from my iPhone

 On Aug 6, 2011, at 11:51 AM, Jacob Carlborg<doob me.com>   wrote:

 On 2011-08-06 18:32, Sean Kelly wrote:
 I'd love to be able to send classes between processes, but first we need a
good serialization/deserialization mechanism.

 Have a look at Orange, I don't know if it's considered good but it works for
almost all types available in D, the only available archive is currently XML.
http://dsource.org/projects/orange/

 --
 /Jacob Carlborg



 --
 /Jacob Carlborg



-- 
/Jacob Carlborg

Aug 07 2011

dsimcha <dsimcha yahoo.com> writes:

On 8/6/2011 12:32 PM, Sean Kelly wrote:
 I'd love to be able to send classes between processes, but first we need a
good serialization/deserialization mechanism.

The more I think about it, the more I think that std.concurrency isn't 
quite the right interface for cluster parallelism.  I'm thinking instead 
of doing something loosely based on, but not a translation of, 
boost::mpi.  The following differences between std.concurrency and what 
makes sense for MPI bother me:

1.  shared/immutable isn't needed when you're copying the data anyhow.

2.  spawn() is taken care of by the MPI runtime.

3.  std.concurrency doesn't support broadcasting.

Aug 07 2011

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Friday 05 August 2011 23:51:24 dsimcha wrote:
 I've finally bitten the bullet and learned MPI
 (http://en.wikipedia.org/wiki/Message_passing_interface) for an ultra
 computationally intensive research project I've been working on lately.
   I wrote all the MPI-calling code in D against the C API, using a very
 quick-and-dirty (i.e. not releasable) translation of the parts of the
 header I needed.
 
 I'm halfway-thinking of writing a std.concurrency-like interface on top
 of MPI in D.  A few questions:
 
 1.  Is anyone besides me interested in this?

Personally, I've never heard of MPI and have no interest in it whatsoever, but 
I don't do much with concurrent programming. Others will probably be far more 
interested though.

 3.  Would this be Phobos material even though it would depend on MPI, or
 would it better be kept as a 3rd party library?

If MPI is something which can be found by default on your typical OS install, 
then it may be okay to have it in Phobos. But in general, I would think that 
if you need to install 3rd party libraries to use it, it should probably be a 
3rd party library itself. It may be a bit of a grey area though. If we have 
many libraries such as that, then we may want to create an official (or at
least 
pseudo-official) project which contains the major ones to make them easy to
find. 
Regardless, we _don't_ want Phobos to require extra dependencies which aren't 
normally found on your typical OS install such that you have to install them 
even if you don't use the functionality that they're needed for.

- Jonathan M Davis

Aug 06 2011

jdrewsen <jdrewsen nospam.com> writes:

Den 06-08-2011 05:51, dsimcha skrev:
 I've finally bitten the bullet and learned MPI
 (http://en.wikipedia.org/wiki/Message_passing_interface) for an ultra
 computationally intensive research project I've been working on lately.
 I wrote all the MPI-calling code in D against the C API, using a very
 quick-and-dirty (i.e. not releasable) translation of the parts of the
 header I needed.

 I'm halfway-thinking of writing a std.concurrency-like interface on top
 of MPI in D. A few questions:

 1. Is anyone besides me interested in this?

 2. Is anyone already working on something similar.

 3. Would this be Phobos material even though it would depend on MPI, or
 would it better be kept as a 3rd party library?

I think std.concurrency needs to define a new interface for passing 
messages out-of-process ie. other process or host. The implementation 
itself should probably be 3rd party since there are many serialized 
representations and protocols out there to pick from.

 4. std.concurrency in its current incarnation doesn't allow objects with
 mutable indirection to be passed as messages. This makes sense when
 passing messages between threads in the same address space. However, for
 passing between MPI processes, the object is going to be copied anyhow.
 Should the restriction be kept (for consistency) or removed (because it
 doesn't serve much of a purpose in the MPI context)?

 5. For passing complex object graphs, serialization would obviously be
 necessary. What's the current state of the art in serialization in D? I
 want something that's efficient and general first and foremost. I really
 don't care about human readability or standards compliance (in other
 words, no XML or JSON or anything like that).

AFAIK David Nadlinger is handling serialization in his GSOC Thrift 
project that he is working on currently.

/Jonas

Aug 06 2011

dsimcha <dsimcha yahoo.com> writes:

On 8/6/2011 5:38 PM, jdrewsen wrote:
 AFAIK David Nadlinger is handling serialization in his GSOC Thrift
 project that he is working on currently.

 /Jonas

Good to know, but what flavor?  As I see it there is a three-way 
tradeoff in serialization.  In order of importance for distributed 
parallelism, the qualities are:

1.  Efficiency.  How much does it cost to serialize/unserialize 
something and how much space overhead is there?

2.  Flexibility w.r.t. types:  How many types can be serialized?  How 
faithfully are they reproduced on the other end w.r.t. things like 
pointer/reference/slice aliasing?

3.  Standardization:  How universally understood is the format?  Can it 
be used to send data across different CPU architectures?  Across 
languages?  Is it human readable?  Is it based on some meta-format like XML?

For enterprisey use cases, I think this ordering would probably be 
completely reversed.  For example, in a typical MPI cluster all nodes 
are of the same architecture, so it's usually perfectly reasonable to 
send arrays of primitives as just raw bits.  I imagine this is a 
terrible idea in other contexts that I know less about.

Aug 06 2011

Jacob Carlborg <doob me.com> writes:

On 2011-08-07 00:09, dsimcha wrote:
 On 8/6/2011 5:38 PM, jdrewsen wrote:
 AFAIK David Nadlinger is handling serialization in his GSOC Thrift
 project that he is working on currently.

 /Jonas

 Good to know, but what flavor? As I see it there is a three-way tradeoff
 in serialization. In order of importance for distributed parallelism,
 the qualities are:

I can answer these tradeoff for the Orange serialization library, 
http://dsource.org/projects/orange/.

 1. Efficiency. How much does it cost to serialize/unserialize something
 and how much space overhead is there?

I haven't done any measurements but I would guess it depends on which 
archive type is used. The actual serializer tries to do quite a lot, 
where possible, at compile time. But it also stores a reference for 
every serialized value, in the case a pointer points to the value.

 2. Flexibility w.r.t. types: How many types can be serialized? How
 faithfully are they reproduced on the other end w.r.t. things like
 pointer/reference/slice aliasing?

If I haven't missed something Orange can serialize almost all types, 
except unions, function pointers, void pointers and delegates.

 3. Standardization: How universally understood is the format? Can it be
 used to send data across different CPU architectures? Across languages?
 Is it human readable? Is it based on some meta-format like XML?

Currently, the only available format is XML.

-- 
/Jacob Carlborg

Aug 07 2011

dsimcha <dsimcha yahoo.com> writes:

On 8/7/2011 11:36 AM, Jacob Carlborg wrote:
 Currently, the only available format is XML.

Ok, I'll look into writing a binary archiver that assumes that the CPU 
architecture on the deserializing end is the same as that on the 
serializing end.  If it works, maybe Orange is a good choice.

Aug 07 2011

Lutger Blijdestijn <lutger.blijdestijn gmail.com> writes:

dsimcha wrote:

 On 8/7/2011 11:36 AM, Jacob Carlborg wrote:
 Currently, the only available format is XML.

 
 Ok, I'll look into writing a binary archiver that assumes that the CPU
 architecture on the deserializing end is the same as that on the
 serializing end.  If it works, maybe Orange is a good choice.

Just in case you missed it, the messagepack protocol has a D implementation  
which seems to be what you're looking for: http://msgpack.org/ The last 
commit on bitbucket reveals it should be compatible with 2.054. Perhaps it 
can be adapted as an archiver for Orange.

Aug 07 2011

Lutger Blijdestijn <lutger.blijdestijn gmail.com> writes:

link for the D implementation: https://bitbucket.org/repeatedly/msgpack4d/

Aug 07 2011

dsimcha <dsimcha yahoo.com> writes:

On 8/7/2011 12:01 PM, Lutger Blijdestijn wrote:
 dsimcha wrote:

 On 8/7/2011 11:36 AM, Jacob Carlborg wrote:
 Currently, the only available format is XML.

 Ok, I'll look into writing a binary archiver that assumes that the CPU
 architecture on the deserializing end is the same as that on the
 serializing end.  If it works, maybe Orange is a good choice.

 Just in case you missed it, the messagepack protocol has a D implementation
 which seems to be what you're looking for: http://msgpack.org/ The last
 commit on bitbucket reveals it should be compatible with 2.054. Perhaps it
 can be adapted as an archiver for Orange.

Ok, this sounds great.  Again, though, it would be great to get 
serialization into Phobos.  (I don't know whether messagepack is 
suitable in its current form, because I haven't looked in detail.)  I 
was vaguely aware of a messagepack implementation for D, but I didn't 
realize it was still maintained and didn't know where it was hosted.

Aug 07 2011

"Masahiro Nakagawa" <repeatedly gmail.com> writes:

On Mon, 08 Aug 2011 01:08:39 +0900, dsimcha <dsimcha yahoo.com> wrote:

 On 8/7/2011 12:01 PM, Lutger Blijdestijn wrote:
 dsimcha wrote:

 On 8/7/2011 11:36 AM, Jacob Carlborg wrote:
 Currently, the only available format is XML.

 Ok, I'll look into writing a binary archiver that assumes that the CPU
 architecture on the deserializing end is the same as that on the
 serializing end.  If it works, maybe Orange is a good choice.

 Just in case you missed it, the messagepack protocol has a D  
 implementation
 which seems to be what you're looking for: http://msgpack.org/ The last
 commit on bitbucket reveals it should be compatible with 2.054. Perhaps  
 it
 can be adapted as an archiver for Orange.

 Ok, this sounds great.  Again, though, it would be great to get  
 serialization into Phobos.  (I don't know whether messagepack is  
 suitable in its current form, because I haven't looked in detail.)  I  
 was vaguely aware of a messagepack implementation for D, but I didn't  
 realize it was still maintained and didn't know where it was hosted.

I maintain MessagePack for D and use this library as internal tool of my  
job.

I will move from bitbucket to github.
D programmer mainly uses git and github is more useful than bitbucket.


Masahiro

Aug 08 2011

Jacob Carlborg <doob me.com> writes:

On 2011-08-07 18:01, Lutger Blijdestijn wrote:
 dsimcha wrote:

 On 8/7/2011 11:36 AM, Jacob Carlborg wrote:
 Currently, the only available format is XML.

 Ok, I'll look into writing a binary archiver that assumes that the CPU
 architecture on the deserializing end is the same as that on the
 serializing end.  If it works, maybe Orange is a good choice.

 Just in case you missed it, the messagepack protocol has a D implementation
 which seems to be what you're looking for: http://msgpack.org/ The last
 commit on bitbucket reveals it should be compatible with 2.054. Perhaps it
 can be adapted as an archiver for Orange.

I think it should be possible.

-- 
/Jacob Carlborg

Aug 07 2011

Jacob Carlborg <doob me.com> writes:

On 2011-08-07 17:45, dsimcha wrote:
 On 8/7/2011 11:36 AM, Jacob Carlborg wrote:
 Currently, the only available format is XML.

 Ok, I'll look into writing a binary archiver that assumes that the CPU
 architecture on the deserializing end is the same as that on the
 serializing end. If it works, maybe Orange is a good choice.

Sounds good. I just hope that the current design allows for a binary 
archive. Currently the serializer in Orange assumes that an archive can 
deserialize a value based on a key which could be basically anywhere in 
the serialized data. This allows at least to implement archives which 
store the serialized data in a structured format, e.g. XML, JSON, YAML. 
I don't know if that's possible with a binary format, I'm not familiar 
with how to implement a binary format.

-- 
/Jacob Carlborg

Aug 07 2011

dsimcha <dsimcha yahoo.com> writes:

On 8/7/2011 2:27 PM, Jacob Carlborg wrote:
 On 2011-08-07 17:45, dsimcha wrote:
 On 8/7/2011 11:36 AM, Jacob Carlborg wrote:
 Currently, the only available format is XML.

 Ok, I'll look into writing a binary archiver that assumes that the CPU
 architecture on the deserializing end is the same as that on the
 serializing end. If it works, maybe Orange is a good choice.

 Sounds good. I just hope that the current design allows for a binary
 archive. Currently the serializer in Orange assumes that an archive can
 deserialize a value based on a key which could be basically anywhere in
 the serialized data. This allows at least to implement archives which
 store the serialized data in a structured format, e.g. XML, JSON, YAML.
 I don't know if that's possible with a binary format, I'm not familiar
 with how to implement a binary format.

Yeah, I was trying to wrap my head around the whole "key" concept.  I 
wasn't very successful.  I also tried out Orange and filed a few bug 
reports.  It may be that Orange isn't the right tool for the job for 
MPI, though modulo some bug fixing and polishing it could be extremely 
useful in different cases with different sets of tradeoffs.

In addition to the bug reports I filed, why is it necessary to write any 
serialization code to serialize through the base class?  What's wrong 
with just doing something like:

class Base {}
class Derived : Base {}

void main() {
     auto serializer = new Serializer(new XMLArchive!());

     // Introspect Derived and figure out all the details automatically.
     serializer.register!(Derived);
}

Aug 07 2011

Sean Kelly <sean invisibleduck.org> writes:

This would probably work with the protobuf format.=20

Sent from my iPhone

On Aug 7, 2011, at 12:28 PM, dsimcha <dsimcha yahoo.com> wrote:

 On 8/7/2011 2:27 PM, Jacob Carlborg wrote:
 On 2011-08-07 17:45, dsimcha wrote:
 On 8/7/2011 11:36 AM, Jacob Carlborg wrote:
 Currently, the only available format is XML.

=20
 Ok, I'll look into writing a binary archiver that assumes that the CPU
 architecture on the deserializing end is the same as that on the
 serializing end. If it works, maybe Orange is a good choice.

=20
 Sounds good. I just hope that the current design allows for a binary
 archive. Currently the serializer in Orange assumes that an archive can
 deserialize a value based on a key which could be basically anywhere in
 the serialized data. This allows at least to implement archives which
 store the serialized data in a structured format, e.g. XML, JSON, YAML.
 I don't know if that's possible with a binary format, I'm not familiar
 with how to implement a binary format.
=20

=20
 Yeah, I was trying to wrap my head around the whole "key" concept.  I wasn=

't very successful.  I also tried out Orange and filed a few bug reports.  I=
t may be that Orange isn't the right tool for the job for MPI, though modulo=
 some bug fixing and polishing it could be extremely useful in different cas=
es with different sets of tradeoffs.
=20
 In addition to the bug reports I filed, why is it necessary to write any s=

erialization code to serialize through the base class?  What's wrong with ju=
st doing something like:
=20
 class Base {}
 class Derived : Base {}
=20
 void main() {
    auto serializer =3D new Serializer(new XMLArchive!());
=20
    // Introspect Derived and figure out all the details automatically.
    serializer.register!(Derived);
 }
=20

Aug 07 2011

Jacob Carlborg <doob me.com> writes:

On 2011-08-07 21:28, dsimcha wrote:
 Yeah, I was trying to wrap my head around the whole "key" concept. I
 wasn't very successful. I also tried out Orange and filed a few bug
 reports. It may be that Orange isn't the right tool for the job for MPI,
 though modulo some bug fixing and polishing it could be extremely useful
 in different cases with different sets of tradeoffs.

Every serialized value has a associated key. The key should be unique in 
its context but doesn't have to be unique in the whole document. A key 
can be explicitly chosen, in that case that key will be used, or the 
serialize can create a key (just a number that is incremented). Example:

class Foo
{
     int bar;
}

auto foo = new Foo;

When serializing "foo", it will get the key "0", chosen by the 
serializer. When "bar" is serialized it will use the explicit key "bar". 
This way the serialization process won't depend on the order of instance 
variables or struct members.

In addition to keys, all values have a associated id which is unique 
across the whole document. This is used for pointers and similar which 
reference other variables.

 In addition to the bug reports I filed, why is it necessary to write any
 serialization code to serialize through the base class? What's wrong
 with just doing something like:

 class Base {}
 class Derived : Base {}

 void main() {
 auto serializer = new Serializer(new XMLArchive!());

 // Introspect Derived and figure out all the details automatically.
 serializer.register!(Derived);
 }

I haven't thought about that, seems it would work. That will shorten the 
code a lot. This is a part that has not gone through the rewrite.

Note that all documentation on the wiki pages are outdated, they only 
refer to the first version, 0.0.1. The unit tests can be used as 
documentation to see how to use the new version and how it behaves.

-- 
/Jacob Carlborg

Aug 07 2011

Jacob Carlborg <doob me.com> writes:

On 2011-08-07 21:28, dsimcha wrote:
 In addition to the bug reports I filed, why is it necessary to write any
 serialization code to serialize through the base class? What's wrong
 with just doing something like:

 class Base {}
 class Derived : Base {}

 void main() {
 auto serializer = new Serializer(new XMLArchive!());

 // Introspect Derived and figure out all the details automatically.
 serializer.register!(Derived);
 }

I've been thinking about this and currently I don't see how this would 
be possible. When serializing through a base class reference the static 
type would be of the base class. But what I need is the static type of 
the subclass, to be able to loop through the tuple returned by tupleof. 
The only information I can get about the subclass is basically the fully 
qualified name.

What I would need is some kind of associative array that maps strings to 
types, but as far as I know that's not possible, specially since the 
strings would be runtime values.

-- 
/Jacob Carlborg

Aug 11 2011

dsimcha <dsimcha yahoo.com> writes:

On 8/11/2011 4:14 AM, Jacob Carlborg wrote:
 On 2011-08-07 21:28, dsimcha wrote:
 In addition to the bug reports I filed, why is it necessary to write any
 serialization code to serialize through the base class? What's wrong
 with just doing something like:

 class Base {}
 class Derived : Base {}

 void main() {
 auto serializer = new Serializer(new XMLArchive!());

 // Introspect Derived and figure out all the details automatically.
 serializer.register!(Derived);
 }

 I've been thinking about this and currently I don't see how this would
 be possible. When serializing through a base class reference the static
 type would be of the base class. But what I need is the static type of
 the subclass, to be able to loop through the tuple returned by tupleof.
 The only information I can get about the subclass is basically the fully
 qualified name.

 What I would need is some kind of associative array that maps strings to
 types, but as far as I know that's not possible, specially since the
 strings would be runtime values.

You have classinfo as a key, as you point out.  You also already have a 
template that's capable of serializing a class given that its static 
type is exactly its dynamic type.

I was thinking something like:

class Serializer {
     string delegate(Object)[TypeInfo_Class] registered;

     void register(T)() {
         registered[T.classinfo] = &downcastSerialize!(T);
     }

     void serialize(T)(T value) if(is(T : Object)) {
         if(value.classinfo is T.classinfo) {
             // Then the static type is exactly the runtime type.
             // Serialize it the same way you do now.
         } else {
              enforce(value.classinfo in registered,
                  "Cannot serialize a " ~ value.classinfo.name  ~
                  " because it has not been registered.");

              return registered[value.classinfo](value);
         }
     }

     string downcastSerialize(T)(Object value) if(is(T : Object)) {
         auto casted = cast(T) value;
         assert(casted);
         assert(value.classinfo is T.classinfo);

         return serialize(casted);
     }
}

Aug 11 2011

dsimcha <dsimcha yahoo.com> writes:

On 8/11/2011 7:07 AM, dsimcha wrote:
 On 8/11/2011 4:14 AM, Jacob Carlborg wrote:
 On 2011-08-07 21:28, dsimcha wrote:
 In addition to the bug reports I filed, why is it necessary to write any
 serialization code to serialize through the base class? What's wrong
 with just doing something like:

 class Base {}
 class Derived : Base {}

 void main() {
 auto serializer = new Serializer(new XMLArchive!());

 // Introspect Derived and figure out all the details automatically.
 serializer.register!(Derived);
 }

 I've been thinking about this and currently I don't see how this would
 be possible. When serializing through a base class reference the static
 type would be of the base class. But what I need is the static type of
 the subclass, to be able to loop through the tuple returned by tupleof.
 The only information I can get about the subclass is basically the fully
 qualified name.

 What I would need is some kind of associative array that maps strings to
 types, but as far as I know that's not possible, specially since the
 strings would be runtime values.

 You have classinfo as a key, as you point out. You also already have a
 template that's capable of serializing a class given that its static
 type is exactly its dynamic type.

 I was thinking something like:

 class Serializer {
 string delegate(Object)[TypeInfo_Class] registered;

 void register(T)() {
 registered[T.classinfo] = &downcastSerialize!(T);
 }

 void serialize(T)(T value) if(is(T : Object)) {
 if(value.classinfo is T.classinfo) {
 // Then the static type is exactly the runtime type.
 // Serialize it the same way you do now.
 } else {
 enforce(value.classinfo in registered,
 "Cannot serialize a " ~ value.classinfo.name ~
 " because it has not been registered.");

 return registered[value.classinfo](value);
 }
 }

 string downcastSerialize(T)(Object value) if(is(T : Object)) {
 auto casted = cast(T) value;
 assert(casted);
 assert(value.classinfo is T.classinfo);

 return serialize(casted);
 }
 }

One small correction:


string downcastSerialize(T)(Object value) if(is(T : Object)) {
     auto casted = cast(T) value;
     assert(casted);
     assert(casted.classinfo is T.classinfo);

     return serialize(casted);
}

Aug 11 2011

Jacob Carlborg <doob me.com> writes:

On 2011-08-11 13:07, dsimcha wrote:
 On 8/11/2011 4:14 AM, Jacob Carlborg wrote:
 On 2011-08-07 21:28, dsimcha wrote:
 In addition to the bug reports I filed, why is it necessary to write any
 serialization code to serialize through the base class? What's wrong
 with just doing something like:

 class Base {}
 class Derived : Base {}

 void main() {
 auto serializer = new Serializer(new XMLArchive!());

 // Introspect Derived and figure out all the details automatically.
 serializer.register!(Derived);
 }

 I've been thinking about this and currently I don't see how this would
 be possible. When serializing through a base class reference the static
 type would be of the base class. But what I need is the static type of
 the subclass, to be able to loop through the tuple returned by tupleof.
 The only information I can get about the subclass is basically the fully
 qualified name.

 What I would need is some kind of associative array that maps strings to
 types, but as far as I know that's not possible, specially since the
 strings would be runtime values.

 You have classinfo as a key, as you point out. You also already have a
 template that's capable of serializing a class given that its static
 type is exactly its dynamic type.

 I was thinking something like:

 class Serializer {
 string delegate(Object)[TypeInfo_Class] registered;

 void register(T)() {
 registered[T.classinfo] = &downcastSerialize!(T);
 }

 void serialize(T)(T value) if(is(T : Object)) {
 if(value.classinfo is T.classinfo) {
 // Then the static type is exactly the runtime type.
 // Serialize it the same way you do now.
 } else {
 enforce(value.classinfo in registered,
 "Cannot serialize a " ~ value.classinfo.name ~
 " because it has not been registered.");

 return registered[value.classinfo](value);
 }
 }

 string downcastSerialize(T)(Object value) if(is(T : Object)) {
 auto casted = cast(T) value;
 assert(casted);
 assert(value.classinfo is T.classinfo);

 return serialize(casted);
 }
 }

Cool, very clever. I didn't think of delegates and specially not 
creating a delegate out of a template method. Thanks.

-- 
/Jacob Carlborg

Aug 11 2011

dsimcha <dsimcha yahoo.com> writes:

On 8/7/2011 11:36 AM, Jacob Carlborg wrote:
 Good to know, but what flavor? As I see it there is a three-way tradeoff
 in serialization. In order of importance for distributed parallelism,
 the qualities are:

 I can answer these tradeoff for the Orange serialization library,
 http://dsource.org/projects/orange/.


BTW, I know this has been discussed in the past, but I'll bring it up 
again.  Since serialization is pretty fundamental to a lot of things and 
I want to avoid dependency hell, what are the prospects for getting 
Orange into Phobos?

Aug 07 2011

Jacob Carlborg <doob me.com> writes:

On 2011-08-07 17:58, dsimcha wrote:
 On 8/7/2011 11:36 AM, Jacob Carlborg wrote:
 Good to know, but what flavor? As I see it there is a three-way tradeoff
 in serialization. In order of importance for distributed parallelism,
 the qualities are:

 I can answer these tradeoff for the Orange serialization library,
 http://dsource.org/projects/orange/.


 BTW, I know this has been discussed in the past, but I'll bring it up
 again. Since serialization is pretty fundamental to a lot of things and
 I want to avoid dependency hell, what are the prospects for getting
 Orange into Phobos?

To get Orange into Phobos, at least this most be done:

* Actually finishing the rewrite (I'm almost done, the basic stuff works)
* Add more unit tests
* Add documentation
* Rip out all D1 and Tango related code
* Some minor changes to follow the Phobos style guide, I have not 
followed the 80-120 column limit
* The XML module in Phobos needs some minor updates
* I've used my own kind of mini unit test framework, don't know if 
people like that, should be easy to remove

I think that's all.

-- 
/Jacob Carlborg

Aug 07 2011

David Nadlinger <see klickverbot.at> writes:

On 8/7/11 12:09 AM, dsimcha wrote:
 On 8/6/2011 5:38 PM, jdrewsen wrote:
 AFAIK David Nadlinger is handling serialization in his GSOC Thrift
 project that he is working on currently.

 Good to know, but what flavor?

The most important thing to note, and the reason it could not be 
appropriate for what you want to do, is that the intended main use case 
for Thrift is to define an interface that can easily be used from 
several programming languages, feeling »native« for each of them 
(similar to what protobuf does). As a consequence, Thrift by design only 
supports value types, so it is not possible to e.g. serialize a tree or 
a DAG without »flattening« it first.

Another important feature of Thrift is protocol versioning – you can 
have required and optional fields, and the order of struct fields on the 
wire is not defined. While the schemata themselves are never serialized, 
the serialized data includes type tags and field ids for this purpose.

For the actual serialization format, there are several choices 
available, currently implemented for D are the most popular ones: 
Binary, which basically just dumps the raw bytes to the stream (all 
numbers are written in network byte order, though), Compact, which is a 
space-optimized binary protocol (zigzag varints, merging of some bytes 
where you know you don't need all bits, …), and a »rich« JSON format.

These features obviously come at a (manageable) performance cost, but 
except for that, the code is quite heavily optimized for reading/writing 
performance. For example, while the protocols and transports (serialized 
data sources/sinks) are pluggable at runtime, it is possible to 
specialize all the serialization/RPC code for the actual implementations 
used, thus eliminating all virtual calls and allowing e.g. the 
serialization code for a struct to be inlined into a single function 
without any control flow resp. the reading code into a single switch 
statement (for the field ids) inside a loop.

But as said above, the second item from your list, flexibility with 
regard to the types serialized, is a non-goal for Thrift, so it probably 
isn't the best fit for your application.

David

Aug 09 2011

Sean Kelly <sean invisibleduck.org> writes:

I'm hoping to simply extend the existing API. The crucial portion will be th=
e addition of a Node (base) type.=20

Sent from my iPhone

On Aug 6, 2011, at 2:38 PM, jdrewsen <jdrewsen nospam.com> wrote:

 Den 06-08-2011 05:51, dsimcha skrev:
 I've finally bitten the bullet and learned MPI
 (http://en.wikipedia.org/wiki/Message_passing_interface) for an ultra
 computationally intensive research project I've been working on lately.
 I wrote all the MPI-calling code in D against the C API, using a very
 quick-and-dirty (i.e. not releasable) translation of the parts of the
 header I needed.
=20
 I'm halfway-thinking of writing a std.concurrency-like interface on top
 of MPI in D. A few questions:
=20
 1. Is anyone besides me interested in this?
=20
 2. Is anyone already working on something similar.
=20
 3. Would this be Phobos material even though it would depend on MPI, or
 would it better be kept as a 3rd party library?

=20
 I think std.concurrency needs to define a new interface for passing messag=

es out-of-process ie. other process or host. The implementation itself shoul=
d probably be 3rd party since there are many serialized representations and p=
rotocols out there to pick from.
=20
 4. std.concurrency in its current incarnation doesn't allow objects with
 mutable indirection to be passed as messages. This makes sense when
 passing messages between threads in the same address space. However, for
 passing between MPI processes, the object is going to be copied anyhow.
 Should the restriction be kept (for consistency) or removed (because it
 doesn't serve much of a purpose in the MPI context)?

 5. For passing complex object graphs, serialization would obviously be
 necessary. What's the current state of the art in serialization in D? I
 want something that's efficient and general first and foremost. I really
 don't care about human readability or standards compliance (in other
 words, no XML or JSON or anything like that).

=20
 AFAIK David Nadlinger is handling serialization in his GSOC Thrift project=

 that he is working on currently.
=20
 /Jonas

Aug 06 2011

dsimcha <dsimcha yahoo.com> writes:

On 8/6/2011 8:26 PM, Sean Kelly wrote:
 I'm hoping to simply extend the existing API. The crucial portion will be the
addition of a Node (base) type.

So Node would be the equivalent of Tid in the current API?

Aug 06 2011

Sean Kelly <sean invisibleduck.org> writes:

Nope. It would represent an external destination and defines the protocol.=20=


Sent from my iPhone

On Aug 6, 2011, at 6:57 PM, dsimcha <dsimcha yahoo.com> wrote:

 On 8/6/2011 8:26 PM, Sean Kelly wrote:
 I'm hoping to simply extend the existing API. The crucial portion will be=


 the addition of a Node (base) type.
=20
 So Node would be the equivalent of Tid in the current API?

Aug 07 2011

D Programming

C/C++ Programming

Other

digitalmars.D - std.concurrency wrapper over MPI?