www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.announce - MessagePack for D released

reply "Masahiro Nakagawa" <repeatedly gmail.com> writes:
I release a serialization library for Phobos(D2).

Project repository: http://www.bitbucket.org/repeatedly/msgpack4d

MessagePack is a binary-based serialization spec.
See official site for details: http://msgpack.sourceforge.net/
Some application replace JSON with MessagePack for performance improvement.

msgpack4d ver 0.1.0 has an equal features with reference implementation.
  * Zero copy serialization / deserialization
  * Stream deserializer
  * Support some D features(Range, Tuple)

Currently, Phobos doesn't have a real serialization module(std.json lacks  
some features)
I hope Phobos adopts this library for serialization(std.msgpack or  
std.serialization?).
Apr 25 2010
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 04/25/2010 07:20 AM, Masahiro Nakagawa wrote:
 I release a serialization library for Phobos(D2).
 
 Project repository: http://www.bitbucket.org/repeatedly/msgpack4d
 
 MessagePack is a binary-based serialization spec.
 See official site for details: http://msgpack.sourceforge.net/
 Some application replace JSON with MessagePack for performance improvement.
 
 msgpack4d ver 0.1.0 has an equal features with reference implementation.
 * Zero copy serialization / deserialization
 * Stream deserializer
 * Support some D features(Range, Tuple)
 
 Currently, Phobos doesn't have a real serialization module(std.json 
 lacks some features)
 I hope Phobos adopts this library for serialization(std.msgpack or 
 std.serialization?).
This is great. Code looks very good and it's very generous of you to offer to contribute it to Phobos. There are a few details that could be changed to minimize repetition. For example, SimpleBuffer looks a lot like Appender!(ubyte[]). The DeflateBuffer and the FileBuffer look like great starting points for output range artifacts, though e.g. for FileBuffer we should call it something like BinaryFileWriter (as opposed to TextFileWriter) etc. I suggest we hold a community review in this group and then most likely integrate this functionality into Phobos. All - let us know what you think! Andrei
Apr 25 2010
parent "Masahiro Nakagawa" <repeatedly gmail.com> writes:
On Sun, 25 Apr 2010 22:26:14 +0900, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 On 04/25/2010 07:20 AM, Masahiro Nakagawa wrote:
 I release a serialization library for Phobos(D2).

 Project repository: http://www.bitbucket.org/repeatedly/msgpack4d

 MessagePack is a binary-based serialization spec.
 See official site for details: http://msgpack.sourceforge.net/
 Some application replace JSON with MessagePack for performance  
 improvement.

 msgpack4d ver 0.1.0 has an equal features with reference implementation.
 * Zero copy serialization / deserialization
 * Stream deserializer
 * Support some D features(Range, Tuple)

 Currently, Phobos doesn't have a real serialization module(std.json
 lacks some features)
 I hope Phobos adopts this library for serialization(std.msgpack or
 std.serialization?).
This is great. Code looks very good and it's very generous of you to offer to contribute it to Phobos.
Thanks!
 There are a few details that could be changed to minimize repetition.
 For example, SimpleBuffer looks a lot like Appender!(ubyte[]). The
I forgot Appender. I removed SimpleBuffer struct and used Appender!(ubyte[]) for SimpleBuffer alias. (I think SimpleBuffer is a better name than Appender!(ubyte[]))
 DeflateBuffer and the FileBuffer look like great starting points for
 output range artifacts, though e.g. for FileBuffer we should call it
 something like BinaryFileWriter (as opposed to TextFileWriter) etc.
Oh, Your point is correct. FileBuffer was not Buffer. Thank you very much for your code review. Recent changes are here: http://www.bitbucket.org/repeatedly/msgpack4d/changesets
 I suggest we hold a community review in this group and then most likely
 integrate this functionality into Phobos. All - let us know what you  
 think!
Ok, I wait for more reviews.
Apr 25 2010
prev sibling next sibling parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2010-04-25 08:20:17 -0400, "Masahiro Nakagawa" <repeatedly gmail.com> said:

 I release a serialization library for Phobos(D2).
 
 Project repository: http://www.bitbucket.org/repeatedly/msgpack4d
 
 MessagePack is a binary-based serialization spec.
 See official site for details: http://msgpack.sourceforge.net/
 Some application replace JSON with MessagePack for performance improvement.
 
 msgpack4d ver 0.1.0 has an equal features with reference implementation.
   * Zero copy serialization / deserialization
   * Stream deserializer
   * Support some D features(Range, Tuple)
 
 Currently, Phobos doesn't have a real serialization module(std.json 
 lacks  some features)
 I hope Phobos adopts this library for serialization(std.msgpack or  
 std.serialization?).
Looks well done. There's one thing I'd suggest though. I'm pretty sure you could make it even faster by skipping the mp_Object intermediary representation and using templates. I know it's possible since I've done it for a surprisingly similar serialization library I'm working on. The trick is to reuse the same pattern in the unpacker as you're already using in the packer. For instance, the packer has this function: ref Packer pack(T)(in T value) if (is(Unqual!T == long)) so the unpacker could have this function (just changed 'in' by 'out'): ref Unpacker unpack(T)(out T value) if (is(Unqual!T == long)) My library works by unserializing everything directly a the right place in a data structure while it parses the stream. Looks like this: MyStruct original; Archiver archiver; archiver.encode(original); immutable(byte)[] data = archiver.outout; MyStruct copy; Unarchiver unarchiver; unarchiver.input = data unarchiver.decode(copy); This is unlike mp_Object which is in itself an intermediary representation that sits between the serialized data and the data structure you actually want to rebuild. I still have something similar to mp_Object as a convenience for types that prefer to implement a custom unserialization process in an order not dictated by the input stream, but this is less efficient: void decode(ref KeyUnarchiver archive) { archive.decode("var1", var1); archive.decode("var2", var2); } What I'm trying to put to work now is a way to deal with multiple references to the same object. I'd also like a nice way to deal with Variant, but I'm under the impression this won't be possible without adding serialization support directly into Variant, or into TypeInfo. Masahiro, sorry: this started as a useful commentary on your unserializer's approach and I ended up instead promoting what I am doing. Your library seems targeted at making a MessagePack serializer, with an emphasis on having a simple and portable serialization format, which is great when you want to communicate in this format. But on my side, I care more about being able of recreating object graphs and reinstantiating objects of the correct class when unserializing. That does not seem possible with your library, and MessagePack doesn't support this so it doesn't seem likely it can be added easily, am I right? -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Apr 25 2010
parent "Masahiro Nakagawa" <repeatedly gmail.com> writes:
On Mon, 26 Apr 2010 01:29:07 +0900, Michel Fortin  
<michel.fortin michelf.com> wrote:

 Looks well done. There's one thing I'd suggest though. I'm pretty sure  
 you could make it even faster by skipping the mp_Object intermediary  
 representation and using templates. I know it's possible since I've done  
 it for a surprisingly similar serialization library I'm working on.

 The trick is to reuse the same pattern in the unpacker as you're already  
 using in the packer. For instance, the packer has this function:

     ref Packer pack(T)(in T value) if (is(Unqual!T == long))

 so the unpacker could have this function (just changed 'in' by 'out'):

     ref Unpacker unpack(T)(out T value) if (is(Unqual!T == long))

 My library works by unserializing everything directly a the right place  
 in a data structure while it parses the stream. Looks like this:

 	MyStruct original;
 	Archiver archiver;
 	archiver.encode(original);
 	immutable(byte)[] data = archiver.outout;

 	MyStruct copy;
 	Unarchiver unarchiver;
 	unarchiver.input = data
 	unarchiver.decode(copy);

 This is unlike mp_Object which is in itself an intermediary  
 representation that sits between the serialized data and the data  
 structure you actually want to rebuild. I still have something similar  
 to mp_Object as a convenience for types that prefer to implement a  
 custom unserialization process in an order not dictated by the input  
 stream, but this is less efficient:

 	void decode(ref KeyUnarchiver archive) {
 		archive.decode("var1", var1);
 		archive.decode("var2", var2);
 	}
Yeah, I know your approach(Protocol Buffers supports similar approach using schema). Current implementation purposes separating deserialization and conversion. D has Tuple and Variadic Templates, so I can rewrite execute method relatively easily if need.
 What I'm trying to put to work now is a way to deal with multiple  
 references to the same object.
MessagePack doesn't consider multiple references to same object. This support slows down serialization / deserialization and makes stream deserializer difficult. This problem poses little problem for product use.
 I'd also like a nice way to deal with Variant, but I'm under the  
 impression this won't be possible without adding serialization support  
 directly into Variant, or into TypeInfo.
I agree this point. This problem might be solved if D has ADL(but I don't like ADL).
 Masahiro, sorry: this started as a useful commentary on your  
 unserializer's approach and I ended up instead promoting what I am  
 doing. Your library seems targeted at making a MessagePack serializer,  
 with an emphasis on having a simple and portable serialization format,  
 which is great when you want to communicate in this format.
Your understanding is correct. MessagePack is designed at efficient, small, and fast serialization(thus not perfect). I think this simplicity meets Phobos.
 But on my
 side, I care more about being able of recreating object graphs and  
 reinstantiating objects of the correct class when unserializing.
Perfect serialization needs environment support. However, such serialization is very difficult if support. (Cocoa, Java, and etc. Ruby's marshal is slow) I don't know the library based perfect serialization.
 That does not seem possible with your library, and MessagePack doesn't  
 support this so it doesn't seem likely it can be added easily, am I  
 right?
You are right. The author of MessagePack considers supporting class and reference extension. But his conclusion is "There is no little merit in supporting".
Apr 25 2010
prev sibling next sibling parent reply davidl <davidl nospam.com> writes:
=D4=DA Sun, 25 Apr 2010 20:20:17 +0800=A3=ACMasahiro Nakagawa  =

<repeatedly gmail.com> =D0=B4=B5=C0:

 I release a serialization library for Phobos(D2).

 Project repository: http://www.bitbucket.org/repeatedly/msgpack4d

 MessagePack is a binary-based serialization spec.
 See official site for details: http://msgpack.sourceforge.net/
 Some application replace JSON with MessagePack for performance  =
 improvement.

 msgpack4d ver 0.1.0 has an equal features with reference implementatio=
n.
   * Zero copy serialization / deserialization
   * Stream deserializer
   * Support some D features(Range, Tuple)

 Currently, Phobos doesn't have a real serialization module(std.json  =
 lacks some features)
 I hope Phobos adopts this library for serialization(std.msgpack or  =
 std.serialization?).
I hope it can create Dynamic Object when unpack the serialized object(on= ce = the client doesn't know the exact type of the original object or no = declaration is available when the compilation is done). For example: a plugin DLL class Plugin{ int plugin_state; } // serialize this object and then send to host Host // has no idea of the plugin's declaration DynamicObject do =3D unpack(dll_serialized_buffer); do.plugin_state // it can be done through the opDispatch -- = =CA=B9=D3=C3 Opera =B8=EF=C3=FC=D0=D4=B5=C4=B5=E7=D7=D3=D3=CA=BC=FE=BF=CD= =BB=A7=B3=CC=D0=F2: http://www.opera.com/mail/
Apr 26 2010
parent "Masahiro Nakagawa" <repeatedly gmail.com> writes:
On Mon, 26 Apr 2010 20:17:56 +0900, davidl <davidl nospam.com> wrote:

 I hope it can create Dynamic Object when unpack the serialized  
 object(once the client doesn't know the exact type of the original  
 object or no declaration is available when the compilation is done).

 For example:

 a plugin DLL

 class Plugin{
    int plugin_state;

 } // serialize this object and then send to host


 Host // has no idea of the plugin's declaration

 DynamicObject do = unpack(dll_serialized_buffer);

 do.plugin_state  // it can be done through the opDispatch
Umm... I can't image this situation. On host, How could you know 'plugin_state' name? I think DynamicObject is difficult. First idea uses Variant: ----- struct DynamicObject { Variant[string] props; // cut Variant opDispatch(string name)() { return props[name]; } } ----- but this code doesn't work. See http://d.puremagic.com/issues/show_bug.cgi?id=2451 Any ideas?
Apr 26 2010
prev sibling parent reply "Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> writes:
On 04/25/2010 02:20 PM, Masahiro Nakagawa wrote:
 I release a serialization library for Phobos(D2).

 Project repository: http://www.bitbucket.org/repeatedly/msgpack4d

 MessagePack is a binary-based serialization spec.
 See official site for details: http://msgpack.sourceforge.net/
 Some application replace JSON with MessagePack for performance improvement.

 msgpack4d ver 0.1.0 has an equal features with reference implementation.
 * Zero copy serialization / deserialization
 * Stream deserializer
 * Support some D features(Range, Tuple)

 Currently, Phobos doesn't have a real serialization module(std.json
 lacks some features)
 I hope Phobos adopts this library for serialization(std.msgpack or
 std.serialization?).
Phobos should definitely have a serialisation module, and this seems to me like a good candidate. I haven't tried the code myself, but it looks very clean and the examples are nice. I consider it a huge plus that it uses an existing format that already has APIs for a bunch of programming languages. Two questions: The packer can write to an arbitrary ubyte output range, but it seems the unpacker is limited to ubyte[] arrays. Would it be possible to unpack from an arbitrary input range? It doesn't seem to support the real type. It this a limitation of the MessagePack format or just an oversight? Even if the format doesn't directly support 80-bit floats, it should be possible to wrap them somehow. -Lars
Apr 27 2010
next sibling parent reply tama <repeatedly gmail.com> writes:
On Tue, 27 Apr 2010 21:27:35 +0900, Lars T. Kyllingstad  
<public kyllingen.nospamnet> wrote:

 Phobos should definitely have a serialisation module, and this seems to  
 me like a good candidate.  I haven't tried the code myself, but it looks  
 very clean and the examples are nice.
Thanks!
 I consider it a huge plus that it uses an existing format that already  
 has APIs for a bunch of programming languages.
I agree. At first, I checked several serialization libraries. As a result, I select MessagePack with all things considered. Other libraries and Specs seem to be large and complex.
 Two questions:

 The packer can write to an arbitrary ubyte output range, but it seems  
 the unpacker is limited to ubyte[] arrays.  Would it be possible to  
 unpack from an arbitrary input range?
No. Unpacker is possible to do so if D has integrated stream. For your information, Other implementation(e.g. Java) uses InputStream.
 It doesn't seem to support the real type.  It this a limitation of the  
 MessagePack format or just an oversight?  Even if the format doesn't  
 directly support 80-bit floats, it should be possible to wrap them  
 somehow.
Good point. MessagePack doesn't define real type(80-bit float) because some languages don't have real. But MessagePack has free format space, so some implementation can support language-specific format. real has environment-dependent problem. This problem causes loss of precision. Unpacker can't be responsible for this problem. OK? I will try to implement real support.
Apr 27 2010
parent reply "Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> writes:
On 04/27/2010 06:54 PM, tama wrote:
 On Tue, 27 Apr 2010 21:27:35 +0900, Lars T. Kyllingstad
 <public kyllingen.nospamnet> wrote:

 Phobos should definitely have a serialisation module, and this seems
 to me like a good candidate. I haven't tried the code myself, but it
 looks very clean and the examples are nice.
Thanks!
 I consider it a huge plus that it uses an existing format that already
 has APIs for a bunch of programming languages.
I agree. At first, I checked several serialization libraries. As a result, I select MessagePack with all things considered. Other libraries and Specs seem to be large and complex.
 Two questions:

 The packer can write to an arbitrary ubyte output range, but it seems
 the unpacker is limited to ubyte[] arrays. Would it be possible to
 unpack from an arbitrary input range?
No. Unpacker is possible to do so if D has integrated stream. For your information, Other implementation(e.g. Java) uses InputStream.
 It doesn't seem to support the real type. It this a limitation of the
 MessagePack format or just an oversight? Even if the format doesn't
 directly support 80-bit floats, it should be possible to wrap them
 somehow.
Good point. MessagePack doesn't define real type(80-bit float) because some languages don't have real. But MessagePack has free format space, so some implementation can support language-specific format. real has environment-dependent problem. This problem causes loss of precision. Unpacker can't be responsible for this problem. OK? I will try to implement real support.
For systems where real is 64 bits it may be possible to use an 80-bit std.numeric.CustomFloat as an intermediate step between the stored data and the 64-bit type. -Lars
Apr 27 2010
next sibling parent "Masahiro Nakagawa" <repeatedly gmail.com> writes:
On Wed, 28 Apr 2010 02:20:58 +0900, Lars T. Kyllingstad  
<public kyllingen.nospamnet> wrote:

 For systems where real is 64 bits it may be possible to use an 80-bit  
 std.numeric.CustomFloat as an intermediate step between the stored data  
 and the 64-bit type.
Thanks for your advice!
Apr 27 2010
prev sibling parent reply "Masahiro Nakagawa" <repeatedly gmail.com> writes:
On Wed, 28 Apr 2010 02:20:58 +0900, Lars T. Kyllingstad  
<public kyllingen.nospamnet> wrote:

 For systems where real is 64 bits it may be possible to use an 80-bit  
 std.numeric.CustomFloat as an intermediate step between the stored data  
 and the 64-bit type.
I tried std.numeric.CustomFloat: ----- alias CustomFloat!(1, 64, 15) RealRep; ----- But this code doesn't comile...
Apr 27 2010
parent reply "Robert Jacques" <sandford jhu.edu> writes:
On Tue, 27 Apr 2010 15:37:56 -0300, Masahiro Nakagawa  
<repeatedly gmail.com> wrote:

 On Wed, 28 Apr 2010 02:20:58 +0900, Lars T. Kyllingstad  
 <public kyllingen.nospamnet> wrote:

 For systems where real is 64 bits it may be possible to use an 80-bit  
 std.numeric.CustomFloat as an intermediate step between the stored data  
 and the 64-bit type.
I tried std.numeric.CustomFloat: ----- alias CustomFloat!(1, 64, 15) RealRep; ----- But this code doesn't comile...
CustomFloat currently is currently broken. There are some partial patches (see http://d.puremagic.com/issues/show_bug.cgi?id=3520), but I don't they will address this issue.
Apr 27 2010
parent "Masahiro Nakagawa" <repeatedly gmail.com> writes:
On Wed, 28 Apr 2010 10:02:52 +0900, Robert Jacques <sandford jhu.edu>  
wrote:

 On Tue, 27 Apr 2010 15:37:56 -0300, Masahiro Nakagawa  
 <repeatedly gmail.com> wrote:

 On Wed, 28 Apr 2010 02:20:58 +0900, Lars T. Kyllingstad  
 <public kyllingen.nospamnet> wrote:

 For systems where real is 64 bits it may be possible to use an 80-bit  
 std.numeric.CustomFloat as an intermediate step between the stored  
 data and the 64-bit type.
I tried std.numeric.CustomFloat: ----- alias CustomFloat!(1, 64, 15) RealRep; ----- But this code doesn't comile...
CustomFloat currently is currently broken. There are some partial patches (see http://d.puremagic.com/issues/show_bug.cgi?id=3520), but I don't they will address this issue.
Yeah. CustomFloat uses std.bitmanip.bitfields for internal state, but bitfields can't deal with large size :(
Apr 27 2010
prev sibling parent "Masahiro Nakagawa" <repeatedly gmail.com> writes:
On Tue, 27 Apr 2010 21:27:35 +0900, Lars T. Kyllingstad  
<public kyllingen.nospamnet> wrote:

 It doesn't seem to support the real type.  It this a limitation of the  
 MessagePack format or just an oversight?  Even if the format doesn't  
 directly support 80-bit floats, it should be possible to wrap them  
 somehow.
Finished. http://bitbucket.org/repeatedly/msgpack4d/changeset/87954e46548e If real type equals double type: - Packer uses double type for real type. - Unpacker raises UnpackException when receives real type. The reason for this, To use real type means that precision is important. How does it look?
Apr 27 2010