digitalmars.D.learn - Reflections on Serialization APIs in D

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= (17/17) Nov 17 2013 In the road to develop a new kind of search engine that caches

Jacek Furmankiewicz (5/5) Nov 17 2013 I have not used it in D, but we use Thrift in Java a lot and I've
Orvid King (17/34) Nov 17 2013 I would suspect that the biggest reason is the limitations that that

"Per =?UTF-8?B?Tm9yZGzDtnci?= <per.nordlow gmail.com> (8/59) Nov 18 2013 Is JSONSerialization somehow related to the upcoming

Jacob Carlborg (6/10) Nov 18 2013 The idea is that std.serialization can support many different archive

"Per =?UTF-8?B?Tm9yZGzDtnci?= <per.nordlow gmail.com> (3/16) Nov 18 2013 Ok. That is great.

Orvid King (9/70) Nov 18 2013 Yep, my goal with it is to be a possible contender for the place of

Atila Neves (8/14) Nov 18 2013 I'm not sure that's actually true. I've been working on my own

Jacek Furmankiewicz (8/8) Nov 18 2013 The reason I like Thrift is that it is backwards and forwards
Orvid King (9/23) Nov 18 2013 I am curious as to how exactly that would work, does it determine the

Atila Neves (2/15) Nov 18 2013 I promise to announce and explain it soon. Maybe even today. I

Atila Neves (2/18) Nov 18 2013

Jacob Carlborg (4/9) Nov 18 2013 Is that only for custom serialization or a general requirement?

Atila Neves (2/12) Nov 18 2013 See my previous answer.

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:

In the road to develop a new kind of search engine that caches 
types, statistics, etc about files and directories I'm currently 
trying to implement persistent caching of my internal directory 
tree using `msgpack-d`:

Why doesn't `msgpack-d` and, from what I can see also, 
`std.serialization` (Orange) support implementing *both* packing 
and unpacking through one common template (member) function 
overload like **Boost.Serialization** does?. For example 
containers can be handled using this concise and elegant syntax 
in C++11:

         friend class boost::serialization::access;
         template<class Ar> void serialize(Ar& ar, const uint 
version) {
             for (const auto& e : *this) { ar & e; }
         }

This halves the code size aswell as removes the risk of making 
the `pack` and `unpack` go out of sync.

Nov 17 2013

"Jacek Furmankiewicz" <jacek99 gmail.com> writes:

I have not used it in D, but we use Thrift in Java a lot and I've 
been very happy with it on many levels. it works really well in 
production.

Since Thrift has D bindings as of recently, it may be worth your 
time to investigate.

Nov 17 2013

Orvid King <blah38621 gmail.com> writes:

On 11/17/13, "Nordl�w" <per.nordlow gmail.com> wrote:
 In the road to develop a new kind of search engine that caches
 types, statistics, etc about files and directories I'm currently
 trying to implement persistent caching of my internal directory
 tree using `msgpack-d`:

 Why doesn't `msgpack-d` and, from what I can see also,
 `std.serialization` (Orange) support implementing *both* packing
 and unpacking through one common template (member) function
 overload like **Boost.Serialization** does?. For example
 containers can be handled using this concise and elegant syntax
 in C++11:

          friend class boost::serialization::access;
          template<class Ar> void serialize(Ar& ar, const uint
 version) {
              for (const auto& e : *this) { ar & e; }
          }

 This halves the code size aswell as removes the risk of making
 the `pack` and `unpack` go out of sync.

I would suspect that the biggest reason is the limitations that that
imposes on the underlying serialization implementation, as it would
require that the underlying format support a minimum set of types.

I have something similar(ish) in my serialization framework,
(https://github.com/Orvid/JSONSerialization) that allows you to
implement a custom format for each type, but I implement it as a pair
of methods, toString and parse, allowing the underlying format to be
able to support only serializing strings if it really wanted to. Also,
currently my framework only supports JSON, but it's designed such that
it would be insanely easy to add support for another format. It's also
fast, very fast, mostly because I have managed to implement the JSON
serialization methods entirely with no allocation at all being
required. I'm able to serialize 100k objects in about 90ms on an i5
running at 1.6ghz, deserialization is a bit slower currently, 420ms to
deserialize those same objects, but that's almost exclusively
allocation time.

Nov 17 2013

"Per =?UTF-8?B?Tm9yZGzDtnci?= <per.nordlow gmail.com> writes:

Is JSONSerialization somehow related to the upcoming 
std.serialization?
I feel that there is a big need for standardizing serialization 
in D. There are too many alternatives: dproto, msgpack, JSON, 
xml, etc should be made backends to the same frontend named 
std.serialization right?

/Per

On Sunday, 17 November 2013 at 21:37:35 UTC, Orvid King wrote:
 On 11/17/13, "Nordlöw" <per.nordlow gmail.com> wrote:
 In the road to develop a new kind of search engine that caches
 types, statistics, etc about files and directories I'm 
 currently
 trying to implement persistent caching of my internal directory
 tree using `msgpack-d`:

 Why doesn't `msgpack-d` and, from what I can see also,
 `std.serialization` (Orange) support implementing *both* 
 packing
 and unpacking through one common template (member) function
 overload like **Boost.Serialization** does?. For example
 containers can be handled using this concise and elegant syntax
 in C++11:

          friend class boost::serialization::access;
          template<class Ar> void serialize(Ar& ar, const uint
 version) {
              for (const auto& e : *this) { ar & e; }
          }

 This halves the code size aswell as removes the risk of making
 the `pack` and `unpack` go out of sync.

 I would suspect that the biggest reason is the limitations that 
 that
 imposes on the underlying serialization implementation, as it 
 would
 require that the underlying format support a minimum set of 
 types.

 I have something similar(ish) in my serialization framework,
 (https://github.com/Orvid/JSONSerialization) that allows you to
 implement a custom format for each type, but I implement it as 
 a pair
 of methods, toString and parse, allowing the underlying format 
 to be
 able to support only serializing strings if it really wanted 
 to. Also,
 currently my framework only supports JSON, but it's designed 
 such that
 it would be insanely easy to add support for another format. 
 It's also
 fast, very fast, mostly because I have managed to implement the 
 JSON
 serialization methods entirely with no allocation at all being
 required. I'm able to serialize 100k objects in about 90ms on 
 an i5
 running at 1.6ghz, deserialization is a bit slower currently, 
 420ms to
 deserialize those same objects, but that's almost exclusively
 allocation time.

Nov 18 2013

Jacob Carlborg <doob me.com> writes:

On 2013-11-18 12:25, "Per Nordlöw" <per.nordlow gmail.com>" wrote:

 Is JSONSerialization somehow related to the upcoming std.serialization?

No.

 I feel that there is a big need for standardizing serialization in D.
 There are too many alternatives: dproto, msgpack, JSON, xml, etc should
 be made backends to the same frontend named std.serialization right?

The idea is that std.serialization can support many different archive 
types (backends).

-- 
/Jacob Carlborg

Nov 18 2013

"Per =?UTF-8?B?Tm9yZGzDtnci?= <per.nordlow gmail.com> writes:

Ok. That is great.

Thx.

On Monday, 18 November 2013 at 12:26:19 UTC, Jacob Carlborg wrote:
 On 2013-11-18 12:25, "Per Nordlöw" <per.nordlow gmail.com>" 
 wrote:

 Is JSONSerialization somehow related to the upcoming 
 std.serialization?

 No.

 I feel that there is a big need for standardizing 
 serialization in D.
 There are too many alternatives: dproto, msgpack, JSON, xml, 
 etc should
 be made backends to the same frontend named std.serialization 
 right?

 The idea is that std.serialization can support many different 
 archive types (backends).

Nov 18 2013

Orvid King <blah38621 gmail.com> writes:

On 11/18/13, "Per Nordl�w\" <per.nordlow gmail.com>" puremagic.com
<"Per Nordl�w\" <per.nordlow gmail.com>" puremagic.com> wrote:
 Is JSONSerialization somehow related to the upcoming
 std.serialization?
 I feel that there is a big need for standardizing serialization
 in D. There are too many alternatives: dproto, msgpack, JSON,
 xml, etc should be made backends to the same frontend named
 std.serialization right?

 /Per

 On Sunday, 17 November 2013 at 21:37:35 UTC, Orvid King wrote:
 On 11/17/13, "Nordl�w" <per.nordlow gmail.com> wrote:
 In the road to develop a new kind of search engine that caches
 types, statistics, etc about files and directories I'm
 currently
 trying to implement persistent caching of my internal directory
 tree using `msgpack-d`:

 Why doesn't `msgpack-d` and, from what I can see also,
 `std.serialization` (Orange) support implementing *both*
 packing
 and unpacking through one common template (member) function
 overload like **Boost.Serialization** does?. For example
 containers can be handled using this concise and elegant syntax
 in C++11:

          friend class boost::serialization::access;
          template<class Ar> void serialize(Ar& ar, const uint
 version) {
              for (const auto& e : *this) { ar & e; }
          }

 This halves the code size aswell as removes the risk of making
 the `pack` and `unpack` go out of sync.

 I would suspect that the biggest reason is the limitations that
 that
 imposes on the underlying serialization implementation, as it
 would
 require that the underlying format support a minimum set of
 types.

 I have something similar(ish) in my serialization framework,
 (https://github.com/Orvid/JSONSerialization) that allows you to
 implement a custom format for each type, but I implement it as
 a pair
 of methods, toString and parse, allowing the underlying format
 to be
 able to support only serializing strings if it really wanted
 to. Also,
 currently my framework only supports JSON, but it's designed
 such that
 it would be insanely easy to add support for another format.
 It's also
 fast, very fast, mostly because I have managed to implement the
 JSON
 serialization methods entirely with no allocation at all being
 required. I'm able to serialize 100k objects in about 90ms on
 an i5
 running at 1.6ghz, deserialization is a bit slower currently,
 420ms to
 deserialize those same objects, but that's almost exclusively
 allocation time.


Yep, my goal with it is to be a possible contender for the place of
std.serialization, I've designed it from the start to be able to be
fast, but also easily usable, which is why toJSON and fromJSON exist,
because they provide a very large usability improvement while still
allowing it to be fast. It also is based on an abstracted api that
allows you to interact with the serialization format in a way that is
independent of what the actual format is.

Nov 18 2013

"Atila Neves" <atila.neves gmail.com> writes:

 I would suspect that the biggest reason is the limitations that 
 that
 imposes on the underlying serialization implementation, as it 
 would
 require that the underlying format support a minimum set of 
 types.

I'm not sure that's actually true. I've been working on my own 
serialisation library in D that I plan to unleash on the announce 
forum soon and it does it in a manner described by the original 
poster. Even with custom serialisations, client code need only 
define one function for both directions.

The only reason I haven't announced it yet is because I wanted to 
be pure it's polished enough, but maybe I shouldn't wait.

Atila

Nov 18 2013

"Jacek Furmankiewicz" <jacek99 gmail.com> writes:

The reason I like Thrift is that it is backwards and forwards 
compatible.

Assuming in your schema you keep defining new fields as 
"optional",
old clients can read data from new producers as well
as new clients can read data from old producers.

Not too many binary serialization formats offer this type of 
flexibility to evolve your schema over time.

Nov 18 2013

Orvid King <blah38621 gmail.com> writes:

On 11/18/13, Atila Neves <atila.neves gmail.com> wrote:
 I would suspect that the biggest reason is the limitations that
 that
 imposes on the underlying serialization implementation, as it
 would
 require that the underlying format support a minimum set of
 types.

 I'm not sure that's actually true. I've been working on my own
 serialisation library in D that I plan to unleash on the announce
 forum soon and it does it in a manner described by the original
 poster. Even with custom serialisations, client code need only
 define one function for both directions.

 The only reason I haven't announced it yet is because I wanted to
 be pure it's polished enough, but maybe I shouldn't wait.

 Atila

I am curious as to how exactly that would work, does it determine the
output format at compile-time or runtime? Does it specify the way it's
serialized, or it's serialized representation? I'd also be curious
about the performance impact it brings, if any. Depending on it's
exact function it's perfectly possible that it could actually be
faster than my toString/parse combo because mine requires the string
be allocated from toString, due to the lack of knowledge of the
underlying format.

Nov 18 2013

"Atila Neves" <atila.neves gmail.com> writes:

 I am curious as to how exactly that would work, does it 
 determine the
 output format at compile-time or runtime? Does it specify the 
 way it's
 serialized, or it's serialized representation? I'd also be 
 curious
 about the performance impact it brings, if any. Depending on 
 it's
 exact function it's perfectly possible that it could actually be
 faster than my toString/parse combo because mine requires the 
 string
 be allocated from toString, due to the lack of knowledge of the
 underlying format.

I promise to announce and explain it soon. Maybe even today. I 
just have to fix one small detail.

Nov 18 2013

"Atila Neves" <atila.neves gmail.com> writes:

http://forum.dlang.org/thread/sctofkitoaxftosxtspw forum.dlang.org#post-sctofkitoaxftosxtspw:40forum.dlang.org

On Monday, 18 November 2013 at 15:32:30 UTC, Atila Neves wrote:
 I am curious as to how exactly that would work, does it 
 determine the
 output format at compile-time or runtime? Does it specify the 
 way it's
 serialized, or it's serialized representation? I'd also be 
 curious
 about the performance impact it brings, if any. Depending on 
 it's
 exact function it's perfectly possible that it could actually 
 be
 faster than my toString/parse combo because mine requires the 
 string
 be allocated from toString, due to the lack of knowledge of the
 underlying format.

 I promise to announce and explain it soon. Maybe even today. I 
 just have to fix one small detail.

Nov 18 2013

Jacob Carlborg <doob me.com> writes:

On 2013-11-18 16:11, Atila Neves wrote:

 I'm not sure that's actually true. I've been working on my own
 serialisation library in D that I plan to unleash on the announce forum
 soon and it does it in a manner described by the original poster. Even
 with custom serialisations, client code need only define one function
 for both directions.

Is that only for custom serialization or a general requirement?

-- 
/Jacob Carlborg

Nov 18 2013

"Atila Neves" <atila.neves gmail.com> writes:

On Monday, 18 November 2013 at 15:26:28 UTC, Jacob Carlborg wrote:
 On 2013-11-18 16:11, Atila Neves wrote:

 I'm not sure that's actually true. I've been working on my own
 serialisation library in D that I plan to unleash on the 
 announce forum
 soon and it does it in a manner described by the original 
 poster. Even
 with custom serialisations, client code need only define one 
 function
 for both directions.

 Is that only for custom serialization or a general requirement?

See my previous answer.

Nov 18 2013

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Reflections on Serialization APIs in D