www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Request for review - std.serialization (orange)

reply Jacob Carlborg <doob me.com> writes:
std.serialization (orange) is now ready to be reviewed.

A couple of notes for the review:

* The most important packages are: orange.serialization and 
orange.serialization.archives

* The unit tests are located in its own package, I'm not very happy 
about putting the unit tests in the same module as the rest of the code, 
i.e. the serialization module. What are the options? These test are 
quite high level. They test the whole Serializer class and not 
individual functions.

* I'm using some utility functions located in the "util" and "core" 
packages, what should we do about those, where to put them?

* Trailing whitespace and tabs will be fixed when/if the package gets 
accepted

* If this get accepted should I do a sub-tree merge (or what it's 
called) to keep the history intact?

Changes since last time:

* I've removed any Tango and D1 related code
* I've removed all unused functions (hopefully)

For usage examples, see the github wiki pages: 
https://github.com/jacob-carlborg/orange/wiki/_pages

For more extended usage examples, see the unit tests: 
https://github.com/jacob-carlborg/orange/tree/master/tests

Sources: https://github.com/jacob-carlborg/orange
Documentation: https://dl.dropbox.com/u/18386187/orange_docs/Serializer.html
Run unit tests: execute the unittest.sh shell script

(Don't forget clicking the "Package" tab in the top corner to see the 
documentation for the rest of the modules)

-- 
/Jacob Carlborg
Mar 24 2013
next sibling parent reply Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
On 3/24/13, Jacob Carlborg <doob me.com> wrote:
 For usage examples, see the github wiki pages:
 https://github.com/jacob-carlborg/orange/wiki/_pages

A small example actually writing the xml file to disk and the reading back from it would be beneficial. Btw the library doesn't build with the -w switch: orange\xml\PhobosXml.d(2536): Error: switch case fallthrough - use 'goto case;' if intended
Mar 24 2013
next sibling parent Jacob Carlborg <doob me.com> writes:
On 2013-03-24 22:41, Andrej Mitrovic wrote:

 A small example actually writing the xml file to disk and the reading
 back from it would be beneficial.

Ok, so just adding write and read to disk to the usage example on the github page?
 Btw the library doesn't build with the -w switch:

 orange\xml\PhobosXml.d(2536): Error: switch case fallthrough - use
 'goto case;' if intended

Good catch. -- /Jacob Carlborg
Mar 25 2013
prev sibling parent Jacob Carlborg <doob me.com> writes:
On 2013-03-24 22:41, Andrej Mitrovic wrote:

 orange\xml\PhobosXml.d(2536): Error: switch case fallthrough - use
 'goto case;' if intended

PhobosXml is a local copy of std.xml with a few small modifications. If accepted I'll make the changes to std.xml and remove PhobosXml. -- /Jacob Carlborg
Mar 27 2013
prev sibling next sibling parent reply Manu <turkeyman gmail.com> writes:
--089e0149ca10652afc04d8b58d2f
Content-Type: text/plain; charset=UTF-8

Just at a glance, a few things strike me...

Phobos doesn't typically use classes, seems to prefer flat functions. Are
we happy with classes in this instance?
Use of caps in the filenames/functions is not very phobos like.

Can I have a post-de-serialise callback to recalculate transient data?

Why register serialiser's, and structures that can be operated on? (I'm not
a big fan of registrations of this sort personally, if they can be avoided)

Is there a mechanism to deal with pointers, or do you just serialise
through the pointer? Some sort of reference system so objects pointing at
the same object instance will deserialise pointing at the same object
instance (or a new copy thereof)?

Is it fast? I see in your custom deserialise example, you deserialise
members by string name... does it need to FIND those in the stream by name,
or does it just use that to validate the sequence?
I have a serialiser that serialises in realtime (60fps), a good fair few
megabytes of data per frame... will orange handle this?

Documentation, what attributes are available? How to use them?

You only seem to provide an XML backend. What about JSON? Binary (with
endian awareness)?

Writing an Archiver looks a lot more involved than I would have imagined.
XmlArchive.d is huge, mostly just 'ditto'.
Should unarchiveXXX() not rather be unarchive!(XXX)(), allowing to minimise
most of those function definitions?


On 25 March 2013 07:41, Andrej Mitrovic <andrej.mitrovich gmail.com> wrote:

 On 3/24/13, Jacob Carlborg <doob me.com> wrote:
 For usage examples, see the github wiki pages:
 https://github.com/jacob-carlborg/orange/wiki/_pages

A small example actually writing the xml file to disk and the reading back from it would be beneficial. Btw the library doesn't build with the -w switch: orange\xml\PhobosXml.d(2536): Error: switch case fallthrough - use 'goto case;' if intended

--089e0149ca10652afc04d8b58d2f Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable <div dir=3D"ltr">Just at a glance, a few things strike me...<div><br><div s= tyle>Phobos doesn&#39;t typically use classes, seems to prefer flat functio= ns. Are we happy with classes in this instance?</div></div><div style><div> Use of caps in the filenames/functions is not very phobos like.</div><div><= br></div></div><div style>Can I have a post-de-serialise callback to recalc= ulate transient data?<br></div><div style><br></div><div style>Why register= serialiser&#39;s, and structures that can be operated on? (I&#39;m not a b= ig fan of registrations of this sort personally, if they can be avoided)<br=

nters, or do you just serialise through the pointer? Some sort of reference= system so objects pointing at the same object instance will deserialise po= inting at the same object instance (or a new copy thereof)?</div> <div style><br></div><div style>Is it fast? I see in your custom deserialis= e example, you deserialise members by string name... does it need to FIND t= hose in the stream by name, or does it just use that to validate the sequen= ce?</div> <div style>I have a serialiser that serialises in realtime (60fps), a good = fair few megabytes of data per frame... will orange handle this?</div><div = style><br></div><div style>Documentation, what attributes are available? Ho= w to use them?</div> <div><br></div><div style>You only seem to provide an XML backend. What abo= ut JSON? Binary (with endian awareness)?</div><div><br></div><div style>Wri= ting an Archiver looks a lot more involved than I would have imagined. XmlA= rchive.d is huge, mostly just &#39;ditto&#39;.</div> <div style>Should unarchiveXXX() not rather be unarchive!(XXX)(), allowing = to minimise most of those function definitions?</div></div><div class=3D"gm= ail_extra"><br><br><div class=3D"gmail_quote">On 25 March 2013 07:41, Andre= j Mitrovic <span dir=3D"ltr">&lt;<a href=3D"mailto:andrej.mitrovich gmail.c= om" target=3D"_blank">andrej.mitrovich gmail.com</a>&gt;</span> wrote:<br> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p= x #ccc solid;padding-left:1ex"><div class=3D"im">On 3/24/13, Jacob Carlborg= &lt;<a href=3D"mailto:doob me.com">doob me.com</a>&gt; wrote:<br> &gt; For usage examples, see the github wiki pages:<br> &gt; <a href=3D"https://github.com/jacob-carlborg/orange/wiki/_pages" targe= t=3D"_blank">https://github.com/jacob-carlborg/orange/wiki/_pages</a><br> <br> </div>A small example actually writing the xml file to disk and the reading= <br> back from it would be beneficial.<br> <br> Btw the library doesn&#39;t build with the -w switch:<br> <br> orange\xml\PhobosXml.d(2536): Error: switch case fallthrough - use<br> &#39;goto case;&#39; if intended<br> </blockquote></div><br></div> --089e0149ca10652afc04d8b58d2f--
Mar 24 2013
parent reply Jacob Carlborg <doob me.com> writes:
On 2013-03-25 02:16, Manu wrote:
 Just at a glance, a few things strike me...

 Phobos doesn't typically use classes, seems to prefer flat functions.

It's necessary to have a class or struct to pass around. The serializer is passed to method/functions doing custom serialization. I could create a free function that encapsulates the classes for the common use cases.
 Are we happy with classes in this instance?
 Use of caps in the filenames/functions is not very phobos like.

Yeah, that will be fixed if accepted. As you see, it's still a separate library and not included into Phobos.
 Can I have a post-de-serialise callback to recalculate transient data?

Yes. There are three ways to custom the serialization process. 1. Take complete control of the process (for the type) by adding toData/fromData to your types https://github.com/jacob-carlborg/orange/wiki/Custom-Serialization 2. Take complete control of the process (for the type) by registering a function pointer/delegate as a serializer for a given type. Useful for serializing third party types https://github.com/jacob-carlborg/orange/wiki/Non-Intrusive-Serialization 3. Add the onDeserialized attribute to a method in the type being serialized https://github.com/jacob-carlborg/orange/blob/master/tests/Events.d#L75 https://dl.dropbox.com/u/18386187/orange_docs/Events.html I noticed that the documentation for the attributes don't look so good.
 Why register serialiser's, and structures that can be operated on? (I'm
 not a big fan of registrations of this sort personally, if they can be
 avoided)

The only time when registering a serializer is really necessary is when serializing through a base class reference. Otherwise the use cases are when customizing the serialization process.
 Is there a mechanism to deal with pointers, or do you just serialise
 through the pointer? Some sort of reference system so objects pointing
 at the same object instance will deserialise pointing at the same object
 instance (or a new copy thereof)?

Yes. All references types (including pointers) are only serialized ones. If a pointer, that is serialized, is pointing to data not being serialized it serialize what it's pointing to as well. If you're curious about the internals I suggest you serialize some class/strcut hierarchy and look at the XML data. It should be readable.
 Is it fast? I see in your custom deserialise example, you deserialise
 members by string name... does it need to FIND those in the stream by
 name, or does it just use that to validate the sequence?

That's up to the archive how to implemented. But the idea is that it should be able to find by name in the serialized data. That is kind of an implicit contract between the archive and the serializer.
 I have a serialiser that serialises in realtime (60fps), a good fair few
 megabytes of data per frame... will orange handle this?

Probably not. I think it mostly depends on the archive used. The XML module in Phobos is really, REALLY slow. Serializing the same data with Tango (D1) is at least twice as fast. I have started to work on an archive type that just tries to be as fast as possible. That: * Break the implicit contract with the serializer * Doesn't care about endians * Doesn't care if the fields have changed * May not handle slices correctly * And some other things
 Documentation, what attributes are available? How to use them?

https://dl.dropbox.com/u/18386187/orange_docs/Events.html https://dl.dropbox.com/u/18386187/orange_docs/Serializable.html Is this clear enough?
 You only seem to provide an XML backend. What about JSON? Binary (with
 endian awareness)?

Yeah, that is not implemented yet. Is it necessary before adding to to Phobos?
 Writing an Archiver looks a lot more involved than I would have
 imagined. XmlArchive.d is huge, mostly just 'ditto'.
 Should unarchiveXXX() not rather be unarchive!(XXX)(), allowing to
 minimise most of those function definitions?

Yeah, it has kind of a big API. The reason is to be able to use interfaces. Seriailzer contains a reference to an archive, typed as the interface Archive. If you're using custom serialization I don't think it would be good to lock yourself to a specific archive type. BTW, unarchiveXXX is forwarded to a private unarchive!(XXX)() in XmlArchive. With classes and interfaces: class Serializer interface Archive class XmlArchive : Archive Archive archive = new XmlArchive; auto serializer = new Serializer(archive); struct Foo { void toData (Serializer serializer, Serializer.Data key); } With templates: class Serializer (T) class XmlArchive auto archive = new XmlArchive; auto serializer = new Serializer!(XmlArchive)(archive); struct Foo { void toData (Serializer!(XmlArchive) serializer, Serializer.Data key); } Foo is now locked to the XmlArchive. Or: class Bar { void toData (T) (Serializer!(T) serializer, Serializer.Data key); } toData cannot be virtual. -- /Jacob Carlborg
Mar 25 2013
parent reply Jacob Carlborg <doob me.com> writes:
On 2013-03-31 12:40, Kagamin wrote:

 http://dpaste.dzfl.pl/0f7d8219

So instead Serializer gets a huge API? -- /Jacob Carlborg
Mar 31 2013
next sibling parent Jacob Carlborg <doob me.com> writes:
On 2013-03-31 21:02, Kagamin wrote:

 Are a lot of serializers expected to be written? Archives are.

Hmm, maybe it could work. -- /Jacob Carlborg
Mar 31 2013
prev sibling parent Jacob Carlborg <doob me.com> writes:
On 2013-03-31 23:40, Kagamin wrote:
 A "MyArchive" example can be useful too. The basic idea is to write a
 minimal archive class with basic test code. All methods
 assert(false,"method archive(dchar) not implemented"); the example
 complies and runs, but asserts. So people take the example and fill
 methods with their own implementations, thus incrementally building
 their archive class.

Yes, if the API is change to what you're suggesting. -- /Jacob Carlborg
Apr 01 2013
prev sibling next sibling parent Jacob Carlborg <doob me.com> writes:
On 2013-03-24 22:03, Jacob Carlborg wrote:

 std.serialization (orange) is now ready to be reviewed.

Just so there is no confusion. If it gets accepted I will replace tabs with spaces, fix the column limit and change all filenames to lowercase. -- /Jacob Carlborg
Mar 25 2013
prev sibling next sibling parent reply "Jesse Phillips" <Jesse.K.Phillips+D gmail.com> writes:
Hello Jacob,

These comments are based on looking into adding Protocol Buffer 
as an archive. First some details on the PB format.
https://developers.google.com/protocol-buffers/docs/overview

1) It is a binary format
2) Not all D types can be serialized
3) Serialization is done by message (struct) and not by primitives
4) It defines options which can affect (de)serializing.

I am looking at using Serializer to drive (de)serialization even 
if that meant just jamming it in there where Orange could only 
read PB data it has written. Keep in mind I'm not saying these 
are requirements or that I know what I'm talking about, only my 
thoughts.

My first thought was at a minimum I could just use a function 
which does the complete (de)serialization of the type. Which 
would be great since the pbcompiler I'm using/modifying already 
does this.

Because of the way custom serialization I'm stopped by point 3. I 
didn't realize that at first so I also looked at implementing an 
Archive. What I notice here is

* Information is lost, specifically the attributes (more 
important with UDA).
* I am required to implement conversions I have no implementation 
for.

This leaves me concluding that I'd need to implement my own 
Serializer, which seems to me I'm practically reimplementing most 
of Orange to use Orange with PB.

Does having Orange support things like PB make sense?

I think some work could be done for the Archive API as it doesn't 
feel like D2. Maybe we could look at locking down custom 
Archive/Serializer classes while the internals are worked out 
(would mean XML (de)serialization is available in Phobos).
Mar 30 2013
next sibling parent Jacob Carlborg <doob me.com> writes:
On 2013-03-30 21:02, Jesse Phillips wrote:
 Hello Jacob,

 These comments are based on looking into adding Protocol Buffer as an
 archive. First some details on the PB format.
 https://developers.google.com/protocol-buffers/docs/overview

 1) It is a binary format

That shouldn't be a problem. Preferably it should support some kind of identity map and be able to deserialize fields in any order.
 2) Not all D types can be serialized

Any data format that supports some kind of key-value mapping should be able to serialize all D types. Although, possibly in a format that is not idiomatic for that data format. XML doesn't have any types and the XML archive can serialize any D type.
 3) Serialization is done by message (struct) and not by primitives

I'm not sure I understand this.
 4) It defines options which can affect (de)serializing.

While Orange doesn't support the options Protocol Buffer seems to use directly, it should be possible by customizing the serialization of a type. See: https://github.com/jacob-carlborg/orange/wiki/Custom-Serialization https://github.com/jacob-carlborg/orange/wiki/Non-Intrusive-Serialization Alternatively they are useful enough to have direct support in the serializer.
 I am looking at using Serializer to drive (de)serialization even if that
 meant just jamming it in there where Orange could only read PB data it
 has written. Keep in mind I'm not saying these are requirements or that
 I know what I'm talking about, only my thoughts.

That should be possible. I've been working a binary archive that tries to be as fast as possible, breaking rules to the left and right, doesn't conform to the implicit contract between the serializer and archive and so on.
 My first thought was at a minimum I could just use a function which does
 the complete (de)serialization of the type. Which would be great since
 the pbcompiler I'm using/modifying already does this.

 Because of the way custom serialization I'm stopped by point 3. I didn't
 realize that at first so I also looked at implementing an Archive. What
 I notice here is

 * Information is lost, specifically the attributes (more important with
 UDA).

Do you want UDA's passed to the archive for a give type or field? I don't know how easy that would be to implement. It would probably require a template method in the archive, which I would like to avoid, since it wouldn't be possible to use via an interface.
 * I am required to implement conversions I have no implementation for.

Just implement an empty method for any method you don't have use for. If it needs to return a value, you can most of return typeof(return).init.
 This leaves me concluding that I'd need to implement my own Serializer,
 which seems to me I'm practically reimplementing most of Orange to use
 Orange with PB.

That doesn't sound good.
 Does having Orange support things like PB make sense?

I think so.
 I think some work could be done for the Archive API as it doesn't feel
 like D2.

It started for D1.
 Maybe we could look at locking down custom Archive/Serializer
 classes while the internals are worked out (would mean XML
 (de)serialization is available in Phobos).

-- /Jacob Carlborg
Mar 30 2013
prev sibling next sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2013-03-31 13:23, Kagamin wrote:

 PB does serialize by primitives and Archive has archiveStruct method
 which is called to serialize struct, I believe. At first sight orange
 serializes using built-in grammar (in EXI terms), and since PB uses
 schema-informed grammar, you have to provide schema to the archiver:
 either keep it in the archiver or store globally.

The actual struct is never passed to the archive in Orange. It's basically lets the archive know, "the next primitives belong to a struct". -- /Jacob Carlborg
Mar 31 2013
parent reply Jacob Carlborg <doob me.com> writes:
On 2013-03-31 21:06, Kagamin wrote:

 Knowing the struct type name one may select the matching schema. In the
 case of PB the schema collection is just int[string][string] - maps type
 names to field maps.

Ok. I'm not familiar with Protocol Buffers. -- /Jacob Carlborg
Mar 31 2013
next sibling parent Jacob Carlborg <doob me.com> writes:
On 2013-04-01 07:15, Kagamin wrote:
 It's a pull parser? Hmm... how reordered fields are supposed to be
 handled? When the archiver is requested for a field, it will probably
 need to look ahead for the field in the entire message. Also arrays can
 be discontinuous both in xml and in pb. Also if the archiver is
 requested for a missing field, it may be a bad idea to return
 typeof(return).init as it will overwrite the default value for the field
 in the structure. Though, this may be a minor issue: field usually is
 missing because it's obsolete, but the serializer will spend time
 requesting missing fields.

Optional fields are possible to implement by writing a custom serializer for a given type. The look ahead is not needed for the entire message. Only for the length of a class/strcut. But since fields of class can consist of other class it might not make a difference.
 As a schema-informed serialization, PB works better with specialized
 code, so it's better to provide a means for specialized serialization,
 where components will be tightly coupled, and the archiver will have
 full access to the serialized type and will be able to infer schema.
 Isn't serialization simpler when you have access to the type?

Yes, it would probably be simpler if the archive had access to the type. The idea behind Orange is that Serializer tries to do as much as possible of the implementation and leaves the data dependent parts to the archive. Also, the archive only needs to know how to serialize primitive types. -- /Jacob Carlborg
Apr 01 2013
prev sibling parent Jacob Carlborg <doob me.com> writes:
On 2013-03-31 23:57, Kagamin wrote:

 Well, the basic idea of EXI and similar standards is that you can have 2
 types of serialization: built-in when you keep schema in the serialized
 message - which value belongs to which field (this way you can read and
 write any data structure) or schema-informed when the serializer knows
 what data it works with, so it omits schema from the message and e.g.
 writes two int fields as just consecutive 8 bytes - it knows that these
 8 bytes are 2 ints and which field each belongs to; the drawback is that
 you can't read the message without schema, the advantage is smaller
 message size and faster serialization.

I see. -- /Jacob Carlborg
Apr 01 2013
prev sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2013-04-01 01:39, Jesse Phillips wrote:

 I'm not well versed in PB or Orange so I'd need to play around more with
 both, but I'm pretty sure Orange would need changes made to be able to
 make the claim PB is supported. It should be possible to create a binary
 format based on PB.

Isn't PB binary? Or it actually seems it can be both. -- /Jacob Carlborg
Apr 01 2013
next sibling parent reply Matt Soucy <msoucy csh.rit.edu> writes:
On 04/01/2013 01:13 PM, Jesse Phillips wrote:
 On Monday, 1 April 2013 at 08:53:51 UTC, Jacob Carlborg wrote:
 On 2013-04-01 01:39, Jesse Phillips wrote:

 I'm not well versed in PB or Orange so I'd need to play around more with
 both, but I'm pretty sure Orange would need changes made to be able to
 make the claim PB is supported. It should be possible to create a binary
 format based on PB.

Isn't PB binary? Or it actually seems it can be both.

Let me see if I can describe this. PB does encoding to binary by type. However it also has a schema in a .proto file. My first concern is that this file provides the ID to use for each field, while arbitrary the ID must be what is specified. The second one I'm concerned with is option to pack repeated fields. I'm not sure the specifics for this encoding, but I imagine some compression. This is why I think I'd have to implement my own Serializer to be able to support PB, but also believe we could have a binary format based on PB (which maybe it would be possible to create a schema of Orange generated data, but it would be hard to generate data for a specific schema).

following: auto msg = fields.map!(a=>a.serialize())().reduce!(a,b=>a~b)(); return ((id<<3)|2) ~ msg.length.toVarint() ~ msg; Where msg is a ubyte[]. -Matt Soucy
Apr 01 2013
next sibling parent reply Matt Soucy <msoucy csh.rit.edu> writes:
On 04/01/2013 02:37 PM, Kagamin wrote:
 AFAIK, it's opposite: an array serialized in chunks, and they are
 concatenated on deserialization. Useful if you don't know how much
 elements you're sending, so you send them in finite chunks as the data
 becomes available. Client can also close connection, so you don't have
 to see the end of the sequence.

https://developers.google.com/protocol-buffers/docs/encoding#optional The "packed repeated fields" section explains it and breaks it down with an example. If the client can close like that, you probably don't want to use packed. -Soucy
Apr 01 2013
parent reply Matt Soucy <msoucy csh.rit.edu> writes:
On 04/01/2013 03:11 PM, Kagamin wrote:
 On Monday, 1 April 2013 at 18:41:57 UTC, Matt Soucy wrote:
 The "packed repeated fields" section explains it and breaks it down
 with an example. If the client can close like that, you probably don't
 want to use packed.

Why not? If you transfer a result of google search, the client will be able to peek only N first results and close or not. Though I agree it's strange that you can only transfer primitive types this way.

serialization. A message is recorded as length+serialized members. Members can happen in any order. Packed repeated messages would look like...what? How do you know when one message ends and another begins? If you try and denote it, you'd just end up with what you already have. In your example, you'd want to send each individual result as a distinct message, so they could be read one at a time. You wouldn't want to pack, as packing is for sending a whole data set at once.
Apr 01 2013
parent reply Matt Soucy <msoucy csh.rit.edu> writes:
On 04/01/2013 04:54 PM, Kagamin wrote:
 On Monday, 1 April 2013 at 19:37:12 UTC, Matt Soucy wrote:
 It's not really strange, because of how it actually does the
 serialization. A message is recorded as length+serialized members.
 Members can happen in any order. Packed repeated messages would look
 like...what? How do you know when one message ends and another begins?
 If you try and denote it, you'd just end up with what you already have.

Well, messages can be just repeated, not packed. Packing is for really small elements, I guess, - namely numbers.

And therefore, it supports arrays just fine (as repeated fields). Yes. That last sentence was poorly-worded, and should have said "you'd just end up with the un'packed' data with an extra header."
 In your example, you'd want to send each individual result as a
 distinct message, so they could be read one at a time. You wouldn't
 want to pack, as packing is for sending a whole data set at once.

So you suggest to send 1 message per TCP packet?

Unfortunately, I'm not particularly knowledgeable about networking, but that's not quite what I meant. I meant that the use case itself would result in sending individual Result messages one at a time, since packing (even if it were valid) wouldn't be useful and would require getting all of the Results at once. You would just leave off the "packed" attribute.
Apr 01 2013
parent Matt Soucy <msoucy csh.rit.edu> writes:
On 04/02/2013 12:38 AM, Kagamin wrote:
 On Monday, 1 April 2013 at 21:11:57 UTC, Matt Soucy wrote:
 And therefore, it supports arrays just fine (as repeated fields). Yes.
 That last sentence was poorly-worded, and should have said "you'd just
 end up with the un'packed' data with an extra header."

It says repeated messages should be merged which results in one message, not an array of messages. So from several repeated messages you get one as if they formed contiguous soup of fields which got parsed as one message: e.g. scalar fields of the resulting message take their last seen values.

They're merged if the field is optional and gets multiple inputs with that id. If a field is marked as repeated, then each block of data denoted with that field is treated as a new item in the array.
 Unfortunately, I'm not particularly knowledgeable about networking,
 but that's not quite what I meant. I meant that the use case itself
 would result in sending individual Result messages one at a time,
 since packing (even if it were valid) wouldn't be useful and would
 require getting all of the Results at once. You would just leave off
 the "packed" attribute.

As you said, there's no way to tell where one message ends and next begins. If you send them one or two at a time, they end up as a contiguous stream of bytes. If one is to delimit messages, he should define a container format as an extension on top of PB with additional semantics for representation of arrays, which results in another protocol. And even if you define such protocol, there's still no way to have array fields in PB messages (arrays of non-trivial types). For example if you want to update students and departments with one method, the obvious choice is to pass it a dictionary of key-value pairs of new values for the object's attributes. How to do that?

I said that that only applies to the incorrect idea of packing complex messages. With regular repeated messages (nonpacked), each new repeated message is added to the array when deserialized. While yes, protocol buffers do not create a way to denote uppermost-level messages, that isn't really relevant to the situation that you're trying to claim. If messages are supposed to be treated separately, there are numerous ways to handle that which CAN be done inside of protocol buffers. In this example, one way could be to define messages like so: message Updates { message StudentUpdate { required string studentName = 1; required uint32 departmentNumber = 2; } repeated StudentUpdate updates = 1; } The you would iterate over Updates.updates, which you'd be adding to upon deserialization of more of the messages.
Apr 01 2013
prev sibling parent Matt Soucy <msoucy csh.rit.edu> writes:
On 04/01/2013 04:38 PM, Jesse Phillips wrote:
 On Monday, 1 April 2013 at 17:24:05 UTC, Matt Soucy wrote:
 From what I got from the examples, Repeated fields are done roughly as
 following:
 auto msg = fields.map!(a=>a.serialize())().reduce!(a,b=>a~b)();
 return ((id<<3)|2) ~ msg.length.toVarint() ~ msg;
 Where msg is a ubyte[].
 -Matt Soucy

I think that would fall under some form of compression, namely compressing the ID :P BTW, I love how easy that is to read.

So, I looked at the code I'm currently working on to handle these...and it's literally that, except "raw" instead of "fields". UFCS is kind of wonderful in places like this. You're right, that does count as compression. It wouldn't be hard to number the fields during serialization, especially if you don't need to worry about the structure changing. -Matt Soucy
Apr 01 2013
prev sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2013-04-01 19:13, Jesse Phillips wrote:

 Let me see if I can describe this.

 PB does encoding to binary by type. However it also has a schema in a
 .proto file. My first concern is that this file provides the ID to use
 for each field, while arbitrary the ID must be what is specified.

 The second one I'm concerned with is option to pack repeated fields. I'm
 not sure the specifics for this encoding, but I imagine some compression.

 This is why I think I'd have to implement my own Serializer to be able
 to support PB, but also believe we could have a binary format based on
 PB (which maybe it would be possible to create a schema of Orange
 generated data, but it would be hard to generate data for a specific
 schema).

As I understand it there's a "schema definition", that is the .proto file. You compile this schema to produce D/C++/Java/whatever code that contains structs/classes with methods/fields that matches this schema. If you need to change the schema, besides adding optional fields, you need to recompile the schema to produce new code, right? If you have a D class/struct that matches this schema (regardless if it's auto generated from the schema or manually created) with actual instance variables for the fields I think it would be possible to (de)serialize into the binary PB format using Orange. Then there's the issue of the options supported by PB like optional fields and pack repeated fields (which I don't know what it means). It seems PB is dependent on the order of the fields so that won't be a problem. Just disregard the "key" that is passed to the archive and deserialize the next type that is expected. Maybe you could use the schema to do some extra validations. Although, I don't know how PB handles multiple references to the same value. Looking at this: https://developers.google.com/protocol-buffers/docs/overview Below "Why not just use XML?", they both mention a text format (not to be confused with the schema, .proto) and a binary format. Although the text format seems to be mostly for debugging. -- /Jacob Carlborg
Apr 02 2013
parent reply Matt Soucy <msoucy csh.rit.edu> writes:
On 04/02/2013 03:21 AM, Jacob Carlborg wrote:
 On 2013-04-01 19:13, Jesse Phillips wrote:

 Let me see if I can describe this.

 PB does encoding to binary by type. However it also has a schema in a
 .proto file. My first concern is that this file provides the ID to use
 for each field, while arbitrary the ID must be what is specified.

 The second one I'm concerned with is option to pack repeated fields. I'm
 not sure the specifics for this encoding, but I imagine some compression.

 This is why I think I'd have to implement my own Serializer to be able
 to support PB, but also believe we could have a binary format based on
 PB (which maybe it would be possible to create a schema of Orange
 generated data, but it would be hard to generate data for a specific
 schema).

As I understand it there's a "schema definition", that is the .proto file. You compile this schema to produce D/C++/Java/whatever code that contains structs/classes with methods/fields that matches this schema. If you need to change the schema, besides adding optional fields, you need to recompile the schema to produce new code, right? If you have a D class/struct that matches this schema (regardless if it's auto generated from the schema or manually created) with actual instance variables for the fields I think it would be possible to (de)serialize into the binary PB format using Orange. Then there's the issue of the options supported by PB like optional fields and pack repeated fields (which I don't know what it means). It seems PB is dependent on the order of the fields so that won't be a problem. Just disregard the "key" that is passed to the archive and deserialize the next type that is expected. Maybe you could use the schema to do some extra validations. Although, I don't know how PB handles multiple references to the same value. Looking at this: https://developers.google.com/protocol-buffers/docs/overview Below "Why not just use XML?", they both mention a text format (not to be confused with the schema, .proto) and a binary format. Although the text format seems to be mostly for debugging.

Unfortunately, only partially correct. Optional isn't an "option", it's a way of saying that a field may be specified 0 or 1 times. If two messages with the same ID are read and the ID is considered optional in the schema, then they are merged. Packed IS an "option", which can only be done to primitives. It changes serialization from:
 return raw.map!(a=>(MsgType!BufferType | (id << 3)).toVarint() ~ 

to
 auto msg = raw.map!(writeProto!BufferType)().reduce!((a,b)=>a~b)();
 return (2 | (id << 3)).toVarint() ~ msg.length.toVarint() ~ msg;

(Actual snippets from my partially-complete protocol buffer library) If you had a struct that matches that schema (PB messages have value semantics) then yes, in theory you could do something to serialize the struct based on the schema, but you'd have to maintain both separately. PB is NOT dependent on the order of the fields during serialization, they can be sent/received in any order. You could use the schema like you mentioned above to tie member names to ids, though. PB uses value semantics, so multiple references to the same thing isn't really an issue that is covered. I hadn't actually noticed that TextFormat stuff before...interesting. I might take a look at that later when I have time. -Matt Soucy
Apr 02 2013
parent reply Jacob Carlborg <doob me.com> writes:
On 2013-04-02 15:38, Matt Soucy wrote:

 Unfortunately, only partially correct. Optional isn't an "option", it's
 a way of saying that a field may be specified 0 or 1 times. If two
 messages with the same ID are read and the ID is considered optional in
 the schema, then they are merged.

With "option", I mean you don't have to use it in the schema. But the (de)serializer of course need to support this to be fully compliant with the spec.
 Packed IS an "option", which can only be done to primitives. It changes
 serialization from:
  > return raw.map!(a=>(MsgType!BufferType | (id << 3)).toVarint() ~
 a.writeProto!BufferType())().reduce!((a,b)=>a~b)();
 to
  > auto msg = raw.map!(writeProto!BufferType)().reduce!((a,b)=>a~b)();
  > return (2 | (id << 3)).toVarint() ~ msg.length.toVarint() ~ msg;

 (Actual snippets from my partially-complete protocol buffer library)

 If you had a struct that matches that schema (PB messages have value
 semantics) then yes, in theory you could do something to serialize the
 struct based on the schema, but you'd have to maintain both separately.

Just compile the schema to a struct with the necessary fields. Perhaps not how it's usually done.
 PB is NOT dependent on the order of the fields during serialization,
 they can be sent/received in any order. You could use the schema like
 you mentioned above to tie member names to ids, though.

So if you have a schema like this: message Person { required string name = 1; required int32 id = 2; optional string email = 3; } 1, 2 and 3 will be ids of the fields, and also the order in which they are (de)serialized? Then you could have the archive read the schema, map names to ids and archive the ids instead of the names.
 PB uses value semantics, so multiple references to the same thing isn't
 really an issue that is covered.

I see, that kind of sucks, in my opinion. -- /Jacob Carlborg
Apr 02 2013
parent reply Matt Soucy <msoucy csh.rit.edu> writes:
On 04/02/2013 10:52 AM, Jacob Carlborg wrote:
 On 2013-04-02 15:38, Matt Soucy wrote:

 Unfortunately, only partially correct. Optional isn't an "option", it's
 a way of saying that a field may be specified 0 or 1 times. If two
 messages with the same ID are read and the ID is considered optional in
 the schema, then they are merged.

With "option", I mean you don't have to use it in the schema. But the (de)serializer of course need to support this to be fully compliant with the spec.

OK, I see what you mean. PB uses the term "option" for language constructs, hence my confusion.
 Packed IS an "option", which can only be done to primitives. It changes
 serialization from:
  > return raw.map!(a=>(MsgType!BufferType | (id << 3)).toVarint() ~
 a.writeProto!BufferType())().reduce!((a,b)=>a~b)();
 to
  > auto msg = raw.map!(writeProto!BufferType)().reduce!((a,b)=>a~b)();
  > return (2 | (id << 3)).toVarint() ~ msg.length.toVarint() ~ msg;

 (Actual snippets from my partially-complete protocol buffer library)

 If you had a struct that matches that schema (PB messages have value
 semantics) then yes, in theory you could do something to serialize the
 struct based on the schema, but you'd have to maintain both separately.

Just compile the schema to a struct with the necessary fields. Perhaps not how it's usually done.

Again, my misunderstanding. I assumed you were talking about taking a pre-existing struct, not one generated from the .proto
 PB is NOT dependent on the order of the fields during serialization,
 they can be sent/received in any order. You could use the schema like
 you mentioned above to tie member names to ids, though.

So if you have a schema like this: message Person { required string name = 1; required int32 id = 2; optional string email = 3; } 1, 2 and 3 will be ids of the fields, and also the order in which they are (de)serialized? Then you could have the archive read the schema, map names to ids and archive the ids instead of the names.

You could easily receive 3,1,2 or 2,1,3 or any other such combination, and it would still be valid. That doesn't stop you from doing what you suggest, however, as long as you can lookup id[name] and name[id].
 PB uses value semantics, so multiple references to the same thing isn't
 really an issue that is covered.

I see, that kind of sucks, in my opinion.

Eh. I personally think that it makes sense, and don't have much of a problem with it.
Apr 02 2013
parent reply Jacob Carlborg <doob me.com> writes:
On 2013-04-02 18:24, Matt Soucy wrote:

 Again, my misunderstanding. I assumed you were talking about taking a
 pre-existing struct, not one generated from the .proto

It doesn't really matter where the struct comes from.
 You could easily receive 3,1,2 or 2,1,3 or any other such combination,
 and it would still be valid. That doesn't stop you from doing what you
 suggest, however, as long as you can lookup id[name] and name[id].

Right. The archive gets the names, it's then up to the archive how to map names to PB ids. If the archive gets "foo", "bar" and the serialized data contains "bar", "foo" can it handle that? What I mean is that the serializer decides which field should be (de)serialized not the archive.
 Eh. I personally think that it makes sense, and don't have much of a
 problem with it.

It probably makes sense if one sends the data over the network and the data is mostly value based. I usually have an object hierarchy with many reference types and objects passed around. -- /Jacob Carlborg
Apr 02 2013
parent reply Jacob Carlborg <doob me.com> writes:
On 2013-06-17 15:31, Francois Chabot wrote:

 I think that's kinda the sticking point with Orange for me. Integrating
 it in my current project implied bringing in a huge amount of code when
 my needs are super-straightforward.

Why is that a problem?
 I actually ended up writing myself a super-light range-based serializer
 that'll handle any combination of struct/Array/AA thrown at it. (for
 reference:
 https://github.com/Chabsf/flyb/blob/SerializableHistory/serialize.d ).
 It also does not copy data around, and instead just casts the data as
 arrays of bytes. It's under 300 lines of D code and does everything I
 need from Orange, and does it very fast. Orange is just so very overkill
 for my needs.

It's a lot easier to create a serializer which doesn't handle the full set of D. When you start to bring in reference types, pointers, arrays and slices it's start to get more complicated.
 My point is that serializing POD-based structures in D is so simple that
 using a one-size-fit-all serializer made to handle absolutely everything
 feels very wasteful.

You don't have to use it. -- /Jacob Carlborg
Jun 17 2013
parent reply Jacob Carlborg <doob me.com> writes:
On 2013-06-17 16:39, Francois Chabot wrote:

 I will grant that making Orange part of Phobos will alleviate the
 project bloat issue, which is a huge deal. But as it stands, to me, it
 only handles a subset of what std.serialization should.

So what would you like it to handle? I assume you want a binary archive and you want faster serialization? You are free to add enhancement requests to the github project and comment in the official review thread. The thing is with Orange is that it's possible to add new archive types. If Orange gets accepted as std.serialization we could later add a binary archive. Do you want to stop std.serialization just because of it not having a binary archive? Not having a binary archive doesn't make the XML archive useless. -- /Jacob Carlborg
Jun 17 2013
parent Jacob Carlborg <doob me.com> writes:
On 2013-06-18 00:06, Francois Chabot wrote:

 Well, the main thing that I want on my end is a O(1) memory footprint
 when dealing with hierarchical data with no cross-references.

That's kind of hard since it creates data. But if you mean except for the data then it stores some meta data in order to support reference types. That is, not serializing the same reference more than once. Theoretically a template parameter could be added to avoid this, but that defeats the purpose of a class/interface design. It could be a runtime parameter, don't know how well the compiler can optimize that out.
 Even better would be that serialization being a lazy input range that can be
 easily piped into something like Walter's std.compress. I guess I could
 log an enhancement request to that effect, but I kinda felt that this
 was beyond the scope of Orange. It has a clear serialize, then deal with
 the data design. What I need really goes against the grain here.

I'm no expert on ranges but I'm pretty sure it can work with std.compress. It returns an array, which is a random access range. Although it won't be lazy. The deserialization would be a bit harder since it depends on std.xml which is not range based.
 Once again, the sticking point is not the serialization format, but the
 delivery method of said data.

Ok.
 No! no no no. I just feel that Orange handles a big piece of the
 serialization puzzle, but that there's a lot more to it.

Ok, I see. But I need to know about the rest, what you're missing to be able to improve it. I have made a couple of comments here. -- /Jacob Carlborg
Jun 18 2013
prev sibling next sibling parent "Kagamin" <spam here.lot> writes:
On Monday, 25 March 2013 at 08:53:32 UTC, Jacob Carlborg wrote:
 With templates:

 class Serializer (T)
 class XmlArchive

 auto archive = new XmlArchive;
 auto serializer = new Serializer!(XmlArchive)(archive);

 struct Foo
 {
     void toData (Serializer!(XmlArchive) serializer, 
 Serializer.Data key);
 }

 Foo is now locked to the XmlArchive. Or:

 class Bar
 {
     void toData (T) (Serializer!(T) serializer, Serializer.Data 
 key);
 }

 toData cannot be virtual.

http://dpaste.dzfl.pl/0f7d8219
Mar 31 2013
prev sibling next sibling parent "Kagamin" <spam here.lot> writes:
On Saturday, 30 March 2013 at 20:02:48 UTC, Jesse Phillips wrote:
 3) Serialization is done by message (struct) and not by 
 primitives

PB does serialize by primitives and Archive has archiveStruct method which is called to serialize struct, I believe. At first sight orange serializes using built-in grammar (in EXI terms), and since PB uses schema-informed grammar, you have to provide schema to the archiver: either keep it in the archiver or store globally.
Mar 31 2013
prev sibling next sibling parent "Kagamin" <spam here.lot> writes:
On Sunday, 31 March 2013 at 17:33:28 UTC, Jacob Carlborg wrote:
 On 2013-03-31 12:40, Kagamin wrote:

 http://dpaste.dzfl.pl/0f7d8219

So instead Serializer gets a huge API?

Are a lot of serializers expected to be written? Archives are.
Mar 31 2013
prev sibling next sibling parent "Kagamin" <spam here.lot> writes:
On Sunday, 31 March 2013 at 17:36:12 UTC, Jacob Carlborg wrote:
 The actual struct is never passed to the archive in Orange. 
 It's basically lets the archive know, "the next primitives 
 belong to a struct".

Knowing the struct type name one may select the matching schema. In the case of PB the schema collection is just int[string][string] - maps type names to field maps.
Mar 31 2013
prev sibling next sibling parent "Kagamin" <spam here.lot> writes:
On Sunday, 31 March 2013 at 17:33:28 UTC, Jacob Carlborg wrote:
 On 2013-03-31 12:40, Kagamin wrote:

 http://dpaste.dzfl.pl/0f7d8219

So instead Serializer gets a huge API?

The point is to have Serializer as an encapsulation point, not archiver. API can remain the same, it's just calls to the archiver are to be routed through Serializer's virtual methods, not archiver's.
Mar 31 2013
prev sibling next sibling parent "Kagamin" <spam here.lot> writes:
On Sunday, 31 March 2013 at 17:33:28 UTC, Jacob Carlborg wrote:
 On 2013-03-31 12:40, Kagamin wrote:

 http://dpaste.dzfl.pl/0f7d8219

So instead Serializer gets a huge API?

As an alternative routing can be handled in the archiver by a template mixin, which defines virtual methods and routes them to a templated method written by the implementer. So it will be the implementer's choice whether to use mixin helper or implement methods manually.
Mar 31 2013
prev sibling next sibling parent "Kagamin" <spam here.lot> writes:
Manual implementation is probably better for PB.
Mar 31 2013
prev sibling next sibling parent "Kagamin" <spam here.lot> writes:
A "MyArchive" example can be useful too. The basic idea is to 
write a minimal archive class with basic test code. All methods 
assert(false,"method archive(dchar) not implemented"); the 
example complies and runs, but asserts. So people take the 
example and fill methods with their own implementations, thus 
incrementally building their archive class.
Mar 31 2013
prev sibling next sibling parent "Kagamin" <spam here.lot> writes:
On Sunday, 31 March 2013 at 21:25:48 UTC, Jacob Carlborg wrote:
 Ok. I'm not familiar with Protocol Buffers.

Well, the basic idea of EXI and similar standards is that you can have 2 types of serialization: built-in when you keep schema in the serialized message - which value belongs to which field (this way you can read and write any data structure) or schema-informed when the serializer knows what data it works with, so it omits schema from the message and e.g. writes two int fields as just consecutive 8 bytes - it knows that these 8 bytes are 2 ints and which field each belongs to; the drawback is that you can't read the message without schema, the advantage is smaller message size and faster serialization.
Mar 31 2013
prev sibling next sibling parent "Jesse Phillips" <Jessekphillips+d gmail.com> writes:
On Sunday, 31 March 2013 at 11:23:27 UTC, Kagamin wrote:
 On Saturday, 30 March 2013 at 20:02:48 UTC, Jesse Phillips 
 wrote:
 3) Serialization is done by message (struct) and not by 
 primitives

PB does serialize by primitives and Archive has archiveStruct method which is called to serialize struct, I believe. At first sight orange serializes using built-in grammar (in EXI terms), and since PB uses schema-informed grammar, you have to provide schema to the archiver: either keep it in the archiver or store globally.

Thank you, you've described it much better. When I saide "by message" I was referring to what you have more accurately stated as requiring a schema. I'm not well versed in PB or Orange so I'd need to play around more with both, but I'm pretty sure Orange would need changes made to be able to make the claim PB is supported. It should be possible to create a binary format based on PB.
Mar 31 2013
prev sibling next sibling parent "Kagamin" <spam here.lot> writes:
It's a pull parser? Hmm... how reordered fields are supposed to 
be handled? When the archiver is requested for a field, it will 
probably need to look ahead for the field in the entire message. 
Also arrays can be discontinuous both in xml and in pb. Also if 
the archiver is requested for a missing field, it may be a bad 
idea to return typeof(return).init as it will overwrite the 
default value for the field in the structure. Though, this may be 
a minor issue: field usually is missing because it's obsolete, 
but the serializer will spend time requesting missing fields.

As a schema-informed serialization, PB works better with 
specialized code, so it's better to provide a means for 
specialized serialization, where components will be tightly 
coupled, and the archiver will have full access to the serialized 
type and will be able to infer schema. Isn't serialization 
simpler when you have access to the type?
Mar 31 2013
prev sibling next sibling parent "Jesse Phillips" <Jessekphillips+D gmail.com> writes:
On Monday, 1 April 2013 at 08:53:51 UTC, Jacob Carlborg wrote:
 On 2013-04-01 01:39, Jesse Phillips wrote:

 I'm not well versed in PB or Orange so I'd need to play around 
 more with
 both, but I'm pretty sure Orange would need changes made to be 
 able to
 make the claim PB is supported. It should be possible to 
 create a binary
 format based on PB.

Isn't PB binary? Or it actually seems it can be both.

Let me see if I can describe this. PB does encoding to binary by type. However it also has a schema in a .proto file. My first concern is that this file provides the ID to use for each field, while arbitrary the ID must be what is specified. The second one I'm concerned with is option to pack repeated fields. I'm not sure the specifics for this encoding, but I imagine some compression. This is why I think I'd have to implement my own Serializer to be able to support PB, but also believe we could have a binary format based on PB (which maybe it would be possible to create a schema of Orange generated data, but it would be hard to generate data for a specific schema).
Apr 01 2013
prev sibling next sibling parent "Kagamin" <spam here.lot> writes:
AFAIK, it's opposite: an array serialized in chunks, and they are 
concatenated on deserialization. Useful if you don't know how 
much elements you're sending, so you send them in finite chunks 
as the data becomes available. Client can also close connection, 
so you don't have to see the end of the sequence.
Apr 01 2013
prev sibling next sibling parent "Kagamin" <spam here.lot> writes:
Oh, wait, it looks like a (possibly infinite) range rather than 
an array.
Apr 01 2013
prev sibling next sibling parent "Kagamin" <spam here.lot> writes:
On Monday, 1 April 2013 at 18:41:57 UTC, Matt Soucy wrote:
 The "packed repeated fields" section explains it and breaks it 
 down with an example. If the client can close like that, you 
 probably don't want to use packed.

Why not? If you transfer a result of google search, the client will be able to peek only N first results and close or not. Though I agree it's strange that you can only transfer primitive types this way.
Apr 01 2013
prev sibling next sibling parent "Kagamin" <spam here.lot> writes:
So as we see PB doesn't support arrays (and ranges). Hmm... 
that's unfortunate.
Apr 01 2013
prev sibling next sibling parent "Jesse Phillips" <Jessekphillips+D gmail.com> writes:
On Monday, 1 April 2013 at 17:24:05 UTC, Matt Soucy wrote:
 From what I got from the examples, Repeated fields are done 
 roughly as following:
 auto msg = fields.map!(a=>a.serialize())().reduce!(a,b=>a~b)();
 return ((id<<3)|2) ~ msg.length.toVarint() ~ msg;
 Where msg is a ubyte[].
 -Matt Soucy

I think that would fall under some form of compression, namely compressing the ID :P BTW, I love how easy that is to read.
Apr 01 2013
prev sibling next sibling parent "Kagamin" <spam here.lot> writes:
On Monday, 1 April 2013 at 19:37:12 UTC, Matt Soucy wrote:
 It's not really strange, because of how it actually does the 
 serialization. A message is recorded as length+serialized 
 members. Members can happen in any order. Packed repeated 
 messages would look like...what? How do you know when one 
 message ends and another begins? If you try and denote it, 
 you'd just end up with what you already have.

Well, messages can be just repeated, not packed. Packing is for really small elements, I guess, - namely numbers.
 In your example, you'd want to send each individual result as a 
 distinct message, so they could be read one at a time. You 
 wouldn't want to pack, as packing is for sending a whole data 
 set at once.

So you suggest to send 1 message per TCP packet?
Apr 01 2013
prev sibling next sibling parent "Kagamin" <spam here.lot> writes:
On Monday, 1 April 2013 at 21:11:57 UTC, Matt Soucy wrote:
 And therefore, it supports arrays just fine (as repeated 
 fields). Yes. That last sentence was poorly-worded, and should 
 have said "you'd just end up with the un'packed' data with an 
 extra header."

It says repeated messages should be merged which results in one message, not an array of messages. So from several repeated messages you get one as if they formed contiguous soup of fields which got parsed as one message: e.g. scalar fields of the resulting message take their last seen values.
 Unfortunately, I'm not particularly knowledgeable about 
 networking, but that's not quite what I meant. I meant that the 
 use case itself would result in sending individual Result 
 messages one at a time, since packing (even if it were valid) 
 wouldn't be useful and would require getting all of the Results 
 at once. You would just leave off the "packed" attribute.

As you said, there's no way to tell where one message ends and next begins. If you send them one or two at a time, they end up as a contiguous stream of bytes. If one is to delimit messages, he should define a container format as an extension on top of PB with additional semantics for representation of arrays, which results in another protocol. And even if you define such protocol, there's still no way to have array fields in PB messages (arrays of non-trivial types). For example if you want to update students and departments with one method, the obvious choice is to pass it a dictionary of key-value pairs of new values for the object's attributes. How to do that?
Apr 01 2013
prev sibling next sibling parent Jacob Carlborg <doob me.com> writes:
On 2013-03-24 22:03, Jacob Carlborg wrote:
 std.serialization (orange) is now ready to be reviewed.

I've been working on a binary archive with the following format: FileFormat := CompoundArrayOffset Data CompoundArray CompoundArrayOffset := AbsOffset # Offset of the compound array AbsOffset := 4B # Absolute offset from the beginning of FileFormat CompoundArray := Compound* # An array of Compound CompoundOffset := 4B # Offset into CompoundArray Data := Type* Type := String | Array | Compound | AssociativeArray | Pointer | Enum | Primitive Compound := ClassData | StructData String := Length 4B* | 2B* | 1B* Array := Length Type* Class := CompoundOffset Struct := CompoundOffset ClassData := String Field* StructData := Field* Field := Type Length := 4B Primitive := Bool | Byte | Cdouble | Cfloat | Char | Creal | Dchar | Double | Float | Idouble | Ifloat | Int | Ireal | Long | Real | Short | Ubyte | Uint | Ulong | Ushort | Wchar Bool := 1B Byte := 1B Cdouble := 8B 8B Cfloat := 8B Char := 1B Creal := 8B 8B 8B 8B Dchar := 4B Double := 8B Float := 4B Idouble := 8B Ifloat := 4B Int := 4B Ireal := 8B 8B Long := 8B Real := 8B 8B 8B 8B Short := 2B Ubyte := 1B Uint := 4B Ulong := 8B Ushort := 2B Wchar := 2B 1B := 1Byte 2B := 2Bytes 4B := 4Bytes 8B := 8Bytes How does this look like? -- /Jacob Carlborg
Apr 02 2013
prev sibling next sibling parent reply "Baz" <burg.basile yahoo.com> writes:
On Sunday, 24 March 2013 at 21:03:59 UTC, Jacob Carlborg wrote:
 std.serialization (orange) is now ready to be reviewed.

I'm fundatemtaly agaisnt the integration of Orange into the std lib. The basic problem is that there is no flag in the D object model for the serialization (e.g: the "published" attribute in Pascal). In the same order of idea, the doc about RTI is null. In fact there's even no RTI usefull for an "academic" serialization system. No no no.
Jun 15 2013
parent Jacob Carlborg <doob me.com> writes:
On 2013-06-15 20:54, Baz wrote:

 I'm fundatemtaly agaisnt the integration of Orange into the std lib.
 The basic problem is that there is no flag in the D object model for the
 serialization (e.g: the "published" attribute in Pascal).

Why does that matter? -- /Jacob Carlborg
Jun 15 2013
prev sibling next sibling parent "Dicebot" <public dicebot.lv> writes:
On Saturday, 15 June 2013 at 18:54:35 UTC, Baz wrote:
 The basic problem is that there is no flag in the D object 
 model for the serialization

 In the same order of idea, the doc about RTI is null. In fact 
 there's even no RTI usefull for an "academic" serialization 
 system.

RTI (rtti) is hardly relevant here, as far as I can see, std.serialization does static reflection.
Jun 15 2013
prev sibling next sibling parent "Francois Chabot" <francois chabs.ca> writes:
On Tuesday, 2 April 2013 at 18:50:04 UTC, Jacob Carlborg wrote:
 It probably makes sense if one sends the data over the network 
 and the data is mostly value based. I usually have an object 
 hierarchy with many reference types and objects passed around.

I think that's kinda the sticking point with Orange for me. Integrating it in my current project implied bringing in a huge amount of code when my needs are super-straightforward. I actually ended up writing myself a super-light range-based serializer that'll handle any combination of struct/Array/AA thrown at it. (for reference: https://github.com/Chabsf/flyb/blob/SerializableHistory/serialize.d ). It also does not copy data around, and instead just casts the data as arrays of bytes. It's under 300 lines of D code and does everything I need from Orange, and does it very fast. Orange is just so very overkill for my needs. My point is that serializing POD-based structures in D is so simple that using a one-size-fit-all serializer made to handle absolutely everything feels very wasteful.
Jun 17 2013
prev sibling next sibling parent "Francois Chabot" <francois chabs.ca> writes:
 My point is that serializing POD-based structures in D is so 
 simple that
 using a one-size-fit-all serializer made to handle absolutely 
 everything
 feels very wasteful.

You don't have to use it.

But I do use it, quite a bit. Whenever I have any form of polymorphic serialization, Orange is excellent. I find myself effectively choosing between two serialization libraries based on my needs right now, which is actually a very good thing. The issue here is not whether I should be using Orange or not in a given project, but whether it should become std.serialization. If that happens, then I will find myself "forced" to use it, the same way I should be "forced" to use std::vector in C++ unless I am dealing with a truly exceptional situation. But serializing data to send on a socket efficiently should NOT be treated as an exceptional case. If anything, I think it should be the base case. If/When std.serialization exists, then it will effectively become the goto serialization mechanism for any D newcomer, and it really should handle that case efficiently and elegantly IMHO. I will grant that making Orange part of Phobos will alleviate the project bloat issue, which is a huge deal. But as it stands, to me, it only handles a subset of what std.serialization should.
Jun 17 2013
prev sibling parent "Francois Chabot" <francois chabs.ca> writes:
On Monday, 17 June 2013 at 20:59:31 UTC, Jacob Carlborg wrote:
 On 2013-06-17 16:39, Francois Chabot wrote:

 I will grant that making Orange part of Phobos will alleviate 
 the
 project bloat issue, which is a huge deal. But as it stands, 
 to me, it
 only handles a subset of what std.serialization should.

So what would you like it to handle? I assume you want a binary archive and you want faster serialization? You are free to add enhancement requests to the github project and comment in the official review thread.

Well, the main thing that I want on my end is a O(1) memory footprint when dealing with hierarchical data with no cross-references. Even better would be that serialization being a lazy input range that can be easily piped into something like Walter's std.compress. I guess I could log an enhancement request to that effect, but I kinda felt that this was beyond the scope of Orange. It has a clear serialize, then deal with the data design. What I need really goes against the grain here.
 The thing is with Orange is that it's possible to add new 
 archive types. If Orange gets accepted as std.serialization we 
 could later add a binary archive.

Once again, the sticking point is not the serialization format, but the delivery method of said data. Hmmm, come to think of it, running Orange in a fiber with a special archive type and wrapping the whole thing in a range could work. However, I'm not familiar enough with how aggressively Orange caches potential references to know if it would work. Also, due to the polymorphic nature of archives, archive types with partial serialization support would be a pain, as they would generate runtime errors, not compile-time.
 Do you want to stop std.serialization just because of it not 
 having a binary archive? Not having a binary archive doesn't 
 make the XML archive useless.

No! no no no. I just feel that Orange handles a big piece of the serialization puzzle, but that there's a lot more to it.
Jun 17 2013