digitalmars.D - Request for review - std.serialization (orange)

Jacob Carlborg (29/29) Mar 24 2013 std.serialization (orange) is now ready to be reviewed.

Andrej Mitrovic (6/8) Mar 24 2013 A small example actually writing the xml file to disk and the reading

Jacob Carlborg (6/11) Mar 25 2013 Ok, so just adding write and read to disk to the usage example on the
Jacob Carlborg (5/7) Mar 27 2013 PhobosXml is a local copy of std.xml with a few small modifications. If

Manu (24/32) Mar 24 2013 Just at a glance, a few things strike me...

Jacob Carlborg (75/99) Mar 25 2013 It's necessary to have a class or struct to pass around. The serializer

Kagamin (2/19) Mar 31 2013 http://dpaste.dzfl.pl/0f7d8219

Jacob Carlborg (4/5) Mar 31 2013 So instead Serializer gets a huge API?

Kagamin (2/5) Mar 31 2013 Are a lot of serializers expected to be written? Archives are.

Jacob Carlborg (4/5) Mar 31 2013 Hmm, maybe it could work.

Kagamin (5/8) Mar 31 2013 The point is to have Serializer as an encapsulation point, not
Kagamin (6/9) Mar 31 2013 As an alternative routing can be handled in the archiver by a

Kagamin (1/1) Mar 31 2013 Manual implementation is probably better for PB.
Kagamin (6/6) Mar 31 2013 A "MyArchive" example can be useful too. The basic idea is to

Jacob Carlborg (4/10) Apr 01 2013 Yes, if the API is change to what you're suggesting.

Jacob Carlborg (5/6) Mar 25 2013 Just so there is no confusion. If it gets accepted I will replace tabs
Jesse Phillips (32/32) Mar 30 2013 Hello Jacob,

Jacob Carlborg (30/60) Mar 30 2013 That shouldn't be a problem. Preferably it should support some kind of
Kagamin (7/9) Mar 31 2013 PB does serialize by primitives and Archive has archiveStruct

Jacob Carlborg (5/10) Mar 31 2013 The actual struct is never passed to the archive in Orange. It's

Kagamin (4/7) Mar 31 2013 Knowing the struct type name one may select the matching schema.

Jacob Carlborg (4/7) Mar 31 2013 Ok. I'm not familiar with Protocol Buffers.

Kagamin (11/12) Mar 31 2013 Well, the basic idea of EXI and similar standards is that you can

Jacob Carlborg (4/13) Apr 01 2013 I see.

Kagamin (15/15) Mar 31 2013 It's a pull parser? Hmm... how reordered fields are supposed to

Jacob Carlborg (13/27) Apr 01 2013 Optional fields are possible to implement by writing a custom serializer...

Jesse Phillips (8/18) Mar 31 2013 Thank you, you've described it much better. When I saide "by

Jacob Carlborg (4/8) Apr 01 2013 Isn't PB binary? Or it actually seems it can be both.

Jesse Phillips (14/23) Apr 01 2013 Let me see if I can describe this.

Matt Soucy (7/27) Apr 01 2013 From what I got from the examples, Repeated fields are done roughly as

Kagamin (5/5) Apr 01 2013 AFAIK, it's opposite: an array serialized in chunks, and they are

Kagamin (2/2) Apr 01 2013 Oh, wait, it looks like a (possibly infinite) range rather than
Matt Soucy (6/11) Apr 01 2013 https://developers.google.com/protocol-buffers/docs/encoding#optional

Kagamin (5/8) Apr 01 2013 Why not? If you transfer a result of google search, the client

Kagamin (2/2) Apr 01 2013 So as we see PB doesn't support arrays (and ranges). Hmm...
Matt Soucy (9/16) Apr 01 2013 It's not really strange, because of how it actually does the

Kagamin (4/14) Apr 01 2013 Well, messages can be just repeated, not packed. Packing is for

Matt Soucy (10/22) Apr 01 2013 And therefore, it supports arrays just fine (as repeated fields). Yes.

Kagamin (18/28) Apr 01 2013 It says repeated messages should be merged which results in one

Matt Soucy (22/47) Apr 01 2013 They're merged if the field is optional and gets multiple inputs with

Jesse Phillips (4/10) Apr 01 2013 I think that would fall under some form of compression, namely

Matt Soucy (8/18) Apr 01 2013 So, I looked at the code I'm currently working on to handle these...and

Jacob Carlborg (24/35) Apr 02 2013 As I understand it there's a "schema definition", that is the .proto

Matt Soucy (21/61) Apr 02 2013 Unfortunately, only partially correct. Optional isn't an "option", it's

Jacob Carlborg (19/39) Apr 02 2013 With "option", I mean you don't have to use it in the schema. But the

Matt Soucy (10/49) Apr 02 2013 OK, I see what you mean. PB uses the term "option" for language

Jacob Carlborg (11/18) Apr 02 2013 Right. The archive gets the names, it's then up to the archive how to

Francois Chabot (15/18) Jun 17 2013 I think that's kinda the sticking point with Orange for me.

Jacob Carlborg (8/22) Jun 17 2013 It's a lot easier to create a serializer which doesn't handle the full

Francois Chabot (18/24) Jun 17 2013 But I do use it, quite a bit. Whenever I have any form of

Jacob Carlborg (12/15) Jun 17 2013 So what would you like it to handle? I assume you want a binary archive

Francois Chabot (20/36) Jun 17 2013 Well, the main thing that I want on my end is a O(1) memory

Jacob Carlborg (17/28) Jun 18 2013 That's kind of hard since it creates data. But if you mean except for

"Daniele =?UTF-8?B?Qm9uZMOsIg==?= <daniele.bondi.dev gmail.com> (1/1) Oct 06 2014 Any news on this?

Jacob Carlborg (5/6) Oct 06 2014 No, I'm currently working on D/Objective-C [1].

Jacob Carlborg (50/51) Apr 02 2013 I've been working on a binary archive with the following format:
Baz (9/10) Jun 15 2013 I'm fundatemtaly agaisnt the integration of Orange into the std

Jacob Carlborg (4/7) Jun 15 2013 Why does that matter?
Dicebot (4/9) Jun 15 2013 RTI (rtti) is hardly relevant here, as far as I can see,

Jacob Carlborg <doob me.com> writes:

std.serialization (orange) is now ready to be reviewed.

A couple of notes for the review:

* The most important packages are: orange.serialization and 
orange.serialization.archives

* The unit tests are located in its own package, I'm not very happy 
about putting the unit tests in the same module as the rest of the code, 
i.e. the serialization module. What are the options? These test are 
quite high level. They test the whole Serializer class and not 
individual functions.

* I'm using some utility functions located in the "util" and "core" 
packages, what should we do about those, where to put them?

* Trailing whitespace and tabs will be fixed when/if the package gets 
accepted

* If this get accepted should I do a sub-tree merge (or what it's 
called) to keep the history intact?

Changes since last time:

* I've removed any Tango and D1 related code
* I've removed all unused functions (hopefully)

For usage examples, see the github wiki pages: 
https://github.com/jacob-carlborg/orange/wiki/_pages

For more extended usage examples, see the unit tests: 
https://github.com/jacob-carlborg/orange/tree/master/tests

Sources: https://github.com/jacob-carlborg/orange
Documentation: https://dl.dropbox.com/u/18386187/orange_docs/Serializer.html
Run unit tests: execute the unittest.sh shell script

(Don't forget clicking the "Package" tab in the top corner to see the 
documentation for the rest of the modules)

-- 
/Jacob Carlborg

Mar 24 2013

Andrej Mitrovic <andrej.mitrovich gmail.com> writes:

On 3/24/13, Jacob Carlborg <doob me.com> wrote:
 For usage examples, see the github wiki pages:
 https://github.com/jacob-carlborg/orange/wiki/_pages

A small example actually writing the xml file to disk and the reading
back from it would be beneficial.

Btw the library doesn't build with the -w switch:

orange\xml\PhobosXml.d(2536): Error: switch case fallthrough - use
'goto case;' if intended

Mar 24 2013

Jacob Carlborg <doob me.com> writes:

On 2013-03-24 22:41, Andrej Mitrovic wrote:

 A small example actually writing the xml file to disk and the reading
 back from it would be beneficial.

Ok, so just adding write and read to disk to the usage example on the 
github page?

 Btw the library doesn't build with the -w switch:

 orange\xml\PhobosXml.d(2536): Error: switch case fallthrough - use
 'goto case;' if intended

Good catch.

-- 
/Jacob Carlborg

Mar 25 2013

Jacob Carlborg <doob me.com> writes:

On 2013-03-24 22:41, Andrej Mitrovic wrote:

 orange\xml\PhobosXml.d(2536): Error: switch case fallthrough - use
 'goto case;' if intended

PhobosXml is a local copy of std.xml with a few small modifications. If 
accepted I'll make the changes to std.xml and remove PhobosXml.

-- 
/Jacob Carlborg

Mar 27 2013

Manu <turkeyman gmail.com> writes:

Just at a glance, a few things strike me...

Phobos doesn't typically use classes, seems to prefer flat functions. Are
we happy with classes in this instance?
Use of caps in the filenames/functions is not very phobos like.

Can I have a post-de-serialise callback to recalculate transient data?

Why register serialiser's, and structures that can be operated on? (I'm not
a big fan of registrations of this sort personally, if they can be avoided)

Is there a mechanism to deal with pointers, or do you just serialise
through the pointer? Some sort of reference system so objects pointing at
the same object instance will deserialise pointing at the same object
instance (or a new copy thereof)?

Is it fast? I see in your custom deserialise example, you deserialise
members by string name... does it need to FIND those in the stream by name,
or does it just use that to validate the sequence?
I have a serialiser that serialises in realtime (60fps), a good fair few
megabytes of data per frame... will orange handle this?

Documentation, what attributes are available? How to use them?

You only seem to provide an XML backend. What about JSON? Binary (with
endian awareness)?

Writing an Archiver looks a lot more involved than I would have imagined.
XmlArchive.d is huge, mostly just 'ditto'.
Should unarchiveXXX() not rather be unarchive!(XXX)(), allowing to minimise
most of those function definitions?


On 25 March 2013 07:41, Andrej Mitrovic <andrej.mitrovich gmail.com> wrote:

 On 3/24/13, Jacob Carlborg <doob me.com> wrote:
 For usage examples, see the github wiki pages:
 https://github.com/jacob-carlborg/orange/wiki/_pages

 A small example actually writing the xml file to disk and the reading
 back from it would be beneficial.

 Btw the library doesn't build with the -w switch:

 orange\xml\PhobosXml.d(2536): Error: switch case fallthrough - use
 'goto case;' if intended

Mar 24 2013

Jacob Carlborg <doob me.com> writes:

On 2013-03-25 02:16, Manu wrote:
 Just at a glance, a few things strike me...

 Phobos doesn't typically use classes, seems to prefer flat functions.

It's necessary to have a class or struct to pass around. The serializer 
is passed to method/functions doing custom serialization. I could create 
a free function that encapsulates the classes for the common use cases.

 Are we happy with classes in this instance?
 Use of caps in the filenames/functions is not very phobos like.

Yeah, that will be fixed if accepted. As you see, it's still a separate 
library and not included into Phobos.

 Can I have a post-de-serialise callback to recalculate transient data?

Yes. There are three ways to custom the serialization process.

1. Take complete control of the process (for the type) by adding 
toData/fromData to your types

https://github.com/jacob-carlborg/orange/wiki/Custom-Serialization

2. Take complete control of the process (for the type) by registering a 
function pointer/delegate as a serializer for a given type. Useful for 
serializing third party types

https://github.com/jacob-carlborg/orange/wiki/Non-Intrusive-Serialization

3. Add the onDeserialized attribute to a method in the type being serialized

https://github.com/jacob-carlborg/orange/blob/master/tests/Events.d#L75
https://dl.dropbox.com/u/18386187/orange_docs/Events.html

I noticed that the documentation for the attributes don't look so good.

 Why register serialiser's, and structures that can be operated on? (I'm
 not a big fan of registrations of this sort personally, if they can be
 avoided)

The only time when registering a serializer is really necessary is when 
serializing through a base class reference. Otherwise the use cases are 
when customizing the serialization process.

 Is there a mechanism to deal with pointers, or do you just serialise
 through the pointer? Some sort of reference system so objects pointing
 at the same object instance will deserialise pointing at the same object
 instance (or a new copy thereof)?

Yes. All references types (including pointers) are only serialized ones. 
If a pointer, that is serialized, is pointing to data not being 
serialized it serialize what it's pointing to as well.

If you're curious about the internals I suggest you serialize some 
class/strcut hierarchy and look at the XML data. It should be readable.

 Is it fast? I see in your custom deserialise example, you deserialise
 members by string name... does it need to FIND those in the stream by
 name, or does it just use that to validate the sequence?

That's up to the archive how to implemented. But the idea is that it 
should be able to find by name in the serialized data. That is kind of 
an implicit contract between the archive and the serializer.

 I have a serialiser that serialises in realtime (60fps), a good fair few
 megabytes of data per frame... will orange handle this?

Probably not. I think it mostly depends on the archive used. The XML 
module in Phobos is really, REALLY slow. Serializing the same data with 
Tango (D1) is at least twice as fast. I have started to work on an 
archive type that just tries to be as fast as possible. That:

* Break the implicit contract with the serializer
* Doesn't care about endians
* Doesn't care if the fields have changed
* May not handle slices correctly
* And some other things

 Documentation, what attributes are available? How to use them?

https://dl.dropbox.com/u/18386187/orange_docs/Events.html
https://dl.dropbox.com/u/18386187/orange_docs/Serializable.html

Is this clear enough?

 You only seem to provide an XML backend. What about JSON? Binary (with
 endian awareness)?

Yeah, that is not implemented yet. Is it necessary before adding to to 
Phobos?

 Writing an Archiver looks a lot more involved than I would have
 imagined. XmlArchive.d is huge, mostly just 'ditto'.
 Should unarchiveXXX() not rather be unarchive!(XXX)(), allowing to
 minimise most of those function definitions?

Yeah, it has kind of a big API. The reason is to be able to use 
interfaces. Seriailzer contains a reference to an archive, typed as the 
interface Archive. If you're using custom serialization I don't think it 
would be good to lock yourself to a specific archive type.

BTW, unarchiveXXX is forwarded to a private unarchive!(XXX)() in XmlArchive.

With classes and interfaces:

class Serializer
interface Archive
class XmlArchive : Archive

Archive archive = new XmlArchive;
auto serializer = new Serializer(archive);

struct Foo
{
     void toData (Serializer serializer, Serializer.Data key);
}

With templates:

class Serializer (T)
class XmlArchive

auto archive = new XmlArchive;
auto serializer = new Serializer!(XmlArchive)(archive);

struct Foo
{
     void toData (Serializer!(XmlArchive) serializer, Serializer.Data key);
}

Foo is now locked to the XmlArchive. Or:

class Bar
{
     void toData (T) (Serializer!(T) serializer, Serializer.Data key);
}

toData cannot be virtual.

-- 
/Jacob Carlborg

Mar 25 2013

"Kagamin" <spam here.lot> writes:

On Monday, 25 March 2013 at 08:53:32 UTC, Jacob Carlborg wrote:
 With templates:

 class Serializer (T)
 class XmlArchive

 auto archive = new XmlArchive;
 auto serializer = new Serializer!(XmlArchive)(archive);

 struct Foo
 {
     void toData (Serializer!(XmlArchive) serializer, 
 Serializer.Data key);
 }

 Foo is now locked to the XmlArchive. Or:

 class Bar
 {
     void toData (T) (Serializer!(T) serializer, Serializer.Data 
 key);
 }

 toData cannot be virtual.

http://dpaste.dzfl.pl/0f7d8219

Mar 31 2013

Jacob Carlborg <doob me.com> writes:

On 2013-03-31 12:40, Kagamin wrote:

 http://dpaste.dzfl.pl/0f7d8219

So instead Serializer gets a huge API?

-- 
/Jacob Carlborg

Mar 31 2013

"Kagamin" <spam here.lot> writes:

On Sunday, 31 March 2013 at 17:33:28 UTC, Jacob Carlborg wrote:
 On 2013-03-31 12:40, Kagamin wrote:

 http://dpaste.dzfl.pl/0f7d8219

 So instead Serializer gets a huge API?

Are a lot of serializers expected to be written? Archives are.

Mar 31 2013

Jacob Carlborg <doob me.com> writes:

On 2013-03-31 21:02, Kagamin wrote:

 Are a lot of serializers expected to be written? Archives are.

Hmm, maybe it could work.

-- 
/Jacob Carlborg

Mar 31 2013

"Kagamin" <spam here.lot> writes:

On Sunday, 31 March 2013 at 17:33:28 UTC, Jacob Carlborg wrote:
 On 2013-03-31 12:40, Kagamin wrote:

 http://dpaste.dzfl.pl/0f7d8219

 So instead Serializer gets a huge API?

The point is to have Serializer as an encapsulation point, not 
archiver. API can remain the same, it's just calls to the 
archiver are to be routed through Serializer's virtual methods, 
not archiver's.

Mar 31 2013

"Kagamin" <spam here.lot> writes:

On Sunday, 31 March 2013 at 17:33:28 UTC, Jacob Carlborg wrote:
 On 2013-03-31 12:40, Kagamin wrote:

 http://dpaste.dzfl.pl/0f7d8219

 So instead Serializer gets a huge API?

As an alternative routing can be handled in the archiver by a 
template mixin, which defines virtual methods and routes them to 
a templated method written by the implementer. So it will be the 
implementer's choice whether to use mixin helper or implement 
methods manually.

Mar 31 2013

"Kagamin" <spam here.lot> writes:

Manual implementation is probably better for PB.

Mar 31 2013

"Kagamin" <spam here.lot> writes:

A "MyArchive" example can be useful too. The basic idea is to 
write a minimal archive class with basic test code. All methods 
assert(false,"method archive(dchar) not implemented"); the 
example complies and runs, but asserts. So people take the 
example and fill methods with their own implementations, thus 
incrementally building their archive class.

Mar 31 2013

Jacob Carlborg <doob me.com> writes:

On 2013-03-31 23:40, Kagamin wrote:
 A "MyArchive" example can be useful too. The basic idea is to write a
 minimal archive class with basic test code. All methods
 assert(false,"method archive(dchar) not implemented"); the example
 complies and runs, but asserts. So people take the example and fill
 methods with their own implementations, thus incrementally building
 their archive class.

Yes, if the API is change to what you're suggesting.

-- 
/Jacob Carlborg

Apr 01 2013

Jacob Carlborg <doob me.com> writes:

On 2013-03-24 22:03, Jacob Carlborg wrote:

 std.serialization (orange) is now ready to be reviewed.

Just so there is no confusion. If it gets accepted I will replace tabs 
with spaces, fix the column limit and change all filenames to lowercase.

-- 
/Jacob Carlborg

Mar 25 2013

"Jesse Phillips" <Jesse.K.Phillips+D gmail.com> writes:

Hello Jacob,

These comments are based on looking into adding Protocol Buffer 
as an archive. First some details on the PB format.
https://developers.google.com/protocol-buffers/docs/overview

1) It is a binary format
2) Not all D types can be serialized
3) Serialization is done by message (struct) and not by primitives
4) It defines options which can affect (de)serializing.

I am looking at using Serializer to drive (de)serialization even 
if that meant just jamming it in there where Orange could only 
read PB data it has written. Keep in mind I'm not saying these 
are requirements or that I know what I'm talking about, only my 
thoughts.

My first thought was at a minimum I could just use a function 
which does the complete (de)serialization of the type. Which 
would be great since the pbcompiler I'm using/modifying already 
does this.

Because of the way custom serialization I'm stopped by point 3. I 
didn't realize that at first so I also looked at implementing an 
Archive. What I notice here is

* Information is lost, specifically the attributes (more 
important with UDA).
* I am required to implement conversions I have no implementation 
for.

This leaves me concluding that I'd need to implement my own 
Serializer, which seems to me I'm practically reimplementing most 
of Orange to use Orange with PB.

Does having Orange support things like PB make sense?

I think some work could be done for the Archive API as it doesn't 
feel like D2. Maybe we could look at locking down custom 
Archive/Serializer classes while the internals are worked out 
(would mean XML (de)serialization is available in Phobos).

Mar 30 2013

Jacob Carlborg <doob me.com> writes:

On 2013-03-30 21:02, Jesse Phillips wrote:
 Hello Jacob,

 These comments are based on looking into adding Protocol Buffer as an
 archive. First some details on the PB format.
 https://developers.google.com/protocol-buffers/docs/overview

 1) It is a binary format

That shouldn't be a problem. Preferably it should support some kind of 
identity map and be able to deserialize fields in any order.

 2) Not all D types can be serialized

Any data format that supports some kind of key-value mapping should be 
able to serialize all D types. Although, possibly in a format that is 
not idiomatic for that data format. XML doesn't have any types and the 
XML archive can serialize any D type.

 3) Serialization is done by message (struct) and not by primitives

I'm not sure I understand this.

 4) It defines options which can affect (de)serializing.

While Orange doesn't support the options Protocol Buffer seems to use 
directly, it should be possible by customizing the serialization of a 
type. See:

https://github.com/jacob-carlborg/orange/wiki/Custom-Serialization
https://github.com/jacob-carlborg/orange/wiki/Non-Intrusive-Serialization

Alternatively they are useful enough to have direct support in the 
serializer.

 I am looking at using Serializer to drive (de)serialization even if that
 meant just jamming it in there where Orange could only read PB data it
 has written. Keep in mind I'm not saying these are requirements or that
 I know what I'm talking about, only my thoughts.

That should be possible. I've been working a binary archive that tries 
to be as fast as possible, breaking rules to the left and right, doesn't 
conform to the implicit contract between the serializer and archive and 
so on.

 My first thought was at a minimum I could just use a function which does
 the complete (de)serialization of the type. Which would be great since
 the pbcompiler I'm using/modifying already does this.

 Because of the way custom serialization I'm stopped by point 3. I didn't
 realize that at first so I also looked at implementing an Archive. What
 I notice here is

 * Information is lost, specifically the attributes (more important with
 UDA).

Do you want UDA's passed to the archive for a give type or field? I 
don't know how easy that would be to implement. It would probably 
require a template method in the archive, which I would like to avoid, 
since it wouldn't be possible to use via an interface.

 * I am required to implement conversions I have no implementation for.

Just implement an empty method for any method you don't have use for. If 
it needs to return a value, you can most of return typeof(return).init.

 This leaves me concluding that I'd need to implement my own Serializer,
 which seems to me I'm practically reimplementing most of Orange to use
 Orange with PB.

That doesn't sound good.

 Does having Orange support things like PB make sense?

I think so.

 I think some work could be done for the Archive API as it doesn't feel
 like D2.

It started for D1.

 Maybe we could look at locking down custom Archive/Serializer
 classes while the internals are worked out (would mean XML
 (de)serialization is available in Phobos).

-- 
/Jacob Carlborg

Mar 30 2013

"Kagamin" <spam here.lot> writes:

On Saturday, 30 March 2013 at 20:02:48 UTC, Jesse Phillips wrote:
 3) Serialization is done by message (struct) and not by 
 primitives

PB does serialize by primitives and Archive has archiveStruct 
method which is called to serialize struct, I believe. At first 
sight orange serializes using built-in grammar (in EXI terms), 
and since PB uses schema-informed grammar, you have to provide 
schema to the archiver: either keep it in the archiver or store 
globally.

Mar 31 2013

Jacob Carlborg <doob me.com> writes:

On 2013-03-31 13:23, Kagamin wrote:

 PB does serialize by primitives and Archive has archiveStruct method
 which is called to serialize struct, I believe. At first sight orange
 serializes using built-in grammar (in EXI terms), and since PB uses
 schema-informed grammar, you have to provide schema to the archiver:
 either keep it in the archiver or store globally.

The actual struct is never passed to the archive in Orange. It's 
basically lets the archive know, "the next primitives belong to a struct".

-- 
/Jacob Carlborg

Mar 31 2013

"Kagamin" <spam here.lot> writes:

On Sunday, 31 March 2013 at 17:36:12 UTC, Jacob Carlborg wrote:
 The actual struct is never passed to the archive in Orange. 
 It's basically lets the archive know, "the next primitives 
 belong to a struct".

Knowing the struct type name one may select the matching schema. 
In the case of PB the schema collection is just 
int[string][string] - maps type names to field maps.

Mar 31 2013

Jacob Carlborg <doob me.com> writes:

On 2013-03-31 21:06, Kagamin wrote:

 Knowing the struct type name one may select the matching schema. In the
 case of PB the schema collection is just int[string][string] - maps type
 names to field maps.

Ok. I'm not familiar with Protocol Buffers.

-- 
/Jacob Carlborg

Mar 31 2013

"Kagamin" <spam here.lot> writes:

On Sunday, 31 March 2013 at 21:25:48 UTC, Jacob Carlborg wrote:
 Ok. I'm not familiar with Protocol Buffers.

Well, the basic idea of EXI and similar standards is that you can 
have 2 types of serialization: built-in when you keep schema in 
the serialized message - which value belongs to which field (this 
way you can read and write any data structure) or schema-informed 
when the serializer knows what data it works with, so it omits 
schema from the message and e.g. writes two int fields as just 
consecutive 8 bytes - it knows that these 8 bytes are 2 ints and 
which field each belongs to; the drawback is that you can't read 
the message without schema, the advantage is smaller message size 
and faster serialization.

Mar 31 2013

Jacob Carlborg <doob me.com> writes:

On 2013-03-31 23:57, Kagamin wrote:

 Well, the basic idea of EXI and similar standards is that you can have 2
 types of serialization: built-in when you keep schema in the serialized
 message - which value belongs to which field (this way you can read and
 write any data structure) or schema-informed when the serializer knows
 what data it works with, so it omits schema from the message and e.g.
 writes two int fields as just consecutive 8 bytes - it knows that these
 8 bytes are 2 ints and which field each belongs to; the drawback is that
 you can't read the message without schema, the advantage is smaller
 message size and faster serialization.

I see.

-- 
/Jacob Carlborg

Apr 01 2013

"Kagamin" <spam here.lot> writes:

It's a pull parser? Hmm... how reordered fields are supposed to 
be handled? When the archiver is requested for a field, it will 
probably need to look ahead for the field in the entire message. 
Also arrays can be discontinuous both in xml and in pb. Also if 
the archiver is requested for a missing field, it may be a bad 
idea to return typeof(return).init as it will overwrite the 
default value for the field in the structure. Though, this may be 
a minor issue: field usually is missing because it's obsolete, 
but the serializer will spend time requesting missing fields.

As a schema-informed serialization, PB works better with 
specialized code, so it's better to provide a means for 
specialized serialization, where components will be tightly 
coupled, and the archiver will have full access to the serialized 
type and will be able to infer schema. Isn't serialization 
simpler when you have access to the type?

Mar 31 2013

Jacob Carlborg <doob me.com> writes:

On 2013-04-01 07:15, Kagamin wrote:
 It's a pull parser? Hmm... how reordered fields are supposed to be
 handled? When the archiver is requested for a field, it will probably
 need to look ahead for the field in the entire message. Also arrays can
 be discontinuous both in xml and in pb. Also if the archiver is
 requested for a missing field, it may be a bad idea to return
 typeof(return).init as it will overwrite the default value for the field
 in the structure. Though, this may be a minor issue: field usually is
 missing because it's obsolete, but the serializer will spend time
 requesting missing fields.

Optional fields are possible to implement by writing a custom serializer 
for a given type.

The look ahead is not needed for the entire message. Only for the length 
of a class/strcut. But since fields of class can consist of other class 
it might not make a difference.

 As a schema-informed serialization, PB works better with specialized
 code, so it's better to provide a means for specialized serialization,
 where components will be tightly coupled, and the archiver will have
 full access to the serialized type and will be able to infer schema.
 Isn't serialization simpler when you have access to the type?

Yes, it would probably be simpler if the archive had access to the type. 
The idea behind Orange is that Serializer tries to do as much as 
possible of the implementation and leaves the data dependent parts to 
the archive. Also, the archive only needs to know how to serialize 
primitive types.

-- 
/Jacob Carlborg

Apr 01 2013

"Jesse Phillips" <Jessekphillips+d gmail.com> writes:

On Sunday, 31 March 2013 at 11:23:27 UTC, Kagamin wrote:
 On Saturday, 30 March 2013 at 20:02:48 UTC, Jesse Phillips 
 wrote:
 3) Serialization is done by message (struct) and not by 
 primitives

 PB does serialize by primitives and Archive has archiveStruct 
 method which is called to serialize struct, I believe. At first 
 sight orange serializes using built-in grammar (in EXI terms), 
 and since PB uses schema-informed grammar, you have to provide 
 schema to the archiver: either keep it in the archiver or store 
 globally.

Thank you, you've described it much better. When I saide "by 
message" I was referring to what you have more accurately stated 
as requiring a schema.

I'm not well versed in PB or Orange so I'd need to play around 
more with both, but I'm pretty sure Orange would need changes 
made to be able to make the claim PB is supported. It should be 
possible to create a binary format based on PB.

Mar 31 2013

Jacob Carlborg <doob me.com> writes:

On 2013-04-01 01:39, Jesse Phillips wrote:

 I'm not well versed in PB or Orange so I'd need to play around more with
 both, but I'm pretty sure Orange would need changes made to be able to
 make the claim PB is supported. It should be possible to create a binary
 format based on PB.

Isn't PB binary? Or it actually seems it can be both.

-- 
/Jacob Carlborg

Apr 01 2013

"Jesse Phillips" <Jessekphillips+D gmail.com> writes:

On Monday, 1 April 2013 at 08:53:51 UTC, Jacob Carlborg wrote:
 On 2013-04-01 01:39, Jesse Phillips wrote:

 I'm not well versed in PB or Orange so I'd need to play around 
 more with
 both, but I'm pretty sure Orange would need changes made to be 
 able to
 make the claim PB is supported. It should be possible to 
 create a binary
 format based on PB.

 Isn't PB binary? Or it actually seems it can be both.

Let me see if I can describe this.

PB does encoding to binary by type. However it also has a schema 
in a .proto file. My first concern is that this file provides the 
ID to use for each field, while arbitrary the ID must be what is 
specified.

The second one I'm concerned with is option to pack repeated 
fields. I'm not sure the specifics for this encoding, but I 
imagine some compression.

This is why I think I'd have to implement my own Serializer to be 
able to support PB, but also believe we could have a binary 
format based on PB (which maybe it would be possible to create a 
schema of Orange generated data, but it would be hard to generate 
data for a specific schema).

Apr 01 2013

Matt Soucy <msoucy csh.rit.edu> writes:

On 04/01/2013 01:13 PM, Jesse Phillips wrote:
 On Monday, 1 April 2013 at 08:53:51 UTC, Jacob Carlborg wrote:
 On 2013-04-01 01:39, Jesse Phillips wrote:

 I'm not well versed in PB or Orange so I'd need to play around more with
 both, but I'm pretty sure Orange would need changes made to be able to
 make the claim PB is supported. It should be possible to create a binary
 format based on PB.

 Isn't PB binary? Or it actually seems it can be both.

 Let me see if I can describe this.

 PB does encoding to binary by type. However it also has a schema in a
 .proto file. My first concern is that this file provides the ID to use
 for each field, while arbitrary the ID must be what is specified.

 The second one I'm concerned with is option to pack repeated fields. I'm
 not sure the specifics for this encoding, but I imagine some compression.

 This is why I think I'd have to implement my own Serializer to be able
 to support PB, but also believe we could have a binary format based on
 PB (which maybe it would be possible to create a schema of Orange
 generated data, but it would be hard to generate data for a specific
 schema).

 From what I got from the examples, Repeated fields are done roughly as 
following:
auto msg = fields.map!(a=>a.serialize())().reduce!(a,b=>a~b)();
return ((id<<3)|2) ~ msg.length.toVarint() ~ msg;
Where msg is a ubyte[].
-Matt Soucy

Apr 01 2013

"Kagamin" <spam here.lot> writes:

AFAIK, it's opposite: an array serialized in chunks, and they are 
concatenated on deserialization. Useful if you don't know how 
much elements you're sending, so you send them in finite chunks 
as the data becomes available. Client can also close connection, 
so you don't have to see the end of the sequence.

Apr 01 2013

"Kagamin" <spam here.lot> writes:

Oh, wait, it looks like a (possibly infinite) range rather than 
an array.

Apr 01 2013

Matt Soucy <msoucy csh.rit.edu> writes:

On 04/01/2013 02:37 PM, Kagamin wrote:
 AFAIK, it's opposite: an array serialized in chunks, and they are
 concatenated on deserialization. Useful if you don't know how much
 elements you're sending, so you send them in finite chunks as the data
 becomes available. Client can also close connection, so you don't have
 to see the end of the sequence.

https://developers.google.com/protocol-buffers/docs/encoding#optional

The "packed repeated fields" section explains it and breaks it down with 
an example. If the client can close like that, you probably don't want 
to use packed.
-Soucy

Apr 01 2013

"Kagamin" <spam here.lot> writes:

On Monday, 1 April 2013 at 18:41:57 UTC, Matt Soucy wrote:
 The "packed repeated fields" section explains it and breaks it 
 down with an example. If the client can close like that, you 
 probably don't want to use packed.

Why not? If you transfer a result of google search, the client 
will be able to peek only N first results and close or not. 
Though I agree it's strange that you can only transfer primitive 
types this way.

Apr 01 2013

"Kagamin" <spam here.lot> writes:

So as we see PB doesn't support arrays (and ranges). Hmm... 
that's unfortunate.

Apr 01 2013

Matt Soucy <msoucy csh.rit.edu> writes:

On 04/01/2013 03:11 PM, Kagamin wrote:
 On Monday, 1 April 2013 at 18:41:57 UTC, Matt Soucy wrote:
 The "packed repeated fields" section explains it and breaks it down
 with an example. If the client can close like that, you probably don't
 want to use packed.

 Why not? If you transfer a result of google search, the client will be
 able to peek only N first results and close or not. Though I agree it's
 strange that you can only transfer primitive types this way.

It's not really strange, because of how it actually does the 
serialization. A message is recorded as length+serialized members. 
Members can happen in any order. Packed repeated messages would look 
like...what? How do you know when one message ends and another begins? 
If you try and denote it, you'd just end up with what you already have.
In your example, you'd want to send each individual result as a distinct 
message, so they could be read one at a time. You wouldn't want to pack, 
as packing is for sending a whole data set at once.

Apr 01 2013

"Kagamin" <spam here.lot> writes:

On Monday, 1 April 2013 at 19:37:12 UTC, Matt Soucy wrote:
 It's not really strange, because of how it actually does the 
 serialization. A message is recorded as length+serialized 
 members. Members can happen in any order. Packed repeated 
 messages would look like...what? How do you know when one 
 message ends and another begins? If you try and denote it, 
 you'd just end up with what you already have.

Well, messages can be just repeated, not packed. Packing is for 
really small elements, I guess, - namely numbers.

 In your example, you'd want to send each individual result as a 
 distinct message, so they could be read one at a time. You 
 wouldn't want to pack, as packing is for sending a whole data 
 set at once.

So you suggest to send 1 message per TCP packet?

Apr 01 2013

Matt Soucy <msoucy csh.rit.edu> writes:

On 04/01/2013 04:54 PM, Kagamin wrote:
 On Monday, 1 April 2013 at 19:37:12 UTC, Matt Soucy wrote:
 It's not really strange, because of how it actually does the
 serialization. A message is recorded as length+serialized members.
 Members can happen in any order. Packed repeated messages would look
 like...what? How do you know when one message ends and another begins?
 If you try and denote it, you'd just end up with what you already have.

 Well, messages can be just repeated, not packed. Packing is for really
 small elements, I guess, - namely numbers.

And therefore, it supports arrays just fine (as repeated fields). Yes. 
That last sentence was poorly-worded, and should have said "you'd just 
end up with the un'packed' data with an extra header."

 In your example, you'd want to send each individual result as a
 distinct message, so they could be read one at a time. You wouldn't
 want to pack, as packing is for sending a whole data set at once.

 So you suggest to send 1 message per TCP packet?

Unfortunately, I'm not particularly knowledgeable about networking, but 
that's not quite what I meant. I meant that the use case itself would 
result in sending individual Result messages one at a time, since 
packing (even if it were valid) wouldn't be useful and would require 
getting all of the Results at once. You would just leave off the 
"packed" attribute.

Apr 01 2013

"Kagamin" <spam here.lot> writes:

On Monday, 1 April 2013 at 21:11:57 UTC, Matt Soucy wrote:
 And therefore, it supports arrays just fine (as repeated 
 fields). Yes. That last sentence was poorly-worded, and should 
 have said "you'd just end up with the un'packed' data with an 
 extra header."

It says repeated messages should be merged which results in one 
message, not an array of messages. So from several repeated 
messages you get one as if they formed contiguous soup of fields 
which got parsed as one message: e.g. scalar fields of the 
resulting message take their last seen values.

 Unfortunately, I'm not particularly knowledgeable about 
 networking, but that's not quite what I meant. I meant that the 
 use case itself would result in sending individual Result 
 messages one at a time, since packing (even if it were valid) 
 wouldn't be useful and would require getting all of the Results 
 at once. You would just leave off the "packed" attribute.

As you said, there's no way to tell where one message ends and 
next begins. If you send them one or two at a time, they end up 
as a contiguous stream of bytes. If one is to delimit messages, 
he should define a container format as an extension on top of PB 
with additional semantics for representation of arrays, which 
results in another protocol. And even if you define such 
protocol, there's still no way to have array fields in PB 
messages (arrays of non-trivial types).

For example if you want to update students and departments with 
one method, the obvious choice is to pass it a dictionary of 
key-value pairs of new values for the object's attributes. How to 
do that?

Apr 01 2013

Matt Soucy <msoucy csh.rit.edu> writes:

On 04/02/2013 12:38 AM, Kagamin wrote:
 On Monday, 1 April 2013 at 21:11:57 UTC, Matt Soucy wrote:
 And therefore, it supports arrays just fine (as repeated fields). Yes.
 That last sentence was poorly-worded, and should have said "you'd just
 end up with the un'packed' data with an extra header."

 It says repeated messages should be merged which results in one message,
 not an array of messages. So from several repeated messages you get one
 as if they formed contiguous soup of fields which got parsed as one
 message: e.g. scalar fields of the resulting message take their last
 seen values.

They're merged if the field is optional and gets multiple inputs with 
that id. If a field is marked as repeated, then each block of data 
denoted with that field is treated as a new item in the array.

 Unfortunately, I'm not particularly knowledgeable about networking,
 but that's not quite what I meant. I meant that the use case itself
 would result in sending individual Result messages one at a time,
 since packing (even if it were valid) wouldn't be useful and would
 require getting all of the Results at once. You would just leave off
 the "packed" attribute.

 As you said, there's no way to tell where one message ends and next
 begins. If you send them one or two at a time, they end up as a
 contiguous stream of bytes. If one is to delimit messages, he should
 define a container format as an extension on top of PB with additional
 semantics for representation of arrays, which results in another
 protocol. And even if you define such protocol, there's still no way to
 have array fields in PB messages (arrays of non-trivial types).

 For example if you want to update students and departments with one
 method, the obvious choice is to pass it a dictionary of key-value pairs
 of new values for the object's attributes. How to do that?

I said that that only applies to the incorrect idea of packing complex 
messages. With regular repeated messages (nonpacked), each new repeated 
message is added to the array when deserialized.
While yes, protocol buffers do not create a way to denote 
uppermost-level messages, that isn't really relevant to the situation 
that you're trying to claim. If messages are supposed to be treated 
separately, there are numerous ways to handle that which CAN be done 
inside of protocol buffers.
In this example, one way could be to define messages like so:

message Updates {
	message StudentUpdate {
		required string studentName = 1;
		required uint32 departmentNumber = 2;
	}
	repeated StudentUpdate updates = 1;
}

The you would iterate over Updates.updates, which you'd be adding to 
upon deserialization of more of the messages.

Apr 01 2013

"Jesse Phillips" <Jessekphillips+D gmail.com> writes:

On Monday, 1 April 2013 at 17:24:05 UTC, Matt Soucy wrote:
 From what I got from the examples, Repeated fields are done 
 roughly as following:
 auto msg = fields.map!(a=>a.serialize())().reduce!(a,b=>a~b)();
 return ((id<<3)|2) ~ msg.length.toVarint() ~ msg;
 Where msg is a ubyte[].
 -Matt Soucy

I think that would fall under some form of compression, namely 
compressing the ID :P

BTW, I love how easy that is to read.

Apr 01 2013

Matt Soucy <msoucy csh.rit.edu> writes:

On 04/01/2013 04:38 PM, Jesse Phillips wrote:
 On Monday, 1 April 2013 at 17:24:05 UTC, Matt Soucy wrote:
 From what I got from the examples, Repeated fields are done roughly as
 following:
 auto msg = fields.map!(a=>a.serialize())().reduce!(a,b=>a~b)();
 return ((id<<3)|2) ~ msg.length.toVarint() ~ msg;
 Where msg is a ubyte[].
 -Matt Soucy

 I think that would fall under some form of compression, namely
 compressing the ID :P

 BTW, I love how easy that is to read.

So, I looked at the code I'm currently working on to handle these...and 
it's literally that, except "raw" instead of "fields". UFCS is kind of 
wonderful in places like this.

You're right, that does count as compression. It wouldn't be hard to 
number the fields during serialization, especially if you don't need to 
worry about the structure changing.

-Matt Soucy

Apr 01 2013

Jacob Carlborg <doob me.com> writes:

On 2013-04-01 19:13, Jesse Phillips wrote:

 Let me see if I can describe this.

 PB does encoding to binary by type. However it also has a schema in a
 .proto file. My first concern is that this file provides the ID to use
 for each field, while arbitrary the ID must be what is specified.

 The second one I'm concerned with is option to pack repeated fields. I'm
 not sure the specifics for this encoding, but I imagine some compression.

 This is why I think I'd have to implement my own Serializer to be able
 to support PB, but also believe we could have a binary format based on
 PB (which maybe it would be possible to create a schema of Orange
 generated data, but it would be hard to generate data for a specific
 schema).

As I understand it there's a "schema definition", that is the .proto 
file. You compile this schema to produce D/C++/Java/whatever code that 
contains structs/classes with methods/fields that matches this schema.

If you need to change the schema, besides adding optional fields, you 
need to recompile the schema to produce new code, right?

If you have a D class/struct that matches this schema (regardless if 
it's auto generated from the schema or manually created) with actual 
instance variables for the fields I think it would be possible to 
(de)serialize into the binary PB format using Orange.

Then there's the issue of the options supported by PB like optional 
fields and pack repeated fields (which I don't know what it means).

It seems PB is dependent on the order of the fields so that won't be a 
problem. Just disregard the "key" that is passed to the archive and 
deserialize the next type that is expected. Maybe you could use the 
schema to do some extra validations.

Although, I don't know how PB handles multiple references to the same value.

Looking at this:

https://developers.google.com/protocol-buffers/docs/overview

Below "Why not just use XML?", they both mention a text format (not to 
be confused with the schema, .proto) and a binary format. Although the 
text format seems to be mostly for debugging.

-- 
/Jacob Carlborg

Apr 02 2013

Matt Soucy <msoucy csh.rit.edu> writes:

On 04/02/2013 03:21 AM, Jacob Carlborg wrote:
 On 2013-04-01 19:13, Jesse Phillips wrote:

 Let me see if I can describe this.

 PB does encoding to binary by type. However it also has a schema in a
 .proto file. My first concern is that this file provides the ID to use
 for each field, while arbitrary the ID must be what is specified.

 The second one I'm concerned with is option to pack repeated fields. I'm
 not sure the specifics for this encoding, but I imagine some compression.

 This is why I think I'd have to implement my own Serializer to be able
 to support PB, but also believe we could have a binary format based on
 PB (which maybe it would be possible to create a schema of Orange
 generated data, but it would be hard to generate data for a specific
 schema).

 As I understand it there's a "schema definition", that is the .proto
 file. You compile this schema to produce D/C++/Java/whatever code that
 contains structs/classes with methods/fields that matches this schema.

 If you need to change the schema, besides adding optional fields, you
 need to recompile the schema to produce new code, right?

 If you have a D class/struct that matches this schema (regardless if
 it's auto generated from the schema or manually created) with actual
 instance variables for the fields I think it would be possible to
 (de)serialize into the binary PB format using Orange.

 Then there's the issue of the options supported by PB like optional
 fields and pack repeated fields (which I don't know what it means).

 It seems PB is dependent on the order of the fields so that won't be a
 problem. Just disregard the "key" that is passed to the archive and
 deserialize the next type that is expected. Maybe you could use the
 schema to do some extra validations.

 Although, I don't know how PB handles multiple references to the same
 value.

 Looking at this:

 https://developers.google.com/protocol-buffers/docs/overview

 Below "Why not just use XML?", they both mention a text format (not to
 be confused with the schema, .proto) and a binary format. Although the
 text format seems to be mostly for debugging.

Unfortunately, only partially correct. Optional isn't an "option", it's 
a way of saying that a field may be specified 0 or 1 times. If two 
messages with the same ID are read and the ID is considered optional in 
the schema, then they are merged.

Packed IS an "option", which can only be done to primitives. It changes 
serialization from:
 return raw.map!(a=>(MsgType!BufferType | (id << 3)).toVarint() ~ 

a.writeProto!BufferType())().reduce!((a,b)=>a~b)();
to
 auto msg = raw.map!(writeProto!BufferType)().reduce!((a,b)=>a~b)();
 return (2 | (id << 3)).toVarint() ~ msg.length.toVarint() ~ msg;

(Actual snippets from my partially-complete protocol buffer library)

If you had a struct that matches that schema (PB messages have value 
semantics) then yes, in theory you could do something to serialize the 
struct based on the schema, but you'd have to maintain both separately.

PB is NOT dependent on the order of the fields during serialization, 
they can be sent/received in any order. You could use the schema like 
you mentioned above to tie member names to ids, though.

PB uses value semantics, so multiple references to the same thing isn't 
really an issue that is covered.

I hadn't actually noticed that TextFormat stuff before...interesting. I 
might take a look at that later when I have time.

-Matt Soucy

Apr 02 2013

Jacob Carlborg <doob me.com> writes:

On 2013-04-02 15:38, Matt Soucy wrote:

 Unfortunately, only partially correct. Optional isn't an "option", it's
 a way of saying that a field may be specified 0 or 1 times. If two
 messages with the same ID are read and the ID is considered optional in
 the schema, then they are merged.

With "option", I mean you don't have to use it in the schema. But the 
(de)serializer of course need to support this to be fully compliant with 
the spec.

 Packed IS an "option", which can only be done to primitives. It changes
 serialization from:
  > return raw.map!(a=>(MsgType!BufferType | (id << 3)).toVarint() ~
 a.writeProto!BufferType())().reduce!((a,b)=>a~b)();
 to
  > auto msg = raw.map!(writeProto!BufferType)().reduce!((a,b)=>a~b)();
  > return (2 | (id << 3)).toVarint() ~ msg.length.toVarint() ~ msg;

 (Actual snippets from my partially-complete protocol buffer library)

 If you had a struct that matches that schema (PB messages have value
 semantics) then yes, in theory you could do something to serialize the
 struct based on the schema, but you'd have to maintain both separately.

Just compile the schema to a struct with the necessary fields. Perhaps 
not how it's usually done.

 PB is NOT dependent on the order of the fields during serialization,
 they can be sent/received in any order. You could use the schema like
 you mentioned above to tie member names to ids, though.

So if you have a schema like this:

message Person {
   required string name = 1;
   required int32 id = 2;
   optional string email = 3;
}

1, 2 and 3 will be ids of the fields, and also the order in which they 
are (de)serialized?

Then you could have the archive read the schema, map names to ids and 
archive the ids instead of the names.

 PB uses value semantics, so multiple references to the same thing isn't
 really an issue that is covered.

I see, that kind of sucks, in my opinion.

-- 
/Jacob Carlborg

Apr 02 2013

Matt Soucy <msoucy csh.rit.edu> writes:

On 04/02/2013 10:52 AM, Jacob Carlborg wrote:
 On 2013-04-02 15:38, Matt Soucy wrote:

 Unfortunately, only partially correct. Optional isn't an "option", it's
 a way of saying that a field may be specified 0 or 1 times. If two
 messages with the same ID are read and the ID is considered optional in
 the schema, then they are merged.

 With "option", I mean you don't have to use it in the schema. But the
 (de)serializer of course need to support this to be fully compliant with
 the spec.

OK, I see what you mean. PB uses the term "option" for language 
constructs, hence my confusion.

 Packed IS an "option", which can only be done to primitives. It changes
 serialization from:
  > return raw.map!(a=>(MsgType!BufferType | (id << 3)).toVarint() ~
 a.writeProto!BufferType())().reduce!((a,b)=>a~b)();
 to
  > auto msg = raw.map!(writeProto!BufferType)().reduce!((a,b)=>a~b)();
  > return (2 | (id << 3)).toVarint() ~ msg.length.toVarint() ~ msg;

 (Actual snippets from my partially-complete protocol buffer library)

 If you had a struct that matches that schema (PB messages have value
 semantics) then yes, in theory you could do something to serialize the
 struct based on the schema, but you'd have to maintain both separately.

 Just compile the schema to a struct with the necessary fields. Perhaps
 not how it's usually done.

Again, my misunderstanding. I assumed you were talking about taking a 
pre-existing struct, not one generated from the .proto

 PB is NOT dependent on the order of the fields during serialization,
 they can be sent/received in any order. You could use the schema like
 you mentioned above to tie member names to ids, though.

 So if you have a schema like this:

 message Person {
    required string name = 1;
    required int32 id = 2;
    optional string email = 3;
 }

 1, 2 and 3 will be ids of the fields, and also the order in which they
 are (de)serialized?

 Then you could have the archive read the schema, map names to ids and
 archive the ids instead of the names.

You could easily receive 3,1,2 or 2,1,3 or any other such combination, 
and it would still be valid. That doesn't stop you from doing what you 
suggest, however, as long as you can lookup id[name] and name[id].

 PB uses value semantics, so multiple references to the same thing isn't
 really an issue that is covered.

 I see, that kind of sucks, in my opinion.

Eh. I personally think that it makes sense, and don't have much of a 
problem with it.

Apr 02 2013

Jacob Carlborg <doob me.com> writes:

On 2013-04-02 18:24, Matt Soucy wrote:

 Again, my misunderstanding. I assumed you were talking about taking a
 pre-existing struct, not one generated from the .proto

It doesn't really matter where the struct comes from.

 You could easily receive 3,1,2 or 2,1,3 or any other such combination,
 and it would still be valid. That doesn't stop you from doing what you
 suggest, however, as long as you can lookup id[name] and name[id].

Right. The archive gets the names, it's then up to the archive how to 
map names to PB ids. If the archive gets "foo", "bar" and the serialized 
data contains "bar", "foo" can it handle that? What I mean is that the 
serializer decides which field should be (de)serialized not the archive.

 Eh. I personally think that it makes sense, and don't have much of a
 problem with it.

It probably makes sense if one sends the data over the network and the 
data is mostly value based. I usually have an object hierarchy with many 
reference types and objects passed around.

-- 
/Jacob Carlborg

Apr 02 2013

"Francois Chabot" <francois chabs.ca> writes:

On Tuesday, 2 April 2013 at 18:50:04 UTC, Jacob Carlborg wrote:
 It probably makes sense if one sends the data over the network 
 and the data is mostly value based. I usually have an object 
 hierarchy with many reference types and objects passed around.

I think that's kinda the sticking point with Orange for me. 
Integrating it in my current project implied bringing in a huge 
amount of code when my needs are super-straightforward.

I actually ended up writing myself a super-light range-based 
serializer that'll handle any combination of struct/Array/AA 
thrown at it. (for reference: 
https://github.com/Chabsf/flyb/blob/SerializableHistory/serialize.d 
). It also does not copy data around, and instead just casts the 
data as arrays of bytes. It's under 300 lines of D code and does 
everything I need from Orange, and does it very fast. Orange is 
just so very overkill for my needs.

My point is that serializing POD-based structures in D is so 
simple that using a one-size-fit-all serializer made to handle 
absolutely everything feels very wasteful.

Jun 17 2013

Jacob Carlborg <doob me.com> writes:

On 2013-06-17 15:31, Francois Chabot wrote:

 I think that's kinda the sticking point with Orange for me. Integrating
 it in my current project implied bringing in a huge amount of code when
 my needs are super-straightforward.

Why is that a problem?

 I actually ended up writing myself a super-light range-based serializer
 that'll handle any combination of struct/Array/AA thrown at it. (for
 reference:
 https://github.com/Chabsf/flyb/blob/SerializableHistory/serialize.d ).
 It also does not copy data around, and instead just casts the data as
 arrays of bytes. It's under 300 lines of D code and does everything I
 need from Orange, and does it very fast. Orange is just so very overkill
 for my needs.

It's a lot easier to create a serializer which doesn't handle the full 
set of D. When you start to bring in reference types, pointers, arrays 
and slices it's start to get more complicated.

 My point is that serializing POD-based structures in D is so simple that
 using a one-size-fit-all serializer made to handle absolutely everything
 feels very wasteful.

You don't have to use it.

-- 
/Jacob Carlborg

Jun 17 2013

"Francois Chabot" <francois chabs.ca> writes:

 My point is that serializing POD-based structures in D is so 
 simple that
 using a one-size-fit-all serializer made to handle absolutely 
 everything
 feels very wasteful.

 You don't have to use it.

But I do use it, quite a bit. Whenever I have any form of 
polymorphic serialization, Orange is excellent. I find myself 
effectively choosing between two serialization libraries based on 
my needs right now, which is actually a very good thing.

The issue here is not whether I should be using Orange or not in 
a given project, but whether it should become std.serialization. 
If that happens, then I will find myself "forced" to use it, the 
same way I should be "forced" to use std::vector in C++ unless I 
am dealing with a truly exceptional situation.

But serializing data to send on a socket efficiently should NOT 
be treated as an exceptional case. If anything, I think it should 
be the base case. If/When std.serialization exists, then it will 
effectively become the goto serialization mechanism for any D 
newcomer, and it really should handle that case efficiently and 
elegantly IMHO.

I will grant that making Orange part of Phobos will alleviate the 
project bloat issue, which is a huge deal. But as it stands, to 
me, it only handles a subset of what std.serialization should.

Jun 17 2013

Jacob Carlborg <doob me.com> writes:

On 2013-06-17 16:39, Francois Chabot wrote:

 I will grant that making Orange part of Phobos will alleviate the
 project bloat issue, which is a huge deal. But as it stands, to me, it
 only handles a subset of what std.serialization should.

So what would you like it to handle? I assume you want a binary archive 
and you want faster serialization? You are free to add enhancement 
requests to the github project and comment in the official review thread.

The thing is with Orange is that it's possible to add new archive types. 
If Orange gets accepted as std.serialization we could later add a binary 
archive.

Do you want to stop std.serialization just because of it not having a 
binary archive? Not having a binary archive doesn't make the XML archive 
useless.

-- 
/Jacob Carlborg

Jun 17 2013

"Francois Chabot" <francois chabs.ca> writes:

On Monday, 17 June 2013 at 20:59:31 UTC, Jacob Carlborg wrote:
 On 2013-06-17 16:39, Francois Chabot wrote:

 I will grant that making Orange part of Phobos will alleviate 
 the
 project bloat issue, which is a huge deal. But as it stands, 
 to me, it
 only handles a subset of what std.serialization should.

 So what would you like it to handle? I assume you want a binary 
 archive and you want faster serialization? You are free to add 
 enhancement requests to the github project and comment in the 
 official review thread.

Well, the main thing that I want on my end is a O(1) memory 
footprint when dealing with hierarchical data with no 
cross-references. Even better would be that serialization being a 
lazy input range that can be easily piped into something like 
Walter's std.compress. I guess I could log an enhancement request 
to that effect, but I kinda felt that this was beyond the scope 
of Orange. It has a clear serialize, then deal with the data 
design. What I need really goes against the grain here.

 The thing is with Orange is that it's possible to add new 
 archive types. If Orange gets accepted as std.serialization we 
 could later add a binary archive.

Once again, the sticking point is not the serialization format, 
but the delivery method of said data.

Hmmm, come to think of it, running Orange in a fiber with a 
special archive type and wrapping the whole thing in a range 
could work. However, I'm not familiar enough with how 
aggressively Orange caches potential references to know if it 
would work. Also, due to the polymorphic nature of archives, 
archive types with partial serialization support would be a pain, 
as they would generate runtime errors, not compile-time.

 Do you want to stop std.serialization just because of it not 
 having a binary archive? Not having a binary archive doesn't 
 make the XML archive useless.

No! no no no. I just feel that Orange handles a big piece of the 
serialization puzzle, but that there's a lot more to it.

Jun 17 2013

Jacob Carlborg <doob me.com> writes:

On 2013-06-18 00:06, Francois Chabot wrote:

 Well, the main thing that I want on my end is a O(1) memory footprint
 when dealing with hierarchical data with no cross-references.

That's kind of hard since it creates data. But if you mean except for 
the data then it stores some meta data in order to support reference 
types. That is, not serializing the same reference more than once. 
Theoretically a template parameter could be added to avoid this, but 
that defeats the purpose of a class/interface design. It could be a 
runtime parameter, don't know how well the compiler can optimize that out.

 Even better would be that serialization being a lazy input range that can be
 easily piped into something like Walter's std.compress. I guess I could
 log an enhancement request to that effect, but I kinda felt that this
 was beyond the scope of Orange. It has a clear serialize, then deal with
 the data design. What I need really goes against the grain here.

I'm no expert on ranges but I'm pretty sure it can work with 
std.compress. It returns an array, which is a random access range. 
Although it won't be lazy.

The deserialization would be a bit harder since it depends on std.xml 
which is not range based.

 Once again, the sticking point is not the serialization format, but the
 delivery method of said data.

Ok.

 No! no no no. I just feel that Orange handles a big piece of the
 serialization puzzle, but that there's a lot more to it.

Ok, I see. But I need to know about the rest, what you're missing to be 
able to improve it. I have made a couple of comments here.

-- 
/Jacob Carlborg

Jun 18 2013

"Daniele =?UTF-8?B?Qm9uZMOsIg==?= <daniele.bondi.dev gmail.com> writes:

Any news on this?

Oct 06 2014

Jacob Carlborg <doob me.com> writes:

On 06/10/14 12:24, "Daniele Bondì" <daniele.bondi.dev gmail.com>" wrote:
 Any news on this?

No, I'm currently working on D/Objective-C [1].

[1] http://wiki.dlang.org/DIP43

-- 
/Jacob Carlborg

Oct 06 2014

Jacob Carlborg <doob me.com> writes:

On 2013-03-24 22:03, Jacob Carlborg wrote:
 std.serialization (orange) is now ready to be reviewed.

I've been working on a binary archive with the following format:

FileFormat := CompoundArrayOffset Data CompoundArray




Data := Type*
Type := String | Array | Compound | AssociativeArray | Pointer | Enum | 
Primitive
Compound := ClassData | StructData
String := Length 4B* | 2B* | 1B*
Array := Length Type*
Class := CompoundOffset
Struct := CompoundOffset
ClassData := String Field*
StructData := Field*
Field := Type
Length := 4B
Primitive := Bool | Byte | Cdouble | Cfloat | Char | Creal | Dchar | 
Double | Float | Idouble | Ifloat | Int | Ireal | Long | Real | Short | 
Ubyte | Uint | Ulong | Ushort | Wchar
Bool := 1B
Byte := 1B
Cdouble := 8B 8B
Cfloat := 8B
Char := 1B
Creal := 8B 8B 8B 8B
Dchar := 4B
Double := 8B
Float := 4B
Idouble := 8B
Ifloat := 4B
Int := 4B
Ireal := 8B 8B
Long := 8B
Real := 8B 8B 8B 8B
Short := 2B
Ubyte := 1B
Uint := 4B
Ulong := 8B
Ushort := 2B
Wchar := 2B
1B := 1Byte
2B := 2Bytes
4B := 4Bytes
8B := 8Bytes

How does this look like?

-- 
/Jacob Carlborg

Apr 02 2013

"Baz" <burg.basile yahoo.com> writes:

On Sunday, 24 March 2013 at 21:03:59 UTC, Jacob Carlborg wrote:
 std.serialization (orange) is now ready to be reviewed.

I'm fundatemtaly agaisnt the integration of Orange into the std 
lib.
The basic problem is that there is no flag in the D object model 
for the serialization (e.g: the "published" attribute in Pascal).
In the same order of idea, the doc about RTI is null. In fact 
there's even no RTI usefull for an "academic" serialization 
system.
No no no.

Jun 15 2013

Jacob Carlborg <doob me.com> writes:

On 2013-06-15 20:54, Baz wrote:

 I'm fundatemtaly agaisnt the integration of Orange into the std lib.
 The basic problem is that there is no flag in the D object model for the
 serialization (e.g: the "published" attribute in Pascal).

Why does that matter?

-- 
/Jacob Carlborg

Jun 15 2013

"Dicebot" <public dicebot.lv> writes:

On Saturday, 15 June 2013 at 18:54:35 UTC, Baz wrote:
 The basic problem is that there is no flag in the D object 
 model for the serialization

Yep, and std.serialization adds it, via UDA.

 In the same order of idea, the doc about RTI is null. In fact 
 there's even no RTI usefull for an "academic" serialization 
 system.

RTI (rtti) is hardly relevant here, as far as I can see, 
std.serialization does static reflection.

Jun 15 2013

D Programming

C/C++ Programming

Other

digitalmars.D - Request for review - std.serialization (orange)