www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - std.jgrandson

reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
We need a better json library at Facebook. I'd discussed with Sönke the 
possibility of taking vibe.d's json to std but he said it needs some 
more work. So I took std.jgrandson to proof of concept state and hence 
ready for destruction:

http://erdani.com/d/jgrandson.d
http://erdani.com/d/phobos-prerelease/std_jgrandson.html

Here are a few differences compared to vibe.d's library. I think these 
are desirable to have in that library as well:

* Parsing strings is decoupled into tokenization (which is lazy and only 
needs an input range) and parsing proper. Tokenization is lazy, which 
allows users to create their own advanced (e.g. partial/lazy) parsing if 
needed. The parser itself is eager.

* There's no decoding of strings.

* The representation is built on Algebraic, with the advantages that it 
benefits from all of its primitives. Implementation is also very compact 
because Algebraic obviates a bunch of boilerplate. Subsequent 
improvements to Algebraic will also reflect themselves into improvements 
to std.jgrandson.

* The JSON value (called std.jgrandson.Value) has no named member 
variables or methods except for __payload. This is so there's no clash 
between dynamic properties exposed via opDispatch.

Well that's about it. What would it take for this to become a Phobos 
proposal? Destroy.


Andrei
Aug 03 2014
next sibling parent reply Johannes Pfau <nospam example.com> writes:
Am Sun, 03 Aug 2014 00:16:04 -0700
schrieb Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>:

 We need a better json library at Facebook. I'd discussed with S=C3=B6nke
 the possibility of taking vibe.d's json to std but he said it needs
 some more work. So I took std.jgrandson to proof of concept state and
 hence ready for destruction:
=20
 http://erdani.com/d/jgrandson.d
 http://erdani.com/d/phobos-prerelease/std_jgrandson.html
=20
 Here are a few differences compared to vibe.d's library. I think
 these are desirable to have in that library as well:
=20
 * Parsing strings is decoupled into tokenization (which is lazy and
 only needs an input range) and parsing proper. Tokenization is lazy,
 which allows users to create their own advanced (e.g. partial/lazy)
 parsing if needed. The parser itself is eager.
=20
 * There's no decoding of strings.
=20
 * The representation is built on Algebraic, with the advantages that
 it benefits from all of its primitives. Implementation is also very
 compact because Algebraic obviates a bunch of boilerplate. Subsequent=20
 improvements to Algebraic will also reflect themselves into
 improvements to std.jgrandson.
=20
 * The JSON value (called std.jgrandson.Value) has no named member=20
 variables or methods except for __payload. This is so there's no
 clash between dynamic properties exposed via opDispatch.
=20
 Well that's about it. What would it take for this to become a Phobos=20
 proposal? Destroy.
=20
=20
 Andrei
API looks great but I'd like to see some simple serialize/deserialize functions as in vibed: http://vibed.org/api/vibe.data.json/deserializeJson http://vibed.org/api/vibe.data.json/serializeToJson vibe uses UDAs to customize the serialization output. That's actually not json specific and therefore shouldn't be part of this module. But a simple deserializeJson which simply fills in all fields of a struct given a TokenStream is very useful and can be done without allocations (so it's much faster than going through the DOM). Nitpicks: * I'd make Token only store strings, then convert to double/number only when requested. If a user is simply skipping some tokens these conversions are unnecessary overhead. * parseString really shouldn't use appender. Make it somehow possible to supply a buffer to TokenStream and use that. (This way there's no memory allocation. If a user want to keep the string he has to .dup it). A BufferedRange concept might even be better, because you can read in blocks and reuse buffers.
Aug 03 2014
next sibling parent reply "ponce" <contact gam3sfrommars.fr> writes:
 API looks great but I'd like to see some simple 
 serialize/deserialize
 functions as in vibed:
 http://vibed.org/api/vibe.data.json/deserializeJson
 http://vibed.org/api/vibe.data.json/serializeToJson

 vibe uses UDAs to customize the serialization output. That's 
 actually
 not json specific and therefore shouldn't be part of this 
 module. But a
 simple deserializeJson which simply fills in all fields of a 
 struct
 given a TokenStream is very useful and can be done without 
 allocations
 (so it's much faster than going through the DOM).
That's what https://github.com/Orvid/JSONSerialization does. Also msgpack-d https://github.com/msgpack/msgpack-d whose defaults need no UDAs That makes the typical use case very fast to write.
Aug 03 2014
parent =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:
Am 03.08.2014 10:25, schrieb ponce:
 API looks great but I'd like to see some simple serialize/deserialize
 functions as in vibed:
 http://vibed.org/api/vibe.data.json/deserializeJson
 http://vibed.org/api/vibe.data.json/serializeToJson

 vibe uses UDAs to customize the serialization output. That's actually
 not json specific and therefore shouldn't be part of this module. But a
 simple deserializeJson which simply fills in all fields of a struct
 given a TokenStream is very useful and can be done without allocations
 (so it's much faster than going through the DOM).
That's what https://github.com/Orvid/JSONSerialization does. Also msgpack-d https://github.com/msgpack/msgpack-d whose defaults need no UDAs That makes the typical use case very fast to write.
The default mode for vibe.data.serialization also doesn't need any UDAs, but it's still often useful to be able to make customizations.
Aug 03 2014
prev sibling next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/3/14, 1:02 AM, Johannes Pfau wrote:
 API looks great but I'd like to see some simple serialize/deserialize
 functions as in vibed:
 http://vibed.org/api/vibe.data.json/deserializeJson
 http://vibed.org/api/vibe.data.json/serializeToJson
Agreed.
 vibe uses UDAs to customize the serialization output. That's actually
 not json specific and therefore shouldn't be part of this module. But a
 simple deserializeJson which simply fills in all fields of a struct
 given a TokenStream is very useful and can be done without allocations
 (so it's much faster than going through the DOM).
Nice.
 Nitpicks:

 * I'd make Token only store strings, then convert to double/number only
    when requested. If a user is simply skipping some tokens these
    conversions are unnecessary overhead.
Well... this is tricky. If the input has immutable characters, they can be stored because it can be assumed they'll live forever. If they're mutable or const, that assumption doesn't hold so every number must allocate. At that point it's probably cheaper to just convert to double. One thing is I didn't treat integers specially, but I did notice some json parsers do make that distinction.
 * parseString really shouldn't use appender. Make it somehow possible
    to supply a buffer to TokenStream and use that. (This way there's no
    memory allocation. If a user want to keep the string he has to .dup
    it). A BufferedRange concept might even be better, because you can
    read in blocks and reuse buffers.
Good suggestion, thanks. Andrei
Aug 03 2014
prev sibling parent reply "Dicebot" <public dicebot.lv> writes:
On Sunday, 3 August 2014 at 08:04:40 UTC, Johannes Pfau wrote:
 API looks great but I'd like to see some simple 
 serialize/deserialize
 functions as in vibed:
 http://vibed.org/api/vibe.data.json/deserializeJson
 http://vibed.org/api/vibe.data.json/serializeToJson
Before going this route one needs to have a good vision how it may interact with imaginary std.serialization to avoid later deprecation. At the same time I have recently started to think that dedicated serialization module that decouples aggregate iteration from data storage format is in most cases impractical for performance reasons - different serialization methods imply very different efficient iteration strategies. Probably it is better to define serialization compile-time traits instead and require each `std.data.*` provider to implement those on its own in the most effective fashion.
Aug 03 2014
next sibling parent reply =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:
Am 03.08.2014 20:44, schrieb Dicebot:
 On Sunday, 3 August 2014 at 08:04:40 UTC, Johannes Pfau wrote:
 API looks great but I'd like to see some simple serialize/deserialize
 functions as in vibed:
 http://vibed.org/api/vibe.data.json/deserializeJson
 http://vibed.org/api/vibe.data.json/serializeToJson
Before going this route one needs to have a good vision how it may interact with imaginary std.serialization to avoid later deprecation. At the same time I have recently started to think that dedicated serialization module that decouples aggregate iteration from data storage format is in most cases impractical for performance reasons - different serialization methods imply very different efficient iteration strategies. Probably it is better to define serialization compile-time traits instead and require each `std.data.*` provider to implement those on its own in the most effective fashion.
Do you have a specific case in mind where the data format doesn't fit the process used by vibe.data.serialization? The data format iteration part *is* abstracted away there in basically a kind of traits structure (the "Serializer"). When serializing, the data always gets written in the order defined by the input value, while during deserialization the serializer defines how aggregates are iterated. This seems to fit all of the data formats that I had in mind.
Aug 03 2014
parent "Dicebot" <public dicebot.lv> writes:
On Sunday, 3 August 2014 at 19:36:43 UTC, Sönke Ludwig wrote:
 Do you have a specific case in mind where the data format 
 doesn't fit the process used by vibe.data.serialization? The 
 data format iteration part *is* abstracted away there in 
 basically a kind of traits structure (the "Serializer"). When 
 serializing, the data always gets written in the order defined 
 by the input value, while during deserialization the serializer 
 defines how aggregates are iterated. This seems to fit all of 
 the data formats that I had in mind.
For example we use special binary serialization format for structs where serialized content is actually a valid D struct - after updating internal array pointers one can simply do `cast(S*) buffer.ptr` and work with it normally. Doing this efficiently requires breadth-first traversal and keeping track of one upper level to update the pointers. This does not fit very well with classical depth-first recursive traversal usually required by JSON-structure formats.
Aug 03 2014
prev sibling parent reply "Jacob Carlborg" <doob me.com> writes:
On Sunday, 3 August 2014 at 18:44:37 UTC, Dicebot wrote:

 Before going this route one needs to have a good vision how it 
 may interact with imaginary std.serialization to avoid later 
 deprecation.
I suggest only provide functions for serializing primitive types. A separate serialization module/package with a JSON archive type would use this module as its backend.
 At the same time I have recently started to think that 
 dedicated serialization module that decouples aggregate 
 iteration from data storage format is in most cases impractical 
 for performance reasons - different serialization methods imply 
 very different efficient iteration strategies. Probably it is 
 better to define serialization compile-time traits instead and 
 require each `std.data.*` provider to implement those on its 
 own in the most effective fashion.
I'm not sure I agree with that. In my work on std.serialization I have not seen this to be a problem. What problems have you found? -- /Jacob Carlborg
Aug 04 2014
next sibling parent reply "Daniel Murphy" <yebbliesnospam gmail.com> writes:
"Jacob Carlborg"  wrote in message 
news:bjecckhwlmkwkeqegwqa forum.dlang.org...

 I suggest only provide functions for serializing primitive types.
This is exactly what I need in most projects. Basic types, arrays, AAs, and structs are usually enough.
Aug 04 2014
parent reply "Jacob Carlborg" <doob me.com> writes:
On Monday, 4 August 2014 at 09:10:46 UTC, Daniel Murphy wrote:

 This is exactly what I need in most projects.  Basic types, 
 arrays, AAs, and structs are usually enough.
I was more thinking only types that cannot be broken down in to smaller pieces, i.e. integer, floating point, bool and string. The serializer would break down the other types in to smaller pieces. -- /Jacob Carlborg
Aug 04 2014
parent reply "Daniel Murphy" <yebbliesnospam gmail.com> writes:
"Jacob Carlborg"  wrote in message 
news:kvuaxyxjwmpqrorlozrz forum.dlang.org...

 This is exactly what I need in most projects.  Basic types, arrays, AAs, 
 and structs are usually enough.
I was more thinking only types that cannot be broken down in to smaller pieces, i.e. integer, floating point, bool and string. The serializer would break down the other types in to smaller pieces.
I guess I meant types that have an obvious mapping to json types. int/long -> json integer bool -> json bool string -> json string float/real -> json float (close enough) T[] -> json array T[string] -> json object struct -> json object This is usually enough for config and data files. Being able to do this is just awesome: struct AppConfig { string somePath; bool someOption; string[] someList; string[string] someMap; } void main() { auto config = "config.json".readText().parseJSON().fromJson!AppConfig(); } Being able to serialize whole graphs into json is something I need much less often.
Aug 05 2014
next sibling parent reply "Andrea Fontana" <nospam example.com> writes:
On Tuesday, 5 August 2014 at 12:40:25 UTC, Daniel Murphy wrote:
 "Jacob Carlborg"  wrote in message 
 news:kvuaxyxjwmpqrorlozrz forum.dlang.org...

 This is exactly what I need in most projects.  Basic types, 
 arrays, AAs, and structs are usually enough.
I was more thinking only types that cannot be broken down in to smaller pieces, i.e. integer, floating point, bool and string. The serializer would break down the other types in to smaller pieces.
I guess I meant types that have an obvious mapping to json types. int/long -> json integer bool -> json bool string -> json string float/real -> json float (close enough) T[] -> json array T[string] -> json object struct -> json object This is usually enough for config and data files. Being able to do this is just awesome: struct AppConfig { string somePath; bool someOption; string[] someList; string[string] someMap; } void main() { auto config = "config.json".readText().parseJSON().fromJson!AppConfig(); } Being able to serialize whole graphs into json is something I need much less often.
If I'm right, json has just one numeric type. No difference between integers / float and no limits. So probably the mapping is: float/double/real/int/long => number
Aug 05 2014
parent reply "Daniel Murphy" <yebbliesnospam gmail.com> writes:
"Andrea Fontana"  wrote in message 
news:takluoqmlmmooxlovqya forum.dlang.org...

 If I'm right, json has just one numeric type. No difference between 
 integers / float and no limits.

 So probably the mapping is:

 float/double/real/int/long => number
Maybe, but std.json has three numeric types.
Aug 05 2014
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/5/14, 8:23 AM, Daniel Murphy wrote:
 "Andrea Fontana"  wrote in message
 news:takluoqmlmmooxlovqya forum.dlang.org...

 If I'm right, json has just one numeric type. No difference between
 integers / float and no limits.

 So probably the mapping is:

 float/double/real/int/long => number
Maybe, but std.json has three numeric types.
I searched around a bit and it seems different libraries have different takes to this numeric matter. A simple reading of the spec suggests that floating point data is the only numeric type. However, many implementations choose to distinguish between floating point and integrals. Andrei
Aug 05 2014
next sibling parent "Dicebot" <public dicebot.lv> writes:
On Tuesday, 5 August 2014 at 17:17:56 UTC, Andrei Alexandrescu 
wrote:
 On 8/5/14, 8:23 AM, Daniel Murphy wrote:
 "Andrea Fontana"  wrote in message
 news:takluoqmlmmooxlovqya forum.dlang.org...

 If I'm right, json has just one numeric type. No difference 
 between
 integers / float and no limits.

 So probably the mapping is:

 float/double/real/int/long => number
Maybe, but std.json has three numeric types.
I searched around a bit and it seems different libraries have different takes to this numeric matter. A simple reading of the spec suggests that floating point data is the only numeric type. However, many implementations choose to distinguish between floating point and integrals.
There is certain benefit in using same primitive types for JSON as ones defined by BSON spec.
Aug 05 2014
prev sibling parent reply "Sean Kelly" <sean invisibleduck.org> writes:
On Tuesday, 5 August 2014 at 17:17:56 UTC, Andrei Alexandrescu
wrote:
 I searched around a bit and it seems different libraries have 
 different takes to this numeric matter. A simple reading of the 
 spec suggests that floating point data is the only numeric 
 type. However, many implementations choose to distinguish 
 between floating point and integrals.
The original point of JSON was that it auto-converts to Javascript data. And since Javascript only has one numeric type, of course JSON does too. But I think it's important that a JSON package for a language maps naturally to the types available in that language. D provides both floating point and integer types, each with their own costs and benefits, and so the JSON package should as well. It ends up being a lot easier to deal with than remembering to round from JSON.number or whatever when assigning to an int. In fact, JSON doesn't even impose any precision restrictions on its numeric type, so one could argue that we should be using BigInt and BigFloat. But this would stink most of the time, so... On an unrelated note, while the default encoding for strings is UTF-8, the RFC absolutely allows for UTF-16 surrogate pairs, and this must be supported. Any strings you get from Internet Explorer will be encoded as UTF-16 surrogate pairs regardless of content, presumably since Windows uses 16 bit wide chars for unicode.
Aug 05 2014
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/5/14, 10:48 AM, Sean Kelly wrote:
 On Tuesday, 5 August 2014 at 17:17:56 UTC, Andrei Alexandrescu
 wrote:
 I searched around a bit and it seems different libraries have
 different takes to this numeric matter. A simple reading of the spec
 suggests that floating point data is the only numeric type. However,
 many implementations choose to distinguish between floating point and
 integrals.
The original point of JSON was that it auto-converts to Javascript data. And since Javascript only has one numeric type, of course JSON does too. But I think it's important that a JSON package for a language maps naturally to the types available in that language. D provides both floating point and integer types, each with their own costs and benefits, and so the JSON package should as well. It ends up being a lot easier to deal with than remembering to round from JSON.number or whatever when assigning to an int. In fact, JSON doesn't even impose any precision restrictions on its numeric type, so one could argue that we should be using BigInt and BigFloat. But this would stink most of the time, so... On an unrelated note, while the default encoding for strings is UTF-8, the RFC absolutely allows for UTF-16 surrogate pairs, and this must be supported. Any strings you get from Internet Explorer will be encoded as UTF-16 surrogate pairs regardless of content, presumably since Windows uses 16 bit wide chars for unicode.
All good points. Proceed with implementation! :o) -- Andrei
Aug 05 2014
next sibling parent reply "Dicebot" <public dicebot.lv> writes:
On Tuesday, 5 August 2014 at 17:58:08 UTC, Andrei Alexandrescu 
wrote:
 All good points. Proceed with implementation! :o) -- Andrei
Any news about std.allocator ? ;)
Aug 05 2014
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/5/14, 10:58 AM, Dicebot wrote:
 On Tuesday, 5 August 2014 at 17:58:08 UTC, Andrei Alexandrescu wrote:
 All good points. Proceed with implementation! :o) -- Andrei
Any news about std.allocator ? ;)
It looks like I need to go all out and write a garbage collector, design and implementation and all. Andrei
Aug 05 2014
parent "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:
On Tuesday, 5 August 2014 at 18:12:54 UTC, Andrei Alexandrescu 
wrote:
 On 8/5/14, 10:58 AM, Dicebot wrote:
 On Tuesday, 5 August 2014 at 17:58:08 UTC, Andrei Alexandrescu 
 wrote:
 All good points. Proceed with implementation! :o) -- Andrei
Any news about std.allocator ? ;)
It looks like I need to go all out and write a garbage collector, design and implementation and all.
A few months ago, you posted a video of a talk where you presented code from a garbage collector (it used templated mark functions to get precise tracing). I remember you said that this code was in use somewhere (I guess at FB?). Can this be used as a basis?
Aug 05 2014
prev sibling parent reply "H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:
On Tue, Aug 05, 2014 at 10:58:08AM -0700, Andrei Alexandrescu via Digitalmars-d
wrote:
 On 8/5/14, 10:48 AM, Sean Kelly wrote:
[...]
The original point of JSON was that it auto-converts to
Javascript data.  And since Javascript only has one numeric type,
of course JSON does too.  But I think it's important that a JSON
package for a language maps naturally to the types available in
that language.  D provides both floating point and integer types,
each with their own costs and benefits, and so the JSON package
should as well.  It ends up being a lot easier to deal with than
remembering to round from JSON.number or whatever when assigning
to an int.

In fact, JSON doesn't even impose any precision restrictions on
its numeric type, so one could argue that we should be using
BigInt and BigFloat.  But this would stink most of the time, so...
Would it make sense to wrap a JSON number in an opaque type that implicitly casts to the target built-in type?
On an unrelated note, while the default encoding for strings is
UTF-8, the RFC absolutely allows for UTF-16 surrogate pairs, and
this must be supported.  Any strings you get from Internet
Explorer will be encoded as UTF-16 surrogate pairs regardless of
content, presumably since Windows uses 16 bit wide chars for
unicode.
[...] Wait, I thought surrogate pairs only apply to characters past U+FFFF? Is it even possible to encode BMP characters with surrogate pairs?? Or do you mean UTF-16? T -- Music critic: "That's an imitation fugue!"
Aug 05 2014
parent "Andrea Fontana" <nospam example.com> writes:
On Tuesday, 5 August 2014 at 18:11:21 UTC, H. S. Teoh via 
Digitalmars-d wrote:
 On Tue, Aug 05, 2014 at 10:58:08AM -0700, Andrei Alexandrescu 
 via Digitalmars-d wrote:
 On 8/5/14, 10:48 AM, Sean Kelly wrote:
[...]
 Would it make sense to wrap a JSON number in an opaque type that
 implicitly casts to the target built-in type?
IMO we should store original json number value as string and then try to convert to what user asks for. As said, it could be a big int, or a big floating point value without any limit.
Aug 06 2014
prev sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2014-08-05 14:40, Daniel Murphy wrote:

 I guess I meant types that have an obvious mapping to json types.

 int/long -> json integer
 bool -> json bool
 string -> json string
 float/real -> json float (close enough)
 T[] -> json array
 T[string] -> json object
 struct -> json object

 This is usually enough for config and data files.  Being able to do this
 is just awesome:

 struct AppConfig
 {
     string somePath;
     bool someOption;
     string[] someList;
     string[string] someMap;
 }

 void main()
 {
     auto config =
 "config.json".readText().parseJSON().fromJson!AppConfig();
 }
I'm not saying that is a bad idea or that I don't want to be able to do this. I just prefer this to be handled by a generic serialization module. Which can of course handle the simple cases, like above, as well. -- /Jacob Carlborg
Aug 05 2014
parent reply "Daniel Murphy" <yebbliesnospam gmail.com> writes:
"Jacob Carlborg"  wrote in message news:lrqvfa$2has$1 digitalmars.com...

 I'm not saying that is a bad idea or that I don't want to be able to do 
 this. I just prefer this to be handled by a generic serialization module. 
 Which can of course handle the simple cases, like above, as well.
I know, but I don't really care if it's part of a generic serialization library or not. I just want it there. Chances are tying it to a future generic serialization library is going to make it take longer.
Aug 05 2014
parent reply Jacob Carlborg <doob me.com> writes:
On 2014-08-05 18:42, Daniel Murphy wrote:

 Chances are tying it to a future
 generic serialization library is going to make it take longer.
Yeah, that's the problem. But where do you draw the line. Should arrays of structs be supported? -- /Jacob Carlborg
Aug 06 2014
parent reply "Daniel Murphy" <yebbliesnospam gmail.com> writes:
"Jacob Carlborg"  wrote in message news:lrsrek$19mf$1 digitalmars.com...

 Chances are tying it to a future
 generic serialization library is going to make it take longer.
Yeah, that's the problem. But where do you draw the line. Should arrays of structs be supported?
Yes. Allow T, where T is any of int, float, long, etc bool struct { T... } T[string] T[] Sure, you _can_ make a struct containing an array that contains itself, but you probably won't.
Aug 06 2014
parent reply Jacob Carlborg <doob me.com> writes:
On 2014-08-06 13:36, Daniel Murphy wrote:

 Yes.  Allow T, where T is any of

 int, float, long, etc
 bool
 struct { T... }
 T[string]
 T[]
BTW, why not classes? It's basically the same implementation as for structs. -- /Jacob Carlborg
Aug 06 2014
parent reply "Daniel Murphy" <yebbliesnospam gmail.com> writes:
"Jacob Carlborg"  wrote in message news:lrtf8l$22d3$1 digitalmars.com...

 BTW, why not classes? It's basically the same implementation as for 
 structs.
I guess I've just never needed to do it with classes. A lot of the time when I use classes I use inheritance, and this simple translation doesn't work out so will then...
Aug 06 2014
parent "Sean Kelly" <sean invisibleduck.org> writes:
On Wednesday, 6 August 2014 at 15:28:06 UTC, Daniel Murphy wrote:
 "Jacob Carlborg"  wrote in message 
 news:lrtf8l$22d3$1 digitalmars.com...

 BTW, why not classes? It's basically the same implementation 
 as for structs.
I guess I've just never needed to do it with classes. A lot of the time when I use classes I use inheritance, and this simple translation doesn't work out so will then...
We could do something like Jackson. I wouldn't want it as the primary interface for a JSON package, but for serializing classes it's a pretty easy design to work with from a user perspective.
Aug 06 2014
prev sibling parent reply "Dicebot" <public dicebot.lv> writes:
On Monday, 4 August 2014 at 07:34:19 UTC, Jacob Carlborg wrote:
 On Sunday, 3 August 2014 at 18:44:37 UTC, Dicebot wrote:

 Before going this route one needs to have a good vision how it 
 may interact with imaginary std.serialization to avoid later 
 deprecation.
I suggest only provide functions for serializing primitive types. A separate serialization module/package with a JSON archive type would use this module as its backend.
case for JSON conversion.
 At the same time I have recently started to think that 
 dedicated serialization module that decouples aggregate 
 iteration from data storage format is in most cases 
 impractical for performance reasons - different serialization 
 methods imply very different efficient iteration strategies. 
 Probably it is better to define serialization compile-time 
 traits instead and require each `std.data.*` provider to 
 implement those on its own in the most effective fashion.
I'm not sure I agree with that. In my work on std.serialization I have not seen this to be a problem. What problems have you found?
http://forum.dlang.org/post/mzweposldwqdtmqoltiy forum.dlang.org
Aug 04 2014
parent reply "Jacob Carlborg" <doob me.com> writes:
On Monday, 4 August 2014 at 14:02:22 UTC, Dicebot wrote:


 use case for JSON conversion.
No, only types that cannot be broken down in to smaller pieces, i.e. integral, floating points, bool and strings.
 http://forum.dlang.org/post/mzweposldwqdtmqoltiy forum.dlang.org
I don't understand exactly how that binary serialization works. I think I would need a code example. -- /Jacob Carlborg
Aug 04 2014
parent reply "Dicebot" <public dicebot.lv> writes:
On Monday, 4 August 2014 at 14:18:41 UTC, Jacob Carlborg wrote:
 On Monday, 4 August 2014 at 14:02:22 UTC, Dicebot wrote:


 use case for JSON conversion.
No, only types that cannot be broken down in to smaller pieces, i.e. integral, floating points, bool and strings.
That is exactly the problem - if `structToJson` won't be provided, complaints are inevitable, it is too basic feature to wait for std.serialization :(
 http://forum.dlang.org/post/mzweposldwqdtmqoltiy forum.dlang.org
I don't understand exactly how that binary serialization works. I think I would need a code example.
Simplified serialization algorithm: 1) write (cast(void*) &struct)[0..struct.sizeof] to target buffer 2) write any of array content to the same buffer after the struct 3.1) if array contains structs, recursion 3.2) go back to buffer[0..struct.sizeof] slice and update array fields to store an index in the same buffer instead of actual ptr Simplified deserialization algorithm: 1) recursively traverse the struct and replace array index offsets with real slices to the buffer (I don't want to bother with getting copyright permissions to publish actual code) I am pretty sure that this is not the only optimized serialization approach out there that does not fit in a content-insensitive primitive-based traversal scheme. And we won't Phobos stuff to be blazingly fast which can lead to situation where new data module will circumvent the std.serialization API to get more performance.
Aug 04 2014
parent reply Jacob Carlborg <doob me.com> writes:
On 2014-08-04 16:55, Dicebot wrote:

 That is exactly the problem - if `structToJson` won't be provided,
 complaints are inevitable, it is too basic feature to wait for
 std.serialization :(
Hmm, yeah, that's a problem.
 Simplified serialization algorithm:

 1) write (cast(void*) &struct)[0..struct.sizeof] to target buffer
 2) write any of array content to the same buffer after the struct
 3.1) if array contains structs, recursion
 3.2) go back to buffer[0..struct.sizeof] slice and update array fields
 to store an index in the same buffer instead of actual ptr

 Simplified deserialization algorithm:

 1) recursively traverse the struct and replace array index offsets with
 real slices to the buffer
I see. I need to think a bit about this.
 (I don't want to bother with getting copyright permissions to publish
 actual code)
Fair enough. The above was quite descriptive.
 I am pretty sure that this is not the only optimized serialization
 approach out there that does not fit in a content-insensitive
 primitive-based traversal scheme. And we won't Phobos stuff to be
 blazingly fast which can lead to situation where new data module will
 circumvent the std.serialization API to get more performance.
I don't like the idea of having to reimplement serialization for each data type that can be generalized. -- /Jacob Carlborg
Aug 04 2014
parent reply =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:
Am 04.08.2014 20:38, schrieb Jacob Carlborg:
 On 2014-08-04 16:55, Dicebot wrote:

 That is exactly the problem - if `structToJson` won't be provided,
 complaints are inevitable, it is too basic feature to wait for
 std.serialization :(
Hmm, yeah, that's a problem.
On the other hand, a simplistic solution will inevitably result in people needing more. And when at some point a serialization module is in Phobos, there will be duplicate functionality in the library.
 I am pretty sure that this is not the only optimized serialization
 approach out there that does not fit in a content-insensitive
 primitive-based traversal scheme. And we won't Phobos stuff to be
 blazingly fast which can lead to situation where new data module will
 circumvent the std.serialization API to get more performance.
I don't like the idea of having to reimplement serialization for each data type that can be generalized.
I think we could also simply keep the generic default recursive descent behavior, but allow serializers to customize the process using some kind of trait. This could even be added later in a backwards compatible fashion if necessary. BTW, how is the progress for Orange w.r.t. to the conversion to a more template+allocation-less approach, is a new std proposal within the next DMD release cycle realistic? I quite like most of how vibe.data.serialization turned out, but it can't do any alias detection/deduplication (and I have no concrete plans to add support for that), which is why I currently wouldn't consider it as a potential Phobos candidate.
Aug 05 2014
next sibling parent "Dicebot" <public dicebot.lv> writes:
On Tuesday, 5 August 2014 at 09:54:42 UTC, Sönke Ludwig wrote:
 I think we could also simply keep the generic default recursive 
 descent behavior, but allow serializers to customize the 
 process using some kind of trait. This could even be added 
 later in a backwards compatible fashion if necessary.
Simple option is to define required serializer traits and make both std.serialization default and any custom data-specific ones conform it.
Aug 05 2014
prev sibling parent Jacob Carlborg <doob me.com> writes:
On 2014-08-05 11:54, Sönke Ludwig wrote:

 I think we could also simply keep the generic default recursive descent
 behavior, but allow serializers to customize the process using some kind
 of trait. This could even be added later in a backwards compatible
 fashion if necessary.
I have a very flexible trait like system in place. This allows to configure the serializer based on the given archiver and user customizations. To avoid having the serializer do unnecessary work which the archiver cannot handle.
 BTW, how is the progress for Orange w.r.t. to the conversion to a more
 template+allocation-less approach
Slowly. I think the range support in the serializer is basically complete. But the deserializer isn't done yet. I would also like to provide, at least, one additional archiver type besides XML. BTW std.xml doesn't make it any easier to rangify the serializer. I've been focusing on D/Objective-C lately, which I think is in a more complete state than std.serialization. I would really like to get it done and create a pull request so I can get back to std.serialization. But I always get stuck after a merge with something breaking. With the summer and vacations I haven't been able to work that much on D at all. , is a new std proposal within the next
 DMD release cycle realistic?
Probably not.
 I quite like most of how vibe.data.serialization turned out, but it
 can't do any alias detection/deduplication (and I have no concrete plans
 to add support for that), which is why I currently wouldn't consider it
 as a potential Phobos candidate.
I'm quite satisfied with the feature support and flexibility of Orange/std.serialization. With the new trait like system it will be even more flexible. -- /Jacob Carlborg
Aug 05 2014
prev sibling next sibling parent reply "bearophile" <bearophileHUGS lycos.com> writes:
Andrei Alexandrescu:

 * The representation is built on Algebraic,
Good. But here I'd like a little more readable type: alias Payload = std.variant.VariantN!(16LU, typeof(null), bool, double, string, Value[], Value[string]).VariantN; Like: alias Payload = std.variant.Algebraic!(typeof(null), bool, double, string, Value[], Value[string]); Bye, bearophile
Aug 03 2014
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/3/14, 1:19 AM, bearophile wrote:
 Andrei Alexandrescu:

 * The representation is built on Algebraic,
Good. But here I'd like a little more readable type: alias Payload = std.variant.VariantN!(16LU, typeof(null), bool, double, string, Value[], Value[string]).VariantN; Like: alias Payload = std.variant.Algebraic!(typeof(null), bool, double, string, Value[], Value[string]);
Yah, the latter is in the code. It's a ddoc problem. -- Andrei
Aug 03 2014
prev sibling next sibling parent reply =?ISO-8859-1?Q?S=F6nke_Ludwig?= <sludwig rejectedsoftware.com> writes:
A few thoughts based on my experience with vibe.data.json:

1. No decoding of strings appears to mean that "Value" also always 
contains encoded strings. This seems the be a leaky and also error prone 
leaky abstraction. For the token stream, performance should be top 
priority, so it's okay to not decode there, but "Value" is a high level 
abstraction of a JSON value, so it should really hide all implementation 
details of the storage format.

2. Algebraic is a good choice for its generic handling of operations on 
the contained types (which isn't exposed here, though). However, a 
tagged union type in my experience has quite some advantages for 
usability. Since adding a type tag possibly affects the interface in a 
non-backwards compatible way, this should be evaluated early on.

2.b) I'm currently working on a generic tagged union type that also 
enables operations between values in a natural generic way. This has the 
big advantage of not having to manually define operators like in 
"Value", which is error prone and often limited (I've had to make many 
fixes and additions in this part of the code over time).

3. Use of "opDispatch" for an open set of members has been criticized 
for vibe.data.json before and I agree with that criticism. The only 
advantage is saving a few keystrokes (json.key instead of json["key"]), 
but I came to the conclusion that the right approach to work with JSON 
values in D is to always directly deserialize when/if possible anyway, 
which mostly makes this is a moot point.

This approach has a lot of advantages, e.g. reduction of allocations, 
performance of field access and avoiding typos when accessing fields. 
Especially the last point is interesting, because opDispatch based field 
access gives the false impression that a static field is accessed.

The decision to minimize the number of static fields within "Value" 
reduces the chance of accidentally accessing a static field instead of 
hitting opDispatch, but there are still *some* static fields/methods and 
any later addition of a symbol must now be considered a breaking change.

3.b) Bad interaction of UFCS and opDispatch: Functions like "remove" and 
"assume" certainly look like they could be used with UFCS, but 
opDispatch destroys that possibility.

4. I know the stance on this is often "The D module system has enough 
facilities to disambiguate" (which is not really a valid argument, but 
rather just the lack of a counter argument, IMO), but I highly dislike 
the choice to leave off any mention of "JSON" or "Json" in the global 
symbol names. Using the module either requires to always use a renamed 
import or a manual alias, or the resulting source code will always leave 
the reader wondering what kind of data is actually handled. Handling 
multiple "value" types in a single piece of code, which is not uncommon 
(e.g. JSON + BSON/ini value/...) would always require explicit 
disambiguation. I'd certainly include the "JSON" or "Json" part in the 
names.

5. Whatever happens, *please* let's aim for a module name of 
std.data.json (similar to std.digest.*), so that any data formats added 
later are nicely organized. All existing data format support (XML + CSV) 
doesn't follow contemporary Phobos style, so they will need to be 
deprecated at some point anyway, freeing the way for a clean an 
non-breaking transition to a more organized module hierarchy.

6. (Possibly compile time optional) support for keeping track of 
line/column numbers is often important for better error messages, so 
that would be good to have included as part of the parser and in the 
"Token" type.

Sönke
Aug 03 2014
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/3/14, 2:38 AM, Sönke Ludwig wrote:
 A few thoughts based on my experience with vibe.data.json:

 1. No decoding of strings appears to mean that "Value" also always
 contains encoded strings. This seems the be a leaky and also error prone
 leaky abstraction. For the token stream, performance should be top
 priority, so it's okay to not decode there, but "Value" is a high level
 abstraction of a JSON value, so it should really hide all implementation
 details of the storage format.
Nonono. I think there's a confusion. The input strings are not UTF decoded for the simple need there's no need (all tokenization decisions are taken on the basis of ASCII characters/code units). The backslash-prefixed characters are indeed decoded. An optimization I didn't implement yet is to use slices of the input wherever possible (when the input is string, immutable(byte)[], or immutable(ubyte)[]). That will reduce allocations considerably.
 2. Algebraic is a good choice for its generic handling of operations on
 the contained types (which isn't exposed here, though). However, a
 tagged union type in my experience has quite some advantages for
 usability. Since adding a type tag possibly affects the interface in a
 non-backwards compatible way, this should be evaluated early on.
There's a public opCast(Payload) that gives the end user access to the Payload inside a Value. I forgot to add documentation to it. What advantages are to a tagged union? (FWIW: to me Algebraic and Variant are also tagged unions, just that the tags are not 0, 1, ..., n. That can be easily fixed for Algebraic by defining operations to access the index of the currently-stored type.)
 2.b) I'm currently working on a generic tagged union type that also
 enables operations between values in a natural generic way. This has the
 big advantage of not having to manually define operators like in
 "Value", which is error prone and often limited (I've had to make many
 fixes and additions in this part of the code over time).
I did notice that vibe.json has quite a repetitive implementation, so reducing it would be great. The way I see it, good work on tagged unions must be either integrated within std.variant (either by modifying Variant/Algebraic or by adding new types to it). I am very strongly opposed to adding a tagged union type only for JSON purposes, which I'd consider essentially a usability bug in std.variant, the opposite of dogfooding, etc.
 3. Use of "opDispatch" for an open set of members has been criticized
 for vibe.data.json before and I agree with that criticism. The only
 advantage is saving a few keystrokes (json.key instead of json["key"]),
 but I came to the conclusion that the right approach to work with JSON
 values in D is to always directly deserialize when/if possible anyway,
 which mostly makes this is a moot point.
Interesting. Well if experience with opDispatch is negative then it should probably not be used here, or only offered on an opt-in basis.
 This approach has a lot of advantages, e.g. reduction of allocations,
 performance of field access and avoiding typos when accessing fields.
 Especially the last point is interesting, because opDispatch based field
 access gives the false impression that a static field is accessed.
Good point.
 The decision to minimize the number of static fields within "Value"
 reduces the chance of accidentally accessing a static field instead of
 hitting opDispatch, but there are still *some* static fields/methods and
 any later addition of a symbol must now be considered a breaking change.
Right now the idea is that the only named member is __payload. Well then there's opXxxx as well. The idea is/was to add all other functionality as free functions.
 3.b) Bad interaction of UFCS and opDispatch: Functions like "remove" and
 "assume" certainly look like they could be used with UFCS, but
 opDispatch destroys that possibility.
Yah, agreed. The bummer is people coming from Python won't be able to continue using the same style without opDispatch.
 4. I know the stance on this is often "The D module system has enough
 facilities to disambiguate" (which is not really a valid argument, but
 rather just the lack of a counter argument, IMO), but I highly dislike
 the choice to leave off any mention of "JSON" or "Json" in the global
 symbol names. Using the module either requires to always use a renamed
 import or a manual alias, or the resulting source code will always leave
 the reader wondering what kind of data is actually handled. Handling
 multiple "value" types in a single piece of code, which is not uncommon
 (e.g. JSON + BSON/ini value/...) would always require explicit
 disambiguation. I'd certainly include the "JSON" or "Json" part in the
 names.
Good point, I agree.
 5. Whatever happens, *please* let's aim for a module name of
 std.data.json (similar to std.digest.*), so that any data formats added
 later are nicely organized. All existing data format support (XML + CSV)
 doesn't follow contemporary Phobos style, so they will need to be
 deprecated at some point anyway, freeing the way for a clean an
 non-breaking transition to a more organized module hierarchy.
I agree.
 6. (Possibly compile time optional) support for keeping track of
 line/column numbers is often important for better error messages, so
 that would be good to have included as part of the parser and in the
 "Token" type.
Yah, saw that in vibe.d but forgot about it. Thanks, Andrei
Aug 03 2014
next sibling parent "Dicebot" <public dicebot.lv> writes:
On Sunday, 3 August 2014 at 15:14:43 UTC, Andrei Alexandrescu 
wrote:
 3. Use of "opDispatch" for an open set of members has been 
 criticized
 for vibe.data.json before and I agree with that criticism. The 
 only
 advantage is saving a few keystrokes (json.key instead of 
 json["key"]),
 but I came to the conclusion that the right approach to work 
 with JSON
 values in D is to always directly deserialize when/if possible 
 anyway,
 which mostly makes this is a moot point.
Interesting. Well if experience with opDispatch is negative then it should probably not be used here, or only offered on an opt-in basis.
I support this opinion. opDispatch looks cool with JSON objects when you implement it but it results in many subtle quirks when you consider something like range traits for example - most annoying to encounter and debug. It is not worth the gain.
Aug 03 2014
prev sibling next sibling parent reply =?ISO-8859-1?Q?S=F6nke_Ludwig?= <sludwig rejectedsoftware.com> writes:
Am 03.08.2014 17:14, schrieb Andrei Alexandrescu:
 On 8/3/14, 2:38 AM, Sönke Ludwig wrote:
 A few thoughts based on my experience with vibe.data.json:

 1. No decoding of strings appears to mean that "Value" also always
 contains encoded strings. This seems the be a leaky and also error prone
 leaky abstraction. For the token stream, performance should be top
 priority, so it's okay to not decode there, but "Value" is a high level
 abstraction of a JSON value, so it should really hide all implementation
 details of the storage format.
Nonono. I think there's a confusion. The input strings are not UTF decoded for the simple need there's no need (all tokenization decisions are taken on the basis of ASCII characters/code units). The backslash-prefixed characters are indeed decoded. An optimization I didn't implement yet is to use slices of the input wherever possible (when the input is string, immutable(byte)[], or immutable(ubyte)[]). That will reduce allocations considerably.
Ah okay, *phew* ;) But in that case I'd actually think about leaving off the backslash decoding in the low level parser, so that slices could be used for immutable inputs in all cases - maybe with a name of "rawString" for the stored data and an additional "string" property that decodes on the fly. This may come in handy when the first comparative benchmarks together with rapidjson and the like are done.
 2. Algebraic is a good choice for its generic handling of operations on
 the contained types (which isn't exposed here, though). However, a
 tagged union type in my experience has quite some advantages for
 usability. Since adding a type tag possibly affects the interface in a
 non-backwards compatible way, this should be evaluated early on.
There's a public opCast(Payload) that gives the end user access to the Payload inside a Value. I forgot to add documentation to it.
I see. Suppose that opDispatch would be dropped, would anything speak against "alias this"ing _payload to avoid the need for the manually defined operators?
 What advantages are to a tagged union? (FWIW: to me Algebraic and
 Variant are also tagged unions, just that the tags are not 0, 1, ..., n.
 That can be easily fixed for Algebraic by defining operations to access
 the index of the currently-stored type.)
The two major points are probably that it's possible to use "final switch" on the type tag if it's an enum, and the type id can be easily stored in both integer and string form (which is not as conveniently possible with a TypeInfo).
 (...)

 The way I see it, good work on tagged unions must be either integrated
 within std.variant (either by modifying Variant/Algebraic or by adding
 new types to it). I am very strongly opposed to adding a tagged union
 type only for JSON purposes, which I'd consider essentially a usability
 bug in std.variant, the opposite of dogfooding, etc.
Definitely agree there. An enum based tagged union design also currently has the unfortunate property that the order of enum values and that of the accepted types must be defined consistently, or bad things will happen. Supporting UDAs on enum values would be a possible direction to fix this: enum JsonType { variantType!string string, variantType!(JsonValue[]) array, variantType!(JsonValue[string]) object } alias JsonValue = TaggedUnion!JsonType; But then there are obviously still issues with cyclic type references. So, anyway, this is something that still requires some thought. It could also be designed in a way that is backwards compatible with a pure "Algebraic", so it shouldn't be a blocker for the current design.
Aug 03 2014
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/3/14, 11:03 AM, Sönke Ludwig wrote:
 Am 03.08.2014 17:14, schrieb Andrei Alexandrescu:
[snip]
 Ah okay, *phew* ;) But in that case I'd actually think about leaving off
 the backslash decoding in the low level parser, so that slices could be
 used for immutable inputs in all cases - maybe with a name of
 "rawString" for the stored data and an additional "string" property that
 decodes on the fly. This may come in handy when the first comparative
 benchmarks together with rapidjson and the like are done.
Yah, that's awesome.
 There's a public opCast(Payload) that gives the end user access to the
 Payload inside a Value. I forgot to add documentation to it.
I see. Suppose that opDispatch would be dropped, would anything speak against "alias this"ing _payload to avoid the need for the manually defined operators?
Correct. In fact the conversion was there but I removed it for the sake of opDispatch.
 What advantages are to a tagged union? (FWIW: to me Algebraic and
 Variant are also tagged unions, just that the tags are not 0, 1, ..., n.
 That can be easily fixed for Algebraic by defining operations to access
 the index of the currently-stored type.)
The two major points are probably that it's possible to use "final switch" on the type tag if it's an enum,
So I just tried this: http://dpaste.dzfl.pl/eeadac68fac0. Sadly, the cast doesn't take. Without the cast the enum does compile, but not the switch. I submitted https://issues.dlang.org/show_bug.cgi?id=13247.
 and the type id can be easily stored in both integer and string form
 (which is not as conveniently possible with a TypeInfo).
I think here pointers to functions "win" because getting a string (or anything else for that matter) is an indirect call away. std.variant has been among the first artifacts I wrote for D. It's a topic I've been dabbling in for a long time in a C++ context (http://goo.gl/zqUwFx), with always almost-satisfactory results. I told myself if I get to implement things in D properly, then this language has good potential. Replacing the integral tag I'd always used with a pointer to function is, I think, net progress. Things turned out fine, save for the switch matter.
 An enum based tagged union design also currently has the unfortunate
 property that the order of enum values and that of the accepted types
 must be defined consistently, or bad things will happen. Supporting UDAs
 on enum values would be a possible direction to fix this:

      enum JsonType {
           variantType!string string,
           variantType!(JsonValue[]) array,
           variantType!(JsonValue[string]) object
      }
      alias JsonValue = TaggedUnion!JsonType;

 But then there are obviously still issues with cyclic type references.
 So, anyway, this is something that still requires some thought. It could
 also be designed in a way that is backwards compatible with a pure
 "Algebraic", so it shouldn't be a blocker for the current design.
I think something can be designed along these lines if necessary. Andrei
Aug 03 2014
prev sibling parent "Wyatt" <wyatt.epp gmail.com> writes:
On Sunday, 3 August 2014 at 15:14:43 UTC, Andrei Alexandrescu 
wrote:
 On 8/3/14, 2:38 AM, Sönke Ludwig wrote:
 3. Use of "opDispatch" for an open set of members has been
 criticized for vibe.data.json before and I agree with that
 criticism. The only advantage is saving a few keystrokes
 (json.key instead of json["key"]), but I came to the conclusion
 that the right approach to work with JSON values in D is to
 always directly deserialize when/if possible anyway, which
 mostly makes this is a moot point.
Interesting. Well if experience with opDispatch is negative then it should probably not be used here, or only offered on an opt-in basis.
I suspect that depends on the circumstances. I've been using this style (with Adam's jsvar), and I find it quite nice for decomposing my TOML parse trees to Variant-like structures that go several levels deep. It makes reading (and, consequently, reasoning about) them much easier for me. That said, I think the ideal would be that nesting Variant[] should work predictably such that users can just write a one-line opDispatch if they want it to behave that way. -Wyatt
Aug 04 2014
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/3/14, 2:38 AM, Sönke Ludwig wrote:
[snip]

We need to address the matter of std.jgrandson competing with
vibe.data.json. Clearly at a point only one proposal will have to be
accepted so the other would be wasted work.

Following our email exchange I decided to work on this because (a) you
mentioned more work is needed and your schedule was unclear, (b) we need
this at FB sooner rather than later, (c) there were a few things I
thought can be improved in vibe.data.json. I hope that taking
std.jgrandson to proof spurs things into action.

Would you want to merge some of std.jgrandson's deltas into a new
proposal std.data.json based on vibe.data.json? Here's a few things that
I consider necessary:

1. Commit to a schedule. I can't abandon stuff in wait for the perfect 
design that may or may not come someday.

2. Avoid UTF decoding.

3. Offer a lazy token stream as a basis for a non-lazy parser. A lazy 
general parser would be considerably more difficult to write and would 
only serve a small niche. On the other hand, a lazy tokenizer is easy to 
write and make efficient, and serve as a basis for user-defined 
specialized lazy parsers if the user wants so.

4. Avoid string allocation. String allocation can be replaced with 
slices of the input when these two conditions are true: (a) input type 
is string, immutable(byte)[], or immutable(ubyte)[]; (b) there are no 
backslash-encoded sequences in the string, i.e. the input string and the 
actual string are the same.

5. Build on std.variant through and through. Again, anything that 
doesn't work is a usability bug in std.variant, which was designed for 
exactly this kind of stuff. Exposing the representation such that user 
code benefits of the Algebraic's primitives may be desirable.

6. Address w0rp's issue with undefined. In fact std.Algebraic does have 
an uninitialized state :o).

Sönke, what do you think?


Andrei
Aug 03 2014
next sibling parent reply Johannes Pfau <nospam example.com> writes:
Am Sun, 03 Aug 2014 08:34:20 -0700
schrieb Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>:

 On 8/3/14, 2:38 AM, S=C3=B6nke Ludwig wrote:
 [snip]
=20
 We need to address the matter of std.jgrandson competing with
 vibe.data.json. Clearly at a point only one proposal will have to be
 accepted so the other would be wasted work.
=20
 [...]
=20
 4. Avoid string allocation. String allocation can be replaced with=20
 slices of the input when these two conditions are true: (a) input
 type is string, immutable(byte)[], or immutable(ubyte)[]; (b) there
 are no backslash-encoded sequences in the string, i.e. the input
 string and the actual string are the same.
I think for the lowest level interface we could avoid allocation completely: The tokenizer could always return slices to the raw string, even if a string contains backslash-encode sequences or if the token is a number. Simply expose that as token.rawValue. Then add a function, Token.decodeString() and token.decodeNumber() to actually decode the numbers. decodeString could additionally support decoding into a buffer. If the input is not sliceable, read the input into an internal buffer first and slice that buffer. The main usecase for this is if you simply stream lots of data and you only want to parse very little of it and skip over most content. Then you don't need to decode the strings. This is also true if you only write a JSON formatter: No need to decode and encode the strings.
=20
 5. Build on std.variant through and through. Again, anything that=20
 doesn't work is a usability bug in std.variant, which was designed
 for exactly this kind of stuff. Exposing the representation such that
 user code benefits of the Algebraic's primitives may be desirable.
=20
Variant uses TypeInfo internally, right? I think as long as it uses TypeInfo it can't replace all use-cases for a standard tagged union.
Aug 03 2014
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/3/14, 8:51 AM, Johannes Pfau wrote:
 Am Sun, 03 Aug 2014 08:34:20 -0700
 schrieb Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>:

 On 8/3/14, 2:38 AM, Sönke Ludwig wrote:
 [snip]

 We need to address the matter of std.jgrandson competing with
 vibe.data.json. Clearly at a point only one proposal will have to be
 accepted so the other would be wasted work.

 [...]

 4. Avoid string allocation. String allocation can be replaced with
 slices of the input when these two conditions are true: (a) input
 type is string, immutable(byte)[], or immutable(ubyte)[]; (b) there
 are no backslash-encoded sequences in the string, i.e. the input
 string and the actual string are the same.
I think for the lowest level interface we could avoid allocation completely: The tokenizer could always return slices to the raw string, even if a string contains backslash-encode sequences or if the token is a number. Simply expose that as token.rawValue. Then add a function, Token.decodeString() and token.decodeNumber() to actually decode the numbers. decodeString could additionally support decoding into a buffer.
That works but not e.g. for File.byLine which reuses its internal buffer. But it's a neat idea for arrays of immutable bytes.
 If the input is not sliceable, read the input into an internal buffer
 first and slice that buffer.
At that point the cost of decoding becomes negligible.
 The main usecase for this is if you simply stream lots of data and you
 only want to parse very little of it and skip over most content. Then
 you don't need to decode the strings.
Awesome.
 This is also true if you only
 write a JSON formatter: No need to decode and encode the strings.
But wouldn't that still need to encode \n, \r, \t, \v?
 5. Build on std.variant through and through. Again, anything that
 doesn't work is a usability bug in std.variant, which was designed
 for exactly this kind of stuff. Exposing the representation such that
 user code benefits of the Algebraic's primitives may be desirable.
Variant uses TypeInfo internally, right?
No. Andrei
Aug 03 2014
parent reply Johannes Pfau <nospam example.com> writes:
Am Sun, 03 Aug 2014 09:17:57 -0700
schrieb Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>:

 On 8/3/14, 8:51 AM, Johannes Pfau wrote:
 Variant uses TypeInfo internally, right?
No.
https://github.com/D-Programming-Language/phobos/blob/master/std/variant.d#L210 https://github.com/D-Programming-Language/phobos/blob/master/std/variant.d#L371 https://github.com/D-Programming-Language/phobos/blob/master/std/variant.d#L696 Also the handler function concept will always have more overhead than a simple tagged union. It is certainly useful if you want to store any type, but if you only want a limited set of types there are more efficient implementations.
Aug 03 2014
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/3/14, 11:08 AM, Johannes Pfau wrote:
 Am Sun, 03 Aug 2014 09:17:57 -0700
 schrieb Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>:

 On 8/3/14, 8:51 AM, Johannes Pfau wrote:
 Variant uses TypeInfo internally, right?
No.
https://github.com/D-Programming-Language/phobos/blob/master/std/variant.d#L210
That's a query for the TypeInfo.
 https://github.com/D-Programming-Language/phobos/blob/master/std/variant.d#L371
That could be translated to a comparison of pointers to functions.
 https://github.com/D-Programming-Language/phobos/blob/master/std/variant.d#L696
That, too, could be translated to a comparison of pointers to functions. It's a confision Let me clarify this. What Variant does is to use pointers to functions instead of integers. The space overhead (one word) is generally the same due to alignment issues.
 Also the handler function concept will always have more overhead than a
 simple tagged union. It is certainly useful if you want to store any
 type, but if you only want a limited set of types there are more
 efficient implementations.
I'm not sure at all actually. The way I see it a pointer to a function offers most everything an integer does, plus universal functionality by actually calling the function. What it doesn't offer is ordering of small integers, but that can be easily arranged at a small cost. Andrei
Aug 03 2014
prev sibling parent reply =?ISO-8859-1?Q?S=F6nke_Ludwig?= <sludwig rejectedsoftware.com> writes:
Am 03.08.2014 17:34, schrieb Andrei Alexandrescu:
 On 8/3/14, 2:38 AM, Sönke Ludwig wrote:
 [snip]

 We need to address the matter of std.jgrandson competing with
 vibe.data.json. Clearly at a point only one proposal will have to be
 accepted so the other would be wasted work.

 Following our email exchange I decided to work on this because (a) you
 mentioned more work is needed and your schedule was unclear, (b) we need
 this at FB sooner rather than later, (c) there were a few things I
 thought can be improved in vibe.data.json. I hope that taking
 std.jgrandson to proof spurs things into action.

 Would you want to merge some of std.jgrandson's deltas into a new
 proposal std.data.json based on vibe.data.json? Here's a few things that
 I consider necessary:

 1. Commit to a schedule. I can't abandon stuff in wait for the perfect
 design that may or may not come someday.
This may be the crux w.r.t. the vibe.data.json implementation. My schedule will be very crowded this month, so I could only really start to work on it beginning of September. But apart from the mentioned points, I think your implementation is already the closest thing to what I have in mind, so I'm all for going the clean slate route (I'll have to do a lot in terms of deprecation work in vibe.d anyway).
 2. Avoid UTF decoding.

 3. Offer a lazy token stream as a basis for a non-lazy parser. A lazy
 general parser would be considerably more difficult to write and would
 only serve a small niche. On the other hand, a lazy tokenizer is easy to
 write and make efficient, and serve as a basis for user-defined
 specialized lazy parsers if the user wants so.

 4. Avoid string allocation. String allocation can be replaced with
 slices of the input when these two conditions are true: (a) input type
 is string, immutable(byte)[], or immutable(ubyte)[]; (b) there are no
 backslash-encoded sequences in the string, i.e. the input string and the
 actual string are the same.

 5. Build on std.variant through and through. Again, anything that
 doesn't work is a usability bug in std.variant, which was designed for
 exactly this kind of stuff. Exposing the representation such that user
 code benefits of the Algebraic's primitives may be desirable.

 6. Address w0rp's issue with undefined. In fact std.Algebraic does have
 an uninitialized state :o).

 Sönke, what do you think?
My requirements would be the same, except for 6. The "undefined" state in the vibe.d version was necessary due to early API decisions and it's more or less a prominent part of it (specifically because the API was designed to behave similar to JavaScript). In hindsight, I'd definitely avoid that. However, I don't think its existence (also in the form of Algebraic.init) is an issue per se, as long as such values are properly handled when converting the runtime value back to a JSON string (i.e. skipped or treated as null values).
Aug 03 2014
next sibling parent reply "w0rp" <devw0rp gmail.com> writes:
On Sunday, 3 August 2014 at 18:37:48 UTC, Sönke Ludwig wrote:
 Am 03.08.2014 17:34, schrieb Andrei Alexandrescu:
 6. Address w0rp's issue with undefined. In fact std.Algebraic 
 does have
 an uninitialized state :o).
My requirements would be the same, except for 6. The "undefined" state in the vibe.d version was necessary due to early API decisions and it's more or less a prominent part of it (specifically because the API was designed to behave similar to JavaScript). In hindsight, I'd definitely avoid that. However, I don't think its existence (also in the form of Algebraic.init) is an issue per se, as long as such values are properly handled when converting the runtime value back to a JSON string (i.e. skipped or treated as null values).
My issue with is is that if you ask for a key in an object which doesn't exist, you get an 'undefined' value back, just like JavaScript. I'd rather that be propagated as a RangeError, which is more consistent with associative arrays in the language and probably more correct. A minor issue is being able to create a Json object which isn't a valid Json object by itself. I'd rather the initial value was just 'null', which would match how pointers and class instances behave in the language.
Aug 03 2014
parent reply =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:
Am 03.08.2014 20:57, schrieb w0rp:
 On Sunday, 3 August 2014 at 18:37:48 UTC, Sönke Ludwig wrote:
 The "undefined" state in the vibe.d version was necessary due to early
 API decisions and it's more or less a prominent part of it
 (specifically because the API was designed to behave similar to
 JavaScript). In hindsight, I'd definitely avoid that. However, I don't
 think its existence (also in the form of Algebraic.init) is an issue
 per se, as long as such values are properly handled when converting
 the runtime value back to a JSON string (i.e. skipped or treated as
 null values).
My issue with is is that if you ask for a key in an object which doesn't exist, you get an 'undefined' value back, just like JavaScript. I'd rather that be propagated as a RangeError, which is more consistent with associative arrays in the language and probably more correct.
Yes, this is what I meant with the JavaScript part of API. In addition to opIndex(), there should of course also be a .get(key, default_value) style accessor and the "in" operator.
 A minor
 issue is being able to create a Json object which isn't a valid Json
 object by itself. I'd rather the initial value was just 'null', which
 would match how pointers and class instances behave in the language.
This is what I meant with not being an issue by itself. But having such a special value of course has its pros and cons, and I could personally definitely also live with JSON values being initialized to JSON "null", if somebody hacks Algebraic to support that kind of use case.
Aug 03 2014
parent "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:
On Sunday, 3 August 2014 at 19:54:12 UTC, Sönke Ludwig wrote:
 Am 03.08.2014 20:57, schrieb w0rp:
 My issue with is is that if you ask for a key in an object 
 which doesn't
 exist, you get an 'undefined' value back, just like 
 JavaScript. I'd
 rather that be propagated as a RangeError, which is more 
 consistent with
 associative arrays in the language and probably more correct.
Yes, this is what I meant with the JavaScript part of API. In addition to opIndex(), there should of course also be a .get(key, default_value) style accessor and the "in" operator.
There is a parallel discussion about the concept of associative ranges: http://forum.dlang.org/thread/jheurakujksdlrjaoncs forum.dlang.org Maybe you could also have a look there, because JSON seems to be a good candidate for an associative range.
Aug 04 2014
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/3/14, 11:37 AM, Sönke Ludwig wrote:
 Am 03.08.2014 17:34, schrieb Andrei Alexandrescu:
 On 8/3/14, 2:38 AM, Sönke Ludwig wrote:
 [snip]

 We need to address the matter of std.jgrandson competing with
 vibe.data.json. Clearly at a point only one proposal will have to be
 accepted so the other would be wasted work.

 Following our email exchange I decided to work on this because (a) you
 mentioned more work is needed and your schedule was unclear, (b) we need
 this at FB sooner rather than later, (c) there were a few things I
 thought can be improved in vibe.data.json. I hope that taking
 std.jgrandson to proof spurs things into action.

 Would you want to merge some of std.jgrandson's deltas into a new
 proposal std.data.json based on vibe.data.json? Here's a few things that
 I consider necessary:

 1. Commit to a schedule. I can't abandon stuff in wait for the perfect
 design that may or may not come someday.
This may be the crux w.r.t. the vibe.data.json implementation. My schedule will be very crowded this month, so I could only really start to work on it beginning of September. But apart from the mentioned points, I think your implementation is already the closest thing to what I have in mind, so I'm all for going the clean slate route (I'll have to do a lot in terms of deprecation work in vibe.d anyway).
What would be your estimated time of finishing? Would anyone want to take vibe.data.json and std.jgrandson, put them in a crucible, and have std.data.json emerge from it in a timely manner? My understanding is that everyone involved would be cool with that. Andrei
Aug 03 2014
parent =?ISO-8859-1?Q?S=F6nke_Ludwig?= <sludwig rejectedsoftware.com> writes:
Am 03.08.2014 21:53, schrieb Andrei Alexandrescu:
 What would be your estimated time of finishing?
My rough estimate would be that about two weeks of calender time should suffice for a first candidate, since the functionality and the design is already mostly there. However, it seems that VariantN will need some work, too (currently using opAdd results in an error for an Algebraic defined for JSON usage).
Aug 05 2014
prev sibling next sibling parent "w0rp" <devw0rp gmail.com> writes:
On Sunday, 3 August 2014 at 07:16:05 UTC, Andrei Alexandrescu 
wrote:
 We need a better json library at Facebook. I'd discussed with 
 Sönke the possibility of taking vibe.d's json to std but he 
 said it needs some more work. So I took std.jgrandson to proof 
 of concept state and hence ready for destruction:

 http://erdani.com/d/jgrandson.d
 http://erdani.com/d/phobos-prerelease/std_jgrandson.html

 Here are a few differences compared to vibe.d's library. I 
 think these are desirable to have in that library as well:

 * Parsing strings is decoupled into tokenization (which is lazy 
 and only needs an input range) and parsing proper. Tokenization 
 is lazy, which allows users to create their own advanced (e.g. 
 partial/lazy) parsing if needed. The parser itself is eager.

 * There's no decoding of strings.

 * The representation is built on Algebraic, with the advantages 
 that it benefits from all of its primitives. Implementation is 
 also very compact because Algebraic obviates a bunch of 
 boilerplate. Subsequent improvements to Algebraic will also 
 reflect themselves into improvements to std.jgrandson.

 * The JSON value (called std.jgrandson.Value) has no named 
 member variables or methods except for __payload. This is so 
 there's no clash between dynamic properties exposed via 
 opDispatch.

 Well that's about it. What would it take for this to become a 
 Phobos proposal? Destroy.


 Andrei
I like it. Here's what I think about it. * When I wrote my JSON library, the thing I wanted most was constructors and opAssign functions for creating JSON values easily. JSON x = "some string"; You have this, so it's great. * You didn't include an 'undefined' value like vibe.d, which is a very minor detail, but something I dislike. This is good. * I'd just name Value either 'JSON' or 'JSONValue.' So you can just import the module without using aliases. * opDispatch is kind of "meh" for JSON objects. It works until you hit a name clash with a UFCS function. I don't mind typing the extra three characters. That's all I could think of really.
Aug 03 2014
prev sibling next sibling parent reply Daniel Gibson <metalcaedes gmail.com> writes:
Am 03.08.2014 09:16, schrieb Andrei Alexandrescu:
 We need a better json library at Facebook. I'd discussed with Sönke the
 possibility of taking vibe.d's json to std but he said it needs some
 more work. So I took std.jgrandson to proof of concept state and hence
 ready for destruction:

 http://erdani.com/d/jgrandson.d
 http://erdani.com/d/phobos-prerelease/std_jgrandson.html
Is the name supposed to stay or just a working title? "std.j*grandson*" (being the successor of "std.j*son*") is of course a funny play of words, but it's not really obvious on the first sight what it does. i.e. if someone skims the std. modules in the documentation, looking for json, he'd probably not think that this is the new json module. std.json2 or something like that would be more obvious. Cheers, Daniel
Aug 03 2014
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/3/14, 9:49 AM, Daniel Gibson wrote:
 Am 03.08.2014 09:16, schrieb Andrei Alexandrescu:
 We need a better json library at Facebook. I'd discussed with Sönke the
 possibility of taking vibe.d's json to std but he said it needs some
 more work. So I took std.jgrandson to proof of concept state and hence
 ready for destruction:

 http://erdani.com/d/jgrandson.d
 http://erdani.com/d/phobos-prerelease/std_jgrandson.html
Is the name supposed to stay or just a working title?
Just a working title, but of course if it were wildly successful... but then again it's not. -- Andrei
Aug 03 2014
prev sibling next sibling parent reply "Sean Kelly" <sean invisibleduck.org> writes:
I don't want to pay for anything I don't use.  No allocations 
should occur within the parser and it should simply slice up the 
input.  So the lowest layer should allow me to iterate across 
symbols in some way.  When I've done this in the past it was 
SAX-style (ie. a callback per type) but with the range interface 
that shouldn't be necessary.

The parser shouldn't decode or convert anything unless I ask it 
to.  Most of the time I only care about specific values, and 
paying for conversions on everything is wasted process time.

I suggest splitting number into float and integer types.  In a 
language like D where these are distinct internal types, it can 
be valuable to know this up front.

Is there support for output?  I see the makeArray and makeObject 
routines...  Ideally, there should be a way to serialize JSON 
against an OutputRange with optional formatting.
Aug 03 2014
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/3/14, 10:19 AM, Sean Kelly wrote:
 I don't want to pay for anything I don't use.  No allocations should
 occur within the parser and it should simply slice up the input.
What to do about arrays and objects, which would naturally allocate arrays and associative arrays respectively? What about strings with backslash-encoded characters? No allocation works for tokenization, but parsing is a whole different matter.
 So the
 lowest layer should allow me to iterate across symbols in some way.
Yah, that would be the tokenizer.
 When I've done this in the past it was SAX-style (ie. a callback per
 type) but with the range interface that shouldn't be necessary.

 The parser shouldn't decode or convert anything unless I ask it to.
 Most of the time I only care about specific values, and paying for
 conversions on everything is wasted process time.
That's tricky. Once you scan for 2 specific characters you may as well scan for a couple more, the added cost is negligible. In contrast, scanning once for finding termination and then again for decoding purposes will definitely be a lot more expensive.
 I suggest splitting number into float and integer types.  In a language
 like D where these are distinct internal
bfulifbucivrdfvhhjnrunrgultdjbjutypes, it can be valuable to
 know this up front.
Yah, that kept on sticking like a sore thumb throughout.
 Is there support for output?  I see the makeArray and makeObject
 routines...  Ideally, there should be a way to serialize JSON against an
 OutputRange with optional formatting.
Not yet, and yah those should be in. Andrei
Aug 03 2014
next sibling parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
03-Aug-2014 21:40, Andrei Alexandrescu пишет:
 On 8/3/14, 10:19 AM, Sean Kelly wrote:
 I don't want to pay for anything I don't use.  No allocations should
 occur within the parser and it should simply slice up the input.
What to do about arrays and objects, which would naturally allocate arrays and associative arrays respectively? What about strings with backslash-encoded characters?
SAX-style would imply that array is "parsed" by calling 6 user-defined callbacks inside of a parser: startArray, endArray, startObject, endObject, id and value. A simplified pseudo-code of JSON-parser inner loop is then: if(cur == '[') startArray(); else if(cur == '{'){ startObject(); else if(cur == '}') endObject(); else if(cur == ']') endArray(); else{ if(expectObjectKey){ id(parseAsIdentifier()); } else value(parseAsValue()); } This is as barebones as it can get and is very fast in practice esp. in context of searching/extracting/matching specific sub-tries of JSON documents. -- Dmitry Olshansky
Aug 03 2014
parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
03-Aug-2014 23:54, Dmitry Olshansky пишет:
 03-Aug-2014 21:40, Andrei Alexandrescu пишет:
 A simplified pseudo-code of JSON-parser inner loop is then:

 if(cur == '[')
         startArray();
 else if(cur == '{'){
Aw. Stray brace.. -- Dmitry Olshansky
Aug 03 2014
prev sibling parent reply "Sean Kelly" <sean invisibleduck.org> writes:
On Sunday, 3 August 2014 at 17:40:48 UTC, Andrei Alexandrescu 
wrote:
 On 8/3/14, 10:19 AM, Sean Kelly wrote:
 I don't want to pay for anything I don't use.  No allocations 
 should
 occur within the parser and it should simply slice up the 
 input.
What to do about arrays and objects, which would naturally allocate arrays and associative arrays respectively? What about strings with backslash-encoded characters?
This is tricky with a range. With an event-based parser I'd have events for object and array begin / end, but with a range you end up having an element that's a token, which is pretty weird. For encoded characters (and you need to make sure you handle surrogate pairs in your decoder) I'd still provide some means of decoding on demand. If nothing else, decode lazily when the user asks for the string value. That way the user isn't paying to decode strings he isn't interested in.
 No allocation works for tokenization, but parsing is a whole 
 different matter.

 So the
 lowest layer should allow me to iterate across symbols in some 
 way.
Yah, that would be the tokenizer.
But that will halt on comma and colon and such, correct? That's a tad lower than I'd want, though I guess it would be easy enough to build a parser on top of it.
 When I've done this in the past it was SAX-style (ie. a 
 callback per
 type) but with the range interface that shouldn't be necessary.

 The parser shouldn't decode or convert anything unless I ask 
 it to.
 Most of the time I only care about specific values, and paying 
 for
 conversions on everything is wasted process time.
That's tricky. Once you scan for 2 specific characters you may as well scan for a couple more, the added cost is negligible. In contrast, scanning once for finding termination and then again for decoding purposes will definitely be a lot more expensive.
I think I'm getting a bit confused. For the JSON parser I wrote, the parser performs full validation but leaves the content as-is, then provides a routine to decode values from their string representation if the user wishes to. I'm not sure where scanning figures in here.
 Andrei
Aug 03 2014
parent "Jacob Carlborg" <doob me.com> writes:
On Sunday, 3 August 2014 at 20:40:47 UTC, Sean Kelly wrote:

 This is tricky with a range. With an event-based parser I'd 
 have events for object and array begin / end, but with a range 
 you end up having an element that's a token, which is pretty 
 weird.
Have a look at Token.Kind in the top of the module [1]. The enum has objectStart, objectEnd, arrayStart and arrayEnd. By just looking that that, it seems it already works very similar to an event parser, but with a range API. This is exactly like the XML pull parser in Tango. [1] http://erdani.com/d/jgrandson.d -- /Jacob Carlborg
Aug 04 2014
prev sibling parent "Jacob Carlborg" <doob me.com> writes:
On Sunday, 3 August 2014 at 17:19:04 UTC, Sean Kelly wrote:

 Is there support for output?  I see the makeArray and 
 makeObject routines...  Ideally, there should be a way to 
 serialize JSON against an OutputRange with optional formatting.
I think it should only provide very primitive functions to serialize basic data types. Then Phobos should provide a separate module/package for generic serialization where JSON is an archive type using this module as its backend. -- /Jacob Carlborg
Aug 04 2014
prev sibling next sibling parent Orvid King <blah38621 gmail.com> writes:
On 8/3/2014 2:16 AM, Andrei Alexandrescu wrote:
 We need a better json library at Facebook. I'd discussed with Sönke the
 possibility of taking vibe.d's json to std but he said it needs some
 more work. So I took std.jgrandson to proof of concept state and hence
 ready for destruction:

 http://erdani.com/d/jgrandson.d
 http://erdani.com/d/phobos-prerelease/std_jgrandson.html

 Here are a few differences compared to vibe.d's library. I think these
 are desirable to have in that library as well:

 * Parsing strings is decoupled into tokenization (which is lazy and only
 needs an input range) and parsing proper. Tokenization is lazy, which
 allows users to create their own advanced (e.g. partial/lazy) parsing if
 needed. The parser itself is eager.

 * There's no decoding of strings.

 * The representation is built on Algebraic, with the advantages that it
 benefits from all of its primitives. Implementation is also very compact
 because Algebraic obviates a bunch of boilerplate. Subsequent
 improvements to Algebraic will also reflect themselves into improvements
 to std.jgrandson.

 * The JSON value (called std.jgrandson.Value) has no named member
 variables or methods except for __payload. This is so there's no clash
 between dynamic properties exposed via opDispatch.

 Well that's about it. What would it take for this to become a Phobos
 proposal? Destroy.


 Andrei
If your looking for serialization from statically known type layouts then I believe my JSON (de)serialization code (https://github.com/Orvid/JSONSerialization) might actually be of interest to you, as it uses no intermediate representation, nor does it allocate when it converts an object to JSON. As far as I know, even when only compiled with DMD, it's among the fastest JSON (de)serialization libraries. Unless it needs to convert a floating point number to a string, in which case I suppose you could certainly use a local buffer to write to, but at the moment it just converts it to a normal string that gets written to the output range. It also supports (de)serializing from, what I called at the time, dynamic types, such as std.variant, which isn't actually supported because that code is only there because I needed it for something else, and wasn't using std.variant at the time.
Aug 03 2014
prev sibling next sibling parent reply "Andrea Fontana" <nospam example.com> writes:
On Sunday, 3 August 2014 at 07:16:05 UTC, Andrei Alexandrescu 
wrote:
 We need a better json library at Facebook. I'd discussed with 
 Sönke the possibility of taking vibe.d's json to std but he 
 said it needs some more work. So I took std.jgrandson to proof 
 of concept state and hence ready for destruction:

 http://erdani.com/d/jgrandson.d
 http://erdani.com/d/phobos-prerelease/std_jgrandson.html

 Here are a few differences compared to vibe.d's library. I 
 think these are desirable to have in that library as well:

 * Parsing strings is decoupled into tokenization (which is lazy 
 and only needs an input range) and parsing proper. Tokenization 
 is lazy, which allows users to create their own advanced (e.g. 
 partial/lazy) parsing if needed. The parser itself is eager.

 * There's no decoding of strings.

 * The representation is built on Algebraic, with the advantages 
 that it benefits from all of its primitives. Implementation is 
 also very compact because Algebraic obviates a bunch of 
 boilerplate. Subsequent improvements to Algebraic will also 
 reflect themselves into improvements to std.jgrandson.

 * The JSON value (called std.jgrandson.Value) has no named 
 member variables or methods except for __payload. This is so 
 there's no clash between dynamic properties exposed via 
 opDispatch.

 Well that's about it. What would it take for this to become a 
 Phobos proposal? Destroy.


 Andrei
On my bson library I found very useful to have some methods to know if a field exists or not, and to get a "defaulted" value. Something like: auto assume(T)(Value v, T default = T.init); Another good method could be something like xpath to get a deep value: Value v = value["/path/to/sub/object"]; Moreover in my library I actually have three different methods to read a value: T get(T)() // Exception if value is not a T or not valid or value doesn't exist T to(T)() // Try to convert value to T using to!string. Exception if doesn't exists or not valid BsonField!T as(T)(lazy T default = T.init) // Always return a value BsonField!T is an "alias this"-ed struct with two fields: T value and bool error(). T value is the aliased field, and error() tells you if value is defaulted (because of an error: field not exists or can't convert to T) So I can write something like this: int myvalue = json["/that/deep/property"].as!int; or auto myvalue = json["/that/deep/property"].as!int(10); if (myvalue.error) writeln("Property doesn't exists, I'm using default value); writeln("Property value: ", myvalue); I hope this can be useful...
Aug 04 2014
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/4/14, 12:47 AM, Andrea Fontana wrote:
 On my bson library I found very useful to have some methods to know if a
 field exists or not, and to get a "defaulted" value. Something like:

 auto assume(T)(Value v, T default = T.init);
Nice. Probably "get" would be better to be in keep with built-in hashtables.
 Another good method could be something like xpath to get a deep value:

 Value v = value["/path/to/sub/object"];
Cool. Is it unlikely that a value contains an actual slash? If so would be value["path"]["to"]["sub"]["object"] more precise?
 Moreover in my library I actually have three different methods to read a
 value:

 T get(T)() // Exception if value is not a T or not valid or value
 doesn't exist
 T to(T)()  // Try to convert value to T using to!string. Exception if
 doesn't exists or not valid

 BsonField!T as(T)(lazy T default = T.init)  // Always return a value

 BsonField!T is an "alias this"-ed struct with two fields: T value and
 bool error(). T value is the aliased field, and error() tells you if
 value is defaulted (because of an error: field not exists or can't
 convert to T)

 So I can write something like this:

 int myvalue = json["/that/deep/property"].as!int;

 or

 auto myvalue = json["/that/deep/property"].as!int(10);

 if (myvalue.error) writeln("Property doesn't exists, I'm using default
 value);

 writeln("Property value: ", myvalue);

 I hope this can be useful...
Sure is, thanks. Listen, would you want to volunteer a std.data.json proposal? Andrei
Aug 04 2014
parent reply "Andrea Fontana" <nospam example.com> writes:
On Monday, 4 August 2014 at 16:58:12 UTC, Andrei Alexandrescu 
wrote:
 On 8/4/14, 12:47 AM, Andrea Fontana wrote:
 On my bson library I found very useful to have some methods to 
 know if a
 field exists or not, and to get a "defaulted" value. Something 
 like:

 auto assume(T)(Value v, T default = T.init);
Nice. Probably "get" would be better to be in keep with built-in hashtables.
I wrote assume just to use proposed syntax :)
 Another good method could be something like xpath to get a 
 deep value:

 Value v = value["/path/to/sub/object"];
Cool. Is it unlikely that a value contains an actual slash? If so would be value["path"]["to"]["sub"]["object"] more precise?
Key with a slash (or dot?) inside is not common at all. Never seen on json data. In many languages there're libraries to bind json to struct or objects so usually people doesn't use strange chars inside key. If needed you can still use old good method to read a single field. value["path"]["to"]["object"] was my first choice but i didn't like it. First: it create a lot of temporary objects. Second: it is easier to implement using a single string (also on assignment) I gave it a try with value["path", "to", "index"] but it's not confortable if you need to generate your path from code.
 Moreover in my library I actually have three different methods 
 to read a
 value:

 T get(T)() // Exception if value is not a T or not valid or 
 value
 doesn't exist
 T to(T)()  // Try to convert value to T using to!string. 
 Exception if
 doesn't exists or not valid

 BsonField!T as(T)(lazy T default = T.init)  // Always return a 
 value

 BsonField!T is an "alias this"-ed struct with two fields: T 
 value and
 bool error(). T value is the aliased field, and error() tells 
 you if
 value is defaulted (because of an error: field not exists or 
 can't
 convert to T)

 So I can write something like this:

 int myvalue = json["/that/deep/property"].as!int;

 or

 auto myvalue = json["/that/deep/property"].as!int(10);

 if (myvalue.error) writeln("Property doesn't exists, I'm using 
 default
 value);

 writeln("Property value: ", myvalue);

 I hope this can be useful...
Sure is, thanks. Listen, would you want to volunteer a std.data.json proposal?
What does it mean? :)
 Andrei
Aug 05 2014
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/5/14, 2:08 AM, Andrea Fontana wrote:
 Sure is, thanks. Listen, would you want to volunteer a std.data.json
 proposal?
What does it mean? :)
One one side enters vibe.data.json with the deltas prompted by std.jgrandson plus your talent and determination, and on the other side comes std.data.json with code and documentation that passes the Phobos review process. -- Andrei
Aug 05 2014
prev sibling parent reply "Jacob Carlborg" <doob me.com> writes:
On Sunday, 3 August 2014 at 07:16:05 UTC, Andrei Alexandrescu 
wrote:
 We need a better json library at Facebook. I'd discussed with 
 Sönke the possibility of taking vibe.d's json to std but he 
 said it needs some more work. So I took std.jgrandson to proof 
 of concept state and hence ready for destruction:

 http://erdani.com/d/jgrandson.d
 http://erdani.com/d/phobos-prerelease/std_jgrandson.html
* Could you please put it on Github to get syntax highlighting and all the other advantages * It doesn't completely follow the Phobos naming conventions * The indentation is off in some places * The unit tests is a bit lacking for the separate parsing functions * There are methods for getting the strings and numbers, what about booleans? * Shouldn't it be called TokenRange? * Shouldn't this be built using the lexer generator you so strongly have been pushing for? * The unit tests for TokenStream is very dense. I would prefer empty newlines for grouping "assert" and calls to "popFront" belonging together -- /Jacob Carlborg
Aug 04 2014
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/4/14, 12:56 AM, Jacob Carlborg wrote:
 On Sunday, 3 August 2014 at 07:16:05 UTC, Andrei Alexandrescu wrote:
 We need a better json library at Facebook. I'd discussed with Sönke
 the possibility of taking vibe.d's json to std but he said it needs
 some more work. So I took std.jgrandson to proof of concept state and
 hence ready for destruction:

 http://erdani.com/d/jgrandson.d
 http://erdani.com/d/phobos-prerelease/std_jgrandson.html
Thanks for your comments! A few responses within:
 * Could you please put it on Github to get syntax highlighting and all
 the other advantages
Quick workaround: http://dpaste.dzfl.pl/65f4dcc36ab8
 * It doesn't completely follow the Phobos naming conventions
What would be the places?
 * The indentation is off in some places
Xamarin/Mono-D is at fault here :o).
 * The unit tests is a bit lacking for the separate parsing functions
Agreed.
 * There are methods for getting the strings and numbers, what about
 booleans?
You mean for Token? Good point. Numbers and strings are somewhat special because they have a payload associated. In contrast Booleans are represented by two distinct tokens. Would be good to add a convenience method.
 * Shouldn't it be called TokenRange?
Yah.
 * Shouldn't this be built using the lexer generator you so strongly have
 been pushing for?
Of course, and in the beginning I've actually pasted some code from it. Then I regressed to minimizing dependencies.
 * The unit tests for TokenStream is very dense. I would prefer empty
 newlines for grouping "assert" and calls to "popFront" belonging together
De gustibus et the coloribus non est disputandum :o). Andrei
Aug 04 2014
parent reply Jacob Carlborg <doob me.com> writes:
On 2014-08-04 18:55, Andrei Alexandrescu wrote:

 What would be the places?
That's why it's easier with Github ;) I can comment directly on a line. I just had a quick look but "_true", "_false" and "_null" in Token.Kind. If I recall correctly we add an underscore as a suffix for symbols with the same name as keywords.
 You mean for Token? Good point.
Yes, in Token.
 Numbers and strings are somewhat special
 because they have a payload associated. In contrast Booleans are
 represented by two distinct tokens. Would be good to add a convenience
 method.
Right.
 De gustibus et the coloribus non est disputandum :o).
Please avoid these Latin sentences, I have no idea what they mean. This is an international community, please don't make it more complicated than it already is with language barriers. -- /Jacob Carlborg
Aug 04 2014
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/4/14, 11:46 AM, Jacob Carlborg wrote:
 De gustibus et the coloribus non est disputandum :o).
Please avoid these Latin sentences, I have no idea what they mean. This is an international community, please don't make it more complicated than it already is with language barriers.
"Favorite foods and colors are not to be disputed." 51,300 results on google... and please let's end this before it becomes another Epic Debate. -- Andrei
Aug 04 2014