digitalmars.D - std.jgrandson

Andrei Alexandrescu (24/24) Aug 03 2014 We need a better json library at Facebook. I'd discussed with S�nke the

Johannes Pfau (20/53) Aug 03 2014 API looks great but I'd like to see some simple serialize/deserialize

ponce (4/18) Aug 03 2014 That's what https://github.com/Orvid/JSONSerialization does.

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (3/18) Aug 03 2014 The default mode for vibe.data.serialization also doesn't need any UDAs,...

Andrei Alexandrescu (11/29) Aug 03 2014 Nice.
Dicebot (12/17) Aug 03 2014 Before going this route one needs to have a good vision how it

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (8/22) Aug 03 2014 Do you have a specific case in mind where the data format doesn't fit

Dicebot (9/17) Aug 03 2014 For example we use special binary serialization format for

Jacob Carlborg (8/19) Aug 04 2014 I suggest only provide functions for serializing primitive types.

Daniel Murphy (4/5) Aug 04 2014 This is exactly what I need in most projects. Basic types, arrays, AAs,...

Jacob Carlborg (7/9) Aug 04 2014 I was more thinking only types that cannot be broken down in to

Daniel Murphy (25/30) Aug 05 2014 I guess I meant types that have an obvious mapping to json types.

Andrea Fontana (5/39) Aug 05 2014 If I'm right, json has just one numeric type. No difference

Daniel Murphy (3/7) Aug 05 2014 Maybe, but std.json has three numeric types.

Andrei Alexandrescu (6/15) Aug 05 2014 I searched around a bit and it seems different libraries have different

Dicebot (4/22) Aug 05 2014 There is certain benefit in using same primitive types for JSON
Sean Kelly (20/25) Aug 05 2014 The original point of JSON was that it auto-converts to

Andrei Alexandrescu (2/28) Aug 05 2014 All good points. Proceed with implementation! :o) -- Andrei

Dicebot (3/4) Aug 05 2014 Any news about std.allocator ? ;)

Andrei Alexandrescu (4/7) Aug 05 2014 It looks like I need to go all out and write a garbage collector, design...

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (7/15) Aug 05 2014 A few months ago, you posted a video of a talk where you

H. S. Teoh via Digitalmars-d (11/31) Aug 05 2014 Would it make sense to wrap a JSON number in an opaque type that

Andrea Fontana (6/12) Aug 06 2014 IMO we should store original json number value as string and then

Jacob Carlborg (6/28) Aug 05 2014 I'm not saying that is a bad idea or that I don't want to be able to do

Daniel Murphy (4/7) Aug 05 2014 I know, but I don't really care if it's part of a generic serialization

Jacob Carlborg (5/7) Aug 06 2014 Yeah, that's the problem. But where do you draw the line. Should arrays

Daniel Murphy (9/13) Aug 06 2014 Yes. Allow T, where T is any of

Jacob Carlborg (4/10) Aug 06 2014 BTW, why not classes? It's basically the same implementation as for stru...

Daniel Murphy (4/6) Aug 06 2014 I guess I've just never needed to do it with classes. A lot of the time...

Sean Kelly (4/11) Aug 06 2014 We could do something like Jackson. I wouldn't want it as the

Dicebot (4/22) Aug 04 2014 Do you consider structs primitive types? This is probably #1 use

Jacob Carlborg (7/10) Aug 04 2014 No, only types that cannot be broken down in to smaller pieces,

Dicebot (21/29) Aug 04 2014 That is exactly the problem - if `structToJson` won't be

Jacob Carlborg (8/27) Aug 04 2014 I see. I need to think a bit about this.

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (15/27) Aug 05 2014 On the other hand, a simplistic solution will inevitably result in

Dicebot (4/8) Aug 05 2014 Simple option is to define required serializer traits and make
Jacob Carlborg (21/32) Aug 05 2014 I have a very flexible trait like system in place. This allows to

bearophile (10/11) Aug 03 2014 Good.

Andrei Alexandrescu (2/11) Aug 03 2014 Yah, the latter is in the code. It's a ddoc problem. -- Andrei

=?ISO-8859-1?Q?S=F6nke_Ludwig?= (56/56) Aug 03 2014 A few thoughts based on my experience with vibe.data.json:

Andrei Alexandrescu (35/90) Aug 03 2014 Nonono. I think there's a confusion. The input strings are not UTF

Dicebot (6/20) Aug 03 2014 I support this opinion. opDispatch looks cool with JSON objects
=?ISO-8859-1?Q?S=F6nke_Ludwig?= (29/62) Aug 03 2014 Ah okay, *phew* ;) But in that case I'd actually think about leaving off...

Andrei Alexandrescu (19/53) Aug 03 2014 Yah, that's awesome.

Wyatt (11/22) Aug 04 2014 I suspect that depends on the circumstances. I've been using

Andrei Alexandrescu (34/34) Aug 03 2014 On 8/3/14, 2:38 AM, S�nke Ludwig wrote:

Johannes Pfau (17/37) Aug 03 2014 I think for the lowest level interface we could avoid allocation

Andrei Alexandrescu (8/44) Aug 03 2014 That works but not e.g. for File.byLine which reuses its internal

Johannes Pfau (9/15) Aug 03 2014 https://github.com/D-Programming-Language/phobos/blob/master/std/variant...

Andrei Alexandrescu (12/27) Aug 03 2014 That could be translated to a comparison of pointers to functions.

=?ISO-8859-1?Q?S=F6nke_Ludwig?= (15/48) Aug 03 2014 This may be the crux w.r.t. the vibe.data.json implementation. My

w0rp (9/22) Aug 03 2014 My issue with is is that if you ask for a key in an object which

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (8/26) Aug 03 2014 Yes, this is what I meant with the JavaScript part of API. In addition

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (6/17) Aug 04 2014 There is a parallel discussion about the concept of associative

Andrei Alexandrescu (6/32) Aug 03 2014 What would be your estimated time of finishing?

=?ISO-8859-1?Q?S=F6nke_Ludwig?= (6/7) Aug 05 2014 My rough estimate would be that about two weeks of calender time should

w0rp (16/41) Aug 03 2014 I like it. Here's what I think about it.
Daniel Gibson (10/16) Aug 03 2014 Is the name supposed to stay or just a working title?

Andrei Alexandrescu (3/12) Aug 03 2014 Just a working title, but of course if it were wildly successful... but

Sean Kelly (15/15) Aug 03 2014 I don't want to pay for anything I don't use. No allocations

Andrei Alexandrescu (14/29) Aug 03 2014 What to do about arrays and objects, which would naturally allocate

Dmitry Olshansky (25/31) Aug 03 2014 SAX-style would imply that array is "parsed" by calling 6 user-defined

Dmitry Olshansky (4/9) Aug 03 2014 Aw. Stray brace..

Sean Kelly (18/47) Aug 03 2014 This is tricky with a range. With an event-based parser I'd have

Jacob Carlborg (9/13) Aug 04 2014 Have a look at Token.Kind in the top of the module [1]. The enum

Jacob Carlborg (7/10) Aug 04 2014 I think it should only provide very primitive functions to

Orvid King (15/39) Aug 03 2014 If your looking for serialization from statically known type layouts
Andrea Fontana (29/54) Aug 04 2014 On my bson library I found very useful to have some methods to

Andrei Alexandrescu (7/31) Aug 04 2014 Cool. Is it unlikely that a value contains an actual slash? If so would

Andrea Fontana (17/72) Aug 05 2014 I wrote assume just to use proposed syntax :)

Andrei Alexandrescu (5/9) Aug 05 2014 One one side enters vibe.data.json with the deltas prompted by

Jacob Carlborg (18/24) Aug 04 2014 * Could you please put it on Github to get syntax highlighting

Andrei Alexandrescu (15/35) Aug 04 2014 Quick workaround: http://dpaste.dzfl.pl/65f4dcc36ab8

Jacob Carlborg (12/19) Aug 04 2014 That's why it's easier with Github ;) I can comment directly on a line.

Andrei Alexandrescu (4/8) Aug 04 2014 "Favorite foods and colors are not to be disputed." 51,300 results on

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

We need a better json library at Facebook. I'd discussed with S�nke the 
possibility of taking vibe.d's json to std but he said it needs some 
more work. So I took std.jgrandson to proof of concept state and hence 
ready for destruction:

http://erdani.com/d/jgrandson.d
http://erdani.com/d/phobos-prerelease/std_jgrandson.html

Here are a few differences compared to vibe.d's library. I think these 
are desirable to have in that library as well:

* Parsing strings is decoupled into tokenization (which is lazy and only 
needs an input range) and parsing proper. Tokenization is lazy, which 
allows users to create their own advanced (e.g. partial/lazy) parsing if 
needed. The parser itself is eager.

* There's no decoding of strings.

* The representation is built on Algebraic, with the advantages that it 
benefits from all of its primitives. Implementation is also very compact 
because Algebraic obviates a bunch of boilerplate. Subsequent 
improvements to Algebraic will also reflect themselves into improvements 
to std.jgrandson.

* The JSON value (called std.jgrandson.Value) has no named member 
variables or methods except for __payload. This is so there's no clash 
between dynamic properties exposed via opDispatch.

Well that's about it. What would it take for this to become a Phobos 
proposal? Destroy.


Andrei

Aug 03 2014

Johannes Pfau <nospam example.com> writes:

Am Sun, 03 Aug 2014 00:16:04 -0700
schrieb Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>:

 We need a better json library at Facebook. I'd discussed with S=C3=B6nke
 the possibility of taking vibe.d's json to std but he said it needs
 some more work. So I took std.jgrandson to proof of concept state and
 hence ready for destruction:
=20
 http://erdani.com/d/jgrandson.d
 http://erdani.com/d/phobos-prerelease/std_jgrandson.html
=20
 Here are a few differences compared to vibe.d's library. I think
 these are desirable to have in that library as well:
=20
 * Parsing strings is decoupled into tokenization (which is lazy and
 only needs an input range) and parsing proper. Tokenization is lazy,
 which allows users to create their own advanced (e.g. partial/lazy)
 parsing if needed. The parser itself is eager.
=20
 * There's no decoding of strings.
=20
 * The representation is built on Algebraic, with the advantages that
 it benefits from all of its primitives. Implementation is also very
 compact because Algebraic obviates a bunch of boilerplate. Subsequent=20
 improvements to Algebraic will also reflect themselves into
 improvements to std.jgrandson.
=20
 * The JSON value (called std.jgrandson.Value) has no named member=20
 variables or methods except for __payload. This is so there's no
 clash between dynamic properties exposed via opDispatch.
=20
 Well that's about it. What would it take for this to become a Phobos=20
 proposal? Destroy.
=20
=20
 Andrei

API looks great but I'd like to see some simple serialize/deserialize
functions as in vibed:
http://vibed.org/api/vibe.data.json/deserializeJson
http://vibed.org/api/vibe.data.json/serializeToJson

vibe uses UDAs to customize the serialization output. That's actually
not json specific and therefore shouldn't be part of this module. But a
simple deserializeJson which simply fills in all fields of a struct
given a TokenStream is very useful and can be done without allocations
(so it's much faster than going through the DOM).

Nitpicks:

* I'd make Token only store strings, then convert to double/number only
  when requested. If a user is simply skipping some tokens these
  conversions are unnecessary overhead.
* parseString really shouldn't use appender. Make it somehow possible
  to supply a buffer to TokenStream and use that. (This way there's no
  memory allocation. If a user want to keep the string he has to .dup
  it). A BufferedRange concept might even be better, because you can
  read in blocks and reuse buffers.

Aug 03 2014

"ponce" <contact gam3sfrommars.fr> writes:

 API looks great but I'd like to see some simple 
 serialize/deserialize
 functions as in vibed:
 http://vibed.org/api/vibe.data.json/deserializeJson
 http://vibed.org/api/vibe.data.json/serializeToJson

 vibe uses UDAs to customize the serialization output. That's 
 actually
 not json specific and therefore shouldn't be part of this 
 module. But a
 simple deserializeJson which simply fills in all fields of a 
 struct
 given a TokenStream is very useful and can be done without 
 allocations
 (so it's much faster than going through the DOM).

That's what https://github.com/Orvid/JSONSerialization does.
Also msgpack-d https://github.com/msgpack/msgpack-d whose 
defaults need no UDAs

That makes the typical use case very fast to write.

Aug 03 2014

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 03.08.2014 10:25, schrieb ponce:
 API looks great but I'd like to see some simple serialize/deserialize
 functions as in vibed:
 http://vibed.org/api/vibe.data.json/deserializeJson
 http://vibed.org/api/vibe.data.json/serializeToJson

 vibe uses UDAs to customize the serialization output. That's actually
 not json specific and therefore shouldn't be part of this module. But a
 simple deserializeJson which simply fills in all fields of a struct
 given a TokenStream is very useful and can be done without allocations
 (so it's much faster than going through the DOM).

 That's what https://github.com/Orvid/JSONSerialization does.
 Also msgpack-d https://github.com/msgpack/msgpack-d whose defaults need
 no UDAs

 That makes the typical use case very fast to write.

The default mode for vibe.data.serialization also doesn't need any UDAs, 
but it's still often useful to be able to make customizations.

Aug 03 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 8/3/14, 1:02 AM, Johannes Pfau wrote:
 API looks great but I'd like to see some simple serialize/deserialize
 functions as in vibed:
 http://vibed.org/api/vibe.data.json/deserializeJson
 http://vibed.org/api/vibe.data.json/serializeToJson

Agreed.

 vibe uses UDAs to customize the serialization output. That's actually
 not json specific and therefore shouldn't be part of this module. But a
 simple deserializeJson which simply fills in all fields of a struct
 given a TokenStream is very useful and can be done without allocations
 (so it's much faster than going through the DOM).

Nice.

 Nitpicks:

 * I'd make Token only store strings, then convert to double/number only
    when requested. If a user is simply skipping some tokens these
    conversions are unnecessary overhead.

Well... this is tricky. If the input has immutable characters, they can 
be stored because it can be assumed they'll live forever. If they're 
mutable or const, that assumption doesn't hold so every number must 
allocate. At that point it's probably cheaper to just convert to double.

One thing is I didn't treat integers specially, but I did notice some 
json parsers do make that distinction.

 * parseString really shouldn't use appender. Make it somehow possible
    to supply a buffer to TokenStream and use that. (This way there's no
    memory allocation. If a user want to keep the string he has to .dup
    it). A BufferedRange concept might even be better, because you can
    read in blocks and reuse buffers.

Good suggestion, thanks.


Andrei

Aug 03 2014

"Dicebot" <public dicebot.lv> writes:

On Sunday, 3 August 2014 at 08:04:40 UTC, Johannes Pfau wrote:
 API looks great but I'd like to see some simple 
 serialize/deserialize
 functions as in vibed:
 http://vibed.org/api/vibe.data.json/deserializeJson
 http://vibed.org/api/vibe.data.json/serializeToJson

Before going this route one needs to have a good vision how it 
may interact with imaginary std.serialization to avoid later 
deprecation.

At the same time I have recently started to think that dedicated 
serialization module that decouples aggregate iteration from data 
storage format is in most cases impractical for performance 
reasons - different serialization methods imply very different 
efficient iteration strategies. Probably it is better to define 
serialization compile-time traits instead and require each 
`std.data.*` provider to implement those on its own in the most 
effective fashion.

Aug 03 2014

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 03.08.2014 20:44, schrieb Dicebot:
 On Sunday, 3 August 2014 at 08:04:40 UTC, Johannes Pfau wrote:
 API looks great but I'd like to see some simple serialize/deserialize
 functions as in vibed:
 http://vibed.org/api/vibe.data.json/deserializeJson
 http://vibed.org/api/vibe.data.json/serializeToJson

 Before going this route one needs to have a good vision how it may
 interact with imaginary std.serialization to avoid later deprecation.

 At the same time I have recently started to think that dedicated
 serialization module that decouples aggregate iteration from data
 storage format is in most cases impractical for performance reasons -
 different serialization methods imply very different efficient iteration
 strategies. Probably it is better to define serialization compile-time
 traits instead and require each `std.data.*` provider to implement those
 on its own in the most effective fashion.

Do you have a specific case in mind where the data format doesn't fit 
the process used by vibe.data.serialization? The data format iteration 
part *is* abstracted away there in basically a kind of traits structure 
(the "Serializer"). When serializing, the data always gets written in 
the order defined by the input value, while during deserialization the 
serializer defines how aggregates are iterated. This seems to fit all of 
the data formats that I had in mind.

Aug 03 2014

"Dicebot" <public dicebot.lv> writes:

On Sunday, 3 August 2014 at 19:36:43 UTC, Sönke Ludwig wrote:
 Do you have a specific case in mind where the data format 
 doesn't fit the process used by vibe.data.serialization? The 
 data format iteration part *is* abstracted away there in 
 basically a kind of traits structure (the "Serializer"). When 
 serializing, the data always gets written in the order defined 
 by the input value, while during deserialization the serializer 
 defines how aggregates are iterated. This seems to fit all of 
 the data formats that I had in mind.

For example we use special binary serialization format for 
structs where serialized content is actually a valid D struct - 
after updating internal array pointers one can simply do 
`cast(S*) buffer.ptr` and work with it normally. Doing this 
efficiently requires breadth-first traversal and keeping track of 
one upper level to update the pointers. This does not fit very 
well with classical depth-first recursive traversal usually 
required by JSON-structure formats.

Aug 03 2014

"Jacob Carlborg" <doob me.com> writes:

On Sunday, 3 August 2014 at 18:44:37 UTC, Dicebot wrote:

 Before going this route one needs to have a good vision how it 
 may interact with imaginary std.serialization to avoid later 
 deprecation.

I suggest only provide functions for serializing primitive types. 
A separate serialization module/package with a JSON archive type 
would use this module as its backend.

 At the same time I have recently started to think that 
 dedicated serialization module that decouples aggregate 
 iteration from data storage format is in most cases impractical 
 for performance reasons - different serialization methods imply 
 very different efficient iteration strategies. Probably it is 
 better to define serialization compile-time traits instead and 
 require each `std.data.*` provider to implement those on its 
 own in the most effective fashion.

I'm not sure I agree with that. In my work on std.serialization I 
have not seen this to be a problem. What problems have you found?

--
/Jacob Carlborg

Aug 04 2014

"Daniel Murphy" <yebbliesnospam gmail.com> writes:

"Jacob Carlborg"  wrote in message 
news:bjecckhwlmkwkeqegwqa forum.dlang.org...

 I suggest only provide functions for serializing primitive types.

This is exactly what I need in most projects.  Basic types, arrays, AAs, and 
structs are usually enough.

Aug 04 2014

"Jacob Carlborg" <doob me.com> writes:

On Monday, 4 August 2014 at 09:10:46 UTC, Daniel Murphy wrote:

 This is exactly what I need in most projects.  Basic types, 
 arrays, AAs, and structs are usually enough.

I was more thinking only types that cannot be broken down in to 
smaller pieces, i.e. integer, floating point, bool and string. 
The serializer would break down the other types in to smaller 
pieces.

--
/Jacob Carlborg

Aug 04 2014

"Daniel Murphy" <yebbliesnospam gmail.com> writes:

"Jacob Carlborg"  wrote in message 
news:kvuaxyxjwmpqrorlozrz forum.dlang.org...

 This is exactly what I need in most projects.  Basic types, arrays, AAs, 
 and structs are usually enough.

 I was more thinking only types that cannot be broken down in to smaller 
 pieces, i.e. integer, floating point, bool and string. The serializer 
 would break down the other types in to smaller pieces.

I guess I meant types that have an obvious mapping to json types.

int/long -> json integer
bool -> json bool
string -> json string
float/real -> json float (close enough)
T[] -> json array
T[string] -> json object
struct -> json object

This is usually enough for config and data files.  Being able to do this is 
just awesome:

struct AppConfig
{
    string somePath;
    bool someOption;
    string[] someList;
    string[string] someMap;
}

void main()
{
    auto config = "config.json".readText().parseJSON().fromJson!AppConfig();
}

Being able to serialize whole graphs into json is something I need much less 
often.

Aug 05 2014

"Andrea Fontana" <nospam example.com> writes:

On Tuesday, 5 August 2014 at 12:40:25 UTC, Daniel Murphy wrote:
 "Jacob Carlborg"  wrote in message 
 news:kvuaxyxjwmpqrorlozrz forum.dlang.org...

 This is exactly what I need in most projects.  Basic types, 
 arrays, AAs, and structs are usually enough.

 I was more thinking only types that cannot be broken down in 
 to smaller pieces, i.e. integer, floating point, bool and 
 string. The serializer would break down the other types in to 
 smaller pieces.

 I guess I meant types that have an obvious mapping to json 
 types.

 int/long -> json integer
 bool -> json bool
 string -> json string
 float/real -> json float (close enough)
 T[] -> json array
 T[string] -> json object
 struct -> json object

 This is usually enough for config and data files.  Being able 
 to do this is just awesome:

 struct AppConfig
 {
    string somePath;
    bool someOption;
    string[] someList;
    string[string] someMap;
 }

 void main()
 {
    auto config = 
 "config.json".readText().parseJSON().fromJson!AppConfig();
 }

 Being able to serialize whole graphs into json is something I 
 need much less often.

If I'm right, json has just one numeric type. No difference 
between integers / float and no limits.

So probably the mapping is:

float/double/real/int/long => number

Aug 05 2014

"Daniel Murphy" <yebbliesnospam gmail.com> writes:

"Andrea Fontana"  wrote in message 
news:takluoqmlmmooxlovqya forum.dlang.org...

 If I'm right, json has just one numeric type. No difference between 
 integers / float and no limits.

 So probably the mapping is:

 float/double/real/int/long => number

Maybe, but std.json has three numeric types.

Aug 05 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 8/5/14, 8:23 AM, Daniel Murphy wrote:
 "Andrea Fontana"  wrote in message
 news:takluoqmlmmooxlovqya forum.dlang.org...

 If I'm right, json has just one numeric type. No difference between
 integers / float and no limits.

 So probably the mapping is:

 float/double/real/int/long => number

 Maybe, but std.json has three numeric types.

I searched around a bit and it seems different libraries have different 
takes to this numeric matter. A simple reading of the spec suggests that 
floating point data is the only numeric type. However, many 
implementations choose to distinguish between floating point and integrals.

Andrei

Aug 05 2014

"Dicebot" <public dicebot.lv> writes:

On Tuesday, 5 August 2014 at 17:17:56 UTC, Andrei Alexandrescu 
wrote:
 On 8/5/14, 8:23 AM, Daniel Murphy wrote:
 "Andrea Fontana"  wrote in message
 news:takluoqmlmmooxlovqya forum.dlang.org...

 If I'm right, json has just one numeric type. No difference 
 between
 integers / float and no limits.

 So probably the mapping is:

 float/double/real/int/long => number

 Maybe, but std.json has three numeric types.

 I searched around a bit and it seems different libraries have 
 different takes to this numeric matter. A simple reading of the 
 spec suggests that floating point data is the only numeric 
 type. However, many implementations choose to distinguish 
 between floating point and integrals.

There is certain benefit in using same primitive types for JSON 
as ones defined by BSON spec.

Aug 05 2014

"Sean Kelly" <sean invisibleduck.org> writes:

On Tuesday, 5 August 2014 at 17:17:56 UTC, Andrei Alexandrescu
wrote:
 I searched around a bit and it seems different libraries have 
 different takes to this numeric matter. A simple reading of the 
 spec suggests that floating point data is the only numeric 
 type. However, many implementations choose to distinguish 
 between floating point and integrals.

The original point of JSON was that it auto-converts to
Javascript data.  And since Javascript only has one numeric type,
of course JSON does too.  But I think it's important that a JSON
package for a language maps naturally to the types available in
that language.  D provides both floating point and integer types,
each with their own costs and benefits, and so the JSON package
should as well.  It ends up being a lot easier to deal with than
remembering to round from JSON.number or whatever when assigning
to an int.

In fact, JSON doesn't even impose any precision restrictions on
its numeric type, so one could argue that we should be using
BigInt and BigFloat.  But this would stink most of the time, so...

On an unrelated note, while the default encoding for strings is
UTF-8, the RFC absolutely allows for UTF-16 surrogate pairs, and
this must be supported.  Any strings you get from Internet
Explorer will be encoded as UTF-16 surrogate pairs regardless of
content, presumably since Windows uses 16 bit wide chars for
unicode.

Aug 05 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 8/5/14, 10:48 AM, Sean Kelly wrote:
 On Tuesday, 5 August 2014 at 17:17:56 UTC, Andrei Alexandrescu
 wrote:
 I searched around a bit and it seems different libraries have
 different takes to this numeric matter. A simple reading of the spec
 suggests that floating point data is the only numeric type. However,
 many implementations choose to distinguish between floating point and
 integrals.

 The original point of JSON was that it auto-converts to
 Javascript data.  And since Javascript only has one numeric type,
 of course JSON does too.  But I think it's important that a JSON
 package for a language maps naturally to the types available in
 that language.  D provides both floating point and integer types,
 each with their own costs and benefits, and so the JSON package
 should as well.  It ends up being a lot easier to deal with than
 remembering to round from JSON.number or whatever when assigning
 to an int.

 In fact, JSON doesn't even impose any precision restrictions on
 its numeric type, so one could argue that we should be using
 BigInt and BigFloat.  But this would stink most of the time, so...

 On an unrelated note, while the default encoding for strings is
 UTF-8, the RFC absolutely allows for UTF-16 surrogate pairs, and
 this must be supported.  Any strings you get from Internet
 Explorer will be encoded as UTF-16 surrogate pairs regardless of
 content, presumably since Windows uses 16 bit wide chars for
 unicode.

All good points. Proceed with implementation! :o) -- Andrei

Aug 05 2014

"Dicebot" <public dicebot.lv> writes:

On Tuesday, 5 August 2014 at 17:58:08 UTC, Andrei Alexandrescu 
wrote:
 All good points. Proceed with implementation! :o) -- Andrei

Any news about std.allocator ? ;)

Aug 05 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 8/5/14, 10:58 AM, Dicebot wrote:
 On Tuesday, 5 August 2014 at 17:58:08 UTC, Andrei Alexandrescu wrote:
 All good points. Proceed with implementation! :o) -- Andrei

 Any news about std.allocator ? ;)

It looks like I need to go all out and write a garbage collector, design 
and implementation and all.

Andrei

Aug 05 2014

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:

On Tuesday, 5 August 2014 at 18:12:54 UTC, Andrei Alexandrescu 
wrote:
 On 8/5/14, 10:58 AM, Dicebot wrote:
 On Tuesday, 5 August 2014 at 17:58:08 UTC, Andrei Alexandrescu 
 wrote:
 All good points. Proceed with implementation! :o) -- Andrei

 Any news about std.allocator ? ;)

 It looks like I need to go all out and write a garbage 
 collector, design and implementation and all.

A few months ago, you posted a video of a talk where you 
presented code from a garbage collector (it used templated mark 
functions to get precise tracing). I remember you said that this 
code was in use somewhere (I guess at FB?). Can this be used as a 
basis?

Aug 05 2014

"H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:

On Tue, Aug 05, 2014 at 10:58:08AM -0700, Andrei Alexandrescu via Digitalmars-d
wrote:
 On 8/5/14, 10:48 AM, Sean Kelly wrote:

[...]
The original point of JSON was that it auto-converts to
Javascript data.  And since Javascript only has one numeric type,
of course JSON does too.  But I think it's important that a JSON
package for a language maps naturally to the types available in
that language.  D provides both floating point and integer types,
each with their own costs and benefits, and so the JSON package
should as well.  It ends up being a lot easier to deal with than
remembering to round from JSON.number or whatever when assigning
to an int.

In fact, JSON doesn't even impose any precision restrictions on
its numeric type, so one could argue that we should be using
BigInt and BigFloat.  But this would stink most of the time, so...


Would it make sense to wrap a JSON number in an opaque type that
implicitly casts to the target built-in type?


On an unrelated note, while the default encoding for strings is
UTF-8, the RFC absolutely allows for UTF-16 surrogate pairs, and
this must be supported.  Any strings you get from Internet
Explorer will be encoded as UTF-16 surrogate pairs regardless of
content, presumably since Windows uses 16 bit wide chars for
unicode.


[...]

Wait, I thought surrogate pairs only apply to characters past U+FFFF? Is
it even possible to encode BMP characters with surrogate pairs?? Or do
you mean UTF-16?


T

-- 
Music critic: "That's an imitation fugue!"

Aug 05 2014

"Andrea Fontana" <nospam example.com> writes:

On Tuesday, 5 August 2014 at 18:11:21 UTC, H. S. Teoh via 
Digitalmars-d wrote:
 On Tue, Aug 05, 2014 at 10:58:08AM -0700, Andrei Alexandrescu 
 via Digitalmars-d wrote:
 On 8/5/14, 10:48 AM, Sean Kelly wrote:

 [...]

 Would it make sense to wrap a JSON number in an opaque type that
 implicitly casts to the target built-in type?

IMO we should store original json number value as string and then 
try to convert to what user asks for.

As said, it could be a big int, or a big floating point value 
without any limit.

Aug 06 2014

Jacob Carlborg <doob me.com> writes:

On 2014-08-05 14:40, Daniel Murphy wrote:

 I guess I meant types that have an obvious mapping to json types.

 int/long -> json integer
 bool -> json bool
 string -> json string
 float/real -> json float (close enough)
 T[] -> json array
 T[string] -> json object
 struct -> json object

 This is usually enough for config and data files.  Being able to do this
 is just awesome:

 struct AppConfig
 {
     string somePath;
     bool someOption;
     string[] someList;
     string[string] someMap;
 }

 void main()
 {
     auto config =
 "config.json".readText().parseJSON().fromJson!AppConfig();
 }

I'm not saying that is a bad idea or that I don't want to be able to do 
this. I just prefer this to be handled by a generic serialization 
module. Which can of course handle the simple cases, like above, as well.

-- 
/Jacob Carlborg

Aug 05 2014

"Daniel Murphy" <yebbliesnospam gmail.com> writes:

"Jacob Carlborg"  wrote in message news:lrqvfa$2has$1 digitalmars.com...

 I'm not saying that is a bad idea or that I don't want to be able to do 
 this. I just prefer this to be handled by a generic serialization module. 
 Which can of course handle the simple cases, like above, as well.

I know, but I don't really care if it's part of a generic serialization 
library or not.  I just want it there.  Chances are tying it to a future 
generic serialization library is going to make it take longer.

Aug 05 2014

Jacob Carlborg <doob me.com> writes:

On 2014-08-05 18:42, Daniel Murphy wrote:

 Chances are tying it to a future
 generic serialization library is going to make it take longer.

Yeah, that's the problem. But where do you draw the line. Should arrays 
of structs be supported?

-- 
/Jacob Carlborg

Aug 06 2014

"Daniel Murphy" <yebbliesnospam gmail.com> writes:

"Jacob Carlborg"  wrote in message news:lrsrek$19mf$1 digitalmars.com...

 Chances are tying it to a future
 generic serialization library is going to make it take longer.

 Yeah, that's the problem. But where do you draw the line. Should arrays of 
 structs be supported?

Yes.  Allow T, where T is any of

int, float, long, etc
bool
struct { T... }
T[string]
T[]

Sure, you _can_ make a struct containing an array that contains itself, but 
you probably won't.

Aug 06 2014

Jacob Carlborg <doob me.com> writes:

On 2014-08-06 13:36, Daniel Murphy wrote:

 Yes.  Allow T, where T is any of

 int, float, long, etc
 bool
 struct { T... }
 T[string]
 T[]

BTW, why not classes? It's basically the same implementation as for structs.

-- 
/Jacob Carlborg

Aug 06 2014

"Daniel Murphy" <yebbliesnospam gmail.com> writes:

"Jacob Carlborg"  wrote in message news:lrtf8l$22d3$1 digitalmars.com...

 BTW, why not classes? It's basically the same implementation as for 
 structs.

I guess I've just never needed to do it with classes.  A lot of the time 
when I use classes I use inheritance, and this simple translation doesn't 
work out so will then...

Aug 06 2014

"Sean Kelly" <sean invisibleduck.org> writes:

On Wednesday, 6 August 2014 at 15:28:06 UTC, Daniel Murphy wrote:
 "Jacob Carlborg"  wrote in message 
 news:lrtf8l$22d3$1 digitalmars.com...

 BTW, why not classes? It's basically the same implementation 
 as for structs.

 I guess I've just never needed to do it with classes.  A lot of 
 the time when I use classes I use inheritance, and this simple 
 translation doesn't work out so will then...

We could do something like Jackson. I wouldn't want it as the 
primary interface for a JSON package, but for serializing classes 
it's a pretty easy design to work with from a user perspective.

Aug 06 2014

"Dicebot" <public dicebot.lv> writes:

On Monday, 4 August 2014 at 07:34:19 UTC, Jacob Carlborg wrote:
 On Sunday, 3 August 2014 at 18:44:37 UTC, Dicebot wrote:

 Before going this route one needs to have a good vision how it 
 may interact with imaginary std.serialization to avoid later 
 deprecation.

 I suggest only provide functions for serializing primitive 
 types. A separate serialization module/package with a JSON 
 archive type would use this module as its backend.


case for JSON conversion.

 At the same time I have recently started to think that 
 dedicated serialization module that decouples aggregate 
 iteration from data storage format is in most cases 
 impractical for performance reasons - different serialization 
 methods imply very different efficient iteration strategies. 
 Probably it is better to define serialization compile-time 
 traits instead and require each `std.data.*` provider to 
 implement those on its own in the most effective fashion.

 I'm not sure I agree with that. In my work on std.serialization 
 I have not seen this to be a problem. What problems have you 
 found?

http://forum.dlang.org/post/mzweposldwqdtmqoltiy forum.dlang.org

Aug 04 2014

"Jacob Carlborg" <doob me.com> writes:

On Monday, 4 August 2014 at 14:02:22 UTC, Dicebot wrote:


 use case for JSON conversion.

No, only types that cannot be broken down in to smaller pieces, 
i.e. integral, floating points,  bool and strings.

 http://forum.dlang.org/post/mzweposldwqdtmqoltiy forum.dlang.org

I don't understand exactly how that binary serialization works. I 
think I would need a code example.

--
/Jacob Carlborg

Aug 04 2014

"Dicebot" <public dicebot.lv> writes:

On Monday, 4 August 2014 at 14:18:41 UTC, Jacob Carlborg wrote:
 On Monday, 4 August 2014 at 14:02:22 UTC, Dicebot wrote:


 use case for JSON conversion.

 No, only types that cannot be broken down in to smaller pieces, 
 i.e. integral, floating points,  bool and strings.

That is exactly the problem - if `structToJson` won't be 
provided, complaints are inevitable, it is too basic feature to 
wait for std.serialization :(

 http://forum.dlang.org/post/mzweposldwqdtmqoltiy forum.dlang.org

 I don't understand exactly how that binary serialization works. 
 I think I would need a code example.

Simplified serialization algorithm:

1) write (cast(void*) &struct)[0..struct.sizeof] to target buffer
2) write any of array content to the same buffer after the struct
3.1) if array contains structs, recursion
3.2) go back to buffer[0..struct.sizeof] slice and update array 
fields to store an index in the same buffer instead of actual ptr

Simplified deserialization algorithm:

1) recursively traverse the struct and replace array index 
offsets with real slices to the buffer

(I don't want to bother with getting copyright permissions to 
publish actual code)

I am pretty sure that this is not the only optimized 
serialization approach out there that does not fit in a 
content-insensitive primitive-based traversal scheme. And we 
won't Phobos stuff to be blazingly fast which can lead to 
situation where new data module will circumvent the 
std.serialization API to get more performance.

Aug 04 2014

Jacob Carlborg <doob me.com> writes:

On 2014-08-04 16:55, Dicebot wrote:

 That is exactly the problem - if `structToJson` won't be provided,
 complaints are inevitable, it is too basic feature to wait for
 std.serialization :(

Hmm, yeah, that's a problem.

 Simplified serialization algorithm:

 1) write (cast(void*) &struct)[0..struct.sizeof] to target buffer
 2) write any of array content to the same buffer after the struct
 3.1) if array contains structs, recursion
 3.2) go back to buffer[0..struct.sizeof] slice and update array fields
 to store an index in the same buffer instead of actual ptr

 Simplified deserialization algorithm:

 1) recursively traverse the struct and replace array index offsets with
 real slices to the buffer

I see. I need to think a bit about this.

 (I don't want to bother with getting copyright permissions to publish
 actual code)

Fair enough. The above was quite descriptive.

 I am pretty sure that this is not the only optimized serialization
 approach out there that does not fit in a content-insensitive
 primitive-based traversal scheme. And we won't Phobos stuff to be
 blazingly fast which can lead to situation where new data module will
 circumvent the std.serialization API to get more performance.

I don't like the idea of having to reimplement serialization for each 
data type that can be generalized.

-- 
/Jacob Carlborg

Aug 04 2014

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 04.08.2014 20:38, schrieb Jacob Carlborg:
 On 2014-08-04 16:55, Dicebot wrote:

 That is exactly the problem - if `structToJson` won't be provided,
 complaints are inevitable, it is too basic feature to wait for
 std.serialization :(

 Hmm, yeah, that's a problem.

On the other hand, a simplistic solution will inevitably result in 
people needing more. And when at some point a serialization module is in 
Phobos, there will be duplicate functionality in the library.

 I am pretty sure that this is not the only optimized serialization
 approach out there that does not fit in a content-insensitive
 primitive-based traversal scheme. And we won't Phobos stuff to be
 blazingly fast which can lead to situation where new data module will
 circumvent the std.serialization API to get more performance.

 I don't like the idea of having to reimplement serialization for each
 data type that can be generalized.

I think we could also simply keep the generic default recursive descent 
behavior, but allow serializers to customize the process using some kind 
of trait. This could even be added later in a backwards compatible 
fashion if necessary.

BTW, how is the progress for Orange w.r.t. to the conversion to a more 
template+allocation-less approach, is a new std proposal within the next 
DMD release cycle realistic?

I quite like most of how vibe.data.serialization turned out, but it 
can't do any alias detection/deduplication (and I have no concrete plans 
to add support for that), which is why I currently wouldn't consider it 
as a potential Phobos candidate.

Aug 05 2014

"Dicebot" <public dicebot.lv> writes:

On Tuesday, 5 August 2014 at 09:54:42 UTC, Sönke Ludwig wrote:
 I think we could also simply keep the generic default recursive 
 descent behavior, but allow serializers to customize the 
 process using some kind of trait. This could even be added 
 later in a backwards compatible fashion if necessary.

Simple option is to define required serializer traits and make 
both std.serialization default and any custom data-specific ones 
conform it.

Aug 05 2014

Jacob Carlborg <doob me.com> writes:

On 2014-08-05 11:54, Sönke Ludwig wrote:

 I think we could also simply keep the generic default recursive descent
 behavior, but allow serializers to customize the process using some kind
 of trait. This could even be added later in a backwards compatible
 fashion if necessary.

I have a very flexible trait like system in place. This allows to 
configure the serializer based on the given archiver and user 
customizations. To avoid having the serializer do unnecessary work which 
the archiver cannot handle.

 BTW, how is the progress for Orange w.r.t. to the conversion to a more
 template+allocation-less approach

Slowly. I think the range support in the serializer is basically 
complete. But the deserializer isn't done yet. I would also like to 
provide, at least, one additional archiver type besides XML. BTW std.xml 
doesn't make it any easier to rangify the serializer.

I've been focusing on D/Objective-C lately, which I think is in a more 
complete state than std.serialization. I would really like to get it 
done and create a pull request so I can get back to std.serialization. 
But I always get stuck after a merge with something breaking. With the 
summer and vacations I haven't been able to work that much on D at all.

, is a new std proposal within the next
 DMD release cycle realistic?

Probably not.

 I quite like most of how vibe.data.serialization turned out, but it
 can't do any alias detection/deduplication (and I have no concrete plans
 to add support for that), which is why I currently wouldn't consider it
 as a potential Phobos candidate.

I'm quite satisfied with the feature support and flexibility of 
Orange/std.serialization. With the new trait like system it will be even 
more flexible.

-- 
/Jacob Carlborg

Aug 05 2014

"bearophile" <bearophileHUGS lycos.com> writes:

Andrei Alexandrescu:

 * The representation is built on Algebraic,

Good.

But here I'd like a little more readable type:

alias Payload = std.variant.VariantN!(16LU, typeof(null), bool, 
double, string, Value[], Value[string]).VariantN;

Like:

alias Payload = std.variant.Algebraic!(typeof(null), bool, 
double, string, Value[], Value[string]);

Bye,
bearophile

Aug 03 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 8/3/14, 1:19 AM, bearophile wrote:
 Andrei Alexandrescu:

 * The representation is built on Algebraic,

 Good.

 But here I'd like a little more readable type:

 alias Payload = std.variant.VariantN!(16LU, typeof(null), bool, double,
 string, Value[], Value[string]).VariantN;

 Like:

 alias Payload = std.variant.Algebraic!(typeof(null), bool, double,
 string, Value[], Value[string]);

Yah, the latter is in the code. It's a ddoc problem. -- Andrei

Aug 03 2014

=?ISO-8859-1?Q?S=F6nke_Ludwig?= <sludwig rejectedsoftware.com> writes:

A few thoughts based on my experience with vibe.data.json:

1. No decoding of strings appears to mean that "Value" also always 
contains encoded strings. This seems the be a leaky and also error prone 
leaky abstraction. For the token stream, performance should be top 
priority, so it's okay to not decode there, but "Value" is a high level 
abstraction of a JSON value, so it should really hide all implementation 
details of the storage format.

2. Algebraic is a good choice for its generic handling of operations on 
the contained types (which isn't exposed here, though). However, a 
tagged union type in my experience has quite some advantages for 
usability. Since adding a type tag possibly affects the interface in a 
non-backwards compatible way, this should be evaluated early on.

2.b) I'm currently working on a generic tagged union type that also 
enables operations between values in a natural generic way. This has the 
big advantage of not having to manually define operators like in 
"Value", which is error prone and often limited (I've had to make many 
fixes and additions in this part of the code over time).

3. Use of "opDispatch" for an open set of members has been criticized 
for vibe.data.json before and I agree with that criticism. The only 
advantage is saving a few keystrokes (json.key instead of json["key"]), 
but I came to the conclusion that the right approach to work with JSON 
values in D is to always directly deserialize when/if possible anyway, 
which mostly makes this is a moot point.

This approach has a lot of advantages, e.g. reduction of allocations, 
performance of field access and avoiding typos when accessing fields. 
Especially the last point is interesting, because opDispatch based field 
access gives the false impression that a static field is accessed.

The decision to minimize the number of static fields within "Value" 
reduces the chance of accidentally accessing a static field instead of 
hitting opDispatch, but there are still *some* static fields/methods and 
any later addition of a symbol must now be considered a breaking change.

3.b) Bad interaction of UFCS and opDispatch: Functions like "remove" and 
"assume" certainly look like they could be used with UFCS, but 
opDispatch destroys that possibility.

4. I know the stance on this is often "The D module system has enough 
facilities to disambiguate" (which is not really a valid argument, but 
rather just the lack of a counter argument, IMO), but I highly dislike 
the choice to leave off any mention of "JSON" or "Json" in the global 
symbol names. Using the module either requires to always use a renamed 
import or a manual alias, or the resulting source code will always leave 
the reader wondering what kind of data is actually handled. Handling 
multiple "value" types in a single piece of code, which is not uncommon 
(e.g. JSON + BSON/ini value/...) would always require explicit 
disambiguation. I'd certainly include the "JSON" or "Json" part in the 
names.

5. Whatever happens, *please* let's aim for a module name of 
std.data.json (similar to std.digest.*), so that any data formats added 
later are nicely organized. All existing data format support (XML + CSV) 
doesn't follow contemporary Phobos style, so they will need to be 
deprecated at some point anyway, freeing the way for a clean an 
non-breaking transition to a more organized module hierarchy.

6. (Possibly compile time optional) support for keeping track of 
line/column numbers is often important for better error messages, so 
that would be good to have included as part of the parser and in the 
"Token" type.

S�nke

Aug 03 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 8/3/14, 2:38 AM, S�nke Ludwig wrote:
 A few thoughts based on my experience with vibe.data.json:

 1. No decoding of strings appears to mean that "Value" also always
 contains encoded strings. This seems the be a leaky and also error prone
 leaky abstraction. For the token stream, performance should be top
 priority, so it's okay to not decode there, but "Value" is a high level
 abstraction of a JSON value, so it should really hide all implementation
 details of the storage format.

Nonono. I think there's a confusion. The input strings are not UTF 
decoded for the simple need there's no need (all tokenization decisions 
are taken on the basis of ASCII characters/code units). The 
backslash-prefixed characters are indeed decoded.

An optimization I didn't implement yet is to use slices of the input 
wherever possible (when the input is string, immutable(byte)[], or 
immutable(ubyte)[]). That will reduce allocations considerably.

 2. Algebraic is a good choice for its generic handling of operations on
 the contained types (which isn't exposed here, though). However, a
 tagged union type in my experience has quite some advantages for
 usability. Since adding a type tag possibly affects the interface in a
 non-backwards compatible way, this should be evaluated early on.

There's a public opCast(Payload) that gives the end user access to the 
Payload inside a Value. I forgot to add documentation to it.

What advantages are to a tagged union? (FWIW: to me Algebraic and 
Variant are also tagged unions, just that the tags are not 0, 1, ..., n. 
That can be easily fixed for Algebraic by defining operations to access 
the index of the currently-stored type.)

 2.b) I'm currently working on a generic tagged union type that also
 enables operations between values in a natural generic way. This has the
 big advantage of not having to manually define operators like in
 "Value", which is error prone and often limited (I've had to make many
 fixes and additions in this part of the code over time).

I did notice that vibe.json has quite a repetitive implementation, so 
reducing it would be great.

The way I see it, good work on tagged unions must be either integrated 
within std.variant (either by modifying Variant/Algebraic or by adding 
new types to it). I am very strongly opposed to adding a tagged union 
type only for JSON purposes, which I'd consider essentially a usability 
bug in std.variant, the opposite of dogfooding, etc.

 3. Use of "opDispatch" for an open set of members has been criticized
 for vibe.data.json before and I agree with that criticism. The only
 advantage is saving a few keystrokes (json.key instead of json["key"]),
 but I came to the conclusion that the right approach to work with JSON
 values in D is to always directly deserialize when/if possible anyway,
 which mostly makes this is a moot point.

Interesting. Well if experience with opDispatch is negative then it 
should probably not be used here, or only offered on an opt-in basis.

 This approach has a lot of advantages, e.g. reduction of allocations,
 performance of field access and avoiding typos when accessing fields.
 Especially the last point is interesting, because opDispatch based field
 access gives the false impression that a static field is accessed.

Good point.

 The decision to minimize the number of static fields within "Value"
 reduces the chance of accidentally accessing a static field instead of
 hitting opDispatch, but there are still *some* static fields/methods and
 any later addition of a symbol must now be considered a breaking change.

Right now the idea is that the only named member is __payload. Well then 
there's opXxxx as well. The idea is/was to add all other functionality 
as free functions.

 3.b) Bad interaction of UFCS and opDispatch: Functions like "remove" and
 "assume" certainly look like they could be used with UFCS, but
 opDispatch destroys that possibility.

Yah, agreed.

The bummer is people coming from Python won't be able to continue using 
the same style without opDispatch.

 4. I know the stance on this is often "The D module system has enough
 facilities to disambiguate" (which is not really a valid argument, but
 rather just the lack of a counter argument, IMO), but I highly dislike
 the choice to leave off any mention of "JSON" or "Json" in the global
 symbol names. Using the module either requires to always use a renamed
 import or a manual alias, or the resulting source code will always leave
 the reader wondering what kind of data is actually handled. Handling
 multiple "value" types in a single piece of code, which is not uncommon
 (e.g. JSON + BSON/ini value/...) would always require explicit
 disambiguation. I'd certainly include the "JSON" or "Json" part in the
 names.

Good point, I agree.

 5. Whatever happens, *please* let's aim for a module name of
 std.data.json (similar to std.digest.*), so that any data formats added
 later are nicely organized. All existing data format support (XML + CSV)
 doesn't follow contemporary Phobos style, so they will need to be
 deprecated at some point anyway, freeing the way for a clean an
 non-breaking transition to a more organized module hierarchy.

I agree.

 6. (Possibly compile time optional) support for keeping track of
 line/column numbers is often important for better error messages, so
 that would be good to have included as part of the parser and in the
 "Token" type.

Yah, saw that in vibe.d but forgot about it.


Thanks,

Andrei

Aug 03 2014

"Dicebot" <public dicebot.lv> writes:

On Sunday, 3 August 2014 at 15:14:43 UTC, Andrei Alexandrescu 
wrote:
 3. Use of "opDispatch" for an open set of members has been 
 criticized
 for vibe.data.json before and I agree with that criticism. The 
 only
 advantage is saving a few keystrokes (json.key instead of 
 json["key"]),
 but I came to the conclusion that the right approach to work 
 with JSON
 values in D is to always directly deserialize when/if possible 
 anyway,
 which mostly makes this is a moot point.

 Interesting. Well if experience with opDispatch is negative 
 then it should probably not be used here, or only offered on an 
 opt-in basis.

I support this opinion. opDispatch looks cool with JSON objects 
when you implement it but it results in many subtle quirks when 
you consider something like range traits for example - most 
annoying to encounter and debug. It is not worth the gain.

Aug 03 2014

=?ISO-8859-1?Q?S=F6nke_Ludwig?= <sludwig rejectedsoftware.com> writes:

Am 03.08.2014 17:14, schrieb Andrei Alexandrescu:
 On 8/3/14, 2:38 AM, S�nke Ludwig wrote:
 A few thoughts based on my experience with vibe.data.json:

 1. No decoding of strings appears to mean that "Value" also always
 contains encoded strings. This seems the be a leaky and also error prone
 leaky abstraction. For the token stream, performance should be top
 priority, so it's okay to not decode there, but "Value" is a high level
 abstraction of a JSON value, so it should really hide all implementation
 details of the storage format.

 Nonono. I think there's a confusion. The input strings are not UTF
 decoded for the simple need there's no need (all tokenization decisions
 are taken on the basis of ASCII characters/code units). The
 backslash-prefixed characters are indeed decoded.

 An optimization I didn't implement yet is to use slices of the input
 wherever possible (when the input is string, immutable(byte)[], or
 immutable(ubyte)[]). That will reduce allocations considerably.

Ah okay, *phew* ;) But in that case I'd actually think about leaving off 
the backslash decoding in the low level parser, so that slices could be 
used for immutable inputs in all cases - maybe with a name of 
"rawString" for the stored data and an additional "string" property that 
decodes on the fly. This may come in handy when the first comparative 
benchmarks together with rapidjson and the like are done.

 2. Algebraic is a good choice for its generic handling of operations on
 the contained types (which isn't exposed here, though). However, a
 tagged union type in my experience has quite some advantages for
 usability. Since adding a type tag possibly affects the interface in a
 non-backwards compatible way, this should be evaluated early on.

 There's a public opCast(Payload) that gives the end user access to the
 Payload inside a Value. I forgot to add documentation to it.

I see. Suppose that opDispatch would be dropped, would anything speak 
against "alias this"ing _payload to avoid the need for the manually 
defined operators?

 What advantages are to a tagged union? (FWIW: to me Algebraic and
 Variant are also tagged unions, just that the tags are not 0, 1, ..., n.
 That can be easily fixed for Algebraic by defining operations to access
 the index of the currently-stored type.)

The two major points are probably that it's possible to use "final 
switch" on the type tag if it's an enum, and the type id can be easily 
stored in both integer and string form (which is not as conveniently 
possible with a TypeInfo).

 (...)

 The way I see it, good work on tagged unions must be either integrated
 within std.variant (either by modifying Variant/Algebraic or by adding
 new types to it). I am very strongly opposed to adding a tagged union
 type only for JSON purposes, which I'd consider essentially a usability
 bug in std.variant, the opposite of dogfooding, etc.

Definitely agree there.

An enum based tagged union design also currently has the unfortunate 
property that the order of enum values and that of the accepted types 
must be defined consistently, or bad things will happen. Supporting UDAs 
on enum values would be a possible direction to fix this:

	enum JsonType {
		 variantType!string string,
		 variantType!(JsonValue[]) array,
		 variantType!(JsonValue[string]) object
	}
	alias JsonValue = TaggedUnion!JsonType;

But then there are obviously still issues with cyclic type references. 
So, anyway, this is something that still requires some thought. It could 
also be designed in a way that is backwards compatible with a pure 
"Algebraic", so it shouldn't be a blocker for the current design.

Aug 03 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 8/3/14, 11:03 AM, S�nke Ludwig wrote:
 Am 03.08.2014 17:14, schrieb Andrei Alexandrescu:

[snip]
 Ah okay, *phew* ;) But in that case I'd actually think about leaving off
 the backslash decoding in the low level parser, so that slices could be
 used for immutable inputs in all cases - maybe with a name of
 "rawString" for the stored data and an additional "string" property that
 decodes on the fly. This may come in handy when the first comparative
 benchmarks together with rapidjson and the like are done.

Yah, that's awesome.

 There's a public opCast(Payload) that gives the end user access to the
 Payload inside a Value. I forgot to add documentation to it.

 I see. Suppose that opDispatch would be dropped, would anything speak
 against "alias this"ing _payload to avoid the need for the manually
 defined operators?

Correct. In fact the conversion was there but I removed it for the sake 
of opDispatch.

 What advantages are to a tagged union? (FWIW: to me Algebraic and
 Variant are also tagged unions, just that the tags are not 0, 1, ..., n.
 That can be easily fixed for Algebraic by defining operations to access
 the index of the currently-stored type.)

 The two major points are probably that it's possible to use "final
 switch" on the type tag if it's an enum,

So I just tried this: http://dpaste.dzfl.pl/eeadac68fac0. Sadly, the 
cast doesn't take. Without the cast the enum does compile, but not the 
switch. I submitted https://issues.dlang.org/show_bug.cgi?id=13247.

 and the type id can be easily stored in both integer and string form
 (which is not as conveniently possible with a TypeInfo).

I think here pointers to functions "win" because getting a string (or 
anything else for that matter) is an indirect call away.

std.variant has been among the first artifacts I wrote for D. It's a 
topic I've been dabbling in for a long time in a C++ context 
(http://goo.gl/zqUwFx), with always almost-satisfactory results. I told 
myself if I get to implement things in D properly, then this language 
has good potential. Replacing the integral tag I'd always used with a 
pointer to function is, I think, net progress. Things turned out fine, 
save for the switch matter.

 An enum based tagged union design also currently has the unfortunate
 property that the order of enum values and that of the accepted types
 must be defined consistently, or bad things will happen. Supporting UDAs
 on enum values would be a possible direction to fix this:

      enum JsonType {
           variantType!string string,
           variantType!(JsonValue[]) array,
           variantType!(JsonValue[string]) object
      }
      alias JsonValue = TaggedUnion!JsonType;

 But then there are obviously still issues with cyclic type references.
 So, anyway, this is something that still requires some thought. It could
 also be designed in a way that is backwards compatible with a pure
 "Algebraic", so it shouldn't be a blocker for the current design.

I think something can be designed along these lines if necessary.


Andrei

Aug 03 2014

"Wyatt" <wyatt.epp gmail.com> writes:

On Sunday, 3 August 2014 at 15:14:43 UTC, Andrei Alexandrescu 
wrote:
 On 8/3/14, 2:38 AM, Sönke Ludwig wrote:
 3. Use of "opDispatch" for an open set of members has been
 criticized for vibe.data.json before and I agree with that
 criticism. The only advantage is saving a few keystrokes
 (json.key instead of json["key"]), but I came to the conclusion
 that the right approach to work with JSON values in D is to
 always directly deserialize when/if possible anyway, which
 mostly makes this is a moot point.

 Interesting. Well if experience with opDispatch is negative then
 it should probably not be used here, or only offered on an
 opt-in basis.

I suspect that depends on the circumstances.  I've been using 
this style (with Adam's jsvar), and I find it quite nice for 
decomposing my TOML parse trees to Variant-like structures that 
go several levels deep.  It makes reading (and, consequently, 
reasoning about) them much easier for me.

That said, I think the ideal would be that nesting Variant[] 
should work predictably such that users can just write a one-line 
opDispatch if they want it to behave that way.

-Wyatt

Aug 04 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 8/3/14, 2:38 AM, S�nke Ludwig wrote:
[snip]

We need to address the matter of std.jgrandson competing with
vibe.data.json. Clearly at a point only one proposal will have to be
accepted so the other would be wasted work.

Following our email exchange I decided to work on this because (a) you
mentioned more work is needed and your schedule was unclear, (b) we need
this at FB sooner rather than later, (c) there were a few things I
thought can be improved in vibe.data.json. I hope that taking
std.jgrandson to proof spurs things into action.

Would you want to merge some of std.jgrandson's deltas into a new
proposal std.data.json based on vibe.data.json? Here's a few things that
I consider necessary:

1. Commit to a schedule. I can't abandon stuff in wait for the perfect 
design that may or may not come someday.

2. Avoid UTF decoding.

3. Offer a lazy token stream as a basis for a non-lazy parser. A lazy 
general parser would be considerably more difficult to write and would 
only serve a small niche. On the other hand, a lazy tokenizer is easy to 
write and make efficient, and serve as a basis for user-defined 
specialized lazy parsers if the user wants so.

4. Avoid string allocation. String allocation can be replaced with 
slices of the input when these two conditions are true: (a) input type 
is string, immutable(byte)[], or immutable(ubyte)[]; (b) there are no 
backslash-encoded sequences in the string, i.e. the input string and the 
actual string are the same.

5. Build on std.variant through and through. Again, anything that 
doesn't work is a usability bug in std.variant, which was designed for 
exactly this kind of stuff. Exposing the representation such that user 
code benefits of the Algebraic's primitives may be desirable.

6. Address w0rp's issue with undefined. In fact std.Algebraic does have 
an uninitialized state :o).

S�nke, what do you think?


Andrei

Aug 03 2014

Johannes Pfau <nospam example.com> writes:

Am Sun, 03 Aug 2014 08:34:20 -0700
schrieb Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>:

 On 8/3/14, 2:38 AM, S=C3=B6nke Ludwig wrote:
 [snip]
=20
 We need to address the matter of std.jgrandson competing with
 vibe.data.json. Clearly at a point only one proposal will have to be
 accepted so the other would be wasted work.
=20
 [...]
=20
 4. Avoid string allocation. String allocation can be replaced with=20
 slices of the input when these two conditions are true: (a) input
 type is string, immutable(byte)[], or immutable(ubyte)[]; (b) there
 are no backslash-encoded sequences in the string, i.e. the input
 string and the actual string are the same.

I think for the lowest level interface we could avoid allocation
completely:
The tokenizer could always return slices to the raw string, even if a
string contains backslash-encode sequences or if the token is a number.
Simply expose that as token.rawValue. Then add a function,
Token.decodeString() and token.decodeNumber() to actually decode the
numbers. decodeString could additionally support decoding into a buffer.

If the input is not sliceable, read the input into an internal buffer
first and slice that buffer.

The main usecase for this is if you simply stream lots of data and you
only want to parse very little of it and skip over most content. Then
you don't need to decode the strings. This is also true if you only
write a JSON formatter: No need to decode and encode the strings.

=20
 5. Build on std.variant through and through. Again, anything that=20
 doesn't work is a usability bug in std.variant, which was designed
 for exactly this kind of stuff. Exposing the representation such that
 user code benefits of the Algebraic's primitives may be desirable.
=20

Variant uses TypeInfo internally, right? I think as long as it uses
TypeInfo it can't replace all use-cases for a standard tagged union.

Aug 03 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 8/3/14, 8:51 AM, Johannes Pfau wrote:
 Am Sun, 03 Aug 2014 08:34:20 -0700
 schrieb Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>:

 On 8/3/14, 2:38 AM, Sönke Ludwig wrote:
 [snip]

 We need to address the matter of std.jgrandson competing with
 vibe.data.json. Clearly at a point only one proposal will have to be
 accepted so the other would be wasted work.

 [...]

 4. Avoid string allocation. String allocation can be replaced with
 slices of the input when these two conditions are true: (a) input
 type is string, immutable(byte)[], or immutable(ubyte)[]; (b) there
 are no backslash-encoded sequences in the string, i.e. the input
 string and the actual string are the same.

 I think for the lowest level interface we could avoid allocation
 completely:
 The tokenizer could always return slices to the raw string, even if a
 string contains backslash-encode sequences or if the token is a number.
 Simply expose that as token.rawValue. Then add a function,
 Token.decodeString() and token.decodeNumber() to actually decode the
 numbers. decodeString could additionally support decoding into a buffer.

That works but not e.g. for File.byLine which reuses its internal 
buffer. But it's a neat idea for arrays of immutable bytes.

 If the input is not sliceable, read the input into an internal buffer
 first and slice that buffer.

At that point the cost of decoding becomes negligible.

 The main usecase for this is if you simply stream lots of data and you
 only want to parse very little of it and skip over most content. Then
 you don't need to decode the strings.

Awesome.

 This is also true if you only
 write a JSON formatter: No need to decode and encode the strings.

But wouldn't that still need to encode \n, \r, \t, \v?

 5. Build on std.variant through and through. Again, anything that
 doesn't work is a usability bug in std.variant, which was designed
 for exactly this kind of stuff. Exposing the representation such that
 user code benefits of the Algebraic's primitives may be desirable.

 Variant uses TypeInfo internally, right?

No.


Andrei

Aug 03 2014

Johannes Pfau <nospam example.com> writes:

Am Sun, 03 Aug 2014 09:17:57 -0700
schrieb Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>:

 On 8/3/14, 8:51 AM, Johannes Pfau wrote:
 Variant uses TypeInfo internally, right?

 
 No.
 

https://github.com/D-Programming-Language/phobos/blob/master/std/variant.d#L210
https://github.com/D-Programming-Language/phobos/blob/master/std/variant.d#L371
https://github.com/D-Programming-Language/phobos/blob/master/std/variant.d#L696

Also the handler function concept will always have more overhead than a
simple tagged union. It is certainly useful if you want to store any
type, but if you only want a limited set of types there are more
efficient implementations.

Aug 03 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 8/3/14, 11:08 AM, Johannes Pfau wrote:
 Am Sun, 03 Aug 2014 09:17:57 -0700
 schrieb Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>:

 On 8/3/14, 8:51 AM, Johannes Pfau wrote:
 Variant uses TypeInfo internally, right?

 No.

 https://github.com/D-Programming-Language/phobos/blob/master/std/variant.d#L210

That's a query for the TypeInfo.

 https://github.com/D-Programming-Language/phobos/blob/master/std/variant.d#L371

That could be translated to a comparison of pointers to functions.

 https://github.com/D-Programming-Language/phobos/blob/master/std/variant.d#L696

That, too, could be translated to a comparison of pointers to functions.

It's a confision Let me clarify this. What Variant does is to use 
pointers to functions instead of integers. The space overhead (one word) 
is generally the same due to alignment issues.

 Also the handler function concept will always have more overhead than a
 simple tagged union. It is certainly useful if you want to store any
 type, but if you only want a limited set of types there are more
 efficient implementations.

I'm not sure at all actually. The way I see it a pointer to a function 
offers most everything an integer does, plus universal functionality by 
actually calling the function. What it doesn't offer is ordering of 
small integers, but that can be easily arranged at a small cost.


Andrei

Aug 03 2014

=?ISO-8859-1?Q?S=F6nke_Ludwig?= <sludwig rejectedsoftware.com> writes:

Am 03.08.2014 17:34, schrieb Andrei Alexandrescu:
 On 8/3/14, 2:38 AM, S�nke Ludwig wrote:
 [snip]

 We need to address the matter of std.jgrandson competing with
 vibe.data.json. Clearly at a point only one proposal will have to be
 accepted so the other would be wasted work.

 Following our email exchange I decided to work on this because (a) you
 mentioned more work is needed and your schedule was unclear, (b) we need
 this at FB sooner rather than later, (c) there were a few things I
 thought can be improved in vibe.data.json. I hope that taking
 std.jgrandson to proof spurs things into action.

 Would you want to merge some of std.jgrandson's deltas into a new
 proposal std.data.json based on vibe.data.json? Here's a few things that
 I consider necessary:

 1. Commit to a schedule. I can't abandon stuff in wait for the perfect
 design that may or may not come someday.

This may be the crux w.r.t. the vibe.data.json implementation. My 
schedule will be very crowded this month, so I could only really start 
to work on it beginning of September. But apart from the mentioned 
points, I think your implementation is already the closest thing to what 
I have in mind, so I'm all for going the clean slate route (I'll have to 
do a lot in terms of deprecation work in vibe.d anyway).

 2. Avoid UTF decoding.

 3. Offer a lazy token stream as a basis for a non-lazy parser. A lazy
 general parser would be considerably more difficult to write and would
 only serve a small niche. On the other hand, a lazy tokenizer is easy to
 write and make efficient, and serve as a basis for user-defined
 specialized lazy parsers if the user wants so.

 4. Avoid string allocation. String allocation can be replaced with
 slices of the input when these two conditions are true: (a) input type
 is string, immutable(byte)[], or immutable(ubyte)[]; (b) there are no
 backslash-encoded sequences in the string, i.e. the input string and the
 actual string are the same.

 5. Build on std.variant through and through. Again, anything that
 doesn't work is a usability bug in std.variant, which was designed for
 exactly this kind of stuff. Exposing the representation such that user
 code benefits of the Algebraic's primitives may be desirable.

 6. Address w0rp's issue with undefined. In fact std.Algebraic does have
 an uninitialized state :o).

 S�nke, what do you think?

My requirements would be the same, except for 6.

The "undefined" state in the vibe.d version was necessary due to early 
API decisions and it's more or less a prominent part of it (specifically 
because the API was designed to behave similar to JavaScript). In 
hindsight, I'd definitely avoid that. However, I don't think its 
existence (also in the form of Algebraic.init) is an issue per se, as 
long as such values are properly handled when converting the runtime 
value back to a JSON string (i.e. skipped or treated as null values).

Aug 03 2014

"w0rp" <devw0rp gmail.com> writes:

On Sunday, 3 August 2014 at 18:37:48 UTC, Sönke Ludwig wrote:
 Am 03.08.2014 17:34, schrieb Andrei Alexandrescu:
 6. Address w0rp's issue with undefined. In fact std.Algebraic 
 does have
 an uninitialized state :o).

 My requirements would be the same, except for 6.

 The "undefined" state in the vibe.d version was necessary due 
 to early API decisions and it's more or less a prominent part 
 of it (specifically because the API was designed to behave 
 similar to JavaScript). In hindsight, I'd definitely avoid 
 that. However, I don't think its existence (also in the form of 
 Algebraic.init) is an issue per se, as long as such values are 
 properly handled when converting the runtime value back to a 
 JSON string (i.e. skipped or treated as null values).

My issue with is is that if you ask for a key in an object which 
doesn't exist, you get an 'undefined' value back, just like 
JavaScript. I'd rather that be propagated as a RangeError, which 
is more consistent with associative arrays in the language and 
probably more correct. A minor issue is being able to create a 
Json object which isn't a valid Json object by itself. I'd rather 
the initial value was just 'null', which would match how pointers 
and class instances behave in the language.

Aug 03 2014

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 03.08.2014 20:57, schrieb w0rp:
 On Sunday, 3 August 2014 at 18:37:48 UTC, Sönke Ludwig wrote:
 The "undefined" state in the vibe.d version was necessary due to early
 API decisions and it's more or less a prominent part of it
 (specifically because the API was designed to behave similar to
 JavaScript). In hindsight, I'd definitely avoid that. However, I don't
 think its existence (also in the form of Algebraic.init) is an issue
 per se, as long as such values are properly handled when converting
 the runtime value back to a JSON string (i.e. skipped or treated as
 null values).

 My issue with is is that if you ask for a key in an object which doesn't
 exist, you get an 'undefined' value back, just like JavaScript. I'd
 rather that be propagated as a RangeError, which is more consistent with
 associative arrays in the language and probably more correct.

Yes, this is what I meant with the JavaScript part of API. In addition 
to opIndex(), there should of course also be a .get(key, default_value) 
style accessor and the "in" operator.

 A minor
 issue is being able to create a Json object which isn't a valid Json
 object by itself. I'd rather the initial value was just 'null', which
 would match how pointers and class instances behave in the language.

This is what I meant with not being an issue by itself. But having such 
a special value of course has its pros and cons, and I could personally 
definitely also live with JSON values being initialized to JSON "null", 
if somebody hacks Algebraic to support that kind of use case.

Aug 03 2014

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:

On Sunday, 3 August 2014 at 19:54:12 UTC, Sönke Ludwig wrote:
 Am 03.08.2014 20:57, schrieb w0rp:
 My issue with is is that if you ask for a key in an object 
 which doesn't
 exist, you get an 'undefined' value back, just like 
 JavaScript. I'd
 rather that be propagated as a RangeError, which is more 
 consistent with
 associative arrays in the language and probably more correct.

 Yes, this is what I meant with the JavaScript part of API. In 
 addition to opIndex(), there should of course also be a 
 .get(key, default_value) style accessor and the "in" operator.

There is a parallel discussion about the concept of associative 
ranges:
http://forum.dlang.org/thread/jheurakujksdlrjaoncs forum.dlang.org

Maybe you could also have a look there, because JSON seems to be 
a good candidate for an associative range.

Aug 04 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 8/3/14, 11:37 AM, S�nke Ludwig wrote:
 Am 03.08.2014 17:34, schrieb Andrei Alexandrescu:
 On 8/3/14, 2:38 AM, S�nke Ludwig wrote:
 [snip]

 We need to address the matter of std.jgrandson competing with
 vibe.data.json. Clearly at a point only one proposal will have to be
 accepted so the other would be wasted work.

 Following our email exchange I decided to work on this because (a) you
 mentioned more work is needed and your schedule was unclear, (b) we need
 this at FB sooner rather than later, (c) there were a few things I
 thought can be improved in vibe.data.json. I hope that taking
 std.jgrandson to proof spurs things into action.

 Would you want to merge some of std.jgrandson's deltas into a new
 proposal std.data.json based on vibe.data.json? Here's a few things that
 I consider necessary:

 1. Commit to a schedule. I can't abandon stuff in wait for the perfect
 design that may or may not come someday.

 This may be the crux w.r.t. the vibe.data.json implementation. My
 schedule will be very crowded this month, so I could only really start
 to work on it beginning of September. But apart from the mentioned
 points, I think your implementation is already the closest thing to what
 I have in mind, so I'm all for going the clean slate route (I'll have to
 do a lot in terms of deprecation work in vibe.d anyway).

What would be your estimated time of finishing?

Would anyone want to take vibe.data.json and std.jgrandson, put them in 
a crucible, and have std.data.json emerge from it in a timely manner? My 
understanding is that everyone involved would be cool with that.


Andrei

Aug 03 2014

=?ISO-8859-1?Q?S=F6nke_Ludwig?= <sludwig rejectedsoftware.com> writes:

Am 03.08.2014 21:53, schrieb Andrei Alexandrescu:
 What would be your estimated time of finishing?

My rough estimate would be that about two weeks of calender time should 
suffice for a first candidate, since the functionality and the design is 
already mostly there. However, it seems that VariantN will need some 
work, too (currently using opAdd results in an error for an Algebraic 
defined for JSON usage).

Aug 05 2014

"w0rp" <devw0rp gmail.com> writes:

On Sunday, 3 August 2014 at 07:16:05 UTC, Andrei Alexandrescu 
wrote:
 We need a better json library at Facebook. I'd discussed with 
 Sönke the possibility of taking vibe.d's json to std but he 
 said it needs some more work. So I took std.jgrandson to proof 
 of concept state and hence ready for destruction:

 http://erdani.com/d/jgrandson.d
 http://erdani.com/d/phobos-prerelease/std_jgrandson.html

 Here are a few differences compared to vibe.d's library. I 
 think these are desirable to have in that library as well:

 * Parsing strings is decoupled into tokenization (which is lazy 
 and only needs an input range) and parsing proper. Tokenization 
 is lazy, which allows users to create their own advanced (e.g. 
 partial/lazy) parsing if needed. The parser itself is eager.

 * There's no decoding of strings.

 * The representation is built on Algebraic, with the advantages 
 that it benefits from all of its primitives. Implementation is 
 also very compact because Algebraic obviates a bunch of 
 boilerplate. Subsequent improvements to Algebraic will also 
 reflect themselves into improvements to std.jgrandson.

 * The JSON value (called std.jgrandson.Value) has no named 
 member variables or methods except for __payload. This is so 
 there's no clash between dynamic properties exposed via 
 opDispatch.

 Well that's about it. What would it take for this to become a 
 Phobos proposal? Destroy.


 Andrei

I like it. Here's what I think about it.

* When I wrote my JSON library, the thing I wanted most was
constructors and opAssign functions for creating JSON values 
easily.
JSON x = "some string"; You have this, so it's great.
* You didn't include an 'undefined' value like vibe.d, which is a 
very
minor detail, but something I dislike. This is good.
* I'd just name Value either 'JSON' or 'JSONValue.' So you can 
just import the module without using aliases.
* opDispatch is kind of "meh" for JSON objects. It works until 
you hit a name clash with a UFCS function. I don't mind typing 
the extra three characters.

That's all I could think of really.

Aug 03 2014

Daniel Gibson <metalcaedes gmail.com> writes:

Am 03.08.2014 09:16, schrieb Andrei Alexandrescu:
 We need a better json library at Facebook. I'd discussed with S�nke the
 possibility of taking vibe.d's json to std but he said it needs some
 more work. So I took std.jgrandson to proof of concept state and hence
 ready for destruction:

 http://erdani.com/d/jgrandson.d
 http://erdani.com/d/phobos-prerelease/std_jgrandson.html


Is the name supposed to stay or just a working title?
"std.j*grandson*" (being the successor of "std.j*son*") is of course a 
funny play of words, but it's not really obvious on the first sight what 
it does.
i.e. if someone skims the std. modules in the documentation, looking for 
json, he'd probably not think that this is the new json module.
std.json2 or something like that would be more obvious.

Cheers,
Daniel

Aug 03 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 8/3/14, 9:49 AM, Daniel Gibson wrote:
 Am 03.08.2014 09:16, schrieb Andrei Alexandrescu:
 We need a better json library at Facebook. I'd discussed with S�nke the
 possibility of taking vibe.d's json to std but he said it needs some
 more work. So I took std.jgrandson to proof of concept state and hence
 ready for destruction:

 http://erdani.com/d/jgrandson.d
 http://erdani.com/d/phobos-prerelease/std_jgrandson.html


 Is the name supposed to stay or just a working title?

Just a working title, but of course if it were wildly successful... but 
then again it's not. -- Andrei

Aug 03 2014

"Sean Kelly" <sean invisibleduck.org> writes:

I don't want to pay for anything I don't use.  No allocations 
should occur within the parser and it should simply slice up the 
input.  So the lowest layer should allow me to iterate across 
symbols in some way.  When I've done this in the past it was 
SAX-style (ie. a callback per type) but with the range interface 
that shouldn't be necessary.

The parser shouldn't decode or convert anything unless I ask it 
to.  Most of the time I only care about specific values, and 
paying for conversions on everything is wasted process time.

I suggest splitting number into float and integer types.  In a 
language like D where these are distinct internal types, it can 
be valuable to know this up front.

Is there support for output?  I see the makeArray and makeObject 
routines...  Ideally, there should be a way to serialize JSON 
against an OutputRange with optional formatting.

Aug 03 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 8/3/14, 10:19 AM, Sean Kelly wrote:
 I don't want to pay for anything I don't use.  No allocations should
 occur within the parser and it should simply slice up the input.

What to do about arrays and objects, which would naturally allocate 
arrays and associative arrays respectively? What about strings with 
backslash-encoded characters?

No allocation works for tokenization, but parsing is a whole different 
matter.

 So the
 lowest layer should allow me to iterate across symbols in some way.

Yah, that would be the tokenizer.

 When I've done this in the past it was SAX-style (ie. a callback per
 type) but with the range interface that shouldn't be necessary.

 The parser shouldn't decode or convert anything unless I ask it to.
 Most of the time I only care about specific values, and paying for
 conversions on everything is wasted process time.

That's tricky. Once you scan for 2 specific characters you may as well 
scan for a couple more, the added cost is negligible. In contrast, 
scanning once for finding termination and then again for decoding 
purposes will definitely be a lot more expensive.

 I suggest splitting number into float and integer types.  In a language
 like D where these are distinct internal
bfulifbucivrdfvhhjnrunrgultdjbjutypes, it can be valuable to
 know this up front.

Yah, that kept on sticking like a sore thumb throughout.

 Is there support for output?  I see the makeArray and makeObject
 routines...  Ideally, there should be a way to serialize JSON against an
 OutputRange with optional formatting.

Not yet, and yah those should be in.


Andrei

Aug 03 2014

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

03-Aug-2014 21:40, Andrei Alexandrescu пишет:
 On 8/3/14, 10:19 AM, Sean Kelly wrote:
 I don't want to pay for anything I don't use.  No allocations should
 occur within the parser and it should simply slice up the input.

 What to do about arrays and objects, which would naturally allocate
 arrays and associative arrays respectively? What about strings with
 backslash-encoded characters?

SAX-style would imply that array is "parsed" by calling 6 user-defined 
callbacks inside of a parser:
startArray, endArray, startObject, endObject, id and value.

A simplified pseudo-code of JSON-parser inner loop is then:

if(cur == '[')
        startArray();
else if(cur == '{'){
	startObject();
else if(cur == '}')
	endObject();
else if(cur == ']')
	endArray();
else{
      if(expectObjectKey){
	id(parseAsIdentifier());
      }
      else
	value(parseAsValue());
}

This is as barebones as it can get and is very fast in practice esp. in 
context of searching/extracting/matching specific sub-tries of JSON 
documents.

-- 
Dmitry Olshansky

Aug 03 2014

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

03-Aug-2014 23:54, Dmitry Olshansky пишет:
 03-Aug-2014 21:40, Andrei Alexandrescu пишет:
 A simplified pseudo-code of JSON-parser inner loop is then:

 if(cur == '[')
         startArray();
 else if(cur == '{'){

Aw. Stray brace..


-- 
Dmitry Olshansky

Aug 03 2014

"Sean Kelly" <sean invisibleduck.org> writes:

On Sunday, 3 August 2014 at 17:40:48 UTC, Andrei Alexandrescu 
wrote:
 On 8/3/14, 10:19 AM, Sean Kelly wrote:
 I don't want to pay for anything I don't use.  No allocations 
 should
 occur within the parser and it should simply slice up the 
 input.

 What to do about arrays and objects, which would naturally 
 allocate arrays and associative arrays respectively? What about 
 strings with backslash-encoded characters?

This is tricky with a range. With an event-based parser I'd have 
events for object and array begin / end, but with a range you end 
up having an element that's a token, which is pretty weird. For 
encoded characters (and you need to make sure you handle 
surrogate pairs in your decoder) I'd still provide some means of 
decoding on demand. If nothing else, decode lazily when the user 
asks for the string value.  That way the user isn't paying to 
decode strings he isn't interested in.


 No allocation works for tokenization, but parsing is a whole 
 different matter.

 So the
 lowest layer should allow me to iterate across symbols in some 
 way.

 Yah, that would be the tokenizer.

But that will halt on comma and colon and such, correct?  That's 
a tad lower than I'd want, though I guess it would be easy enough 
to build a parser on top of it.


 When I've done this in the past it was SAX-style (ie. a 
 callback per
 type) but with the range interface that shouldn't be necessary.

 The parser shouldn't decode or convert anything unless I ask 
 it to.
 Most of the time I only care about specific values, and paying 
 for
 conversions on everything is wasted process time.

 That's tricky. Once you scan for 2 specific characters you may 
 as well scan for a couple more, the added cost is negligible. 
 In contrast, scanning once for finding termination and then 
 again for decoding purposes will definitely be a lot more 
 expensive.

I think I'm getting a bit confused. For the JSON parser I wrote, 
the parser performs full validation but leaves the content as-is, 
then provides a routine to decode values from their string 
representation if the user wishes to. I'm not sure where scanning 
figures in here.
 Andrei

Aug 03 2014

"Jacob Carlborg" <doob me.com> writes:

On Sunday, 3 August 2014 at 20:40:47 UTC, Sean Kelly wrote:

 This is tricky with a range. With an event-based parser I'd 
 have events for object and array begin / end, but with a range 
 you end up having an element that's a token, which is pretty 
 weird.

Have a look at Token.Kind in the top of the module [1]. The enum 
has objectStart, objectEnd, arrayStart and arrayEnd. By just 
looking that that, it seems it already works very similar to an 
event parser, but with a range API. This is exactly like the XML 
pull parser in Tango.

[1] http://erdani.com/d/jgrandson.d

--
/Jacob Carlborg

Aug 04 2014

"Jacob Carlborg" <doob me.com> writes:

On Sunday, 3 August 2014 at 17:19:04 UTC, Sean Kelly wrote:

 Is there support for output?  I see the makeArray and 
 makeObject routines...  Ideally, there should be a way to 
 serialize JSON against an OutputRange with optional formatting.

I think it should only provide very primitive functions to 
serialize basic data types. Then Phobos should provide a separate 
module/package for generic serialization where JSON is an archive 
type using this module as its backend.

--
/Jacob Carlborg

Aug 04 2014

Orvid King <blah38621 gmail.com> writes:

On 8/3/2014 2:16 AM, Andrei Alexandrescu wrote:
 We need a better json library at Facebook. I'd discussed with S�nke the
 possibility of taking vibe.d's json to std but he said it needs some
 more work. So I took std.jgrandson to proof of concept state and hence
 ready for destruction:

 http://erdani.com/d/jgrandson.d
 http://erdani.com/d/phobos-prerelease/std_jgrandson.html

 Here are a few differences compared to vibe.d's library. I think these
 are desirable to have in that library as well:

 * Parsing strings is decoupled into tokenization (which is lazy and only
 needs an input range) and parsing proper. Tokenization is lazy, which
 allows users to create their own advanced (e.g. partial/lazy) parsing if
 needed. The parser itself is eager.

 * There's no decoding of strings.

 * The representation is built on Algebraic, with the advantages that it
 benefits from all of its primitives. Implementation is also very compact
 because Algebraic obviates a bunch of boilerplate. Subsequent
 improvements to Algebraic will also reflect themselves into improvements
 to std.jgrandson.

 * The JSON value (called std.jgrandson.Value) has no named member
 variables or methods except for __payload. This is so there's no clash
 between dynamic properties exposed via opDispatch.

 Well that's about it. What would it take for this to become a Phobos
 proposal? Destroy.


 Andrei

If your looking for serialization from statically known type layouts 
then I believe my JSON (de)serialization code 
(https://github.com/Orvid/JSONSerialization) might actually be of 
interest to you, as it uses no intermediate representation, nor does it 
allocate when it converts an object to JSON. As far as I know, even when 
only compiled with DMD, it's among the fastest JSON (de)serialization 
libraries.

Unless it needs to convert a floating point number to a string, in which 
case I suppose you could certainly use a local buffer to write to, but 
at the moment it just converts it to a normal string that gets written 
to the output range. It also supports (de)serializing from, what I 
called at the time, dynamic types, such as std.variant, which isn't 
actually supported because that code is only there because I needed it 
for something else, and wasn't using std.variant at the time.

Aug 03 2014

"Andrea Fontana" <nospam example.com> writes:

On Sunday, 3 August 2014 at 07:16:05 UTC, Andrei Alexandrescu 
wrote:
 We need a better json library at Facebook. I'd discussed with 
 Sönke the possibility of taking vibe.d's json to std but he 
 said it needs some more work. So I took std.jgrandson to proof 
 of concept state and hence ready for destruction:

 http://erdani.com/d/jgrandson.d
 http://erdani.com/d/phobos-prerelease/std_jgrandson.html

 Here are a few differences compared to vibe.d's library. I 
 think these are desirable to have in that library as well:

 * Parsing strings is decoupled into tokenization (which is lazy 
 and only needs an input range) and parsing proper. Tokenization 
 is lazy, which allows users to create their own advanced (e.g. 
 partial/lazy) parsing if needed. The parser itself is eager.

 * There's no decoding of strings.

 * The representation is built on Algebraic, with the advantages 
 that it benefits from all of its primitives. Implementation is 
 also very compact because Algebraic obviates a bunch of 
 boilerplate. Subsequent improvements to Algebraic will also 
 reflect themselves into improvements to std.jgrandson.

 * The JSON value (called std.jgrandson.Value) has no named 
 member variables or methods except for __payload. This is so 
 there's no clash between dynamic properties exposed via 
 opDispatch.

 Well that's about it. What would it take for this to become a 
 Phobos proposal? Destroy.


 Andrei

On my bson library I found very useful to have some methods to 
know if a field exists or not, and to get a "defaulted" value. 
Something like:

auto assume(T)(Value v, T default = T.init);

Another good method could be something like xpath to get a deep 
value:

Value v = value["/path/to/sub/object"];

Moreover in my library I actually have three different methods to 
read a value:

T get(T)() // Exception if value is not a T or not valid or value 
doesn't exist
T to(T)()  // Try to convert value to T using to!string. 
Exception if doesn't exists or not valid

BsonField!T as(T)(lazy T default = T.init)  // Always return a 
value

BsonField!T is an "alias this"-ed struct with two fields: T value 
and bool error(). T value is the aliased field, and error() tells 
you if value is defaulted (because of an error: field not exists 
or can't convert to T)

So I can write something like this:

int myvalue = json["/that/deep/property"].as!int;

or

auto myvalue = json["/that/deep/property"].as!int(10);

if (myvalue.error) writeln("Property doesn't exists, I'm using 
default value);

writeln("Property value: ", myvalue);

I hope this can be useful...

Aug 04 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 8/4/14, 12:47 AM, Andrea Fontana wrote:
 On my bson library I found very useful to have some methods to know if a
 field exists or not, and to get a "defaulted" value. Something like:

 auto assume(T)(Value v, T default = T.init);

Nice. Probably "get" would be better to be in keep with built-in hashtables.

 Another good method could be something like xpath to get a deep value:

 Value v = value["/path/to/sub/object"];

Cool. Is it unlikely that a value contains an actual slash? If so would 
be value["path"]["to"]["sub"]["object"] more precise?

 Moreover in my library I actually have three different methods to read a
 value:

 T get(T)() // Exception if value is not a T or not valid or value
 doesn't exist
 T to(T)()  // Try to convert value to T using to!string. Exception if
 doesn't exists or not valid

 BsonField!T as(T)(lazy T default = T.init)  // Always return a value

 BsonField!T is an "alias this"-ed struct with two fields: T value and
 bool error(). T value is the aliased field, and error() tells you if
 value is defaulted (because of an error: field not exists or can't
 convert to T)

 So I can write something like this:

 int myvalue = json["/that/deep/property"].as!int;

 or

 auto myvalue = json["/that/deep/property"].as!int(10);

 if (myvalue.error) writeln("Property doesn't exists, I'm using default
 value);

 writeln("Property value: ", myvalue);

 I hope this can be useful...

Sure is, thanks. Listen, would you want to volunteer a std.data.json 
proposal?


Andrei

Aug 04 2014

"Andrea Fontana" <nospam example.com> writes:

On Monday, 4 August 2014 at 16:58:12 UTC, Andrei Alexandrescu 
wrote:
 On 8/4/14, 12:47 AM, Andrea Fontana wrote:
 On my bson library I found very useful to have some methods to 
 know if a
 field exists or not, and to get a "defaulted" value. Something 
 like:

 auto assume(T)(Value v, T default = T.init);

 Nice. Probably "get" would be better to be in keep with 
 built-in hashtables.

I wrote assume just to use proposed syntax :)

 Another good method could be something like xpath to get a 
 deep value:

 Value v = value["/path/to/sub/object"];

 Cool. Is it unlikely that a value contains an actual slash? If 
 so would be value["path"]["to"]["sub"]["object"] more precise?

Key with a slash (or dot?) inside is not common at all. Never 
seen on json data.

In many languages there're libraries to bind json to struct or 
objects so usually people doesn't use strange chars inside key. 
If needed you can still use old good method to read a single 
field.

value["path"]["to"]["object"] was my first choice but i didn't 
like it.

First: it create a lot of temporary objects.

Second: it is easier to implement using a single string (also on 
assignment)

I gave it a try with value["path", "to", "index"] but it's not 
confortable if you need to generate your path from code.


 Moreover in my library I actually have three different methods 
 to read a
 value:

 T get(T)() // Exception if value is not a T or not valid or 
 value
 doesn't exist
 T to(T)()  // Try to convert value to T using to!string. 
 Exception if
 doesn't exists or not valid

 BsonField!T as(T)(lazy T default = T.init)  // Always return a 
 value

 BsonField!T is an "alias this"-ed struct with two fields: T 
 value and
 bool error(). T value is the aliased field, and error() tells 
 you if
 value is defaulted (because of an error: field not exists or 
 can't
 convert to T)

 So I can write something like this:

 int myvalue = json["/that/deep/property"].as!int;

 or

 auto myvalue = json["/that/deep/property"].as!int(10);

 if (myvalue.error) writeln("Property doesn't exists, I'm using 
 default
 value);

 writeln("Property value: ", myvalue);

 I hope this can be useful...

 Sure is, thanks. Listen, would you want to volunteer a 
 std.data.json proposal?

What does it mean? :)

 Andrei

Aug 05 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 8/5/14, 2:08 AM, Andrea Fontana wrote:
 Sure is, thanks. Listen, would you want to volunteer a std.data.json
 proposal?

 What does it mean? :)

One one side enters vibe.data.json with the deltas prompted by 
std.jgrandson plus your talent and determination, and on the other side 
comes std.data.json with code and documentation that passes the Phobos 
review process. -- Andrei

Aug 05 2014

"Jacob Carlborg" <doob me.com> writes:

On Sunday, 3 August 2014 at 07:16:05 UTC, Andrei Alexandrescu 
wrote:
 We need a better json library at Facebook. I'd discussed with 
 Sönke the possibility of taking vibe.d's json to std but he 
 said it needs some more work. So I took std.jgrandson to proof 
 of concept state and hence ready for destruction:

 http://erdani.com/d/jgrandson.d
 http://erdani.com/d/phobos-prerelease/std_jgrandson.html

* Could you please put it on Github to get syntax highlighting 
and all the other advantages
* It doesn't completely follow the Phobos naming conventions
* The indentation is off in some places
* The unit tests is a bit lacking for the separate parsing 
functions
* There are methods for getting the strings and numbers, what 
about booleans?
* Shouldn't it be called TokenRange?
* Shouldn't this be built using the lexer generator you so 
strongly have been pushing for?

* The unit tests for TokenStream is very dense. I would prefer 
empty newlines for grouping "assert" and calls to "popFront" 
belonging together

--
/Jacob Carlborg

Aug 04 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 8/4/14, 12:56 AM, Jacob Carlborg wrote:
 On Sunday, 3 August 2014 at 07:16:05 UTC, Andrei Alexandrescu wrote:
 We need a better json library at Facebook. I'd discussed with Sönke
 the possibility of taking vibe.d's json to std but he said it needs
 some more work. So I took std.jgrandson to proof of concept state and
 hence ready for destruction:

 http://erdani.com/d/jgrandson.d
 http://erdani.com/d/phobos-prerelease/std_jgrandson.html


Thanks for your comments! A few responses within:

 * Could you please put it on Github to get syntax highlighting and all
 the other advantages

Quick workaround: http://dpaste.dzfl.pl/65f4dcc36ab8

 * It doesn't completely follow the Phobos naming conventions

What would be the places?

 * The indentation is off in some places

Xamarin/Mono-D is at fault here :o).

 * The unit tests is a bit lacking for the separate parsing functions

Agreed.

 * There are methods for getting the strings and numbers, what about
 booleans?

You mean for Token? Good point. Numbers and strings are somewhat special 
because they have a payload associated. In contrast Booleans are 
represented by two distinct tokens. Would be good to add a convenience 
method.

 * Shouldn't it be called TokenRange?

Yah.

 * Shouldn't this be built using the lexer generator you so strongly have
 been pushing for?

Of course, and in the beginning I've actually pasted some code from it. 
Then I regressed to minimizing dependencies.

 * The unit tests for TokenStream is very dense. I would prefer empty
 newlines for grouping "assert" and calls to "popFront" belonging together

De gustibus et the coloribus non est disputandum :o).


Andrei

Aug 04 2014

Jacob Carlborg <doob me.com> writes:

On 2014-08-04 18:55, Andrei Alexandrescu wrote:

 What would be the places?

That's why it's easier with Github ;) I can comment directly on a line.

I just had a quick look but "_true", "_false" and "_null" in Token.Kind. 
If I recall correctly we add an underscore as a suffix for symbols with 
the same name as keywords.

 You mean for Token? Good point.

Yes, in Token.

 Numbers and strings are somewhat special
 because they have a payload associated. In contrast Booleans are
 represented by two distinct tokens. Would be good to add a convenience
 method.

Right.

 De gustibus et the coloribus non est disputandum :o).

Please avoid these Latin sentences, I have no idea what they mean. This 
is an international community, please don't make it more complicated 
than it already is with language barriers.

-- 
/Jacob Carlborg

Aug 04 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 8/4/14, 11:46 AM, Jacob Carlborg wrote:
 De gustibus et the coloribus non est disputandum :o).

 Please avoid these Latin sentences, I have no idea what they mean. This
 is an international community, please don't make it more complicated
 than it already is with language barriers.

"Favorite foods and colors are not to be disputed." 51,300 results on 
google... and please let's end this before it becomes another Epic 
Debate. -- Andrei

Aug 04 2014

D Programming

C/C++ Programming

Other

digitalmars.D - std.jgrandson