digitalmars.D - RFC: std.json sucessor

=?ISO-8859-15?Q?S=F6nke_Ludwig?= (37/37) Aug 21 2014 Following up on the recent "std.jgrandson" thread [1], I've picked up

Brian Schott (16/17) Aug 21 2014 source/stdx/data/json/lexer.d(263:8)[warn]: 'JSONToken' has

Justin Whear (2/2) Aug 21 2014 Someone needs to make a "showbrianmycode" bot: mention a D github repo

Idan Arye (3/6) Aug 21 2014 Why bother with mentioning a GitHub repo? Just make the bot

Brian Schott (2/4) Aug 21 2014 It's kind of picky. http://i.imgur.com/SHNAWnH.png

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (5/18) Aug 22 2014 Fixed all of them (neither was causing harm, but it's still nicer that

Ary Borenszweig (12/21) Aug 21 2014 Say I have a class Person with name (string) and age (int) with a

=?ISO-8859-15?Q?S=F6nke_Ludwig?= (13/24) Aug 21 2014 Without a serialization framework it would in theory work like this:

Ary Borenszweig (4/30) Aug 22 2014 But does this parse the whole json into JSONValue? I want to create a

=?ISO-8859-15?Q?S=F6nke_Ludwig?= (20/36) Aug 22 2014 That would be done by the serialization framework. Instead of using

Ary Borenszweig (2/39) Aug 22 2014 Cool, that looks good :-)

Colden Cullen (5/5) Aug 21 2014 I notice in the docs there are several references to a

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (3/8) Aug 21 2014 Seems like I forgot to replace a few mentions. They are called

=?ISO-8859-15?Q?S=F6nke_Ludwig?= (4/12) Aug 22 2014 https://github.com/D-Programming-Language/phobos/pull/2452

matovitch (5/5) Aug 22 2014 Very nice ! I had started (and dropped) a json module based on

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (13/17) Aug 22 2014 Exactly, that's the syntax you'd use for JSONValue. But my favorite way

matovitch (5/27) Aug 22 2014 Completely agree, I am waiting for a serializer too. I would love

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (3/5) Aug 22 2014 I see, so you just have to write your own number/string parsing routines...

matovitch (10/18) Aug 22 2014 It's kind of "low level" indeed...I don't know what kind of back

Jacob Carlborg (15/24) Aug 22 2014 * Opening braces should be put on their own line to follow Phobos style

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (5/10) Aug 22 2014 Hmm... my initial reaction was "not as default - it should throw
=?ISO-8859-15?Q?S=F6nke_Ludwig?= (12/35) Aug 22 2014 There are actually no invalid tokens at all, the "invalid" enum value is...

=?ISO-8859-15?Q?S=F6nke_Ludwig?= (4/29) Aug 22 2014 and an additional "error" kind has been added, which implements the

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (16/16) Aug 22 2014 Some thoughts about the API:

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (18/31) Aug 22 2014 For those functions it may be acceptable, although I really dislike that...

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (15/42) Aug 22 2014 I'm not really concerned about the amount of typing, it just

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (9/24) Aug 22 2014 That would be nice, but then it should also work together with std.conv,...

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (11/35) Aug 22 2014 The easiest and cleanest way would be to add a function in

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (2/32) Aug 22 2014 Okay, for parse that may work, but what about to!()?

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (2/16) Aug 22 2014 What's the problem with to!()?

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (4/20) Aug 23 2014 to!() definitely doesn't have a template constraint that excludes

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (6/30) Aug 23 2014 For converting a JSONValue to a different type, JSONValue can

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (5/34) Aug 23 2014 That would just introduce the said dependency cycle between JSONValue,

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (4/52) Aug 23 2014 That's what I expect it to do anyway. For parsing, there are

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (3/43) Aug 23 2014 Probably, but then to!() is inconsistent with parse!(). Usually they are...

Christian Manning (9/9) Aug 22 2014 It would be nice to have integers treated separately to doubles.

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (8/16) Aug 22 2014 That's how I've done it for vibe.data.json, too. For the new

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (3/14) Aug 22 2014 It should automatically fall back to double on overflow. Maybe

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (8/20) Aug 22 2014 I guess BigInt + exponent would be the only lossless way to represent

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (4/31) Aug 22 2014 As the functions will be templatized anyway, it should include a

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (4/33) Aug 22 2014 I'm actually in the process of converting the "track_location" parameter...

Christian Manning (5/32) Aug 22 2014 You could check for a decimal point and a 0 at the front

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (7/36) Aug 22 2014 Yes, no decimal point + no exponent would work without overhead to

John Colvin (12/64) Aug 22 2014 It might be the right choice anyway (seeing as json/js do
Christian Manning (18/24) Aug 22 2014 Ah I see.

Walter Bright (17/18) Aug 22 2014 Thanks for taking this on! This is valuable work. On to destruction!

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (12/29) Aug 22 2014 The latest version now features a LexOptions.noThrow option which causes...

Walter Bright (9/31) Aug 22 2014 Having a nothrow option may prevent the functions from being attributed ...

Walter Bright (5/14) Aug 22 2014 Another possibility is to have the user pass in a resizeable buffer whic...

Ola Fosheim Gr (3/9) Aug 22 2014 Does this mean that D is getting resizable stack allocations in

Walter Bright (2/7) Aug 22 2014 scopebuffer does not require resizeable stack allocations.

Ola Fosheim Gr (16/21) Aug 22 2014 So you cannot use the stack for resizable allocations.

Walter Bright (2/9) Aug 22 2014 Please, take a look at how scopebuffer works.

Ola Fosheim Gr (12/24) Aug 22 2014 I have? It requires an upperbound to stay on the stack, that

Walter Bright (5/26) Aug 22 2014 Scopebuffer is extensively used in Warp, and works very well. The "hole"...

Ola Fosheim Gr (9/13) Aug 22 2014 Well, on a webserver you don't want to push out the caches for no

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (3/17) Aug 23 2014 It's a compile time option, so that shouldn't be an issue. There is also...

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (11/19) Aug 23 2014 I've added two new types now to abstract away how strings and numbers

Walter Bright (2/3) Aug 23 2014 Why the immutable(ubyte)[] ?

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (5/8) Aug 23 2014 I've adopted that basically from Andrei's module. The idea is to allow

Walter Bright (3/12) Aug 23 2014 I feel that non-UTF encodings should be handled by adapter algorithms, n...

Brad Roberts via Digitalmars-d (9/24) Aug 23 2014 For performance purposes, determining encoding during lexing is useful.

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (5/7) Aug 23 2014 I am not so sure when it comes to SIMD lexing. I think the

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (3/3) Aug 23 2014 Some baselines for performance:

Walter Bright (2/6) Aug 23 2014 I'm not convinced that using an adapter algorithm won't be just as fast.

Brad Roberts via Digitalmars-d (7/14) Aug 23 2014 Consider your own talks on optimizing the existing dmd lexer. In those

Walter Bright (5/11) Aug 25 2014 On the other hand, deadalnix demonstrated that the ldc optimizer was abl...

simendsjo (7/23) Aug 25 2014 I just happened to write a very small script yesterday and tested with

Walter Bright (2/8) Aug 25 2014 Speed optimizations are different.
Jacob Carlborg (5/6) Aug 26 2014 It's because the latest release of LDC has the --gc-sections falg

Entusiastic user (4/9) Aug 26 2014 I tried using "-disable-linker-strip-dead", but it had no effect.

Andrei Alexandrescu (8/23) Aug 23 2014 I think accepting ubyte it's a good idea. It means "got this stream of

Walter Bright (9/14) Aug 23 2014 Using an adapter still makes sense, because:

Andrei Alexandrescu (6/25) Aug 23 2014 An adapter would solve the wrong problem here. There's nothing to adapt

Walter Bright (6/10) Aug 25 2014 The adaptation is to take arbitrary byte input in an unknown encoding an...

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (14/16) Aug 25 2014 I agree.

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (4/18) Aug 25 2014 BTW, JSON is *required* to be UTF encoded anyway as per RFC-7159, which

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (10/13) Aug 25 2014 The lexer cannot assume valid UTF since the client might be a

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (13/25) Aug 25 2014 But why should UTF validation be the job of the lexer in the first

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (21/32) Aug 25 2014 Because you want to save time, it is faster to integrate

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (17/21) Aug 25 2014 I think it is doable and worth it…

Kiith-Sa (23/44) Aug 25 2014 D:YAML uses a similar approach, but with 8 bytes (plain ulong -

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (6/10) Aug 25 2014 Cool!

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (11/40) Aug 26 2014 I guess it depends on if you look at the grammar as productions or

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (12/24) Aug 26 2014 I think you should validate JSON-strings to be UTF-8 encoded even

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (22/43) Aug 26 2014 I think this is a misunderstanding. What I mean is that if the input

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (3/5) Aug 26 2014 Yes, so this will be supported? Because this is what is most

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (3/7) Aug 26 2014 If nobody plays a veto card, I'll implement it that way.

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (11/11) Aug 25 2014 Btw, maybe it would be a good idea to take a look on the JSON

Walter Bright (2/4) Aug 25 2014 I think that settles it.

Andrej Mitrovic via Digitalmars-d (11/12) Aug 22 2014 This confused me for a solid minute:

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (4/16) Aug 22 2014 Hmmm, but it *is* a string. Isn't the problem more the use of with in

Andrej Mitrovic via Digitalmars-d (3/5) Aug 23 2014 Yeah, maybe so. I thought for a second it was a tuple, but then I saw

deadalnix (17/17) Aug 22 2014 First thank you for your work. std.json is horrible to use right

ketmar via Digitalmars-d (3/8) Aug 22 2014 jsvar using opDispatch, and S=C3=B6nke wrote:
=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (17/31) Aug 23 2014 Setting the issue of opDispatch aside, one of the goals was to use

w0rp (8/17) Aug 23 2014 I have seen similar issues to these with simplexml in PHP. Using

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (4/7) Aug 23 2014 It's split into two separate functions now. Having to type out a full

deadalnix (3/12) Aug 23 2014 Yes, I don't mind missing that one. It look like a false good

=?ISO-8859-15?Q?S=F6nke_Ludwig?= (18/18) Aug 25 2014 I've added support (compile time option [1]) for long and BigInt in the

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (12/15) Aug 25 2014 It can be very useful to have a base 10 exponent representation

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (6/20) Aug 25 2014 In fact, I've already prepared the code for that, but commented it out

Don (14/61) Aug 25 2014 One missing feature (which is also missing from the existing

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (15/21) Aug 25 2014 I believe you are allowed to use very high exponents, though.

Walter Bright (5/21) Aug 25 2014 Infinity. Mapping to max value would be a horrible bug.

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (8/14) Aug 25 2014 Yes… but then you are reading an illegal value that JSON does not

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (4/5) Aug 25 2014 It is defined in C++11:

Walter Bright (5/9) Aug 25 2014 I didn't know that. But recall I did implement it in DMC++, and it turne...

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (3/6) Aug 25 2014 Well, one should initialize with signaling NaN. Then you get an

Walter Bright (3/9) Aug 25 2014 That's the theory. The practice doesn't work out so well.

Don (6/19) Aug 26 2014 To be more concrete:

Ola Fosheim Gr (3/7) Aug 26 2014 I disagree. AFAIK signaling NaN was standardized in IEEE 754-2008.

Don (20/29) Aug 26 2014 It was always in IEEE754. The decision in 754-2008 was simply to

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (9/14) Aug 26 2014 It was implementation defined before. I think they specified the

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (4/7) Aug 26 2014 …

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (34/34) Aug 26 2014 With the danger of being noisy, these instructions are subject to

Don (6/16) Aug 26 2014 No, it's more subtle. On the original x87, signalling NaNs are

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (10/14) Aug 26 2014 You are right, but it happens for loads from the FP-stack too:

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (8/10) Aug 26 2014 Sorry for being off-topic, but MOVSS and VMOVSS on AMD don't

Walter Bright (7/24) Aug 27 2014 The other issues were just when the snan => qnan conversion took place. ...

Don (16/50) Aug 28 2014 I think the way to think of it is, to the programmer, there is

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (11/19) Aug 28 2014 I disagree with this view.

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (7/7) Aug 28 2014 Or to be more explicit:

Don (9/16) Aug 28 2014 No. Once you load an SNAN, it isn't an SNAN any more! It is a

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (16/24) Aug 28 2014 By which definition? It is only if you consume the SNAN with an

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (9/9) Aug 28 2014 Let me try again:

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (17/17) Aug 28 2014 Kahan states this in a 1997 paper:

Daniel Murphy (5/7) Aug 28 2014 So should we get rid of them from the language completely? Using them a...

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (5/61) Aug 25 2014 This would probably best added as another (CT) optional feature. I think...

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (6/72) Aug 25 2014 By default, floating-point special values are now output as 'null',

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (8/11) Aug 25 2014 ECMAScript presumes double. I think one should base Phobos on

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (34/38) Aug 25 2014 Let me expand a bit on the difference between web clients and

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (8/10) Aug 25 2014 Like... node.js?

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (16/26) Aug 25 2014 Well, of course it's based on that RFC, did you seriously think

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (3/5) Aug 25 2014 Sorry, to be precise, it has no suggestion of how to *handle* infinity
"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (26/36) Aug 25 2014 I made no assumptions, just responded to what you wrote :-). It

Don (13/104) Aug 26 2014 Yes, it should be optional, but not a compile-time option.

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (13/32) Aug 26 2014 Why not a compile time option?

Don (15/60) Aug 26 2014 Please note, I've been talking about the lexer. I'm choosing my

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (8/61) Aug 26 2014 I've been talking about the lexer, too. Sorry for the confusing use of

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (4/13) Aug 26 2014 One argument against supporting it in the parser is that the parser

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (8/13) Aug 26 2014 I don't care either way, but JSON.stringify() has the following

Entusiastic user (43/43) Aug 25 2014 Hi!

Entusiastic user (1/4) Aug 25 2014 I meant Ubuntu 13.10 :D
=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (2/7) Aug 26 2014 I've fixed all errors on DMD 2.065 now. Hopefully that should also fix L...

David Soria Parra (4/14) Aug 26 2014 Do we have any benchmarks for this yet. Note that the main
Atila Neves (5/52) Sep 08 2014 Been using it for a bit now, I think the only thing I have to say
Andrei Alexandrescu (54/54) Oct 12 2014 Here's my destruction of std.data.json.

Sean Kelly (6/18) Oct 12 2014 I'd like to see unescapeStringLiteral() made public. Then I can

Sean Kelly (2/2) Oct 12 2014 Oh, it looks like you aren't checking for 0x7F (DEL) as a control

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (3/5) Oct 13 2014 It doesn't get mentioned in the JSON spec, so I left it out. But I guess...

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (2/6) Oct 13 2014 Will do. Same for the inverse functions.

=?ISO-8859-15?Q?S=F6nke_Ludwig?= (67/120) Oct 13 2014 This is actually more or less done in unescapeStringLiteral() - if it

Jacob Carlborg (4/7) Oct 13 2014 64k?

=?ISO-8859-15?Q?S=F6nke_Ludwig?= (5/11) Oct 13 2014 Oh, I've read "both line and column into a single uint", because of

Daniel Murphy (2/6) Oct 13 2014 I suppose a 4GB single-line json file is still possible.

=?ISO-8859-15?Q?S=F6nke_Ludwig?= (5/11) Oct 13 2014 If we make that assumption, we'd have to change it from size_t to ulong,...

Kiith-Sa (4/22) Oct 13 2014 What are you using the location structs for?

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (4/23) Oct 13 2014 Within the package itself they are also only used for error information....

Andrei Alexandrescu (2/15) Oct 13 2014 Agreed. -- Andrei

Andrei Alexandrescu (2/16) Oct 13 2014 Yah, one uint for each. -- Andrei

Jacob Carlborg (4/11) Oct 13 2014 JSONToken.Kind and JSONParserNode.Kind could be "ubyte" to save space.

=?ISO-8859-15?Q?S=F6nke_Ludwig?= (4/14) Oct 13 2014 But it won't save space in practice, at least on x86, due to alignment,

Andrei Alexandrescu (2/18) Oct 13 2014 Correct. -- Andrei

Ary Borenszweig (4/15) Oct 17 2014 Once its done you can compare its performance against other languages

Sean Kelly (24/27) Oct 18 2014 Wow, the C++Rapid parser is really impressive. I threw together

Sean Kelly (29/50) Oct 18 2014 I just commented out the sscanf() call that was parsing the float
Ary Borenszweig (4/33) Oct 19 2014 Yes, C++ rapid seems to be really, really fast. It has some sse2/see4
David Soria Parra (4/9) Oct 20 2014 I assume this is the standard json module? I am wondering how

Jakob Ovrum (4/5) Feb 05 2015 Added to the review queue as a work in progress with relevant

Andrei Alexandrescu (2/6) Feb 05 2015 Yay! -- Andrei
=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (3/7) Feb 05 2015 Thanks! I(t) should be ready for an official review in one or two weeks

=?ISO-8859-15?Q?S=F6nke_Ludwig?= <sludwig rejectedsoftware.com> writes:

Following up on the recent "std.jgrandson" thread [1], I've picked up 
the work (a lot earlier than anticipated) and finished a first version 
of a loose blend of said std.jgrandson, vibe.data.json and some changes 
that I had planned for vibe.data.json for a while. I'm quite pleased by 
the results so far, although without a serialization framework it still 
misses a very important building block.

Code: https://github.com/s-ludwig/std_data_json
Docs: http://s-ludwig.github.io/std_data_json/
DUB: http://code.dlang.org/packages/std_data_json

The new code contains:
  - Lazy lexer in the form of a token input range (using slices of the
    input if possible)
  - Lazy streaming parser (StAX style) in the form of a node input range
  - Eager DOM style parser returning a JSONValue
  - Range based JSON string generator taking either a token range, a
    node range, or a JSONValue
  - Opt-out location tracking (line/column) for tokens, nodes and values
  - No opDispatch() for JSONValue - this has shown to do more harm than
    good in vibe.data.json

The DOM style JSONValue type is based on std.variant.Algebraic. This 
currently has a few usability issues that can be solved by 
upgrading/fixing Algebraic:

  - Operator overloading only works sporadically
  - No "tag" enum is supported, so that switch()ing on the type of a
    value doesn't work and an if-else cascade is required
  - Operations and conversions between different Algebraic types is not
    conveniently supported, which gets important when other similar
    formats get supported (e.g. BSON)

Assuming that those points are solved, I'd like to get some early 
feedback before going for an official review. One open issue is how to 
handle unescaping of string literals. Currently it always unescapes 
immediately, which is more efficient for general input ranges when the 
unescaped result is needed, but less efficient for string inputs when 
the unescaped result is not needed. Maybe a flag could be used to 
conditionally switch behavior depending on the input range type.

Destroy away! ;)

[1]: http://forum.dlang.org/thread/lrknjl$co7$1 digitalmars.com

Aug 21 2014

"Brian Schott" <briancschott gmail.com> writes:

On Thursday, 21 August 2014 at 22:35:18 UTC, Sönke Ludwig wrote:
 Destroy away! ;)

source/stdx/data/json/lexer.d(263:8)[warn]: 'JSONToken' has 
method 'opEquals', but not 'toHash'.
source/stdx/data/json/lexer.d(499:65)[warn]: Use parenthesis to 
clarify this expression.
source/stdx/data/json/parser.d(516:8)[warn]: 'JSONParserNode' has 
method 'opEquals', but not 'toHash'.
source/stdx/data/json/value.d(95:10)[warn]: Variable c is never 
used.
source/stdx/data/json/value.d(99:10)[warn]: Variable d is never 
used.
source/stdx/data/json/package.d(942:14)[warn]: Variable val is 
never used.

It's likely that you can ignore these, but I thought I'd post 
them anyways. (The last three are in unittest blocks, for 
example.)

Aug 21 2014

Justin Whear <justin economicmodeling.com> writes:

Someone needs to make a "showbrianmycode" bot: mention a D github repo 
and it runs static analysis for you.

Aug 21 2014

"Idan Arye" <GenericNPC gmail.com> writes:

On Thursday, 21 August 2014 at 23:27:28 UTC, Justin Whear wrote:
 Someone needs to make a "showbrianmycode" bot: mention a D 
 github repo
 and it runs static analysis for you.

Why bother with mentioning a GitHub repo? Just make the bot 
periodically scan the DUB registry.

Aug 21 2014

"Brian Schott" <briancschott gmail.com> writes:

On Thursday, 21 August 2014 at 23:33:35 UTC, Idan Arye wrote:
 Why bother with mentioning a GitHub repo? Just make the bot 
 periodically scan the DUB registry.

It's kind of picky. http://i.imgur.com/SHNAWnH.png

Aug 21 2014

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 22.08.2014 00:48, schrieb Brian Schott:
 On Thursday, 21 August 2014 at 22:35:18 UTC, Sönke Ludwig wrote:
 Destroy away! ;)

 source/stdx/data/json/lexer.d(263:8)[warn]: 'JSONToken' has method
 'opEquals', but not 'toHash'.
 source/stdx/data/json/lexer.d(499:65)[warn]: Use parenthesis to clarify
 this expression.
 source/stdx/data/json/parser.d(516:8)[warn]: 'JSONParserNode' has method
 'opEquals', but not 'toHash'.
 source/stdx/data/json/value.d(95:10)[warn]: Variable c is never used.
 source/stdx/data/json/value.d(99:10)[warn]: Variable d is never used.
 source/stdx/data/json/package.d(942:14)[warn]: Variable val is never used.

 It's likely that you can ignore these, but I thought I'd post them
 anyways. (The last three are in unittest blocks, for example.)

Fixed all of them (neither was causing harm, but it's still nicer that 
way). Also added  safe and nothrow where possible.

BTW, anyone knows what's holding back formattedWrite() from being  safe 
for simple types?

Aug 22 2014

Ary Borenszweig <ary esperanto.org.ar> writes:

On 8/21/14, 7:35 PM, S�nke Ludwig wrote:
 Following up on the recent "std.jgrandson" thread [1], I've picked up
 the work (a lot earlier than anticipated) and finished a first version
 of a loose blend of said std.jgrandson, vibe.data.json and some changes
 that I had planned for vibe.data.json for a while. I'm quite pleased by
 the results so far, although without a serialization framework it still
 misses a very important building block.

 Code: https://github.com/s-ludwig/std_data_json
 Docs: http://s-ludwig.github.io/std_data_json/
 DUB: http://code.dlang.org/packages/std_data_json

Say I have a class Person with name (string) and age (int) with a 
constructor that receives both. How would I create an instance of a 
Person from a json with the json stream?

Suppose the json is this:

{"age": 10, "name": "John"}

And the class is this:

class Person {
   this(string name, int age) {
     // ...
   }
}

Aug 21 2014

=?ISO-8859-15?Q?S=F6nke_Ludwig?= <sludwig rejectedsoftware.com> writes:

Am 22.08.2014 02:42, schrieb Ary Borenszweig:
 Say I have a class Person with name (string) and age (int) with a
 constructor that receives both. How would I create an instance of a
 Person from a json with the json stream?

 Suppose the json is this:

 {"age": 10, "name": "John"}

 And the class is this:

 class Person {
    this(string name, int age) {
      // ...
    }
 }

Without a serialization framework it would in theory work like this:

	JSONValue v = parseJSON(`{"age": 10, "name": "John"}`);
	auto p = new Person(v["name"].get!string, v["age"].get!int);

unfortunately the operator overloading doesn't work like this currently, 
so this is needed:

	JSONValue v = parseJSON(`{"age": 10, "name": "John"}`);
	auto p = new Person(
		v.get!(Json[string])["name"].get!string,
		v.get!(Json[string])["age"].get!int);

That should be solved together with the new module (it could of course 
also easily be added to JSONValue itself instead of Algebraic, but the 
value of having it in Algebraic would be much higher).

Aug 21 2014

Ary Borenszweig <ary esperanto.org.ar> writes:

On 8/22/14, 3:33 AM, S�nke Ludwig wrote:
 Am 22.08.2014 02:42, schrieb Ary Borenszweig:
 Say I have a class Person with name (string) and age (int) with a
 constructor that receives both. How would I create an instance of a
 Person from a json with the json stream?

 Suppose the json is this:

 {"age": 10, "name": "John"}

 And the class is this:

 class Person {
    this(string name, int age) {
      // ...
    }
 }

 Without a serialization framework it would in theory work like this:

      JSONValue v = parseJSON(`{"age": 10, "name": "John"}`);
      auto p = new Person(v["name"].get!string, v["age"].get!int);

 unfortunately the operator overloading doesn't work like this currently,
 so this is needed:

      JSONValue v = parseJSON(`{"age": 10, "name": "John"}`);
      auto p = new Person(
          v.get!(Json[string])["name"].get!string,
          v.get!(Json[string])["age"].get!int);

But does this parse the whole json into JSONValue? I want to create a 
Person without creating an intermediate JSONValue for the whole json. 
Can this be done?

Aug 22 2014

=?ISO-8859-15?Q?S=F6nke_Ludwig?= <sludwig rejectedsoftware.com> writes:

Am 22.08.2014 16:53, schrieb Ary Borenszweig:
 On 8/22/14, 3:33 AM, S�nke Ludwig wrote:
 Without a serialization framework it would in theory work like this:

      JSONValue v = parseJSON(`{"age": 10, "name": "John"}`);
      auto p = new Person(v["name"].get!string, v["age"].get!int);

 unfortunately the operator overloading doesn't work like this currently,
 so this is needed:

      JSONValue v = parseJSON(`{"age": 10, "name": "John"}`);
      auto p = new Person(
          v.get!(Json[string])["name"].get!string,
          v.get!(Json[string])["age"].get!int);

 But does this parse the whole json into JSONValue? I want to create a
 Person without creating an intermediate JSONValue for the whole json.
 Can this be done?

That would be done by the serialization framework. Instead of using 
parseJSON(), it could use parseJSONStream() to populate the Person 
instance on the fly, without putting the whole JSON into memory. But I'd 
like to leave that for a later addition, because we'd otherwise end up 
with duplicate functionality once std.serialization gets finalized.

Manually it would work similar to this:

auto nodes = parseJSONStream(`{"age": 10, "name": "John"}`);
with (JSONParserNode.Kind) {
	enforce(nodes.front == objectStart);
	nodes.popFront();
	while (nodes.front != objectEnd) {
		auto key = nodes.front.key;
		nodes.popFront();
		if (key == "name")
			person.name = nodes.front.literal.string;
		else if (key == "age")
			person.age = nodes.front.literal.number;
	}
}

Aug 22 2014

Ary Borenszweig <ary esperanto.org.ar> writes:

On 8/22/14, 1:24 PM, S�nke Ludwig wrote:
 Am 22.08.2014 16:53, schrieb Ary Borenszweig:
 On 8/22/14, 3:33 AM, S�nke Ludwig wrote:
 Without a serialization framework it would in theory work like this:

      JSONValue v = parseJSON(`{"age": 10, "name": "John"}`);
      auto p = new Person(v["name"].get!string, v["age"].get!int);

 unfortunately the operator overloading doesn't work like this currently,
 so this is needed:

      JSONValue v = parseJSON(`{"age": 10, "name": "John"}`);
      auto p = new Person(
          v.get!(Json[string])["name"].get!string,
          v.get!(Json[string])["age"].get!int);

 But does this parse the whole json into JSONValue? I want to create a
 Person without creating an intermediate JSONValue for the whole json.
 Can this be done?

 That would be done by the serialization framework. Instead of using
 parseJSON(), it could use parseJSONStream() to populate the Person
 instance on the fly, without putting the whole JSON into memory. But I'd
 like to leave that for a later addition, because we'd otherwise end up
 with duplicate functionality once std.serialization gets finalized.

 Manually it would work similar to this:

 auto nodes = parseJSONStream(`{"age": 10, "name": "John"}`);
 with (JSONParserNode.Kind) {
      enforce(nodes.front == objectStart);
      nodes.popFront();
      while (nodes.front != objectEnd) {
          auto key = nodes.front.key;
          nodes.popFront();
          if (key == "name")
              person.name = nodes.front.literal.string;
          else if (key == "age")
              person.age = nodes.front.literal.number;
      }
 }

Cool, that looks good :-)

Aug 22 2014

"Colden Cullen" <ColdenCullen gmail.com> writes:

I notice in the docs there are several references to a 
`parseJSON` and `parseJson`, but I can't seem to find where 
either of these are defined. Is this just a typo?

Hope this helps: 
https://github.com/s-ludwig/std_data_json/search?q=parseJson&type=Code

Aug 21 2014

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 22.08.2014 04:35, schrieb Colden Cullen:
 I notice in the docs there are several references to a `parseJSON` and
 `parseJson`, but I can't seem to find where either of these are defined.
 Is this just a typo?

 Hope this helps:
 https://github.com/s-ludwig/std_data_json/search?q=parseJson&type=Code

Seems like I forgot to replace a few mentions. They are called 
parseJSONValue and toJSONValue now for clarity.

Aug 21 2014

=?ISO-8859-15?Q?S=F6nke_Ludwig?= <sludwig rejectedsoftware.com> writes:

Am 22.08.2014 00:35, schrieb S�nke Ludwig:
 The DOM style JSONValue type is based on std.variant.Algebraic. This
 currently has a few usability issues that can be solved by
 upgrading/fixing Algebraic:

   - Operator overloading only works sporadically
   - (...)
   - Operations and conversions between different Algebraic types is not
     conveniently supported, which gets important when other similar
     formats get supported (e.g. BSON)

https://github.com/D-Programming-Language/phobos/pull/2452
https://github.com/D-Programming-Language/phobos/pull/2453

Those fix the most important operators, index access and binary arithmetic.

Aug 22 2014

"matovitch" <camille.brugel laposte.net> writes:

Very nice ! I had started (and dropped) a json module based on 
Algebraic too. So without opDispatch you plan to use a syntax 
like jPerson["age"] = 10 ? You didn't use stdx.d.lexer. Any 
reason why ? (I am asking even if I never used this module.(never 
coded much in D in fact))

Aug 22 2014

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 22.08.2014 14:17, schrieb matovitch:
 Very nice ! I had started (and dropped) a json module based on Algebraic
 too. So without opDispatch you plan to use a syntax like jPerson["age"]
 = 10 ? You didn't use stdx.d.lexer. Any reason why ? (I am asking even
 if I never used this module.(never coded much in D in fact))

Exactly, that's the syntax you'd use for JSONValue. But my favorite way 
to work with most JSON data is actually to directly read the JSON string 
into a D struct using a serialization framework and then access the 
struct in a strongly typed way. This has both, less syntactic and less 
runtime overhead, and also greatly reduces the chance for field 
name/type related bugs.

The module is written against current Phobos, which is why stdx.d.lexer 
wasn't really an option. I'm also unsure if std.lexer would be able to 
handle the parsing required for JSON numbers and strings. But it would 
certainly be nice already if at least the token structure could be 
reused. However, it should also be possible to find a painless migration 
path later, when std.lexer is actually part of Phobos.

Aug 22 2014

"matovitch" <camille.brugel laposte.net> writes:

On Friday, 22 August 2014 at 12:39:08 UTC, Sönke Ludwig wrote:
 Am 22.08.2014 14:17, schrieb matovitch:
 Very nice ! I had started (and dropped) a json module based on 
 Algebraic
 too. So without opDispatch you plan to use a syntax like 
 jPerson["age"]
 = 10 ? You didn't use stdx.d.lexer. Any reason why ? (I am 
 asking even
 if I never used this module.(never coded much in D in fact))

 Exactly, that's the syntax you'd use for JSONValue. But my 
 favorite way to work with most JSON data is actually to 
 directly read the JSON string into a D struct using a 
 serialization framework and then access the struct in a 
 strongly typed way. This has both, less syntactic and less 
 runtime overhead, and also greatly reduces the chance for field 
 name/type related bugs.

Completely agree, I am waiting for a serializer too. I would love 
to see something like cap'n proto in D.

 The module is written against current Phobos, which is why 
 stdx.d.lexer wasn't really an option. I'm also unsure if 
 std.lexer would be able to handle the parsing required for JSON 
 numbers and strings. But it would certainly be nice already if 
 at least the token structure could be reused. However, it 
 should also be possible to find a painless migration path 
 later, when std.lexer is actually part of Phobos.

Ok. I think I remember there was a stdx.d.lexer's Json parser 
provided as sample.

Aug 22 2014

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 22.08.2014 14:47, schrieb matovitch:
 Ok. I think I remember there was a stdx.d.lexer's Json parser provided
 as sample.

I see, so you just have to write your own number/string parsing routines:
https://github.com/Hackerpilot/lexer-demo/blob/master/jsonlexer.d

Aug 22 2014

"matovitch" <camille.brugel laposte.net> writes:

On Friday, 22 August 2014 at 13:00:19 UTC, Sönke Ludwig wrote:
 Am 22.08.2014 14:47, schrieb matovitch:
 Ok. I think I remember there was a stdx.d.lexer's Json parser 
 provided
 as sample.

 I see, so you just have to write your own number/string parsing 
 routines:
 https://github.com/Hackerpilot/lexer-demo/blob/master/jsonlexer.d

It's kind of "low level" indeed...I don't know what kind of back 
magic are doing all these template mixins but the code looks 
quite clean.

Confusing :

// Therefore, this always returns false.
bool isSeparating(size_t offset) pure nothrow  safe
{
     return true;
}

Aug 22 2014

Jacob Carlborg <doob me.com> writes:

On 2014-08-22 00:35, S�nke Ludwig wrote:
 Following up on the recent "std.jgrandson" thread [1], I've picked up
 the work (a lot earlier than anticipated) and finished a first version
 of a loose blend of said std.jgrandson, vibe.data.json and some changes
 that I had planned for vibe.data.json for a while. I'm quite pleased by
 the results so far, although without a serialization framework it still
 misses a very important building block.

 Code: https://github.com/s-ludwig/std_data_json
 Docs: http://s-ludwig.github.io/std_data_json/
 DUB: http://code.dlang.org/packages/std_data_json

* Opening braces should be put on their own line to follow Phobos style 
guides

* I'm wondering about the assert in lexer.d, line 160. What happens if 
two invalid tokens after each other occur?

* I think we have talked about this before, when reviewing D lexers. I'm 
thinking of how to handle invalid data. Is it the best solution to throw 
an exception? Would it be possible to return an error token and have the 
client decide what to do about? Shouldn't it be possible to build a JSON 
validator on this?

* The lexer seems to always convert JSON types to their native D types, 
is that wise to do? That's unnecessary if you're implementing syntax 
highlighting

-- 
/Jacob Carlborg

Aug 22 2014

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:

On Friday, 22 August 2014 at 15:47:51 UTC, Jacob Carlborg wrote:
 * I think we have talked about this before, when reviewing D 
 lexers. I'm thinking of how to handle invalid data. Is it the 
 best solution to throw an exception? Would it be possible to 
 return an error token and have the client decide what to do 
 about?

Hmm... my initial reaction was "not as default - it should throw 
on error, otherwise noone will check for errors". But if it's 
returning an error token, maybe it would be sufficient if that 
token throws when its value is accessed?

Aug 22 2014

=?ISO-8859-15?Q?S=F6nke_Ludwig?= <sludwig rejectedsoftware.com> writes:

Am 22.08.2014 17:47, schrieb Jacob Carlborg:
 On 2014-08-22 00:35, S�nke Ludwig wrote:
 Following up on the recent "std.jgrandson" thread [1], I've picked up
 the work (a lot earlier than anticipated) and finished a first version
 of a loose blend of said std.jgrandson, vibe.data.json and some changes
 that I had planned for vibe.data.json for a while. I'm quite pleased by
 the results so far, although without a serialization framework it still
 misses a very important building block.

 Code: https://github.com/s-ludwig/std_data_json
 Docs: http://s-ludwig.github.io/std_data_json/
 DUB: http://code.dlang.org/packages/std_data_json

 * Opening braces should be put on their own line to follow Phobos style
 guides

Will do.

 * I'm wondering about the assert in lexer.d, line 160. What happens if
 two invalid tokens after each other occur?

There are actually no invalid tokens at all, the "invalid" enum value is 
only used to denote that no token is currently stored in _front. If 
readToken() doesn't throw, there will always be a valid token.

 * I think we have talked about this before, when reviewing D lexers. I'm
 thinking of how to handle invalid data. Is it the best solution to throw
 an exception? Would it be possible to return an error token and have the
 client decide what to do about? Shouldn't it be possible to build a JSON
 validator on this?

That would indeed be a possibility, it's how I used to handle it in my 
private version of std.lexer, too. It could also be made a compile time 
option.

 * The lexer seems to always convert JSON types to their native D types,
 is that wise to do? That's unnecessary if you're implementing syntax
 highlighting

It's basically the same trade-off as for unescaping string literals. For 
"string" inputs, it would be more efficient to just store a slice, but 
for generic input ranges it avoids the otherwise needed allocation. The 
proposed flag could make an improvement here, too.

Aug 22 2014

=?ISO-8859-15?Q?S=F6nke_Ludwig?= <sludwig rejectedsoftware.com> writes:

Am 22.08.2014 18:13, schrieb S�nke Ludwig:
 Am 22.08.2014 17:47, schrieb Jacob Carlborg:
 * Opening braces should be put on their own line to follow Phobos style
 guides

 Will do.

 * I'm wondering about the assert in lexer.d, line 160. What happens if
 two invalid tokens after each other occur?

 There are actually no invalid tokens at all, the "invalid" enum value is
 only used to denote that no token is currently stored in _front. If
 readToken() doesn't throw, there will always be a valid token.

Renamed from "invalid" to "none" now to avoid confusion ->

 * I think we have talked about this before, when reviewing D lexers. I'm
 thinking of how to handle invalid data. Is it the best solution to throw
 an exception? Would it be possible to return an error token and have the
 client decide what to do about? Shouldn't it be possible to build a JSON
 validator on this?

 That would indeed be a possibility, it's how I used to handle it in my
 private version of std.lexer, too. It could also be made a compile time
 option.

and an additional "error" kind has been added, which implements the 
above. Enabled using LexOptions.noThrow.

 * The lexer seems to always convert JSON types to their native D types,
 is that wise to do? That's unnecessary if you're implementing syntax
 highlighting

 It's basically the same trade-off as for unescaping string literals. For
 "string" inputs, it would be more efficient to just store a slice, but
 for generic input ranges it avoids the otherwise needed allocation. The
 proposed flag could make an improvement here, too.

Aug 22 2014

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:

Some thoughts about the API:

1) Instead of `parseJSONValue` and `lexJSON`, how about static 
methods `JSON.parse` and `JSON.lex`, or even a module level 
functions `std.data.json.parse` etc.? The "JSON" part of the name 
is redundant.

2) Also, `parseJSONValue` and `parseJSONStream` probably don't 
need to have different names. They can be distinguished by their 
parameter types.

3) `toJSONString` shouldn't just take a boolean as flag for 
pretty-printing. It should either use something like 
`Pretty.YES`, or the function should be called 
`toPrettyJSONString` (I believe I have seen this latter 
convention elsewhere).
We should also think about whether we can just call the functions 
`toString` and `toPrettyString`. Alternatively, `toJSON` and 
`toPrettyJSON` should be considered.

Aug 22 2014

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 22.08.2014 18:15, schrieb "Marc Schütz" <schuetzm gmx.net>":
 Some thoughts about the API:

 1) Instead of `parseJSONValue` and `lexJSON`, how about static methods
 `JSON.parse` and `JSON.lex`, or even a module level functions
 `std.data.json.parse` etc.? The "JSON" part of the name is redundant.

For those functions it may be acceptable, although I really dislike that 
style, because it makes the code harder to read (what exactly does this 
parse?) and the functions are rarely used, so that that typing that 
additional "JSON" should be no issue at all. On the other hand, if you 
always type "JSON.lex" it's more to type than just "lexJSON".

But for "[JSON]Value" it gets ugly really quick, because "Value"s are 
such a common thing and quickly occur in multiple kinds in the same 
source file.

 2) Also, `parseJSONValue` and `parseJSONStream` probably don't need to
 have different names. They can be distinguished by their parameter types.

Actually they take exactly the same parameters and just differ in their 
return value. It would be more descriptive to name them parseAsJSONValue 
and parseAsJSONStream - or maybe parseJSONAsValue or parseJSONToValue? 
The current naming is somewhat modeled after std.conv's "to!T" and 
"parse!T".

 3) `toJSONString` shouldn't just take a boolean as flag for
 pretty-printing. It should either use something like `Pretty.YES`, or
 the function should be called `toPrettyJSONString` (I believe I have
 seen this latter convention elsewhere).
 We should also think about whether we can just call the functions
 `toString` and `toPrettyString`. Alternatively, `toJSON` and
 `toPrettyJSON` should be considered.

Agreed, a boolean isn't good for a public interface, renaming the 
current writeAsString to private writeAsStringImpl and then adding 
"(writeAs/to)[Pretty]String" sounds reasonable. Actually I've done it 
that way for vibe.data.json.

Aug 22 2014

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:

On Friday, 22 August 2014 at 16:48:44 UTC, Sönke Ludwig wrote:
 Am 22.08.2014 18:15, schrieb "Marc Schütz" <schuetzm gmx.net>":
 Some thoughts about the API:

 1) Instead of `parseJSONValue` and `lexJSON`, how about static 
 methods
 `JSON.parse` and `JSON.lex`, or even a module level functions
 `std.data.json.parse` etc.? The "JSON" part of the name is 
 redundant.

 For those functions it may be acceptable, although I really 
 dislike that style, because it makes the code harder to read 
 (what exactly does this parse?) and the functions are rarely 
 used, so that that typing that additional "JSON" should be no 
 issue at all. On the other hand, if you always type "JSON.lex" 
 it's more to type than just "lexJSON".

I'm not really concerned about the amount of typing, it just 
seemed a bit odd to have the redundant JSON in there, as we have 
module names for namespacing. Your argument about readability is 
true nevertheless. But...

 But for "[JSON]Value" it gets ugly really quick, because 
 "Value"s are such a common thing and quickly occur in multiple 
 kinds in the same source file.

 2) Also, `parseJSONValue` and `parseJSONStream` probably don't 
 need to
 have different names. They can be distinguished by their 
 parameter types.

 Actually they take exactly the same parameters and just differ 
 in their return value. It would be more descriptive to name 
 them parseAsJSONValue and parseAsJSONStream - or maybe 
 parseJSONAsValue or parseJSONToValue? The current naming is 
 somewhat modeled after std.conv's "to!T" and "parse!T".

... why not use exactly the same convention then? => 
`parse!JSONValue`

Would be nice to have a "pluggable" API where you just need to 
specify the type in a factory method to choose the input format. 
Then there could be `parse!BSON`, `parse!YAML`, with the same 
style as `parse!(int[])`.

I know this sound a bit like bike-shedding, but the API shouldn't 
stand by itself, but fit into the "big picture", especially as 
there will probably be other parsers (you already named the 
module std._data_.json).

Aug 22 2014

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 22.08.2014 19:24, schrieb "Marc Schütz" <schuetzm gmx.net>":
 On Friday, 22 August 2014 at 16:48:44 UTC, Sönke Ludwig wrote:
 Actually they take exactly the same parameters and just differ in
 their return value. It would be more descriptive to name them
 parseAsJSONValue and parseAsJSONStream - or maybe parseJSONAsValue or
 parseJSONToValue? The current naming is somewhat modeled after
 std.conv's "to!T" and "parse!T".

 ... why not use exactly the same convention then? => `parse!JSONValue`

 Would be nice to have a "pluggable" API where you just need to specify
 the type in a factory method to choose the input format. Then there
 could be `parse!BSON`, `parse!YAML`, with the same style as
 `parse!(int[])`.

 I know this sound a bit like bike-shedding, but the API shouldn't stand
 by itself, but fit into the "big picture", especially as there will
 probably be other parsers (you already named the module std._data_.json).

That would be nice, but then it should also work together with std.conv, 
which basically is exactly this pluggable API. Just like this it would 
result in an ambiguity error if both std.data.json and std.conv are 
imported at the same time.

Is there a way to make std.conv work properly with JSONValue? I guess 
the only theoretical way would be to put something in JSONValue, but 
that would result in a slightly ugly cyclic dependency between parser.d 
and value.d.

Aug 22 2014

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:

On Friday, 22 August 2014 at 17:35:20 UTC, Sönke Ludwig wrote:
 ... why not use exactly the same convention then? => 
 `parse!JSONValue`

 Would be nice to have a "pluggable" API where you just need to 
 specify
 the type in a factory method to choose the input format. Then 
 there
 could be `parse!BSON`, `parse!YAML`, with the same style as
 `parse!(int[])`.

 I know this sound a bit like bike-shedding, but the API 
 shouldn't stand
 by itself, but fit into the "big picture", especially as there 
 will
 probably be other parsers (you already named the module 
 std._data_.json).

 That would be nice, but then it should also work together with 
 std.conv, which basically is exactly this pluggable API. Just 
 like this it would result in an ambiguity error if both 
 std.data.json and std.conv are imported at the same time.

 Is there a way to make std.conv work properly with JSONValue? I 
 guess the only theoretical way would be to put something in 
 JSONValue, but that would result in a slightly ugly cyclic 
 dependency between parser.d and value.d.

The easiest and cleanest way would be to add a function in 
std.data.json:

     auto parse(Target, Source)(Source input)
         if(is(Target == JSONValue))
     {
         return ...;
     }

The various overloads of `std.conv.parse` already have mutually 
exclusive template constraints, they will not collide with our 
function.

Aug 22 2014

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 22.08.2014 19:57, schrieb "Marc Schütz" <schuetzm gmx.net>":
 On Friday, 22 August 2014 at 17:35:20 UTC, Sönke Ludwig wrote:
 ... why not use exactly the same convention then? => `parse!JSONValue`

 Would be nice to have a "pluggable" API where you just need to specify
 the type in a factory method to choose the input format. Then there
 could be `parse!BSON`, `parse!YAML`, with the same style as
 `parse!(int[])`.

 I know this sound a bit like bike-shedding, but the API shouldn't stand
 by itself, but fit into the "big picture", especially as there will
 probably be other parsers (you already named the module
 std._data_.json).

 That would be nice, but then it should also work together with
 std.conv, which basically is exactly this pluggable API. Just like
 this it would result in an ambiguity error if both std.data.json and
 std.conv are imported at the same time.

 Is there a way to make std.conv work properly with JSONValue? I guess
 the only theoretical way would be to put something in JSONValue, but
 that would result in a slightly ugly cyclic dependency between
 parser.d and value.d.

 The easiest and cleanest way would be to add a function in std.data.json:

      auto parse(Target, Source)(Source input)
          if(is(Target == JSONValue))
      {
          return ...;
      }

 The various overloads of `std.conv.parse` already have mutually
 exclusive template constraints, they will not collide with our function.

Okay, for parse that may work, but what about to!()?

Aug 22 2014

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:

On Friday, 22 August 2014 at 18:08:34 UTC, Sönke Ludwig wrote:
 Am 22.08.2014 19:57, schrieb "Marc Schütz" <schuetzm gmx.net>":
 The easiest and cleanest way would be to add a function in 
 std.data.json:

     auto parse(Target, Source)(Source input)
         if(is(Target == JSONValue))
     {
         return ...;
     }

 The various overloads of `std.conv.parse` already have mutually
 exclusive template constraints, they will not collide with our 
 function.

 Okay, for parse that may work, but what about to!()?

What's the problem with to!()?

Aug 22 2014

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 22.08.2014 21:00, schrieb "Marc Schütz" <schuetzm gmx.net>":
 On Friday, 22 August 2014 at 18:08:34 UTC, Sönke Ludwig wrote:
 Am 22.08.2014 19:57, schrieb "Marc Schütz" <schuetzm gmx.net>":
 The easiest and cleanest way would be to add a function in
 std.data.json:

     auto parse(Target, Source)(Source input)
         if(is(Target == JSONValue))
     {
         return ...;
     }

 The various overloads of `std.conv.parse` already have mutually
 exclusive template constraints, they will not collide with our function.

 Okay, for parse that may work, but what about to!()?

 What's the problem with to!()?

to!() definitely doesn't have a template constraint that excludes 
JSONValue. Instead, it will convert any struct type that doesn't define 
toString() to a D-like representation.

Aug 23 2014

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:

On Saturday, 23 August 2014 at 16:49:23 UTC, Sönke Ludwig wrote:
 Am 22.08.2014 21:00, schrieb "Marc Schütz" <schuetzm gmx.net>":
 On Friday, 22 August 2014 at 18:08:34 UTC, Sönke Ludwig wrote:
 Am 22.08.2014 19:57, schrieb "Marc Schütz" 
 <schuetzm gmx.net>":
 The easiest and cleanest way would be to add a function in
 std.data.json:

    auto parse(Target, Source)(Source input)
        if(is(Target == JSONValue))
    {
        return ...;
    }

 The various overloads of `std.conv.parse` already have 
 mutually
 exclusive template constraints, they will not collide with 
 our function.

 Okay, for parse that may work, but what about to!()?

 What's the problem with to!()?

 to!() definitely doesn't have a template constraint that 
 excludes JSONValue. Instead, it will convert any struct type 
 that doesn't define toString() to a D-like representation.

For converting a JSONValue to a different type, JSONValue can 
implement `opCast`, which is the regular interface that 
std.conv.to uses if it's available.

For converting something _to_ a JSONValue, std.conv.to will 
simply create an instance of it by calling the constructor.

Aug 23 2014

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 23.08.2014 19:25, schrieb "Marc Schütz" <schuetzm gmx.net>":
 On Saturday, 23 August 2014 at 16:49:23 UTC, Sönke Ludwig wrote:
 Am 22.08.2014 21:00, schrieb "Marc Schütz" <schuetzm gmx.net>":
 On Friday, 22 August 2014 at 18:08:34 UTC, Sönke Ludwig wrote:
 Am 22.08.2014 19:57, schrieb "Marc Schütz" <schuetzm gmx.net>":
 The easiest and cleanest way would be to add a function in
 std.data.json:

    auto parse(Target, Source)(Source input)
        if(is(Target == JSONValue))
    {
        return ...;
    }

 The various overloads of `std.conv.parse` already have mutually
 exclusive template constraints, they will not collide with our
 function.

 Okay, for parse that may work, but what about to!()?

 What's the problem with to!()?

 to!() definitely doesn't have a template constraint that excludes
 JSONValue. Instead, it will convert any struct type that doesn't
 define toString() to a D-like representation.

 For converting a JSONValue to a different type, JSONValue can implement
 `opCast`, which is the regular interface that std.conv.to uses if it's
 available.

 For converting something _to_ a JSONValue, std.conv.to will simply
 create an instance of it by calling the constructor.

That would just introduce the said dependency cycle between JSONValue, 
the parser and the lexer. Possible, but not particularly pretty. Also, 
using the JSONValue constructor to parse an input string would 
contradict the intuitive behavior to just store the string value.

Aug 23 2014

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:

On Saturday, 23 August 2014 at 17:32:01 UTC, Sönke Ludwig wrote:
 Am 23.08.2014 19:25, schrieb "Marc Schütz" <schuetzm gmx.net>":
 On Saturday, 23 August 2014 at 16:49:23 UTC, Sönke Ludwig 
 wrote:
 Am 22.08.2014 21:00, schrieb "Marc Schütz" 
 <schuetzm gmx.net>":
 On Friday, 22 August 2014 at 18:08:34 UTC, Sönke Ludwig 
 wrote:
 Am 22.08.2014 19:57, schrieb "Marc Schütz" 
 <schuetzm gmx.net>":
 The easiest and cleanest way would be to add a function in
 std.data.json:

   auto parse(Target, Source)(Source input)
       if(is(Target == JSONValue))
   {
       return ...;
   }

 The various overloads of `std.conv.parse` already have 
 mutually
 exclusive template constraints, they will not collide with 
 our
 function.

 Okay, for parse that may work, but what about to!()?

 What's the problem with to!()?

 to!() definitely doesn't have a template constraint that 
 excludes
 JSONValue. Instead, it will convert any struct type that 
 doesn't
 define toString() to a D-like representation.

 For converting a JSONValue to a different type, JSONValue can 
 implement
 `opCast`, which is the regular interface that std.conv.to uses 
 if it's
 available.

 For converting something _to_ a JSONValue, std.conv.to will 
 simply
 create an instance of it by calling the constructor.

 That would just introduce the said dependency cycle between 
 JSONValue, the parser and the lexer. Possible, but not 
 particularly pretty. Also, using the JSONValue constructor to 
 parse an input string would contradict the intuitive behavior 
 to just store the string value.

That's what I expect it to do anyway. For parsing, there are 
already other functions. "mystring".to!JSONValue should just wrap 
"mystring".

Aug 23 2014

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 23.08.2014 20:31, schrieb "Marc Schütz" <schuetzm gmx.net>":
 On Saturday, 23 August 2014 at 17:32:01 UTC, Sönke Ludwig wrote:
 Am 23.08.2014 19:25, schrieb "Marc Schütz" <schuetzm gmx.net>":
 On Saturday, 23 August 2014 at 16:49:23 UTC, Sönke Ludwig wrote:
 Am 22.08.2014 21:00, schrieb "Marc Schütz" <schuetzm gmx.net>":
 On Friday, 22 August 2014 at 18:08:34 UTC, Sönke Ludwig wrote:
 Am 22.08.2014 19:57, schrieb "Marc Schütz" <schuetzm gmx.net>":
 The easiest and cleanest way would be to add a function in
 std.data.json:

   auto parse(Target, Source)(Source input)
       if(is(Target == JSONValue))
   {
       return ...;
   }

 The various overloads of `std.conv.parse` already have mutually
 exclusive template constraints, they will not collide with our
 function.

 Okay, for parse that may work, but what about to!()?

 What's the problem with to!()?

 to!() definitely doesn't have a template constraint that excludes
 JSONValue. Instead, it will convert any struct type that doesn't
 define toString() to a D-like representation.

 For converting a JSONValue to a different type, JSONValue can implement
 `opCast`, which is the regular interface that std.conv.to uses if it's
 available.

 For converting something _to_ a JSONValue, std.conv.to will simply
 create an instance of it by calling the constructor.

 That would just introduce the said dependency cycle between JSONValue,
 the parser and the lexer. Possible, but not particularly pretty. Also,
 using the JSONValue constructor to parse an input string would
 contradict the intuitive behavior to just store the string value.

 That's what I expect it to do anyway. For parsing, there are already
 other functions. "mystring".to!JSONValue should just wrap "mystring".

Probably, but then to!() is inconsistent with parse!(). Usually they are 
both the same apart from how the tail of the input string is handled.

Aug 23 2014

"Christian Manning" <cmanning999 gmail.com> writes:

It would be nice to have integers treated separately to doubles. 
I know it makes the number parsing simpler to just treat 
everything as double, but still, it could be annoying when you 
expect an integer type.

I'd also like to see some benchmarks, particularly against some 
of the high performance C++ parsers, i.e. rapidjson, gason, 
sajson. Or even some of the "not bad" performance parsers with 
better APIs, i.e. QJsonDocument, jsoncpp and jsoncons (slow but 
perhaps comparable interface to this proposal?).

Aug 22 2014

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 22.08.2014 18:31, schrieb Christian Manning:
 It would be nice to have integers treated separately to doubles. I know
 it makes the number parsing simpler to just treat everything as double,
 but still, it could be annoying when you expect an integer type.

That's how I've done it for vibe.data.json, too. For the new 
implementation, I've just used the number parsing routine from Andrei's 
std.jgrandson module. Does anybody have reservations about representing 
integers as "long" instead?

 I'd also like to see some benchmarks, particularly against some of the
 high performance C++ parsers, i.e. rapidjson, gason, sajson. Or even
 some of the "not bad" performance parsers with better APIs, i.e.
 QJsonDocument, jsoncpp and jsoncons (slow but perhaps comparable
 interface to this proposal?).

That would indeed be nice to have, but I'm not sure if I can manage to 
squeeze that in besides finishing the module itself. My time frame for 
working on this is quite limited.

Aug 22 2014

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:

On Friday, 22 August 2014 at 16:56:26 UTC, Sönke Ludwig wrote:
 Am 22.08.2014 18:31, schrieb Christian Manning:
 It would be nice to have integers treated separately to 
 doubles. I know
 it makes the number parsing simpler to just treat everything 
 as double,
 but still, it could be annoying when you expect an integer 
 type.

 That's how I've done it for vibe.data.json, too. For the new 
 implementation, I've just used the number parsing routine from 
 Andrei's std.jgrandson module. Does anybody have reservations 
 about representing integers as "long" instead?

It should automatically fall back to double on overflow. Maybe 
even use BigInt if applicable?

Aug 22 2014

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 22.08.2014 19:27, schrieb "Marc Schütz" <schuetzm gmx.net>":
 On Friday, 22 August 2014 at 16:56:26 UTC, Sönke Ludwig wrote:
 Am 22.08.2014 18:31, schrieb Christian Manning:
 It would be nice to have integers treated separately to doubles. I know
 it makes the number parsing simpler to just treat everything as double,
 but still, it could be annoying when you expect an integer type.

 That's how I've done it for vibe.data.json, too. For the new
 implementation, I've just used the number parsing routine from
 Andrei's std.jgrandson module. Does anybody have reservations about
 representing integers as "long" instead?

 It should automatically fall back to double on overflow. Maybe even use
 BigInt if applicable?

I guess BigInt + exponent would be the only lossless way to represent 
any JSON number. That could then be converted to any desired smaller 
type as required.

But checking for overflow during number parsing would definitely have an 
impact on parsing speed, as well as using a BigInt of course, so the 
question is how we want set up the trade off here (or if there is 
another way that is overhead-free).

Aug 22 2014

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:

On Friday, 22 August 2014 at 17:45:03 UTC, Sönke Ludwig wrote:
 Am 22.08.2014 19:27, schrieb "Marc Schütz" <schuetzm gmx.net>":
 On Friday, 22 August 2014 at 16:56:26 UTC, Sönke Ludwig wrote:
 Am 22.08.2014 18:31, schrieb Christian Manning:
 It would be nice to have integers treated separately to 
 doubles. I know
 it makes the number parsing simpler to just treat everything 
 as double,
 but still, it could be annoying when you expect an integer 
 type.

 That's how I've done it for vibe.data.json, too. For the new
 implementation, I've just used the number parsing routine from
 Andrei's std.jgrandson module. Does anybody have reservations 
 about
 representing integers as "long" instead?

 It should automatically fall back to double on overflow. Maybe 
 even use
 BigInt if applicable?

 I guess BigInt + exponent would be the only lossless way to 
 represent any JSON number. That could then be converted to any 
 desired smaller type as required.

 But checking for overflow during number parsing would 
 definitely have an impact on parsing speed, as well as using a 
 BigInt of course, so the question is how we want set up the 
 trade off here (or if there is another way that is 
 overhead-free).

As the functions will be templatized anyway, it should include a 
flags parameter. These and possible future extensions can then be 
selected by the user.

Aug 22 2014

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 22.08.2014 20:01, schrieb "Marc Schütz" <schuetzm gmx.net>":
 On Friday, 22 August 2014 at 17:45:03 UTC, Sönke Ludwig wrote:
 Am 22.08.2014 19:27, schrieb "Marc Schütz" <schuetzm gmx.net>":
 On Friday, 22 August 2014 at 16:56:26 UTC, Sönke Ludwig wrote:
 Am 22.08.2014 18:31, schrieb Christian Manning:
 It would be nice to have integers treated separately to doubles. I
 know
 it makes the number parsing simpler to just treat everything as
 double,
 but still, it could be annoying when you expect an integer type.

 That's how I've done it for vibe.data.json, too. For the new
 implementation, I've just used the number parsing routine from
 Andrei's std.jgrandson module. Does anybody have reservations about
 representing integers as "long" instead?

 It should automatically fall back to double on overflow. Maybe even use
 BigInt if applicable?

 I guess BigInt + exponent would be the only lossless way to represent
 any JSON number. That could then be converted to any desired smaller
 type as required.

 But checking for overflow during number parsing would definitely have
 an impact on parsing speed, as well as using a BigInt of course, so
 the question is how we want set up the trade off here (or if there is
 another way that is overhead-free).

 As the functions will be templatized anyway, it should include a flags
 parameter. These and possible future extensions can then be selected by
 the user.

I'm actually in the process of converting the "track_location" parameter 
to a flags enum and to add support for an error token, so this would fit 
right in.

Aug 22 2014

"Christian Manning" <cmanning999 gmail.com> writes:

On Friday, 22 August 2014 at 17:45:03 UTC, Sönke Ludwig wrote:
 Am 22.08.2014 19:27, schrieb "Marc Schütz" <schuetzm gmx.net>":
 On Friday, 22 August 2014 at 16:56:26 UTC, Sönke Ludwig wrote:
 Am 22.08.2014 18:31, schrieb Christian Manning:
 It would be nice to have integers treated separately to 
 doubles. I know
 it makes the number parsing simpler to just treat everything 
 as double,
 but still, it could be annoying when you expect an integer 
 type.

 That's how I've done it for vibe.data.json, too. For the new
 implementation, I've just used the number parsing routine from
 Andrei's std.jgrandson module. Does anybody have reservations 
 about
 representing integers as "long" instead?

 It should automatically fall back to double on overflow. Maybe 
 even use
 BigInt if applicable?

 I guess BigInt + exponent would be the only lossless way to 
 represent any JSON number. That could then be converted to any 
 desired smaller type as required.

 But checking for overflow during number parsing would 
 definitely have an impact on parsing speed, as well as using a 
 BigInt of course, so the question is how we want set up the 
 trade off here (or if there is another way that is 
 overhead-free).

You could check for a decimal point and a 0 at the front 
(excluding possible - sign), either would indicate a double, 
making the reasonable assumption that anything else will fit in a 
long.

Aug 22 2014

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 22.08.2014 21:48, schrieb Christian Manning:
 On Friday, 22 August 2014 at 17:45:03 UTC, Sönke Ludwig wrote:
 Am 22.08.2014 19:27, schrieb "Marc Schütz" <schuetzm gmx.net>":
 On Friday, 22 August 2014 at 16:56:26 UTC, Sönke Ludwig wrote:
 Am 22.08.2014 18:31, schrieb Christian Manning:
 It would be nice to have integers treated separately to doubles. I
 know
 it makes the number parsing simpler to just treat everything as
 double,
 but still, it could be annoying when you expect an integer type.

 That's how I've done it for vibe.data.json, too. For the new
 implementation, I've just used the number parsing routine from
 Andrei's std.jgrandson module. Does anybody have reservations about
 representing integers as "long" instead?

 It should automatically fall back to double on overflow. Maybe even use
 BigInt if applicable?

 I guess BigInt + exponent would be the only lossless way to represent
 any JSON number. That could then be converted to any desired smaller
 type as required.

 But checking for overflow during number parsing would definitely have
 an impact on parsing speed, as well as using a BigInt of course, so
 the question is how we want set up the trade off here (or if there is
 another way that is overhead-free).

 You could check for a decimal point and a 0 at the front (excluding
 possible - sign), either would indicate a double, making the reasonable
 assumption that anything else will fit in a long.

Yes, no decimal point + no exponent would work without overhead to 
detect integers, but that wouldn't solve the proposed automatic 
long->double overflow, which is what I meant. My current idea is to 
default to double and optionally support any of long, BigInt and 
"Decimal" (BigInt+exponent), where integer overflow only works for 
long->BigInt.

Aug 22 2014

"John Colvin" <john.loughran.colvin gmail.com> writes:

On Friday, 22 August 2014 at 20:02:41 UTC, Sönke Ludwig wrote:
 Am 22.08.2014 21:48, schrieb Christian Manning:
 On Friday, 22 August 2014 at 17:45:03 UTC, Sönke Ludwig wrote:
 Am 22.08.2014 19:27, schrieb "Marc Schütz" 
 <schuetzm gmx.net>":
 On Friday, 22 August 2014 at 16:56:26 UTC, Sönke Ludwig 
 wrote:
 Am 22.08.2014 18:31, schrieb Christian Manning:
 It would be nice to have integers treated separately to 
 doubles. I
 know
 it makes the number parsing simpler to just treat 
 everything as
 double,
 but still, it could be annoying when you expect an integer 
 type.

 That's how I've done it for vibe.data.json, too. For the new
 implementation, I've just used the number parsing routine 
 from
 Andrei's std.jgrandson module. Does anybody have 
 reservations about
 representing integers as "long" instead?

 It should automatically fall back to double on overflow. 
 Maybe even use
 BigInt if applicable?

 I guess BigInt + exponent would be the only lossless way to 
 represent
 any JSON number. That could then be converted to any desired 
 smaller
 type as required.

 But checking for overflow during number parsing would 
 definitely have
 an impact on parsing speed, as well as using a BigInt of 
 course, so
 the question is how we want set up the trade off here (or if 
 there is
 another way that is overhead-free).

 You could check for a decimal point and a 0 at the front 
 (excluding
 possible - sign), either would indicate a double, making the 
 reasonable
 assumption that anything else will fit in a long.

 Yes, no decimal point + no exponent would work without overhead 
 to detect integers, but that wouldn't solve the proposed 
 automatic long->double overflow, which is what I meant. My 
 current idea is to default to double and optionally support any 
 of long, BigInt and "Decimal" (BigInt+exponent), where integer 
 overflow only works for long->BigInt.

It might be the right choice anyway (seeing as json/js do 
overflow to double), but fwiw it's still atrocious.

double a = long.max;
assert(iota(1, 1000000).map!(d => (a+d)-a).until!"a != 
0".walkLength == 1024);

Yuk.

Floating point numbers and integers are so completely different 
in behaviour that it's just dishonest to transparently switch 
between the two. This especially the case for overflow from long 
-> double, where by definition you're 10 bits past being able to 
reliably accurately represent the integer in question.

Aug 22 2014

"Christian Manning" <cmanning999 gmail.com> writes:

 Yes, no decimal point + no exponent would work without overhead 
 to detect integers, but that wouldn't solve the proposed 
 automatic long->double overflow, which is what I meant. My 
 current idea is to default to double and optionally support any 
 of long, BigInt and "Decimal" (BigInt+exponent), where integer 
 overflow only works for long->BigInt.

Ah I see.

I have to say, if you are going to treat integers and floating 
point numbers differently, then you should store them 
differently. long should be used to store integers, double for 
floating point numbers. 64 bit signed integer (long) is a totally 
reasonable limitation for integers, but even that would lose 
precision stored as a double as you are proposing (if I'm 
understanding right). I don't think BigInt needs to be brought 
into this at all really.

In the case of integers met in the parser which are too 
large/small to fit in long, give an error IMO. Such integers 
should be (and are by other libs IIRC) serialised in the form 
"1.234e-123" to force double parsing, perhaps losing precision at 
that stage rather than invisibly inside the library. Size of JSON 
numbers is implementation defined and the whole thing shouldn't 
be degraded in both performance and usability to cover JSON 
serialisers who go beyond common native number types.

Of course, you are free to do whatever you like :)

Aug 22 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 8/21/2014 3:35 PM, Sönke Ludwig wrote:
 Destroy away! ;)

Thanks for taking this on! This is valuable work. On to destruction!

I'm looking at:

http://s-ludwig.github.io/std_data_json/stdx/data/json/lexer/lexJSON.html

I anticipate this will be used a LOT and in very high speed demanding 
applications. With that in mind,


1. There's no mention of what will happen if it is passed malformed JSON 
strings. I presume an exception is thrown. Exceptions are both slow and consume 
GC memory. I suggest an alternative would be to emit an "Error" token instead; 
this would be much like how the UTF decoding algorithms emit a "replacement 
char" for invalid UTF sequences.

2. The escape sequenced strings presumably consume GC memory. This will be a 
problem for high performance code. I suggest either leaving them undecoded in 
the token stream, and letting higher level code decide what to do about them,
or 
provide a hook that the user can override with his own allocation scheme.


If we don't make it possible to use std.json without invoking the GC, I believe 
the module will fail in the long term.

Aug 22 2014

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 22.08.2014 20:08, schrieb Walter Bright:
 On 8/21/2014 3:35 PM, Sönke Ludwig wrote:
 Destroy away! ;)

 Thanks for taking this on! This is valuable work. On to destruction!

 I'm looking at:

 http://s-ludwig.github.io/std_data_json/stdx/data/json/lexer/lexJSON.html

 I anticipate this will be used a LOT and in very high speed demanding
 applications. With that in mind,


 1. There's no mention of what will happen if it is passed malformed JSON
 strings. I presume an exception is thrown. Exceptions are both slow and
 consume GC memory. I suggest an alternative would be to emit an "Error"
 token instead; this would be much like how the UTF decoding algorithms
 emit a "replacement char" for invalid UTF sequences.

The latest version now features a LexOptions.noThrow option which causes 
an error token to be emitted instead. After popping the error token, the 
range is always empty.

 2. The escape sequenced strings presumably consume GC memory. This will
 be a problem for high performance code. I suggest either leaving them
 undecoded in the token stream, and letting higher level code decide what
 to do about them, or provide a hook that the user can override with his
 own allocation scheme.

The problem is that it really depends on the use case and on the type of 
input stream which approach is more efficient (storing the escaped 
version of a string might require *two* allocations if the input range 
cannot be sliced and if the decoded string is then requested by the 
parser). My current idea therefore is to simply make this configurable, too.

Enabling the use of custom allocators should be easily possible as an 
add-on functionality later on. At least my suggestion would be to wait 
with this until we have a finished std.allocator module.

Aug 22 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 8/22/2014 2:27 PM, Sönke Ludwig wrote:
 Am 22.08.2014 20:08, schrieb Walter Bright:
 1. There's no mention of what will happen if it is passed malformed JSON
 strings. I presume an exception is thrown. Exceptions are both slow and
 consume GC memory. I suggest an alternative would be to emit an "Error"
 token instead; this would be much like how the UTF decoding algorithms
 emit a "replacement char" for invalid UTF sequences.

 The latest version now features a LexOptions.noThrow option which causes an
 error token to be emitted instead. After popping the error token, the range is
 always empty.

Having a nothrow option may prevent the functions from being attributed as 
"nothrow".

But in any case, to worship at the Altar Of Composability, the error token
could 
always be emitted, and then provide another algorithm which passes through all 
non-error tokens, and throws if it sees an error token.


 2. The escape sequenced strings presumably consume GC memory. This will
 be a problem for high performance code. I suggest either leaving them
 undecoded in the token stream, and letting higher level code decide what
 to do about them, or provide a hook that the user can override with his
 own allocation scheme.

 The problem is that it really depends on the use case and on the type of input
 stream which approach is more efficient (storing the escaped version of a
string
 might require *two* allocations if the input range cannot be sliced and if the
 decoded string is then requested by the parser). My current idea therefore is
to
 simply make this configurable, too.

 Enabling the use of custom allocators should be easily possible as an add-on
 functionality later on. At least my suggestion would be to wait with this until
 we have a finished std.allocator module.

I'm worried that std.allocator is stalled and we'll be digging ourselves deeper 
into needing to revise things later to remove GC usage. I'd really like to find 
a way to abstract the allocation away from the algorithm.

Aug 22 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 8/22/2014 6:05 PM, Walter Bright wrote:
 The problem is that it really depends on the use case and on the type of input
 stream which approach is more efficient (storing the escaped version of a
string
 might require *two* allocations if the input range cannot be sliced and if the
 decoded string is then requested by the parser). My current idea therefore is
to
 simply make this configurable, too.

 Enabling the use of custom allocators should be easily possible as an add-on
 functionality later on. At least my suggestion would be to wait with this until
 we have a finished std.allocator module.


Another possibility is to have the user pass in a resizeable buffer which then 
will be used to store the strings in as necessary.

One example is std.internal.scopebuffer. The nice thing about that is the user 
can use the stack for the storage, which works out to be very, very fast.

Aug 22 2014

"Ola Fosheim Gr" <ola.fosheim.grostad+dlang gmail.com> writes:

On Saturday, 23 August 2014 at 02:30:23 UTC, Walter Bright wrote:
 Another possibility is to have the user pass in a resizeable 
 buffer which then will be used to store the strings in as 
 necessary.

 One example is std.internal.scopebuffer. The nice thing about 
 that is the user can use the stack for the storage, which works 
 out to be very, very fast.

Does this mean that D is getting resizable stack allocations in 
lower stack frames? That has a lot of implications for code gen.

Aug 22 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 8/22/2014 9:01 PM, Ola Fosheim Gr wrote:
 On Saturday, 23 August 2014 at 02:30:23 UTC, Walter Bright wrote:
 One example is std.internal.scopebuffer. The nice thing about that is the user
 can use the stack for the storage, which works out to be very, very fast.

 Does this mean that D is getting resizable stack allocations in lower stack
 frames? That has a lot of implications for code gen.

scopebuffer does not require resizeable stack allocations.

Aug 22 2014

"Ola Fosheim Gr" <ola.fosheim.grostad+dlang gmail.com> writes:

On Saturday, 23 August 2014 at 04:36:34 UTC, Walter Bright wrote:
 On 8/22/2014 9:01 PM, Ola Fosheim Gr wrote:
 Does this mean that D is getting resizable stack allocations 
 in lower stack
 frames? That has a lot of implications for code gen.

 scopebuffer does not require resizeable stack allocations.

So you cannot use the stack for resizable allocations.

That would however be a nice optimization. Iff an algorithm only 
have one alloca, can be inlined in a way which does not extend 
the stack and use a resizable buffer that grows downwards in 
memory then you can have a resizable buffer on the stack:

HIMEM
...
Algorihm stack frame vars
Inlined vars
Buffer head/book keeping vars
Buffer end
Buffer front
...add to front here...
End of stack
LOMEM

Aug 22 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 8/22/2014 9:48 PM, Ola Fosheim Gr wrote:
 On Saturday, 23 August 2014 at 04:36:34 UTC, Walter Bright wrote:
 On 8/22/2014 9:01 PM, Ola Fosheim Gr wrote:
 Does this mean that D is getting resizable stack allocations in lower stack
 frames? That has a lot of implications for code gen.

 scopebuffer does not require resizeable stack allocations.

 So you cannot use the stack for resizable allocations.

Please, take a look at how scopebuffer works.

Aug 22 2014

"Ola Fosheim Gr" <ola.fosheim.grostad+dlang gmail.com> writes:

On Saturday, 23 August 2014 at 05:28:55 UTC, Walter Bright wrote:
 On 8/22/2014 9:48 PM, Ola Fosheim Gr wrote:
 On Saturday, 23 August 2014 at 04:36:34 UTC, Walter Bright 
 wrote:
 On 8/22/2014 9:01 PM, Ola Fosheim Gr wrote:
 Does this mean that D is getting resizable stack allocations 
 in lower stack
 frames? That has a lot of implications for code gen.

 scopebuffer does not require resizeable stack allocations.

 So you cannot use the stack for resizable allocations.

 Please, take a look at how scopebuffer works.

I have? It requires an upperbound to stay on the stack, that 
creates a big hole in the stack. I don't think wasting the stack 
or moving to the heap is a nice predictable solution. It would be 
better to just have a couple of regions that do "reverse" stack 
allocations, but the most efficient solution is the one I 
outlined.

With json you might be able to create an upperbound of say 4-8 
times the size of the source iff you know the file size. You 
don't if you are streaming.

(scopebuffer is too unpredictable for real time, a pure stack 
solution is predictable)

Aug 22 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 8/22/2014 11:25 PM, Ola Fosheim Gr wrote:
 On Saturday, 23 August 2014 at 05:28:55 UTC, Walter Bright wrote:
 On 8/22/2014 9:48 PM, Ola Fosheim Gr wrote:
 On Saturday, 23 August 2014 at 04:36:34 UTC, Walter Bright wrote:
 On 8/22/2014 9:01 PM, Ola Fosheim Gr wrote:
 Does this mean that D is getting resizable stack allocations in lower stack
 frames? That has a lot of implications for code gen.

 scopebuffer does not require resizeable stack allocations.

 So you cannot use the stack for resizable allocations.

 Please, take a look at how scopebuffer works.

 I have? It requires an upperbound to stay on the stack, that creates a big hole
 in the stack. I don't think wasting the stack or moving to the heap is a nice
 predictable solution. It would be better to just have a couple of regions that
 do "reverse" stack allocations, but the most efficient solution is the one I
 outlined.

Scopebuffer is extensively used in Warp, and works very well. The "hole" in the 
stack is not a significant problem.


 With json you might be able to create an upperbound of say 4-8 times the size
of
 the source iff you know the file size. You don't if you are streaming.

 (scopebuffer is too unpredictable for real time, a pure stack solution is
 predictable)

You can always implement your own buffering system and pass it in - that's the 
point, it's under user control.

Aug 22 2014

"Ola Fosheim Gr" <ola.fosheim.grostad+dlang gmail.com> writes:

On Saturday, 23 August 2014 at 06:41:11 UTC, Walter Bright wrote:
 Scopebuffer is extensively used in Warp, and works very well. 
 The "hole" in the stack is not a significant problem.

Well, on a webserver you don't want to push out the caches for no 
good reason.

 You can always implement your own buffering system and pass it 
 in - that's the point, it's under user control.

My point is that you need compiler support to get good buffering 
options on the stack. Something like an  alloca_inline:

auto buffer =  alloca_inline getstuff();
process(buffer);

I think all memory allocation should be under compiler control, 
the library solutions are bound to be suboptimal, i.e. slower.

Aug 22 2014

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 23.08.2014 03:05, schrieb Walter Bright:
 On 8/22/2014 2:27 PM, Sönke Ludwig wrote:
 Am 22.08.2014 20:08, schrieb Walter Bright:
 1. There's no mention of what will happen if it is passed malformed JSON
 strings. I presume an exception is thrown. Exceptions are both slow and
 consume GC memory. I suggest an alternative would be to emit an "Error"
 token instead; this would be much like how the UTF decoding algorithms
 emit a "replacement char" for invalid UTF sequences.

 The latest version now features a LexOptions.noThrow option which
 causes an
 error token to be emitted instead. After popping the error token, the
 range is
 always empty.

 Having a nothrow option may prevent the functions from being attributed
 as "nothrow".

It's a compile time option, so that shouldn't be an issue. There is also 
just a single "throw" statement in the source, so it's easy to isolate.

Aug 23 2014

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 22.08.2014 20:08, schrieb Walter Bright:
 (...)
 2. The escape sequenced strings presumably consume GC memory. This will
 be a problem for high performance code. I suggest either leaving them
 undecoded in the token stream, and letting higher level code decide what
 to do about them, or provide a hook that the user can override with his
 own allocation scheme.

 If we don't make it possible to use std.json without invoking the GC, I
 believe the module will fail in the long term.

I've added two new types now to abstract away how strings and numbers 
are represented in memory. For string literals this means that for input 
types "string" and "immutable(ubyte)[]" they will always be stored as 
slices to the input buffer. JSONValue has a .rawValue property to access 
them, as well as an "alias this"ed .value property that transparently 
unescapes.

At that place it would also be easy to provide a method that takes an 
arbitrary output range to unescape without allocations.

Documentation and code are both updated (also added a note about 
exception behavior).

Aug 23 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 8/23/2014 9:36 AM, Sönke Ludwig wrote:
 input types "string" and "immutable(ubyte)[]"

Why the immutable(ubyte)[] ?

Aug 23 2014

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 23.08.2014 19:38, schrieb Walter Bright:
 On 8/23/2014 9:36 AM, Sönke Ludwig wrote:
 input types "string" and "immutable(ubyte)[]"

 Why the immutable(ubyte)[] ?

I've adopted that basically from Andrei's module. The idea is to allow 
processing data with arbitrary character encoding. However, the output 
will always be Unicode and JSON is defined to be encoded as Unicode, 
too, so that could probably be dropped...

Aug 23 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 8/23/2014 10:42 AM, Sönke Ludwig wrote:
 Am 23.08.2014 19:38, schrieb Walter Bright:
 On 8/23/2014 9:36 AM, Sönke Ludwig wrote:
 input types "string" and "immutable(ubyte)[]"

 Why the immutable(ubyte)[] ?

 I've adopted that basically from Andrei's module. The idea is to allow
 processing data with arbitrary character encoding. However, the output will
 always be Unicode and JSON is defined to be encoded as Unicode, too, so that
 could probably be dropped...

I feel that non-UTF encodings should be handled by adapter algorithms, not 
embedded into the JSON lexer, so yes, I'd drop that.

Aug 23 2014

Brad Roberts via Digitalmars-d <digitalmars-d puremagic.com> writes:

On 8/23/2014 10:46 AM, Walter Bright via Digitalmars-d wrote:
 On 8/23/2014 10:42 AM, Sönke Ludwig wrote:
 Am 23.08.2014 19:38, schrieb Walter Bright:
 On 8/23/2014 9:36 AM, Sönke Ludwig wrote:
 input types "string" and "immutable(ubyte)[]"

 Why the immutable(ubyte)[] ?

 I've adopted that basically from Andrei's module. The idea is to allow
 processing data with arbitrary character encoding. However, the output
 will
 always be Unicode and JSON is defined to be encoded as Unicode, too,
 so that
 could probably be dropped...

 I feel that non-UTF encodings should be handled by adapter algorithms,
 not embedded into the JSON lexer, so yes, I'd drop that.

For performance purposes, determining encoding during lexing is useful. 
  You can avoid any conversion costs when you know that the original 
string is ascii or utf-8 or other.  The cost during lexing is 
essentially zero.  The cost of storing that state might be a concern, or 
it might be free in otherwise unused padding space.  The cost of 
re-scanning strings that can be avoided is non-trivial.

My past experience with this was in an http parser, where there's even 
more complex logic than json parsing, but the concepts still apply.

Aug 23 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Saturday, 23 August 2014 at 19:01:13 UTC, Brad Roberts via 
Digitalmars-d wrote:
 original string is ascii or utf-8 or other.  The cost during 
 lexing is essentially zero.

I am not so sure when it comes to SIMD lexing. I think the 
specified behaviour should be done in a way which encourage later 
optimizations.

Aug 23 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

Some baselines for performance:

https://github.com/mloskot/json_benchmark

http://chadaustin.me/2013/01/json-parser-benchmarking/

Aug 23 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 8/23/2014 12:00 PM, Brad Roberts via Digitalmars-d wrote:
 On 8/23/2014 10:46 AM, Walter Bright via Digitalmars-d wrote:
 I feel that non-UTF encodings should be handled by adapter algorithms,
 not embedded into the JSON lexer, so yes, I'd drop that.

 For performance purposes, determining encoding during lexing is useful.

I'm not convinced that using an adapter algorithm won't be just as fast.

Aug 23 2014

Brad Roberts via Digitalmars-d <digitalmars-d puremagic.com> writes:

On 8/23/2014 3:20 PM, Walter Bright via Digitalmars-d wrote:
 On 8/23/2014 12:00 PM, Brad Roberts via Digitalmars-d wrote:
 On 8/23/2014 10:46 AM, Walter Bright via Digitalmars-d wrote:
 I feel that non-UTF encodings should be handled by adapter algorithms,
 not embedded into the JSON lexer, so yes, I'd drop that.

 For performance purposes, determining encoding during lexing is useful.

 I'm not convinced that using an adapter algorithm won't be just as fast.

Consider your own talks on optimizing the existing dmd lexer.  In those 
talks you've talked about the evils of additional processing on every 
byte.  That's what you're talking about here.  While it's possible that 
the inliner and other optimizer steps might be able to integrate the two 
phases and remove some overhead, I'll believe it when I see the 
resulting assembly code.

Aug 23 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 8/23/2014 6:32 PM, Brad Roberts via Digitalmars-d wrote:
 I'm not convinced that using an adapter algorithm won't be just as fast.

 Consider your own talks on optimizing the existing dmd lexer.  In those talks
 you've talked about the evils of additional processing on every byte.  That's
 what you're talking about here.  While it's possible that the inliner and other
 optimizer steps might be able to integrate the two phases and remove some
 overhead, I'll believe it when I see the resulting assembly code.

On the other hand, deadalnix demonstrated that the ldc optimizer was able to 
remove the extra code.

I have a reasonable faith that optimization can be improved where necessary to 
cover this.

Aug 25 2014

simendsjo <simendsjo+dlang gmail.com> writes:

On 08/25/2014 09:35 PM, Walter Bright wrote:
 On 8/23/2014 6:32 PM, Brad Roberts via Digitalmars-d wrote:
 I'm not convinced that using an adapter algorithm won't be just as fast.

 Consider your own talks on optimizing the existing dmd lexer.  In
 those talks
 you've talked about the evils of additional processing on every byte. 
 That's
 what you're talking about here.  While it's possible that the inliner
 and other
 optimizer steps might be able to integrate the two phases and remove some
 overhead, I'll believe it when I see the resulting assembly code.

 
 On the other hand, deadalnix demonstrated that the ldc optimizer was
 able to remove the extra code.
 
 I have a reasonable faith that optimization can be improved where
 necessary to cover this.

I just happened to write a very small script yesterday and tested with
the compilers (with dub --build=release).

dmd: 2.8 mb
gdc: 3.3 mb
ldc  0.5 mb

So ldc can remove quite a substantial amount of code in some cases.

Aug 25 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 8/25/2014 12:49 PM, simendsjo wrote:
 I just happened to write a very small script yesterday and tested with
 the compilers (with dub --build=release).

 dmd: 2.8 mb
 gdc: 3.3 mb
 ldc  0.5 mb

 So ldc can remove quite a substantial amount of code in some cases.

Speed optimizations are different.

Aug 25 2014

Jacob Carlborg <doob me.com> writes:

On 25/08/14 21:49, simendsjo wrote:

 So ldc can remove quite a substantial amount of code in some cases.

It's because the latest release of LDC has the --gc-sections falg 
enabled by default.

-- 
/Jacob Carlborg

Aug 26 2014

"Entusiastic user" <cncgeneralsfan999 abv.bg> writes:

I tried using "-disable-linker-strip-dead", but it had no effect. 
 From the error messages it seems the problem is compile-time and 
not link-time...



On Tuesday, 26 August 2014 at 07:01:09 UTC, Jacob Carlborg wrote:
 On 25/08/14 21:49, simendsjo wrote:

 So ldc can remove quite a substantial amount of code in some 
 cases.

 It's because the latest release of LDC has the --gc-sections 
 falg enabled by default.

Aug 26 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 8/23/14, 10:46 AM, Walter Bright wrote:
 On 8/23/2014 10:42 AM, Sönke Ludwig wrote:
 Am 23.08.2014 19:38, schrieb Walter Bright:
 On 8/23/2014 9:36 AM, Sönke Ludwig wrote:
 input types "string" and "immutable(ubyte)[]"

 Why the immutable(ubyte)[] ?

 I've adopted that basically from Andrei's module. The idea is to allow
 processing data with arbitrary character encoding. However, the output
 will
 always be Unicode and JSON is defined to be encoded as Unicode, too,
 so that
 could probably be dropped...

 I feel that non-UTF encodings should be handled by adapter algorithms,
 not embedded into the JSON lexer, so yes, I'd drop that.

I think accepting ubyte it's a good idea. It means "got this stream of 
bytes off of the wire and it hasn't been validated as a UTF string". It 
also means (which is true) that the lexer does enough validation to 
constrain arbitrary bytes into text, and saves caller from either a 
check (expensive) or a cast (unpleasant).

Reality is the JSON lexer takes ubytes and produces tokens.


Andrei

Aug 23 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 8/23/2014 2:36 PM, Andrei Alexandrescu wrote:
 I think accepting ubyte it's a good idea. It means "got this stream of bytes
off
 of the wire and it hasn't been validated as a UTF string". It also means (which
 is true) that the lexer does enough validation to constrain arbitrary bytes
into
 text, and saves caller from either a check (expensive) or a cast (unpleasant).

 Reality is the JSON lexer takes ubytes and produces tokens.

Using an adapter still makes sense, because:

1. The adapter should be just as fast as wiring it in internally

2. The adapter then becomes a general purpose tool that can be used elsewhere 
where the encoding is unknown or suspect

3. The scope of the adapter is small, so it is easier to get it right, and
being 
reusable means every user benefits from it

4. If we can't make adapters efficient, we've failed at the ranges+algorithms 
model, and I'm very unwilling to fail at that

Aug 23 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 8/23/14, 3:24 PM, Walter Bright wrote:
 On 8/23/2014 2:36 PM, Andrei Alexandrescu wrote:
 I think accepting ubyte it's a good idea. It means "got this stream of
 bytes off
 of the wire and it hasn't been validated as a UTF string". It also
 means (which
 is true) that the lexer does enough validation to constrain arbitrary
 bytes into
 text, and saves caller from either a check (expensive) or a cast
 (unpleasant).

 Reality is the JSON lexer takes ubytes and produces tokens.

 Using an adapter still makes sense, because:

 1. The adapter should be just as fast as wiring it in internally

 2. The adapter then becomes a general purpose tool that can be used
 elsewhere where the encoding is unknown or suspect

 3. The scope of the adapter is small, so it is easier to get it right,
 and being reusable means every user benefits from it

 4. If we can't make adapters efficient, we've failed at the
 ranges+algorithms model, and I'm very unwilling to fail at that

An adapter would solve the wrong problem here. There's nothing to adapt 
from and to.

An adapter would be good if e.g. the stream uses UTF-16 or some Windows 
encoding. Bytes are the natural input for a json parser.


Andrei

Aug 23 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 8/23/2014 3:51 PM, Andrei Alexandrescu wrote:
 An adapter would solve the wrong problem here. There's nothing to adapt from
and
 to.

 An adapter would be good if e.g. the stream uses UTF-16 or some Windows
 encoding. Bytes are the natural input for a json parser.

The adaptation is to take arbitrary byte input in an unknown encoding and 
produce valid UTF.

Note that many html readers scan the bytes to see if it is ASCII, UTF, some
code 
page encoding, Shift-JIS, etc., and translate accordingly. I do not see why
that 
is less costly to put inside the JSON lexer than as an adapter.

Aug 25 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Monday, 25 August 2014 at 19:38:05 UTC, Walter Bright wrote:
 The adaptation is to take arbitrary byte input in an unknown 
 encoding and produce valid UTF.

I agree.

For a restful http service the encoding should be specified in 
the http header and the input rejected if it isn't UTF 
compatible. For that use scenario you only want validation, not 
conversion. However some validation is free, like if you only 
accept numbers you could just turn off parsing of strings in the 
template…

If files are read from storage then you can reread the file if it 
fails validation on the first pass.

I wonder, in which use scenario it is that both of these 
conditions fail?

1. unspecified character-set and cannot assume UTF for JSON
3. unable to re-parse

Aug 25 2014

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 25.08.2014 21:50, schrieb "Ola Fosheim Grøstad" 
<ola.fosheim.grostad+dlang gmail.com>":
 On Monday, 25 August 2014 at 19:38:05 UTC, Walter Bright wrote:
 The adaptation is to take arbitrary byte input in an unknown encoding
 and produce valid UTF.

 I agree.

 For a restful http service the encoding should be specified in the http
 header and the input rejected if it isn't UTF compatible. For that use
 scenario you only want validation, not conversion. However some
 validation is free, like if you only accept numbers you could just turn
 off parsing of strings in the template…

 If files are read from storage then you can reread the file if it fails
 validation on the first pass.

 I wonder, in which use scenario it is that both of these conditions fail?

 1. unspecified character-set and cannot assume UTF for JSON
 3. unable to re-parse

BTW, JSON is *required* to be UTF encoded anyway as per RFC-7159, which 
is another argument for just letting the lexer assume valid UTF.

Aug 25 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Monday, 25 August 2014 at 20:35:32 UTC, Sönke Ludwig wrote:
 BTW, JSON is *required* to be UTF encoded anyway as per 
 RFC-7159, which is another argument for just letting the lexer 
 assume valid UTF.

The lexer cannot assume valid UTF since the client might be a 
rogue, but it can just bail out if the lookahead isn't jSON? So 
UTF-validation is limited to strings.

You have to parse the strings because of the \uXXXX escapes of 
course, so some basic validation is unavoidable? But I guess full 
validation of string content could be another useful option along 
with "ignore escapes" for the case where you want to avoid 
decode-encode scenarios. (like for a proxy, or if you store 
pre-escaped unicode in a database)

Aug 25 2014

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 25.08.2014 22:51, schrieb "Ola Fosheim Grøstad" 
<ola.fosheim.grostad+dlang gmail.com>":
 On Monday, 25 August 2014 at 20:35:32 UTC, Sönke Ludwig wrote:
 BTW, JSON is *required* to be UTF encoded anyway as per RFC-7159,
 which is another argument for just letting the lexer assume valid UTF.

 The lexer cannot assume valid UTF since the client might be a rogue, but
 it can just bail out if the lookahead isn't jSON? So UTF-validation is
 limited to strings.

But why should UTF validation be the job of the lexer in the first 
place? D's "string" type is also defined to be UTF-8, so given that, it 
would of course be free to assume valid UTF-8. I agree with Walter there 
that validation/conversion should be added as a separate proxy range. 
But if we end up going for validating in the lexer, it would indeed be 
enough to validate inside strings, because the rest of the grammar 
assumes a subset of ASCII.

 You have to parse the strings because of the \uXXXX escapes of course,
 so some basic validation is unavoidable?

At least no UTF validation is needed. Since all non-ASCII characters 
will always be composed of bytes >0x7F, a sequence \uXXXX can be assumed 
to be valid wherever in the string it occurs, and all other bytes that 
don't belong to an escape sequence are just passed through as-is.

 But I guess full validation of
 string content could be another useful option along with "ignore
 escapes" for the case where you want to avoid decode-encode scenarios.
 (like for a proxy, or if you store pre-escaped unicode in a database)

Aug 25 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Monday, 25 August 2014 at 21:27:42 UTC, Sönke Ludwig wrote:
 But why should UTF validation be the job of the lexer in the 
 first place?

Because you want to save time, it is faster to integrate 
validation? The most likely use scenario is to receive REST data 
over HTTP that needs validation.

Well, so then I agree with Andrei… array of bytes it is. ;-)

 added as a separate proxy range. But if we end up going for 
 validating in the lexer, it would indeed be enough to validate 
 inside strings, because the rest of the grammar assumes a 
 subset of ASCII.

Not assumes, but defines! :-)

If you have to validate UTF before lexing then you will end up 
needlessly scanning lots of ascii if the file contains lots of 
non-strings or is from a encoder that only sends pure ascii.

If you want to have "plugin" validation of strings then you also 
need to differentiate strings so that the user can select which 
data should be just ascii, utf8, numbers, ids etc. Otherwise the 
user will end up doing double validation (you have to bypass >7F 
followed by string-end anyway).

The advantage of integrated validation is that you can use 16 
bytes SIMD registers on the buffer.

I presume you can load 16 bytes and do BITWISE-AND on the MSB, 
then match against string-end and carefully use this to boost 
performance of simultanous UTF validation, escape-scanning, and 
string-end scan. A bit tricky, of course.

 At least no UTF validation is needed. Since all non-ASCII 
 characters will always be composed of bytes >0x7F, a sequence 
 \uXXXX can be assumed to be valid wherever in the string it 
 occurs, and all other bytes that don't belong to an escape 
 sequence are just passed through as-is.

You cannot assume \u… to be valid if you convert it.

Aug 25 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Monday, 25 August 2014 at 21:53:50 UTC, Ola Fosheim Grøstad 
wrote:
 I presume you can load 16 bytes and do BITWISE-AND on the MSB, 
 then match against string-end and carefully use this to boost 
 performance of simultanous UTF validation, escape-scanning, and 
 string-end scan. A bit tricky, of course.

I think it is doable and worth it…

https://software.intel.com/sites/landingpage/IntrinsicsGuide/

e.g.:

__mmask16 _mm_cmpeq_epu8_mask (__m128i a, __m128i b)
__mmask32 _mm256_cmpeq_epu8_mask (__m256i a, __m256i b)
__mmask64 _mm512_cmpeq_epu8_mask (__m512i a, __m512i b)
__mmask16 _mm_test_epi8_mask (__m128i a, __m128i b)
etc.

So you can:

1. preload registers with "\\\\\\\\…" ,  "\"\"…"  and "\0\0\0…"
2. then compare signed/unsigned/equal whatever.
3. then load 16,32 or 64 bytes of data and stream until the masks 
trigger
4. tests masks
5. resolve any potential issues, goto 3

Aug 25 2014

"Kiith-Sa" <kiithsacmp gmail.com> writes:

On Monday, 25 August 2014 at 22:40:00 UTC, Ola Fosheim Grøstad 
wrote:
 On Monday, 25 August 2014 at 21:53:50 UTC, Ola Fosheim Grøstad 
 wrote:
 I presume you can load 16 bytes and do BITWISE-AND on the MSB, 
 then match against string-end and carefully use this to boost 
 performance of simultanous UTF validation, escape-scanning, 
 and string-end scan. A bit tricky, of course.

 I think it is doable and worth it…

 https://software.intel.com/sites/landingpage/IntrinsicsGuide/

 e.g.:

 __mmask16 _mm_cmpeq_epu8_mask (__m128i a, __m128i b)
 __mmask32 _mm256_cmpeq_epu8_mask (__m256i a, __m256i b)
 __mmask64 _mm512_cmpeq_epu8_mask (__m512i a, __m512i b)
 __mmask16 _mm_test_epi8_mask (__m128i a, __m128i b)
 etc.

 So you can:

 1. preload registers with "\\\\\\\\…" ,  "\"\"…"  and "\0\0\0…"
 2. then compare signed/unsigned/equal whatever.
 3. then load 16,32 or 64 bytes of data and stream until the 
 masks trigger
 4. tests masks
 5. resolve any potential issues, goto 3

D:YAML uses a similar approach, but with 8 bytes (plain ulong - 
portable) to detect how many ASCII chars are there before the 
first non-ASCII UTF-8 sequence,  and it significantly improves 
performance (didn't keep any numbers unfortunately, but it 
decreases decoding overhead to a fraction for most inputs (since 
YAML (and JSON) files tend to be mostly-ASCII with non-ASCII from 
time to time in strings), if we know that we have e.g. 100 chars 
incoming that are plain ASCII, we can use a fast path for them 
and only consider decoding after that))

See the countASCII() function in 
https://github.com/kiith-sa/D-YAML/blob/master/source/dyaml/reader.d

However, this approach is useful only if you decode the whole 
buffer at once, not if you do something like foreach(dchar ch; 
"asdsššdfáľäô") {}, which is the most obvious way to decode in D.

FWIW, decoding _was_ a significant overhead in D:YAML (again, 
didn't keep numbers, but at a time it was around 10% in the 
profiler), and I didn't like the fact that it prevented making my 
code  nogc - I ended up copying chunks of std.utf and making them 
 nogc nothrow (D:YAML as a whole is not  nogc but I use  nogc in 
some parts basically as " noalloc" to ensure I don't allocate 
anything)

Aug 25 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Monday, 25 August 2014 at 23:24:43 UTC, Kiith-Sa wrote:
 D:YAML uses a similar approach, but with 8 bytes (plain ulong - 
 portable) to detect how many ASCII chars are there before the 
 first non-ASCII UTF-8 sequence,  and it significantly improves 
 performance (didn't keep any numbers unfortunately, but it

Cool!

I think often you will have an array of numbers so you could 
subtract "000000000…", then parse offset-bytes and convert the 
mantissa/exponent using shuffles and simd.

Somehow…

Aug 25 2014

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 25.08.2014 23:53, schrieb "Ola Fosheim Grøstad" 
<ola.fosheim.grostad+dlang gmail.com>":
 On Monday, 25 August 2014 at 21:27:42 UTC, Sönke Ludwig wrote:
 But why should UTF validation be the job of the lexer in the first place?

 Because you want to save time, it is faster to integrate validation? The
 most likely use scenario is to receive REST data over HTTP that needs
 validation.

 Well, so then I agree with Andrei… array of bytes it is. ;-)

 added as a separate proxy range. But if we end up going for validating
 in the lexer, it would indeed be enough to validate inside strings,
 because the rest of the grammar assumes a subset of ASCII.

 Not assumes, but defines! :-)

I guess it depends on if you look at the grammar as productions or 
comprehensions(right term?) ;)

 If you have to validate UTF before lexing then you will end up
 needlessly scanning lots of ascii if the file contains lots of
 non-strings or is from a encoder that only sends pure ascii.

That's true. So the ideal solution would be to *assume* UTF-8 when the 
input is char based and to *validate* if the input is "numeric".

 If you want to have "plugin" validation of strings then you also need to
 differentiate strings so that the user can select which data should be
 just ascii, utf8, numbers, ids etc. Otherwise the user will end up doing
 double validation (you have to bypass >7F followed by string-end anyway).

 The advantage of integrated validation is that you can use 16 bytes SIMD
 registers on the buffer.

 I presume you can load 16 bytes and do BITWISE-AND on the MSB, then
 match against string-end and carefully use this to boost performance of
 simultanous UTF validation, escape-scanning, and string-end scan. A bit
 tricky, of course.

Well, that's something that's definitely out of the scope of this 
proposal. Definitely an interesting direction to pursue, though.

 At least no UTF validation is needed. Since all non-ASCII characters
 will always be composed of bytes >0x7F, a sequence \uXXXX can be
 assumed to be valid wherever in the string it occurs, and all other
 bytes that don't belong to an escape sequence are just passed through
 as-is.

 You cannot assume \u… to be valid if you convert it.

I meant "X" to stand for a hex digit. The point was just that you don't 
have to worry about interacting in a bad way with UTF sequences when you 
find "\uXXXX".

Aug 26 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Tuesday, 26 August 2014 at 07:51:04 UTC, Sönke Ludwig wrote:
 That's true. So the ideal solution would be to *assume* UTF-8 
 when the input is char based and to *validate* if the input is 
 "numeric".

I think you should validate JSON-strings to be UTF-8 encoded even 
if you allow illegal unicode values. Basically ensuring that 
0x7f has the right number of bytes after it, so you don't get 
0x7f as the last byte in a string etc.

 Well, that's something that's definitely out of the scope of 
 this proposal. Definitely an interesting direction to pursue, 
 though.

Maybe the interface/code structure is or could be designed so 
that the implementation could later be version()'ed to SIMD where 
possible.

 You cannot assume \u… to be valid if you convert it.

 I meant "X" to stand for a hex digit. The point was just that 
 you don't have to worry about interacting in a bad way with UTF 
 sequences when you find "\uXXXX".

When you convert "\uXXXX" to UTF-8 bytes, is it then validated as 
a legal code point? I guess it is not necessary.

Btw, I believe rapidJSON achieves high speed by converting 
strings in situ, so that if the prefix is escape free it just 
converts in place when it hits the first escape. Thus avoiding 
some moving.

Aug 26 2014

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 26.08.2014 10:24, schrieb "Ola Fosheim Grøstad" 
<ola.fosheim.grostad+dlang gmail.com>":
 On Tuesday, 26 August 2014 at 07:51:04 UTC, Sönke Ludwig wrote:
 That's true. So the ideal solution would be to *assume* UTF-8 when the
 input is char based and to *validate* if the input is "numeric".

 I think you should validate JSON-strings to be UTF-8 encoded even if you
 allow illegal unicode values. Basically ensuring that >0x7f has the
 right number of bytes after it, so you don't get >0x7f as the last byte
 in a string etc.

I think this is a misunderstanding. What I mean is that if the input 
range passed to the lexer is char/wchar/dchar based, the lexer should 
assume that the input is well formed UTF. After all this is how D 
strings are defined.

When on the other hand a ubyte/ushort/uint range is used, the lexer 
should validate all string literals.

 Well, that's something that's definitely out of the scope of this
 proposal. Definitely an interesting direction to pursue, though.

 Maybe the interface/code structure is or could be designed so that the
 implementation could later be version()'ed to SIMD where possible.

I guess that shouldn't be an issue. From the outside it's just a generic 
range that is passed in and internally it's always possible to add 
special cases for array inputs. If someone else wants to play around 
with this idea, we could of course also integrate it right away, it's 
just that I personally don't have the time to go to the extreme here.

 You cannot assume \u… to be valid if you convert it.

 I meant "X" to stand for a hex digit. The point was just that you
 don't have to worry about interacting in a bad way with UTF sequences
 when you find "\uXXXX".

 When you convert "\uXXXX" to UTF-8 bytes, is it then validated as a
 legal code point? I guess it is not necessary.

What is validated is that it forms valid UTF-16 surrogate pairs, and 
those are converted to a single dchar instead (if applicable). This is 
necessary, because otherwise the lexer would produce invalid UTF-8 for 
valid inputs. Apart from that, the value is used verbatim as a dchar.

 Btw, I believe rapidJSON achieves high speed by converting strings in
 situ, so that if the prefix is escape free it just converts in place
 when it hits the first escape. Thus avoiding some moving.

The same is true for this lexer, at least for array inputs. It actually 
currently just stores a slice of the string literal in all cases and 
lazily decodes on the first access. While doing that, it first skips any 
escape sequence free prefix and returns a slice if the whole string is 
escape sequence free.

Aug 26 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Tuesday, 26 August 2014 at 09:05:05 UTC, Sönke Ludwig wrote:
 When on the other hand a ubyte/ushort/uint range is used, the 
 lexer should validate all string literals.

Yes, so this will be supported? Because this is what is most 
useful.

Aug 26 2014

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 26.08.2014 11:11, schrieb "Ola Fosheim Grøstad" 
<ola.fosheim.grostad+dlang gmail.com>":
 On Tuesday, 26 August 2014 at 09:05:05 UTC, Sönke Ludwig wrote:
 When on the other hand a ubyte/ushort/uint range is used, the lexer
 should validate all string literals.

 Yes, so this will be supported? Because this is what is most useful.

If nobody plays a veto card, I'll implement it that way.

Aug 26 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

Btw, maybe it would be a good idea to take a look on the JSON 
that various browsers generates to see if there are any 
differences?

Then one could tune optimizations to what is the most common 
coding, like this:

1. start parsing assuming "browser style restricted JSON" grammar.

2. on failure jump to the slower "generic JSON"

Chrome does not seem to generate whitespace in JSON.stringfy(). 
And I would not be surprised if the encoding of double is similar 
across browsers.

Ola.

Aug 25 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 8/25/2014 1:35 PM, Sönke Ludwig wrote:
 BTW, JSON is *required* to be UTF encoded anyway as per RFC-7159, which is
 another argument for just letting the lexer assume valid UTF.

I think that settles it.

Aug 25 2014

Andrej Mitrovic via Digitalmars-d <digitalmars-d puremagic.com> writes:

On 8/22/14, Sönke Ludwig <digitalmars-d puremagic.com> wrote:
 Docs: http://s-ludwig.github.io/std_data_json/

This confused me for a solid minute:

// Lex a JSON string into a lazy range of tokens
auto tokens = lexJSON(`{"name": "Peter", "age": 42}`);

with (JSONToken.Kind) {
    assert(tokens.map!(t => t.kind).equal(
        [objectStart, string, colon, string, comma,
        string, colon, number, objectEnd]));
}

Generally I'd avoid using de-facto reserved names as enum member names
(e.g. string).

Aug 22 2014

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 22.08.2014 21:15, schrieb Andrej Mitrovic via Digitalmars-d:
 On 8/22/14, Sönke Ludwig <digitalmars-d puremagic.com> wrote:
 Docs: http://s-ludwig.github.io/std_data_json/

 This confused me for a solid minute:

 // Lex a JSON string into a lazy range of tokens
 auto tokens = lexJSON(`{"name": "Peter", "age": 42}`);

 with (JSONToken.Kind) {
      assert(tokens.map!(t => t.kind).equal(
          [objectStart, string, colon, string, comma,
          string, colon, number, objectEnd]));
 }

 Generally I'd avoid using de-facto reserved names as enum member names
 (e.g. string).

Hmmm, but it *is* a string. Isn't the problem more the use of with in 
this case? Maybe the example should just use with(JSONToken) and then 
Kind.string?

Aug 22 2014

Andrej Mitrovic via Digitalmars-d <digitalmars-d puremagic.com> writes:

On 8/22/14, Sönke Ludwig <digitalmars-d puremagic.com> wrote:
 Hmmm, but it *is* a string. Isn't the problem more the use of with in
 this case?

Yeah, maybe so. I thought for a second it was a tuple, but then I saw
the square brackets and was left scratching my head. :)

Aug 23 2014

"deadalnix" <deadalnix gmail.com> writes:

First thank you for your work. std.json is horrible to use right 
now, so a replacement is more than welcome.

I haven't played with your code yet, so I may be asking for 
somethign that already exists, but did you had a look to jsvar by 
Adam ?

You can find it here: 
https://github.com/adamdruppe/arsd/blob/master/jsvar.d

One of the big pain when one work with format like JSON is that 
you go from the untyped world to the typed world (the same 
problem occurs with XML and various config format as well).

I think Adam got the right balance in jsvar. It behave closely 
enough to javascript so it is convenient to manipulate, while 
removing the most dangerous behavior (concatenation is still done 
using ~and not + as in JS).

If that is not already the case, I'd love that the element I get 
out of my JSON behave that way. If you can do that, you have a 
user.

Aug 22 2014

ketmar via Digitalmars-d <digitalmars-d puremagic.com> writes:

On Sat, 23 Aug 2014 02:23:25 +0000
deadalnix via Digitalmars-d <digitalmars-d puremagic.com> wrote:

 I haven't played with your code yet, so I may be asking for=20
 somethign that already exists, but did you had a look to jsvar by=20
 Adam ?

jsvar using opDispatch, and S=C3=B6nke wrote:
  - No opDispatch() for JSONValue - this has shown to do more harm than
    good in vibe.data.json

Aug 22 2014

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 23.08.2014 04:23, schrieb deadalnix:
 First thank you for your work. std.json is horrible to use right now, so
 a replacement is more than welcome.

 I haven't played with your code yet, so I may be asking for somethign
 that already exists, but did you had a look to jsvar by Adam ?

 You can find it here:
 https://github.com/adamdruppe/arsd/blob/master/jsvar.d

 One of the big pain when one work with format like JSON is that you go
 from the untyped world to the typed world (the same problem occurs with
 XML and various config format as well).

 I think Adam got the right balance in jsvar. It behave closely enough to
 javascript so it is convenient to manipulate, while removing the most
 dangerous behavior (concatenation is still done using ~and not + as in JS).

 If that is not already the case, I'd love that the element I get out of
 my JSON behave that way. If you can do that, you have a user.

Setting the issue of opDispatch aside, one of the goals was to use 
Algebraic to store values. It is probably not completely as flexible as 
jsvar, but still transparently enables a lot of operations (with those 
pull requests merged at least). But it has another big advantage, which 
is that we can later define other types based on Algebraic, such as 
BSONValue, and those can be transparently runtime converted between each 
other in a generic way. A special case type on the other hand produces 
nasty dependencies between the formats.

Main issues of using opDispatch:

  - Prone to bugs where a normal field/method of the JSONValue struct is 
accessed instead of a JSON field
  - On top of that the var.field syntax gives the wrong impression that 
you are working with static typing, while var["field"] makes it clear 
that runtime indexing is going on
  - Every interface change of JSONValue would be a silent breaking 
change, because the whole string domain is used up for opDispatch

Aug 23 2014

"w0rp" <devw0rp gmail.com> writes:

On Saturday, 23 August 2014 at 09:22:01 UTC, Sönke Ludwig wrote:
 Main issues of using opDispatch:

  - Prone to bugs where a normal field/method of the JSONValue 
 struct is accessed instead of a JSON field
  - On top of that the var.field syntax gives the wrong 
 impression that you are working with static typing, while 
 var["field"] makes it clear that runtime indexing is going on
  - Every interface change of JSONValue would be a silent 
 breaking change, because the whole string domain is used up for 
 opDispatch

I have seen similar issues to these with simplexml in PHP. Using 
opDispatch to match all possible names except a few doesn't work 
so well.

I'm not sure if you've changed it already, but I agree with the 
earlier comment about changing the flag for pretty printing from 
a boolean to an enum value. Booleans in interfaces is one of my 
pet peeves.

Aug 23 2014

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 23.08.2014 14:19, schrieb w0rp:
 I'm not sure if you've changed it already, but I agree with the earlier
 comment about changing the flag for pretty printing from a boolean to an
 enum value. Booleans in interfaces is one of my pet peeves.

It's split into two separate functions now. Having to type out a full 
enum value I guess would be too distracting in this case, since they 
will be pretty frequently used.

Aug 23 2014

"deadalnix" <deadalnix gmail.com> writes:

On Saturday, 23 August 2014 at 09:22:01 UTC, Sönke Ludwig wrote:
 Main issues of using opDispatch:

  - Prone to bugs where a normal field/method of the JSONValue 
 struct is accessed instead of a JSON field
  - On top of that the var.field syntax gives the wrong 
 impression that you are working with static typing, while 
 var["field"] makes it clear that runtime indexing is going on
  - Every interface change of JSONValue would be a silent 
 breaking change, because the whole string domain is used up for 
 opDispatch

Yes, I don't mind missing that one. It look like a false good 
idea.

Aug 23 2014

=?ISO-8859-15?Q?S=F6nke_Ludwig?= <sludwig rejectedsoftware.com> writes:

I've added support (compile time option [1]) for long and BigInt in the 
lexer (and parser), see [2]. JSONValue currently still only stores 
double for numbers. There are two options for extending JSONValue:

1. Add long and BigInt to the set of supported types for JSONValue. This 
preserves all features of Algebraic and would later still allow 
transparent conversion to other similar value types (e.g. BSONValue). On 
the other hand it would be necessary to always check the actual type 
before accessing a number, or the Algebraic would throw.

2. Instead of double, store a JSONNumber in the Algebraic. This enables 
all the transparent conversions of JSONNumber and would thus be more 
convenient, but blocks the way for possible automatic conversions in the 
future.

I'm leaning towards 1, because allowing generic conversion between 
different JSONValue-like types was one of my prime goals for the new module.

[1]: 
http://s-ludwig.github.io/std_data_json/stdx/data/json/lexer/LexOptions.html
[2]: 
http://s-ludwig.github.io/std_data_json/stdx/data/json/lexer/JSONNumber.html

Aug 25 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Monday, 25 August 2014 at 11:30:15 UTC, Sönke Ludwig wrote:
 I've added support (compile time option [1]) for long and 
 BigInt in the lexer (and parser), see [2]. JSONValue currently 
 still only stores double for numbers.

It can be very useful to have a base 10 exponent representation 
in certain situations where you need to have the exact same 
results in two systems (like a third party ERP server versus a 
client side application). Base 2 exponents are tricky (incorrect) 
when you read ascii.

E.g. I have resorted to using Decimal in Python just to avoid the 
weird round off issues when calculating prices where the price is 
given in fractions of the order unit.

Perhaps a marginal problem, but could be important for some 
serious application areas where you need to integrate D with 
existing systems (for which you don't have the source code).

Aug 25 2014

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 25.08.2014 14:12, schrieb "Ola Fosheim Grøstad" 
<ola.fosheim.grostad+dlang gmail.com>":
 On Monday, 25 August 2014 at 11:30:15 UTC, Sönke Ludwig wrote:
 I've added support (compile time option [1]) for long and BigInt in
 the lexer (and parser), see [2]. JSONValue currently still only stores
 double for numbers.

 It can be very useful to have a base 10 exponent representation in
 certain situations where you need to have the exact same results in two
 systems (like a third party ERP server versus a client side
 application). Base 2 exponents are tricky (incorrect) when you read ascii.

 E.g. I have resorted to using Decimal in Python just to avoid the weird
 round off issues when calculating prices where the price is given in
 fractions of the order unit.

 Perhaps a marginal problem, but could be important for some serious
 application areas where you need to integrate D with existing systems
 (for which you don't have the source code).

In fact, I've already prepared the code for that, but commented it out 
for now, because I wanted to have an efficient algorithm for converting 
double to Decimal and because we should probably first add a Decimal 
type to Phobos instead of adding it to the JSON module.

Aug 25 2014

"Don" <x nospam.com> writes:

On Thursday, 21 August 2014 at 22:35:18 UTC, Sönke Ludwig wrote:
 Following up on the recent "std.jgrandson" thread [1], I've 
 picked up the work (a lot earlier than anticipated) and 
 finished a first version of a loose blend of said 
 std.jgrandson, vibe.data.json and some changes that I had 
 planned for vibe.data.json for a while. I'm quite pleased by 
 the results so far, although without a serialization framework 
 it still misses a very important building block.

 Code: https://github.com/s-ludwig/std_data_json
 Docs: http://s-ludwig.github.io/std_data_json/
 DUB: http://code.dlang.org/packages/std_data_json

 The new code contains:
  - Lazy lexer in the form of a token input range (using slices 
 of the
    input if possible)
  - Lazy streaming parser (StAX style) in the form of a node 
 input range
  - Eager DOM style parser returning a JSONValue
  - Range based JSON string generator taking either a token 
 range, a
    node range, or a JSONValue
  - Opt-out location tracking (line/column) for tokens, nodes 
 and values
  - No opDispatch() for JSONValue - this has shown to do more 
 harm than
    good in vibe.data.json

 The DOM style JSONValue type is based on std.variant.Algebraic. 
 This currently has a few usability issues that can be solved by 
 upgrading/fixing Algebraic:

  - Operator overloading only works sporadically
  - No "tag" enum is supported, so that switch()ing on the type 
 of a
    value doesn't work and an if-else cascade is required
  - Operations and conversions between different Algebraic types 
 is not
    conveniently supported, which gets important when other 
 similar
    formats get supported (e.g. BSON)

 Assuming that those points are solved, I'd like to get some 
 early feedback before going for an official review. One open 
 issue is how to handle unescaping of string literals. Currently 
 it always unescapes immediately, which is more efficient for 
 general input ranges when the unescaped result is needed, but 
 less efficient for string inputs when the unescaped result is 
 not needed. Maybe a flag could be used to conditionally switch 
 behavior depending on the input range type.

 Destroy away! ;)

 [1]: http://forum.dlang.org/thread/lrknjl$co7$1 digitalmars.com


One missing feature (which is also missing from the existing 
std.json) is support for NaN and Infinity as JSON values. 
Although they are not part of the formal JSON spec (which is a 
ridiculous omission, the argument given for excluding them is 
fallacious), they do get generated if you use Javascript's 
toString to create the JSON. Many JSON libraries (eg Google's) 
also generate them, so they are frequently encountered in 
practice. So a JSON parser should at least be able to lex them.

ie this should be parsable:

{"foo": NaN, "bar": Infinity, "baz": -Infinity}

You should also put tests in for what happens when you pass NaN 
or infinity to toJSON. It shouldn't silently generate invalid 
JSON.

Aug 25 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Monday, 25 August 2014 at 13:07:08 UTC, Don wrote:
 practice. So a JSON parser should at least be able to lex them.

 ie this should be parsable:

 {"foo": NaN, "bar": Infinity, "baz": -Infinity}

 You should also put tests in for what happens when you pass NaN 
 or infinity to toJSON. It shouldn't silently generate invalid 
 JSON.

I believe you are allowed to use very high exponents, though. 
Like: 1E999 . So you need to decide if those should be mapped to 
+Infinity or to the max value…

NaN also come in two forms with differing semantics: 
signalling(NaNs) and quiet (NaN).  NaN is used for 0/0 and 
sqrt(-1), but NaNs is used for illegal values and failure.

For some reason D does not seem to support this aspect of 
IEEE754? I cannot find ".nans" listed on the page 
http://dlang.org/property.html

The distinction is important when you do conditional branching. 
With NaNs you might not be able to figure out which branch to 
take since you might have missed out on a real value, with NaN 
you got the value (which is known to be not real) and you might 
be able to branch.

Aug 25 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 8/25/2014 6:23 AM, "Ola Fosheim Grøstad" 
<ola.fosheim.grostad+dlang gmail.com>" wrote:
 On Monday, 25 August 2014 at 13:07:08 UTC, Don wrote:
 practice. So a JSON parser should at least be able to lex them.

 ie this should be parsable:

 {"foo": NaN, "bar": Infinity, "baz": -Infinity}

 You should also put tests in for what happens when you pass NaN or infinity to
 toJSON. It shouldn't silently generate invalid JSON.

 I believe you are allowed to use very high exponents, though. Like: 1E999 . So
 you need to decide if those should be mapped to +Infinity or to the max
value…

Infinity. Mapping to max value would be a horrible bug.


 NaN also come in two forms with differing semantics: signalling(NaNs) and quiet
 (NaN).  NaN is used for 0/0 and sqrt(-1), but NaNs is used for illegal values
 and failure.

 For some reason D does not seem to support this aspect of IEEE754? I cannot
find
 ".nans" listed on the page http://dlang.org/property.html

Because I tried supporting them in C++. It doesn't work for various reasons. 
Nobody else supports them, either.

Aug 25 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Monday, 25 August 2014 at 19:42:03 UTC, Walter Bright wrote:
 Infinity. Mapping to max value would be a horrible bug.

Yes… but then you are reading an illegal value that JSON does not 
support…

 For some reason D does not seem to support this aspect of 
 IEEE754? I cannot find
 ".nans" listed on the page http://dlang.org/property.html

 Because I tried supporting them in C++. It doesn't work for 
 various reasons. Nobody else supports them, either.

I haven't tested, but Python is supposed to throw on NaNs.

gcc has support for nans in their documentation:
https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html

IBM Fortran supports it…

I think supporting signaling NaN is important for correctness.

Aug 25 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Monday, 25 August 2014 at 20:04:10 UTC, Ola Fosheim Grøstad 
wrote:
 I think supporting signaling NaN is important for correctness.

It is defined in C++11:

http://en.cppreference.com/w/cpp/types/numeric_limits/signaling_NaN

Aug 25 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 8/25/2014 1:21 PM, "Ola Fosheim Grøstad" 
<ola.fosheim.grostad+dlang gmail.com>" wrote:
 On Monday, 25 August 2014 at 20:04:10 UTC, Ola Fosheim Grøstad wrote:
 I think supporting signaling NaN is important for correctness.

 It is defined in C++11:

 http://en.cppreference.com/w/cpp/types/numeric_limits/signaling_NaN


I didn't know that. But recall I did implement it in DMC++, and it turned out
to 
simply not be useful. I'd be surprised if the new C++ support for it does 
anything worthwhile.

Aug 25 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Monday, 25 August 2014 at 21:24:11 UTC, Walter Bright wrote:
 I didn't know that. But recall I did implement it in DMC++, and 
 it turned out to simply not be useful. I'd be surprised if the 
 new C++ support for it does anything worthwhile.

Well, one should initialize with signaling NaN. Then you get an 
exception if you try to compute using uninitialized values.

Aug 25 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 8/25/2014 4:15 PM, "Ola Fosheim Grøstad" 
<ola.fosheim.grostad+dlang gmail.com>" wrote:
 On Monday, 25 August 2014 at 21:24:11 UTC, Walter Bright wrote:
 I didn't know that. But recall I did implement it in DMC++, and it turned out
 to simply not be useful. I'd be surprised if the new C++ support for it does
 anything worthwhile.

 Well, one should initialize with signaling NaN. Then you get an exception if
you
 try to compute using uninitialized values.


That's the theory. The practice doesn't work out so well.

Aug 25 2014

"Don" <x nospam.com> writes:

On Monday, 25 August 2014 at 23:29:21 UTC, Walter Bright wrote:
 On 8/25/2014 4:15 PM, "Ola Fosheim Grøstad" 
 <ola.fosheim.grostad+dlang gmail.com>" wrote:
 On Monday, 25 August 2014 at 21:24:11 UTC, Walter Bright wrote:
 I didn't know that. But recall I did implement it in DMC++, 
 and it turned out
 to simply not be useful. I'd be surprised if the new C++ 
 support for it does
 anything worthwhile.

 Well, one should initialize with signaling NaN. Then you get 
 an exception if you
 try to compute using uninitialized values.


 That's the theory. The practice doesn't work out so well.

To be more concrete:

Processors from AMD have signalling NaN behaviour which is 
different from processors from Intel.

And the situation is worst on most other architectures. It's a 
lost cause, I think.

Aug 26 2014

"Ola Fosheim Gr" <ola.fosheim.grostad+dlang gmail.com> writes:

On Tuesday, 26 August 2014 at 07:24:19 UTC, Don wrote:
 Processors from AMD have signalling NaN behaviour which is 
 different from processors from Intel.

 And the situation is worst on most other architectures. It's a 
 lost cause, I think.

I disagree. AFAIK signaling NaN was standardized in IEEE 754-2008.
So it receives attention.

Aug 26 2014

"Don" <x nospam.com> writes:

On Tuesday, 26 August 2014 at 07:34:05 UTC, Ola Fosheim Gr wrote:
 On Tuesday, 26 August 2014 at 07:24:19 UTC, Don wrote:
 Processors from AMD have signalling NaN behaviour which is 
 different from processors from Intel.

 And the situation is worst on most other architectures. It's a 
 lost cause, I think.

 I disagree. AFAIK signaling NaN was standardized in IEEE 
 754-2008.
 So it receives attention.

It was always in IEEE754. The decision in 754-2008 was simply to 
not remove it from the spec (a lot of people wanted to remove 
it). I don't think anything has changed.

The point is, existing hardware does not support it consistently. 
It's not possible at reasonable cost.

---
real uninitialized_var = real.snan;

void foo()
{
   real other_var = void;
   asm {
      fld uninitialized_var;
      fstp other_var;
   }
}
---

will signal on AMD, but not Intel. I'd love for this to work, but 
the hardware is fighting against us. I think it's useful only for 
debugging.

Aug 26 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Tuesday, 26 August 2014 at 10:55:20 UTC, Don wrote:
 It was always in IEEE754. The decision in 754-2008 was simply 
 to not remove it from the spec (a lot of people wanted to 
 remove it). I don't think anything has changed.

It was implementation defined before. I think they specified the 
bit in 2008.

      fld uninitialized_var;
      fstp other_var;

This is not SSE, but I guess MOVSS does not create exceptions 
either. AVX is quite complicated, but searching for "signaling" 
gives some hints about the semantics you can rely on.

https://software.intel.com/en-us/articles/introduction-to-intel-advanced-vector-extensions

https://software.intel.com/sites/default/files/managed/c6/a9/319433-020.pdf

Ola.

Aug 26 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Tuesday, 26 August 2014 at 12:37:58 UTC, Ola Fosheim Grøstad 
wrote:

 either. AVX is quite complicated, but searching for "signaling" 
 gives some hints about the semantics you can rely on.

…
 https://software.intel.com/sites/default/files/managed/c6/a9/319433-020.pdf

(Actually, searching for "SNAN" is better…)

Aug 26 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

With the danger of being noisy, these instructions are subject to 
floating point exceptions according to my (perhaps sloppy) 
reading of Intel Architecture Instruction Set Extensions 
Programming Reference (2012):

(V)ADDPD, (V)ADDPS, (V)ADDSUBPD, (V)ADDSUBPS, (V)CMPPD, (V)CMPPS, 
(V)CVTDQ2PS, (V)CVTPD2DQ, (V)CVTPD2PS, (V)CVTPS2DQ, (V)CVTTPD2DQ, 
(V)CVTTPS2DQ, (V)DIVPD, (V)DIVPS, (V)DPPD*, (V)DPPS*, 
VFMADD132PD, VFMADD213PD, VFMADD231PD, VFMADD132PS, VFMADD213PS, 
VFMADD231PS, VFMADDSUB132PD, VFMADDSUB213PD, VFMADDSUB231PD, 
VFMADDSUB132PS, VFMADDSUB213PS, VFMADDSUB231PS, VFMSUBADD132PD, 
VFMSUBADD213PD, VFMSUBADD231PD, VFMSUBADD132PS, VFMSUBADD213PS, 
VFMSUBADD231PS, VFMSUB132PD, VFMSUB213PD, VFMSUB231PD, 
VFMSUB132PS, VFMSUB213PS, VFMSUB231PS, VFNMADD132PD, 
VFNMADD213PD, VFNMADD231PD, VFNMADD132PS, VFNMADD213PS, 
VFNMADD231PS, VFNMSUB132PD, VFNMSUB213PD, VFNMSUB231PD, 
VFNMSUB132PS, VFNMSUB213PS, VFNMSUB231PS, (V)HADDPD, (V)HADDPS, 
(V)HSUBPD, (V)HSUBPS, (V)MAXPD, (V)MAXPS, (V)MINPD, (V)MINPS, 
(V)MULPD, (V)MULPS, (V)ROUNDPS, (V)ROUNDPS, (V)SQRTPD, (V)SQRTPS, 
(V)SUBPD, (V)SUBPS

(V)ADDSD, (V)ADDSS, (V)CMPSD, (V)CMPSS, (V)COMISD, (V)COMISS, 
(V)CVTPS2PD, (V)CVTSD2SI, (V)CVTSD2SS, (V)CVTSI2SD, (V)CVTSI2SS, 
(V)CVTSS2SD, (V)CVTSS2SI, (V)CVTTSD2SI, (V)CVTTSS2SI, (V)DIVSD, 
(V)DIVSS, VFMADD132SD, VFMADD213SD, VFMADD231SD, VFMADD132SS, 
VFMADD213SS, VFMADD231SS, VFMSUB132SD, VFMSUB213SD, VFMSUB231SD, 
VFMSUB132SS, VFMSUB213SS, VFMSUB231SS, VFNMADD132SD, 
VFNMADD213SD, VFNMADD231SD, VFNMADD132SS, VFNMADD213SS, 
VFNMADD231SS, VFNMSUB132SD, VFNMSUB213SD, VFNMSUB231SD, 
VFNMSUB132SS, VFNMSUB213SS, VFNMSUB231SS, (V)MAXSD, (V)MAXSS, 
(V)MINSD, (V)MINSS, (V)MULSD, (V)MULSS, (V)ROUNDSD, (V)ROUNDSS, 
(V)SQRTSD, (V)SQRTSS, (V)SUBSD, (V)SUBSS, (V)UCOMISD, (V)UCOMISS

VCVTPH2PS, VCVTPS2PH

So I guess Intel floating point exceptions trigger on 
computations, but not on moves?

Ola.

Aug 26 2014

"Don" <x nospam.com> writes:

On Tuesday, 26 August 2014 at 12:37:58 UTC, Ola Fosheim Grøstad 
wrote:
 On Tuesday, 26 August 2014 at 10:55:20 UTC, Don wrote:
 It was always in IEEE754. The decision in 754-2008 was simply 
 to not remove it from the spec (a lot of people wanted to 
 remove it). I don't think anything has changed.

 It was implementation defined before. I think they specified 
 the bit in 2008.

     fld uninitialized_var;
     fstp other_var;

 This is not SSE, but I guess MOVSS does not create exceptions 
 either.

No, it's more subtle. On the original x87, signalling NaNs are 
triggered for 64 bits loads, but not for 80 bit loads. You have 
to read the fine print to discover this. I don't think the 
behaviour was intentional.

Aug 26 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Tuesday, 26 August 2014 at 13:24:11 UTC, Don wrote:
 No, it's more subtle. On the original x87, signalling NaNs are 
 triggered for 64 bits loads, but not for 80 bit loads. You have 
 to read the fine print to discover this.

You are right, but it happens for loads from the FP-stack too: 
«Source operand is an SNaN. Does not occur if the source operand 
is in double extended-precision floating-point format (FLD m80fp 
or FLD ST(i)).»

 I don't think the behaviour was intentional.

It seems reasonable, you need to load/save NaNs without 
exceptions if you do a context switch? I don't think the extended 
format was not meant for "end users".

Anyway, the x87 FP stack is history, even MOVSS is considered 
legacy by Intel…

Aug 26 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Tuesday, 26 August 2014 at 13:43:56 UTC, Ola Fosheim Grøstad 
wrote:
 Anyway, the x87 FP stack is history, even MOVSS is considered 
 legacy by Intel…

Sorry for being off-topic, but MOVSS and VMOVSS on AMD don't 
throw FP exceptions either, but calculations does. So it seems 
like AMD and Intel sufficiently close for D to support NaNs, 
IMHO. Forget the legacy…

http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2012/10/26568_APM_v41.pdf

Ola.

Aug 26 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 8/26/2014 12:24 AM, Don wrote:
 On Monday, 25 August 2014 at 23:29:21 UTC, Walter Bright wrote:
 On 8/25/2014 4:15 PM, "Ola Fosheim Grøstad"
 <ola.fosheim.grostad+dlang gmail.com>" wrote:
 On Monday, 25 August 2014 at 21:24:11 UTC, Walter Bright wrote:
 I didn't know that. But recall I did implement it in DMC++, and it turned out
 to simply not be useful. I'd be surprised if the new C++ support for it does
 anything worthwhile.

 Well, one should initialize with signaling NaN. Then you get an exception if
you
 try to compute using uninitialized values.


 That's the theory. The practice doesn't work out so well.

 To be more concrete:

 Processors from AMD have signalling NaN behaviour which is different from
 processors from Intel.

 And the situation is worst on most other architectures. It's a lost cause, I
think.

The other issues were just when the snan => qnan conversion took place. This is 
quite unclear given the extensive constant folding, CTFE, etc., that D does.

It was also affected by how dmd generates code. Some code gen on floating point 
doesn't need the FPU, such as toggling the sign bit. But then what happens with 
snan => qnan?

The whole thing is an undefined, unmanageable mess.

Aug 27 2014

"Don" <x nospam.com> writes:

On Wednesday, 27 August 2014 at 23:51:54 UTC, Walter Bright wrote:
 On 8/26/2014 12:24 AM, Don wrote:
 On Monday, 25 August 2014 at 23:29:21 UTC, Walter Bright wrote:
 On 8/25/2014 4:15 PM, "Ola Fosheim Grøstad"
 <ola.fosheim.grostad+dlang gmail.com>" wrote:
 On Monday, 25 August 2014 at 21:24:11 UTC, Walter Bright 
 wrote:
 I didn't know that. But recall I did implement it in DMC++, 
 and it turned out
 to simply not be useful. I'd be surprised if the new C++ 
 support for it does
 anything worthwhile.

 Well, one should initialize with signaling NaN. Then you get 
 an exception if you
 try to compute using uninitialized values.


 That's the theory. The practice doesn't work out so well.

 To be more concrete:

 Processors from AMD have signalling NaN behaviour which is 
 different from
 processors from Intel.

 And the situation is worst on most other architectures. It's a 
 lost cause, I think.

 The other issues were just when the snan => qnan conversion 
 took place. This is quite unclear given the extensive constant 
 folding, CTFE, etc., that D does.

 It was also affected by how dmd generates code. Some code gen 
 on floating point doesn't need the FPU, such as toggling the 
 sign bit. But then what happens with snan => qnan?

 The whole thing is an undefined, unmanageable mess.

I think the way to think of it is, to the programmer, there is 
*no such thing* as an snan value. It's an implementation detail 
that should be invisible.
Semantically, a signalling nan is a qnan value with a hardware 
breakpoint on it.

An SNAN should never enter the CPU. The CPU always converts them 
to QNAN if you try. You're kind of not supposed to know that SNAN 
exists.

Because of this, I think SNAN only ever makes sense for static 
variables. Setting local variables to snan doesn't make sense. 
since the snan has to enter the CPU. Making that work without 
triggering the snan is very painful. Making it trigger the snan 
on all forms of access is even worse.

If float.init exists, it cannot be an snan, since you are allowed 
to use float.init.

Aug 28 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Thursday, 28 August 2014 at 11:09:16 UTC, Don wrote:
 I think the way to think of it is, to the programmer, there is 
 *no such thing* as an snan value. It's an implementation detail 
 that should be invisible.
 Semantically, a signalling nan is a qnan value with a hardware 
 breakpoint on it.

I disagree with this view.

QNAN: there is a value, but it does not result in a real

SNAN: the value is missing for an unspecified reason

AFAIK some x86 ops such as ROUNDPD allows you to treat SNAN as 
QNAN or throw an exception. So there is an builtin test if needed.

Other ops such as reciprocals don't throw any FP exceptions and 
will treat SNAN as QNAN.

 An SNAN should never enter the CPU. The CPU always converts 
 them to QNAN if you try. You're kind of not supposed to know 
 that SNAN exists.

I'm not sure how you reached this interpretation?

The solution should be to emit a test for SNAN explicitly or 
implicitly if you cannot prove that SNAN is impossible.

Aug 28 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

Or to be more explicit:

If have SNAN then there is no point in trying to recompute the 
expression using a different algorithm.

If have QNAN then you might want to recompute the expression 
using a different algorithm (e.g. complex numbers or 
analytically).

?

Aug 28 2014

"Don" <x nospam.com> writes:

On Thursday, 28 August 2014 at 12:10:58 UTC, Ola Fosheim Grøstad 
wrote:
 Or to be more explicit:

 If have SNAN then there is no point in trying to recompute the 
 expression using a different algorithm.

 If have QNAN then you might want to recompute the expression 
 using a different algorithm (e.g. complex numbers or 
 analytically).

 ?

No. Once you load an SNAN, it isn't an SNAN any more! It is a 
QNAN.
You cannot have an SNAN in a floating-point register (unless you 
do a nasty hack to pass it in). It gets converted during loading.

const float x = snan;
x = x;

// x is now a qnan.

Aug 28 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Thursday, 28 August 2014 at 14:43:30 UTC, Don wrote:
 No. Once you load an SNAN, it isn't an SNAN any more! It is a 
 QNAN.

By which definition?  It is only if you consume the SNAN with an 
fp-exception-free arithmetic op that it should be turned into a 
QNAN. If you compute with an op that throws then it should throw 
an exception.

MOV should not be viewed as a computation…

It also makes sense to save SNAN to file when converting 
corrupted data-files. SNAN could then mean "corrupted" and QNAN 
could mean "absent". You should not get an exception for loading 
a file. You should get an exception if you start computing on the 
SNAN in the file.

 You cannot have an SNAN in a floating-point register (unless 
 you do a nasty hack to pass it in). It gets converted during 
 loading.

I don't understand this position. If you cannot load SNAN then 
why does SSE handle SNAN in arithmetic ops and compares?

 const float x = snan;
 x = x;

 // x is now a qnan.

I disagree (and why const?)

Assignment does nothing, it should not consume the SNAN. 
Assignment is just "naming". It is not "computing".

Aug 28 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

Let me try again:

SNAN => unfortunately absent

QNAN => deliberately absent

So you can have:

compute(SNAN) => handle(exception) {
    if(can turn unfortunate situation into deliberate)
    then compute(QNAN)
    else throw
)

Aug 28 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

Kahan states this in a 1997 paper:

«[…]An SNaN may be moved ( copied ) without incident, but any 
other arithmetic operation upon an SNaN is an INVALID operation ( 
and so is loading one onto the ix87's stack ) that must trap or 
else produce a new nonsignaling NaN. ( Another way to turn an 
SNaN into a NaN is to turn 0xxx...xxx into 1xxx...xxx with a 
logical OR.) Intended for, among other things, data missing from 
statistical collections, and for uninitialized variables[…]»

( http://www.eecs.berkeley.edu/~wkahan/ieee754status/IEEE754.PDF)

x87 is legacy, it predates IEEE754 by 5 years and should be 
forgotten.

Note also that the string representation for a signalling nan is 
"NANS", so it reasonable to save it to file if you need to 
represent missing data. "NAN" represents 0/0, sqrt(-1), not 
missing data.

I'm not really sure how it can be interpreted differently?

Ola.

Aug 28 2014

"Daniel Murphy" <yebbliesnospam gmail.com> writes:

"Don"  wrote in message news:fvxmsrbicgpqkkiufdyv forum.dlang.org...

 If float.init exists, it cannot be an snan, since you are allowed to use 
 float.init.

So should we get rid of them from the language completely?  Using them as 
template parameters does even respect the sign of the NaN last time I 
checked, let alone the s/q or payload.  If we change float.init to be a qnan 
then it won't be possible to make one at compile time.

Aug 28 2014

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 25.08.2014 15:07, schrieb Don:
 On Thursday, 21 August 2014 at 22:35:18 UTC, Sönke Ludwig wrote:
 Following up on the recent "std.jgrandson" thread [1], I've picked up
 the work (a lot earlier than anticipated) and finished a first version
 of a loose blend of said std.jgrandson, vibe.data.json and some
 changes that I had planned for vibe.data.json for a while. I'm quite
 pleased by the results so far, although without a serialization
 framework it still misses a very important building block.

 Code: https://github.com/s-ludwig/std_data_json
 Docs: http://s-ludwig.github.io/std_data_json/
 DUB: http://code.dlang.org/packages/std_data_json

 The new code contains:
  - Lazy lexer in the form of a token input range (using slices of the
    input if possible)
  - Lazy streaming parser (StAX style) in the form of a node input range
  - Eager DOM style parser returning a JSONValue
  - Range based JSON string generator taking either a token range, a
    node range, or a JSONValue
  - Opt-out location tracking (line/column) for tokens, nodes and values
  - No opDispatch() for JSONValue - this has shown to do more harm than
    good in vibe.data.json

 The DOM style JSONValue type is based on std.variant.Algebraic. This
 currently has a few usability issues that can be solved by
 upgrading/fixing Algebraic:

  - Operator overloading only works sporadically
  - No "tag" enum is supported, so that switch()ing on the type of a
    value doesn't work and an if-else cascade is required
  - Operations and conversions between different Algebraic types is not
    conveniently supported, which gets important when other similar
    formats get supported (e.g. BSON)

 Assuming that those points are solved, I'd like to get some early
 feedback before going for an official review. One open issue is how to
 handle unescaping of string literals. Currently it always unescapes
 immediately, which is more efficient for general input ranges when the
 unescaped result is needed, but less efficient for string inputs when
 the unescaped result is not needed. Maybe a flag could be used to
 conditionally switch behavior depending on the input range type.

 Destroy away! ;)

 [1]: http://forum.dlang.org/thread/lrknjl$co7$1 digitalmars.com


 One missing feature (which is also missing from the existing std.json)
 is support for NaN and Infinity as JSON values. Although they are not
 part of the formal JSON spec (which is a ridiculous omission, the
 argument given for excluding them is fallacious), they do get generated
 if you use Javascript's toString to create the JSON. Many JSON libraries
 (eg Google's) also generate them, so they are frequently encountered in
 practice. So a JSON parser should at least be able to lex them.

 ie this should be parsable:

 {"foo": NaN, "bar": Infinity, "baz": -Infinity}

This would probably best added as another (CT) optional feature. I think 
the default should strictly adhere to the JSON specification, though.

 You should also put tests in for what happens when you pass NaN or
 infinity to toJSON. It shouldn't silently generate invalid JSON.

Good point. The current solution to just use formattedWrite("%.16g") is 
also not ideal.

Aug 25 2014

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 25.08.2014 16:04, schrieb Sönke Ludwig:
Am 25.08.2014 15:07, schrieb Don:
On Thursday, 21 August 2014 at 22:35:18 UTC, Sönke Ludwig wrote:
Following up on the recent "std.jgrandson" thread [1], I've picked up
the work (a lot earlier than anticipated) and finished a first version
of a loose blend of said std.jgrandson, vibe.data.json and some
changes that I had planned for vibe.data.json for a while. I'm quite
pleased by the results so far, although without a serialization
framework it still misses a very important building block.

Code: https://github.com/s-ludwig/std_data_json
Docs: http://s-ludwig.github.io/std_data_json/
DUB: http://code.dlang.org/packages/std_data_json

The new code contains:
- Lazy lexer in the form of a token input range (using slices of the
input if possible)
- Lazy streaming parser (StAX style) in the form of a node input range
- Eager DOM style parser returning a JSONValue
- Range based JSON string generator taking either a token range, a
node range, or a JSONValue
- Opt-out location tracking (line/column) for tokens, nodes and values
- No opDispatch() for JSONValue - this has shown to do more harm than
good in vibe.data.json

The DOM style JSONValue type is based on std.variant.Algebraic. This
currently has a few usability issues that can be solved by
upgrading/fixing Algebraic:

- Operator overloading only works sporadically
- No "tag" enum is supported, so that switch()ing on the type of a
value doesn't work and an if-else cascade is required
- Operations and conversions between different Algebraic types is not
conveniently supported, which gets important when other similar
formats get supported (e.g. BSON)

Assuming that those points are solved, I'd like to get some early
feedback before going for an official review. One open issue is how to
handle unescaping of string literals. Currently it always unescapes
immediately, which is more efficient for general input ranges when the
unescaped result is needed, but less efficient for string inputs when
the unescaped result is not needed. Maybe a flag could be used to
conditionally switch behavior depending on the input range type.

Destroy away! ;)

[1]: http://forum.dlang.org/thread/lrknjl$co7$1 digitalmars.com

One missing feature (which is also missing from the existing std.json)
is support for NaN and Infinity as JSON values. Although they are not
part of the formal JSON spec (which is a ridiculous omission, the
argument given for excluding them is fallacious), they do get generated
if you use Javascript's toString to create the JSON. Many JSON libraries
(eg Google's) also generate them, so they are frequently encountered in
practice. So a JSON parser should at least be able to lex them.

ie this should be parsable:

{"foo": NaN, "bar": Infinity, "baz": -Infinity}

This would probably best added as another (CT) optional feature. I think
the default should strictly adhere to the JSON specification, though.

http://s-ludwig.github.io/std_data_json/stdx/data/json/lexer/LexOptions.specialFloatLiterals.html

You should also put tests in for what happens when you pass NaN or
infinity to toJSON. It shouldn't silently generate invalid JSON.

Good point. The current solution to just use formattedWrite("%.16g") is
also not ideal.

By default, floating-point special values are now output as 'null',
according to the ECMA-script standard. Optionally, they will be emitted
as 'NaN' and 'Infinity':

http://s-ludwig.github.io/std_data_json/stdx/data/json/generator/GeneratorOptions.specialFloatLiterals.html

Aug 25 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Monday, 25 August 2014 at 15:34:29 UTC, Sönke Ludwig wrote:
 By default, floating-point special values are now output as 
 'null', according to the ECMA-script standard. Optionally, they 
 will be emitted as 'NaN' and 'Infinity':

ECMAScript presumes double. I think one should base Phobos on 
language-independent standards. I suggest:

http://tools.ietf.org/html/rfc7159

For a web server it would be most useful to get an exception 
since you risk ending up with web-clients not working with no 
logging. It is better to have an exception and log an error so 
the problem can be fixed.

Aug 25 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Monday, 25 August 2014 at 15:46:12 UTC, Ola Fosheim Grøstad 
wrote:
 For a web server it would be most useful to get an exception 
 since you risk ending up with web-clients not working with no 
 logging. It is better to have an exception and log an error so 
 the problem can be fixed.

Let me expand a bit on the difference between web clients and 
servers, assuming D is used on the server:

* Web servers have to check all input and log illegal activity. 
It is either a bug or an attack.

* Web clients don't have to check input from the server (at most 
a crypto check) and should not do double work if servers validate 
anyway.

* Web servers detect errors and send the error as a response to 
the client that displays it as a warning to the user. This is the 
uncommon case so you don't want to burden the client with it.

 From this we can infer:

- It makes more sense for ECMAScript to turn illegal values into 
null since it runs on the client.

- The server needs efficient validation of input so that it can 
have faster response.

- The more integration of validation of typedness you can have in 
the parser, the better.


Thus it would be an advantage to be able to configure the 
validation done in the parser (through template mechanisms):


1. On write: throw exception on all illegal values or values that 
cannot be represented in the format. If the values are illegal 
then the client should not receive it. It could cause legal 
problems (like wrong prices).


2. On read: add the ability to configure the validation of 
typedness on many parameters:

- no nulls, no dicts, only nesting arrays etc

- predetermined key-values and automatic mapping to structs on 
exact match.

- require all leaf arrays to be uniform (array of strings, array 
of numbers)

- match a predefined grammar

etc

Aug 25 2014

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

 - It makes more sense for ECMAScript to turn illegal values into null
 since it runs on the client.

Like... node.js?

Sorry, just kidding.

I don't think it makes sense for clients to be less strict about such 
things, but I do agree with your assessment about being as strict as 
possible on the server. I also do think that exceptions are a perfect 
tool especially for server applications and that instead of avoiding 
them because they are slow, they should better be made fast enough to 
not be an issue.

Aug 25 2014

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 25.08.2014 17:46, schrieb "Ola Fosheim Grøstad" 
<ola.fosheim.grostad+dlang gmail.com>":
 On Monday, 25 August 2014 at 15:34:29 UTC, Sönke Ludwig wrote:
 By default, floating-point special values are now output as 'null',
 according to the ECMA-script standard. Optionally, they will be
 emitted as 'NaN' and 'Infinity':

 ECMAScript presumes double. I think one should base Phobos on
 language-independent standards. I suggest:

 http://tools.ietf.org/html/rfc7159

Well, of course it's based on that RFC, did you seriously think 
something else? However, that standard has no mention of infinity or 
NaN, and since JSON is designed to be a subset of ECMA script, it's 
basically the only thing that comes close.

 For a web server it would be most useful to get an exception since you
 risk ending up with web-clients not working with no logging. It is
 better to have an exception and log an error so the problem can be fixed.

Although you have a point there of course, it's also highly unlikely 
that those clients would work correctly if we presume that JSON 
supported infinity/NaN. So it would really be just coincidence to detect 
a bug like that.

But I generally agree, it's just that the anti-exception voices are 
pretty loud these days (including Walter's), so that I opted for a 
non-throwing solution instead. I guess it wouldn't hurt though to 
default to throwing an exception, while still providing the 
GeneratorOptions.specialFloatLiterals option to handle those values 
without exception overhead, but in a non standard-conforming way.

Aug 25 2014

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 25.08.2014 22:21, schrieb Sönke Ludwig:
 that standard has no mention of infinity or
 NaN

Sorry, to be precise, it has no suggestion of how to *handle* infinity 
or NaN.

Aug 25 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Monday, 25 August 2014 at 20:21:01 UTC, Sönke Ludwig wrote:
 Well, of course it's based on that RFC, did you seriously think 
 something else?

I made no assumptions, just responded to what you wrote :-). It 
would be reasonable in the context of vibe.d to assume the 
ECMAScript spec.

 But I generally agree, it's just that the anti-exception voices 
 are pretty loud these days (including Walter's), so that I 
 opted for a non-throwing solution instead.

Yes, the minimum requirement is to just get "did not validate" 
directly as a single value. One can create a wrapper to get 
exceptions.

 I guess it wouldn't hurt though to default to throwing an 
 exception, while still providing the 
 GeneratorOptions.specialFloatLiterals option to handle those 
 values without exception overhead, but in a non 
 standard-conforming way.

What I care most about is getting all the free validation that 
can be added with no extra cost.

That will make writing web services easier. Like if you can 
define constraints like:

- root is array, values are strings.
- root is array, second level only arrays, third level is numbers
- root is dict, all arrays contain only numbers

What is a bit annoying about generic libs is that you have no 
idea what you are getting so you have to spend time creating dull 
validation code.

But maybe StructuredJSON should be a separate library. It would 
be useful for REST services to specify the grammar and 
auto-generate both javascript and D structures to hold it along 
with validation code.

However, just turning off parsing of "true", "false", "null", 
"[", "{" etc seems like a cheap addition that also can improve 
parsing speed if the compiler can make do with two if statements 
instead of a switch.

Ola.

Aug 25 2014

"Don" <x nospam.com> writes:

On Monday, 25 August 2014 at 14:04:12 UTC, Sönke Ludwig wrote:
 Am 25.08.2014 15:07, schrieb Don:
 On Thursday, 21 August 2014 at 22:35:18 UTC, Sönke Ludwig 
 wrote:
 Following up on the recent "std.jgrandson" thread [1], I've 
 picked up
 the work (a lot earlier than anticipated) and finished a 
 first version
 of a loose blend of said std.jgrandson, vibe.data.json and 
 some
 changes that I had planned for vibe.data.json for a while. 
 I'm quite
 pleased by the results so far, although without a 
 serialization
 framework it still misses a very important building block.

 Code: https://github.com/s-ludwig/std_data_json
 Docs: http://s-ludwig.github.io/std_data_json/
 DUB: http://code.dlang.org/packages/std_data_json

 The new code contains:
 - Lazy lexer in the form of a token input range (using slices 
 of the
   input if possible)
 - Lazy streaming parser (StAX style) in the form of a node 
 input range
 - Eager DOM style parser returning a JSONValue
 - Range based JSON string generator taking either a token 
 range, a
   node range, or a JSONValue
 - Opt-out location tracking (line/column) for tokens, nodes 
 and values
 - No opDispatch() for JSONValue - this has shown to do more 
 harm than
   good in vibe.data.json

 The DOM style JSONValue type is based on 
 std.variant.Algebraic. This
 currently has a few usability issues that can be solved by
 upgrading/fixing Algebraic:

 - Operator overloading only works sporadically
 - No "tag" enum is supported, so that switch()ing on the type 
 of a
   value doesn't work and an if-else cascade is required
 - Operations and conversions between different Algebraic 
 types is not
   conveniently supported, which gets important when other 
 similar
   formats get supported (e.g. BSON)

 Assuming that those points are solved, I'd like to get some 
 early
 feedback before going for an official review. One open issue 
 is how to
 handle unescaping of string literals. Currently it always 
 unescapes
 immediately, which is more efficient for general input ranges 
 when the
 unescaped result is needed, but less efficient for string 
 inputs when
 the unescaped result is not needed. Maybe a flag could be 
 used to
 conditionally switch behavior depending on the input range 
 type.

 Destroy away! ;)

 [1]: 
 http://forum.dlang.org/thread/lrknjl$co7$1 digitalmars.com


 One missing feature (which is also missing from the existing 
 std.json)
 is support for NaN and Infinity as JSON values. Although they 
 are not
 part of the formal JSON spec (which is a ridiculous omission, 
 the
 argument given for excluding them is fallacious), they do get 
 generated
 if you use Javascript's toString to create the JSON. Many JSON 
 libraries
 (eg Google's) also generate them, so they are frequently 
 encountered in
 practice. So a JSON parser should at least be able to lex them.

 ie this should be parsable:

 {"foo": NaN, "bar": Infinity, "baz": -Infinity}

 This would probably best added as another (CT) optional 
 feature. I think the default should strictly adhere to the JSON 
 specification, though.

Yes, it should be optional, but not a compile-time option.
I think it should parse it, and based on a runtime flag, throw an 
error (perhaps an OutOfRange error or something, and use the same 
thing for values that exceed the representable range).

An app may accept these non-standard values under certain 
circumstances and not others. In real-world code, you see a *lot* 
of these guys.

Part of the reason these are important, is that NaN or Infinity 
generally means some Javascript code just has an uninitialized 
variable. Any other kind of invalid JSON typically means 
something very nasty has happened. It's important to distinguish 
these.

Aug 26 2014

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 26.08.2014 15:43, schrieb Don:
 On Monday, 25 August 2014 at 14:04:12 UTC, Sönke Ludwig wrote:
 Am 25.08.2014 15:07, schrieb Don:
 ie this should be parsable:

 {"foo": NaN, "bar": Infinity, "baz": -Infinity}

 This would probably best added as another (CT) optional feature. I
 think the default should strictly adhere to the JSON specification,
 though.

 Yes, it should be optional, but not a compile-time option.
 I think it should parse it, and based on a runtime flag, throw an error
 (perhaps an OutOfRange error or something, and use the same thing for
 values that exceed the representable range).

 An app may accept these non-standard values under certain circumstances
 and not others. In real-world code, you see a *lot* of these guys.

Why not a compile time option?

That sounds to me like such an app should simply enable parsing those 
values and manually test for NaN at places where it matters. For all 
other (the majority) of applications, encountering NaN/Infinity will 
simply mean that there is a bug, so it makes sense to not accept those 
at all by default.

Apart from that I don't think that it's a good idea for the lexer in 
general to accept non-standard input by default.

 Part of the reason these are important, is that NaN or Infinity
 generally means some Javascript code just has an uninitialized variable.
 Any other kind of invalid JSON typically means something very nasty has
 happened. It's important to distinguish these.

As far as I understood, JavaScript will output those special values as 
null (at least when not using external JSON libraries). But even if not, 
an uninitialized variable can also be very nasty, so it's hard to see 
why that kind of bug should be silently supported (by default).

Aug 26 2014

"Don" <x nospam.com> writes:

On Tuesday, 26 August 2014 at 14:06:42 UTC, Sönke Ludwig wrote:
 Am 26.08.2014 15:43, schrieb Don:
 On Monday, 25 August 2014 at 14:04:12 UTC, Sönke Ludwig wrote:
 Am 25.08.2014 15:07, schrieb Don:
 ie this should be parsable:

 {"foo": NaN, "bar": Infinity, "baz": -Infinity}

 This would probably best added as another (CT) optional 
 feature. I
 think the default should strictly adhere to the JSON 
 specification,
 though.

 Yes, it should be optional, but not a compile-time option.
 I think it should parse it, and based on a runtime flag, throw 
 an error
 (perhaps an OutOfRange error or something, and use the same 
 thing for
 values that exceed the representable range).

 An app may accept these non-standard values under certain 
 circumstances
 and not others. In real-world code, you see a *lot* of these 
 guys.

 Why not a compile time option?

 That sounds to me like such an app should simply enable parsing 
 those values and manually test for NaN at places where it 
 matters.
 For all other (the majority) of applications, encountering 
 NaN/Infinity will simply mean that there is a bug, so it makes 
 sense to not accept those at all by default.

 Apart from that I don't think that it's a good idea for the 
 lexer in general to accept non-standard input by default.

Please note, I've been talking about the lexer. I'm choosing my 
words very carefully.

 Part of the reason these are important, is that NaN or Infinity
 generally means some Javascript code just has an uninitialized 
 variable.
 Any other kind of invalid JSON typically means something very 
 nasty has
 happened. It's important to distinguish these.

 As far as I understood, JavaScript will output those special 
 values as null (at least when not using external JSON 
 libraries).

No. Javascript generates them directly. Naive JS code generates 
these guys. That's why they're so important.

 But even if not, an uninitialized variable can also be very 
 nasty, so it's hard to see why that kind of bug should be 
 silently supported (by default).

I never said it should accepted by default. I said it is a 
situation which should be *lexed*. Ideally, by default it should 
give a different error from simply 'invalid JSON'. I believe it 
should ALWAYS be lexed, even if an error is ultimately generated.

This is the difference: if you get NaN or Infinity, there's 
probably a straightforward bug in the Javascript code, but your D 
code is fine. Any other kind of JSON parsing error means you've 
got a garbage string that isn't JSON at all. They are very 
different errors.
It's a diagnostics issue.

Aug 26 2014

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 26.08.2014 16:40, schrieb Don:
 On Tuesday, 26 August 2014 at 14:06:42 UTC, Sönke Ludwig wrote:
 Am 26.08.2014 15:43, schrieb Don:
 On Monday, 25 August 2014 at 14:04:12 UTC, Sönke Ludwig wrote:
 Am 25.08.2014 15:07, schrieb Don:
 ie this should be parsable:

 {"foo": NaN, "bar": Infinity, "baz": -Infinity}

 This would probably best added as another (CT) optional feature. I
 think the default should strictly adhere to the JSON specification,
 though.

 Yes, it should be optional, but not a compile-time option.
 I think it should parse it, and based on a runtime flag, throw an error
 (perhaps an OutOfRange error or something, and use the same thing for
 values that exceed the representable range).

 An app may accept these non-standard values under certain circumstances
 and not others. In real-world code, you see a *lot* of these guys.

 Why not a compile time option?

 That sounds to me like such an app should simply enable parsing those
 values and manually test for NaN at places where it matters.
 For all other (the majority) of applications, encountering
 NaN/Infinity will simply mean that there is a bug, so it makes sense
 to not accept those at all by default.

 Apart from that I don't think that it's a good idea for the lexer in
 general to accept non-standard input by default.

 Please note, I've been talking about the lexer. I'm choosing my words
 very carefully.

I've been talking about the lexer, too. Sorry for the confusing use of 
the term "parsing" (after all, the lexer is also a parser, but anyway).

 Part of the reason these are important, is that NaN or Infinity
 generally means some Javascript code just has an uninitialized variable.
 Any other kind of invalid JSON typically means something very nasty has
 happened. It's important to distinguish these.

 As far as I understood, JavaScript will output those special values as
 null (at least when not using external JSON libraries).

 No. Javascript generates them directly. Naive JS code generates these
 guys. That's why they're so important.

JSON.stringify(0/0) == "null"

Holds for all browsers that I've tested.

 But even if not, an uninitialized variable can also be very nasty, so
 it's hard to see why that kind of bug should be silently supported (by
 default).

 I never said it should accepted by default. I said it is a situation
 which should be *lexed*. Ideally, by default it should give a different
 error from simply 'invalid JSON'. I believe it should ALWAYS be lexed,
 even if an error is ultimately generated.

 This is the difference: if you get NaN or Infinity, there's probably a
 straightforward bug in the Javascript code, but your D code is fine. Any
 other kind of JSON parsing error means you've got a garbage string that
 isn't JSON at all. They are very different errors.
 It's a diagnostics issue.

The error will be more like "filename(line:column): Invalid token" - 
possibly the text following the line/column could also be displayed. 
Wouldn't that be sufficient?

Aug 26 2014

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 26.08.2014 16:51, schrieb Sönke Ludwig:
 Am 26.08.2014 16:40, schrieb Don:
 This is the difference: if you get NaN or Infinity, there's probably a
 straightforward bug in the Javascript code, but your D code is fine. Any
 other kind of JSON parsing error means you've got a garbage string that
 isn't JSON at all. They are very different errors.
 It's a diagnostics issue.

 The error will be more like "filename(line:column): Invalid token" -
 possibly the text following the line/column could also be displayed.
 Wouldn't that be sufficient?

One argument against supporting it in the parser is that the parser 
currently works without any configuration, but the user would then have 
to specify two sets of configuration options with this added.

Aug 26 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Tuesday, 26 August 2014 at 14:40:02 UTC, Don wrote:
 This is the difference: if you get NaN or Infinity, there's 
 probably a straightforward bug in the Javascript code, but your 
 D code is fine. Any other kind of JSON parsing error means 
 you've got a garbage string that isn't JSON at all. They are 
 very different errors.

I don't care either way, but JSON.stringify() has the following 
support:

IE8 and up
Firefox 3.5 and up
Safari 4 and up
Chrome

So not using it is very much legacy…

Aug 26 2014

"Entusiastic user" <cncgeneralsfan999 abv.bg> writes:

Hi!

Thanks for the effort you've put in this.

I am having problems with building with LDC 0.14.0. DMD 2.066.0
seems to work fine (all unit tests pass). Do you have any ideas
why?

I am using Ubuntu 3.10 (Linux 3.11.0-15-generic x86_64).

Master was at 6a9f8e62e456c3601fe8ff2e1fbb640f38793d08.
$ dub fetch std_data_json --version=~master
$ cd std_data_json-master/
$ dub test --compiler=ldc2

Generating test runner configuration '__test__library__' for
'library' (library).
Building std_data_json ~master configuration "__test__library__",
build type unittest.
Running ldc2...
source/stdx/data/json/parser.d(77): Error: safe function
'stdx.data.json.parser.__unittestL68_22' cannot call system
function 'object.AssociativeArray!(string,
JSONValue).AssociativeArray.length'
source/stdx/data/json/parser.d(124): Error: safe function
'stdx.data.json.parser.__unittestL116_24' cannot call system
function 'object.AssociativeArray!(string,
JSONValue).AssociativeArray.length'
source/stdx/data/json/parser.d(341): Error: function
stdx.data.json.parser.JSONParserRange!(JSONLexerRange!string).JSONParserRange.opAssign
is not callable because it is annotated with  disable
source/stdx/data/json/parser.d(341): Error: safe function
'stdx.data.json.parser.__unittestL318_32' cannot call system
function
'stdx.data.json.parser.JSONParserRange!(JSONLexerRange!string).JSONParserRange.opAssign'
source/stdx/data/json/parser.d(633): Error: function
stdx.data.json.lexer.JSONToken.opAssign is not callable because
it is annotated with  disable
source/stdx/data/json/parser.d(633): Error:
'stdx.data.json.lexer.JSONToken.opAssign' is not nothrow
source/stdx/data/json/parser.d(630): Error: function
'stdx.data.json.parser.JSONParserNode.literal' is nothrow yet may
throw
FAIL
.dub/build/__test__library__-unittest-linux.posix-x86_64-ldc2-0F620B217010475A5A4E545A57CDD09A/
__test__library__ executable
Error executing command test: ldc2 failed with exit code 1.

Thanks

Aug 25 2014

"Entusiastic user" <cncgeneralsfan999 abv.bg> writes:

 ...
 I am using Ubuntu 3.10 (Linux 3.11.0-15-generic x86_64).
 ...

I meant Ubuntu 13.10 :D

Aug 25 2014

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 26.08.2014 03:31, schrieb Entusiastic user:
 Hi!

 Thanks for the effort you've put in this.

 I am having problems with building with LDC 0.14.0. DMD 2.066.0
 seems to work fine (all unit tests pass). Do you have any ideas
 why?

I've fixed all errors on DMD 2.065 now. Hopefully that should also fix LDC.

Aug 26 2014

"David Soria Parra" <davidsp fb.com> writes:

On Thursday, 21 August 2014 at 22:35:18 UTC, Sönke Ludwig wrote:
 Following up on the recent "std.jgrandson" thread [1], I've 
 picked up the work (a lot earlier than anticipated) and 
 finished a first version of a loose blend of said 
 std.jgrandson, vibe.data.json and some changes that I had 
 planned for vibe.data.json for a while. I'm quite pleased by 
 the results so far, although without a serialization framework 
 it still misses a very important building block.

 Code: https://github.com/s-ludwig/std_data_json
 Docs: http://s-ludwig.github.io/std_data_json/
 DUB: http://code.dlang.org/packages/std_data_json

Do we have any benchmarks for this yet. Note that the main
motivation for a new json parsers was that std.json is remarkable
slow in comparison to python's json or ujson.

Aug 26 2014

"Atila Neves" <atila.neves gmail.com> writes:

Been using it for a bit now, I think the only thing I have to say 
is having to insert all of those `JSONValue` everywhere is 
tiresome and I never know when I have to do it.

Atila

On Thursday, 21 August 2014 at 22:35:18 UTC, Sönke Ludwig wrote:
 Following up on the recent "std.jgrandson" thread [1], I've 
 picked up the work (a lot earlier than anticipated) and 
 finished a first version of a loose blend of said 
 std.jgrandson, vibe.data.json and some changes that I had 
 planned for vibe.data.json for a while. I'm quite pleased by 
 the results so far, although without a serialization framework 
 it still misses a very important building block.

 Code: https://github.com/s-ludwig/std_data_json
 Docs: http://s-ludwig.github.io/std_data_json/
 DUB: http://code.dlang.org/packages/std_data_json

 The new code contains:
  - Lazy lexer in the form of a token input range (using slices 
 of the
    input if possible)
  - Lazy streaming parser (StAX style) in the form of a node 
 input range
  - Eager DOM style parser returning a JSONValue
  - Range based JSON string generator taking either a token 
 range, a
    node range, or a JSONValue
  - Opt-out location tracking (line/column) for tokens, nodes 
 and values
  - No opDispatch() for JSONValue - this has shown to do more 
 harm than
    good in vibe.data.json

 The DOM style JSONValue type is based on std.variant.Algebraic. 
 This currently has a few usability issues that can be solved by 
 upgrading/fixing Algebraic:

  - Operator overloading only works sporadically
  - No "tag" enum is supported, so that switch()ing on the type 
 of a
    value doesn't work and an if-else cascade is required
  - Operations and conversions between different Algebraic types 
 is not
    conveniently supported, which gets important when other 
 similar
    formats get supported (e.g. BSON)

 Assuming that those points are solved, I'd like to get some 
 early feedback before going for an official review. One open 
 issue is how to handle unescaping of string literals. Currently 
 it always unescapes immediately, which is more efficient for 
 general input ranges when the unescaped result is needed, but 
 less efficient for string inputs when the unescaped result is 
 not needed. Maybe a flag could be used to conditionally switch 
 behavior depending on the input range type.

 Destroy away! ;)

 [1]: http://forum.dlang.org/thread/lrknjl$co7$1 digitalmars.com

Sep 08 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Here's my destruction of std.data.json.

* lexer.d:

** Beautifully done. From what I understand, if the input is string or 
immutable(ubyte)[] then the strings are carved out as slices of the 
input, as opposed to newly allocated. Awesome.

** The string after lexing is correctly scanned and stored in raw format 
(escapes are not rewritten) and decoded on demand. Problem with decoding 
is that it may allocate memory, and it would be great (and not 
difficult) to make the lexer 100% lazy/non-allocating. To achieve that, 
lexer.d should define TWO "Kind"s of strings at the lexer level: regular 
string and undecoded string. The former is lexer.d's way of saying "I 
got lucky" in the sense that it didn't detect any '\\' so the raw and 
decoded strings are identical. No need for anyone to do any further 
processing in the majority of cases => win. The latter means the lexer 
lexed the string, saw at least one '\\', and leaves it to the caller to 
do the actual decoding.

** After moving the decoding business out of lexer.d, a way to take this 
further would be to qualify lexer methods as  nogc if the input is 
string/immutable(ubyte)[]. I wonder how to implement a conditional 
attribute. We'll probably need a language enhancement for that.

** The implementation uses manually-defined tagged unions for work. 
Could we use Algebraic instead - dogfooding and all that? I recall there 
was a comment in S�nke's original work that Algebraic has a specific 
issue (was it false pointers?) - so the question arises, should we fix 
Algebraic and use it thus helping other uses as well?

** I see the "boolean" kind, should we instead have the "true_" and 
"false_" kinds?

** Long story short I couldn't find any major issue with this module, 
and I looked! I do think the decoding logic should be moved outside of 
lexer.d or at least the JSONLexerRange.

* generator.d: looking good, no special comments. Like the consistent 
use of structs filled with options as template parameters.

* foundation.d:

** At four words per token, Location seems pretty bulky. How about 
reducing line and column to uint?

** Could JSONException create the message string in toString (i.e. 
when/if used) as opposed to in the constructor?

* parser.d:

** How about using .init instead of .defaults for options?

** I'm a bit surprised by JSONParserNode.Kind. E.g. the objectStart/End 
markers shouldn't appear as nodes. There should be an "object" node 
only. I guess that's needed for laziness.

** It's unclear where memory is being allocated in the parser.  nogc 
annotations wherever appropriate would be great.

* value.d:

** Looks like this is/may be the only place where memory is being 
managed, at least if the input is string/immutable(ubyte)[]. Right?

** Algebraic ftw.

============================

Overall: This is very close to everything I hoped! A bit more care to 
 nogc would be awesome, especially with the upcoming focus on memory 
management going forward.

After one more pass it would be great to move forward for review.


Andrei

Oct 12 2014

"Sean Kelly" <sean invisibleduck.org> writes:

On Sunday, 12 October 2014 at 18:17:29 UTC, Andrei Alexandrescu 
wrote:
 ** The string after lexing is correctly scanned and stored in 
 raw format (escapes are not rewritten) and decoded on demand. 
 Problem with decoding is that it may allocate memory, and it 
 would be great (and not difficult) to make the lexer 100% 
 lazy/non-allocating. To achieve that, lexer.d should define TWO 
 "Kind"s of strings at the lexer level: regular string and 
 undecoded string. The former is lexer.d's way of saying "I got 
 lucky" in the sense that it didn't detect any '\\' so the raw 
 and decoded strings are identical. No need for anyone to do any 
 further processing in the majority of cases => win. The latter 
 means the lexer lexed the string, saw at least one '\\', and 
 leaves it to the caller to do the actual decoding.

I'd like to see unescapeStringLiteral() made public.  Then I can 
unescape multiple strings to the same preallocated destination, 
or even unescape in place (guaranteed to work since the result 
will always be smaller than the input).

Oct 12 2014

"Sean Kelly" <sean invisibleduck.org> writes:

Oh, it looks like you aren't checking for 0x7F (DEL) as a control 
character.

Oct 12 2014

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 12.10.2014 23:52, schrieb Sean Kelly:
 Oh, it looks like you aren't checking for 0x7F (DEL) as a control
 character.

It doesn't get mentioned in the JSON spec, so I left it out. But I guess 
nothing speaks against adding it anyway.

Oct 13 2014

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 12.10.2014 21:04, schrieb Sean Kelly:
 I'd like to see unescapeStringLiteral() made public.  Then I can
 unescape multiple strings to the same preallocated destination, or even
 unescape in place (guaranteed to work since the result will always be
 smaller than the input).

Will do. Same for the inverse functions.

Oct 13 2014

=?ISO-8859-15?Q?S=F6nke_Ludwig?= <sludwig rejectedsoftware.com> writes:

Am 12.10.2014 20:17, schrieb Andrei Alexandrescu:
 Here's my destruction of std.data.json.

 * lexer.d:

 ** Beautifully done. From what I understand, if the input is string or
 immutable(ubyte)[] then the strings are carved out as slices of the
 input, as opposed to newly allocated. Awesome.

 ** The string after lexing is correctly scanned and stored in raw format
 (escapes are not rewritten) and decoded on demand. Problem with decoding
 is that it may allocate memory, and it would be great (and not
 difficult) to make the lexer 100% lazy/non-allocating. To achieve that,
 lexer.d should define TWO "Kind"s of strings at the lexer level: regular
 string and undecoded string. The former is lexer.d's way of saying "I
 got lucky" in the sense that it didn't detect any '\\' so the raw and
 decoded strings are identical. No need for anyone to do any further
 processing in the majority of cases => win. The latter means the lexer
 lexed the string, saw at least one '\\', and leaves it to the caller to
 do the actual decoding.

This is actually more or less done in unescapeStringLiteral() - if it 
doesn't find any '\\', it just returns the original string. Also 
JSONString allows to access its .rawValue without doing any 
decoding/allocations.

https://github.com/s-ludwig/std_data_json/blob/master/source/stdx/data/json/lexer.d#L1421

Unfortunately .rawValue can't be  nogc because the "raw" value might 
have to be constructed first when the input is not a "string" (in this 
case unescaping is done on-the-fly for efficiency reasons).


 ** After moving the decoding business out of lexer.d, a way to take this
 further would be to qualify lexer methods as  nogc if the input is
 string/immutable(ubyte)[]. I wonder how to implement a conditional
 attribute. We'll probably need a language enhancement for that.

Isn't  nogc inferred? Everything is templated, so that should be 
possible. Or does attribute inference only work for template function 
and not for methods of templated types? Should it?

 ** The implementation uses manually-defined tagged unions for work.
 Could we use Algebraic instead - dogfooding and all that? I recall there
 was a comment in S�nke's original work that Algebraic has a specific
 issue (was it false pointers?) - so the question arises, should we fix
 Algebraic and use it thus helping other uses as well?

I had started on an implementation of a type and ID safe TaggedAlgebraic 
that uses Algebraic for its internal storage. If we can get that in 
first, it should be no problem to use it instead (with no or minimal API 
breakage). However, it uses a struct instead of an enum to define the 
"Kind" (which is the only nice way I could conceive to safely couple 
enum value and type at compile time), so it's not as nice in the 
generated documentation.

 ** I see the "boolean" kind, should we instead have the "true_" and
 "false_" kinds?

I always found it cumbersome and awkward to work like that. What would 
be the reason to go that route?

 ** Long story short I couldn't find any major issue with this module,
 and I looked! I do think the decoding logic should be moved outside of
 lexer.d or at least the JSONLexerRange.

 * generator.d: looking good, no special comments. Like the consistent
 use of structs filled with options as template parameters.

 * foundation.d:

 ** At four words per token, Location seems pretty bulky. How about
 reducing line and column to uint?

Single line JSON files >64k (or line counts >64k) are no exception, so 
that would only work in a limited way. My thought about this was that it 
is quite unusual to actually store the tokens for most purposes 
(especially when directly serializing to a native D type), so that it 
should have minimal impact on performance or memory consumption.

 ** Could JSONException create the message string in toString (i.e.
 when/if used) as opposed to in the constructor?

That could of course be done, but the you'd not get the full error 
message using ex.msg, only with ex.toString(), which usually prints a 
call trace instead. Alternatively, it's also possible to completely 
avoid using exceptions with LexOptions.noThrow.

 * parser.d:

 ** How about using .init instead of .defaults for options?

I'd slightly tend to prefer the more explicit "defaults", especially 
because "init" could mean either "defaults" or "none" (currently it 
means "none"). But another idea would be to invert the option values so 
that defaults==none... any objections?

 ** I'm a bit surprised by JSONParserNode.Kind. E.g. the objectStart/End
 markers shouldn't appear as nodes. There should be an "object" node
 only. I guess that's needed for laziness.

While you could infer the end of an object in the parser range by 
looking for the first entry that doesn't start with a "key" node, the 
same would not be possible for arrays, so in general the end marker *is* 
required. Not that the parser range is a StAX style parser, which is 
still very close to the lexical structure of the document.

I was also wondering if there might be a better name than 
"JSONParserNode". It's not really embedded into a tree or graph 
structure, which the name tends to suggest.

 ** It's unclear where memory is being allocated in the parser.  nogc
 annotations wherever appropriate would be great.

The problem is that the parser accesses the lexer, which in turn 
accesses the underlying input range, which in turn could allocate. 
Depending on the options passed to the lexer, it could also throw, and 
thus allocate, an exception. In the end only JSONParserRange.empty could 
generally be made  nogc.

However, attribute inference should be possible here in theory (the 
noThrow option is compile-time).

 * value.d:

 ** Looks like this is/may be the only place where memory is being
 managed, at least if the input is string/immutable(ubyte)[]. Right?

Yes, at least when setting aside optional exceptions and lazy allocations.

 ** Algebraic ftw.

 ============================

 Overall: This is very close to everything I hoped! A bit more care to
  nogc would be awesome, especially with the upcoming focus on memory
 management going forward.

I've tried to use  nogc (as well as nothrow) in more places, but mostly 
due to not knowing if the underlying input range allocates, it hasn't 
really been possible. Even on lower levels (private functions) almost 
any Phobos function that is called is currently not  nogc for reasons 
that are not always obvious, so I gave up on that for now.

 After one more pass it would be great to move forward for review.

There is also still one pending change that I didn't finish yet; the 
optional UTF input validation (never validate "string" inputs, but do 
validate "ubyte[]" inputs).

Oh and there is the open issue of how to allocate in case of non-array 
inputs. Initially I wanted to wait with this until we have an allocators 
module, but Walter would like to have a way to do manual memory 
management in the initial version. However, the ideal design is still 
unclear to me - it would either simply resemble a general allocator 
interface, or could use something like a callback that returns an output 
range, which would probably be quite cumbersome to work with. Any ideas 
in this direction would be welcome.

S�nke

Oct 13 2014

Jacob Carlborg <doob me.com> writes:

On 13/10/14 09:39, S�nke Ludwig wrote:

 ** At four words per token, Location seems pretty bulky. How about
 reducing line and column to uint?

 Single line JSON files >64k (or line counts >64k) are no exception

64k?

-- 
/Jacob Carlborg

Oct 13 2014

=?ISO-8859-15?Q?S=F6nke_Ludwig?= <sludwig rejectedsoftware.com> writes:

Am 13.10.2014 13:33, schrieb Jacob Carlborg:
 On 13/10/14 09:39, S�nke Ludwig wrote:

 ** At four words per token, Location seems pretty bulky. How about
 reducing line and column to uint?

 Single line JSON files >64k (or line counts >64k) are no exception

 64k?

Oh, I've read "both line and column into a single uint", because of 
"four words per token" - considering that "word == 16bit", but Andrei 
obviously meant "word == (void*).sizeof". If simply using uint instead 
of size_t is meant, then that's of course a different thing.

Oct 13 2014

"Daniel Murphy" <yebbliesnospam gmail.com> writes:

"S�nke Ludwig"  wrote in message news:m1ge08$10ub$1 digitalmars.com...

 Oh, I've read "both line and column into a single uint", because of "four 
 words per token" - considering that "word == 16bit", but Andrei obviously 
 meant "word == (void*).sizeof". If simply using uint instead of size_t is 
 meant, then that's of course a different thing.

I suppose a 4GB single-line json file is still possible.

Oct 13 2014

=?ISO-8859-15?Q?S=F6nke_Ludwig?= <sludwig rejectedsoftware.com> writes:

Am 13.10.2014 16:36, schrieb Daniel Murphy:
 "S�nke Ludwig"  wrote in message news:m1ge08$10ub$1 digitalmars.com...

 Oh, I've read "both line and column into a single uint", because of
 "four words per token" - considering that "word == 16bit", but Andrei
 obviously meant "word == (void*).sizeof". If simply using uint instead
 of size_t is meant, then that's of course a different thing.

 I suppose a 4GB single-line json file is still possible.

If we make that assumption, we'd have to change it from size_t to ulong, 
but my feeling is that this case (format error at >4GB && human tries to 
look at that place using an editor) should be rare enough that we can 
make the compromise in favor of a smaller struct size.

Oct 13 2014

"Kiith-Sa" <kiithsacmp gmail.com> writes:

On Monday, 13 October 2014 at 17:21:44 UTC, Sönke Ludwig wrote:
 Am 13.10.2014 16:36, schrieb Daniel Murphy:
 "Sönke Ludwig"  wrote in message 
 news:m1ge08$10ub$1 digitalmars.com...

 Oh, I've read "both line and column into a single uint", 
 because of
 "four words per token" - considering that "word == 16bit", 
 but Andrei
 obviously meant "word == (void*).sizeof". If simply using 
 uint instead
 of size_t is meant, then that's of course a different thing.

 I suppose a 4GB single-line json file is still possible.

 If we make that assumption, we'd have to change it from size_t 
 to ulong, but my feeling is that this case (format error at
4GB && human tries to look at that place using an editor)

 should be rare enough that we can make the compromise in favor 
 of a smaller struct size.

What are you using the location structs for?

In D:YAML they're only used for info about errors, so I use 
ushorts and ushort.max means "65535 or more".

Oct 13 2014

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 13.10.2014 19:40, schrieb Kiith-Sa:
 On Monday, 13 October 2014 at 17:21:44 UTC, Sönke Ludwig wrote:
 Am 13.10.2014 16:36, schrieb Daniel Murphy:
 "Sönke Ludwig"  wrote in message news:m1ge08$10ub$1 digitalmars.com...

 Oh, I've read "both line and column into a single uint", because of
 "four words per token" - considering that "word == 16bit", but Andrei
 obviously meant "word == (void*).sizeof". If simply using uint instead
 of size_t is meant, then that's of course a different thing.

 I suppose a 4GB single-line json file is still possible.

 If we make that assumption, we'd have to change it from size_t to
 ulong, but my feeling is that this case (format error at
4GB && human tries to look at that place using an editor)

 should be rare enough that we can make the compromise in favor of a
 smaller struct size.

 What are you using the location structs for?

 In D:YAML they're only used for info about errors, so I use ushorts and
 ushort.max means "65535 or more".

Within the package itself they are also only used for error information. 
But they are also generally available with each token/node/value, so 
people could do very different things with them.

Oct 13 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 10/13/14, 10:21 AM, S�nke Ludwig wrote:
 Am 13.10.2014 16:36, schrieb Daniel Murphy:
 "S�nke Ludwig"  wrote in message news:m1ge08$10ub$1 digitalmars.com...

 Oh, I've read "both line and column into a single uint", because of
 "four words per token" - considering that "word == 16bit", but Andrei
 obviously meant "word == (void*).sizeof". If simply using uint instead
 of size_t is meant, then that's of course a different thing.

 I suppose a 4GB single-line json file is still possible.

 If we make that assumption, we'd have to change it from size_t to ulong,
 but my feeling is that this case (format error at >4GB && human tries to
 look at that place using an editor) should be rare enough that we can
 make the compromise in favor of a smaller struct size.

Agreed. -- Andrei

Oct 13 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 10/13/14, 4:45 AM, S�nke Ludwig wrote:
 Am 13.10.2014 13:33, schrieb Jacob Carlborg:
 On 13/10/14 09:39, S�nke Ludwig wrote:

 ** At four words per token, Location seems pretty bulky. How about
 reducing line and column to uint?

 Single line JSON files >64k (or line counts >64k) are no exception

 64k?

 Oh, I've read "both line and column into a single uint", because of
 "four words per token" - considering that "word == 16bit", but Andrei
 obviously meant "word == (void*).sizeof". If simply using uint instead
 of size_t is meant, then that's of course a different thing.

Yah, one uint for each. -- Andrei

Oct 13 2014

Jacob Carlborg <doob me.com> writes:

On 22/08/14 00:35, S�nke Ludwig wrote:
 Following up on the recent "std.jgrandson" thread [1], I've picked up
 the work (a lot earlier than anticipated) and finished a first version
 of a loose blend of said std.jgrandson, vibe.data.json and some changes
 that I had planned for vibe.data.json for a while. I'm quite pleased by
 the results so far, although without a serialization framework it still
 misses a very important building block.

 Code: https://github.com/s-ludwig/std_data_json

JSONToken.Kind and JSONParserNode.Kind could be "ubyte" to save space.

-- 
/Jacob Carlborg

Oct 13 2014

=?ISO-8859-15?Q?S=F6nke_Ludwig?= <sludwig rejectedsoftware.com> writes:

Am 13.10.2014 13:37, schrieb Jacob Carlborg:
 On 22/08/14 00:35, S�nke Ludwig wrote:
 Following up on the recent "std.jgrandson" thread [1], I've picked up
 the work (a lot earlier than anticipated) and finished a first version
 of a loose blend of said std.jgrandson, vibe.data.json and some changes
 that I had planned for vibe.data.json for a while. I'm quite pleased by
 the results so far, although without a serialization framework it still
 misses a very important building block.

 Code: https://github.com/s-ludwig/std_data_json

 JSONToken.Kind and JSONParserNode.Kind could be "ubyte" to save space.

But it won't save space in practice, at least on x86, due to alignment, 
and depending on what the compiler assumes, the access can also be 
slower that way.

Oct 13 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 10/13/14, 4:48 AM, S�nke Ludwig wrote:
 Am 13.10.2014 13:37, schrieb Jacob Carlborg:
 On 22/08/14 00:35, S�nke Ludwig wrote:
 Following up on the recent "std.jgrandson" thread [1], I've picked up
 the work (a lot earlier than anticipated) and finished a first version
 of a loose blend of said std.jgrandson, vibe.data.json and some changes
 that I had planned for vibe.data.json for a while. I'm quite pleased by
 the results so far, although without a serialization framework it still
 misses a very important building block.

 Code: https://github.com/s-ludwig/std_data_json

 JSONToken.Kind and JSONParserNode.Kind could be "ubyte" to save space.

 But it won't save space in practice, at least on x86, due to alignment,
 and depending on what the compiler assumes, the access can also be
 slower that way.

Correct. -- Andrei

Oct 13 2014

Ary Borenszweig <ary esperanto.org.ar> writes:

On 8/21/14, 7:35 PM, S�nke Ludwig wrote:
 Following up on the recent "std.jgrandson" thread [1], I've picked up
 the work (a lot earlier than anticipated) and finished a first version
 of a loose blend of said std.jgrandson, vibe.data.json and some changes
 that I had planned for vibe.data.json for a while. I'm quite pleased by
 the results so far, although without a serialization framework it still
 misses a very important building block.

 Code: https://github.com/s-ludwig/std_data_json
 Docs: http://s-ludwig.github.io/std_data_json/
 DUB: http://code.dlang.org/packages/std_data_json

 Destroy away! ;)

 [1]: http://forum.dlang.org/thread/lrknjl$co7$1 digitalmars.com

Once its done you can compare its performance against other languages 
with this benchmark:

https://github.com/kostya/benchmarks/tree/master/json

Oct 17 2014

"Sean Kelly" <sean invisibleduck.org> writes:

On Friday, 17 October 2014 at 18:27:34 UTC, Ary Borenszweig wrote:
 Once its done you can compare its performance against other 
 languages with this benchmark:

 https://github.com/kostya/benchmarks/tree/master/json

Wow, the C++Rapid parser is really impressive.  I threw together 
a test with my own parser for comparison, and Rapid still beat 
it.  It's the first parser I've encountered that's faster.


Ruby
0.4995479721139979
0.49977992077421846
0.49981146157805545
7.53s, 2330.9Mb

Python
0.499547972114
0.499779920774
0.499811461578
12.01s, 1355.1Mb

C++ Rapid
0.499548
0.49978
0.499811
1.75s, 1009.0Mb

JEP (mine)
0.49954797
0.49977992
0.49981146
2.38s, 203.4Mb

Oct 18 2014

"Sean Kelly" <sean invisibleduck.org> writes:

On Saturday, 18 October 2014 at 19:53:23 UTC, Sean Kelly wrote:
 On Friday, 17 October 2014 at 18:27:34 UTC, Ary Borenszweig 
 wrote:
 Once its done you can compare its performance against other 
 languages with this benchmark:

 https://github.com/kostya/benchmarks/tree/master/json

 Wow, the C++Rapid parser is really impressive.  I threw 
 together a test with my own parser for comparison, and Rapid 
 still beat it.  It's the first parser I've encountered that's 
 faster.

 C++ Rapid
 0.499548
 0.49978
 0.499811
 1.75s, 1009.0Mb

 JEP (mine)
 0.49954797
 0.49977992
 0.49981146
 2.38s, 203.4Mb

I just commented out the sscanf() call that was parsing the float 
and re-ran the test to see what the difference would be.  Here's 
the new timing:

JEP (mine)
0.00000000
0.00000000
0.00000000
1.23s, 203.1Mb

So nearly half of the total execution time was spent simply 
parsing floats.  For this reason, I'm starting to think that this 
isn't the best benchmark of JSON parser performance.  The other 
issue with my parser is that it's written in C, and so all of the 
user-defined bits are called via a bank of function pointers.  If 
it were converted to C++ or D where this could be done via 
templates it would be much faster.  Just as a test I nulled out 
the function pointers I'd set to see what the cost of indirection 
was, and here's the result:

JEP (mine)
nan
nan
nan
0.57s, 109.4Mb

The memory difference is interesting, and I can't entirely 
explain it other than to say that it's probably an artifact of my 
mapping in the file as virtual memory rather than reading it into 
an allocated buffer.  Either way, roughly 0.60s can be attributed 
to indirect function calls and the bit of logic on the other 
side, which seems like a good candidate for optimization.

Oct 18 2014

Ary Borenszweig <ary esperanto.org.ar> writes:

On 10/18/14, 4:53 PM, Sean Kelly wrote:
 On Friday, 17 October 2014 at 18:27:34 UTC, Ary Borenszweig wrote:
 Once its done you can compare its performance against other languages
 with this benchmark:

 https://github.com/kostya/benchmarks/tree/master/json

 Wow, the C++Rapid parser is really impressive.  I threw together a test
 with my own parser for comparison, and Rapid still beat it.  It's the
 first parser I've encountered that's faster.


 Ruby
 0.4995479721139979
 0.49977992077421846
 0.49981146157805545
 7.53s, 2330.9Mb

 Python
 0.499547972114
 0.499779920774
 0.499811461578
 12.01s, 1355.1Mb

 C++ Rapid
 0.499548
 0.49978
 0.499811
 1.75s, 1009.0Mb

 JEP (mine)
 0.49954797
 0.49977992
 0.49981146
 2.38s, 203.4Mb

Yes, C++ rapid seems to be really, really fast. It has some sse2/see4 
specific optimizations and I guess a lot more. I have to investigate 
more in order to do something similar :-)

Oct 19 2014

"David Soria Parra" <davidsp fb.com> writes:

On Saturday, 18 October 2014 at 19:53:23 UTC, Sean Kelly wrote:

 Python
 0.499547972114
 0.499779920774
 0.499811461578
 12.01s, 1355.1Mb

I assume this is the standard json module? I am wondering how
ujson is performing, which is considered the fastest python
module.

Oct 20 2014

"Jakob Ovrum" <jakobovrum gmail.com> writes:

On Thursday, 21 August 2014 at 22:35:18 UTC, Sönke Ludwig wrote:
 ...

Added to the review queue as a work in progress with relevant 
links:

     http://wiki.dlang.org/Review_Queue

Feb 05 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 2/5/15 1:07 AM, Jakob Ovrum wrote:
 On Thursday, 21 August 2014 at 22:35:18 UTC, Sönke Ludwig wrote:
 ...

 Added to the review queue as a work in progress with relevant links:

      http://wiki.dlang.org/Review_Queue

Yay! -- Andrei

Feb 05 2015

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 05.02.2015 um 10:07 schrieb Jakob Ovrum:
 On Thursday, 21 August 2014 at 22:35:18 UTC, Sönke Ludwig wrote:
 ...

 Added to the review queue as a work in progress with relevant links:

      http://wiki.dlang.org/Review_Queue

Thanks! I(t) should be ready for an official review in one or two weeks 
when my schedule relaxes a little bit.

Feb 05 2015

D Programming

C/C++ Programming

Other

digitalmars.D - RFC: std.json sucessor