digitalmars.D - std.data.json formal review

Atila Neves (4/4) Jul 28 2015 Start of the two week process, folks.

Rikki Cattermole (8/12) Jul 28 2015 Right now, my view is no.

Etienne Cimon (8/24) Jul 28 2015 I totally agree with that, but shouldn't it be consistent in
Brad Anderson (6/22) Jul 28 2015 Just a reminder that this is the review thread, not the vote

Etienne Cimon (5/32) Jul 28 2015 From what I see from std.allocator, there's no Allocator

Rikki Cattermole (6/36) Jul 28 2015 There is one. IAllocator.

Mathias Lang via Digitalmars-d (11/22) Jul 28 2015 Allocator is definitely a separate issue. It's a moving target, it's not

Rikki Cattermole (5/25) Jul 28 2015 Right now we just need a plan, and we're all good for std.data.json.

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (15/29) Jul 28 2015 If you pass a string or byte array as input, then there will be no

Rikki Cattermole (6/38) Jul 28 2015 It was after 3am when I did my initial look. But I saw the appender

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (3/5) Jul 28 2015 You should still have a closer look, as it isn't very similar to the

Rikki Cattermole (3/8) Jul 28 2015 Again after 3am when I first looked. I'll take a closer look and create

Etienne Cimon (8/12) Jul 28 2015 This is cool:

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (9/23) Jul 28 2015 An idea might be to support something like this:

Brad Anderson (6/15) Jul 28 2015 +1
Etienne Cimon (6/36) Jul 28 2015 I like it quite well. No, actually, a lot. Thinking about it some

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (3/7) Jul 28 2015 Thanks for making it happen! Can you also post a quick link to this
Walter Bright (37/38) Jul 28 2015 Thank you very much, Sönke, for taking this on. Thank you, Atila, for t...

H. S. Teoh via Digitalmars-d (18/50) Jul 28 2015 +1. The API should be as simple as possible.

Walter Bright (8/28) Jul 28 2015 That is a good point.

H. S. Teoh via Digitalmars-d (11/42) Jul 28 2015 I'm pretty sure std.conv has interfaces that allow you to keep

Walter Bright (3/7) Jul 28 2015 Not range friendly.

Jacob Carlborg (5/6) Jul 29 2015 But in most cases I think there will be one root node, of type object.

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (6/10) Jul 29 2015 I think a better approach that to add such a special case is to add a
Walter Bright (3/7) Jul 29 2015 I don't understand the question.

Jacob Carlborg (10/16) Jul 29 2015 I guess I'm finding it difficult to picture a JSON structure as a range....

Walter Bright (7/14) Jul 29 2015 It if was returned as a range of nodes, it would be:

Jacob Carlborg (5/8) Jul 30 2015 Ah, that make sense. Never though of an "end" mark like that, pretty

Walter Bright (14/18) Jul 28 2015 So it appears that JSON can be in one of 3 useful states:

H. S. Teoh via Digitalmars-d (6/18) Jul 28 2015 [...]

Walter Bright (3/4) Jul 29 2015 You'd need to add a special node type, 'end'. So an array [1,true] would...

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (26/45) Jul 29 2015 There are actually even four levels:

Walter Bright (5/25) Jul 29 2015 What's the need for users to see a token stream? I don't know what the D...

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (7/36) Jul 29 2015 Hm, I misread "container of JSON values" as "range of JSON values". I

Walter Bright (3/6) Jul 29 2015 Ok, I see your point. The question then becomes does the node stream rea...

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (11/20) Jul 30 2015 I agree that in case of JSON their difference can be a bit subtle.

=?windows-1252?Q?S=F6nke_Ludwig?= (11/29) Jul 29 2015 We could maybe do that if we keep the current JSONValue as a struct
Piotr Szturmaj (5/23) Jul 29 2015 Here's mine range based parser, you can parse 1 TB json file without a

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (56/97) Jul 29 2015 This is actually one of my pet peeves. Having a *readable* API that

Walter Bright (41/140) Jul 29 2015 I agree with your goal of readability. And if someone wants to write cod...

Suliman (2/2) Jul 30 2015 If this implementation will be merged with phobos will vibed

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (5/7) Jul 30 2015 I'll then make the vibe.d JSON module compatible using "alias this"

Brad Anderson (17/26) Jul 30 2015 Is there any reason why D doesn't allow json.parseStream() in

Walter Bright (12/16) Jul 30 2015 I would think it unlikely to be parsing two different formats in one fil...

H. S. Teoh via Digitalmars-d (9/31) Jul 30 2015 Yeah, local imports are fast becoming my preferred D coding style,

Walter Bright (3/8) Jul 30 2015 Funny how my preferred D style of writing code is steadily diverging fro...

H. S. Teoh via Digitalmars-d (6/15) Jul 30 2015 One would hope so, otherwise why are we here instead of in the C++

Suliman (6/6) Jul 31 2015 is the current build is ready for production? I am getting error:

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (4/10) Jul 31 2015 2.068 "fixed" possible safety issues with VariantN by marking the

Suliman (11/25) Jul 31 2015 Wat revision are usable? I checked some and all have issue like:

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (6/27) Aug 01 2015 parseJSONValue takes a reference to an input range, so that it can

Suliman (20/25) Aug 01 2015 Yes please, because it's hard to understand difference. Maybe

Suliman (5/5) Aug 01 2015 Look like it's Variant type. So I tried to use method get! do

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (2/6) Aug 01 2015 The correct syntax is: response["code"].get!int

Suliman (5/14) Aug 01 2015 Thanks! But How to get access to elements that in result: {}
Suliman (5/14) Aug 01 2015 connectInfo.statusCode = response["code"].get!int;

Jacob Carlborg (14/22) Jul 30 2015 I kind of agree with that, but at the same time, if one always need to

Nick Sabalausky (6/18) Aug 21 2015 It also fucks up UFCS, and I'm a huge fan of UFCS.

David Nadlinger (4/5) Aug 21 2015 Are you saying that "import json : parseJSON = parse;

Nick Sabalausky (32/36) Aug 22 2015 Ok, fair point, although I was referring more to fully-qualified name

Don (17/55) Jul 29 2015 Related to this: it should not be importing std.bigint. Note that

H. S. Teoh via Digitalmars-d (17/47) Jul 29 2015 [...]

Laeeth Isharc (6/18) Jul 29 2015 Some JSON files can be quite large...

sigod (4/11) Jul 29 2015 I think in your case it wouldn't matter. Comments are text,

=?windows-1252?Q?S=F6nke_Ludwig?= (6/51) Jul 29 2015 That means a performance hit, because the string has to be parsed twice

matovitch (13/13) Jul 29 2015 Hi Sonke,

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (5/17) Jul 29 2015 Hm, that example is outdated, I'll fix it ASAP. Currently it uses toJSON...

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (15/29) Jul 29 2015 BigInt is opt-in, at least as far as the lexer goes. But why would such
Dmitry Olshansky (14/31) Aug 02 2015 Actually JSON is defined as subset of EMCASCript-262 spec hence

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (2/24) Aug 03 2015

Dmitry Olshansky (44/86) Aug 03 2015 Hm about 5 solid pages and indeed it leaves everything unspecified for

Marco Leise (21/37) Sep 27 2015 I would take RapidJSON with a grain of salt, its main goal is

Dmitry Olshansky (13/49) Sep 27 2015 Agreed. Still keep in mind the whole reason that Ruby supports it is

Walter Bright (3/3) Jul 28 2015 A speed optimization, since JSON parsing speed is critical:

Brad Anderson (3/7) Jul 28 2015 That's what it does (depending on which parser you use). The StAX

Walter Bright (2/10) Jul 28 2015 Great!

Andrea Fontana (40/44) Jul 29 2015 Why don't do a shortcut like:

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (12/54) Jul 29 2015 That would be another possibility. What do you think about the

Andrea Fontana (25/74) Jul 29 2015 I implemented it too, but I removed.

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (28/57) Jul 30 2015 In this case, since it would be a separate type, there are no static

deadalnix (13/17) Aug 03 2015 Looked in the doc (

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (17/37) Aug 04 2015 The documentation is lacking, I'll improve that. JSONValue includes an

deadalnix (10/17) Aug 04 2015 That is not going to cut it. I've been working with these for

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (5/20) Aug 11 2015 I just said that jsvar should be supported (even in its full glory), so

deadalnix (8/30) Aug 11 2015 Ok, then maybe there was a misunderstanding on my part.

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (12/35) Aug 12 2015 But take into account that Algebraic already behaves much like jsvar (at...

Meta (3/9) Aug 12 2015 In relation to that, you may find this thread interesting:

Atila Neves (6/10) Aug 11 2015 I forgot to give warnings that the two week period was about to

deadalnix (13/25) Aug 11 2015 Ok some actionable items.

Dmitry Olshansky (6/31) Aug 11 2015 +1 Also most JS engines use nan-boxing to fit type tag along with the

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (14/23) Aug 11 2015 But the array field already needs 16 bytes on 64-bit systems anyway. We

Dmitry Olshansky (18/42) Aug 11 2015 Pointer to array should work for all fields > 8 bytes. Depending on the

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (26/68) Aug 12 2015 The trouble begins with long vs. ulong, even if we'd leave larger

Walter Bright (2/6) Aug 13 2015 Make the type for storing a Number be a template parameter.

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (9/18) Aug 14 2015 Then we'd lose the ability to distinguish between integers and floating

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (3/11) Aug 14 2015 Why can't you specify many types? You should be able to query the
Walter Bright (5/11) Aug 14 2015 Two other solutions:

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (3/4) Aug 14 2015 Except for the lowest negative value…

Walter Bright (3/7) Aug 14 2015 You can always use T for that.
Matthias Bentrup (4/8) Aug 14 2015 actually the x87 format has 64 mantissa bits, although the bit 63

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (5/15) Aug 14 2015 Yes, Walter was right.

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (37/48) Aug 11 2015 See

deadalnix (31/72) Aug 11 2015 Urg. Looks like BigInt should steal a bit somewhere instead of

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (32/102) Aug 12 2015 Agreed, this was what I also thought. Considering that BigInt is heavy

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (4/9) Aug 12 2015 First proof of concept:

Andrei Alexandrescu (10/20) Aug 14 2015 struct TaggedAlgebraic(U) if (is(U == union)) { ... }

Timon Gehr (4/23) Aug 14 2015 The tag is an implementation detail. Algebraic types are actually more
Andrei Alexandrescu (18/43) Aug 17 2015 Ping on this. My working hypothesis:

Dmitry Olshansky (39/88) Aug 17 2015 Actually one can combine the two:

Andrei Alexandrescu (2/5) Aug 17 2015 But a pointer tag can do everything that an integer tag does. -- Andrei

Dmitry Olshansky (4/10) Aug 17 2015 albeit quite a deal slooower.

Andrei Alexandrescu (3/12) Aug 18 2015 I think there's a misunderstanding. Pointers _are_ 64-bit integers and

Dmitry Olshansky (9/22) Aug 18 2015 Integer in a small range is faster to switch on. Plus comparing to zero

Andrei Alexandrescu (6/28) Aug 18 2015 But I'm talking about using pointers for indirect calls IN ADDITION to

Dmitry Olshansky (11/40) Aug 18 2015 If common type fast path with 0 is not relevant then the only gain of

deadalnix (10/67) Aug 17 2015 From the compiler perspective, the tag is much nicer. Compiler

Andrei Alexandrescu (4/11) Aug 17 2015 Point taken. Question is if this is worth it.

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (8/23) Aug 18 2015 Not really, because it most likely doesn't point to where you

Johannes Pfau (24/37) Aug 18 2015 Here's an example with an enum tag, showing what compilers can do:

Andrei Alexandrescu (3/27) Aug 18 2015 That's a language issue - switch does not work with any pointers. I just...

Johannes Pfau (8/44) Aug 18 2015 Yes, if we enable switch for pointers we get nicer D code.

Andrei Alexandrescu (5/10) Aug 18 2015 I agree there's a margin here in favor of integers, but it's getting

deadalnix (7/21) Aug 18 2015 No, enum can also be cramed inline in the code for cheap, they

deadalnix (3/6) Aug 18 2015 No it is not. Is the set of values is not compact, no jump table.

Andrei Alexandrescu (3/11) Aug 18 2015 No, in std.variant it points to a dispatcher function. -- Andrei

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (18/51) Aug 17 2015 (reposting to NG, accidentally replied by e-mail)

Johannes Pfau (13/81) Aug 17 2015 I think Andrei's point is that a pointer tag can do most things a

Suliman (6/6) Aug 17 2015 Why not working:

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (3/9) Aug 17 2015 toJSONValue() is the right function in this case. I've update the

Suliman (5/16) Aug 17 2015 I think that I miss understanding conception of ranges. I reread

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (5/12) Aug 17 2015 String is a valid range, but parseJSONValue takes a *reference* to a

Suliman (11/18) Aug 17 2015 Yeas, I understood, but maybe it's better to rename it (or add

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (6/16) Aug 17 2015 I agree that the naming can be a bit confusing at first, but I chose

Suliman (4/4) Aug 17 2015 Also I can't build last build from git. I am getting error:

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (4/7) Aug 17 2015 Do you use DUB to build? It should automatically download the

Suliman (3/3) Aug 17 2015 Also could you look at theme

Andrei Alexandrescu (9/23) Aug 17 2015 Sounds tenuous.

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (23/46) Aug 18 2015 It's more convenient/readable in cases where a complex type is used

Andrei Alexandrescu (6/61) Aug 21 2015 Well I guess I would, but no matter. It's something where reasonable

=?UTF-8?Q?S=c3=b6nke_Ludwig?= (10/39) Aug 22 2015 It depends on the perspective/use case, so it's surely not unreasonable

deadalnix (5/17) Aug 12 2015 Thing is, the schema is not always known perfectly? Typical case

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (13/28) Aug 12 2015 For example in the serialization framework of vibe.d you can have
Walter Bright (2/5) Aug 12 2015 Hah, I'd like to replace dmd.conf with a .json file.

CraigDillabaugh (4/11) Aug 13 2015 Not .json!

Walter Bright (2/3) Aug 13 2015 [ "comment" : "and you thought it couldn't have comments!" ]

Craig Dillabaugh (7/11) Aug 13 2015 You are cheating :o)
Andrei Alexandrescu (2/6) Aug 14 2015 There can't be two comments with the same key though. -- Andrei

Steven Schveighoffer (13/20) Aug 14 2015 This is invalid (though probably unintentionally). An array cannot have

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (27/39) Aug 14 2015 http://tools.ietf.org/html/rfc7159
Andrei Alexandrescu (4/23) Aug 14 2015 You're right. Good convo:

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (10/41) Aug 14 2015 No, he is wrong, and even if he was right, he would still be

Steven Schveighoffer (10/46) Aug 14 2015 Yes, that's what I checked first :)

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (5/10) Aug 14 2015 One should have a config file format for which there are standard

Steven Schveighoffer (5/14) Aug 14 2015 And that would be possible here. JSON file format says nothing about how...

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (14/17) Aug 14 2015 It isn't important since JSON is not too good as a config file

deadalnix (4/22) Aug 14 2015 It doesn't matter what you think of JSON.

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (4/5) Aug 14 2015 The discussion was about suitability as a standard config file
Steven Schveighoffer (4/6) Aug 14 2015 I think you are missing that this sub-discussion is about using json to

rsw0x (3/10) Aug 14 2015 dub uses sdlang, why not dmd?

Walter Bright (4/10) Aug 14 2015 When going for portability, it is not a good idea to emit duplicate keys...

Walter Bright (3/10) Aug 14 2015 The Json spec doesn't say that - it doesn't specify any semantic meaning...

Walter Bright (3/13) Aug 14 2015 That is, the ECMA 404 spec. There seems to be more than one JSON spec.

Nick Sabalausky (2/4) Aug 21 2015 Amusingly, that "ECMA-404" link results in an actual HTTP 404.

Adam D. Ruppe (3/4) Aug 13 2015 There's an awful lot of people out there replacing json with more

Brad Anderson (3/8) Aug 13 2015 Referring to TOML?
Walter Bright (3/7) Aug 13 2015 We've currently invented our own, rather stupid and limited, format. The...

Dmitry Olshansky (5/14) Aug 13 2015 YAML is (plus/minus braces) the same but supports comments and is

Walter Bright (2/15) Aug 14 2015 Yes, but we (will) have a .json parser in Phobos.

Jacob Carlborg (4/5) Aug 14 2015 Time to add a YAML parser ;)

Rikki Cattermole (2/5) Aug 14 2015 Heyyy Sonke ;)
"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (3/6) Aug 14 2015 I think kiith-sa has started on that:
Walter Bright (4/7) Aug 14 2015 That's a good idea, but since dmd already emits json and requires incorp...

suliman (4/14) Aug 14 2015 Walter, and what I should to do for commenting stringin config

Walter Bright (4/9) Aug 14 2015 json is a format that everybody understands, and dmd has json code alrea...

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (4/14) Aug 15 2015 And you end up with each D tool having their own config format…
Nick Sabalausky (6/13) Aug 21 2015 I'll take an "invented our own, rather stupid and limited, format" over

Dmitry Olshansky (6/23) Aug 14 2015 We actually have YAML parser in DUB repository plus so that can be

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (6/13) Aug 12 2015 Maybe it is better to just focus on having a top-of-the-line

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (28/39) Aug 13 2015 I think we really need to have an informal pre-vote about the BigInt and...

Walter Bright (9/11) Aug 13 2015 1. What about the issue of having the API be a composable range interfac...

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (11/24) Aug 13 2015 In this case, the lexer will perform on-the-fly UTF validation of the

Walter Bright (10/38) Aug 14 2015 Ok, my mistake. I didn't look at the others.

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (19/63) Aug 15 2015 I'll rename it to isCharInputRange. We don't have something like that in...

Suliman (4/4) Aug 15 2015 I talked with few people and they said that they are prefer

Laeeth Isharc (2/6) Aug 15 2015 New stream parser is fast! (See prior thread on benchmarks).

Walter Bright (25/49) Aug 15 2015 That's right, there isn't one. But I use:

Dmitry Olshansky (13/34) Aug 15 2015 Actually there are next to none. `validate` that throws on failed

Walter Bright (4/7) Aug 16 2015 Perhaps, but I wouldn't be convinced without benchmarks to prove it on a...

Dmitry Olshansky (9/18) Aug 16 2015 About x2 faster then decode + check-if-alphabetic on my stuff:

Walter Bright (2/7) Aug 16 2015 Thank you.

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (33/94) Aug 16 2015 Good, I'll use `if (isInputRange!R &&

Jacob Carlborg (5/11) Aug 16 2015 I agree. Signatures like this are what's making std.algorithm look more
Walter Bright (14/53) Aug 16 2015 Except that there is no reason to support wchar, dchar, int, ubyte, or a...

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (28/93) Aug 22 2015 But you have seen ubyte[] when reading something from a file or from a

Walter Bright (8/25) Aug 24 2015 Not if the illuminating example in the Json API description does it that...

=?UTF-8?Q?S=c3=b6nke_Ludwig?= (25/58) Aug 24 2015 That's true, but then they will possibly have to understand the inner

Sebastiaan Koppe (2/5) Aug 25 2015 One can also say the problem is that you have a string variable.

=?UTF-8?Q?S=c3=b6nke_Ludwig?= (16/21) Aug 25 2015 But ranges are not always the right solution:

Jay Norwood (8/20) Aug 15 2015 I like this #3. If I understand it correctly, this would provide

Andrei Alexandrescu (5/9) Aug 17 2015 I'll submit a review in short order, but thought this might be of use in...
=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (11/11) Aug 17 2015 I've added some changes in the latest version (docs updated):
Andrei Alexandrescu (134/138) Aug 17 2015 I'll preface my review with a general comment. This API comes at an

Jacob Carlborg (6/11) Aug 17 2015 I don't think this is excessive. We should strive to have small modules....

Andrei Alexandrescu (2/11) Aug 18 2015 How about a module with 20? -- Andrei

Jacob Carlborg (4/5) Aug 18 2015 If it's used in several other modules, I don't see a problem with it.

Andrei Alexandrescu (2/5) Aug 18 2015 Me neither if internal. I do see a problem if it's public. -- Andrei

Jacob Carlborg (5/6) Aug 18 2015 If it's public and those 20 lines are useful on its own, I don't see a

Andrei Alexandrescu (4/8) Aug 18 2015 In this case at least they aren't. There is no need to import the JSON

Dmitry Olshansky (6/16) Aug 19 2015 To catch it? Generally I agree - just merge things sensibly, there could...
=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (4/14) Aug 19 2015 The only other module where it would fit would be lexer.d, but that

Andrei Alexandrescu (3/19) Aug 21 2015 I'm sure there are a number of better options to package things nicely.

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (2/22) Aug 22 2015 I'm all ears ;)

Nick Sabalausky (3/9) Aug 21 2015 Module boundaries should be determined by organizational grouping, not

David Nadlinger (5/7) Aug 21 2015 By organizational grouping as well as encapsulation concerns.
Andrei Alexandrescu (3/14) Aug 21 2015 Rather by usefulness. As I mentioned, nobody would ever need only JSON's...
Jacob Carlborg (6/8) Aug 23 2015 Well, but it depends on how you decide what should be in a group. Size

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (22/92) Aug 18 2015 For iterating tree-like structures, a callback-based seems nicer,

Marco Leise (10/12) Sep 28 2015 You mean the user should write a JSON number parsing routine

Marc =?UTF-8?B?U2Now7x0eg==?= (10/20) Sep 29 2015 No, the JSON type should just store the raw unparsed token and

Laeeth Isharc (13/35) Sep 29 2015 I was just speaking to Sonke about another aspect of this. It's
Marco Leise (16/28) Sep 30 2015 Ah, the duck typing approach of accepting any numeric type

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (103/236) Aug 18 2015 That would mean a single module that is >5k lines long. Spreading out

Andrei Alexandrescu (69/278) Aug 21 2015 That would help. My point is it's good design to make the response

Andrei Alexandrescu (3/8) Aug 21 2015 I should add that in parseJSONStream, "stream" refers to the input,
tired_eyes (3/7) Aug 21 2015 Wow. Just wow.

Andrei Alexandrescu (2/10) Aug 21 2015 By "it" there I mean "the brake" :o). -- Andrei

H. S. Teoh via Digitalmars-d (7/19) Aug 21 2015 Wait, wait. So you're saying the GC is a brake, and we should remove the

Andrei Alexandrescu (3/19) Aug 21 2015 Nothing new here. We want to make it a pleasant experience to use D

H. S. Teoh via Digitalmars-d (6/27) Aug 21 2015 Making it pleasant to use without a GC is not the same thing as removing
Steven Schveighoffer (8/29) Aug 21 2015 Allow me to (possibly) clarify.

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (85/242) Aug 22 2015 Most lines are needed for tests and documentation. Surely dropping some

Martin Nowak (8/18) Aug 24 2015 Also see "utf/unicode should only be validated once"

=?UTF-8?Q?S=c3=b6nke_Ludwig?= (9/28) Aug 25 2015 The performance benefit comes from the fact that almost all of JSON is a...

Martin Nowak (3/9) Aug 25 2015 I see, then we should indeed exploit this fact and offer lexing of

Timon Gehr (2/6) Aug 19 2015 What about the comma tokens?

Andrei Alexandrescu (4/11) Aug 19 2015 Forgot about those. The invariant is that byToken should return a

Jacob Carlborg (4/6) Aug 19 2015 That should be possible without the comma tokens in this case?

Andrei Alexandrescu (3/7) Aug 19 2015 That is correct, but would do little else than confusing folks. FWIW the...

Martin Nowak (3/5) Aug 24 2015 Though stdx (or better std.x) would have been a prettier and more

Timon Gehr (3/8) Aug 25 2015 The great thing about the experimental package is that we are actually

Steven Schveighoffer (6/16) Aug 25 2015 I strongly oppose renaming it. I don't want Phobos to fall into the trap...

Martin Nowak (3/3) Aug 25 2015 Will try to convert a piece of code I wrote a few days ago.
tired_eyes (3/3) Sep 24 2015 So, what is the current status of std.data.json? This topic is

Atila Neves (4/7) Sep 24 2015 I probably should have posted here. Soenke is working on all the

Marco Leise (16/22) Oct 02 2015 There is one thing I noticed today that I personally feel
Alex (20/20) Oct 06 2015 JSON is a particular file format useful for serialising

=?UTF-8?Q?S=c3=b6nke_Ludwig?= (12/31) Oct 06 2015 A generic serialization framework is definitely needed! Jacob Carlborg
Sebastiaan Koppe (15/18) Oct 06 2015 I think there are too many particulars making an abstract

Atila Neves (7/10) Oct 06 2015 The binary one is the one I care about, so that's the one I wrote:

=?UTF-8?B?TcOhcmNpbw==?= Martins (4/8) Oct 09 2018 Sorry for the late ping, but it's been 3 years - what has

Nicholas Wilson (3/13) Oct 09 2018 I presume it became vibe.data.json, there is also asdf if you're

Jonathan M Davis (41/56) Oct 09 2018 As I understand it, it was originally part of vibe.d (though I think tha...

Basile B. (6/16) Oct 09 2018 It's been moved here

"Atila Neves" <atila.neves gmail.com> writes:

Start of the two week process, folks.

Code: https://github.com/s-ludwig/std_data_json
Docs: http://s-ludwig.github.io/std_data_json/

Atila

Jul 28 2015

Rikki Cattermole <alphaglosined gmail.com> writes:

On 29/07/2015 2:07 a.m., Atila Neves wrote:
 Start of the two week process, folks.

 Code: https://github.com/s-ludwig/std_data_json
 Docs: http://s-ludwig.github.io/std_data_json/

 Atila

Right now, my view is no.
Unless there is some sort of proof that it will work with allocators.

I have used the code from vibe.d days so its not an issue of how well it 
works nor nit picky. Just can I pass it an allocator (optionally) and 
have it use that for all memory usage?

After all, I really would rather be able to deallocate all memory 
allocated during a request then you know, rely on the GC.

Jul 28 2015

"Etienne Cimon" <etcimon gmail.com> writes:

On Tuesday, 28 July 2015 at 15:07:46 UTC, Rikki Cattermole wrote:
 On 29/07/2015 2:07 a.m., Atila Neves wrote:
 Start of the two week process, folks.

 Code: https://github.com/s-ludwig/std_data_json
 Docs: http://s-ludwig.github.io/std_data_json/

 Atila

 Right now, my view is no.
 Unless there is some sort of proof that it will work with 
 allocators.

 I have used the code from vibe.d days so its not an issue of 
 how well it works nor nit picky. Just can I pass it an 
 allocator (optionally) and have it use that for all memory 
 usage?

 After all, I really would rather be able to deallocate all 
 memory allocated during a request then you know, rely on the GC.

I totally agree with that, but shouldn't it be consistent in 
Phobos? I don't think it's possible to make an interface for 
custom allocators right now, because that question simply hasn't 
been ironed out along with std.allocator.

So, anything related to allocators belongs in another thread imo, 
and the review process here would be about the actual json 
interface

Jul 28 2015

"Brad Anderson" <eco gnuk.net> writes:

On Tuesday, 28 July 2015 at 15:07:46 UTC, Rikki Cattermole wrote:
 On 29/07/2015 2:07 a.m., Atila Neves wrote:
 Start of the two week process, folks.

 Code: https://github.com/s-ludwig/std_data_json
 Docs: http://s-ludwig.github.io/std_data_json/

 Atila

 Right now, my view is no.

Just a reminder that this is the review thread, not the vote 
thread (in case anyone reading got confused).

 Unless there is some sort of proof that it will work with 
 allocators.

 I have used the code from vibe.d days so its not an issue of 
 how well it works nor nit picky. Just can I pass it an 
 allocator (optionally) and have it use that for all memory 
 usage?

 After all, I really would rather be able to deallocate all 
 memory allocated during a request then you know, rely on the GC.

That's a good point. This is the perfect opportunity to hammer 
out how allocators are going to be integrated into other parts of 
Phobos.

Jul 28 2015

"Etienne Cimon" <etcimon gmail.com> writes:

On Tuesday, 28 July 2015 at 15:55:04 UTC, Brad Anderson wrote:
 On Tuesday, 28 July 2015 at 15:07:46 UTC, Rikki Cattermole 
 wrote:
 On 29/07/2015 2:07 a.m., Atila Neves wrote:
 Start of the two week process, folks.

 Code: https://github.com/s-ludwig/std_data_json
 Docs: http://s-ludwig.github.io/std_data_json/

 Atila

 Right now, my view is no.

 Just a reminder that this is the review thread, not the vote 
 thread (in case anyone reading got confused).

 Unless there is some sort of proof that it will work with 
 allocators.

 I have used the code from vibe.d days so its not an issue of 
 how well it works nor nit picky. Just can I pass it an 
 allocator (optionally) and have it use that for all memory 
 usage?

 After all, I really would rather be able to deallocate all 
 memory allocated during a request then you know, rely on the 
 GC.

 That's a good point. This is the perfect opportunity to hammer 
 out how allocators are going to be integrated into other parts 
 of Phobos.

 From what I see from std.allocator, there's no Allocator 
interface? I think this would require changing the type to 
`struct JSONValue(Allocator)`, unless we see an actual interface 
implemented in phobos.

Jul 28 2015

Rikki Cattermole <alphaglosined gmail.com> writes:

On 29/07/2015 4:23 a.m., Etienne Cimon wrote:
 On Tuesday, 28 July 2015 at 15:55:04 UTC, Brad Anderson wrote:
 On Tuesday, 28 July 2015 at 15:07:46 UTC, Rikki Cattermole wrote:
 On 29/07/2015 2:07 a.m., Atila Neves wrote:
 Start of the two week process, folks.

 Code: https://github.com/s-ludwig/std_data_json
 Docs: http://s-ludwig.github.io/std_data_json/

 Atila

 Right now, my view is no.

 Just a reminder that this is the review thread, not the vote thread
 (in case anyone reading got confused).

 Unless there is some sort of proof that it will work with allocators.

 I have used the code from vibe.d days so its not an issue of how well
 it works nor nit picky. Just can I pass it an allocator (optionally)
 and have it use that for all memory usage?

 After all, I really would rather be able to deallocate all memory
 allocated during a request then you know, rely on the GC.

 That's a good point. This is the perfect opportunity to hammer out how
 allocators are going to be integrated into other parts of Phobos.

  From what I see from std.allocator, there's no Allocator interface? I
 think this would require changing the type to `struct
 JSONValue(Allocator)`, unless we see an actual interface implemented in
 phobos.

There is one. IAllocator.
I use it throughout std.experimental.image. Unfortunately site is down 
atm so can't link docs *grumbles*.
Btw even if an allocator is a struct, there is a type to wrap it up in a 
class.

Jul 28 2015

Mathias Lang via Digitalmars-d <digitalmars-d puremagic.com> writes:

2015-07-28 17:55 GMT+02:00 Brad Anderson via Digitalmars-d <
digitalmars-d puremagic.com>:

  Unless there is some sort of proof that it will work with allocators.
 I have used the code from vibe.d days so its not an issue of how well it
 works nor nit picky. Just can I pass it an allocator (optionally) and have
 it use that for all memory usage?

 After all, I really would rather be able to deallocate all memory
 allocated during a request then you know, rely on the GC.

 That's a good point. This is the perfect opportunity to hammer out how
 allocators are going to be integrated into other parts of Phobos.

Allocator is definitely a separate issue. It's a moving target, it's not
yet part of a release, and consequently barely field-tested. We will find
bugs, we might find design mistakes, we might head in a direction which
will turn out to be an anti-pattern (just like `opDispatch` for JSONValue
;) )
It's not to say the quality of the module isn't good - that would mean our
release process is broken -, but making a module inclusion to experimental
dependent on another module in experimental will not improve the quality of
the reviewed module.

Jul 28 2015

Rikki Cattermole <alphaglosined gmail.com> writes:

On 29/07/2015 4:25 a.m., Mathias Lang via Digitalmars-d wrote:
 2015-07-28 17:55 GMT+02:00 Brad Anderson via Digitalmars-d
 <digitalmars-d puremagic.com <mailto:digitalmars-d puremagic.com>>:


         Unless there is some sort of proof that it will work with
         allocators.

         I have used the code from vibe.d days so its not an issue of how
         well it works nor nit picky. Just can I pass it an allocator
         (optionally) and have it use that for all memory usage?

         After all, I really would rather be able to deallocate all
         memory allocated during a request then you know, rely on the GC.


     That's a good point. This is the perfect opportunity to hammer out
     how allocators are going to be integrated into other parts of Phobos.


 Allocator is definitely a separate issue. It's a moving target, it's not
 yet part of a release, and consequently barely field-tested. We will
 find bugs, we might find design mistakes, we might head in a direction
 which will turn out to be an anti-pattern (just like `opDispatch` for
 JSONValue ;) )
 It's not to say the quality of the module isn't good - that would mean
 our release process is broken -, but making a module inclusion to
 experimental dependent on another module in experimental will not
 improve the quality of the reviewed module.

Right now we just need a plan, and we're all good for std.data.json.
Doesn't need to implemented right now, but I'd rather we had a plan 
going forward to add allocators to it, then you know find out a year 
down the track that it would need a whole rewrite.

Jul 28 2015

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 28.07.2015 um 17:07 schrieb Rikki Cattermole:
On 29/07/2015 2:07 a.m., Atila Neves wrote:
Start of the two week process, folks.

Code: https://github.com/s-ludwig/std_data_json
Docs: http://s-ludwig.github.io/std_data_json/

Atila

Right now, my view is no.
Unless there is some sort of proof that it will work with allocators.

I have used the code from vibe.d days so its not an issue of how well it
works nor nit picky. Just can I pass it an allocator (optionally) and
have it use that for all memory usage?

After all, I really would rather be able to deallocate all memory
allocated during a request then you know, rely on the GC.

If you pass a string or byte array as input, then there will be no
allocations at all (the interface is nogc).

For other cases it supports custom allocation through an appender
factory [1][2], since there is no standard allocator interface, yet. But
since that's the only place where memory is allocated (apart from lower
level code, such as BigInt), as soon as Appender supports custom
allocators, or you write your own appender, the JSON parser will, too.

Only if you use the DOM parser, there will be some inevitable GC
allocations, because the DOM representation uses dynamic and associative
arrays.

1:
https://github.com/s-ludwig/std_data_json/blob/aac6d846d596750623fd5c546343f4f9d19447fa/source/stdx/data/json/lexer.d#L66
2:
https://github.com/s-ludwig/std_data_json/blob/aac6d846d596750623fd5c546343f4f9d19447fa/source/stdx/data/json/parser.d#L286

Jul 28 2015

Rikki Cattermole <alphaglosined gmail.com> writes:

On 29/07/2015 4:41 a.m., Sönke Ludwig wrote:
Am 28.07.2015 um 17:07 schrieb Rikki Cattermole:
On 29/07/2015 2:07 a.m., Atila Neves wrote:
Start of the two week process, folks.

Code: https://github.com/s-ludwig/std_data_json
Docs: http://s-ludwig.github.io/std_data_json/

Atila

Right now, my view is no.
Unless there is some sort of proof that it will work with allocators.

I have used the code from vibe.d days so its not an issue of how well it
works nor nit picky. Just can I pass it an allocator (optionally) and
have it use that for all memory usage?

After all, I really would rather be able to deallocate all memory
allocated during a request then you know, rely on the GC.

If you pass a string or byte array as input, then there will be no
allocations at all (the interface is nogc).

Only if you use the DOM parser, there will be some inevitable GC
allocations, because the DOM representation uses dynamic and associative
arrays.

1:
https://github.com/s-ludwig/std_data_json/blob/aac6d846d596750623fd5c546343f4f9d19447fa/source/stdx/data/json/lexer.d#L66

2:
https://github.com/s-ludwig/std_data_json/blob/aac6d846d596750623fd5c546343f4f9d19447fa/source/stdx/data/json/parser.d#L286

It was after 3am when I did my initial look. But I saw the appender
usage. I'm ok with this.
The DOM parser on the other hand.. ugh this is where we do need
IAllocator being used. Although by the sounds of it, we would need a map
collection which supports allocators before it can be done.

Jul 28 2015

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 28.07.2015 um 17:07 schrieb Rikki Cattermole:
 I have used the code from vibe.d days so its not an issue of how well it
 works nor nit picky.

You should still have a closer look, as it isn't very similar to the 
vibe.d code at all, but a rather radical evolution.

Jul 28 2015

Rikki Cattermole <alphaglosined gmail.com> writes:

On 29/07/2015 4:43 a.m., Sönke Ludwig wrote:
 Am 28.07.2015 um 17:07 schrieb Rikki Cattermole:
 I have used the code from vibe.d days so its not an issue of how well it
 works nor nit picky.

 You should still have a closer look, as it isn't very similar to the
 vibe.d code at all, but a rather radical evolution.

Again after 3am when I first looked. I'll take a closer look and create 
a new thread on this post about anything I find.

Jul 28 2015

"Etienne Cimon" <etcimon gmail.com> writes:

On Tuesday, 28 July 2015 at 14:07:19 UTC, Atila Neves wrote:
 Start of the two week process, folks.

 Code: https://github.com/s-ludwig/std_data_json
 Docs: http://s-ludwig.github.io/std_data_json/

 Atila

This is cool:
https://github.com/s-ludwig/std_data_json/blob/aac6d846d596750623fd5c546343f4f9d19447fa/source/stdx/data/json/value.d#L183

I was getting tired of programmatically checking for null, then 
checking for object type, before moving along in the object and 
doing the same recursively. Not quite as intuitive as the 
optional chaining ?. operator in swift but it gets pretty close 
https://blog.sabintsev.com/optionals-in-swift-c94fd231e7a4#5622

Jul 28 2015

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 28.07.2015 um 17:19 schrieb Etienne Cimon:
 On Tuesday, 28 July 2015 at 14:07:19 UTC, Atila Neves wrote:
 Start of the two week process, folks.

 Code: https://github.com/s-ludwig/std_data_json
 Docs: http://s-ludwig.github.io/std_data_json/

 Atila

 This is cool:
 https://github.com/s-ludwig/std_data_json/blob/aac6d846d596750623fd5c546343f4f9d19447fa/source/stdx/data/json/value.d#L183


 I was getting tired of programmatically checking for null, then checking
 for object type, before moving along in the object and doing the same
 recursively. Not quite as intuitive as the optional chaining ?. operator
 in swift but it gets pretty close
 https://blog.sabintsev.com/optionals-in-swift-c94fd231e7a4#5622

An idea might be to support something like this:

json_value.opt.foo.bar[2].baz
or
opt(json_value).foo.bar[2].baz

opt (name is debatable) would return a wrapper struct around the 
JSONValue that supports opDispatch/opIndex and propagates a missing 
field to the top gracefully. It could also keep track of the complete 
path to give a nice error message when a non-existent value is dereferenced.

Jul 28 2015

"Brad Anderson" <eco gnuk.net> writes:

On Tuesday, 28 July 2015 at 18:45:51 UTC, Sönke Ludwig wrote:
 An idea might be to support something like this:

 json_value.opt.foo.bar[2].baz
 or
 opt(json_value).foo.bar[2].baz

 opt (name is debatable) would return a wrapper struct around 
 the JSONValue that supports opDispatch/opIndex and propagates a 
 missing field to the top gracefully. It could also keep track 
 of the complete path to give a nice error message when a 
 non-existent value is dereferenced.

+1

This would solve the cumbersome access of something deeply nested 
that I've had to deal with when using stdx.data.json. Combine 
that with the Algebraic improvements you've mentioned before and 
it'll be just about as pleasant as it could be to use.

Jul 28 2015

"Etienne Cimon" <etcimon gmail.com> writes:

On Tuesday, 28 July 2015 at 18:45:51 UTC, Sönke Ludwig wrote:
 Am 28.07.2015 um 17:19 schrieb Etienne Cimon:
 On Tuesday, 28 July 2015 at 14:07:19 UTC, Atila Neves wrote:
 Start of the two week process, folks.

 Code: https://github.com/s-ludwig/std_data_json
 Docs: http://s-ludwig.github.io/std_data_json/

 Atila

 This is cool:
 https://github.com/s-ludwig/std_data_json/blob/aac6d846d596750623fd5c546343f4f9d19447fa/source/stdx/data/json/value.d#L183


 I was getting tired of programmatically checking for null, 
 then checking
 for object type, before moving along in the object and doing 
 the same
 recursively. Not quite as intuitive as the optional chaining 
 ?. operator
 in swift but it gets pretty close
 https://blog.sabintsev.com/optionals-in-swift-c94fd231e7a4#5622

 An idea might be to support something like this:

 json_value.opt.foo.bar[2].baz
 or
 opt(json_value).foo.bar[2].baz

 opt (name is debatable) would return a wrapper struct around 
 the JSONValue that supports opDispatch/opIndex and propagates a 
 missing field to the top gracefully. It could also keep track 
 of the complete path to give a nice error message when a 
 non-existent value is dereferenced.

I like it quite well. No, actually, a lot. Thinking about it some 
more... this could end up being the most convenient feature ever 
known to mankind and would likely push it towards a new age of 
grand discoveries, infinite fusion power and space colonization. 
Lets do it

Jul 28 2015

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 28.07.2015 um 16:07 schrieb Atila Neves:
 Start of the two week process, folks.

 Code: https://github.com/s-ludwig/std_data_json
 Docs: http://s-ludwig.github.io/std_data_json/

 Atila

Thanks for making it happen! Can you also post a quick link to this 
thread in D.announce?

Jul 28 2015

Walter Bright <newshound2 digitalmars.com> writes:

On 7/28/2015 7:07 AM, Atila Neves wrote:
 Start of the two week process, folks.

Thank you very much, Sönke, for taking this on. Thank you, Atila, for taking
on 
the thankless job of being review manager.

Just looking at the documentation only, some general notes:

1. Not sure that 'JSON' needs to be embedded in the public names. 
'parseJSONStream' should just be 'parseStream', etc. Name disambiguation, if 
needed, should be ably taken care of by a number of D features for that
purpose. 
Additionally, I presume that the stdx.data package implies a number of
different 
formats. These formats should all use the same names with as similar as
possible 
APIs - this won't work too well if JSON is embedded in the APIs.

2. JSON is a trivial format, http://json.org/. But I count 6 files and 30 names 
in the public API.

3. Stepping back a bit, when I think of parsing JSON data, I think:

     auto ast = inputrange.toJSON();

where toJSON() accepts an input range and produces a container, the ast. The
ast 
is just a JSON value. Then, I can just query the ast to see what kind of value 
it is (using overloading), and walk it as necessary. To create output:

     auto r = ast.toChars();  // r is an InputRange of characters
     writeln(r);

So, we'll need:
     toJSON
     toChars
     JSONException

The possible JSON values are:
     string
     number
     object (associative arrays)
     array
     true
     false
     null

Since these are D builtin types, they can actually be a simple union of D 
builtin types.

There is a decision needed about whether toJSON() allocates data or returns 
slices into its inputrange. This can be 'static if' tested by: if inputrange
can 
return immutable slices. toChars() can take a compile time argument to
determine 
if it is 'pretty' or not.

Jul 28 2015

"H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:

On Tue, Jul 28, 2015 at 03:29:02PM -0700, Walter Bright via Digitalmars-d wrote:
[...]
 3. Stepping back a bit, when I think of parsing JSON data, I think:
 
     auto ast = inputrange.toJSON();
 
 where toJSON() accepts an input range and produces a container, the
 ast. The ast is just a JSON value. Then, I can just query the ast to
 see what kind of value it is (using overloading), and walk it as
 necessary.

+1. The API should be as simple as possible.

Ideally, I'd say hook it up to std.conv.to for maximum flexibility. Then
you can just use to() to convert between a JSON container and the value
that it represents (assuming the types are compatible).

OTOH, some people might want the option of parser-driven data processing
instead (e.g. the JSON data is very large and we don't want to store the
whole thing in memory at once). I'm not sure what a good API for that
would be, though.


 To create output:
 
     auto r = ast.toChars();  // r is an InputRange of characters
     writeln(r);
 
 So, we'll need:
     toJSON
     toChars

Shouldn't it just be toString()?


[...]
 The possible JSON values are:
     string
     number
     object (associative arrays)
     array
     true
     false
     null
 
 Since these are D builtin types, they can actually be a simple union
 of D builtin types.
 
 There is a decision needed about whether toJSON() allocates data or
 returns slices into its inputrange. This can be 'static if' tested by:
 if inputrange can return immutable slices. toChars() can take a
 compile time argument to determine if it is 'pretty' or not.

Whether or not toJSON() allocates *data*, it will have to allocate
container nodes of some sort. At the minimum, it will need to use AA's,
so it cannot be  nogc.


T

-- 
Recently, our IT department hired a bug-fix engineer. He used to work for
Volkswagen.

Jul 28 2015

Walter Bright <newshound2 digitalmars.com> writes:

On 7/28/2015 3:37 PM, H. S. Teoh via Digitalmars-d wrote:
 On Tue, Jul 28, 2015 at 03:29:02PM -0700, Walter Bright via Digitalmars-d
wrote:
 Ideally, I'd say hook it up to std.conv.to for maximum flexibility. Then
 you can just use to() to convert between a JSON container and the value
 that it represents (assuming the types are compatible).

Well, I wouldn't want std.conv to be importing std.json.


 OTOH, some people might want the option of parser-driven data processing
 instead (e.g. the JSON data is very large and we don't want to store the
 whole thing in memory at once).

That is a good point.


 I'm not sure what a good API for that would be, though.

Probably simply returning an InputRange of JSON values.


 To create output:

      auto r = ast.toChars();  // r is an InputRange of characters
      writeln(r);

 So, we'll need:
      toJSON
      toChars

 Shouldn't it just be toString()?

No. toString() returns a string, which has to be allocated. toChars() (an 
upcoming convention) would return an InputRange instead, side-stepping
allocation.


 Whether or not toJSON() allocates *data*, it will have to allocate
 container nodes of some sort. At the minimum, it will need to use AA's,
 so it cannot be  nogc.

That's right. At some point the API will need to add a parameter for Andrei's 
allocator system.

Jul 28 2015

"H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:

On Tue, Jul 28, 2015 at 03:55:22PM -0700, Walter Bright via Digitalmars-d wrote:
 On 7/28/2015 3:37 PM, H. S. Teoh via Digitalmars-d wrote:
On Tue, Jul 28, 2015 at 03:29:02PM -0700, Walter Bright via Digitalmars-d wrote:
Ideally, I'd say hook it up to std.conv.to for maximum flexibility.
Then you can just use to() to convert between a JSON container and
the value that it represents (assuming the types are compatible).

 
 Well, I wouldn't want std.conv to be importing std.json.

I'm pretty sure std.conv has interfaces that allow you to keep
JSON-specific stuff in std.json, so that you don't get the JSON
conversion capability until you actually import std.json.


OTOH, some people might want the option of parser-driven data
processing instead (e.g. the JSON data is very large and we don't
want to store the whole thing in memory at once).

 
 That is a good point.
 
 
I'm not sure what a good API for that would be, though.

 
 Probably simply returning an InputRange of JSON values.

But how would you capture the nesting substructures?


To create output:

     auto r = ast.toChars();  // r is an InputRange of characters
     writeln(r);

So, we'll need:
     toJSON
     toChars

Shouldn't it just be toString()?

 
 No. toString() returns a string, which has to be allocated. toChars()
 (an upcoming convention) would return an InputRange instead,
 side-stepping allocation.

[...]

??!  Surely you have heard of the non-allocating overload of toString?

	void toString(scope void delegate(const(char)[]) dg);


T

-- 
When solving a problem, take care that you do not become part of the problem.

Jul 28 2015

Walter Bright <newshound2 digitalmars.com> writes:

On 7/28/2015 5:15 PM, H. S. Teoh via Digitalmars-d wrote:
 Probably simply returning an InputRange of JSON values.

 But how would you capture the nesting substructures?

A JSON value is a tagged union of the various types.


 ??!  Surely you have heard of the non-allocating overload of toString?
 	void toString(scope void delegate(const(char)[]) dg);

Not range friendly.

Jul 28 2015

Jacob Carlborg <doob me.com> writes:

On 2015-07-29 06:57, Walter Bright wrote:

 A JSON value is a tagged union of the various types.

But in most cases I think there will be one root node, of type object. 
In that case it would be range with only one element? How does that help?

-- 
/Jacob Carlborg

Jul 29 2015

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 29.07.2015 um 12:10 schrieb Jacob Carlborg:
 On 2015-07-29 06:57, Walter Bright wrote:

 A JSON value is a tagged union of the various types.

 But in most cases I think there will be one root node, of type object.
 In that case it would be range with only one element? How does that help?

I think a better approach that to add such a special case is to add a 
readValue function that takes a range of parser nodes and reads into a 
single JSONValue. That way one can use the pull parser to jump between 
array or object entries and then extract individual values, or maybe 
even use nodes.map!readValue to get a range of values...

Jul 29 2015

Walter Bright <newshound2 digitalmars.com> writes:

On 7/29/2015 3:10 AM, Jacob Carlborg wrote:
 On 2015-07-29 06:57, Walter Bright wrote:

 A JSON value is a tagged union of the various types.

 But in most cases I think there will be one root node, of type object.

An object is a collection of other Values.


 In that case it would be range with only one element? How does that help?

I don't understand the question.

Jul 29 2015

Jacob Carlborg <doob me.com> writes:

On 2015-07-29 20:33, Walter Bright wrote:

 On 7/29/2015 3:10 AM, Jacob Carlborg wrote:
 But in most cases I think there will be one root node, of type object.

 An object is a collection of other Values.


  > In that case it would be range with only one element? How does that
 help?

 I don't understand the question.

I guess I'm finding it difficult to picture a JSON structure as a range. 
How would the following JSON be returned as a range?

{
   "a": 1,
   "b": [2, 3],
   "c": { "d": 4 }
}

-- 
/Jacob Carlborg

Jul 29 2015

Walter Bright <newshound2 digitalmars.com> writes:

On 7/29/2015 11:51 AM, Jacob Carlborg wrote:
 I guess I'm finding it difficult to picture a JSON structure as a range. How
 would the following JSON be returned as a range?

 {
    "a": 1,
    "b": [2, 3],
    "c": { "d": 4 }
 }


It if was returned as a range of nodes, it would be:

    Object, string, number, string, array, number, number, end, string, object, 
string, number, end, end

If was returned as a Value, then you could ask the value to return a range of
nodes.

A container is not a range, although it may offer a way to get range that 
iterates over its contents.

Jul 29 2015

Jacob Carlborg <doob me.com> writes:

On 2015-07-30 01:34, Walter Bright wrote:

 It if was returned as a range of nodes, it would be:

     Object, string, number, string, array, number, number, end, string,
 object, string, number, end, end

Ah, that make sense. Never though of an "end" mark like that, pretty 
cleaver.

-- 
/Jacob Carlborg

Jul 30 2015

Walter Bright <newshound2 digitalmars.com> writes:

On 7/28/2015 3:55 PM, Walter Bright wrote:
 OTOH, some people might want the option of parser-driven data processing
 instead (e.g. the JSON data is very large and we don't want to store the
 whole thing in memory at once).

 That is a good point.

So it appears that JSON can be in one of 3 useful states:

1. a range of characters (rc)
2. a range of nodes (rn)
3. a container of JSON values (values)

What's necessary is simply the ability to convert between these states:

(names are just for illustration)

    rn = rc.toNodes();
    values = rn.toValues();
    rn = values.toNodes();
    rc = rn.toChars();

So, if I wanted to simply pretty print a JSON string s:

    s.toNodes.toChars();

I.e. it's all composable.

Jul 28 2015

"H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:

On Tue, Jul 28, 2015 at 10:43:20PM -0700, Walter Bright via Digitalmars-d wrote:
 On 7/28/2015 3:55 PM, Walter Bright wrote:
OTOH, some people might want the option of parser-driven data
processing instead (e.g. the JSON data is very large and we don't
want to store the whole thing in memory at once).

That is a good point.

 
 So it appears that JSON can be in one of 3 useful states:
 
 1. a range of characters (rc)
 2. a range of nodes (rn)
 3. a container of JSON values (values)

[...]

How does a linear range of nodes convey a nested structure?


T

-- 
Let's call it an accidental feature. -- Larry Wall

Jul 28 2015

Walter Bright <newshound2 digitalmars.com> writes:

On 7/28/2015 10:49 PM, H. S. Teoh via Digitalmars-d wrote:
 How does a linear range of nodes convey a nested structure?

You'd need to add a special node type, 'end'. So an array [1,true] would look
like:

     array number true end

Jul 29 2015

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 29.07.2015 um 07:43 schrieb Walter Bright:
 On 7/28/2015 3:55 PM, Walter Bright wrote:
 OTOH, some people might want the option of parser-driven data processing
 instead (e.g. the JSON data is very large and we don't want to store the
 whole thing in memory at once).

 That is a good point.

 So it appears that JSON can be in one of 3 useful states:

 1. a range of characters (rc)
 2. a range of nodes (rn)
 3. a container of JSON values (values)

 What's necessary is simply the ability to convert between these states:

 (names are just for illustration)

     rn = rc.toNodes();
     values = rn.toValues();
     rn = values.toNodes();
     rc = rn.toChars();

 So, if I wanted to simply pretty print a JSON string s:

     s.toNodes.toChars();

 I.e. it's all composable.

There are actually even four levels:
1. Range of characters
2. Range of tokens
3. Range of nodes
4. DOM value

Having a special case for range of DOM values may or may not be a 
worthwhile thing to optimize for handling big JSON arrays of values. But 
there is always the pull parser for that kind of data processing.

Currently not all, but most, conversions between the levels are 
implemented, and sometimes a level is skipped for efficiency. The 
question is if it would be worth the effort and the API complexity to 
implement all of them.

lexJSON: character range -> token range
parseJSONStream: character range -> node range
parseJSONStream: token range -> node range
parseJSONValue: character range -> DOM value
parseJSONValue: token range -> DOM value (same for toJSONValue)
writeJSON: token range -> character range (output range)
writeJSON: node range -> character range (output range)
writeJSON: DOM value -> character range (output range)
writeJSON: to -> character range (output range)
(same for toJSON with string output)

Adding an InputStream based version of writeJSON would be an option, but 
the question is how performant that would be and how to go about 
implementing the number->InputRange functionality.

Jul 29 2015

Walter Bright <newshound2 digitalmars.com> writes:

On 7/29/2015 1:37 AM, Sönke Ludwig wrote:
 There are actually even four levels:
 1. Range of characters
 2. Range of tokens
 3. Range of nodes
 4. DOM value

What's the need for users to see a token stream? I don't know what the DOM
value 
is - is that just JSON as an ast?


 Having a special case for range of DOM values may or may not be a worthwhile
 thing to optimize for handling big JSON arrays of values.

I see no point for that.


 Currently not all, but most, conversions between the levels are implemented,
and
 sometimes a level is skipped for efficiency. The question is if it would be
 worth the effort and the API complexity to implement all of them.

 lexJSON: character range -> token range
 parseJSONStream: character range -> node range
 parseJSONStream: token range -> node range
 parseJSONValue: character range -> DOM value
 parseJSONValue: token range -> DOM value (same for toJSONValue)
 writeJSON: token range -> character range (output range)
 writeJSON: node range -> character range (output range)
 writeJSON: DOM value -> character range (output range)
 writeJSON: to -> character range (output range)
 (same for toJSON with string output)

I don't see why there are more than the 3 I mentioned.

Jul 29 2015

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 29.07.2015 um 20:44 schrieb Walter Bright:
 On 7/29/2015 1:37 AM, Sönke Ludwig wrote:
 There are actually even four levels:
 1. Range of characters
 2. Range of tokens
 3. Range of nodes
 4. DOM value

 What's the need for users to see a token stream? I don't know what the
 DOM value is - is that just JSON as an ast?

Yes.

 Having a special case for range of DOM values may or may not be a
 worthwhile
 thing to optimize for handling big JSON arrays of values.

 I see no point for that.

Hm, I misread "container of JSON values" as "range of JSON values". I 
guess you just meant JSONValue, so my comment doesn't apply.

 Currently not all, but most, conversions between the levels are
 implemented, and
 sometimes a level is skipped for efficiency. The question is if it
 would be
 worth the effort and the API complexity to implement all of them.

 lexJSON: character range -> token range
 parseJSONStream: character range -> node range
 parseJSONStream: token range -> node range
 parseJSONValue: character range -> DOM value
 parseJSONValue: token range -> DOM value (same for toJSONValue)
 writeJSON: token range -> character range (output range)
 writeJSON: node range -> character range (output range)
 writeJSON: DOM value -> character range (output range)
 writeJSON: to -> character range (output range)
 (same for toJSON with string output)

 I don't see why there are more than the 3 I mentioned.

The token level is useful for reasoning about the text representation. 
It could be used for example to implement syntax highlighting, or for 
using the location information to mark errors in the source code.

Jul 29 2015

Walter Bright <newshound2 digitalmars.com> writes:

On 7/29/2015 1:41 PM, Sönke Ludwig wrote:
 The token level is useful for reasoning about the text representation. It could
 be used for example to implement syntax highlighting, or for using the location
 information to mark errors in the source code.

Ok, I see your point. The question then becomes does the node stream really add 
enough value to justify its existence, as it greatly overlaps the token stream.

Jul 29 2015

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 30.07.2015 um 05:25 schrieb Walter Bright:
 On 7/29/2015 1:41 PM, Sönke Ludwig wrote:
 The token level is useful for reasoning about the text representation.
 It could
 be used for example to implement syntax highlighting, or for using the
 location
 information to mark errors in the source code.

 Ok, I see your point. The question then becomes does the node stream
 really add enough value to justify its existence, as it greatly overlaps
 the token stream.

I agree that in case of JSON their difference can be a bit subtle. 
Basically the node stream adds knowledge about the nesting of elements, 
as well as adding semantic meaning to special token sequences that the 
library users would otherwise have to parse themselves. Finally, it also 
guarantees a valid JSON structure, while a token range could have tokens 
in any order.

Especially the knowledge about nesting is also a requirement for the 
high-level pull parser functions (skipToKey, readArray, readString etc.) 
that make working with that kind of pull parser interface actually 
bearable, outside of mechanic code like a generic serialization framework.

Jul 30 2015

=?windows-1252?Q?S=F6nke_Ludwig?= <sludwig rejectedsoftware.com> writes:

Am 29.07.2015 um 00:37 schrieb H. S. Teoh via Digitalmars-d:
 On Tue, Jul 28, 2015 at 03:29:02PM -0700, Walter Bright via Digitalmars-d
wrote:
 [...]
 3. Stepping back a bit, when I think of parsing JSON data, I think:

      auto ast = inputrange.toJSON();

 where toJSON() accepts an input range and produces a container, the
 ast. The ast is just a JSON value. Then, I can just query the ast to
 see what kind of value it is (using overloading), and walk it as
 necessary.

 +1. The API should be as simple as possible.

http://s-ludwig.github.io/std_data_json/stdx/data/json/parser/toJSONValue.html

 Ideally, I'd say hook it up to std.conv.to for maximum flexibility. Then
 you can just use to() to convert between a JSON container and the value
 that it represents (assuming the types are compatible).

We could maybe do that if we keep the current JSONValue as a struct 
wrapper around Algebraic. But it I guess that this will create an 
ambiguity between JSONValue("...") creating parsing a JSON string, or 
being constructed as a JSON string value. Or does to! hook up to 
something else than the constructor?

 OTOH, some people might want the option of parser-driven data processing
 instead (e.g. the JSON data is very large and we don't want to store the
 whole thing in memory at once). I'm not sure what a good API for that
 would be, though.

See 
http://s-ludwig.github.io/std_data_json/stdx/data/json/parser/parseJSONStream.html
and the various UFCS "read" and "skip" functions in 
http://s-ludwig.github.io/std_data_json/stdx/data/json/parser.html

Jul 29 2015

Piotr Szturmaj <bncrbme jadamspam.pl> writes:

W dniu 2015-07-29 o 00:37, H. S. Teoh via Digitalmars-d pisze:
 On Tue, Jul 28, 2015 at 03:29:02PM -0700, Walter Bright via Digitalmars-d
wrote:
 [...]
 3. Stepping back a bit, when I think of parsing JSON data, I think:

      auto ast = inputrange.toJSON();

 where toJSON() accepts an input range and produces a container, the
 ast. The ast is just a JSON value. Then, I can just query the ast to
 see what kind of value it is (using overloading), and walk it as
 necessary.

 +1. The API should be as simple as possible.

 Ideally, I'd say hook it up to std.conv.to for maximum flexibility. Then
 you can just use to() to convert between a JSON container and the value
 that it represents (assuming the types are compatible).

 OTOH, some people might want the option of parser-driven data processing
 instead (e.g. the JSON data is very large and we don't want to store the
 whole thing in memory at once). I'm not sure what a good API for that
 would be, though.

Here's mine range based parser, you can parse 1 TB json file without a 
single allocation. It needs heavy polishing, but I didnt have time/need 
to do it. Basically a WIP, but maybe someone will find it useful.

https://github.com/pszturmaj/json-streaming-parser

Jul 29 2015

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 29.07.2015 um 00:29 schrieb Walter Bright:
 On 7/28/2015 7:07 AM, Atila Neves wrote:
 Start of the two week process, folks.

 Thank you very much, Sönke, for taking this on. Thank you, Atila, for
 taking on the thankless job of being review manager.

 Just looking at the documentation only, some general notes:

 1. Not sure that 'JSON' needs to be embedded in the public names.
 'parseJSONStream' should just be 'parseStream', etc. Name
 disambiguation, if needed, should be ably taken care of by a number of D
 features for that purpose. Additionally, I presume that the stdx.data
 package implies a number of different formats. These formats should all
 use the same names with as similar as possible APIs - this won't work
 too well if JSON is embedded in the APIs.

This is actually one of my pet peeves. Having a *readable* API that 
tells the reader immediately what happens is IMO one of the most 
important aspects (far more important than an API that allows quick 
typing). A number of times I've seen D code that omits part of what it 
actually does in its name and the result was that it was constantly 
necessary to scroll up to see where a particular name might come from. 
So I have a strong preference to keep "JSON", because it's an integral 
part of the semantics.

 2. JSON is a trivial format, http://json.org/. But I count 6 files and
 30 names in the public API.

The whole thing provides a stream parser with high level helpers to make 
it convenient to use, a DOM module, a separate lexer and a generator 
module that operates in various different modes (maybe two additional 
modes still to come!). Every single function provides real and 
frequently useful benefits. So if anything, there are still some little 
things missing.

All in all, even if JSON may be a simple format, the source code is 
already almost 5k LOC (includes unit tests of course). But apart from 
maintainability they have mainly been separated to minimize the amount 
of code that needs to be dragged in for a particular functionality (not 
only other JSON modules, but also from different parts of Phobos).

 3. Stepping back a bit, when I think of parsing JSON data, I think:

      auto ast = inputrange.toJSON();

 where toJSON() accepts an input range and produces a container, the ast.
 The ast is just a JSON value. Then, I can just query the ast to see what
 kind of value it is (using overloading), and walk it as necessary.

We can drop the "Value" part of the name of course, if we expect that 
function to be used a lot, but there is still the parseJSONStream 
function which is arguably not less important. BTW, you just mentioned 
the DOM part so far, but for any code that where performance is a 
priority, the stream based pull parser is basically the way to go. This 
would also be the natural entry point for any serialization library.

And my prediction is, if we do it right, that working with JSON will in 
most cases simply mean "S s = deserializeJSON(json_input);", where S is 
a D struct that gets populated with the deserialized JSON data. Where 
that doesn't fit, performance oriented code would use the pull parser. 
So the DOM part of the system, which is the only thing the current JSON 
module has, will only be left as a niche functionality.

 To create output:

      auto r = ast.toChars();  // r is an InputRange of characters
      writeln(r);

Do we have an InputRange version of the various number-to-string 
conversions? It would be quite inconvenient to reinvent those (double, 
long, BigInt) in the JSON package. Of course, using to!string internally 
would be an option, but it would obviously destroy all  nogc 
opportunities and performance benefits.

 So, we'll need:
      toJSON
      toChars
      JSONException

 The possible JSON values are:
      string
      number
      object (associative arrays)
      array
      true
      false
      null

 Since these are D builtin types, they can actually be a simple union of
 D builtin types.

The idea is to have JSONValue be a simple alias to Algebraic!(...), just 
that there are currently still some workarounds for DMD < 2.067.0 on 
top, which means that JSONValue is a struct that "alias this" inherits 
from Algebraic for the time being. Those workarounds will be removed 
when the code is actually put into Phobos.

But a simple union would obviously not be enough, it still needs a type 
tag of some form and needs to provide a  safe interface on top of it. 
Algebraic is the only thing that comes close right now, but I'd really 
prefer to have a fully statically typed version of Algebraic that uses 
an enum as the type tag instead of working with delegates/typeinfo.

 There is a decision needed about whether toJSON() allocates data or
 returns slices into its inputrange. This can be 'static if' tested by:
 if inputrange can return immutable slices.

The test is currently "is(T == string) || is (T == immutable(ubyte)[])", 
but slicing is done in those cases and the non-DOM parser interface is 
even  nogc as long as exceptions are disabled.

 toChars() can take a compile
 time argument to determine if it is 'pretty' or not.

As long as JSON DOM values are stored in a generic Algebraic (which is a 
huge win in terms of interoperability!), toChars won't suffice as a 
name. It would have to be toJSON(Chars) (as it basically is now). I've 
gave the "pretty" version a separate name simply because it's more 
convenient to use and pretty printing will probably be by far the most 
frequently used option when converting to a string.

Jul 29 2015

Walter Bright <newshound2 digitalmars.com> writes:

On 7/29/2015 1:18 AM, Sönke Ludwig wrote:
 Am 29.07.2015 um 00:29 schrieb Walter Bright:
 1. Not sure that 'JSON' needs to be embedded in the public names.
 'parseJSONStream' should just be 'parseStream', etc. Name
 disambiguation, if needed, should be ably taken care of by a number of D
 features for that purpose. Additionally, I presume that the stdx.data
 package implies a number of different formats. These formats should all
 use the same names with as similar as possible APIs - this won't work
 too well if JSON is embedded in the APIs.

 This is actually one of my pet peeves. Having a *readable* API that tells the
 reader immediately what happens is IMO one of the most important aspects (far
 more important than an API that allows quick typing). A number of times I've
 seen D code that omits part of what it actually does in its name and the result
 was that it was constantly necessary to scroll up to see where a particular
name
 might come from. So I have a strong preference to keep "JSON", because it's an
 integral part of the semantics.

I agree with your goal of readability. And if someone wants to write code that 
emphasizes it's JSON, they can write it as std.data.json.parseStream. (It's not 
about saving typing, it's about avoiding extra redundant redundancy, I'm a big 
fan of Strunk & White :-) ) This is not a huge deal for me, but I'm not in
favor 
of establishing a new convention that repeats the module name. It eschews one
of 
the advantages of having module name spaces in the first place, and evokes the 
old C style naming conventions.



 2. JSON is a trivial format, http://json.org/. But I count 6 files and
 30 names in the public API.

 The whole thing provides a stream parser with high level helpers to make it
 convenient to use, a DOM module, a separate lexer and a generator module that
 operates in various different modes (maybe two additional modes still to
come!).
 Every single function provides real and frequently useful benefits. So if
 anything, there are still some little things missing.

I understand there is a purpose to each of those things, but there's also 
considerable value in a simpler API.


 All in all, even if JSON may be a simple format, the source code is already
 almost 5k LOC (includes unit tests of course).

I don't count unit tests as LOC :-)

 But apart from maintainability
 they have mainly been separated to minimize the amount of code that needs to be
 dragged in for a particular functionality (not only other JSON modules, but
also
 from different parts of Phobos).

They are so strongly related I don't see this as a big issue. Also, if they are 
templates, they don't get compiled in if not used.


 3. Stepping back a bit, when I think of parsing JSON data, I think:

      auto ast = inputrange.toJSON();

 where toJSON() accepts an input range and produces a container, the ast.
 The ast is just a JSON value. Then, I can just query the ast to see what
 kind of value it is (using overloading), and walk it as necessary.

 We can drop the "Value" part of the name of course, if we expect that function
 to be used a lot, but there is still the parseJSONStream function which is
 arguably not less important. BTW, you just mentioned the DOM part so far, but
 for any code that where performance is a priority, the stream based pull parser
 is basically the way to go. This would also be the natural entry point for any
 serialization library.

Agreed elsewhere. But still, I am not seeing a range interface on the
functions. 
The lexer, for example, does not accept an input range of characters.

Having a range interface is absolutely critical, and is the thing I am the most 
adamant about with all new Phobos additions. Any function that accepts 
arbitrarily long data should accept an input range instead, any function that 
generates arbitrary data should present that as an input range.

Any function that builds a container should accept an input range to fill that 
container with. Any function that builds a container should also be an output
range.


 And my prediction is, if we do it right, that working with JSON will in most
 cases simply mean "S s = deserializeJSON(json_input);", where S is a D struct
 that gets populated with the deserialized JSON data.

json_input must be a input range of characters.


 Where that doesn't fit,
 performance oriented code would use the pull parser.

I am not sure what you mean by 'pull parser'. Do you mean the parser presents
an 
input range as its output, and incrementally parses only as the next value is 
requested?


 So the DOM part of the
 system, which is the only thing the current JSON module has, will only be left
 as a niche functionality.

That's ok. Is it normal practice to call the JSON data structure a Document 
Object Model?


 To create output:

      auto r = ast.toChars();  // r is an InputRange of characters
      writeln(r);

 Do we have an InputRange version of the various number-to-string conversions?

We do now, at least for integers. I plan to do ones for floating point.


 It would be quite inconvenient to reinvent those (double, long, BigInt) in the
JSON
 package.

Right. It's been reinvented multiple times in Phobos, which is absurd. If
you're 
reinventing them in std.data.json, then we're doing something wrong again.


 Of course, using to!string internally would be an option, but it would
 obviously destroy all  nogc opportunities and performance benefits.

That's exactly why the range versions were done.


 So, we'll need:
      toJSON
      toChars
      JSONException

 The possible JSON values are:
      string
      number
      object (associative arrays)
      array
      true
      false
      null

 Since these are D builtin types, they can actually be a simple union of
 D builtin types.

 The idea is to have JSONValue be a simple alias to Algebraic!(...), just that
 there are currently still some workarounds for DMD < 2.067.0 on top, which
means
 that JSONValue is a struct that "alias this" inherits from Algebraic for the
 time being. Those workarounds will be removed when the code is actually put
into
 Phobos.

 But a simple union would obviously not be enough, it still needs a type tag of
 some form and needs to provide a  safe interface on top of it.

Agreed.


 Algebraic is the only thing that comes close right now,
 but I'd really prefer to have a fully
 statically typed version of Algebraic that uses an enum as the type tag instead
 of working with delegates/typeinfo.

If Algebraic is not good enough for this, it is a failure and must be fixed.


 There is a decision needed about whether toJSON() allocates data or
 returns slices into its inputrange. This can be 'static if' tested by:
 if inputrange can return immutable slices.

 The test is currently "is(T == string) || is (T == immutable(ubyte)[])", but
 slicing is done in those cases and the non-DOM parser interface is even  nogc
as
 long as exceptions are disabled.

With a range interface, you can test for 1) hasSlicing and 2) if 
ElementEncodingType is immutable.

Why is ubyte being accepted? The ECMA-404 spec sez: "Conforming JSON text is a 
sequence of Unicode code points".


 toChars() can take a compile
 time argument to determine if it is 'pretty' or not.

 As long as JSON DOM values are stored in a generic Algebraic (which is a huge
 win in terms of interoperability!), toChars won't suffice as a name.

Why not?

 It would have to be toJSON(Chars) (as it basically is now). I've gave the
"pretty"
 version a separate name simply because it's more convenient to use and pretty
 printing will probably be by far the most frequently used option when
converting
 to a string.

So make pretty printing the default. In fact, I'm skeptical that a non-pretty 
printed version is worth while. Note that an adapter algorithm can strip 
redundant whitespace.

Jul 29 2015

"Suliman" <evermind live.ru> writes:

If this implementation will be merged with phobos will vibed 
migrate to it, or it would two similar libs?

Jul 30 2015

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 30.07.2015 um 09:27 schrieb Suliman:
 If this implementation will be merged with phobos will vibed migrate to
 it, or it would two similar libs?

I'll then make the vibe.d JSON module compatible using "alias this" 
implicit conversions and then deprecate it over a longer period of time 
before it gets removed. And of course the serialization framework will 
be adjusted to work with the new JSON module.

Jul 30 2015

"Brad Anderson" <eco gnuk.net> writes:

On Thursday, 30 July 2015 at 04:41:51 UTC, Walter Bright wrote:
 I agree with your goal of readability. And if someone wants to 
 write code that emphasizes it's JSON, they can write it as 
 std.data.json.parseStream. (It's not about saving typing, it's 
 about avoiding extra redundant redundancy, I'm a big fan of 
 Strunk & White :-) ) This is not a huge deal for me, but I'm 
 not in favor of establishing a new convention that repeats the 
 module name. It eschews one of the advantages of having module 
 name spaces in the first place, and evokes the old C style 
 naming conventions.

Is there any reason why D doesn't allow json.parseStream() in 
this case? I remember the requirement of having the full module 
path being my first head scratcher while learning D. The first 
example in TDPL had some source code that called split() (if 
memory serves) and phobos had changed since the book was written 
and you needed to disambiguate. I found it very odd that you have 
type the whole thing when just the next level up would suffice to 
disambiguate it.

The trend seems to be toward more deeply nested modules in Phobos 
so having to type the full path will increasingly be a wart of 
D's.

If we can't have the minimal necessary module paths then I'm 
completely in favor of parseJSONStream over the more general 
parseStream. I want that "json" in there one way or another 
(preferably by the method which makes it optional while 
maintaining brevity).

Jul 30 2015

Walter Bright <newshound2 digitalmars.com> writes:

On 7/30/2015 9:58 AM, Brad Anderson wrote:
 If we can't have the minimal necessary module paths then I'm completely in
favor
 of parseJSONStream over the more general parseStream. I want that "json" in
 there one way or another (preferably by the method which makes it optional
while
 maintaining brevity).

I would think it unlikely to be parsing two different formats in one file. But 
in any case, you can always do this:

    import std.data.json : parseJSON = parse;

Or put the import in a scope:

    void doNaughtyThingsWithJson()
    {
        import std.data.json;
        ...
        x.parse();
    }

The latter seems to be becoming the preferred D style.

Jul 30 2015

"H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:

On Thu, Jul 30, 2015 at 12:43:40PM -0700, Walter Bright via Digitalmars-d wrote:
 On 7/30/2015 9:58 AM, Brad Anderson wrote:
If we can't have the minimal necessary module paths then I'm
completely in favor of parseJSONStream over the more general
parseStream. I want that "json" in there one way or another
(preferably by the method which makes it optional while maintaining
brevity).

 
 I would think it unlikely to be parsing two different formats in one
 file.  But in any case, you can always do this:
 
    import std.data.json : parseJSON = parse;
 
 Or put the import in a scope:
 
    void doNaughtyThingsWithJson()
    {
        import std.data.json;
        ...
        x.parse();
    }
 
 The latter seems to be becoming the preferred D style.

Yeah, local imports are fast becoming my preferred D coding style,
because it makes code portable -- if you move a function to a new
module, you don't have to untangle its import dependencies if all
imports are local. It's one of those little, overlooked things about D
that contribute toward making it an awesome language.


T

-- 
Written on the window of a clothing store: No shirt, no shoes, no service.

Jul 30 2015

Walter Bright <newshound2 digitalmars.com> writes:

On 7/30/2015 12:57 PM, H. S. Teoh via Digitalmars-d wrote:
 Yeah, local imports are fast becoming my preferred D coding style,
 because it makes code portable -- if you move a function to a new
 module, you don't have to untangle its import dependencies if all
 imports are local. It's one of those little, overlooked things about D
 that contribute toward making it an awesome language.

Funny how my preferred D style of writing code is steadily diverging from C++ 
style :-)

Jul 30 2015

"H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:

On Thu, Jul 30, 2015 at 01:26:17PM -0700, Walter Bright via Digitalmars-d wrote:
 On 7/30/2015 12:57 PM, H. S. Teoh via Digitalmars-d wrote:
Yeah, local imports are fast becoming my preferred D coding style,
because it makes code portable -- if you move a function to a new
module, you don't have to untangle its import dependencies if all
imports are local. It's one of those little, overlooked things about
D that contribute toward making it an awesome language.

 
 Funny how my preferred D style of writing code is steadily diverging
 from C++ style :-)

One would hope so, otherwise why are we here instead of in the C++
world? ;-)


T

-- 
This is not a sentence.

Jul 30 2015

"Suliman" <evermind live.ru> writes:

is the current build is ready for production? I am getting error:

source\stdx\data\json\value.d(81): Error: safe function 
'stdx.data.json.value.JSONValue.this' cannot call system function 
'std.variant.VariantN!(12u, typeof(null), bool, double, long, 
BigInt, string, JSONValue[], 
JSONValue[string]).VariantN.__ctor!(typeof(null)).this'

Jul 31 2015

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 31.07.2015 um 10:13 schrieb Suliman:
 is the current build is ready for production? I am getting error:

 source\stdx\data\json\value.d(81): Error: safe function
 'stdx.data.json.value.JSONValue.this' cannot call system function
 'std.variant.VariantN!(12u, typeof(null), bool, double, long, BigInt,
 string, JSONValue[],
 JSONValue[string]).VariantN.__ctor!(typeof(null)).this'

2.068 "fixed" possible safety issues with VariantN by marking the 
interface  system instead of  trusted. Unfortunately that broke any 
 safe code using Variant/Algebraic.

Jul 31 2015

"Suliman" <evermind live.ru> writes:

On Friday, 31 July 2015 at 12:16:02 UTC, Sönke Ludwig wrote:
 Am 31.07.2015 um 10:13 schrieb Suliman:
 is the current build is ready for production? I am getting 
 error:

 source\stdx\data\json\value.d(81): Error: safe function
 'stdx.data.json.value.JSONValue.this' cannot call system 
 function
 'std.variant.VariantN!(12u, typeof(null), bool, double, long, 
 BigInt,
 string, JSONValue[],
 JSONValue[string]).VariantN.__ctor!(typeof(null)).this'

 2.068 "fixed" possible safety issues with VariantN by marking 
 the interface  system instead of  trusted. Unfortunately that 
 broke any  safe code using Variant/Algebraic.

Wat revision are usable? I checked some and all have issue like:

source\App.d(5,34): Error: template 
stdx.data.json.parser.parseJSONValue cannot
deduce function from argument types !()(string), candidates are:
source\stdx\data\json\parser.d(105,11):        
stdx.data.json.parser.parseJSONVa
lue(LexOptions options = LexOptions.init, Input)(ref Input input, 
string filenam
e = "") if (isStringInputRange!Input || 
isIntegralInputRange!Input)

Jul 31 2015

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 31.07.2015 um 22:15 schrieb Suliman:
 On Friday, 31 July 2015 at 12:16:02 UTC, Sönke Ludwig wrote:
 Am 31.07.2015 um 10:13 schrieb Suliman:
 is the current build is ready for production? I am getting error:

 source\stdx\data\json\value.d(81): Error: safe function
 'stdx.data.json.value.JSONValue.this' cannot call system function
 'std.variant.VariantN!(12u, typeof(null), bool, double, long, BigInt,
 string, JSONValue[],
 JSONValue[string]).VariantN.__ctor!(typeof(null)).this'

 2.068 "fixed" possible safety issues with VariantN by marking the
 interface  system instead of  trusted. Unfortunately that broke any
  safe code using Variant/Algebraic.

 Wat revision are usable? I checked some and all have issue like:

 source\App.d(5,34): Error: template stdx.data.json.parser.parseJSONValue
 cannot
 deduce function from argument types !()(string), candidates are:
 source\stdx\data\json\parser.d(105,11): stdx.data.json.parser.parseJSONVa
 lue(LexOptions options = LexOptions.init, Input)(ref Input input, string
 filenam
 e = "") if (isStringInputRange!Input || isIntegralInputRange!Input)

parseJSONValue takes a reference to an input range, so that it can 
consume the input and leave any trailing text after the JSON value in 
the range. For just converting a string to a JSONValue, use toJSONValue 
instead.

I'll make this more clear in the documentation.

Aug 01 2015

"Suliman" <evermind live.ru> writes:

 parseJSONValue takes a reference to an input range, so that it 
 can consume the input and leave any trailing text after the 
 JSON value in the range. For just converting a string to a 
 JSONValue, use toJSONValue instead.

 I'll make this more clear in the documentation.

Yes please, because it's hard to understand difference. Maybe 
it's possible to simplify it more?

Also I get trouble with extracting value:

response = toJSONValue(res.bodyReader.readAllUTF8());	

writeln(to!int(response["code"]));

C:\D\dmd2\windows\bin\..\..\src\phobos\std\conv.d(295,24): Error: 
template std.c
onv.toImpl cannot deduce function from argument types 
!(int)(VariantN!20u), cand
idates are:
C:\D\dmd2\windows\bin\..\..\src\phobos\std\conv.d(361,3):        
std.conv.toImpl
(T, S)(S value) if (isImplicitlyConvertible!(S, T) && 
!isEnumStrToStr!(S, T) &&
!isNullToStr!(S, T))


If I am doing simple:
writeln(response["code"]);

Code produce right result (for example 200)

What value consist in key "code"? It's look like not simple 
"200". How I can convert it to string or int?

Aug 01 2015

"Suliman" <evermind live.ru> writes:

Look like it's Variant type. So I tried to use method get! do 
extract value from it
writeln(get!(response["code"]));

But I get error: Error: variable response cannot be read at 
compile time

Aug 01 2015

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 01.08.2015 um 16:15 schrieb Suliman:
 Look like it's Variant type. So I tried to use method get! do extract
 value from it
 writeln(get!(response["code"]));

 But I get error: Error: variable response cannot be read at compile time

The correct syntax is: response["code"].get!int

Aug 01 2015

"Suliman" <evermind live.ru> writes:

On Saturday, 1 August 2015 at 14:52:55 UTC, Sönke Ludwig wrote:
 Am 01.08.2015 um 16:15 schrieb Suliman:
 Look like it's Variant type. So I tried to use method get! do 
 extract
 value from it
 writeln(get!(response["code"]));

 But I get error: Error: variable response cannot be read at 
 compile time

 The correct syntax is: response["code"].get!int

Thanks! But How to get access to elements that in result: {}
for example to: "name":"_system"

{"result":{"name":"_system","id":"76067","path":"database-6067","isSystem":true},"error":false,"code":200}

Could you also extend docs with code example.

Aug 01 2015

"Suliman" <evermind live.ru> writes:

On Saturday, 1 August 2015 at 14:52:55 UTC, Sönke Ludwig wrote:
 Am 01.08.2015 um 16:15 schrieb Suliman:
 Look like it's Variant type. So I tried to use method get! do 
 extract
 value from it
 writeln(get!(response["code"]));

 But I get error: Error: variable response cannot be read at 
 compile time

 The correct syntax is: response["code"].get!int

connectInfo.statusCode = response["code"].get!int;

std.variant.VariantException std\variant.d(1445): Variant: 
attempting to use incompatible types 
stdx.data.json.value.JSONValue and int

Aug 01 2015

Jacob Carlborg <doob me.com> writes:

On 2015-07-30 06:41, Walter Bright wrote:

 I agree with your goal of readability. And if someone wants to write
 code that emphasizes it's JSON, they can write it as
 std.data.json.parseStream. (It's not about saving typing, it's about
 avoiding extra redundant redundancy, I'm a big fan of Strunk & White :-)
 ) This is not a huge deal for me, but I'm not in favor of establishing a
 new convention that repeats the module name. It eschews one of the
 advantages of having module name spaces in the first place, and evokes
 the old C style naming conventions.

I kind of agree with that, but at the same time, if one always need to 
use the fully qualified name (or an alias) because there's a conflict 
then that's quite annoying.

A prefect example of that is the Path module in Tango. It has functions 
as "split" and "join". Every time I use it I alias the import:

import Path = tango.io.Path;

Because otherwise it will conflict with the string manipulating 
functions with the same names. In Phobos the names in the path module 
are different compared to the string functions.

For example, I think "Value" and "parse" are too generic to not include 
"JSON" in their name.

-- 
/Jacob Carlborg

Jul 30 2015

Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:

On 07/30/2015 02:40 PM, Jacob Carlborg wrote:
 On 2015-07-30 06:41, Walter Bright wrote:

 I agree with your goal of readability. And if someone wants to write
 code that emphasizes it's JSON, they can write it as
 std.data.json.parseStream. (It's not about saving typing, it's about
 avoiding extra redundant redundancy, I'm a big fan of Strunk & White :-)
 ) This is not a huge deal for me, but I'm not in favor of establishing a
 new convention that repeats the module name. It eschews one of the
 advantages of having module name spaces in the first place, and evokes
 the old C style naming conventions.

 I kind of agree with that, but at the same time, if one always need to
 use the fully qualified name (or an alias) because there's a conflict
 then that's quite annoying.

It also fucks up UFCS, and I'm a huge fan of UFCS.

I do agree that D's module system is awesome here and worth taking 
advantage of to avoid C++-style naming conventions, but I still think 
balance is needed. Sometimes, just because we can use a shorter 
potentially-conflicting name doesn't mean we necessarily should.

Aug 21 2015

"David Nadlinger" <code klickverbot.at> writes:

On Friday, 21 August 2015 at 15:58:22 UTC, Nick Sabalausky wrote:
 It also fucks up UFCS, and I'm a huge fan of UFCS.

Are you saying that "import json : parseJSON = parse; 
foo.parseJSON.bar;" does not work?

  – David

Aug 21 2015

Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:

On 08/21/2015 12:29 PM, David Nadlinger wrote:
 On Friday, 21 August 2015 at 15:58:22 UTC, Nick Sabalausky wrote:
 It also fucks up UFCS, and I'm a huge fan of UFCS.

 Are you saying that "import json : parseJSON = parse;
 foo.parseJSON.bar;" does not work?

Ok, fair point, although I was referring more to fully-qualified name 
lookups, as in the snippet I quoted from Jacob. Ie, this doesn't work:

someJsonCode.std.json.parse();

I do think though, generally speaking, if there is much need to do a 
renamed import, the symbol in question probably didn't have the best 
name in the first place.

Renamed importing is a great feature to have, but when you see it used 
it raises the question "*Why* is this being renamed? Why not just use 
it's real name?" For the most part, I see two main reasons:

1. "Just because. I like this bikeshed color better." But this is merely 
a code smell, not a legitimate reason to even bother.

or

2. The symbol has a questionable name in the first place.

If there's reason to even bring up renamed imports as a solution, then 
it's probably falling into the "questionably named" category.

Just because we CAN use D's module system and renamed imports and such 
to clear up ambiguities, doesn't mean we should let ourselves take 
things TOO far to the opposite extreme when avoiding C/C++'s "big long 
ugly names as a substitute for modules".

Like Walter, I do very much dislike C/C++'s super-long, 
super-unambiguous names. But IMO, preferring parseStream over 
parseJSONStream isn't a genuine case of avoiding C/C++-style naming, 
it's just being overrun by fear of C/C++-style naming and thus taking 
things too far to the opposite extreme. We can strike a better balance 
than choosing between "brief and unclear-at-a-glance" and "C++-level 
verbosity".

Yea, we CAN do "import std.json : parseJSONStream = parseStream;", but 
if there's even any motivation to do so in the first place, we may as 
well just use the better name right from the start. Besides, those who 
prefer ultra-brevity are free to paint their bikesheds with renamed 
imports, too ;)

Aug 22 2015

"Don" <prosthetictelevisions teletubby.medical.com> writes:

On Tuesday, 28 July 2015 at 22:29:01 UTC, Walter Bright wrote:
 On 7/28/2015 7:07 AM, Atila Neves wrote:
 Start of the two week process, folks.

 Thank you very much, Sönke, for taking this on. Thank you, 
 Atila, for taking on the thankless job of being review manager.

 Just looking at the documentation only, some general notes:

 1. Not sure that 'JSON' needs to be embedded in the public 
 names. 'parseJSONStream' should just be 'parseStream', etc. 
 Name disambiguation, if needed, should be ably taken care of by 
 a number of D features for that purpose. Additionally, I 
 presume that the stdx.data package implies a number of 
 different formats. These formats should all use the same names 
 with as similar as possible APIs - this won't work too well if 
 JSON is embedded in the APIs.

 2. JSON is a trivial format, http://json.org/. But I count 6 
 files and 30 names in the public API.

 3. Stepping back a bit, when I think of parsing JSON data, I 
 think:

     auto ast = inputrange.toJSON();

 where toJSON() accepts an input range and produces a container, 
 the ast. The ast is just a JSON value. Then, I can just query 
 the ast to see what kind of value it is (using overloading), 
 and walk it as necessary. To create output:

     auto r = ast.toChars();  // r is an InputRange of characters
     writeln(r);

 So, we'll need:
     toJSON
     toChars
     JSONException

 The possible JSON values are:
     string
     number
     object (associative arrays)
     array
     true
     false
     null

 Since these are D builtin types, they can actually be a simple 
 union of D builtin types.

Related to this: it should not be importing std.bigint. Note that 
if std.bigint were fully implemented, it would be very 
heavyweight (optimal multiplication of enormous integers involves 
fast fourier transforms and all kinds of odd stuff, that's really 
bizarre to pull in if you're just parsing a trivial little JSON 
config file).

Although it is possible for JSON to contain numbers which are 
larger than can fit into long or ulong, it's an abnormal case. 
Many apps (probably, almost all) will want to reject such numbers 
immediately. BigInt should be opt-in.

And, it is also possible to have floating point numbers that are 
not representable in double or real. BigInt doesn't solve that 
case.

It might be adequate to simply present it as a raw number (an 
unconverted string) if it isn't a built-in type. Parse it for 
validity, but don't actually convert it.

Jul 29 2015

"H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:

On Wed, Jul 29, 2015 at 03:22:05PM +0000, Don via Digitalmars-d wrote:
 On Tuesday, 28 July 2015 at 22:29:01 UTC, Walter Bright wrote:

[...]
The possible JSON values are:
    string
    number
    object (associative arrays)
    array
    true
    false
    null

Since these are D builtin types, they can actually be a simple union
of D builtin types.

 
 Related to this: it should not be importing std.bigint. Note that if
 std.bigint were fully implemented, it would be very heavyweight
 (optimal multiplication of enormous integers involves fast fourier
 transforms and all kinds of odd stuff, that's really bizarre to pull
 in if you're just parsing a trivial little JSON config file).
 
 Although it is possible for JSON to contain numbers which are larger
 than can fit into long or ulong, it's an abnormal case. Many apps
 (probably, almost all) will want to reject such numbers immediately.
 BigInt should be opt-in.
 
 And, it is also possible to have floating point numbers that are not
 representable in double or real. BigInt doesn't solve that case.
 
 It might be adequate to simply present it as a raw number (an
 unconverted string) if it isn't a built-in type. Parse it for
 validity, but don't actually convert it.

[...]

Here's a thought: what about always storing JSON numbers as strings
(albeit tagged with the "number" type, to differentiate them from actual
strings in the input), and the user specifies what type to convert it
to?  The default type can be something handy, like int, but the user has
the option to ask for size_t, or double, or even BigInt if they want
(IIRC, the BigInt ctor can initialize an instance from a digit string,
so if we adopt the convention that non-built-in number-like types can be
initialized from digit strings, then std.json can simply take a template
parameter for the output type, and hand it the digit string. This way,
we can get rid of the std.bigint dependency, except where the user
actually wants to use BigInt.)


T

-- 
Be in denial for long enough, and one day you'll deny yourself of things you
wish you hadn't.

Jul 29 2015

"Laeeth Isharc" <laeethnospam nospamlaeeth.com> writes:

 Here's a thought: what about always storing JSON numbers as 
 strings (albeit tagged with the "number" type, to differentiate 
 them from actual strings in the input), and the user specifies 
 what type to convert it to?  The default type can be something 
 handy, like int, but the user has the option to ask for size_t, 
 or double, or even BigInt if they want (IIRC, the BigInt ctor 
 can initialize an instance from a digit string, so if we adopt 
 the convention that non-built-in number-like types can be 
 initialized from digit strings, then std.json can simply take a 
 template parameter for the output type, and hand it the digit 
 string. This way, we can get rid of the std.bigint dependency, 
 except where the user actually wants to use BigInt.)

Some JSON files can be quite large...

For example, I have a compressed 175 Gig of Reddit comments (one 
file per month) I would like to work with using D, and time + 
memory demands = money.

Wouldn't it be a pain not to store numbers directly when parsing 
in those cases (if I understood you correctly)?

Jul 29 2015

"sigod" <sigod.mail gmail.com> writes:

On Wednesday, 29 July 2015 at 17:04:33 UTC, Laeeth Isharc wrote:
 [...]

 Some JSON files can be quite large...

 For example, I have a compressed 175 Gig of Reddit comments 
 (one file per month) I would like to work with using D, and 
 time + memory demands = money.

 Wouldn't it be a pain not to store numbers directly when 
 parsing in those cases (if I understood you correctly)?

I think in your case it wouldn't matter. Comments are text, 
mostly. There's probably just one or two fields with "number" 
type.

Jul 29 2015

=?windows-1252?Q?S=F6nke_Ludwig?= <sludwig rejectedsoftware.com> writes:

Am 29.07.2015 um 18:47 schrieb H. S. Teoh via Digitalmars-d:
 On Wed, Jul 29, 2015 at 03:22:05PM +0000, Don via Digitalmars-d wrote:
 On Tuesday, 28 July 2015 at 22:29:01 UTC, Walter Bright wrote:

 [...]
 The possible JSON values are:
     string
     number
     object (associative arrays)
     array
     true
     false
     null

 Since these are D builtin types, they can actually be a simple union
 of D builtin types.

 Related to this: it should not be importing std.bigint. Note that if
 std.bigint were fully implemented, it would be very heavyweight
 (optimal multiplication of enormous integers involves fast fourier
 transforms and all kinds of odd stuff, that's really bizarre to pull
 in if you're just parsing a trivial little JSON config file).

 Although it is possible for JSON to contain numbers which are larger
 than can fit into long or ulong, it's an abnormal case. Many apps
 (probably, almost all) will want to reject such numbers immediately.
 BigInt should be opt-in.

 And, it is also possible to have floating point numbers that are not
 representable in double or real. BigInt doesn't solve that case.

 It might be adequate to simply present it as a raw number (an
 unconverted string) if it isn't a built-in type. Parse it for
 validity, but don't actually convert it.

 [...]

 Here's a thought: what about always storing JSON numbers as strings
 (albeit tagged with the "number" type, to differentiate them from actual
 strings in the input), and the user specifies what type to convert it
 to?  The default type can be something handy, like int, but the user has
 the option to ask for size_t, or double, or even BigInt if they want
 (IIRC, the BigInt ctor can initialize an instance from a digit string,
 so if we adopt the convention that non-built-in number-like types can be
 initialized from digit strings, then std.json can simply take a template
 parameter for the output type, and hand it the digit string. This way,
 we can get rid of the std.bigint dependency, except where the user
 actually wants to use BigInt.)


 T

That means a performance hit, because the string has to be parsed twice 
- once for validation and once for conversion. And it means that for 
non-string inputs the lexer has to allocate for each number. It also 
doesn't know the length of the number in advance, so it can't allocate 
in a generally efficient way.

Jul 29 2015

"matovitch" <camille.brugel laposte.net> writes:

Hi Sonke,

Great to see your module moving towards phobos inclusion (I have 
not been following the latest progress of D sadly :() ! Just a 
small remark from the documentation example.

Maybe it would be better to replace :

     value.toJSONString!true()

by

     value.toJSONString!prettify()

using a well-named enum instead of a boolean which could seem 
obscure I now Eigen C++ lib use a similar thing for static vs 
dynamic matrix.

Thanks for the read. Regards,

matovitch

Jul 29 2015

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 29.07.2015 um 20:21 schrieb matovitch:
 Hi Sonke,

 Great to see your module moving towards phobos inclusion (I have not
 been following the latest progress of D sadly :() ! Just a small remark
 from the documentation example.

 Maybe it would be better to replace :

      value.toJSONString!true()

 by

      value.toJSONString!prettify()

 using a well-named enum instead of a boolean which could seem obscure I
 now Eigen C++ lib use a similar thing for static vs dynamic matrix.

 Thanks for the read. Regards,

 matovitch

Hm, that example is outdated, I'll fix it ASAP. Currently it uses toJSON 
and a separate toPrettyJSON function. An obvious alternative would be to 
add an entry GeneratorOptions.prettify, because toJSON already takes 
that as a template argument: toJSON!(GeneratorOptions.prettify)

Jul 29 2015

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 29.07.2015 um 17:22 schrieb Don:
 Related to this: it should not be importing std.bigint. Note that if
 std.bigint were fully implemented, it would be very heavyweight (optimal
 multiplication of enormous integers involves fast fourier transforms and
 all kinds of odd stuff, that's really bizarre to pull in if you're just
 parsing a trivial little JSON config file).

 Although it is possible for JSON to contain numbers which are larger
 than can fit into long or ulong, it's an abnormal case. Many apps
 (probably, almost all) will want to reject such numbers immediately.
 BigInt should be opt-in.

BigInt is opt-in, at least as far as the lexer goes. But why would such 
a number be rejected? Any of the usual floating point parsers would 
simply parse the number and just lose precision if it can't be 
represented exactly. And after all, it's still valid JSON.

But note that I've only added this due to multiple requests, it doesn't 
seem to be that uncommon. We *could* in theory make the JSONNumber type 
a template and make the bigint fields optional. That would be the only 
thing missing to making the import optional, too.

 And, it is also possible to have floating point numbers that are not
 representable in double or real. BigInt doesn't solve that case.

 It might be adequate to simply present it as a raw number (an
 unconverted string) if it isn't a built-in type. Parse it for validity,
 but don't actually convert it.

If we'd have a Decimal type in Phobos, I would have integrated that, 
too. The string representation may be an alternative, but since the 
weight of the import is the main argument, I'd rather choose the more 
comfortable/logical option - or probably rather try to avoid std.bigint 
being such a heavy import (such as local imports to defer secondary 
imports).

Jul 29 2015

"Dmitry Olshansky" <dmitry.olsh gmail.com> writes:

On Wednesday, 29 July 2015 at 15:22:06 UTC, Don wrote:
On Tuesday, 28 July 2015 at 22:29:01 UTC, Walter Bright wrote:

[snip]

Related to this: it should not be importing std.bigint. Note
that if std.bigint were fully implemented, it would be very
heavyweight (optimal multiplication of enormous integers
involves fast fourier transforms and all kinds of odd stuff,
that's really bizarre to pull in if you're just parsing a
trivial little JSON config file).

Although it is possible for JSON to contain numbers which are
larger than can fit into long or ulong, it's an abnormal case.
Many apps (probably, almost all) will want to reject such
numbers immediately. BigInt should be opt-in.

And, it is also possible to have floating point numbers that
are not representable in double or real. BigInt doesn't solve
that case.

It might be adequate to simply present it as a raw number (an
unconverted string) if it isn't a built-in type. Parse it for
validity, but don't actually convert it.

Actually JSON is defined as subset of EMCASCript-262 spec hence
it may not ciontain anything other 64-bit5 IEEE-754 numbers
period.
See:
http://www.ecma-international.org/ecma-262/6.0/index.html#sec-terms-and-definitions-number-value
http://www.ecma-international.org/ecma-262/6.0/index.html#sec-ecmascript-language-types-number-type

Anything else is e-hm an "extension" (or simply put - violation
of spec), I've certainly seen 64-bit integers in the wild - how
often true big ints are found out there?

If no one can present some run of the mill REST JSON API breaking
the rules I'd suggest demoting BigInt handling to optional
feature.

Aug 02 2015

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 02.08.2015 um 19:14 schrieb Dmitry Olshansky:
Actually JSON is defined as subset of EMCASCript-262 spec hence it may
not ciontain anything other 64-bit5 IEEE-754 numbers period.
See:
http://www.ecma-international.org/ecma-262/6.0/index.html#sec-terms-and-definitions-number-value

http://www.ecma-international.org/ecma-262/6.0/index.html#sec-ecmascript-language-types-number-type

Anything else is e-hm an "extension" (or simply put - violation of
spec), I've certainly seen 64-bit integers in the wild - how often true
big ints are found out there?

If no one can present some run of the mill REST JSON API breaking the
rules I'd suggest demoting BigInt handling to optional feature.

This is not true. Quoting from ECMA-404:

JSON is a text format that facilitates structured data interchange between all
programming languages. JSON
is syntax of braces, brackets, colons, and commas that is useful in many
contexts, profiles, and applications.
JSON was inspired by the object literals of JavaScript aka
ECMAScript as defined in the ECMAScript
Language Specification, third Edition [1].
It does not attempt to impose ECMAScript’s internal data
representations on other programming languages. Instead, it shares a small
subset of ECMAScript’s textual
representations with all other programming languages.
JSON is agnostic about numbers. In any programming language, there
can be a variety of number types of
various capacities and complements, fixed or floating, binary or decimal. That
can make interchange between
different programming languages difficult. JSON instead offers only
the representation of numbers that
humans use: a sequence of digits. All programming languages know how to
make sense of digit sequences
even if they disagree on internal representations. That is enough to allow
interchange.

Aug 03 2015

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 03-Aug-2015 10:56, Sönke Ludwig wrote:
Am 02.08.2015 um 19:14 schrieb Dmitry Olshansky:
Actually JSON is defined as subset of EMCASCript-262 spec hence it may
not ciontain anything other 64-bit5 IEEE-754 numbers period.
See:
http://www.ecma-international.org/ecma-262/6.0/index.html#sec-terms-and-definitions-number-value

http://www.ecma-international.org/ecma-262/6.0/index.html#sec-ecmascript-language-types-number-type

Anything else is e-hm an "extension" (or simply put - violation of
spec), I've certainly seen 64-bit integers in the wild - how often true
big ints are found out there?

If no one can present some run of the mill REST JSON API breaking the
rules I'd suggest demoting BigInt handling to optional feature.

This is not true. Quoting from ECMA-404:

JSON is a text format that facilitates structured data interchange
between all programming languages. JSON
is syntax of braces, brackets, colons, and commas that is useful in
many contexts, profiles, and applications.
JSON was inspired by the object literals of JavaScript aka
ECMAScript as defined in the ECMAScript
Language Specification, third Edition [1].
It does not attempt to impose ECMAScript’s internal data
representations on other programming languages. Instead, it shares a
small subset of ECMAScript’s textual
representations with all other programming languages.
JSON is agnostic about numbers. In any programming language,
there can be a variety of number types of
various capacities and complements, fixed or floating, binary or
decimal. That can make interchange between
different programming languages difficult. JSON instead offers
only the representation of numbers that
humans use: a sequence of digits. All programming languages know
how to make sense of digit sequences
even if they disagree on internal representations. That is enough to
allow interchange.

Hm about 5 solid pages and indeed it leaves everything unspecified for
extensibility so I stand corrected.
Still I'm more inclined to put my trust in RFCs, such as the new one:
http://www.ietf.org/rfc/rfc7159.txt

Which states:

This specification allows implementations to set limits on the range
and precision of numbers accepted. Since software that implements
IEEE 754-2008 binary64 (double precision) numbers [IEEE754] is
generally available and widely used, good interoperability can be
achieved by implementations that expect no more precision or range
than these provide, in the sense that implementations will
approximate JSON numbers within the expected precision. A JSON
number such as 1E400 or 3.141592653589793238462643383279 may indicate
potential interoperability problems, since it suggests that the
software that created it expects receiving software to have greater
capabilities for numeric magnitude and precision than is widely
available.

Note that when such software is used, numbers that are integers and
are in the range [-(2**53)+1, (2**53)-1] are interoperable in the
sense that implementations will agree exactly on their numeric
values.

And it implies setting limits on everything:

9. Parsers

A JSON parser transforms a JSON text into another representation. A
JSON parser MUST accept all texts that conform to the JSON grammar.
A JSON parser MAY accept non-JSON forms or extensions.

An implementation may set limits on the size of texts that it
accepts. An implementation may set limits on the maximum depth of
nesting. An implementation may set limits on the range and precision
of numbers. An implementation may set limits on the length and
character contents of strings.

Now back to our land let's look at say rapidJSON.

It MAY seem to handle big integers:
https://github.com/miloyip/rapidjson/blob/master/include/rapidjson/internal/biginteger.h

But it's used only to parse doubles:
https://github.com/miloyip/rapidjson/pull/137

Anyhow the API says it all - only integers up to 64bit and doubles:

http://rapidjson.org/md_doc_sax.html#Handler

Pretty much what I expect by default.
And plz-plz don't hardcode BitInteger in JSON parser, it's slow plus it
causes epic code bloat as Don already pointed out.

--
Dmitry Olshansky

Aug 03 2015

Marco Leise <Marco.Leise gmx.de> writes:

Am Mon, 03 Aug 2015 12:11:14 +0300
schrieb Dmitry Olshansky <dmitry.olsh gmail.com>:

[...]

Now back to our land let's look at say rapidJSON.

It MAY seem to handle big integers:
https://github.com/miloyip/rapidjson/blob/master/include/rapidjson/internal/biginteger.h

But it's used only to parse doubles:
https://github.com/miloyip/rapidjson/pull/137

Anyhow the API says it all - only integers up to 64bit and doubles:

http://rapidjson.org/md_doc_sax.html#Handler

Pretty much what I expect by default.
And plz-plz don't hardcode BitInteger in JSON parser, it's slow plus it
causes epic code bloat as Don already pointed out.

I would take RapidJSON with a grain of salt, its main goal is
to be the fastest JSON parser. Nothing wrong with that, but
BigInt and fast doesn't naturally match and the C standard
library also doesn't come with a BigInt type that could
conveniently be plugged in.
Please compare again with JSON parsers in languages that
provide BigInts, e.g. Ruby:
http://ruby-doc.org/stdlib-1.9.3/libdoc/json/rdoc/JSON/Ext/Generator/GeneratorMethods/Bignum.html
Optional ok, but no support at all would be so 90s.

My impression is that the standard wants to allow JSON being
used in environments that cannot provide BigInt support, but a
modern language for PCs with a BigInt module should totally
support reading long integers and be able to do proper
rounding of double values. I thought about reading two
BigInts: one for the significand and one for the
base-10 exponent, so you don't need a BigFloat but have the
full accuracy from the textual string still as x*10^y.

--
Marco

Sep 27 2015

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 27-Sep-2015 20:43, Marco Leise wrote:
Am Mon, 03 Aug 2015 12:11:14 +0300
schrieb Dmitry Olshansky <dmitry.olsh gmail.com>:

[...]

Now back to our land let's look at say rapidJSON.

It MAY seem to handle big integers:
https://github.com/miloyip/rapidjson/blob/master/include/rapidjson/internal/biginteger.h

But it's used only to parse doubles:
https://github.com/miloyip/rapidjson/pull/137

Anyhow the API says it all - only integers up to 64bit and doubles:

http://rapidjson.org/md_doc_sax.html#Handler

Pretty much what I expect by default.
And plz-plz don't hardcode BitInteger in JSON parser, it's slow plus it
causes epic code bloat as Don already pointed out.

Yes, yet support should be optional.

Please compare again with JSON parsers in languages that
provide BigInts, e.g. Ruby:
http://ruby-doc.org/stdlib-1.9.3/libdoc/json/rdoc/JSON/Ext/Generator/GeneratorMethods/Bignum.html
Optional ok, but no support at all would be so 90s.

Agreed. Still keep in mind the whole reason that Ruby supports it is
because its "integer" type is multi-precision by default. So if your
native integer type is multi-precision than indeed why add a special
case for fixnums.

All of that is sensible ... in the slow code path. The common path must
be simple and lean, bigints are certainly an exception rather then the
rule. Therefore support for big int should not come at the expense for
other use cases. Also - pluggability should allow me to e.g. use my own
"big" decimal floating point.

--
Dmitry Olshansky

Sep 27 2015

Walter Bright <newshound2 digitalmars.com> writes:

A speed optimization, since JSON parsing speed is critical:

If the parser is able to use slices of its input, store numbers as slices. Only 
convert them to numbers lazily, as the numeric conversion can take significant
time.

Jul 28 2015

"Brad Anderson" <eco gnuk.net> writes:

On Tuesday, 28 July 2015 at 23:16:34 UTC, Walter Bright wrote:
 A speed optimization, since JSON parsing speed is critical:

 If the parser is able to use slices of its input, store numbers 
 as slices. Only convert them to numbers lazily, as the numeric 
 conversion can take significant time.

That's what it does (depending on which parser you use). The StAX 
style parser included is lazy and non-allocating.

Jul 28 2015

Walter Bright <newshound2 digitalmars.com> writes:

On 7/28/2015 4:24 PM, Brad Anderson wrote:
 On Tuesday, 28 July 2015 at 23:16:34 UTC, Walter Bright wrote:
 A speed optimization, since JSON parsing speed is critical:

 If the parser is able to use slices of its input, store numbers as slices.
 Only convert them to numbers lazily, as the numeric conversion can take
 significant time.

 That's what it does (depending on which parser you use). The StAX style parser
 included is lazy and non-allocating.

Great!

Jul 28 2015

"Andrea Fontana" <nospam example.com> writes:

On Tuesday, 28 July 2015 at 14:07:19 UTC, Atila Neves wrote:
 Start of the two week process, folks.

 Code: https://github.com/s-ludwig/std_data_json
 Docs: http://s-ludwig.github.io/std_data_json/

 Atila

Why don't do a shortcut like:

jv.opt("/this/is/a/path") ?

I use it in my json/bson binding.

Anyway, opt(...).isNull return true if that sub-obj doesn't 
exists.
How can I check instead if that sub-object is actually null?

Something like:  { "a" : { "b" : null} } ?

It would be nice to have a way to get a default if it doesn't 
exists.
On my library that behave in a different way i write:

Object is :  { address : { number: 15 } }

// as!xxx try to get a value of that type, if it can't it tries 
to convert it using .to!xxx if it fails again it returns default

// Converted as string
assert(obj["/address/number"].as!string == "15");

// This doesn't exists
assert(obj["/address/asdasd"].as!int == int.init);

// A default value is specified		
assert(obj["/address/asdasd"].as!int(50) == 50);

// A default value is specified (but value exists)
assert(obj["/address/number"].as!int(50) == 15);

// This doesn't exists
assert(!obj["address"]["number"]["this"].exists);

My library has a get!xxx string too (that throws an exception if 
value is not xxx) and to!xxx that throws an exception if value 
can't converted to xxx.

Other feature:
// This field doesn't exists return default value
auto tmpField = obj["/address/asdasd"].as!int(50);
assert(tmpField.error == true);   // Value is defaulted ...
assert(tmpField.exists == false); // ... because it doesn't exists
assert(tmpField == 50);

// This field exists, but can't be converted to int. Return 
default value.
tmpField = obj["/tags/0"].as!int(50);
assert(tmpField.error == true);   // Value is defaulted ...
assert(tmpField.exists == true);  // ... but a field is actually 
here
assert(tmpField == 50);

Jul 29 2015

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 29.07.2015 um 09:46 schrieb Andrea Fontana:
 On Tuesday, 28 July 2015 at 14:07:19 UTC, Atila Neves wrote:
 Start of the two week process, folks.

 Code: https://github.com/s-ludwig/std_data_json
 Docs: http://s-ludwig.github.io/std_data_json/

 Atila

 Why don't do a shortcut like:

 jv.opt("/this/is/a/path") ?

 I use it in my json/bson binding.

That would be another possibility. What do you think about the 
opt(jv).foo.bar[12].baz alternative? One advantage is that it could work 
without parsing a string and the implications thereof (error handling?).

 Anyway, opt(...).isNull return true if that sub-obj doesn't exists.
 How can I check instead if that sub-object is actually null?

 Something like:  { "a" : { "b" : null} } ?

opt(...) == null

 It would be nice to have a way to get a default if it doesn't exists.
 On my library that behave in a different way i write:

 Object is :  { address : { number: 15 } }

 // as!xxx try to get a value of that type, if it can't it tries to
 convert it using .to!xxx if it fails again it returns default

 // Converted as string
 assert(obj["/address/number"].as!string == "15");

 // This doesn't exists
 assert(obj["/address/asdasd"].as!int == int.init);

 // A default value is specified
 assert(obj["/address/asdasd"].as!int(50) == 50);

 // A default value is specified (but value exists)
 assert(obj["/address/number"].as!int(50) == 15);

 // This doesn't exists
 assert(!obj["address"]["number"]["this"].exists);

 My library has a get!xxx string too (that throws an exception if value
 is not xxx) and to!xxx that throws an exception if value can't converted
 to xxx.

I try to build this from existing building blocks in Phobos, so opt 
basically returns a Nullable!Algebraic. I guess some of it could simply 
be implemented in Algebraic, for example by adding an overload of .get 
that takes a default value. Instead of .to, you already have .coerce.

The other possible approach, which would be more convenient to use, 
would be add a "default value" overload to "opt", for example: 
jv.opt("defval").foo.bar

 Other feature:
 // This field doesn't exists return default value
 auto tmpField = obj["/address/asdasd"].as!int(50);
 assert(tmpField.error == true);   // Value is defaulted ...
 assert(tmpField.exists == false); // ... because it doesn't exists
 assert(tmpField == 50);

 // This field exists, but can't be converted to int. Return default value.
 tmpField = obj["/tags/0"].as!int(50);
 assert(tmpField.error == true);   // Value is defaulted ...
 assert(tmpField.exists == true);  // ... but a field is actually here
 assert(tmpField == 50);

Jul 29 2015

"Andrea Fontana" <nospam example.com> writes:

On Wednesday, 29 July 2015 at 08:55:20 UTC, Sönke Ludwig wrote:
 That would be another possibility. What do you think about the 
 opt(jv).foo.bar[12].baz alternative? One advantage is that it 
 could work without parsing a string and the implications 
 thereof (error handling?).

I implemented it too, but I removed.
Many times fields name are functions name or similar and it 
breaks the code.
In my implementation it creates a lot of temporary objects (one 
for each subobj) using the string instead, i just create the last 
one.

It's not easy for me to use assignments with that syntax. 
Something like:

obj.with.a.new.field = 3;

It's difficult to implement. It's much easier to implement:

obj["/field/doesnt/exists"] = 3

It's much easier to write formatted-string paths.
It allows future implementation of something like xpath/jquery 
style

If your json contains keys with "/" inside, you can still use old 
plain syntax...

String parsing it's quite easy (at compile time too) of course. 
If a part of path doesn't exists it works like a part of opt("a", 
"b", "c") doesn't. It's just syntax sugar. :)

 Anyway, opt(...).isNull return true if that sub-obj doesn't 
 exists.
 How can I check instead if that sub-object is actually null?

 Something like:  { "a" : { "b" : null} } ?

 opt(...) == null

Does it works? Anyway it seems ambiguous:
opt(...) == null   => false
opt(...).isNull    => true

 It would be nice to have a way to get a default if it doesn't 
 exists.
 On my library that behave in a different way i write:

 Object is :  { address : { number: 15 } }

 // as!xxx try to get a value of that type, if it can't it 
 tries to
 convert it using .to!xxx if it fails again it returns default

 // Converted as string
 assert(obj["/address/number"].as!string == "15");

 // This doesn't exists
 assert(obj["/address/asdasd"].as!int == int.init);

 // A default value is specified
 assert(obj["/address/asdasd"].as!int(50) == 50);

 // A default value is specified (but value exists)
 assert(obj["/address/number"].as!int(50) == 15);

 // This doesn't exists
 assert(!obj["address"]["number"]["this"].exists);

 My library has a get!xxx string too (that throws an exception 
 if value
 is not xxx) and to!xxx that throws an exception if value can't 
 converted
 to xxx.

 I try to build this from existing building blocks in Phobos, so 
 opt basically returns a Nullable!Algebraic. I guess some of it 
 could simply be implemented in Algebraic, for example by adding 
 an overload of .get that takes a default value. Instead of .to, 
 you already have .coerce.

 The other possible approach, which would be more convenient to 
 use, would be add a "default value" overload to "opt", for 
 example: jv.opt("defval").foo.bar

Isn't jv.opt("defval") taking the value of ("defval") rather than 
setting a default value?

Jul 29 2015

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 29.07.2015 um 11:58 schrieb Andrea Fontana:
 On Wednesday, 29 July 2015 at 08:55:20 UTC, Sönke Ludwig wrote:
 That would be another possibility. What do you think about the
 opt(jv).foo.bar[12].baz alternative? One advantage is that it could
 work without parsing a string and the implications thereof (error
 handling?).

 I implemented it too, but I removed.
 Many times fields name are functions name or similar and it breaks the
 code.

In this case, since it would be a separate type, there are no static 
members apart from the automatically generated ones and maybe something 
like opIndex/opAssign. It can of course also overload opIndex with a 
string argument, so that there is a generic alternative in case of 
conflicts or runtime key names.

 In my implementation it creates a lot of temporary objects (one for each
 subobj) using the string instead, i just create the last one.

If the temporary objects are cheap, I don't see an issue there. Without 
keeping track of the path, a simple pointer to a JSONValue should be 
sufficient (the temporary objects have to be made non-copyable).

 It's not easy for me to use assignments with that syntax. Something like:

 obj.with.a.new.field = 3;

 It's difficult to implement. It's much easier to implement:

 obj["/field/doesnt/exists"] = 3

Maybe more difficult, but certainly possible. If the complexity doesn't 
explode, I'd say that shouldn't be a primary concern, since this is all 
still pretty simple.

 It's much easier to write formatted-string paths.
 It allows future implementation of something like xpath/jquery style

Advanced path queries could indeed be interesting, possibly even more 
interesting if applied to the pull parser.

 If your json contains keys with "/" inside, you can still use old plain
 syntax...

A possible alternative would be to support some kind of escape syntax.

 String parsing it's quite easy (at compile time too) of course. If a
 part of path doesn't exists it works like a part of opt("a", "b", "c")
 doesn't. It's just syntax sugar. :)

Granted, it's not really much in this case, but you do get less static 
checking, which means that some things will only be caught at run time. 
Also, you'll get an ambiguity if you want to support array indices, too. 
Finally, it may even be security relevant, because an attacker might try 
to sneak in a key that contains slash characters to access/overwrite 
fields that would normally not be possible. So every user input that may 
end up in a path query will have to be validated first now.

 Does it works? Anyway it seems ambiguous:
 opt(...) == null   => false
 opt(...).isNull    => true

The former gets forwarded to Algebraic, while the latter is a method of 
the enclosing Nullable. I've tested it and it works. But I also agree it 
isn't particularly pretty in this case, but that's what we have in D as 
basic building blocks (or do we have an Optional type somewhere, yet).

 The other possible approach, which would be more convenient to use,
 would be add a "default value" overload to "opt", for example:
 jv.opt("defval").foo.bar

 Isn't jv.opt("defval") taking the value of ("defval") rather than
 setting a default value?

It would be an opt with different semantics, just a theoretical 
alternative. This behavior would be mutually exclusive to the current opt.

Jul 30 2015

"deadalnix" <deadalnix gmail.com> writes:

On Tuesday, 28 July 2015 at 14:07:19 UTC, Atila Neves wrote:
 Start of the two week process, folks.

 Code: https://github.com/s-ludwig/std_data_json
 Docs: http://s-ludwig.github.io/std_data_json/

 Atila

Looked in the doc ( 
http://s-ludwig.github.io/std_data_json/stdx/data/json/value/JSONValue.html ).
I wanted to know how JSONValue can be manipulated. That is not very explicit.

First, it doesn't looks like the value can embed null as a value. 
null is a valid json value.

Secondly, it seems that it accept bigint. As per JSON spec, the 
only kind of numeric value you can have in there is a num, which 
doesn't even make the difference between floating point and 
integer (!) and with 53 bits of precision. By having double and 
long in there, we are already way over spec, so I'm not sure why 
we'd want to put bigint in there.

Finally, I'd love to see that JSONValue to exhibit a similar API 
than jsvar.

Aug 03 2015

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 03.08.2015 um 23:15 schrieb deadalnix:
 On Tuesday, 28 July 2015 at 14:07:19 UTC, Atila Neves wrote:
 Start of the two week process, folks.

 Code: https://github.com/s-ludwig/std_data_json
 Docs: http://s-ludwig.github.io/std_data_json/

 Atila

 Looked in the doc (
 http://s-ludwig.github.io/std_data_json/stdx/data/json/value/JSONValue.html
 ). I wanted to know how JSONValue can be manipulated. That is not very
 explicit.

 First, it doesn't looks like the value can embed null as a value. null
 is a valid json value.

The documentation is lacking, I'll improve that. JSONValue includes an 
alias this to an Algebraic, which provides the actual data API. Its type 
list includes typeof(null).

 Secondly, it seems that it accept bigint. As per JSON spec, the only
 kind of numeric value you can have in there is a num, which doesn't even
 make the difference between floating point and integer (!) and with 53
 bits of precision. By having double and long in there, we are already
 way over spec, so I'm not sure why we'd want to put bigint in there.

See also my reply a few posts back. JSON does not specify anything WRT 
the precision or length of numbers. In the ECMA standard it is mentioned 
explicitly that this was done so that applications are not limited in 
what kind of numbers can be transferred. The only thing explicitly 
mentioned is that implementations *may* choose to support only 64-bit 
floats. But large integer numbers are used in practice, so we should be 
able to handle those, too (one way or another).

 Finally, I'd love to see that JSONValue to exhibit a similar API than
 jsvar.

This is how it used to be in the vibe.data.json module. I consider that 
to be a mistake now for multiple reasons, at least on this abstraction 
level. My proposal would be to have a clean, "strongly typed" JSONValue 
and a generic jsvar like struct on top of that, which is defined 
independently, and could for example work on a BSONValue, too. The usage 
would simply be "var value = parseJSONValue(...);".

Aug 04 2015

"deadalnix" <deadalnix gmail.com> writes:

On Tuesday, 4 August 2015 at 13:10:11 UTC, Sönke Ludwig wrote:
 This is how it used to be in the vibe.data.json module. I 
 consider that to be a mistake now for multiple reasons, at 
 least on this abstraction level. My proposal would be to have a 
 clean, "strongly typed" JSONValue and a generic jsvar like 
 struct on top of that, which is defined independently, and 
 could for example work on a BSONValue, too. The usage would 
 simply be "var value = parseJSONValue(...);".

That is not going to cut it. I've been working with these for 
ages. This is the very kind of scenarios where dynamically typed 
languages are way more convenient.

I've used both quite extensively and this is clear cut: you don't 
want what you call the strongly typed version of things. I've 
done it in many languages, including in java for instance.

jsvar interface remove the problematic parts of JS (use ~ instead 
of + for concat strings and do not implement the opDispatch part 
of the API).

Aug 04 2015

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 04.08.2015 um 19:14 schrieb deadalnix:
 On Tuesday, 4 August 2015 at 13:10:11 UTC, Sönke Ludwig wrote:
 This is how it used to be in the vibe.data.json module. I consider
 that to be a mistake now for multiple reasons, at least on this
 abstraction level. My proposal would be to have a clean, "strongly
 typed" JSONValue and a generic jsvar like struct on top of that, which
 is defined independently, and could for example work on a BSONValue,
 too. The usage would simply be "var value = parseJSONValue(...);".

 That is not going to cut it. I've been working with these for ages. This
 is the very kind of scenarios where dynamically typed languages are way
 more convenient.

 I've used both quite extensively and this is clear cut: you don't want
 what you call the strongly typed version of things. I've done it in many
 languages, including in java for instance.

 jsvar interface remove the problematic parts of JS (use ~ instead of +
 for concat strings and do not implement the opDispatch part of the API).

I just said that jsvar should be supported (even in its full glory), so 
why is that not going to cut it? Also, in theory, Algebraic already does 
more or less exactly what you propose (forwards operators, but skips 
opDispatch and JS-like string operators).

Aug 11 2015

"deadalnix" <deadalnix gmail.com> writes:

On Tuesday, 11 August 2015 at 21:27:48 UTC, Sönke Ludwig wrote:
 That is not going to cut it. I've been working with these for 
 ages. This
 is the very kind of scenarios where dynamically typed 
 languages are way
 more convenient.

 I've used both quite extensively and this is clear cut: you 
 don't want
 what you call the strongly typed version of things. I've done 
 it in many
 languages, including in java for instance.

 jsvar interface remove the problematic parts of JS (use ~ 
 instead of +
 for concat strings and do not implement the opDispatch part of 
 the API).

 I just said that jsvar should be supported (even in its full 
 glory), so why is that not going to cut it? Also, in theory, 
 Algebraic already does more or less exactly what you propose 
 (forwards operators, but skips opDispatch and JS-like string 
 operators).

Ok, then maybe there was a misunderstanding on my part.

My understanding was that there was a Node coming from the 
parser, and that the node could be wrapped in some facility 
providing a jsvar like API.

My position is that it is preferable to have whatever DOM node be 
jsvar like out of the box rather than having to wrap it into 
something to get that.

Aug 11 2015

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 11.08.2015 um 23:52 schrieb deadalnix:
 On Tuesday, 11 August 2015 at 21:27:48 UTC, Sönke Ludwig wrote:
 That is not going to cut it. I've been working with these for ages. This
 is the very kind of scenarios where dynamically typed languages are way
 more convenient.

 I've used both quite extensively and this is clear cut: you don't want
 what you call the strongly typed version of things. I've done it in many
 languages, including in java for instance.

 jsvar interface remove the problematic parts of JS (use ~ instead of +
 for concat strings and do not implement the opDispatch part of the API).

 I just said that jsvar should be supported (even in its full glory),
 so why is that not going to cut it? Also, in theory, Algebraic already
 does more or less exactly what you propose (forwards operators, but
 skips opDispatch and JS-like string operators).

 Ok, then maybe there was a misunderstanding on my part.

 My understanding was that there was a Node coming from the parser, and
 that the node could be wrapped in some facility providing a jsvar like API.

Okay, no that's correct.

 My position is that it is preferable to have whatever DOM node be jsvar
 like out of the box rather than having to wrap it into something to get
 that.

But take into account that Algebraic already behaves much like jsvar (at 
least ideally), just without opDispatch and JavaScript operator 
emulation (which I'm strongly opposed to as a *default*). So the jsvar 
wrapper would really just be needed for the cases where really concise 
code is desired when operating on JSON objects.

We also discussed an alternative approach similar to 
opt(n).foo.bar[1].baz, where n is a JSONValue and opt() creates a 
wrapper that enables safe navigation within the DOM, propagating any 
missing/mismatched fields to the final result instead of throwing. This 
could also be combined with a final type query: opt!string(n).foo.bar

Aug 12 2015

"Meta" <jared771 gmail.com> writes:

On Wednesday, 12 August 2015 at 07:19:05 UTC, Sönke Ludwig wrote:
 We also discussed an alternative approach similar to 
 opt(n).foo.bar[1].baz, where n is a JSONValue and opt() creates 
 a wrapper that enables safe navigation within the DOM, 
 propagating any missing/mismatched fields to the final result 
 instead of throwing. This could also be combined with a final 
 type query: opt!string(n).foo.bar

In relation to that, you may find this thread interesting:

http://forum.dlang.org/post/lnsc0c$1sip$1 digitalmars.com

Aug 12 2015

"Atila Neves" <atila.neves gmail.com> writes:

On Tuesday, 28 July 2015 at 14:07:19 UTC, Atila Neves wrote:
 Start of the two week process, folks.

 Code: https://github.com/s-ludwig/std_data_json
 Docs: http://s-ludwig.github.io/std_data_json/

 Atila

I forgot to give warnings that the two week period was about to 
be up, and was unsure from comments if this would be ready for 
voting, so let's give it another two days unless there are 
objections.

Atila

Aug 11 2015

"deadalnix" <deadalnix gmail.com> writes:

On Tuesday, 11 August 2015 at 17:08:39 UTC, Atila Neves wrote:
 On Tuesday, 28 July 2015 at 14:07:19 UTC, Atila Neves wrote:
 Start of the two week process, folks.

 Code: https://github.com/s-ludwig/std_data_json
 Docs: http://s-ludwig.github.io/std_data_json/

 Atila

 I forgot to give warnings that the two week period was about to 
 be up, and was unsure from comments if this would be ready for 
 voting, so let's give it another two days unless there are 
 objections.

 Atila

Ok some actionable items.

1/ How big is a JSON struct ? What is the biggest element in the 
union ? Is that element really needed ? Recurse.
2/ As far as I can see, the element are discriminated using 
typeid. An enum is preferable as the compiler would know values 
ahead of time and optimize based on this. It also allow use of 
things like final switch.
3/ Going from the untyped world to the typed world and provide an 
API to get back to the untyped word is a loser strategy. That 
sounds true intuitively, but also from my experience manipulating 
JSON in various languages. The Nodes produced by this lib need to 
be "manipulatable" as the unstructured values they represent.

Aug 11 2015

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 11-Aug-2015 20:30, deadalnix wrote:
 On Tuesday, 11 August 2015 at 17:08:39 UTC, Atila Neves wrote:
 On Tuesday, 28 July 2015 at 14:07:19 UTC, Atila Neves wrote:
 Start of the two week process, folks.

 Code: https://github.com/s-ludwig/std_data_json
 Docs: http://s-ludwig.github.io/std_data_json/

 Atila

 I forgot to give warnings that the two week period was about to be up,
 and was unsure from comments if this would be ready for voting, so
 let's give it another two days unless there are objections.

 Atila

 Ok some actionable items.

 1/ How big is a JSON struct ? What is the biggest element in the union ?
 Is that element really needed ? Recurse.

+1 Also most JS engines use nan-boxing to fit type tag along with the 
payload in 8 bytes total. At least the _fast_ path of std.data.json 
should take advantage of similar techniques.

 2/ As far as I can see, the element are discriminated using typeid. An
 enum is preferable as the compiler would know values ahead of time and
 optimize based on this. It also allow use of things like final switch.

 3/ Going from the untyped world to the typed world and provide an API to
 get back to the untyped word is a loser strategy. That sounds true
 intuitively, but also from my experience manipulating JSON in various
 languages. The Nodes produced by this lib need to be "manipulatable" as
 the unstructured values they represent.


-- 
Dmitry Olshansky

Aug 11 2015

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 11.08.2015 um 20:15 schrieb Dmitry Olshansky:
 On 11-Aug-2015 20:30, deadalnix wrote:
 Ok some actionable items.

 1/ How big is a JSON struct ? What is the biggest element in the union ?
 Is that element really needed ? Recurse.

 +1 Also most JS engines use nan-boxing to fit type tag along with the
 payload in 8 bytes total. At least the _fast_ path of std.data.json
 should take advantage of similar techniques.

But the array field already needs 16 bytes on 64-bit systems anyway. We 
could surely abuse some bits there to at least not use up more for the 
type tag, but before we go that far, we should first tackle some other 
questions, such as the allocation strategy of JSONValues during parsing, 
the Location field and BigInt/Decimal support.

Maybe we should first have a vote about whether BigInt/Decimal should be 
supported or not, because that would at least solve some of the 
controversial tradeoffs. I didn't have a use for those personally, but 
at least we had the real-world issue in vibe.d's implementation that a 
ulong wasn't exactly representable.

My view generally still is that the DOM representation is something for 
convenient manipulation of small chunks of JSON, so that performance is 
not a priority, but feature completeness is.

Aug 11 2015

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 12-Aug-2015 00:21, Sönke Ludwig wrote:
 Am 11.08.2015 um 20:15 schrieb Dmitry Olshansky:
 On 11-Aug-2015 20:30, deadalnix wrote:
 Ok some actionable items.

 1/ How big is a JSON struct ? What is the biggest element in the union ?
 Is that element really needed ? Recurse.

 +1 Also most JS engines use nan-boxing to fit type tag along with the
 payload in 8 bytes total. At least the _fast_ path of std.data.json
 should take advantage of similar techniques.

 But the array field already needs 16 bytes on 64-bit systems anyway. We
 could surely abuse some bits there to at least not use up more for the
 type tag, but before we go that far, we should first tackle some other
 questions, such as the allocation strategy of JSONValues during parsing,
 the Location field and BigInt/Decimal support.

Pointer to array should work for all fields > 8 bytes. Depending on the 
ratio frequency of value vs frequency of array (which is at least an 
~5-10 in any practical scenario) it would make things both more compact 
and faster.

 Maybe we should first have a vote about whether BigInt/Decimal should be
 supported or not, because that would at least solve some of the
 controversial tradeoffs. I didn't have a use for those personally, but
 at least we had the real-world issue in vibe.d's implementation that a
 ulong wasn't exactly representable.

Well I've stated why I think BigInt should be optional. The reason is 
C++ parsers don't even bother with anything beyond ULong/double, nor 
would any e.g. Node.js stuff bother with things beyond double.

Lastly we don't have BigFloat so supporting BigInt but not BigFloat is 
kinda half-way.

So please make it an option. And again add an extra indirection (that is 
BigInt*) for BigInt field in a union because they are extremely rare.

 My view generally still is that the DOM representation is something for
 convenient manipulation of small chunks of JSON, so that performance is
 not a priority, but feature completeness is.

I'm confused - there must be some struct that represents a useful value. 
And more importantly - is JSONValue going to be converted to jsvar? If 
not - I'm fine. Otherwise whatever inefficiency present in JSONValue 
would be accumulated by this conversion process.

-- 
Dmitry Olshansky

Aug 11 2015

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 12.08.2015 um 08:28 schrieb Dmitry Olshansky:
 On 12-Aug-2015 00:21, Sönke Ludwig wrote:
 Am 11.08.2015 um 20:15 schrieb Dmitry Olshansky:
 On 11-Aug-2015 20:30, deadalnix wrote:
 Ok some actionable items.

 1/ How big is a JSON struct ? What is the biggest element in the
 union ?
 Is that element really needed ? Recurse.

 +1 Also most JS engines use nan-boxing to fit type tag along with the
 payload in 8 bytes total. At least the _fast_ path of std.data.json
 should take advantage of similar techniques.

 But the array field already needs 16 bytes on 64-bit systems anyway. We
 could surely abuse some bits there to at least not use up more for the
 type tag, but before we go that far, we should first tackle some other
 questions, such as the allocation strategy of JSONValues during parsing,
 the Location field and BigInt/Decimal support.

 Pointer to array should work for all fields > 8 bytes. Depending on the
 ratio frequency of value vs frequency of array (which is at least an
 ~5-10 in any practical scenario) it would make things both more compact
 and faster.

 Maybe we should first have a vote about whether BigInt/Decimal should be
 supported or not, because that would at least solve some of the
 controversial tradeoffs. I didn't have a use for those personally, but
 at least we had the real-world issue in vibe.d's implementation that a
 ulong wasn't exactly representable.

 Well I've stated why I think BigInt should be optional. The reason is
 C++ parsers don't even bother with anything beyond ULong/double, nor
 would any e.g. Node.js stuff bother with things beyond double.

The trouble begins with long vs. ulong, even if we'd leave larger 
numbers aside. We'd really have to support both, but choosing between 
the two is ambiguous, which isn't very pretty overall.

 Lastly we don't have BigFloat so supporting BigInt but not BigFloat is
 kinda half-way.

That's where Decimal would come in. There is some code for that 
commented out, but I really didn't want to add it without a standard 
Phobos implementation. But I wouldn't say that this is really an 
argument against BigInt, maybe more one for implementing a Decimal type.

 So please make it an option. And again add an extra indirection (that is
 BigInt*) for BigInt field in a union because they are extremely rare.

Good idea, didn't think about that.

 My view generally still is that the DOM representation is something for
 convenient manipulation of small chunks of JSON, so that performance is
 not a priority, but feature completeness is.

 I'm confused - there must be some struct that represents a useful value.

There is also the lower level JSONParserNode that represents data of a 
single bit of the JSON document. But since that struct is just part of a 
range, its size doesn't matter for speed or memory consumption (they are 
not allocated or copied while parsing).

 And more importantly - is JSONValue going to be converted to jsvar? If
 not - I'm fine. Otherwise whatever inefficiency present in JSONValue
 would be accumulated by this conversion process.

By default and currently it isn't, but it might be an idea for the 
future. The jsvar struct could possibly be implemented as a wrapper 
around JSONValue as a whole, so that it doesn't have to perform an 
actual conversion of the whole document.

Generally, working with JSONValue is already rather inefficient due to 
all of the dynamic allocations to populate dynamic and associative 
arrays. Changing that would require switching to completely different 
underlying container types, which would at least make the API a lot less 
intuitive.

We could of course also simply provide an alternative value 
representation that is not based on Algebraic (or an enum tag based 
alternative) and is not augmented with location information, but 
optimized solely for speed and low memory consumption.

Aug 12 2015

Walter Bright <newshound2 digitalmars.com> writes:

On 8/12/2015 12:44 AM, Sönke Ludwig wrote:
 That's where Decimal would come in. There is some code for that commented out,
 but I really didn't want to add it without a standard Phobos implementation.
But
 I wouldn't say that this is really an argument against BigInt, maybe more one
 for implementing a Decimal type.

Make the type for storing a Number be a template parameter.

Aug 13 2015

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 14.08.2015 um 07:11 schrieb Walter Bright:
 On 8/12/2015 12:44 AM, Sönke Ludwig wrote:
 That's where Decimal would come in. There is some code for that
 commented out,
 but I really didn't want to add it without a standard Phobos
 implementation. But
 I wouldn't say that this is really an argument against BigInt, maybe
 more one
 for implementing a Decimal type.

 Make the type for storing a Number be a template parameter.

Then we'd lose the ability to distinguish between integers and floating 
point in the same lexer instantiation, which is vital for certain input 
files to avoid losing precision for 64-bit integers. The only solution 
would be to use Decimal, but that doesn't exist yet and would be slow.

But the use of BigInt is already controlled by a template parameter, 
only the std.bigint import is currently there unconditionally.

Hm, another idea would be to store a void* (to a BigInt) instead of a 
BigInt and only import std.bigint locally in the accessor functions.

Aug 14 2015

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Friday, 14 August 2015 at 07:14:34 UTC, Sönke Ludwig wrote:
 Am 14.08.2015 um 07:11 schrieb Walter Bright:
 Make the type for storing a Number be a template parameter.

 Then we'd lose the ability to distinguish between integers and 
 floating point in the same lexer instantiation, which is vital 
 for certain input files to avoid losing precision for 64-bit 
 integers. The only solution would be to use Decimal, but that 
 doesn't exist yet and would be slow.

Why can't you specify many types? You should be able to query the 
range/precision of each type?

Aug 14 2015

Walter Bright <newshound2 digitalmars.com> writes:

On 8/14/2015 12:14 AM, Sönke Ludwig wrote:
 Am 14.08.2015 um 07:11 schrieb Walter Bright:
 Make the type for storing a Number be a template parameter.

 Then we'd lose the ability to distinguish between integers and floating point
in
 the same lexer instantiation, which is vital for certain input files to avoid
 losing precision for 64-bit integers. The only solution would be to use
Decimal,
 but that doesn't exist yet and would be slow.

Two other solutions:

1. 'real' has enough precision to hold 64 bit integers.

2. You can use a union of 'long' and a template type T. Use the 'long' if it 
fits, and T if it doesn't.

Aug 14 2015

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Friday, 14 August 2015 at 08:03:34 UTC, Walter Bright wrote:
 1. 'real' has enough precision to hold 64 bit integers.

Except for the lowest negative value…

(it has only 63 bits + floating point sign bit)

Aug 14 2015

Walter Bright <newshound2 digitalmars.com> writes:

On 8/14/2015 2:20 AM, Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= 
<ola.fosheim.grostad+dlang gmail.com> wrote:
 On Friday, 14 August 2015 at 08:03:34 UTC, Walter Bright wrote:
 1. 'real' has enough precision to hold 64 bit integers.

 Except for the lowest negative value…

 (it has only 63 bits + floating point sign bit)

You can always use T for that.

Aug 14 2015

"Matthias Bentrup" <matthias.bentrup googlemail.com> writes:

On Friday, 14 August 2015 at 09:20:14 UTC, Ola Fosheim Grøstad 
wrote:
 On Friday, 14 August 2015 at 08:03:34 UTC, Walter Bright wrote:
 1. 'real' has enough precision to hold 64 bit integers.

 Except for the lowest negative value…

 (it has only 63 bits + floating point sign bit)

actually the x87 format has 64 mantissa bits, although the bit 63 
is always '1' for normalized numbers.

Aug 14 2015

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Friday, 14 August 2015 at 11:44:35 UTC, Matthias Bentrup wrote:
 On Friday, 14 August 2015 at 09:20:14 UTC, Ola Fosheim Grøstad 
 wrote:
 On Friday, 14 August 2015 at 08:03:34 UTC, Walter Bright wrote:
 1. 'real' has enough precision to hold 64 bit integers.

 Except for the lowest negative value…

 (it has only 63 bits + floating point sign bit)

 actually the x87 format has 64 mantissa bits, although the bit 
 63 is always '1' for normalized numbers.

Yes, Walter was right.

The most negative number can be represented since it is a -(2^63) 
, so you only need the exponent to represent it (you only need 1 
bit from the mantissa).

Aug 14 2015

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 11.08.2015 um 19:30 schrieb deadalnix:
 Ok some actionable items.

 1/ How big is a JSON struct ? What is the biggest element in the union ?
 Is that element really needed ? Recurse.

See 
http://s-ludwig.github.io/std_data_json/stdx/data/json/value/JSONValue.payload.html

The question whether each field is "really" needed obviously depends on 
the application. However, the biggest type is BigInt that, form a quick 
look, contains a dynamic array + a bool field, so it's not as compact as 
it could be, but also not really large. There is also an additional 
Location field that may sometimes be important for good error messages 
and the like and sometimes may be totally unneeded.

However, my goal when implementing this has never been to make the DOM 
representation as efficient as possible. The simple reason is that a DOM 
representation is inherently inefficient when compared to operating on 
the structure using either the pull parser or using a deserializer that 
directly converts into a static D type. IMO these should be advertised 
instead of trying to milk a dead cow (in terms of performance).

 2/ As far as I can see, the element are discriminated using typeid. An
 enum is preferable as the compiler would know values ahead of time and
 optimize based on this. It also allow use of things like final switch.

Using a tagged union like structure is definitely what I'd like to have, 
too. However, the main goal was to build the DOM type upon a generic 
algebraic type instead of using a home-brew tagged union. The reason is 
that it automatically makes different DOM types with a similar structure 
interoperable (JSON/BSON/TOML/...).

Now Phobos unfortunately only has Algebraic, which not only doesn't have 
a type enum, but is currently also really bad at keeping static type 
information when forwarding function calls or operators. The only 
options were basically to resort to Algebraic for now, but have 
something that works, or to first implement an alternative algebraic 
type and get it accepted into Phobos, which would delay the whole 
process nearly indefinitely.

 3/ Going from the untyped world to the typed world and provide an API to
 get back to the untyped word is a loser strategy. That sounds true
 intuitively, but also from my experience manipulating JSON in various
 languages. The Nodes produced by this lib need to be "manipulatable" as
 the unstructured values they represent.

It isn't really clear to me what you mean by this. What exactly about 
JSONValue can't be manipulated like the "unstructured values [it] 
represent[s]"?

Or do you perhaps mean the JSON -> deserialize -> manipulate -> 
serialize -> JSON approach? That definitely is not a "loser strategy"*, 
but yes, it is limited to applications where you have a partially fixed 
schema. However, arguably most applications fall into that category.

* OT: My personal observation is that sadly the overall tone in the 
community has generally become a lot less friendly over the last months. 
I'm a bit worried about where this may lead in the long term.

Aug 11 2015

"deadalnix" <deadalnix gmail.com> writes:

On Tuesday, 11 August 2015 at 21:06:24 UTC, Sönke Ludwig wrote:
 See 
 http://s-ludwig.github.io/std_data_json/stdx/data/json/value/JSONValue.payload.html

 The question whether each field is "really" needed obviously 
 depends on the application. However, the biggest type is BigInt 
 that, form a quick look, contains a dynamic array + a bool 
 field, so it's not as compact as it could be, but also not 
 really large. There is also an additional Location field that 
 may sometimes be important for good error messages and the like 
 and sometimes may be totally unneeded.

Urg. Looks like BigInt should steal a bit somewhere instead of 
having a bool like this. That is not really your lib's fault, but 
that's quite an heavy cost.

Consider this, if the struct fit into 2 registers, it will be 
passed around as such rather than in memory. That is a 
significant difference. For BigInt itself, and, by proxy, for the 
JSON library.

Putting the BigInt thing aside, it seems like the biggest field 
in there is an array of JSONValues or a string. For the string, 
you can artificially limit the length by 3 bits to stick a tag. 
That still give absurdly large strings. For the JSONValue case, 
the alignment on the pointer is such as you can steal 3 bits from 
there. Or as for string, the length can be used.

It seems very realizable to me to have the JSONValue struct fit 
into 2 registers, granted the tag fit in 3 bits (8 different 
types).

I can help with that if you want to.

 However, my goal when implementing this has never been to make 
 the DOM representation as efficient as possible. The simple 
 reason is that a DOM representation is inherently inefficient 
 when compared to operating on the structure using either the 
 pull parser or using a deserializer that directly converts into 
 a static D type. IMO these should be advertised instead of 
 trying to milk a dead cow (in terms of performance).

Indeed. Still, JSON nodes should be as lightweight as possible.

 2/ As far as I can see, the element are discriminated using 
 typeid. An
 enum is preferable as the compiler would know values ahead of 
 time and
 optimize based on this. It also allow use of things like final 
 switch.

 Using a tagged union like structure is definitely what I'd like 
 to have, too. However, the main goal was to build the DOM type 
 upon a generic algebraic type instead of using a home-brew 
 tagged union. The reason is that it automatically makes 
 different DOM types with a similar structure interoperable 
 (JSON/BSON/TOML/...).

That is a great point that I haven't considered. I'd go the other 
way around about it: providing a compatible typeid based struct 
from the enum tagged one for compatibility. It can even be alias 
this so the transition is transparent.

The transformation is not bijective, so that'd be great to get 
the most restrictive form (the enum) and fallback on the least 
restrictive one (alias this) when wanted.

 Now Phobos unfortunately only has Algebraic, which not only 
 doesn't have a type enum, but is currently also really bad at 
 keeping static type information when forwarding function calls 
 or operators. The only options were basically to resort to 
 Algebraic for now, but have something that works, or to first 
 implement an alternative algebraic type and get it accepted 
 into Phobos, which would delay the whole process nearly 
 indefinitely.

That's fine. Done is better than perfect. Still API changes tend 
to be problematic, so we need to nail that part at least, and an 
enum with fallback on typeid based solution seems like the best 
option.

 Or do you perhaps mean the JSON -> deserialize -> manipulate -> 
 serialize -> JSON approach? That definitely is not a "loser 
 strategy"*, but yes, it is limited to applications where you 
 have a partially fixed schema. However, arguably most 
 applications fall into that category.

Yes.

Aug 11 2015

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 12.08.2015 um 00:21 schrieb deadalnix:
 On Tuesday, 11 August 2015 at 21:06:24 UTC, Sönke Ludwig wrote:
 See
 http://s-ludwig.github.io/std_data_json/stdx/data/json/value/JSONValue.payload.html


 The question whether each field is "really" needed obviously depends
 on the application. However, the biggest type is BigInt that, form a
 quick look, contains a dynamic array + a bool field, so it's not as
 compact as it could be, but also not really large. There is also an
 additional Location field that may sometimes be important for good
 error messages and the like and sometimes may be totally unneeded.

 Urg. Looks like BigInt should steal a bit somewhere instead of having a
 bool like this. That is not really your lib's fault, but that's quite an
 heavy cost.

 Consider this, if the struct fit into 2 registers, it will be passed
 around as such rather than in memory. That is a significant difference.
 For BigInt itself, and, by proxy, for the JSON library.

Agreed, this was what I also thought. Considering that BigInt is heavy 
anyway, Dimitry's suggestion to store a "BigInt*" sounds like a good 
idea to sidestep that issue, though.

 Putting the BigInt thing aside, it seems like the biggest field in there
 is an array of JSONValues or a string. For the string, you can
 artificially limit the length by 3 bits to stick a tag. That still give
 absurdly large strings. For the JSONValue case, the alignment on the
 pointer is such as you can steal 3 bits from there. Or as for string,
 the length can be used.

 It seems very realizable to me to have the JSONValue struct fit into 2
 registers, granted the tag fit in 3 bits (8 different types).

 I can help with that if you want to.

The question is mainly just, should we decide on a single way to 
represent values (either speed, or features), or let the library user 
decide by either making JSONValue a template, or by providing two 
separate structs optimized for each case.

In the latter case, we could really optimize on all fronts and for 
example use custom containers that use less allocations and are more 
cache friendly than the built-in ones.

 However, my goal when implementing this has never been to make the DOM
 representation as efficient as possible. The simple reason is that a
 DOM representation is inherently inefficient when compared to
 operating on the structure using either the pull parser or using a
 deserializer that directly converts into a static D type. IMO these
 should be advertised instead of trying to milk a dead cow (in terms of
 performance).

 Indeed. Still, JSON nodes should be as lightweight as possible.

 2/ As far as I can see, the element are discriminated using typeid. An
 enum is preferable as the compiler would know values ahead of time and
 optimize based on this. It also allow use of things like final switch.

 Using a tagged union like structure is definitely what I'd like to
 have, too. However, the main goal was to build the DOM type upon a
 generic algebraic type instead of using a home-brew tagged union. The
 reason is that it automatically makes different DOM types with a
 similar structure interoperable (JSON/BSON/TOML/...).

 That is a great point that I haven't considered. I'd go the other way
 around about it: providing a compatible typeid based struct from the
 enum tagged one for compatibility. It can even be alias this so the
 transition is transparent.

 The transformation is not bijective, so that'd be great to get the most
 restrictive form (the enum) and fallback on the least restrictive one
 (alias this) when wanted.

As long as the set of types is fixed, it would even be bijective. 
Anyway, I've just started to work on a generic variant of an enum based 
algebraic type that exploits as much static type information as 
possible. If that works out (compiler bugs?), it would be a great thing 
to have in Phobos, so maybe it's worth to delay the JSON module for that 
if necessary.

The optimization to store the type enum in the length field of dynamic 
arrays could also be built into the generic type.

 Now Phobos unfortunately only has Algebraic, which not only doesn't
 have a type enum, but is currently also really bad at keeping static
 type information when forwarding function calls or operators. The only
 options were basically to resort to Algebraic for now, but have
 something that works, or to first implement an alternative algebraic
 type and get it accepted into Phobos, which would delay the whole
 process nearly indefinitely.

 That's fine. Done is better than perfect. Still API changes tend to be
 problematic, so we need to nail that part at least, and an enum with
 fallback on typeid based solution seems like the best option.

Yeah, the transition is indeed problematic. Sadly the "alias this" idea 
wouldn't work for for that either, because operators and methods of the 
enum based algebraic type usually have different return types.

 Or do you perhaps mean the JSON -> deserialize -> manipulate ->
 serialize -> JSON approach? That definitely is not a "loser
 strategy"*, but yes, it is limited to applications where you have a
 partially fixed schema. However, arguably most applications fall into
 that category.

 Yes.

Just to state explicitly what I mean: This strategy has the most 
efficient in-memory storage format and profits from all the static type 
checking niceties of the compiler. It also means that there is a 
documented schema in the code that be used for reference by the 
developers and that will automatically be verified by the serializer, 
resulting in less and better checked code. So where applicable I claim 
that this is the best strategy to work with such data.

For maximum efficiency, it can also be transparently combined with the 
pull parser. The pull parser can for example be used to jump between 
array entries and the serializer then reads each single array entry.

Aug 12 2015

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

 Anyway, I've just started to work on a generic variant of an enum based
 algebraic type that exploits as much static type information as
 possible. If that works out (compiler bugs?), it would be a great thing
 to have in Phobos, so maybe it's worth to delay the JSON module for that
 if necessary.

First proof of concept:
https://gist.github.com/s-ludwig/7a8a60150f510239f071#file-taggedalgebraic-d-L148

It probably still has issues with const/immutable and ref in some 
places, but the basics seem to work as expected.

Aug 12 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 8/12/15 5:43 AM, Sönke Ludwig wrote:
 Anyway, I've just started to work on a generic variant of an enum based
 algebraic type that exploits as much static type information as
 possible. If that works out (compiler bugs?), it would be a great thing
 to have in Phobos, so maybe it's worth to delay the JSON module for that
 if necessary.

 First proof of concept:
 https://gist.github.com/s-ludwig/7a8a60150f510239f071#file-taggedalgebraic-d-L148


 It probably still has issues with const/immutable and ref in some
 places, but the basics seem to work as expected.

struct TaggedAlgebraic(U) if (is(U == union)) { ... }

Interesting. I think it would be best to rename it to TaggedUnion 
(instantly recognizable; also TaggedAlgebraic is an oxymoron as there's 
no untagged algebraic type). A good place for it is straight in std.variant.

What are the relative advantages of using an integral over a pointer to 
function? In other words, what's a side by side comparison of 
TaggedAlgebraic!U and Algebraic!(types inside U)?


Thanks,

Andrei

Aug 14 2015

Timon Gehr <timon.gehr gmx.ch> writes:

On 08/14/2015 01:40 PM, Andrei Alexandrescu wrote:
 On 8/12/15 5:43 AM, Sönke Ludwig wrote:
 Anyway, I've just started to work on a generic variant of an enum based
 algebraic type that exploits as much static type information as
 possible. If that works out (compiler bugs?), it would be a great thing
 to have in Phobos, so maybe it's worth to delay the JSON module for that
 if necessary.

 First proof of concept:
 https://gist.github.com/s-ludwig/7a8a60150f510239f071#file-taggedalgebraic-d-L148



 It probably still has issues with const/immutable and ref in some
 places, but the basics seem to work as expected.

 struct TaggedAlgebraic(U) if (is(U == union)) { ... }

 Interesting. I think it would be best to rename it to TaggedUnion
 (instantly recognizable; also TaggedAlgebraic is an oxymoron

No, it isn't. I believe the word you might want is "pleonasm". :o)

 as there's no untagged algebraic type).

The tag is an implementation detail. Algebraic types are actually more 
naturally expressed as polymorphic higher-order functions.

Aug 14 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 8/14/15 7:40 AM, Andrei Alexandrescu wrote:
 On 8/12/15 5:43 AM, Sönke Ludwig wrote:
 Anyway, I've just started to work on a generic variant of an enum based
 algebraic type that exploits as much static type information as
 possible. If that works out (compiler bugs?), it would be a great thing
 to have in Phobos, so maybe it's worth to delay the JSON module for that
 if necessary.

 First proof of concept:
 https://gist.github.com/s-ludwig/7a8a60150f510239f071#file-taggedalgebraic-d-L148



 It probably still has issues with const/immutable and ref in some
 places, but the basics seem to work as expected.

 struct TaggedAlgebraic(U) if (is(U == union)) { ... }

 Interesting. I think it would be best to rename it to TaggedUnion
 (instantly recognizable; also TaggedAlgebraic is an oxymoron as there's
 no untagged algebraic type). A good place for it is straight in
 std.variant.

 What are the relative advantages of using an integral over a pointer to
 function? In other words, what's a side by side comparison of
 TaggedAlgebraic!U and Algebraic!(types inside U)?

 Thanks,

 Andrei

Ping on this. My working hypothesis:

- If there's a way to make a tag smaller than one word, e.g. by using 
various packing tricks, then the integral tag has an advantage over the 
pointer tag.

- If there's some ordering among types (e.g. all types below 16 have 
some property etc), then the integral tag again has an advantage over 
the pointer tag.

- Other than that the pointer tag is superior to the integral tag at 
everything. Where it really wins is there is one unique tag for each 
type, present or future, so the universe of types representable is the 
total set. The pointer may be used for dispatching but also as a simple 
integral tag, so the pointer tag is a superset of the integral tag.

I've noticed many people are surprised by std.variant's use of a pointer 
instead of an integral for tagging. I'd like to either figure whether 
there's an advantage to integral tags, or if not settle for good a 
misconception.


Andrei

Aug 17 2015

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 17-Aug-2015 21:12, Andrei Alexandrescu wrote:
 On 8/14/15 7:40 AM, Andrei Alexandrescu wrote:
 On 8/12/15 5:43 AM, Sönke Ludwig wrote:
 Anyway, I've just started to work on a generic variant of an enum based
 algebraic type that exploits as much static type information as
 possible. If that works out (compiler bugs?), it would be a great thing
 to have in Phobos, so maybe it's worth to delay the JSON module for
 that
 if necessary.

 First proof of concept:
 https://gist.github.com/s-ludwig/7a8a60150f510239f071#file-taggedalgebraic-d-L148




 It probably still has issues with const/immutable and ref in some
 places, but the basics seem to work as expected.

 struct TaggedAlgebraic(U) if (is(U == union)) { ... }

 Interesting. I think it would be best to rename it to TaggedUnion
 (instantly recognizable; also TaggedAlgebraic is an oxymoron as there's
 no untagged algebraic type). A good place for it is straight in
 std.variant.

 What are the relative advantages of using an integral over a pointer to
 function? In other words, what's a side by side comparison of
 TaggedAlgebraic!U and Algebraic!(types inside U)?

 Thanks,

 Andrei

 Ping on this. My working hypothesis:

 - If there's a way to make a tag smaller than one word, e.g. by using
 various packing tricks, then the integral tag has an advantage over the
 pointer tag.

 - If there's some ordering among types (e.g. all types below 16 have
 some property etc), then the integral tag again has an advantage over
 the pointer tag.

 - Other than that the pointer tag is superior to the integral tag at
 everything. Where it really wins is there is one unique tag for each
 type, present or future, so the universe of types representable is the
 total set. The pointer may be used for dispatching but also as a simple
 integral tag, so the pointer tag is a superset of the integral tag.

 I've noticed many people are surprised by std.variant's use of a pointer
 instead of an integral for tagging. I'd like to either figure whether
 there's an advantage to integral tags, or if not settle for good a
 misconception.

Actually one can combine the two:
- use integer type tag for everything built-in
- use pointer tag for what is not

In code:
union NiftyTaggedUnion
{
  	// pointer must be at least 4-byte aligned
	// To discern int tag must have the LSB == 1
	// this assumes little-endian though, big-endian is doable too
	 property bool isIntTag(){ return common.head & 1; }
	IntTagged intTagged;
	PtrTagged ptrTagged;
	CommonUnion common;
}
struct CommonUnion
{
	ubyte[size_of_max_builtin] store;
// this is where the type-tag starts - pointer or int
	uint head;
}

union IntTagged // int-tagged
{
	union{  // builtins go here
		int ival;
		double dval;
		// ....
	}
	uint tag;
}

union PtrTagged // ptr to typeinfo scheme
{
	ubyte[size_of_max_builtin] payload;
	TypeInfo* pinfo;
}

It's going to be challenging but I think I can pull off even nan-boxing 
with this scheme.

-- 
Dmitry Olshansky

Aug 17 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 8/17/15 2:47 PM, Dmitry Olshansky wrote:
 Actually one can combine the two:
 - use integer type tag for everything built-in
 - use pointer tag for what is not

But a pointer tag can do everything that an integer tag does. -- Andrei

Aug 17 2015

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 18-Aug-2015 01:33, Andrei Alexandrescu wrote:
 On 8/17/15 2:47 PM, Dmitry Olshansky wrote:
 Actually one can combine the two:
 - use integer type tag for everything built-in
 - use pointer tag for what is not

 But a pointer tag can do everything that an integer tag does. -- Andrei

albeit quite a deal slooower.

-- 
Dmitry Olshansky

Aug 17 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 8/18/15 2:55 AM, Dmitry Olshansky wrote:
 On 18-Aug-2015 01:33, Andrei Alexandrescu wrote:
 On 8/17/15 2:47 PM, Dmitry Olshansky wrote:
 Actually one can combine the two:
 - use integer type tag for everything built-in
 - use pointer tag for what is not

 But a pointer tag can do everything that an integer tag does. -- Andrei

 albeit quite a deal slooower.

I think there's a misunderstanding. Pointers _are_ 64-bit integers and 
may be compared as such. You can use a pointer as an integer. -- Andrei

Aug 18 2015

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 18-Aug-2015 16:19, Andrei Alexandrescu wrote:
 On 8/18/15 2:55 AM, Dmitry Olshansky wrote:
 On 18-Aug-2015 01:33, Andrei Alexandrescu wrote:
 On 8/17/15 2:47 PM, Dmitry Olshansky wrote:
 Actually one can combine the two:
 - use integer type tag for everything built-in
 - use pointer tag for what is not

 But a pointer tag can do everything that an integer tag does. -- Andrei

 albeit quite a deal slooower.

 I think there's a misunderstanding. Pointers _are_ 64-bit integers and
 may be compared as such. You can use a pointer as an integer. -- Andrei

Integer in a small range is faster to switch on. Plus comparing to zero 
is faster, so if the common type has tag == 0 it's a net gain.

Strictly speaking pointer with vtbl is about as fast as switch but when 
we have to switch on 2 types the vtbl dispatch needs to be based on 2 
types instead of one. So ideally we need vtbl per pair of type to 
support e.g. fast binary operators on TaggedAlgebraic.

-- 
Dmitry Olshansky

Aug 18 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 8/18/15 12:31 PM, Dmitry Olshansky wrote:
 On 18-Aug-2015 16:19, Andrei Alexandrescu wrote:
 On 8/18/15 2:55 AM, Dmitry Olshansky wrote:
 On 18-Aug-2015 01:33, Andrei Alexandrescu wrote:
 On 8/17/15 2:47 PM, Dmitry Olshansky wrote:
 Actually one can combine the two:
 - use integer type tag for everything built-in
 - use pointer tag for what is not

 But a pointer tag can do everything that an integer tag does. -- Andrei

 albeit quite a deal slooower.

 I think there's a misunderstanding. Pointers _are_ 64-bit integers and
 may be compared as such. You can use a pointer as an integer. -- Andrei

 Integer in a small range is faster to switch on. Plus comparing to zero
 is faster, so if the common type has tag == 0 it's a net gain.

Agreed. These are small gains though unless tight loops are concerned.

 Strictly speaking pointer with vtbl is about as fast as switch but when
 we have to switch on 2 types the vtbl dispatch needs to be based on 2
 types instead of one. So ideally we need vtbl per pair of type to
 support e.g. fast binary operators on TaggedAlgebraic.

But I'm talking about using pointers for indirect calls IN ADDITION to 
using pointers for simple integral comparison. So the comparison is not 
appropriate. It's better to have both options instead of just one.


Andrei

Aug 18 2015

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 18-Aug-2015 19:35, Andrei Alexandrescu wrote:
 On 8/18/15 12:31 PM, Dmitry Olshansky wrote:
 On 18-Aug-2015 16:19, Andrei Alexandrescu wrote:
 On 8/18/15 2:55 AM, Dmitry Olshansky wrote:
 On 18-Aug-2015 01:33, Andrei Alexandrescu wrote:
 On 8/17/15 2:47 PM, Dmitry Olshansky wrote:
 Actually one can combine the two:
 - use integer type tag for everything built-in
 - use pointer tag for what is not

 But a pointer tag can do everything that an integer tag does. --
 Andrei

 albeit quite a deal slooower.

 I think there's a misunderstanding. Pointers _are_ 64-bit integers and
 may be compared as such. You can use a pointer as an integer. -- Andrei

 Integer in a small range is faster to switch on. Plus comparing to zero
 is faster, so if the common type has tag == 0 it's a net gain.

 Agreed. These are small gains though unless tight loops are concerned.

 Strictly speaking pointer with vtbl is about as fast as switch but when
 we have to switch on 2 types the vtbl dispatch needs to be based on 2
 types instead of one. So ideally we need vtbl per pair of type to
 support e.g. fast binary operators on TaggedAlgebraic.

 But I'm talking about using pointers for indirect calls IN ADDITION to
 using pointers for simple integral comparison. So the comparison is not
 appropriate. It's better to have both options instead of just one.

If common type fast path with 0 is not relevant then the only gain of 
integer is being able to fit it in a couple of bytes or even reuse some 
vacant bits.

Another thing is that function addresses are rather sparse so switch 
statement should do some special preprocessing to make it more dense:
  - subtract start of the code segment (maybe, but this won't work with 
DLLs though)
  -  shift right by 2(4?) as functions are usually aligned

-- 
Dmitry Olshansky

Aug 18 2015

"deadalnix" <deadalnix gmail.com> writes:

On Monday, 17 August 2015 at 18:12:02 UTC, Andrei Alexandrescu 
wrote:
 On 8/14/15 7:40 AM, Andrei Alexandrescu wrote:
 On 8/12/15 5:43 AM, Sönke Ludwig wrote:
 Anyway, I've just started to work on a generic variant of an 
 enum based
 algebraic type that exploits as much static type information 
 as
 possible. If that works out (compiler bugs?), it would be a 
 great thing
 to have in Phobos, so maybe it's worth to delay the JSON 
 module for that
 if necessary.

 First proof of concept:
 https://gist.github.com/s-ludwig/7a8a60150f510239f071#file-taggedalgebraic-d-L148



 It probably still has issues with const/immutable and ref in 
 some
 places, but the basics seem to work as expected.

 struct TaggedAlgebraic(U) if (is(U == union)) { ... }

 Interesting. I think it would be best to rename it to 
 TaggedUnion
 (instantly recognizable; also TaggedAlgebraic is an oxymoron 
 as there's
 no untagged algebraic type). A good place for it is straight in
 std.variant.

 What are the relative advantages of using an integral over a 
 pointer to
 function? In other words, what's a side by side comparison of
 TaggedAlgebraic!U and Algebraic!(types inside U)?

 Thanks,

 Andrei

 Ping on this. My working hypothesis:

 - If there's a way to make a tag smaller than one word, e.g. by 
 using various packing tricks, then the integral tag has an 
 advantage over the pointer tag.

 - If there's some ordering among types (e.g. all types below 16 
 have some property etc), then the integral tag again has an 
 advantage over the pointer tag.

 - Other than that the pointer tag is superior to the integral 
 tag at everything. Where it really wins is there is one unique 
 tag for each type, present or future, so the universe of types 
 representable is the total set. The pointer may be used for 
 dispatching but also as a simple integral tag, so the pointer 
 tag is a superset of the integral tag.

 I've noticed many people are surprised by std.variant's use of 
 a pointer instead of an integral for tagging. I'd like to 
 either figure whether there's an advantage to integral tags, or 
 if not settle for good a misconception.


 Andrei

 From the compiler perspective, the tag is much nicer. Compiler 
can use jump table for instance.

It is not a good solution for Variant (which needs to be able to 
represent arbitrary types) but if the amount of types is finite, 
tag is almost always a win.

In the case of JSON, using a tag and packing trick, it is 
possible to pack everything in a 2 pointers sized struct without 
much trouble.

Aug 17 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 8/17/15 2:51 PM, deadalnix wrote:
  From the compiler perspective, the tag is much nicer. Compiler can use
 jump table for instance.

The pointer is a more direct conduit to a jump table.

 It is not a good solution for Variant (which needs to be able to
 represent arbitrary types) but if the amount of types is finite, tag is
 almost always a win.
 In the case of JSON, using a tag and packing trick, it is possible to
 pack everything in a 2 pointers sized struct without much trouble.

Point taken. Question is if this is worth it.


Andrei

Aug 17 2015

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:

On Monday, 17 August 2015 at 22:34:36 UTC, Andrei Alexandrescu 
wrote:
 On 8/17/15 2:51 PM, deadalnix wrote:
  From the compiler perspective, the tag is much nicer. 
 Compiler can use
 jump table for instance.

 The pointer is a more direct conduit to a jump table.

Not really, because it most likely doesn't point to where you 
need it, but to a `TypeInfo` struct instead, which doesn't help 
you in a `switch` statement. Besides, you probably shouldn't 
compare pointers vs integers, but pointers vs enums.

 It is not a good solution for Variant (which needs to be able 
 to
 represent arbitrary types) but if the amount of types is 
 finite, tag is
 almost always a win.
 In the case of JSON, using a tag and packing trick, it is 
 possible to
 pack everything in a 2 pointers sized struct without much 
 trouble.

 Point taken. Question is if this is worth it.

Anything that makes it fit in two registers instead of three (= 2 
regs + memory, in practice) is most likely worth it.

Aug 18 2015

Johannes Pfau <nospam example.com> writes:

Am Tue, 18 Aug 2015 09:10:25 +0000
schrieb "Marc Sch=C3=BCtz" <schuetzm gmx.net>:

 On Monday, 17 August 2015 at 22:34:36 UTC, Andrei Alexandrescu=20
 wrote:
 On 8/17/15 2:51 PM, deadalnix wrote:
  From the compiler perspective, the tag is much nicer.=20
 Compiler can use
 jump table for instance.

 The pointer is a more direct conduit to a jump table.

=20
 Not really, because it most likely doesn't point to where you=20
 need it, but to a `TypeInfo` struct instead, which doesn't help=20
 you in a `switch` statement. Besides, you probably shouldn't=20
 compare pointers vs integers, but pointers vs enums.

Here's an example with an enum tag, showing what compilers can do:
http://goo.gl/NUZwNo

ARM ASM is easier to read for me. Feel free to switch to X86.


necessary for a final switch, probably a GDC/GCC enhancement). All
instructions/data should be in the instruction cache. There's no
register save / function call overhead.


If you use a pointer:
http://goo.gl/9kb0vQ

No jump table optimization. Cache should be OK as well. No call
overhead.


Note how both examples can also combine the code for uint/int. If you
use a function pointer instead you'll call different function.


Calling a function through pointer:
http://goo.gl/zTU3sA

You have one indirect call. Probably hard for the branch prediction,
although I don't really know. Probably also worse regarding cache. I
also cheated by using one pointer only for add. In reality you'll need
to store one pointer per operation or use a switch inside the called
function.


I think it's reasonable to expect the enum version to be faster. To be
really sure we'd need some benchmarks.

Aug 18 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 8/18/15 7:02 AM, Johannes Pfau wrote:
 Am Tue, 18 Aug 2015 09:10:25 +0000
 schrieb "Marc Schütz" <schuetzm gmx.net>:

 On Monday, 17 August 2015 at 22:34:36 UTC, Andrei Alexandrescu
 wrote:
 On 8/17/15 2:51 PM, deadalnix wrote:
   From the compiler perspective, the tag is much nicer.
 Compiler can use
 jump table for instance.

 The pointer is a more direct conduit to a jump table.

 Not really, because it most likely doesn't point to where you
 need it, but to a `TypeInfo` struct instead, which doesn't help
 you in a `switch` statement. Besides, you probably shouldn't
 compare pointers vs integers, but pointers vs enums.

 Here's an example with an enum tag, showing what compilers can do:
 http://goo.gl/NUZwNo

 ARM ASM is easier to read for me. Feel free to switch to X86.


 necessary for a final switch, probably a GDC/GCC enhancement). All
 instructions/data should be in the instruction cache. There's no
 register save / function call overhead.


 If you use a pointer:
 http://goo.gl/9kb0vQ

That's a language issue - switch does not work with any pointers. I just 
submitted https://issues.dlang.org/show_bug.cgi?id=14931. -- Andrei

Aug 18 2015

Johannes Pfau <nospam example.com> writes:

Am Tue, 18 Aug 2015 10:58:17 -0400
schrieb Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>:

 On 8/18/15 7:02 AM, Johannes Pfau wrote:
 Am Tue, 18 Aug 2015 09:10:25 +0000
 schrieb "Marc Sch=C3=BCtz" <schuetzm gmx.net>:

 On Monday, 17 August 2015 at 22:34:36 UTC, Andrei Alexandrescu
 wrote:
 On 8/17/15 2:51 PM, deadalnix wrote:
   From the compiler perspective, the tag is much nicer.
 Compiler can use
 jump table for instance.

 The pointer is a more direct conduit to a jump table.

 Not really, because it most likely doesn't point to where you
 need it, but to a `TypeInfo` struct instead, which doesn't help
 you in a `switch` statement. Besides, you probably shouldn't
 compare pointers vs integers, but pointers vs enums.

 Here's an example with an enum tag, showing what compilers can do:
 http://goo.gl/NUZwNo

 ARM ASM is easier to read for me. Feel free to switch to X86.


 be necessary for a final switch, probably a GDC/GCC enhancement).
 All instructions/data should be in the instruction cache. There's no
 register save / function call overhead.


 If you use a pointer:
 http://goo.gl/9kb0vQ

=20
 That's a language issue - switch does not work with any pointers. I
 just submitted https://issues.dlang.org/show_bug.cgi?id=3D14931. --
 Andrei
=20

Yes, if we enable switch for pointers we get nicer D code.

No, this won't improve the ASM much: Enum values start at 0 and are
consecutive. With a final switch they're also bounded. All these points
do not apply to pointers. They don't start at 0, are not guaranteed to
be consecutive and likely can't be used with final switch. Because of
that a switch on pointers can never use jump tables.

Aug 18 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 8/18/15 11:39 AM, Johannes Pfau wrote:
 No, this won't improve the ASM much: Enum values start at 0 and are
 consecutive. With a final switch they're also bounded. All these points
 do not apply to pointers. They don't start at 0, are not guaranteed to
 be consecutive and likely can't be used with final switch. Because of
 that a switch on pointers can never use jump tables.

I agree there's a margin here in favor of integers, but it's getting 
thin. Meanwhile, pointers maintain large advantages of principle. I 
suggest we pursue better use of pointers as tags instead of adding 
integral-tagged unions to phobos. -- Andrei

Aug 18 2015

"deadalnix" <deadalnix gmail.com> writes:

On Tuesday, 18 August 2015 at 16:22:20 UTC, Andrei Alexandrescu 
wrote:
 On 8/18/15 11:39 AM, Johannes Pfau wrote:
 No, this won't improve the ASM much: Enum values start at 0 
 and are
 consecutive. With a final switch they're also bounded. All 
 these points
 do not apply to pointers. They don't start at 0, are not 
 guaranteed to
 be consecutive and likely can't be used with final switch. 
 Because of
 that a switch on pointers can never use jump tables.

 I agree there's a margin here in favor of integers, but it's 
 getting thin. Meanwhile, pointers maintain large advantages of 
 principle. I suggest we pursue better use of pointers as tags 
 instead of adding integral-tagged unions to phobos. -- Andrei

No, enum can also be cramed inline in the code for cheap, they 
can be inserted in existing structure for cheap using bits 
manipulations most of the time, the compiler can check that all 
cases are handled in an exhaustive manner.

It is not getting thinner.

Aug 18 2015

"deadalnix" <deadalnix gmail.com> writes:

On Tuesday, 18 August 2015 at 14:58:08 UTC, Andrei Alexandrescu 
wrote:
 That's a language issue - switch does not work with any 
 pointers. I just submitted 
 https://issues.dlang.org/show_bug.cgi?id=14931. -- Andrei

No it is not. Is the set of values is not compact, no jump table.

Aug 18 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 8/18/15 5:10 AM, "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net>" 
wrote:
 On Monday, 17 August 2015 at 22:34:36 UTC, Andrei Alexandrescu wrote:
 On 8/17/15 2:51 PM, deadalnix wrote:
  From the compiler perspective, the tag is much nicer. Compiler can use
 jump table for instance.

 The pointer is a more direct conduit to a jump table.

 Not really, because it most likely doesn't point to where you need it,
 but to a `TypeInfo` struct instead

No, in std.variant it points to a dispatcher function. -- Andrei

Aug 18 2015

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig outerproduct.org> writes:

Am 17.08.2015 um 20:12 schrieb Andrei Alexandrescu:
 On 8/14/15 7:40 AM, Andrei Alexandrescu wrote:
 struct TaggedAlgebraic(U) if (is(U == union)) { ... }

 Interesting. I think it would be best to rename it to TaggedUnion
 (instantly recognizable; also TaggedAlgebraic is an oxymoron as there's
 no untagged algebraic type). A good place for it is straight in
 std.variant.

 What are the relative advantages of using an integral over a pointer to
 function? In other words, what's a side by side comparison of
 TaggedAlgebraic!U and Algebraic!(types inside U)?

 Thanks,

 Andrei

 Ping on this. My working hypothesis:

 - If there's a way to make a tag smaller than one word, e.g. by using
 various packing tricks, then the integral tag has an advantage over the
 pointer tag.

 - If there's some ordering among types (e.g. all types below 16 have
 some property etc), then the integral tag again has an advantage over
 the pointer tag.

 - Other than that the pointer tag is superior to the integral tag at
 everything. Where it really wins is there is one unique tag for each
 type, present or future, so the universe of types representable is the
 total set. The pointer may be used for dispatching but also as a simple
 integral tag, so the pointer tag is a superset of the integral tag.

 I've noticed many people are surprised by std.variant's use of a pointer
 instead of an integral for tagging. I'd like to either figure whether
 there's an advantage to integral tags, or if not settle for good a
 misconception.


 Andrei

(reposting to NG, accidentally replied by e-mail)

Some more points come to mind:

- The enum is useful to be able to identify the types outside of the D 
code itself. For example when serializing the data to disk, or when 
communicating with C code.

- It enables the use of pattern matching (final switch), which is often 
very convenient, faster, and safer than an if-else cascade.

- A hypothesis is that it is faster, because there is no function call 
indirection involved.

- It naturally enables fully statically typed operator forwarding as far 
as possible (have a look at the examples of the current version). A 
pointer based version could do this, too, but only by jumping through hoops.

- The same type can be used multiple times with a different enum name. 
This can alternatively be solved using a Typedef!T, but I had several 
occasions where that proved useful.

They both have their place, but IMO where the pointer approach really 
shines is for unbounded Variant types.

Aug 17 2015

Johannes Pfau <nospam example.com> writes:

Am Mon, 17 Aug 2015 20:56:18 +0200
schrieb S=C3=B6nke Ludwig <sludwig outerproduct.org>:

 Am 17.08.2015 um 20:12 schrieb Andrei Alexandrescu:
 On 8/14/15 7:40 AM, Andrei Alexandrescu wrote:
 struct TaggedAlgebraic(U) if (is(U =3D=3D union)) { ... }

 Interesting. I think it would be best to rename it to TaggedUnion
 (instantly recognizable; also TaggedAlgebraic is an oxymoron as
 there's no untagged algebraic type). A good place for it is
 straight in std.variant.

 What are the relative advantages of using an integral over a
 pointer to function? In other words, what's a side by side
 comparison of TaggedAlgebraic!U and Algebraic!(types inside U)?

 Thanks,

 Andrei

 Ping on this. My working hypothesis:

 - If there's a way to make a tag smaller than one word, e.g. by
 using various packing tricks, then the integral tag has an
 advantage over the pointer tag.

 - If there's some ordering among types (e.g. all types below 16 have
 some property etc), then the integral tag again has an advantage
 over the pointer tag.

 - Other than that the pointer tag is superior to the integral tag at
 everything. Where it really wins is there is one unique tag for each
 type, present or future, so the universe of types representable is
 the total set. The pointer may be used for dispatching but also as
 a simple integral tag, so the pointer tag is a superset of the
 integral tag.

 I've noticed many people are surprised by std.variant's use of a
 pointer instead of an integral for tagging. I'd like to either
 figure whether there's an advantage to integral tags, or if not
 settle for good a misconception.


 Andrei

=20
 (reposting to NG, accidentally replied by e-mail)
=20
 Some more points come to mind:
=20
 - The enum is useful to be able to identify the types outside of the
 D code itself. For example when serializing the data to disk, or when=20
 communicating with C code.
=20
 - It enables the use of pattern matching (final switch), which is
 often very convenient, faster, and safer than an if-else cascade.
=20
 - A hypothesis is that it is faster, because there is no function
 call indirection involved.
=20
 - It naturally enables fully statically typed operator forwarding as
 far as possible (have a look at the examples of the current version).
 A pointer based version could do this, too, but only by jumping
 through hoops.
=20
 - The same type can be used multiple times with a different enum
 name. This can alternatively be solved using a Typedef!T, but I had
 several occasions where that proved useful.
=20
 They both have their place, but IMO where the pointer approach really=20
 shines is for unbounded Variant types.


I think Andrei's point is that a pointer tag can do most things a
integral tag could as you don't have to dereference the pointer:

void* tag;
if (tag =3D=3D &someFunc!A)

So the only benefit is that the compiler knows that the _enum_ (not
simply an integral) tag is bounded. So we gain:
* easier debugging (readable type tag)
* potentially better codegen (jump tables fit perfectly: ordered values,
  0-x, no gaps)
* final switch

In some cases enum tags might also be smaller than a pointer.

Aug 17 2015

"Suliman" <evermind live.ru> writes:

Why not working: 	
JSONValue x = parseJSONValue(`{"a": true, "b": "test"}`);

but:
string str = `{"a": true, "b": "test"}`;
JSONValue x = parseJSONValue(str);

work fine?

Aug 17 2015

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig outerproduct.org> writes:

Am 17.08.2015 um 21:32 schrieb Suliman:
 Why not working:
 JSONValue x = parseJSONValue(`{"a": true, "b": "test"}`);

 but:
 string str = `{"a": true, "b": "test"}`;
 JSONValue x = parseJSONValue(str);

 work fine?

toJSONValue() is the right function in this case. I've update the 
docs/examples to make that clearer.

Aug 17 2015

"Suliman" <evermind live.ru> writes:

On Monday, 17 August 2015 at 20:07:24 UTC, Sönke Ludwig wrote:
 Am 17.08.2015 um 21:32 schrieb Suliman:
 Why not working:
 JSONValue x = parseJSONValue(`{"a": true, "b": "test"}`);

 but:
 string str = `{"a": true, "b": "test"}`;
 JSONValue x = parseJSONValue(str);

 work fine?

 toJSONValue() is the right function in this case. I've update 
 the docs/examples to make that clearer.

I think that I miss understanding conception of ranges. I reread 
docs but can't understand what I am missing. Ranges is way to 
access of sequences, but why I can't take input from string? 
string is not range?

Aug 17 2015

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 17.08.2015 um 22:23 schrieb Suliman:
 On Monday, 17 August 2015 at 20:07:24 UTC, Sönke Ludwig wrote:
 toJSONValue() is the right function in this case. I've update the
 docs/examples to make that clearer.

 I think that I miss understanding conception of ranges. I reread docs
 but can't understand what I am missing. Ranges is way to access of
 sequences, but why I can't take input from string? string is not range?

String is a valid range, but parseJSONValue takes a *reference* to a 
range, because it directly consumes the range and leaves anything that 
appears after the JSON value in the range. toJSON() on the other hand 
assumes that the JSON value occupies the whole input range.

Aug 17 2015

"Suliman" <evermind live.ru> writes:

 String is a valid range, but parseJSONValue takes a *reference* 
 to a range, because it directly consumes the range and leaves 
 anything that appears after the JSON value in the range. 
 toJSON() on the other hand assumes that the JSON value occupies 
 the whole input range.

Yeas, I understood, but maybe it's better to rename it (or add 
attention in docs, I seen your changes, but I think that you 
should extend it more, to prevent people doing mistake that I 
did) , because I think that it would be hard to understand it for 
people who come from other languages. I am writing in D for a 
long time, but still some things make me confuse...


Do you use DUB to build? It should automatically download the 
dependency.


Failed to download 
http://code.dlang.org/packages/vibe-d/0.7.24.zip: 500 Internal 
Server Error

Possible it was issue with my provider. I will check it later. 
Error above was during attempted to download new version of vibed.

Aug 17 2015

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 17.08.2015 um 22:58 schrieb Suliman:
 String is a valid range, but parseJSONValue takes a *reference* to a
 range, because it directly consumes the range and leaves anything that
 appears after the JSON value in the range. toJSON() on the other hand
 assumes that the JSON value occupies the whole input range.

 Yeas, I understood, but maybe it's better to rename it (or add attention
 in docs, I seen your changes, but I think that you should extend it
 more, to prevent people doing mistake that I did) , because I think that
 it would be hard to understand it for people who come from other
 languages. I am writing in D for a long time, but still some things make
 me confuse...

I agree that the naming can be a bit confusing at first, but I chose 
those names to be consistent with std.conv (to!T and parse!T). I've also 
just noticed that the parser module example erroneously uses 
parseJSONValue(). With proper examples, this should hopefully not be 
that big of a deal.

Aug 17 2015

"Suliman" <evermind live.ru> writes:

Also I can't build last build from git. I am getting error:

source\stdx\data\json\value.d(25,8): Error: module 
taggedalgebraic is in file 'taggedalgebraic.d' which cannot be 
read

Aug 17 2015

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 17.08.2015 um 22:31 schrieb Suliman:
 Also I can't build last build from git. I am getting error:

 source\stdx\data\json\value.d(25,8): Error: module taggedalgebraic is in
 file 'taggedalgebraic.d' which cannot be read

Do you use DUB to build? It should automatically download the 
dependency. Alternatively, it's located here:
https://github.com/s-ludwig/taggedalgebraic/blob/master/source/taggedalgebraic.d

Aug 17 2015

"Suliman" <evermind live.ru> writes:

Also could you look at theme 
http://stackoverflow.com/questions/32033817/how-to-insert-date-to-arangodb

And suggest your variant or approve on of existent.

Aug 17 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 8/17/15 2:56 PM, Sönke Ludwig wrote:
 - The enum is useful to be able to identify the types outside of the D
 code itself. For example when serializing the data to disk, or when
 communicating with C code.

OK.

 - It enables the use of pattern matching (final switch), which is often
 very convenient, faster, and safer than an if-else cascade.

Sounds tenuous.

 - A hypothesis is that it is faster, because there is no function call
 indirection involved.

Again: pointers do all integrals do. To compare:

if (myptr == ThePtrOf!int) { ... this is an int ... }

I want to make clear that this is understood.

 - It naturally enables fully statically typed operator forwarding as far
 as possible (have a look at the examples of the current version). A
 pointer based version could do this, too, but only by jumping through
 hoops.

I'm unclear on that. Could you please point me to the actual file and lines?

 - The same type can be used multiple times with a different enum name.
 This can alternatively be solved using a Typedef!T, but I had several
 occasions where that proved useful.

Unclear on this.



Andrei

Aug 17 2015

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 18.08.2015 um 00:37 schrieb Andrei Alexandrescu:
On 8/17/15 2:56 PM, Sönke Ludwig wrote:
- The enum is useful to be able to identify the types outside of the D
code itself. For example when serializing the data to disk, or when
communicating with C code.

OK.

- It enables the use of pattern matching (final switch), which is often
very convenient, faster, and safer than an if-else cascade.

Sounds tenuous.

It's more convenient/readable in cases where a complex type is used
(typeID == Type.object vs. has!(JSONValue[string]). This is especially
true if the type is ever changed (or parametric) and all has!()/get!()
code needs to be adjusted accordingly.

It's faster, even if there is no indirect call involved in the pointer
case, because the compiler can emit efficient jump tables instead of
generating a series of conditional jumps (if-else-cascade).

It's safer because of the possibility to use final switch in addition to
a normal switch.

I wouldn't call that tenuous.

- A hypothesis is that it is faster, because there is no function call
indirection involved.

Again: pointers do all integrals do. To compare:

if (myptr == ThePtrOf!int) { ... this is an int ... }

I want to make clear that this is understood.

Got that.

- It naturally enables fully statically typed operator forwarding as far
as possible (have a look at the examples of the current version). A
pointer based version could do this, too, but only by jumping through
hoops.

I'm unclear on that. Could you please point me to the actual file and
lines?

See the operator implementation code [1] that is completely statically
typed until the final "switch" happens [2]. You can of course do the
same for the pointer based Algebraic, but that would just
duplicate/override the code that is already implemented by the pointer
method.

- The same type can be used multiple times with a different enum name.
This can alternatively be solved using a Typedef!T, but I had several
occasions where that proved useful.

Unclear on this.

I'd say this is just a little perk of the representation but not a hard
argument since it can be achieved in a different way relatively easily.

[1]:
https://github.com/s-ludwig/taggedalgebraic/blob/591b45ca8f99dbab1da966192c67f45354c1e34e/source/taggedalgebraic.d#L145
[2]:
https://github.com/s-ludwig/taggedalgebraic/blob/591b45ca8f99dbab1da966192c67f45354c1e34e/source/taggedalgebraic.d#L551

Aug 18 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 8/18/15 1:21 PM, Sönke Ludwig wrote:
Am 18.08.2015 um 00:37 schrieb Andrei Alexandrescu:
On 8/17/15 2:56 PM, Sönke Ludwig wrote:
- The enum is useful to be able to identify the types outside of the D
code itself. For example when serializing the data to disk, or when
communicating with C code.

OK.

- It enables the use of pattern matching (final switch), which is often
very convenient, faster, and safer than an if-else cascade.

Sounds tenuous.

It's faster, even if there is no indirect call involved in the pointer
case, because the compiler can emit efficient jump tables instead of
generating a series of conditional jumps (if-else-cascade).

It's safer because of the possibility to use final switch in addition to
a normal switch.

I wouldn't call that tenuous.

Well I guess I would, but no matter. It's something where reasonable
people may disagree.

- A hypothesis is that it is faster, because there is no function call
indirection involved.

Again: pointers do all integrals do. To compare:

if (myptr == ThePtrOf!int) { ... this is an int ... }

I want to make clear that this is understood.

Got that.

I'm unclear on that. Could you please point me to the actual file and
lines?

Classic code factoring can be done to avoid duplication.

- The same type can be used multiple times with a different enum name.
This can alternatively be solved using a Typedef!T, but I had several
occasions where that proved useful.

Unclear on this.

I'd say this is just a little perk of the representation but not a hard
argument since it can be achieved in a different way relatively easily.

[1]:
https://github.com/s-ludwig/taggedalgebraic/blob/591b45ca8f99dbab1da966192c67f45354c1e34e/source/taggedalgebraic.d#L145

[2]:
https://github.com/s-ludwig/taggedalgebraic/blob/591b45ca8f99dbab1da966192c67f45354c1e34e/source/taggedalgebraic.d#L551

Thanks.

Andrei

Aug 21 2015

=?UTF-8?Q?S=c3=b6nke_Ludwig?= <sludwig rejectedsoftware.com> writes:

Am 21.08.2015 um 18:56 schrieb Andrei Alexandrescu:
 On 8/18/15 1:21 PM, Sönke Ludwig wrote:
 Am 18.08.2015 um 00:37 schrieb Andrei Alexandrescu:
 On 8/17/15 2:56 PM, Sönke Ludwig wrote:
 - The enum is useful to be able to identify the types outside of the D
 code itself. For example when serializing the data to disk, or when
 communicating with C code.

 OK.

 - It enables the use of pattern matching (final switch), which is often
 very convenient, faster, and safer than an if-else cascade.

 Sounds tenuous.

 It's more convenient/readable in cases where a complex type is used
 (typeID == Type.object vs. has!(JSONValue[string]). This is especially
 true if the type is ever changed (or parametric) and all has!()/get!()
 code needs to be adjusted accordingly.

 It's faster, even if there is no indirect call involved in the pointer
 case, because the compiler can emit efficient jump tables instead of
 generating a series of conditional jumps (if-else-cascade).

 It's safer because of the possibility to use final switch in addition to
 a normal switch.

 I wouldn't call that tenuous.

 Well I guess I would, but no matter. It's something where reasonable
 people may disagree.

It depends on the perspective/use case, so it's surely not unreasonable 
to disagree here. But I'm especially not happy with the "final switch" 
argument getting dismissed so easily. By the same logic, we could also 
question the existence of "final switch", or even "switch", as a feature 
in the first place.

Performance benefits are certainly nice, too, but that's really just an 
implementation detail. The important trait is that the types get a name 
and that they form an enumerable set. This is quite similar to comparing 
a struct with named members to an anonymous Tuple!(T...).

Aug 22 2015

"deadalnix" <deadalnix gmail.com> writes:

On Wednesday, 12 August 2015 at 08:21:41 UTC, Sönke Ludwig wrote:
 Just to state explicitly what I mean: This strategy has the 
 most efficient in-memory storage format and profits from all 
 the static type checking niceties of the compiler. It also 
 means that there is a documented schema in the code that be 
 used for reference by the developers and that will 
 automatically be verified by the serializer, resulting in less 
 and better checked code. So where applicable I claim that this 
 is the best strategy to work with such data.

 For maximum efficiency, it can also be transparently combined 
 with the pull parser. The pull parser can for example be used 
 to jump between array entries and the serializer then reads 
 each single array entry.

Thing is, the schema is not always known perfectly? Typical case 
is JSON used for configuration, and diverse version of the 
software adding new configurations capabilities, or ignoring old 
ones.

Aug 12 2015

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 12.08.2015 um 19:10 schrieb deadalnix:
 On Wednesday, 12 August 2015 at 08:21:41 UTC, Sönke Ludwig wrote:
 Just to state explicitly what I mean: This strategy has the most
 efficient in-memory storage format and profits from all the static
 type checking niceties of the compiler. It also means that there is a
 documented schema in the code that be used for reference by the
 developers and that will automatically be verified by the serializer,
 resulting in less and better checked code. So where applicable I claim
 that this is the best strategy to work with such data.

 For maximum efficiency, it can also be transparently combined with the
 pull parser. The pull parser can for example be used to jump between
 array entries and the serializer then reads each single array entry.

 Thing is, the schema is not always known perfectly? Typical case is JSON
 used for configuration, and diverse version of the software adding new
 configurations capabilities, or ignoring old ones.

For example in the serialization framework of vibe.d you can have 
 optional or Nullable fields, you can choose to ignore or error out on 
unknown fields, and you can have fields of type "Json" or associative 
arrays to match arbitrary structures. This usually gives enough 
flexibility, assuming that the program is just interested in fields that 
it knows about.

Of course there are situations where you really just want to access the 
raw JSON structure, possibly because you are just interested in a small 
subset of the data. Both, the DOM or the pull parser based approaches, 
fit in there, based on convenience vs. performance considerations. But 
things like storing data as JSON in a database or implementing a JSON 
based protocol usually fit the schema based approach perfectly.

Aug 12 2015

Walter Bright <newshound2 digitalmars.com> writes:

On 8/12/2015 10:10 AM, deadalnix wrote:
 Thing is, the schema is not always known perfectly? Typical case is JSON used
 for configuration, and diverse version of the software adding new
configurations
 capabilities, or ignoring old ones.


Hah, I'd like to replace dmd.conf with a .json file.

Aug 12 2015

"CraigDillabaugh" <craig.dillabaugh gmail.com> writes:

On Thursday, 13 August 2015 at 03:44:14 UTC, Walter Bright wrote:
 On 8/12/2015 10:10 AM, deadalnix wrote:
 Thing is, the schema is not always known perfectly? Typical 
 case is JSON used
 for configuration, and diverse version of the software adding 
 new configurations
 capabilities, or ignoring old ones.


 Hah, I'd like to replace dmd.conf with a .json file.

Not .json!

No configuration file should be in a format that doesn't support 
comments.

Aug 13 2015

Walter Bright <newshound2 digitalmars.com> writes:

On 8/13/2015 5:22 AM, CraigDillabaugh wrote:
 No configuration file should be in a format that doesn't support comments.

[ "comment" : "and you thought it couldn't have comments!" ]

Aug 13 2015

"Craig Dillabaugh" <craig.dillabaugh gmail.com> writes:

On Friday, 14 August 2015 at 00:16:47 UTC, Walter Bright wrote:
 On 8/13/2015 5:22 AM, CraigDillabaugh wrote:
 No configuration file should be in a format that doesn't 
 support comments.

 [ "comment" : "and you thought it couldn't have comments!" ]

You are cheating :o)

There do seem to be some ways to comment JSON files, but they all 
feel, and look like hacks.

I think something like YAML or SDLang even would be better.

Anyway, at least you aren't proposing XML, so I won't complain 
too loudly.

Aug 13 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 8/13/15 8:16 PM, Walter Bright wrote:
 On 8/13/2015 5:22 AM, CraigDillabaugh wrote:
 No configuration file should be in a format that doesn't support
 comments.

 [ "comment" : "and you thought it couldn't have comments!" ]

There can't be two comments with the same key though. -- Andrei

Aug 14 2015

Steven Schveighoffer <schveiguy yahoo.com> writes:

On 8/14/15 8:51 AM, Andrei Alexandrescu wrote:
 On 8/13/15 8:16 PM, Walter Bright wrote:
 On 8/13/2015 5:22 AM, CraigDillabaugh wrote:
 No configuration file should be in a format that doesn't support
 comments.

 [ "comment" : "and you thought it couldn't have comments!" ]


This is invalid (though probably unintentionally). An array cannot have 
names for elements.

 There can't be two comments with the same key though. -- Andrei

Why not? I believe this is valid json:

{
    "comment" : "this is the first value",
    "value1" : 42,
    "comment" : "this is the second value",
    "value2" : 101
}

Though, I would much rather see a better comment tag than "comment":. 
json isn't ideal for this.

-Steve

Aug 14 2015

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Friday, 14 August 2015 at 13:10:53 UTC, Steven Schveighoffer 
wrote:
 On 8/14/15 8:51 AM, Andrei Alexandrescu wrote:
 On 8/13/15 8:16 PM, Walter Bright wrote:
 On 8/13/2015 5:22 AM, CraigDillabaugh wrote:
 No configuration file should be in a format that doesn't 
 support
 comments.

 [ "comment" : "and you thought it couldn't have comments!" ]


 This is invalid (though probably unintentionally). An array 
 cannot have names for elements.

 There can't be two comments with the same key though. -- Andrei

 Why not? I believe this is valid json:


http://tools.ietf.org/html/rfc7159

«The names within an object SHOULD be unique.»

«An object whose names are all unique is interoperable in the 
sense
    that all software implementations receiving that object will 
agree on
    the name-value mappings.  When the names within an object are 
not
    unique, the behavior of software that receives such an object 
is
    unpredictable.  Many implementations report the last 
name/value pair
    only.  Other implementations report an error or fail to parse 
the
    object, and some implementations report all of the name/value 
pairs,
    including duplicates.

    JSON parsing libraries have been observed to differ as to 
whether or
    not they make the ordering of object members visible to calling
    software.  Implementations whose behavior does not depend on 
member
    ordering will be interoperable in the sense that they will not 
be
    affected by these differences.»

Aug 14 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 8/14/15 9:10 AM, Steven Schveighoffer wrote:
 On 8/14/15 8:51 AM, Andrei Alexandrescu wrote:
 On 8/13/15 8:16 PM, Walter Bright wrote:
 On 8/13/2015 5:22 AM, CraigDillabaugh wrote:
 No configuration file should be in a format that doesn't support
 comments.

 [ "comment" : "and you thought it couldn't have comments!" ]


 This is invalid (though probably unintentionally). An array cannot have
 names for elements.

 There can't be two comments with the same key though. -- Andrei

 Why not? I believe this is valid json:

 {
     "comment" : "this is the first value",
     "value1" : 42,
     "comment" : "this is the second value",
     "value2" : 101
 }

 Though, I would much rather see a better comment tag than "comment":.
 json isn't ideal for this.

You're right. Good convo: 
http://stackoverflow.com/questions/21832701/does-json-syntax-allow-duplica
e-keys-in-an-object 
-- Andrei

Aug 14 2015

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Friday, 14 August 2015 at 13:30:44 UTC, Andrei Alexandrescu 
wrote:
 On 8/14/15 9:10 AM, Steven Schveighoffer wrote:
 On 8/14/15 8:51 AM, Andrei Alexandrescu wrote:
 On 8/13/15 8:16 PM, Walter Bright wrote:
 On 8/13/2015 5:22 AM, CraigDillabaugh wrote:
 No configuration file should be in a format that doesn't 
 support
 comments.

 [ "comment" : "and you thought it couldn't have comments!" ]


 This is invalid (though probably unintentionally). An array 
 cannot have
 names for elements.

 There can't be two comments with the same key though. -- 
 Andrei

 Why not? I believe this is valid json:

 {
     "comment" : "this is the first value",
     "value1" : 42,
     "comment" : "this is the second value",
     "value2" : 101
 }

 Though, I would much rather see a better comment tag than 
 "comment":.
 json isn't ideal for this.

 You're right. Good convo: 
 http://stackoverflow.com/questions/21832701/does-json-syntax-allow-duplica
e-keys-in-an-object -- Andrei

No, he is wrong, and even if he was right, he would still be 
wrong. JSON objects are unordered so if you read then write you 
can get:

{
     "comment" : "this is the second value",
     "value1" : 42,
     "value2" : 101
}

Aug 14 2015

Steven Schveighoffer <schveiguy yahoo.com> writes:

On 8/14/15 9:37 AM, "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= 
<ola.fosheim.grostad+dlang gmail.com>" wrote:
 On Friday, 14 August 2015 at 13:30:44 UTC, Andrei Alexandrescu wrote:
 On 8/14/15 9:10 AM, Steven Schveighoffer wrote:
 On 8/14/15 8:51 AM, Andrei Alexandrescu wrote:
 On 8/13/15 8:16 PM, Walter Bright wrote:
 On 8/13/2015 5:22 AM, CraigDillabaugh wrote:
 No configuration file should be in a format that doesn't support
 comments.

 [ "comment" : "and you thought it couldn't have comments!" ]


 This is invalid (though probably unintentionally). An array cannot have
 names for elements.

 There can't be two comments with the same key though. -- Andrei

 Why not? I believe this is valid json:

 {
     "comment" : "this is the first value",
     "value1" : 42,
     "comment" : "this is the second value",
     "value2" : 101
 }

 Though, I would much rather see a better comment tag than "comment":.
 json isn't ideal for this.

 You're right. Good convo:
 http://stackoverflow.com/questions/21832701/does-json-syntax-allow-duplicate-keys-in-an-object


Yes, that's what I checked first :)

 No, he is wrong, and even if he was right, he would still be wrong. JSON
 objects are unordered so if you read then write you can get:

 {
      "comment" : "this is the second value",
      "value1" : 42,
      "value2" : 101
 }

Sure, but:

a) we aren't writing
b) comments are for the human reader, not for the program. Dmd should 
ignore the comments, and it doesn't matter the order.
c) it's not important, I think we all agree a format that has specific 
allowances for comments is better than json.

-Steve

Aug 14 2015

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Friday, 14 August 2015 at 14:09:25 UTC, Steven Schveighoffer 
wrote:
 a) we aren't writing
 b) comments are for the human reader, not for the program. Dmd 
 should ignore the comments, and it doesn't matter the order.
 c) it's not important, I think we all agree a format that has 
 specific allowances for comments is better than json.

One should have a config file format for which there are standard 
libraries that preserves structure and comments. It is quite 
common to have tools that read and write config files.

Aug 14 2015

Steven Schveighoffer <schveiguy yahoo.com> writes:

On 8/14/15 10:44 AM, "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= 
<ola.fosheim.grostad+dlang gmail.com>" wrote:
 On Friday, 14 August 2015 at 14:09:25 UTC, Steven Schveighoffer wrote:
 a) we aren't writing
 b) comments are for the human reader, not for the program. Dmd should
 ignore the comments, and it doesn't matter the order.
 c) it's not important, I think we all agree a format that has specific
 allowances for comments is better than json.

 One should have a config file format for which there are standard
 libraries that preserves structure and comments. It is quite common to
 have tools that read and write config files.

And that would be possible here. JSON file format says nothing about how 
the data is stored in your library. But again, not important.

-Steve

Aug 14 2015

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Friday, 14 August 2015 at 15:11:41 UTC, Steven Schveighoffer 
wrote:
 And that would be possible here. JSON file format says nothing 
 about how the data is stored in your library. But again, not 
 important.

It isn't important since JSON is not too good as a config file 
format, but it is important when considering other formats.

When you read a JSON file into Python or Javascript and write it 
back all dictionary objects will be restructured. For instance, 
when a tool reads a config file and removes attributes it is 
desirable that removed attributes are commented out.

With JSON you would have to hack around it like this:

[ {fieldname1:value1}, {fieldname2:value2} ]

Which is ugly.

I think it would be nice if all D tooling standardized on YAML 
and provided a convenient DOM for it. It is used quite a lot and 
editors have support for it.

Aug 14 2015

"deadalnix" <deadalnix gmail.com> writes:

On Friday, 14 August 2015 at 15:29:12 UTC, Ola Fosheim Grøstad 
wrote:
 On Friday, 14 August 2015 at 15:11:41 UTC, Steven Schveighoffer 
 wrote:
 And that would be possible here. JSON file format says nothing 
 about how the data is stored in your library. But again, not 
 important.

 It isn't important since JSON is not too good as a config file 
 format, but it is important when considering other formats.

 When you read a JSON file into Python or Javascript and write 
 it back all dictionary objects will be restructured. For 
 instance, when a tool reads a config file and removes 
 attributes it is desirable that removed attributes are 
 commented out.

 With JSON you would have to hack around it like this:

 [ {fieldname1:value1}, {fieldname2:value2} ]

 Which is ugly.

 I think it would be nice if all D tooling standardized on YAML 
 and provided a convenient DOM for it. It is used quite a lot 
 and editors have support for it.

It doesn't matter what you think of JSON.

JSON is widely used an needed in the standard lib. PERIOD.

Aug 14 2015

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Friday, 14 August 2015 at 17:31:02 UTC, deadalnix wrote:
 JSON is widely used an needed in the standard lib. PERIOD.

The discussion was about suitability as a standard config file 
format for D not whether it should be in the standard lib. JSON, 
XML and YAML all belong in a standard lib.

Aug 14 2015

Steven Schveighoffer <schveiguy yahoo.com> writes:

On 8/14/15 1:30 PM, deadalnix wrote:
 It doesn't matter what you think of JSON.

 JSON is widely used an needed in the standard lib. PERIOD.

I think you are missing that this sub-discussion is about using json to 
replace dmd configuration file.

-Steve

Aug 14 2015

"rsw0x" <anonymous anonymous.com> writes:

On Friday, 14 August 2015 at 17:40:01 UTC, Steven Schveighoffer 
wrote:
 On 8/14/15 1:30 PM, deadalnix wrote:
 It doesn't matter what you think of JSON.

 JSON is widely used an needed in the standard lib. PERIOD.

 I think you are missing that this sub-discussion is about using 
 json to replace dmd configuration file.

 -Steve

dub uses sdlang, why not dmd?

Aug 14 2015

Walter Bright <newshound2 digitalmars.com> writes:

On 8/14/2015 6:30 AM, Andrei Alexandrescu wrote:
 On 8/14/15 9:10 AM, Steven Schveighoffer wrote:
 Though, I would much rather see a better comment tag than "comment":.
 json isn't ideal for this.

 You're right. Good convo:
 http://stackoverflow.com/questions/21832701/does-json-syntax-allow-duplicate-keys-in-an-object
 -- Andrei

When going for portability, it is not a good idea to emit duplicate keys
because 
many json parsers fail on it. For our own json readers, such as reading a 
dmd.json file with our own parser, it should be fine.

Aug 14 2015

Walter Bright <newshound2 digitalmars.com> writes:

On 8/14/2015 5:51 AM, Andrei Alexandrescu wrote:
 On 8/13/15 8:16 PM, Walter Bright wrote:
 On 8/13/2015 5:22 AM, CraigDillabaugh wrote:
 No configuration file should be in a format that doesn't support
 comments.

 [ "comment" : "and you thought it couldn't have comments!" ]


Should be { }, not [ ]


 There can't be two comments with the same key though. -- Andrei

The Json spec doesn't say that - it doesn't specify any semantic meaning.

Aug 14 2015

Walter Bright <newshound2 digitalmars.com> writes:

On 8/14/2015 1:30 PM, Walter Bright wrote:
 On 8/14/2015 5:51 AM, Andrei Alexandrescu wrote:
 On 8/13/15 8:16 PM, Walter Bright wrote:
 On 8/13/2015 5:22 AM, CraigDillabaugh wrote:
 No configuration file should be in a format that doesn't support
 comments.

 [ "comment" : "and you thought it couldn't have comments!" ]


 Should be { }, not [ ]


 There can't be two comments with the same key though. -- Andrei

 The Json spec doesn't say that - it doesn't specify any semantic meaning.

That is, the ECMA 404 spec. There seems to be more than one JSON spec.

www.ecma-international.org/.../files/.../ECMA-404.pdf

Aug 14 2015

Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:

On 08/14/2015 04:33 PM, Walter Bright wrote:
 That is, the ECMA 404 spec. There seems to be more than one JSON spec.

 www.ecma-international.org/.../files/.../ECMA-404.pdf

Amusingly, that "ECMA-404" link results in an actual HTTP 404.

Aug 21 2015

"Adam D. Ruppe" <destructionator gmail.com> writes:

On Thursday, 13 August 2015 at 03:44:14 UTC, Walter Bright wrote:
 Hah, I'd like to replace dmd.conf with a .json file.

There's an awful lot of people out there replacing json with more 
ini-like files....

Aug 13 2015

"Brad Anderson" <eco gnuk.net> writes:

On Friday, 14 August 2015 at 00:18:39 UTC, Adam D. Ruppe wrote:
 On Thursday, 13 August 2015 at 03:44:14 UTC, Walter Bright 
 wrote:
 Hah, I'd like to replace dmd.conf with a .json file.

 There's an awful lot of people out there replacing json with 
 more ini-like files....

Referring to TOML?

https://github.com/toml-lang/toml

Aug 13 2015

Walter Bright <newshound2 digitalmars.com> writes:

On 8/13/2015 5:18 PM, Adam D. Ruppe wrote:
 On Thursday, 13 August 2015 at 03:44:14 UTC, Walter Bright wrote:
 Hah, I'd like to replace dmd.conf with a .json file.

 There's an awful lot of people out there replacing json with more ini-like
 files....

We've currently invented our own, rather stupid and limited, format. There's no 
point to it over .json.

Aug 13 2015

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 14-Aug-2015 03:48, Walter Bright wrote:
 On 8/13/2015 5:18 PM, Adam D. Ruppe wrote:
 On Thursday, 13 August 2015 at 03:44:14 UTC, Walter Bright wrote:
 Hah, I'd like to replace dmd.conf with a .json file.

 There's an awful lot of people out there replacing json with more
 ini-like
 files....

 We've currently invented our own, rather stupid and limited, format.
 There's no point to it over .json.

YAML is (plus/minus braces) the same but supports comments and is 
increasingly popular for hierarchical configuration files.

-- 
Dmitry Olshansky

Aug 13 2015

Walter Bright <newshound2 digitalmars.com> writes:

On 8/13/2015 11:54 PM, Dmitry Olshansky wrote:
 On 14-Aug-2015 03:48, Walter Bright wrote:
 On 8/13/2015 5:18 PM, Adam D. Ruppe wrote:
 On Thursday, 13 August 2015 at 03:44:14 UTC, Walter Bright wrote:
 Hah, I'd like to replace dmd.conf with a .json file.

 There's an awful lot of people out there replacing json with more
 ini-like
 files....

 We've currently invented our own, rather stupid and limited, format.
 There's no point to it over .json.

 YAML is (plus/minus braces) the same but supports comments and is increasingly
 popular for hierarchical configuration files.

Yes, but we (will) have a .json parser in Phobos.

Aug 14 2015

Jacob Carlborg <doob me.com> writes:

On 2015-08-14 10:04, Walter Bright wrote:

 Yes, but we (will) have a .json parser in Phobos.

Time to add a YAML parser ;)

-- 
/Jacob Carlborg

Aug 14 2015

Rikki Cattermole <alphaglosined gmail.com> writes:

On 15/08/2015 12:40 a.m., Jacob Carlborg wrote:
 On 2015-08-14 10:04, Walter Bright wrote:

 Yes, but we (will) have a .json parser in Phobos.

 Time to add a YAML parser ;)

Heyyy Sonke ;)

Aug 14 2015

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Friday, 14 August 2015 at 12:40:32 UTC, Jacob Carlborg wrote:
 On 2015-08-14 10:04, Walter Bright wrote:

 Yes, but we (will) have a .json parser in Phobos.

 Time to add a YAML parser ;)

I think kiith-sa has started on that:

https://github.com/kiith-sa/D-YAML

Aug 14 2015

Walter Bright <newshound2 digitalmars.com> writes:

On 8/14/2015 5:40 AM, Jacob Carlborg wrote:
 On 2015-08-14 10:04, Walter Bright wrote:

 Yes, but we (will) have a .json parser in Phobos.

 Time to add a YAML parser ;)

That's a good idea, but since dmd already emits json and requires incorporation 
of the json code, the fewer file formats it has to deal with, the better.

Config files will work fine with json format.

Aug 14 2015

"suliman" <Evermind live.ru> writes:

On Friday, 14 August 2015 at 20:44:59 UTC, Walter Bright wrote:
 On 8/14/2015 5:40 AM, Jacob Carlborg wrote:
 On 2015-08-14 10:04, Walter Bright wrote:

 Yes, but we (will) have a .json parser in Phobos.

 Time to add a YAML parser ;)

 That's a good idea, but since dmd already emits json and 
 requires incorporation of the json code, the fewer file formats 
 it has to deal with, the better.

 Config files will work fine with json format.

Walter, and what I should to do for commenting stringin config 
for test purpose? How it's can be done with json?

I really think that dmd should use same format as dub

Aug 14 2015

Walter Bright <newshound2 digitalmars.com> writes:

On 8/14/2015 9:58 PM, suliman wrote:
 On Friday, 14 August 2015 at 20:44:59 UTC, Walter Bright wrote:
 Config files will work fine with json format.

 Walter, and what I should to do for commenting stringin config for test
purpose?
 How it's can be done with json?

{ "comment" : "this is a comment" }


 I really think that dmd should use same format as dub

json is a format that everybody understands, and dmd has json code already in
it 
(as dmd generates json files)

Aug 14 2015

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Saturday, 15 August 2015 at 05:03:52 UTC, Walter Bright wrote:
 On 8/14/2015 9:58 PM, suliman wrote:
 On Friday, 14 August 2015 at 20:44:59 UTC, Walter Bright wrote:
 Config files will work fine with json format.

 Walter, and what I should to do for commenting stringin config 
 for test purpose?
 How it's can be done with json?

 { "comment" : "this is a comment" }


 I really think that dmd should use same format as dub

 json is a format that everybody understands, and dmd has json 
 code already in it (as dmd generates json files)

And you end up with each D tool having their own config format… 
:-(

http://www.json2yaml.com/

Aug 15 2015

Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:

On 08/15/2015 01:03 AM, Walter Bright wrote:
 On 8/14/2015 9:58 PM, suliman wrote:
 On Friday, 14 August 2015 at 20:44:59 UTC, Walter Bright wrote:
 Config files will work fine with json format.

 Walter, and what I should to do for commenting stringin config for
 test purpose?
 How it's can be done with json?

 { "comment" : "this is a comment" }

I'll take an "invented our own, rather stupid and limited, format" over 
comments that ugly any day.

Seriously, with DUB, I've been using json for configuration file a lot 
lately, and dmd.conf is a way nicer config format. There's very good 
reason DUB's added an alternate format.

Aug 21 2015

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 14-Aug-2015 11:04, Walter Bright wrote:
 On 8/13/2015 11:54 PM, Dmitry Olshansky wrote:
 On 14-Aug-2015 03:48, Walter Bright wrote:
 On 8/13/2015 5:18 PM, Adam D. Ruppe wrote:
 On Thursday, 13 August 2015 at 03:44:14 UTC, Walter Bright wrote:
 Hah, I'd like to replace dmd.conf with a .json file.

 There's an awful lot of people out there replacing json with more
 ini-like
 files....

 We've currently invented our own, rather stupid and limited, format.
 There's no point to it over .json.

 YAML is (plus/minus braces) the same but supports comments and is
 increasingly
 popular for hierarchical configuration files.

 Yes, but we (will) have a .json parser in Phobos.

We actually have YAML parser in DUB repository plus so that can be 
copied over to the compiler source interim. And doesn't have to be 
particularly fast it just have to work resonably well.

-- 
Dmitry Olshansky

Aug 14 2015

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Tuesday, 11 August 2015 at 21:06:24 UTC, Sönke Ludwig wrote:
 However, my goal when implementing this has never been to make 
 the DOM representation as efficient as possible. The simple 
 reason is that a DOM representation is inherently inefficient 
 when compared to operating on the structure using either the 
 pull parser or using a deserializer that directly converts into 
 a static D type. IMO these should be advertised instead of 
 trying to milk a dead cow (in terms of performance).

Maybe it is better to just focus on having a top-of-the-line 
parser and then let competing DOM implementations build on top of 
it.

I'm personally only interested in structured JSON, I think most 
webapps use structured JSON informally.

Aug 12 2015

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 11.08.2015 um 19:08 schrieb Atila Neves:
 On Tuesday, 28 July 2015 at 14:07:19 UTC, Atila Neves wrote:
 Start of the two week process, folks.

 Code: https://github.com/s-ludwig/std_data_json
 Docs: http://s-ludwig.github.io/std_data_json/

 Atila

 I forgot to give warnings that the two week period was about to be up,
 and was unsure from comments if this would be ready for voting, so let's
 give it another two days unless there are objections.

 Atila

I think we really need to have an informal pre-vote about the BigInt and 
DOM efficiency vs. functionality issues. Basically there are three 
options for each:

1. Keep them: May have an impact on compile time for big DOMs (run 
time/memory consumption wouldn't be affected if a pointer to BigInt is 
stored). But provides an out-of-the-box experience for a broad set of 
applications.

2. Remove them: Results in a slim and clean API that is fast (to 
run/compile), but also one that will be less useful for certain 
applications.

3. Make them CT configurable: Best of both worlds in terms of speed, at 
the cost of a more complex API.

4. Use a string representation instead of BigInt: This has it's own set 
of issues, but would also enable some special use cases [1] [2] ([2] is 
also solved by BigInt/Decimal support, though).

I'd also like to postpone the main vote, if there are no objections, 
until the question of using a general enum based alternative to 
Algebraic is answered. I've published an initial candidate for this now [3].

These were, AFAICS, the only major open issues (a decision for an opt() 
variant would be nice, but fortunately that's not a fundamental decision 
in any way). There is also the topic of avoiding any redundancy in 
symbol names, which I don't agree with, but I would of course change it 
if the inclusion depends on that.

[1]: https://github.com/rejectedsoftware/vibe.d/issues/431
[2]: 
http://forum.rejectedsoftware.com/groups/rejectedsoftware.vibed/thread/10098/
[3]: http://code.dlang.org/packages/taggedalgebraic

Aug 13 2015

Walter Bright <newshound2 digitalmars.com> writes:

On 8/13/2015 3:51 AM, Sönke Ludwig wrote:
 These were, AFAICS, the only major open issues (a decision for an opt() variant
 would be nice, but fortunately that's not a fundamental decision in any way).

1. What about the issue of having the API be a composable range interface?

http://s-ludwig.github.io/std_data_json/stdx/data/json/lexer/lexJSON.html

I.e. the input range should be the FIRST argument, not the last.

2. Why are integers acceptable as lexer input? The spec specifies Unicode.

3. Why are there 4 functions that do the same thing?

http://s-ludwig.github.io/std_data_json/stdx/data/json/generator.html

After all, there already is a 
http://s-ludwig.github.io/std_data_json/stdx/data/json/generator/GeneratorOptions.html

Aug 13 2015

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 14.08.2015 um 02:26 schrieb Walter Bright:
 On 8/13/2015 3:51 AM, Sönke Ludwig wrote:
 These were, AFAICS, the only major open issues (a decision for an
 opt() variant
 would be nice, but fortunately that's not a fundamental decision in
 any way).

 1. What about the issue of having the API be a composable range interface?

 http://s-ludwig.github.io/std_data_json/stdx/data/json/lexer/lexJSON.html

 I.e. the input range should be the FIRST argument, not the last.

Hm, it *is* the first function argument, just the last template argument.

 2. Why are integers acceptable as lexer input? The spec specifies Unicode.

In this case, the lexer will perform on-the-fly UTF validation of the 
input. It can do so more efficiently than first validating the input 
using a wrapper range, because it has to check the value of most 
incoming code units anyway.

 3. Why are there 4 functions that do the same thing?

 http://s-ludwig.github.io/std_data_json/stdx/data/json/generator.html

 After all, there already is a
 http://s-ludwig.github.io/std_data_json/stdx/data/json/generator/GeneratorOptions.html

There are two classes of functions that are not covered by 
GeneratorOptions: writing to a stream or returning a string. But you are 
right that pretty printing should be controlled by GeneratorOptions. 
I'll fix that. The suggestion to use pretty printing by default also 
sounds good.

Aug 13 2015

Walter Bright <newshound2 digitalmars.com> writes:

On 8/13/2015 11:52 PM, Sönke Ludwig wrote:
 Am 14.08.2015 um 02:26 schrieb Walter Bright:
 On 8/13/2015 3:51 AM, Sönke Ludwig wrote:
 These were, AFAICS, the only major open issues (a decision for an
 opt() variant
 would be nice, but fortunately that's not a fundamental decision in
 any way).

 1. What about the issue of having the API be a composable range interface?

 http://s-ludwig.github.io/std_data_json/stdx/data/json/lexer/lexJSON.html

 I.e. the input range should be the FIRST argument, not the last.

 Hm, it *is* the first function argument, just the last template argument.

Ok, my mistake. I didn't look at the others.

I don't know what 'isStringInputRange' is. Whatever it is, it should be a
'range 
of char'.


 2. Why are integers acceptable as lexer input? The spec specifies Unicode.

 In this case, the lexer will perform on-the-fly UTF validation of the input. It
 can do so more efficiently than first validating the input using a wrapper
 range, because it has to check the value of most incoming code units anyway.

There is no reason to validate UTF-8 input. The only place where non-ASCII code 
units can even legally appear is inside strings, and there they can just be 
copied verbatim while looking for the end of the string.


 3. Why are there 4 functions that do the same thing?

 http://s-ludwig.github.io/std_data_json/stdx/data/json/generator.html

 After all, there already is a
 http://s-ludwig.github.io/std_data_json/stdx/data/json/generator/GeneratorOptions.html

 There are two classes of functions that are not covered by GeneratorOptions:
 writing to a stream or returning a string.

Why do both? Always return an input range. If the user wants a string, he can 
pipe the input range to a string generator, such as .array

 But you are right that pretty
 printing should be controlled by GeneratorOptions. I'll fix that. The
suggestion
 to use pretty printing by default also sounds good.

Thanks

Aug 14 2015

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 14.08.2015 um 10:17 schrieb Walter Bright:
 On 8/13/2015 11:52 PM, Sönke Ludwig wrote:
 Am 14.08.2015 um 02:26 schrieb Walter Bright:
 On 8/13/2015 3:51 AM, Sönke Ludwig wrote:
 These were, AFAICS, the only major open issues (a decision for an
 opt() variant
 would be nice, but fortunately that's not a fundamental decision in
 any way).

 1. What about the issue of having the API be a composable range
 interface?

 http://s-ludwig.github.io/std_data_json/stdx/data/json/lexer/lexJSON.html


 I.e. the input range should be the FIRST argument, not the last.

 Hm, it *is* the first function argument, just the last template argument.

 Ok, my mistake. I didn't look at the others.

 I don't know what 'isStringInputRange' is. Whatever it is, it should be
 a 'range of char'.

I'll rename it to isCharInputRange. We don't have something like that in 
Phobos, right?

 2. Why are integers acceptable as lexer input? The spec specifies
 Unicode.

 In this case, the lexer will perform on-the-fly UTF validation of the
 input. It
 can do so more efficiently than first validating the input using a
 wrapper
 range, because it has to check the value of most incoming code units
 anyway.

 There is no reason to validate UTF-8 input. The only place where
 non-ASCII code units can even legally appear is inside strings, and
 there they can just be copied verbatim while looking for the end of the
 string.

The idea is to assume that any char based input is already valid UTF (as 
D defines it), while integer based input comes from an unverified 
source, so that it still has to be validated before being cast/copied 
into a 'string'. I think this is a sensible approach, both semantically 
and performance-wise.

 3. Why are there 4 functions that do the same thing?

 http://s-ludwig.github.io/std_data_json/stdx/data/json/generator.html

 After all, there already is a
 http://s-ludwig.github.io/std_data_json/stdx/data/json/generator/GeneratorOptions.html

 There are two classes of functions that are not covered by
 GeneratorOptions:
 writing to a stream or returning a string.

 Why do both? Always return an input range. If the user wants a string,
 he can pipe the input range to a string generator, such as .array

Convenience for one. The lack of number to input range conversion 
functions is another concern. I'm not really keen to implement an input 
range style floating-point to string conversion routine just for this 
module.

Finally, I'm a little worried about performance. The output range based 
approach can keep a lot of state implicitly using the program counter 
register. But an input range would explicitly have to keep track of the 
current JSON element, as well as the current character/state within that 
element (and possibly one level deeper, for example for escape 
sequences). This means that it will require either multiple branches or 
indirection for each popFront().

Aug 15 2015

"Suliman" <evermind live.ru> writes:

I talked with few people and they said that they are prefer 
current vibed's json implementation. What's wrong with it? Why do 
not stay old? They look more easier that new...

IMHO API of current is much harder.

Aug 15 2015

"Laeeth Isharc" <spamnolaeeth nospamlaeeth.com> writes:

On Saturday, 15 August 2015 at 17:07:36 UTC, Suliman wrote:
 I talked with few people and they said that they are prefer 
 current vibed's json implementation. What's wrong with it? Why 
 do not stay old? They look more easier that new...

 IMHO API of current is much harder.

New stream parser is fast!  (See prior thread on benchmarks).

Aug 15 2015

Walter Bright <newshound2 digitalmars.com> writes:

On 8/15/2015 3:18 AM, Sönke Ludwig wrote:
 I don't know what 'isStringInputRange' is. Whatever it is, it should be
 a 'range of char'.

 I'll rename it to isCharInputRange. We don't have something like that in
Phobos,
 right?

That's right, there isn't one. But I use:

     if (isInputRange!R && is(Unqual!(ElementEncodingType!R) == char))

I'm not a fan of more names for trivia, the deluge of names has its own costs.


 There is no reason to validate UTF-8 input. The only place where
 non-ASCII code units can even legally appear is inside strings, and
 there they can just be copied verbatim while looking for the end of the
 string.

 The idea is to assume that any char based input is already valid UTF (as D
 defines it), while integer based input comes from an unverified source, so that
 it still has to be validated before being cast/copied into a 'string'. I think
 this is a sensible approach, both semantically and performance-wise.

The json parser will work fine without doing any validation at all. I've been 
implementing string handling code in Phobos with the idea of doing validation 
only if the algorithm requires it, and only for those parts that require it.

There are many validation algorithms in Phobos one can tack on - having two 
implementations of every algorithm, one with an embedded reinvented validation 
and one without - is too much.

The general idea with algorithms is that they do not combine things, but they 
enable composition.


 Why do both? Always return an input range. If the user wants a string,
 he can pipe the input range to a string generator, such as .array

 Convenience for one.

Back to the previous point, that means that every algorithm in Phobos should 
have two versions, one that returns a range and the other a string? All these 
variations will result in a combinatorical explosion.

The other problem, of course, is that returning a string means the algorithm
has 
to decide how to allocate that string. As much as possible, algorithms should 
not be making allocation decisions.


 The lack of number to input range conversion functions is
 another concern. I'm not really keen to implement an input range style
 floating-point to string conversion routine just for this module.

Not sure what you mean. Phobos needs such routines anyway, and you still have
to 
do something about floating point.


 Finally, I'm a little worried about performance. The output range based
approach
 can keep a lot of state implicitly using the program counter register. But an
 input range would explicitly have to keep track of the current JSON element, as
 well as the current character/state within that element (and possibly one level
 deeper, for example for escape sequences). This means that it will require
 either multiple branches or indirection for each popFront().

Often this is made up for by not needing to allocate storage. Also, that state 
is in the cached "hot zone" on top of the stack, which is much faster to access 
than a cold uninitialized array.

I share your concern with performance, and I had very good results with Warp by 
keeping all the state on the stack in this manner.

Aug 15 2015

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 16-Aug-2015 03:50, Walter Bright wrote:
 On 8/15/2015 3:18 AM, Sönke Ludwig wrote:
 There is no reason to validate UTF-8 input. The only place where
 non-ASCII code units can even legally appear is inside strings, and
 there they can just be copied verbatim while looking for the end of the
 string.

 The idea is to assume that any char based input is already valid UTF
 (as D
 defines it), while integer based input comes from an unverified
 source, so that
 it still has to be validated before being cast/copied into a 'string'.
 I think
 this is a sensible approach, both semantically and performance-wise.

 The json parser will work fine without doing any validation at all. I've
 been implementing string handling code in Phobos with the idea of doing
 validation only if the algorithm requires it, and only for those parts
 that require it.

Aye.

 There are many validation algorithms in Phobos one can tack on - having
 two implementations of every algorithm, one with an embedded reinvented
 validation and one without - is too much.

Actually there are next to none. `validate` that throws on failed 
validation is a misnomer.

 The general idea with algorithms is that they do not combine things, but
 they enable composition.

At the lower level such as tokenizers combining a couple of simple steps 
together makes sense because it makes things run faster. It usually 
eliminates the need for temporary result that must be digestible by the 
next range.

For instance "combining" decoding and character classification one may 
side-step generating the codepoint value itself (because now it doesn't 
have to produce it for the top-level algorithm).


-- 
Dmitry Olshansky

Aug 15 2015

Walter Bright <newshound2 digitalmars.com> writes:

On 8/15/2015 11:52 PM, Dmitry Olshansky wrote:
 For instance "combining" decoding and character classification one may
side-step
 generating the codepoint value itself (because now it doesn't have to produce
it
 for the top-level algorithm).

Perhaps, but I wouldn't be convinced without benchmarks to prove it on a 
case-by-case basis.

But it's moot, as json lexing never needs to decode.

Aug 16 2015

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 16-Aug-2015 11:30, Walter Bright wrote:
 On 8/15/2015 11:52 PM, Dmitry Olshansky wrote:
 For instance "combining" decoding and character classification one may
 side-step
 generating the codepoint value itself (because now it doesn't have to
 produce it
 for the top-level algorithm).

 Perhaps, but I wouldn't be convinced without benchmarks to prove it on a
 case-by-case basis.

About x2 faster then decode + check-if-alphabetic on my stuff:

https://github.com/DmitryOlshansky/gsoc-bench-2012

I haven't updated it in a while. There are nice bargraphs for decoding 
versions by David comparing DMD vs LDC vs GDC:

Page 15 at http://dconf.org/2013/talks/nadlinger.pdf

 But it's moot, as json lexing never needs to decode.

Agreed.

-- 
Dmitry Olshansky

Aug 16 2015

Walter Bright <newshound2 digitalmars.com> writes:

On 8/16/2015 3:39 AM, Dmitry Olshansky wrote:
 About x2 faster then decode + check-if-alphabetic on my stuff:

 https://github.com/DmitryOlshansky/gsoc-bench-2012

 I haven't updated it in a while. There are nice bargraphs for decoding versions
 by David comparing DMD vs LDC vs GDC:

 Page 15 at http://dconf.org/2013/talks/nadlinger.pdf

Thank you.

Aug 16 2015

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 16.08.2015 um 02:50 schrieb Walter Bright:
 On 8/15/2015 3:18 AM, Sönke Ludwig wrote:
 I don't know what 'isStringInputRange' is. Whatever it is, it should be
 a 'range of char'.

 I'll rename it to isCharInputRange. We don't have something like that
 in Phobos,
 right?

 That's right, there isn't one. But I use:

      if (isInputRange!R && is(Unqual!(ElementEncodingType!R) == char))

 I'm not a fan of more names for trivia, the deluge of names has its own
 costs.

Good, I'll use `if (isInputRange!R && 
(isSomeChar!(ElementEncodingType!R) || 
isIntegral!(ElementEncodingType!R))`. It's just used in number of places 
and quite a bit more verbose (twice as long) and I guess a large number 
of algorithms in Phobos accept char ranges, so that may actually warrant 
a name in this case.

 There is no reason to validate UTF-8 input. The only place where
 non-ASCII code units can even legally appear is inside strings, and
 there they can just be copied verbatim while looking for the end of the
 string.

 The idea is to assume that any char based input is already valid UTF
 (as D
 defines it), while integer based input comes from an unverified
 source, so that
 it still has to be validated before being cast/copied into a 'string'.
 I think
 this is a sensible approach, both semantically and performance-wise.

 The json parser will work fine without doing any validation at all. I've
 been implementing string handling code in Phobos with the idea of doing
 validation only if the algorithm requires it, and only for those parts
 that require it.

Yes, and it won't do that if a char range is passed in. If the integral 
range path gets removed there are basically two possibilities left, 
perform the validation up-front (slower), or risk UTF exceptions in 
unrelated parts of the code base. I don't see why we shouldn't take the 
opportunity for a full and fast validation here. But I'll relay this to 
Andrei, it was his idea originally.

 There are many validation algorithms in Phobos one can tack on - having
 two implementations of every algorithm, one with an embedded reinvented
 validation and one without - is too much.

There is nothing reinvented here. It simply implicitly validates all 
non-string parts of a JSON document and uses validate() for parts of 
JSON strings that can contain unicode characters.

 The general idea with algorithms is that they do not combine things, but
 they enable composition.

It's just that there is no way to achieve the same performance using 
composition in this case.

 Why do both? Always return an input range. If the user wants a string,
 he can pipe the input range to a string generator, such as .array

 Convenience for one.

 Back to the previous point, that means that every algorithm in Phobos
 should have two versions, one that returns a range and the other a
 string? All these variations will result in a combinatorical explosion.

This may be a factor of two, but not a combinatorial explosion.

 The other problem, of course, is that returning a string means the
 algorithm has to decide how to allocate that string. As much as
 possible, algorithms should not be making allocation decisions.

Granted, the fact that format() and to!() support input ranges (I didn't 
notice that until now) makes the issue less important. But without 
those, it would basically mean that almost all places that generate JSON 
strings would have to import std.array and append .array. Nothing 
particularly bad if viewed in isolation, but makes the language appear a 
lot less clean/more verbose if it occurs often. It's also a stepping 
stone for language newcomers.

 The lack of number to input range conversion functions is
 another concern. I'm not really keen to implement an input range style
 floating-point to string conversion routine just for this module.

 Not sure what you mean. Phobos needs such routines anyway, and you still
 have to do something about floating point.

There are output range and allocation based float->string conversions 
available, but no input range based one. But well, using an internal 
buffer together with formattedWrite would probably be a viable workaround...

 Finally, I'm a little worried about performance. The output range
 based approach
 can keep a lot of state implicitly using the program counter register.
 But an
 input range would explicitly have to keep track of the current JSON
 element, as
 well as the current character/state within that element (and possibly
 one level
 deeper, for example for escape sequences). This means that it will
 require
 either multiple branches or indirection for each popFront().

 Often this is made up for by not needing to allocate storage. Also, that
 state is in the cached "hot zone" on top of the stack, which is much
 faster to access than a cold uninitialized array.

Just branch misprediction will most probably be problematic. But I think 
this can be made fast enough anyway by making the input range partially 
eager and serving chunks of strings at a time. That way, the additional 
branching only has to happen once per chunk. I'll have a look.

 I share your concern with performance, and I had very good results with
 Warp by keeping all the state on the stack in this manner.

Aug 16 2015

Jacob Carlborg <doob me.com> writes:

On 2015-08-16 14:34, Sönke Ludwig wrote:

 Good, I'll use `if (isInputRange!R &&
 (isSomeChar!(ElementEncodingType!R) ||
 isIntegral!(ElementEncodingType!R))`. It's just used in number of places
 and quite a bit more verbose (twice as long) and I guess a large number
 of algorithms in Phobos accept char ranges, so that may actually warrant
 a name in this case.

I agree. Signatures like this are what's making std.algorithm look more 
complicated than it is.

-- 
/Jacob Carlborg

Aug 16 2015

Walter Bright <newshound2 digitalmars.com> writes:

On 8/16/2015 5:34 AM, Sönke Ludwig wrote:
 Am 16.08.2015 um 02:50 schrieb Walter Bright:
      if (isInputRange!R && is(Unqual!(ElementEncodingType!R) == char))

 I'm not a fan of more names for trivia, the deluge of names has its own
 costs.

 Good, I'll use `if (isInputRange!R && (isSomeChar!(ElementEncodingType!R) ||
 isIntegral!(ElementEncodingType!R))`. It's just used in number of places and
 quite a bit more verbose (twice as long) and I guess a large number of
 algorithms in Phobos accept char ranges, so that may actually warrant a name in
 this case.

Except that there is no reason to support wchar, dchar, int, ubyte, or anything 
other than char. The idea is not to support something just because you can, but 
there should be an identifiable, real use case for it first. Has anyone ever 
seen Json data as ulongs? I haven't either.


 The json parser will work fine without doing any validation at all. I've
 been implementing string handling code in Phobos with the idea of doing
 validation only if the algorithm requires it, and only for those parts
 that require it.

 Yes, and it won't do that if a char range is passed in. If the integral range
 path gets removed there are basically two possibilities left, perform the
 validation up-front (slower), or risk UTF exceptions in unrelated parts of the
 code base. I don't see why we shouldn't take the opportunity for a full and
fast
 validation here. But I'll relay this to Andrei, it was his idea originally.

That argument could be used to justify validation in every single algorithm
that 
deals with strings.


 Why do both? Always return an input range. If the user wants a string,
 he can pipe the input range to a string generator, such as .array

 Convenience for one.

 Back to the previous point, that means that every algorithm in Phobos
 should have two versions, one that returns a range and the other a
 string? All these variations will result in a combinatorical explosion.

 This may be a factor of two, but not a combinatorial explosion.

We're already up to validate or not, to string or not, i.e. 4 combinations.


 The other problem, of course, is that returning a string means the
 algorithm has to decide how to allocate that string. As much as
 possible, algorithms should not be making allocation decisions.

 Granted, the fact that format() and to!() support input ranges (I didn't notice
 that until now) makes the issue less important. But without those, it would
 basically mean that almost all places that generate JSON strings would have to
 import std.array and append .array. Nothing particularly bad if viewed in
 isolation, but makes the language appear a lot less clean/more verbose if it
 occurs often. It's also a stepping stone for language newcomers.

This has been argued before, and the problem is it applies to EVERY algorithm
in 
Phobos, and winds up with a doubling of the number of functions to deal with
it. 
I do not view this as clean.

D is going to be built around ranges as a fundamental way of coding. Users will 
need to learn something about them. Appending .array is not a big hill to climb.


 There are output range and allocation based float->string conversions
available,
 but no input range based one. But well, using an internal buffer together with
 formattedWrite would probably be a viable workaround...

I plan to fix that, so using a workaround in the meantime is appropriate.

Aug 16 2015

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 17.08.2015 um 00:03 schrieb Walter Bright:
 On 8/16/2015 5:34 AM, Sönke Ludwig wrote:
 Am 16.08.2015 um 02:50 schrieb Walter Bright:
      if (isInputRange!R && is(Unqual!(ElementEncodingType!R) == char))

 I'm not a fan of more names for trivia, the deluge of names has its own
 costs.

 Good, I'll use `if (isInputRange!R &&
 (isSomeChar!(ElementEncodingType!R) ||
 isIntegral!(ElementEncodingType!R))`. It's just used in number of
 places and
 quite a bit more verbose (twice as long) and I guess a large number of
 algorithms in Phobos accept char ranges, so that may actually warrant
 a name in
 this case.

 Except that there is no reason to support wchar, dchar, int, ubyte, or
 anything other than char. The idea is not to support something just
 because you can, but there should be an identifiable, real use case for
 it first. Has anyone ever seen Json data as ulongs? I haven't either.

But you have seen ubyte[] when reading something from a file or from a 
network stream. But since Andrei now also wants to remove it, so be it. 
I'll answer some of the other points anyway:

 The json parser will work fine without doing any validation at all. I've
 been implementing string handling code in Phobos with the idea of doing
 validation only if the algorithm requires it, and only for those parts
 that require it.

 Yes, and it won't do that if a char range is passed in. If the
 integral range
 path gets removed there are basically two possibilities left, perform the
 validation up-front (slower), or risk UTF exceptions in unrelated
 parts of the
 code base. I don't see why we shouldn't take the opportunity for a
 full and fast
 validation here. But I'll relay this to Andrei, it was his idea
 originally.

 That argument could be used to justify validation in every single
 algorithm that deals with strings.

Not really for all, but indeed there are more where this could apply in 
theory. However, JSON is used frequently in situations where parsing 
speed, or performance in general, is often crucial (e.g. web services), 
which makes it stand out due to practical concerns. Others, such as an 
XML parser would apply, too, but probably none of the generic string 
manipulation functions.

 Why do both? Always return an input range. If the user wants a string,
 he can pipe the input range to a string generator, such as .array

 Convenience for one.

 Back to the previous point, that means that every algorithm in Phobos
 should have two versions, one that returns a range and the other a
 string? All these variations will result in a combinatorical explosion.

 This may be a factor of two, but not a combinatorial explosion.

 We're already up to validate or not, to string or not, i.e. 4 combinations.

Validation is part of the lexer and not the generator. There is no 
combinatorial relation between the two. Validation is also just a 
template parameter, so there are no two combinations in terms of 
implementation either. There is just a "static if" statement somewhere 
to decide if validate() should be called or not.

 The other problem, of course, is that returning a string means the
 algorithm has to decide how to allocate that string. As much as
 possible, algorithms should not be making allocation decisions.

 Granted, the fact that format() and to!() support input ranges (I
 didn't notice
 that until now) makes the issue less important. But without those, it
 would
 basically mean that almost all places that generate JSON strings would
 have to
 import std.array and append .array. Nothing particularly bad if viewed in
 isolation, but makes the language appear a lot less clean/more verbose
 if it
 occurs often. It's also a stepping stone for language newcomers.

 This has been argued before, and the problem is it applies to EVERY
 algorithm in Phobos, and winds up with a doubling of the number of
 functions to deal with it. I do not view this as clean.

 D is going to be built around ranges as a fundamental way of coding.
 Users will need to learn something about them. Appending .array is not a
 big hill to climb.

It isn't if you get taught about it. But it surely is if you don't know 
about it yet and try to get something working based only on the JSON API 
(language newcomer that wants to work with JSON). It's also still an 
additional thing to remember, type and read, making it an additional 
piece of cognitive load, even for developers that are fluent with this. 
Have many of such pieces and they add up to a point where productivity 
goes to its knees.

I already personally find it quite annoying constantly having to import 
std.range, std.array and std.algorithm to just use some small piece of 
functionality in std.algorithm. It's also often not clear in which of 
the three modules/packages a certain function is. We need to find a 
better balance here if D is to keep its appeal as a language where you 
stay in "the zone"  (a.k.a flow), which always has been a big thing for me.

Aug 22 2015

Walter Bright <newshound2 digitalmars.com> writes:

On 8/22/2015 5:21 AM, Sönke Ludwig wrote:
 Am 17.08.2015 um 00:03 schrieb Walter Bright:
 D is going to be built around ranges as a fundamental way of coding.
 Users will need to learn something about them. Appending .array is not a
 big hill to climb.

 It isn't if you get taught about it. But it surely is if you don't know about
it
 yet and try to get something working based only on the JSON API (language
 newcomer that wants to work with JSON).

Not if the illuminating example in the Json API description does it that way. 
Newbies will tend to copy/pasta the examples as a starting point.

 It's also still an additional thing to
 remember, type and read, making it an additional piece of cognitive load, even
 for developers that are fluent with this. Have many of such pieces and they add
 up to a point where productivity goes to its knees.

Having composable components behaving in predictable ways is not an additional 
piece of cognitive load, it is less of one.


 I already personally find it quite annoying constantly having to import
 std.range, std.array and std.algorithm to just use some small piece of
 functionality in std.algorithm. It's also often not clear in which of the three
 modules/packages a certain function is. We need to find a better balance here
if
 D is to keep its appeal as a language where you stay in "the zone"  (a.k.a
 flow), which always has been a big thing for me.

If I buy a toy car, I get a toy car. If I get a lego set, I can build any toy 
with it. I believe the composable component approach will make Phobos smaller 
and much more flexible and useful, as opposed to monolithic APIs.

Aug 24 2015

=?UTF-8?Q?S=c3=b6nke_Ludwig?= <sludwig rejectedsoftware.com> writes:

Am 24.08.2015 um 22:25 schrieb Walter Bright:
 On 8/22/2015 5:21 AM, Sönke Ludwig wrote:
 Am 17.08.2015 um 00:03 schrieb Walter Bright:
 D is going to be built around ranges as a fundamental way of coding.
 Users will need to learn something about them. Appending .array is not a
 big hill to climb.

 It isn't if you get taught about it. But it surely is if you don't
 know about it
 yet and try to get something working based only on the JSON API (language
 newcomer that wants to work with JSON).

 Not if the illuminating example in the Json API description does it that
 way. Newbies will tend to copy/pasta the examples as a starting point.

That's true, but then they will possibly have to understand the inner 
workings soon after, for example when something goes wrong and they get 
cryptic error messages. It makes the learning curve steeper, even if 
some of that can be mitigated with good documentation/tutorials.

 It's also still an additional thing to
 remember, type and read, making it an additional piece of cognitive
 load, even
 for developers that are fluent with this. Have many of such pieces and
 they add
 up to a point where productivity goes to its knees.

 Having composable components behaving in predictable ways is not an
 additional piece of cognitive load, it is less of one.

Having to write additional things that are not part of the problem 
(".array", "import std.array : array;") is cognitive load and having to 
read such things is cognitive and visual load. Also, having to remember 
where those additional components reside is cognitive load, at least if 
they are not used really frequently. This has of course nothing to do 
with predictable behavior of the components, but with the API/language 
boundary between ranges and arrays.

 I already personally find it quite annoying constantly having to import
 std.range, std.array and std.algorithm to just use some small piece of
 functionality in std.algorithm. It's also often not clear in which of
 the three
 modules/packages a certain function is. We need to find a better
 balance here if
 D is to keep its appeal as a language where you stay in "the zone"
 (a.k.a
 flow), which always has been a big thing for me.

 If I buy a toy car, I get a toy car. If I get a lego set, I can build
 any toy with it. I believe the composable component approach will make
 Phobos smaller and much more flexible and useful, as opposed to
 monolithic APIs.

I'm not arguing against a range based approach! It's just that such an 
approach ideally shouldn't come at the expense of simplicity and relevance.

If I have a string variable and I want to store the upper case version 
of another string, the direct mental translation is "dst = 
toUpper(src);" - and not "dst = toUpper(src).array;". It reminds me of 
the unwrap() calls in Rust code. They can produce a huge amount of 
visual noise for dealing with errors, whereas an exception based 
approach lets you focus on the actual problem. Of course exceptions have 
their own issues, but that's a different topic.

Keeping toString in addition to toChars would be enough to avoid the 
issue here. A possible alternative would be to let the proposed JSON 
text input range have an "alias this" to "std.array.array(this)". Then 
it wouldn't even require a rename of toString to toChars to get both worlds.

Aug 24 2015

"Sebastiaan Koppe" <mail skoppe.eu> writes:

On Tuesday, 25 August 2015 at 06:56:23 UTC, Sönke Ludwig wrote:
 If I have a string variable and I want to store the upper case 
 version of another string, the direct mental translation is 
 "dst = toUpper(src);" - and not "dst = toUpper(src).array;".

One can also say the problem is that you have a string variable.

Aug 25 2015

=?UTF-8?Q?S=c3=b6nke_Ludwig?= <sludwig rejectedsoftware.com> writes:

Am 25.08.2015 um 14:14 schrieb Sebastiaan Koppe:
 On Tuesday, 25 August 2015 at 06:56:23 UTC, Sönke Ludwig wrote:
 If I have a string variable and I want to store the upper case version
 of another string, the direct mental translation is "dst =
 toUpper(src);" - and not "dst = toUpper(src).array;".

 One can also say the problem is that you have a string variable.

But ranges are not always the right solution:

- For fields or setter properties, the exact type of the range is fixed, 
which is generally unpractical
- If the underlying data of a range is stored on the stack or any other 
transient storage, it cannot be stored on the heap
- If the range is only an input range, it must be copied to an array 
anyway if it's going to be read multiple times
- Ranges cannot be immutable (no safe slicing or passing between threads)
- If for some reason template land needs to be left, ranges have trouble 
following (although there are wrapper classes available)
- Most existing APIs are string based
- Re-evaluating a computed range each time a variable is read is usually 
wasteful

There are probably a bunch of other problems that simply make ranges not 
the best answer in every situation.

Aug 25 2015

"Jay Norwood" <jayn prismnet.com> writes:

On Thursday, 13 August 2015 at 10:51:47 UTC, Sönke Ludwig wrote:
 I think we really need to have an informal pre-vote about the 
 BigInt and DOM efficiency vs. functionality issues. Basically 
 there are three options for each:

 1. Keep them: May have an impact on compile time for big DOMs 
 (run time/memory consumption wouldn't be affected if a pointer 
 to BigInt is stored). But provides an out-of-the-box experience 
 for a broad set of applications.

 2. Remove them: Results in a slim and clean API that is fast 
 (to run/compile), but also one that will be less useful for 
 certain applications.

 3. Make them CT configurable: Best of both worlds in terms of 
 speed, at the cost of a more complex API.


the template to extend the supported data types, correct?

However, I also think that you shouldn't try to make the basic 
storage format handle everything that might be more appropriately 
handled by a meta-model.

Are the range operations compatible with the std.parallelism 
library?

Aug 15 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 7/28/15 10:07 AM, Atila Neves wrote:
 Start of the two week process, folks.

 Code: https://github.com/s-ludwig/std_data_json
 Docs: http://s-ludwig.github.io/std_data_json/

 Atila

I'll submit a review in short order, but thought this might be of use in 
performance comparisons: 
https://www.reddit.com/r/programming/comments/3hbt4w/using_json_in_a_low_
atency_environment/ 
-- Andrei

Aug 17 2015

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig outerproduct.org> writes:

I've added some changes in the latest version (docs updated):

- Switched to TaggedAlgebraic with full static operator forwarding
- Removed toPrettyJSON (now the default), added GeneratorOptions.compact
- The bigInt field in JSONValue is now stored as a pointer
- Removed is(String/Integral)InputRange helper functions
- Added opt2() [1] as an alternative candidate to opt() [2] with a more 
natural syntax

The possible optimization to store the type tag in unused parts of the 
data fields could be implemented later directly in TaggedAlgebraic.

[1]: http://s-ludwig.github.io/std_data_json/stdx/data/json/value/opt2.html
[2]: http://s-ludwig.github.io/std_data_json/stdx/data/json/value/opt.html

Aug 17 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 7/28/15 10:07 AM, Atila Neves wrote:
 Start of the two week process, folks.

 Code: https://github.com/s-ludwig/std_data_json
 Docs: http://s-ludwig.github.io/std_data_json/

 Atila

I'll preface my review with a general comment. This API comes at an 
interesting juncture; we're striving as much as possible for interfaces 
that abstract away lifetime management, so they can be used comfortably 
with GC, or at high performance (and hopefully no or only marginal loss 
of comfort) with client-chosen lifetime management policies.

The JSON API is a great test bed for our emerging recommended "push 
lifetime up" idioms; it's not too complicated yet it's not trivial 
either, and has great usefulness.

With this, here are some points:

* All new stuff should go in std.experimental. I assume "stdx" would 
change to that, should this work be merged.

* On the face of it, dedicating 6 modules to such a small specification 
as JSON seems excessive. I'm thinking one module here. (As a simple 
point: who would ever want to import only foundation, which in turn has 
one exception type and one location type in it?) I think it shouldn't be 
up for debate that we must aim for simple and clean APIs.

* stdx.data.json.generator: I think the API for converting in-memory 
JSON values to strings needs to be redone, as follows:

- JSONValue should offer a byToken range, which offers the contents of 
the value one token at a time. For example, "[ 1, 2, 3 ]" offers the '[' 
token followed by three numeric tokens with the respective values 
followed by the ']' token.

- On top of byToken it's immediate to implement a method (say toJSON or 
toString) that accepts an output range of characters and formatting options.

- On top of the method above with output range, implementing a toString 
overload that returns a string for convenience is a two-liner. However, 
it shouldn't return a "string"; Phobos APIs should avoid "hardcoding" 
the string type. Instead, it should return a user-chosen string type 
(including reference counting strings).

- While at it make prettyfication a flag in the options, not its own 
part of the function name.

* stdx.data.json.lexer:

- I assume the idea was to accept ranges of integrals to mean "there's 
some raw input from a file". This seems to be a bit overdone, e.g. 
there's no need to accept signed integers or 64-bit integers. I suggest 
just going with the three character types.

- I see tokenization accepts input ranges. This forces the tokenizer to 
store its own copy of things, which is no doubt the business of 
appenderFactory. Here the departure of the current approach from what I 
think should become canonical Phobos APIs deepens for multiple reasons. 
First, appenderFactory does allow customization of the append operation 
(nice) but that's not enough to allow the user to customize the lifetime 
of the created strings, which is usually reflected in the string type 
itself. So the lexing method should be parameterized by the string type 
used. (By default string (as is now) should be fine.) Therefore instead 
of customizing the append method just customize the string type used in 
the token.

- The lexer should internally take optimization opportunities, e.g. if 
the string type is "string" and the lexed type is also "string", great, 
just use slices of the input instead of appending them to the tokens.

- As a consequence the JSONToken type also needs to be parameterized by 
the type of its string that holds the payload. I understand this is a 
complication compared to the current approach, but I don't see an out. 
In the grand scheme of things it seems a necessary evil: tokens may or 
may not need a means to manage lifetime of their payload, and that's 
determined by the type of the payload. Hopefully simplifications in 
other areas of the API would offset this.

- At token level there should be no number parsing. Just store the 
payload with the token and leave it for later. Very often numbers are 
converted without there being a need, and the process is costly. This 
also nicely sidesteps the entire matter of bigints, floating point etc. 
at this level.

- Also, at token level strings should be stored with escapes unresolved. 
If the user wants a string with the escapes resolved, a lazy range does it.

- Validating UTF is tricky; I've seen some discussion in this thread 
about it. On the face of it JSON only accepts valid UTF characters. As 
such, a modularity-based argument is to pipe UTF validation before 
tokenization. (We need a lazy UTF validator and sanitizer stat!) An 
efficiency-based argument is to do validation during tokenization. I'm 
inclining in favor of modularization, which allows us to focus on one 
thing at a time and do it well, instead of duplicationg validation 
everywhere. Note that it's easy to write routines that do JSON 
tokenization and leave UTF validation for later, so there's a lot of 
flexibility in composing validation with JSONization.

- Litmus test: if the input type is a forward range AND if the string 
type chosen for tokens is the same as input type, successful 
tokenization should allocate exactly zero memory. I think this is a 
simple way to make sure that the tokenization API works well.

- If noThrow is a runtime option, some functions can't be nothrow (and 
consequently nogc). Not sure how important this is. Probably quite a bit 
because of the current gc implications of exceptions. IMHO: at lexing 
level a sound design might just emit error tokens (with the culprit as 
payload) and never throw. Clients may always throw when they see an 
error token.

* stdx.data.json.parser:

- Similar considerations regarding string type used apply here as well: 
everything should be parameterized with it - the use case to keep in 
mind is someone wants everything with refcounted strings.

- The JSON value does its own internal allocation (for e.g. arrays and 
hashtables), which should be fine as long as it's encapsulated and we 
can tweak it later (e.g. make it use reference counting inside).

- parseJSONStream should parameterize on string type, not on 
appenderFactory.

- Why both parseJSONStream and parseJSONValue? I'm thinking 
parseJSONValue would be enough because then you trivially parse a stream 
with repeated calls to parseJSONValue.

- FWIW I think the whole thing with accommodating BigInt etc. is an 
exaggeration. Just stick with long and double.

- readArray suddenly introduces a distinct kind of interacting - 
callbacks. Why? Should be a lazy range lazy range lazy range. An adapter 
using callbacks is then a two-liner.

- Why is readBool even needed? Just readJSONValue and then enforce it as 
a bool. Same reasoning applies to readDouble and readString.

- readObject is with callbacks again - it would be nice if it were a 
lazy range.

- skipXxx are nice to have and useful.

* stdx.data.json.value:

- The etymology of "opt" is unclear - no word starting with "opt" or 
obviously abbreviating to it is in the documentation. "opt2" is awkward. 
How about "path" and "dyn", respectively.

- I think Algebraic should be used throughout instead of 
TaggedAlgebraic, or motivation be given for the latter.

- JSONValue should be more opaque and not expose representation as much 
as it does now. In particular, offering a built-in hashtable is bound to 
be problematic because those are expensive to construct, create garbage, 
and are not customizable. Instead, the necessary lookup and set APIs 
should be provided by JSONValue whilst keeping the implementation 
hidden. The same goes about array - a JSONValue shall not be exposed; 
instead, indexed access primitives should be exposed. Separate types 
might be offered (e.g. JSONArray, JSONDictionary) if deemed necessary. 
The string type should be a type parameter of JSONValue.

==============================

So, here we are. I realize a good chunk of this is surprising ("you mean 
I shouldn't create strings in my APIs?"). My point here is, again, we're 
at a juncture. We're trying to factor garbage (heh) out of API design in 
ways that defer the lifetime management to the user of the API.

We could pull json into std.experimental and defer the policy decisions 
for later, but I think it's a great driver for them. (Thanks Sönke for 
doing all the work, this is a great baseline.) I think we should use the 
JSON API as a guinea pig for the new era of D API design in which we 
have a solid set of principles, tools, and guidelines to defer lifetime 
management. Please advise.



Andrei

Aug 17 2015

Jacob Carlborg <doob me.com> writes:

On 2015-08-18 00:21, Andrei Alexandrescu wrote:

 * On the face of it, dedicating 6 modules to such a small specification
 as JSON seems excessive. I'm thinking one module here. (As a simple
 point: who would ever want to import only foundation, which in turn has
 one exception type and one location type in it?) I think it shouldn't be
 up for debate that we must aim for simple and clean APIs.

I don't think this is excessive. We should strive to have small modules. 
We already have/had problems with std.algorithm and std.datetime, let's 
not repeat those mistakes. A module with 2000 lines is more than enough.

-- 
/Jacob Carlborg

Aug 17 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 8/18/15 2:31 AM, Jacob Carlborg wrote:
 On 2015-08-18 00:21, Andrei Alexandrescu wrote:

 * On the face of it, dedicating 6 modules to such a small specification
 as JSON seems excessive. I'm thinking one module here. (As a simple
 point: who would ever want to import only foundation, which in turn has
 one exception type and one location type in it?) I think it shouldn't be
 up for debate that we must aim for simple and clean APIs.

 I don't think this is excessive. We should strive to have small modules.
 We already have/had problems with std.algorithm and std.datetime, let's
 not repeat those mistakes. A module with 2000 lines is more than enough.

How about a module with 20? -- Andrei

Aug 18 2015

Jacob Carlborg <doob me.com> writes:

On 2015-08-18 15:18, Andrei Alexandrescu wrote:

 How about a module with 20? -- Andrei

If it's used in several other modules, I don't see a problem with it.

-- 
/Jacob Carlborg

Aug 18 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 8/18/15 9:31 AM, Jacob Carlborg wrote:
 On 2015-08-18 15:18, Andrei Alexandrescu wrote:

 How about a module with 20? -- Andrei

 If it's used in several other modules, I don't see a problem with it.

Me neither if internal. I do see a problem if it's public. -- Andrei

Aug 18 2015

Jacob Carlborg <doob me.com> writes:

On 2015-08-18 17:18, Andrei Alexandrescu wrote:

 Me neither if internal. I do see a problem if it's public. -- Andrei

If it's public and those 20 lines are useful on its own, I don't see a 
problem with that either.

-- 
/Jacob Carlborg

Aug 18 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 8/18/15 1:24 PM, Jacob Carlborg wrote:
 On 2015-08-18 17:18, Andrei Alexandrescu wrote:

 Me neither if internal. I do see a problem if it's public. -- Andrei

 If it's public and those 20 lines are useful on its own, I don't see a
 problem with that either.

In this case at least they aren't. There is no need to import the JSON 
exception and the JSON location without importing anything else JSON. -- 
Andrei

Aug 18 2015

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 19-Aug-2015 04:58, Andrei Alexandrescu wrote:
 On 8/18/15 1:24 PM, Jacob Carlborg wrote:
 On 2015-08-18 17:18, Andrei Alexandrescu wrote:

 Me neither if internal. I do see a problem if it's public. -- Andrei

 If it's public and those 20 lines are useful on its own, I don't see a
 problem with that either.

 In this case at least they aren't. There is no need to import the JSON
 exception and the JSON location without importing anything else JSON. --
 Andrei

To catch it? Generally I agree - just merge things sensibly, there could 
be traits.d/primitives.d module should it define isXYZ constraints and 
other lightweight interface-only entities.

-- 
Dmitry Olshansky

Aug 19 2015

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 19.08.2015 um 03:58 schrieb Andrei Alexandrescu:
 On 8/18/15 1:24 PM, Jacob Carlborg wrote:
 On 2015-08-18 17:18, Andrei Alexandrescu wrote:

 Me neither if internal. I do see a problem if it's public. -- Andrei

 If it's public and those 20 lines are useful on its own, I don't see a
 problem with that either.

 In this case at least they aren't. There is no need to import the JSON
 exception and the JSON location without importing anything else JSON. --
 Andrei

The only other module where it would fit would be lexer.d, but that 
means that importing JSONValue also has to import the parser and lexer 
modules, which is usually only needed in a few places.

Aug 19 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 8/19/15 4:55 AM, Sönke Ludwig wrote:
 Am 19.08.2015 um 03:58 schrieb Andrei Alexandrescu:
 On 8/18/15 1:24 PM, Jacob Carlborg wrote:
 On 2015-08-18 17:18, Andrei Alexandrescu wrote:

 Me neither if internal. I do see a problem if it's public. -- Andrei

 If it's public and those 20 lines are useful on its own, I don't see a
 problem with that either.

 In this case at least they aren't. There is no need to import the JSON
 exception and the JSON location without importing anything else JSON. --
 Andrei

 The only other module where it would fit would be lexer.d, but that
 means that importing JSONValue also has to import the parser and lexer
 modules, which is usually only needed in a few places.

I'm sure there are a number of better options to package things nicely. 
-- Andrei

Aug 21 2015

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 21.08.2015 um 18:54 schrieb Andrei Alexandrescu:
 On 8/19/15 4:55 AM, Sönke Ludwig wrote:
 Am 19.08.2015 um 03:58 schrieb Andrei Alexandrescu:
 On 8/18/15 1:24 PM, Jacob Carlborg wrote:
 On 2015-08-18 17:18, Andrei Alexandrescu wrote:

 Me neither if internal. I do see a problem if it's public. -- Andrei

 If it's public and those 20 lines are useful on its own, I don't see a
 problem with that either.

 In this case at least they aren't. There is no need to import the JSON
 exception and the JSON location without importing anything else JSON. --
 Andrei

 The only other module where it would fit would be lexer.d, but that
 means that importing JSONValue also has to import the parser and lexer
 modules, which is usually only needed in a few places.

 I'm sure there are a number of better options to package things nicely.
 -- Andrei

I'm all ears ;)

Aug 22 2015

Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:

On 08/18/2015 09:18 AM, Andrei Alexandrescu wrote:
 On 8/18/15 2:31 AM, Jacob Carlborg wrote:
 I don't think this is excessive. We should strive to have small modules.
 We already have/had problems with std.algorithm and std.datetime, let's
 not repeat those mistakes. A module with 2000 lines is more than enough.

 How about a module with 20? -- Andrei

Module boundaries should be determined by organizational grouping, not 
by size.

Aug 21 2015

"David Nadlinger" <code klickverbot.at> writes:

On Friday, 21 August 2015 at 16:25:40 UTC, Nick Sabalausky wrote:
 Module boundaries should be determined by organizational 
 grouping, not by size.

By organizational grouping as well as encapsulation concerns. 
Modules are the smallest units of encapsulation in D, 
visibility-wise.

  — David

Aug 21 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 8/21/15 12:25 PM, Nick Sabalausky wrote:
 On 08/18/2015 09:18 AM, Andrei Alexandrescu wrote:
 On 8/18/15 2:31 AM, Jacob Carlborg wrote:
 I don't think this is excessive. We should strive to have small modules.
 We already have/had problems with std.algorithm and std.datetime, let's
 not repeat those mistakes. A module with 2000 lines is more than enough.

 How about a module with 20? -- Andrei

 Module boundaries should be determined by organizational grouping, not
 by size.

Rather by usefulness. As I mentioned, nobody would ever need only JSON's 
exceptions and location. -- Andrei

Aug 21 2015

Jacob Carlborg <doob me.com> writes:

On 2015-08-21 18:25, Nick Sabalausky wrote:

 Module boundaries should be determined by organizational grouping, not
 by size.

Well, but it depends on how you decide what should be in a group. Size 
is usually a part of that decision, although it might not be conscious. 
You wouldn't but the whole D compiler on one module ;)

-- 
/Jacob Carlborg

Aug 23 2015

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:

On Monday, 17 August 2015 at 22:21:50 UTC, Andrei Alexandrescu 
wrote:
 * stdx.data.json.generator: I think the API for converting 
 in-memory JSON values to strings needs to be redone, as follows:

 - JSONValue should offer a byToken range, which offers the 
 contents of the value one token at a time. For example, "[ 1, 
 2, 3 ]" offers the '[' token followed by three numeric tokens 
 with the respective values followed by the ']' token.

For iterating tree-like structures, a callback-based seems nicer, 
because it can naturally use the stack for storing its state. (I 
assume std.concurrency.Generator is too heavy-weight for this 
case.)

 - On top of byToken it's immediate to implement a method (say 
 toJSON or toString) that accepts an output range of characters 
 and formatting options.

If there really needs to be a range, `joiner` and `copy` should 
do the job.

 - On top of the method above with output range, implementing a 
 toString overload that returns a string for convenience is a 
 two-liner. However, it shouldn't return a "string"; Phobos APIs 
 should avoid "hardcoding" the string type. Instead, it should 
 return a user-chosen string type (including reference counting 
 strings).

`to!string`, for compatibility with std.conv.

 - While at it make prettyfication a flag in the options, not 
 its own part of the function name.

(That's already done.)

 * stdx.data.json.lexer:

 - I assume the idea was to accept ranges of integrals to mean 
 "there's some raw input from a file". This seems to be a bit 
 overdone, e.g. there's no need to accept signed integers or 
 64-bit integers. I suggest just going with the three character 
 types.

 - I see tokenization accepts input ranges. This forces the 
 tokenizer to store its own copy of things, which is no doubt 
 the business of appenderFactory. Here the departure of the 
 current approach from what I think should become canonical 
 Phobos APIs deepens for multiple reasons. First, 
 appenderFactory does allow customization of the append 
 operation (nice) but that's not enough to allow the user to 
 customize the lifetime of the created strings, which is usually 
 reflected in the string type itself. So the lexing method 
 should be parameterized by the string type used. (By default 
 string (as is now) should be fine.) Therefore instead of 
 customizing the append method just customize the string type 
 used in the token.

 - The lexer should internally take optimization opportunities, 
 e.g. if the string type is "string" and the lexed type is also 
 "string", great, just use slices of the input instead of 
 appending them to the tokens.

 - As a consequence the JSONToken type also needs to be 
 parameterized by the type of its string that holds the payload. 
 I understand this is a complication compared to the current 
 approach, but I don't see an out. In the grand scheme of things 
 it seems a necessary evil: tokens may or may not need a means 
 to manage lifetime of their payload, and that's determined by 
 the type of the payload. Hopefully simplifications in other 
 areas of the API would offset this.

I've never seen JSON encoded in anything other than UTF-8. Is it 
really necessary to complicate everything for such an infrequent 
niche case?

 - At token level there should be no number parsing. Just store 
 the payload with the token and leave it for later. Very often 
 numbers are converted without there being a need, and the 
 process is costly. This also nicely sidesteps the entire matter 
 of bigints, floating point etc. at this level.

 - Also, at token level strings should be stored with escapes 
 unresolved. If the user wants a string with the escapes 
 resolved, a lazy range does it.

This was already suggested, and it looks like a good idea, though 
there was an objection because of possible performance costs. The 
other objection, that it requires an allocation, is no longer 
valid if sliceable input is used.

 - Validating UTF is tricky; I've seen some discussion in this 
 thread about it. On the face of it JSON only accepts valid UTF 
 characters. As such, a modularity-based argument is to pipe UTF 
 validation before tokenization. (We need a lazy UTF validator 
 and sanitizer stat!) An efficiency-based argument is to do 
 validation during tokenization. I'm inclining in favor of 
 modularization, which allows us to focus on one thing at a time 
 and do it well, instead of duplicationg validation everywhere. 
 Note that it's easy to write routines that do JSON tokenization 
 and leave UTF validation for later, so there's a lot of 
 flexibility in composing validation with JSONization.

Well, in an ideal world, there should be no difference in 
performance between manually combined tokenization/validation, 
and composed ranges. We should practice what we preach here.

 * stdx.data.json.parser:

 - FWIW I think the whole thing with accommodating BigInt etc. 
 is an exaggeration. Just stick with long and double.

Or, as above, leave it to the end user and provide a `to(T)` 
method that can support built-in types and `BigInt` alike.

Aug 18 2015

Marco Leise <Marco.Leise gmx.de> writes:

Am Tue, 18 Aug 2015 09:05:32 +0000
schrieb "Marc Sch=C3=BCtz" <schuetzm gmx.net>:

 Or, as above, leave it to the end user and provide a `to(T)`=20
 method that can support built-in types and `BigInt` alike.

You mean the user should write a JSON number parsing routine
on their own? Then which part is responsible for validation of
JSON contraints? If it is the to!(T) function, then it is
code duplication with chances of getting something wrong,
if it is the JSON parser, then the number is parsed twice.
Besides, there is a lot of code to be shared for every T.

--=20
Marco

Sep 28 2015

Marc =?UTF-8?B?U2Now7x0eg==?= <schuetzm gmx.net> writes:

On Monday, 28 September 2015 at 07:02:35 UTC, Marco Leise wrote:
 Am Tue, 18 Aug 2015 09:05:32 +0000
 schrieb "Marc Schütz" <schuetzm gmx.net>:

 Or, as above, leave it to the end user and provide a `to(T)` 
 method that can support built-in types and `BigInt` alike.

 You mean the user should write a JSON number parsing routine
 on their own? Then which part is responsible for validation of
 JSON contraints? If it is the to!(T) function, then it is
 code duplication with chances of getting something wrong,
 if it is the JSON parser, then the number is parsed twice.
 Besides, there is a lot of code to be shared for every T.

No, the JSON type should just store the raw unparsed token and 
implement:

     struct JSON {
         T to(T) if(isNumeric!T && is(typeof(T("")))) {
             return T(this.raw);
         }
     }

The end user can then call:

     auto value = json.to!BigInt;

Sep 29 2015

Laeeth Isharc <laeethnospam nospamlaeeth.com> writes:

On Tuesday, 29 September 2015 at 11:06:03 UTC, Marc Schütz wrote:
 On Monday, 28 September 2015 at 07:02:35 UTC, Marco Leise wrote:
 Am Tue, 18 Aug 2015 09:05:32 +0000
 schrieb "Marc Schütz" <schuetzm gmx.net>:

 Or, as above, leave it to the end user and provide a `to(T)` 
 method that can support built-in types and `BigInt` alike.

 You mean the user should write a JSON number parsing routine
 on their own? Then which part is responsible for validation of
 JSON contraints? If it is the to!(T) function, then it is
 code duplication with chances of getting something wrong,
 if it is the JSON parser, then the number is parsed twice.
 Besides, there is a lot of code to be shared for every T.

 No, the JSON type should just store the raw unparsed token and 
 implement:

     struct JSON {
         T to(T) if(isNumeric!T && is(typeof(T("")))) {
             return T(this.raw);
         }
     }

 The end user can then call:

     auto value = json.to!BigInt;



I was just speaking to Sonke about another aspect of this.  It's 
not just numbers where this might be the case - dates are also 
often in a weird format (because the data comes from some ancient 
mainframe, for example).  And similarly for enums where the field 
is a string but actually ought to fit in a fixed set of 
categories.

I forgot the original context to this long thread, so hopefully 
this point is relevant.  It's more relevant for the layer that 
will go on top where you want to be able to parse a json array or 
object as a D array/associative array of structs, as you can do 
in vibe.d currently.  But maybe needs to be considered in lower 
level - I forget at this point.

Sep 29 2015

Marco Leise <Marco.Leise gmx.de> writes:

Am Tue, 29 Sep 2015 11:06:01 +0000
schrieb Marc Sch=C3=BCtz <schuetzm gmx.net>:

 No, the JSON type should just store the raw unparsed token and=20
 implement:
=20
      struct JSON {
          T to(T) if(isNumeric!T && is(typeof(T("")))) {
              return T(this.raw);
          }
      }
=20
 The end user can then call:
=20
      auto value =3D json.to!BigInt;

Ah, the duck typing approach of accepting any numeric type
constructible from a string.

Still: You need to parse the number first to know how long
the digit string is that you pass to T's ctor. And then you
have two sets of syntaxes for numbers: JSON and T's ctor. T
could potentially parse numbers with the system locale's
setting for the decimal point which may be ',' while JSON uses
'.' or support hexadecimal numbers which are also invalid JSON.
On the other hand, a ctor for some integral type may not
support the exponential notation "2e10", which could
legitimately be used by JSON writers (Ruby's uses shortest way
to store numbers) to save on bandwidth.

--=20
Marco

Sep 30 2015

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 18.08.2015 um 00:21 schrieb Andrei Alexandrescu:
 I'll preface my review with a general comment. This API comes at an
 interesting juncture; we're striving as much as possible for interfaces
 that abstract away lifetime management, so they can be used comfortably
 with GC, or at high performance (and hopefully no or only marginal loss
 of comfort) with client-chosen lifetime management policies.

 The JSON API is a great test bed for our emerging recommended "push
 lifetime up" idioms; it's not too complicated yet it's not trivial
 either, and has great usefulness.

 With this, here are some points:

 * All new stuff should go in std.experimental. I assume "stdx" would
 change to that, should this work be merged.

Check.

 * On the face of it, dedicating 6 modules to such a small specification
 as JSON seems excessive. I'm thinking one module here. (As a simple
 point: who would ever want to import only foundation, which in turn has
 one exception type and one location type in it?) I think it shouldn't be
 up for debate that we must aim for simple and clean APIs.

That would mean a single module that is >5k lines long. Spreading out 
certain things, such as JSONValue into an own module also makes sense to 
avoid unnecessarily large imports where other parts of the functionality 
isn't needed. Maybe we could move some private things to "std.internal" 
or similar and merge some of the modules?

But I also think that grouping symbols by topic is a good thing and 
makes figuring out the API easier. There is also always package.d if you 
really want to import everything.

 * stdx.data.json.generator: I think the API for converting in-memory
 JSON values to strings needs to be redone, as follows:

 - JSONValue should offer a byToken range, which offers the contents of
 the value one token at a time. For example, "[ 1, 2, 3 ]" offers the '['
 token followed by three numeric tokens with the respective values
 followed by the ']' token.

An input range style generator is on the TODO list, but would a token 
range be really useful for anything in practice? I would just go 
straight for a char range.

Another thing I'd like to add is an output range that takes parser nodes 
and writes to a string output range. This would be the kind of interface 
that would be most useful for a serialization framework.

 - On top of byToken it's immediate to implement a method (say toJSON or
 toString) that accepts an output range of characters and formatting
 options.

 - On top of the method above with output range, implementing a toString
 overload that returns a string for convenience is a two-liner. However,
 it shouldn't return a "string"; Phobos APIs should avoid "hardcoding"
 the string type. Instead, it should return a user-chosen string type
 (including reference counting strings).

Without any existing code to test this against, how would this look 
like? Simply using an `Appender!rcstring`?

 - While at it make prettyfication a flag in the options, not its own
 part of the function name.

Already done. Pretty printing is now the default and there is 
GeneratorOptions.compact.

 * stdx.data.json.lexer:

 - I assume the idea was to accept ranges of integrals to mean "there's
 some raw input from a file". This seems to be a bit overdone, e.g.
 there's no need to accept signed integers or 64-bit integers. I suggest
 just going with the three character types.

It's funny you say that, because this was your own design proposal.

Regarding the three character types, if we drop everything but those, I 
think we could also go with Walter's suggestion and just drop everything 
apart from "char". Putting a conversion range from dchar to char would 
be trivial and should be fast enough.

 - I see tokenization accepts input ranges. This forces the tokenizer to
 store its own copy of things, which is no doubt the business of
 appenderFactory.  Here the departure of the current approach from what I
 think should become canonical Phobos APIs deepens for multiple reasons.
 First, appenderFactory does allow customization of the append operation
 (nice) but that's not enough to allow the user to customize the lifetime
 of the created strings, which is usually reflected in the string type
 itself. So the lexing method should be parameterized by the string type
 used. (By default string (as is now) should be fine.) Therefore instead
 of customizing the append method just customize the string type used in
 the token.

Okay, sounds reasonable if Appender!rcstring is just going to work.

 - The lexer should internally take optimization opportunities, e.g. if
 the string type is "string" and the lexed type is also "string", great,
 just use slices of the input instead of appending them to the tokens.

It does.

 - As a consequence the JSONToken type also needs to be parameterized by
 the type of its string that holds the payload. I understand this is a
 complication compared to the current approach, but I don't see an out.
 In the grand scheme of things it seems a necessary evil: tokens may or
 may not need a means to manage lifetime of their payload, and that's
 determined by the type of the payload. Hopefully simplifications in
 other areas of the API would offset this.

It wouldn't be too bad here, because it's presumably pretty rare to 
store tokens or parser nodes. Worse is JSONValue.

 - At token level there should be no number parsing. Just store the
 payload with the token and leave it for later. Very often numbers are
 converted without there being a need, and the process is costly. This
 also nicely sidesteps the entire matter of bigints, floating point etc.
 at this level.

Okay, again, this was your own suggestion. The downside of always 
storing the string representation is that it requires allocations if no 
slices are used, and that the string will have to be parsed twice if the 
number is indeed going to be used. This can have a considerable 
performance impact.

 - Also, at token level strings should be stored with escapes unresolved.
 If the user wants a string with the escapes resolved, a lazy range does it.

To make things efficient, it currently stores escaped strings if slices 
of the input are used, but stores unescaped strings if allocations are 
necessary anyway.

 - Validating UTF is tricky; I've seen some discussion in this thread
 about it. On the face of it JSON only accepts valid UTF characters. As
 such, a modularity-based argument is to pipe UTF validation before
 tokenization. (We need a lazy UTF validator and sanitizer stat!) An
 efficiency-based argument is to do validation during tokenization. I'm
 inclining in favor of modularization, which allows us to focus on one
 thing at a time and do it well, instead of duplicationg validation
 everywhere. Note that it's easy to write routines that do JSON
 tokenization and leave UTF validation for later, so there's a lot of
 flexibility in composing validation with JSONization.

It's unfortunate to see this change of mind in face of the work that 
already went into the implementation. I also still think that this is a 
good optimization opportunity that doesn't really affect the 
implementation complexity. Validation isn't duplicated, but reused from 
std.utf.

 - Litmus test: if the input type is a forward range AND if the string
 type chosen for tokens is the same as input type, successful
 tokenization should allocate exactly zero memory. I think this is a
 simple way to make sure that the tokenization API works well.

Supporting arbitrary forward ranges doesn't seem to be enough, it would 
at least have to be combined with something like take(), but then the 
type doesn't equal the string type anymore. I'd suggest to keep it to 
"if is sliceable and input type equals string type", at least for the 
initial version.

 - If noThrow is a runtime option, some functions can't be nothrow (and
 consequently nogc). Not sure how important this is. Probably quite a bit
 because of the current gc implications of exceptions. IMHO: at lexing
 level a sound design might just emit error tokens (with the culprit as
 payload) and never throw. Clients may always throw when they see an
 error token.

noThrow is a compile time option and there are  nothrow unit tests to 
make sure that the API is  nothrow at least for string inputs.

 * stdx.data.json.parser:

 - Similar considerations regarding string type used apply here as well:
 everything should be parameterized with it - the use case to keep in
 mind is someone wants everything with refcounted strings.

Okay.

 - The JSON value does its own internal allocation (for e.g. arrays and
 hashtables), which should be fine as long as it's encapsulated and we
 can tweak it later (e.g. make it use reference counting inside).

Since it's based on (Tagged)Algebraic, the internal types are part of 
the interface. Changing them later is bound to break some code. So AFICS 
this would either require to make the types used parameterized (string, 
array and AA types). Or to abstract them away completely, i.e. only 
forward operations but deny direct access to the type.

... thinking about it, TaggedAlgebraic could do that, while Algebraic can't.

 - parseJSONStream should parameterize on string type, not on
 appenderFactory.

Okay.

 - Why both parseJSONStream and parseJSONValue? I'm thinking
 parseJSONValue would be enough because then you trivially parse a stream
 with repeated calls to parseJSONValue.

parseJSONStream is the pull parser (StAX style) interface. It returns 
the contents of a JSON document as individual nodes instead of storing 
them in a DOM. This part is vital for high-performance parsing, 
especially of large documents.

 - FWIW I think the whole thing with accommodating BigInt etc. is an
 exaggeration. Just stick with long and double.

As mentioned earlier somewhere in this thread, there are practical needs 
to at least be able to handle ulong, too. Maybe the solution is indeed 
to just (optionally) store the string representation, so people can 
convert as they see fit.

 - readArray suddenly introduces a distinct kind of interacting -
 callbacks. Why? Should be a lazy range lazy range lazy range. An adapter
 using callbacks is then a two-liner.

It just has a more complicated implementation, but is already on the 
TODO list.

 - Why is readBool even needed? Just readJSONValue and then enforce it as
 a bool. Same reasoning applies to readDouble and readString.

This is for lower level access, using parseJSONValue would certainly be 
possible, but it would have quite some unneeded overhead and would also 
be non- nogc.

 - readObject is with callbacks again - it would be nice if it were a
 lazy range.

Okay, is also already on the list.

 - skipXxx are nice to have and useful.

 * stdx.data.json.value:

 - The etymology of "opt" is unclear - no word starting with "opt" or
 obviously abbreviating to it is in the documentation. "opt2" is awkward.
 How about "path" and "dyn", respectively.

The names are just placeholders currently. I think one of the two should 
also be enough. I've just implemented both, so that both can be 
tested/seen in practice. There have also been some more name suggestions 
in a thread mentioned by Meta with a more general suggestion for normal 
D member access. I'll see if I can dig those up, too.

 - I think Algebraic should be used throughout instead of
 TaggedAlgebraic, or motivation be given for the latter.

There have already been quite some arguments that I think are 
compelling, especially with a lack of counter arguments (maybe their 
consequences need to be explained better, though). TaggedAlgebraic could 
also (implicitly) convert to Algebraic. An additional argument is the 
potential possibility of TaggedAlgebraic to abstract away the underlying 
type, since it doesn't rely on a has!T and get!T API.

But apart from that, algebraic is unfortunately currently quite unsuited 
for this kind of abstraction, even if that can be solved in theory (with 
a lot of work). It requires to write things like 
obj.get!(JSONValue[string])["foo"].get!JSONValue instead of just 
obj["foo"], because it simply returns Variant from all of its forwarded 
operators.

 - JSONValue should be more opaque and not expose representation as much
 as it does now. In particular, offering a built-in hashtable is bound to
 be problematic because those are expensive to construct, create garbage,
 and are not customizable. Instead, the necessary lookup and set APIs
 should be provided by JSONValue whilst keeping the implementation
 hidden. The same goes about array - a JSONValue shall not be exposed;
 instead, indexed access primitives should be exposed. Separate types
 might be offered (e.g. JSONArray, JSONDictionary) if deemed necessary.
 The string type should be a type parameter of JSONValue.

This would unfortunately at the same time destroy almost all benefits 
that using (Tagged)Algebraic has, namely that it would opens up the 
possibility to have interoperability between different data formats (for 
example, passing a JSONValue to a BSON generator without letting the 
BSON generator know about JSON). This is unfortunately an area that I've 
also not yet properly explored, but I think it's important as we go 
forward with other data formats.

 ==============================

 So, here we are. I realize a good chunk of this is surprising ("you mean
 I shouldn't create strings in my APIs?"). My point here is, again, we're
 at a juncture. We're trying to factor garbage (heh) out of API design in
 ways that defer the lifetime management to the user of the API.

Most suggestions so far sound very reasonable, namely parameterizing 
parsing/lexing on the string type and using ranges where possible. 
JSONValue is a different beast that needs some more thought if we really 
want to keep it generic in terms of allocation/lifetime model.

In terms of removing "garbage" from the API, I'm just not 100% sure if 
removing small but frequently used functions, such as a string 
conversion function (one that returns an allocated string) is really a 
good idea (what Walter's suggested).

 We could pull json into std.experimental and defer the policy decisions
 for later, but I think it's a great driver for them. (Thanks Sönke for
 doing all the work, this is a great baseline.) I think we should use the
 JSON API as a guinea pig for the new era of D API design in which we
 have a solid set of principles, tools, and guidelines to defer lifetime
 management. Please advise.

Aug 18 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 8/18/15 12:54 PM, Sönke Ludwig wrote:
 Am 18.08.2015 um 00:21 schrieb Andrei Alexandrescu:
 * On the face of it, dedicating 6 modules to such a small specification
 as JSON seems excessive. I'm thinking one module here. (As a simple
 point: who would ever want to import only foundation, which in turn has
 one exception type and one location type in it?) I think it shouldn't be
 up for debate that we must aim for simple and clean APIs.

 That would mean a single module that is >5k lines long. Spreading out
 certain things, such as JSONValue into an own module also makes sense to
 avoid unnecessarily large imports where other parts of the functionality
 isn't needed. Maybe we could move some private things to "std.internal"
 or similar and merge some of the modules?

That would help. My point is it's good design to make the response 
proportional to the problem. 5K lines is not a lot, but reducing those 
5K in the first place would be a noble pursuit. And btw saving parsing 
time is so C++ :o).

 But I also think that grouping symbols by topic is a good thing and
 makes figuring out the API easier. There is also always package.d if you
 really want to import everything.

Figuring out the API easily is a good goal. The best way to achieve that 
is making the API no larger than necessary.

 * stdx.data.json.generator: I think the API for converting in-memory
 JSON values to strings needs to be redone, as follows:

 - JSONValue should offer a byToken range, which offers the contents of
 the value one token at a time. For example, "[ 1, 2, 3 ]" offers the '['
 token followed by three numeric tokens with the respective values
 followed by the ']' token.

 An input range style generator is on the TODO list, but would a token
 range be really useful for anything in practice? I would just go
 straight for a char range.

Sounds good.

 Another thing I'd like to add is an output range that takes parser nodes
 and writes to a string output range. This would be the kind of interface
 that would be most useful for a serialization framework.

Couldn't that be achieved trivially by e.g. using map!(t => t.toString) 
or similar?

This is the nice thing about rangifying everything - suddenly you have a 
host of tools at your disposal.

 - On top of byToken it's immediate to implement a method (say toJSON or
 toString) that accepts an output range of characters and formatting
 options.

 - On top of the method above with output range, implementing a toString
 overload that returns a string for convenience is a two-liner. However,
 it shouldn't return a "string"; Phobos APIs should avoid "hardcoding"
 the string type. Instead, it should return a user-chosen string type
 (including reference counting strings).

 Without any existing code to test this against, how would this look
 like? Simply using an `Appender!rcstring`?

Yes.

 - While at it make prettyfication a flag in the options, not its own
 part of the function name.

 Already done. Pretty printing is now the default and there is
 GeneratorOptions.compact.

Great, thanks.

 * stdx.data.json.lexer:

 - I assume the idea was to accept ranges of integrals to mean "there's
 some raw input from a file". This seems to be a bit overdone, e.g.
 there's no need to accept signed integers or 64-bit integers. I suggest
 just going with the three character types.

 It's funny you say that, because this was your own design proposal.

Ooops...

 Regarding the three character types, if we drop everything but those, I
 think we could also go with Walter's suggestion and just drop everything
 apart from "char". Putting a conversion range from dchar to char would
 be trivial and should be fast enough.

That's great, thanks.

 - I see tokenization accepts input ranges. This forces the tokenizer to
 store its own copy of things, which is no doubt the business of
 appenderFactory.  Here the departure of the current approach from what I
 think should become canonical Phobos APIs deepens for multiple reasons.
 First, appenderFactory does allow customization of the append operation
 (nice) but that's not enough to allow the user to customize the lifetime
 of the created strings, which is usually reflected in the string type
 itself. So the lexing method should be parameterized by the string type
 used. (By default string (as is now) should be fine.) Therefore instead
 of customizing the append method just customize the string type used in
 the token.

 Okay, sounds reasonable if Appender!rcstring is just going to work.

Awesome, thanks.

 - The lexer should internally take optimization opportunities, e.g. if
 the string type is "string" and the lexed type is also "string", great,
 just use slices of the input instead of appending them to the tokens.

 It does.

Yay to that.

 - At token level there should be no number parsing. Just store the
 payload with the token and leave it for later. Very often numbers are
 converted without there being a need, and the process is costly. This
 also nicely sidesteps the entire matter of bigints, floating point etc.
 at this level.

 Okay, again, this was your own suggestion. The downside of always
 storing the string representation is that it requires allocations if no
 slices are used, and that the string will have to be parsed twice if the
 number is indeed going to be used. This can have a considerable
 performance impact.

Hmm, point taken. I'm not too worried about the parsing part but string 
allocation may be problematic.

 - Also, at token level strings should be stored with escapes unresolved.
 If the user wants a string with the escapes resolved, a lazy range
 does it.

 To make things efficient, it currently stores escaped strings if slices
 of the input are used, but stores unescaped strings if allocations are
 necessary anyway.

That seems a good balance, and probably could be applied to numbers as well.

 - Validating UTF is tricky; I've seen some discussion in this thread
 about it. On the face of it JSON only accepts valid UTF characters. As
 such, a modularity-based argument is to pipe UTF validation before
 tokenization. (We need a lazy UTF validator and sanitizer stat!) An
 efficiency-based argument is to do validation during tokenization. I'm
 inclining in favor of modularization, which allows us to focus on one
 thing at a time and do it well, instead of duplicationg validation
 everywhere. Note that it's easy to write routines that do JSON
 tokenization and leave UTF validation for later, so there's a lot of
 flexibility in composing validation with JSONization.

 It's unfortunate to see this change of mind in face of the work that
 already went into the implementation. I also still think that this is a
 good optimization opportunity that doesn't really affect the
 implementation complexity. Validation isn't duplicated, but reused from
 std.utf.

Well if the validation is reused from std.utf, it can't have been very 
much work. I maintain that separating concerns seems like a good 
strategy here.

 - Litmus test: if the input type is a forward range AND if the string
 type chosen for tokens is the same as input type, successful
 tokenization should allocate exactly zero memory. I think this is a
 simple way to make sure that the tokenization API works well.

 Supporting arbitrary forward ranges doesn't seem to be enough, it would
 at least have to be combined with something like take(), but then the
 type doesn't equal the string type anymore. I'd suggest to keep it to
 "if is sliceable and input type equals string type", at least for the
 initial version.

I had "take" in mind. Don't forget that "take" automatically uses slices 
wherever applicable. So if you just use typeof(take(...)), you get the 
best of all worlds.

The more restrictive version seems reasonable for the first release.

 - If noThrow is a runtime option, some functions can't be nothrow (and
 consequently nogc). Not sure how important this is. Probably quite a bit
 because of the current gc implications of exceptions. IMHO: at lexing
 level a sound design might just emit error tokens (with the culprit as
 payload) and never throw. Clients may always throw when they see an
 error token.

 noThrow is a compile time option and there are  nothrow unit tests to
 make sure that the API is  nothrow at least for string inputs.

Awesome.

 - The JSON value does its own internal allocation (for e.g. arrays and
 hashtables), which should be fine as long as it's encapsulated and we
 can tweak it later (e.g. make it use reference counting inside).

 Since it's based on (Tagged)Algebraic, the internal types are part of
 the interface. Changing them later is bound to break some code. So AFICS
 this would either require to make the types used parameterized (string,
 array and AA types). Or to abstract them away completely, i.e. only
 forward operations but deny direct access to the type.

 ... thinking about it, TaggedAlgebraic could do that, while Algebraic
 can't.

Well if you figure the general Algebraic type is better replaced by a 
type specialized for JSON, fine.

What we shouldn't endorse is two nearly identical library types 
(Algebraic and TaggedAlgebraic) that are only different in subtle 
matters related to performance in certain use patterns.

If integral tags are better for closed type universes, specialize 
Algebraic to use integral tags where applicable.

 - Why both parseJSONStream and parseJSONValue? I'm thinking
 parseJSONValue would be enough because then you trivially parse a stream
 with repeated calls to parseJSONValue.

 parseJSONStream is the pull parser (StAX style) interface. It returns
 the contents of a JSON document as individual nodes instead of storing
 them in a DOM. This part is vital for high-performance parsing,
 especially of large documents.

So perhaps this is just a naming issue. The names don't suggest 
everything you said. What I see is "parse a JSON stream" and "parse a 
JSON value". So I naturally assumed we're looking at consuming a full 
stream vs. consuming only one value off a stream and stopping. How about 
better names?

 - FWIW I think the whole thing with accommodating BigInt etc. is an
 exaggeration. Just stick with long and double.

 As mentioned earlier somewhere in this thread, there are practical needs
 to at least be able to handle ulong, too. Maybe the solution is indeed
 to just (optionally) store the string representation, so people can
 convert as they see fit.

Great. I trust you'll find the right compromise there. All I'm saying is 
that BigInt here stands like a sore thumb in the whole affair. Best to 
just take it out and let folks who need it build on top of the lexer.

 - readArray suddenly introduces a distinct kind of interacting -
 callbacks. Why? Should be a lazy range lazy range lazy range. An adapter
 using callbacks is then a two-liner.

 It just has a more complicated implementation, but is already on the
 TODO list.

Great. Let me say again that with ranges you get to instantly tap into a 
wealth of tools. I say get rid of the callbacks and let a "tee" take 
care of it for whomever needs it.

 - Why is readBool even needed? Just readJSONValue and then enforce it as
 a bool. Same reasoning applies to readDouble and readString.

 This is for lower level access, using parseJSONValue would certainly be
 possible, but it would have quite some unneeded overhead and would also
 be non- nogc.

Meh, fine. But all of this is adding weight to the API in the wrong places.

 - readObject is with callbacks again - it would be nice if it were a
 lazy range.

 Okay, is also already on the list.

Awes!

 - skipXxx are nice to have and useful.

 * stdx.data.json.value:

 - The etymology of "opt" is unclear - no word starting with "opt" or
 obviously abbreviating to it is in the documentation. "opt2" is awkward.
 How about "path" and "dyn", respectively.

 The names are just placeholders currently. I think one of the two should
 also be enough. I've just implemented both, so that both can be
 tested/seen in practice. There have also been some more name suggestions
 in a thread mentioned by Meta with a more general suggestion for normal
 D member access. I'll see if I can dig those up, too.

Okay.

 - I think Algebraic should be used throughout instead of
 TaggedAlgebraic, or motivation be given for the latter.

 There have already been quite some arguments that I think are
 compelling, especially with a lack of counter arguments (maybe their
 consequences need to be explained better, though). TaggedAlgebraic could
 also (implicitly) convert to Algebraic. An additional argument is the
 potential possibility of TaggedAlgebraic to abstract away the underlying
 type, since it doesn't rely on a has!T and get!T API.

To reiterate the point I made above: we should not endorse two mostly 
equivalent types that exhibit subtle performance differences. Feel free 
to change Algebraic to use integrals for some/most cases when the number 
of types involved is bounded. Adding new methods to Algebraic should 
also be fine. Just don't add a new type that's 98% the same.

 But apart from that, algebraic is unfortunately currently quite unsuited
 for this kind of abstraction, even if that can be solved in theory (with
 a lot of work). It requires to write things like
 obj.get!(JSONValue[string])["foo"].get!JSONValue instead of just
 obj["foo"], because it simply returns Variant from all of its forwarded
 operators.

Algebraic does not expose opIndex. We could add it to Algebraic such 
that obj["foo"] returns the same type a "this".

It's easy for anyone to say that what's there is unfit for a particular 
purpose. It's also easy for many to define a ever-so-slightly-different 
new artifact that fits a particular purpose. Where you come as a 
talented hacker is to operate with the understanding of the importance 
of making things work, and make it work.

 - JSONValue should be more opaque and not expose representation as much
 as it does now. In particular, offering a built-in hashtable is bound to
 be problematic because those are expensive to construct, create garbage,
 and are not customizable. Instead, the necessary lookup and set APIs
 should be provided by JSONValue whilst keeping the implementation
 hidden. The same goes about array - a JSONValue shall not be exposed;
 instead, indexed access primitives should be exposed. Separate types
 might be offered (e.g. JSONArray, JSONDictionary) if deemed necessary.
 The string type should be a type parameter of JSONValue.

 This would unfortunately at the same time destroy almost all benefits
 that using (Tagged)Algebraic has, namely that it would opens up the
 possibility to have interoperability between different data formats (for
 example, passing a JSONValue to a BSON generator without letting the
 BSON generator know about JSON). This is unfortunately an area that I've
 also not yet properly explored, but I think it's important as we go
 forward with other data formats.

I think we need to do it. Otherwise we're stuck with "D's JSON API 
cannot be used without the GC". We want to escape that gravitational 
pull. I know it's hard. But it's worth it.

 ==============================

 So, here we are. I realize a good chunk of this is surprising ("you mean
 I shouldn't create strings in my APIs?"). My point here is, again, we're
 at a juncture. We're trying to factor garbage (heh) out of API design in
 ways that defer the lifetime management to the user of the API.

 Most suggestions so far sound very reasonable, namely parameterizing
 parsing/lexing on the string type and using ranges where possible.
 JSONValue is a different beast that needs some more thought if we really
 want to keep it generic in terms of allocation/lifetime model.

 In terms of removing "garbage" from the API, I'm just not 100% sure if
 removing small but frequently used functions, such as a string
 conversion function (one that returns an allocated string) is really a
 good idea (what Walter's suggested).

We must accommodate a GC-less world. It's definitely time to acknowledge 
the GC as a brake that limits D adoption, and put our full thrust behind 
removing it.


Andrei

Aug 21 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 8/21/15 1:30 PM, Andrei Alexandrescu wrote:
 So perhaps this is just a naming issue. The names don't suggest
 everything you said. What I see is "parse a JSON stream" and "parse a
 JSON value". So I naturally assumed we're looking at consuming a full
 stream vs. consuming only one value off a stream and stopping. How about
 better names?

I should add that in parseJSONStream, "stream" refers to the input, 
whereas in parseJSONValue, "value" refers to the output. -- Andrei

Aug 21 2015

"tired_eyes" <pastuhov85 gmail.com> writes:

On Friday, 21 August 2015 at 17:30:43 UTC, Andrei Alexandrescu 
wrote:
 We must accommodate a GC-less world. It's definitely time to 
 acknowledge the GC as a brake that limits D adoption, and put 
 our full thrust behind removing it.


 Andrei

Wow. Just wow.

Aug 21 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 8/21/15 2:03 PM, tired_eyes wrote:
 On Friday, 21 August 2015 at 17:30:43 UTC, Andrei Alexandrescu wrote:
 We must accommodate a GC-less world. It's definitely time to
 acknowledge the GC as a brake that limits D adoption, and put our full
 thrust behind removing it.


 Andrei

 Wow. Just wow.

By "it" there I mean "the brake" :o). -- Andrei

Aug 21 2015

"H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:

On Fri, Aug 21, 2015 at 02:21:06PM -0400, Andrei Alexandrescu via Digitalmars-d
wrote:
 On 8/21/15 2:03 PM, tired_eyes wrote:
On Friday, 21 August 2015 at 17:30:43 UTC, Andrei Alexandrescu wrote:
We must accommodate a GC-less world. It's definitely time to
acknowledge the GC as a brake that limits D adoption, and put our
full thrust behind removing it.


Andrei

Wow. Just wow.

 
 By "it" there I mean "the brake" :o). -- Andrei

Wait, wait. So you're saying the GC is a brake, and we should remove the
brake, and therefore we should remove the GC?  This is ... wow. I'm
speechless here.


T

-- 
He who sacrifices functionality for ease of use, loses both and deserves
neither. -- Slashdotter

Aug 21 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 8/21/15 2:50 PM, H. S. Teoh via Digitalmars-d wrote:
 On Fri, Aug 21, 2015 at 02:21:06PM -0400, Andrei Alexandrescu via
Digitalmars-d wrote:
 On 8/21/15 2:03 PM, tired_eyes wrote:
 On Friday, 21 August 2015 at 17:30:43 UTC, Andrei Alexandrescu wrote:
 We must accommodate a GC-less world. It's definitely time to
 acknowledge the GC as a brake that limits D adoption, and put our
 full thrust behind removing it.


 Andrei

 Wow. Just wow.

 By "it" there I mean "the brake" :o). -- Andrei

 Wait, wait. So you're saying the GC is a brake, and we should remove the
 brake, and therefore we should remove the GC?  This is ... wow. I'm
 speechless here.

Nothing new here. We want to make it a pleasant experience to use D 
without a garbage collector. -- Andrei

Aug 21 2015

"H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:

On Fri, Aug 21, 2015 at 03:22:25PM -0400, Andrei Alexandrescu via Digitalmars-d
wrote:
 On 8/21/15 2:50 PM, H. S. Teoh via Digitalmars-d wrote:
On Fri, Aug 21, 2015 at 02:21:06PM -0400, Andrei Alexandrescu via Digitalmars-d
wrote:
On 8/21/15 2:03 PM, tired_eyes wrote:
On Friday, 21 August 2015 at 17:30:43 UTC, Andrei Alexandrescu wrote:
We must accommodate a GC-less world. It's definitely time to
acknowledge the GC as a brake that limits D adoption, and put our
full thrust behind removing it.


Andrei

Wow. Just wow.

By "it" there I mean "the brake" :o). -- Andrei

Wait, wait. So you're saying the GC is a brake, and we should remove
the brake, and therefore we should remove the GC?  This is ... wow.
I'm speechless here.

 
 Nothing new here. We want to make it a pleasant experience to use D
 without a garbage collector. -- Andrei

Making it pleasant to use without a GC is not the same thing as removing
the GC. Which is it?


T

-- 
Try to keep an open mind, but not so open your brain falls out. -- theboz

Aug 21 2015

Steven Schveighoffer <schveiguy yahoo.com> writes:

On 8/21/15 3:22 PM, Andrei Alexandrescu wrote:
 On 8/21/15 2:50 PM, H. S. Teoh via Digitalmars-d wrote:
 On Fri, Aug 21, 2015 at 02:21:06PM -0400, Andrei Alexandrescu via
 Digitalmars-d wrote:
 On 8/21/15 2:03 PM, tired_eyes wrote:
 On Friday, 21 August 2015 at 17:30:43 UTC, Andrei Alexandrescu wrote:
 We must accommodate a GC-less world. It's definitely time to
 acknowledge the GC as a brake that limits D adoption, and put our
 full thrust behind removing it.


 Andrei

 Wow. Just wow.

 By "it" there I mean "the brake" :o). -- Andrei

 Wait, wait. So you're saying the GC is a brake, and we should remove the
 brake, and therefore we should remove the GC?  This is ... wow. I'm
 speechless here.

 Nothing new here. We want to make it a pleasant experience to use D
 without a garbage collector. -- Andrei

Allow me to (possibly) clarify.

What Andrei is saying is that you should be able to use D and phobos 
*without* the GC, not that we should remove the GC.

e.g. what Walter was talking about at dconf2015 that instead of 
converting an integer to a GC-allocated string, you return a range that 
does the same thing but doesn't allocate.

-Steve

Aug 21 2015

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 21.08.2015 um 19:30 schrieb Andrei Alexandrescu:
On 8/18/15 12:54 PM, Sönke Ludwig wrote:
Am 18.08.2015 um 00:21 schrieb Andrei Alexandrescu:
* On the face of it, dedicating 6 modules to such a small specification
as JSON seems excessive. I'm thinking one module here. (As a simple
point: who would ever want to import only foundation, which in turn has
one exception type and one location type in it?) I think it shouldn't be
up for debate that we must aim for simple and clean APIs.

That would mean a single module that is >5k lines long. Spreading out
certain things, such as JSONValue into an own module also makes sense to
avoid unnecessarily large imports where other parts of the functionality
isn't needed. Maybe we could move some private things to "std.internal"
or similar and merge some of the modules?

That would help. My point is it's good design to make the response
proportional to the problem. 5K lines is not a lot, but reducing those
5K in the first place would be a noble pursuit. And btw saving parsing
time is so C++ :o).

Most lines are needed for tests and documentation. Surely dropping some
functionality would make the module smaller, too. But there is not a lot
to take away without making severe compromises in terms of actual
functionality or usability.

But I also think that grouping symbols by topic is a good thing and
makes figuring out the API easier. There is also always package.d if you
really want to import everything.

Figuring out the API easily is a good goal. The best way to achieve that
is making the API no larger than necessary.

So, what's your suggestion, remove all read*/skip* functions for
example? Make them member functions of JSONParserRange instead of UFCS
functions? We could of course also just use the pseudo modules that
std.algorithm had for example, where we'd create a table in the
documentation for each category of functions.

Another thing I'd like to add is an output range that takes parser nodes
and writes to a string output range. This would be the kind of interface
that would be most useful for a serialization framework.

Couldn't that be achieved trivially by e.g. using map!(t => t.toString)
or similar?

This is the nice thing about rangifying everything - suddenly you have a
host of tools at your disposal.

No, the idea is to have an output range like so:

Appender!string dst;
JSONNodeOutputRange r(&dst);
r.put(beginArray);
r.put(1);
r.put(2);
r.put(endArray);

This would provide a forward interface for code that has to directly
iterate over its input, which is the case for a serializer - it can't
provide an input range interface in a sane way. The alternative would be
to either let the serializer re-implement all of JSON, or to just
provide some primitives (writeJSON() that takes bool, number or string)
and to let the serializer implement the rest of JSON (arrays/objects),
which includes certain options, such as pretty-printing.

- Also, at token level strings should be stored with escapes unresolved.
If the user wants a string with the escapes resolved, a lazy range
does it.

To make things efficient, it currently stores escaped strings if slices
of the input are used, but stores unescaped strings if allocations are
necessary anyway.

That seems a good balance, and probably could be applied to numbers as
well.

With the difference that numbers stored as numbers never need to
allocate, so for non-slicable inputs the compromise is not the same.

What about just offering basically three (CT selectable) modes:
- Always parse as double (parse lazily if slicing can be used) (default)
- Parse double or long (again, lazily if slicing can be used)
- Always store the string representation

The question that remains is how to handle this in JSONValue - support
just double there? Or something like JSONNumber that abstracts away the
differences, but makes writing generic code against JSONValue difficult?
Or make it also parameterized in what it can store?

- Validating UTF is tricky; I've seen some discussion in this thread
about it. On the face of it JSON only accepts valid UTF characters. As
such, a modularity-based argument is to pipe UTF validation before
tokenization. (We need a lazy UTF validator and sanitizer stat!) An
efficiency-based argument is to do validation during tokenization. I'm
inclining in favor of modularization, which allows us to focus on one
thing at a time and do it well, instead of duplicationg validation
everywhere. Note that it's easy to write routines that do JSON
tokenization and leave UTF validation for later, so there's a lot of
flexibility in composing validation with JSONization.

It's unfortunate to see this change of mind in face of the work that
already went into the implementation. I also still think that this is a
good optimization opportunity that doesn't really affect the
implementation complexity. Validation isn't duplicated, but reused from
std.utf.

Well if the validation is reused from std.utf, it can't have been very
much work. I maintain that separating concerns seems like a good
strategy here.

There is more than the actual call to validate(), such as writing tests
and making sure the surroundings work, adjusting the interface and
writing documentation. It's not *that* much work, but nonetheless wasted
work.

I also still think that this hasn't been a bad idea at all. Because it
speeds up the most important use case, parsing JSON from a non-memory
source that has not yet been validated. I also very much like the idea
of making it a programming error to have invalid UTF stored in a string,
i.e. forcing the validation to happen before the cast from bytes to chars.

- Litmus test: if the input type is a forward range AND if the string
type chosen for tokens is the same as input type, successful
tokenization should allocate exactly zero memory. I think this is a
simple way to make sure that the tokenization API works well.

Supporting arbitrary forward ranges doesn't seem to be enough, it would
at least have to be combined with something like take(), but then the
type doesn't equal the string type anymore. I'd suggest to keep it to
"if is sliceable and input type equals string type", at least for the
initial version.

I had "take" in mind. Don't forget that "take" automatically uses slices
wherever applicable. So if you just use typeof(take(...)), you get the
best of all worlds.

The more restrictive version seems reasonable for the first release.

Okay.

- The JSON value does its own internal allocation (for e.g. arrays and
hashtables), which should be fine as long as it's encapsulated and we
can tweak it later (e.g. make it use reference counting inside).

Since it's based on (Tagged)Algebraic, the internal types are part of
the interface. Changing them later is bound to break some code. So AFICS
this would either require to make the types used parameterized (string,
array and AA types). Or to abstract them away completely, i.e. only
forward operations but deny direct access to the type.

... thinking about it, TaggedAlgebraic could do that, while Algebraic
can't.

Well if you figure the general Algebraic type is better replaced by a
type specialized for JSON, fine.

What we shouldn't endorse is two nearly identical library types
(Algebraic and TaggedAlgebraic) that are only different in subtle
matters related to performance in certain use patterns.

If integral tags are better for closed type universes, specialize
Algebraic to use integral tags where applicable.

TaggedAlgebraic would not be a type specialized for JSON! It's useful
for all kinds of applications and just happens to have some advantages
here, too.

An (imperfect) idea for merging this with the existing Algebraic name:

template Algebraic(T)
if (is(T == struct) || is(T == union))
{
// ... implementation of TaggedAlgebraic ...
}

To avoid the ambiguity with a single type Algebraic, a UDA could be
required for T to get the actual TaggedAgebraic behavior.

Everything else would be problematic, because TaggedAlgebraic needs to
be supplied with names for the different types, so the Algebraic(T...)
way of specifying allowed types doesn't really work. And, more
importantly, because exploiting static type information in the generated
interface means breaking code that currently is built around a Variant
return value.

- Why both parseJSONStream and parseJSONValue? I'm thinking
parseJSONValue would be enough because then you trivially parse a stream
with repeated calls to parseJSONValue.

parseJSONStream is the pull parser (StAX style) interface. It returns
the contents of a JSON document as individual nodes instead of storing
them in a DOM. This part is vital for high-performance parsing,
especially of large documents.

So perhaps this is just a naming issue. The names don't suggest
everything you said. What I see is "parse a JSON stream" and "parse a
JSON value". So I naturally assumed we're looking at consuming a full
stream vs. consuming only one value off a stream and stopping. How about
better names?

parseToJSONValue/parseToJSONStream? parseAsX?

- readArray suddenly introduces a distinct kind of interacting -
callbacks. Why? Should be a lazy range lazy range lazy range. An adapter
using callbacks is then a two-liner.

It just has a more complicated implementation, but is already on the
TODO list.

Great. Let me say again that with ranges you get to instantly tap into a
wealth of tools. I say get rid of the callbacks and let a "tee" take
care of it for whomever needs it.

The callbacks would surely be dropped when ranges get available.
foreach() should usually be all that is needed.

- Why is readBool even needed? Just readJSONValue and then enforce it as
a bool. Same reasoning applies to readDouble and readString.

This is for lower level access, using parseJSONValue would certainly be
possible, but it would have quite some unneeded overhead and would also
be non- nogc.

Meh, fine. But all of this is adding weight to the API in the wrong places.

Frankly, I don't think that this is even the wrong place. The pull
parser interface is the single most important part of the API when we
talk about allocation-less and high-performance operation. It also
really has low weight, as it's just a small function that joins the
other read* functions quite naturally and doesn't create any additional
cognitive load.

- readObject is with callbacks again - it would be nice if it were a
lazy range.

Okay, is also already on the list.

Awes!

It could return a Tuple!(string, JSONNodeRange). But probably there
should also be an opApply for the object field range, so that foreach
(key, value; ...) becomes possible.

But apart from that, algebraic is unfortunately currently quite unsuited
for this kind of abstraction, even if that can be solved in theory (with
a lot of work). It requires to write things like
obj.get!(JSONValue[string])["foo"].get!JSONValue instead of just
obj["foo"], because it simply returns Variant from all of its forwarded
operators.

Algebraic does not expose opIndex. We could add it to Algebraic such
that obj["foo"] returns the same type a "this".

https://github.com/D-Programming-Language/phobos/blob/6df5d551fd8a21feef061483c226e7d9b26d6cd4/std/variant.d#L1088
https://github.com/D-Programming-Language/phobos/blob/6df5d551fd8a21feef061483c226e7d9b26d6cd4/std/variant.d#L1348

It's easy for anyone to say that what's there is unfit for a particular
purpose. It's also easy for many to define a ever-so-slightly-different
new artifact that fits a particular purpose. Where you come as a
talented hacker is to operate with the understanding of the importance
of making things work, and make it work.

The problem is that making Algebraic exploit static type information
means nothing short of a complete reimplementation, which
TaggedAlgebraic is. It also means breaking existing code, if, for
example, alg[0] suddenly returns a string instead of just a Variant with
a string stored inside.

- JSONValue should be more opaque and not expose representation as much
as it does now. In particular, offering a built-in hashtable is bound to
be problematic because those are expensive to construct, create garbage,
and are not customizable. Instead, the necessary lookup and set APIs
should be provided by JSONValue whilst keeping the implementation
hidden. The same goes about array - a JSONValue shall not be exposed;
instead, indexed access primitives should be exposed. Separate types
might be offered (e.g. JSONArray, JSONDictionary) if deemed necessary.
The string type should be a type parameter of JSONValue.

This would unfortunately at the same time destroy almost all benefits
that using (Tagged)Algebraic has, namely that it would opens up the
possibility to have interoperability between different data formats (for
example, passing a JSONValue to a BSON generator without letting the
BSON generator know about JSON). This is unfortunately an area that I've
also not yet properly explored, but I think it's important as we go
forward with other data formats.

I think we need to do it. Otherwise we're stuck with "D's JSON API
cannot be used without the GC". We want to escape that gravitational
pull. I know it's hard. But it's worth it.

I can't fight the feeling that what Phobos currently has in terms of
allocators, containters and reference counting is simply not mature
enough to make a good decision here. Restricting JSONValue as much as
possible would at least keep the possibility to extend it later, but I
think that we can and should do better in the long term.

Aug 22 2015

"Martin Nowak" <code dawg.eu> writes:

On Saturday, 22 August 2015 at 13:41:49 UTC, Sönke Ludwig wrote:
 There is more than the actual call to validate(), such as 
 writing tests and making sure the surroundings work, adjusting 
 the interface and writing documentation. It's not *that* much 
 work, but nonetheless wasted work.

 I also still think that this hasn't been a bad idea at all. 
 Because it speeds up the most important use case, parsing JSON 
 from a non-memory source that has not yet been validated. I 
 also very much like the idea of making it a programming error 
 to have invalid UTF stored in a string, i.e. forcing the 
 validation to happen before the cast from bytes to chars.

Also see "utf/unicode should only be validated once"
https://issues.dlang.org/show_bug.cgi?id=14919

If combining lexing and validation is faster (why?) then a ubyte 
consuming interface should be available, though why couldn't it 
be done by adding a lazy ubyte->char validator range to std.utf.
In any case during lexing we should avoid autodecoding of narrow 
strings for redundant validation.

Aug 24 2015

=?UTF-8?Q?S=c3=b6nke_Ludwig?= <sludwig rejectedsoftware.com> writes:

Am 25.08.2015 um 07:55 schrieb Martin Nowak:
 On Saturday, 22 August 2015 at 13:41:49 UTC, Sönke Ludwig wrote:
 There is more than the actual call to validate(), such as writing
 tests and making sure the surroundings work, adjusting the interface
 and writing documentation. It's not *that* much work, but nonetheless
 wasted work.

 I also still think that this hasn't been a bad idea at all. Because it
 speeds up the most important use case, parsing JSON from a non-memory
 source that has not yet been validated. I also very much like the idea
 of making it a programming error to have invalid UTF stored in a
 string, i.e. forcing the validation to happen before the cast from
 bytes to chars.

 Also see "utf/unicode should only be validated once"
 https://issues.dlang.org/show_bug.cgi?id=14919

 If combining lexing and validation is faster (why?) then a ubyte
 consuming interface should be available, though why couldn't it be done
 by adding a lazy ubyte->char validator range to std.utf.
 In any case during lexing we should avoid autodecoding of narrow strings
 for redundant validation.

The performance benefit comes from the fact that almost all of JSON is a 
subset of ASCII, so that lexing the input will implicitly validate it as 
correct UTF. The only places where actual UTF sequences can occur is in 
string literals outside of escape sequences. Depending on the type of 
document, that can result is a lot less conditionals compared to a full 
validation of the input.

Autodecoding during lexing is being avoided, everything happens on the 
code unit level.

Aug 25 2015

Martin Nowak <code+news.digitalmars dawg.eu> writes:

On 08/25/2015 09:03 AM, Sönke Ludwig wrote:
 The performance benefit comes from the fact that almost all of JSON is a
 subset of ASCII, so that lexing the input will implicitly validate it as
 correct UTF. The only places where actual UTF sequences can occur is in
 string literals outside of escape sequences. Depending on the type of
 document, that can result is a lot less conditionals compared to a full
 validation of the input.

I see, then we should indeed exploit this fact and offer lexing of
ubyte[]-ish ranges.

Aug 25 2015

Timon Gehr <timon.gehr gmx.ch> writes:

On 08/18/2015 12:21 AM, Andrei Alexandrescu wrote:
 - JSONValue should offer a byToken range, which offers the contents of
 the value one token at a time. For example, "[ 1, 2, 3 ]" offers the '['
 token followed by three numeric tokens with the respective values
 followed by the ']' token.

What about the comma tokens?

Aug 19 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 8/19/15 8:42 AM, Timon Gehr wrote:
 On 08/18/2015 12:21 AM, Andrei Alexandrescu wrote:
 - JSONValue should offer a byToken range, which offers the contents of
 the value one token at a time. For example, "[ 1, 2, 3 ]" offers the '['
 token followed by three numeric tokens with the respective values
 followed by the ']' token.

 What about the comma tokens?

Forgot about those. The invariant is that byToken should return a 
sequence of tokens that, when parsed, produces the originating object. 
-- Andrei

Aug 19 2015

Jacob Carlborg <doob me.com> writes:

On 2015-08-19 19:29, Andrei Alexandrescu wrote:

 Forgot about those. The invariant is that byToken should return a
 sequence of tokens that, when parsed, produces the originating object.

That should be possible without the comma tokens in this case?

-- 
/Jacob Carlborg

Aug 19 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 8/19/15 1:59 PM, Jacob Carlborg wrote:
 On 2015-08-19 19:29, Andrei Alexandrescu wrote:

 Forgot about those. The invariant is that byToken should return a
 sequence of tokens that, when parsed, produces the originating object.

 That should be possible without the comma tokens in this case?

That is correct, but would do little else than confusing folks. FWIW the 
distinction is similar to AST vs. CST (C = Concrete). -- Andrei

Aug 19 2015

Martin Nowak <code+news.digitalmars dawg.eu> writes:

On 08/18/2015 12:21 AM, Andrei Alexandrescu wrote:
 * All new stuff should go in std.experimental. I assume "stdx" would
 change to that, should this work be merged.

Though stdx (or better std.x) would have been a prettier and more
exciting name for std.experimental to begin with.

Aug 24 2015

Timon Gehr <timon.gehr gmx.ch> writes:

On 08/25/2015 08:18 AM, Martin Nowak wrote:
 On 08/18/2015 12:21 AM, Andrei Alexandrescu wrote:
 * All new stuff should go in std.experimental. I assume "stdx" would
 change to that, should this work be merged.

 Though stdx (or better std.x) would have been a prettier and more
 exciting name for std.experimental to begin with.

The great thing about the experimental package is that we are actually 
allowed to rename it. :-)

Aug 25 2015

Steven Schveighoffer <schveiguy yahoo.com> writes:

On 8/25/15 11:02 AM, Timon Gehr wrote:
 On 08/25/2015 08:18 AM, Martin Nowak wrote:
 On 08/18/2015 12:21 AM, Andrei Alexandrescu wrote:
 * All new stuff should go in std.experimental. I assume "stdx" would
 change to that, should this work be merged.

 Though stdx (or better std.x) would have been a prettier and more
 exciting name for std.experimental to begin with.

 The great thing about the experimental package is that we are actually
 allowed to rename it. :-)

I strongly oppose renaming it. I don't want Phobos to fall into the trap 
of javax, which was supposed to be "experimental" but then became unmovable.

std.experimental is much more obvious that you shouldn't expect things 
to live there forever.

-Steve

Aug 25 2015

Martin Nowak <code+news.digitalmars dawg.eu> writes:

Will try to convert a piece of code I wrote a few days ago.
https://github.com/MartinNowak/rabbitmq-munin/blob/48c3e7451dec0dcb2b6dccbb9b4230b224e2e647/src/app.d
Right now working with json for trivial stuff is a pain.

Aug 25 2015

tired_eyes <pastuhov85 gmail.com> writes:

So, what is the current status of std.data.json? This topic is 
almost two month old, what is the result of "two week process"? 
Wiki page tells nothing except of "ready for comments".

Sep 24 2015

Atila Neves <atila.neves gmail.com> writes:

On Thursday, 24 September 2015 at 20:44:57 UTC, tired_eyes wrote:
 So, what is the current status of std.data.json? This topic is 
 almost two month old, what is the result of "two week process"? 
 Wiki page tells nothing except of "ready for comments".

I probably should have posted here. Soenke is working on all the 
comments as far as I know. It'll come back.

Atila

Sep 24 2015

Marco Leise <Marco.Leise gmx.de> writes:

Am Tue, 28 Jul 2015 14:07:18 +0000
schrieb "Atila Neves" <atila.neves gmail.com>:

 Start of the two week process, folks.
 
 Code: https://github.com/s-ludwig/std_data_json
 Docs: http://s-ludwig.github.io/std_data_json/
 
 Atila

There is one thing I noticed today that I personally feel
strongly about: Serialized double values are not restored
accurately. That is, when I send a double value via JSON and
use enough digits to represent it accurately, it may not be
decoded to the same value. `std.json` does not have this
problem with the random values from [0..1) I tested with.
I also tried `LexOptions.useBigInt/.useLong` to no avail.

Looking at the unittests it seems the decision was deliberate,
as `approxEqual` is used in parsing tests. JSON specs don't
enforce any specific accuracy, but they say that you can
arrange for a lossless transmission of the widely supported
IEEE double values, by using up to 17 significant digits.

-- 
Marco

Oct 02 2015

Alex <a b.c> writes:

JSON is a particular file format useful for serialising 
heirachical data.

Given that D also has an XML module which appears to be 
deprecated, I wonder if it would be better to write a more 
abstract serialisation/persistance module that could use either 
json,xml,some binary format and future formats.

I would estimate that more than 70% of the times, the JSON data 
will only be read and written by a single D application, with 
only occasional inspection by developers etc.
In these cases it is undesirable to have code littered with types 
coming from a particular serialisation file format library.
As the software evolves that file format might become 
obsolete/slow/unfashionable etc, and it would be much nicer if 
the format could be changed without a lot of code being touched.
The other 30% of uses will genuinely need raw JSON control when 
reading/writing files written/read by other software, and this 
needs to be in Phobos to implement the backends.
It would be better for most people to not write their code in 
terms of JSON, but in terms of the more abstract concept of 
persistence/serialisation (whatever you want to call it).

Oct 06 2015

=?UTF-8?Q?S=c3=b6nke_Ludwig?= <sludwig rejectedsoftware.com> writes:

Am 06.10.2015 um 12:05 schrieb Alex:
 JSON is a particular file format useful for serialising heirachical data.

 Given that D also has an XML module which appears to be deprecated, I
 wonder if it would be better to write a more abstract
 serialisation/persistance module that could use either json,xml,some
 binary format and future formats.

 I would estimate that more than 70% of the times, the JSON data will
 only be read and written by a single D application, with only occasional
 inspection by developers etc.
 In these cases it is undesirable to have code littered with types coming
 from a particular serialisation file format library.
 As the software evolves that file format might become
 obsolete/slow/unfashionable etc, and it would be much nicer if the
 format could be changed without a lot of code being touched.
 The other 30% of uses will genuinely need raw JSON control when
 reading/writing files written/read by other software, and this needs to
 be in Phobos to implement the backends.
 It would be better for most people to not write their code in terms of
 JSON, but in terms of the more abstract concept of
 persistence/serialisation (whatever you want to call it).

A generic serialization framework is definitely needed! Jacob Carlborg 
had once tried to get the Orange[1] serialization library into Phobos, 
but the amount of requested changes was quite overwhelming and it didn't 
work out so far. There is also a serialization framework in vibe.d[2], 
but in contrast to Orange it doesn't handle cross references (for 
pointers/reference types).

But this is definitely outside of the scope of this particular module 
and will require a separate effort. It is intended to be well suited for 
that purpose, though.

[1]: https://github.com/jacob-carlborg/orange
[2]: http://vibed.org/api/vibe.data.serialization/

Oct 06 2015

Sebastiaan Koppe <mail skoppe.eu> writes:

On Tuesday, 6 October 2015 at 10:05:46 UTC, Alex wrote:
 I wonder if it would be better to write a more abstract 
 serialisation/persistance module that could use either 
 json,xml,some binary format and future formats.

I think there are too many particulars making an abstract 
(de)serialization module unworkable.

If that wasn't the case it would be easy to transform any format 
into another, by simply deserializing from format A and 
serializing to format B. But a little experiment will show you 
that it requires a lot of complexity for the non-trivial case. 
And the format's particulars will still show up in your code.

At which point it begs the question, why not just write simple 
primitive (de)serialization modules that only do one format? 
Probably easier to build, maintain and debug.

I am reminded of a binary file format I once wrote which 
supported referenced objects and had enough meta-data to allow 
garbage collection. It was a big ugly c++ template monster. Any 
abstract deserializer is going to stay away from that.

Oct 06 2015

Atila Neves <atila.neves gmail.com> writes:

On Tuesday, 6 October 2015 at 15:47:08 UTC, Sebastiaan Koppe 
wrote:
 At which point it begs the question, why not just write simple 
 primitive (de)serialization modules that only do one format? 
 Probably easier to build, maintain and debug.

The binary one is the one I care about, so that's the one I wrote:

https://github.com/atilaneves/cerealed

I've thinking of adding other formats. I don't know if it's worth 
it.

Atila

Oct 06 2015

=?UTF-8?B?TcOhcmNpbw==?= Martins <marcioapm gmail.com> writes:

On Tuesday, 28 July 2015 at 14:07:19 UTC, Atila Neves wrote:
 Start of the two week process, folks.

 Code: https://github.com/s-ludwig/std_data_json
 Docs: http://s-ludwig.github.io/std_data_json/

 Atila

Sorry for the late ping, but it's been 3 years - what has 
happened to this? Has it been forgotten?

Working with JSON in D is still quite painful.

Oct 09 2018

Nicholas Wilson <iamthewilsonator hotmail.com> writes:

On Tuesday, 9 October 2018 at 18:07:44 UTC, Márcio Martins wrote:
 On Tuesday, 28 July 2015 at 14:07:19 UTC, Atila Neves wrote:
 Start of the two week process, folks.

 Code: https://github.com/s-ludwig/std_data_json
 Docs: http://s-ludwig.github.io/std_data_json/

 Atila

 Sorry for the late ping, but it's been 3 years - what has 
 happened to this? Has it been forgotten?

 Working with JSON in D is still quite painful.

I presume it became vibe.data.json, there is also asdf if you're 
looking for some other library.

Oct 09 2018

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Tuesday, October 9, 2018 5:45:02 PM MDT Nicholas Wilson via Digitalmars-d 
wrote:
 On Tuesday, 9 October 2018 at 18:07:44 UTC, M�rcio Martins wrote:
 On Tuesday, 28 July 2015 at 14:07:19 UTC, Atila Neves wrote:
 Start of the two week process, folks.

 Code: https://github.com/s-ludwig/std_data_json
 Docs: http://s-ludwig.github.io/std_data_json/

 Atila

 Sorry for the late ping, but it's been 3 years - what has
 happened to this? Has it been forgotten?

 Working with JSON in D is still quite painful.

 I presume it became vibe.data.json, there is also asdf if you're
 looking for some other library.

As I understand it, it was originally part of vibe.d (though I think that it
may have been internal-only to be begin with) and was put into std_data_json
for the attempt to get into Phobos. The version inside of vibe.d has
continued to be maintained while the separate version hasn't really been.
Either way, there wasn't enough agreement about the design during the Phobos
review process for it to make it into Phobos, and Sonke gave up on it.

I've used std_data_json on a few projects, and it works reasonably well for
reading JSON, but I've found it rather frustrating when writing JSON,
because you have no control over the order it writes data in - which results
in perfectly valid JSON, since the key-value pairs are not ordered, but it's
really annoying to use it for configuration files and the like where you
organize the file the way you'd like, and then your program completely
reorders it when it needs to make an adjustment to the file. However, I
really need to check out the properly maintained version in vibe.d, and as
you say, there are other JSON parsers on code.dlang.org such as asdf.
Writing a JSON parser is pretty easy. It's coming up with an API that would
get through the Phobos review process that's not necessarily easy.

While we would like to replace std.json, someone is going to have to put in
the effort to write (or complete) something - and push it through the Phobos
review process - in order to replace std.json. And while there's clearly
some interest in having certain modules in Phobos replaced (or in some
cases, new modules added), there doesn't seem to be many people willing to
push their code through the Phobos review process at this point even if
they've put the time and effort into writing the code. They're far more
willing to just put it up on code.dlang.org and leave it at that. I think
that the fact that code.dlang.org works as well as it does has to a large
extent killed off interest in attempting to put anything through the Phobos
review process. It's been quite some time since anyone has made the attempt.

Personally, I don't think that we even need some of the stuff in Phobos
that's in there (like JSON or XML parsers) and that having it on
code.dlang.org makes more sense, but I do think that having subpar versions
of them in Phobos is a problem - arguably even more so when we say that at
the top of the documentation and have had for years as is the case with
std.xml. I don't think that std.json is rated as badly, but it's been talked
about as needing replacement for years, and std_data_json would have
replaced it had it made it through the review process. So, we should
probably replace it with something one of these days, but of course, someone
has to put in the time and effort, which no one wants to do.

- Jonathan M Davis

Oct 09 2018

Basile B. <b2.temp gmx.com> writes:

On Tuesday, 9 October 2018 at 18:07:44 UTC, Márcio Martins wrote:
 On Tuesday, 28 July 2015 at 14:07:19 UTC, Atila Neves wrote:
 Start of the two week process, folks.

 Code: https://github.com/s-ludwig/std_data_json
 Docs: http://s-ludwig.github.io/std_data_json/

 Atila

 Sorry for the late ping, but it's been 3 years - what has 
 happened to this? Has it been forgotten?

It's been moved here 
https://github.com/dlang-community/std_data_json since a few 
weeks. Contributions are welcome and actually if someone want to 
take the leadership for this library then show some willingness 
and you'll get invited.

 Working with JSON in D is still quite painful

Oct 09 2018

D Programming

C/C++ Programming

Other

digitalmars.D - std.data.json formal review