www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - std.data.json formal review

reply "Atila Neves" <atila.neves gmail.com> writes:
Start of the two week process, folks.

Code: https://github.com/s-ludwig/std_data_json
Docs: http://s-ludwig.github.io/std_data_json/

Atila
Jul 28 2015
next sibling parent reply Rikki Cattermole <alphaglosined gmail.com> writes:
On 29/07/2015 2:07 a.m., Atila Neves wrote:
 Start of the two week process, folks.

 Code: https://github.com/s-ludwig/std_data_json
 Docs: http://s-ludwig.github.io/std_data_json/

 Atila
Right now, my view is no. Unless there is some sort of proof that it will work with allocators. I have used the code from vibe.d days so its not an issue of how well it works nor nit picky. Just can I pass it an allocator (optionally) and have it use that for all memory usage? After all, I really would rather be able to deallocate all memory allocated during a request then you know, rely on the GC.
Jul 28 2015
next sibling parent "Etienne Cimon" <etcimon gmail.com> writes:
On Tuesday, 28 July 2015 at 15:07:46 UTC, Rikki Cattermole wrote:
 On 29/07/2015 2:07 a.m., Atila Neves wrote:
 Start of the two week process, folks.

 Code: https://github.com/s-ludwig/std_data_json
 Docs: http://s-ludwig.github.io/std_data_json/

 Atila
Right now, my view is no. Unless there is some sort of proof that it will work with allocators. I have used the code from vibe.d days so its not an issue of how well it works nor nit picky. Just can I pass it an allocator (optionally) and have it use that for all memory usage? After all, I really would rather be able to deallocate all memory allocated during a request then you know, rely on the GC.
I totally agree with that, but shouldn't it be consistent in Phobos? I don't think it's possible to make an interface for custom allocators right now, because that question simply hasn't been ironed out along with std.allocator. So, anything related to allocators belongs in another thread imo, and the review process here would be about the actual json interface
Jul 28 2015
prev sibling next sibling parent reply "Brad Anderson" <eco gnuk.net> writes:
On Tuesday, 28 July 2015 at 15:07:46 UTC, Rikki Cattermole wrote:
 On 29/07/2015 2:07 a.m., Atila Neves wrote:
 Start of the two week process, folks.

 Code: https://github.com/s-ludwig/std_data_json
 Docs: http://s-ludwig.github.io/std_data_json/

 Atila
Right now, my view is no.
Just a reminder that this is the review thread, not the vote thread (in case anyone reading got confused).
 Unless there is some sort of proof that it will work with 
 allocators.

 I have used the code from vibe.d days so its not an issue of 
 how well it works nor nit picky. Just can I pass it an 
 allocator (optionally) and have it use that for all memory 
 usage?

 After all, I really would rather be able to deallocate all 
 memory allocated during a request then you know, rely on the GC.
That's a good point. This is the perfect opportunity to hammer out how allocators are going to be integrated into other parts of Phobos.
Jul 28 2015
next sibling parent reply "Etienne Cimon" <etcimon gmail.com> writes:
On Tuesday, 28 July 2015 at 15:55:04 UTC, Brad Anderson wrote:
 On Tuesday, 28 July 2015 at 15:07:46 UTC, Rikki Cattermole 
 wrote:
 On 29/07/2015 2:07 a.m., Atila Neves wrote:
 Start of the two week process, folks.

 Code: https://github.com/s-ludwig/std_data_json
 Docs: http://s-ludwig.github.io/std_data_json/

 Atila
Right now, my view is no.
Just a reminder that this is the review thread, not the vote thread (in case anyone reading got confused).
 Unless there is some sort of proof that it will work with 
 allocators.

 I have used the code from vibe.d days so its not an issue of 
 how well it works nor nit picky. Just can I pass it an 
 allocator (optionally) and have it use that for all memory 
 usage?

 After all, I really would rather be able to deallocate all 
 memory allocated during a request then you know, rely on the 
 GC.
That's a good point. This is the perfect opportunity to hammer out how allocators are going to be integrated into other parts of Phobos.
From what I see from std.allocator, there's no Allocator interface? I think this would require changing the type to `struct JSONValue(Allocator)`, unless we see an actual interface implemented in phobos.
Jul 28 2015
parent Rikki Cattermole <alphaglosined gmail.com> writes:
On 29/07/2015 4:23 a.m., Etienne Cimon wrote:
 On Tuesday, 28 July 2015 at 15:55:04 UTC, Brad Anderson wrote:
 On Tuesday, 28 July 2015 at 15:07:46 UTC, Rikki Cattermole wrote:
 On 29/07/2015 2:07 a.m., Atila Neves wrote:
 Start of the two week process, folks.

 Code: https://github.com/s-ludwig/std_data_json
 Docs: http://s-ludwig.github.io/std_data_json/

 Atila
Right now, my view is no.
Just a reminder that this is the review thread, not the vote thread (in case anyone reading got confused).
 Unless there is some sort of proof that it will work with allocators.

 I have used the code from vibe.d days so its not an issue of how well
 it works nor nit picky. Just can I pass it an allocator (optionally)
 and have it use that for all memory usage?

 After all, I really would rather be able to deallocate all memory
 allocated during a request then you know, rely on the GC.
That's a good point. This is the perfect opportunity to hammer out how allocators are going to be integrated into other parts of Phobos.
From what I see from std.allocator, there's no Allocator interface? I think this would require changing the type to `struct JSONValue(Allocator)`, unless we see an actual interface implemented in phobos.
There is one. IAllocator. I use it throughout std.experimental.image. Unfortunately site is down atm so can't link docs *grumbles*. Btw even if an allocator is a struct, there is a type to wrap it up in a class.
Jul 28 2015
prev sibling parent reply Mathias Lang via Digitalmars-d <digitalmars-d puremagic.com> writes:
2015-07-28 17:55 GMT+02:00 Brad Anderson via Digitalmars-d <
digitalmars-d puremagic.com>:

  Unless there is some sort of proof that it will work with allocators.
 I have used the code from vibe.d days so its not an issue of how well it
 works nor nit picky. Just can I pass it an allocator (optionally) and have
 it use that for all memory usage?

 After all, I really would rather be able to deallocate all memory
 allocated during a request then you know, rely on the GC.
That's a good point. This is the perfect opportunity to hammer out how allocators are going to be integrated into other parts of Phobos.
Allocator is definitely a separate issue. It's a moving target, it's not yet part of a release, and consequently barely field-tested. We will find bugs, we might find design mistakes, we might head in a direction which will turn out to be an anti-pattern (just like `opDispatch` for JSONValue ;) ) It's not to say the quality of the module isn't good - that would mean our release process is broken -, but making a module inclusion to experimental dependent on another module in experimental will not improve the quality of the reviewed module.
Jul 28 2015
parent Rikki Cattermole <alphaglosined gmail.com> writes:
On 29/07/2015 4:25 a.m., Mathias Lang via Digitalmars-d wrote:
 2015-07-28 17:55 GMT+02:00 Brad Anderson via Digitalmars-d
 <digitalmars-d puremagic.com <mailto:digitalmars-d puremagic.com>>:


         Unless there is some sort of proof that it will work with
         allocators.

         I have used the code from vibe.d days so its not an issue of how
         well it works nor nit picky. Just can I pass it an allocator
         (optionally) and have it use that for all memory usage?

         After all, I really would rather be able to deallocate all
         memory allocated during a request then you know, rely on the GC.


     That's a good point. This is the perfect opportunity to hammer out
     how allocators are going to be integrated into other parts of Phobos.


 Allocator is definitely a separate issue. It's a moving target, it's not
 yet part of a release, and consequently barely field-tested. We will
 find bugs, we might find design mistakes, we might head in a direction
 which will turn out to be an anti-pattern (just like `opDispatch` for
 JSONValue ;) )
 It's not to say the quality of the module isn't good - that would mean
 our release process is broken -, but making a module inclusion to
 experimental dependent on another module in experimental will not
 improve the quality of the reviewed module.
Right now we just need a plan, and we're all good for std.data.json. Doesn't need to implemented right now, but I'd rather we had a plan going forward to add allocators to it, then you know find out a year down the track that it would need a whole rewrite.
Jul 28 2015
prev sibling next sibling parent reply =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:
Am 28.07.2015 um 17:07 schrieb Rikki Cattermole:
 On 29/07/2015 2:07 a.m., Atila Neves wrote:
 Start of the two week process, folks.

 Code: https://github.com/s-ludwig/std_data_json
 Docs: http://s-ludwig.github.io/std_data_json/

 Atila
Right now, my view is no. Unless there is some sort of proof that it will work with allocators. I have used the code from vibe.d days so its not an issue of how well it works nor nit picky. Just can I pass it an allocator (optionally) and have it use that for all memory usage? After all, I really would rather be able to deallocate all memory allocated during a request then you know, rely on the GC.
If you pass a string or byte array as input, then there will be no allocations at all (the interface is nogc). For other cases it supports custom allocation through an appender factory [1][2], since there is no standard allocator interface, yet. But since that's the only place where memory is allocated (apart from lower level code, such as BigInt), as soon as Appender supports custom allocators, or you write your own appender, the JSON parser will, too. Only if you use the DOM parser, there will be some inevitable GC allocations, because the DOM representation uses dynamic and associative arrays. 1: https://github.com/s-ludwig/std_data_json/blob/aac6d846d596750623fd5c546343f4f9d19447fa/source/stdx/data/json/lexer.d#L66 2: https://github.com/s-ludwig/std_data_json/blob/aac6d846d596750623fd5c546343f4f9d19447fa/source/stdx/data/json/parser.d#L286
Jul 28 2015
parent Rikki Cattermole <alphaglosined gmail.com> writes:
On 29/07/2015 4:41 a.m., Sönke Ludwig wrote:
 Am 28.07.2015 um 17:07 schrieb Rikki Cattermole:
 On 29/07/2015 2:07 a.m., Atila Neves wrote:
 Start of the two week process, folks.

 Code: https://github.com/s-ludwig/std_data_json
 Docs: http://s-ludwig.github.io/std_data_json/

 Atila
Right now, my view is no. Unless there is some sort of proof that it will work with allocators. I have used the code from vibe.d days so its not an issue of how well it works nor nit picky. Just can I pass it an allocator (optionally) and have it use that for all memory usage? After all, I really would rather be able to deallocate all memory allocated during a request then you know, rely on the GC.
If you pass a string or byte array as input, then there will be no allocations at all (the interface is nogc). For other cases it supports custom allocation through an appender factory [1][2], since there is no standard allocator interface, yet. But since that's the only place where memory is allocated (apart from lower level code, such as BigInt), as soon as Appender supports custom allocators, or you write your own appender, the JSON parser will, too. Only if you use the DOM parser, there will be some inevitable GC allocations, because the DOM representation uses dynamic and associative arrays. 1: https://github.com/s-ludwig/std_data_json/blob/aac6d846d596750623fd5c546343f4f9d19447fa/source/stdx/data/json/lexer.d#L66 2: https://github.com/s-ludwig/std_data_json/blob/aac6d846d596750623fd5c546343f4f9d19447fa/source/stdx/data/json/parser.d#L286
It was after 3am when I did my initial look. But I saw the appender usage. I'm ok with this. The DOM parser on the other hand.. ugh this is where we do need IAllocator being used. Although by the sounds of it, we would need a map collection which supports allocators before it can be done.
Jul 28 2015
prev sibling parent reply =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:
Am 28.07.2015 um 17:07 schrieb Rikki Cattermole:
 I have used the code from vibe.d days so its not an issue of how well it
 works nor nit picky.
You should still have a closer look, as it isn't very similar to the vibe.d code at all, but a rather radical evolution.
Jul 28 2015
parent Rikki Cattermole <alphaglosined gmail.com> writes:
On 29/07/2015 4:43 a.m., Sönke Ludwig wrote:
 Am 28.07.2015 um 17:07 schrieb Rikki Cattermole:
 I have used the code from vibe.d days so its not an issue of how well it
 works nor nit picky.
You should still have a closer look, as it isn't very similar to the vibe.d code at all, but a rather radical evolution.
Again after 3am when I first looked. I'll take a closer look and create a new thread on this post about anything I find.
Jul 28 2015
prev sibling next sibling parent reply "Etienne Cimon" <etcimon gmail.com> writes:
On Tuesday, 28 July 2015 at 14:07:19 UTC, Atila Neves wrote:
 Start of the two week process, folks.

 Code: https://github.com/s-ludwig/std_data_json
 Docs: http://s-ludwig.github.io/std_data_json/

 Atila
This is cool: https://github.com/s-ludwig/std_data_json/blob/aac6d846d596750623fd5c546343f4f9d19447fa/source/stdx/data/json/value.d#L183 I was getting tired of programmatically checking for null, then checking for object type, before moving along in the object and doing the same recursively. Not quite as intuitive as the optional chaining ?. operator in swift but it gets pretty close https://blog.sabintsev.com/optionals-in-swift-c94fd231e7a4#5622
Jul 28 2015
parent reply =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:
Am 28.07.2015 um 17:19 schrieb Etienne Cimon:
 On Tuesday, 28 July 2015 at 14:07:19 UTC, Atila Neves wrote:
 Start of the two week process, folks.

 Code: https://github.com/s-ludwig/std_data_json
 Docs: http://s-ludwig.github.io/std_data_json/

 Atila
This is cool: https://github.com/s-ludwig/std_data_json/blob/aac6d846d596750623fd5c546343f4f9d19447fa/source/stdx/data/json/value.d#L183 I was getting tired of programmatically checking for null, then checking for object type, before moving along in the object and doing the same recursively. Not quite as intuitive as the optional chaining ?. operator in swift but it gets pretty close https://blog.sabintsev.com/optionals-in-swift-c94fd231e7a4#5622
An idea might be to support something like this: json_value.opt.foo.bar[2].baz or opt(json_value).foo.bar[2].baz opt (name is debatable) would return a wrapper struct around the JSONValue that supports opDispatch/opIndex and propagates a missing field to the top gracefully. It could also keep track of the complete path to give a nice error message when a non-existent value is dereferenced.
Jul 28 2015
next sibling parent "Brad Anderson" <eco gnuk.net> writes:
On Tuesday, 28 July 2015 at 18:45:51 UTC, Sönke Ludwig wrote:
 An idea might be to support something like this:

 json_value.opt.foo.bar[2].baz
 or
 opt(json_value).foo.bar[2].baz

 opt (name is debatable) would return a wrapper struct around 
 the JSONValue that supports opDispatch/opIndex and propagates a 
 missing field to the top gracefully. It could also keep track 
 of the complete path to give a nice error message when a 
 non-existent value is dereferenced.
+1 This would solve the cumbersome access of something deeply nested that I've had to deal with when using stdx.data.json. Combine that with the Algebraic improvements you've mentioned before and it'll be just about as pleasant as it could be to use.
Jul 28 2015
prev sibling parent "Etienne Cimon" <etcimon gmail.com> writes:
On Tuesday, 28 July 2015 at 18:45:51 UTC, Sönke Ludwig wrote:
 Am 28.07.2015 um 17:19 schrieb Etienne Cimon:
 On Tuesday, 28 July 2015 at 14:07:19 UTC, Atila Neves wrote:
 Start of the two week process, folks.

 Code: https://github.com/s-ludwig/std_data_json
 Docs: http://s-ludwig.github.io/std_data_json/

 Atila
This is cool: https://github.com/s-ludwig/std_data_json/blob/aac6d846d596750623fd5c546343f4f9d19447fa/source/stdx/data/json/value.d#L183 I was getting tired of programmatically checking for null, then checking for object type, before moving along in the object and doing the same recursively. Not quite as intuitive as the optional chaining ?. operator in swift but it gets pretty close https://blog.sabintsev.com/optionals-in-swift-c94fd231e7a4#5622
An idea might be to support something like this: json_value.opt.foo.bar[2].baz or opt(json_value).foo.bar[2].baz opt (name is debatable) would return a wrapper struct around the JSONValue that supports opDispatch/opIndex and propagates a missing field to the top gracefully. It could also keep track of the complete path to give a nice error message when a non-existent value is dereferenced.
I like it quite well. No, actually, a lot. Thinking about it some more... this could end up being the most convenient feature ever known to mankind and would likely push it towards a new age of grand discoveries, infinite fusion power and space colonization. Lets do it
Jul 28 2015
prev sibling next sibling parent =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:
Am 28.07.2015 um 16:07 schrieb Atila Neves:
 Start of the two week process, folks.

 Code: https://github.com/s-ludwig/std_data_json
 Docs: http://s-ludwig.github.io/std_data_json/

 Atila
Thanks for making it happen! Can you also post a quick link to this thread in D.announce?
Jul 28 2015
prev sibling next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 7/28/2015 7:07 AM, Atila Neves wrote:
 Start of the two week process, folks.
Thank you very much, Sönke, for taking this on. Thank you, Atila, for taking on the thankless job of being review manager. Just looking at the documentation only, some general notes: 1. Not sure that 'JSON' needs to be embedded in the public names. 'parseJSONStream' should just be 'parseStream', etc. Name disambiguation, if needed, should be ably taken care of by a number of D features for that purpose. Additionally, I presume that the stdx.data package implies a number of different formats. These formats should all use the same names with as similar as possible APIs - this won't work too well if JSON is embedded in the APIs. 2. JSON is a trivial format, http://json.org/. But I count 6 files and 30 names in the public API. 3. Stepping back a bit, when I think of parsing JSON data, I think: auto ast = inputrange.toJSON(); where toJSON() accepts an input range and produces a container, the ast. The ast is just a JSON value. Then, I can just query the ast to see what kind of value it is (using overloading), and walk it as necessary. To create output: auto r = ast.toChars(); // r is an InputRange of characters writeln(r); So, we'll need: toJSON toChars JSONException The possible JSON values are: string number object (associative arrays) array true false null Since these are D builtin types, they can actually be a simple union of D builtin types. There is a decision needed about whether toJSON() allocates data or returns slices into its inputrange. This can be 'static if' tested by: if inputrange can return immutable slices. toChars() can take a compile time argument to determine if it is 'pretty' or not.
Jul 28 2015
next sibling parent reply "H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:
On Tue, Jul 28, 2015 at 03:29:02PM -0700, Walter Bright via Digitalmars-d wrote:
[...]
 3. Stepping back a bit, when I think of parsing JSON data, I think:
 
     auto ast = inputrange.toJSON();
 
 where toJSON() accepts an input range and produces a container, the
 ast. The ast is just a JSON value. Then, I can just query the ast to
 see what kind of value it is (using overloading), and walk it as
 necessary.
+1. The API should be as simple as possible. Ideally, I'd say hook it up to std.conv.to for maximum flexibility. Then you can just use to() to convert between a JSON container and the value that it represents (assuming the types are compatible). OTOH, some people might want the option of parser-driven data processing instead (e.g. the JSON data is very large and we don't want to store the whole thing in memory at once). I'm not sure what a good API for that would be, though.
 To create output:
 
     auto r = ast.toChars();  // r is an InputRange of characters
     writeln(r);
 
 So, we'll need:
     toJSON
     toChars
Shouldn't it just be toString()? [...]
 The possible JSON values are:
     string
     number
     object (associative arrays)
     array
     true
     false
     null
 
 Since these are D builtin types, they can actually be a simple union
 of D builtin types.
 
 There is a decision needed about whether toJSON() allocates data or
 returns slices into its inputrange. This can be 'static if' tested by:
 if inputrange can return immutable slices. toChars() can take a
 compile time argument to determine if it is 'pretty' or not.
Whether or not toJSON() allocates *data*, it will have to allocate container nodes of some sort. At the minimum, it will need to use AA's, so it cannot be nogc. T -- Recently, our IT department hired a bug-fix engineer. He used to work for Volkswagen.
Jul 28 2015
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 7/28/2015 3:37 PM, H. S. Teoh via Digitalmars-d wrote:
 On Tue, Jul 28, 2015 at 03:29:02PM -0700, Walter Bright via Digitalmars-d
wrote:
 Ideally, I'd say hook it up to std.conv.to for maximum flexibility. Then
 you can just use to() to convert between a JSON container and the value
 that it represents (assuming the types are compatible).
Well, I wouldn't want std.conv to be importing std.json.
 OTOH, some people might want the option of parser-driven data processing
 instead (e.g. the JSON data is very large and we don't want to store the
 whole thing in memory at once).
That is a good point.
 I'm not sure what a good API for that would be, though.
Probably simply returning an InputRange of JSON values.
 To create output:

      auto r = ast.toChars();  // r is an InputRange of characters
      writeln(r);

 So, we'll need:
      toJSON
      toChars
Shouldn't it just be toString()?
No. toString() returns a string, which has to be allocated. toChars() (an upcoming convention) would return an InputRange instead, side-stepping allocation.
 Whether or not toJSON() allocates *data*, it will have to allocate
 container nodes of some sort. At the minimum, it will need to use AA's,
 so it cannot be  nogc.
That's right. At some point the API will need to add a parameter for Andrei's allocator system.
Jul 28 2015
next sibling parent reply "H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:
On Tue, Jul 28, 2015 at 03:55:22PM -0700, Walter Bright via Digitalmars-d wrote:
 On 7/28/2015 3:37 PM, H. S. Teoh via Digitalmars-d wrote:
On Tue, Jul 28, 2015 at 03:29:02PM -0700, Walter Bright via Digitalmars-d wrote:
Ideally, I'd say hook it up to std.conv.to for maximum flexibility.
Then you can just use to() to convert between a JSON container and
the value that it represents (assuming the types are compatible).
Well, I wouldn't want std.conv to be importing std.json.
I'm pretty sure std.conv has interfaces that allow you to keep JSON-specific stuff in std.json, so that you don't get the JSON conversion capability until you actually import std.json.
OTOH, some people might want the option of parser-driven data
processing instead (e.g. the JSON data is very large and we don't
want to store the whole thing in memory at once).
That is a good point.
I'm not sure what a good API for that would be, though.
Probably simply returning an InputRange of JSON values.
But how would you capture the nesting substructures?
To create output:

     auto r = ast.toChars();  // r is an InputRange of characters
     writeln(r);

So, we'll need:
     toJSON
     toChars
Shouldn't it just be toString()?
No. toString() returns a string, which has to be allocated. toChars() (an upcoming convention) would return an InputRange instead, side-stepping allocation.
[...] ??! Surely you have heard of the non-allocating overload of toString? void toString(scope void delegate(const(char)[]) dg); T -- When solving a problem, take care that you do not become part of the problem.
Jul 28 2015
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 7/28/2015 5:15 PM, H. S. Teoh via Digitalmars-d wrote:
 Probably simply returning an InputRange of JSON values.
But how would you capture the nesting substructures?
A JSON value is a tagged union of the various types.
 ??!  Surely you have heard of the non-allocating overload of toString?
 	void toString(scope void delegate(const(char)[]) dg);
Not range friendly.
Jul 28 2015
parent reply Jacob Carlborg <doob me.com> writes:
On 2015-07-29 06:57, Walter Bright wrote:

 A JSON value is a tagged union of the various types.
But in most cases I think there will be one root node, of type object. In that case it would be range with only one element? How does that help? -- /Jacob Carlborg
Jul 29 2015
next sibling parent =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:
Am 29.07.2015 um 12:10 schrieb Jacob Carlborg:
 On 2015-07-29 06:57, Walter Bright wrote:

 A JSON value is a tagged union of the various types.
But in most cases I think there will be one root node, of type object. In that case it would be range with only one element? How does that help?
I think a better approach that to add such a special case is to add a readValue function that takes a range of parser nodes and reads into a single JSONValue. That way one can use the pull parser to jump between array or object entries and then extract individual values, or maybe even use nodes.map!readValue to get a range of values...
Jul 29 2015
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 7/29/2015 3:10 AM, Jacob Carlborg wrote:
 On 2015-07-29 06:57, Walter Bright wrote:

 A JSON value is a tagged union of the various types.
But in most cases I think there will be one root node, of type object.
An object is a collection of other Values.
 In that case it would be range with only one element? How does that help?
I don't understand the question.
Jul 29 2015
parent reply Jacob Carlborg <doob me.com> writes:
On 2015-07-29 20:33, Walter Bright wrote:

 On 7/29/2015 3:10 AM, Jacob Carlborg wrote:
 But in most cases I think there will be one root node, of type object.
An object is a collection of other Values. > In that case it would be range with only one element? How does that help? I don't understand the question.
I guess I'm finding it difficult to picture a JSON structure as a range. How would the following JSON be returned as a range? { "a": 1, "b": [2, 3], "c": { "d": 4 } } -- /Jacob Carlborg
Jul 29 2015
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 7/29/2015 11:51 AM, Jacob Carlborg wrote:
 I guess I'm finding it difficult to picture a JSON structure as a range. How
 would the following JSON be returned as a range?

 {
    "a": 1,
    "b": [2, 3],
    "c": { "d": 4 }
 }
It if was returned as a range of nodes, it would be: Object, string, number, string, array, number, number, end, string, object, string, number, end, end If was returned as a Value, then you could ask the value to return a range of nodes. A container is not a range, although it may offer a way to get range that iterates over its contents.
Jul 29 2015
parent Jacob Carlborg <doob me.com> writes:
On 2015-07-30 01:34, Walter Bright wrote:

 It if was returned as a range of nodes, it would be:

     Object, string, number, string, array, number, number, end, string,
 object, string, number, end, end
Ah, that make sense. Never though of an "end" mark like that, pretty cleaver. -- /Jacob Carlborg
Jul 30 2015
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 7/28/2015 3:55 PM, Walter Bright wrote:
 OTOH, some people might want the option of parser-driven data processing
 instead (e.g. the JSON data is very large and we don't want to store the
 whole thing in memory at once).
That is a good point.
So it appears that JSON can be in one of 3 useful states: 1. a range of characters (rc) 2. a range of nodes (rn) 3. a container of JSON values (values) What's necessary is simply the ability to convert between these states: (names are just for illustration) rn = rc.toNodes(); values = rn.toValues(); rn = values.toNodes(); rc = rn.toChars(); So, if I wanted to simply pretty print a JSON string s: s.toNodes.toChars(); I.e. it's all composable.
Jul 28 2015
next sibling parent reply "H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:
On Tue, Jul 28, 2015 at 10:43:20PM -0700, Walter Bright via Digitalmars-d wrote:
 On 7/28/2015 3:55 PM, Walter Bright wrote:
OTOH, some people might want the option of parser-driven data
processing instead (e.g. the JSON data is very large and we don't
want to store the whole thing in memory at once).
That is a good point.
So it appears that JSON can be in one of 3 useful states: 1. a range of characters (rc) 2. a range of nodes (rn) 3. a container of JSON values (values)
[...] How does a linear range of nodes convey a nested structure? T -- Let's call it an accidental feature. -- Larry Wall
Jul 28 2015
parent Walter Bright <newshound2 digitalmars.com> writes:
On 7/28/2015 10:49 PM, H. S. Teoh via Digitalmars-d wrote:
 How does a linear range of nodes convey a nested structure?
You'd need to add a special node type, 'end'. So an array [1,true] would look like: array number true end
Jul 29 2015
prev sibling parent reply =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:
Am 29.07.2015 um 07:43 schrieb Walter Bright:
 On 7/28/2015 3:55 PM, Walter Bright wrote:
 OTOH, some people might want the option of parser-driven data processing
 instead (e.g. the JSON data is very large and we don't want to store the
 whole thing in memory at once).
That is a good point.
So it appears that JSON can be in one of 3 useful states: 1. a range of characters (rc) 2. a range of nodes (rn) 3. a container of JSON values (values) What's necessary is simply the ability to convert between these states: (names are just for illustration) rn = rc.toNodes(); values = rn.toValues(); rn = values.toNodes(); rc = rn.toChars(); So, if I wanted to simply pretty print a JSON string s: s.toNodes.toChars(); I.e. it's all composable.
There are actually even four levels: 1. Range of characters 2. Range of tokens 3. Range of nodes 4. DOM value Having a special case for range of DOM values may or may not be a worthwhile thing to optimize for handling big JSON arrays of values. But there is always the pull parser for that kind of data processing. Currently not all, but most, conversions between the levels are implemented, and sometimes a level is skipped for efficiency. The question is if it would be worth the effort and the API complexity to implement all of them. lexJSON: character range -> token range parseJSONStream: character range -> node range parseJSONStream: token range -> node range parseJSONValue: character range -> DOM value parseJSONValue: token range -> DOM value (same for toJSONValue) writeJSON: token range -> character range (output range) writeJSON: node range -> character range (output range) writeJSON: DOM value -> character range (output range) writeJSON: to -> character range (output range) (same for toJSON with string output) Adding an InputStream based version of writeJSON would be an option, but the question is how performant that would be and how to go about implementing the number->InputRange functionality.
Jul 29 2015
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 7/29/2015 1:37 AM, Sönke Ludwig wrote:
 There are actually even four levels:
 1. Range of characters
 2. Range of tokens
 3. Range of nodes
 4. DOM value
What's the need for users to see a token stream? I don't know what the DOM value is - is that just JSON as an ast?
 Having a special case for range of DOM values may or may not be a worthwhile
 thing to optimize for handling big JSON arrays of values.
I see no point for that.
 Currently not all, but most, conversions between the levels are implemented,
and
 sometimes a level is skipped for efficiency. The question is if it would be
 worth the effort and the API complexity to implement all of them.

 lexJSON: character range -> token range
 parseJSONStream: character range -> node range
 parseJSONStream: token range -> node range
 parseJSONValue: character range -> DOM value
 parseJSONValue: token range -> DOM value (same for toJSONValue)
 writeJSON: token range -> character range (output range)
 writeJSON: node range -> character range (output range)
 writeJSON: DOM value -> character range (output range)
 writeJSON: to -> character range (output range)
 (same for toJSON with string output)
I don't see why there are more than the 3 I mentioned.
Jul 29 2015
parent reply =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:
Am 29.07.2015 um 20:44 schrieb Walter Bright:
 On 7/29/2015 1:37 AM, Sönke Ludwig wrote:
 There are actually even four levels:
 1. Range of characters
 2. Range of tokens
 3. Range of nodes
 4. DOM value
What's the need for users to see a token stream? I don't know what the DOM value is - is that just JSON as an ast?
Yes.
 Having a special case for range of DOM values may or may not be a
 worthwhile
 thing to optimize for handling big JSON arrays of values.
I see no point for that.
Hm, I misread "container of JSON values" as "range of JSON values". I guess you just meant JSONValue, so my comment doesn't apply.
 Currently not all, but most, conversions between the levels are
 implemented, and
 sometimes a level is skipped for efficiency. The question is if it
 would be
 worth the effort and the API complexity to implement all of them.

 lexJSON: character range -> token range
 parseJSONStream: character range -> node range
 parseJSONStream: token range -> node range
 parseJSONValue: character range -> DOM value
 parseJSONValue: token range -> DOM value (same for toJSONValue)
 writeJSON: token range -> character range (output range)
 writeJSON: node range -> character range (output range)
 writeJSON: DOM value -> character range (output range)
 writeJSON: to -> character range (output range)
 (same for toJSON with string output)
I don't see why there are more than the 3 I mentioned.
The token level is useful for reasoning about the text representation. It could be used for example to implement syntax highlighting, or for using the location information to mark errors in the source code.
Jul 29 2015
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 7/29/2015 1:41 PM, Sönke Ludwig wrote:
 The token level is useful for reasoning about the text representation. It could
 be used for example to implement syntax highlighting, or for using the location
 information to mark errors in the source code.
Ok, I see your point. The question then becomes does the node stream really add enough value to justify its existence, as it greatly overlaps the token stream.
Jul 29 2015
parent =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:
Am 30.07.2015 um 05:25 schrieb Walter Bright:
 On 7/29/2015 1:41 PM, Sönke Ludwig wrote:
 The token level is useful for reasoning about the text representation.
 It could
 be used for example to implement syntax highlighting, or for using the
 location
 information to mark errors in the source code.
Ok, I see your point. The question then becomes does the node stream really add enough value to justify its existence, as it greatly overlaps the token stream.
I agree that in case of JSON their difference can be a bit subtle. Basically the node stream adds knowledge about the nesting of elements, as well as adding semantic meaning to special token sequences that the library users would otherwise have to parse themselves. Finally, it also guarantees a valid JSON structure, while a token range could have tokens in any order. Especially the knowledge about nesting is also a requirement for the high-level pull parser functions (skipToKey, readArray, readString etc.) that make working with that kind of pull parser interface actually bearable, outside of mechanic code like a generic serialization framework.
Jul 30 2015
prev sibling next sibling parent =?windows-1252?Q?S=F6nke_Ludwig?= <sludwig rejectedsoftware.com> writes:
Am 29.07.2015 um 00:37 schrieb H. S. Teoh via Digitalmars-d:
 On Tue, Jul 28, 2015 at 03:29:02PM -0700, Walter Bright via Digitalmars-d
wrote:
 [...]
 3. Stepping back a bit, when I think of parsing JSON data, I think:

      auto ast = inputrange.toJSON();

 where toJSON() accepts an input range and produces a container, the
 ast. The ast is just a JSON value. Then, I can just query the ast to
 see what kind of value it is (using overloading), and walk it as
 necessary.
+1. The API should be as simple as possible.
http://s-ludwig.github.io/std_data_json/stdx/data/json/parser/toJSONValue.html
 Ideally, I'd say hook it up to std.conv.to for maximum flexibility. Then
 you can just use to() to convert between a JSON container and the value
 that it represents (assuming the types are compatible).
We could maybe do that if we keep the current JSONValue as a struct wrapper around Algebraic. But it I guess that this will create an ambiguity between JSONValue("...") creating parsing a JSON string, or being constructed as a JSON string value. Or does to! hook up to something else than the constructor?
 OTOH, some people might want the option of parser-driven data processing
 instead (e.g. the JSON data is very large and we don't want to store the
 whole thing in memory at once). I'm not sure what a good API for that
 would be, though.
See http://s-ludwig.github.io/std_data_json/stdx/data/json/parser/parseJSONStream.html and the various UFCS "read" and "skip" functions in http://s-ludwig.github.io/std_data_json/stdx/data/json/parser.html
Jul 29 2015
prev sibling parent Piotr Szturmaj <bncrbme jadamspam.pl> writes:
W dniu 2015-07-29 o 00:37, H. S. Teoh via Digitalmars-d pisze:
 On Tue, Jul 28, 2015 at 03:29:02PM -0700, Walter Bright via Digitalmars-d
wrote:
 [...]
 3. Stepping back a bit, when I think of parsing JSON data, I think:

      auto ast = inputrange.toJSON();

 where toJSON() accepts an input range and produces a container, the
 ast. The ast is just a JSON value. Then, I can just query the ast to
 see what kind of value it is (using overloading), and walk it as
 necessary.
+1. The API should be as simple as possible. Ideally, I'd say hook it up to std.conv.to for maximum flexibility. Then you can just use to() to convert between a JSON container and the value that it represents (assuming the types are compatible). OTOH, some people might want the option of parser-driven data processing instead (e.g. the JSON data is very large and we don't want to store the whole thing in memory at once). I'm not sure what a good API for that would be, though.
Here's mine range based parser, you can parse 1 TB json file without a single allocation. It needs heavy polishing, but I didnt have time/need to do it. Basically a WIP, but maybe someone will find it useful. https://github.com/pszturmaj/json-streaming-parser
Jul 29 2015
prev sibling next sibling parent reply =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:
Am 29.07.2015 um 00:29 schrieb Walter Bright:
 On 7/28/2015 7:07 AM, Atila Neves wrote:
 Start of the two week process, folks.
Thank you very much, Sönke, for taking this on. Thank you, Atila, for taking on the thankless job of being review manager. Just looking at the documentation only, some general notes: 1. Not sure that 'JSON' needs to be embedded in the public names. 'parseJSONStream' should just be 'parseStream', etc. Name disambiguation, if needed, should be ably taken care of by a number of D features for that purpose. Additionally, I presume that the stdx.data package implies a number of different formats. These formats should all use the same names with as similar as possible APIs - this won't work too well if JSON is embedded in the APIs.
This is actually one of my pet peeves. Having a *readable* API that tells the reader immediately what happens is IMO one of the most important aspects (far more important than an API that allows quick typing). A number of times I've seen D code that omits part of what it actually does in its name and the result was that it was constantly necessary to scroll up to see where a particular name might come from. So I have a strong preference to keep "JSON", because it's an integral part of the semantics.
 2. JSON is a trivial format, http://json.org/. But I count 6 files and
 30 names in the public API.
The whole thing provides a stream parser with high level helpers to make it convenient to use, a DOM module, a separate lexer and a generator module that operates in various different modes (maybe two additional modes still to come!). Every single function provides real and frequently useful benefits. So if anything, there are still some little things missing. All in all, even if JSON may be a simple format, the source code is already almost 5k LOC (includes unit tests of course). But apart from maintainability they have mainly been separated to minimize the amount of code that needs to be dragged in for a particular functionality (not only other JSON modules, but also from different parts of Phobos).
 3. Stepping back a bit, when I think of parsing JSON data, I think:

      auto ast = inputrange.toJSON();

 where toJSON() accepts an input range and produces a container, the ast.
 The ast is just a JSON value. Then, I can just query the ast to see what
 kind of value it is (using overloading), and walk it as necessary.
We can drop the "Value" part of the name of course, if we expect that function to be used a lot, but there is still the parseJSONStream function which is arguably not less important. BTW, you just mentioned the DOM part so far, but for any code that where performance is a priority, the stream based pull parser is basically the way to go. This would also be the natural entry point for any serialization library. And my prediction is, if we do it right, that working with JSON will in most cases simply mean "S s = deserializeJSON(json_input);", where S is a D struct that gets populated with the deserialized JSON data. Where that doesn't fit, performance oriented code would use the pull parser. So the DOM part of the system, which is the only thing the current JSON module has, will only be left as a niche functionality.
 To create output:

      auto r = ast.toChars();  // r is an InputRange of characters
      writeln(r);
Do we have an InputRange version of the various number-to-string conversions? It would be quite inconvenient to reinvent those (double, long, BigInt) in the JSON package. Of course, using to!string internally would be an option, but it would obviously destroy all nogc opportunities and performance benefits.
 So, we'll need:
      toJSON
      toChars
      JSONException

 The possible JSON values are:
      string
      number
      object (associative arrays)
      array
      true
      false
      null

 Since these are D builtin types, they can actually be a simple union of
 D builtin types.
The idea is to have JSONValue be a simple alias to Algebraic!(...), just that there are currently still some workarounds for DMD < 2.067.0 on top, which means that JSONValue is a struct that "alias this" inherits from Algebraic for the time being. Those workarounds will be removed when the code is actually put into Phobos. But a simple union would obviously not be enough, it still needs a type tag of some form and needs to provide a safe interface on top of it. Algebraic is the only thing that comes close right now, but I'd really prefer to have a fully statically typed version of Algebraic that uses an enum as the type tag instead of working with delegates/typeinfo.
 There is a decision needed about whether toJSON() allocates data or
 returns slices into its inputrange. This can be 'static if' tested by:
 if inputrange can return immutable slices.
The test is currently "is(T == string) || is (T == immutable(ubyte)[])", but slicing is done in those cases and the non-DOM parser interface is even nogc as long as exceptions are disabled.
 toChars() can take a compile
 time argument to determine if it is 'pretty' or not.
As long as JSON DOM values are stored in a generic Algebraic (which is a huge win in terms of interoperability!), toChars won't suffice as a name. It would have to be toJSON(Chars) (as it basically is now). I've gave the "pretty" version a separate name simply because it's more convenient to use and pretty printing will probably be by far the most frequently used option when converting to a string.
Jul 29 2015
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 7/29/2015 1:18 AM, Sönke Ludwig wrote:
 Am 29.07.2015 um 00:29 schrieb Walter Bright:
 1. Not sure that 'JSON' needs to be embedded in the public names.
 'parseJSONStream' should just be 'parseStream', etc. Name
 disambiguation, if needed, should be ably taken care of by a number of D
 features for that purpose. Additionally, I presume that the stdx.data
 package implies a number of different formats. These formats should all
 use the same names with as similar as possible APIs - this won't work
 too well if JSON is embedded in the APIs.
This is actually one of my pet peeves. Having a *readable* API that tells the reader immediately what happens is IMO one of the most important aspects (far more important than an API that allows quick typing). A number of times I've seen D code that omits part of what it actually does in its name and the result was that it was constantly necessary to scroll up to see where a particular name might come from. So I have a strong preference to keep "JSON", because it's an integral part of the semantics.
I agree with your goal of readability. And if someone wants to write code that emphasizes it's JSON, they can write it as std.data.json.parseStream. (It's not about saving typing, it's about avoiding extra redundant redundancy, I'm a big fan of Strunk & White :-) ) This is not a huge deal for me, but I'm not in favor of establishing a new convention that repeats the module name. It eschews one of the advantages of having module name spaces in the first place, and evokes the old C style naming conventions.
 2. JSON is a trivial format, http://json.org/. But I count 6 files and
 30 names in the public API.
The whole thing provides a stream parser with high level helpers to make it convenient to use, a DOM module, a separate lexer and a generator module that operates in various different modes (maybe two additional modes still to come!). Every single function provides real and frequently useful benefits. So if anything, there are still some little things missing.
I understand there is a purpose to each of those things, but there's also considerable value in a simpler API.
 All in all, even if JSON may be a simple format, the source code is already
 almost 5k LOC (includes unit tests of course).
I don't count unit tests as LOC :-)
 But apart from maintainability
 they have mainly been separated to minimize the amount of code that needs to be
 dragged in for a particular functionality (not only other JSON modules, but
also
 from different parts of Phobos).
They are so strongly related I don't see this as a big issue. Also, if they are templates, they don't get compiled in if not used.
 3. Stepping back a bit, when I think of parsing JSON data, I think:

      auto ast = inputrange.toJSON();

 where toJSON() accepts an input range and produces a container, the ast.
 The ast is just a JSON value. Then, I can just query the ast to see what
 kind of value it is (using overloading), and walk it as necessary.
We can drop the "Value" part of the name of course, if we expect that function to be used a lot, but there is still the parseJSONStream function which is arguably not less important. BTW, you just mentioned the DOM part so far, but for any code that where performance is a priority, the stream based pull parser is basically the way to go. This would also be the natural entry point for any serialization library.
Agreed elsewhere. But still, I am not seeing a range interface on the functions. The lexer, for example, does not accept an input range of characters. Having a range interface is absolutely critical, and is the thing I am the most adamant about with all new Phobos additions. Any function that accepts arbitrarily long data should accept an input range instead, any function that generates arbitrary data should present that as an input range. Any function that builds a container should accept an input range to fill that container with. Any function that builds a container should also be an output range.
 And my prediction is, if we do it right, that working with JSON will in most
 cases simply mean "S s = deserializeJSON(json_input);", where S is a D struct
 that gets populated with the deserialized JSON data.
json_input must be a input range of characters.
 Where that doesn't fit,
 performance oriented code would use the pull parser.
I am not sure what you mean by 'pull parser'. Do you mean the parser presents an input range as its output, and incrementally parses only as the next value is requested?
 So the DOM part of the
 system, which is the only thing the current JSON module has, will only be left
 as a niche functionality.
That's ok. Is it normal practice to call the JSON data structure a Document Object Model?
 To create output:

      auto r = ast.toChars();  // r is an InputRange of characters
      writeln(r);
Do we have an InputRange version of the various number-to-string conversions?
We do now, at least for integers. I plan to do ones for floating point.
 It would be quite inconvenient to reinvent those (double, long, BigInt) in the
JSON
 package.
Right. It's been reinvented multiple times in Phobos, which is absurd. If you're reinventing them in std.data.json, then we're doing something wrong again.
 Of course, using to!string internally would be an option, but it would
 obviously destroy all  nogc opportunities and performance benefits.
That's exactly why the range versions were done.
 So, we'll need:
      toJSON
      toChars
      JSONException

 The possible JSON values are:
      string
      number
      object (associative arrays)
      array
      true
      false
      null

 Since these are D builtin types, they can actually be a simple union of
 D builtin types.
The idea is to have JSONValue be a simple alias to Algebraic!(...), just that there are currently still some workarounds for DMD < 2.067.0 on top, which means that JSONValue is a struct that "alias this" inherits from Algebraic for the time being. Those workarounds will be removed when the code is actually put into Phobos. But a simple union would obviously not be enough, it still needs a type tag of some form and needs to provide a safe interface on top of it.
Agreed.
 Algebraic is the only thing that comes close right now,
 but I'd really prefer to have a fully
 statically typed version of Algebraic that uses an enum as the type tag instead
 of working with delegates/typeinfo.
If Algebraic is not good enough for this, it is a failure and must be fixed.
 There is a decision needed about whether toJSON() allocates data or
 returns slices into its inputrange. This can be 'static if' tested by:
 if inputrange can return immutable slices.
The test is currently "is(T == string) || is (T == immutable(ubyte)[])", but slicing is done in those cases and the non-DOM parser interface is even nogc as long as exceptions are disabled.
With a range interface, you can test for 1) hasSlicing and 2) if ElementEncodingType is immutable. Why is ubyte being accepted? The ECMA-404 spec sez: "Conforming JSON text is a sequence of Unicode code points".
 toChars() can take a compile
 time argument to determine if it is 'pretty' or not.
As long as JSON DOM values are stored in a generic Algebraic (which is a huge win in terms of interoperability!), toChars won't suffice as a name.
Why not?
 It would have to be toJSON(Chars) (as it basically is now). I've gave the
"pretty"
 version a separate name simply because it's more convenient to use and pretty
 printing will probably be by far the most frequently used option when
converting
 to a string.
So make pretty printing the default. In fact, I'm skeptical that a non-pretty printed version is worth while. Note that an adapter algorithm can strip redundant whitespace.
Jul 29 2015
next sibling parent reply "Suliman" <evermind live.ru> writes:
If this implementation will be merged with phobos will vibed 
migrate to it, or it would two similar libs?
Jul 30 2015
parent =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:
Am 30.07.2015 um 09:27 schrieb Suliman:
 If this implementation will be merged with phobos will vibed migrate to
 it, or it would two similar libs?
I'll then make the vibe.d JSON module compatible using "alias this" implicit conversions and then deprecate it over a longer period of time before it gets removed. And of course the serialization framework will be adjusted to work with the new JSON module.
Jul 30 2015
prev sibling next sibling parent reply "Brad Anderson" <eco gnuk.net> writes:
On Thursday, 30 July 2015 at 04:41:51 UTC, Walter Bright wrote:
 I agree with your goal of readability. And if someone wants to 
 write code that emphasizes it's JSON, they can write it as 
 std.data.json.parseStream. (It's not about saving typing, it's 
 about avoiding extra redundant redundancy, I'm a big fan of 
 Strunk & White :-) ) This is not a huge deal for me, but I'm 
 not in favor of establishing a new convention that repeats the 
 module name. It eschews one of the advantages of having module 
 name spaces in the first place, and evokes the old C style 
 naming conventions.
Is there any reason why D doesn't allow json.parseStream() in this case? I remember the requirement of having the full module path being my first head scratcher while learning D. The first example in TDPL had some source code that called split() (if memory serves) and phobos had changed since the book was written and you needed to disambiguate. I found it very odd that you have type the whole thing when just the next level up would suffice to disambiguate it. The trend seems to be toward more deeply nested modules in Phobos so having to type the full path will increasingly be a wart of D's. If we can't have the minimal necessary module paths then I'm completely in favor of parseJSONStream over the more general parseStream. I want that "json" in there one way or another (preferably by the method which makes it optional while maintaining brevity).
Jul 30 2015
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 7/30/2015 9:58 AM, Brad Anderson wrote:
 If we can't have the minimal necessary module paths then I'm completely in
favor
 of parseJSONStream over the more general parseStream. I want that "json" in
 there one way or another (preferably by the method which makes it optional
while
 maintaining brevity).
I would think it unlikely to be parsing two different formats in one file. But in any case, you can always do this: import std.data.json : parseJSON = parse; Or put the import in a scope: void doNaughtyThingsWithJson() { import std.data.json; ... x.parse(); } The latter seems to be becoming the preferred D style.
Jul 30 2015
parent reply "H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:
On Thu, Jul 30, 2015 at 12:43:40PM -0700, Walter Bright via Digitalmars-d wrote:
 On 7/30/2015 9:58 AM, Brad Anderson wrote:
If we can't have the minimal necessary module paths then I'm
completely in favor of parseJSONStream over the more general
parseStream. I want that "json" in there one way or another
(preferably by the method which makes it optional while maintaining
brevity).
I would think it unlikely to be parsing two different formats in one file. But in any case, you can always do this: import std.data.json : parseJSON = parse; Or put the import in a scope: void doNaughtyThingsWithJson() { import std.data.json; ... x.parse(); } The latter seems to be becoming the preferred D style.
Yeah, local imports are fast becoming my preferred D coding style, because it makes code portable -- if you move a function to a new module, you don't have to untangle its import dependencies if all imports are local. It's one of those little, overlooked things about D that contribute toward making it an awesome language. T -- Written on the window of a clothing store: No shirt, no shoes, no service.
Jul 30 2015
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 7/30/2015 12:57 PM, H. S. Teoh via Digitalmars-d wrote:
 Yeah, local imports are fast becoming my preferred D coding style,
 because it makes code portable -- if you move a function to a new
 module, you don't have to untangle its import dependencies if all
 imports are local. It's one of those little, overlooked things about D
 that contribute toward making it an awesome language.
Funny how my preferred D style of writing code is steadily diverging from C++ style :-)
Jul 30 2015
parent reply "H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:
On Thu, Jul 30, 2015 at 01:26:17PM -0700, Walter Bright via Digitalmars-d wrote:
 On 7/30/2015 12:57 PM, H. S. Teoh via Digitalmars-d wrote:
Yeah, local imports are fast becoming my preferred D coding style,
because it makes code portable -- if you move a function to a new
module, you don't have to untangle its import dependencies if all
imports are local. It's one of those little, overlooked things about
D that contribute toward making it an awesome language.
Funny how my preferred D style of writing code is steadily diverging from C++ style :-)
One would hope so, otherwise why are we here instead of in the C++ world? ;-) T -- This is not a sentence.
Jul 30 2015
parent reply "Suliman" <evermind live.ru> writes:
is the current build is ready for production? I am getting error:

source\stdx\data\json\value.d(81): Error: safe function 
'stdx.data.json.value.JSONValue.this' cannot call system function 
'std.variant.VariantN!(12u, typeof(null), bool, double, long, 
BigInt, string, JSONValue[], 
JSONValue[string]).VariantN.__ctor!(typeof(null)).this'
Jul 31 2015
parent reply =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:
Am 31.07.2015 um 10:13 schrieb Suliman:
 is the current build is ready for production? I am getting error:

 source\stdx\data\json\value.d(81): Error: safe function
 'stdx.data.json.value.JSONValue.this' cannot call system function
 'std.variant.VariantN!(12u, typeof(null), bool, double, long, BigInt,
 string, JSONValue[],
 JSONValue[string]).VariantN.__ctor!(typeof(null)).this'
2.068 "fixed" possible safety issues with VariantN by marking the interface system instead of trusted. Unfortunately that broke any safe code using Variant/Algebraic.
Jul 31 2015
parent reply "Suliman" <evermind live.ru> writes:
On Friday, 31 July 2015 at 12:16:02 UTC, Sönke Ludwig wrote:
 Am 31.07.2015 um 10:13 schrieb Suliman:
 is the current build is ready for production? I am getting 
 error:

 source\stdx\data\json\value.d(81): Error: safe function
 'stdx.data.json.value.JSONValue.this' cannot call system 
 function
 'std.variant.VariantN!(12u, typeof(null), bool, double, long, 
 BigInt,
 string, JSONValue[],
 JSONValue[string]).VariantN.__ctor!(typeof(null)).this'
2.068 "fixed" possible safety issues with VariantN by marking the interface system instead of trusted. Unfortunately that broke any safe code using Variant/Algebraic.
Wat revision are usable? I checked some and all have issue like: source\App.d(5,34): Error: template stdx.data.json.parser.parseJSONValue cannot deduce function from argument types !()(string), candidates are: source\stdx\data\json\parser.d(105,11): stdx.data.json.parser.parseJSONVa lue(LexOptions options = LexOptions.init, Input)(ref Input input, string filenam e = "") if (isStringInputRange!Input || isIntegralInputRange!Input)
Jul 31 2015
parent reply =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:
Am 31.07.2015 um 22:15 schrieb Suliman:
 On Friday, 31 July 2015 at 12:16:02 UTC, Sönke Ludwig wrote:
 Am 31.07.2015 um 10:13 schrieb Suliman:
 is the current build is ready for production? I am getting error:

 source\stdx\data\json\value.d(81): Error: safe function
 'stdx.data.json.value.JSONValue.this' cannot call system function
 'std.variant.VariantN!(12u, typeof(null), bool, double, long, BigInt,
 string, JSONValue[],
 JSONValue[string]).VariantN.__ctor!(typeof(null)).this'
2.068 "fixed" possible safety issues with VariantN by marking the interface system instead of trusted. Unfortunately that broke any safe code using Variant/Algebraic.
Wat revision are usable? I checked some and all have issue like: source\App.d(5,34): Error: template stdx.data.json.parser.parseJSONValue cannot deduce function from argument types !()(string), candidates are: source\stdx\data\json\parser.d(105,11): stdx.data.json.parser.parseJSONVa lue(LexOptions options = LexOptions.init, Input)(ref Input input, string filenam e = "") if (isStringInputRange!Input || isIntegralInputRange!Input)
parseJSONValue takes a reference to an input range, so that it can consume the input and leave any trailing text after the JSON value in the range. For just converting a string to a JSONValue, use toJSONValue instead. I'll make this more clear in the documentation.
Aug 01 2015
parent reply "Suliman" <evermind live.ru> writes:
 parseJSONValue takes a reference to an input range, so that it 
 can consume the input and leave any trailing text after the 
 JSON value in the range. For just converting a string to a 
 JSONValue, use toJSONValue instead.

 I'll make this more clear in the documentation.
Yes please, because it's hard to understand difference. Maybe it's possible to simplify it more? Also I get trouble with extracting value: response = toJSONValue(res.bodyReader.readAllUTF8()); writeln(to!int(response["code"])); C:\D\dmd2\windows\bin\..\..\src\phobos\std\conv.d(295,24): Error: template std.c onv.toImpl cannot deduce function from argument types !(int)(VariantN!20u), cand idates are: C:\D\dmd2\windows\bin\..\..\src\phobos\std\conv.d(361,3): std.conv.toImpl (T, S)(S value) if (isImplicitlyConvertible!(S, T) && !isEnumStrToStr!(S, T) && !isNullToStr!(S, T)) If I am doing simple: writeln(response["code"]); Code produce right result (for example 200) What value consist in key "code"? It's look like not simple "200". How I can convert it to string or int?
Aug 01 2015
parent reply "Suliman" <evermind live.ru> writes:
Look like it's Variant type. So I tried to use method get! do 
extract value from it
writeln(get!(response["code"]));

But I get error: Error: variable response cannot be read at 
compile time
Aug 01 2015
parent reply =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:
Am 01.08.2015 um 16:15 schrieb Suliman:
 Look like it's Variant type. So I tried to use method get! do extract
 value from it
 writeln(get!(response["code"]));

 But I get error: Error: variable response cannot be read at compile time
The correct syntax is: response["code"].get!int
Aug 01 2015
next sibling parent "Suliman" <evermind live.ru> writes:
On Saturday, 1 August 2015 at 14:52:55 UTC, Sönke Ludwig wrote:
 Am 01.08.2015 um 16:15 schrieb Suliman:
 Look like it's Variant type. So I tried to use method get! do 
 extract
 value from it
 writeln(get!(response["code"]));

 But I get error: Error: variable response cannot be read at 
 compile time
The correct syntax is: response["code"].get!int
Thanks! But How to get access to elements that in result: {} for example to: "name":"_system" {"result":{"name":"_system","id":"76067","path":"database-6067","isSystem":true},"error":false,"code":200} Could you also extend docs with code example.
Aug 01 2015
prev sibling parent "Suliman" <evermind live.ru> writes:
On Saturday, 1 August 2015 at 14:52:55 UTC, Sönke Ludwig wrote:
 Am 01.08.2015 um 16:15 schrieb Suliman:
 Look like it's Variant type. So I tried to use method get! do 
 extract
 value from it
 writeln(get!(response["code"]));

 But I get error: Error: variable response cannot be read at 
 compile time
The correct syntax is: response["code"].get!int
connectInfo.statusCode = response["code"].get!int; std.variant.VariantException std\variant.d(1445): Variant: attempting to use incompatible types stdx.data.json.value.JSONValue and int
Aug 01 2015
prev sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2015-07-30 06:41, Walter Bright wrote:

 I agree with your goal of readability. And if someone wants to write
 code that emphasizes it's JSON, they can write it as
 std.data.json.parseStream. (It's not about saving typing, it's about
 avoiding extra redundant redundancy, I'm a big fan of Strunk & White :-)
 ) This is not a huge deal for me, but I'm not in favor of establishing a
 new convention that repeats the module name. It eschews one of the
 advantages of having module name spaces in the first place, and evokes
 the old C style naming conventions.
I kind of agree with that, but at the same time, if one always need to use the fully qualified name (or an alias) because there's a conflict then that's quite annoying. A prefect example of that is the Path module in Tango. It has functions as "split" and "join". Every time I use it I alias the import: import Path = tango.io.Path; Because otherwise it will conflict with the string manipulating functions with the same names. In Phobos the names in the path module are different compared to the string functions. For example, I think "Value" and "parse" are too generic to not include "JSON" in their name. -- /Jacob Carlborg
Jul 30 2015
parent reply Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:
On 07/30/2015 02:40 PM, Jacob Carlborg wrote:
 On 2015-07-30 06:41, Walter Bright wrote:

 I agree with your goal of readability. And if someone wants to write
 code that emphasizes it's JSON, they can write it as
 std.data.json.parseStream. (It's not about saving typing, it's about
 avoiding extra redundant redundancy, I'm a big fan of Strunk & White :-)
 ) This is not a huge deal for me, but I'm not in favor of establishing a
 new convention that repeats the module name. It eschews one of the
 advantages of having module name spaces in the first place, and evokes
 the old C style naming conventions.
I kind of agree with that, but at the same time, if one always need to use the fully qualified name (or an alias) because there's a conflict then that's quite annoying.
It also fucks up UFCS, and I'm a huge fan of UFCS. I do agree that D's module system is awesome here and worth taking advantage of to avoid C++-style naming conventions, but I still think balance is needed. Sometimes, just because we can use a shorter potentially-conflicting name doesn't mean we necessarily should.
Aug 21 2015
parent reply "David Nadlinger" <code klickverbot.at> writes:
On Friday, 21 August 2015 at 15:58:22 UTC, Nick Sabalausky wrote:
 It also fucks up UFCS, and I'm a huge fan of UFCS.
Are you saying that "import json : parseJSON = parse; foo.parseJSON.bar;" does not work? – David
Aug 21 2015
parent Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:
On 08/21/2015 12:29 PM, David Nadlinger wrote:
 On Friday, 21 August 2015 at 15:58:22 UTC, Nick Sabalausky wrote:
 It also fucks up UFCS, and I'm a huge fan of UFCS.
Are you saying that "import json : parseJSON = parse; foo.parseJSON.bar;" does not work?
Ok, fair point, although I was referring more to fully-qualified name lookups, as in the snippet I quoted from Jacob. Ie, this doesn't work: someJsonCode.std.json.parse(); I do think though, generally speaking, if there is much need to do a renamed import, the symbol in question probably didn't have the best name in the first place. Renamed importing is a great feature to have, but when you see it used it raises the question "*Why* is this being renamed? Why not just use it's real name?" For the most part, I see two main reasons: 1. "Just because. I like this bikeshed color better." But this is merely a code smell, not a legitimate reason to even bother. or 2. The symbol has a questionable name in the first place. If there's reason to even bring up renamed imports as a solution, then it's probably falling into the "questionably named" category. Just because we CAN use D's module system and renamed imports and such to clear up ambiguities, doesn't mean we should let ourselves take things TOO far to the opposite extreme when avoiding C/C++'s "big long ugly names as a substitute for modules". Like Walter, I do very much dislike C/C++'s super-long, super-unambiguous names. But IMO, preferring parseStream over parseJSONStream isn't a genuine case of avoiding C/C++-style naming, it's just being overrun by fear of C/C++-style naming and thus taking things too far to the opposite extreme. We can strike a better balance than choosing between "brief and unclear-at-a-glance" and "C++-level verbosity". Yea, we CAN do "import std.json : parseJSONStream = parseStream;", but if there's even any motivation to do so in the first place, we may as well just use the better name right from the start. Besides, those who prefer ultra-brevity are free to paint their bikesheds with renamed imports, too ;)
Aug 22 2015
prev sibling parent reply "Don" <prosthetictelevisions teletubby.medical.com> writes:
On Tuesday, 28 July 2015 at 22:29:01 UTC, Walter Bright wrote:
 On 7/28/2015 7:07 AM, Atila Neves wrote:
 Start of the two week process, folks.
Thank you very much, Sönke, for taking this on. Thank you, Atila, for taking on the thankless job of being review manager. Just looking at the documentation only, some general notes: 1. Not sure that 'JSON' needs to be embedded in the public names. 'parseJSONStream' should just be 'parseStream', etc. Name disambiguation, if needed, should be ably taken care of by a number of D features for that purpose. Additionally, I presume that the stdx.data package implies a number of different formats. These formats should all use the same names with as similar as possible APIs - this won't work too well if JSON is embedded in the APIs. 2. JSON is a trivial format, http://json.org/. But I count 6 files and 30 names in the public API. 3. Stepping back a bit, when I think of parsing JSON data, I think: auto ast = inputrange.toJSON(); where toJSON() accepts an input range and produces a container, the ast. The ast is just a JSON value. Then, I can just query the ast to see what kind of value it is (using overloading), and walk it as necessary. To create output: auto r = ast.toChars(); // r is an InputRange of characters writeln(r); So, we'll need: toJSON toChars JSONException The possible JSON values are: string number object (associative arrays) array true false null Since these are D builtin types, they can actually be a simple union of D builtin types.
Related to this: it should not be importing std.bigint. Note that if std.bigint were fully implemented, it would be very heavyweight (optimal multiplication of enormous integers involves fast fourier transforms and all kinds of odd stuff, that's really bizarre to pull in if you're just parsing a trivial little JSON config file). Although it is possible for JSON to contain numbers which are larger than can fit into long or ulong, it's an abnormal case. Many apps (probably, almost all) will want to reject such numbers immediately. BigInt should be opt-in. And, it is also possible to have floating point numbers that are not representable in double or real. BigInt doesn't solve that case. It might be adequate to simply present it as a raw number (an unconverted string) if it isn't a built-in type. Parse it for validity, but don't actually convert it.
Jul 29 2015
next sibling parent reply "H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:
On Wed, Jul 29, 2015 at 03:22:05PM +0000, Don via Digitalmars-d wrote:
 On Tuesday, 28 July 2015 at 22:29:01 UTC, Walter Bright wrote:
[...]
The possible JSON values are:
    string
    number
    object (associative arrays)
    array
    true
    false
    null

Since these are D builtin types, they can actually be a simple union
of D builtin types.
Related to this: it should not be importing std.bigint. Note that if std.bigint were fully implemented, it would be very heavyweight (optimal multiplication of enormous integers involves fast fourier transforms and all kinds of odd stuff, that's really bizarre to pull in if you're just parsing a trivial little JSON config file). Although it is possible for JSON to contain numbers which are larger than can fit into long or ulong, it's an abnormal case. Many apps (probably, almost all) will want to reject such numbers immediately. BigInt should be opt-in. And, it is also possible to have floating point numbers that are not representable in double or real. BigInt doesn't solve that case. It might be adequate to simply present it as a raw number (an unconverted string) if it isn't a built-in type. Parse it for validity, but don't actually convert it.
[...] Here's a thought: what about always storing JSON numbers as strings (albeit tagged with the "number" type, to differentiate them from actual strings in the input), and the user specifies what type to convert it to? The default type can be something handy, like int, but the user has the option to ask for size_t, or double, or even BigInt if they want (IIRC, the BigInt ctor can initialize an instance from a digit string, so if we adopt the convention that non-built-in number-like types can be initialized from digit strings, then std.json can simply take a template parameter for the output type, and hand it the digit string. This way, we can get rid of the std.bigint dependency, except where the user actually wants to use BigInt.) T -- Be in denial for long enough, and one day you'll deny yourself of things you wish you hadn't.
Jul 29 2015
next sibling parent reply "Laeeth Isharc" <laeethnospam nospamlaeeth.com> writes:
 Here's a thought: what about always storing JSON numbers as 
 strings (albeit tagged with the "number" type, to differentiate 
 them from actual strings in the input), and the user specifies 
 what type to convert it to?  The default type can be something 
 handy, like int, but the user has the option to ask for size_t, 
 or double, or even BigInt if they want (IIRC, the BigInt ctor 
 can initialize an instance from a digit string, so if we adopt 
 the convention that non-built-in number-like types can be 
 initialized from digit strings, then std.json can simply take a 
 template parameter for the output type, and hand it the digit 
 string. This way, we can get rid of the std.bigint dependency, 
 except where the user actually wants to use BigInt.)
Some JSON files can be quite large... For example, I have a compressed 175 Gig of Reddit comments (one file per month) I would like to work with using D, and time + memory demands = money. Wouldn't it be a pain not to store numbers directly when parsing in those cases (if I understood you correctly)?
Jul 29 2015
parent "sigod" <sigod.mail gmail.com> writes:
On Wednesday, 29 July 2015 at 17:04:33 UTC, Laeeth Isharc wrote:
 [...]
Some JSON files can be quite large... For example, I have a compressed 175 Gig of Reddit comments (one file per month) I would like to work with using D, and time + memory demands = money. Wouldn't it be a pain not to store numbers directly when parsing in those cases (if I understood you correctly)?
I think in your case it wouldn't matter. Comments are text, mostly. There's probably just one or two fields with "number" type.
Jul 29 2015
prev sibling parent reply =?windows-1252?Q?S=F6nke_Ludwig?= <sludwig rejectedsoftware.com> writes:
Am 29.07.2015 um 18:47 schrieb H. S. Teoh via Digitalmars-d:
 On Wed, Jul 29, 2015 at 03:22:05PM +0000, Don via Digitalmars-d wrote:
 On Tuesday, 28 July 2015 at 22:29:01 UTC, Walter Bright wrote:
[...]
 The possible JSON values are:
     string
     number
     object (associative arrays)
     array
     true
     false
     null

 Since these are D builtin types, they can actually be a simple union
 of D builtin types.
Related to this: it should not be importing std.bigint. Note that if std.bigint were fully implemented, it would be very heavyweight (optimal multiplication of enormous integers involves fast fourier transforms and all kinds of odd stuff, that's really bizarre to pull in if you're just parsing a trivial little JSON config file). Although it is possible for JSON to contain numbers which are larger than can fit into long or ulong, it's an abnormal case. Many apps (probably, almost all) will want to reject such numbers immediately. BigInt should be opt-in. And, it is also possible to have floating point numbers that are not representable in double or real. BigInt doesn't solve that case. It might be adequate to simply present it as a raw number (an unconverted string) if it isn't a built-in type. Parse it for validity, but don't actually convert it.
[...] Here's a thought: what about always storing JSON numbers as strings (albeit tagged with the "number" type, to differentiate them from actual strings in the input), and the user specifies what type to convert it to? The default type can be something handy, like int, but the user has the option to ask for size_t, or double, or even BigInt if they want (IIRC, the BigInt ctor can initialize an instance from a digit string, so if we adopt the convention that non-built-in number-like types can be initialized from digit strings, then std.json can simply take a template parameter for the output type, and hand it the digit string. This way, we can get rid of the std.bigint dependency, except where the user actually wants to use BigInt.) T
That means a performance hit, because the string has to be parsed twice - once for validation and once for conversion. And it means that for non-string inputs the lexer has to allocate for each number. It also doesn't know the length of the number in advance, so it can't allocate in a generally efficient way.
Jul 29 2015
parent reply "matovitch" <camille.brugel laposte.net> writes:
Hi Sonke,

Great to see your module moving towards phobos inclusion (I have 
not been following the latest progress of D sadly :() ! Just a 
small remark from the documentation example.

Maybe it would be better to replace :

     value.toJSONString!true()

by

     value.toJSONString!prettify()

using a well-named enum instead of a boolean which could seem 
obscure I now Eigen C++ lib use a similar thing for static vs 
dynamic matrix.

Thanks for the read. Regards,

matovitch
Jul 29 2015
parent =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:
Am 29.07.2015 um 20:21 schrieb matovitch:
 Hi Sonke,

 Great to see your module moving towards phobos inclusion (I have not
 been following the latest progress of D sadly :() ! Just a small remark
 from the documentation example.

 Maybe it would be better to replace :

      value.toJSONString!true()

 by

      value.toJSONString!prettify()

 using a well-named enum instead of a boolean which could seem obscure I
 now Eigen C++ lib use a similar thing for static vs dynamic matrix.

 Thanks for the read. Regards,

 matovitch
Hm, that example is outdated, I'll fix it ASAP. Currently it uses toJSON and a separate toPrettyJSON function. An obvious alternative would be to add an entry GeneratorOptions.prettify, because toJSON already takes that as a template argument: toJSON!(GeneratorOptions.prettify)
Jul 29 2015
prev sibling next sibling parent =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:
Am 29.07.2015 um 17:22 schrieb Don:
 Related to this: it should not be importing std.bigint. Note that if
 std.bigint were fully implemented, it would be very heavyweight (optimal
 multiplication of enormous integers involves fast fourier transforms and
 all kinds of odd stuff, that's really bizarre to pull in if you're just
 parsing a trivial little JSON config file).

 Although it is possible for JSON to contain numbers which are larger
 than can fit into long or ulong, it's an abnormal case. Many apps
 (probably, almost all) will want to reject such numbers immediately.
 BigInt should be opt-in.
BigInt is opt-in, at least as far as the lexer goes. But why would such a number be rejected? Any of the usual floating point parsers would simply parse the number and just lose precision if it can't be represented exactly. And after all, it's still valid JSON. But note that I've only added this due to multiple requests, it doesn't seem to be that uncommon. We *could* in theory make the JSONNumber type a template and make the bigint fields optional. That would be the only thing missing to making the import optional, too.
 And, it is also possible to have floating point numbers that are not
 representable in double or real. BigInt doesn't solve that case.

 It might be adequate to simply present it as a raw number (an
 unconverted string) if it isn't a built-in type. Parse it for validity,
 but don't actually convert it.
If we'd have a Decimal type in Phobos, I would have integrated that, too. The string representation may be an alternative, but since the weight of the import is the main argument, I'd rather choose the more comfortable/logical option - or probably rather try to avoid std.bigint being such a heavy import (such as local imports to defer secondary imports).
Jul 29 2015
prev sibling parent reply "Dmitry Olshansky" <dmitry.olsh gmail.com> writes:
On Wednesday, 29 July 2015 at 15:22:06 UTC, Don wrote:
 On Tuesday, 28 July 2015 at 22:29:01 UTC, Walter Bright wrote:
[snip]
 Related to this: it should not be importing std.bigint. Note 
 that if std.bigint were fully implemented, it would be very 
 heavyweight (optimal multiplication of enormous integers 
 involves fast fourier transforms and all kinds of odd stuff, 
 that's really bizarre to pull in if you're just parsing a 
 trivial little JSON config file).

 Although it is possible for JSON to contain numbers which are 
 larger than can fit into long or ulong, it's an abnormal case. 
 Many apps (probably, almost all) will want to reject such 
 numbers immediately. BigInt should be opt-in.

 And, it is also possible to have floating point numbers that 
 are not representable in double or real. BigInt doesn't solve 
 that case.

 It might be adequate to simply present it as a raw number (an 
 unconverted string) if it isn't a built-in type. Parse it for 
 validity, but don't actually convert it.
Actually JSON is defined as subset of EMCASCript-262 spec hence it may not ciontain anything other 64-bit5 IEEE-754 numbers period. See: http://www.ecma-international.org/ecma-262/6.0/index.html#sec-terms-and-definitions-number-value http://www.ecma-international.org/ecma-262/6.0/index.html#sec-ecmascript-language-types-number-type Anything else is e-hm an "extension" (or simply put - violation of spec), I've certainly seen 64-bit integers in the wild - how often true big ints are found out there? If no one can present some run of the mill REST JSON API breaking the rules I'd suggest demoting BigInt handling to optional feature.
Aug 02 2015
parent reply =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:
Am 02.08.2015 um 19:14 schrieb Dmitry Olshansky:
 Actually JSON is defined as subset of EMCASCript-262 spec hence it may
 not ciontain anything other 64-bit5 IEEE-754 numbers period.
 See:
 http://www.ecma-international.org/ecma-262/6.0/index.html#sec-terms-and-definitions-number-value

 http://www.ecma-international.org/ecma-262/6.0/index.html#sec-ecmascript-language-types-number-type


 Anything else is e-hm an "extension" (or simply put - violation of
 spec), I've certainly seen 64-bit integers in the wild - how often true
 big ints are found out there?

 If no one can present some run of the mill REST JSON API breaking the
 rules I'd suggest demoting BigInt handling to optional feature.
This is not true. Quoting from ECMA-404:
 JSON is a text format that facilitates structured data interchange between all
programming languages. JSON
 is syntax of braces, brackets, colons, and commas that is useful in many
contexts, profiles, and applications.
 JSON  was  inspired  by  the  object  literals  of  JavaScript  aka 
ECMAScript  as  defined  in  the  ECMAScript
 Language   Specification,   third   Edition   [1].
 It  does  not  attempt  to  impose  ECMAScript’s  internal  data
 representations on other programming languages. Instead, it shares a small
subset of ECMAScript’s textual
 representations with all other programming languages.
 JSON  is  agnostic  about  numbers.  In  any  programming  language,  there 
can  be  a  variety  of  number  types  of
 various capacities and complements, fixed or floating, binary or decimal. That
can make interchange between
 different  programming  languages  difficult.  JSON  instead  offers  only 
the  representation  of  numbers  that
 humans use: a sequence  of digits.  All programming languages know  how to
make sense of digit sequences
 even if they disagree on internal representations. That is enough to allow
interchange.
Aug 03 2015
parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 03-Aug-2015 10:56, Sönke Ludwig wrote:
 Am 02.08.2015 um 19:14 schrieb Dmitry Olshansky:
 Actually JSON is defined as subset of EMCASCript-262 spec hence it may
 not ciontain anything other 64-bit5 IEEE-754 numbers period.
 See:
 http://www.ecma-international.org/ecma-262/6.0/index.html#sec-terms-and-definitions-number-value


 http://www.ecma-international.org/ecma-262/6.0/index.html#sec-ecmascript-language-types-number-type



 Anything else is e-hm an "extension" (or simply put - violation of
 spec), I've certainly seen 64-bit integers in the wild - how often true
 big ints are found out there?

 If no one can present some run of the mill REST JSON API breaking the
 rules I'd suggest demoting BigInt handling to optional feature.
This is not true. Quoting from ECMA-404:
 JSON is a text format that facilitates structured data interchange
 between all programming languages. JSON
 is syntax of braces, brackets, colons, and commas that is useful in
 many contexts, profiles, and applications.
 JSON  was  inspired  by  the  object  literals  of  JavaScript  aka
 ECMAScript  as  defined  in  the  ECMAScript
 Language   Specification,   third   Edition   [1].
 It  does  not  attempt  to  impose  ECMAScript’s  internal  data
 representations on other programming languages. Instead, it shares a
 small subset of ECMAScript’s textual
 representations with all other programming languages.
 JSON  is  agnostic  about  numbers.  In  any  programming  language,
 there  can  be  a  variety  of  number  types  of
 various capacities and complements, fixed or floating, binary or
 decimal. That can make interchange between
 different  programming  languages  difficult.  JSON  instead  offers
 only  the  representation  of  numbers  that
 humans use: a sequence  of digits.  All programming languages know
 how to make sense of digit sequences
 even if they disagree on internal representations. That is enough to
 allow interchange.
Hm about 5 solid pages and indeed it leaves everything unspecified for extensibility so I stand corrected. Still I'm more inclined to put my trust in RFCs, such as the new one: http://www.ietf.org/rfc/rfc7159.txt Which states: This specification allows implementations to set limits on the range and precision of numbers accepted. Since software that implements IEEE 754-2008 binary64 (double precision) numbers [IEEE754] is generally available and widely used, good interoperability can be achieved by implementations that expect no more precision or range than these provide, in the sense that implementations will approximate JSON numbers within the expected precision. A JSON number such as 1E400 or 3.141592653589793238462643383279 may indicate potential interoperability problems, since it suggests that the software that created it expects receiving software to have greater capabilities for numeric magnitude and precision than is widely available. Note that when such software is used, numbers that are integers and are in the range [-(2**53)+1, (2**53)-1] are interoperable in the sense that implementations will agree exactly on their numeric values. And it implies setting limits on everything: 9. Parsers A JSON parser transforms a JSON text into another representation. A JSON parser MUST accept all texts that conform to the JSON grammar. A JSON parser MAY accept non-JSON forms or extensions. An implementation may set limits on the size of texts that it accepts. An implementation may set limits on the maximum depth of nesting. An implementation may set limits on the range and precision of numbers. An implementation may set limits on the length and character contents of strings. Now back to our land let's look at say rapidJSON. It MAY seem to handle big integers: https://github.com/miloyip/rapidjson/blob/master/include/rapidjson/internal/biginteger.h But it's used only to parse doubles: https://github.com/miloyip/rapidjson/pull/137 Anyhow the API says it all - only integers up to 64bit and doubles: http://rapidjson.org/md_doc_sax.html#Handler Pretty much what I expect by default. And plz-plz don't hardcode BitInteger in JSON parser, it's slow plus it causes epic code bloat as Don already pointed out. -- Dmitry Olshansky
Aug 03 2015
parent reply Marco Leise <Marco.Leise gmx.de> writes:
Am Mon, 03 Aug 2015 12:11:14 +0300
schrieb Dmitry Olshansky <dmitry.olsh gmail.com>:

 [...]

 Now back to our land let's look at say rapidJSON.
 
 It MAY seem to handle big integers:
 https://github.com/miloyip/rapidjson/blob/master/include/rapidjson/internal/biginteger.h
 
 But it's used only to parse doubles:
 https://github.com/miloyip/rapidjson/pull/137
 
 Anyhow the API says it all - only integers up to 64bit and doubles:
 
 http://rapidjson.org/md_doc_sax.html#Handler
 
 Pretty much what I expect by default.
 And plz-plz don't hardcode BitInteger in JSON parser, it's slow plus it 
 causes epic code bloat as Don already pointed out.
I would take RapidJSON with a grain of salt, its main goal is to be the fastest JSON parser. Nothing wrong with that, but BigInt and fast doesn't naturally match and the C standard library also doesn't come with a BigInt type that could conveniently be plugged in. Please compare again with JSON parsers in languages that provide BigInts, e.g. Ruby: http://ruby-doc.org/stdlib-1.9.3/libdoc/json/rdoc/JSON/Ext/Generator/GeneratorMethods/Bignum.html Optional ok, but no support at all would be so 90s. My impression is that the standard wants to allow JSON being used in environments that cannot provide BigInt support, but a modern language for PCs with a BigInt module should totally support reading long integers and be able to do proper rounding of double values. I thought about reading two BigInts: one for the significand and one for the base-10 exponent, so you don't need a BigFloat but have the full accuracy from the textual string still as x*10^y. -- Marco
Sep 27 2015
parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 27-Sep-2015 20:43, Marco Leise wrote:
 Am Mon, 03 Aug 2015 12:11:14 +0300
 schrieb Dmitry Olshansky <dmitry.olsh gmail.com>:

 [...]

 Now back to our land let's look at say rapidJSON.

 It MAY seem to handle big integers:
 https://github.com/miloyip/rapidjson/blob/master/include/rapidjson/internal/biginteger.h

 But it's used only to parse doubles:
 https://github.com/miloyip/rapidjson/pull/137

 Anyhow the API says it all - only integers up to 64bit and doubles:

 http://rapidjson.org/md_doc_sax.html#Handler

 Pretty much what I expect by default.
 And plz-plz don't hardcode BitInteger in JSON parser, it's slow plus it
 causes epic code bloat as Don already pointed out.
I would take RapidJSON with a grain of salt, its main goal is to be the fastest JSON parser. Nothing wrong with that, but BigInt and fast doesn't naturally match and the C standard library also doesn't come with a BigInt type that could conveniently be plugged in.
Yes, yet support should be optional.
 Please compare again with JSON parsers in languages that
 provide BigInts, e.g. Ruby:
 http://ruby-doc.org/stdlib-1.9.3/libdoc/json/rdoc/JSON/Ext/Generator/GeneratorMethods/Bignum.html
 Optional ok, but no support at all would be so 90s.
Agreed. Still keep in mind the whole reason that Ruby supports it is because its "integer" type is multi-precision by default. So if your native integer type is multi-precision than indeed why add a special case for fixnums.
 My impression is that the standard wants to allow JSON being
 used in environments that cannot provide BigInt support, but a
 modern language for PCs with a BigInt module should totally
 support reading long integers and be able to do proper
 rounding of double values. I thought about reading two
 BigInts: one for the significand and one for the
 base-10 exponent, so you don't need a BigFloat but have the
 full accuracy from the textual string still as x*10^y.
All of that is sensible ... in the slow code path. The common path must be simple and lean, bigints are certainly an exception rather then the rule. Therefore support for big int should not come at the expense for other use cases. Also - pluggability should allow me to e.g. use my own "big" decimal floating point. -- Dmitry Olshansky
Sep 27 2015
prev sibling next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
A speed optimization, since JSON parsing speed is critical:

If the parser is able to use slices of its input, store numbers as slices. Only 
convert them to numbers lazily, as the numeric conversion can take significant
time.
Jul 28 2015
parent reply "Brad Anderson" <eco gnuk.net> writes:
On Tuesday, 28 July 2015 at 23:16:34 UTC, Walter Bright wrote:
 A speed optimization, since JSON parsing speed is critical:

 If the parser is able to use slices of its input, store numbers 
 as slices. Only convert them to numbers lazily, as the numeric 
 conversion can take significant time.
That's what it does (depending on which parser you use). The StAX style parser included is lazy and non-allocating.
Jul 28 2015
parent Walter Bright <newshound2 digitalmars.com> writes:
On 7/28/2015 4:24 PM, Brad Anderson wrote:
 On Tuesday, 28 July 2015 at 23:16:34 UTC, Walter Bright wrote:
 A speed optimization, since JSON parsing speed is critical:

 If the parser is able to use slices of its input, store numbers as slices.
 Only convert them to numbers lazily, as the numeric conversion can take
 significant time.
That's what it does (depending on which parser you use). The StAX style parser included is lazy and non-allocating.
Great!
Jul 28 2015
prev sibling next sibling parent reply "Andrea Fontana" <nospam example.com> writes:
On Tuesday, 28 July 2015 at 14:07:19 UTC, Atila Neves wrote:
 Start of the two week process, folks.

 Code: https://github.com/s-ludwig/std_data_json
 Docs: http://s-ludwig.github.io/std_data_json/

 Atila
Why don't do a shortcut like: jv.opt("/this/is/a/path") ? I use it in my json/bson binding. Anyway, opt(...).isNull return true if that sub-obj doesn't exists. How can I check instead if that sub-object is actually null? Something like: { "a" : { "b" : null} } ? It would be nice to have a way to get a default if it doesn't exists. On my library that behave in a different way i write: Object is : { address : { number: 15 } } // as!xxx try to get a value of that type, if it can't it tries to convert it using .to!xxx if it fails again it returns default // Converted as string assert(obj["/address/number"].as!string == "15"); // This doesn't exists assert(obj["/address/asdasd"].as!int == int.init); // A default value is specified assert(obj["/address/asdasd"].as!int(50) == 50); // A default value is specified (but value exists) assert(obj["/address/number"].as!int(50) == 15); // This doesn't exists assert(!obj["address"]["number"]["this"].exists); My library has a get!xxx string too (that throws an exception if value is not xxx) and to!xxx that throws an exception if value can't converted to xxx. Other feature: // This field doesn't exists return default value auto tmpField = obj["/address/asdasd"].as!int(50); assert(tmpField.error == true); // Value is defaulted ... assert(tmpField.exists == false); // ... because it doesn't exists assert(tmpField == 50); // This field exists, but can't be converted to int. Return default value. tmpField = obj["/tags/0"].as!int(50); assert(tmpField.error == true); // Value is defaulted ... assert(tmpField.exists == true); // ... but a field is actually here assert(tmpField == 50);
Jul 29 2015
parent reply =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:
Am 29.07.2015 um 09:46 schrieb Andrea Fontana:
 On Tuesday, 28 July 2015 at 14:07:19 UTC, Atila Neves wrote:
 Start of the two week process, folks.

 Code: https://github.com/s-ludwig/std_data_json
 Docs: http://s-ludwig.github.io/std_data_json/

 Atila
Why don't do a shortcut like: jv.opt("/this/is/a/path") ? I use it in my json/bson binding.
That would be another possibility. What do you think about the opt(jv).foo.bar[12].baz alternative? One advantage is that it could work without parsing a string and the implications thereof (error handling?).
 Anyway, opt(...).isNull return true if that sub-obj doesn't exists.
 How can I check instead if that sub-object is actually null?

 Something like:  { "a" : { "b" : null} } ?
opt(...) == null
 It would be nice to have a way to get a default if it doesn't exists.
 On my library that behave in a different way i write:

 Object is :  { address : { number: 15 } }

 // as!xxx try to get a value of that type, if it can't it tries to
 convert it using .to!xxx if it fails again it returns default

 // Converted as string
 assert(obj["/address/number"].as!string == "15");

 // This doesn't exists
 assert(obj["/address/asdasd"].as!int == int.init);

 // A default value is specified
 assert(obj["/address/asdasd"].as!int(50) == 50);

 // A default value is specified (but value exists)
 assert(obj["/address/number"].as!int(50) == 15);

 // This doesn't exists
 assert(!obj["address"]["number"]["this"].exists);

 My library has a get!xxx string too (that throws an exception if value
 is not xxx) and to!xxx that throws an exception if value can't converted
 to xxx.
I try to build this from existing building blocks in Phobos, so opt basically returns a Nullable!Algebraic. I guess some of it could simply be implemented in Algebraic, for example by adding an overload of .get that takes a default value. Instead of .to, you already have .coerce. The other possible approach, which would be more convenient to use, would be add a "default value" overload to "opt", for example: jv.opt("defval").foo.bar
 Other feature:
 // This field doesn't exists return default value
 auto tmpField = obj["/address/asdasd"].as!int(50);
 assert(tmpField.error == true);   // Value is defaulted ...
 assert(tmpField.exists == false); // ... because it doesn't exists
 assert(tmpField == 50);

 // This field exists, but can't be converted to int. Return default value.
 tmpField = obj["/tags/0"].as!int(50);
 assert(tmpField.error == true);   // Value is defaulted ...
 assert(tmpField.exists == true);  // ... but a field is actually here
 assert(tmpField == 50);
Jul 29 2015
parent reply "Andrea Fontana" <nospam example.com> writes:
On Wednesday, 29 July 2015 at 08:55:20 UTC, Sönke Ludwig wrote:
 That would be another possibility. What do you think about the 
 opt(jv).foo.bar[12].baz alternative? One advantage is that it 
 could work without parsing a string and the implications 
 thereof (error handling?).
I implemented it too, but I removed. Many times fields name are functions name or similar and it breaks the code. In my implementation it creates a lot of temporary objects (one for each subobj) using the string instead, i just create the last one. It's not easy for me to use assignments with that syntax. Something like: obj.with.a.new.field = 3; It's difficult to implement. It's much easier to implement: obj["/field/doesnt/exists"] = 3 It's much easier to write formatted-string paths. It allows future implementation of something like xpath/jquery style If your json contains keys with "/" inside, you can still use old plain syntax... String parsing it's quite easy (at compile time too) of course. If a part of path doesn't exists it works like a part of opt("a", "b", "c") doesn't. It's just syntax sugar. :)
 Anyway, opt(...).isNull return true if that sub-obj doesn't 
 exists.
 How can I check instead if that sub-object is actually null?

 Something like:  { "a" : { "b" : null} } ?
opt(...) == null
Does it works? Anyway it seems ambiguous: opt(...) == null => false opt(...).isNull => true
 It would be nice to have a way to get a default if it doesn't 
 exists.
 On my library that behave in a different way i write:

 Object is :  { address : { number: 15 } }

 // as!xxx try to get a value of that type, if it can't it 
 tries to
 convert it using .to!xxx if it fails again it returns default

 // Converted as string
 assert(obj["/address/number"].as!string == "15");

 // This doesn't exists
 assert(obj["/address/asdasd"].as!int == int.init);

 // A default value is specified
 assert(obj["/address/asdasd"].as!int(50) == 50);

 // A default value is specified (but value exists)
 assert(obj["/address/number"].as!int(50) == 15);

 // This doesn't exists
 assert(!obj["address"]["number"]["this"].exists);

 My library has a get!xxx string too (that throws an exception 
 if value
 is not xxx) and to!xxx that throws an exception if value can't 
 converted
 to xxx.
I try to build this from existing building blocks in Phobos, so opt basically returns a Nullable!Algebraic. I guess some of it could simply be implemented in Algebraic, for example by adding an overload of .get that takes a default value. Instead of .to, you already have .coerce. The other possible approach, which would be more convenient to use, would be add a "default value" overload to "opt", for example: jv.opt("defval").foo.bar
Isn't jv.opt("defval") taking the value of ("defval") rather than setting a default value?
Jul 29 2015
parent =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:
Am 29.07.2015 um 11:58 schrieb Andrea Fontana:
 On Wednesday, 29 July 2015 at 08:55:20 UTC, Sönke Ludwig wrote:
 That would be another possibility. What do you think about the
 opt(jv).foo.bar[12].baz alternative? One advantage is that it could
 work without parsing a string and the implications thereof (error
 handling?).
I implemented it too, but I removed. Many times fields name are functions name or similar and it breaks the code.
In this case, since it would be a separate type, there are no static members apart from the automatically generated ones and maybe something like opIndex/opAssign. It can of course also overload opIndex with a string argument, so that there is a generic alternative in case of conflicts or runtime key names.
 In my implementation it creates a lot of temporary objects (one for each
 subobj) using the string instead, i just create the last one.
If the temporary objects are cheap, I don't see an issue there. Without keeping track of the path, a simple pointer to a JSONValue should be sufficient (the temporary objects have to be made non-copyable).
 It's not easy for me to use assignments with that syntax. Something like:

 obj.with.a.new.field = 3;

 It's difficult to implement. It's much easier to implement:

 obj["/field/doesnt/exists"] = 3
Maybe more difficult, but certainly possible. If the complexity doesn't explode, I'd say that shouldn't be a primary concern, since this is all still pretty simple.
 It's much easier to write formatted-string paths.
 It allows future implementation of something like xpath/jquery style
Advanced path queries could indeed be interesting, possibly even more interesting if applied to the pull parser.
 If your json contains keys with "/" inside, you can still use old plain
 syntax...
A possible alternative would be to support some kind of escape syntax.
 String parsing it's quite easy (at compile time too) of course. If a
 part of path doesn't exists it works like a part of opt("a", "b", "c")
 doesn't. It's just syntax sugar. :)
Granted, it's not really much in this case, but you do get less static checking, which means that some things will only be caught at run time. Also, you'll get an ambiguity if you want to support array indices, too. Finally, it may even be security relevant, because an attacker might try to sneak in a key that contains slash characters to access/overwrite fields that would normally not be possible. So every user input that may end up in a path query will have to be validated first now.
 Does it works? Anyway it seems ambiguous:
 opt(...) == null   => false
 opt(...).isNull    => true
The former gets forwarded to Algebraic, while the latter is a method of the enclosing Nullable. I've tested it and it works. But I also agree it isn't particularly pretty in this case, but that's what we have in D as basic building blocks (or do we have an Optional type somewhere, yet).
 The other possible approach, which would be more convenient to use,
 would be add a "default value" overload to "opt", for example:
 jv.opt("defval").foo.bar
Isn't jv.opt("defval") taking the value of ("defval") rather than setting a default value?
It would be an opt with different semantics, just a theoretical alternative. This behavior would be mutually exclusive to the current opt.
Jul 30 2015
prev sibling next sibling parent reply "deadalnix" <deadalnix gmail.com> writes:
On Tuesday, 28 July 2015 at 14:07:19 UTC, Atila Neves wrote:
 Start of the two week process, folks.

 Code: https://github.com/s-ludwig/std_data_json
 Docs: http://s-ludwig.github.io/std_data_json/

 Atila
Looked in the doc ( http://s-ludwig.github.io/std_data_json/stdx/data/json/value/JSONValue.html ). I wanted to know how JSONValue can be manipulated. That is not very explicit. First, it doesn't looks like the value can embed null as a value. null is a valid json value. Secondly, it seems that it accept bigint. As per JSON spec, the only kind of numeric value you can have in there is a num, which doesn't even make the difference between floating point and integer (!) and with 53 bits of precision. By having double and long in there, we are already way over spec, so I'm not sure why we'd want to put bigint in there. Finally, I'd love to see that JSONValue to exhibit a similar API than jsvar.
Aug 03 2015
parent reply =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:
Am 03.08.2015 um 23:15 schrieb deadalnix:
 On Tuesday, 28 July 2015 at 14:07:19 UTC, Atila Neves wrote:
 Start of the two week process, folks.

 Code: https://github.com/s-ludwig/std_data_json
 Docs: http://s-ludwig.github.io/std_data_json/

 Atila
Looked in the doc ( http://s-ludwig.github.io/std_data_json/stdx/data/json/value/JSONValue.html ). I wanted to know how JSONValue can be manipulated. That is not very explicit. First, it doesn't looks like the value can embed null as a value. null is a valid json value.
The documentation is lacking, I'll improve that. JSONValue includes an alias this to an Algebraic, which provides the actual data API. Its type list includes typeof(null).
 Secondly, it seems that it accept bigint. As per JSON spec, the only
 kind of numeric value you can have in there is a num, which doesn't even
 make the difference between floating point and integer (!) and with 53
 bits of precision. By having double and long in there, we are already
 way over spec, so I'm not sure why we'd want to put bigint in there.
See also my reply a few posts back. JSON does not specify anything WRT the precision or length of numbers. In the ECMA standard it is mentioned explicitly that this was done so that applications are not limited in what kind of numbers can be transferred. The only thing explicitly mentioned is that implementations *may* choose to support only 64-bit floats. But large integer numbers are used in practice, so we should be able to handle those, too (one way or another).
 Finally, I'd love to see that JSONValue to exhibit a similar API than
 jsvar.
This is how it used to be in the vibe.data.json module. I consider that to be a mistake now for multiple reasons, at least on this abstraction level. My proposal would be to have a clean, "strongly typed" JSONValue and a generic jsvar like struct on top of that, which is defined independently, and could for example work on a BSONValue, too. The usage would simply be "var value = parseJSONValue(...);".
Aug 04 2015
parent reply "deadalnix" <deadalnix gmail.com> writes:
On Tuesday, 4 August 2015 at 13:10:11 UTC, Sönke Ludwig wrote:
 This is how it used to be in the vibe.data.json module. I 
 consider that to be a mistake now for multiple reasons, at 
 least on this abstraction level. My proposal would be to have a 
 clean, "strongly typed" JSONValue and a generic jsvar like 
 struct on top of that, which is defined independently, and 
 could for example work on a BSONValue, too. The usage would 
 simply be "var value = parseJSONValue(...);".
That is not going to cut it. I've been working with these for ages. This is the very kind of scenarios where dynamically typed languages are way more convenient. I've used both quite extensively and this is clear cut: you don't want what you call the strongly typed version of things. I've done it in many languages, including in java for instance. jsvar interface remove the problematic parts of JS (use ~ instead of + for concat strings and do not implement the opDispatch part of the API).
Aug 04 2015
parent reply =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:
Am 04.08.2015 um 19:14 schrieb deadalnix:
 On Tuesday, 4 August 2015 at 13:10:11 UTC, Sönke Ludwig wrote:
 This is how it used to be in the vibe.data.json module. I consider
 that to be a mistake now for multiple reasons, at least on this
 abstraction level. My proposal would be to have a clean, "strongly
 typed" JSONValue and a generic jsvar like struct on top of that, which
 is defined independently, and could for example work on a BSONValue,
 too. The usage would simply be "var value = parseJSONValue(...);".
That is not going to cut it. I've been working with these for ages. This is the very kind of scenarios where dynamically typed languages are way more convenient. I've used both quite extensively and this is clear cut: you don't want what you call the strongly typed version of things. I've done it in many languages, including in java for instance. jsvar interface remove the problematic parts of JS (use ~ instead of + for concat strings and do not implement the opDispatch part of the API).
I just said that jsvar should be supported (even in its full glory), so why is that not going to cut it? Also, in theory, Algebraic already does more or less exactly what you propose (forwards operators, but skips opDispatch and JS-like string operators).
Aug 11 2015
parent reply "deadalnix" <deadalnix gmail.com> writes:
On Tuesday, 11 August 2015 at 21:27:48 UTC, Sönke Ludwig wrote:
 That is not going to cut it. I've been working with these for 
 ages. This
 is the very kind of scenarios where dynamically typed 
 languages are way
 more convenient.

 I've used both quite extensively and this is clear cut: you 
 don't want
 what you call the strongly typed version of things. I've done 
 it in many
 languages, including in java for instance.

 jsvar interface remove the problematic parts of JS (use ~ 
 instead of +
 for concat strings and do not implement the opDispatch part of 
 the API).
I just said that jsvar should be supported (even in its full glory), so why is that not going to cut it? Also, in theory, Algebraic already does more or less exactly what you propose (forwards operators, but skips opDispatch and JS-like string operators).
Ok, then maybe there was a misunderstanding on my part. My understanding was that there was a Node coming from the parser, and that the node could be wrapped in some facility providing a jsvar like API. My position is that it is preferable to have whatever DOM node be jsvar like out of the box rather than having to wrap it into something to get that.
Aug 11 2015
parent reply =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:
Am 11.08.2015 um 23:52 schrieb deadalnix:
 On Tuesday, 11 August 2015 at 21:27:48 UTC, Sönke Ludwig wrote:
 That is not going to cut it. I've been working with these for ages. This
 is the very kind of scenarios where dynamically typed languages are way
 more convenient.

 I've used both quite extensively and this is clear cut: you don't want
 what you call the strongly typed version of things. I've done it in many
 languages, including in java for instance.

 jsvar interface remove the problematic parts of JS (use ~ instead of +
 for concat strings and do not implement the opDispatch part of the API).
I just said that jsvar should be supported (even in its full glory), so why is that not going to cut it? Also, in theory, Algebraic already does more or less exactly what you propose (forwards operators, but skips opDispatch and JS-like string operators).
Ok, then maybe there was a misunderstanding on my part. My understanding was that there was a Node coming from the parser, and that the node could be wrapped in some facility providing a jsvar like API.
Okay, no that's correct.
 My position is that it is preferable to have whatever DOM node be jsvar
 like out of the box rather than having to wrap it into something to get
 that.
But take into account that Algebraic already behaves much like jsvar (at least ideally), just without opDispatch and JavaScript operator emulation (which I'm strongly opposed to as a *default*). So the jsvar wrapper would really just be needed for the cases where really concise code is desired when operating on JSON objects. We also discussed an alternative approach similar to opt(n).foo.bar[1].baz, where n is a JSONValue and opt() creates a wrapper that enables safe navigation within the DOM, propagating any missing/mismatched fields to the final result instead of throwing. This could also be combined with a final type query: opt!string(n).foo.bar
Aug 12 2015
parent "Meta" <jared771 gmail.com> writes:
On Wednesday, 12 August 2015 at 07:19:05 UTC, Sönke Ludwig wrote:
 We also discussed an alternative approach similar to 
 opt(n).foo.bar[1].baz, where n is a JSONValue and opt() creates 
 a wrapper that enables safe navigation within the DOM, 
 propagating any missing/mismatched fields to the final result 
 instead of throwing. This could also be combined with a final 
 type query: opt!string(n).foo.bar
In relation to that, you may find this thread interesting: http://forum.dlang.org/post/lnsc0c$1sip$1 digitalmars.com
Aug 12 2015
prev sibling next sibling parent reply "Atila Neves" <atila.neves gmail.com> writes:
On Tuesday, 28 July 2015 at 14:07:19 UTC, Atila Neves wrote:
 Start of the two week process, folks.

 Code: https://github.com/s-ludwig/std_data_json
 Docs: http://s-ludwig.github.io/std_data_json/

 Atila
I forgot to give warnings that the two week period was about to be up, and was unsure from comments if this would be ready for voting, so let's give it another two days unless there are objections. Atila
Aug 11 2015
next sibling parent reply "deadalnix" <deadalnix gmail.com> writes:
On Tuesday, 11 August 2015 at 17:08:39 UTC, Atila Neves wrote:
 On Tuesday, 28 July 2015 at 14:07:19 UTC, Atila Neves wrote:
 Start of the two week process, folks.

 Code: https://github.com/s-ludwig/std_data_json
 Docs: http://s-ludwig.github.io/std_data_json/

 Atila
I forgot to give warnings that the two week period was about to be up, and was unsure from comments if this would be ready for voting, so let's give it another two days unless there are objections. Atila
Ok some actionable items. 1/ How big is a JSON struct ? What is the biggest element in the union ? Is that element really needed ? Recurse. 2/ As far as I can see, the element are discriminated using typeid. An enum is preferable as the compiler would know values ahead of time and optimize based on this. It also allow use of things like final switch. 3/ Going from the untyped world to the typed world and provide an API to get back to the untyped word is a loser strategy. That sounds true intuitively, but also from my experience manipulating JSON in various languages. The Nodes produced by this lib need to be "manipulatable" as the unstructured values they represent.
Aug 11 2015
next sibling parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 11-Aug-2015 20:30, deadalnix wrote:
 On Tuesday, 11 August 2015 at 17:08:39 UTC, Atila Neves wrote:
 On Tuesday, 28 July 2015 at 14:07:19 UTC, Atila Neves wrote:
 Start of the two week process, folks.

 Code: https://github.com/s-ludwig/std_data_json
 Docs: http://s-ludwig.github.io/std_data_json/

 Atila
I forgot to give warnings that the two week period was about to be up, and was unsure from comments if this would be ready for voting, so let's give it another two days unless there are objections. Atila
Ok some actionable items. 1/ How big is a JSON struct ? What is the biggest element in the union ? Is that element really needed ? Recurse.
+1 Also most JS engines use nan-boxing to fit type tag along with the payload in 8 bytes total. At least the _fast_ path of std.data.json should take advantage of similar techniques.
 2/ As far as I can see, the element are discriminated using typeid. An
 enum is preferable as the compiler would know values ahead of time and
 optimize based on this. It also allow use of things like final switch.
 3/ Going from the untyped world to the typed world and provide an API to
 get back to the untyped word is a loser strategy. That sounds true
 intuitively, but also from my experience manipulating JSON in various
 languages. The Nodes produced by this lib need to be "manipulatable" as
 the unstructured values they represent.
-- Dmitry Olshansky
Aug 11 2015
parent reply =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:
Am 11.08.2015 um 20:15 schrieb Dmitry Olshansky:
 On 11-Aug-2015 20:30, deadalnix wrote:
 Ok some actionable items.

 1/ How big is a JSON struct ? What is the biggest element in the union ?
 Is that element really needed ? Recurse.
+1 Also most JS engines use nan-boxing to fit type tag along with the payload in 8 bytes total. At least the _fast_ path of std.data.json should take advantage of similar techniques.
But the array field already needs 16 bytes on 64-bit systems anyway. We could surely abuse some bits there to at least not use up more for the type tag, but before we go that far, we should first tackle some other questions, such as the allocation strategy of JSONValues during parsing, the Location field and BigInt/Decimal support. Maybe we should first have a vote about whether BigInt/Decimal should be supported or not, because that would at least solve some of the controversial tradeoffs. I didn't have a use for those personally, but at least we had the real-world issue in vibe.d's implementation that a ulong wasn't exactly representable. My view generally still is that the DOM representation is something for convenient manipulation of small chunks of JSON, so that performance is not a priority, but feature completeness is.
Aug 11 2015
parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 12-Aug-2015 00:21, Sönke Ludwig wrote:
 Am 11.08.2015 um 20:15 schrieb Dmitry Olshansky:
 On 11-Aug-2015 20:30, deadalnix wrote:
 Ok some actionable items.

 1/ How big is a JSON struct ? What is the biggest element in the union ?
 Is that element really needed ? Recurse.
+1 Also most JS engines use nan-boxing to fit type tag along with the payload in 8 bytes total. At least the _fast_ path of std.data.json should take advantage of similar techniques.
But the array field already needs 16 bytes on 64-bit systems anyway. We could surely abuse some bits there to at least not use up more for the type tag, but before we go that far, we should first tackle some other questions, such as the allocation strategy of JSONValues during parsing, the Location field and BigInt/Decimal support.
Pointer to array should work for all fields > 8 bytes. Depending on the ratio frequency of value vs frequency of array (which is at least an ~5-10 in any practical scenario) it would make things both more compact and faster.
 Maybe we should first have a vote about whether BigInt/Decimal should be
 supported or not, because that would at least solve some of the
 controversial tradeoffs. I didn't have a use for those personally, but
 at least we had the real-world issue in vibe.d's implementation that a
 ulong wasn't exactly representable.
Well I've stated why I think BigInt should be optional. The reason is C++ parsers don't even bother with anything beyond ULong/double, nor would any e.g. Node.js stuff bother with things beyond double. Lastly we don't have BigFloat so supporting BigInt but not BigFloat is kinda half-way. So please make it an option. And again add an extra indirection (that is BigInt*) for BigInt field in a union because they are extremely rare.
 My view generally still is that the DOM representation is something for
 convenient manipulation of small chunks of JSON, so that performance is
 not a priority, but feature completeness is.
I'm confused - there must be some struct that represents a useful value. And more importantly - is JSONValue going to be converted to jsvar? If not - I'm fine. Otherwise whatever inefficiency present in JSONValue would be accumulated by this conversion process. -- Dmitry Olshansky
Aug 11 2015
parent reply =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:
Am 12.08.2015 um 08:28 schrieb Dmitry Olshansky:
 On 12-Aug-2015 00:21, Sönke Ludwig wrote:
 Am 11.08.2015 um 20:15 schrieb Dmitry Olshansky:
 On 11-Aug-2015 20:30, deadalnix wrote:
 Ok some actionable items.

 1/ How big is a JSON struct ? What is the biggest element in the
 union ?
 Is that element really needed ? Recurse.
+1 Also most JS engines use nan-boxing to fit type tag along with the payload in 8 bytes total. At least the _fast_ path of std.data.json should take advantage of similar techniques.
But the array field already needs 16 bytes on 64-bit systems anyway. We could surely abuse some bits there to at least not use up more for the type tag, but before we go that far, we should first tackle some other questions, such as the allocation strategy of JSONValues during parsing, the Location field and BigInt/Decimal support.
Pointer to array should work for all fields > 8 bytes. Depending on the ratio frequency of value vs frequency of array (which is at least an ~5-10 in any practical scenario) it would make things both more compact and faster.
 Maybe we should first have a vote about whether BigInt/Decimal should be
 supported or not, because that would at least solve some of the
 controversial tradeoffs. I didn't have a use for those personally, but
 at least we had the real-world issue in vibe.d's implementation that a
 ulong wasn't exactly representable.
Well I've stated why I think BigInt should be optional. The reason is C++ parsers don't even bother with anything beyond ULong/double, nor would any e.g. Node.js stuff bother with things beyond double.
The trouble begins with long vs. ulong, even if we'd leave larger numbers aside. We'd really have to support both, but choosing between the two is ambiguous, which isn't very pretty overall.
 Lastly we don't have BigFloat so supporting BigInt but not BigFloat is
 kinda half-way.
That's where Decimal would come in. There is some code for that commented out, but I really didn't want to add it without a standard Phobos implementation. But I wouldn't say that this is really an argument against BigInt, maybe more one for implementing a Decimal type.
 So please make it an option. And again add an extra indirection (that is
 BigInt*) for BigInt field in a union because they are extremely rare.
Good idea, didn't think about that.
 My view generally still is that the DOM representation is something for
 convenient manipulation of small chunks of JSON, so that performance is
 not a priority, but feature completeness is.
I'm confused - there must be some struct that represents a useful value.
There is also the lower level JSONParserNode that represents data of a single bit of the JSON document. But since that struct is just part of a range, its size doesn't matter for speed or memory consumption (they are not allocated or copied while parsing).
 And more importantly - is JSONValue going to be converted to jsvar? If
 not - I'm fine. Otherwise whatever inefficiency present in JSONValue
 would be accumulated by this conversion process.
By default and currently it isn't, but it might be an idea for the future. The jsvar struct could possibly be implemented as a wrapper around JSONValue as a whole, so that it doesn't have to perform an actual conversion of the whole document. Generally, working with JSONValue is already rather inefficient due to all of the dynamic allocations to populate dynamic and associative arrays. Changing that would require switching to completely different underlying container types, which would at least make the API a lot less intuitive. We could of course also simply provide an alternative value representation that is not based on Algebraic (or an enum tag based alternative) and is not augmented with location information, but optimized solely for speed and low memory consumption.
Aug 12 2015
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 8/12/2015 12:44 AM, Sönke Ludwig wrote:
 That's where Decimal would come in. There is some code for that commented out,
 but I really didn't want to add it without a standard Phobos implementation.
But
 I wouldn't say that this is really an argument against BigInt, maybe more one
 for implementing a Decimal type.
Make the type for storing a Number be a template parameter.
Aug 13 2015
parent reply =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:
Am 14.08.2015 um 07:11 schrieb Walter Bright:
 On 8/12/2015 12:44 AM, Sönke Ludwig wrote:
 That's where Decimal would come in. There is some code for that
 commented out,
 but I really didn't want to add it without a standard Phobos
 implementation. But
 I wouldn't say that this is really an argument against BigInt, maybe
 more one
 for implementing a Decimal type.
Make the type for storing a Number be a template parameter.
Then we'd lose the ability to distinguish between integers and floating point in the same lexer instantiation, which is vital for certain input files to avoid losing precision for 64-bit integers. The only solution would be to use Decimal, but that doesn't exist yet and would be slow. But the use of BigInt is already controlled by a template parameter, only the std.bigint import is currently there unconditionally. Hm, another idea would be to store a void* (to a BigInt) instead of a BigInt and only import std.bigint locally in the accessor functions.
Aug 14 2015
next sibling parent "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Friday, 14 August 2015 at 07:14:34 UTC, Sönke Ludwig wrote:
 Am 14.08.2015 um 07:11 schrieb Walter Bright:
 Make the type for storing a Number be a template parameter.
Then we'd lose the ability to distinguish between integers and floating point in the same lexer instantiation, which is vital for certain input files to avoid losing precision for 64-bit integers. The only solution would be to use Decimal, but that doesn't exist yet and would be slow.
Why can't you specify many types? You should be able to query the range/precision of each type?
Aug 14 2015
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 8/14/2015 12:14 AM, Sönke Ludwig wrote:
 Am 14.08.2015 um 07:11 schrieb Walter Bright:
 Make the type for storing a Number be a template parameter.
Then we'd lose the ability to distinguish between integers and floating point in the same lexer instantiation, which is vital for certain input files to avoid losing precision for 64-bit integers. The only solution would be to use Decimal, but that doesn't exist yet and would be slow.
Two other solutions: 1. 'real' has enough precision to hold 64 bit integers. 2. You can use a union of 'long' and a template type T. Use the 'long' if it fits, and T if it doesn't.
Aug 14 2015
parent reply "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Friday, 14 August 2015 at 08:03:34 UTC, Walter Bright wrote:
 1. 'real' has enough precision to hold 64 bit integers.
Except for the lowest negative value… (it has only 63 bits + floating point sign bit)
Aug 14 2015
next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 8/14/2015 2:20 AM, Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= 
<ola.fosheim.grostad+dlang gmail.com> wrote:
 On Friday, 14 August 2015 at 08:03:34 UTC, Walter Bright wrote:
 1. 'real' has enough precision to hold 64 bit integers.
Except for the lowest negative value… (it has only 63 bits + floating point sign bit)
You can always use T for that.
Aug 14 2015
prev sibling parent reply "Matthias Bentrup" <matthias.bentrup googlemail.com> writes:
On Friday, 14 August 2015 at 09:20:14 UTC, Ola Fosheim Grøstad 
wrote:
 On Friday, 14 August 2015 at 08:03:34 UTC, Walter Bright wrote:
 1. 'real' has enough precision to hold 64 bit integers.
Except for the lowest negative value… (it has only 63 bits + floating point sign bit)
actually the x87 format has 64 mantissa bits, although the bit 63 is always '1' for normalized numbers.
Aug 14 2015
parent "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Friday, 14 August 2015 at 11:44:35 UTC, Matthias Bentrup wrote:
 On Friday, 14 August 2015 at 09:20:14 UTC, Ola Fosheim Grøstad 
 wrote:
 On Friday, 14 August 2015 at 08:03:34 UTC, Walter Bright wrote:
 1. 'real' has enough precision to hold 64 bit integers.
Except for the lowest negative value… (it has only 63 bits + floating point sign bit)
actually the x87 format has 64 mantissa bits, although the bit 63 is always '1' for normalized numbers.
Yes, Walter was right. The most negative number can be represented since it is a -(2^63) , so you only need the exponent to represent it (you only need 1 bit from the mantissa).
Aug 14 2015
prev sibling parent reply =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:
Am 11.08.2015 um 19:30 schrieb deadalnix:
 Ok some actionable items.

 1/ How big is a JSON struct ? What is the biggest element in the union ?
 Is that element really needed ? Recurse.
See http://s-ludwig.github.io/std_data_json/stdx/data/json/value/JSONValue.payload.html The question whether each field is "really" needed obviously depends on the application. However, the biggest type is BigInt that, form a quick look, contains a dynamic array + a bool field, so it's not as compact as it could be, but also not really large. There is also an additional Location field that may sometimes be important for good error messages and the like and sometimes may be totally unneeded. However, my goal when implementing this has never been to make the DOM representation as efficient as possible. The simple reason is that a DOM representation is inherently inefficient when compared to operating on the structure using either the pull parser or using a deserializer that directly converts into a static D type. IMO these should be advertised instead of trying to milk a dead cow (in terms of performance).
 2/ As far as I can see, the element are discriminated using typeid. An
 enum is preferable as the compiler would know values ahead of time and
 optimize based on this. It also allow use of things like final switch.
Using a tagged union like structure is definitely what I'd like to have, too. However, the main goal was to build the DOM type upon a generic algebraic type instead of using a home-brew tagged union. The reason is that it automatically makes different DOM types with a similar structure interoperable (JSON/BSON/TOML/...). Now Phobos unfortunately only has Algebraic, which not only doesn't have a type enum, but is currently also really bad at keeping static type information when forwarding function calls or operators. The only options were basically to resort to Algebraic for now, but have something that works, or to first implement an alternative algebraic type and get it accepted into Phobos, which would delay the whole process nearly indefinitely.
 3/ Going from the untyped world to the typed world and provide an API to
 get back to the untyped word is a loser strategy. That sounds true
 intuitively, but also from my experience manipulating JSON in various
 languages. The Nodes produced by this lib need to be "manipulatable" as
 the unstructured values they represent.
It isn't really clear to me what you mean by this. What exactly about JSONValue can't be manipulated like the "unstructured values [it] represent[s]"? Or do you perhaps mean the JSON -> deserialize -> manipulate -> serialize -> JSON approach? That definitely is not a "loser strategy"*, but yes, it is limited to applications where you have a partially fixed schema. However, arguably most applications fall into that category. * OT: My personal observation is that sadly the overall tone in the community has generally become a lot less friendly over the last months. I'm a bit worried about where this may lead in the long term.
Aug 11 2015
next sibling parent reply "deadalnix" <deadalnix gmail.com> writes:
On Tuesday, 11 August 2015 at 21:06:24 UTC, Sönke Ludwig wrote:
 See 
 http://s-ludwig.github.io/std_data_json/stdx/data/json/value/JSONValue.payload.html

 The question whether each field is "really" needed obviously 
 depends on the application. However, the biggest type is BigInt 
 that, form a quick look, contains a dynamic array + a bool 
 field, so it's not as compact as it could be, but also not 
 really large. There is also an additional Location field that 
 may sometimes be important for good error messages and the like 
 and sometimes may be totally unneeded.
Urg. Looks like BigInt should steal a bit somewhere instead of having a bool like this. That is not really your lib's fault, but that's quite an heavy cost. Consider this, if the struct fit into 2 registers, it will be passed around as such rather than in memory. That is a significant difference. For BigInt itself, and, by proxy, for the JSON library. Putting the BigInt thing aside, it seems like the biggest field in there is an array of JSONValues or a string. For the string, you can artificially limit the length by 3 bits to stick a tag. That still give absurdly large strings. For the JSONValue case, the alignment on the pointer is such as you can steal 3 bits from there. Or as for string, the length can be used. It seems very realizable to me to have the JSONValue struct fit into 2 registers, granted the tag fit in 3 bits (8 different types). I can help with that if you want to.
 However, my goal when implementing this has never been to make 
 the DOM representation as efficient as possible. The simple 
 reason is that a DOM representation is inherently inefficient 
 when compared to operating on the structure using either the 
 pull parser or using a deserializer that directly converts into 
 a static D type. IMO these should be advertised instead of 
 trying to milk a dead cow (in terms of performance).
Indeed. Still, JSON nodes should be as lightweight as possible.
 2/ As far as I can see, the element are discriminated using 
 typeid. An
 enum is preferable as the compiler would know values ahead of 
 time and
 optimize based on this. It also allow use of things like final 
 switch.
Using a tagged union like structure is definitely what I'd like to have, too. However, the main goal was to build the DOM type upon a generic algebraic type instead of using a home-brew tagged union. The reason is that it automatically makes different DOM types with a similar structure interoperable (JSON/BSON/TOML/...).
That is a great point that I haven't considered. I'd go the other way around about it: providing a compatible typeid based struct from the enum tagged one for compatibility. It can even be alias this so the transition is transparent. The transformation is not bijective, so that'd be great to get the most restrictive form (the enum) and fallback on the least restrictive one (alias this) when wanted.
 Now Phobos unfortunately only has Algebraic, which not only 
 doesn't have a type enum, but is currently also really bad at 
 keeping static type information when forwarding function calls 
 or operators. The only options were basically to resort to 
 Algebraic for now, but have something that works, or to first 
 implement an alternative algebraic type and get it accepted 
 into Phobos, which would delay the whole process nearly 
 indefinitely.
That's fine. Done is better than perfect. Still API changes tend to be problematic, so we need to nail that part at least, and an enum with fallback on typeid based solution seems like the best option.
 Or do you perhaps mean the JSON -> deserialize -> manipulate -> 
 serialize -> JSON approach? That definitely is not a "loser 
 strategy"*, but yes, it is limited to applications where you 
 have a partially fixed schema. However, arguably most 
 applications fall into that category.
Yes.
Aug 11 2015
parent reply =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:
Am 12.08.2015 um 00:21 schrieb deadalnix:
 On Tuesday, 11 August 2015 at 21:06:24 UTC, Sönke Ludwig wrote:
 See
 http://s-ludwig.github.io/std_data_json/stdx/data/json/value/JSONValue.payload.html


 The question whether each field is "really" needed obviously depends
 on the application. However, the biggest type is BigInt that, form a
 quick look, contains a dynamic array + a bool field, so it's not as
 compact as it could be, but also not really large. There is also an
 additional Location field that may sometimes be important for good
 error messages and the like and sometimes may be totally unneeded.
Urg. Looks like BigInt should steal a bit somewhere instead of having a bool like this. That is not really your lib's fault, but that's quite an heavy cost. Consider this, if the struct fit into 2 registers, it will be passed around as such rather than in memory. That is a significant difference. For BigInt itself, and, by proxy, for the JSON library.
Agreed, this was what I also thought. Considering that BigInt is heavy anyway, Dimitry's suggestion to store a "BigInt*" sounds like a good idea to sidestep that issue, though.
 Putting the BigInt thing aside, it seems like the biggest field in there
 is an array of JSONValues or a string. For the string, you can
 artificially limit the length by 3 bits to stick a tag. That still give
 absurdly large strings. For the JSONValue case, the alignment on the
 pointer is such as you can steal 3 bits from there. Or as for string,
 the length can be used.

 It seems very realizable to me to have the JSONValue struct fit into 2
 registers, granted the tag fit in 3 bits (8 different types).

 I can help with that if you want to.
The question is mainly just, should we decide on a single way to represent values (either speed, or features), or let the library user decide by either making JSONValue a template, or by providing two separate structs optimized for each case. In the latter case, we could really optimize on all fronts and for example use custom containers that use less allocations and are more cache friendly than the built-in ones.
 However, my goal when implementing this has never been to make the DOM
 representation as efficient as possible. The simple reason is that a
 DOM representation is inherently inefficient when compared to
 operating on the structure using either the pull parser or using a
 deserializer that directly converts into a static D type. IMO these
 should be advertised instead of trying to milk a dead cow (in terms of
 performance).
Indeed. Still, JSON nodes should be as lightweight as possible.
 2/ As far as I can see, the element are discriminated using typeid. An
 enum is preferable as the compiler would know values ahead of time and
 optimize based on this. It also allow use of things like final switch.
Using a tagged union like structure is definitely what I'd like to have, too. However, the main goal was to build the DOM type upon a generic algebraic type instead of using a home-brew tagged union. The reason is that it automatically makes different DOM types with a similar structure interoperable (JSON/BSON/TOML/...).
That is a great point that I haven't considered. I'd go the other way around about it: providing a compatible typeid based struct from the enum tagged one for compatibility. It can even be alias this so the transition is transparent. The transformation is not bijective, so that'd be great to get the most restrictive form (the enum) and fallback on the least restrictive one (alias this) when wanted.
As long as the set of types is fixed, it would even be bijective. Anyway, I've just started to work on a generic variant of an enum based algebraic type that exploits as much static type information as possible. If that works out (compiler bugs?), it would be a great thing to have in Phobos, so maybe it's worth to delay the JSON module for that if necessary. The optimization to store the type enum in the length field of dynamic arrays could also be built into the generic type.
 Now Phobos unfortunately only has Algebraic, which not only doesn't
 have a type enum, but is currently also really bad at keeping static
 type information when forwarding function calls or operators. The only
 options were basically to resort to Algebraic for now, but have
 something that works, or to first implement an alternative algebraic
 type and get it accepted into Phobos, which would delay the whole
 process nearly indefinitely.
That's fine. Done is better than perfect. Still API changes tend to be problematic, so we need to nail that part at least, and an enum with fallback on typeid based solution seems like the best option.
Yeah, the transition is indeed problematic. Sadly the "alias this" idea wouldn't work for for that either, because operators and methods of the enum based algebraic type usually have different return types.
 Or do you perhaps mean the JSON -> deserialize -> manipulate ->
 serialize -> JSON approach? That definitely is not a "loser
 strategy"*, but yes, it is limited to applications where you have a
 partially fixed schema. However, arguably most applications fall into
 that category.
Yes.
Just to state explicitly what I mean: This strategy has the most efficient in-memory storage format and profits from all the static type checking niceties of the compiler. It also means that there is a documented schema in the code that be used for reference by the developers and that will automatically be verified by the serializer, resulting in less and better checked code. So where applicable I claim that this is the best strategy to work with such data. For maximum efficiency, it can also be transparently combined with the pull parser. The pull parser can for example be used to jump between array entries and the serializer then reads each single array entry.
Aug 12 2015
next sibling parent reply =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:
 Anyway, I've just started to work on a generic variant of an enum based
 algebraic type that exploits as much static type information as
 possible. If that works out (compiler bugs?), it would be a great thing
 to have in Phobos, so maybe it's worth to delay the JSON module for that
 if necessary.
First proof of concept: https://gist.github.com/s-ludwig/7a8a60150f510239f071#file-taggedalgebraic-d-L148 It probably still has issues with const/immutable and ref in some places, but the basics seem to work as expected.
Aug 12 2015
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/12/15 5:43 AM, Sönke Ludwig wrote:
 Anyway, I've just started to work on a generic variant of an enum based
 algebraic type that exploits as much static type information as
 possible. If that works out (compiler bugs?), it would be a great thing
 to have in Phobos, so maybe it's worth to delay the JSON module for that
 if necessary.
First proof of concept: https://gist.github.com/s-ludwig/7a8a60150f510239f071#file-taggedalgebraic-d-L148 It probably still has issues with const/immutable and ref in some places, but the basics seem to work as expected.
struct TaggedAlgebraic(U) if (is(U == union)) { ... } Interesting. I think it would be best to rename it to TaggedUnion (instantly recognizable; also TaggedAlgebraic is an oxymoron as there's no untagged algebraic type). A good place for it is straight in std.variant. What are the relative advantages of using an integral over a pointer to function? In other words, what's a side by side comparison of TaggedAlgebraic!U and Algebraic!(types inside U)? Thanks, Andrei
Aug 14 2015
next sibling parent Timon Gehr <timon.gehr gmx.ch> writes:
On 08/14/2015 01:40 PM, Andrei Alexandrescu wrote:
 On 8/12/15 5:43 AM, Sönke Ludwig wrote:
 Anyway, I've just started to work on a generic variant of an enum based
 algebraic type that exploits as much static type information as
 possible. If that works out (compiler bugs?), it would be a great thing
 to have in Phobos, so maybe it's worth to delay the JSON module for that
 if necessary.
First proof of concept: https://gist.github.com/s-ludwig/7a8a60150f510239f071#file-taggedalgebraic-d-L148 It probably still has issues with const/immutable and ref in some places, but the basics seem to work as expected.
struct TaggedAlgebraic(U) if (is(U == union)) { ... } Interesting. I think it would be best to rename it to TaggedUnion (instantly recognizable; also TaggedAlgebraic is an oxymoron
No, it isn't. I believe the word you might want is "pleonasm". :o)
 as there's no untagged algebraic type).
The tag is an implementation detail. Algebraic types are actually more naturally expressed as polymorphic higher-order functions.
Aug 14 2015
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/14/15 7:40 AM, Andrei Alexandrescu wrote:
 On 8/12/15 5:43 AM, Sönke Ludwig wrote:
 Anyway, I've just started to work on a generic variant of an enum based
 algebraic type that exploits as much static type information as
 possible. If that works out (compiler bugs?), it would be a great thing
 to have in Phobos, so maybe it's worth to delay the JSON module for that
 if necessary.
First proof of concept: https://gist.github.com/s-ludwig/7a8a60150f510239f071#file-taggedalgebraic-d-L148 It probably still has issues with const/immutable and ref in some places, but the basics seem to work as expected.
struct TaggedAlgebraic(U) if (is(U == union)) { ... } Interesting. I think it would be best to rename it to TaggedUnion (instantly recognizable; also TaggedAlgebraic is an oxymoron as there's no untagged algebraic type). A good place for it is straight in std.variant. What are the relative advantages of using an integral over a pointer to function? In other words, what's a side by side comparison of TaggedAlgebraic!U and Algebraic!(types inside U)? Thanks, Andrei
Ping on this. My working hypothesis: - If there's a way to make a tag smaller than one word, e.g. by using various packing tricks, then the integral tag has an advantage over the pointer tag. - If there's some ordering among types (e.g. all types below 16 have some property etc), then the integral tag again has an advantage over the pointer tag. - Other than that the pointer tag is superior to the integral tag at everything. Where it really wins is there is one unique tag for each type, present or future, so the universe of types representable is the total set. The pointer may be used for dispatching but also as a simple integral tag, so the pointer tag is a superset of the integral tag. I've noticed many people are surprised by std.variant's use of a pointer instead of an integral for tagging. I'd like to either figure whether there's an advantage to integral tags, or if not settle for good a misconception. Andrei
Aug 17 2015
next sibling parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 17-Aug-2015 21:12, Andrei Alexandrescu wrote:
 On 8/14/15 7:40 AM, Andrei Alexandrescu wrote:
 On 8/12/15 5:43 AM, Sönke Ludwig wrote:
 Anyway, I've just started to work on a generic variant of an enum based
 algebraic type that exploits as much static type information as
 possible. If that works out (compiler bugs?), it would be a great thing
 to have in Phobos, so maybe it's worth to delay the JSON module for
 that
 if necessary.
First proof of concept: https://gist.github.com/s-ludwig/7a8a60150f510239f071#file-taggedalgebraic-d-L148 It probably still has issues with const/immutable and ref in some places, but the basics seem to work as expected.
struct TaggedAlgebraic(U) if (is(U == union)) { ... } Interesting. I think it would be best to rename it to TaggedUnion (instantly recognizable; also TaggedAlgebraic is an oxymoron as there's no untagged algebraic type). A good place for it is straight in std.variant. What are the relative advantages of using an integral over a pointer to function? In other words, what's a side by side comparison of TaggedAlgebraic!U and Algebraic!(types inside U)? Thanks, Andrei
Ping on this. My working hypothesis: - If there's a way to make a tag smaller than one word, e.g. by using various packing tricks, then the integral tag has an advantage over the pointer tag. - If there's some ordering among types (e.g. all types below 16 have some property etc), then the integral tag again has an advantage over the pointer tag. - Other than that the pointer tag is superior to the integral tag at everything. Where it really wins is there is one unique tag for each type, present or future, so the universe of types representable is the total set. The pointer may be used for dispatching but also as a simple integral tag, so the pointer tag is a superset of the integral tag. I've noticed many people are surprised by std.variant's use of a pointer instead of an integral for tagging. I'd like to either figure whether there's an advantage to integral tags, or if not settle for good a misconception.
Actually one can combine the two: - use integer type tag for everything built-in - use pointer tag for what is not In code: union NiftyTaggedUnion { // pointer must be at least 4-byte aligned // To discern int tag must have the LSB == 1 // this assumes little-endian though, big-endian is doable too property bool isIntTag(){ return common.head & 1; } IntTagged intTagged; PtrTagged ptrTagged; CommonUnion common; } struct CommonUnion { ubyte[size_of_max_builtin] store; // this is where the type-tag starts - pointer or int uint head; } union IntTagged // int-tagged { union{ // builtins go here int ival; double dval; // .... } uint tag; } union PtrTagged // ptr to typeinfo scheme { ubyte[size_of_max_builtin] payload; TypeInfo* pinfo; } It's going to be challenging but I think I can pull off even nan-boxing with this scheme. -- Dmitry Olshansky
Aug 17 2015
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/17/15 2:47 PM, Dmitry Olshansky wrote:
 Actually one can combine the two:
 - use integer type tag for everything built-in
 - use pointer tag for what is not
But a pointer tag can do everything that an integer tag does. -- Andrei
Aug 17 2015
parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 18-Aug-2015 01:33, Andrei Alexandrescu wrote:
 On 8/17/15 2:47 PM, Dmitry Olshansky wrote:
 Actually one can combine the two:
 - use integer type tag for everything built-in
 - use pointer tag for what is not
But a pointer tag can do everything that an integer tag does. -- Andrei
albeit quite a deal slooower. -- Dmitry Olshansky
Aug 17 2015
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/18/15 2:55 AM, Dmitry Olshansky wrote:
 On 18-Aug-2015 01:33, Andrei Alexandrescu wrote:
 On 8/17/15 2:47 PM, Dmitry Olshansky wrote:
 Actually one can combine the two:
 - use integer type tag for everything built-in
 - use pointer tag for what is not
But a pointer tag can do everything that an integer tag does. -- Andrei
albeit quite a deal slooower.
I think there's a misunderstanding. Pointers _are_ 64-bit integers and may be compared as such. You can use a pointer as an integer. -- Andrei
Aug 18 2015
parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 18-Aug-2015 16:19, Andrei Alexandrescu wrote:
 On 8/18/15 2:55 AM, Dmitry Olshansky wrote:
 On 18-Aug-2015 01:33, Andrei Alexandrescu wrote:
 On 8/17/15 2:47 PM, Dmitry Olshansky wrote:
 Actually one can combine the two:
 - use integer type tag for everything built-in
 - use pointer tag for what is not
But a pointer tag can do everything that an integer tag does. -- Andrei
albeit quite a deal slooower.
I think there's a misunderstanding. Pointers _are_ 64-bit integers and may be compared as such. You can use a pointer as an integer. -- Andrei
Integer in a small range is faster to switch on. Plus comparing to zero is faster, so if the common type has tag == 0 it's a net gain. Strictly speaking pointer with vtbl is about as fast as switch but when we have to switch on 2 types the vtbl dispatch needs to be based on 2 types instead of one. So ideally we need vtbl per pair of type to support e.g. fast binary operators on TaggedAlgebraic. -- Dmitry Olshansky
Aug 18 2015
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/18/15 12:31 PM, Dmitry Olshansky wrote:
 On 18-Aug-2015 16:19, Andrei Alexandrescu wrote:
 On 8/18/15 2:55 AM, Dmitry Olshansky wrote:
 On 18-Aug-2015 01:33, Andrei Alexandrescu wrote:
 On 8/17/15 2:47 PM, Dmitry Olshansky wrote:
 Actually one can combine the two:
 - use integer type tag for everything built-in
 - use pointer tag for what is not
But a pointer tag can do everything that an integer tag does. -- Andrei
albeit quite a deal slooower.
I think there's a misunderstanding. Pointers _are_ 64-bit integers and may be compared as such. You can use a pointer as an integer. -- Andrei
Integer in a small range is faster to switch on. Plus comparing to zero is faster, so if the common type has tag == 0 it's a net gain.
Agreed. These are small gains though unless tight loops are concerned.
 Strictly speaking pointer with vtbl is about as fast as switch but when
 we have to switch on 2 types the vtbl dispatch needs to be based on 2
 types instead of one. So ideally we need vtbl per pair of type to
 support e.g. fast binary operators on TaggedAlgebraic.
But I'm talking about using pointers for indirect calls IN ADDITION to using pointers for simple integral comparison. So the comparison is not appropriate. It's better to have both options instead of just one. Andrei
Aug 18 2015
parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 18-Aug-2015 19:35, Andrei Alexandrescu wrote:
 On 8/18/15 12:31 PM, Dmitry Olshansky wrote:
 On 18-Aug-2015 16:19, Andrei Alexandrescu wrote:
 On 8/18/15 2:55 AM, Dmitry Olshansky wrote:
 On 18-Aug-2015 01:33, Andrei Alexandrescu wrote:
 On 8/17/15 2:47 PM, Dmitry Olshansky wrote:
 Actually one can combine the two:
 - use integer type tag for everything built-in
 - use pointer tag for what is not
But a pointer tag can do everything that an integer tag does. -- Andrei
albeit quite a deal slooower.
I think there's a misunderstanding. Pointers _are_ 64-bit integers and may be compared as such. You can use a pointer as an integer. -- Andrei
Integer in a small range is faster to switch on. Plus comparing to zero is faster, so if the common type has tag == 0 it's a net gain.
Agreed. These are small gains though unless tight loops are concerned.
 Strictly speaking pointer with vtbl is about as fast as switch but when
 we have to switch on 2 types the vtbl dispatch needs to be based on 2
 types instead of one. So ideally we need vtbl per pair of type to
 support e.g. fast binary operators on TaggedAlgebraic.
But I'm talking about using pointers for indirect calls IN ADDITION to using pointers for simple integral comparison. So the comparison is not appropriate. It's better to have both options instead of just one.
If common type fast path with 0 is not relevant then the only gain of integer is being able to fit it in a couple of bytes or even reuse some vacant bits. Another thing is that function addresses are rather sparse so switch statement should do some special preprocessing to make it more dense: - subtract start of the code segment (maybe, but this won't work with DLLs though) - shift right by 2(4?) as functions are usually aligned -- Dmitry Olshansky
Aug 18 2015
prev sibling next sibling parent reply "deadalnix" <deadalnix gmail.com> writes:
On Monday, 17 August 2015 at 18:12:02 UTC, Andrei Alexandrescu 
wrote:
 On 8/14/15 7:40 AM, Andrei Alexandrescu wrote:
 On 8/12/15 5:43 AM, Sönke Ludwig wrote:
 Anyway, I've just started to work on a generic variant of an 
 enum based
 algebraic type that exploits as much static type information 
 as
 possible. If that works out (compiler bugs?), it would be a 
 great thing
 to have in Phobos, so maybe it's worth to delay the JSON 
 module for that
 if necessary.
First proof of concept: https://gist.github.com/s-ludwig/7a8a60150f510239f071#file-taggedalgebraic-d-L148 It probably still has issues with const/immutable and ref in some places, but the basics seem to work as expected.
struct TaggedAlgebraic(U) if (is(U == union)) { ... } Interesting. I think it would be best to rename it to TaggedUnion (instantly recognizable; also TaggedAlgebraic is an oxymoron as there's no untagged algebraic type). A good place for it is straight in std.variant. What are the relative advantages of using an integral over a pointer to function? In other words, what's a side by side comparison of TaggedAlgebraic!U and Algebraic!(types inside U)? Thanks, Andrei
Ping on this. My working hypothesis: - If there's a way to make a tag smaller than one word, e.g. by using various packing tricks, then the integral tag has an advantage over the pointer tag. - If there's some ordering among types (e.g. all types below 16 have some property etc), then the integral tag again has an advantage over the pointer tag. - Other than that the pointer tag is superior to the integral tag at everything. Where it really wins is there is one unique tag for each type, present or future, so the universe of types representable is the total set. The pointer may be used for dispatching but also as a simple integral tag, so the pointer tag is a superset of the integral tag. I've noticed many people are surprised by std.variant's use of a pointer instead of an integral for tagging. I'd like to either figure whether there's an advantage to integral tags, or if not settle for good a misconception. Andrei
From the compiler perspective, the tag is much nicer. Compiler can use jump table for instance. It is not a good solution for Variant (which needs to be able to represent arbitrary types) but if the amount of types is finite, tag is almost always a win. In the case of JSON, using a tag and packing trick, it is possible to pack everything in a 2 pointers sized struct without much trouble.
Aug 17 2015
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/17/15 2:51 PM, deadalnix wrote:
  From the compiler perspective, the tag is much nicer. Compiler can use
 jump table for instance.
The pointer is a more direct conduit to a jump table.
 It is not a good solution for Variant (which needs to be able to
 represent arbitrary types) but if the amount of types is finite, tag is
 almost always a win.
 In the case of JSON, using a tag and packing trick, it is possible to
 pack everything in a 2 pointers sized struct without much trouble.
Point taken. Question is if this is worth it. Andrei
Aug 17 2015
parent reply "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:
On Monday, 17 August 2015 at 22:34:36 UTC, Andrei Alexandrescu 
wrote:
 On 8/17/15 2:51 PM, deadalnix wrote:
  From the compiler perspective, the tag is much nicer. 
 Compiler can use
 jump table for instance.
The pointer is a more direct conduit to a jump table.
Not really, because it most likely doesn't point to where you need it, but to a `TypeInfo` struct instead, which doesn't help you in a `switch` statement. Besides, you probably shouldn't compare pointers vs integers, but pointers vs enums.
 It is not a good solution for Variant (which needs to be able 
 to
 represent arbitrary types) but if the amount of types is 
 finite, tag is
 almost always a win.
 In the case of JSON, using a tag and packing trick, it is 
 possible to
 pack everything in a 2 pointers sized struct without much 
 trouble.
Point taken. Question is if this is worth it.
Anything that makes it fit in two registers instead of three (= 2 regs + memory, in practice) is most likely worth it.
Aug 18 2015
next sibling parent reply Johannes Pfau <nospam example.com> writes:
Am Tue, 18 Aug 2015 09:10:25 +0000
schrieb "Marc Sch=C3=BCtz" <schuetzm gmx.net>:

 On Monday, 17 August 2015 at 22:34:36 UTC, Andrei Alexandrescu=20
 wrote:
 On 8/17/15 2:51 PM, deadalnix wrote:
  From the compiler perspective, the tag is much nicer.=20
 Compiler can use
 jump table for instance.
The pointer is a more direct conduit to a jump table.
=20 Not really, because it most likely doesn't point to where you=20 need it, but to a `TypeInfo` struct instead, which doesn't help=20 you in a `switch` statement. Besides, you probably shouldn't=20 compare pointers vs integers, but pointers vs enums.
Here's an example with an enum tag, showing what compilers can do: http://goo.gl/NUZwNo ARM ASM is easier to read for me. Feel free to switch to X86. The jump table requires only one instruction (the cmp #4 shouldn't be necessary for a final switch, probably a GDC/GCC enhancement). All instructions/data should be in the instruction cache. There's no register save / function call overhead. If you use a pointer: http://goo.gl/9kb0vQ No jump table optimization. Cache should be OK as well. No call overhead. Note how both examples can also combine the code for uint/int. If you use a function pointer instead you'll call different function. Calling a function through pointer: http://goo.gl/zTU3sA You have one indirect call. Probably hard for the branch prediction, although I don't really know. Probably also worse regarding cache. I also cheated by using one pointer only for add. In reality you'll need to store one pointer per operation or use a switch inside the called function. I think it's reasonable to expect the enum version to be faster. To be really sure we'd need some benchmarks.
Aug 18 2015
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/18/15 7:02 AM, Johannes Pfau wrote:
 Am Tue, 18 Aug 2015 09:10:25 +0000
 schrieb "Marc Schütz" <schuetzm gmx.net>:

 On Monday, 17 August 2015 at 22:34:36 UTC, Andrei Alexandrescu
 wrote:
 On 8/17/15 2:51 PM, deadalnix wrote:
   From the compiler perspective, the tag is much nicer.
 Compiler can use
 jump table for instance.
The pointer is a more direct conduit to a jump table.
Not really, because it most likely doesn't point to where you need it, but to a `TypeInfo` struct instead, which doesn't help you in a `switch` statement. Besides, you probably shouldn't compare pointers vs integers, but pointers vs enums.
Here's an example with an enum tag, showing what compilers can do: http://goo.gl/NUZwNo ARM ASM is easier to read for me. Feel free to switch to X86. The jump table requires only one instruction (the cmp #4 shouldn't be necessary for a final switch, probably a GDC/GCC enhancement). All instructions/data should be in the instruction cache. There's no register save / function call overhead. If you use a pointer: http://goo.gl/9kb0vQ
That's a language issue - switch does not work with any pointers. I just submitted https://issues.dlang.org/show_bug.cgi?id=14931. -- Andrei
Aug 18 2015
next sibling parent reply Johannes Pfau <nospam example.com> writes:
Am Tue, 18 Aug 2015 10:58:17 -0400
schrieb Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>:

 On 8/18/15 7:02 AM, Johannes Pfau wrote:
 Am Tue, 18 Aug 2015 09:10:25 +0000
 schrieb "Marc Sch=C3=BCtz" <schuetzm gmx.net>:

 On Monday, 17 August 2015 at 22:34:36 UTC, Andrei Alexandrescu
 wrote:
 On 8/17/15 2:51 PM, deadalnix wrote:
   From the compiler perspective, the tag is much nicer.
 Compiler can use
 jump table for instance.
The pointer is a more direct conduit to a jump table.
Not really, because it most likely doesn't point to where you need it, but to a `TypeInfo` struct instead, which doesn't help you in a `switch` statement. Besides, you probably shouldn't compare pointers vs integers, but pointers vs enums.
Here's an example with an enum tag, showing what compilers can do: http://goo.gl/NUZwNo ARM ASM is easier to read for me. Feel free to switch to X86. The jump table requires only one instruction (the cmp #4 shouldn't be necessary for a final switch, probably a GDC/GCC enhancement). All instructions/data should be in the instruction cache. There's no register save / function call overhead. If you use a pointer: http://goo.gl/9kb0vQ
=20 That's a language issue - switch does not work with any pointers. I just submitted https://issues.dlang.org/show_bug.cgi?id=3D14931. -- Andrei =20
Yes, if we enable switch for pointers we get nicer D code. No, this won't improve the ASM much: Enum values start at 0 and are consecutive. With a final switch they're also bounded. All these points do not apply to pointers. They don't start at 0, are not guaranteed to be consecutive and likely can't be used with final switch. Because of that a switch on pointers can never use jump tables.
Aug 18 2015
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/18/15 11:39 AM, Johannes Pfau wrote:
 No, this won't improve the ASM much: Enum values start at 0 and are
 consecutive. With a final switch they're also bounded. All these points
 do not apply to pointers. They don't start at 0, are not guaranteed to
 be consecutive and likely can't be used with final switch. Because of
 that a switch on pointers can never use jump tables.
I agree there's a margin here in favor of integers, but it's getting thin. Meanwhile, pointers maintain large advantages of principle. I suggest we pursue better use of pointers as tags instead of adding integral-tagged unions to phobos. -- Andrei
Aug 18 2015
parent "deadalnix" <deadalnix gmail.com> writes:
On Tuesday, 18 August 2015 at 16:22:20 UTC, Andrei Alexandrescu 
wrote:
 On 8/18/15 11:39 AM, Johannes Pfau wrote:
 No, this won't improve the ASM much: Enum values start at 0 
 and are
 consecutive. With a final switch they're also bounded. All 
 these points
 do not apply to pointers. They don't start at 0, are not 
 guaranteed to
 be consecutive and likely can't be used with final switch. 
 Because of
 that a switch on pointers can never use jump tables.
I agree there's a margin here in favor of integers, but it's getting thin. Meanwhile, pointers maintain large advantages of principle. I suggest we pursue better use of pointers as tags instead of adding integral-tagged unions to phobos. -- Andrei
No, enum can also be cramed inline in the code for cheap, they can be inserted in existing structure for cheap using bits manipulations most of the time, the compiler can check that all cases are handled in an exhaustive manner. It is not getting thinner.
Aug 18 2015
prev sibling parent "deadalnix" <deadalnix gmail.com> writes:
On Tuesday, 18 August 2015 at 14:58:08 UTC, Andrei Alexandrescu 
wrote:
 That's a language issue - switch does not work with any 
 pointers. I just submitted 
 https://issues.dlang.org/show_bug.cgi?id=14931. -- Andrei
No it is not. Is the set of values is not compact, no jump table.
Aug 18 2015
prev sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/18/15 5:10 AM, "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net>" 
wrote:
 On Monday, 17 August 2015 at 22:34:36 UTC, Andrei Alexandrescu wrote:
 On 8/17/15 2:51 PM, deadalnix wrote:
  From the compiler perspective, the tag is much nicer. Compiler can use
 jump table for instance.
The pointer is a more direct conduit to a jump table.
Not really, because it most likely doesn't point to where you need it, but to a `TypeInfo` struct instead
No, in std.variant it points to a dispatcher function. -- Andrei
Aug 18 2015
prev sibling parent reply =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig outerproduct.org> writes:
Am 17.08.2015 um 20:12 schrieb Andrei Alexandrescu:
 On 8/14/15 7:40 AM, Andrei Alexandrescu wrote:
 struct TaggedAlgebraic(U) if (is(U == union)) { ... }

 Interesting. I think it would be best to rename it to TaggedUnion
 (instantly recognizable; also TaggedAlgebraic is an oxymoron as there's
 no untagged algebraic type). A good place for it is straight in
 std.variant.

 What are the relative advantages of using an integral over a pointer to
 function? In other words, what's a side by side comparison of
 TaggedAlgebraic!U and Algebraic!(types inside U)?

 Thanks,

 Andrei
Ping on this. My working hypothesis: - If there's a way to make a tag smaller than one word, e.g. by using various packing tricks, then the integral tag has an advantage over the pointer tag. - If there's some ordering among types (e.g. all types below 16 have some property etc), then the integral tag again has an advantage over the pointer tag. - Other than that the pointer tag is superior to the integral tag at everything. Where it really wins is there is one unique tag for each type, present or future, so the universe of types representable is the total set. The pointer may be used for dispatching but also as a simple integral tag, so the pointer tag is a superset of the integral tag. I've noticed many people are surprised by std.variant's use of a pointer instead of an integral for tagging. I'd like to either figure whether there's an advantage to integral tags, or if not settle for good a misconception. Andrei
(reposting to NG, accidentally replied by e-mail) Some more points come to mind: - The enum is useful to be able to identify the types outside of the D code itself. For example when serializing the data to disk, or when communicating with C code. - It enables the use of pattern matching (final switch), which is often very convenient, faster, and safer than an if-else cascade. - A hypothesis is that it is faster, because there is no function call indirection involved. - It naturally enables fully statically typed operator forwarding as far as possible (have a look at the examples of the current version). A pointer based version could do this, too, but only by jumping through hoops. - The same type can be used multiple times with a different enum name. This can alternatively be solved using a Typedef!T, but I had several occasions where that proved useful. They both have their place, but IMO where the pointer approach really shines is for unbounded Variant types.
Aug 17 2015
next sibling parent reply Johannes Pfau <nospam example.com> writes:
Am Mon, 17 Aug 2015 20:56:18 +0200
schrieb S=C3=B6nke Ludwig <sludwig outerproduct.org>:

 Am 17.08.2015 um 20:12 schrieb Andrei Alexandrescu:
 On 8/14/15 7:40 AM, Andrei Alexandrescu wrote:
 struct TaggedAlgebraic(U) if (is(U =3D=3D union)) { ... }

 Interesting. I think it would be best to rename it to TaggedUnion
 (instantly recognizable; also TaggedAlgebraic is an oxymoron as
 there's no untagged algebraic type). A good place for it is
 straight in std.variant.

 What are the relative advantages of using an integral over a
 pointer to function? In other words, what's a side by side
 comparison of TaggedAlgebraic!U and Algebraic!(types inside U)?

 Thanks,

 Andrei
Ping on this. My working hypothesis: - If there's a way to make a tag smaller than one word, e.g. by using various packing tricks, then the integral tag has an advantage over the pointer tag. - If there's some ordering among types (e.g. all types below 16 have some property etc), then the integral tag again has an advantage over the pointer tag. - Other than that the pointer tag is superior to the integral tag at everything. Where it really wins is there is one unique tag for each type, present or future, so the universe of types representable is the total set. The pointer may be used for dispatching but also as a simple integral tag, so the pointer tag is a superset of the integral tag. I've noticed many people are surprised by std.variant's use of a pointer instead of an integral for tagging. I'd like to either figure whether there's an advantage to integral tags, or if not settle for good a misconception. Andrei
=20 (reposting to NG, accidentally replied by e-mail) =20 Some more points come to mind: =20 - The enum is useful to be able to identify the types outside of the D code itself. For example when serializing the data to disk, or when=20 communicating with C code. =20 - It enables the use of pattern matching (final switch), which is often very convenient, faster, and safer than an if-else cascade. =20 - A hypothesis is that it is faster, because there is no function call indirection involved. =20 - It naturally enables fully statically typed operator forwarding as far as possible (have a look at the examples of the current version). A pointer based version could do this, too, but only by jumping through hoops. =20 - The same type can be used multiple times with a different enum name. This can alternatively be solved using a Typedef!T, but I had several occasions where that proved useful. =20 They both have their place, but IMO where the pointer approach really=20 shines is for unbounded Variant types.
I think Andrei's point is that a pointer tag can do most things a integral tag could as you don't have to dereference the pointer: void* tag; if (tag =3D=3D &someFunc!A) So the only benefit is that the compiler knows that the _enum_ (not simply an integral) tag is bounded. So we gain: * easier debugging (readable type tag) * potentially better codegen (jump tables fit perfectly: ordered values, 0-x, no gaps) * final switch In some cases enum tags might also be smaller than a pointer.
Aug 17 2015
parent reply "Suliman" <evermind live.ru> writes:
Why not working: 	
JSONValue x = parseJSONValue(`{"a": true, "b": "test"}`);

but:
string str = `{"a": true, "b": "test"}`;
JSONValue x = parseJSONValue(str);

work fine?
Aug 17 2015
parent reply =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig outerproduct.org> writes:
Am 17.08.2015 um 21:32 schrieb Suliman:
 Why not working:
 JSONValue x = parseJSONValue(`{"a": true, "b": "test"}`);

 but:
 string str = `{"a": true, "b": "test"}`;
 JSONValue x = parseJSONValue(str);

 work fine?
toJSONValue() is the right function in this case. I've update the docs/examples to make that clearer.
Aug 17 2015
next sibling parent reply "Suliman" <evermind live.ru> writes:
On Monday, 17 August 2015 at 20:07:24 UTC, Sönke Ludwig wrote:
 Am 17.08.2015 um 21:32 schrieb Suliman:
 Why not working:
 JSONValue x = parseJSONValue(`{"a": true, "b": "test"}`);

 but:
 string str = `{"a": true, "b": "test"}`;
 JSONValue x = parseJSONValue(str);

 work fine?
toJSONValue() is the right function in this case. I've update the docs/examples to make that clearer.
I think that I miss understanding conception of ranges. I reread docs but can't understand what I am missing. Ranges is way to access of sequences, but why I can't take input from string? string is not range?
Aug 17 2015
parent reply =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:
Am 17.08.2015 um 22:23 schrieb Suliman:
 On Monday, 17 August 2015 at 20:07:24 UTC, Sönke Ludwig wrote:
 toJSONValue() is the right function in this case. I've update the
 docs/examples to make that clearer.
I think that I miss understanding conception of ranges. I reread docs but can't understand what I am missing. Ranges is way to access of sequences, but why I can't take input from string? string is not range?
String is a valid range, but parseJSONValue takes a *reference* to a range, because it directly consumes the range and leaves anything that appears after the JSON value in the range. toJSON() on the other hand assumes that the JSON value occupies the whole input range.
Aug 17 2015
parent reply "Suliman" <evermind live.ru> writes:
 String is a valid range, but parseJSONValue takes a *reference* 
 to a range, because it directly consumes the range and leaves 
 anything that appears after the JSON value in the range. 
 toJSON() on the other hand assumes that the JSON value occupies 
 the whole input range.
Yeas, I understood, but maybe it's better to rename it (or add attention in docs, I seen your changes, but I think that you should extend it more, to prevent people doing mistake that I did) , because I think that it would be hard to understand it for people who come from other languages. I am writing in D for a long time, but still some things make me confuse...
Do you use DUB to build? It should automatically download the 
dependency.
Failed to download http://code.dlang.org/packages/vibe-d/0.7.24.zip: 500 Internal Server Error Possible it was issue with my provider. I will check it later. Error above was during attempted to download new version of vibed.
Aug 17 2015
parent =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:
Am 17.08.2015 um 22:58 schrieb Suliman:
 String is a valid range, but parseJSONValue takes a *reference* to a
 range, because it directly consumes the range and leaves anything that
 appears after the JSON value in the range. toJSON() on the other hand
 assumes that the JSON value occupies the whole input range.
Yeas, I understood, but maybe it's better to rename it (or add attention in docs, I seen your changes, but I think that you should extend it more, to prevent people doing mistake that I did) , because I think that it would be hard to understand it for people who come from other languages. I am writing in D for a long time, but still some things make me confuse...
I agree that the naming can be a bit confusing at first, but I chose those names to be consistent with std.conv (to!T and parse!T). I've also just noticed that the parser module example erroneously uses parseJSONValue(). With proper examples, this should hopefully not be that big of a deal.
Aug 17 2015
prev sibling parent reply "Suliman" <evermind live.ru> writes:
Also I can't build last build from git. I am getting error:

source\stdx\data\json\value.d(25,8): Error: module 
taggedalgebraic is in file 'taggedalgebraic.d' which cannot be 
read
Aug 17 2015
parent reply =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:
Am 17.08.2015 um 22:31 schrieb Suliman:
 Also I can't build last build from git. I am getting error:

 source\stdx\data\json\value.d(25,8): Error: module taggedalgebraic is in
 file 'taggedalgebraic.d' which cannot be read
Do you use DUB to build? It should automatically download the dependency. Alternatively, it's located here: https://github.com/s-ludwig/taggedalgebraic/blob/master/source/taggedalgebraic.d
Aug 17 2015
parent "Suliman" <evermind live.ru> writes:
Also could you look at theme 
http://stackoverflow.com/questions/32033817/how-to-insert-date-to-arangodb

And suggest your variant or approve on of existent.
Aug 17 2015
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/17/15 2:56 PM, Sönke Ludwig wrote:
 - The enum is useful to be able to identify the types outside of the D
 code itself. For example when serializing the data to disk, or when
 communicating with C code.
OK.
 - It enables the use of pattern matching (final switch), which is often
 very convenient, faster, and safer than an if-else cascade.
Sounds tenuous.
 - A hypothesis is that it is faster, because there is no function call
 indirection involved.
Again: pointers do all integrals do. To compare: if (myptr == ThePtrOf!int) { ... this is an int ... } I want to make clear that this is understood.
 - It naturally enables fully statically typed operator forwarding as far
 as possible (have a look at the examples of the current version). A
 pointer based version could do this, too, but only by jumping through
 hoops.
I'm unclear on that. Could you please point me to the actual file and lines?
 - The same type can be used multiple times with a different enum name.
 This can alternatively be solved using a Typedef!T, but I had several
 occasions where that proved useful.
Unclear on this. Andrei
Aug 17 2015
parent reply =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:
Am 18.08.2015 um 00:37 schrieb Andrei Alexandrescu:
 On 8/17/15 2:56 PM, Sönke Ludwig wrote:
 - The enum is useful to be able to identify the types outside of the D
 code itself. For example when serializing the data to disk, or when
 communicating with C code.
OK.
 - It enables the use of pattern matching (final switch), which is often
 very convenient, faster, and safer than an if-else cascade.
Sounds tenuous.
It's more convenient/readable in cases where a complex type is used (typeID == Type.object vs. has!(JSONValue[string]). This is especially true if the type is ever changed (or parametric) and all has!()/get!() code needs to be adjusted accordingly. It's faster, even if there is no indirect call involved in the pointer case, because the compiler can emit efficient jump tables instead of generating a series of conditional jumps (if-else-cascade). It's safer because of the possibility to use final switch in addition to a normal switch. I wouldn't call that tenuous.
 - A hypothesis is that it is faster, because there is no function call
 indirection involved.
Again: pointers do all integrals do. To compare: if (myptr == ThePtrOf!int) { ... this is an int ... } I want to make clear that this is understood.
Got that.
 - It naturally enables fully statically typed operator forwarding as far
 as possible (have a look at the examples of the current version). A
 pointer based version could do this, too, but only by jumping through
 hoops.
I'm unclear on that. Could you please point me to the actual file and lines?
See the operator implementation code [1] that is completely statically typed until the final "switch" happens [2]. You can of course do the same for the pointer based Algebraic, but that would just duplicate/override the code that is already implemented by the pointer method.
 - The same type can be used multiple times with a different enum name.
 This can alternatively be solved using a Typedef!T, but I had several
 occasions where that proved useful.
Unclear on this.
I'd say this is just a little perk of the representation but not a hard argument since it can be achieved in a different way relatively easily. [1]: https://github.com/s-ludwig/taggedalgebraic/blob/591b45ca8f99dbab1da966192c67f45354c1e34e/source/taggedalgebraic.d#L145 [2]: https://github.com/s-ludwig/taggedalgebraic/blob/591b45ca8f99dbab1da966192c67f45354c1e34e/source/taggedalgebraic.d#L551
Aug 18 2015
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/18/15 1:21 PM, Sönke Ludwig wrote:
 Am 18.08.2015 um 00:37 schrieb Andrei Alexandrescu:
 On 8/17/15 2:56 PM, Sönke Ludwig wrote:
 - The enum is useful to be able to identify the types outside of the D
 code itself. For example when serializing the data to disk, or when
 communicating with C code.
OK.
 - It enables the use of pattern matching (final switch), which is often
 very convenient, faster, and safer than an if-else cascade.
Sounds tenuous.
It's more convenient/readable in cases where a complex type is used (typeID == Type.object vs. has!(JSONValue[string]). This is especially true if the type is ever changed (or parametric) and all has!()/get!() code needs to be adjusted accordingly. It's faster, even if there is no indirect call involved in the pointer case, because the compiler can emit efficient jump tables instead of generating a series of conditional jumps (if-else-cascade). It's safer because of the possibility to use final switch in addition to a normal switch. I wouldn't call that tenuous.
Well I guess I would, but no matter. It's something where reasonable people may disagree.
 - A hypothesis is that it is faster, because there is no function call
 indirection involved.
Again: pointers do all integrals do. To compare: if (myptr == ThePtrOf!int) { ... this is an int ... } I want to make clear that this is understood.
Got that.
 - It naturally enables fully statically typed operator forwarding as far
 as possible (have a look at the examples of the current version). A
 pointer based version could do this, too, but only by jumping through
 hoops.
I'm unclear on that. Could you please point me to the actual file and lines?
See the operator implementation code [1] that is completely statically typed until the final "switch" happens [2]. You can of course do the same for the pointer based Algebraic, but that would just duplicate/override the code that is already implemented by the pointer method.
Classic code factoring can be done to avoid duplication.
 - The same type can be used multiple times with a different enum name.
 This can alternatively be solved using a Typedef!T, but I had several
 occasions where that proved useful.
Unclear on this.
I'd say this is just a little perk of the representation but not a hard argument since it can be achieved in a different way relatively easily. [1]: https://github.com/s-ludwig/taggedalgebraic/blob/591b45ca8f99dbab1da966192c67f45354c1e34e/source/taggedalgebraic.d#L145 [2]: https://github.com/s-ludwig/taggedalgebraic/blob/591b45ca8f99dbab1da966192c67f45354c1e34e/source/taggedalgebraic.d#L551
Thanks. Andrei
Aug 21 2015
parent =?UTF-8?Q?S=c3=b6nke_Ludwig?= <sludwig rejectedsoftware.com> writes:
Am 21.08.2015 um 18:56 schrieb Andrei Alexandrescu:
 On 8/18/15 1:21 PM, Sönke Ludwig wrote:
 Am 18.08.2015 um 00:37 schrieb Andrei Alexandrescu:
 On 8/17/15 2:56 PM, Sönke Ludwig wrote:
 - The enum is useful to be able to identify the types outside of the D
 code itself. For example when serializing the data to disk, or when
 communicating with C code.
OK.
 - It enables the use of pattern matching (final switch), which is often
 very convenient, faster, and safer than an if-else cascade.
Sounds tenuous.
It's more convenient/readable in cases where a complex type is used (typeID == Type.object vs. has!(JSONValue[string]). This is especially true if the type is ever changed (or parametric) and all has!()/get!() code needs to be adjusted accordingly. It's faster, even if there is no indirect call involved in the pointer case, because the compiler can emit efficient jump tables instead of generating a series of conditional jumps (if-else-cascade). It's safer because of the possibility to use final switch in addition to a normal switch. I wouldn't call that tenuous.
Well I guess I would, but no matter. It's something where reasonable people may disagree.
It depends on the perspective/use case, so it's surely not unreasonable to disagree here. But I'm especially not happy with the "final switch" argument getting dismissed so easily. By the same logic, we could also question the existence of "final switch", or even "switch", as a feature in the first place. Performance benefits are certainly nice, too, but that's really just an implementation detail. The important trait is that the types get a name and that they form an enumerable set. This is quite similar to comparing a struct with named members to an anonymous Tuple!(T...).
Aug 22 2015
prev sibling parent reply "deadalnix" <deadalnix gmail.com> writes:
On Wednesday, 12 August 2015 at 08:21:41 UTC, Sönke Ludwig wrote:
 Just to state explicitly what I mean: This strategy has the 
 most efficient in-memory storage format and profits from all 
 the static type checking niceties of the compiler. It also 
 means that there is a documented schema in the code that be 
 used for reference by the developers and that will 
 automatically be verified by the serializer, resulting in less 
 and better checked code. So where applicable I claim that this 
 is the best strategy to work with such data.

 For maximum efficiency, it can also be transparently combined 
 with the pull parser. The pull parser can for example be used 
 to jump between array entries and the serializer then reads 
 each single array entry.
Thing is, the schema is not always known perfectly? Typical case is JSON used for configuration, and diverse version of the software adding new configurations capabilities, or ignoring old ones.
Aug 12 2015
next sibling parent =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:
Am 12.08.2015 um 19:10 schrieb deadalnix:
 On Wednesday, 12 August 2015 at 08:21:41 UTC, Sönke Ludwig wrote:
 Just to state explicitly what I mean: This strategy has the most
 efficient in-memory storage format and profits from all the static
 type checking niceties of the compiler. It also means that there is a
 documented schema in the code that be used for reference by the
 developers and that will automatically be verified by the serializer,
 resulting in less and better checked code. So where applicable I claim
 that this is the best strategy to work with such data.

 For maximum efficiency, it can also be transparently combined with the
 pull parser. The pull parser can for example be used to jump between
 array entries and the serializer then reads each single array entry.
Thing is, the schema is not always known perfectly? Typical case is JSON used for configuration, and diverse version of the software adding new configurations capabilities, or ignoring old ones.
For example in the serialization framework of vibe.d you can have optional or Nullable fields, you can choose to ignore or error out on unknown fields, and you can have fields of type "Json" or associative arrays to match arbitrary structures. This usually gives enough flexibility, assuming that the program is just interested in fields that it knows about. Of course there are situations where you really just want to access the raw JSON structure, possibly because you are just interested in a small subset of the data. Both, the DOM or the pull parser based approaches, fit in there, based on convenience vs. performance considerations. But things like storing data as JSON in a database or implementing a JSON based protocol usually fit the schema based approach perfectly.
Aug 12 2015
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 8/12/2015 10:10 AM, deadalnix wrote:
 Thing is, the schema is not always known perfectly? Typical case is JSON used
 for configuration, and diverse version of the software adding new
configurations
 capabilities, or ignoring old ones.
Hah, I'd like to replace dmd.conf with a .json file.
Aug 12 2015
next sibling parent reply "CraigDillabaugh" <craig.dillabaugh gmail.com> writes:
On Thursday, 13 August 2015 at 03:44:14 UTC, Walter Bright wrote:
 On 8/12/2015 10:10 AM, deadalnix wrote:
 Thing is, the schema is not always known perfectly? Typical 
 case is JSON used
 for configuration, and diverse version of the software adding 
 new configurations
 capabilities, or ignoring old ones.
Hah, I'd like to replace dmd.conf with a .json file.
Not .json! No configuration file should be in a format that doesn't support comments.
Aug 13 2015
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 8/13/2015 5:22 AM, CraigDillabaugh wrote:
 No configuration file should be in a format that doesn't support comments.
[ "comment" : "and you thought it couldn't have comments!" ]
Aug 13 2015
next sibling parent "Craig Dillabaugh" <craig.dillabaugh gmail.com> writes:
On Friday, 14 August 2015 at 00:16:47 UTC, Walter Bright wrote:
 On 8/13/2015 5:22 AM, CraigDillabaugh wrote:
 No configuration file should be in a format that doesn't 
 support comments.
[ "comment" : "and you thought it couldn't have comments!" ]
You are cheating :o) There do seem to be some ways to comment JSON files, but they all feel, and look like hacks. I think something like YAML or SDLang even would be better. Anyway, at least you aren't proposing XML, so I won't complain too loudly.
Aug 13 2015
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/13/15 8:16 PM, Walter Bright wrote:
 On 8/13/2015 5:22 AM, CraigDillabaugh wrote:
 No configuration file should be in a format that doesn't support
 comments.
[ "comment" : "and you thought it couldn't have comments!" ]
There can't be two comments with the same key though. -- Andrei
Aug 14 2015
next sibling parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 8/14/15 8:51 AM, Andrei Alexandrescu wrote:
 On 8/13/15 8:16 PM, Walter Bright wrote:
 On 8/13/2015 5:22 AM, CraigDillabaugh wrote:
 No configuration file should be in a format that doesn't support
 comments.
[ "comment" : "and you thought it couldn't have comments!" ]
This is invalid (though probably unintentionally). An array cannot have names for elements.
 There can't be two comments with the same key though. -- Andrei
Why not? I believe this is valid json: { "comment" : "this is the first value", "value1" : 42, "comment" : "this is the second value", "value2" : 101 } Though, I would much rather see a better comment tag than "comment":. json isn't ideal for this. -Steve
Aug 14 2015
next sibling parent "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Friday, 14 August 2015 at 13:10:53 UTC, Steven Schveighoffer 
wrote:
 On 8/14/15 8:51 AM, Andrei Alexandrescu wrote:
 On 8/13/15 8:16 PM, Walter Bright wrote:
 On 8/13/2015 5:22 AM, CraigDillabaugh wrote:
 No configuration file should be in a format that doesn't 
 support
 comments.
[ "comment" : "and you thought it couldn't have comments!" ]
This is invalid (though probably unintentionally). An array cannot have names for elements.
 There can't be two comments with the same key though. -- Andrei
Why not? I believe this is valid json:
http://tools.ietf.org/html/rfc7159 «The names within an object SHOULD be unique.» «An object whose names are all unique is interoperable in the sense that all software implementations receiving that object will agree on the name-value mappings. When the names within an object are not unique, the behavior of software that receives such an object is unpredictable. Many implementations report the last name/value pair only. Other implementations report an error or fail to parse the object, and some implementations report all of the name/value pairs, including duplicates. JSON parsing libraries have been observed to differ as to whether or not they make the ordering of object members visible to calling software. Implementations whose behavior does not depend on member ordering will be interoperable in the sense that they will not be affected by these differences.»
Aug 14 2015
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/14/15 9:10 AM, Steven Schveighoffer wrote:
 On 8/14/15 8:51 AM, Andrei Alexandrescu wrote:
 On 8/13/15 8:16 PM, Walter Bright wrote:
 On 8/13/2015 5:22 AM, CraigDillabaugh wrote:
 No configuration file should be in a format that doesn't support
 comments.
[ "comment" : "and you thought it couldn't have comments!" ]
This is invalid (though probably unintentionally). An array cannot have names for elements.
 There can't be two comments with the same key though. -- Andrei
Why not? I believe this is valid json: { "comment" : "this is the first value", "value1" : 42, "comment" : "this is the second value", "value2" : 101 } Though, I would much rather see a better comment tag than "comment":. json isn't ideal for this.
You're right. Good convo: http://stackoverflow.com/questions/21832701/does-json-syntax-allow-duplica e-keys-in-an-object -- Andrei
Aug 14 2015
next sibling parent reply "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Friday, 14 August 2015 at 13:30:44 UTC, Andrei Alexandrescu 
wrote:
 On 8/14/15 9:10 AM, Steven Schveighoffer wrote:
 On 8/14/15 8:51 AM, Andrei Alexandrescu wrote:
 On 8/13/15 8:16 PM, Walter Bright wrote:
 On 8/13/2015 5:22 AM, CraigDillabaugh wrote:
 No configuration file should be in a format that doesn't 
 support
 comments.
[ "comment" : "and you thought it couldn't have comments!" ]
This is invalid (though probably unintentionally). An array cannot have names for elements.
 There can't be two comments with the same key though. -- 
 Andrei
Why not? I believe this is valid json: { "comment" : "this is the first value", "value1" : 42, "comment" : "this is the second value", "value2" : 101 } Though, I would much rather see a better comment tag than "comment":. json isn't ideal for this.
You're right. Good convo: http://stackoverflow.com/questions/21832701/does-json-syntax-allow-duplica e-keys-in-an-object -- Andrei
No, he is wrong, and even if he was right, he would still be wrong. JSON objects are unordered so if you read then write you can get: { "comment" : "this is the second value", "value1" : 42, "value2" : 101 }
Aug 14 2015
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 8/14/15 9:37 AM, "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= 
<ola.fosheim.grostad+dlang gmail.com>" wrote:
 On Friday, 14 August 2015 at 13:30:44 UTC, Andrei Alexandrescu wrote:
 On 8/14/15 9:10 AM, Steven Schveighoffer wrote:
 On 8/14/15 8:51 AM, Andrei Alexandrescu wrote:
 On 8/13/15 8:16 PM, Walter Bright wrote:
 On 8/13/2015 5:22 AM, CraigDillabaugh wrote:
 No configuration file should be in a format that doesn't support
 comments.
[ "comment" : "and you thought it couldn't have comments!" ]
This is invalid (though probably unintentionally). An array cannot have names for elements.
 There can't be two comments with the same key though. -- Andrei
Why not? I believe this is valid json: { "comment" : "this is the first value", "value1" : 42, "comment" : "this is the second value", "value2" : 101 } Though, I would much rather see a better comment tag than "comment":. json isn't ideal for this.
You're right. Good convo: http://stackoverflow.com/questions/21832701/does-json-syntax-allow-duplicate-keys-in-an-object
Yes, that's what I checked first :)
 No, he is wrong, and even if he was right, he would still be wrong. JSON
 objects are unordered so if you read then write you can get:

 {
      "comment" : "this is the second value",
      "value1" : 42,
      "value2" : 101
 }
Sure, but: a) we aren't writing b) comments are for the human reader, not for the program. Dmd should ignore the comments, and it doesn't matter the order. c) it's not important, I think we all agree a format that has specific allowances for comments is better than json. -Steve
Aug 14 2015
parent reply "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Friday, 14 August 2015 at 14:09:25 UTC, Steven Schveighoffer 
wrote:
 a) we aren't writing
 b) comments are for the human reader, not for the program. Dmd 
 should ignore the comments, and it doesn't matter the order.
 c) it's not important, I think we all agree a format that has 
 specific allowances for comments is better than json.
One should have a config file format for which there are standard libraries that preserves structure and comments. It is quite common to have tools that read and write config files.
Aug 14 2015
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 8/14/15 10:44 AM, "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= 
<ola.fosheim.grostad+dlang gmail.com>" wrote:
 On Friday, 14 August 2015 at 14:09:25 UTC, Steven Schveighoffer wrote:
 a) we aren't writing
 b) comments are for the human reader, not for the program. Dmd should
 ignore the comments, and it doesn't matter the order.
 c) it's not important, I think we all agree a format that has specific
 allowances for comments is better than json.
One should have a config file format for which there are standard libraries that preserves structure and comments. It is quite common to have tools that read and write config files.
And that would be possible here. JSON file format says nothing about how the data is stored in your library. But again, not important. -Steve
Aug 14 2015
parent reply "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Friday, 14 August 2015 at 15:11:41 UTC, Steven Schveighoffer 
wrote:
 And that would be possible here. JSON file format says nothing 
 about how the data is stored in your library. But again, not 
 important.
It isn't important since JSON is not too good as a config file format, but it is important when considering other formats. When you read a JSON file into Python or Javascript and write it back all dictionary objects will be restructured. For instance, when a tool reads a config file and removes attributes it is desirable that removed attributes are commented out. With JSON you would have to hack around it like this: [ {fieldname1:value1}, {fieldname2:value2} ] Which is ugly. I think it would be nice if all D tooling standardized on YAML and provided a convenient DOM for it. It is used quite a lot and editors have support for it.
Aug 14 2015
parent reply "deadalnix" <deadalnix gmail.com> writes:
On Friday, 14 August 2015 at 15:29:12 UTC, Ola Fosheim Grøstad 
wrote:
 On Friday, 14 August 2015 at 15:11:41 UTC, Steven Schveighoffer 
 wrote:
 And that would be possible here. JSON file format says nothing 
 about how the data is stored in your library. But again, not 
 important.
It isn't important since JSON is not too good as a config file format, but it is important when considering other formats. When you read a JSON file into Python or Javascript and write it back all dictionary objects will be restructured. For instance, when a tool reads a config file and removes attributes it is desirable that removed attributes are commented out. With JSON you would have to hack around it like this: [ {fieldname1:value1}, {fieldname2:value2} ] Which is ugly. I think it would be nice if all D tooling standardized on YAML and provided a convenient DOM for it. It is used quite a lot and editors have support for it.
It doesn't matter what you think of JSON. JSON is widely used an needed in the standard lib. PERIOD.
Aug 14 2015
next sibling parent "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Friday, 14 August 2015 at 17:31:02 UTC, deadalnix wrote:
 JSON is widely used an needed in the standard lib. PERIOD.
The discussion was about suitability as a standard config file format for D not whether it should be in the standard lib. JSON, XML and YAML all belong in a standard lib.
Aug 14 2015
prev sibling parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 8/14/15 1:30 PM, deadalnix wrote:
 It doesn't matter what you think of JSON.

 JSON is widely used an needed in the standard lib. PERIOD.
I think you are missing that this sub-discussion is about using json to replace dmd configuration file. -Steve
Aug 14 2015
parent "rsw0x" <anonymous anonymous.com> writes:
On Friday, 14 August 2015 at 17:40:01 UTC, Steven Schveighoffer 
wrote:
 On 8/14/15 1:30 PM, deadalnix wrote:
 It doesn't matter what you think of JSON.

 JSON is widely used an needed in the standard lib. PERIOD.
I think you are missing that this sub-discussion is about using json to replace dmd configuration file. -Steve
dub uses sdlang, why not dmd?
Aug 14 2015
prev sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 8/14/2015 6:30 AM, Andrei Alexandrescu wrote:
 On 8/14/15 9:10 AM, Steven Schveighoffer wrote:
 Though, I would much rather see a better comment tag than "comment":.
 json isn't ideal for this.
You're right. Good convo: http://stackoverflow.com/questions/21832701/does-json-syntax-allow-duplicate-keys-in-an-object -- Andrei
When going for portability, it is not a good idea to emit duplicate keys because many json parsers fail on it. For our own json readers, such as reading a dmd.json file with our own parser, it should be fine.
Aug 14 2015
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 8/14/2015 5:51 AM, Andrei Alexandrescu wrote:
 On 8/13/15 8:16 PM, Walter Bright wrote:
 On 8/13/2015 5:22 AM, CraigDillabaugh wrote:
 No configuration file should be in a format that doesn't support
 comments.
[ "comment" : "and you thought it couldn't have comments!" ]
Should be { }, not [ ]
 There can't be two comments with the same key though. -- Andrei
The Json spec doesn't say that - it doesn't specify any semantic meaning.
Aug 14 2015
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 8/14/2015 1:30 PM, Walter Bright wrote:
 On 8/14/2015 5:51 AM, Andrei Alexandrescu wrote:
 On 8/13/15 8:16 PM, Walter Bright wrote:
 On 8/13/2015 5:22 AM, CraigDillabaugh wrote:
 No configuration file should be in a format that doesn't support
 comments.
[ "comment" : "and you thought it couldn't have comments!" ]
Should be { }, not [ ]
 There can't be two comments with the same key though. -- Andrei
The Json spec doesn't say that - it doesn't specify any semantic meaning.
That is, the ECMA 404 spec. There seems to be more than one JSON spec. www.ecma-international.org/.../files/.../ECMA-404.pdf
Aug 14 2015
parent Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:
On 08/14/2015 04:33 PM, Walter Bright wrote:
 That is, the ECMA 404 spec. There seems to be more than one JSON spec.

 www.ecma-international.org/.../files/.../ECMA-404.pdf
Amusingly, that "ECMA-404" link results in an actual HTTP 404.
Aug 21 2015
prev sibling parent reply "Adam D. Ruppe" <destructionator gmail.com> writes:
On Thursday, 13 August 2015 at 03:44:14 UTC, Walter Bright wrote:
 Hah, I'd like to replace dmd.conf with a .json file.
There's an awful lot of people out there replacing json with more ini-like files....
Aug 13 2015
next sibling parent "Brad Anderson" <eco gnuk.net> writes:
On Friday, 14 August 2015 at 00:18:39 UTC, Adam D. Ruppe wrote:
 On Thursday, 13 August 2015 at 03:44:14 UTC, Walter Bright 
 wrote:
 Hah, I'd like to replace dmd.conf with a .json file.
There's an awful lot of people out there replacing json with more ini-like files....
Referring to TOML? https://github.com/toml-lang/toml
Aug 13 2015
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 8/13/2015 5:18 PM, Adam D. Ruppe wrote:
 On Thursday, 13 August 2015 at 03:44:14 UTC, Walter Bright wrote:
 Hah, I'd like to replace dmd.conf with a .json file.
There's an awful lot of people out there replacing json with more ini-like files....
We've currently invented our own, rather stupid and limited, format. There's no point to it over .json.
Aug 13 2015
parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 14-Aug-2015 03:48, Walter Bright wrote:
 On 8/13/2015 5:18 PM, Adam D. Ruppe wrote:
 On Thursday, 13 August 2015 at 03:44:14 UTC, Walter Bright wrote:
 Hah, I'd like to replace dmd.conf with a .json file.
There's an awful lot of people out there replacing json with more ini-like files....
We've currently invented our own, rather stupid and limited, format. There's no point to it over .json.
YAML is (plus/minus braces) the same but supports comments and is increasingly popular for hierarchical configuration files. -- Dmitry Olshansky
Aug 13 2015
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 8/13/2015 11:54 PM, Dmitry Olshansky wrote:
 On 14-Aug-2015 03:48, Walter Bright wrote:
 On 8/13/2015 5:18 PM, Adam D. Ruppe wrote:
 On Thursday, 13 August 2015 at 03:44:14 UTC, Walter Bright wrote:
 Hah, I'd like to replace dmd.conf with a .json file.
There's an awful lot of people out there replacing json with more ini-like files....
We've currently invented our own, rather stupid and limited, format. There's no point to it over .json.
YAML is (plus/minus braces) the same but supports comments and is increasingly popular for hierarchical configuration files.
Yes, but we (will) have a .json parser in Phobos.
Aug 14 2015
next sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2015-08-14 10:04, Walter Bright wrote:

 Yes, but we (will) have a .json parser in Phobos.
Time to add a YAML parser ;) -- /Jacob Carlborg
Aug 14 2015
next sibling parent Rikki Cattermole <alphaglosined gmail.com> writes:
On 15/08/2015 12:40 a.m., Jacob Carlborg wrote:
 On 2015-08-14 10:04, Walter Bright wrote:

 Yes, but we (will) have a .json parser in Phobos.
Time to add a YAML parser ;)
Heyyy Sonke ;)
Aug 14 2015
prev sibling next sibling parent "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Friday, 14 August 2015 at 12:40:32 UTC, Jacob Carlborg wrote:
 On 2015-08-14 10:04, Walter Bright wrote:

 Yes, but we (will) have a .json parser in Phobos.
Time to add a YAML parser ;)
I think kiith-sa has started on that: https://github.com/kiith-sa/D-YAML
Aug 14 2015
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 8/14/2015 5:40 AM, Jacob Carlborg wrote:
 On 2015-08-14 10:04, Walter Bright wrote:

 Yes, but we (will) have a .json parser in Phobos.
Time to add a YAML parser ;)
That's a good idea, but since dmd already emits json and requires incorporation of the json code, the fewer file formats it has to deal with, the better. Config files will work fine with json format.
Aug 14 2015
parent reply "suliman" <Evermind live.ru> writes:
On Friday, 14 August 2015 at 20:44:59 UTC, Walter Bright wrote:
 On 8/14/2015 5:40 AM, Jacob Carlborg wrote:
 On 2015-08-14 10:04, Walter Bright wrote:

 Yes, but we (will) have a .json parser in Phobos.
Time to add a YAML parser ;)
That's a good idea, but since dmd already emits json and requires incorporation of the json code, the fewer file formats it has to deal with, the better. Config files will work fine with json format.
Walter, and what I should to do for commenting stringin config for test purpose? How it's can be done with json? I really think that dmd should use same format as dub
Aug 14 2015
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 8/14/2015 9:58 PM, suliman wrote:
 On Friday, 14 August 2015 at 20:44:59 UTC, Walter Bright wrote:
 Config files will work fine with json format.
Walter, and what I should to do for commenting stringin config for test purpose? How it's can be done with json?
{ "comment" : "this is a comment" }
 I really think that dmd should use same format as dub
json is a format that everybody understands, and dmd has json code already in it (as dmd generates json files)
Aug 14 2015
next sibling parent "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Saturday, 15 August 2015 at 05:03:52 UTC, Walter Bright wrote:
 On 8/14/2015 9:58 PM, suliman wrote:
 On Friday, 14 August 2015 at 20:44:59 UTC, Walter Bright wrote:
 Config files will work fine with json format.
Walter, and what I should to do for commenting stringin config for test purpose? How it's can be done with json?
{ "comment" : "this is a comment" }
 I really think that dmd should use same format as dub
json is a format that everybody understands, and dmd has json code already in it (as dmd generates json files)
And you end up with each D tool having their own config format… :-( http://www.json2yaml.com/
Aug 15 2015
prev sibling parent Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:
On 08/15/2015 01:03 AM, Walter Bright wrote:
 On 8/14/2015 9:58 PM, suliman wrote:
 On Friday, 14 August 2015 at 20:44:59 UTC, Walter Bright wrote:
 Config files will work fine with json format.
Walter, and what I should to do for commenting stringin config for test purpose? How it's can be done with json?
{ "comment" : "this is a comment" }
I'll take an "invented our own, rather stupid and limited, format" over comments that ugly any day. Seriously, with DUB, I've been using json for configuration file a lot lately, and dmd.conf is a way nicer config format. There's very good reason DUB's added an alternate format.
Aug 21 2015
prev sibling parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 14-Aug-2015 11:04, Walter Bright wrote:
 On 8/13/2015 11:54 PM, Dmitry Olshansky wrote:
 On 14-Aug-2015 03:48, Walter Bright wrote:
 On 8/13/2015 5:18 PM, Adam D. Ruppe wrote:
 On Thursday, 13 August 2015 at 03:44:14 UTC, Walter Bright wrote:
 Hah, I'd like to replace dmd.conf with a .json file.
There's an awful lot of people out there replacing json with more ini-like files....
We've currently invented our own, rather stupid and limited, format. There's no point to it over .json.
YAML is (plus/minus braces) the same but supports comments and is increasingly popular for hierarchical configuration files.
Yes, but we (will) have a .json parser in Phobos.
We actually have YAML parser in DUB repository plus so that can be copied over to the compiler source interim. And doesn't have to be particularly fast it just have to work resonably well. -- Dmitry Olshansky
Aug 14 2015
prev sibling parent "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Tuesday, 11 August 2015 at 21:06:24 UTC, Sönke Ludwig wrote:
 However, my goal when implementing this has never been to make 
 the DOM representation as efficient as possible. The simple 
 reason is that a DOM representation is inherently inefficient 
 when compared to operating on the structure using either the 
 pull parser or using a deserializer that directly converts into 
 a static D type. IMO these should be advertised instead of 
 trying to milk a dead cow (in terms of performance).
Maybe it is better to just focus on having a top-of-the-line parser and then let competing DOM implementations build on top of it. I'm personally only interested in structured JSON, I think most webapps use structured JSON informally.
Aug 12 2015
prev sibling parent reply =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:
Am 11.08.2015 um 19:08 schrieb Atila Neves:
 On Tuesday, 28 July 2015 at 14:07:19 UTC, Atila Neves wrote:
 Start of the two week process, folks.

 Code: https://github.com/s-ludwig/std_data_json
 Docs: http://s-ludwig.github.io/std_data_json/

 Atila
I forgot to give warnings that the two week period was about to be up, and was unsure from comments if this would be ready for voting, so let's give it another two days unless there are objections. Atila
I think we really need to have an informal pre-vote about the BigInt and DOM efficiency vs. functionality issues. Basically there are three options for each: 1. Keep them: May have an impact on compile time for big DOMs (run time/memory consumption wouldn't be affected if a pointer to BigInt is stored). But provides an out-of-the-box experience for a broad set of applications. 2. Remove them: Results in a slim and clean API that is fast (to run/compile), but also one that will be less useful for certain applications. 3. Make them CT configurable: Best of both worlds in terms of speed, at the cost of a more complex API. 4. Use a string representation instead of BigInt: This has it's own set of issues, but would also enable some special use cases [1] [2] ([2] is also solved by BigInt/Decimal support, though). I'd also like to postpone the main vote, if there are no objections, until the question of using a general enum based alternative to Algebraic is answered. I've published an initial candidate for this now [3]. These were, AFAICS, the only major open issues (a decision for an opt() variant would be nice, but fortunately that's not a fundamental decision in any way). There is also the topic of avoiding any redundancy in symbol names, which I don't agree with, but I would of course change it if the inclusion depends on that. [1]: https://github.com/rejectedsoftware/vibe.d/issues/431 [2]: http://forum.rejectedsoftware.com/groups/rejectedsoftware.vibed/thread/10098/ [3]: http://code.dlang.org/packages/taggedalgebraic
Aug 13 2015
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 8/13/2015 3:51 AM, Sönke Ludwig wrote:
 These were, AFAICS, the only major open issues (a decision for an opt() variant
 would be nice, but fortunately that's not a fundamental decision in any way).
1. What about the issue of having the API be a composable range interface? http://s-ludwig.github.io/std_data_json/stdx/data/json/lexer/lexJSON.html I.e. the input range should be the FIRST argument, not the last. 2. Why are integers acceptable as lexer input? The spec specifies Unicode. 3. Why are there 4 functions that do the same thing? http://s-ludwig.github.io/std_data_json/stdx/data/json/generator.html After all, there already is a http://s-ludwig.github.io/std_data_json/stdx/data/json/generator/GeneratorOptions.html
Aug 13 2015
parent reply =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:
Am 14.08.2015 um 02:26 schrieb Walter Bright:
 On 8/13/2015 3:51 AM, Sönke Ludwig wrote:
 These were, AFAICS, the only major open issues (a decision for an
 opt() variant
 would be nice, but fortunately that's not a fundamental decision in
 any way).
1. What about the issue of having the API be a composable range interface? http://s-ludwig.github.io/std_data_json/stdx/data/json/lexer/lexJSON.html I.e. the input range should be the FIRST argument, not the last.
Hm, it *is* the first function argument, just the last template argument.
 2. Why are integers acceptable as lexer input? The spec specifies Unicode.
In this case, the lexer will perform on-the-fly UTF validation of the input. It can do so more efficiently than first validating the input using a wrapper range, because it has to check the value of most incoming code units anyway.
 3. Why are there 4 functions that do the same thing?

 http://s-ludwig.github.io/std_data_json/stdx/data/json/generator.html

 After all, there already is a
 http://s-ludwig.github.io/std_data_json/stdx/data/json/generator/GeneratorOptions.html
There are two classes of functions that are not covered by GeneratorOptions: writing to a stream or returning a string. But you are right that pretty printing should be controlled by GeneratorOptions. I'll fix that. The suggestion to use pretty printing by default also sounds good.
Aug 13 2015
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 8/13/2015 11:52 PM, Sönke Ludwig wrote:
 Am 14.08.2015 um 02:26 schrieb Walter Bright:
 On 8/13/2015 3:51 AM, Sönke Ludwig wrote:
 These were, AFAICS, the only major open issues (a decision for an
 opt() variant
 would be nice, but fortunately that's not a fundamental decision in
 any way).
1. What about the issue of having the API be a composable range interface? http://s-ludwig.github.io/std_data_json/stdx/data/json/lexer/lexJSON.html I.e. the input range should be the FIRST argument, not the last.
Hm, it *is* the first function argument, just the last template argument.
Ok, my mistake. I didn't look at the others. I don't know what 'isStringInputRange' is. Whatever it is, it should be a 'range of char'.
 2. Why are integers acceptable as lexer input? The spec specifies Unicode.
In this case, the lexer will perform on-the-fly UTF validation of the input. It can do so more efficiently than first validating the input using a wrapper range, because it has to check the value of most incoming code units anyway.
There is no reason to validate UTF-8 input. The only place where non-ASCII code units can even legally appear is inside strings, and there they can just be copied verbatim while looking for the end of the string.
 3. Why are there 4 functions that do the same thing?

 http://s-ludwig.github.io/std_data_json/stdx/data/json/generator.html

 After all, there already is a
 http://s-ludwig.github.io/std_data_json/stdx/data/json/generator/GeneratorOptions.html
There are two classes of functions that are not covered by GeneratorOptions: writing to a stream or returning a string.
Why do both? Always return an input range. If the user wants a string, he can pipe the input range to a string generator, such as .array
 But you are right that pretty
 printing should be controlled by GeneratorOptions. I'll fix that. The
suggestion
 to use pretty printing by default also sounds good.
Thanks
Aug 14 2015
parent reply =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:
Am 14.08.2015 um 10:17 schrieb Walter Bright:
 On 8/13/2015 11:52 PM, Sönke Ludwig wrote:
 Am 14.08.2015 um 02:26 schrieb Walter Bright:
 On 8/13/2015 3:51 AM, Sönke Ludwig wrote:
 These were, AFAICS, the only major open issues (a decision for an
 opt() variant
 would be nice, but fortunately that's not a fundamental decision in
 any way).
1. What about the issue of having the API be a composable range interface? http://s-ludwig.github.io/std_data_json/stdx/data/json/lexer/lexJSON.html I.e. the input range should be the FIRST argument, not the last.
Hm, it *is* the first function argument, just the last template argument.
Ok, my mistake. I didn't look at the others. I don't know what 'isStringInputRange' is. Whatever it is, it should be a 'range of char'.
I'll rename it to isCharInputRange. We don't have something like that in Phobos, right?
 2. Why are integers acceptable as lexer input? The spec specifies
 Unicode.
In this case, the lexer will perform on-the-fly UTF validation of the input. It can do so more efficiently than first validating the input using a wrapper range, because it has to check the value of most incoming code units anyway.
There is no reason to validate UTF-8 input. The only place where non-ASCII code units can even legally appear is inside strings, and there they can just be copied verbatim while looking for the end of the string.
The idea is to assume that any char based input is already valid UTF (as D defines it), while integer based input comes from an unverified source, so that it still has to be validated before being cast/copied into a 'string'. I think this is a sensible approach, both semantically and performance-wise.
 3. Why are there 4 functions that do the same thing?

 http://s-ludwig.github.io/std_data_json/stdx/data/json/generator.html

 After all, there already is a
 http://s-ludwig.github.io/std_data_json/stdx/data/json/generator/GeneratorOptions.html
There are two classes of functions that are not covered by GeneratorOptions: writing to a stream or returning a string.
Why do both? Always return an input range. If the user wants a string, he can pipe the input range to a string generator, such as .array
Convenience for one. The lack of number to input range conversion functions is another concern. I'm not really keen to implement an input range style floating-point to string conversion routine just for this module. Finally, I'm a little worried about performance. The output range based approach can keep a lot of state implicitly using the program counter register. But an input range would explicitly have to keep track of the current JSON element, as well as the current character/state within that element (and possibly one level deeper, for example for escape sequences). This means that it will require either multiple branches or indirection for each popFront().
Aug 15 2015
next sibling parent reply "Suliman" <evermind live.ru> writes:
I talked with few people and they said that they are prefer 
current vibed's json implementation. What's wrong with it? Why do 
not stay old? They look more easier that new...

IMHO API of current is much harder.
Aug 15 2015
parent "Laeeth Isharc" <spamnolaeeth nospamlaeeth.com> writes:
On Saturday, 15 August 2015 at 17:07:36 UTC, Suliman wrote:
 I talked with few people and they said that they are prefer 
 current vibed's json implementation. What's wrong with it? Why 
 do not stay old? They look more easier that new...

 IMHO API of current is much harder.
New stream parser is fast! (See prior thread on benchmarks).
Aug 15 2015
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 8/15/2015 3:18 AM, Sönke Ludwig wrote:
 I don't know what 'isStringInputRange' is. Whatever it is, it should be
 a 'range of char'.
I'll rename it to isCharInputRange. We don't have something like that in Phobos, right?
That's right, there isn't one. But I use: if (isInputRange!R && is(Unqual!(ElementEncodingType!R) == char)) I'm not a fan of more names for trivia, the deluge of names has its own costs.
 There is no reason to validate UTF-8 input. The only place where
 non-ASCII code units can even legally appear is inside strings, and
 there they can just be copied verbatim while looking for the end of the
 string.
The idea is to assume that any char based input is already valid UTF (as D defines it), while integer based input comes from an unverified source, so that it still has to be validated before being cast/copied into a 'string'. I think this is a sensible approach, both semantically and performance-wise.
The json parser will work fine without doing any validation at all. I've been implementing string handling code in Phobos with the idea of doing validation only if the algorithm requires it, and only for those parts that require it. There are many validation algorithms in Phobos one can tack on - having two implementations of every algorithm, one with an embedded reinvented validation and one without - is too much. The general idea with algorithms is that they do not combine things, but they enable composition.
 Why do both? Always return an input range. If the user wants a string,
 he can pipe the input range to a string generator, such as .array
Convenience for one.
Back to the previous point, that means that every algorithm in Phobos should have two versions, one that returns a range and the other a string? All these variations will result in a combinatorical explosion. The other problem, of course, is that returning a string means the algorithm has to decide how to allocate that string. As much as possible, algorithms should not be making allocation decisions.
 The lack of number to input range conversion functions is
 another concern. I'm not really keen to implement an input range style
 floating-point to string conversion routine just for this module.
Not sure what you mean. Phobos needs such routines anyway, and you still have to do something about floating point.
 Finally, I'm a little worried about performance. The output range based
approach
 can keep a lot of state implicitly using the program counter register. But an
 input range would explicitly have to keep track of the current JSON element, as
 well as the current character/state within that element (and possibly one level
 deeper, for example for escape sequences). This means that it will require
 either multiple branches or indirection for each popFront().
Often this is made up for by not needing to allocate storage. Also, that state is in the cached "hot zone" on top of the stack, which is much faster to access than a cold uninitialized array. I share your concern with performance, and I had very good results with Warp by keeping all the state on the stack in this manner.
Aug 15 2015
next sibling parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 16-Aug-2015 03:50, Walter Bright wrote:
 On 8/15/2015 3:18 AM, Sönke Ludwig wrote:
 There is no reason to validate UTF-8 input. The only place where
 non-ASCII code units can even legally appear is inside strings, and
 there they can just be copied verbatim while looking for the end of the
 string.
The idea is to assume that any char based input is already valid UTF (as D defines it), while integer based input comes from an unverified source, so that it still has to be validated before being cast/copied into a 'string'. I think this is a sensible approach, both semantically and performance-wise.
The json parser will work fine without doing any validation at all. I've been implementing string handling code in Phobos with the idea of doing validation only if the algorithm requires it, and only for those parts that require it.
Aye.
 There are many validation algorithms in Phobos one can tack on - having
 two implementations of every algorithm, one with an embedded reinvented
 validation and one without - is too much.
Actually there are next to none. `validate` that throws on failed validation is a misnomer.
 The general idea with algorithms is that they do not combine things, but
 they enable composition.
At the lower level such as tokenizers combining a couple of simple steps together makes sense because it makes things run faster. It usually eliminates the need for temporary result that must be digestible by the next range. For instance "combining" decoding and character classification one may side-step generating the codepoint value itself (because now it doesn't have to produce it for the top-level algorithm). -- Dmitry Olshansky
Aug 15 2015
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 8/15/2015 11:52 PM, Dmitry Olshansky wrote:
 For instance "combining" decoding and character classification one may
side-step
 generating the codepoint value itself (because now it doesn't have to produce
it
 for the top-level algorithm).
Perhaps, but I wouldn't be convinced without benchmarks to prove it on a case-by-case basis. But it's moot, as json lexing never needs to decode.
Aug 16 2015
parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 16-Aug-2015 11:30, Walter Bright wrote:
 On 8/15/2015 11:52 PM, Dmitry Olshansky wrote:
 For instance "combining" decoding and character classification one may
 side-step
 generating the codepoint value itself (because now it doesn't have to
 produce it
 for the top-level algorithm).
Perhaps, but I wouldn't be convinced without benchmarks to prove it on a case-by-case basis.
About x2 faster then decode + check-if-alphabetic on my stuff: https://github.com/DmitryOlshansky/gsoc-bench-2012 I haven't updated it in a while. There are nice bargraphs for decoding versions by David comparing DMD vs LDC vs GDC: Page 15 at http://dconf.org/2013/talks/nadlinger.pdf
 But it's moot, as json lexing never needs to decode.
Agreed. -- Dmitry Olshansky
Aug 16 2015
parent Walter Bright <newshound2 digitalmars.com> writes:
On 8/16/2015 3:39 AM, Dmitry Olshansky wrote:
 About x2 faster then decode + check-if-alphabetic on my stuff:

 https://github.com/DmitryOlshansky/gsoc-bench-2012

 I haven't updated it in a while. There are nice bargraphs for decoding versions
 by David comparing DMD vs LDC vs GDC:

 Page 15 at http://dconf.org/2013/talks/nadlinger.pdf
Thank you.
Aug 16 2015
prev sibling parent reply =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:
Am 16.08.2015 um 02:50 schrieb Walter Bright:
 On 8/15/2015 3:18 AM, Sönke Ludwig wrote:
 I don't know what 'isStringInputRange' is. Whatever it is, it should be
 a 'range of char'.
I'll rename it to isCharInputRange. We don't have something like that in Phobos, right?
That's right, there isn't one. But I use: if (isInputRange!R && is(Unqual!(ElementEncodingType!R) == char)) I'm not a fan of more names for trivia, the deluge of names has its own costs.
Good, I'll use `if (isInputRange!R && (isSomeChar!(ElementEncodingType!R) || isIntegral!(ElementEncodingType!R))`. It's just used in number of places and quite a bit more verbose (twice as long) and I guess a large number of algorithms in Phobos accept char ranges, so that may actually warrant a name in this case.
 There is no reason to validate UTF-8 input. The only place where
 non-ASCII code units can even legally appear is inside strings, and
 there they can just be copied verbatim while looking for the end of the
 string.
The idea is to assume that any char based input is already valid UTF (as D defines it), while integer based input comes from an unverified source, so that it still has to be validated before being cast/copied into a 'string'. I think this is a sensible approach, both semantically and performance-wise.
The json parser will work fine without doing any validation at all. I've been implementing string handling code in Phobos with the idea of doing validation only if the algorithm requires it, and only for those parts that require it.
Yes, and it won't do that if a char range is passed in. If the integral range path gets removed there are basically two possibilities left, perform the validation up-front (slower), or risk UTF exceptions in unrelated parts of the code base. I don't see why we shouldn't take the opportunity for a full and fast validation here. But I'll relay this to Andrei, it was his idea originally.
 There are many validation algorithms in Phobos one can tack on - having
 two implementations of every algorithm, one with an embedded reinvented
 validation and one without - is too much.
There is nothing reinvented here. It simply implicitly validates all non-string parts of a JSON document and uses validate() for parts of JSON strings that can contain unicode characters.
 The general idea with algorithms is that they do not combine things, but
 they enable composition.
It's just that there is no way to achieve the same performance using composition in this case.
 Why do both? Always return an input range. If the user wants a string,
 he can pipe the input range to a string generator, such as .array
Convenience for one.
Back to the previous point, that means that every algorithm in Phobos should have two versions, one that returns a range and the other a string? All these variations will result in a combinatorical explosion.
This may be a factor of two, but not a combinatorial explosion.
 The other problem, of course, is that returning a string means the
 algorithm has to decide how to allocate that string. As much as
 possible, algorithms should not be making allocation decisions.
Granted, the fact that format() and to!() support input ranges (I didn't notice that until now) makes the issue less important. But without those, it would basically mean that almost all places that generate JSON strings would have to import std.array and append .array. Nothing particularly bad if viewed in isolation, but makes the language appear a lot less clean/more verbose if it occurs often. It's also a stepping stone for language newcomers.
 The lack of number to input range conversion functions is
 another concern. I'm not really keen to implement an input range style
 floating-point to string conversion routine just for this module.
Not sure what you mean. Phobos needs such routines anyway, and you still have to do something about floating point.
There are output range and allocation based float->string conversions available, but no input range based one. But well, using an internal buffer together with formattedWrite would probably be a viable workaround...
 Finally, I'm a little worried about performance. The output range
 based approach
 can keep a lot of state implicitly using the program counter register.
 But an
 input range would explicitly have to keep track of the current JSON
 element, as
 well as the current character/state within that element (and possibly
 one level
 deeper, for example for escape sequences). This means that it will
 require
 either multiple branches or indirection for each popFront().
Often this is made up for by not needing to allocate storage. Also, that state is in the cached "hot zone" on top of the stack, which is much faster to access than a cold uninitialized array.
Just branch misprediction will most probably be problematic. But I think this can be made fast enough anyway by making the input range partially eager and serving chunks of strings at a time. That way, the additional branching only has to happen once per chunk. I'll have a look.
 I share your concern with performance, and I had very good results with
 Warp by keeping all the state on the stack in this manner.
Aug 16 2015
next sibling parent Jacob Carlborg <doob me.com> writes:
On 2015-08-16 14:34, Sönke Ludwig wrote:

 Good, I'll use `if (isInputRange!R &&
 (isSomeChar!(ElementEncodingType!R) ||
 isIntegral!(ElementEncodingType!R))`. It's just used in number of places
 and quite a bit more verbose (twice as long) and I guess a large number
 of algorithms in Phobos accept char ranges, so that may actually warrant
 a name in this case.
I agree. Signatures like this are what's making std.algorithm look more complicated than it is. -- /Jacob Carlborg
Aug 16 2015
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 8/16/2015 5:34 AM, Sönke Ludwig wrote:
 Am 16.08.2015 um 02:50 schrieb Walter Bright:
      if (isInputRange!R && is(Unqual!(ElementEncodingType!R) == char))

 I'm not a fan of more names for trivia, the deluge of names has its own
 costs.
Good, I'll use `if (isInputRange!R && (isSomeChar!(ElementEncodingType!R) || isIntegral!(ElementEncodingType!R))`. It's just used in number of places and quite a bit more verbose (twice as long) and I guess a large number of algorithms in Phobos accept char ranges, so that may actually warrant a name in this case.
Except that there is no reason to support wchar, dchar, int, ubyte, or anything other than char. The idea is not to support something just because you can, but there should be an identifiable, real use case for it first. Has anyone ever seen Json data as ulongs? I haven't either.
 The json parser will work fine without doing any validation at all. I've
 been implementing string handling code in Phobos with the idea of doing
 validation only if the algorithm requires it, and only for those parts
 that require it.
Yes, and it won't do that if a char range is passed in. If the integral range path gets removed there are basically two possibilities left, perform the validation up-front (slower), or risk UTF exceptions in unrelated parts of the code base. I don't see why we shouldn't take the opportunity for a full and fast validation here. But I'll relay this to Andrei, it was his idea originally.
That argument could be used to justify validation in every single algorithm that deals with strings.
 Why do both? Always return an input range. If the user wants a string,
 he can pipe the input range to a string generator, such as .array
Convenience for one.
Back to the previous point, that means that every algorithm in Phobos should have two versions, one that returns a range and the other a string? All these variations will result in a combinatorical explosion.
This may be a factor of two, but not a combinatorial explosion.
We're already up to validate or not, to string or not, i.e. 4 combinations.
 The other problem, of course, is that returning a string means the
 algorithm has to decide how to allocate that string. As much as
 possible, algorithms should not be making allocation decisions.
Granted, the fact that format() and to!() support input ranges (I didn't notice that until now) makes the issue less important. But without those, it would basically mean that almost all places that generate JSON strings would have to import std.array and append .array. Nothing particularly bad if viewed in isolation, but makes the language appear a lot less clean/more verbose if it occurs often. It's also a stepping stone for language newcomers.
This has been argued before, and the problem is it applies to EVERY algorithm in Phobos, and winds up with a doubling of the number of functions to deal with it. I do not view this as clean. D is going to be built around ranges as a fundamental way of coding. Users will need to learn something about them. Appending .array is not a big hill to climb.
 There are output range and allocation based float->string conversions
available,
 but no input range based one. But well, using an internal buffer together with
 formattedWrite would probably be a viable workaround...
I plan to fix that, so using a workaround in the meantime is appropriate.
Aug 16 2015
parent reply =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:
Am 17.08.2015 um 00:03 schrieb Walter Bright:
 On 8/16/2015 5:34 AM, Sönke Ludwig wrote:
 Am 16.08.2015 um 02:50 schrieb Walter Bright:
      if (isInputRange!R && is(Unqual!(ElementEncodingType!R) == char))

 I'm not a fan of more names for trivia, the deluge of names has its own
 costs.
Good, I'll use `if (isInputRange!R && (isSomeChar!(ElementEncodingType!R) || isIntegral!(ElementEncodingType!R))`. It's just used in number of places and quite a bit more verbose (twice as long) and I guess a large number of algorithms in Phobos accept char ranges, so that may actually warrant a name in this case.
Except that there is no reason to support wchar, dchar, int, ubyte, or anything other than char. The idea is not to support something just because you can, but there should be an identifiable, real use case for it first. Has anyone ever seen Json data as ulongs? I haven't either.
But you have seen ubyte[] when reading something from a file or from a network stream. But since Andrei now also wants to remove it, so be it. I'll answer some of the other points anyway:
 The json parser will work fine without doing any validation at all. I've
 been implementing string handling code in Phobos with the idea of doing
 validation only if the algorithm requires it, and only for those parts
 that require it.
Yes, and it won't do that if a char range is passed in. If the integral range path gets removed there are basically two possibilities left, perform the validation up-front (slower), or risk UTF exceptions in unrelated parts of the code base. I don't see why we shouldn't take the opportunity for a full and fast validation here. But I'll relay this to Andrei, it was his idea originally.
That argument could be used to justify validation in every single algorithm that deals with strings.
Not really for all, but indeed there are more where this could apply in theory. However, JSON is used frequently in situations where parsing speed, or performance in general, is often crucial (e.g. web services), which makes it stand out due to practical concerns. Others, such as an XML parser would apply, too, but probably none of the generic string manipulation functions.
 Why do both? Always return an input range. If the user wants a string,
 he can pipe the input range to a string generator, such as .array
Convenience for one.
Back to the previous point, that means that every algorithm in Phobos should have two versions, one that returns a range and the other a string? All these variations will result in a combinatorical explosion.
This may be a factor of two, but not a combinatorial explosion.
We're already up to validate or not, to string or not, i.e. 4 combinations.
Validation is part of the lexer and not the generator. There is no combinatorial relation between the two. Validation is also just a template parameter, so there are no two combinations in terms of implementation either. There is just a "static if" statement somewhere to decide if validate() should be called or not.
 The other problem, of course, is that returning a string means the
 algorithm has to decide how to allocate that string. As much as
 possible, algorithms should not be making allocation decisions.
Granted, the fact that format() and to!() support input ranges (I didn't notice that until now) makes the issue less important. But without those, it would basically mean that almost all places that generate JSON strings would have to import std.array and append .array. Nothing particularly bad if viewed in isolation, but makes the language appear a lot less clean/more verbose if it occurs often. It's also a stepping stone for language newcomers.
This has been argued before, and the problem is it applies to EVERY algorithm in Phobos, and winds up with a doubling of the number of functions to deal with it. I do not view this as clean. D is going to be built around ranges as a fundamental way of coding. Users will need to learn something about them. Appending .array is not a big hill to climb.
It isn't if you get taught about it. But it surely is if you don't know about it yet and try to get something working based only on the JSON API (language newcomer that wants to work with JSON). It's also still an additional thing to remember, type and read, making it an additional piece of cognitive load, even for developers that are fluent with this. Have many of such pieces and they add up to a point where productivity goes to its knees. I already personally find it quite annoying constantly having to import std.range, std.array and std.algorithm to just use some small piece of functionality in std.algorithm. It's also often not clear in which of the three modules/packages a certain function is. We need to find a better balance here if D is to keep its appeal as a language where you stay in "the zone" (a.k.a flow), which always has been a big thing for me.
Aug 22 2015
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 8/22/2015 5:21 AM, Sönke Ludwig wrote:
 Am 17.08.2015 um 00:03 schrieb Walter Bright:
 D is going to be built around ranges as a fundamental way of coding.
 Users will need to learn something about them. Appending .array is not a
 big hill to climb.
It isn't if you get taught about it. But it surely is if you don't know about it yet and try to get something working based only on the JSON API (language newcomer that wants to work with JSON).
Not if the illuminating example in the Json API description does it that way. Newbies will tend to copy/pasta the examples as a starting point.
 It's also still an additional thing to
 remember, type and read, making it an additional piece of cognitive load, even
 for developers that are fluent with this. Have many of such pieces and they add
 up to a point where productivity goes to its knees.
Having composable components behaving in predictable ways is not an additional piece of cognitive load, it is less of one.
 I already personally find it quite annoying constantly having to import
 std.range, std.array and std.algorithm to just use some small piece of
 functionality in std.algorithm. It's also often not clear in which of the three
 modules/packages a certain function is. We need to find a better balance here
if
 D is to keep its appeal as a language where you stay in "the zone"  (a.k.a
 flow), which always has been a big thing for me.
If I buy a toy car, I get a toy car. If I get a lego set, I can build any toy with it. I believe the composable component approach will make Phobos smaller and much more flexible and useful, as opposed to monolithic APIs.
Aug 24 2015
parent reply =?UTF-8?Q?S=c3=b6nke_Ludwig?= <sludwig rejectedsoftware.com> writes:
Am 24.08.2015 um 22:25 schrieb Walter Bright:
 On 8/22/2015 5:21 AM, Sönke Ludwig wrote:
 Am 17.08.2015 um 00:03 schrieb Walter Bright:
 D is going to be built around ranges as a fundamental way of coding.
 Users will need to learn something about them. Appending .array is not a
 big hill to climb.
It isn't if you get taught about it. But it surely is if you don't know about it yet and try to get something working based only on the JSON API (language newcomer that wants to work with JSON).
Not if the illuminating example in the Json API description does it that way. Newbies will tend to copy/pasta the examples as a starting point.
That's true, but then they will possibly have to understand the inner workings soon after, for example when something goes wrong and they get cryptic error messages. It makes the learning curve steeper, even if some of that can be mitigated with good documentation/tutorials.
 It's also still an additional thing to
 remember, type and read, making it an additional piece of cognitive
 load, even
 for developers that are fluent with this. Have many of such pieces and
 they add
 up to a point where productivity goes to its knees.
Having composable components behaving in predictable ways is not an additional piece of cognitive load, it is less of one.
Having to write additional things that are not part of the problem (".array", "import std.array : array;") is cognitive load and having to read such things is cognitive and visual load. Also, having to remember where those additional components reside is cognitive load, at least if they are not used really frequently. This has of course nothing to do with predictable behavior of the components, but with the API/language boundary between ranges and arrays.
 I already personally find it quite annoying constantly having to import
 std.range, std.array and std.algorithm to just use some small piece of
 functionality in std.algorithm. It's also often not clear in which of
 the three
 modules/packages a certain function is. We need to find a better
 balance here if
 D is to keep its appeal as a language where you stay in "the zone"
 (a.k.a
 flow), which always has been a big thing for me.
If I buy a toy car, I get a toy car. If I get a lego set, I can build any toy with it. I believe the composable component approach will make Phobos smaller and much more flexible and useful, as opposed to monolithic APIs.
I'm not arguing against a range based approach! It's just that such an approach ideally shouldn't come at the expense of simplicity and relevance. If I have a string variable and I want to store the upper case version of another string, the direct mental translation is "dst = toUpper(src);" - and not "dst = toUpper(src).array;". It reminds me of the unwrap() calls in Rust code. They can produce a huge amount of visual noise for dealing with errors, whereas an exception based approach lets you focus on the actual problem. Of course exceptions have their own issues, but that's a different topic. Keeping toString in addition to toChars would be enough to avoid the issue here. A possible alternative would be to let the proposed JSON text input range have an "alias this" to "std.array.array(this)". Then it wouldn't even require a rename of toString to toChars to get both worlds.
Aug 24 2015
parent reply "Sebastiaan Koppe" <mail skoppe.eu> writes:
On Tuesday, 25 August 2015 at 06:56:23 UTC, Sönke Ludwig wrote:
 If I have a string variable and I want to store the upper case 
 version of another string, the direct mental translation is 
 "dst = toUpper(src);" - and not "dst = toUpper(src).array;".
One can also say the problem is that you have a string variable.
Aug 25 2015
parent =?UTF-8?Q?S=c3=b6nke_Ludwig?= <sludwig rejectedsoftware.com> writes:
Am 25.08.2015 um 14:14 schrieb Sebastiaan Koppe:
 On Tuesday, 25 August 2015 at 06:56:23 UTC, Sönke Ludwig wrote:
 If I have a string variable and I want to store the upper case version
 of another string, the direct mental translation is "dst =
 toUpper(src);" - and not "dst = toUpper(src).array;".
One can also say the problem is that you have a string variable.
But ranges are not always the right solution: - For fields or setter properties, the exact type of the range is fixed, which is generally unpractical - If the underlying data of a range is stored on the stack or any other transient storage, it cannot be stored on the heap - If the range is only an input range, it must be copied to an array anyway if it's going to be read multiple times - Ranges cannot be immutable (no safe slicing or passing between threads) - If for some reason template land needs to be left, ranges have trouble following (although there are wrapper classes available) - Most existing APIs are string based - Re-evaluating a computed range each time a variable is read is usually wasteful There are probably a bunch of other problems that simply make ranges not the best answer in every situation.
Aug 25 2015
prev sibling parent "Jay Norwood" <jayn prismnet.com> writes:
On Thursday, 13 August 2015 at 10:51:47 UTC, Sönke Ludwig wrote:
 I think we really need to have an informal pre-vote about the 
 BigInt and DOM efficiency vs. functionality issues. Basically 
 there are three options for each:

 1. Keep them: May have an impact on compile time for big DOMs 
 (run time/memory consumption wouldn't be affected if a pointer 
 to BigInt is stored). But provides an out-of-the-box experience 
 for a broad set of applications.

 2. Remove them: Results in a slim and clean API that is fast 
 (to run/compile), but also one that will be less useful for 
 certain applications.

 3. Make them CT configurable: Best of both worlds in terms of 
 speed, at the cost of a more complex API.
I like this #3. If I understand it correctly, this would provide the template to extend the supported data types, correct? However, I also think that you shouldn't try to make the basic storage format handle everything that might be more appropriately handled by a meta-model. Are the range operations compatible with the std.parallelism library?
Aug 15 2015
prev sibling next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 7/28/15 10:07 AM, Atila Neves wrote:
 Start of the two week process, folks.

 Code: https://github.com/s-ludwig/std_data_json
 Docs: http://s-ludwig.github.io/std_data_json/

 Atila
I'll submit a review in short order, but thought this might be of use in performance comparisons: https://www.reddit.com/r/programming/comments/3hbt4w/using_json_in_a_low_ atency_environment/ -- Andrei
Aug 17 2015
prev sibling next sibling parent =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig outerproduct.org> writes:
I've added some changes in the latest version (docs updated):

- Switched to TaggedAlgebraic with full static operator forwarding
- Removed toPrettyJSON (now the default), added GeneratorOptions.compact
- The bigInt field in JSONValue is now stored as a pointer
- Removed is(String/Integral)InputRange helper functions
- Added opt2() [1] as an alternative candidate to opt() [2] with a more 
natural syntax

The possible optimization to store the type tag in unused parts of the 
data fields could be implemented later directly in TaggedAlgebraic.

[1]: http://s-ludwig.github.io/std_data_json/stdx/data/json/value/opt2.html
[2]: http://s-ludwig.github.io/std_data_json/stdx/data/json/value/opt.html
Aug 17 2015
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 7/28/15 10:07 AM, Atila Neves wrote:
 Start of the two week process, folks.

 Code: https://github.com/s-ludwig/std_data_json
 Docs: http://s-ludwig.github.io/std_data_json/

 Atila
I'll preface my review with a general comment. This API comes at an interesting juncture; we're striving as much as possible for interfaces that abstract away lifetime management, so they can be used comfortably with GC, or at high performance (and hopefully no or only marginal loss of comfort) with client-chosen lifetime management policies. The JSON API is a great test bed for our emerging recommended "push lifetime up" idioms; it's not too complicated yet it's not trivial either, and has great usefulness. With this, here are some points: * All new stuff should go in std.experimental. I assume "stdx" would change to that, should this work be merged. * On the face of it, dedicating 6 modules to such a small specification as JSON seems excessive. I'm thinking one module here. (As a simple point: who would ever want to import only foundation, which in turn has one exception type and one location type in it?) I think it shouldn't be up for debate that we must aim for simple and clean APIs. * stdx.data.json.generator: I think the API for converting in-memory JSON values to strings needs to be redone, as follows: - JSONValue should offer a byToken range, which offers the contents of the value one token at a time. For example, "[ 1, 2, 3 ]" offers the '[' token followed by three numeric tokens with the respective values followed by the ']' token. - On top of byToken it's immediate to implement a method (say toJSON or toString) that accepts an output range of characters and formatting options. - On top of the method above with output range, implementing a toString overload that returns a string for convenience is a two-liner. However, it shouldn't return a "string"; Phobos APIs should avoid "hardcoding" the string type. Instead, it should return a user-chosen string type (including reference counting strings). - While at it make prettyfication a flag in the options, not its own part of the function name. * stdx.data.json.lexer: - I assume the idea was to accept ranges of integrals to mean "there's some raw input from a file". This seems to be a bit overdone, e.g. there's no need to accept signed integers or 64-bit integers. I suggest just going with the three character types. - I see tokenization accepts input ranges. This forces the tokenizer to store its own copy of things, which is no doubt the business of appenderFactory. Here the departure of the current approach from what I think should become canonical Phobos APIs deepens for multiple reasons. First, appenderFactory does allow customization of the append operation (nice) but that's not enough to allow the user to customize the lifetime of the created strings, which is usually reflected in the string type itself. So the lexing method should be parameterized by the string type used. (By default string (as is now) should be fine.) Therefore instead of customizing the append method just customize the string type used in the token. - The lexer should internally take optimization opportunities, e.g. if the string type is "string" and the lexed type is also "string", great, just use slices of the input instead of appending them to the tokens. - As a consequence the JSONToken type also needs to be parameterized by the type of its string that holds the payload. I understand this is a complication compared to the current approach, but I don't see an out. In the grand scheme of things it seems a necessary evil: tokens may or may not need a means to manage lifetime of their payload, and that's determined by the type of the payload. Hopefully simplifications in other areas of the API would offset this. - At token level there should be no number parsing. Just store the payload with the token and leave it for later. Very often numbers are converted without there being a need, and the process is costly. This also nicely sidesteps the entire matter of bigints, floating point etc. at this level. - Also, at token level strings should be stored with escapes unresolved. If the user wants a string with the escapes resolved, a lazy range does it. - Validating UTF is tricky; I've seen some discussion in this thread about it. On the face of it JSON only accepts valid UTF characters. As such, a modularity-based argument is to pipe UTF validation before tokenization. (We need a lazy UTF validator and sanitizer stat!) An efficiency-based argument is to do validation during tokenization. I'm inclining in favor of modularization, which allows us to focus on one thing at a time and do it well, instead of duplicationg validation everywhere. Note that it's easy to write routines that do JSON tokenization and leave UTF validation for later, so there's a lot of flexibility in composing validation with JSONization. - Litmus test: if the input type is a forward range AND if the string type chosen for tokens is the same as input type, successful tokenization should allocate exactly zero memory. I think this is a simple way to make sure that the tokenization API works well. - If noThrow is a runtime option, some functions can't be nothrow (and consequently nogc). Not sure how important this is. Probably quite a bit because of the current gc implications of exceptions. IMHO: at lexing level a sound design might just emit error tokens (with the culprit as payload) and never throw. Clients may always throw when they see an error token. * stdx.data.json.parser: - Similar considerations regarding string type used apply here as well: everything should be parameterized with it - the use case to keep in mind is someone wants everything with refcounted strings. - The JSON value does its own internal allocation (for e.g. arrays and hashtables), which should be fine as long as it's encapsulated and we can tweak it later (e.g. make it use reference counting inside). - parseJSONStream should parameterize on string type, not on appenderFactory. - Why both parseJSONStream and parseJSONValue? I'm thinking parseJSONValue would be enough because then you trivially parse a stream with repeated calls to parseJSONValue. - FWIW I think the whole thing with accommodating BigInt etc. is an exaggeration. Just stick with long and double. - readArray suddenly introduces a distinct kind of interacting - callbacks. Why? Should be a lazy range lazy range lazy range. An adapter using callbacks is then a two-liner. - Why is readBool even needed? Just readJSONValue and then enforce it as a bool. Same reasoning applies to readDouble and readString. - readObject is with callbacks again - it would be nice if it were a lazy range. - skipXxx are nice to have and useful. * stdx.data.json.value: - The etymology of "opt" is unclear - no word starting with "opt" or obviously abbreviating to it is in the documentation. "opt2" is awkward. How about "path" and "dyn", respectively. - I think Algebraic should be used throughout instead of TaggedAlgebraic, or motivation be given for the latter. - JSONValue should be more opaque and not expose representation as much as it does now. In particular, offering a built-in hashtable is bound to be problematic because those are expensive to construct, create garbage, and are not customizable. Instead, the necessary lookup and set APIs should be provided by JSONValue whilst keeping the implementation hidden. The same goes about array - a JSONValue shall not be exposed; instead, indexed access primitives should be exposed. Separate types might be offered (e.g. JSONArray, JSONDictionary) if deemed necessary. The string type should be a type parameter of JSONValue. ============================== So, here we are. I realize a good chunk of this is surprising ("you mean I shouldn't create strings in my APIs?"). My point here is, again, we're at a juncture. We're trying to factor garbage (heh) out of API design in ways that defer the lifetime management to the user of the API. We could pull json into std.experimental and defer the policy decisions for later, but I think it's a great driver for them. (Thanks Sönke for doing all the work, this is a great baseline.) I think we should use the JSON API as a guinea pig for the new era of D API design in which we have a solid set of principles, tools, and guidelines to defer lifetime management. Please advise. Andrei
Aug 17 2015
next sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2015-08-18 00:21, Andrei Alexandrescu wrote:

 * On the face of it, dedicating 6 modules to such a small specification
 as JSON seems excessive. I'm thinking one module here. (As a simple
 point: who would ever want to import only foundation, which in turn has
 one exception type and one location type in it?) I think it shouldn't be
 up for debate that we must aim for simple and clean APIs.
I don't think this is excessive. We should strive to have small modules. We already have/had problems with std.algorithm and std.datetime, let's not repeat those mistakes. A module with 2000 lines is more than enough. -- /Jacob Carlborg
Aug 17 2015
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/18/15 2:31 AM, Jacob Carlborg wrote:
 On 2015-08-18 00:21, Andrei Alexandrescu wrote:

 * On the face of it, dedicating 6 modules to such a small specification
 as JSON seems excessive. I'm thinking one module here. (As a simple
 point: who would ever want to import only foundation, which in turn has
 one exception type and one location type in it?) I think it shouldn't be
 up for debate that we must aim for simple and clean APIs.
I don't think this is excessive. We should strive to have small modules. We already have/had problems with std.algorithm and std.datetime, let's not repeat those mistakes. A module with 2000 lines is more than enough.
How about a module with 20? -- Andrei
Aug 18 2015
next sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2015-08-18 15:18, Andrei Alexandrescu wrote:

 How about a module with 20? -- Andrei
If it's used in several other modules, I don't see a problem with it. -- /Jacob Carlborg
Aug 18 2015
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/18/15 9:31 AM, Jacob Carlborg wrote:
 On 2015-08-18 15:18, Andrei Alexandrescu wrote:

 How about a module with 20? -- Andrei
If it's used in several other modules, I don't see a problem with it.
Me neither if internal. I do see a problem if it's public. -- Andrei
Aug 18 2015
parent reply Jacob Carlborg <doob me.com> writes:
On 2015-08-18 17:18, Andrei Alexandrescu wrote:

 Me neither if internal. I do see a problem if it's public. -- Andrei
If it's public and those 20 lines are useful on its own, I don't see a problem with that either. -- /Jacob Carlborg
Aug 18 2015
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/18/15 1:24 PM, Jacob Carlborg wrote:
 On 2015-08-18 17:18, Andrei Alexandrescu wrote:

 Me neither if internal. I do see a problem if it's public. -- Andrei
If it's public and those 20 lines are useful on its own, I don't see a problem with that either.
In this case at least they aren't. There is no need to import the JSON exception and the JSON location without importing anything else JSON. -- Andrei
Aug 18 2015
next sibling parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 19-Aug-2015 04:58, Andrei Alexandrescu wrote:
 On 8/18/15 1:24 PM, Jacob Carlborg wrote:
 On 2015-08-18 17:18, Andrei Alexandrescu wrote:

 Me neither if internal. I do see a problem if it's public. -- Andrei
If it's public and those 20 lines are useful on its own, I don't see a problem with that either.
In this case at least they aren't. There is no need to import the JSON exception and the JSON location without importing anything else JSON. -- Andrei
To catch it? Generally I agree - just merge things sensibly, there could be traits.d/primitives.d module should it define isXYZ constraints and other lightweight interface-only entities. -- Dmitry Olshansky
Aug 19 2015
prev sibling parent reply =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:
Am 19.08.2015 um 03:58 schrieb Andrei Alexandrescu:
 On 8/18/15 1:24 PM, Jacob Carlborg wrote:
 On 2015-08-18 17:18, Andrei Alexandrescu wrote:

 Me neither if internal. I do see a problem if it's public. -- Andrei
If it's public and those 20 lines are useful on its own, I don't see a problem with that either.
In this case at least they aren't. There is no need to import the JSON exception and the JSON location without importing anything else JSON. -- Andrei
The only other module where it would fit would be lexer.d, but that means that importing JSONValue also has to import the parser and lexer modules, which is usually only needed in a few places.
Aug 19 2015
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/19/15 4:55 AM, Sönke Ludwig wrote:
 Am 19.08.2015 um 03:58 schrieb Andrei Alexandrescu:
 On 8/18/15 1:24 PM, Jacob Carlborg wrote:
 On 2015-08-18 17:18, Andrei Alexandrescu wrote:

 Me neither if internal. I do see a problem if it's public. -- Andrei
If it's public and those 20 lines are useful on its own, I don't see a problem with that either.
In this case at least they aren't. There is no need to import the JSON exception and the JSON location without importing anything else JSON. -- Andrei
The only other module where it would fit would be lexer.d, but that means that importing JSONValue also has to import the parser and lexer modules, which is usually only needed in a few places.
I'm sure there are a number of better options to package things nicely. -- Andrei
Aug 21 2015
parent =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:
Am 21.08.2015 um 18:54 schrieb Andrei Alexandrescu:
 On 8/19/15 4:55 AM, Sönke Ludwig wrote:
 Am 19.08.2015 um 03:58 schrieb Andrei Alexandrescu:
 On 8/18/15 1:24 PM, Jacob Carlborg wrote:
 On 2015-08-18 17:18, Andrei Alexandrescu wrote:

 Me neither if internal. I do see a problem if it's public. -- Andrei
If it's public and those 20 lines are useful on its own, I don't see a problem with that either.
In this case at least they aren't. There is no need to import the JSON exception and the JSON location without importing anything else JSON. -- Andrei
The only other module where it would fit would be lexer.d, but that means that importing JSONValue also has to import the parser and lexer modules, which is usually only needed in a few places.
I'm sure there are a number of better options to package things nicely. -- Andrei
I'm all ears ;)
Aug 22 2015
prev sibling parent reply Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:
On 08/18/2015 09:18 AM, Andrei Alexandrescu wrote:
 On 8/18/15 2:31 AM, Jacob Carlborg wrote:
 I don't think this is excessive. We should strive to have small modules.
 We already have/had problems with std.algorithm and std.datetime, let's
 not repeat those mistakes. A module with 2000 lines is more than enough.
How about a module with 20? -- Andrei
Module boundaries should be determined by organizational grouping, not by size.
Aug 21 2015
next sibling parent "David Nadlinger" <code klickverbot.at> writes:
On Friday, 21 August 2015 at 16:25:40 UTC, Nick Sabalausky wrote:
 Module boundaries should be determined by organizational 
 grouping, not by size.
By organizational grouping as well as encapsulation concerns. Modules are the smallest units of encapsulation in D, visibility-wise. — David
Aug 21 2015
prev sibling next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/21/15 12:25 PM, Nick Sabalausky wrote:
 On 08/18/2015 09:18 AM, Andrei Alexandrescu wrote:
 On 8/18/15 2:31 AM, Jacob Carlborg wrote:
 I don't think this is excessive. We should strive to have small modules.
 We already have/had problems with std.algorithm and std.datetime, let's
 not repeat those mistakes. A module with 2000 lines is more than enough.
How about a module with 20? -- Andrei
Module boundaries should be determined by organizational grouping, not by size.
Rather by usefulness. As I mentioned, nobody would ever need only JSON's exceptions and location. -- Andrei
Aug 21 2015
prev sibling parent Jacob Carlborg <doob me.com> writes:
On 2015-08-21 18:25, Nick Sabalausky wrote:

 Module boundaries should be determined by organizational grouping, not
 by size.
Well, but it depends on how you decide what should be in a group. Size is usually a part of that decision, although it might not be conscious. You wouldn't but the whole D compiler on one module ;) -- /Jacob Carlborg
Aug 23 2015
prev sibling next sibling parent reply "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:
On Monday, 17 August 2015 at 22:21:50 UTC, Andrei Alexandrescu 
wrote:
 * stdx.data.json.generator: I think the API for converting 
 in-memory JSON values to strings needs to be redone, as follows:

 - JSONValue should offer a byToken range, which offers the 
 contents of the value one token at a time. For example, "[ 1, 
 2, 3 ]" offers the '[' token followed by three numeric tokens 
 with the respective values followed by the ']' token.
For iterating tree-like structures, a callback-based seems nicer, because it can naturally use the stack for storing its state. (I assume std.concurrency.Generator is too heavy-weight for this case.)
 - On top of byToken it's immediate to implement a method (say 
 toJSON or toString) that accepts an output range of characters 
 and formatting options.
If there really needs to be a range, `joiner` and `copy` should do the job.
 - On top of the method above with output range, implementing a 
 toString overload that returns a string for convenience is a 
 two-liner. However, it shouldn't return a "string"; Phobos APIs 
 should avoid "hardcoding" the string type. Instead, it should 
 return a user-chosen string type (including reference counting 
 strings).
`to!string`, for compatibility with std.conv.
 - While at it make prettyfication a flag in the options, not 
 its own part of the function name.
(That's already done.)
 * stdx.data.json.lexer:

 - I assume the idea was to accept ranges of integrals to mean 
 "there's some raw input from a file". This seems to be a bit 
 overdone, e.g. there's no need to accept signed integers or 
 64-bit integers. I suggest just going with the three character 
 types.

 - I see tokenization accepts input ranges. This forces the 
 tokenizer to store its own copy of things, which is no doubt 
 the business of appenderFactory. Here the departure of the 
 current approach from what I think should become canonical 
 Phobos APIs deepens for multiple reasons. First, 
 appenderFactory does allow customization of the append 
 operation (nice) but that's not enough to allow the user to 
 customize the lifetime of the created strings, which is usually 
 reflected in the string type itself. So the lexing method 
 should be parameterized by the string type used. (By default 
 string (as is now) should be fine.) Therefore instead of 
 customizing the append method just customize the string type 
 used in the token.

 - The lexer should internally take optimization opportunities, 
 e.g. if the string type is "string" and the lexed type is also 
 "string", great, just use slices of the input instead of 
 appending them to the tokens.

 - As a consequence the JSONToken type also needs to be 
 parameterized by the type of its string that holds the payload. 
 I understand this is a complication compared to the current 
 approach, but I don't see an out. In the grand scheme of things 
 it seems a necessary evil: tokens may or may not need a means 
 to manage lifetime of their payload, and that's determined by 
 the type of the payload. Hopefully simplifications in other 
 areas of the API would offset this.
I've never seen JSON encoded in anything other than UTF-8. Is it really necessary to complicate everything for such an infrequent niche case?
 - At token level there should be no number parsing. Just store 
 the payload with the token and leave it for later. Very often 
 numbers are converted without there being a need, and the 
 process is costly. This also nicely sidesteps the entire matter 
 of bigints, floating point etc. at this level.

 - Also, at token level strings should be stored with escapes 
 unresolved. If the user wants a string with the escapes 
 resolved, a lazy range does it.
This was already suggested, and it looks like a good idea, though there was an objection because of possible performance costs. The other objection, that it requires an allocation, is no longer valid if sliceable input is used.
 - Validating UTF is tricky; I've seen some discussion in this 
 thread about it. On the face of it JSON only accepts valid UTF 
 characters. As such, a modularity-based argument is to pipe UTF 
 validation before tokenization. (We need a lazy UTF validator 
 and sanitizer stat!) An efficiency-based argument is to do 
 validation during tokenization. I'm inclining in favor of 
 modularization, which allows us to focus on one thing at a time 
 and do it well, instead of duplicationg validation everywhere. 
 Note that it's easy to write routines that do JSON tokenization 
 and leave UTF validation for later, so there's a lot of 
 flexibility in composing validation with JSONization.
Well, in an ideal world, there should be no difference in performance between manually combined tokenization/validation, and composed ranges. We should practice what we preach here.
 * stdx.data.json.parser:

 - FWIW I think the whole thing with accommodating BigInt etc. 
 is an exaggeration. Just stick with long and double.
Or, as above, leave it to the end user and provide a `to(T)` method that can support built-in types and `BigInt` alike.
Aug 18 2015
parent reply Marco Leise <Marco.Leise gmx.de> writes:
Am Tue, 18 Aug 2015 09:05:32 +0000
schrieb "Marc Sch=C3=BCtz" <schuetzm gmx.net>:

 Or, as above, leave it to the end user and provide a `to(T)`=20
 method that can support built-in types and `BigInt` alike.
You mean the user should write a JSON number parsing routine on their own? Then which part is responsible for validation of JSON contraints? If it is the to!(T) function, then it is code duplication with chances of getting something wrong, if it is the JSON parser, then the number is parsed twice. Besides, there is a lot of code to be shared for every T. --=20 Marco
Sep 28 2015
parent reply Marc =?UTF-8?B?U2Now7x0eg==?= <schuetzm gmx.net> writes:
On Monday, 28 September 2015 at 07:02:35 UTC, Marco Leise wrote:
 Am Tue, 18 Aug 2015 09:05:32 +0000
 schrieb "Marc Schütz" <schuetzm gmx.net>:

 Or, as above, leave it to the end user and provide a `to(T)` 
 method that can support built-in types and `BigInt` alike.
You mean the user should write a JSON number parsing routine on their own? Then which part is responsible for validation of JSON contraints? If it is the to!(T) function, then it is code duplication with chances of getting something wrong, if it is the JSON parser, then the number is parsed twice. Besides, there is a lot of code to be shared for every T.
No, the JSON type should just store the raw unparsed token and implement: struct JSON { T to(T) if(isNumeric!T && is(typeof(T("")))) { return T(this.raw); } } The end user can then call: auto value = json.to!BigInt;
Sep 29 2015
next sibling parent Laeeth Isharc <laeethnospam nospamlaeeth.com> writes:
On Tuesday, 29 September 2015 at 11:06:03 UTC, Marc Schütz wrote:
 On Monday, 28 September 2015 at 07:02:35 UTC, Marco Leise wrote:
 Am Tue, 18 Aug 2015 09:05:32 +0000
 schrieb "Marc Schütz" <schuetzm gmx.net>:

 Or, as above, leave it to the end user and provide a `to(T)` 
 method that can support built-in types and `BigInt` alike.
You mean the user should write a JSON number parsing routine on their own? Then which part is responsible for validation of JSON contraints? If it is the to!(T) function, then it is code duplication with chances of getting something wrong, if it is the JSON parser, then the number is parsed twice. Besides, there is a lot of code to be shared for every T.
No, the JSON type should just store the raw unparsed token and implement: struct JSON { T to(T) if(isNumeric!T && is(typeof(T("")))) { return T(this.raw); } } The end user can then call: auto value = json.to!BigInt;
I was just speaking to Sonke about another aspect of this. It's not just numbers where this might be the case - dates are also often in a weird format (because the data comes from some ancient mainframe, for example). And similarly for enums where the field is a string but actually ought to fit in a fixed set of categories. I forgot the original context to this long thread, so hopefully this point is relevant. It's more relevant for the layer that will go on top where you want to be able to parse a json array or object as a D array/associative array of structs, as you can do in vibe.d currently. But maybe needs to be considered in lower level - I forget at this point.
Sep 29 2015
prev sibling parent Marco Leise <Marco.Leise gmx.de> writes:
Am Tue, 29 Sep 2015 11:06:01 +0000
schrieb Marc Sch=C3=BCtz <schuetzm gmx.net>:

 No, the JSON type should just store the raw unparsed token and=20
 implement:
=20
      struct JSON {
          T to(T) if(isNumeric!T && is(typeof(T("")))) {
              return T(this.raw);
          }
      }
=20
 The end user can then call:
=20
      auto value =3D json.to!BigInt;
Ah, the duck typing approach of accepting any numeric type constructible from a string. Still: You need to parse the number first to know how long the digit string is that you pass to T's ctor. And then you have two sets of syntaxes for numbers: JSON and T's ctor. T could potentially parse numbers with the system locale's setting for the decimal point which may be ',' while JSON uses '.' or support hexadecimal numbers which are also invalid JSON. On the other hand, a ctor for some integral type may not support the exponential notation "2e10", which could legitimately be used by JSON writers (Ruby's uses shortest way to store numbers) to save on bandwidth. --=20 Marco
Sep 30 2015
prev sibling next sibling parent reply =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:
Am 18.08.2015 um 00:21 schrieb Andrei Alexandrescu:
 I'll preface my review with a general comment. This API comes at an
 interesting juncture; we're striving as much as possible for interfaces
 that abstract away lifetime management, so they can be used comfortably
 with GC, or at high performance (and hopefully no or only marginal loss
 of comfort) with client-chosen lifetime management policies.

 The JSON API is a great test bed for our emerging recommended "push
 lifetime up" idioms; it's not too complicated yet it's not trivial
 either, and has great usefulness.

 With this, here are some points:

 * All new stuff should go in std.experimental. I assume "stdx" would
 change to that, should this work be merged.
Check.
 * On the face of it, dedicating 6 modules to such a small specification
 as JSON seems excessive. I'm thinking one module here. (As a simple
 point: who would ever want to import only foundation, which in turn has
 one exception type and one location type in it?) I think it shouldn't be
 up for debate that we must aim for simple and clean APIs.
That would mean a single module that is >5k lines long. Spreading out certain things, such as JSONValue into an own module also makes sense to avoid unnecessarily large imports where other parts of the functionality isn't needed. Maybe we could move some private things to "std.internal" or similar and merge some of the modules? But I also think that grouping symbols by topic is a good thing and makes figuring out the API easier. There is also always package.d if you really want to import everything.
 * stdx.data.json.generator: I think the API for converting in-memory
 JSON values to strings needs to be redone, as follows:

 - JSONValue should offer a byToken range, which offers the contents of
 the value one token at a time. For example, "[ 1, 2, 3 ]" offers the '['
 token followed by three numeric tokens with the respective values
 followed by the ']' token.
An input range style generator is on the TODO list, but would a token range be really useful for anything in practice? I would just go straight for a char range. Another thing I'd like to add is an output range that takes parser nodes and writes to a string output range. This would be the kind of interface that would be most useful for a serialization framework.
 - On top of byToken it's immediate to implement a method (say toJSON or
 toString) that accepts an output range of characters and formatting
 options.

 - On top of the method above with output range, implementing a toString
 overload that returns a string for convenience is a two-liner. However,
 it shouldn't return a "string"; Phobos APIs should avoid "hardcoding"
 the string type. Instead, it should return a user-chosen string type
 (including reference counting strings).
Without any existing code to test this against, how would this look like? Simply using an `Appender!rcstring`?
 - While at it make prettyfication a flag in the options, not its own
 part of the function name.
Already done. Pretty printing is now the default and there is GeneratorOptions.compact.
 * stdx.data.json.lexer:

 - I assume the idea was to accept ranges of integrals to mean "there's
 some raw input from a file". This seems to be a bit overdone, e.g.
 there's no need to accept signed integers or 64-bit integers. I suggest
 just going with the three character types.
It's funny you say that, because this was your own design proposal. Regarding the three character types, if we drop everything but those, I think we could also go with Walter's suggestion and just drop everything apart from "char". Putting a conversion range from dchar to char would be trivial and should be fast enough.
 - I see tokenization accepts input ranges. This forces the tokenizer to
 store its own copy of things, which is no doubt the business of
 appenderFactory.  Here the departure of the current approach from what I
 think should become canonical Phobos APIs deepens for multiple reasons.
 First, appenderFactory does allow customization of the append operation
 (nice) but that's not enough to allow the user to customize the lifetime
 of the created strings, which is usually reflected in the string type
 itself. So the lexing method should be parameterized by the string type
 used. (By default string (as is now) should be fine.) Therefore instead
 of customizing the append method just customize the string type used in
 the token.
Okay, sounds reasonable if Appender!rcstring is just going to work.
 - The lexer should internally take optimization opportunities, e.g. if
 the string type is "string" and the lexed type is also "string", great,
 just use slices of the input instead of appending them to the tokens.
It does.
 - As a consequence the JSONToken type also needs to be parameterized by
 the type of its string that holds the payload. I understand this is a
 complication compared to the current approach, but I don't see an out.
 In the grand scheme of things it seems a necessary evil: tokens may or
 may not need a means to manage lifetime of their payload, and that's
 determined by the type of the payload. Hopefully simplifications in
 other areas of the API would offset this.
It wouldn't be too bad here, because it's presumably pretty rare to store tokens or parser nodes. Worse is JSONValue.
 - At token level there should be no number parsing. Just store the
 payload with the token and leave it for later. Very often numbers are
 converted without there being a need, and the process is costly. This
 also nicely sidesteps the entire matter of bigints, floating point etc.
 at this level.
Okay, again, this was your own suggestion. The downside of always storing the string representation is that it requires allocations if no slices are used, and that the string will have to be parsed twice if the number is indeed going to be used. This can have a considerable performance impact.
 - Also, at token level strings should be stored with escapes unresolved.
 If the user wants a string with the escapes resolved, a lazy range does it.
To make things efficient, it currently stores escaped strings if slices of the input are used, but stores unescaped strings if allocations are necessary anyway.
 - Validating UTF is tricky; I've seen some discussion in this thread
 about it. On the face of it JSON only accepts valid UTF characters. As
 such, a modularity-based argument is to pipe UTF validation before
 tokenization. (We need a lazy UTF validator and sanitizer stat!) An
 efficiency-based argument is to do validation during tokenization. I'm
 inclining in favor of modularization, which allows us to focus on one
 thing at a time and do it well, instead of duplicationg validation
 everywhere. Note that it's easy to write routines that do JSON
 tokenization and leave UTF validation for later, so there's a lot of
 flexibility in composing validation with JSONization.
It's unfortunate to see this change of mind in face of the work that already went into the implementation. I also still think that this is a good optimization opportunity that doesn't really affect the implementation complexity. Validation isn't duplicated, but reused from std.utf.
 - Litmus test: if the input type is a forward range AND if the string
 type chosen for tokens is the same as input type, successful
 tokenization should allocate exactly zero memory. I think this is a
 simple way to make sure that the tokenization API works well.
Supporting arbitrary forward ranges doesn't seem to be enough, it would at least have to be combined with something like take(), but then the type doesn't equal the string type anymore. I'd suggest to keep it to "if is sliceable and input type equals string type", at least for the initial version.
 - If noThrow is a runtime option, some functions can't be nothrow (and
 consequently nogc). Not sure how important this is. Probably quite a bit
 because of the current gc implications of exceptions. IMHO: at lexing
 level a sound design might just emit error tokens (with the culprit as
 payload) and never throw. Clients may always throw when they see an
 error token.
noThrow is a compile time option and there are nothrow unit tests to make sure that the API is nothrow at least for string inputs.
 * stdx.data.json.parser:

 - Similar considerations regarding string type used apply here as well:
 everything should be parameterized with it - the use case to keep in
 mind is someone wants everything with refcounted strings.
Okay.
 - The JSON value does its own internal allocation (for e.g. arrays and
 hashtables), which should be fine as long as it's encapsulated and we
 can tweak it later (e.g. make it use reference counting inside).
Since it's based on (Tagged)Algebraic, the internal types are part of the interface. Changing them later is bound to break some code. So AFICS this would either require to make the types used parameterized (string, array and AA types). Or to abstract them away completely, i.e. only forward operations but deny direct access to the type. ... thinking about it, TaggedAlgebraic could do that, while Algebraic can't.
 - parseJSONStream should parameterize on string type, not on
 appenderFactory.
Okay.
 - Why both parseJSONStream and parseJSONValue? I'm thinking
 parseJSONValue would be enough because then you trivially parse a stream
 with repeated calls to parseJSONValue.
parseJSONStream is the pull parser (StAX style) interface. It returns the contents of a JSON document as individual nodes instead of storing them in a DOM. This part is vital for high-performance parsing, especially of large documents.
 - FWIW I think the whole thing with accommodating BigInt etc. is an
 exaggeration. Just stick with long and double.
As mentioned earlier somewhere in this thread, there are practical needs to at least be able to handle ulong, too. Maybe the solution is indeed to just (optionally) store the string representation, so people can convert as they see fit.
 - readArray suddenly introduces a distinct kind of interacting -
 callbacks. Why? Should be a lazy range lazy range lazy range. An adapter
 using callbacks is then a two-liner.
It just has a more complicated implementation, but is already on the TODO list.
 - Why is readBool even needed? Just readJSONValue and then enforce it as
 a bool. Same reasoning applies to readDouble and readString.
This is for lower level access, using parseJSONValue would certainly be possible, but it would have quite some unneeded overhead and would also be non- nogc.
 - readObject is with callbacks again - it would be nice if it were a
 lazy range.
Okay, is also already on the list.
 - skipXxx are nice to have and useful.

 * stdx.data.json.value:

 - The etymology of "opt" is unclear - no word starting with "opt" or
 obviously abbreviating to it is in the documentation. "opt2" is awkward.
 How about "path" and "dyn", respectively.
The names are just placeholders currently. I think one of the two should also be enough. I've just implemented both, so that both can be tested/seen in practice. There have also been some more name suggestions in a thread mentioned by Meta with a more general suggestion for normal D member access. I'll see if I can dig those up, too.
 - I think Algebraic should be used throughout instead of
 TaggedAlgebraic, or motivation be given for the latter.
There have already been quite some arguments that I think are compelling, especially with a lack of counter arguments (maybe their consequences need to be explained better, though). TaggedAlgebraic could also (implicitly) convert to Algebraic. An additional argument is the potential possibility of TaggedAlgebraic to abstract away the underlying type, since it doesn't rely on a has!T and get!T API. But apart from that, algebraic is unfortunately currently quite unsuited for this kind of abstraction, even if that can be solved in theory (with a lot of work). It requires to write things like obj.get!(JSONValue[string])["foo"].get!JSONValue instead of just obj["foo"], because it simply returns Variant from all of its forwarded operators.
 - JSONValue should be more opaque and not expose representation as much
 as it does now. In particular, offering a built-in hashtable is bound to
 be problematic because those are expensive to construct, create garbage,
 and are not customizable. Instead, the necessary lookup and set APIs
 should be provided by JSONValue whilst keeping the implementation
 hidden. The same goes about array - a JSONValue shall not be exposed;
 instead, indexed access primitives should be exposed. Separate types
 might be offered (e.g. JSONArray, JSONDictionary) if deemed necessary.
 The string type should be a type parameter of JSONValue.
This would unfortunately at the same time destroy almost all benefits that using (Tagged)Algebraic has, namely that it would opens up the possibility to have interoperability between different data formats (for example, passing a JSONValue to a BSON generator without letting the BSON generator know about JSON). This is unfortunately an area that I've also not yet properly explored, but I think it's important as we go forward with other data formats.
 ==============================

 So, here we are. I realize a good chunk of this is surprising ("you mean
 I shouldn't create strings in my APIs?"). My point here is, again, we're
 at a juncture. We're trying to factor garbage (heh) out of API design in
 ways that defer the lifetime management to the user of the API.
Most suggestions so far sound very reasonable, namely parameterizing parsing/lexing on the string type and using ranges where possible. JSONValue is a different beast that needs some more thought if we really want to keep it generic in terms of allocation/lifetime model. In terms of removing "garbage" from the API, I'm just not 100% sure if removing small but frequently used functions, such as a string conversion function (one that returns an allocated string) is really a good idea (what Walter's suggested).
 We could pull json into std.experimental and defer the policy decisions
 for later, but I think it's a great driver for them. (Thanks Sönke for
 doing all the work, this is a great baseline.) I think we should use the
 JSON API as a guinea pig for the new era of D API design in which we
 have a solid set of principles, tools, and guidelines to defer lifetime
 management. Please advise.
Aug 18 2015
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/18/15 12:54 PM, Sönke Ludwig wrote:
 Am 18.08.2015 um 00:21 schrieb Andrei Alexandrescu:
 * On the face of it, dedicating 6 modules to such a small specification
 as JSON seems excessive. I'm thinking one module here. (As a simple
 point: who would ever want to import only foundation, which in turn has
 one exception type and one location type in it?) I think it shouldn't be
 up for debate that we must aim for simple and clean APIs.
That would mean a single module that is >5k lines long. Spreading out certain things, such as JSONValue into an own module also makes sense to avoid unnecessarily large imports where other parts of the functionality isn't needed. Maybe we could move some private things to "std.internal" or similar and merge some of the modules?
That would help. My point is it's good design to make the response proportional to the problem. 5K lines is not a lot, but reducing those 5K in the first place would be a noble pursuit. And btw saving parsing time is so C++ :o).
 But I also think that grouping symbols by topic is a good thing and
 makes figuring out the API easier. There is also always package.d if you
 really want to import everything.
Figuring out the API easily is a good goal. The best way to achieve that is making the API no larger than necessary.
 * stdx.data.json.generator: I think the API for converting in-memory
 JSON values to strings needs to be redone, as follows:

 - JSONValue should offer a byToken range, which offers the contents of
 the value one token at a time. For example, "[ 1, 2, 3 ]" offers the '['
 token followed by three numeric tokens with the respective values
 followed by the ']' token.
An input range style generator is on the TODO list, but would a token range be really useful for anything in practice? I would just go straight for a char range.
Sounds good.
 Another thing I'd like to add is an output range that takes parser nodes
 and writes to a string output range. This would be the kind of interface
 that would be most useful for a serialization framework.
Couldn't that be achieved trivially by e.g. using map!(t => t.toString) or similar? This is the nice thing about rangifying everything - suddenly you have a host of tools at your disposal.
 - On top of byToken it's immediate to implement a method (say toJSON or
 toString) that accepts an output range of characters and formatting
 options.

 - On top of the method above with output range, implementing a toString
 overload that returns a string for convenience is a two-liner. However,
 it shouldn't return a "string"; Phobos APIs should avoid "hardcoding"
 the string type. Instead, it should return a user-chosen string type
 (including reference counting strings).
Without any existing code to test this against, how would this look like? Simply using an `Appender!rcstring`?
Yes.
 - While at it make prettyfication a flag in the options, not its own
 part of the function name.
Already done. Pretty printing is now the default and there is GeneratorOptions.compact.
Great, thanks.
 * stdx.data.json.lexer:

 - I assume the idea was to accept ranges of integrals to mean "there's
 some raw input from a file". This seems to be a bit overdone, e.g.
 there's no need to accept signed integers or 64-bit integers. I suggest
 just going with the three character types.
It's funny you say that, because this was your own design proposal.
Ooops...
 Regarding the three character types, if we drop everything but those, I
 think we could also go with Walter's suggestion and just drop everything
 apart from "char". Putting a conversion range from dchar to char would
 be trivial and should be fast enough.
That's great, thanks.
 - I see tokenization accepts input ranges. This forces the tokenizer to
 store its own copy of things, which is no doubt the business of
 appenderFactory.  Here the departure of the current approach from what I
 think should become canonical Phobos APIs deepens for multiple reasons.
 First, appenderFactory does allow customization of the append operation
 (nice) but that's not enough to allow the user to customize the lifetime
 of the created strings, which is usually reflected in the string type
 itself. So the lexing method should be parameterized by the string type
 used. (By default string (as is now) should be fine.) Therefore instead
 of customizing the append method just customize the string type used in
 the token.
Okay, sounds reasonable if Appender!rcstring is just going to work.
Awesome, thanks.
 - The lexer should internally take optimization opportunities, e.g. if
 the string type is "string" and the lexed type is also "string", great,
 just use slices of the input instead of appending them to the tokens.
It does.
Yay to that.
 - At token level there should be no number parsing. Just store the
 payload with the token and leave it for later. Very often numbers are
 converted without there being a need, and the process is costly. This
 also nicely sidesteps the entire matter of bigints, floating point etc.
 at this level.
Okay, again, this was your own suggestion. The downside of always storing the string representation is that it requires allocations if no slices are used, and that the string will have to be parsed twice if the number is indeed going to be used. This can have a considerable performance impact.
Hmm, point taken. I'm not too worried about the parsing part but string allocation may be problematic.
 - Also, at token level strings should be stored with escapes unresolved.
 If the user wants a string with the escapes resolved, a lazy range
 does it.
To make things efficient, it currently stores escaped strings if slices of the input are used, but stores unescaped strings if allocations are necessary anyway.
That seems a good balance, and probably could be applied to numbers as well.
 - Validating UTF is tricky; I've seen some discussion in this thread
 about it. On the face of it JSON only accepts valid UTF characters. As
 such, a modularity-based argument is to pipe UTF validation before
 tokenization. (We need a lazy UTF validator and sanitizer stat!) An
 efficiency-based argument is to do validation during tokenization. I'm
 inclining in favor of modularization, which allows us to focus on one
 thing at a time and do it well, instead of duplicationg validation
 everywhere. Note that it's easy to write routines that do JSON
 tokenization and leave UTF validation for later, so there's a lot of
 flexibility in composing validation with JSONization.
It's unfortunate to see this change of mind in face of the work that already went into the implementation. I also still think that this is a good optimization opportunity that doesn't really affect the implementation complexity. Validation isn't duplicated, but reused from std.utf.
Well if the validation is reused from std.utf, it can't have been very much work. I maintain that separating concerns seems like a good strategy here.
 - Litmus test: if the input type is a forward range AND if the string
 type chosen for tokens is the same as input type, successful
 tokenization should allocate exactly zero memory. I think this is a
 simple way to make sure that the tokenization API works well.
Supporting arbitrary forward ranges doesn't seem to be enough, it would at least have to be combined with something like take(), but then the type doesn't equal the string type anymore. I'd suggest to keep it to "if is sliceable and input type equals string type", at least for the initial version.
I had "take" in mind. Don't forget that "take" automatically uses slices wherever applicable. So if you just use typeof(take(...)), you get the best of all worlds. The more restrictive version seems reasonable for the first release.
 - If noThrow is a runtime option, some functions can't be nothrow (and
 consequently nogc). Not sure how important this is. Probably quite a bit
 because of the current gc implications of exceptions. IMHO: at lexing
 level a sound design might just emit error tokens (with the culprit as
 payload) and never throw. Clients may always throw when they see an
 error token.
noThrow is a compile time option and there are nothrow unit tests to make sure that the API is nothrow at least for string inputs.
Awesome.
 - The JSON value does its own internal allocation (for e.g. arrays and
 hashtables), which should be fine as long as it's encapsulated and we
 can tweak it later (e.g. make it use reference counting inside).
Since it's based on (Tagged)Algebraic, the internal types are part of the interface. Changing them later is bound to break some code. So AFICS this would either require to make the types used parameterized (string, array and AA types). Or to abstract them away completely, i.e. only forward operations but deny direct access to the type. ... thinking about it, TaggedAlgebraic could do that, while Algebraic can't.
Well if you figure the general Algebraic type is better replaced by a type specialized for JSON, fine. What we shouldn't endorse is two nearly identical library types (Algebraic and TaggedAlgebraic) that are only different in subtle matters related to performance in certain use patterns. If integral tags are better for closed type universes, specialize Algebraic to use integral tags where applicable.
 - Why both parseJSONStream and parseJSONValue? I'm thinking
 parseJSONValue would be enough because then you trivially parse a stream
 with repeated calls to parseJSONValue.
parseJSONStream is the pull parser (StAX style) interface. It returns the contents of a JSON document as individual nodes instead of storing them in a DOM. This part is vital for high-performance parsing, especially of large documents.
So perhaps this is just a naming issue. The names don't suggest everything you said. What I see is "parse a JSON stream" and "parse a JSON value". So I naturally assumed we're looking at consuming a full stream vs. consuming only one value off a stream and stopping. How about better names?
 - FWIW I think the whole thing with accommodating BigInt etc. is an
 exaggeration. Just stick with long and double.
As mentioned earlier somewhere in this thread, there are practical needs to at least be able to handle ulong, too. Maybe the solution is indeed to just (optionally) store the string representation, so people can convert as they see fit.
Great. I trust you'll find the right compromise there. All I'm saying is that BigInt here stands like a sore thumb in the whole affair. Best to just take it out and let folks who need it build on top of the lexer.
 - readArray suddenly introduces a distinct kind of interacting -
 callbacks. Why? Should be a lazy range lazy range lazy range. An adapter
 using callbacks is then a two-liner.
It just has a more complicated implementation, but is already on the TODO list.
Great. Let me say again that with ranges you get to instantly tap into a wealth of tools. I say get rid of the callbacks and let a "tee" take care of it for whomever needs it.
 - Why is readBool even needed? Just readJSONValue and then enforce it as
 a bool. Same reasoning applies to readDouble and readString.
This is for lower level access, using parseJSONValue would certainly be possible, but it would have quite some unneeded overhead and would also be non- nogc.
Meh, fine. But all of this is adding weight to the API in the wrong places.
 - readObject is with callbacks again - it would be nice if it were a
 lazy range.
Okay, is also already on the list.
Awes!
 - skipXxx are nice to have and useful.

 * stdx.data.json.value:

 - The etymology of "opt" is unclear - no word starting with "opt" or
 obviously abbreviating to it is in the documentation. "opt2" is awkward.
 How about "path" and "dyn", respectively.
The names are just placeholders currently. I think one of the two should also be enough. I've just implemented both, so that both can be tested/seen in practice. There have also been some more name suggestions in a thread mentioned by Meta with a more general suggestion for normal D member access. I'll see if I can dig those up, too.
Okay.
 - I think Algebraic should be used throughout instead of
 TaggedAlgebraic, or motivation be given for the latter.
There have already been quite some arguments that I think are compelling, especially with a lack of counter arguments (maybe their consequences need to be explained better, though). TaggedAlgebraic could also (implicitly) convert to Algebraic. An additional argument is the potential possibility of TaggedAlgebraic to abstract away the underlying type, since it doesn't rely on a has!T and get!T API.
To reiterate the point I made above: we should not endorse two mostly equivalent types that exhibit subtle performance differences. Feel free to change Algebraic to use integrals for some/most cases when the number of types involved is bounded. Adding new methods to Algebraic should also be fine. Just don't add a new type that's 98% the same.
 But apart from that, algebraic is unfortunately currently quite unsuited
 for this kind of abstraction, even if that can be solved in theory (with
 a lot of work). It requires to write things like
 obj.get!(JSONValue[string])["foo"].get!JSONValue instead of just
 obj["foo"], because it simply returns Variant from all of its forwarded
 operators.
Algebraic does not expose opIndex. We could add it to Algebraic such that obj["foo"] returns the same type a "this". It's easy for anyone to say that what's there is unfit for a particular purpose. It's also easy for many to define a ever-so-slightly-different new artifact that fits a particular purpose. Where you come as a talented hacker is to operate with the understanding of the importance of making things work, and make it work.
 - JSONValue should be more opaque and not expose representation as much
 as it does now. In particular, offering a built-in hashtable is bound to
 be problematic because those are expensive to construct, create garbage,
 and are not customizable. Instead, the necessary lookup and set APIs
 should be provided by JSONValue whilst keeping the implementation
 hidden. The same goes about array - a JSONValue shall not be exposed;
 instead, indexed access primitives should be exposed. Separate types
 might be offered (e.g. JSONArray, JSONDictionary) if deemed necessary.
 The string type should be a type parameter of JSONValue.
This would unfortunately at the same time destroy almost all benefits that using (Tagged)Algebraic has, namely that it would opens up the possibility to have interoperability between different data formats (for example, passing a JSONValue to a BSON generator without letting the BSON generator know about JSON). This is unfortunately an area that I've also not yet properly explored, but I think it's important as we go forward with other data formats.
I think we need to do it. Otherwise we're stuck with "D's JSON API cannot be used without the GC". We want to escape that gravitational pull. I know it's hard. But it's worth it.
 ==============================

 So, here we are. I realize a good chunk of this is surprising ("you mean
 I shouldn't create strings in my APIs?"). My point here is, again, we're
 at a juncture. We're trying to factor garbage (heh) out of API design in
 ways that defer the lifetime management to the user of the API.
Most suggestions so far sound very reasonable, namely parameterizing parsing/lexing on the string type and using ranges where possible. JSONValue is a different beast that needs some more thought if we really want to keep it generic in terms of allocation/lifetime model. In terms of removing "garbage" from the API, I'm just not 100% sure if removing small but frequently used functions, such as a string conversion function (one that returns an allocated string) is really a good idea (what Walter's suggested).
We must accommodate a GC-less world. It's definitely time to acknowledge the GC as a brake that limits D adoption, and put our full thrust behind removing it. Andrei
Aug 21 2015
next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/21/15 1:30 PM, Andrei Alexandrescu wrote:
 So perhaps this is just a naming issue. The names don't suggest
 everything you said. What I see is "parse a JSON stream" and "parse a
 JSON value". So I naturally assumed we're looking at consuming a full
 stream vs. consuming only one value off a stream and stopping. How about
 better names?
I should add that in parseJSONStream, "stream" refers to the input, whereas in parseJSONValue, "value" refers to the output. -- Andrei
Aug 21 2015
prev sibling next sibling parent reply "tired_eyes" <pastuhov85 gmail.com> writes:
On Friday, 21 August 2015 at 17:30:43 UTC, Andrei Alexandrescu 
wrote:
 We must accommodate a GC-less world. It's definitely time to 
 acknowledge the GC as a brake that limits D adoption, and put 
 our full thrust behind removing it.


 Andrei
Wow. Just wow.
Aug 21 2015
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/21/15 2:03 PM, tired_eyes wrote:
 On Friday, 21 August 2015 at 17:30:43 UTC, Andrei Alexandrescu wrote:
 We must accommodate a GC-less world. It's definitely time to
 acknowledge the GC as a brake that limits D adoption, and put our full
 thrust behind removing it.


 Andrei
Wow. Just wow.
By "it" there I mean "the brake" :o). -- Andrei
Aug 21 2015
parent reply "H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:
On Fri, Aug 21, 2015 at 02:21:06PM -0400, Andrei Alexandrescu via Digitalmars-d
wrote:
 On 8/21/15 2:03 PM, tired_eyes wrote:
On Friday, 21 August 2015 at 17:30:43 UTC, Andrei Alexandrescu wrote:
We must accommodate a GC-less world. It's definitely time to
acknowledge the GC as a brake that limits D adoption, and put our
full thrust behind removing it.


Andrei
Wow. Just wow.
By "it" there I mean "the brake" :o). -- Andrei
Wait, wait. So you're saying the GC is a brake, and we should remove the brake, and therefore we should remove the GC? This is ... wow. I'm speechless here. T -- He who sacrifices functionality for ease of use, loses both and deserves neither. -- Slashdotter
Aug 21 2015
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/21/15 2:50 PM, H. S. Teoh via Digitalmars-d wrote:
 On Fri, Aug 21, 2015 at 02:21:06PM -0400, Andrei Alexandrescu via
Digitalmars-d wrote:
 On 8/21/15 2:03 PM, tired_eyes wrote:
 On Friday, 21 August 2015 at 17:30:43 UTC, Andrei Alexandrescu wrote:
 We must accommodate a GC-less world. It's definitely time to
 acknowledge the GC as a brake that limits D adoption, and put our
 full thrust behind removing it.


 Andrei
Wow. Just wow.
By "it" there I mean "the brake" :o). -- Andrei
Wait, wait. So you're saying the GC is a brake, and we should remove the brake, and therefore we should remove the GC? This is ... wow. I'm speechless here.
Nothing new here. We want to make it a pleasant experience to use D without a garbage collector. -- Andrei
Aug 21 2015
next sibling parent "H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:
On Fri, Aug 21, 2015 at 03:22:25PM -0400, Andrei Alexandrescu via Digitalmars-d
wrote:
 On 8/21/15 2:50 PM, H. S. Teoh via Digitalmars-d wrote:
On Fri, Aug 21, 2015 at 02:21:06PM -0400, Andrei Alexandrescu via Digitalmars-d
wrote:
On 8/21/15 2:03 PM, tired_eyes wrote:
On Friday, 21 August 2015 at 17:30:43 UTC, Andrei Alexandrescu wrote:
We must accommodate a GC-less world. It's definitely time to
acknowledge the GC as a brake that limits D adoption, and put our
full thrust behind removing it.


Andrei
Wow. Just wow.
By "it" there I mean "the brake" :o). -- Andrei
Wait, wait. So you're saying the GC is a brake, and we should remove the brake, and therefore we should remove the GC? This is ... wow. I'm speechless here.
Nothing new here. We want to make it a pleasant experience to use D without a garbage collector. -- Andrei
Making it pleasant to use without a GC is not the same thing as removing the GC. Which is it? T -- Try to keep an open mind, but not so open your brain falls out. -- theboz
Aug 21 2015
prev sibling parent Steven Schveighoffer <schveiguy yahoo.com> writes:
On 8/21/15 3:22 PM, Andrei Alexandrescu wrote:
 On 8/21/15 2:50 PM, H. S. Teoh via Digitalmars-d wrote:
 On Fri, Aug 21, 2015 at 02:21:06PM -0400, Andrei Alexandrescu via
 Digitalmars-d wrote:
 On 8/21/15 2:03 PM, tired_eyes wrote:
 On Friday, 21 August 2015 at 17:30:43 UTC, Andrei Alexandrescu wrote:
 We must accommodate a GC-less world. It's definitely time to
 acknowledge the GC as a brake that limits D adoption, and put our
 full thrust behind removing it.


 Andrei
Wow. Just wow.
By "it" there I mean "the brake" :o). -- Andrei
Wait, wait. So you're saying the GC is a brake, and we should remove the brake, and therefore we should remove the GC? This is ... wow. I'm speechless here.
Nothing new here. We want to make it a pleasant experience to use D without a garbage collector. -- Andrei
Allow me to (possibly) clarify. What Andrei is saying is that you should be able to use D and phobos *without* the GC, not that we should remove the GC. e.g. what Walter was talking about at dconf2015 that instead of converting an integer to a GC-allocated string, you return a range that does the same thing but doesn't allocate. -Steve
Aug 21 2015
prev sibling parent reply =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:
Am 21.08.2015 um 19:30 schrieb Andrei Alexandrescu:
 On 8/18/15 12:54 PM, Sönke Ludwig wrote:
 Am 18.08.2015 um 00:21 schrieb Andrei Alexandrescu:
 * On the face of it, dedicating 6 modules to such a small specification
 as JSON seems excessive. I'm thinking one module here. (As a simple
 point: who would ever want to import only foundation, which in turn has
 one exception type and one location type in it?) I think it shouldn't be
 up for debate that we must aim for simple and clean APIs.
That would mean a single module that is >5k lines long. Spreading out certain things, such as JSONValue into an own module also makes sense to avoid unnecessarily large imports where other parts of the functionality isn't needed. Maybe we could move some private things to "std.internal" or similar and merge some of the modules?
That would help. My point is it's good design to make the response proportional to the problem. 5K lines is not a lot, but reducing those 5K in the first place would be a noble pursuit. And btw saving parsing time is so C++ :o).
Most lines are needed for tests and documentation. Surely dropping some functionality would make the module smaller, too. But there is not a lot to take away without making severe compromises in terms of actual functionality or usability.
 But I also think that grouping symbols by topic is a good thing and
 makes figuring out the API easier. There is also always package.d if you
 really want to import everything.
Figuring out the API easily is a good goal. The best way to achieve that is making the API no larger than necessary.
So, what's your suggestion, remove all read*/skip* functions for example? Make them member functions of JSONParserRange instead of UFCS functions? We could of course also just use the pseudo modules that std.algorithm had for example, where we'd create a table in the documentation for each category of functions.
 Another thing I'd like to add is an output range that takes parser nodes
 and writes to a string output range. This would be the kind of interface
 that would be most useful for a serialization framework.
Couldn't that be achieved trivially by e.g. using map!(t => t.toString) or similar? This is the nice thing about rangifying everything - suddenly you have a host of tools at your disposal.
No, the idea is to have an output range like so: Appender!string dst; JSONNodeOutputRange r(&dst); r.put(beginArray); r.put(1); r.put(2); r.put(endArray); This would provide a forward interface for code that has to directly iterate over its input, which is the case for a serializer - it can't provide an input range interface in a sane way. The alternative would be to either let the serializer re-implement all of JSON, or to just provide some primitives (writeJSON() that takes bool, number or string) and to let the serializer implement the rest of JSON (arrays/objects), which includes certain options, such as pretty-printing.
 - Also, at token level strings should be stored with escapes unresolved.
 If the user wants a string with the escapes resolved, a lazy range
 does it.
To make things efficient, it currently stores escaped strings if slices of the input are used, but stores unescaped strings if allocations are necessary anyway.
That seems a good balance, and probably could be applied to numbers as well.
With the difference that numbers stored as numbers never need to allocate, so for non-slicable inputs the compromise is not the same. What about just offering basically three (CT selectable) modes: - Always parse as double (parse lazily if slicing can be used) (default) - Parse double or long (again, lazily if slicing can be used) - Always store the string representation The question that remains is how to handle this in JSONValue - support just double there? Or something like JSONNumber that abstracts away the differences, but makes writing generic code against JSONValue difficult? Or make it also parameterized in what it can store?
 - Validating UTF is tricky; I've seen some discussion in this thread
 about it. On the face of it JSON only accepts valid UTF characters. As
 such, a modularity-based argument is to pipe UTF validation before
 tokenization. (We need a lazy UTF validator and sanitizer stat!) An
 efficiency-based argument is to do validation during tokenization. I'm
 inclining in favor of modularization, which allows us to focus on one
 thing at a time and do it well, instead of duplicationg validation
 everywhere. Note that it's easy to write routines that do JSON
 tokenization and leave UTF validation for later, so there's a lot of
 flexibility in composing validation with JSONization.
It's unfortunate to see this change of mind in face of the work that already went into the implementation. I also still think that this is a good optimization opportunity that doesn't really affect the implementation complexity. Validation isn't duplicated, but reused from std.utf.
Well if the validation is reused from std.utf, it can't have been very much work. I maintain that separating concerns seems like a good strategy here.
There is more than the actual call to validate(), such as writing tests and making sure the surroundings work, adjusting the interface and writing documentation. It's not *that* much work, but nonetheless wasted work. I also still think that this hasn't been a bad idea at all. Because it speeds up the most important use case, parsing JSON from a non-memory source that has not yet been validated. I also very much like the idea of making it a programming error to have invalid UTF stored in a string, i.e. forcing the validation to happen before the cast from bytes to chars.
 - Litmus test: if the input type is a forward range AND if the string
 type chosen for tokens is the same as input type, successful
 tokenization should allocate exactly zero memory. I think this is a
 simple way to make sure that the tokenization API works well.
Supporting arbitrary forward ranges doesn't seem to be enough, it would at least have to be combined with something like take(), but then the type doesn't equal the string type anymore. I'd suggest to keep it to "if is sliceable and input type equals string type", at least for the initial version.
I had "take" in mind. Don't forget that "take" automatically uses slices wherever applicable. So if you just use typeof(take(...)), you get the best of all worlds. The more restrictive version seems reasonable for the first release.
Okay.
 - The JSON value does its own internal allocation (for e.g. arrays and
 hashtables), which should be fine as long as it's encapsulated and we
 can tweak it later (e.g. make it use reference counting inside).
Since it's based on (Tagged)Algebraic, the internal types are part of the interface. Changing them later is bound to break some code. So AFICS this would either require to make the types used parameterized (string, array and AA types). Or to abstract them away completely, i.e. only forward operations but deny direct access to the type. ... thinking about it, TaggedAlgebraic could do that, while Algebraic can't.
Well if you figure the general Algebraic type is better replaced by a type specialized for JSON, fine. What we shouldn't endorse is two nearly identical library types (Algebraic and TaggedAlgebraic) that are only different in subtle matters related to performance in certain use patterns. If integral tags are better for closed type universes, specialize Algebraic to use integral tags where applicable.
TaggedAlgebraic would not be a type specialized for JSON! It's useful for all kinds of applications and just happens to have some advantages here, too. An (imperfect) idea for merging this with the existing Algebraic name: template Algebraic(T) if (is(T == struct) || is(T == union)) { // ... implementation of TaggedAlgebraic ... } To avoid the ambiguity with a single type Algebraic, a UDA could be required for T to get the actual TaggedAgebraic behavior. Everything else would be problematic, because TaggedAlgebraic needs to be supplied with names for the different types, so the Algebraic(T...) way of specifying allowed types doesn't really work. And, more importantly, because exploiting static type information in the generated interface means breaking code that currently is built around a Variant return value.
 - Why both parseJSONStream and parseJSONValue? I'm thinking
 parseJSONValue would be enough because then you trivially parse a stream
 with repeated calls to parseJSONValue.
parseJSONStream is the pull parser (StAX style) interface. It returns the contents of a JSON document as individual nodes instead of storing them in a DOM. This part is vital for high-performance parsing, especially of large documents.
So perhaps this is just a naming issue. The names don't suggest everything you said. What I see is "parse a JSON stream" and "parse a JSON value". So I naturally assumed we're looking at consuming a full stream vs. consuming only one value off a stream and stopping. How about better names?
parseToJSONValue/parseToJSONStream? parseAsX?
 - readArray suddenly introduces a distinct kind of interacting -
 callbacks. Why? Should be a lazy range lazy range lazy range. An adapter
 using callbacks is then a two-liner.
It just has a more complicated implementation, but is already on the TODO list.
Great. Let me say again that with ranges you get to instantly tap into a wealth of tools. I say get rid of the callbacks and let a "tee" take care of it for whomever needs it.
The callbacks would surely be dropped when ranges get available. foreach() should usually be all that is needed.
 - Why is readBool even needed? Just readJSONValue and then enforce it as
 a bool. Same reasoning applies to readDouble and readString.
This is for lower level access, using parseJSONValue would certainly be possible, but it would have quite some unneeded overhead and would also be non- nogc.
Meh, fine. But all of this is adding weight to the API in the wrong places.
Frankly, I don't think that this is even the wrong place. The pull parser interface is the single most important part of the API when we talk about allocation-less and high-performance operation. It also really has low weight, as it's just a small function that joins the other read* functions quite naturally and doesn't create any additional cognitive load.
 - readObject is with callbacks again - it would be nice if it were a
 lazy range.
Okay, is also already on the list.
Awes!
It could return a Tuple!(string, JSONNodeRange). But probably there should also be an opApply for the object field range, so that foreach (key, value; ...) becomes possible.
 But apart from that, algebraic is unfortunately currently quite unsuited
 for this kind of abstraction, even if that can be solved in theory (with
 a lot of work). It requires to write things like
 obj.get!(JSONValue[string])["foo"].get!JSONValue instead of just
 obj["foo"], because it simply returns Variant from all of its forwarded
 operators.
Algebraic does not expose opIndex. We could add it to Algebraic such that obj["foo"] returns the same type a "this".
https://github.com/D-Programming-Language/phobos/blob/6df5d551fd8a21feef061483c226e7d9b26d6cd4/std/variant.d#L1088 https://github.com/D-Programming-Language/phobos/blob/6df5d551fd8a21feef061483c226e7d9b26d6cd4/std/variant.d#L1348
 It's easy for anyone to say that what's there is unfit for a particular
 purpose. It's also easy for many to define a ever-so-slightly-different
 new artifact that fits a particular purpose. Where you come as a
 talented hacker is to operate with the understanding of the importance
 of making things work, and make it work.
The problem is that making Algebraic exploit static type information means nothing short of a complete reimplementation, which TaggedAlgebraic is. It also means breaking existing code, if, for example, alg[0] suddenly returns a string instead of just a Variant with a string stored inside.
 - JSONValue should be more opaque and not expose representation as much
 as it does now. In particular, offering a built-in hashtable is bound to
 be problematic because those are expensive to construct, create garbage,
 and are not customizable. Instead, the necessary lookup and set APIs
 should be provided by JSONValue whilst keeping the implementation
 hidden. The same goes about array - a JSONValue shall not be exposed;
 instead, indexed access primitives should be exposed. Separate types
 might be offered (e.g. JSONArray, JSONDictionary) if deemed necessary.
 The string type should be a type parameter of JSONValue.
This would unfortunately at the same time destroy almost all benefits that using (Tagged)Algebraic has, namely that it would opens up the possibility to have interoperability between different data formats (for example, passing a JSONValue to a BSON generator without letting the BSON generator know about JSON). This is unfortunately an area that I've also not yet properly explored, but I think it's important as we go forward with other data formats.
I think we need to do it. Otherwise we're stuck with "D's JSON API cannot be used without the GC". We want to escape that gravitational pull. I know it's hard. But it's worth it.
I can't fight the feeling that what Phobos currently has in terms of allocators, containters and reference counting is simply not mature enough to make a good decision here. Restricting JSONValue as much as possible would at least keep the possibility to extend it later, but I think that we can and should do better in the long term.
Aug 22 2015
parent reply "Martin Nowak" <code dawg.eu> writes:
On Saturday, 22 August 2015 at 13:41:49 UTC, Sönke Ludwig wrote:
 There is more than the actual call to validate(), such as 
 writing tests and making sure the surroundings work, adjusting 
 the interface and writing documentation. It's not *that* much 
 work, but nonetheless wasted work.

 I also still think that this hasn't been a bad idea at all. 
 Because it speeds up the most important use case, parsing JSON 
 from a non-memory source that has not yet been validated. I 
 also very much like the idea of making it a programming error 
 to have invalid UTF stored in a string, i.e. forcing the 
 validation to happen before the cast from bytes to chars.
Also see "utf/unicode should only be validated once" https://issues.dlang.org/show_bug.cgi?id=14919 If combining lexing and validation is faster (why?) then a ubyte consuming interface should be available, though why couldn't it be done by adding a lazy ubyte->char validator range to std.utf. In any case during lexing we should avoid autodecoding of narrow strings for redundant validation.
Aug 24 2015
parent reply =?UTF-8?Q?S=c3=b6nke_Ludwig?= <sludwig rejectedsoftware.com> writes:
Am 25.08.2015 um 07:55 schrieb Martin Nowak:
 On Saturday, 22 August 2015 at 13:41:49 UTC, Sönke Ludwig wrote:
 There is more than the actual call to validate(), such as writing
 tests and making sure the surroundings work, adjusting the interface
 and writing documentation. It's not *that* much work, but nonetheless
 wasted work.

 I also still think that this hasn't been a bad idea at all. Because it
 speeds up the most important use case, parsing JSON from a non-memory
 source that has not yet been validated. I also very much like the idea
 of making it a programming error to have invalid UTF stored in a
 string, i.e. forcing the validation to happen before the cast from
 bytes to chars.
Also see "utf/unicode should only be validated once" https://issues.dlang.org/show_bug.cgi?id=14919 If combining lexing and validation is faster (why?) then a ubyte consuming interface should be available, though why couldn't it be done by adding a lazy ubyte->char validator range to std.utf. In any case during lexing we should avoid autodecoding of narrow strings for redundant validation.
The performance benefit comes from the fact that almost all of JSON is a subset of ASCII, so that lexing the input will implicitly validate it as correct UTF. The only places where actual UTF sequences can occur is in string literals outside of escape sequences. Depending on the type of document, that can result is a lot less conditionals compared to a full validation of the input. Autodecoding during lexing is being avoided, everything happens on the code unit level.
Aug 25 2015
parent Martin Nowak <code+news.digitalmars dawg.eu> writes:
On 08/25/2015 09:03 AM, Sönke Ludwig wrote:
 The performance benefit comes from the fact that almost all of JSON is a
 subset of ASCII, so that lexing the input will implicitly validate it as
 correct UTF. The only places where actual UTF sequences can occur is in
 string literals outside of escape sequences. Depending on the type of
 document, that can result is a lot less conditionals compared to a full
 validation of the input.
I see, then we should indeed exploit this fact and offer lexing of ubyte[]-ish ranges.
Aug 25 2015
prev sibling next sibling parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 08/18/2015 12:21 AM, Andrei Alexandrescu wrote:
 - JSONValue should offer a byToken range, which offers the contents of
 the value one token at a time. For example, "[ 1, 2, 3 ]" offers the '['
 token followed by three numeric tokens with the respective values
 followed by the ']' token.
What about the comma tokens?
Aug 19 2015
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/19/15 8:42 AM, Timon Gehr wrote:
 On 08/18/2015 12:21 AM, Andrei Alexandrescu wrote:
 - JSONValue should offer a byToken range, which offers the contents of
 the value one token at a time. For example, "[ 1, 2, 3 ]" offers the '['
 token followed by three numeric tokens with the respective values
 followed by the ']' token.
What about the comma tokens?
Forgot about those. The invariant is that byToken should return a sequence of tokens that, when parsed, produces the originating object. -- Andrei
Aug 19 2015
parent reply Jacob Carlborg <doob me.com> writes:
On 2015-08-19 19:29, Andrei Alexandrescu wrote:

 Forgot about those. The invariant is that byToken should return a
 sequence of tokens that, when parsed, produces the originating object.
That should be possible without the comma tokens in this case? -- /Jacob Carlborg
Aug 19 2015
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/19/15 1:59 PM, Jacob Carlborg wrote:
 On 2015-08-19 19:29, Andrei Alexandrescu wrote:

 Forgot about those. The invariant is that byToken should return a
 sequence of tokens that, when parsed, produces the originating object.
That should be possible without the comma tokens in this case?
That is correct, but would do little else than confusing folks. FWIW the distinction is similar to AST vs. CST (C = Concrete). -- Andrei
Aug 19 2015
prev sibling parent reply Martin Nowak <code+news.digitalmars dawg.eu> writes:
On 08/18/2015 12:21 AM, Andrei Alexandrescu wrote:
 * All new stuff should go in std.experimental. I assume "stdx" would
 change to that, should this work be merged.
Though stdx (or better std.x) would have been a prettier and more exciting name for std.experimental to begin with.
Aug 24 2015
parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 08/25/2015 08:18 AM, Martin Nowak wrote:
 On 08/18/2015 12:21 AM, Andrei Alexandrescu wrote:
 * All new stuff should go in std.experimental. I assume "stdx" would
 change to that, should this work be merged.
Though stdx (or better std.x) would have been a prettier and more exciting name for std.experimental to begin with.
The great thing about the experimental package is that we are actually allowed to rename it. :-)
Aug 25 2015
parent Steven Schveighoffer <schveiguy yahoo.com> writes:
On 8/25/15 11:02 AM, Timon Gehr wrote:
 On 08/25/2015 08:18 AM, Martin Nowak wrote:
 On 08/18/2015 12:21 AM, Andrei Alexandrescu wrote:
 * All new stuff should go in std.experimental. I assume "stdx" would
 change to that, should this work be merged.
Though stdx (or better std.x) would have been a prettier and more exciting name for std.experimental to begin with.
The great thing about the experimental package is that we are actually allowed to rename it. :-)
I strongly oppose renaming it. I don't want Phobos to fall into the trap of javax, which was supposed to be "experimental" but then became unmovable. std.experimental is much more obvious that you shouldn't expect things to live there forever. -Steve
Aug 25 2015
prev sibling next sibling parent Martin Nowak <code+news.digitalmars dawg.eu> writes:
Will try to convert a piece of code I wrote a few days ago.
https://github.com/MartinNowak/rabbitmq-munin/blob/48c3e7451dec0dcb2b6dccbb9b4230b224e2e647/src/app.d
Right now working with json for trivial stuff is a pain.
Aug 25 2015
prev sibling next sibling parent reply tired_eyes <pastuhov85 gmail.com> writes:
So, what is the current status of std.data.json? This topic is 
almost two month old, what is the result of "two week process"? 
Wiki page tells nothing except of "ready for comments".
Sep 24 2015
parent Atila Neves <atila.neves gmail.com> writes:
On Thursday, 24 September 2015 at 20:44:57 UTC, tired_eyes wrote:
 So, what is the current status of std.data.json? This topic is 
 almost two month old, what is the result of "two week process"? 
 Wiki page tells nothing except of "ready for comments".
I probably should have posted here. Soenke is working on all the comments as far as I know. It'll come back. Atila
Sep 24 2015
prev sibling next sibling parent Marco Leise <Marco.Leise gmx.de> writes:
Am Tue, 28 Jul 2015 14:07:18 +0000
schrieb "Atila Neves" <atila.neves gmail.com>:

 Start of the two week process, folks.
 
 Code: https://github.com/s-ludwig/std_data_json
 Docs: http://s-ludwig.github.io/std_data_json/
 
 Atila
There is one thing I noticed today that I personally feel strongly about: Serialized double values are not restored accurately. That is, when I send a double value via JSON and use enough digits to represent it accurately, it may not be decoded to the same value. `std.json` does not have this problem with the random values from [0..1) I tested with. I also tried `LexOptions.useBigInt/.useLong` to no avail. Looking at the unittests it seems the decision was deliberate, as `approxEqual` is used in parsing tests. JSON specs don't enforce any specific accuracy, but they say that you can arrange for a lossless transmission of the widely supported IEEE double values, by using up to 17 significant digits. -- Marco
Oct 02 2015
prev sibling parent reply Alex <a b.c> writes:
JSON is a particular file format useful for serialising 
heirachical data.

Given that D also has an XML module which appears to be 
deprecated, I wonder if it would be better to write a more 
abstract serialisation/persistance module that could use either 
json,xml,some binary format and future formats.

I would estimate that more than 70% of the times, the JSON data 
will only be read and written by a single D application, with 
only occasional inspection by developers etc.
In these cases it is undesirable to have code littered with types 
coming from a particular serialisation file format library.
As the software evolves that file format might become 
obsolete/slow/unfashionable etc, and it would be much nicer if 
the format could be changed without a lot of code being touched.
The other 30% of uses will genuinely need raw JSON control when 
reading/writing files written/read by other software, and this 
needs to be in Phobos to implement the backends.
It would be better for most people to not write their code in 
terms of JSON, but in terms of the more abstract concept of 
persistence/serialisation (whatever you want to call it).
Oct 06 2015
next sibling parent =?UTF-8?Q?S=c3=b6nke_Ludwig?= <sludwig rejectedsoftware.com> writes:
Am 06.10.2015 um 12:05 schrieb Alex:
 JSON is a particular file format useful for serialising heirachical data.

 Given that D also has an XML module which appears to be deprecated, I
 wonder if it would be better to write a more abstract
 serialisation/persistance module that could use either json,xml,some
 binary format and future formats.

 I would estimate that more than 70% of the times, the JSON data will
 only be read and written by a single D application, with only occasional
 inspection by developers etc.
 In these cases it is undesirable to have code littered with types coming
 from a particular serialisation file format library.
 As the software evolves that file format might become
 obsolete/slow/unfashionable etc, and it would be much nicer if the
 format could be changed without a lot of code being touched.
 The other 30% of uses will genuinely need raw JSON control when
 reading/writing files written/read by other software, and this needs to
 be in Phobos to implement the backends.
 It would be better for most people to not write their code in terms of
 JSON, but in terms of the more abstract concept of
 persistence/serialisation (whatever you want to call it).
A generic serialization framework is definitely needed! Jacob Carlborg had once tried to get the Orange[1] serialization library into Phobos, but the amount of requested changes was quite overwhelming and it didn't work out so far. There is also a serialization framework in vibe.d[2], but in contrast to Orange it doesn't handle cross references (for pointers/reference types). But this is definitely outside of the scope of this particular module and will require a separate effort. It is intended to be well suited for that purpose, though. [1]: https://github.com/jacob-carlborg/orange [2]: http://vibed.org/api/vibe.data.serialization/
Oct 06 2015
prev sibling parent reply Sebastiaan Koppe <mail skoppe.eu> writes:
On Tuesday, 6 October 2015 at 10:05:46 UTC, Alex wrote:
 I wonder if it would be better to write a more abstract 
 serialisation/persistance module that could use either 
 json,xml,some binary format and future formats.
I think there are too many particulars making an abstract (de)serialization module unworkable. If that wasn't the case it would be easy to transform any format into another, by simply deserializing from format A and serializing to format B. But a little experiment will show you that it requires a lot of complexity for the non-trivial case. And the format's particulars will still show up in your code. At which point it begs the question, why not just write simple primitive (de)serialization modules that only do one format? Probably easier to build, maintain and debug. I am reminded of a binary file format I once wrote which supported referenced objects and had enough meta-data to allow garbage collection. It was a big ugly c++ template monster. Any abstract deserializer is going to stay away from that.
Oct 06 2015
parent Atila Neves <atila.neves gmail.com> writes:
On Tuesday, 6 October 2015 at 15:47:08 UTC, Sebastiaan Koppe 
wrote:
 At which point it begs the question, why not just write simple 
 primitive (de)serialization modules that only do one format? 
 Probably easier to build, maintain and debug.
The binary one is the one I care about, so that's the one I wrote: https://github.com/atilaneves/cerealed I've thinking of adding other formats. I don't know if it's worth it. Atila
Oct 06 2015