www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - std.json API improvement - Request for code review

reply Brian Schott <brian-schott cox.net> writes:
Now that a few bugs are fixed in DMD (notably 4826), my improvements to
std.json compile. My primary purpose in this code change is streamlining
the process of creating JSON documents. You can now do stuff like this:

auto json = JSONValue();
json["numbers"] = [1, 3, 5, 7];
json["nullValue"] = null;
json["something"] = false;
json["vector"] = ["x": 234.231, "y" : 12.8, "z" : 35.0];
assert("vector" in json);
json["vector"]["x"] = 42.8;

More example usage:
http://www.hackerpilot.org/src/phobos/jsontest.d

The implementation of the actual JSON data structure and parsing /
writing is unchanged. Ddoc comments were added so that the documentation
page for the module won't be quite so empty.

Implementation:
http://www.hackerpilot.org/src/phobos/json.d

I'd like to get this committed back to Phobos if there's a consensus
that these changes make sense. Comments welcome. (Note: You'll need a
DMD version built from SVN to use this)

- Brian
Sep 12 2010
next sibling parent reply sybrandy <sybrandy gmail.com> writes:
On 09/12/2010 05:05 AM, Brian Schott wrote:
 Now that a few bugs are fixed in DMD (notably 4826), my improvements to
 std.json compile. My primary purpose in this code change is streamlining
 the process of creating JSON documents. You can now do stuff like this:

 auto json = JSONValue();
 json["numbers"] = [1, 3, 5, 7];
 json["nullValue"] = null;
 json["something"] = false;
 json["vector"] = ["x": 234.231, "y" : 12.8, "z" : 35.0];
 assert("vector" in json);
 json["vector"]["x"] = 42.8;

 More example usage:
 http://www.hackerpilot.org/src/phobos/jsontest.d

 The implementation of the actual JSON data structure and parsing /
 writing is unchanged. Ddoc comments were added so that the documentation
 page for the module won't be quite so empty.

 Implementation:
 http://www.hackerpilot.org/src/phobos/json.d

 I'd like to get this committed back to Phobos if there's a consensus
 that these changes make sense. Comments welcome. (Note: You'll need a
 DMD version built from SVN to use this)

 - Brian

Everything I've seen looks good to me, though I haven't tried to execute it. The fact that I can directly manipulate a JSONValue looks good to me. However, here's a question: if I have the following JSON document and parse it using parseJSON, will obj1 and obj2 both be JSONValues? My current project deals with this type of situation on a regular basis, so I'm curious about how easy this will be to access the data. Casey { "obj1": { "obj2": { "val1": 1, "val2": "a string" }, "val3": [ 1, 2, 3, 4] } }
Sep 12 2010
next sibling parent reply Brian Schott <brian-schott cox.net> writes:
Everything is a JSONValue. JSONValue has a union inside it that actually
holds the data. I just wrote a short program that reads your example
file. (This could be a bit more efficient if I stored obj2, but I think
it's enough to communicate the idea.)

import std.stdio;
import std.json;
import std.file;

void main(string[] args)
{
	auto jsonString = readText("../ml.json");
	JSONValue json = parseJSON(jsonString);
	writeln(json["obj1"]["obj2"]["val1"].integer);
	writeln(json["obj1"]["obj2"]["val2"].str);
	foreach(value; json["obj1"]["val3"].array)
		writeln(value.integer);
}

Output:

1
a string
1
2
3
4

Your question did remind me to document the union members so that the
HTML documentation will show how to access the actual data. I've
uploaded the new version of the file. The link is the same.

On 09/12/2010 05:19 PM, sybrandy wrote:
 Everything I've seen looks good to me, though I haven't tried to execute
 it.  The fact that I can directly manipulate a JSONValue looks good to
 me.  However, here's a question: if I have the following JSON document
 and parse it using parseJSON, will obj1 and obj2 both be JSONValues?  My
 current project deals with this type of situation on a regular basis, so
 I'm curious about how easy this will be to access the data.
 
 Casey
 
 {
     "obj1":
     {
         "obj2":
         {
             "val1": 1,
             "val2": "a string"
         },
         "val3": [ 1, 2, 3, 4]
     }
 }

Sep 12 2010
parent reply sybrandy <sybrandy gmail.com> writes:
On 09/12/2010 10:53 PM, Brian Schott wrote:
 Everything is a JSONValue. JSONValue has a union inside it that actually
 holds the data. I just wrote a short program that reads your example
 file. (This could be a bit more efficient if I stored obj2, but I think
 it's enough to communicate the idea.)

 import std.stdio;
 import std.json;
 import std.file;

 void main(string[] args)
 {
 	auto jsonString = readText("../ml.json");
 	JSONValue json = parseJSON(jsonString);
 	writeln(json["obj1"]["obj2"]["val1"].integer);
 	writeln(json["obj1"]["obj2"]["val2"].str);
 	foreach(value; json["obj1"]["val3"].array)
 		writeln(value.integer);
 }

 Output:

 1
 a string
 1
 2
 3
 4

 Your question did remind me to document the union members so that the
 HTML documentation will show how to access the actual data. I've
 uploaded the new version of the file. The link is the same.

Cool. It looks very simple and easy...just the way I like it. What you have is actually quite nice as I can navigate down to a low-level element without having to store 200 different intermediate values. Probably not very common, but a nicety for when it's needed. Thanks! Casey
Sep 13 2010
parent sybrandy <sybrandy gmail.com> writes:
 Unfortunately, the above code is horribly broken. Here's how to read a
 number correctly:

 real x;
 if(json["vector"]["x"].type == JSON_TYPE.INTEGER) {
 x = json["vector"]["x"].integer;
 } else if(json["vector"]["x"].type == JSON_TYPE.FLOAT) {
 x = json["vector"]["x"].floating;
 } else {
 enforceEx!(JSONException)(false);
 }

 You'll notice that before any access you must check to ensure the JSON
 type is what you think it should be. As noted above, JSON does not
 differentiate between integers and reals, so you have to test both on
 access.

Understood. Even though this probably isn't the way it will end up based on some previous discussion, I like the way the indices are used to access the elements. Perhaps this means I'm more of a C guy than a D guy in some respects. Definitely not a Java guy. As for the integers vs. floats, does the API always treat a number as a float even if it is an integer? If so, then checking for an integer vs. a float may not be a big deal in many cases. Casey
Sep 14 2010
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Mon, 13 Sep 2010 18:59:57 -0400, sybrandy <sybrandy gmail.com> wrote:

 On 09/12/2010 10:53 PM, Brian Schott wrote:
 Everything is a JSONValue. JSONValue has a union inside it that actually
 holds the data. I just wrote a short program that reads your example
 file. (This could be a bit more efficient if I stored obj2, but I think
 it's enough to communicate the idea.)

 import std.stdio;
 import std.json;
 import std.file;

 void main(string[] args)
 {
 	auto jsonString = readText("../ml.json");
 	JSONValue json = parseJSON(jsonString);
 	writeln(json["obj1"]["obj2"]["val1"].integer);
 	writeln(json["obj1"]["obj2"]["val2"].str);
 	foreach(value; json["obj1"]["val3"].array)
 		writeln(value.integer);
 }

 Output:

 1
 a string
 1
 2
 3
 4

 Your question did remind me to document the union members so that the
 HTML documentation will show how to access the actual data. I've
 uploaded the new version of the file. The link is the same.

Cool. It looks very simple and easy...just the way I like it. What you have is actually quite nice as I can navigate down to a low-level element without having to store 200 different intermediate values. Probably not very common, but a nicety for when it's needed. Thanks! Casey

Unfortunately, the above code is horribly broken. Here's how to read a number correctly: real x; if(json["vector"]["x"].type == JSON_TYPE.INTEGER) { x = json["vector"]["x"].integer; } else if(json["vector"]["x"].type == JSON_TYPE.FLOAT) { x = json["vector"]["x"].floating; } else { enforceEx!(JSONException)(false); } You'll notice that before any access you must check to ensure the JSON type is what you think it should be. As noted above, JSON does not differentiate between integers and reals, so you have to test both on access.
Sep 13 2010
prev sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Tue, 14 Sep 2010 17:48:25 -0400, sybrandy <sybrandy gmail.com> wrote:

 Unfortunately, the above code is horribly broken. Here's how to read a
 number correctly:

 real x;
 if(json["vector"]["x"].type == JSON_TYPE.INTEGER) {
 x = json["vector"]["x"].integer;
 } else if(json["vector"]["x"].type == JSON_TYPE.FLOAT) {
 x = json["vector"]["x"].floating;
 } else {
 enforceEx!(JSONException)(false);
 }

 You'll notice that before any access you must check to ensure the JSON
 type is what you think it should be. As noted above, JSON does not
 differentiate between integers and reals, so you have to test both on
 access.

Understood. Even though this probably isn't the way it will end up based on some previous discussion, I like the way the indices are used to access the elements. Perhaps this means I'm more of a C guy than a D guy in some respects. Definitely not a Java guy.

And the really awesome thing about D is that its trivial to support both json.vector.x and json["vector"]["x"] syntaxes.
 As for the integers vs. floats, does the API always treat a number as a  
 float even if it is an integer?  If so, then checking for an integer vs.  
 a float may not be a big deal in many cases.

 Casey

Nope. The current std.json dynamically checks if a number is an integer or real and stores the data accordingly. So you'd have to do the checks.
Sep 14 2010
prev sibling parent reply "Robert Jacques" <sandford jhu.edu> writes:
On Sun, 12 Sep 2010 05:05:09 -0400, Brian Schott <brian-schott cox.net>  
wrote:

 Now that a few bugs are fixed in DMD (notably 4826), my improvements to
 std.json compile. My primary purpose in this code change is streamlining
 the process of creating JSON documents. You can now do stuff like this:

 auto json = JSONValue();
 json["numbers"] = [1, 3, 5, 7];
 json["nullValue"] = null;
 json["something"] = false;
 json["vector"] = ["x": 234.231, "y" : 12.8, "z" : 35.0];
 assert("vector" in json);
 json["vector"]["x"] = 42.8;

 More example usage:
 http://www.hackerpilot.org/src/phobos/jsontest.d

 The implementation of the actual JSON data structure and parsing /
 writing is unchanged. Ddoc comments were added so that the documentation
 page for the module won't be quite so empty.

 Implementation:
 http://www.hackerpilot.org/src/phobos/json.d

 I'd like to get this committed back to Phobos if there's a consensus
 that these changes make sense. Comments welcome. (Note: You'll need a
 DMD version built from SVN to use this)

 - Brian

Hi Brain, This really belongs on the phobos mailing list as JSON isn't ready for public consumption yet (as far as I know). I would suspect that it even has a decent chance of being dropped in favor of serialization + variant. The implementation has several bugs. First, it doesn't parse Unicode escape sequences correctly (e.g. \u0026). Second, JSON has no integer type. Third, the serializer with certain JSON value inputs will write a JSON file that can not be read by the parser. It's also missing some key features, like output range and human readable output support. The design is very C-ish as opposed to D-ish: its composed of a bunch of free functions / types all containing JSON in their name. (i.e. parseJSON). These should all be encapsulated as member functions. Getting more to the API itself, the reading of a JSON value is a use case that just isn't considered currently. Consider: // It's relatively simple to write to a JSON value json["vector"]["x"] = 42.8; // But reading it... real x; if(json["vector"]["x"].type == JSON_TYPE.INTEGER) { x = json["vector"]["x"].integer; } else if(json["vector"]["x"].type == JSON_TYPE.FLOAT) { x = json["vector"]["x"].floating; } else { enforceEx!(JSONException)(false); } By contrast, this is the API on my personal JSON library: json.vector.x = 42.8; auto x = json.vector.x.number;
Sep 12 2010
next sibling parent Brian Schott <brian-schott cox.net> writes:
I just found the phobos mailing list. Why is it on a completely
different server?

Regarding the bugs, my intent was just to improve the usefulness of the
existing implementation. I was not aware of any plans to drop this
module. (There's no notice to this effect in the documentation the way
there is with std.contracts)

What do you recommend for dealing with JSON files until this is sorted out?

On 09/12/2010 08:45 PM, Robert Jacques wrote:
 Hi Brain,
 This really belongs on the phobos mailing list as JSON isn't ready for
 public consumption yet (as far as I know). I would suspect that it even
 has a decent chance of being dropped in favor of serialization +
 variant. The implementation has several bugs. First, it doesn't parse
 Unicode escape sequences correctly (e.g. \u0026). Second, JSON has no
 integer type. Third, the serializer with certain JSON value inputs will
 write a JSON file that can not be read by the parser. It's also missing
 some key features, like output range and human readable output support.
 The design is very C-ish as opposed to D-ish: its composed of a bunch of
 free functions / types all containing JSON in their name. (i.e.
 parseJSON). These should all be encapsulated as member functions.
 
 Getting more to the API itself, the reading of a JSON value is a use
 case that just isn't considered currently. Consider:
 
 // It's relatively simple to write to a JSON value
 json["vector"]["x"] = 42.8;
 
 // But reading it...
 real x;
 if(json["vector"]["x"].type == JSON_TYPE.INTEGER) {
     x = json["vector"]["x"].integer;
 } else if(json["vector"]["x"].type == JSON_TYPE.FLOAT) {
     x = json["vector"]["x"].floating;
 } else {
     enforceEx!(JSONException)(false);
 }
 
 By contrast, this is the API on my personal JSON library:
 json.vector.x = 42.8;
 auto x = json.vector.x.number;

Sep 12 2010
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Mon, 13 Sep 2010 00:18:40 -0400, Brian Schott <brian-schott cox.net>  
wrote:
 I just found the phobos mailing list. Why is it on a completely
 different server?

No clue. I've only just joined the list recently myself.
 Regarding the bugs, my intent was just to improve the usefulness of the
 existing implementation. I was not aware of any plans to drop this
 module. (There's no notice to this effect in the documentation the way
 there is with std.contracts)

There's no plan that I know of regarding std.json. The code was literally taken from a pastebin by Jeremie Pelletier and hasn't been touched (or discussed) since. It was lurking around with its documentation unlinked, like std.perf, but it appears that this is no longer the case (though this wasn't mentioned in the change logs). However, there has been a bunch of talk regarding serialization and so std.json will need to change to accommodate this. And Json value is really a specialized version of variant, so if certain bugs/etc where fixed in variant there'll be no need for json value.
 What do you recommend for dealing with JSON files until this is sorted  
 out?

There are two solutions, as I see it: 1) Move std.json (or an improved version) to user code land (i.e. scrapple) in the short term for people who need it. Long term, fix/improve std.variant and add a "std.serialize" module, probably based on orange. 2) Fix/improve std.json and decide later what to do with it when "std.serialize" arrives. I've got my own Json library that I'm willing to share, but I need some basic serialization support and I know my compile-time only solution isn't going to be the final solution for phobos, so I've been reluctant to submit it as a replacement.
Sep 12 2010
prev sibling next sibling parent reply Sean Kelly <sean invisibleduck.org> writes:
Robert Jacques Wrote:
 
 Hi Brain,
 This really belongs on the phobos mailing list as JSON isn't ready for  
 public consumption yet (as far as I know). I would suspect that it even  
 has a decent chance of being dropped in favor of serialization + variant.  
 The implementation has several bugs. First, it doesn't parse Unicode  
 escape sequences correctly (e.g. \u0026). Second, JSON has no integer  
 type. Third, the serializer with certain JSON value inputs will write a  
 JSON file that can not be read by the parser. It's also missing some key  
 features, like output range and human readable output support. The design  
 is very C-ish as opposed to D-ish: its composed of a bunch of free  
 functions / types all containing JSON in their name. (i.e. parseJSON).  
 These should all be encapsulated as member functions.
 
 Getting more to the API itself, the reading of a JSON value is a use case  
 that just isn't considered currently. Consider:
 
 // It's relatively simple to write to a JSON value
 json["vector"]["x"] = 42.8;
 
 // But reading it...
 real x;
 if(json["vector"]["x"].type == JSON_TYPE.INTEGER) {
      x = json["vector"]["x"].integer;
 } else if(json["vector"]["x"].type == JSON_TYPE.FLOAT) {
      x = json["vector"]["x"].floating;
 } else {
      enforceEx!(JSONException)(false);
 }
 
 By contrast, this is the API on my personal JSON library:
 json.vector.x = 42.8;
 auto x = json.vector.x.number;

Could all this sit atop a SAX-style API? I'm not likely to ever use an API that requires memory allocation for parsing or writing data.
Sep 13 2010
parent Sean Kelly <sean invisibleduck.org> writes:
Robert Jacques Wrote:

 On Mon, 13 Sep 2010 14:30:10 -0400, Sean Kelly <sean invisibleduck.org>  
 wrote:
 
 Could all this sit atop a SAX-style API?  I'm not likely to ever use an  
 API that requires memory allocation for parsing or writing data.

Well, writing data could be done using output ranges easily enough, so no extra memory writing troubles there. As for parsing, the biggest cost with JSON is that fact that all strings can include escape chars, so things have to be copied instead of sliced.

What I've always done is to not automatically unescape string data but rather provide a function for the user to do it so they can provide the buffer. Alternately, this behavior could be configurable. Escaping output should definitely be configurable though. In fact, I often don't even want numbers to be automatically converted from their string to real/int representation, since it's common for me to want to operate on the value as a string. So even this I like being given the original representation and calling to!int or whatever on my own.
 However, there's nothing preventing a  
 SAX style implementation in the format itself. Except that JSON has less  
 extra meta-data than XML so SAX becomes a less informative. Instead of:
 
 object start vector
 member x
 number 42.8
 object end vector
 
 you have something like
 
 object start
 member vector
 object start
 member x
 number 42.8
 object end
 object end
 
 For myself, the files are under a mb and random access makes everything  
 much faster to program and debug.

Random access is definitely nice, it's more the performance cost of all those allocations that's an issue for me. What I'm basically looking for is a set of events like this: alias void delegate(char[]) ParseEvent; ParseEvent onObjectEnter, onObjectKey, onObjectLeave; ParseEvent onArrayEnter, onArrayLeave; ParseEvent onStringValue, onFloatValue, onIntValue; ParseEvent onTrueValue, onFalseValue, onNullValue; With corresponding write events on the output side so if I hooked the parser to the writer the data would all flow through and generate output identical to the input, formatting notwithstanding (though I'd add the option to write numbers as either a string, real, or int). I like the event parameter being a char[] because it allows me to unescape string data in place, etc. There are some advanced-mode writer options I'd like as well, like the ability to dump a char[] blob directly into the destination string without translation, saving and restoring writer state, etc. I don't know if anyone besides myself would find all this useful though. These are just some things I've found necessary for the work I do.
Sep 14 2010
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Mon, 13 Sep 2010 14:30:10 -0400, Sean Kelly <sean invisibleduck.org>  
wrote:

 Robert Jacques Wrote:
 Hi Brain,
 This really belongs on the phobos mailing list as JSON isn't ready for
 public consumption yet (as far as I know). I would suspect that it even
 has a decent chance of being dropped in favor of serialization +  
 variant.
 The implementation has several bugs. First, it doesn't parse Unicode
 escape sequences correctly (e.g. \u0026). Second, JSON has no integer
 type. Third, the serializer with certain JSON value inputs will write a
 JSON file that can not be read by the parser. It's also missing some key
 features, like output range and human readable output support. The  
 design
 is very C-ish as opposed to D-ish: its composed of a bunch of free
 functions / types all containing JSON in their name. (i.e. parseJSON).
 These should all be encapsulated as member functions.

 Getting more to the API itself, the reading of a JSON value is a use  
 case
 that just isn't considered currently. Consider:

 // It's relatively simple to write to a JSON value
 json["vector"]["x"] = 42.8;

 // But reading it...
 real x;
 if(json["vector"]["x"].type == JSON_TYPE.INTEGER) {
      x = json["vector"]["x"].integer;
 } else if(json["vector"]["x"].type == JSON_TYPE.FLOAT) {
      x = json["vector"]["x"].floating;
 } else {
      enforceEx!(JSONException)(false);
 }

 By contrast, this is the API on my personal JSON library:
 json.vector.x = 42.8;
 auto x = json.vector.x.number;

Could all this sit atop a SAX-style API? I'm not likely to ever use an API that requires memory allocation for parsing or writing data.

Well, writing data could be done using output ranges easily enough, so no extra memory writing troubles there. As for parsing, the biggest cost with JSON is that fact that all strings can include escape chars, so things have to be copied instead of sliced. However, there's nothing preventing a SAX style implementation in the format itself. Except that JSON has less extra meta-data than XML so SAX becomes a less informative. Instead of: object start vector member x number 42.8 object end vector you have something like object start member vector object start member x number 42.8 object end object end For myself, the files are under a mb and random access makes everything much faster to program and debug.
Sep 13 2010
prev sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Tue, 14 Sep 2010 11:18:14 -0400, Sean Kelly <sean invisibleduck.org>  
wrote:

 Robert Jacques Wrote:

 On Mon, 13 Sep 2010 14:30:10 -0400, Sean Kelly <sean invisibleduck.org>
 wrote:

 Could all this sit atop a SAX-style API?  I'm not likely to ever use  

 API that requires memory allocation for parsing or writing data.

Well, writing data could be done using output ranges easily enough, so no extra memory writing troubles there. As for parsing, the biggest cost with JSON is that fact that all strings can include escape chars, so things have to be copied instead of sliced.

What I've always done is to not automatically unescape string data but rather provide a function for the user to do it so they can provide the buffer. Alternately, this behavior could be configurable. Escaping output should definitely be configurable though. In fact, I often don't even want numbers to be automatically converted from their string to real/int representation, since it's common for me to want to operate on the value as a string. So even this I like being given the original representation and calling to!int or whatever on my own.
 However, there's nothing preventing a
 SAX style implementation in the format itself. Except that JSON has less
 extra meta-data than XML so SAX becomes a less informative. Instead of:

 object start vector
 member x
 number 42.8
 object end vector

 you have something like

 object start
 member vector
 object start
 member x
 number 42.8
 object end
 object end

 For myself, the files are under a mb and random access makes everything
 much faster to program and debug.

Random access is definitely nice, it's more the performance cost of all those allocations that's an issue for me. What I'm basically looking for is a set of events like this: alias void delegate(char[]) ParseEvent; ParseEvent onObjectEnter, onObjectKey, onObjectLeave; ParseEvent onArrayEnter, onArrayLeave; ParseEvent onStringValue, onFloatValue, onIntValue; ParseEvent onTrueValue, onFalseValue, onNullValue; With corresponding write events on the output side so if I hooked the parser to the writer the data would all flow through and generate output identical to the input, formatting notwithstanding (though I'd add the option to write numbers as either a string, real, or int). I like the event parameter being a char[] because it allows me to unescape string data in place, etc. There are some advanced-mode writer options I'd like as well, like the ability to dump a char[] blob directly into the destination string without translation, saving and restoring writer state, etc. I don't know if anyone besides myself would find all this useful though. These are just some things I've found necessary for the work I do.

This seems pretty straight forward. Could you list the advanced-mode features you'd need?
Sep 14 2010