www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - I wrote a JSON library

reply "w0rp" <devw0rp gmail.com> writes:
I wasn't quite satisfied with std.json or the JSON libraries in 
frameworks. The standard library doesn't make it easy enough to 
create JSON objects, and my primary objection for the framework 
solutions is that they seem to depend on other parts of the 
frameworks. (I'd rather not depend on a host of libraries I won't 
be using just to use one I will.) So, desiring an easy-to-use and 
atomic library, I took to writing my own from scratch.

https://github.com/w0rp/dson/blob/master/json.d

I would love to hear some comments on my implementation. 
Criticism is mostly what I am after. It's hard for me to 
self-criticise. Perhaps the most obvious criticism to me is that 
I seem to write too damn many unit tests.
May 07 2013
next sibling parent reply "evilrat" <evilrat666 gmail.com> writes:
On Tuesday, 7 May 2013 at 07:29:16 UTC, w0rp wrote:
 ...
 https://github.com/w0rp/dson/blob/master/json.d
 ...
looks like you do reinvented the wheel. std.json already has anything we need to read/write json, and it is really small. after all, i would enjoy the idea if someone would write std.xml like stuff for reading/writing which will be pulled to std.json.
May 07 2013
next sibling parent "deadalnix" <deadalnix gmail.com> writes:
On Tuesday, 7 May 2013 at 07:52:29 UTC, evilrat wrote:
 On Tuesday, 7 May 2013 at 07:29:16 UTC, w0rp wrote:
 ...
 https://github.com/w0rp/dson/blob/master/json.d
 ...
looks like you do reinvented the wheel. std.json already has anything we need to read/write json, and it is really small. after all, i would enjoy the idea if someone would write std.xml like stuff for reading/writing which will be pulled to std.json.
I always was unhappy with phobos json lib. Mostly because the API. wOrp, can you provide some usage example, to see how the lib is intended to be used ? Do you have some benchmarks ? Sone, same thing ?
May 07 2013
prev sibling parent reply Sean Kelly <sean invisibleduck.org> writes:
I'm not terribly happy with std.json do this is welcome. I do wish that ther=
e were a SAX style parser available too though, so I could parse JSON withou=
t allocating.=20

On May 7, 2013, at 12:52 AM, "evilrat" <evilrat666 gmail.com> wrote:

 On Tuesday, 7 May 2013 at 07:29:16 UTC, w0rp wrote:
 ...
 https://github.com/w0rp/dson/blob/master/json.d
 ...
=20 looks like you do reinvented the wheel. std.json already has anything we n=
eed to read/write json, and it is really small.
 after all, i would enjoy the idea if someone would write std.xml like stuf=
f for reading/writing which will be pulled to std.json.
May 07 2013
parent reply Piotr Szturmaj <bncrbme jadamspam.pl> writes:
W dniu 07.05.2013 16:53, Sean Kelly pisze:
 I'm not terribly happy with std.json do this is welcome. I do wish that
 there were a SAX style parser available too though, so I could parse 
JSON without allocating. You may find this useful: https://github.com/pszturmaj/json-streaming-parser Don't be scared by the TODOs, they're not much relevant for normal usage. The only thing you shouldn't do is calling whole() methods more than once.
May 07 2013
parent reply "Sean Kelly" <sean invisibleduck.org> writes:
On Tuesday, 7 May 2013 at 17:13:18 UTC, Piotr Szturmaj wrote:
 You may find this useful: 
 https://github.com/pszturmaj/json-streaming-parser
Thanks for the link. Unfortunately, I couldn't get it to compiler out of the box. I did use the test routine you had to benchmark std.json and the JSON implementation from the OP of this thread as well as an event-based JSON parser I implemented for work. On a single parse of this large (189MB) JSON file: https://github.com/zeMirco/sf-city-lots-json Here are my results for one parse, where "newJson" is the OP's JSON parser and "jepJson" is mine: $ main n = 1 Milliseconds to call stdJson() n times: 73054 Milliseconds to call newJson() n times: 44022 Milliseconds to call jepJson() n times: 839 newJson() is faster than stdJson() 1.66x times jepJson() is faster than stdJson() 87.1x times Now obviously, in many cases convenience is preferable to raw speed, but I think code in Phobos should be an option for both types of uses whenever possible. What I'd really like to see is the variant-type front-end layered on top of an event-based parser so the user could just use parseJSON as-is to generate a tree of JSON objects or call the event-driven parser directly when performance is desired. I don't think the parser needs to be resumable either, since in most cases JSON is transported in an HTTP message, so a plain old recursive descent parser is fine.
May 07 2013
next sibling parent Jacob Carlborg <doob me.com> writes:
On 2013-05-07 20:36, Sean Kelly wrote:

 $ main
 n = 1
 Milliseconds to call stdJson() n times: 73054
 Milliseconds to call newJson() n times: 44022
 Milliseconds to call jepJson() n times: 839
 newJson() is faster than stdJson() 1.66x times
 jepJson() is faster than stdJson() 87.1x times
That's quite a big difference. -- /Jacob Carlborg
May 07 2013
prev sibling next sibling parent "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Tuesday, May 07, 2013 20:36:19 Sean Kelly wrote:
 Now obviously, in many cases convenience is preferable to raw
 speed, but I think code in Phobos should be an option for both
 types of uses whenever possible. What I'd really like to see is
 the variant-type front-end layered on top of an event-based
 parser so the user could just use parseJSON as-is to generate a
 tree of JSON objects or call the event-driven parser directly
 when performance is desired. I don't think the parser needs to
 be resumable either, since in most cases JSON is transported in
 an HTTP message, so a plain old recursive descent parser is fine.
Yeah. For both JSON and XML, it should be quite possible to implement a low- level API which gives you raw speed and then build more convenient APIs on top of them, thereby giving users the choice. And given how slices work, parsers like this should be able to beat the pants off of most parsers in other languages, especially with the low-level API. - Jonathan M Davis
May 07 2013
prev sibling parent reply "w0rp" <devw0rp gmail.com> writes:
On Tuesday, 7 May 2013 at 18:36:20 UTC, Sean Kelly wrote:

 $ main
 n = 1
 Milliseconds to call stdJson() n times: 73054
 Milliseconds to call newJson() n times: 44022
 Milliseconds to call jepJson() n times: 839
 newJson() is faster than stdJson() 1.66x times
 jepJson() is faster than stdJson() 87.1x times
This is very interesting. This jepJson library seems to be pretty fast. I imagine this library works very similar to SAX, so you can save quite a bit on simply not having to allocate. Before I read this, I went about creating my own benchmark. Here is a .zip containing the source and some nice looking bar charts comparing std.json, vibe.d's json library, and my own against various arrays of objects held in memory as a string: http://www.mediafire.com/download.php?gabsvk8ta711q4u For those less interested in downloading and looking at the .ods file, here are the results for the largest input size. (Array of 100,000 small objects) std.json - 2689375370 ms vibe.data.json - 2835431576 ms dson - 3705095251 ms Where 'dson' is my library. I have done my duty and made my own library look the worst in benchmarks. I think overall these are all linear time algorithms that do very similar things, and the speed difference is very minor. As always with benchmarks, mileage may vary. Per request for examples of my library, I have produced this little snippet. http://pastebin.com/sU8heFXZ It's hard to enumerate all of the features I put in there at once, but that's a pretty good start. I also listed a few examples in a doc comment at the top of the json.d source. The idea presented in this thread of building a nice tagged union reader (like std.json, vibe.d, and my own) on top of a recursive event (SAX-like) parser seems pretty attractive to me now. I can envision re-writing my own library to work on top of such a parser.
May 07 2013
next sibling parent reply "Sean Kelly" <sean invisibleduck.org> writes:
On Tuesday, 7 May 2013 at 20:14:20 UTC, w0rp wrote:
 On Tuesday, 7 May 2013 at 18:36:20 UTC, Sean Kelly wrote:

 $ main
 n = 1
 Milliseconds to call stdJson() n times: 73054
 Milliseconds to call newJson() n times: 44022
 Milliseconds to call jepJson() n times: 839
 newJson() is faster than stdJson() 1.66x times
 jepJson() is faster than stdJson() 87.1x times
This is very interesting. This jepJson library seems to be pretty fast. I imagine this library works very similar to SAX, so you can save quite a bit on simply not having to allocate.
Yes, the jep parser does no allocation at all--all callbacks simply receive a slice of the value. It does full validation according to the spec, but there's no interpretation of the values beyond that either, so if you want the integer string you were passed converted to an int, for example, you'd do the conversion yourself. The same goes for unescaping of string data, and in practice I often end up unescaping the strings in-place since I typically never need to re-parse the input buffer. In practice, it's kind of a pain to use the jep parser for arbitrary processing so I have some functions layered on top of it that iterate across array values and object keys: int foreachArrayElem(char[] buf, scope int delegate(char[] value)); int foreachObjectField(char[] buf, scope int delegate(char[] name, char[] value)); This works basically the same as opApply, so having the delegate return a nonzero value causes parsing to abort and return that value from the foreach routine. The parser is sufficiently fast that I generally just nest calls to these foreach routines to parse complex types, even though this results in multiple passes across the same data. The only other thing I was careful to do is design the library in such a way that each parser callback could call a corresponding writer routine to simply pass through the input to an output buffer. This makes auto-reformatting a breeze because you just set a "format output" flag on the writer and implement a few one-line functions.
 Before I read this, I went about creating my own benchmark. 
 Here is a .zip containing the source and some nice looking bar 
 charts comparing std.json, vibe.d's json library, and my own 
 against various arrays of objects held in memory as a string:

 http://www.mediafire.com/download.php?gabsvk8ta711q4u

 For those less interested in downloading and looking at the 
 .ods file, here are the results for the largest input size. 
 (Array of 100,000 small objects)

 std.json - 2689375370 ms
 vibe.data.json - 2835431576 ms
 dson - 3705095251 ms
These results don't seem correct. Is this really milliseconds?
May 07 2013
parent "w0rp" <devw0rp gmail.com> writes:
 std.json - 2689375370 ms
 vibe.data.json - 2835431576 ms
 dson - 3705095251 ms
These results don't seem correct. Is this really milliseconds?
Well this is embarrassing. I do apologise. I appear to have printed the TickDuration object value itself instead of the milliseconds. I think I spent too much time writing the benchmark and too little looking at the actual results. I ran it again quickly correcting the error (.msecs) and got much more reasonable looking results on a size of 1,000: std.json : 7370 ms vibe.data.json : 6878 ms json : 9150 ms
May 07 2013
prev sibling next sibling parent reply "w0rp" <devw0rp gmail.com> writes:
I completely missed something out there. Namely, my reasons why I 
just didn't like the existing implementations enough. Overall, 
the other libraries are all very similar, so I don't have major 
complaints, just little ones.

For vibe.d, it's actually pretty close to what I wanted. My big 
objection is that I don't like the 'Undefined' types. I would 
rather experience runtime errors in those cases. I also have to 
pretty much depend on Vibe to use it, rather than just a JSON 
library. Aside from that, it's not far off from what I'm after.

For Libdjson, it uses classes to represent json types. That just 
seems very awkward to use, and that shouts out "unnecessary 
garbage creation" to me.

The standard library (std.json) seems to nail the parsing of 
JSON, but lacks the ability to write a JSON string to an output 
range, and doesn't really offer any conveniences for working with 
the JSON data structure itself. std.json, vibe.d, and my own 
representation of JSON are all very similar. They are tagged 
unions implemented with union {} and an enum. What makes vibe.d 
and my own library nice is all of the operator overloads, 
properties, and convenience functions.

Another issue with std.json is lack of pretty-printing, which 
both vibe.d and my own library address. (Mine has toJSON!4 and 
writeJSON!8 for a string indented by 4 characters and writing to 
an output range indented by 8 characters, respectively.)

So that's essentially my rationale. Overall, writing the library 
was mostly done because I found it to be a rather entertaining 
challenge for myself.
May 07 2013
parent Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:
On Tue, 07 May 2013 23:09:35 +0200
"w0rp" <devw0rp gmail.com> wrote:
 
 So that's essentially my rationale. Overall, writing the library 
 was mostly done because I found it to be a rather entertaining 
 challenge for myself.
Parsing a simple grammar can indeed be very fun! I did that recently, too (not JSON though), partly to try my hand at LL for a change, and had a blast. Designing and implementing a good API can actually be the hard/tedius part (well, and the unittests can be pretty tedius).
May 07 2013
prev sibling parent reply "deadalnix" <deadalnix gmail.com> writes:
On Tuesday, 7 May 2013 at 20:14:20 UTC, w0rp wrote:
 Per request for examples of my library, I have produced this 
 little snippet. http://pastebin.com/sU8heFXZ It's hard to 
 enumerate all of the features I put in there at once, but 
 that's a pretty good start. I also listed a few examples in a 
 doc comment at the top of the json.d source.
The API look really nice ! I'd love to sse something similar into phobos APIwise. But I don't like the shortcut choices. arr => array you only win 2 chars ! That is nothing and certainly not worth the confusion. Same for obj => object. With this kind of practices, everybody come with its own set of shortcut and you have to remember all of them for each library ! What seems like a speedup at first ends up being a slowdown.
May 08 2013
parent reply "w0rp" <devw0rp gmail.com> writes:
 The API look really nice ! I'd love to sse something similar 
 into phobos APIwise.

 But I don't like the shortcut choices. arr => array you only 
 win 2 chars ! That is nothing and certainly not worth the 
 confusion. Same for obj => object. With this kind of practices, 
 everybody come with its own set of shortcut and you have to 
 remember all of them for each library ! What seems like a 
 speedup at first ends up being a slowdown.
I think that's a good point. I'll change them immediately and push to github.
May 08 2013
parent reply "deadalnix" <deadalnix gmail.com> writes:
On Wednesday, 8 May 2013 at 21:05:55 UTC, w0rp wrote:
 The API look really nice ! I'd love to sse something similar 
 into phobos APIwise.

 But I don't like the shortcut choices. arr => array you only 
 win 2 chars ! That is nothing and certainly not worth the 
 confusion. Same for obj => object. With this kind of 
 practices, everybody come with its own set of shortcut and you 
 have to remember all of them for each library ! What seems 
 like a speedup at first ends up being a slowdown.
I think that's a good point. I'll change them immediately and push to github.
Awesome. Another nice thing you can do it to use alias this on a property to allow for implicit conversion to int. Overall, the API is super nice ! If performance don't matter, I definitively recommend to use the lib.
May 08 2013
parent reply "w0rp" <devw0rp gmail.com> writes:
On Thursday, 9 May 2013 at 01:42:41 UTC, deadalnix wrote:
 Awesome. Another nice thing you can do it to use alias this on 
 a  property to allow for implicit conversion to int.

 Overall, the API is super nice ! If performance don't matter, I 
 definitively recommend to use the lib.
I'll have to experiment with the alias this idea. There are still a few things I need to work out. I'm missing an overload for opCmp (plus the host of math operators), and the append behaviour is perhaps strange. I had to choose between ~ meaning a JSON array is added to the LHS, [] ~ [1, 2] == [[1, 2]], or an array is concatenated, like the normal D arrays, [] ~ [1, 2] == [1, 2]. I went with the former for now, but I might have made the wrong choice. It all came about because of this. auto arr = jsonArray(); arr ~= 1; // [1] arr ~= "foo"; // [1, "foo"] arr ~= jsonArray() // Currently: [1, "foo", []] auto another = jsonArray(); another ~= 3; arr.array ~= another.array; // Always: [1, "foo", [], 3] I swear that I wrote a concat(JSON, JSON) function for this, but it's not there. That would have accomplished this: arr.concat(another)
May 09 2013
parent Manu <turkeyman gmail.com> writes:
This entire thread is a really good example of why all new modules should
live in exp. for a year after birth before moving to std...


On 9 May 2013 17:21, w0rp <devw0rp gmail.com> wrote:

 On Thursday, 9 May 2013 at 01:42:41 UTC, deadalnix wrote:

 Awesome. Another nice thing you can do it to use alias this on a
  property to allow for implicit conversion to int.

 Overall, the API is super nice ! If performance don't matter, I
 definitively recommend to use the lib.
I'll have to experiment with the alias this idea. There are still a few things I need to work out. I'm missing an overload for opCmp (plus the host of math operators), and the append behaviour is perhaps strange. I had to choose between ~ meaning a JSON array is added to the LHS, [] ~ [1, 2] == [[1, 2]], or an array is concatenated, like the normal D arrays, [] ~ [1, 2] == [1, 2]. I went with the former for now, but I might have made the wrong choice. It all came about because of this. auto arr = jsonArray(); arr ~= 1; // [1] arr ~= "foo"; // [1, "foo"] arr ~= jsonArray() // Currently: [1, "foo", []] auto another = jsonArray(); another ~= 3; arr.array ~= another.array; // Always: [1, "foo", [], 3] I swear that I wrote a concat(JSON, JSON) function for this, but it's not there. That would have accomplished this: arr.concat(another)
May 09 2013
prev sibling next sibling parent =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig outerproduct.org> writes:
Am 07.05.2013 09:29, schrieb w0rp:
 I wasn't quite satisfied with std.json or the JSON libraries in
 frameworks. The standard library doesn't make it easy enough to create
 JSON objects, and my primary objection for the framework solutions is
 that they seem to depend on other parts of the frameworks. (I'd rather
 not depend on a host of libraries I won't be using just to use one I
 will.)
Just for reference, the vibe.d JSON implementation does not depend on other parts of the library: https://github.com/rejectedsoftware/vibe.d/blob/master/source/vibe/data/json.d
May 07 2013
prev sibling next sibling parent Jacob Carlborg <doob me.com> writes:
On 2013-05-07 09:29, w0rp wrote:

 Perhaps the most obvious criticism to me is that I seem to write too damn many
unit tests.
I never heard of anyone complaining about too many unit tests. I don't see it as a problem. -- /Jacob Carlborg
May 07 2013
prev sibling next sibling parent David <d dav1d.de> writes:
Am 07.05.2013 09:29, schrieb w0rp:
 I wasn't quite satisfied with std.json or the JSON libraries in
 frameworks. The standard library doesn't make it easy enough to create
 JSON objects, and my primary objection for the framework solutions is
 that they seem to depend on other parts of the frameworks. (I'd rather
 not depend on a host of libraries I won't be using just to use one I
 will.) So, desiring an easy-to-use and atomic library, I took to writing
 my own from scratch.
 
 https://github.com/w0rp/dson/blob/master/json.d
 
 I would love to hear some comments on my implementation. Criticism is
 mostly what I am after. It's hard for me to self-criticise. Perhaps the
 most obvious criticism to me is that I seem to write too damn many unit
 tests.
And there is https://256.makerslocal.org/wiki/index.php/Libdjson
May 07 2013
prev sibling next sibling parent Denis Shelomovskij <verylonglogin.reg gmail.com> writes:
07.05.2013 11:29, w0rp пишет:
 I wasn't quite satisfied with std.json or the JSON libraries in
 frameworks. The standard library doesn't make it easy enough to create
 JSON objects, and my primary objection for the framework solutions is
 that they seem to depend on other parts of the frameworks. (I'd rather
 not depend on a host of libraries I won't be using just to use one I
 will.) So, desiring an easy-to-use and atomic library, I took to writing
 my own from scratch.

 https://github.com/w0rp/dson/blob/master/json.d

 I would love to hear some comments on my implementation. Criticism is
 mostly what I am after. It's hard for me to self-criticise. Perhaps the
 most obvious criticism to me is that I seem to write too damn many unit
 tests.
Good luck as my personal attempt to improve std.json to at least this: https://github.com/D-Programming-Language/phobos/pull/1206#issuecomment-14826562 got stuck on even this simple pull: https://github.com/D-Programming-Language/phobos/pull/1263 -- Денис В. Шеломовский Denis V. Shelomovskij
May 10 2013
prev sibling parent reply "w0rp" <devw0rp gmail.com> writes:
I have been working on a few improvements to the library. First, 
I made a few performance tweaks. Aside from very small (and 
therefore hard to describe) tweaks, I made two major improvements.

1. Manual parsing of numbers has been implemented.
2. When the input is a string, the indices and the length are 
used instead of the array InputRange functions.

The first one is a dangerous idea, but my unit tests show that at 
least what I have tested works. The reasoning behind it is that 
before, a string buffer (Appender!string) was created for 
numbers, and then one of parse!long or parse!real was chosen 
based upon whether or not the parser figured out it was an 
integer or not. Now it will read the input into actual numbers as 
it goes and then spit out an integer or a floating point number 
after it hits the end. You need to put a helmet on, but there's 
less allocation along the way. Perhaps this idea could be better 
encapsulated at some point with an std.conv function which 
accepts a range and returns a tagged union.

The second improvement is actually pretty nice, because I already 
wrapped the input range functions in methods anyway, so it was a 
simple matter of inserting 'static if ... else' to flip the 
string optimisation on. Perhaps it is better in general to wrap 
strings in range structs than to rely on std.array's range 
functions.

The end result is that I can now cheat at my own benchmark.

---
Ran for 100 runs

std.json : 674 ms
vibe.data.json : 604 ms
json : 548 ms
---

Which I updated slightly to match some function renaming (plus to 
correct my earlier embarrassing omission of .msecs) here: 
http://pastebin.com/KciFit4b

It's not a complete test of speed, and as always with benchmarks, 
mileage will vary.

In addition to these things, I made a few of the property and 
function names a little nicer, and generally improved on the 
documentation, which currently looks a little like this. 
http://www.mediafire.com/?q5lwtj2cc22s1t0 I apologise for my 
current lack of hosting. (I plan to correct this at a later date, 
perhaps with a website written in D!)
May 14 2013
parent "deadalnix" <deadalnix gmail.com> writes:
On Tuesday, 14 May 2013 at 20:23:42 UTC, w0rp wrote:
 It's not a complete test of speed, and as always with 
 benchmarks, mileage will vary.

 In addition to these things, I made a few of the property and 
 function names a little nicer, and generally improved on the 
 documentation, which currently looks a little like this. 
 http://www.mediafire.com/?q5lwtj2cc22s1t0 I apologise for my 
 current lack of hosting. (I plan to correct this at a later 
 date, perhaps with a website written in D!)
Awesome. I want to try that lib next time I have to do some code involving JSON.
May 14 2013