digitalmars.D - Performance of std.json

David Soria Parra (28/28) Jun 01 2014 Hi,

Joshua Niehus (4/9) Jun 01 2014 std.json is underpowered and in need of an overhaul. In the mean
Jonathan M Davis via Digitalmars-d (20/48) Jun 01 2014 It's my understanding that the current design of std.json is considered

w0rp (65/95) Jun 02 2014 I implemented a JSON library myself which parses JSON and

w0rp (5/5) Jun 02 2014 It's worth noting, "pretty printing" could be configured entirely
Jacob Carlborg (5/10) Jun 02 2014 I think there should be quite a minimal API, then a proper serialization...

Sean Kelly (20/20) Jun 02 2014 The vibe.d parser is better, but it still creates a DOM-style

Jacob Carlborg (4/24) Jun 02 2014 Yes, exactly.
Jacob Carlborg (4/7) Jun 02 2014 That would be awesome. Is it written in D or was it C++ ?

Sean Kelly (9/14) Jun 03 2014 It's written in C, and so would need an overhaul regardless. The

Johannes Pfau (7/10) Jun 03 2014 I'd probably prefer a tokenizer/lexer as the lowest layer, then SAX and

Jacob Carlborg (6/11) Jun 03 2014 If I recall correctly it will allocate strings instead of slicing the

Jonathan M Davis via Digitalmars-d (13/23) Jun 03 2014 Agreed, though it might make sense to have something even lower level th...
=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (5/25) Jun 03 2014 Since some time, the vibe.d parser can directly serialize from and to

Chris Williams (24/29) Jun 02 2014 In general, I've been pretty happy with vibe.d, and I've heard

David Soria Parra (7/30) Jun 02 2014 I think the main question is, given that std.json is close to be

Chris Williams (18/24) Jun 02 2014 std.json really only has two methods parseJson and toJson. Any

Masahiro Nakagawa (27/56) Jun 03 2014 I don't know the status of another D based JSON library.
Masahiro Nakagawa (14/43) Jun 03 2014 BTW, my acquaintance points out your haskell code is different

"David Soria Parra" <dsp experimentalworks.net> writes:

Hi,

I have recently had to deal with large amounts of JSON data in D. 
While doing that I've found that std.json is remarkable slow in 
comparison to other languages standard json implementation. I've 
create a small and simple benchmark parsing a local copy of a 
github API call 
"https://api.github.com/repos/D-Programming-Language/dmd/pulls" 
and parsing it 100% times and writing the title to stdout.

My results as follows:
   ./d-test > /dev/null  3.54s user 0.02s system 99% cpu 3.560 
total
   ./hs-test > /dev/null  0.02s user 0.00s system 93% cpu 0.023 
total
   python test.py > /dev/null  0.77s user 0.02s system 99% cpu 
0.792 total

The concrete implementations (sorry for my terrible haskell 
implementation) can be found here:

    https://github.com/dsp/D-Json-Tests/

This is comapring D's std.json vs Haskells Data.Aeson and python 
standard library json. I am a bit concerned with the current 
state of our JSON parser given that a lot of applications these 
day use JSON. I personally consider a high speed implementation 
of JSON a critical part of a standard library.

Would it make sense to start thinking about using ujson4c as an 
external library, or maybe come up with a better implementation. 
I know Orvid has something and might add some analysis as to why 
std.json is slow. Any ideas or pointers as to how to start with 
that?

Jun 01 2014

"Joshua Niehus" <jm.niehus gmail.com> writes:

On Monday, 2 June 2014 at 00:18:19 UTC, David Soria Parra wrote:
 Would it make sense to start thinking about using ujson4c as an 
 external library, or maybe come up with a better 
 implementation. I know Orvid has something and might add some 
 analysis as to why std.json is slow. Any ideas or pointers as 
 to how to start with that?

std.json is underpowered and in need of an overhaul.  In the mean 
time have you tried vibe.d's json?

http://vibed.org/api/vibe.data.json/

Jun 01 2014

Jonathan M Davis via Digitalmars-d <digitalmars-d puremagic.com> writes:

On Mon, 02 Jun 2014 00:18:18 +0000
David Soria Parra via Digitalmars-d <digitalmars-d puremagic.com> wrote:

 Hi,

 I have recently had to deal with large amounts of JSON data in D.
 While doing that I've found that std.json is remarkable slow in
 comparison to other languages standard json implementation. I've
 create a small and simple benchmark parsing a local copy of a
 github API call
 "https://api.github.com/repos/D-Programming-Language/dmd/pulls"
 and parsing it 100% times and writing the title to stdout.

 My results as follows:
    ./d-test > /dev/null  3.54s user 0.02s system 99% cpu 3.560
 total
    ./hs-test > /dev/null  0.02s user 0.00s system 93% cpu 0.023
 total
    python test.py > /dev/null  0.77s user 0.02s system 99% cpu
 0.792 total

 The concrete implementations (sorry for my terrible haskell
 implementation) can be found here:

     https://github.com/dsp/D-Json-Tests/

 This is comapring D's std.json vs Haskells Data.Aeson and python
 standard library json. I am a bit concerned with the current
 state of our JSON parser given that a lot of applications these
 day use JSON. I personally consider a high speed implementation
 of JSON a critical part of a standard library.

 Would it make sense to start thinking about using ujson4c as an
 external library, or maybe come up with a better implementation.
 I know Orvid has something and might add some analysis as to why
 std.json is slow. Any ideas or pointers as to how to start with
 that?

It's my understanding that the current design of std.json is considered
to be poor, but I don't haven't used it, so I don't know any the
details. But if it's as slow as you're finding to be the case, then I
think that that supports the idea that it needs a redesign. The
question then is what a new std.json should look like and who would do
it. And that pretty much comes down to an interested and motivated
developer coming up with and implementing a new design and then
proposing it here. And until someone takes up that torch, we'll be
stuck with what we have. Certainly, there's no fundamental reason why
we can't have a lightening fast std.json. With ranges and slices,
parsing in D in general should be faster than C/C++ (and definitely
faster than Haskell of python), and if it isn't, that indicates that
the implementation (if not the whole design) of that code needs to be
redone.

I know that vibe.d uses its own json implementation, but I don't know
how much of that is part of its public API and how much of that is
simply used internally: http://vibed.org

- Jonathan M Davis

Jun 01 2014

"w0rp" <devw0rp gmail.com> writes:

On Monday, 2 June 2014 at 00:39:48 UTC, Jonathan M Davis via 
Digitalmars-d wrote:
 It's my understanding that the current design of std.json is 
 considered
 to be poor, but I don't haven't used it, so I don't know any the
 details. But if it's as slow as you're finding to be the case, 
 then I
 think that that supports the idea that it needs a redesign. The
 question then is what a new std.json should look like and who 
 would do
 it. And that pretty much comes down to an interested and 
 motivated
 developer coming up with and implementing a new design and then
 proposing it here. And until someone takes up that torch, we'll 
 be
 stuck with what we have. Certainly, there's no fundamental 
 reason why
 we can't have a lightening fast std.json. With ranges and 
 slices,
 parsing in D in general should be faster than C/C++ (and 
 definitely
 faster than Haskell of python), and if it isn't, that indicates 
 that
 the implementation (if not the whole design) of that code needs 
 to be
 redone.

 I know that vibe.d uses its own json implementation, but I 
 don't know
 how much of that is part of its public API and how much of that 
 is
 simply used internally: http://vibed.org

 - Jonathan M Davis

I implemented a JSON library myself which parses JSON and 
generates JSON objects similar to how std.json does not. I wrote 
it largely because of the poor API in the standard library at the 
time, but I think by this point nearly all of the concerns have 
been alleviated.

At the time I benchmarked it against std.json and vibe.d's 
implementation, and they were all pretty equivalent in terms of 
performance. I settled for edging just slightly ahead of 
std.json. If there's any major performance gains to make, I 
believe we will have to completely rethink how we go about 
parsing JSON I suspect transparent character encoding and 
decoding (dchar ranges) might be one potential source of trouble.

In terms of API, I wouldn't go completely for an approach based 
on serialising to structs. Having a tagged union type is still 
helpful for situations where you just want to quickly get at some 
JSON data and do something with it. I have thought a great deal 
about writing data *to* JSON strings however, and I have an idea 
for this I would like to share.

First, you define by convention that there is a function 
writeJSON which takes some value and an OutputRange, and then 
writes the value in a JSON representation directly to an 
OutputRange. You define in the library writeJSON functions for 
standard types.

writeJSON(OutputRange)(JSONValue, OutputRange);
writeJSON(OutputRange)(string, OutputRange);
writeJSON(OutputRange)(int, OutputRange);
writeJSON(OutputRange)(bool, OutputRange);
writeJSON(OutputRange)(typeof(null), OutputRange);
// ...

You define one additional writeJSON function, which takes any 
InputRange of type T and writes an array of Ts. (So string[] will 
write an array of strings, int[] will write ints, etc.)

writeJSON(InputRange, OutputRange)(InputRange inRange, 
OutputRange outRange) {
    foreach(ref value; inRange) {
        writeJSON(value, outRange);
    }
}

Add a convenience method which takes var args alternatively 
string, T, string, U, ... Call it say, writeJSONObject.

You now have a decent framework for writing objects directly to 
OutputRanges.

struct Foo {
     AnotherType bar;
     string stringValue;
     int intValue;
}

writeJSON(OutputRange)(Foo foo, OutputRange outRange) {
     // Writes {"bar":<bar_value>, ... }
     writeJSONObject(outRange,
          // writeJSONObject calls writeJSON for AnotherType, etc.
         "bar", foo.bar,
         "stringValue", foo.stringValue,
         "intValue", foo.intValue
     );
}

There are more details, and something would need to be done for 
handling stack overflows, (inlining?) but there's the idea that I 
had for improving writing JSON at least. One advantage in this 
approach would be that it wouldn't be dependent on the GC, and 
scoped buffers could be used. (A  nogc candidate, I think.) You 
can't get this ability out of something like toJSON which 
produces a string at once.

Jun 02 2014

"w0rp" <devw0rp gmail.com> writes:

It's worth noting, "pretty printing" could be configured entirely 
in an OutputRange which watches for certain syntax coming into 
the range and inserts whitespace where it believes to be 
appropriate, so writeJSON functions would not need to know 
anything about pretty printing.

Jun 02 2014

Jacob Carlborg <doob me.com> writes:

On 02/06/14 13:36, w0rp wrote:

 In terms of API, I wouldn't go completely for an approach based on
 serialising to structs. Having a tagged union type is still helpful for
 situations where you just want to quickly get at some JSON data and do
 something with it. I have thought a great deal about writing data *to*
 JSON strings however, and I have an idea for this I would like to share.

I think there should be quite a minimal API, then a proper serialization 
module can be built on top of that.

-- 
/Jacob Carlborg

Jun 02 2014

"Sean Kelly" <sean invisibleduck.org> writes:

The vibe.d parser is better, but it still creates a DOM-style
tree of objects, which isn't acceptable in some circumstances.  I
posted a performance comparison of the JSON parser I created for
work use with std.json a while back, and mine is almost 100x
faster than std.json in a simple test and allocates zero memory
to boot:

http://forum.dlang.org/thread/cyzcirslzcgnyxbyzycc forum.dlang.org#post-gxgeizjsurulklzftfqz:40forum.dlang.org

I haven't tried it vs. the vibe.d parser, but I suspect it will
still beat it by an order of magnitude or more because of the not
allocating thing.

I've said this a bunch of times, but what I want to see is a
SAX-style parser as the bottom layer with an optional DOM-style
parser built on top of it.  Then people who want the tree
generated can get it, and people who want performance or don't
want allocations can get that too.  I'm starting to wonder if I
should just try and get permission from work to open source my
parser so I can submit it.  Parsing JSON really isn't terribly
difficult though.  It shouldn't take more than a few days for one
of the more parser-oriented people here to produce something
comparable.

Jun 02 2014

Jacob Carlborg <doob me.com> writes:

On 02/06/14 21:13, Sean Kelly wrote:
 The vibe.d parser is better, but it still creates a DOM-style
 tree of objects, which isn't acceptable in some circumstances.  I
 posted a performance comparison of the JSON parser I created for
 work use with std.json a while back, and mine is almost 100x
 faster than std.json in a simple test and allocates zero memory
 to boot:

 http://forum.dlang.org/thread/cyzcirslzcgnyxbyzycc forum.dlang.org#post-gxgeizjsurulklzftfqz:40forum.dlang.org


 I haven't tried it vs. the vibe.d parser, but I suspect it will
 still beat it by an order of magnitude or more because of the not
 allocating thing.

 I've said this a bunch of times, but what I want to see is a
 SAX-style parser as the bottom layer with an optional DOM-style
 parser built on top of it.  Then people who want the tree
 generated can get it, and people who want performance or don't
 want allocations can get that too.  I'm starting to wonder if I
 should just try and get permission from work to open source my
 parser so I can submit it.  Parsing JSON really isn't terribly
 difficult though.  It shouldn't take more than a few days for one
 of the more parser-oriented people here to produce something
 comparable.

Yes, exactly.

-- 
/Jacob Carlborg

Jun 02 2014

Jacob Carlborg <doob me.com> writes:

On 02/06/14 21:13, Sean Kelly wrote:

 I'm starting to wonder if I
 should just try and get permission from work to open source my
 parser so I can submit it.

That would be awesome. Is it written in D or was it C++ ?

-- 
/Jacob Carlborg

Jun 02 2014

"Sean Kelly" <sean invisibleduck.org> writes:

On Tuesday, 3 June 2014 at 06:39:04 UTC, Jacob Carlborg wrote:
 On 02/06/14 21:13, Sean Kelly wrote:

 I'm starting to wonder if I
 should just try and get permission from work to open source my
 parser so I can submit it.

 That would be awesome. Is it written in D or was it C++ ?

It's written in C, and so would need an overhaul regardless.  The 
user basically assigns a bunch of function pointers for the 
callbacks.  Using the parser at this level is really kind of 
difficult because you have to create a state machine for parsing 
anything reasonably complex, so what I usually do is nest calls 
to foreachObjectField and foreachArrayElem.  I'm wondering if we 
can't do something similar here, but with corresponding 
ForwardRanges instead of the opApply style.

Jun 03 2014

Johannes Pfau <nospam example.com> writes:

Am Mon, 02 Jun 2014 19:13:07 +0000
schrieb "Sean Kelly" <sean invisibleduck.org>:

 I've said this a bunch of times, but what I want to see is a
 SAX-style parser as the bottom layer with an optional DOM-style
 parser built on top of it. 

I'd probably prefer a tokenizer/lexer as the lowest layer, then SAX and
DOM implemented using the tokenizer. This way we can provide a kind of
input range. I actually used Brian Schotts std.lexer proposal to build a
simple JSON tokenizer/lexer and it worked quite well. But I don't
think std.lexer is zero-allocation yet so that's an important drawback.

Jun 03 2014

Jacob Carlborg <doob me.com> writes:

On 03/06/14 09:15, Johannes Pfau wrote:

 I'd probably prefer a tokenizer/lexer as the lowest layer, then SAX and
 DOM implemented using the tokenizer. This way we can provide a kind of
 input range. I actually used Brian Schotts std.lexer proposal to build a
 simple JSON tokenizer/lexer and it worked quite well. But I don't
 think std.lexer is zero-allocation yet so that's an important drawback.

If I recall correctly it will allocate strings instead of slicing the 
input. The strings are then reused. If the input is sliced the whole 
input is retained in memory even if not all of the input is used.

-- 
/Jacob Carlborg

Jun 03 2014

Jonathan M Davis via Digitalmars-d <digitalmars-d puremagic.com> writes:

On Mon, 02 Jun 2014 19:13:07 +0000
Sean Kelly via Digitalmars-d <digitalmars-d puremagic.com> wrote:

 I've said this a bunch of times, but what I want to see is a
 SAX-style parser as the bottom layer with an optional DOM-style
 parser built on top of it.  Then people who want the tree
 generated can get it, and people who want performance or don't
 want allocations can get that too.  I'm starting to wonder if I
 should just try and get permission from work to open source my
 parser so I can submit it.  Parsing JSON really isn't terribly
 difficult though.  It shouldn't take more than a few days for one
 of the more parser-oriented people here to produce something
 comparable.

Agreed, though it might make sense to have something even lower level than a
SAX parser. Certainly, for XML, I'd implement something that just gave you a
range of the attributes without any consideration for what you might do with
them, whereas it's my understanding that a SAX parser uses callbacks which
triggered when it finds for what you're looking for. A SAX parser and DOM
parser could then be built on top of the simple range API.  I'd be looking to
do something similar with JSON were I implementing a JSON parser, though since
JSON is a bit different from XML in structure, I'm not quite sure what the
lowest level API which would still be useful would be.  I'd have to think
about it. But in principle, I agree with what you're suggesting.

- Jonathan M Davis

Jun 03 2014

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 02.06.2014 21:13, schrieb Sean Kelly:
 The vibe.d parser is better, but it still creates a DOM-style
 tree of objects, which isn't acceptable in some circumstances.  I
 posted a performance comparison of the JSON parser I created for
 work use with std.json a while back, and mine is almost 100x
 faster than std.json in a simple test and allocates zero memory
 to boot:

 http://forum.dlang.org/thread/cyzcirslzcgnyxbyzycc forum.dlang.org#post-gxgeizjsurulklzftfqz:40forum.dlang.org


 I haven't tried it vs. the vibe.d parser, but I suspect it will
 still beat it by an order of magnitude or more because of the not
 allocating thing.

 I've said this a bunch of times, but what I want to see is a
 SAX-style parser as the bottom layer with an optional DOM-style
 parser built on top of it.  Then people who want the tree
 generated can get it, and people who want performance or don't
 want allocations can get that too.  I'm starting to wonder if I
 should just try and get permission from work to open source my
 parser so I can submit it.  Parsing JSON really isn't terribly
 difficult though.  It shouldn't take more than a few days for one
 of the more parser-oriented people here to produce something
 comparable.

Since some time, the vibe.d parser can directly serialize from and to 
string form, circumventing the DOM step and without unnecessary 
allocations. But I agree that an intermediate SAX layer would be nice to 
have. Maybe even an additional StAX layer.

Jun 03 2014

"Chris Williams" <yoreanon-chrisw yahoo.co.jp> writes:

On Monday, 2 June 2014 at 00:39:48 UTC, Jonathan M Davis via 
Digitalmars-d wrote:
 I know that vibe.d uses its own json implementation, but I 
 don't know
 how much of that is part of its public API and how much of that 
 is
 simply used internally: http://vibed.org

In general, I've been pretty happy with vibe.d, and I've heard 
that the parser speed of the JSON implementation is good. But I 
must admit that I found the API to be fairly obtuse. In order to 
do much of anything, you really need to serialize/deserialize 
from structs. The JSON objects themselves are pretty impossible 
to modify.

I haven't looked at how vibe's parser works, but any very-fast 
parser would probably need to support an input stream, so that it 
can build out data in parallel to I/O, and do a lot of manual 
memory management. E.g. you probably want a stack of reusable 
node buffers that you use to add elements to as you scan the JSON 
tree, then clone off purpose-sized nodes from the work buffers 
when you encounter the end of the definition. Whereas, the 
current implementation in std.json only accepts a complete string 
and for each node starts with no memory and has to 
allocate/reallocate for every fresh piece of information.

Having worked with JSON libraries quite a bit, the key to a good 
one is the ability to refer to paths through the data. So besides 
the JSON objects themselves, you need something like a "struct 
JPath" that represents an array of strings and size_ts, which you 
can pass into get, set, has, and count methods. I'd view the lack 
of that as the larger issue with the current JSON implementations.

Jun 02 2014

"David Soria Parra" <dsp experimentalworks.net> writes:

On Monday, 2 June 2014 at 19:05:15 UTC, Chris Williams wrote:

 In general, I've been pretty happy with vibe.d, and I've heard 
 that the parser speed of the JSON implementation is good. But I 
 must admit that I found the API to be fairly obtuse. In order 
 to do much of anything, you really need to 
 serialize/deserialize from structs. The JSON objects themselves 
 are pretty impossible to modify.

 I haven't looked at how vibe's parser works, but any very-fast 
 parser would probably need to support an input stream, so that 
 it can build out data in parallel to I/O, and do a lot of 
 manual memory management. E.g. you probably want a stack of 
 reusable node buffers that you use to add elements to as you 
 scan the JSON tree, then clone off purpose-sized nodes from the 
 work buffers when you encounter the end of the definition. 
 Whereas, the current implementation in std.json only accepts a 
 complete string and for each node starts with no memory and has 
 to allocate/reallocate for every fresh piece of information.

 Having worked with JSON libraries quite a bit, the key to a 
 good one is the ability to refer to paths through the data. So 
 besides the JSON objects themselves, you need something like a 
 "struct JPath" that represents an array of strings and size_ts, 
 which you can pass into get, set, has, and count methods. I'd 
 view the lack of that as the larger issue with the current JSON 
 implementations.


I think the main question is, given that std.json is close to be 
unusable for anything serious due to it's poor performance, can 
we come up with something faster that has the same API. I am not 
sure what phobos take on backwards compatibility is, but I'd 
rather keep the API than breaking it for whoever is using 
std.json.

Jun 02 2014

"Chris Williams" <yoreanon-chrisw yahoo.co.jp> writes:

On Monday, 2 June 2014 at 20:10:52 UTC, David Soria Parra wrote:
 I think the main question is, given that std.json is close to 
 be unusable for anything serious due to it's poor performance, 
 can we come up with something faster that has the same API. I 
 am not sure what phobos take on backwards compatibility is, but 
 I'd rather keep the API than breaking it for whoever is using 
 std.json.

std.json really only has two methods parseJson and toJson. Any 
implementation is going to have those two methods, so in terms of 
not breaking anything, you're pretty safe there.

Since it doesn't have any methods except those two, it really 
comes down to the underlying data structure. Right now, you have 
to read the source and understand the structure in order to 
operate on it, which is a hassle, but is presumably what people 
are doing. So maintaining the current structure would be the key 
necessity. I think that limits the optimizations which could be 
performed, but doesn't make them impossible.

Adding a stream-based parsing method would probably be the main 
optimization. That adds to the API, but is backwards compatible.

The module has a lot of inner methods and recursion. Reducing the 
number of function calls, using manual stack management instead 
of recursion, etc. might give another significant gain. How 
parseJson() works is irrelevant to the caller, so all of that can 
be optimized to the heart's content.

Jun 02 2014

"Masahiro Nakagawa" <repeatedly gmail.com> writes:

On Monday, 2 June 2014 at 00:18:19 UTC, David Soria Parra wrote:
 Hi,

 I have recently had to deal with large amounts of JSON data in 
 D. While doing that I've found that std.json is remarkable slow 
 in comparison to other languages standard json implementation. 
 I've create a small and simple benchmark parsing a local copy 
 of a github API call 
 "https://api.github.com/repos/D-Programming-Language/dmd/pulls" 
 and parsing it 100% times and writing the title to stdout.

 My results as follows:
   ./d-test > /dev/null  3.54s user 0.02s system 99% cpu 3.560 
 total
   ./hs-test > /dev/null  0.02s user 0.00s system 93% cpu 0.023 
 total
   python test.py > /dev/null  0.77s user 0.02s system 99% cpu 
 0.792 total

 The concrete implementations (sorry for my terrible haskell 
 implementation) can be found here:

    https://github.com/dsp/D-Json-Tests/

 This is comapring D's std.json vs Haskells Data.Aeson and 
 python standard library json. I am a bit concerned with the 
 current state of our JSON parser given that a lot of 
 applications these day use JSON. I personally consider a high 
 speed implementation of JSON a critical part of a standard 
 library.

 Would it make sense to start thinking about using ujson4c as an 
 external library, or maybe come up with a better 
 implementation. I know Orvid has something and might add some 
 analysis as to why std.json is slow. Any ideas or pointers as 
 to how to start with that?

I don't know the status of another D based JSON library.
If you can install yajl library, then yajl-d is an another 
candidate.

% time ./yajl_test > /dev/null
./yajl_test > /dev/null  0.42s user 0.01s system 99% cpu 0.434 
total
% time python test.py> /dev/null
python test.py > /dev/null  0.65s user 0.02s system 99% cpu 0.671 
total
% time ./test > /dev/null
./test > /dev/null  3.10s user 0.02s system 99% cpu 3.125 total

import yajl.yajl, std.datetime, std.file, std.stdio;

void parse() {
     foreach(elem; readText("test.json").decode.array) {
         writeln(elem.object["title"]);
     }
}
int main(string[] args) {
     for(uint i = 0; i < 100; i++) {
         parse();
     }
     return 0;
}

http://code.dlang.org/packages/yajl

NOTE: yajl-d doesn't expose yajl's SAX style API unlike Sean's 
implementation

Jun 03 2014

"Masahiro Nakagawa" <repeatedly gmail.com> writes:

On Monday, 2 June 2014 at 00:18:19 UTC, David Soria Parra wrote:
 Hi,

 I have recently had to deal with large amounts of JSON data in 
 D. While doing that I've found that std.json is remarkable slow 
 in comparison to other languages standard json implementation. 
 I've create a small and simple benchmark parsing a local copy 
 of a github API call 
 "https://api.github.com/repos/D-Programming-Language/dmd/pulls" 
 and parsing it 100% times and writing the title to stdout.

 My results as follows:
   ./d-test > /dev/null  3.54s user 0.02s system 99% cpu 3.560 
 total
   ./hs-test > /dev/null  0.02s user 0.00s system 93% cpu 0.023 
 total
   python test.py > /dev/null  0.77s user 0.02s system 99% cpu 
 0.792 total

 The concrete implementations (sorry for my terrible haskell 
 implementation) can be found here:

    https://github.com/dsp/D-Json-Tests/

 This is comapring D's std.json vs Haskells Data.Aeson and 
 python standard library json. I am a bit concerned with the 
 current state of our JSON parser given that a lot of 
 applications these day use JSON. I personally consider a high 
 speed implementation of JSON a critical part of a standard 
 library.

 Would it make sense to start thinking about using ujson4c as an 
 external library, or maybe come up with a better 
 implementation. I know Orvid has something and might add some 
 analysis as to why std.json is slow. Any ideas or pointers as 
 to how to start with that?

BTW, my acquaintance points out your haskell code is different 
from other samples.
Your haskell code parses JSON array only once. This is why so 
fast.
He uploads same behaviour code which parses JSON array at each 
loop. Please check it.

https://gist.github.com/maoe/e5f72c3cf3687610fe5c

On my env result:

% time ./new_test > /dev/null
./new_test > /dev/null  1.13s user 0.02s system 99% cpu 1.144 
total
% time ./test > /dev/null
./test > /dev/null  0.02s user 0.00s system 91% cpu 0.023 total

Jun 03 2014

D Programming

C/C++ Programming

Other

digitalmars.D - Performance of std.json