digitalmars.D.learn - Save JSONValue binary in file?

Chopin (6/6) Oct 12 2012 Hello!

Piotr Szturmaj (4/10) Oct 12 2012 Try this implementation:

Chopin (7/7) Oct 12 2012 Thanks! I tried using it:

Piotr Szturmaj (9/15) Oct 12 2012 If you're sure that content is an array:

Sean Kelly (50/55) Oct 12 2012 The performance problem is because std.json works like a DOM parser for ...

Jacob Carlborg (10/26) Oct 13 2012 I tried JSON parser in Tango, using D2, this is the results I got for a

"Chopin" <robert.bue gmail.com> writes:

Hello!

I got this 109 MB json file that I read... and it takes over 32
seconds for parseJSON() to finish it. So I was wondering if it
was a way to save it as binary or something like that so I can
read it super fast?

Thanks for all suggestions :)

Oct 12 2012

Piotr Szturmaj <bncrbme jadamspam.pl> writes:

Chopin wrote:
 Hello!

 I got this 109 MB json file that I read... and it takes over 32
 seconds for parseJSON() to finish it. So I was wondering if it
 was a way to save it as binary or something like that so I can
 read it super fast?

 Thanks for all suggestions :)

Try this implementation: 
https://github.com/pszturmaj/json-streaming-parser, you can parse all to 
memory or do streaming style parsing.

Oct 12 2012

"Chopin" <robert.bue gmail.com> writes:

Thanks! I tried using it:

auto document = parseJSON(content).array; // this works with 
std.json :)

Using json.d from the link:

auto j = JSONReader!string(content);
auto document = j.value.whole.array; // this doesn't.... "Error: 
undefined identifier 'array'"

Oct 12 2012

Piotr Szturmaj <bncrbme jadamspam.pl> writes:

Chopin wrote:
 Thanks! I tried using it:

 auto document = parseJSON(content).array; // this works with std.json :)

 Using json.d from the link:

 auto j = JSONReader!string(content);
 auto document = j.value.whole.array; // this doesn't.... "Error:
 undefined identifier 'array'"

If you're sure that content is an array:

auto j = JSONReader!string(content);
auto jv = j.value.whole;
assert(jv.type == JSONType.array);
auto jsonArray = jv.as!(JSONValue[]);

alternatively you can replace last line with

alias JSONValue[] JSONArray;
auto jsonArray = jv.as!JSONArray;

Oct 12 2012

Sean Kelly <sean invisibleduck.org> writes:

On Oct 12, 2012, at 9:40 AM, Chopin <robert.bue gmail.com> wrote:
=20
 I got this 109 MB json file that I read... and it takes over 32
 seconds for parseJSON() to finish it. So I was wondering if it
 was a way to save it as binary or something like that so I can
 read it super fast?

The performance problem is because std.json works like a DOM parser for =
XML--it allocates a node per value in the JSON stream.  What we really =
need is something that works more like a SAX parser with the DOM version =
as an optional layer built on top.  Just for kicks, I grabbed the fourth =
(largest) JSON blob from here:

http://www.json.org/example.html

then wrapped it in array tags and duplicated the object until I had a =
~350 MB input file.  ie.

[ paste, paste, paste, =85 ]

Then I parsed it via this test app, based on an example in a SAX-style =
JSON parser I wrote in C:


import core.stdc.stdlib;
import core.sys.posix.unistd;
import core.sys.posix.sys.stat;
import core.sys.posix.fcntl;
import std.json;

void main()
{
    auto filename =3D "input.txt\0".dup;

    stat_t st;
    stat(filename.ptr, &st);
    auto sz =3D st.st_size;
    auto buf =3D cast(char*) malloc(sz);
    auto fh =3D open(filename.ptr, O_RDONLY);
    read(fh, buf, sz);

    auto json =3D parseJSON(buf[0 .. sz]);
}


Here are my results:


$ dmd -release -inline -O dtest
$ ll input.txt
-rw-r--r--  1 sean  staff  365105313 Oct 12 15:50 input.txt
$ time dtest

real  1m36.462s
user 1m32.468s
sys   0m1.102s
=20

Then I ran my SAX style parser example on the same input file:


$ make example
cc example.c -o example lib/release/myparser.a
$ time example

real  0m2.191s
user 0m1.944s
sys   0m0.241s


So clearly the problem isn't parsing JSON in general but rather =
generating an object tree for a large input stream.  Note that the D app =
used gigabytes of memory to process this file--I believe the total VM =
footprint was around 3.5 GB--while my app used a fixed amount roughly =
equal to the size of the input file.  In short, DOM style parsers are =
great for small data and terrible for large data.

Oct 12 2012

Jacob Carlborg <doob me.com> writes:

On 2012-10-13 01:26, Sean Kelly wrote:

 Here are my results:


 $ dmd -release -inline -O dtest
 $ ll input.txt
 -rw-r--r--  1 sean  staff  365105313 Oct 12 15:50 input.txt
 $ time dtest

 real  1m36.462s
 user 1m32.468s
 sys   0m1.102s


 Then I ran my SAX style parser example on the same input file:


 $ make example
 cc example.c -o example lib/release/myparser.a
 $ time example

 real  0m2.191s
 user 0m1.944s
 sys   0m0.241s


 So clearly the problem isn't parsing JSON in general but rather generating an
object tree for a large input stream.  Note that the D app used gigabytes of
memory to process this file--I believe the total VM footprint was around 3.5
GB--while my app used a fixed amount roughly equal to the size of the input
file.  In short, DOM style parsers are great for small data and terrible for
large data.

I tried JSON parser in Tango, using D2, this is the results I got for a 
file just below 360 MB:

real	1m2.848s
user	0m58.321s
sys	0m1.423s

Since the XML parser in Tango is so fast I expected more from the JSON 
parser as well. But I have no idea what kind of parser the JSON parser uses.

-- 
/Jacob Carlborg

Oct 13 2012

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Save JSONValue binary in file?