www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Getting underlying struct for parseJSON

reply Alexey H <ahaidamaka gmail.com> writes:
Hello, guys!

I'm working on a project that involves parsing of huge JSON 
datasets in real-time.
Just an example of what i'm dealing with is here:

https://gist.githubusercontent.com/gdmka/125014058bb7d7f01b867fac56300a61/raw/f0c6b5be5fb01b16dd83f07c577b72f76f72c855/data.json

Can't think of any tools other that D or Go to solve this problem.

My experience of solving the problem with Go has led me to 
Stackoverflow and the community out there seemed too reluctant to 
help so i assumed that the language cannot handle such a set of 
operations on complex datastructures.

My experience with D was like a charm. Where i have had ~100 
lines of Go code i did the same  with 12 in with D. But, 
nevertheless, i did some profiling (unfortunately on OS X) and 
saw much heavier CPU usage with D than Go. Probably because the 
Go solution was unpacking all the data strictly to struct.

So, my real question is: can i actually, by any change, get the 
description of an underlying struct that the call to parseJSON 
generates?

The goers have this thing https://mholt.github.io/json-to-go/ to 
generate structs from JSON automatically.

Since D easily parses JSON by type inference, i assume it builds 
a JSONValue struct which holds all the fields and the data.

If it is possible, then i can build a similar JSON to D tool just 
for the sake of saving people's time and patience.
Feb 28
next sibling parent reply Petar Kirov [ZombineDev] <petar.p.kirov gmail.com> writes:
On Tuesday, 28 February 2017 at 20:27:25 UTC, Alexey H wrote:
 Hello, guys!

 I'm working on a project that involves parsing of huge JSON 
 datasets in real-time.
 Just an example of what i'm dealing with is here:

 https://gist.githubusercontent.com/gdmka/125014058bb7d7f01b867fac56300a61/raw/f0c6b5be5fb01b16dd83f07c577b72f76f72c855/data.json

 Can't think of any tools other that D or Go to solve this 
 problem.

 My experience of solving the problem with Go has led me to 
 Stackoverflow and the community out there seemed too reluctant 
 to help so i assumed that the language cannot handle such a set 
 of operations on complex datastructures.

 My experience with D was like a charm. Where i have had ~100 
 lines of Go code i did the same  with 12 in with D. But, 
 nevertheless, i did some profiling (unfortunately on OS X) and 
 saw much heavier CPU usage with D than Go. Probably because the 
 Go solution was unpacking all the data strictly to struct.

 So, my real question is: can i actually, by any change, get the 
 description of an underlying struct that the call to parseJSON 
 generates?

 The goers have this thing https://mholt.github.io/json-to-go/ 
 to generate structs from JSON automatically.

 Since D easily parses JSON by type inference, i assume it 
 builds a JSONValue struct which holds all the fields and the 
 data.

 If it is possible, then i can build a similar JSON to D tool 
 just for the sake of saving people's time and patience.
If you really care about performance, have a look this: http://forum.dlang.org/post/20151014090114.60780ad6 marco-toshiba std.json is not tuned for performance, so don't expect good results from it.
Feb 28
next sibling parent Alexey H <ahaidamaka gmail.com> writes:
On Tuesday, 28 February 2017 at 20:48:33 UTC, Petar Kirov 
[ZombineDev] wrote:
 On Tuesday, 28 February 2017 at 20:27:25 UTC, Alexey H wrote:
 Hello, guys!

 I'm working on a project that involves parsing of huge JSON 
 datasets in real-time.
 Just an example of what i'm dealing with is here:

 https://gist.githubusercontent.com/gdmka/125014058bb7d7f01b867fac56300a61/raw/f0c6b5be5fb01b16dd83f07c577b72f76f72c855/data.json

 Can't think of any tools other that D or Go to solve this 
 problem.

 My experience of solving the problem with Go has led me to 
 Stackoverflow and the community out there seemed too reluctant 
 to help so i assumed that the language cannot handle such a 
 set of operations on complex datastructures.

 My experience with D was like a charm. Where i have had ~100 
 lines of Go code i did the same  with 12 in with D. But, 
 nevertheless, i did some profiling (unfortunately on OS X) and 
 saw much heavier CPU usage with D than Go. Probably because 
 the Go solution was unpacking all the data strictly to struct.

 So, my real question is: can i actually, by any change, get 
 the description of an underlying struct that the call to 
 parseJSON generates?

 The goers have this thing https://mholt.github.io/json-to-go/ 
 to generate structs from JSON automatically.

 Since D easily parses JSON by type inference, i assume it 
 builds a JSONValue struct which holds all the fields and the 
 data.

 If it is possible, then i can build a similar JSON to D tool 
 just for the sake of saving people's time and patience.
If you really care about performance, have a look this: http://forum.dlang.org/post/20151014090114.60780ad6 marco-toshiba std.json is not tuned for performance, so don't expect good results from it.
I am not expecting good results from stdlib's json. As for now i just need a concise way to get 1.2-1.5 MB JSON and dump it into a struct to perform numeric computations. Since it's not the only data source i will be parsing, i need a straightforward way to generate D structs right out of the predefined JSON schema. So i am willing to sacrifice some speed for convenience at this point. Fastjson might be good when dealing with trusted input, but this is not my case.
Feb 28
prev sibling parent Seb <seb wilzba.ch> writes:
On Tuesday, 28 February 2017 at 20:48:33 UTC, Petar Kirov 
[ZombineDev] wrote:
 On Tuesday, 28 February 2017 at 20:27:25 UTC, Alexey H wrote:
 [...]
If you really care about performance, have a look this: http://forum.dlang.org/post/20151014090114.60780ad6 marco-toshiba std.json is not tuned for performance, so don't expect good results from it.
It's a bit OT, but asdf is even faster and has a simple API: https://github.com/tamediadigital/asdf In terms of performance:
 Reading JSON line separated values and parsing them to ASDF - 
 300+ MB per second (SSD).
 Writing ASDF range to JSON line separated values - 300+ MB per 
 second (SSD).
Another good library is std.data.json (Json parsing extracted from Vibe.d): https://github.com/s-ludwig/std_data_json
Feb 28
prev sibling parent reply Adam D. Ruppe <destructionator gmail.com> writes:
On Tuesday, 28 February 2017 at 20:27:25 UTC, Alexey H wrote:
 So, my real question is: can i actually, by any change, get the 
 description of an underlying struct that the call to parseJSON 
 generates?
It doesn't actually generate one, it just returns a tagged union (a kind of dynamic type). But, my json inspector program (never finished btw) has something that can kinda generate structs from json: https://github.com/adamdruppe/inspector I ran your data.json through my program, and it spat this out (well, not directly, I did a few minor tweaks by hand since it doesn't output 100% valid D, but it is a good start): struct Json_t { struct sports_t { long regionId; string name; long id; string sortOrder; long parentId; string kind; } sports_t[] sports; long siteVersion; struct eventBlocks_t { long[] factors; long eventId; string state; } eventBlocks_t[] eventBlocks; struct customFactors_t { long lo; long e; bool isLive; double v; string pt; long f; long p; long hi; } customFactors_t[] customFactors; struct announcements_t { long segmentId; string place; bool liveHalf; long regionId; string name; long[] tv; string segmentName; string segmentSortOrder; long id; long num; string namePrefix; string team1; long startTime; long sportId; string team2; } announcements_t[] announcements; struct events_t { long level; string sortOrder; string place; string name; long rootKind; long parentId; long id; long num; string namePrefix; string team1; long kind; struct state_t { bool inHotList; bool willBeLive; bool liveHalf; } state_t state; long startTime; long sportId; long priority; string team2; } events_t[] events; long packetVersion; struct eventMiscs_t { long liveDelay; long timerUpdateTimestamp; long[] tv; string comment; long score2; long servingTeam; long id; long timerSeconds; long timerDirection; long score1; } eventMiscs_t[] eventMiscs; long fromVersion; long factorsVersion; } And I *think* my jsvar.d has magic methods to load up one of those.... but since my jsvar.d just uses std.json and then builds up junk on top of it, it will necessarily be even slower than what you already have :S so meh.
Feb 28
parent Alexey H <ahaidamaka gmail.com> writes:
On Tuesday, 28 February 2017 at 21:21:30 UTC, Adam D. Ruppe wrote:
 On Tuesday, 28 February 2017 at 20:27:25 UTC, Alexey H wrote:
 [...]
It doesn't actually generate one, it just returns a tagged union (a kind of dynamic type). [...]
Superb, Adam, thank you! I need to check out inspector. The std.json will be used solely to generate proper structs. I expect to do all the heavy stuff via http://code.dlang.org/packages/jsonserialized since it uses vibe.d's JSON implementation, my expectations are that it would be faster.
Mar 01