www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Lazily parse a JSON text file using stdx.data.json?

reply David Gileadi <gileadisNOSPM gmail.com> writes:
I'm a longtime fan of dlang, but haven't had a chance to do much 
in-depth dlang programming, and especially not range programming. Today 
I thought I'd use stdx.data.json to read from a text file. Since it's a 
somewhat large file, I thought I'd create a text range from the file and 
parse it that way. stdx.data.json has a great interface for lazily 
parsing text into JSON values, so all I had to do was turn a text file 
into a lazy range of UTF-8 chars that stdx.data.json's lexer could use. 
(In my best Clarkson voice:) How hard could it be?

Several hours later, I've finally given up and am just reading the whole 
file into a string. There may be a magic incantation I could use to make 
it work, but I can't find it, and frankly I can't see why I should need 
an incantation in the first place. It really ought to just be a method 
of std.stdio.File.

Apparently some of the complexity is caused by autodecoding (e.g. joiner 
returns a range of dchar from char ranges), and some of the fault may be 
in stdx.data.json, but either way I'm surprised that I couldn't do it. 
This is the kind of thing I expected to be ground level stuff.
Dec 16 2017
next sibling parent reply Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Saturday, December 16, 2017 21:34:22 David Gileadi via Digitalmars-d 
wrote:
 I'm a longtime fan of dlang, but haven't had a chance to do much
 in-depth dlang programming, and especially not range programming. Today
 I thought I'd use stdx.data.json to read from a text file. Since it's a
 somewhat large file, I thought I'd create a text range from the file and
 parse it that way. stdx.data.json has a great interface for lazily
 parsing text into JSON values, so all I had to do was turn a text file
 into a lazy range of UTF-8 chars that stdx.data.json's lexer could use.
 (In my best Clarkson voice:) How hard could it be?

 Several hours later, I've finally given up and am just reading the whole
 file into a string. There may be a magic incantation I could use to make
 it work, but I can't find it, and frankly I can't see why I should need
 an incantation in the first place. It really ought to just be a method
 of std.stdio.File.

 Apparently some of the complexity is caused by autodecoding (e.g. joiner
 returns a range of dchar from char ranges), and some of the fault may be
 in stdx.data.json, but either way I'm surprised that I couldn't do it.
 This is the kind of thing I expected to be ground level stuff.
I don't know what problems specifically you were hitting, but a lot of range-based stuff (especially parsing) requires forward ranges so that there can be some amount of lookahead (having just a basic input range can be incredibly restrictive), and forward ranges and lazily reading from a file don't tend to go together very well, because it tends to require allocating buffers that then have to be copied on save. It gets to be rather difficult to do it efficiently. std.stdio.File does support lazily reading in a file, which works well with foreach, but if you're trying to process the entire file as a range, it's usually just way easier to read in the entire file at once and operate on it as a dynamic array. The option halfway in between is to use std.mmfile so that the file gets treated as a dynamic array but the OS is reading it in piecemeal for you. If I were seriously looking at reading in a file lazily as a forward range, I'd look at http://code.dlang.org/packages/iopipe, though as I understand it, it's very much a work in progress. As for auto-decoding, yeah, it sucks. You can work around it with stuff like std.utf.byCodeUnit, but auto-decoding is a problem all around, and it's one that we're likely stuck with, because unfortunately, we haven't found a way to remove it without breaking everything. - Jonathan M Davis
Dec 17 2017
parent Steven Schveighoffer <schveiguy yahoo.com> writes:
On 12/17/17 4:44 AM, Jonathan M Davis wrote:

 If I were seriously looking at
 reading in a file lazily as a forward range, I'd look at
 http://code.dlang.org/packages/iopipe, though as I understand it, it's very
 much a work in progress.
There is an even more work-in-progress library built on that, but it's not yet in dub (this was the library I wrote for my dconf talk this year): https://github.com/schveiguy/jsoniopipe This kind of demonstrates how to parse json data lazily with pretty high performance. It really depends on what you are trying to do, though.
 As for auto-decoding, yeah, it sucks. You can work around it with stuff like
 std.utf.byCodeUnit, but auto-decoding is a problem all around, and it's one
 that we're likely stuck with, because unfortunately, we haven't found a way
 to remove it without breaking everything.
I think there eventually will have to be a day of reckoning for auto-decoding. But it probably will take a monumental effort to show how it can be done without being too painful for existing code. I still believe it can be done. -Steve
Dec 17 2017
prev sibling parent reply WebFreak001 <d.forum webfreak.org> writes:
On Sunday, 17 December 2017 at 04:34:22 UTC, David Gileadi wrote:
 I'm a longtime fan of dlang, but haven't had a chance to do 
 much in-depth dlang programming, and especially not range 
 programming. Today I thought I'd use stdx.data.json to read 
 from a text file. Since it's a somewhat large file, I thought 
 I'd create a text range from the file and parse it that way. 
 stdx.data.json has a great interface for lazily parsing text 
 into JSON values, so all I had to do was turn a text file into 
 a lazy range of UTF-8 chars that stdx.data.json's lexer could 
 use. (In my best Clarkson voice:) How hard could it be?

 [...]
uh I don't know about stdx.data.json but if you didn't manage to succeed yet, I know that asdf[1] works really well with streaming json. There is also an example how it works. [1]: http://asdf.dub.pm
Dec 17 2017
parent reply David Gileadi <gileadisNOSPM gmail.com> writes:
On 12/17/17 3:28 AM, WebFreak001 wrote:
 On Sunday, 17 December 2017 at 04:34:22 UTC, David Gileadi wrote:
 uh I don't know about stdx.data.json but if you didn't manage to succeed 
 yet, I know that asdf[1] works really well with streaming json. There is 
 also an example how it works.
 
 [1]: http://asdf.dub.pm
Thanks, reading the whole file into memory worked fine. However, asdf looks really cool. I'll definitely look into next time I need to deal with JSON.
Dec 17 2017
parent reply Marco Leise <Marco.Leise gmx.de> writes:
Am Sun, 17 Dec 2017 10:21:33 -0700
schrieb David Gileadi <gileadisNOSPM gmail.com>:

 On 12/17/17 3:28 AM, WebFreak001 wrote:
 On Sunday, 17 December 2017 at 04:34:22 UTC, David Gileadi wrote:
 uh I don't know about stdx.data.json but if you didn't manage to succeed 
 yet, I know that asdf[1] works really well with streaming json. There is 
 also an example how it works.
 
 [1]: http://asdf.dub.pm  
Thanks, reading the whole file into memory worked fine. However, asdf looks really cool. I'll definitely look into next time I need to deal with JSON.
There is also the JSON parser from https://github.com/mleise/fast if you need to parse 2x faster than RapidJSON ;) -- Marco
Dec 30 2017
parent David Gileadi <gileadisNOSPM gmail.com> writes:
On 12/30/17 8:16 PM, Marco Leise wrote:
 There is also the JSON parser from
 https://github.com/mleise/fast
 if you need to parse 2x faster than RapidJSON ;)
Nice, I'll take a look. My original post was mainly to express how surprised I was that one of D's front-page features was, for me, impossible to get working in this context. I posted in hopes that more experienced folks might consider making fixes to help smooth future attempts by others. I realize that compile-time ranges are not runtime interfaces like many languages provide for iteration, but right now ranges seem too hard to get right when it feels like they should just work.
Jan 01