digitalmars.D.learn - What is the best way to use requests and iopipe on gzipped JSON file

Andrew Edwards (36/36) Oct 13 2017 A bit of advice, please. I'm trying to parse a gzipped JSON file

Steven Schveighoffer (19/56) Oct 13 2017 input is an iopipe of char, wchar, or dchar. There is no need to cast it...

Steven Schveighoffer (4/7) Oct 13 2017 This should work today, actually. Didn't think about it before.

Andrew Edwards (5/13) Oct 13 2017 No, it errored out:

Steven Schveighoffer (5/20) Oct 13 2017 I reproduced, and it comes down to some sort of bug when size_t.max is

Steven Schveighoffer (10/33) Oct 13 2017 I think I know, the buffered input source is attempting to allocate a

Steven Schveighoffer (3/19) Oct 28 2017 This is now fixed. https://github.com/schveiguy/iopipe/pull/12

Andrew Edwards (22/54) Oct 13 2017 In this particular case, all three types (char[], wchar[], and

Steven Schveighoffer (10/56) Oct 13 2017 This has to be a misunderstanding. The point of runEncoded is to figure

Andrew Edwards (60/119) Oct 13 2017 Maybe I'm just not finding the correct words to express my

Steven Schveighoffer (9/42) Oct 13 2017 Ah, OK. So the way runEncoded works is it necessarily instantiates your

Andrew Edwards (4/7) Oct 13 2017 Yup. Thanks again.

ikod (17/75) Oct 13 2017 This can be done with requests. You can ask not to load whole

Steven Schveighoffer (7/29) Oct 13 2017 Very nice, I will add a component to iopipe that converts a "chunk-like"...

ikod (15/49) Oct 17 2017 Hello, Steve

Steven Schveighoffer (15/58) Oct 17 2017 Right, iopipe can use it just fine, without copying, as all arrays are

Andrew Edwards <edwards.ac gmail.com> writes:

A bit of advice, please. I'm trying to parse a gzipped JSON file 
retrieved from the internet. The following naive implementation 
accomplishes the task:

	auto url = 
"http://api.syosetu.com/novelapi/api/?out=json&lim=500&gzip=5";
	getContent(url)
		.data
		.unzip
		.runEncoded!((input) {
			ubyte[] content;
			foreach (line; input.byLineRange!true) {
				content ~= cast(ubyte[])line;
			}
			auto json = (cast(string)content).parseJSON;
			foreach (size_t ndx, record; json) {
				if (ndx == 0) continue;
				auto title = json[ndx]["title"].str;
				auto author = json[ndx]["writer"].str;
				writefln("title: %s", title);
				writefln("author: %s\n", author);
			}
		});

However, I'm sure there is a much better way to accomplish this. 
Is there any way to accomplish something akin to:

	auto url = 
"http://api.syosetu.com/novelapi/api/?out=json&lim=500&gzip=5";
	getContent(url)
		.data
		.unzip
		.runEncoded!((input) {
			foreach (record; input.data.parseJSON[1 .. $]) {
				// use or update record as desired
			}
		});

Thanks,
Andrew

Oct 13 2017

Steven Schveighoffer <schveiguy yahoo.com> writes:

On 10/13/17 2:47 PM, Andrew Edwards wrote:
 A bit of advice, please. I'm trying to parse a gzipped JSON file 
 retrieved from the internet. The following naive implementation 
 accomplishes the task:
 
      auto url = 
 "http://api.syosetu.com/novelapi/api/?out=json&lim=500&gzip=5";
      getContent(url)
          .data
          .unzip
          .runEncoded!((input) {
              ubyte[] content;
              foreach (line; input.byLineRange!true) {
                  content ~= cast(ubyte[])line;
              }
              auto json = (cast(string)content).parseJSON;

input is an iopipe of char, wchar, or dchar. There is no need to cast it 
around.

Also, there is no need to split it by line, json doesn't care.

Note also that getContent returns a complete body, but unzip may not be 
so forgiving. But there definitely isn't a reason to create your own 
buffer here.

this should work (something like this really should be in iopipe):

while(input.extend(0) != 0) {} // get data until EOF

And then:
auto json = input.window.parseJSON;

              foreach (size_t ndx, record; json) {
                  if (ndx == 0) continue;
                  auto title = json[ndx]["title"].str;
                  auto author = json[ndx]["writer"].str;
                  writefln("title: %s", title);
                  writefln("author: %s\n", author);
              }
          });
 
 However, I'm sure there is a much better way to accomplish this. Is 
 there any way to accomplish something akin to:
 
      auto url = 
 "http://api.syosetu.com/novelapi/api/?out=json&lim=500&gzip=5";
      getContent(url)
          .data
          .unzip
          .runEncoded!((input) {
              foreach (record; input.data.parseJSON[1 .. $]) {
                  // use or update record as desired
              }
          });

Eventually, something like this will be possible with jsoniopipe (I need 
to update and release this too, it's probably broken with some of the 
changes I just put into iopipe). Hopefully combined with some sort of 
networking library you could process a JSON stream without reading the 
whole thing into memory.

Right now, it works just like std.json.parseJSON: it parses an entire 
JSON message into a DOM form.

-Steve

Oct 13 2017

Steven Schveighoffer <schveiguy yahoo.com> writes:

On 10/13/17 3:17 PM, Steven Schveighoffer wrote:

 this should work (something like this really should be in iopipe):
 
 while(input.extend(0) != 0) {} // get data until EOF

This should work today, actually. Didn't think about it before.

input.ensureElems(size_t.max);

-Steve

Oct 13 2017

Andrew Edwards <edwards.ac gmail.com> writes:

On Friday, 13 October 2017 at 20:17:50 UTC, Steven Schveighoffer 
wrote:
 On 10/13/17 3:17 PM, Steven Schveighoffer wrote:

 this should work (something like this really should be in 
 iopipe):
 
 while(input.extend(0) != 0) {} // get data until EOF

 This should work today, actually. Didn't think about it before.

 input.ensureElems(size_t.max);

 -Steve

No, it errored out:
std.json.JSONException std/json.d(1400): Unexpected end of data. 
(Line 1:8192)

Oct 13 2017

Steven Schveighoffer <schveiguy yahoo.com> writes:

On 10/13/17 4:30 PM, Andrew Edwards wrote:
 On Friday, 13 October 2017 at 20:17:50 UTC, Steven Schveighoffer wrote:
 On 10/13/17 3:17 PM, Steven Schveighoffer wrote:

 this should work (something like this really should be in iopipe):

 while(input.extend(0) != 0) {} // get data until EOF

 This should work today, actually. Didn't think about it before.

 input.ensureElems(size_t.max);

 
 No, it errored out:
 std.json.JSONException std/json.d(1400): Unexpected end of data. (Line 
 1:8192)


I reproduced, and it comes down to some sort of bug when size_t.max is 
passed to ensureElems.

I will find and eradicate it.

-Steve

Oct 13 2017

Steven Schveighoffer <schveiguy yahoo.com> writes:

On 10/13/17 6:07 PM, Steven Schveighoffer wrote:
 On 10/13/17 4:30 PM, Andrew Edwards wrote:
 On Friday, 13 October 2017 at 20:17:50 UTC, Steven Schveighoffer wrote:
 On 10/13/17 3:17 PM, Steven Schveighoffer wrote:

 this should work (something like this really should be in iopipe):

 while(input.extend(0) != 0) {} // get data until EOF

 This should work today, actually. Didn't think about it before.

 input.ensureElems(size_t.max);

 No, it errored out:
 std.json.JSONException std/json.d(1400): Unexpected end of data. (Line 
 1:8192)

 
 
 I reproduced, and it comes down to some sort of bug when size_t.max is 
 passed to ensureElems.
 
 I will find and eradicate it.
 

I think I know, the buffered input source is attempting to allocate a 
size_t.max size buffer to hold the expected new data, and cannot do so 
(obviously). I need to figure out how to handle this properly. I 
shouldn't be prematurely extending the buffer to read all that data.

The while loop does work, I may change ensureElems(size_t.max) to do 
this. But I'm concerned about accidentally allocating huge buffers. For 
example ensureElems(1_000_000_000) works, but probably allocates a GB of 
space in order to "work"!

-Steve

Oct 13 2017

Steven Schveighoffer <schveiguy yahoo.com> writes:

On 10/13/17 6:18 PM, Steven Schveighoffer wrote:
 On 10/13/17 6:07 PM, Steven Schveighoffer wrote:

 I reproduced, and it comes down to some sort of bug when size_t.max is 
 passed to ensureElems.

 I will find and eradicate it.

 
 I think I know, the buffered input source is attempting to allocate a 
 size_t.max size buffer to hold the expected new data, and cannot do so 
 (obviously). I need to figure out how to handle this properly. I 
 shouldn't be prematurely extending the buffer to read all that data.
 
 The while loop does work, I may change ensureElems(size_t.max) to do 
 this. But I'm concerned about accidentally allocating huge buffers. For 
 example ensureElems(1_000_000_000) works, but probably allocates a GB of 
 space in order to "work"!

This is now fixed. https://github.com/schveiguy/iopipe/pull/12

-Steve

Oct 28 2017

Andrew Edwards <edwards.ac gmail.com> writes:

On Friday, 13 October 2017 at 19:17:54 UTC, Steven Schveighoffer 
wrote:
 On 10/13/17 2:47 PM, Andrew Edwards wrote:
 A bit of advice, please. I'm trying to parse a gzipped JSON 
 file retrieved from the internet. The following naive 
 implementation accomplishes the task:
 
      auto url = 
 "http://api.syosetu.com/novelapi/api/?out=json&lim=500&gzip=5";
      getContent(url)
          .data
          .unzip
          .runEncoded!((input) {
              ubyte[] content;
              foreach (line; input.byLineRange!true) {
                  content ~= cast(ubyte[])line;
              }
              auto json = (cast(string)content).parseJSON;

 input is an iopipe of char, wchar, or dchar. There is no need 
 to cast it around.

In this particular case, all three types (char[], wchar[], and 
dchar[]) are being returned at different points in the loop. I 
don't know of any other way to generate a unified buffer than 
casting it to ubyte[].

 Also, there is no need to split it by line, json doesn't care.

I thought as much but my mind was not open enough to see the 
solution.

 Note also that getContent returns a complete body, but unzip 
 may not be so forgiving. But there definitely isn't a reason to 
 create your own buffer here.

 this should work (something like this really should be in 
 iopipe):

 while(input.extend(0) != 0) {} // get data until EOF

This!!! This is what I was looking for. Thank you. I incorrectly 
assumed that if I didn't process the content of input.window, it 
would be overwritten on each .extend() so my implementation was:

ubyte[] json;
while(input.extend(0) != 0) {
     json ~= input.window;
}

This didn't work because it invalidated the Unicode data so I 
ended up splitting by line instead.

Sure enough, this is trivial once one knows how to use it 
correctly, but I think it would be better to put this in the 
library as extendAll().

 And then:
 Eventually, something like this will be possible with 
 jsoniopipe (I need to update and release this too, it's 
 probably broken with some of the changes I just put into 
 iopipe). Hopefully combined with some sort of networking 
 library you could process a JSON stream without reading the 
 whole thing into memory.

That would be awesome. Again, thank you very much.

Andrew

Oct 13 2017

Steven Schveighoffer <schveiguy yahoo.com> writes:

On 10/13/17 4:27 PM, Andrew Edwards wrote:
 On Friday, 13 October 2017 at 19:17:54 UTC, Steven Schveighoffer wrote:
 On 10/13/17 2:47 PM, Andrew Edwards wrote:
 A bit of advice, please. I'm trying to parse a gzipped JSON file 
 retrieved from the internet. The following naive implementation 
 accomplishes the task:

      auto url = 
 "http://api.syosetu.com/novelapi/api/?out=json&lim=500&gzip=5";
      getContent(url)
          .data
          .unzip
          .runEncoded!((input) {
              ubyte[] content;
              foreach (line; input.byLineRange!true) {
                  content ~= cast(ubyte[])line;
              }
              auto json = (cast(string)content).parseJSON;

 input is an iopipe of char, wchar, or dchar. There is no need to cast 
 it around.

 
 In this particular case, all three types (char[], wchar[], and dchar[]) 
 are being returned at different points in the loop. I don't know of any 
 other way to generate a unified buffer than casting it to ubyte[].

This has to be a misunderstanding. The point of runEncoded is to figure 
out the correct type (based on the BOM), and run your lambda function 
with the correct type for the whole thing.

I'm not sure actually this is even needed, as the data could be coming 
through without a BOM. Without a BOM, it assumes UTF8.

 Note also that getContent returns a complete body, but unzip may not 
 be so forgiving. But there definitely isn't a reason to create your 
 own buffer here.

 this should work (something like this really should be in iopipe):

 while(input.extend(0) != 0) {} // get data until EOF

 
 This!!! This is what I was looking for. Thank you. I incorrectly assumed 
 that if I didn't process the content of input.window, it would be 
 overwritten on each .extend() so my implementation was:
 
 ubyte[] json;
 while(input.extend(0) != 0) {
      json ~= input.window;
 }
 
 This didn't work because it invalidated the Unicode data so I ended up 
 splitting by line instead.
 
 Sure enough, this is trivial once one knows how to use it correctly, but 
 I think it would be better to put this in the library as extendAll().

ensureElems(size_t.max) should be equivalent, though I see you responded 
cryptically with something about JSON there :)

I will try and reproduce your error, and see if I can figure out why.

-Steve

Oct 13 2017

Andrew Edwards <edwards.ac gmail.com> writes:

On Friday, 13 October 2017 at 21:53:12 UTC, Steven Schveighoffer 
wrote:
 On 10/13/17 4:27 PM, Andrew Edwards wrote:
 On Friday, 13 October 2017 at 19:17:54 UTC, Steven 
 Schveighoffer wrote:
 On 10/13/17 2:47 PM, Andrew Edwards wrote:
 A bit of advice, please. I'm trying to parse a gzipped JSON 
 file retrieved from the internet. The following naive 
 implementation accomplishes the task:

      auto url = 
 "http://api.syosetu.com/novelapi/api/?out=json&lim=500&gzip=5";
      getContent(url)
          .data
          .unzip
          .runEncoded!((input) {
              ubyte[] content;
              foreach (line; input.byLineRange!true) {
                  content ~= cast(ubyte[])line;
              }
              auto json = (cast(string)content).parseJSON;

 input is an iopipe of char, wchar, or dchar. There is no need 
 to cast it around.

 
 In this particular case, all three types (char[], wchar[], and 
 dchar[]) are being returned at different points in the loop. I 
 don't know of any other way to generate a unified buffer than 
 casting it to ubyte[].

 This has to be a misunderstanding. The point of runEncoded is 
 to figure out the correct type (based on the BOM), and run your 
 lambda function with the correct type for the whole thing.

Maybe I'm just not finding the correct words to express my 
thoughts. This is what I mean:

// ===========

void main()
{
	auto url = 
"http://api.syosetu.com/novelapi/api/?out=json&lim=500&gzip=5";
	getContent(url)
		.data
		.unzip
		.runEncoded!((input) {
			char[] content; // Line 20
			foreach (line; input.byLineRange!true) {
				content ~= line;
			}
		});
}

output:
source/app.d(20,13): Error: cannot append type wchar[] to type 
char[]

Changing line 20 to wchar yields:
source/app.d(20,13): Error: cannot append type char[] to type 
wchar[]

And changing it to dchar[] yields:
source/app.d(20,13): Error: cannot append type char[] to type 
dchar[]

 I'm not sure actually this is even needed, as the data could be 
 coming through without a BOM. Without a BOM, it assumes UTF8.

 Note also that getContent returns a complete body, but unzip 
 may not be so forgiving. But there definitely isn't a reason 
 to create your own buffer here.

 this should work (something like this really should be in 
 iopipe):

 while(input.extend(0) != 0) {} // get data until EOF

 
 This!!! This is what I was looking for. Thank you. I 
 incorrectly assumed that if I didn't process the content of 
 input.window, it would be overwritten on each .extend() so my 
 implementation was:
 
 ubyte[] json;
 while(input.extend(0) != 0) {
      json ~= input.window;
 }
 
 This didn't work because it invalidated the Unicode data so I 
 ended up splitting by line instead.
 
 Sure enough, this is trivial once one knows how to use it 
 correctly, but I think it would be better to put this in the 
 library as extendAll().

 ensureElems(size_t.max) should be equivalent, though I see you 
 responded cryptically with something about JSON there :)

:) I'll have to blame it on my Security+ training. Switching out 
the while loop with ensureElements() in the following results in 
an error:

void main()
{
	auto url = 
"http://api.syosetu.com/novelapi/api/?out=json&lim=500&gzip=5";
	getContent(url)
		.data
		.unzip
		.runEncoded!((input) {
			// while(input.extend(0) != 0){} // this works
			input.ensureElems(size_t.max); // this doesn't
			auto json = input.window.parseJSON;
			foreach (size_t ndx, _; json) {
				if (ndx == 0) continue;
				auto title = json[ndx]["title"].str;
				auto author = json[ndx]["writer"].str;
				writefln("title: %s", title);
				writefln("author: %s\n", author);
			}
		});
}

output:

Running ./uhost
std.json.JSONException std/json.d(1400): Unexpected end of data. 
(Line 1:8192)
----------------
4   uhost                               0x000000010b671112 pure 
 safe void std.json.parseJSON!(char[]).parseJSON(char[], int, 
std.json.JSONOptions).error(immutable(char)[]) + 86

[etc]

Oct 13 2017

Steven Schveighoffer <schveiguy yahoo.com> writes:

On 10/13/17 6:24 PM, Andrew Edwards wrote:
 On Friday, 13 October 2017 at 21:53:12 UTC, Steven Schveighoffer wrote:
 This has to be a misunderstanding. The point of runEncoded is to 
 figure out the correct type (based on the BOM), and run your lambda 
 function with the correct type for the whole thing.

 
 Maybe I'm just not finding the correct words to express my thoughts. 
 This is what I mean:
 
 // ===========
 
 void main()
 {
      auto url = 
 "http://api.syosetu.com/novelapi/api/?out=json&lim=500&gzip=5";
      getContent(url)
          .data
          .unzip
          .runEncoded!((input) {
              char[] content; // Line 20
              foreach (line; input.byLineRange!true) {
                  content ~= line;
              }
          });
 }
 
 output:
 source/app.d(20,13): Error: cannot append type wchar[] to type char[]
 
 Changing line 20 to wchar yields:
 source/app.d(20,13): Error: cannot append type char[] to type wchar[]
 
 And changing it to dchar[] yields:
 source/app.d(20,13): Error: cannot append type char[] to type dchar[]

Ah, OK. So the way runEncoded works is it necessarily instantiates your 
lambda with all types of iopipes that it might need. Then it decides at 
runtime which one to call.

So for a single call, it may be one of those 3, but always the same 
within the loop.

It might be tough to do it right, but moot point now, since it's not 
necessary anyway :)

-Steve

Oct 13 2017

Andrew Edwards <edwards.ac gmail.com> writes:

On Friday, 13 October 2017 at 22:29:39 UTC, Steven Schveighoffer 
wrote:
 It might be tough to do it right, but moot point now, since 
 it's not necessary anyway :)

 -Steve

Yup. Thanks again.

Andrew

Oct 13 2017

ikod <geller.garry gmail.com> writes:

On Friday, 13 October 2017 at 19:17:54 UTC, Steven Schveighoffer 
wrote:
 On 10/13/17 2:47 PM, Andrew Edwards wrote:
 A bit of advice, please. I'm trying to parse a gzipped JSON 
 file retrieved from the internet. The following naive 
 implementation accomplishes the task:
 
      auto url = 
 "http://api.syosetu.com/novelapi/api/?out=json&lim=500&gzip=5";
      getContent(url)
          .data
          .unzip
          .runEncoded!((input) {
              ubyte[] content;
              foreach (line; input.byLineRange!true) {
                  content ~= cast(ubyte[])line;
              }
              auto json = (cast(string)content).parseJSON;

 input is an iopipe of char, wchar, or dchar. There is no need 
 to cast it around.

 Also, there is no need to split it by line, json doesn't care.

 Note also that getContent returns a complete body, but unzip 
 may not be so forgiving. But there definitely isn't a reason to 
 create your own buffer here.

 this should work (something like this really should be in 
 iopipe):

 while(input.extend(0) != 0) {} // get data until EOF

 And then:
 auto json = input.window.parseJSON;

              foreach (size_t ndx, record; json) {
                  if (ndx == 0) continue;
                  auto title = json[ndx]["title"].str;
                  auto author = json[ndx]["writer"].str;
                  writefln("title: %s", title);
                  writefln("author: %s\n", author);
              }
          });
 
 However, I'm sure there is a much better way to accomplish 
 this. Is there any way to accomplish something akin to:
 
      auto url = 
 "http://api.syosetu.com/novelapi/api/?out=json&lim=500&gzip=5";
      getContent(url)
          .data
          .unzip
          .runEncoded!((input) {
              foreach (record; input.data.parseJSON[1 .. $]) {
                  // use or update record as desired
              }
          });

 Eventually, something like this will be possible with 
 jsoniopipe (I need to update and release this too, it's 
 probably broken with some of the changes I just put into 
 iopipe). Hopefully combined with some sort of networking 
 library you could process a JSON stream without reading the 
 whole thing into memory.

This can be done with requests. You can ask not to load whole 
content in memory, but instead produce input range, which will 
continue to load data from server when you will  be ready to 
consume:

     auto rq = Request();
     rq.useStreaming = true;
     auto rs = rq.get("http://httpbin.org/image/jpeg");
     auto stream = rs.receiveAsRange();
     while(!stream.empty) {
         // stream.front contain next data portion
         writefln("Received %d bytes, total received %d from 
document legth %d", stream.front.length, rq.contentReceived, 
rq.contentLength);
         stream.popFront; // continue to load from server
     }


 Right now, it works just like std.json.parseJSON: it parses an 
 entire JSON message into a DOM form.

 -Steve

Oct 13 2017

Steven Schveighoffer <schveiguy yahoo.com> writes:

On 10/13/17 6:18 PM, ikod wrote:
 On Friday, 13 October 2017 at 19:17:54 UTC, Steven Schveighoffer wrote:
 Eventually, something like this will be possible with jsoniopipe (I 
 need to update and release this too, it's probably broken with some of 
 the changes I just put into iopipe). Hopefully combined with some sort 
 of networking library you could process a JSON stream without reading 
 the whole thing into memory.

 
 This can be done with requests. You can ask not to load whole content in 
 memory, but instead produce input range, which will continue to load 
 data from server when you will  be ready to consume:
 
      auto rq = Request();
      rq.useStreaming = true;
      auto rs = rq.get("http://httpbin.org/image/jpeg");
      auto stream = rs.receiveAsRange();
      while(!stream.empty) {
          // stream.front contain next data portion
          writefln("Received %d bytes, total received %d from document 
 legth %d", stream.front.length, rq.contentReceived, rq.contentLength);
          stream.popFront; // continue to load from server
      }

Very nice, I will add a component to iopipe that converts a "chunk-like" 
range like this into an iopipe source, as this is going to be needed to 
interface with existing libraries. I still will want to skip the middle 
man buffer at some point though :)

Thanks!

-Steve

Oct 13 2017

ikod <geller.garry gmail.com> writes:

Hello, Steve

On Friday, 13 October 2017 at 22:22:54 UTC, Steven Schveighoffer 
wrote:
 On 10/13/17 6:18 PM, ikod wrote:
 On Friday, 13 October 2017 at 19:17:54 UTC, Steven 
 Schveighoffer wrote:
 Eventually, something like this will be possible with 
 jsoniopipe (I need to update and release this too, it's 
 probably broken with some of the changes I just put into 
 iopipe). Hopefully combined with some sort of networking 
 library you could process a JSON stream without reading the 
 whole thing into memory.

 
 This can be done with requests. You can ask not to load whole 
 content in memory, but instead produce input range, which will 
 continue to load data from server when you will  be ready to 
 consume:
 
      auto rq = Request();
      rq.useStreaming = true;
      auto rs = rq.get("http://httpbin.org/image/jpeg");
      auto stream = rs.receiveAsRange();
      while(!stream.empty) {
          // stream.front contain next data portion
          writefln("Received %d bytes, total received %d from 
 document legth %d", stream.front.length, rq.contentReceived, 
 rq.contentLength);
          stream.popFront; // continue to load from server
      }

 Very nice, I will add a component to iopipe that converts a 
 "chunk-like" range like this into an iopipe source, as this is 
 going to be needed to interface with existing libraries. I 
 still will want to skip the middle man buffer at some point 
 though :)

 Thanks!

 -Steve

Just in order to have complete picture here - getContent returns 
not just ubyte[], but more rich structure (which can be convered 
to ubyte[] if needed). Basically it is an 
immutable(immutable(ubyte)[]) and almost all data there are just 
data received from network without any data copy.

There are more details and docs on 
https://github.com/ikod/nbuff/blob/master/source/nbuff/buffer.d. 
Main goal behind Buffer is to minimize data movement, but it also 
support many range properties, as long as some internal optimized 
methods.

Thanks,

Igor

Oct 17 2017

Steven Schveighoffer <schveiguy yahoo.com> writes:

On 10/17/17 4:33 AM, ikod wrote:
 Hello, Steve
 
 On Friday, 13 October 2017 at 22:22:54 UTC, Steven Schveighoffer wrote:
 On 10/13/17 6:18 PM, ikod wrote:
 On Friday, 13 October 2017 at 19:17:54 UTC, Steven Schveighoffer wrote:
 Eventually, something like this will be possible with jsoniopipe (I 
 need to update and release this too, it's probably broken with some 
 of the changes I just put into iopipe). Hopefully combined with some 
 sort of networking library you could process a JSON stream without 
 reading the whole thing into memory.

 This can be done with requests. You can ask not to load whole content 
 in memory, but instead produce input range, which will continue to 
 load data from server when you will  be ready to consume:

      auto rq = Request();
      rq.useStreaming = true;
      auto rs = rq.get("http://httpbin.org/image/jpeg");
      auto stream = rs.receiveAsRange();
      while(!stream.empty) {
          // stream.front contain next data portion
          writefln("Received %d bytes, total received %d from document 
 legth %d", stream.front.length, rq.contentReceived, rq.contentLength);
          stream.popFront; // continue to load from server
      }

 Very nice, I will add a component to iopipe that converts a 
 "chunk-like" range like this into an iopipe source, as this is going 
 to be needed to interface with existing libraries. I still will want 
 to skip the middle man buffer at some point though :)

 Thanks!

 
 Just in order to have complete picture here - getContent returns not 
 just ubyte[], but more rich structure (which can be convered to ubyte[] 
 if needed). Basically it is an immutable(immutable(ubyte)[]) and almost 
 all data there are just data received from network without any data copy.

Right, iopipe can use it just fine, without copying, as all arrays are 
also iopipes. In that case, it skips allocating a buffer, because there 
is no need.

However, I prefer the need to avoid allocating the whole thing in 
memory, which is why I would prefer the range interface. However, in 
this case, iopipe needs to copy each chunk to its own buffer.

In terms of the most useful/least copying, direct access to the stream 
itself would be the best, which is why I said "skip the middle man". I 
feel like this won't be possible directly with requests and iopipe, 
because you need buffering to deal with parsing the headers. I think 
it's probably going to be a system built on top of iopipe, using its 
buffers, that would be the most optimal.

 There are more details and docs on 
 https://github.com/ikod/nbuff/blob/master/source/nbuff/buffer.d. Main 
 goal behind Buffer is to minimize data movement, but it also support 
 many range properties, as long as some internal optimized methods.

I will take a look when I get a chance, thanks.

-Steve

Oct 17 2017

D Programming

C/C++ Programming

Other

digitalmars.D.learn - What is the best way to use requests and iopipe on gzipped JSON file