digitalmars.D - stream interfaces

digitalmars.D - stream interfaces - with ranges

Steven Schveighoffer (64/64) May 17 2012 OK, so I had a couple partially written replies on the 'deprecating

Andrei Alexandrescu (18/71) May 17 2012 Ah, there we go :o).

Steven Schveighoffer (32/79) May 17 2012 Right. The thing is, buffered streams are good as plain ranges for one ...

kenji hara (34/97) May 17 2012 I think range interface is not useful for *efficient* IO. The expected

Dmitry Olshansky (10/23) May 18 2012 There is no problem with blocking _interface_. That is the facade. The

Artur Skawina (9/29) May 18 2012 I just took a look, and yes, that's yet another slightly different imple...

Steven Schveighoffer (33/52) May 18 2012 If all you are doing is consuming data and processing it, range interfac...

kenji hara (34/86) May 18 2012 is

Steven Schveighoffer (26/127) May 18 2012 On non-blocking i/o, why not just not support range interface at all? I...

kenji hara (6/11) May 18 2012 d

Steven Schveighoffer (19/30) May 18 2012 OK, *now* I understand what you mean by non-blocking. There are some I/...

kenji hara (30/53) May 18 2012 e:

Andrei Alexandrescu (3/6) May 18 2012 That sounds very promising.

Mehrdad (29/31) May 18 2012 The trouble is, why a slice? Why not an std.array.Array? Why not

Roman D. Boiko (5/36) May 18 2012 Provide slicing if underlying data source is compatible.
Steven Schveighoffer (19/47) May 18 2012 Well, because that's what i/o buffers are :) There isn't an OS primitiv...

Mehrdad (4/16) May 18 2012 I beg to differ..

Steven Schveighoffer (4/20) May 18 2012 It still reads into an array of buffers, which are slices. And the

Mehrdad (3/10) May 18 2012 Uh, the resulting range can be totally discontiguous...

Steven Schveighoffer (4/15) May 18 2012 So? So can a range of T[].

Mehrdad (10/13) May 18 2012 Well you mentioned "There isn't an OS primitive that reads a file

Steven Schveighoffer (4/17) May 18 2012 No, by range of T[] I mean this:

Mehrdad (11/15) May 18 2012 Yes, I believe I understood it correctly...

Martin Nowak (14/17) May 21 2012 It's a pity that iovec and T[] have switch length/ptr fields.

Andrei Alexandrescu (6/20) May 18 2012 Because T[] is the fundamental representation of a typed contiguous area...

Artur Skawina (35/65) May 18 2012 No. 'empty' has to return true only _after_ seeing EOF.

Steven Schveighoffer (8/40) May 18 2012 I think this is an example of what Kenji and I are talking about -- tryi...

kenji hara (24/89) May 18 2012 OK. If reading bytes from underlying device failed, your 'fronts' can

Andrei Alexandrescu (16/31) May 18 2012 Yah, this goes back to the fact that ranges are by definition buffered;

David Nadlinger (8/14) May 18 2012 And I think the issues you brought up some time ago regarding to

Artur Skawina (29/131) May 18 2012 Depends if your input range supports zero-copy or not. IOW you avoid
kenji hara (21/64) May 18 2012 The flaw of your design is, the memory to store read bytes/elements is
Artur Skawina (20/75) May 18 2012 If you pick just one scheme, then you will end up with an unnecessary
Artur Skawina (5/14) May 18 2012 And is apparently windows-only; missing HANDLE type, non-

Masahiro Nakagawa (4/18) May 19 2012 Current dio is PoC for new IO design.

Artur Skawina (4/8) May 18 2012 It's not exactly what i had i mind, but i tried to build it;
kenji hara (4/12) May 19 2012 Sorry, I have updated it.

Masahiro Nakagawa (8/24) May 19 2012 Please add README to top directory.

travert phare.normalesup.org (Christophe Travert) (56/111) May 21 2012 I don't have time to read the whole discussion right now, but I've

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

OK, so I had a couple partially written replies on the 'deprecating
std.stream etc' thread, then I had to go home.

But I thought about this a lot last night, and some of the things Andrei
and others are saying is starting to make sense (I know!).  Now I've
scrapped those replies and am thinking about redesigning my i/o package
(most of the code can stay intact).

I'm a little undecided on some of the details, but here is what I think
makes sense:

1. We need a buffering input stream type.  This must have additional
methods besides the range primitives, because doing one-at-a-time byte
reads is not going to cut it.
2. I realized, buffering input stream of type T is actually an input range
of type T[].  Observe:

struct /*or class*/ buffer(T)
{
      T[] buf;
      InputStream input;
      ...
       property T[] front() { return buf; }
      void popFront() {input.read(buf);} // flush existing buffer, read  
next.
       property bool empty() { return buf.length == 0;}
}

Roughly speaking, not all the details are handled, but this makes a
feasible input range that will perform quite nicely for things like
std.algorithm.copy.  I haven't checked, but copy should be able to handle
transferring a range of type T[] to an output range with element type T,
if it's not able to, it should be made to work.  I know at least, an
output stream with element type T supports putting T or T[].  What I think
really makes sense is to support:

buffer!ubyte b;
outputStream o;

o.put(b); // uses range primitives to put all the data to o, one element
(i.e. ubyte[]) of b at a time


3. An ultimate goal of the i/o streaming package should be to be able to
do this:

auto x = new XmlParser("<rootElement></rootElement>");

or at least

auto x = new XmlParser(buffered("<rootElement></rootElement>"));

So I think arrays need to be able to be treated as a buffering streams.  I
tried really hard to think of some way to make this work with my existing
system, but I don't think it will without unnecessary baggage, and losing
interoperability with existing range functions.

Where does this leave us?

1. I think we need, as Andrei says, an unbuffered streaming abstraction.
I think I have this down pretty solidly in my current std.io.
2. A definition of a buffering range, in terms of what additional
primitives the range should have.  The primitives should support buffered
input and buffered output (these are two separate algorithms), but
independently (possibly allowing switching for rw files).
3. An implementation of the above definition hooked to the unbuffered
stream abstraction, to be utilized in more specific ranges.  But by
itself, can be used as an input range or directly by code.
4. Specialization ranges for each type of input you want (i.e. byLine,
byChunk, textStream).
5. Full replacement option of File backend.  File will start out with
C-supported calls, but any "promotion" to using a more D-like range type
will result in switching to a D-based stream using the above mechanisms.
Of course, all existing code should compile that does not try to assume
the File always has a valid FILE *.

What do you all think?  I'm going to work out what the definition of 2
should be, based on what I've written and what makes sense.

Have I started to design something feasible or unworkable? :)

-Steve

May 17 2012

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 5/17/12 9:02 AM, Steven Schveighoffer wrote:
 1. We need a buffering input stream type. This must have additional
 methods besides the range primitives, because doing one-at-a-time byte
 reads is not going to cut it.

I was thinking a range of T[] could be enough for a buffered input range.

 2. I realized, buffering input stream of type T is actually an input range
 of type T[]. Observe:

Ah, there we go :o).

 struct /*or class*/ buffer(T)
 {
 T[] buf;
 InputStream input;
 ...
  property T[] front() { return buf; }
 void popFront() {input.read(buf);} // flush existing buffer, read next.
  property bool empty() { return buf.length == 0;}
 }

 Roughly speaking, not all the details are handled, but this makes a
 feasible input range that will perform quite nicely for things like
 std.algorithm.copy. I haven't checked, but copy should be able to handle
 transferring a range of type T[] to an output range with element type T,
 if it's not able to, it should be made to work.

We can do this for copy, but if we need to specialize a lot of other 
algorithms, maybe we didn't strike the best design.

 I know at least, an
 output stream with element type T supports putting T or T[].

Right.

 What I think
 really makes sense is to support:

 buffer!ubyte b;
 outputStream o;

 o.put(b); // uses range primitives to put all the data to o, one element
 (i.e. ubyte[]) of b at a time

I think that makes sense.

 3. An ultimate goal of the i/o streaming package should be to be able to
 do this:

 auto x = new XmlParser("<rootElement></rootElement>");

 or at least

 auto x = new XmlParser(buffered("<rootElement></rootElement>"));

 So I think arrays need to be able to be treated as a buffering streams. I
 tried really hard to think of some way to make this work with my existing
 system, but I don't think it will without unnecessary baggage, and losing
 interoperability with existing range functions.

I think we can create a generic abstraction buffered() that layers 
buffering on top of an input range. If the input range has unbuffered 
read capability, buffered() would use those. Otherwise, it would use 
loops using empty, front, and popFront.

 Where does this leave us?

 1. I think we need, as Andrei says, an unbuffered streaming abstraction.
 I think I have this down pretty solidly in my current std.io.

Great. What are the primitives?

 2. A definition of a buffering range, in terms of what additional
 primitives the range should have. The primitives should support buffered
 input and buffered output (these are two separate algorithms), but
 independently (possibly allowing switching for rw files).

Sounds good.

 3. An implementation of the above definition hooked to the unbuffered
 stream abstraction, to be utilized in more specific ranges. But by
 itself, can be used as an input range or directly by code.

Hah, I can't believe I wrote about the same thing above (and I swear I 
didn't read yours).

 4. Specialization ranges for each type of input you want (i.e. byLine,
 byChunk, textStream).

What is the purpose? To avoid unnecessary double buffering?

 5. Full replacement option of File backend. File will start out with
 C-supported calls, but any "promotion" to using a more D-like range type
 will result in switching to a D-based stream using the above mechanisms.
 Of course, all existing code should compile that does not try to assume
 the File always has a valid FILE *.

This will be tricky but probably doable.


Andrei

May 17 2012

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Thu, 17 May 2012 11:46:18 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 On 5/17/12 9:02 AM, Steven Schveighoffer wrote:
 Roughly speaking, not all the details are handled, but this makes a
 feasible input range that will perform quite nicely for things like
 std.algorithm.copy. I haven't checked, but copy should be able to handle
 transferring a range of type T[] to an output range with element type T,
 if it's not able to, it should be made to work.

 We can do this for copy, but if we need to specialize a lot of other  
 algorithms, maybe we didn't strike the best design.

Right.  The thing is, buffered streams are good as plain ranges for one  
thing -- forwarding data.  There probably aren't many algorithms in  
std.algorithm that are applicable.  And there is always the put idiom,  
Appender.put(buf) should work to accumulate all data into an array, which  
can then be used as a normal range.

One thing that worries me, if you did something like  
array(bufferedStream), it would accumulate N copies of the buffer  
reference, which wouldn't be what you want at all.  Of course, you could  
apply map to buffer to dup it.

 3. An ultimate goal of the i/o streaming package should be to be able to
 do this:

 auto x = new XmlParser("<rootElement></rootElement>");

 or at least

 auto x = new XmlParser(buffered("<rootElement></rootElement>"));

 So I think arrays need to be able to be treated as a buffering streams.  
 I
 tried really hard to think of some way to make this work with my  
 existing
 system, but I don't think it will without unnecessary baggage, and  
 losing
 interoperability with existing range functions.

 I think we can create a generic abstraction buffered() that layers  
 buffering on top of an input range. If the input range has unbuffered  
 read capability, buffered() would use those. Otherwise, it would use  
 loops using empty, front, and popFront.

Right, this is different from my proposed buffer implementation, which  
puts a buffer on top of an unbuffered input *stream*.  But of course, we  
can define it for both, since it will be a compile-time interface.

 Where does this leave us?

 1. I think we need, as Andrei says, an unbuffered streaming abstraction.
 I think I have this down pretty solidly in my current std.io.

 Great. What are the primitives?

See here:
https://github.com/schveiguy/phobos/blob/new-io2/std/io.d#L170

Through IODevice.  The BufferedStream type is going to be redone as a  
range.

 3. An implementation of the above definition hooked to the unbuffered
 stream abstraction, to be utilized in more specific ranges. But by
 itself, can be used as an input range or directly by code.

 Hah, I can't believe I wrote about the same thing above (and I swear I  
 didn't read yours).

Well, not quite :)  You wrote about it being supported by an underlying  
range, I need to have it supported by an underlying stream.  We probably  
need both.  But yeah, I think we are mostly on the same page here.

 4. Specialization ranges for each type of input you want (i.e. byLine,
 byChunk, textStream).

 What is the purpose? To avoid unnecessary double buffering?

No, a specialization range *uses* a buffer range as its backing.  A buffer  
range I think is necessarily going to be a reference type (probably a  
class). The specialized range won't replace the buffer range, in other  
words.

Something like byLine is going to do the work of extracting lines from the  
buffer, it will reference the buffer data directly.  But it won't  
reimplement buffering.

 5. Full replacement option of File backend. File will start out with
 C-supported calls, but any "promotion" to using a more D-like range type
 will result in switching to a D-based stream using the above mechanisms.
 Of course, all existing code should compile that does not try to assume
 the File always has a valid FILE *.

 This will be tricky but probably doable.

Doing this will unify all the i/o packages together into one interface --  
File.  I think it's a bad story for D if you have 2 ways of doing i/o (or  
at least 2 ways of doing the *same thing* with i/o).

-Steve

May 17 2012

kenji hara <k.hara.pg gmail.com> writes:

I think range interface is not useful for *efficient* IO. The expected
IO interface will be more *abstract* than range primitives.

---
If you use range I/F to read bytes from device, we will always do
blocking IO - even if the device is socket. It is not efficient.

auto sock =3D new TcpSocketDevice();
if (sock.empty) { auto e =3D sock.front; }
  // In empty primitive, we *must* wait the socket gets one or more
bytes or really disconnected.
  // If not, what exactly returns sock.front?
  // Then using range interface for socket reading enforces blocking
IO. It is *really* inefficient.
---
I think IO primitives must be distinct from range ones for the reasons
mentioned above...

I'm designing experimental IO primitives:
https://github.com/9rnsr/dio

I call the input stream "source", and call output stream "sink".
"source" has a 'pull' primitive, and sink has 'push' primitive, and
they can avoid blocking.
If you want to construct input range interface from "source", you
should use 'ranged' helper function in io.core module. 'ranged'
returns a wrapper object, and in its front method, It reads bytes from
"source", and if the read bytes not sufficient, blocks the input.

In other words, range is not almighty. We should think distinct
primitives for the IO.

Kenji Hara

2012/5/17 Steven Schveighoffer <schveiguy yahoo.com>:
 OK, so I had a couple partially written replies on the 'deprecating
 std.stream etc' thread, then I had to go home.

 But I thought about this a lot last night, and some of the things Andrei
 and others are saying is starting to make sense (I know!). =A0Now I've
 scrapped those replies and am thinking about redesigning my i/o package
 (most of the code can stay intact).

 I'm a little undecided on some of the details, but here is what I think
 makes sense:

 1. We need a buffering input stream type. =A0This must have additional
 methods besides the range primitives, because doing one-at-a-time byte
 reads is not going to cut it.
 2. I realized, buffering input stream of type T is actually an input rang=

e
 of type T[]. =A0Observe:

 struct /*or class*/ buffer(T)
 {
 =A0 =A0 T[] buf;
 =A0 =A0 InputStream input;
 =A0 =A0 ...
 =A0 =A0  property T[] front() { return buf; }
 =A0 =A0 void popFront() {input.read(buf);} // flush existing buffer, read=

 next.
 =A0 =A0  property bool empty() { return buf.length =3D=3D 0;}
 }

 Roughly speaking, not all the details are handled, but this makes a
 feasible input range that will perform quite nicely for things like
 std.algorithm.copy. =A0I haven't checked, but copy should be able to hand=

le
 transferring a range of type T[] to an output range with element type T,
 if it's not able to, it should be made to work. =A0I know at least, an
 output stream with element type T supports putting T or T[]. =A0What I th=

ink
 really makes sense is to support:

 buffer!ubyte b;
 outputStream o;

 o.put(b); // uses range primitives to put all the data to o, one element
 (i.e. ubyte[]) of b at a time


 3. An ultimate goal of the i/o streaming package should be to be able to
 do this:

 auto x =3D new XmlParser("<rootElement></rootElement>");

 or at least

 auto x =3D new XmlParser(buffered("<rootElement></rootElement>"));

 So I think arrays need to be able to be treated as a buffering streams. =

=A0I
 tried really hard to think of some way to make this work with my existing
 system, but I don't think it will without unnecessary baggage, and losing
 interoperability with existing range functions.

 Where does this leave us?

 1. I think we need, as Andrei says, an unbuffered streaming abstraction.
 I think I have this down pretty solidly in my current std.io.
 2. A definition of a buffering range, in terms of what additional
 primitives the range should have. =A0The primitives should support buffer=

ed
 input and buffered output (these are two separate algorithms), but
 independently (possibly allowing switching for rw files).
 3. An implementation of the above definition hooked to the unbuffered
 stream abstraction, to be utilized in more specific ranges. =A0But by
 itself, can be used as an input range or directly by code.
 4. Specialization ranges for each type of input you want (i.e. byLine,
 byChunk, textStream).
 5. Full replacement option of File backend. =A0File will start out with
 C-supported calls, but any "promotion" to using a more D-like range type
 will result in switching to a D-based stream using the above mechanisms.
 Of course, all existing code should compile that does not try to assume
 the File always has a valid FILE *.

 What do you all think? =A0I'm going to work out what the definition of 2
 should be, based on what I've written and what makes sense.

 Have I started to design something feasible or unworkable? :)

 -Steve

May 17 2012

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 18.05.2012 8:19, kenji hara wrote:
 I think range interface is not useful for *efficient* IO. The expected
 IO interface will be more *abstract* than range primitives.

 ---
 If you use range I/F to read bytes from device, we will always do
 blocking IO - even if the device is socket. It is not efficient.

 auto sock = new TcpSocketDevice();
 if (sock.empty) { auto e = sock.front; }
    // In empty primitive, we *must* wait the socket gets one or more
 bytes or really disconnected.
    // If not, what exactly returns sock.front?
    // Then using range interface for socket reading enforces blocking
 IO. It is *really* inefficient.
 ---

There is no problem with blocking _interface_. That is the facade. The 
actual work can happen in background thread (and in fact it often is).
So while you work with first chunk the next one is downloaded behind the 
scenes.
Just take a look at std.net.curl all these asyncByChunk ... and then 
there is vide.d that shows that having blocking interface for 
asynchronous i/o is alright.

-- 
Dmitry Olshansky

May 18 2012

Artur Skawina <art.08.09 gmail.com> writes:

On 05/18/12 13:34, Dmitry Olshansky wrote:
On 18.05.2012 8:19, kenji hara wrote:
I think range interface is not useful for *efficient* IO. The expected
IO interface will be more *abstract* than range primitives.

---
If you use range I/F to read bytes from device, we will always do
blocking IO - even if the device is socket. It is not efficient.

auto sock = new TcpSocketDevice();
if (sock.empty) { auto e = sock.front; }
// In empty primitive, we *must* wait the socket gets one or more
bytes or really disconnected.
// If not, what exactly returns sock.front?
// Then using range interface for socket reading enforces blocking
IO. It is *really* inefficient.
---

There is no problem with blocking _interface_. That is the facade. The actual
work can happen in background thread (and in fact it often is).
So while you work with first chunk the next one is downloaded behind the
scenes.
Just take a look at std.net.curl all these asyncByChunk ... and then there is
vide.d that shows that having blocking interface for asynchronous i/o is
alright.

I just took a look, and yes, that's yet another slightly different
implementation
of the same thing with a somewhat different interface:

https://github.com/rejectedsoftware/vibe.d/blob/399b7a9d6eba9b14ea8d2215498daf53bd8d27d8/source/vibe/stream/stream.d

I thought i was exaggerating when i said 'thirteen', but there are already
more of them mentioned in this thread than i could count on one hand...

This one has an implicit flush and also this: "Finalize has to be called on
certain types of streams.". Not to mention it's class based.

artur

May 18 2012

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Fri, 18 May 2012 00:19:45 -0400, kenji hara <k.hara.pg gmail.com> wrote:

 I think range interface is not useful for *efficient* IO. The expected
 IO interface will be more *abstract* than range primitives.

If all you are doing is consuming data and processing it, range interface  
is efficient.  Most streaming implementations that are synchronous use:

1. read block of data from low-level source into buffer
2. process buffer
3. If still data left, go to step 1.

1 is done via popFront, 2 is done via front.

3 is somewhat available via empty, but empty kind of depends on reading  
data.  I think it can work.

It's not the ideal interface for all aspects of i/o, but it does map to  
ranges, and for single purpose tasks (such as parse an XML file), it will  
be most efficient.

 ---
 If you use range I/F to read bytes from device, we will always do
 blocking IO - even if the device is socket. It is not efficient.

 auto sock = new TcpSocketDevice();
 if (sock.empty) { auto e = sock.front; }
   // In empty primitive, we *must* wait the socket gets one or more
 bytes or really disconnected.
   // If not, what exactly returns sock.front?
   // Then using range interface for socket reading enforces blocking
 IO. It is *really* inefficient.
 ---

sockets do not have to be blocking, and I/O does not have to use the range  
portion of the interface.

And efficient I/O has little to do with synchronicity and more to do with  
reading a large amount of data at a time instead of byte by byte.

Using multi-threads or fibers, and using OS primitives such as select or  
poll can make I/O quite efficient and allow you to do other things while  
no I/O is happening.  These will not happen with range interface, but will  
be available through other interfaces.

 I think IO primitives must be distinct from range ones for the reasons
 mentioned above...

Yes, I agree.  But ranges can be *mapped* to stream primitives.

 I'm designing experimental IO primitives:
 https://github.com/9rnsr/dio

I'll take a look.

 In other words, range is not almighty. We should think distinct
 primitives for the IO.

100% agree.  The main thing I realized that brought me to propose the  
"range-based" (if you can call it that) version is that:

1. Ranges can be readily mapped to stream primitives *if* you use the  
concept of a range of T[] vs. a range of T.  So in essence, without  
changing anything I can slap on a range interface for free.
2. Arrays make very efficient data sources, and are easy to create.  We  
need a way to hook stream-using code onto an array.

But be clear, I am *not* going to remove the existing stream I/O  
primitives I had for buffered i/o, I'm rather *adding* range primitives as  
well.

-Steve

May 18 2012

kenji hara <k.hara.pg gmail.com> writes:

2012/5/18 Steven Schveighoffer <schveiguy yahoo.com>:
 On Fri, 18 May 2012 00:19:45 -0400, kenji hara <k.hara.pg gmail.com> wrot=

e:
 I think range interface is not useful for *efficient* IO. The expected
 IO interface will be more *abstract* than range primitives.


 If all you are doing is consuming data and processing it, range interface=

 is
 efficient. =C2=A0Most streaming implementations that are synchronous use:

 1. read block of data from low-level source into buffer
 2. process buffer
 3. If still data left, go to step 1.

 1 is done via popFront, 2 is done via front.

 3 is somewhat available via empty, but empty kind of depends on reading
 data. =C2=A0I think it can work.

 It's not the ideal interface for all aspects of i/o, but it does map to
 ranges, and for single purpose tasks (such as parse an XML file), it will=

 be
 most efficient.

Almost agree. When we want to do I/O, that is synchronous or asynchronous.
Only a few people would use non-blocking interface.
But for the library implementation, non-blocking interface is still importa=
nt.
I think the non-blocking interface should be designed to avoid copying
as far as possible, and to achieve it with range interface is
impossible in general.

 ---
 If you use range I/F to read bytes from device, we will always do
 blocking IO - even if the device is socket. It is not efficient.

 auto sock =3D new TcpSocketDevice();
 if (sock.empty) { auto e =3D sock.front; }
 =C2=A0// In empty primitive, we *must* wait the socket gets one or more
 bytes or really disconnected.
 =C2=A0// If not, what exactly returns sock.front?
 =C2=A0// Then using range interface for socket reading enforces blocking
 IO. It is *really* inefficient.
 ---


 sockets do not have to be blocking, and I/O does not have to use the rang=

e
 portion of the interface.

 And efficient I/O has little to do with synchronicity and more to do with
 reading a large amount of data at a time instead of byte by byte.

 Using multi-threads or fibers, and using OS primitives such as select or
 poll can make I/O quite efficient and allow you to do other things while =

no
 I/O is happening. =C2=A0These will not happen with range interface, but w=

ill be
 available through other interfaces.

I have talked about *good I/O primitives for library implementation*.
I think range interface is one of the most useful concept for end
users, but not good one for people who want to implement efficient
libraries.

 I think IO primitives must be distinct from range ones for the reasons
 mentioned above...


 Yes, I agree. =C2=A0But ranges can be *mapped* to stream primitives.

No, we cannot map output range concept to non-blocking output. 'put'
operation always requires blocking.

 I'm designing experimental IO primitives:
 https://github.com/9rnsr/dio


 I'll take a look.

Thanks.

 In other words, range is not almighty. We should think distinct
 primitives for the IO.


 100% agree. =C2=A0The main thing I realized that brought me to propose th=

e
 "range-based" (if you can call it that) version is that:

 1. Ranges can be readily mapped to stream primitives *if* you use the
 concept of a range of T[] vs. a range of T. =C2=A0So in essence, without =

changing
 anything I can slap on a range interface for free.
 2. Arrays make very efficient data sources, and are easy to create. =C2=

=A0We need
 a way to hook stream-using code onto an array.

 But be clear, I am *not* going to remove the existing stream I/O primitiv=

es
 I had for buffered i/o, I'm rather *adding* range primitives as well.

My policy is very similar. But, as described above, I think range
cannot cover non-blocing IO.
And I think non-blocking IO interface is important for library implementati=
ons.

Then I had taken a design that provides IO specific primitives.
Additionally I have added primitives to control underlying buffers
explicitly, because it is useful for some  byte processing - e.g.
encoding, taking a string with slicing the buffer, and so on.

Kenji Hara

May 18 2012

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Fri, 18 May 2012 10:39:55 -0400, kenji hara <k.hara.pg gmail.com> wrote:

 2012/5/18 Steven Schveighoffer <schveiguy yahoo.com>:
 On Fri, 18 May 2012 00:19:45 -0400, kenji hara <k.hara.pg gmail.com>  
 wrote:

 I think range interface is not useful for *efficient* IO. The expected
 IO interface will be more *abstract* than range primitives.


 If all you are doing is consuming data and processing it, range  
 interface is
 efficient.  Most streaming implementations that are synchronous use:

 1. read block of data from low-level source into buffer
 2. process buffer
 3. If still data left, go to step 1.

 1 is done via popFront, 2 is done via front.

 3 is somewhat available via empty, but empty kind of depends on reading
 data.  I think it can work.

 It's not the ideal interface for all aspects of i/o, but it does map to
 ranges, and for single purpose tasks (such as parse an XML file), it  
 will be
 most efficient.

 Almost agree. When we want to do I/O, that is synchronous or  
 asynchronous.
 Only a few people would use non-blocking interface.
 But for the library implementation, non-blocking interface is still  
 important.
 I think the non-blocking interface should be designed to avoid copying
 as far as possible, and to achieve it with range interface is
 impossible in general.

On non-blocking i/o, why not just not support range interface at all?  I  
don't have any problem with that.  In other words, if your input source is  
non-blocking, and you try to use range primitives, it simply won't work.

I admit, all of my code so far is focused on blocking i/o.  I have some  
experience with non-blocking i/o, but it was to make a blocking interface  
that supported waiting for data with a timeout.  Making a cross-platform  
(i.e. both windows and Posix) non-blocking interface is difficult because  
you use very different mechanisms on both OSes.

And a lot of times, you don't want non-blocking i/o, but rather parallel  
i/o.

 ---
 If you use range I/F to read bytes from device, we will always do
 blocking IO - even if the device is socket. It is not efficient.

 auto sock = new TcpSocketDevice();
 if (sock.empty) { auto e = sock.front; }
  // In empty primitive, we *must* wait the socket gets one or more
 bytes or really disconnected.
  // If not, what exactly returns sock.front?
  // Then using range interface for socket reading enforces blocking
 IO. It is *really* inefficient.
 ---


 sockets do not have to be blocking, and I/O does not have to use the  
 range
 portion of the interface.

 And efficient I/O has little to do with synchronicity and more to do  
 with
 reading a large amount of data at a time instead of byte by byte.

 Using multi-threads or fibers, and using OS primitives such as select or
 poll can make I/O quite efficient and allow you to do other things  
 while no
 I/O is happening.  These will not happen with range interface, but will  
 be
 available through other interfaces.

 I have talked about *good I/O primitives for library implementation*.
 I think range interface is one of the most useful concept for end
 users, but not good one for people who want to implement efficient
 libraries.

OK, I think we agree.  I am concerned about writing good library types  
that can efficiently use I/O.  The range interface will be for people who  
use the library and want to utilize existing range primitives for whatever  
purpose.

 I think IO primitives must be distinct from range ones for the reasons
 mentioned above...


 Yes, I agree.  But ranges can be *mapped* to stream primitives.

 No, we cannot map output range concept to non-blocking output. 'put'
 operation always requires blocking.

Yes, but again, put can use whatever stream primitives we have.

In other words, it's quite possible to write range primitives which  
utilize stream primitivies.  It's impossible to write good stream  
primitives which utilize range primitives.

 I'm designing experimental IO primitives:
 https://github.com/9rnsr/dio


 I'll take a look.

 Thanks.

I'm having trouble following the code, is there a place with the generated  
docs?   I'm looking for an overview to understand where to look.

Your lib is quite extensive, mine is only one file ;)

 In other words, range is not almighty. We should think distinct
 primitives for the IO.


 100% agree.  The main thing I realized that brought me to propose the
 "range-based" (if you can call it that) version is that:

 1. Ranges can be readily mapped to stream primitives *if* you use the
 concept of a range of T[] vs. a range of T.  So in essence, without  
 changing
 anything I can slap on a range interface for free.
 2. Arrays make very efficient data sources, and are easy to create.  We  
 need
 a way to hook stream-using code onto an array.

 But be clear, I am *not* going to remove the existing stream I/O  
 primitives
 I had for buffered i/o, I'm rather *adding* range primitives as well.

 My policy is very similar. But, as described above, I think range
 cannot cover non-blocing IO.
 And I think non-blocking IO interface is important for library  
 implementations.

I think you misunderstand, I'm not trying to make ranges be the base of  
i/o, I'm trying to expose a range interface *based on* stream i/o  
interface.

-Steve

May 18 2012

kenji hara <k.hara.pg gmail.com> writes:

2012/5/19 Steven Schveighoffer <schveiguy yahoo.com>:
 On Fri, 18 May 2012 10:39:55 -0400, kenji hara <k.hara.pg gmail.com> wrot=

e:
 I'm designing experimental IO primitives:
 https://github.com/9rnsr/dio



 I'm having trouble following the code, is there a place with the generate=

d
 docs? =C2=A0 I'm looking for an overview to understand where to look.

I have created gh-pages:
http://9rnsr.github.com/dio/d/io_core.html

Kenji Hara

May 18 2012

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Fri, 18 May 2012 13:27:22 -0400, kenji hara <k.hara.pg gmail.com> wrote:

 2012/5/19 Steven Schveighoffer <schveiguy yahoo.com>:
 On Fri, 18 May 2012 10:39:55 -0400, kenji hara <k.hara.pg gmail.com>  
 wrote:
 I'm designing experimental IO primitives:
 https://github.com/9rnsr/dio



 I'm having trouble following the code, is there a place with the  
 generated
 docs?   I'm looking for an overview to understand where to look.

 I have created gh-pages:
 http://9rnsr.github.com/dio/d/io_core.html

OK, *now* I understand what you mean by non-blocking.  There are some I/O  
packages that use asynchronous i/o which return even before any data is  
given to the buffer.  I thought this is what you were talking about.

I'm fully on board with synchronous but non-blocking.  That's what I  
assumed we would be doing, and it's well supported by low-level OS  
routines on all OSes.

In my implementation for a buffer, I have two calls:

read(buf[]) -> read until buf.length bytes are read or EOF
readPartial(buf[]) -> read from 1 to buf.length bytes, but performs at  
most 1 low-level read.  Returns 0 bytes on EOF.

readPartial will block if no data is yet available, but obviously can be  
made to not block if the underlying OS handle is marked as non-blocking (I  
need to add some extra structure to account for this).

Typically, this is the normal mechanism that I use for reading data that  
is not always available.  First, I select on a socket until data is  
available, then use synchronous read to get whatever data exists.

continuing reading...

-Steve

May 18 2012

kenji hara <k.hara.pg gmail.com> writes:

2012/5/19 Steven Schveighoffer <schveiguy yahoo.com>:
 On Fri, 18 May 2012 10:39:55 -0400, kenji hara <k.hara.pg gmail.com> wrot=

e:
[snip]
 On non-blocking i/o, why not just not support range interface at all? =C2=

=A0I
 don't have any problem with that. =C2=A0In other words, if your input sou=

rce is
 non-blocking, and you try to use range primitives, it simply won't work.

 I admit, all of my code so far is focused on blocking i/o. =C2=A0I have s=

ome
 experience with non-blocking i/o, but it was to make a blocking interface
 that supported waiting for data with a timeout. =C2=A0Making a cross-plat=

form
 (i.e. both windows and Posix) non-blocking interface is difficult because
 you use very different mechanisms on both OSes.

 And a lot of times, you don't want non-blocking i/o, but rather parallel
 i/o.

[snip]
 No, we cannot map output range concept to non-blocking output. 'put'
 operation always requires blocking.

 Yes, but again, put can use whatever stream primitives we have.

 In other words, it's quite possible to write range primitives which utili=

ze
 stream primitivies. =C2=A0It's impossible to write good stream primitives=

 which
 utilize range primitives.

[snip]
 My policy is very similar. But, as described above, I think range
 cannot cover non-blocing IO.
 And I think non-blocking IO interface is important for library
 implementations.


 I think you misunderstand, I'm not trying to make ranges be the base of i=

/o,
 I'm trying to expose a range interface *based on* stream i/o interface.

The reasons why not use range primitives directly for stream I/O.

1. To specify a buffer for storing read bytes from upper layer.

Input range doesn't have a way to specify buffer for storing read
bytes to lower layer.
Because input range is designed as a view of underlying container.

Comparison of primitive count.
The four or more primitives: empty + front + popFront +
specifiy-buffer-for-storing-read-bytes + ...
vs.
My 'pull' primitive

Which is better?

2. To avoid confusing I/O operation/interfaces and range ones.

Yes, if you only needs blocking-io, you can use range i/f instead of
i/o specific primitives, but it is very confusable.
I think that enforcing to wrap IO objects (like File) with thin range
wrapper is better for orthogonality.

  foreach (ubyte b; RawFile(fname).ranged) { ... }

Kenji Hara

May 18 2012

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 5/18/12 8:27 AM, Steven Schveighoffer wrote:
 But be clear, I am *not* going to remove the existing stream I/O
 primitives I had for buffered i/o, I'm rather *adding* range primitives
 as well.

That sounds very promising.

Andrei

May 18 2012

"Mehrdad" <wfunction hotmail.com> writes:

On Thursday, 17 May 2012 at 14:02:09 UTC, Steven Schveighoffer 
wrote:
 2. I realized, buffering input stream of type T is actually an 
 input range of type T[].

The trouble is, why a slice? Why not an std.array.Array? Why not 
some other data source?
(Check/egg problem....)




Another problem I've noticed is the following:


Say you're tokenizing some input range, and it happens to just be 
a huge, gigantic string.

It *should* be possible to turn it into tokens with slices 
referring to the ORIGINAL string, which is VERY efficient because 
it doesn't require *any* heap allocations whatsoever. (You just 
tokenize with opApply() as you go, without every requiring a heap 
allocation...)

However, this is *only* possible if you don't use the concept of 
an input range!

Since you can't slice an input range, you'd be forced to use the 
front() and popFront() properties. But, as soon as you do that, 
you're gonna have to store the data somewhere... so your 
next-best option is to append it to some new gigantic array 
(instead of a bunch of small arrays, which require a lot of heap 
allocations), but even then, it's not as efficient as possible, 
because there's O(n) extra memory involved -- which defeats the 
whole purpose of working on small chunks at a time with no heap 
allocations.
(If you're going to do that, after all, you might as well read 
the entire thing into a giant string at the beginning, and work 
with an array anyway, discarding the whole idea of a range while 
doing your tokenization.)


Any ideas on how to solve this problem?

May 18 2012

"Roman D. Boiko" <rb d-coding.com> writes:

On Friday, 18 May 2012 at 07:52:57 UTC, Mehrdad wrote:
 On Thursday, 17 May 2012 at 14:02:09 UTC, Steven Schveighoffer 
 wrote:
 2. I realized, buffering input stream of type T is actually an 
 input range of type T[].

 The trouble is, why a slice? Why not an std.array.Array? Why 
 not some other data source?
 (Check/egg problem....)




 Another problem I've noticed is the following:


 Say you're tokenizing some input range, and it happens to just 
 be a huge, gigantic string.

 It *should* be possible to turn it into tokens with slices 
 referring to the ORIGINAL string, which is VERY efficient 
 because it doesn't require *any* heap allocations whatsoever. 
 (You just tokenize with opApply() as you go, without every 
 requiring a heap allocation...)

 However, this is *only* possible if you don't use the concept 
 of an input range!

 Since you can't slice an input range, you'd be forced to use 
 the front() and popFront() properties. But, as soon as you do 
 that, you're gonna have to store the data somewhere... so your 
 next-best option is to append it to some new gigantic array 
 (instead of a bunch of small arrays, which require a lot of 
 heap allocations), but even then, it's not as efficient as 
 possible, because there's O(n) extra memory involved -- which 
 defeats the whole purpose of working on small chunks at a time 
 with no heap allocations.
 (If you're going to do that, after all, you might as well read 
 the entire thing into a giant string at the beginning, and work 
 with an array anyway, discarding the whole idea of a range 
 while doing your tokenization.)


 Any ideas on how to solve this problem?

Provide slicing if underlying data source is compatible.

I have the same need in my DCT, and so far I went with a custom 
implementation (not on Github yet), but plan to reuse std.io as 
soon as it will be more or less stable and usable.

May 18 2012

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Fri, 18 May 2012 03:52:51 -0400, Mehrdad <wfunction hotmail.com> wrote:

 On Thursday, 17 May 2012 at 14:02:09 UTC, Steven Schveighoffer wrote:
 2. I realized, buffering input stream of type T is actually an input  
 range of type T[].

 The trouble is, why a slice? Why not an std.array.Array? Why not some  
 other data source?
 (Check/egg problem....)

Well, because that's what i/o buffers are :)  There isn't an OS primitive  
that reads a file descriptor into an e.g. linked list.  Anything other  
than a slice would go through a translation.

I don't know what std.array.Array is.

 Another problem I've noticed is the following:


 Say you're tokenizing some input range, and it happens to just be a  
 huge, gigantic string.

 It *should* be possible to turn it into tokens with slices referring to  
 the ORIGINAL string, which is VERY efficient because it doesn't require  
 *any* heap allocations whatsoever. (You just tokenize with opApply() as  
 you go, without every requiring a heap allocation...)

 However, this is *only* possible if you don't use the concept of an  
 input range!

How so?  A slice is an input range, and so is a string.

 Since you can't slice an input range, you'd be forced to use the front()  
 and popFront() properties. But, as soon as you do that, you're gonna  
 have to store the data somewhere... so your next-best option is to  
 append it to some new gigantic array (instead of a bunch of small  
 arrays, which require a lot of heap allocations), but even then, it's  
 not as efficient as possible, because there's O(n) extra memory involved  
 -- which defeats the whole purpose of working on small chunks at a time  
 with no heap allocations.
 (If you're going to do that, after all, you might as well read the  
 entire thing into a giant string at the beginning, and work with an  
 array anyway, discarding the whole idea of a range while doing your  
 tokenization.)


 Any ideas on how to solve this problem?

I think I get what you are saying here -- if you are processing, say, an  
XML file, and you want to split that into tokens, you have to dup each  
token from the stream, because the buffer may be reused.

But doing the same thing for a string would be wasteful.

I think in these cases, we need two types of parsing.  One is process the  
stream as it's read into a temporary buffer.  If you need data from the  
temporary buffer beyond the scope of the processing loop, you need to dup  
it.

Other way is read the entire file/stream into a buffer, then process that  
buffer with the knowledge that it's never going to change.

We probably can have buffer identify which situation it's in, so the code  
can make a runtime decision on whether to dup or not.

-Steve

May 18 2012

"Mehrdad" <wfunction hotmail.com> writes:

On Friday, 18 May 2012 at 13:44:43 UTC, Steven Schveighoffer 
wrote:
 On Fri, 18 May 2012 03:52:51 -0400, Mehrdad 
 <wfunction hotmail.com> wrote:

 On Thursday, 17 May 2012 at 14:02:09 UTC, Steven Schveighoffer 
 wrote:
 2. I realized, buffering input stream of type T is actually 
 an input range of type T[].

 The trouble is, why a slice? Why not an std.array.Array? Why 
 not some other data source?
 (Check/egg problem....)

 Well, because that's what i/o buffers are :)  There isn't an OS 
 primitive that reads a file descriptor into an e.g. linked list.


I beg to differ..

http://msdn.microsoft.com/en-us/library/windows/desktop/aa365469.aspx

May 18 2012

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Fri, 18 May 2012 11:40:24 -0400, Mehrdad <wfunction hotmail.com> wrote:

 On Friday, 18 May 2012 at 13:44:43 UTC, Steven Schveighoffer wrote:
 On Fri, 18 May 2012 03:52:51 -0400, Mehrdad <wfunction hotmail.com>  
 wrote:

 On Thursday, 17 May 2012 at 14:02:09 UTC, Steven Schveighoffer wrote:
 2. I realized, buffering input stream of type T is actually an input  
 range of type T[].

 The trouble is, why a slice? Why not an std.array.Array? Why not some  
 other data source?
 (Check/egg problem....)

 Well, because that's what i/o buffers are :)  There isn't an OS  
 primitive that reads a file descriptor into an e.g. linked list.


 I beg to differ..

 http://msdn.microsoft.com/en-us/library/windows/desktop/aa365469.aspx

It still reads into an array of buffers, which are slices.  And the  
resulting "range" looks *exactly* like a range of T[].

-Steve

May 18 2012

"Mehrdad" <wfunction hotmail.com> writes:

On Friday, 18 May 2012 at 15:49:23 UTC, Steven Schveighoffer 
wrote:
 I beg to differ..

 http://msdn.microsoft.com/en-us/library/windows/desktop/aa365469.aspx

 It still reads into an array of buffers, which are slices.  And 
 the resulting "range" looks *exactly* like a range of T[].

 -Steve

Uh, the resulting range can be totally discontiguous...

May 18 2012

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Fri, 18 May 2012 11:52:43 -0400, Mehrdad <wfunction hotmail.com> wrote:

 On Friday, 18 May 2012 at 15:49:23 UTC, Steven Schveighoffer wrote:
 I beg to differ..

 http://msdn.microsoft.com/en-us/library/windows/desktop/aa365469.aspx

 It still reads into an array of buffers, which are slices.  And the  
 resulting "range" looks *exactly* like a range of T[].

 -Steve

 Uh, the resulting range can be totally discontiguous...

So?  So can a range of T[].

I'm not getting your point yet...

-Steve

May 18 2012

"Mehrdad" <wfunction hotmail.com> writes:

On Friday, 18 May 2012 at 15:57:20 UTC, Steven Schveighoffer 
wrote:
 So?  So can a range of T[].

 I'm not getting your point yet...

 -Steve

Well you mentioned "There isn't an OS primitive that reads a file 
descriptor into an e.g. linked list, anything other than a slice 
would go through a translation.", but I was just pointing out 
that there is, and that it doesn't go through any translation. I 
guess you /can/ call it a "range of T[]", but then, you can 
*also* call a linked list "a range of T[]"... where each T[] has 
one element. That's not very useful tho, since their nature is 
kinda different..

May 18 2012

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Fri, 18 May 2012 12:02:16 -0400, Mehrdad <wfunction hotmail.com> wrote:

 On Friday, 18 May 2012 at 15:57:20 UTC, Steven Schveighoffer wrote:
 So?  So can a range of T[].

 I'm not getting your point yet...

 -Steve

 Well you mentioned "There isn't an OS primitive that reads a file  
 descriptor into an e.g. linked list, anything other than a slice would  
 go through a translation.", but I was just pointing out that there is,  
 and that it doesn't go through any translation. I guess you /can/ call  
 it a "range of T[]", but then, you can *also* call a linked list "a  
 range of T[]"... where each T[] has one element. That's not very useful  
 tho, since their nature is kinda different..

No, by range of T[] I mean this:

static assert(isInputRange!Range && is(ElementType!Range == T[]));

-Steve

May 18 2012

"Mehrdad" <wfunction hotmail.com> writes:

On Friday, 18 May 2012 at 16:16:21 UTC, Steven Schveighoffer 
wrote:
 No, by range of T[] I mean this:

 static assert(isInputRange!Range && is(ElementType!Range == 
 T[]));

 -Steve

Yes, I believe I understood it correctly...

In the case of ReadFileScatter, each T[] has the size of (at 
most) 1 page.

In the case of a random linked list, each T[] has the size of 1 
element.

Hence you can represent both of them as "a range of T[]", but 
really, that's just trying to fit it into a mold instead of 
creating the mold based on the actual thing.

What it *really* is is just a discontiguous buffer...

May 18 2012

"Martin Nowak" <dawg dawgfoto.de> writes:

 Well, because that's what i/o buffers are :)  There isn't an OS  
 primitive that reads a file descriptor into an e.g. linked list.   
 Anything other than a slice would go through a translation.

It's a pity that iovec and T[] have switch length/ptr fields.
Otherwise one could directly map read(ubyte[] bufs...) to libc.readv.

It did wrote a buffered range that uses a linked list to promote an
input range to a forward range. This is somewhat similar to lazy
ByteStrings in haskell.
There were some issue with reference counting and the implicit copy
in foreach loops but other than that it's fairly useful.

https://gist.github.com/1257196

The trouble with block-wise primitives (T[] input ranges) like byChunk is
that they make common things like parsing very difficult because the client
has to account for buffer wraps. Things like double buffering or a  
ringbuffer
would help for this.

martin

May 21 2012

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 5/18/12 2:52 AM, Mehrdad wrote:
 On Thursday, 17 May 2012 at 14:02:09 UTC, Steven Schveighoffer wrote:
 2. I realized, buffering input stream of type T is actually an input
 range of type T[].

 The trouble is, why a slice? Why not an std.array.Array? Why not some
 other data source?
 (Check/egg problem....)

Because T[] is the fundamental representation of a typed contiguous area 
of storage.

 Say you're tokenizing some input range, and it happens to just be a
 huge, gigantic string.

 It *should* be possible to turn it into tokens with slices referring to
 the ORIGINAL string, which is VERY efficient because it doesn't require
 *any* heap allocations whatsoever. (You just tokenize with opApply() as
 you go, without every requiring a heap allocation...)

 However, this is *only* possible if you don't use the concept of an
 input range!

But e.g. splitter() does exactly as you say. It's a range and does not 
use memory allocation.


Andrei

May 18 2012

Artur Skawina <art.08.09 gmail.com> writes:

On 05/18/12 06:19, kenji hara wrote:
 I think range interface is not useful for *efficient* IO. The expected
 IO interface will be more *abstract* than range primitives.
 
 ---
 If you use range I/F to read bytes from device, we will always do
 blocking IO - even if the device is socket. It is not efficient.
 
 auto sock = new TcpSocketDevice();
 if (sock.empty) { auto e = sock.front; }
   // In empty primitive, we *must* wait the socket gets one or more
 bytes or really disconnected.

No. 'empty' has to return true only _after_ seeing EOF.

Something like 'available' can return the number of elements known
to be fetchable w/o blocking. [1]

   // If not, what exactly returns sock.front?

EWOULDBLOCK :^)

But, yes, it needs to block, as there's no generic way to return
EAGAIN/EWOULDBLOCK. This is where the primitive returning a slice
comes in - that one /can/ return an empty slice.
So '!r.empty && r.fronts.length==0)' is the equivalent to EAGAIN.
(and note i'm oversimplifying -- 'fronts' can return something that
/acts/ as a slice; which is what i'm in fact are doing)

   // Then using range interface for socket reading enforces blocking
 IO. It is *really* inefficient.

 I think IO primitives must be distinct from range ones for the reasons
 mentioned above...
 
 I'm designing experimental IO primitives:
 https://github.com/9rnsr/dio
 
 I call the input stream "source", and call output stream "sink".
 "source" has a 'pull' primitive, and sink has 'push' primitive, and
 they can avoid blocking.
 If you want to construct input range interface from "source", you
 should use 'ranged' helper function in io.core module. 'ranged'
 returns a wrapper object, and in its front method, It reads bytes from
 "source", and if the read bytes not sufficient, blocks the input.
 
 In other words, range is not almighty. We should think distinct
 primitives for the IO.

Well, your 'pull' and 'push' are just different names for my 'fronts'
and 'puts' (modulo the data transfer interface, which can be done both
ways using a set of overloads, hence it doesn't matter).

I don't see any reason to invent yet another abstraction, when ranges
can be made to work with some improvements.

Ranges are just a convention; not a perfect one, but having /one/, not 
two or thirteen, is valuable. If you think ranges are flawed the
discussion should be about ripping out every trace of them from the
language and libraries and replacing them with something better. If
you think that would be bad - well, having tens of different incompatible
abstractions isn't good either. (and, yes, you can provide glue so that
they can interact, but that does not scale well)

Hmm, how are 'flush()' and 'commit()' supposed to work? Is data lost
if you omit one or both of them?

artur

[1] Reminds me:

   struct S(T) {
      shared T a;
       property size_t available()() { return a; }
   }

The compiler infers length as 'pure', which, depending on the
definition of 'shared' is wrong. ('shared' /shouldn't/ imply 'volatile',
but, as it is now, it does - so omitting a call to 'available' would
be wrong)

May 18 2012

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Fri, 18 May 2012 07:05:50 -0400, Artur Skawina <art.08.09 gmail.com>  
wrote:

 On 05/18/12 06:19, kenji hara wrote:
 I think range interface is not useful for *efficient* IO. The expected
 IO interface will be more *abstract* than range primitives.

 ---
 If you use range I/F to read bytes from device, we will always do
 blocking IO - even if the device is socket. It is not efficient.

 auto sock = new TcpSocketDevice();
 if (sock.empty) { auto e = sock.front; }
   // In empty primitive, we *must* wait the socket gets one or more
 bytes or really disconnected.

 No. 'empty' has to return true only _after_ seeing EOF.

 Something like 'available' can return the number of elements known
 to be fetchable w/o blocking. [1]

   // If not, what exactly returns sock.front?

 EWOULDBLOCK :^)

 But, yes, it needs to block, as there's no generic way to return
 EAGAIN/EWOULDBLOCK. This is where the primitive returning a slice
 comes in - that one /can/ return an empty slice.
 So '!r.empty && r.fronts.length==0)' is the equivalent to EAGAIN.
 (and note i'm oversimplifying -- 'fronts' can return something that
 /acts/ as a slice; which is what i'm in fact are doing)

I think this is an example of what Kenji and I are talking about -- trying  
to make the range interface map to *all* I/O situations.

 I don't see any reason to invent yet another abstraction, when ranges
 can be made to work with some improvements.

 Ranges are just a convention; not a perfect one, but having /one/, not
 two or thirteen, is valuable. If you think ranges are flawed the
 discussion should be about ripping out every trace of them from the
 language and libraries and replacing them with something better. If
 you think that would be bad - well, having tens of different incompatible
 abstractions isn't good either. (and, yes, you can provide glue so that
 they can interact, but that does not scale well)

My opinion is that ranges should be available for i/o when you need to  
hook them to some other range processing code, but they shouldn't be the  
preferred interface for all I/O.

-Steve

May 18 2012

kenji hara <k.hara.pg gmail.com> writes:

2012/5/18 Artur Skawina <art.08.09 gmail.com>:
 On 05/18/12 06:19, kenji hara wrote:
 I think range interface is not useful for *efficient* IO. The expected
 IO interface will be more *abstract* than range primitives.

 ---
 If you use range I/F to read bytes from device, we will always do
 blocking IO - even if the device is socket. It is not efficient.

 auto sock =3D new TcpSocketDevice();
 if (sock.empty) { auto e =3D sock.front; }
 =C2=A0 // In empty primitive, we *must* wait the socket gets one or more
 bytes or really disconnected.

 No. 'empty' has to return true only _after_ seeing EOF.

 Something like 'available' can return the number of elements known
 to be fetchable w/o blocking. [1]

 =C2=A0 // If not, what exactly returns sock.front?

 EWOULDBLOCK :^)

 But, yes, it needs to block, as there's no generic way to return
 EAGAIN/EWOULDBLOCK. This is where the primitive returning a slice
 comes in - that one /can/ return an empty slice.
 So '!r.empty && r.fronts.length=3D=3D0)' is the equivalent to EAGAIN.
 (and note i'm oversimplifying -- 'fronts' can return something that
 /acts/ as a slice; which is what i'm in fact are doing)

OK. If reading bytes from underlying device failed, your 'fronts' can
return empty slice. I understood.
But, It is still *not efficient*. The returned slice will specifies a
buffer controlled by underlying device. If you want to gather bytes
into one chunk, you must copy bytes from returned slice to your chunk.
We should reduce copying memories as much as possible.

And, 'put' primitive in output range concept doesn't support non-blocikng w=
rite.
'put' should consume *all* of given data and write it  to underlying
device, then it would block.

Therefore, whole of range concept doesn't cover non-blocking I/O.

 =C2=A0 // Then using range interface for socket reading enforces blockin=


g
 IO. It is *really* inefficient.

 I think IO primitives must be distinct from range ones for the reasons
 mentioned above...

 I'm designing experimental IO primitives:
 https://github.com/9rnsr/dio

 I call the input stream "source", and call output stream "sink".
 "source" has a 'pull' primitive, and sink has 'push' primitive, and
 they can avoid blocking.
 If you want to construct input range interface from "source", you
 should use 'ranged' helper function in io.core module. 'ranged'
 returns a wrapper object, and in its front method, It reads bytes from
 "source", and if the read bytes not sufficient, blocks the input.

 In other words, range is not almighty. We should think distinct
 primitives for the IO.

 Well, your 'pull' and 'push' are just different names for my 'fronts'
 and 'puts' (modulo the data transfer interface, which can be done both
 ways using a set of overloads, hence it doesn't matter).

 I don't see any reason to invent yet another abstraction, when ranges
 can be made to work with some improvements.

For efficiency and removing bottlenecks.
Even today, I / O is the slowest operation in the entire program.
Providing good primitives for I/O is enough value.

I have designed the 'pull' and 'push' primitives with two concepts:
1. Reduce copying memories as far as possible.
2. Control buffer memory under programer side, not device side.

 Ranges are just a convention; not a perfect one, but having /one/, not
 two or thirteen, is valuable. If you think ranges are flawed the
 discussion should be about ripping out every trace of them from the
 language and libraries and replacing them with something better. If
 you think that would be bad - well, having tens of different incompatible
 abstractions isn't good either. (and, yes, you can provide glue so that
 they can interact, but that does not scale well)

Range concept is good abstraction if underlying container controlls
ownership. But, in I/O we want to *move* ownership of bytes. Range is
not designed efficiently for the purpose, IMO.

 Hmm, how are 'flush()' and 'commit()' supposed to work? Is data lost
 if you omit one or both of them?

In my io library, BufferedSink requires three primitives, flush,
commit, and writable.

 artur

 [1] Reminds me:

 =C2=A0 struct S(T) {
 =C2=A0 =C2=A0 =C2=A0shared T a;
 =C2=A0 =C2=A0 =C2=A0 property size_t available()() { return a; }
 =C2=A0 }

 The compiler infers length as 'pure', which, depending on the
 definition of 'shared' is wrong. ('shared' /shouldn't/ imply 'volatile',
 but, as it is now, it does - so omitting a call to 'available' would
 be wrong)

May 18 2012

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 5/18/12 8:51 AM, kenji hara wrote:
 OK. If reading bytes from underlying device failed, your 'fronts' can
 return empty slice. I understood.
 But, It is still *not efficient*. The returned slice will specifies a
 buffer controlled by underlying device. If you want to gather bytes
 into one chunk, you must copy bytes from returned slice to your chunk.
 We should reduce copying memories as much as possible.

Yah, this goes back to the fact that ranges are by definition buffered; 
there's no way to escape that.

So as I said, we need to add unbuffered primitives (e.g. "Here's a 
buffer, fill it with data). They would work with both inputs that have 
no buffering at all, and with ranges.

 And, 'put' primitive in output range concept doesn't support non-blocikng
write.
 'put' should consume *all* of given data and write it  to underlying
 device, then it would block.

Right.

Zero-copy I/O is a possibility, we need to define primitives for 
destructively transferring buffers to and from streams/ranges.

 Therefore, whole of range concept doesn't cover non-blocking I/O.

Correct. Doesn't cover zero-copy I/O either.

It's interesting to think what primitives we should define, and what 
algorithms can take advantage of them beyond just copy().

 I'm designing experimental IO primitives:
 https://github.com/9rnsr/dio



Had only time to skim it, looks very promising.

 Range concept is good abstraction if underlying container controlls
 ownership. But, in I/O we want to *move* ownership of bytes. Range is
 not designed efficiently for the purpose, IMO.

Yes, yes, yes. Perfect thinking. (What ranges are good at though is for 
algorithms to mess with.)



Andrei

May 18 2012

"David Nadlinger" <see klickverbot.at> writes:

On Friday, 18 May 2012 at 16:38:22 UTC, Andrei Alexandrescu wrote:
 Range concept is good abstraction if underlying container 
 controlls
 ownership. But, in I/O we want to *move* ownership of bytes. 
 Range is
 not designed efficiently for the purpose, IMO.

 Yes, yes, yes. Perfect thinking.

And I think the issues you brought up some time ago regarding to 
orphan ranges and non-GC allocators are also rooted in this fact, 
i.e. that the design of ranges is completely oblivious to data 
ownership concerns.

But as you said, it's a very convenient interface for algorithms, 
so…

David

May 18 2012

Artur Skawina <art.08.09 gmail.com> writes:

On 05/18/12 15:51, kenji hara wrote:
 2012/5/18 Artur Skawina <art.08.09 gmail.com>:
 On 05/18/12 06:19, kenji hara wrote:
 I think range interface is not useful for *efficient* IO. The expected
 IO interface will be more *abstract* than range primitives.

 ---
 If you use range I/F to read bytes from device, we will always do
 blocking IO - even if the device is socket. It is not efficient.

 auto sock = new TcpSocketDevice();
 if (sock.empty) { auto e = sock.front; }
   // In empty primitive, we *must* wait the socket gets one or more
 bytes or really disconnected.

 No. 'empty' has to return true only _after_ seeing EOF.

 Something like 'available' can return the number of elements known
 to be fetchable w/o blocking. [1]

   // If not, what exactly returns sock.front?

 EWOULDBLOCK :^)

 But, yes, it needs to block, as there's no generic way to return
 EAGAIN/EWOULDBLOCK. This is where the primitive returning a slice
 comes in - that one /can/ return an empty slice.
 So '!r.empty && r.fronts.length==0)' is the equivalent to EAGAIN.
 (and note i'm oversimplifying -- 'fronts' can return something that
 /acts/ as a slice; which is what i'm in fact are doing)

 
 OK. If reading bytes from underlying device failed, your 'fronts' can
 return empty slice. I understood.
 But, It is still *not efficient*. The returned slice will specifies a
 buffer controlled by underlying device. If you want to gather bytes
 into one chunk, you must copy bytes from returned slice to your chunk.
 We should reduce copying memories as much as possible.

Depends if your input range supports zero-copy or not. IOW you avoid
the copy iff the range can somehow write the data directly to the caller
provided buffer. This can be true eg for file reads, where you can tell
the read(2) syscall to write into the user buffer. But what if you need to
buffer the stream? An intermediate buffer can become necessary anyway.
But, as i said before, i agree that a caller-provided-buffer-interface
is useful.

   E[] fronts();
   void fronts(ref E[]);

And one can be implemented in terms of the other, ie:

  E[] fronts[] { E[] els; fronts(els); return els; }
  void fronts(ref E[] e) { e[] = fronts()[]; }

depending on which is more efficient. A range can provide

  enum bool HasBuffer = 0 || 1;

so that the user can pick the more suited alternative.

 And, 'put' primitive in output range concept doesn't support non-blocikng
write.
 'put' should consume *all* of given data and write it  to underlying
 device, then it would block.

True, a write-as-much-as-possible-but not-more primitive is needed.

   size_t puts(E[], size_t atleast=size_t.max);

or something like that. (Doing it this way allows for explicit
non-blocking 'puts', ie '(written=puts(els, 0))==0' means EAGAIN.)

 Therefore, whole of range concept doesn't cover non-blocking I/O.

See above.

   // Then using range interface for socket reading enforces blocking
 IO. It is *really* inefficient.

 I think IO primitives must be distinct from range ones for the reasons
 mentioned above...

 I'm designing experimental IO primitives:
 https://github.com/9rnsr/dio

 I call the input stream "source", and call output stream "sink".
 "source" has a 'pull' primitive, and sink has 'push' primitive, and
 they can avoid blocking.
 If you want to construct input range interface from "source", you
 should use 'ranged' helper function in io.core module. 'ranged'
 returns a wrapper object, and in its front method, It reads bytes from
 "source", and if the read bytes not sufficient, blocks the input.

 In other words, range is not almighty. We should think distinct
 primitives for the IO.

 Well, your 'pull' and 'push' are just different names for my 'fronts'
 and 'puts' (modulo the data transfer interface, which can be done both
 ways using a set of overloads, hence it doesn't matter).

 I don't see any reason to invent yet another abstraction, when ranges
 can be made to work with some improvements.

 
 For efficiency and removing bottlenecks.
 Even today, I / O is the slowest operation in the entire program.
 Providing good primitives for I/O is enough value.
 
 I have designed the 'pull' and 'push' primitives with two concepts:
 1. Reduce copying memories as far as possible.
 2. Control buffer memory under programer side, not device side.

Do you have a contained microbenchmark? It would be easy to compare
both approaches... If you do i'll write one using my scheme - so
far i only did this for inter-thread communication, there's no file
based backend.

 Ranges are just a convention; not a perfect one, but having /one/, not
 two or thirteen, is valuable. If you think ranges are flawed the
 discussion should be about ripping out every trace of them from the
 language and libraries and replacing them with something better. If
 you think that would be bad - well, having tens of different incompatible
 abstractions isn't good either. (and, yes, you can provide glue so that
 they can interact, but that does not scale well)

 
 Range concept is good abstraction if underlying container controlls
 ownership. But, in I/O we want to *move* ownership of bytes. Range is
 not designed efficiently for the purpose, IMO.
 
 Hmm, how are 'flush()' and 'commit()' supposed to work? Is data lost
 if you omit one or both of them?

 
 In my io library, BufferedSink requires three primitives, flush,
 commit, and writable.

But what happens if neither flush nor commit is called?

 [1] Reminds me:

   struct S(T) {
      shared T a;
       property size_t available()() { return a; }
   }

 The compiler infers length as 'pure', which, depending on the


                       ^^^^^^
s/length/available/'.

 definition of 'shared' is wrong. ('shared' /shouldn't/ imply 'volatile',
 but, as it is now, it does - so omitting a call to 'available' would
 be wrong)


artur

May 18 2012

kenji hara <k.hara.pg gmail.com> writes:

2012/5/19 Artur Skawina <art.08.09 gmail.com>:
 On 05/18/12 15:51, kenji hara wrote:
 OK. If reading bytes from underlying device failed, your 'fronts' can
 return empty slice. I understood.
 But, It is still *not efficient*. The returned slice will specifies a
 buffer controlled by underlying device. If you want to gather bytes
 into one chunk, you must copy bytes from returned slice to your chunk.
 We should reduce copying memories as much as possible.

 Depends if your input range supports zero-copy or not. IOW you avoid
 the copy iff the range can somehow write the data directly to the caller
 provided buffer. This can be true eg for file reads, where you can tell
 the read(2) syscall to write into the user buffer. But what if you need t=

o
 buffer the stream? An intermediate buffer can become necessary anyway.
 But, as i said before, i agree that a caller-provided-buffer-interface
 is useful.

 =C2=A0 E[] fronts();
 =C2=A0 void fronts(ref E[]);

 And one can be implemented in terms of the other, ie:

 =C2=A0E[] fronts[] { E[] els; fronts(els); return els; }
 =C2=A0void fronts(ref E[] e) { e[] =3D fronts()[]; }

The flaw of your design is, the memory to store read bytes/elements is
allocated by the lower layer.
E.g. If you want to construct linked list of some some elements, you
must copy elements from returned slice to new allocated node. I think
it is still inefficient.

 depending on which is more efficient. A range can provide

 =C2=A0enum bool HasBuffer =3D 0 || 1;

 so that the user can pick the more suited alternative.

I think fewer primitives as possible is better design than adding
extra/optional primitives.
How many primitives in your interface design?

 And, 'put' primitive in output range concept doesn't support non-blocikn=


g write.
 'put' should consume *all* of given data and write it =C2=A0to underlyin=


g
 device, then it would block.

 True, a write-as-much-as-possible-but not-more primitive is needed.

 =C2=A0 size_t puts(E[], size_t atleast=3Dsize_t.max);

 or something like that. (Doing it this way allows for explicit
 non-blocking 'puts', ie '(written=3Dputs(els, 0))=3D=3D0' means EAGAIN.)

 Therefore, whole of range concept doesn't cover non-blocking I/O.


I can agree for the signatures. but the names 'fronts' and 'puts' are
a little too similar.


 I'm designing experimental IO primitives:
 https://github.com/9rnsr/dio


 I have designed the 'pull' and 'push' primitives with two concepts:
 1. Reduce copying memories as far as possible.
 2. Control buffer memory under programer side, not device side.

 Do you have a contained microbenchmark? It would be easy to compare
 both approaches... If you do i'll write one using my scheme - so
 far i only did this for inter-thread communication, there's no file
 based backend.

It has a sample benchmark to compare performance with std.stdio for
line iteration.
In my PC, it is 2x faster in maximum.

 In my io library, BufferedSink requires three primitives, flush,
 commit, and writable.

 But what happens if neither flush nor commit is called?

If you forget to call 'commit', 0 length data will be written.
And if you forget to call 'flush', the committed data won't be written
to actual device.

Kenji Hara

May 18 2012

Artur Skawina <art.08.09 gmail.com> writes:

On 05/18/12 17:43, kenji hara wrote:
 2012/5/19 Artur Skawina <art.08.09 gmail.com>:
 On 05/18/12 15:51, kenji hara wrote:
 OK. If reading bytes from underlying device failed, your 'fronts' can
 return empty slice. I understood.
 But, It is still *not efficient*. The returned slice will specifies a
 buffer controlled by underlying device. If you want to gather bytes
 into one chunk, you must copy bytes from returned slice to your chunk.
 We should reduce copying memories as much as possible.

 Depends if your input range supports zero-copy or not. IOW you avoid
 the copy iff the range can somehow write the data directly to the caller
 provided buffer. This can be true eg for file reads, where you can tell
 the read(2) syscall to write into the user buffer. But what if you need to
 buffer the stream? An intermediate buffer can become necessary anyway.
 But, as i said before, i agree that a caller-provided-buffer-interface
 is useful.

   E[] fronts();
   void fronts(ref E[]);

 And one can be implemented in terms of the other, ie:

  E[] fronts[] { E[] els; fronts(els); return els; }
  void fronts(ref E[] e) { e[] = fronts()[]; }

 
 The flaw of your design is, the memory to store read bytes/elements is
 allocated by the lower layer.

It's a feature. :)

 E.g. If you want to construct linked list of some some elements, you
 must copy elements from returned slice to new allocated node. I think
 it is still inefficient.
 
 depending on which is more efficient. A range can provide

  enum bool HasBuffer = 0 || 1;

 so that the user can pick the more suited alternative.

 
 I think fewer primitives as possible is better design than adding
 extra/optional primitives.

If you pick just one scheme, then you will end up with an unnecessary
copy sometimes. Or using non-std APIs. Again, I'm saying *both* caller-
owned-buffer *and* range-owned-buffer interfaces should be defined.
Otherwise, code that needs decent performance will not be able to use
the pure range API, and will not interoperate well with "std" code.

 How many primitives in your interface design?

Multi-element versions of front, popFront and puts. I think this
was enough to get things working; this is the tested and proven part.

Then there's 'available' and 'free', so that it's possible to 
avoid blocking. And 'allocate' and 'release', for zero-copy output
streams. But i don't remember if i've actually used these parts, i
don't think i needed them.
This is all from memory, as the last time i worked on this was a while
ago, just before i ran into:

   http://www.digitalmars.com/d/archives/digitalmars/D/dtors_in_shared_structs_fail_to_compile_157978.html

...

 And, 'put' primitive in output range concept doesn't support non-blocikng
write.
 'put' should consume *all* of given data and write it  to underlying
 device, then it would block.

 True, a write-as-much-as-possible-but not-more primitive is needed.

   size_t puts(E[], size_t atleast=size_t.max);

 or something like that. (Doing it this way allows for explicit
 non-blocking 'puts', ie '(written=puts(els, 0))==0' means EAGAIN.)

 Therefore, whole of range concept doesn't cover non-blocking I/O.


 
 I can agree for the signatures. but the names 'fronts' and 'puts' are
 a little too similar.

The names are bad, i know... If anybody has better suggestions... (and
almost any other names would be better :) )


artur

May 18 2012

Artur Skawina <art.08.09 gmail.com> writes:

On 05/18/12 20:18, Artur Skawina wrote:
 On 05/18/12 17:43, kenji hara wrote:
 I'm designing experimental IO primitives:
 https://github.com/9rnsr/dio





 
 It has a sample benchmark to compare performance with std.stdio for
 line iteration.

 
 It's not exactly what i had i mind, but i tried to build it;
 it wants a 'io/wrapper.d' module which can not be found.

And is apparently windows-only; missing HANDLE type, non-
existent TextOutputRange. I gave up after running into:

io/file.d:263: Error: static assert  (isSource!(File)) is false

artur

May 18 2012

"Masahiro Nakagawa" <repeatedly gmail.com> writes:

On Friday, 18 May 2012 at 19:18:21 UTC, Artur Skawina wrote:
 On 05/18/12 20:18, Artur Skawina wrote:
 On 05/18/12 17:43, kenji hara wrote:
 I'm designing experimental IO primitives:
 https://github.com/9rnsr/dio





 
 It has a sample benchmark to compare performance with 
 std.stdio for
 line iteration.

 
 It's not exactly what i had i mind, but i tried to build it;
 it wants a 'io/wrapper.d' module which can not be found.

 And is apparently windows-only; missing HANDLE type, non-
 existent TextOutputRange. I gave up after running into:

 io/file.d:263: Error: static assert  (isSource!(File)) is false

Current dio is PoC for new IO design.
If we go with such design, I will add Linux/Mac support to dio.


Masahiro

May 19 2012

Artur Skawina <art.08.09 gmail.com> writes:

On 05/18/12 17:43, kenji hara wrote:
 I'm designing experimental IO primitives:
 https://github.com/9rnsr/dio





 It has a sample benchmark to compare performance with std.stdio for
 line iteration.

It's not exactly what i had i mind, but i tried to build it;
it wants a 'io/wrapper.d' module which can not be found.

artur

May 18 2012

kenji hara <k.hara.pg gmail.com> writes:

Sorry, I have updated it.
Run 'make runbench' or 'make runbench_opt'.

Kenji Hara

2012/5/19 Artur Skawina <art.08.09 gmail.com>:
 On 05/18/12 17:43, kenji hara wrote:
 I'm designing experimental IO primitives:
 https://github.com/9rnsr/dio





 It has a sample benchmark to compare performance with std.stdio for
 line iteration.

 It's not exactly what i had i mind, but i tried to build it;
 it wants a 'io/wrapper.d' module which can not be found.

 artur

May 19 2012

"Masahiro Nakagawa" <repeatedly gmail.com> writes:

Please add README to top directory.
(Contents are benchmark command, support environment and etc)

We can see such information on web browser ;)

P.S.
I want to do pull request for supporting other environments.
But I'm busy right now...


Masahiro

On Saturday, 19 May 2012 at 15:22:37 UTC, kenji hara wrote:
 Sorry, I have updated it.
 Run 'make runbench' or 'make runbench_opt'.

 Kenji Hara

 2012/5/19 Artur Skawina <art.08.09 gmail.com>:
 On 05/18/12 17:43, kenji hara wrote:
 I'm designing experimental IO primitives:
 https://github.com/9rnsr/dio





 It has a sample benchmark to compare performance with 
 std.stdio for
 line iteration.

 It's not exactly what i had i mind, but i tried to build it;
 it wants a 'io/wrapper.d' module which can not be found.

 artur

May 19 2012

travert phare.normalesup.org (Christophe Travert) writes:

I don't have time to read the whole discussion right now, but I've 
thought since our exchange here about buffered stream. I've imagined 
something close to, but quite different from you buffered stream, where 
the length of the buffer chunk can be adapted, and the buffer be poped 
by an arbitrary amount of bytes:

I reuse the name front, popFront and empty, but it may not be such a 
good idea.

struct BufferedStream(T)
{
  T[] buf;
  size_t cursor;
  size_t decoded;
  InputStream input;

  // returns a slice to the n next elements of the input stream.
  // this slice is valid until next call to front only.
  T[] front(size_t n)
  {
    if (n <= decoded - cursor) return buf[cursor..cursor+n];
    if (n <= buffer.length)
      {
       ... // move data to the front of the buffer and read new data to 
           // fill the buffer.
        return buf[0..n];
      }
    if (n > buf.length)
     {
       ... // resize buffer and read new data to fill the buffer
       return buf[0..n];
     }
  }
  // pop the next n elements from the buffer.
  void popFront(size_t n) { cursor += n; }
  void empty() { return input.eof && cursor == buf.length; }
}

This kind of buffered stream enable you read data by varying chunk size, 
but always read data by an amount that is convenient for the input 
stream. (and front could be made to return a buffer with the size that 
is most adequate for the stream when called with size_t.max as n).

More importantly, it allows to peak at an arbitrary amount of data, use 
it, and decide how many items you want to consume. For example, if 
allows to write stuff like "ReadAWord" without double buffering: you 
get enough characters from the buffer until you find a space, and then 
you consume only the characters that are the space.

"Steven Schveighoffer" , dans le message (digitalmars.D:167733), a
 écrit :
 OK, so I had a couple partially written replies on the 'deprecating
 std.stream etc' thread, then I had to go home.
 
 But I thought about this a lot last night, and some of the things Andrei
 and others are saying is starting to make sense (I know!).  Now I've
 scrapped those replies and am thinking about redesigning my i/o package
 (most of the code can stay intact).
 
 I'm a little undecided on some of the details, but here is what I think
 makes sense:
 
 1. We need a buffering input stream type.  This must have additional
 methods besides the range primitives, because doing one-at-a-time byte
 reads is not going to cut it.
 2. I realized, buffering input stream of type T is actually an input range
 of type T[].  Observe:
 
 struct /*or class*/ buffer(T)
 {
       T[] buf;
       InputStream input;
       ...
        property T[] front() { return buf; }
       void popFront() {input.read(buf);} // flush existing buffer, read  
 next.
        property bool empty() { return buf.length == 0;}
 }
 
 Roughly speaking, not all the details are handled, but this makes a
 feasible input range that will perform quite nicely for things like
 std.algorithm.copy.  I haven't checked, but copy should be able to handle
 transferring a range of type T[] to an output range with element type T,
 if it's not able to, it should be made to work. 

Or with joiner(buffer);

 I know at least, an
 output stream with element type T supports putting T or T[].  What I think
 really makes sense is to support:
 
 buffer!ubyte b;
 outputStream o;
 
 o.put(b); // uses range primitives to put all the data to o, one element
 (i.e. ubyte[]) of b at a time

Of course, output stream should not have a consistent interface with 
input stream.

 3. An ultimate goal of the i/o streaming package should be to be able to
 do this:
 
 auto x = new XmlParser("<rootElement></rootElement>");
 
 or at least
 
 auto x = new XmlParser(buffered("<rootElement></rootElement>"));
 
 So I think arrays need to be able to be treated as a buffering streams.  I
 tried really hard to think of some way to make this work with my existing
 system, but I don't think it will without unnecessary baggage, and losing
 interoperability with existing range functions.

A simple string stream can be built on top of a string, with no 
other member than the string itself, can't it ?
With my definition of buffered stream, at least, it can, and any array 
could support:
T[] front(size_t i) { return this[0..min(i, $)]; }
void popFront(size_t i) { this = this[i..$]; }

-- 
Christophe

May 21 2012

D Programming

C/C++ Programming

Other

digitalmars.D - stream interfaces - with ranges