digitalmars.D - std.stream replacement

BLM768 (15/15) Mar 05 2013 While working on a project, I've started to realize that I miss

Jonathan M Davis (11/26) Mar 05 2013 In general, a stream _is_ a range, making a lot of "stream" stuff basica...

Steven Schveighoffer (23/55) Mar 05 2013 This is not correct. A stream is a good low-level representation of i/o...

Dmitry Olshansky (12/57) Mar 05 2013 That's it.

Steven Schveighoffer (24/30) Mar 05 2013 You are right about the locking, though shared streams like stdout will ...

Dmitry Olshansky (33/66) Mar 05 2013 But at least these are already shared :) In fact, shared is meant to be

Steven Schveighoffer (48/61) Mar 05 2013 :
Timon Gehr (6/16) Mar 06 2013 The lexer I have built last year does something similar. It allows the

Dmitry Olshansky (5/23) Mar 06 2013 Exactly. Nice to see common patterns resurface, would be good to fit it

BLM768 (27/45) Mar 05 2013 Ranges aren't necessarily higher- or lower-level than streams;

Steven Schveighoffer (23/59) Mar 06 2013 I think you misunderstand. Ranges absolutely can be a source for stream...

Stewart Gordon (15/21) Mar 06 2013

BLM768 (4/18) Mar 06 2013 That's basically what my thinking was, but you've expressed it in
Steven Schveighoffer (15/32) Mar 07 2013 Sorry, but that is.

Johannes Pfau (6/11) Mar 07 2013 Exactly. There's also some precedent for this: C# Enumerators
Stewart Gordon (8/9) Mar 08 2013

Steven Schveighoffer (16/24) Mar 08 2013 I hope to convince Walter the error of his ways :)

Tyler Jameson Little (24/54) Jul 04 2013 I agree with this 100%, but I obviously am not the one making the

w0rp (5/5) Jul 05 2013 I think you can win with both. You can have very convenient and
Steven Schveighoffer (19/51) Dec 14 2013 I realize this is really old, and I sort of dropped off the D cliff

Jacob Carlborg (6/11) Dec 14 2013 Yeah, it still need to be replaced. In this case you can have a look at

sclytrack (16/29) Apr 16 2014 SINK, TAP

Steven Schveighoffer (7/29) Apr 17 2014 Chaining i/o objects is something I have yet to tackle. I have ideas, bu...

Tero (5/5) May 28 2014 While waiting for the new stream I wrote myself a stream for file

Steven Schveighoffer (9/14) May 28 2014 Cool. I actually have made some progress, I have a new-io3 branch on

Tero (6/23) May 28 2014 Just noticed, the paste was screwed. Had a weird character in a

Jonathan M Davis (12/24) Mar 08 2013 In general, ranges should work just fine for I/O as long as they have an...

Stewart Gordon (24/35) Mar 09 2013 If examining one byte at a time is what you want. I mean this at the

Marco Leise (14/24) Mar 10 2013 For most binary formats you need to deal with endianness for

Stewart Gordon (28/38) Mar 10 2013 Endian conversion is really part of decoding the data, rather than of

H. S. Teoh (56/82) Mar 08 2013 [...]

BLM768 (28/69) Mar 06 2013 Although I probably didn't communicate it very well, my idea was

Steven Schveighoffer (63/106) Mar 07 2013 My point is, we should not build streams from ranges. We have to

BLM768 (18/45) Mar 07 2013 Maybe "takeArray" is a bad design, but it was just an example.

Steven Schveighoffer (8/13) Mar 08 2013 One thing to remember is that streams need to be runtime swappable. For...

BLM768 (10/13) Mar 08 2013 That does make my solution a little tougher to implement. Hmmm...

"BLM768" <blm768 gmail.com> writes:

While working on a project, I've started to realize that I miss 
streams. If someone's not already working on bringing std.stream 
up to snuff, I think that we should start thinking about to do 
that.
Of course, with ranges being so popular (with very good reason), 
the new stream interface would probably just be a range wrapper 
around a file; in fact, a decent amount of functionality could be 
implemented by just adding a byChars range to the standard File 
struct and leaving the parsing functionality to std.conv.parse. 
Of course, there's no reason to stop there; we could also add 
socket streams, compressed streams, and just about any other type 
of stream, all without an especially large amount of effort.
Unless someone already wants to tackle the project (or has 
already started), I'd be willing to work out at least a basic 
design and implementation.

Mar 05 2013

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Tuesday, March 05, 2013 09:14:16 BLM768 wrote:
 While working on a project, I've started to realize that I miss
 streams. If someone's not already working on bringing std.stream
 up to snuff, I think that we should start thinking about to do
 that.
 Of course, with ranges being so popular (with very good reason),
 the new stream interface would probably just be a range wrapper
 around a file; in fact, a decent amount of functionality could be
 implemented by just adding a byChars range to the standard File
 struct and leaving the parsing functionality to std.conv.parse.
 Of course, there's no reason to stop there; we could also add
 socket streams, compressed streams, and just about any other type
 of stream, all without an especially large amount of effort.
 Unless someone already wants to tackle the project (or has
 already started), I'd be willing to work out at least a basic
 design and implementation.

In general, a stream _is_ a range, making a lot of "stream" stuff basically 
irrelevant. What's needed then is a solid, efficient range interface on top of 
I/O (which we're lacking at the moment).

Steven Schveighoffer was working on std.io (which would be a replacement for 
std.stdio), and I believe that streams were supposed to be part of that, but 
I'm not sure. And I don't know quite what std.io's status is at this point, so 
I have no idea when it'll be ready for review. Steven seems to be very busy 
these days, so I suspect that it's been a while since much progress was made 
on it.

- Jonathan M Davis

Mar 05 2013

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Tue, 05 Mar 2013 03:22:00 -0500, Jonathan M Davis <jmdavisProg gmx.com>  
wrote:

 On Tuesday, March 05, 2013 09:14:16 BLM768 wrote:
 While working on a project, I've started to realize that I miss
 streams. If someone's not already working on bringing std.stream
 up to snuff, I think that we should start thinking about to do
 that.
 Of course, with ranges being so popular (with very good reason),
 the new stream interface would probably just be a range wrapper
 around a file; in fact, a decent amount of functionality could be
 implemented by just adding a byChars range to the standard File
 struct and leaving the parsing functionality to std.conv.parse.
 Of course, there's no reason to stop there; we could also add
 socket streams, compressed streams, and just about any other type
 of stream, all without an especially large amount of effort.
 Unless someone already wants to tackle the project (or has
 already started), I'd be willing to work out at least a basic
 design and implementation.

 In general, a stream _is_ a range, making a lot of "stream" stuff  
 basically
 irrelevant. What's needed then is a solid, efficient range interface on  
 top of
 I/O (which we're lacking at the moment).

This is not correct.  A stream is a good low-level representation of i/o.   
A range is a good high-level abstraction of that i/o.  We need both.   
Ranges make terrible streams for two reasons:

1. r.front does not have room for 'read n bytes'.  Making it do that is  
awkward (e.g. r.nextRead = 20; r.front; // read 20 bytes)
2. ranges have separate operations for getting data and progressing data.   
Streams by their very nature combine the two in one operation (i.e. read)

Now, ranges ARE a very good interface for a high level abstraction.  But  
we need a good low-level type to perform the buffering necessary to make  
ranges functional.  std.io is a design that hopefully will fit within the  
existing File type, be compatible with C's printf, and also provides a  
replacement for C's antiquated FILE * buffering stream.  With tests I have  
done, std.io is more efficient and more flexible/powerful than C's version.

 Steven Schveighoffer was working on std.io (which would be a replacement  
 for
 std.stdio), and I believe that streams were supposed to be part of that,  
 but
 I'm not sure. And I don't know quite what std.io's status is at this  
 point, so
 I have no idea when it'll be ready for review. Steven seems to be very  
 busy
 these days, so I suspect that it's been a while since much progress was  
 made
 on it.

Yes, very busy :)  I had taken a break from D for about 3-4 months, had to  
work on my side business.  Still working like mad there, but I'm carving  
out as much time as I can for D.

std.io has not had pretty much any progress since I last went through the  
ringer (and how!) on the forums.  It is not dead, but it will take me some  
time to be able to kick start it again (read: understand what the hell I  
was doing there).  I do plan to try in the coming months.

-Steve

Mar 05 2013

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

05-Mar-2013 20:12, Steven Schveighoffer пишет:
 On Tue, 05 Mar 2013 03:22:00 -0500, Jonathan M Davis
 <jmdavisProg gmx.com> wrote:

 On Tuesday, March 05, 2013 09:14:16 BLM768 wrote:
 While working on a project, I've started to realize that I miss
 streams. If someone's not already working on bringing std.stream
 up to snuff, I think that we should start thinking about to do
 that.
 Of course, with ranges being so popular (with very good reason),
 the new stream interface would probably just be a range wrapper
 around a file; in fact, a decent amount of functionality could be
 implemented by just adding a byChars range to the standard File
 struct and leaving the parsing functionality to std.conv.parse.
 Of course, there's no reason to stop there; we could also add
 socket streams, compressed streams, and just about any other type
 of stream, all without an especially large amount of effort.
 Unless someone already wants to tackle the project (or has
 already started), I'd be willing to work out at least a basic
 design and implementation.



[snip]
 Now, ranges ARE a very good interface for a high level abstraction.  But
 we need a good low-level type to perform the buffering necessary to make
 ranges functional.  std.io is a design that hopefully will fit within
 the existing File type, be compatible with C's printf, and also provides
 a replacement for C's antiquated FILE * buffering stream.  With tests I
 have done, std.io is more efficient and more flexible/powerful than C's
 version.

That's it.
C's iobuf stuff and locks around (f)getc are one reason for it being 
slower. In D we need no stinkin' locks as stuff is TLS by default.

Plus as far as I understand your std.io idea it was focused around 
filling up user-provided buffers directly without obligatory double 
buffering somewhere inside like C does.

 Steven Schveighoffer was working on std.io (which would be a
 replacement for
 std.stdio), and I believe that streams were supposed to be part of
 that, but
 I'm not sure. And I don't know quite what std.io's status is at this
 point, so
 I have no idea when it'll be ready for review. Steven seems to be very
 busy
 these days, so I suspect that it's been a while since much progress
 was made
 on it.

 Yes, very busy :)  I had taken a break from D for about 3-4 months, had
 to work on my side business.  Still working like mad there, but I'm
 carving out as much time as I can for D.

 std.io has not had pretty much any progress since I last went through
 the ringer (and how!) on the forums.  It is not dead, but it will take
 me some time to be able to kick start it again (read: understand what
 the hell I was doing there).  I do plan to try in the coming months.

Would love to see it progressing towards Phobos inclusion. It's one of 
areas where D can easily beat C runtime, no cheating.



-- 
Dmitry Olshansky

Mar 05 2013

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Tue, 05 Mar 2013 11:43:59 -0500, Dmitry Olshansky  
<dmitry.olsh gmail.com> wrote:


 That's it.
 C's iobuf stuff and locks around (f)getc are one reason for it being  
 slower. In D we need no stinkin' locks as stuff is TLS by default.

 Plus as far as I understand your std.io idea it was focused around  
 filling up user-provided buffers directly without obligatory double  
 buffering somewhere inside like C does.

You are right about the locking, though shared streams like stdout will  
need to be locked (this is actually one of the more difficult parts to do,  
and I haven't done it yet.  Shared is a pain to work with, the current  
File struct cheats with casting, I think I will have to do something like  
that).  File does a pretty good job of locking for an entire operation  
(i.e. an entire writeln/readf).

C iobuf I think tries to avoid double buffering for some things (e.g.  
gcc's getline), but std.io takes that to a new level.

With std.io you have SAFE access directly to the buffer.  So instead of  
getline being "read directly into my buffer, or copy into my buffer", it's  
"make sure there is a complete line in the file buffer, then give me a  
slice to it".  What's great about this is, you don't need to hack phobos  
to get buffer access like you need to hack C's stream to get buffer access  
to create something like getline.  So many more possibilities exist.

So things like parsing xml files need no double buffering at all, AND you  
don't even have to provide a buffer!

Note that it is still possible to provide a buffer, in case that is what  
you want to do, and it will only copy any data already in the stream  
buffer.  Everything else is read directly in (I have some heuristics to  
try and prevent tiny reads, so if you want to say read 4 bytes, it will  
first fill the stream buffer, then copy 4 bytes).

-Steve

Mar 05 2013

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

05-Mar-2013 22:49, Steven Schveighoffer пишет:
 On Tue, 05 Mar 2013 11:43:59 -0500, Dmitry Olshansky
 <dmitry.olsh gmail.com> wrote:


 That's it.
 C's iobuf stuff and locks around (f)getc are one reason for it being
 slower. In D we need no stinkin' locks as stuff is TLS by default.

 Plus as far as I understand your std.io idea it was focused around
 filling up user-provided buffers directly without obligatory double
 buffering somewhere inside like C does.

 You are right about the locking, though shared streams like stdout will
 need to be locked (this is actually one of the more difficult parts to
 do, and I haven't done it yet.  Shared is a pain to work with, the
 current File struct cheats with casting, I think I will have to do
 something like that).

But at least these are already shared :) In fact, shared is meant to be 
a pain in the ass (but I agree it should get some more convenience).

What is a key point is that shared should have been the user's problem. 
Now writeln and its ilk are too darn common so some locking scheme got 
to be backed-in to amend the pain.

 File does a pretty good job of locking for an
 entire operation (i.e. an entire writeln/readf).

I just hope it doesn't call internally locking C functions after that...

 C iobuf I think tries to avoid double buffering for some things (e.g.
 gcc's getline), but std.io takes that to a new level.

Yeah, AFAIK it translates calls for say few megabytes of data to direct 
read/write OS syscalls. Hard to say how reliable their heuristics are.

 With std.io you have SAFE access directly to the buffer.  So instead of
 getline being "read directly into my buffer, or copy into my buffer",
 it's "make sure there is a complete line in the file buffer, then give
 me a slice to it".  What's great about this is, you don't need to hack
 phobos to get buffer access like you need to hack C's stream to get
 buffer access to create something like getline.  So many more
 possibilities exist.

 So things like parsing xml files need no double buffering at all, AND
 you don't even have to provide a buffer!

Slicing the internal buffer is real darn nice. Hard to stress it enough ;)

There is one thing I found a nice abstraction while helping out on D's 
lexer in D and I call it mark-slice range. An extension to forward range 
it seems.

It's all about buffering and defining a position in input such that you 
don't care for anything up to this point. This means that starting from 
thusly marked point stuff needs to be kept in buffer, everything prior 
to it could be discarded. The 2nd operation "slice" is getting a slice 
of some internal buffer from last mark to the current position.

Would be interesting to see how it correlates with buffered I/O in 
std.io, what you say so far fits the bill.

 Note that it is still possible to provide a buffer, in case that is what
 you want to do, and it will only copy any data already in the stream
 buffer.

So if I use my own buffers exclusively there is nothing to worry about 
(no copy this - copy that)?

 Everything else is read directly in (I have some heuristics to
 try and prevent tiny reads, so if you want to say read 4 bytes, it will
 first fill the stream buffer, then copy 4 bytes).

This seems a bit like C one iff it's a smart libc. What if instead you 
read more then requested into target buffer (if it fits)? You can tweak 
the definition of read to say "buffer no less then X bytes, the actual 
amount is returned" :)

And if one want the direct and dumb way of get me these 4 bytes - just 
let them provide fixed buffer of 4 bytes in total, then std.io can't 
read more then that. (Could be useful to bench OS I/O layer and such)
Another consequence is that std.io wouldn't need to allocate internal 
buffer eagerly for tiny reads (in case they actually show up).

-- 
Dmitry Olshansky

Mar 05 2013

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Tue, 05 Mar 2013 14:12:58 -0500, Dmitry Olshansky  =

<dmitry.olsh gmail.com> wrote:

 05-Mar-2013 22:49, Steven Schveighoffer =D0=BF=D0=B8=D1=88=D0=B5=D1=82=

:

 Everything else is read directly in (I have some heuristics to
 try and prevent tiny reads, so if you want to say read 4 bytes, it wi=


ll
 first fill the stream buffer, then copy 4 bytes).

 This seems a bit like C one iff it's a smart libc. What if instead you=

  =

 read more then requested into target buffer (if it fits)? You can twea=

k  =

 the definition of read to say "buffer no less then X bytes, the actual=

  =

 amount is returned" :)

 And if one want the direct and dumb way of get me these 4 bytes - just=

  =

 let them provide fixed buffer of 4 bytes in total, then std.io can't  =

 read more then that. (Could be useful to bench OS I/O layer and such)
 Another consequence is that std.io wouldn't need to allocate internal =

 =

 buffer eagerly for tiny reads (in case they actually show up).

The way I devised it is a "processor" delegate.  Basically, you provide =
a  =

delegate that says "yep, this is enough".  While it's not enough, it kee=
ps  =

extending and filling the extended buffer.

Which buffer is used is your call, if you want it to use it's internal  =

buffer, then it will, extending as necessary (I currently only use D  =

arrays and built-in appending/extending).

Here is the a very simple readline implementation (only supports '\n',  =

only supports UTF8, the real version supports much more):

const(char)[] readline(InputStream input)
{
    size_t checkLine(const(ubyte)[] data, size_t start)
    {
        foreach(size_t i; start..data.length)
           if(data[i] =3D=3D '\n')
              return i+1; // consume this many bytes
        return size_t.max; // no eol found yet.
    }

    auto result =3D cast(const(char)[]) input.readUntil(&checkLine);
    if(result.length && result[$-1] =3D=3D '\n')
       result =3D result[0..$-1];
    return result;
}

Note that I don't have to care about management of the return value, it =
is  =

handled for me by the input stream.  If the user intends to save that fo=
r  =

later, he can make a copy.  If not, just process it and move on to the  =

next line.

There is also an appendUntil function which takes an already existing  =

buffer and appends to it.

Also note that I have a shortcut for what is probably a very common  =

requirement -- read until a delimiter is found.  That version accepts  =

either a single ubyte or a ubyte array.  I just showed the above for  =

effect.

input.readUntil('\n');

also will work (for utf-8 streams).

-Steve

Mar 05 2013

Timon Gehr <timon.gehr gmx.ch> writes:

On 03/05/2013 08:12 PM, Dmitry Olshansky wrote:
 ...

 There is one thing I found a nice abstraction while helping out on D's
 lexer in D and I call it mark-slice range. An extension to forward range
 it seems.

 It's all about buffering and defining a position in input such that you
 don't care for anything up to this point. This means that starting from
 thusly marked point stuff needs to be kept in buffer, everything prior
 to it could be discarded. The 2nd operation "slice" is getting a slice
 of some internal buffer from last mark to the current position.
 ...

The lexer I have built last year does something similar. It allows the 
parser to save and restore sorted positions in FIFO order with one 
size_t of memory inside the parser's current stack frame (internally, 
the lexer only saves the first position). The data is kept in a circular 
buffer that grows dynamically in case the required lookahead is too large.

Mar 06 2013

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

07-Mar-2013 00:52, Timon Gehr пишет:
 On 03/05/2013 08:12 PM, Dmitry Olshansky wrote:
 ...

 There is one thing I found a nice abstraction while helping out on D's
 lexer in D and I call it mark-slice range. An extension to forward range
 it seems.

 It's all about buffering and defining a position in input such that you
 don't care for anything up to this point. This means that starting from
 thusly marked point stuff needs to be kept in buffer, everything prior
 to it could be discarded. The 2nd operation "slice" is getting a slice
 of some internal buffer from last mark to the current position.
 ...

 The lexer I have built last year does something similar. It allows the
 parser to save and restore sorted positions in FIFO order with one
 size_t of memory inside the parser's current stack frame (internally,
 the lexer only saves the first position). The data is kept in a circular
 buffer that grows dynamically in case the required lookahead is too large.

Exactly. Nice to see common patterns resurface, would be good to fit it 
elegantly into a native D i/o subsystem.

-- 
Dmitry Olshansky

Mar 06 2013

"BLM768" <blm768 gmail.com> writes:

On Tuesday, 5 March 2013 at 16:12:24 UTC, Steven Schveighoffer 
wrote:
 On Tue, 05 Mar 2013 03:22:00 -0500, Jonathan M Davis 
 <jmdavisProg gmx.com> wrote:

 In general, a stream _is_ a range, making a lot of "stream" 
 stuff basically
 irrelevant. What's needed then is a solid, efficient range 
 interface on top of
 I/O (which we're lacking at the moment).

 This is not correct.  A stream is a good low-level 
 representation of i/o.  A range is a good high-level 
 abstraction of that i/o.

Ranges aren't necessarily higher- or lower-level than streams; 
they're completely orthogonal ways of looking at a data source. 
It's completely possible to build a stream interface on top of a 
range of characters, which is what I was suggesting. In that 
situation, the range is at a lower level of abstraction than the 
stream is.

 Ranges make terrible streams for two reasons:

 1. r.front does not have room for 'read n bytes'.  Making it do 
 that is awkward (e.g. r.nextRead = 20; r.front; // read 20 
 bytes)

Create a range operation like "r.takeArray(n)". You can optimize 
it to take a slice of the buffer when possible.

 2. ranges have separate operations for getting data and 
 progressing data.  Streams by their very nature combine the two 
 in one operation (i.e. read)

Range operations like std.conv.parse implicitly progress their 
source ranges. For example:

auto stream = file.byChars;
while(!stream.empty) {
     doSomethingWithInt(stream.parse!int);
}

Except for the extra ".byChars", it's just as concise as any 
other stream, and it's more flexible than something that *only* 
provides a stream interface. It also saves some duplication of 
effort; everything can lean on std.conv.parse.

Besides, streams don't necessarily progress the data; C++ 
iostreams have peek(), after all.

 From what I see, at least in terms of the interface, a stream is 
basically just a generalization of a range that supports more 
than one type as input/output. There's no reason that such a 
system couldn't be built on top of a range, especially when the 
internal representation is of a single type: characters.

Mar 05 2013

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Tue, 05 Mar 2013 18:24:22 -0500, BLM768 <blm768 gmail.com> wrote:

 On Tuesday, 5 March 2013 at 16:12:24 UTC, Steven Schveighoffer wrote:
 On Tue, 05 Mar 2013 03:22:00 -0500, Jonathan M Davis  
 <jmdavisProg gmx.com> wrote:

 In general, a stream _is_ a range, making a lot of "stream" stuff  
 basically
 irrelevant. What's needed then is a solid, efficient range interface  
 on top of
 I/O (which we're lacking at the moment).

 This is not correct.  A stream is a good low-level representation of  
 i/o.  A range is a good high-level abstraction of that i/o.

 Ranges aren't necessarily higher- or lower-level than streams; they're  
 completely orthogonal ways of looking at a data source. It's completely  
 possible to build a stream interface on top of a range of characters,  
 which is what I was suggesting. In that situation, the range is at a  
 lower level of abstraction than the stream is.

I think you misunderstand.  Ranges absolutely can be a source for streams,  
especially if they are arrays.  The point is that the range *interface*  
doesn't make a good stream interface.  So we need to invent new methods to  
access streams.

 Ranges make terrible streams for two reasons:

 1. r.front does not have room for 'read n bytes'.  Making it do that is  
 awkward (e.g. r.nextRead = 20; r.front; // read 20 bytes)

 Create a range operation like "r.takeArray(n)". You can optimize it to  
 take a slice of the buffer when possible.

This is not a good idea.  We want streams to be high performance.   
Accepting any range, such as a dchar range that outputs one dchar at a  
time, is not going to be high performance.

On top of that, in some cases, the result will be a slice, in some cases  
it will be a copy.  Generic code will have to figure out that difference  
if it wants to save the data for later, or else risk double copying.

 2. ranges have separate operations for getting data and progressing  
 data.  Streams by their very nature combine the two in one operation  
 (i.e. read)

 Range operations like std.conv.parse implicitly progress their source  
 ranges.

That's not a range operation.  Range operations are empty, popFront,  
front.  Anything built on top of ranges must use ONLY these three  
operations, otherwise you are talking about something else.

It is possible to use random-access ranges for a valid stream source.  But  
that is not a valid stream interface, streams aren't random-access ranges.

 Besides, streams don't necessarily progress the data; C++ iostreams have  
 peek(), after all.

That is because the data is buffered.  At a low-level, we have to deal  
with the OS, which may not support peeking.

  From what I see, at least in terms of the interface, a stream is  
 basically just a generalization of a range that supports more than one  
 type as input/output. There's no reason that such a system couldn't be  
 built on top of a range, especially when the internal representation is  
 of a single type: characters.

streams shouldn't have to support the front/popFront mechanism.  empty may  
be the only commonality.  I think that is an awkward fit for ranges.   
Certainly it is possible to take a *specific* range, such as an array, and  
add a stream-like interface to it.  But not ranges in general.

-Steve

Mar 06 2013

Stewart Gordon <smjg_1998 yahoo.com> writes:

On 06/03/2013 16:36, Steven Schveighoffer wrote:
 On Tue, 05 Mar 2013 18:24:22 -0500, BLM768 <blm768 gmail.com> wrote:

<snip>
 Create a range operation like "r.takeArray(n)". You can optimize it to
 take a slice of the buffer when possible.

 This is not a good idea.  We want streams to be high performance.
 Accepting any range, such as a dchar range that outputs one dchar at a
 time, is not going to be high performance.

<snip>

That certain specific types of range can't implement a given operation 
efficiently isn't a reason to reject the idea.

If somebody tries using takeArray on a range that by its very nature can 
only pick off elements one by one, they should expect it to be as slow 
as a for loop.  OTOH, when used on a file, array or similar structure, 
it will perform much better than this.

But thinking about it now, maybe what we need is the concept of a "block 
input" range, which is an input range with the addition of the takeArray 
method.  Of course, standard D arrays would be block input ranges.  Then 
(for example) a library that reads a binary file format can be built to 
accept a block input range of bytes.

Stewart.

Mar 06 2013

"BLM768" <blm768 gmail.com> writes:

 That certain specific types of range can't implement a given 
 operation efficiently isn't a reason to reject the idea.

 If somebody tries using takeArray on a range that by its very 
 nature can only pick off elements one by one, they should 
 expect it to be as slow as a for loop.  OTOH, when used on a 
 file, array or similar structure, it will perform much better 
 than this.

 But thinking about it now, maybe what we need is the concept of 
 a "block input" range, which is an input range with the 
 addition of the takeArray method.  Of course, standard D arrays 
 would be block input ranges.  Then (for example) a library that 
 reads a binary file format can be built to accept a block input 
 range of bytes.

 Stewart.

That's basically what my thinking was, but you've expressed it in 
a better way than I think I could have. I'd definitely like to 
see this idea implemented; it could be useful for just about 
anything involving a buffer.

Mar 06 2013

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Wed, 06 Mar 2013 19:08:40 -0500, Stewart Gordon <smjg_1998 yahoo.com>  
wrote:

 On 06/03/2013 16:36, Steven Schveighoffer wrote:
 On Tue, 05 Mar 2013 18:24:22 -0500, BLM768 <blm768 gmail.com> wrote:

 <snip>
 Create a range operation like "r.takeArray(n)". You can optimize it to
 take a slice of the buffer when possible.

 This is not a good idea.  We want streams to be high performance.
 Accepting any range, such as a dchar range that outputs one dchar at a
 time, is not going to be high performance.

 <snip>

 That certain specific types of range can't implement a given operation  
 efficiently isn't a reason to reject the idea.

Sorry, but that is.

If we make it so streams are implicitly built out of low-performance  
ranges, they will be built out of low performance ranges.

There is always a mechanism to build a stream out of a range, it shouldn't  
be implicit.  Not every range makes a good stream.

 But thinking about it now, maybe what we need is the concept of a "block  
 input" range, which is an input range with the addition of the takeArray  
 method.  Of course, standard D arrays would be block input ranges.  Then  
 (for example) a library that reads a binary file format can be built to  
 accept a block input range of bytes.

I don't really understand the need to make ranges into streams.  Streams  
require a completely separate interface.  An object can be a range and a  
stream (e.g. array), but to say a stream is a specific kind of range, when  
ranges have nothing significant that streams need (front, popFront), is  
just "range fever".  Not everything is a range.

The range interface and the stream interface are orthogonal.  There is no  
overlap.

-Steve

Mar 07 2013

Johannes Pfau <nospam example.com> writes:

Am Thu, 07 Mar 2013 07:07:25 -0500
schrieb "Steven Schveighoffer" <schveiguy yahoo.com>:

 
 The range interface and the stream interface are orthogonal.  There
 is no overlap.
 
 -Steve


(IEnumerator) are basically the same thing as input ranges

also has iterators and streams.

Mar 07 2013

Stewart Gordon <smjg_1998 yahoo.com> writes:

On 07/03/2013 12:07, Steven Schveighoffer wrote:
<snip>
 I don't really understand the need to make ranges into streams.

<snip>

Ask Walter - from what I recall it was his idea to have range-based file 
I/O to replace std.stream.

Thikning about it now, a range-based interface might be good for reading 
files of certain kinds, but isn't suited to general file I/O.

Stewart.

Mar 08 2013

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Fri, 08 Mar 2013 20:59:33 -0500, Stewart Gordon <smjg_1998 yahoo.com>  
wrote:

 On 07/03/2013 12:07, Steven Schveighoffer wrote:
 <snip>
 I don't really understand the need to make ranges into streams.

 <snip>

 Ask Walter - from what I recall it was his idea to have range-based file  
 I/O to replace std.stream.

I hope to convince Walter the error of his ways :)

The problem with this idea, is that there isn't a proven design.  All  
designs I've seen that involve ranges don't look attractive, and end up  
looking like streams with an awkward range API tacked on.  I could be  
wrong, there could be that really great range API that nobody has  
suggested yet.  But from what I can tell, the desire to have ranges be  
streams is based on having all these methods that work with ranges,  
wouldn't it be cool if you could do that with streams too.

 Thikning about it now, a range-based interface might be good for reading  
 files of certain kinds, but isn't suited to general file I/O.

I think a range interface works great as a high level mechanism.  Like a  
range for xml parsing, front could be the current element, popFront could  
give you the next, etc.  I think with the design I have, it can be done  
with minimal buffering, and without double-buffering.

But I see no need to use a range to feed the range data from a file.

-Steve

Mar 08 2013

"Tyler Jameson Little" <beatgammit gmail.com> writes:

On Saturday, 9 March 2013 at 02:13:36 UTC, Steven Schveighoffer 
wrote:
 On Fri, 08 Mar 2013 20:59:33 -0500, Stewart Gordon 
 <smjg_1998 yahoo.com> wrote:

 On 07/03/2013 12:07, Steven Schveighoffer wrote:
 <snip>
 I don't really understand the need to make ranges into 
 streams.

 <snip>

 Ask Walter - from what I recall it was his idea to have 
 range-based file I/O to replace std.stream.

 I hope to convince Walter the error of his ways :)

 The problem with this idea, is that there isn't a proven 
 design.  All designs I've seen that involve ranges don't look 
 attractive, and end up looking like streams with an awkward 
 range API tacked on.  I could be wrong, there could be that 
 really great range API that nobody has suggested yet.  But from 
 what I can tell, the desire to have ranges be streams is based 
 on having all these methods that work with ranges, wouldn't it 
 be cool if you could do that with streams too.

 Thikning about it now, a range-based interface might be good 
 for reading files of certain kinds, but isn't suited to 
 general file I/O.

 I think a range interface works great as a high level 
 mechanism.  Like a range for xml parsing, front could be the 
 current element, popFront could give you the next, etc.  I 
 think with the design I have, it can be done with minimal 
 buffering, and without double-buffering.

 But I see no need to use a range to feed the range data from a 
 file.

 -Steve

I agree with this 100%, but I obviously am not the one making the 
decision.

My point in resurrecting this thread is that I'd like to start 
working on a few D libraries that will rely on streams, but I've 
been trying to hold off until this gets done. I'm sure there are 
plenty of others that would like to see streams get finished.

Do you have an ETA for when you'll have something for review? If 
not, do you have the code posted somewhere so others can help?

The projects I'm interested in working on are:

- HTTP library (probably end up pulling out some vibe.d stuff)
- SSH library (client/server)
- rsync library (built on SSH library)

You've probably already thought about this, but it would be 
really nice to either unread bytes or have some efficient way to 
get bytes without consuming them. This would help with writing an 
"until" function (read until either a new-line or N bytes have 
been read) when the exact number of bytes to read isn't known.

I'd love to help in testing things out. I'm okay with building 
against alpha-quality code, and I'm sure you'd like to get some 
feedback on the design as well.

Let me know if there's any way that I can help. I'm very 
interested in seeing this get finished sooner rather than later.

Jul 04 2013

"w0rp" <devw0rp gmail.com> writes:

I think you can win with both. You can have very convenient and 
general abstractions like ranges which perform very well too. In 
addition, you can provide all of the usual range features to make 
them compatible with generic algorithms, and a few extra methods 
for extra features, like changing the block size.

Jul 05 2013

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Thu, 04 Jul 2013 22:53:46 -0400, Tyler Jameson Little  
<beatgammit gmail.com> wrote:

 On Saturday, 9 March 2013 at 02:13:36 UTC, Steven Schveighoffer wrote:
 I think a range interface works great as a high level mechanism.  Like  
 a range for xml parsing, front could be the current element, popFront  
 could give you the next, etc.  I think with the design I have, it can  
 be done with minimal buffering, and without double-buffering.

 But I see no need to use a range to feed the range data from a file.

 -Steve

 I agree with this 100%, but I obviously am not the one making the  
 decision.

 My point in resurrecting this thread is that I'd like to start working  
 on a few D libraries that will rely on streams, but I've been trying to  
 hold off until this gets done. I'm sure there are plenty of others that  
 would like to see streams get finished.

 Do you have an ETA for when you'll have something for review? If not, do  
 you have the code posted somewhere so others can help?

I realize this is really old, and I sort of dropped off the D cliff  
because all of a sudden I had 0 extra time.

But I am going to get back into working on this (if it's still an issue, I  
still need to peruse the NG completely to see what has happened in the  
last few months). I have something that is really old but was working. At  
this point, I wouldn't recommend reading the code, just the design, but  
it's in my github account here:

https://github.com/schveiguy/phobos/tree/new-io2

Wow, it's 2 years old. Time flies.

 The projects I'm interested in working on are:

 - HTTP library (probably end up pulling out some vibe.d stuff)
 - SSH library (client/server)
 - rsync library (built on SSH library)

 You've probably already thought about this, but it would be really nice  
 to either unread bytes or have some efficient way to get bytes without  
 consuming them. This would help with writing an "until" function (read  
 until either a new-line or N bytes have been read) when the exact number  
 of bytes to read isn't known.

Yes, this is part of the design.

 I'd love to help in testing things out. I'm okay with building against  
 alpha-quality code, and I'm sure you'd like to get some feedback on the  
 design as well.

At this point, the design is roughly done, and the code was working, but 2  
years ago :) The new-io2 branch probably doesn't work. The new-io branch  
should work, but I had to rip apart the design due to objections of how I  
designed it. The guts will be the same though.

 Let me know if there's any way that I can help. I'm very interested in  
 seeing this get finished sooner rather than later.

At this point, maybe you have lost interest. But if not, I wouldn't mind  
having help on it. Send me an email if you still are.

-Steve

Dec 14 2013

Jacob Carlborg <doob me.com> writes:

On 2013-12-14 15:53, Steven Schveighoffer wrote:

 I realize this is really old, and I sort of dropped off the D cliff
 because all of a sudden I had 0 extra time.

 But I am going to get back into working on this (if it's still an issue,
 I still need to peruse the NG completely to see what has happened in the
 last few months).

Yeah, it still need to be replaced. In this case you can have a look at 
the review queue to see what's being worked on:

http://wiki.dlang.org/Review_Queue

-- 
/Jacob Carlborg

Dec 14 2013

"sclytrack" <sclytrack fake.com> writes:

On Saturday, 14 December 2013 at 15:16:50 UTC, Jacob Carlborg 
wrote:
 On 2013-12-14 15:53, Steven Schveighoffer wrote:

 I realize this is really old, and I sort of dropped off the D 
 cliff
 because all of a sudden I had 0 extra time.

 But I am going to get back into working on this (if it's still 
 an issue,
 I still need to peruse the NG completely to see what has 
 happened in the
 last few months).

 Yeah, it still need to be replaced. In this case you can have a 
 look at the review queue to see what's being worked on:

 http://wiki.dlang.org/Review_Queue


SINK, TAP
---------


https://github.com/schveiguy/phobos/blob/new-io/std/io.d

What about adding a single property named sink or tap depending
on how you want the chain to be. That could be either a struct or
a class. Each sink would provide another interface.


struct/class ArchiveWriter(SINK)
{
	 property sink		//pointer to sink
}



writer.sink.sink.sink
arch.sink.sink.sink.open("filename");


ArchiveReader!(InputStream) * reader;


"Warning: As usual I don't know what I'm talking about."

Apr 16 2014

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Wed, 16 Apr 2014 12:09:49 -0400, sclytrack <sclytrack fake.com> wrote:

 On Saturday, 14 December 2013 at 15:16:50 UTC, Jacob Carlborg wrote:
 On 2013-12-14 15:53, Steven Schveighoffer wrote:

 I realize this is really old, and I sort of dropped off the D cliff
 because all of a sudden I had 0 extra time.

 But I am going to get back into working on this (if it's still an  
 issue,
 I still need to peruse the NG completely to see what has happened in  
 the
 last few months).

 Yeah, it still need to be replaced. In this case you can have a look at  
 the review queue to see what's being worked on:

 http://wiki.dlang.org/Review_Queue


 SINK, TAP
 ---------


 https://github.com/schveiguy/phobos/blob/new-io/std/io.d

 What about adding a single property named sink or tap depending
 on how you want the chain to be. That could be either a struct or
 a class. Each sink would provide another interface.

Chaining i/o objects is something I have yet to tackle. I have ideas, but  
I'll wait until I have posted some updated code (hopefully soon). I want  
it to work like ranges/unix pipes.

The single most difficult thing is making it drop-in-replacement for  
std.stdio.File. But I'm close...

-Steve

Apr 17 2014

"Tero" <sghtr naesaatbh.invalid> writes:

While waiting for the new stream I wrote myself a stream for file 
io only.
http://dpaste.dzfl.pl/bc470f96b357

Hope it helps your work somehow. Maybe at least the unittests are 
helpful?

May 28 2014

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Wed, 28 May 2014 06:28:25 -0400, Tero <sghtr naesaatbh.invalid> wrote:

 While waiting for the new stream I wrote myself a stream for file io  
 only.
 http://dpaste.dzfl.pl/bc470f96b357

 Hope it helps your work somehow. Maybe at least the unittests are  
 helpful?

Cool. I actually have made some progress, I have a new-io3 branch on  
github. Nothing finalized yet, but it does do basic input and output in  
all encodings.

I will probably rewrite the entire API at least twice before it's ready  
(in fact, already doing that), but the guts will be similar.

I will take a look at your code to see if there's anything I can use,  
thanks!

-Steve

May 28 2014

"Tero" <sfasfs didlidildied.invalid> writes:

Just noticed, the paste was screwed. Had a weird character in a 
comment which seemed to confuse dpaste.

Here's the full code:
http://dpaste.dzfl.pl/fc2073c19e7d

On Thursday, 29 May 2014 at 03:43:32 UTC, Steven Schveighoffer 
wrote:
 On Wed, 28 May 2014 06:28:25 -0400, Tero 
 <sghtr naesaatbh.invalid> wrote:

 While waiting for the new stream I wrote myself a stream for 
 file io only.
 http://dpaste.dzfl.pl/bc470f96b357

 Hope it helps your work somehow. Maybe at least the unittests 
 are helpful?

 Cool. I actually have made some progress, I have a new-io3 
 branch on github. Nothing finalized yet, but it does do basic 
 input and output in all encodings.

 I will probably rewrite the entire API at least twice before 
 it's ready (in fact, already doing that), but the guts will be 
 similar.

 I will take a look at your code to see if there's anything I 
 can use, thanks!

 -Steve

May 28 2014

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On Saturday, March 09, 2013 01:59:33 Stewart Gordon wrote:
 On 07/03/2013 12:07, Steven Schveighoffer wrote:
 <snip>
 
 I don't really understand the need to make ranges into streams.

 
 <snip>
 
 Ask Walter - from what I recall it was his idea to have range-based file
 I/O to replace std.stream.
 
 Thikning about it now, a range-based interface might be good for reading
 files of certain kinds, but isn't suited to general file I/O.

In general, ranges should work just fine for I/O as long as they have an 
efficient implementation which underneathbuffers (and preferably makes them 
forward ranges). Aside from how its implemented internally, there's no real 
difference between operating on a range over a file and any other range. The 
trick is making it efficient internally. Doing something like reading a 
character at a time from a file every time that popFront is called would be 
horrible, but with buffering, it should be just fine. Now, you're not going to 
get a random-access range that way, but it should work fine as a forward range, 
and std.mmfile will probably give you want you want if an RA range is what you 
really need (and that, we have already).

- Jonathan M Davis

Mar 08 2013

Stewart Gordon <smjg_1998 yahoo.com> writes:

On 09/03/2013 02:30, Jonathan M Davis wrote:
<snip>
 In general, ranges should work just fine for I/O as long as they have an
 efficient implementation which underneathbuffers (and preferably makes them
 forward ranges). Aside from how its implemented internally, there's no real
 difference between operating on a range over a file and any other range. The
 trick is making it efficient internally. Doing something like reading a
 character at a time from a file every time that popFront is called would be
 horrible, but with buffering, it should be just fine.

If examining one byte at a time is what you want.  I mean this at the 
program logic level, not just the implementation level.  The fact 
remains that most applications want to look at bigger portions of the 
file at a time.

     ubyte[] data;
     data.length = 100;
     foreach (ref b; data) b = file.popFront();

Even with buffering, a block memory copy is likely to be more efficient 
than transferring each byte individually.

You could provide direct memory access to the buffer, but this creates 
further complications if you want to read variable-size chunks.  Further 
variables that affect the best way to do it include whether you want to 
keep hold of previously read chunks and whether you want to perform 
in-place modifications of the read-in data.

 Now, you're not going to
 get a random-access range that way, but it should work fine as a forward range,
 and std.mmfile will probably give you want you want if an RA range is what you
 really need (and that, we have already).

Yes, random-access file I/O is another thing.  I was thinking primarily 
of cases where you want to just read the file through and process it 
while doing so.  I imagine that most word processors, graphics editors, 
etc. will read the file and then generate the file afresh when you save, 
rather than just writing the changes to the file.

And then there are web browsers, which read files of various types both 
from the user's local file storage and over an HTTP connection.

Stewart.

Mar 09 2013

Marco Leise <Marco.Leise gmx.de> writes:

Am Sat, 09 Mar 2013 16:30:24 +0000
schrieb Stewart Gordon <smjg_1998 yahoo.com>:

 Yes, random-access file I/O is another thing.  I was thinking primarily 
 of cases where you want to just read the file through and process it 
 while doing so.  I imagine that most word processors, graphics editors, 
 etc. will read the file and then generate the file afresh when you save, 
 rather than just writing the changes to the file.
 
 And then there are web browsers, which read files of various types both 
 from the user's local file storage and over an HTTP connection.
 
 Stewart.

For most binary formats you need to deal with endianness for
short/int/long and blocks of either fixed size or with two
versions (e.g. a revised extended bitmap header) or
alltogether dynamic size. Some formats may also reading the
last bytes first, like ID3 tags in MP3s. And then there are
compressed formats with data types of < 8 bits or dynamic bit
allocations.

It's all obvious, but I had a feeling your use cases are too
restricted. Anyways I no longer know what the discrimination
between std.io and std.streams will be.

-- 
Marco

Mar 10 2013

Stewart Gordon <smjg_1998 yahoo.com> writes:

On 10/03/2013 15:48, Marco Leise wrote:
<snip>
 For most binary formats you need to deal with endianness for
 short/int/long

Endian conversion is really part of decoding the data, rather than of 
reading the file.  As such, it should be a layer over the raw file I/O 
API/implementation.

And probably as often as not, you want to read in or write out a struct 
that includes some multi-byte numerical values, e.g. an image file 
header which has width, height, colour type, bit depth, possibly a few 
other parameters such as compression or interlacing, and not all of 
which will be integers of the same size.  ISTM the most efficient way to 
do this is to read the block of bytes from the file, and then do the 
byte-order conversions in the file-format-specific code.

 and blocks of either fixed size or with two versions (e.g. a revised
 extended bitmap header)or alltogether dynamic size.

Yes, that's exactly why we have in std.stream a method that reads a 
number of bytes specified at runtime, and why it is a fundamental part 
of any stream API that is designed to work on binary files.

 Some formats may also reading the
 last bytes first, like ID3 tags in MP3s.

Do you mean ID3 data is stored backwards in MP3 files?  Still, that's 
half the reason that file streams tend to be seekable.

 And then there are compressed formats with data types of < 8 bits or
 dynamic bit allocations.

But:
- it's a very specialised application
- I would expect most compressed file formats to still have byte-level 
structure
- implementing this would be complicated given bit-order considerations 
and the way that the OS (and possibly even the hardware) manipulates files

As such, this should be implemented as a layer over the raw stream API.

 It's all obvious, but I had a feeling your use cases are too
 restricted.

<snip>

The cases I've covered are the cases that seem to me to be what should 
be covered by a general-purpose stream API.

Stewart.

Mar 10 2013

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Fri, Mar 08, 2013 at 09:30:30PM -0500, Jonathan M Davis wrote:
 On Saturday, March 09, 2013 01:59:33 Stewart Gordon wrote:
 On 07/03/2013 12:07, Steven Schveighoffer wrote:
 <snip>
 
 I don't really understand the need to make ranges into streams.

 
 <snip>
 
 Ask Walter - from what I recall it was his idea to have range-based
 file I/O to replace std.stream.
 
 Thikning about it now, a range-based interface might be good for
 reading files of certain kinds, but isn't suited to general file
 I/O.

 
 In general, ranges should work just fine for I/O as long as they have
 an efficient implementation which underneathbuffers (and preferably
 makes them forward ranges). Aside from how its implemented internally,
 there's no real difference between operating on a range over a file
 and any other range. The trick is making it efficient internally.
 Doing something like reading a character at a time from a file every
 time that popFront is called would be horrible, but with buffering, it
 should be just fine. Now, you're not going to get a random-access
 range that way, but it should work fine as a forward range, and
 std.mmfile will probably give you want you want if an RA range is what
 you really need (and that, we have already).

[...]

I think the new std.stream should have a low-level stream API based on
reading & simultaneously advancing by n bytes. This is still the most
efficient approach for low-level file I/O.

On top of this core, we can provide range-based APIs which are backed by
buffers implemented using the stream API. Conceptually, it could be
something like this:

	module std.stream;

	struct FileStream {
		File _impl;
		...

		// Low-level stream API
		void read(T)(ref T[] buffer, size_t n);
		bool eof();
	}

	struct BufferedStream(T, SrcStream) {
		SrcStream impl;
		T[]    buffer;
		size_t readPos;

		enum BufSize = ...; // some suitable value

		this() {
			buffer.length = BufSize;
		}

		// Range API
		T front() { return buffer[readPos]; }
		bool empty() {
			return impl.eof && readPos >= buffer.length;
		}
		void popFront() {
			if (++readPos >= buffer.length) {
				// Load next chunk of file into buffer
				impl.read(buffer, BufSize);
				readPos = 0;
			}
		}
	}

Suitable adaptor functions/structs/etc. can be used for automatically
converting between streams and range APIs via BufferedStream, etc..

As for making ranges into streams: it could be useful for transparently
substituting, say, a string buffer for file input for generic code that
operates on streams. I'm not sure if ranges are the right thing to use
here, though; if all you have is an input stream, then generic code that
uses BufferedStream on top that would be horribly inefficient. It may
make more sense to require an array.

Another approach could be to extend the idea of a range, to have, for
lack of a better term, a StreamRange or something of the sort, that
provides a read() method (or maybe more suitably named, like
copyFrontN() or something along those lines) that is equivalent to
copying .front and calling popFront n times. But we already have trouble
taming the current variety of ranges, so I'm not sure if this is a good
idea or not.  Jonathan probably will hate the idea of introducing yet
another range type to the mix. :)


T

-- 
"How are you doing?" "Doing what?"

Mar 08 2013

"BLM768" <blm768 gmail.com> writes:

On Wednesday, 6 March 2013 at 16:36:38 UTC, Steven Schveighoffer 
wrote:
 On Tue, 05 Mar 2013 18:24:22 -0500, BLM768 <blm768 gmail.com> 
 wrote:
 Ranges aren't necessarily higher- or lower-level than streams; 
 they're completely orthogonal ways of looking at a data 
 source. It's completely possible to build a stream interface 
 on top of a range of characters, which is what I was 
 suggesting. In that situation, the range is at a lower level 
 of abstraction than the stream is.

 I think you misunderstand.  Ranges absolutely can be a source 
 for streams, especially if they are arrays.  The point is that 
 the range *interface* doesn't make a good stream interface.  So 
 we need to invent new methods to access streams.

Although I probably didn't communicate it very well, my idea was 
that since we already have functions like std.conv.parse that 
essentially provide parts of a stream interface on top of ranges, 
the most convenient way to implement a stream might be to build 
it on top of a range interface so no code duplication is needed.

 Create a range operation like "r.takeArray(n)". You can 
 optimize it to take a slice of the buffer when possible.

 This is not a good idea.  We want streams to be high 
 performance.  Accepting any range, such as a dchar range that 
 outputs one dchar at a time, is not going to be high 
 performance.

If the function is optimized, it can essentially bypass the range 
layer and operate directly on the buffer while using the same 
interface it would use if it were operating on the range. As I 
understand it, some of the operations in Phobos do that as well 
when given arrays.

 On top of that, in some cases, the result will be a slice, in 
 some cases it will be a copy.  Generic code will have to figure 
 out that difference if it wants to save the data for later, or 
 else risk double copying.

That could definitely be an issue. It should be possible to 
enforce slicing semantics somehow, but I'd have to think about it.

 Range operations like std.conv.parse implicitly progress their 
 source ranges.

 That's not a range operation.  Range operations are empty, 
 popFront, front.  Anything built on top of ranges must use ONLY 
 these three operations, otherwise you are talking about 
 something else.

I guess that's not the right terminology for what I'm trying to 
express. I was thinking of "operations that act on ranges."


 From what I see, at least in terms of the interface, a stream 
 is basically just a generalization of a range that supports 
 more than one type as input/output. There's no reason that 
 such a system couldn't be built on top of a range, especially 
 when the internal representation is of a single type: 
 characters.

 streams shouldn't have to support the front/popFront mechanism.
  empty may be the only commonality.  I think that is an awkward 
 fit for ranges.  Certainly it is possible to take a *specific* 
 range, such as an array, and add a stream-like interface to it.
  But not ranges in general.

I hadn't considered the case of r.front; I was only thinking 
about r.popFront. Looks like they're a little more different than 
I was thinking, but they're still very similar under certain 
conditions.

Ultimately, we do need some type of a traditional stream 
interface; I was just thinking about using ranges behind the 
scenes and using existing pieces of the standard library for 
stream operations rather than putting all of the operations into 
a unified data type. I'm not sure if it could really be called an 
"ideal" design, but I do think that it could provide a good 
minimalist solution with performance that would be acceptable for 
at least many applications.

Mar 06 2013

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Wed, 06 Mar 2013 20:15:31 -0500, BLM768 <blm768 gmail.com> wrote:

 On Wednesday, 6 March 2013 at 16:36:38 UTC, Steven Schveighoffer wrote:
 On Tue, 05 Mar 2013 18:24:22 -0500, BLM768 <blm768 gmail.com> wrote:
 Ranges aren't necessarily higher- or lower-level than streams; they're  
 completely orthogonal ways of looking at a data source. It's  
 completely possible to build a stream interface on top of a range of  
 characters, which is what I was suggesting. In that situation, the  
 range is at a lower level of abstraction than the stream is.

 I think you misunderstand.  Ranges absolutely can be a source for  
 streams, especially if they are arrays.  The point is that the range  
 *interface* doesn't make a good stream interface.  So we need to invent  
 new methods to access streams.

 Although I probably didn't communicate it very well, my idea was that  
 since we already have functions like std.conv.parse that essentially  
 provide parts of a stream interface on top of ranges, the most  
 convenient way to implement a stream might be to build it on top of a  
 range interface so no code duplication is needed.

My point is, we should not build streams from ranges.  We have to  
establish terminology here.  A range is an API which provides a way to  
iterate over each element in a source using the methods front, popFront,  
and empty.

A basic stream provides a single function: read.  This function reads N  
bytes into an array, and advances the stream position.  Not a range, an  
array.  That is the basic building block that the OS gives us.  You can  
make read out of front, popFront, and empty, but it's going to be horribly  
low-performing, and I see no benefit to have read sit alongside the range  
primitives.

On top of that, we provide a buffered stream which manages the array the  
lower-level stream outputs, and allows access to data a chunk at a time.   
What defines that chunk is application-specific.

At a higher level is where ranges and streams meet.  front can provide  
access to a chunk, popFront can move on to the next chunk, and empty maps  
to EOF (last read returned 0 bytes).  That is a great mapping, and I  
expect it will be the preferred interface.  What I want to provide with  
std.io is an easy way to build ranges on top of streams by defining a  
mechanism to build the chunk.

But to say that streams are ranges at heart is incorrect.  Streams need  
the read feature, they don't need range features.

Now, if you want to shoehorn a range into a stream, I certainly can see  
how it will be possible.  Extremely slow, but possible.  That should be  
the last resort.  It shouldn't be the foundation.

There is the temptation to say "hey, arrays are ranges, and arrays make  
good stream sources!  Why can't all ranges make good stream sources?"  But  
arrays are good stream sources NOT because they are ranges, but because  
they are arrays.  Reading an array into an array is a noop.

 Create a range operation like "r.takeArray(n)". You can optimize it to  
 take a slice of the buffer when possible.

 This is not a good idea.  We want streams to be high performance.   
 Accepting any range, such as a dchar range that outputs one dchar at a  
 time, is not going to be high performance.

 If the function is optimized, it can essentially bypass the range layer  
 and operate directly on the buffer while using the same interface it  
 would use if it were operating on the range. As I understand it, some of  
 the operations in Phobos do that as well when given arrays.

This is the wrong track to take.

There have been quite a few people in the D community that have advocated  
for the syntax:

int[] arr;

auto p = 5 in arr;

Just like AAs.  It looks great!  Why shouldn't we have a way to search for  
data with such a concise interface?  The problem is then that diminishes  
the value of 'in'.  For AAs, this lookup is O(1) amortized, For an array,  
it's O(n).  This means any time a coder sees x in y, he has to consider  
whether that is a "slow lookup" or a "quick lookup".  Not only that, but  
generic code that uses the in operation has to insert caveats "this  
function is O(n) if T is an array, otherwise it's O(1)".  The situation is  
not something we want.

But if you still want to find 5 in arr, there is the not-as-nice, but  
certainly reasonable looking:

auto p = arr.find(5).ptr;

My point is, we don't want any range to substitute for a stream.  I think  
it might be worth considering accepting random-access ranges, or  
slice-assignable ranges to be stream sources, but not just any range.  We  
could provide a "RangeStream" type which shoehorns any range into a  
stream, but I'd want it tucked in some shadowy corner of Phobos, not to be  
used except in emergencies when nothing else will do.  It should be  
discouraged.

 Range operations like std.conv.parse implicitly progress their source  
 ranges.

 That's not a range operation.  Range operations are empty, popFront,  
 front.  Anything built on top of ranges must use ONLY these three  
 operations, otherwise you are talking about something else.

 I guess that's not the right terminology for what I'm trying to express.  
 I was thinking of "operations that act on ranges."

What I don't want is to accept ranges as streams.  For example, if we have  
an isInputStream trait, it should not accept ranges.  But you certainly  
can use existing phobos functions to shoehorn ranges into a stream-like  
API.

 Ultimately, we do need some type of a traditional stream interface; I  
 was just thinking about using ranges behind the scenes and using  
 existing pieces of the standard library for stream operations rather  
 than putting all of the operations into a unified data type. I'm not  
 sure if it could really be called an "ideal" design, but I do think that  
 it could provide a good minimalist solution with performance that would  
 be acceptable for at least many applications.

I hope my above comments have made clear that I am not against having  
ranges be forcibly changed into streams.  What I don't want is ranges  
implicitly treated as streams.  Certainly, we have a lot of existing  
range-processing code that could be leveraged.  But streams and ranges are  
different concepts, different APIs even.  Building bridges between the two  
should be possible, and ranges will make great interfaces to streams.

-Steve

Mar 07 2013

"BLM768" <blm768 gmail.com> writes:

On Thursday, 7 March 2013 at 12:42:23 UTC, Steven Schveighoffer 
wrote:
 If the function is optimized, it can essentially bypass the 
 range layer and operate directly on the buffer while using the 
 same interface it would use if it were operating on the range. 
 As I understand it, some of the operations in Phobos do that 
 as well when given arrays.

 This is the wrong track to take.

 There have been quite a few people in the D community that have 
 advocated for the syntax:

 int[] arr;

 auto p = 5 in arr;

 Just like AAs.  It looks great!  Why shouldn't we have a way to 
 search for data with such a concise interface?  The problem is 
 then that diminishes the value of 'in'.  For AAs, this lookup 
 is O(1) amortized, For an array, it's O(n).  This means any 
 time a coder sees x in y, he has to consider whether that is a 
 "slow lookup" or a "quick lookup".  Not only that, but generic 
 code that uses the in operation has to insert caveats "this 
 function is O(n) if T is an array, otherwise it's O(1)".  The 
 situation is not something we want.

Maybe "takeArray" is a bad design, but it was just an example. 
The "block input"/"slice-assignable" range idea would still work 
well, though.

 We could provide a "RangeStream" type which shoehorns any range 
 into a stream, but I'd want it tucked in some shadowy corner of 
 Phobos, not to be used except in emergencies when nothing else 
 will do.  It should be discouraged.

One of my main reasons for wanting ranges as the input was to 
allow this sort of an interface. This looks like a usable 
solution for that need.

 I hope my above comments have made clear that I am not against 
 having ranges be forcibly changed into streams.  What I don't 
 want is ranges implicitly treated as streams.

I'd say that my idea is more about having ranges implicitly 
treated as stream sources rather than as true streams, but having 
a method to explicitly make them stream sources would still be 
quite usable.

Ultimately, I think that the differences between our designs boil 
down to having a more monolithic stream interface with an 
internal stream source or having a lighter-weight but more ad-hoc 
stream interface with an external and more exposed stream source. 
At this point, I'd probably be happy with either as long as they 
have equivalent functionality.

Mar 07 2013

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Thu, 07 Mar 2013 20:52:49 -0500, BLM768 <blm768 gmail.com> wrote:


 Ultimately, I think that the differences between our designs boil down  
 to having a more monolithic stream interface with an internal stream  
 source or having a lighter-weight but more ad-hoc stream interface with  
 an external and more exposed stream source. At this point, I'd probably  
 be happy with either as long as they have equivalent functionality.

One thing to remember is that streams need to be runtime swappable.  For  
instance, I should be able to replace stdout with a stream of my choice.

This isn't possible if we only use compile-time API (i.e. templates).  But  
that doesn't preclude us from having templates and ranges on TOP of those  
streams.

When it is all finished, I think it won't be that bad to use.

-Steve

Mar 08 2013

"BLM768" <blm768 gmail.com> writes:

 One thing to remember is that streams need to be runtime 
 swappable.  For instance, I should be able to replace stdout 
 with a stream of my choice.

That does make my solution a little tougher to implement. Hmmm...

It looks like a monolithic type is the easiest solution, but it 
definitely should have range support somewhere. Since that's 
already planned (at least as I understand it), I guess I don't 
really have any complaints about it.

Now, I wouldn't mind if you made the default source a 
"block-input range", since it could have very similar performance 
characteristics to an integrated source and would provide a 
useful range for other stuff, but an integrated source would be 
manageable and probably just a hair faster.

Mar 08 2013

D Programming

C/C++ Programming

Other

digitalmars.D - std.stream replacement