digitalmars.D - deprecating std.stream, std.cstream, std.socketstream

Walter Bright (8/8) May 13 2012 This discussion started in the thread "Getting the const-correctness of ...

=?ISO-8859-1?Q?Alex_R=F8nne_Petersen?= (5/13) May 13 2012 I'm all for it. I haven't used any of them, ever, and probably never
Nathan M. Swan (4/5) May 13 2012 I was just about to make a post suggesting that! You could easily
Kiith-Sa (8/18) May 13 2012 My D:YAML library (YAML parser) depends on std.stream

Nathan M. Swan (9/12) May 13 2012 We also need better interfacing with UTFs in that. D is usually

Walter Bright (2/4) May 13 2012 Yes, std.utf should be upgraded to present range interfaces.

Jonas Drewsen (4/9) May 15 2012 +1 on that.

Nathan M. Swan (9/12) May 13 2012 We also need better interfacing with UTFs in that. D is usually
Andrej Mitrovic (2/3) May 14 2012 Also ae.xml depends on it.

Robert Clipsham (8/16) May 13 2012 I make use of std.stream quite a lot... It's horrible, it has to go.
Oleg Kuporosov (11/12) May 13 2012 unfortunatelly std.stdio under Windows couldn't handle

Walter Bright (2/10) May 14 2012 Why not just convert the UTF16 strings to UTF8 ones? They have the same ...

Stewart Gordon (12/21) May 14 2012 I don't see any of the required range methods in it.

Walter Bright (3/7) May 14 2012 I agree. But that's where the effort needs to be made, not in carrying s...

Steven Schveighoffer (15/23) May 14 2012 I keep trying to avoid talking about this, because I'm writing a

Walter Bright (15/26) May 14 2012 I'll say in advance without seeing your design that it'll be a tough sel...

Lars T. Kyllingstad (9/33) May 15 2012 I have to say, I'm with Steve on this one. While I do believe

Lars T. Kyllingstad (8/43) May 15 2012 Also, I wouldn't mind std.*stream getting deprecated.

Steven Schveighoffer (21/57) May 16 2012 I think we may have a misunderstanding. My design is not range-based, b...

Nathan M. Swan (9/12) May 15 2012 There are several cases where one would want one byte at the

Sean Kelly (18/24) May 15 2012 byte at a time?). A stream of UTF text broken into lines, a very good ...

Steven Schveighoffer (10/16) May 16 2012 My new design supports this. I have a function called readUntil:

travert phare.normalesup.org (Christophe Travert) (23/33) May 16 2012 Maybe I already told this some time ago, but I am not very comfortable

Steven Schveighoffer (33/63) May 16 2012 The delegate is given which portion has already been "processed", that i...

Walter Bright (2/8) May 16 2012 std.stdio.byLine()

Sean Kelly (8/17) May 16 2012 nal

Walter Bright (2/16) May 16 2012 Then you'll need an input range that can be reset - a ForwardRange.

H. S. Teoh (7/13) May 15 2012 This would be very nice to have, but how would you go about implementing
Walter Bright (2/4) May 16 2012 I don't see why that should be true.

Steven Schveighoffer (4/9) May 16 2012 How do you tell front and popFront how many bytes to read?

Robert Clipsham (16/26) May 16 2012 A bit ugly but:

Steven Schveighoffer (4/29) May 16 2012 Yeah, I've seen this before. It's not convincing.

Dmitry Olshansky (8/38) May 16 2012 Yes, It's obvious that files do *not* generally follow range of items

Steven Schveighoffer (8/31) May 16 2012 The best solution would be a range that's specific to your format. My

Walter Bright (3/11) May 16 2012 std.byLine() does it.

Stewart Gordon (4/17) May 16 2012 Why would anybody want to read a large binary file _one byte at a time_?

H. S. Teoh (11/33) May 16 2012 [...]

Stewart Gordon (4/14) May 16 2012 What if I want it to work on ranges that don't have slicing?

Walter Bright (3/20) May 16 2012 You can have that range read from byChunk(). It's really the same thing ...

Steven Schveighoffer (7/33) May 16 2012 This is very wrong. byChunk doesn't cut it. The number of bytes to
Stewart Gordon (4/5) May 16 2012 And what if I want it to work on ranges that don't have a byChunk method...

Steven Schveighoffer (24/40) May 16 2012 Have you looked at how std.byLine works? It certainly does not use a

Walter Bright (3/22) May 16 2012 You can read arbitrary numbers of bytes by tacking a range on after byCh...

Steven Schveighoffer (13/46) May 16 2012 But that is *the point*! The code deciding how much data to read (i.e. ...

Andrei Alexandrescu (7/8) May 16 2012 This is copiously clear to me, but the way I like to think about it is

Steven Schveighoffer (16/23) May 16 2012 What I think we would end up with is a streaming API with range primitiv...

Andrei Alexandrescu (13/26) May 16 2012 Where the two meet is in the notion of buffered streams. Ranges are by

Timon Gehr (3/31) May 16 2012 I don't think this necessarily holds. 'front' might be computed on the

Andrei Alexandrescu (4/29) May 16 2012 It used to be buffered in fact but that was too much trouble. The fair

Adam D. Ruppe (29/32) May 16 2012 I tried this in cgi.d somewhat recently. It ended up
H. S. Teoh (37/45) May 16 2012 [...]

Steven Schveighoffer (19/42) May 16 2012 On such ranges, what would popFront and front do? I'm assuming since

H. S. Teoh (25/65) May 16 2012 How so? It's still useful for implementing readByte, for example.

Steven Schveighoffer (17/79) May 16 2012 readByte is covered by frontN(1). Why the need for front()?

H. S. Teoh (14/87) May 16 2012 If this new type of range is recognized by std.range, then the relevant

jerro (12/33) May 16 2012 I like the idea of frontN and popN. But is there any reason why

Artur Skawina (30/88) May 16 2012 Right now, everybody reinvents this, with a slightly different interface...

Steven Schveighoffer (9/17) May 16 2012 But you never would want to. Don't get me wrong, the primitives here

Artur Skawina (21/33) May 16 2012 Well, I do want to. For example, I can pass the produced data to *any* r...

Adam D. Ruppe (52/52) May 16 2012 tbh, I've found byChunk to be less than worthless

=?ISO-8859-1?Q?Alex_R=F8nne_Petersen?= (4/12) May 14 2012 While we're at it, do we want to keep std.outbuffer?

Walter Bright (2/3) May 14 2012 Since it's not range based, probably not.

H. S. Teoh (8/12) May 14 2012 Why not just fold this into std.io? I'm surprised that this is a

Walter Bright (2/8) May 14 2012 It's not I/O.
Dmitry Olshansky (4/13) May 15 2012 It's std.array Appender. The only difference is text vs binary output fo...

Walter Bright <newshound2 digitalmars.com> writes:

This discussion started in the thread "Getting the const-correctness of Object 
sorted once and for all", but it deserved its own thread.

These modules suffer from the following problems:

1. poor documentation, dearth of examples & rationale

2. toHash(), toString(), etc., all need to be const pure nothrow, but it's 
turning out to be problematic for doing it for these classes

3. overlapping functionality with std.stdio

4. they should present a range interface, not a streaming one

May 13 2012

=?ISO-8859-1?Q?Alex_R=F8nne_Petersen?= <xtzgzorex gmail.com> writes:

On 13-05-2012 23:38, Walter Bright wrote:
 This discussion started in the thread "Getting the const-correctness of
 Object sorted once and for all", but it deserved its own thread.

 These modules suffer from the following problems:

 1. poor documentation, dearth of examples & rationale

 2. toHash(), toString(), etc., all need to be const pure nothrow, but
 it's turning out to be problematic for doing it for these classes

 3. overlapping functionality with std.stdio

 4. they should present a range interface, not a streaming one

I'm all for it. I haven't used any of them, ever, and probably never 
will. Their APIs aren't particularly appealing, honestly.

-- 
- Alex

May 13 2012

"Nathan M. Swan" <nathanmswan gmail.com> writes:

On Sunday, 13 May 2012 at 21:39:07 UTC, Walter Bright wrote:
 4. they should present a range interface, not a streaming one

I was just about to make a post suggesting that! You could easily 
integrate std.io with std.algorithm to do some pretty cool things.

NMS

May 13 2012

"Kiith-Sa" <42 theanswer.com> writes:

On Sunday, 13 May 2012 at 21:39:07 UTC, Walter Bright wrote:
 This discussion started in the thread "Getting the 
 const-correctness of Object sorted once and for all", but it 
 deserved its own thread.

 These modules suffer from the following problems:

 1. poor documentation, dearth of examples & rationale

 2. toHash(), toString(), etc., all need to be const pure 
 nothrow, but it's turning out to be problematic for doing it 
 for these classes

 3. overlapping functionality with std.stdio

 4. they should present a range interface, not a streaming one


My D:YAML library (YAML parser) depends on std.stream
(e.g. for cross-endian compatibility and loading from memory),
and I've been waiting for a replacement since the first release.

I support removing std.stream, but it needs a replacement with
equivalent functionality.

Actually, I've postponed a 1.0 release _until_ std.stream is 
replaced.

May 13 2012

"Nathan M. Swan" <nathanmswan gmail.com> writes:

On Sunday, 13 May 2012 at 21:53:58 UTC, Kiith-Sa wrote:
 My D:YAML library (YAML parser) depends on std.stream
 (e.g. for cross-endian compatibility and loading from memory),
 and I've been waiting for a replacement since the first release.

We also need better interfacing with UTFs in that. D is usually 
great at Unicode, but it doesn't interface well with I/O. For 
example, when working on the file-reader for SDC I had to 
hand-code the check-BOM-read-and-convert functions: 
http://bit.ly/J0QWVF .

Trying to make it read lazily is even harder, as all std.utf 
functions work on arrays, not ranges. I think this should change.

NMS

May 13 2012

Walter Bright <newshound2 digitalmars.com> writes:

On 5/13/2012 3:16 PM, Nathan M. Swan wrote:
 Trying to make it read lazily is even harder, as all std.utf functions work on
 arrays, not ranges. I think this should change.

Yes, std.utf should be upgraded to present range interfaces.

May 13 2012

"Jonas Drewsen" <jdrewsen nospam.com> writes:

On Sunday, 13 May 2012 at 22:26:17 UTC, Walter Bright wrote:
 On 5/13/2012 3:16 PM, Nathan M. Swan wrote:
 Trying to make it read lazily is even harder, as all std.utf 
 functions work on
 arrays, not ranges. I think this should change.

 Yes, std.utf should be upgraded to present range interfaces.

+1 on that.

I really needed it when doing the std.net.curl stuff and would be
happy to move it to a more generic handling in std.utf.

May 15 2012

"Nathan M. Swan" <nathanmswan gmail.com> writes:

On Sunday, 13 May 2012 at 21:53:58 UTC, Kiith-Sa wrote:
 My D:YAML library (YAML parser) depends on std.stream
 (e.g. for cross-endian compatibility and loading from memory),
 and I've been waiting for a replacement since the first release.

We also need better interfacing with UTFs in that. D is usually
great at Unicode, but it doesn't interface well with I/O. For
example, when working on the file-reader for SDC I had to
hand-code the check-BOM-read-and-convert functions:
http://bit.ly/J0QWVF .

Trying to make it read lazily is even harder, as all std.utf
functions work on arrays, not ranges. I think this should change.

NMS

May 13 2012

Andrej Mitrovic <andrej.mitrovich gmail.com> writes:

On 5/13/12, Kiith-Sa <42 theanswer.com> wrote:
 My D:YAML library (YAML parser) depends on std.stream

Also ae.xml depends on it.

May 14 2012

Robert Clipsham <robert octarineparrot.com> writes:

On 13/05/2012 22:38, Walter Bright wrote:
 This discussion started in the thread "Getting the const-correctness of
 Object sorted once and for all", but it deserved its own thread.

 These modules suffer from the following problems:

 1. poor documentation, dearth of examples & rationale

 2. toHash(), toString(), etc., all need to be const pure nothrow, but
 it's turning out to be problematic for doing it for these classes

 3. overlapping functionality with std.stdio

 4. they should present a range interface, not a streaming one

I make use of std.stream quite a lot... It's horrible, it has to go.

I'm not too bothered if replacements aren't available straight away, as 
it doesn't take much to drop 10 lines of replacement in for the 
functionality I use from it until the actual replacement appears.

-- 
Robert
http://octarineparrot.com/

May 13 2012

"Oleg Kuporosov" <Oleg.Kuporosov gmail.com> writes:

On Sunday, 13 May 2012 at 21:39:07 UTC, Walter Bright wrote:

 3. overlapping functionality with std.stdio

unfortunatelly std.stdio under Windows couldn't handle 
UTF16(wchar)-based file names and text IO which are naturel 
there. The root of issues looks in both underlying DMC C-stdio 
(something wrong with w* based functions?) and std.format which 
provides only UTF8 strings. It make sense to depreciate for 
reasons but only after std.stdio would support UTF16 names/flows 
or good replacement (Steven's std.io?) would be ready. Currently 
std.[c]stream is only the way to work with UTF16 filesystems in 
Phobos. Or switch to Tango which looks supports it too (but I 
don't have expirience here).

May 13 2012

Walter Bright <newshound2 digitalmars.com> writes:

On 5/13/2012 10:22 PM, Oleg Kuporosov wrote:
 unfortunatelly std.stdio under Windows couldn't handle UTF16(wchar)-based file
 names and text IO which are naturel there. The root of issues looks in both
 underlying DMC C-stdio (something wrong with w* based functions?) and
std.format
 which provides only UTF8 strings. It make sense to depreciate for reasons but
 only after std.stdio would support UTF16 names/flows or good replacement
 (Steven's std.io?) would be ready. Currently std.[c]stream is only the way to
 work with UTF16 filesystems in Phobos. Or switch to Tango which looks supports
 it too (but I don't have expirience here).


Why not just convert the UTF16 strings to UTF8 ones? They have the same
information.

May 14 2012

Stewart Gordon <smjg_1998 yahoo.com> writes:

 From the other thread....

On 13/05/2012 21:58, Walter Bright wrote:
 On 5/13/2012 1:48 PM, Stewart Gordon wrote:
 On 13/05/2012 20:42, Walter Bright wrote:
 <snip>
 I'd like to see std.stream dumped.  I don't see any reason for it
 to exist that std.stdio doesn't do (or should do).

 So std.stdio.File is the replacement for the std.stream stuff?

 Not exactly.  Ranges are the replacement.  std.stdio.File is merely
 a range that deals with files.

I don't see any of the required range methods in it.

Moreover, I'm a bit confused about the means of retrieving multiple elements at
once with 
the range API, such as a set number of bytes from a file.  We have popFrontN,
which 
advances the range but doesn't return the data from it.  We have take and
takeExactly, 
which seem to be the way to get a set number of elements from the range, but
I'm confused 
about when/whether using these advances the original range.

If I'm writing a library to read a binary file format, I want to allow the data
to come 
from a file, a socket or a memory image.  The stream API makes this
straightforward.  But 
it seems some work is needed before std.stdio and the range API are up to it.

Stewart.

May 14 2012

Walter Bright <newshound2 digitalmars.com> writes:

On 5/14/2012 4:43 AM, Stewart Gordon wrote:
 If I'm writing a library to read a binary file format, I want to allow the data
 to come from a file, a socket or a memory image. The stream API makes this
 straightforward. But it seems some work is needed before std.stdio and the
range
 API are up to it.

I agree. But that's where the effort needs to be made, not in carrying stream 
forward.

May 14 2012

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Sun, 13 May 2012 17:38:23 -0400, Walter Bright  
<newshound2 digitalmars.com> wrote:

 This discussion started in the thread "Getting the const-correctness of  
 Object sorted once and for all", but it deserved its own thread.

 These modules suffer from the following problems:

 1. poor documentation, dearth of examples & rationale

 2. toHash(), toString(), etc., all need to be const pure nothrow, but  
 it's turning out to be problematic for doing it for these classes

 3. overlapping functionality with std.stdio

 4. they should present a range interface, not a streaming one

I keep trying to avoid talking about this, because I'm writing a  
replacement library for std.stream, and I don't want to step on any toes  
while it's still not accepted.

But I have to say, ranges are *not* a good interface for generic data  
providers.  They are *very* good for structured data providers.

In other words, a stream of bytes, not a good range (who wants to get one  
byte at a time?).  A stream of UTF text broken into lines, a very good  
range.

I have no problem with getting rid of std.stream.  I've never actually  
used it.  Still, we absolutely need a non-range based low-level streaming  
interface to data.  If nothing else, we need something we can build ranges  
upon, and I think my replacement does a very good job of that.

-Steve

May 14 2012

Walter Bright <newshound2 digitalmars.com> writes:

On 5/14/2012 8:02 AM, Steven Schveighoffer wrote:
 I keep trying to avoid talking about this, because I'm writing a replacement
 library for std.stream, and I don't want to step on any toes while it's still
 not accepted.

 But I have to say, ranges are *not* a good interface for generic data
providers.
 They are *very* good for structured data providers.

 In other words, a stream of bytes, not a good range (who wants to get one byte
 at a time?). A stream of UTF text broken into lines, a very good range.

 I have no problem with getting rid of std.stream. I've never actually used it.
 Still, we absolutely need a non-range based low-level streaming interface to
 data. If nothing else, we need something we can build ranges upon, and I think
 my replacement does a very good job of that.

I'll say in advance without seeing your design that it'll be a tough sell if it 
is not range based.

I've been doing some range based work on the side. I'm convinced there is 
enormous potential there, despite numerous shortcomings with them I ran across 
in Phobos. Those shortcomings can be fixed, they are not fatal.

The ability to do things like:

  void main() {
   stdin.byChunk(1024).
      map!(a => a.idup). // one of those shortcomings
      joiner().
      stripComments().
      copy(stdout.lockingTextWriter());
  }

is just kick ass.

May 14 2012

"Lars T. Kyllingstad" <public kyllingen.net> writes:

On Tuesday, 15 May 2012 at 02:56:20 UTC, Walter Bright wrote:
 On 5/14/2012 8:02 AM, Steven Schveighoffer wrote:
 I keep trying to avoid talking about this, because I'm writing 
 a replacement
 library for std.stream, and I don't want to step on any toes 
 while it's still
 not accepted.

 But I have to say, ranges are *not* a good interface for 
 generic data providers.
 They are *very* good for structured data providers.

 In other words, a stream of bytes, not a good range (who wants 
 to get one byte
 at a time?). A stream of UTF text broken into lines, a very 
 good range.

 [...]

 I'll say in advance without seeing your design that it'll be a 
 tough sell if it is not range based.

 I've been doing some range based work on the side. I'm 
 convinced there is enormous potential there, despite numerous 
 shortcomings with them I ran across in Phobos. Those 
 shortcomings can be fixed, they are not fatal.

 [...]

I have to say, I'm with Steve on this one.  While I do believe
ranges will have a very important role to play in D's future I/O
paradigm, I also think there needs to be a layer beneath the
ranges that more directly maps to OS primitives.  And as D is a
systems programming language, that layer needs to be publicly
available.  (Note that this is how std.stdio works now, more or
less.)

-Lars

May 15 2012

"Lars T. Kyllingstad" <public kyllingen.net> writes:

On Tuesday, 15 May 2012 at 15:22:03 UTC, Lars T. Kyllingstad 
wrote:
 On Tuesday, 15 May 2012 at 02:56:20 UTC, Walter Bright wrote:
 On 5/14/2012 8:02 AM, Steven Schveighoffer wrote:
 I keep trying to avoid talking about this, because I'm 
 writing a replacement
 library for std.stream, and I don't want to step on any toes 
 while it's still
 not accepted.

 But I have to say, ranges are *not* a good interface for 
 generic data providers.
 They are *very* good for structured data providers.

 In other words, a stream of bytes, not a good range (who 
 wants to get one byte
 at a time?). A stream of UTF text broken into lines, a very 
 good range.

 [...]

 I'll say in advance without seeing your design that it'll be a 
 tough sell if it is not range based.

 I've been doing some range based work on the side. I'm 
 convinced there is enormous potential there, despite numerous 
 shortcomings with them I ran across in Phobos. Those 
 shortcomings can be fixed, they are not fatal.

 [...]

 I have to say, I'm with Steve on this one.  While I do believe
 ranges will have a very important role to play in D's future I/O
 paradigm, I also think there needs to be a layer beneath the
 ranges that more directly maps to OS primitives.  And as D is a
 systems programming language, that layer needs to be publicly
 available.  (Note that this is how std.stdio works now, more or
 less.)

Also, I wouldn't mind std.*stream getting deprecated.  
Personally, I've never used those modules -- not even once.  As a 
first step their documentation could be removed from dlang.org, 
so new users aren't tempted to start using them.  No 
functionality is better than poor functionality, IMO.

-Lars

May 15 2012

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Mon, 14 May 2012 22:56:08 -0400, Walter Bright  
<newshound2 digitalmars.com> wrote:

 On 5/14/2012 8:02 AM, Steven Schveighoffer wrote:
 I keep trying to avoid talking about this, because I'm writing a  
 replacement
 library for std.stream, and I don't want to step on any toes while it's  
 still
 not accepted.

 But I have to say, ranges are *not* a good interface for generic data  
 providers.
 They are *very* good for structured data providers.

 In other words, a stream of bytes, not a good range (who wants to get  
 one byte
 at a time?). A stream of UTF text broken into lines, a very good range.

 I have no problem with getting rid of std.stream. I've never actually  
 used it.
 Still, we absolutely need a non-range based low-level streaming  
 interface to
 data. If nothing else, we need something we can build ranges upon, and  
 I think
 my replacement does a very good job of that.

 I'll say in advance without seeing your design that it'll be a tough  
 sell if it is not range based.

 I've been doing some range based work on the side. I'm convinced there  
 is enormous potential there, despite numerous shortcomings with them I  
 ran across in Phobos. Those shortcomings can be fixed, they are not  
 fatal.

 The ability to do things like:

   void main() {
    stdin.byChunk(1024).
       map!(a => a.idup). // one of those shortcomings
       joiner().
       stripComments().
       copy(stdout.lockingTextWriter());
   }

I think we may have a misunderstanding.  My design is not range-based, but  
supports ranges, and actually makes them very easy to implement.

byChunk is a perfect example of good range -- it defines a specific  
criteria for determining an "element" of data, appropriate for specific  
situations.

But it must be built on top of something that allows reading arbitrary  
amounts of data.  At the lowest level, this is the OS file  
descriptor/HANDLE.

To be efficient, it should be based on a buffering stream.  That buffering  
stream *does not* need to be a range, and I don't think shoehorning such a  
construct into a range interface makes any sense.

To make this clear, I can say that any range File supports, my design will  
support *as a range*.

To make it even clearer, the current std.stdio.File structure, which you  
have shown to "kick ass" with ranges, is *NOT* range-based by my  
definition.

I should note, the output range idiom is directly supported, because the  
output range definition exactly maps to an output stream definition.

-Steve

May 16 2012

"Nathan M. Swan" <nathanmswan gmail.com> writes:

On Monday, 14 May 2012 at 15:02:11 UTC, Steven Schveighoffer 
wrote:
 In other words, a stream of bytes, not a good range (who wants 
 to get one byte at a time?).  A stream of UTF text broken into 
 lines, a very good range.

There are several cases where one would want one byte at the 
time; e.g. as an input to another range that produces the utf 
text as an output.

I do agree for e.g. with binary data some data can't be read with 
ranges (when you need to read small chunks of varying size), but 
that doesn't mean most things shouldn't be ranged-based.

NMS

May 15 2012

Sean Kelly <sean invisibleduck.org> writes:

On May 15, 2012, at 3:34 PM, "Nathan M. Swan" <nathanmswan gmail.com> wrote:=


 On Monday, 14 May 2012 at 15:02:11 UTC, Steven Schveighoffer wrote:
 In other words, a stream of bytes, not a good range (who wants to get one=


 byte at a time?).  A stream of UTF text broken into lines, a very good rang=
e.
=20
 There are several cases where one would want one byte at the time; e.g. as=

 an input to another range that produces the utf text as an output.
=20
 I do agree for e.g. with binary data some data can't be read with ranges (=

when you need to read small chunks of varying size), but that doesn't mean m=
ost things shouldn't be ranged-based.

You really want both, depending on the situation. I don't see what's weird a=
bout this. C++ iostreams have input and output iterators built on top as wel=
l, for much the same reason. The annoying part is that once you've moved to a=
 range interface it's hard to go back. Like say I want a ZipRange on top of a=
 FileRange.  But now I wan to read structs as binary blobs from that uncompr=
essed output.=20

One thing I'd like in a buffered input API is a way to perform transactional=
 reads such that if the full read can't be performed, the read state remains=
 unchanged. The best you can do with most APIs is to check for a desired len=
gth, but what I'd I don't want to read until a full line is available, and I=
 don't know the exact length?  Typically, you end up having to double buffer=
, which stinks.=20=

May 15 2012

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Tue, 15 May 2012 19:43:05 -0400, Sean Kelly <sean invisibleduck.org>  
wrote:

 One thing I'd like in a buffered input API is a way to perform  
 transactional reads such that if the full read can't be performed, the  
 read state remains unchanged. The best you can do with most APIs is to  
 check for a desired length, but what I'd I don't want to read until a  
 full line is available, and I don't know the exact length?  Typically,  
 you end up having to double buffer, which stinks.

My new design supports this.  I have a function called readUntil:

https://github.com/schveiguy/phobos/blob/new-io2/std/io.d#L832

Essentially, it reads into its buffer until the condition is satisfied.   
Therefore, you are not double buffering.  The return value is a slice of  
the buffer.

There is a way to opt-out of reading any data if you determine you cannot  
do a full read.  Just return 0 from the delegate.

-Steve

May 16 2012

travert phare.normalesup.org (Christophe Travert) writes:

"Steven Schveighoffer" , dans le message (digitalmars.D:167548), a
 My new design supports this.  I have a function called readUntil:
 
 https://github.com/schveiguy/phobos/blob/new-io2/std/io.d#L832
 
 Essentially, it reads into its buffer until the condition is satisfied.   
 Therefore, you are not double buffering.  The return value is a slice of  
 the buffer.
 
 There is a way to opt-out of reading any data if you determine you cannot  
 do a full read.  Just return 0 from the delegate.

Maybe I already told this some time ago, but I am not very comfortable 
with this design. The process delegate has to maintain an internal 
state, if you want to avoid reading everything again. It will be 
difficult to implement those process delegates. Do you have an example 
of moderately complicated reading process to show us it is not too 
complicated?

To avoid this issue, the design could be reversed: A method that would 
like to read a certain amount of character could take a delegate from 
the stream, which provides additionnal bytes of data.

Example:
// create a T by reading from stream. returns true if the T was 
// successfully created, and false otherwise.
bool readFrom(const(ubyte)[] delegate(size_t consumed) stream, out T t);

The stream delegate returns a buffer of data to read from when called 
with consumed==0. It must return additionnal data when called 
repeatedly. When it is called with a consumed != 0, the corresponding 
amount of consumed bytes can be discared from the buffer.

This "stream" delegate (if should have a better name) should not be more 
difficult to implement than readUntil, but makes it more easy to use by 
the client. Did I miss some important information ?

-- 
Christophe

May 16 2012

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Wed, 16 May 2012 10:03:42 -0400, Christophe Travert  
<travert phare.normalesup.org> wrote:

 "Steven Schveighoffer" , dans le message (digitalmars.D:167548), a
 My new design supports this.  I have a function called readUntil:

 https://github.com/schveiguy/phobos/blob/new-io2/std/io.d#L832

 Essentially, it reads into its buffer until the condition is satisfied.
 Therefore, you are not double buffering.  The return value is a slice of
 the buffer.

 There is a way to opt-out of reading any data if you determine you  
 cannot
 do a full read.  Just return 0 from the delegate.

 Maybe I already told this some time ago, but I am not very comfortable
 with this design. The process delegate has to maintain an internal
 state, if you want to avoid reading everything again. It will be
 difficult to implement those process delegates.

The delegate is given which portion has already been "processed", that is  
the 'start' parameter.  If you can use this information, it's highly  
useful.

If you need more context, yes, you have to store it elsewhere, but you do  
have a delegate which contains a context pointer.  In a few places (take a  
look at TextStream's readln  
https://github.com/schveiguy/phobos/blob/new-io2/std/io.d#L2149) I use  
inner functions that have access to the function call's frame pointer in  
order to configure or store data.

 Do you have an example
 of moderately complicated reading process to show us it is not too
 complicated?

The most complicated I have so far is reading UTF data as a range of dchar:

https://github.com/schveiguy/phobos/blob/new-io2/std/io.d#L2209

Note that I hand-inlined all the decoding because using std.utf or the  
runtime was too slow, so although it looks huge, it's pretty basic stuff,  
and can largely be ignored for the terms of this discussion.  The  
interesting part is how it specifies what to consume and what not to.

I realize it's a different way of thinking about how to do I/O, but it  
gives more control to the buffer, so it can reason about how best to  
buffer things.  I look at as a way of the buffered stream saying "I'll  
read some data, you tell me when you see something interesting, and I'll  
give you a slice to it".  The alternative is to double-buffer your data.   
Each call to read can invalidate the previously buffered data.  But  
readUntil guarantees the data is contiguous and consumed all at once, no  
need to double-buffer

 To avoid this issue, the design could be reversed: A method that would
 like to read a certain amount of character could take a delegate from
 the stream, which provides additionnal bytes of data.

 Example:
 // create a T by reading from stream. returns true if the T was
 // successfully created, and false otherwise.
 bool readFrom(const(ubyte)[] delegate(size_t consumed) stream, out T t);

 The stream delegate returns a buffer of data to read from when called
 with consumed==0. It must return additionnal data when called
 repeatedly. When it is called with a consumed != 0, the corresponding
 amount of consumed bytes can be discared from the buffer.

I can see use cases for both your method and mine.

I think I can implement your idea in terms of mine.  I might just do  
that.  The only thing missing is, you need a way to specify to the  
delegate that it needs more data.  Probably using size_t.max as an  
argument.

In fact, I need a peek function anyways, your function will provide that  
ability as well.

-Steve

May 16 2012

Walter Bright <newshound2 digitalmars.com> writes:

On 5/15/2012 4:43 PM, Sean Kelly wrote:
 One thing I'd like in a buffered input API is a way to perform transactional
 reads such that if the full read can't be performed, the read state remains
 unchanged. The best you can do with most APIs is to check for a desired
 length, but what I'd I don't want to read until a full line is available, and
 I don't know the exact length?  Typically, you end up having to double
 buffer, which stinks.

std.stdio.byLine()

May 16 2012

Sean Kelly <sean invisibleduck.org> writes:

On May 16, 2012, at 6:52 AM, Walter Bright <newshound2 digitalmars.com> wrot=
e:

 On 5/15/2012 4:43 PM, Sean Kelly wrote:
 One thing I'd like in a buffered input API is a way to perform transactio=


nal
 reads such that if the full read can't be performed, the read state remai=


ns
 unchanged. The best you can do with most APIs is to check for a desired
 length, but what I'd I don't want to read until a full line is available,=


 and
 I don't know the exact length?  Typically, you end up having to double
 buffer, which stinks.

=20
 std.stdio.byLine()

That was just an example. What if I want to do a formatted read and I'm read=
ing from a file that someone else is writing to?  I don't want to block or g=
et a partial result and an EOF that needs to be reset.=20=

May 16 2012

Walter Bright <newshound2 digitalmars.com> writes:

On 5/16/2012 7:49 AM, Sean Kelly wrote:
 On May 16, 2012, at 6:52 AM, Walter Bright<newshound2 digitalmars.com>
 wrote:

 On 5/15/2012 4:43 PM, Sean Kelly wrote:
 One thing I'd like in a buffered input API is a way to perform
 transactional reads such that if the full read can't be performed, the
 read state remains unchanged. The best you can do with most APIs is to
 check for a desired length, but what I'd I don't want to read until a
 full line is available, and I don't know the exact length?  Typically,
 you end up having to double buffer, which stinks.

 std.stdio.byLine()

 That was just an example. What if I want to do a formatted read and I'm
 reading from a file that someone else is writing to?  I don't want to block
 or get a partial result and an EOF that needs to be reset.

Then you'll need an input range that can be reset - a ForwardRange.

May 16 2012

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Tue, May 15, 2012 at 04:43:05PM -0700, Sean Kelly wrote:
[...]
 One thing I'd like in a buffered input API is a way to perform
 transactional reads such that if the full read can't be performed, the
 read state remains unchanged. The best you can do with most APIs is to
 check for a desired length, but what I'd I don't want to read until a
 full line is available, and I don't know the exact length?  Typically,
 you end up having to double buffer, which stinks. 

This would be very nice to have, but how would you go about implementing
such a thing, though? Wouldn't you need OS-level support for it?


T

-- 
Let's eat some disquits while we format the biskettes.

May 15 2012

Walter Bright <newshound2 digitalmars.com> writes:

On 5/15/2012 3:34 PM, Nathan M. Swan wrote:
 I do agree for e.g. with binary data some data can't be read with ranges (when
 you need to read small chunks of varying size),

I don't see why that should be true.

May 16 2012

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Wed, 16 May 2012 09:50:12 -0400, Walter Bright  
<newshound2 digitalmars.com> wrote:

 On 5/15/2012 3:34 PM, Nathan M. Swan wrote:
 I do agree for e.g. with binary data some data can't be read with  
 ranges (when
 you need to read small chunks of varying size),

 I don't see why that should be true.

How do you tell front and popFront how many bytes to read?

-Steve

May 16 2012

Robert Clipsham <robert octarineparrot.com> writes:

On 16/05/2012 15:38, Steven Schveighoffer wrote:
 On Wed, 16 May 2012 09:50:12 -0400, Walter Bright
 <newshound2 digitalmars.com> wrote:

 On 5/15/2012 3:34 PM, Nathan M. Swan wrote:
 I do agree for e.g. with binary data some data can't be read with
 ranges (when
 you need to read small chunks of varying size),

 I don't see why that should be true.

 How do you tell front and popFront how many bytes to read?

 -Steve

A bit ugly but:
----
// Default to 4 byte chunks
auto range = myStream.byChunks(4);
foreach (chunk; range) {
    // Set the next chunk is 3 bytes
    // Chunk after is 4 bytes
    range.nextChunkSize = 3;

    // Next chunk is always 5 bytes
    range.chunkSize = 5;
}
----

-- 
Robert
http://octarineparrot.com/

May 16 2012

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Wed, 16 May 2012 11:19:46 -0400, Robert Clipsham  
<robert octarineparrot.com> wrote:

 On 16/05/2012 15:38, Steven Schveighoffer wrote:
 On Wed, 16 May 2012 09:50:12 -0400, Walter Bright
 <newshound2 digitalmars.com> wrote:

 On 5/15/2012 3:34 PM, Nathan M. Swan wrote:
 I do agree for e.g. with binary data some data can't be read with
 ranges (when
 you need to read small chunks of varying size),

 I don't see why that should be true.

 How do you tell front and popFront how many bytes to read?

 -Steve

 A bit ugly but:
 ----
 // Default to 4 byte chunks
 auto range = myStream.byChunks(4);
 foreach (chunk; range) {
     // Set the next chunk is 3 bytes
     // Chunk after is 4 bytes
     range.nextChunkSize = 3;

     // Next chunk is always 5 bytes
     range.chunkSize = 5;
 }

Yeah, I've seen this before.  It's not convincing.

-Steve

May 16 2012

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 16.05.2012 19:32, Steven Schveighoffer wrote:
 On Wed, 16 May 2012 11:19:46 -0400, Robert Clipsham
 <robert octarineparrot.com> wrote:

 On 16/05/2012 15:38, Steven Schveighoffer wrote:
 On Wed, 16 May 2012 09:50:12 -0400, Walter Bright
 <newshound2 digitalmars.com> wrote:

 On 5/15/2012 3:34 PM, Nathan M. Swan wrote:
 I do agree for e.g. with binary data some data can't be read with
 ranges (when
 you need to read small chunks of varying size),

 I don't see why that should be true.

 How do you tell front and popFront how many bytes to read?

 -Steve

 A bit ugly but:
 ----
 // Default to 4 byte chunks
 auto range = myStream.byChunks(4);
 foreach (chunk; range) {
 // Set the next chunk is 3 bytes
 // Chunk after is 4 bytes
 range.nextChunkSize = 3;

 // Next chunk is always 5 bytes
 range.chunkSize = 5;
 }

 Yeah, I've seen this before. It's not convincing.

Yes, It's obvious that files do *not* generally follow range of items 
semantic. I mean not even range of various items.
In case of binary data it's most of the time header followed by various 
data. Or hierarchical structure. Or table of links + raw data.
Or whatever. I've yet to see standard way to deal with binary formats :)


-- 
Dmitry Olshansky

May 16 2012

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Wed, 16 May 2012 11:48:32 -0400, Dmitry Olshansky  
<dmitry.olsh gmail.com> wrote:

 On 16.05.2012 19:32, Steven Schveighoffer wrote:
 On Wed, 16 May 2012 11:19:46 -0400, Robert Clipsham
 <robert octarineparrot.com> wrote:
 A bit ugly but:
 ----
 // Default to 4 byte chunks
 auto range = myStream.byChunks(4);
 foreach (chunk; range) {
 // Set the next chunk is 3 bytes
 // Chunk after is 4 bytes
 range.nextChunkSize = 3;

 // Next chunk is always 5 bytes
 range.chunkSize = 5;
 }

 Yeah, I've seen this before. It's not convincing.

 Yes, It's obvious that files do *not* generally follow range of items  
 semantic. I mean not even range of various items.
 In case of binary data it's most of the time header followed by various  
 data. Or hierarchical structure. Or table of links + raw data.
 Or whatever. I've yet to see standard way to deal with binary formats :)

The best solution would be a range that's specific to your format.  My  
solution intends to support that.

But that's only if your format fits within the "range of elements" model.

Good old fashioned "read X bytes" needs to be supported, and insisting you  
do this range style is just plain wrong IMO.

-Steve

May 16 2012

Walter Bright <newshound2 digitalmars.com> writes:

On 5/16/2012 7:38 AM, Steven Schveighoffer wrote:
 On Wed, 16 May 2012 09:50:12 -0400, Walter Bright <newshound2 digitalmars.com>
 wrote:

 On 5/15/2012 3:34 PM, Nathan M. Swan wrote:
 I do agree for e.g. with binary data some data can't be read with ranges (when
 you need to read small chunks of varying size),

 I don't see why that should be true.

 How do you tell front and popFront how many bytes to read?

std.byLine() does it.

In general, you can read n bytes by calling empty, front, and popFront n times.

May 16 2012

Stewart Gordon <smjg_1998 yahoo.com> writes:

On 16/05/2012 16:59, Walter Bright wrote:
 On 5/16/2012 7:38 AM, Steven Schveighoffer wrote:
 On Wed, 16 May 2012 09:50:12 -0400, Walter Bright <newshound2 digitalmars.com>
 wrote:

 On 5/15/2012 3:34 PM, Nathan M. Swan wrote:
 I do agree for e.g. with binary data some data can't be read with ranges (when
 you need to read small chunks of varying size),

 I don't see why that should be true.

 How do you tell front and popFront how many bytes to read?

 std.byLine() does it.

And is what you want to do with a text file in many cases.

 In general, you can read n bytes by calling empty, front, and popFront n times.

Why would anybody want to read a large binary file _one byte at a time_?

Stewart.

May 16 2012

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Wed, May 16, 2012 at 05:41:49PM +0100, Stewart Gordon wrote:
 On 16/05/2012 16:59, Walter Bright wrote:
On 5/16/2012 7:38 AM, Steven Schveighoffer wrote:
On Wed, 16 May 2012 09:50:12 -0400, Walter Bright <newshound2 digitalmars.com>
wrote:

On 5/15/2012 3:34 PM, Nathan M. Swan wrote:
I do agree for e.g. with binary data some data can't be read with
ranges (when you need to read small chunks of varying size),

I don't see why that should be true.

How do you tell front and popFront how many bytes to read?

std.byLine() does it.

 
 And is what you want to do with a text file in many cases.
 
In general, you can read n bytes by calling empty, front, and
popFront n times.

 
 Why would anybody want to read a large binary file _one byte at a
 time_?

[...]

import std.range;
byte[] readNBytes(R)(R range, size_t n)
	if (isInputRange!R && hasSlicing!R)
{
	return R[0..n];
}


T

-- 
MAS = Mana Ada Sistem?

May 16 2012

Stewart Gordon <smjg_1998 yahoo.com> writes:

On 16/05/2012 17:48, H. S. Teoh wrote:
 On Wed, May 16, 2012 at 05:41:49PM +0100, Stewart Gordon wrote:

<snip>
 Why would anybody want to read a large binary file _one byte at a
 time_?

 [...]

 import std.range;
 byte[] readNBytes(R)(R range, size_t n)
 	if (isInputRange!R&&  hasSlicing!R)
 {
 	return R[0..n];
 }

What if I want it to work on ranges that don't have slicing?

Stewart.

May 16 2012

Walter Bright <newshound2 digitalmars.com> writes:

On 5/16/2012 9:41 AM, Stewart Gordon wrote:
 On 16/05/2012 16:59, Walter Bright wrote:
 On 5/16/2012 7:38 AM, Steven Schveighoffer wrote:
 On Wed, 16 May 2012 09:50:12 -0400, Walter Bright <newshound2 digitalmars.com>
 wrote:

 On 5/15/2012 3:34 PM, Nathan M. Swan wrote:
 I do agree for e.g. with binary data some data can't be read with ranges (when
 you need to read small chunks of varying size),

 I don't see why that should be true.

 How do you tell front and popFront how many bytes to read?

 std.byLine() does it.

 And is what you want to do with a text file in many cases.

 In general, you can read n bytes by calling empty, front, and popFront n times.

 Why would anybody want to read a large binary file _one byte at a time_?

You can have that range read from byChunk(). It's really the same thing that
C's 
stdio does.

May 16 2012

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Wed, 16 May 2012 13:21:37 -0400, Walter Bright  
<newshound2 digitalmars.com> wrote:

 On 5/16/2012 9:41 AM, Stewart Gordon wrote:
 On 16/05/2012 16:59, Walter Bright wrote:
 On 5/16/2012 7:38 AM, Steven Schveighoffer wrote:
 On Wed, 16 May 2012 09:50:12 -0400, Walter Bright  
 <newshound2 digitalmars.com>
 wrote:

 On 5/15/2012 3:34 PM, Nathan M. Swan wrote:
 I do agree for e.g. with binary data some data can't be read with  
 ranges (when
 you need to read small chunks of varying size),

 I don't see why that should be true.

 How do you tell front and popFront how many bytes to read?

 std.byLine() does it.

 And is what you want to do with a text file in many cases.

 In general, you can read n bytes by calling empty, front, and popFront  
 n times.

 Why would anybody want to read a large binary file _one byte at a time_?

 You can have that range read from byChunk(). It's really the same thing  
 that C's stdio does.

This is very wrong.  byChunk doesn't cut it.  The number of bytes to  
consume from the stream can depend on any number of factors, including the  
actual data in the stream.  For instance, I challenge you to write an  
efficient (meaning no extra buffering) byLine using byChunk as a base.

-Steve

May 16 2012

Stewart Gordon <smjg_1998 yahoo.com> writes:

On 16/05/2012 18:21, Walter Bright wrote:
<snip>
 You can have that range read from byChunk(). It's really the same thing that
C's stdio does.

And what if I want it to work on ranges that don't have a byChunk method?

Stewart.

May 16 2012

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Wed, 16 May 2012 11:59:37 -0400, Walter Bright  
<newshound2 digitalmars.com> wrote:

 On 5/16/2012 7:38 AM, Steven Schveighoffer wrote:
 On Wed, 16 May 2012 09:50:12 -0400, Walter Bright  
 <newshound2 digitalmars.com>
 wrote:

 On 5/15/2012 3:34 PM, Nathan M. Swan wrote:
 I do agree for e.g. with binary data some data can't be read with  
 ranges (when
 you need to read small chunks of varying size),

 I don't see why that should be true.

 How do you tell front and popFront how many bytes to read?

 std.byLine() does it.

Have you looked at how std.byLine works?  It certainly does not use a  
range interface as a source.

 In general, you can read n bytes by calling empty, front, and popFront n  
 times.

I hope you are not serious!  This will make D *the worst performing* i/o  
language.

This should be evidence enough:

steves steves-laptop:~$ time dd if=/dev/zero of=/dev/null bs=1  
count=1000000
1000000+0 records in
1000000+0 records out
1000000 bytes (1.0 MB) copied, 0.74052 s, 1.4 MB/s

real	0m0.744s
user	0m0.176s
sys	0m0.564s
steves steves-laptop:~$ time dd if=/dev/zero of=/dev/null bs=1000  
count=1000
1000+0 records in
1000+0 records out
1000000 bytes (1.0 MB) copied, 0.00194096 s, 515 MB/s

real	0m0.006s
user	0m0.000s
sys	0m0.004s

-Steve

May 16 2012

Walter Bright <newshound2 digitalmars.com> writes:

On 5/16/2012 10:18 AM, Steven Schveighoffer wrote:
 On Wed, 16 May 2012 11:59:37 -0400, Walter Bright <newshound2 digitalmars.com>
 wrote:

 On 5/16/2012 7:38 AM, Steven Schveighoffer wrote:
 On Wed, 16 May 2012 09:50:12 -0400, Walter Bright <newshound2 digitalmars.com>
 wrote:

 On 5/15/2012 3:34 PM, Nathan M. Swan wrote:
 I do agree for e.g. with binary data some data can't be read with ranges (when
 you need to read small chunks of varying size),

 I don't see why that should be true.

 How do you tell front and popFront how many bytes to read?

 std.byLine() does it.

 Have you looked at how std.byLine works? It certainly does not use a range
 interface as a source.

It presents a range interface, though. Not a streaming one.

 In general, you can read n bytes by calling empty, front, and popFront n times.

 I hope you are not serious! This will make D *the worst performing* i/o
language.

You can read arbitrary numbers of bytes by tacking a range on after byChunk().

May 16 2012

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Wed, 16 May 2012 13:23:07 -0400, Walter Bright  
<newshound2 digitalmars.com> wrote:

 On 5/16/2012 10:18 AM, Steven Schveighoffer wrote:
 On Wed, 16 May 2012 11:59:37 -0400, Walter Bright  
 <newshound2 digitalmars.com>
 wrote:

 On 5/16/2012 7:38 AM, Steven Schveighoffer wrote:
 On Wed, 16 May 2012 09:50:12 -0400, Walter Bright  
 <newshound2 digitalmars.com>
 wrote:

 On 5/15/2012 3:34 PM, Nathan M. Swan wrote:
 I do agree for e.g. with binary data some data can't be read with  
 ranges (when
 you need to read small chunks of varying size),

 I don't see why that should be true.

 How do you tell front and popFront how many bytes to read?

 std.byLine() does it.

 Have you looked at how std.byLine works? It certainly does not use a  
 range
 interface as a source.

 It presents a range interface, though. Not a streaming one.

But that is *the point*!  The code deciding how much data to read (i.e.  
the entity I referenced above that 'tells front and popFront how many  
bytes to read') is *not* using a range interface.  In other words, ranges  
aren't enough.

Ranges can be built on top of streaming interfaces.  But there is *still*  
a need for a comprehensive streaming toolkit.  And C's streaming toolkit  
is not as good as a native D toolkit can be.

 In general, you can read n bytes by calling empty, front, and popFront  
 n times.

 I hope you are not serious! This will make D *the worst performing* i/o  
 language.

 You can read arbitrary numbers of bytes by tacking a range on after  
 byChunk().

No, this doesn't work in most cases.  See my other post.  You can't get  
everything you want out of just byChunk and byLine.

what about byMySpecificPacketProtocol?

-Steve

May 16 2012

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 5/16/12 12:34 PM, Steven Schveighoffer wrote:
 In other words, ranges aren't enough.

This is copiously clear to me, but the way I like to think about it is 
by extending the notion of range (with notions such as e.g. 
BufferedRange, LookaheadRange, and such) instead of developing an 
abstraction independent from ranges and then working on stitching that 
with ranges.

Andrei

May 16 2012

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Wed, 16 May 2012 13:48:49 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 On 5/16/12 12:34 PM, Steven Schveighoffer wrote:
 In other words, ranges aren't enough.

 This is copiously clear to me, but the way I like to think about it is  
 by extending the notion of range (with notions such as e.g.  
 BufferedRange, LookaheadRange, and such) instead of developing an  
 abstraction independent from ranges and then working on stitching that  
 with ranges.

What I think we would end up with is a streaming API with range primitives  
tacked on.

- empty is clunky, but possible to implement.  However, it may become  
invalid (think of reading a file that is being appended to by another  
process).
- popFront and front do not have any clear definition of what they refer  
to.  The only valid thing I can think of is bytes, and then nobody will  
use them.

That's hardly saying it's "range based".  I refuse to believe that people  
will be thrilled by having to 'pre-configure' each front and popFront call  
in order to get work done.  If you want to try and convince me, I'm  
willing to listen, but so far I haven't seen anything that looks at all  
appetizing.

-Steve

May 16 2012

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 5/16/12 1:00 PM, Steven Schveighoffer wrote:
 What I think we would end up with is a streaming API with range
 primitives tacked on.

 - empty is clunky, but possible to implement. However, it may become
 invalid (think of reading a file that is being appended to by another
 process).
 - popFront and front do not have any clear definition of what they refer
 to. The only valid thing I can think of is bytes, and then nobody will
 use them.

 That's hardly saying it's "range based". I refuse to believe that people
 will be thrilled by having to 'pre-configure' each front and popFront
 call in order to get work done. If you want to try and convince me, I'm
 willing to listen, but so far I haven't seen anything that looks at all
 appetizing.

Where the two meet is in the notion of buffered streams. Ranges are by 
default buffered, i.e. user code can call front() several times without 
an intervening popFront() and get the same thing. So a range has by 
definition a buffer of at least one element.

That makes the range interface unsuitable for strictly UNbuffered 
streams. On the other hand, a range could no problem offer OPTIONAL 
unbuffered reads (the existence of a buffer does not preclude 
availability of unbuffered transfers).

So to tie it all nicely I think we need:

1. A STRICTLY UNBUFFERED streaming abstraction

2. A notion of range that supports unbuffered transfers.


Andrei

May 16 2012

Timon Gehr <timon.gehr gmx.ch> writes:

On 05/16/2012 11:08 PM, Andrei Alexandrescu wrote:
 On 5/16/12 1:00 PM, Steven Schveighoffer wrote:
 What I think we would end up with is a streaming API with range
 primitives tacked on.

 - empty is clunky, but possible to implement. However, it may become
 invalid (think of reading a file that is being appended to by another
 process).
 - popFront and front do not have any clear definition of what they refer
 to. The only valid thing I can think of is bytes, and then nobody will
 use them.

 That's hardly saying it's "range based". I refuse to believe that people
 will be thrilled by having to 'pre-configure' each front and popFront
 call in order to get work done. If you want to try and convince me, I'm
 willing to listen, but so far I haven't seen anything that looks at all
 appetizing.

 Where the two meet is in the notion of buffered streams. Ranges are by
 default buffered, i.e. user code can call front() several times without
 an intervening popFront() and get the same thing.  So a range has by
 definition a buffer of at least one element.

I don't think this necessarily holds. 'front' might be computed on the 
fly, as it is done for std.algorithm.map.

 That makes the range interface unsuitable for strictly UNbuffered
 streams. On the other hand, a range could no problem offer OPTIONAL
 unbuffered reads (the existence of a buffer does not preclude
 availability of unbuffered transfers).

 So to tie it all nicely I think we need:

 1. A STRICTLY UNBUFFERED streaming abstraction

 2. A notion of range that supports unbuffered transfers.


 Andrei

May 16 2012

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 5/16/12 4:40 PM, Timon Gehr wrote:
 On 05/16/2012 11:08 PM, Andrei Alexandrescu wrote:
 On 5/16/12 1:00 PM, Steven Schveighoffer wrote:
 What I think we would end up with is a streaming API with range
 primitives tacked on.

 - empty is clunky, but possible to implement. However, it may become
 invalid (think of reading a file that is being appended to by another
 process).
 - popFront and front do not have any clear definition of what they refer
 to. The only valid thing I can think of is bytes, and then nobody will
 use them.

 That's hardly saying it's "range based". I refuse to believe that people
 will be thrilled by having to 'pre-configure' each front and popFront
 call in order to get work done. If you want to try and convince me, I'm
 willing to listen, but so far I haven't seen anything that looks at all
 appetizing.

 Where the two meet is in the notion of buffered streams. Ranges are by
 default buffered, i.e. user code can call front() several times without
 an intervening popFront() and get the same thing. So a range has by
 definition a buffer of at least one element.

 I don't think this necessarily holds. 'front' might be computed on the
 fly, as it is done for std.algorithm.map.

It used to be buffered in fact but that was too much trouble. The fair 
thing to say here is that map relies on the implicit buffering of its input.

Andrei

May 16 2012

"Adam D. Ruppe" <destructionator gmail.com> writes:

On Wednesday, 16 May 2012 at 17:48:52 UTC, Andrei Alexandrescu 
wrote:
 This is copiously clear to me, but the way I like to think 
 about it is by extending the notion of range (with notions such 
 as e.g. BufferedRange, LookaheadRange, and such)

I tried this in cgi.d somewhat recently. It ended up
only vaguely looking like a range.

     /**
        A slight difference from regular ranges is you can give it 
the maximum
        number of bytes to consume.

        IMPORTANT NOTE: the default is to consume nothing, so if 
you don't call
        consume() yourself and use a regular foreach, it will 
infinitely loop!
     */
    void popFront(size_t maxBytesToConsume = 0 /*size_t.max*/, 
size_t minBytesToSettleFor = 0) {}


I called that a "slight different" in the comment, but it is
actually a pretty major difference. In practice, it is nothing
like a regular range.

If I defaulted to size_t.max, you could foreach() it, but then
you don't really get to take advantage of the buffer, since it
is cleared out entirely for each iteration.

If it defaults to 0, you can put it in a foreach... but you
have to manually say how much of it is consumed, which no other
range does, meaning it won't work with std.algorithm or anything.


It sorta looks like a range, but isn't actually one at all.




I'm sure something better is possible, but I don't think the range
abstraction is a good fit for this use case.

Of course, providing optional ranges (like how file gives byChunk,
byLine, etc.) is probably a good idea.

May 16 2012

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Wed, May 16, 2012 at 12:48:49PM -0500, Andrei Alexandrescu wrote:
 On 5/16/12 12:34 PM, Steven Schveighoffer wrote:
In other words, ranges aren't enough.

 
 This is copiously clear to me, but the way I like to think about it
 is by extending the notion of range (with notions such as e.g.
 BufferedRange, LookaheadRange, and such) instead of developing an
 abstraction independent from ranges and then working on stitching
 that with ranges.

[...]

One direction that _could_ be helpful, perhaps, is to extend the concept
of range to include, let's tentatively call it, a ChunkedRange.
Basically a ChunkedRange implements the usual InputRange operations
(empty, front, popfront) but adds the following new primitives:

- bool hasAtLeast(R)(R range, int n) - true if underlying range has at
  least n elements left;

- E[] frontN(R)(R range, int n) - returns a slice containing the front n
  elements from the range: this will buffer the next n elements from the
  range if they aren't already; repeated calls will just return the
  buffer;

- void popN(R)(R range, int n) - discards the first n elements from the
  buffer, thus causing the next call to frontN() to fetch more data if
  necessary.

These are all tentative names, of course. But the idea is that you can
keep N elements of the range "in view" at a time, without having to
individually read them out and save them in a second buffer, and you can
advance this view once you're done with the current data and want to
move on.

Existing range operations like popFrontN, take, takeExactly, drop, etc.,
can be extended to take advantage of the extra functionality of
ChunkedRanges. (Perhaps popFrontN can even be merged with popN, since
they amount to the same thing.)

Using a ChunkedRange allows you to write functions that parse a
particular range and return a range of chunks (say, a deserializer that
returns a range of objects given a range of bytes).

Thinking on it a bit further, perhaps we can call this a WindowedRange,
since it somewhat resembles the sliding window protocol where you keep a
"window" of sequential packet ids in an active buffer, and remove them
from the buffer as they get ack'ed (consumed by popN). The buffer thus
acts like a "window" into the next n elements in the range, which can be
"slid forward" as data is consumed.


T

-- 
Having a smoking section in a restaurant is like having a peeing section
in a swimming pool. -- Edward Burr

May 16 2012

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Wed, 16 May 2012 15:38:02 -0400, H. S. Teoh <hsteoh quickfur.ath.cx>  
wrote:

 On Wed, May 16, 2012 at 12:48:49PM -0500, Andrei Alexandrescu wrote:
 On 5/16/12 12:34 PM, Steven Schveighoffer wrote:
In other words, ranges aren't enough.

 This is copiously clear to me, but the way I like to think about it
 is by extending the notion of range (with notions such as e.g.
 BufferedRange, LookaheadRange, and such) instead of developing an
 abstraction independent from ranges and then working on stitching
 that with ranges.

 [...]

 One direction that _could_ be helpful, perhaps, is to extend the concept
 of range to include, let's tentatively call it, a ChunkedRange.
 Basically a ChunkedRange implements the usual InputRange operations
 (empty, front, popfront) but adds the following new primitives:

 - bool hasAtLeast(R)(R range, int n) - true if underlying range has at
   least n elements left;

 - E[] frontN(R)(R range, int n) - returns a slice containing the front n
   elements from the range: this will buffer the next n elements from the
   range if they aren't already; repeated calls will just return the
   buffer;

 - void popN(R)(R range, int n) - discards the first n elements from the
   buffer, thus causing the next call to frontN() to fetch more data if
   necessary.

On such ranges, what would popFront and front do?  I'm assuming since  
frontN and popN are referring to how many elements, and since the most  
logical definition for elements is bytes, that front gets the next byte,  
and popFront discards the next byte.  This seems useless to me.

I still don't get the need to "add" this to ranges.  The streaming API  
works fine on its own.

But there is an omission with your proposed API regardless -- reading data  
is a mutating event.  It destructively mutates the underlying data stream  
so that you cannot get the data again.  This means you must double-buffer  
data in order to support frontN and popN that are not necessary with a  
simple read API.

For example:

auto buf = new ubyte[1000000];
stream.read(buf);

does not need to first buffer the data inside the stream and then copy it  
to buf, it can read it from the OS *directly* into buf.

-Steve

May 16 2012

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Wed, May 16, 2012 at 04:15:22PM -0400, Steven Schveighoffer wrote:
 On Wed, 16 May 2012 15:38:02 -0400, H. S. Teoh
 <hsteoh quickfur.ath.cx> wrote:

[...]
One direction that _could_ be helpful, perhaps, is to extend the concept
of range to include, let's tentatively call it, a ChunkedRange.
Basically a ChunkedRange implements the usual InputRange operations
(empty, front, popfront) but adds the following new primitives:

- bool hasAtLeast(R)(R range, int n) - true if underlying range has at
  least n elements left;

- E[] frontN(R)(R range, int n) - returns a slice containing the front n
  elements from the range: this will buffer the next n elements from the
  range if they aren't already; repeated calls will just return the
  buffer;

- void popN(R)(R range, int n) - discards the first n elements from the
  buffer, thus causing the next call to frontN() to fetch more data if
  necessary.

 
 On such ranges, what would popFront and front do?  I'm assuming since
 frontN and popN are referring to how many elements, and since the most
 logical definition for elements is bytes, that front gets the next
 byte, and popFront discards the next byte.  This seems useless to me.

How so? It's still useful for implementing readByte, for example.


 I still don't get the need to "add" this to ranges.  The streaming API
 works fine on its own.
 
 But there is an omission with your proposed API regardless --
 reading data is a mutating event.  It destructively mutates the
 underlying data stream so that you cannot get the data again.  This
 means you must double-buffer data in order to support frontN and
 popN that are not necessary with a simple read API.
 
 For example:
 
 auto buf = new ubyte[1000000];
 stream.read(buf);
 
 does not need to first buffer the data inside the stream and then
 copy it to buf, it can read it from the OS *directly* into buf.

[...]

The idea is that by asking for N elements at a time instead of calling
front/popFront N times, the underlying implementation can optimize the
request by creating a buffer of size N and have the OS read exactly N
bytes directly into that buffer.

	// Reads 1,000,000 bytes into newly allocated buffer and returns
	// buffer.
	auto buf = stream.frontN(1_000_000);

	// Since 1,000,000 bytes is already read into the buffer, this
	// simply returns a slice of the same buffer:
	auto buf2 = stream.frontN(1_000_000);
	assert(buf is buf2);

	// This consumes the buffer:
	stream.popN(1_000_000);

	// This will read another 1,000,000 bytes into a new buffer
	auto buf3 = stream.frontN(1_000_000);

	// This returns the same buffer as buf3 since we already have
	// the data available.
	auto buf4 = stream.frontN(1_000_000);


T

-- 
English has the lovely word "defenestrate", meaning "to execute by throwing
someone out a window", or more recently "to remove Windows from a computer and
replace it with something useful". :-) -- John Cowan

May 16 2012

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Wed, 16 May 2012 16:30:43 -0400, H. S. Teoh <hsteoh quickfur.ath.cx>  
wrote:

 On Wed, May 16, 2012 at 04:15:22PM -0400, Steven Schveighoffer wrote:
 On Wed, 16 May 2012 15:38:02 -0400, H. S. Teoh
 <hsteoh quickfur.ath.cx> wrote:

 [...]
One direction that _could_ be helpful, perhaps, is to extend the  

 concept
of range to include, let's tentatively call it, a ChunkedRange.
Basically a ChunkedRange implements the usual InputRange operations
(empty, front, popfront) but adds the following new primitives:

- bool hasAtLeast(R)(R range, int n) - true if underlying range has at
  least n elements left;

- E[] frontN(R)(R range, int n) - returns a slice containing the front  

 n
  elements from the range: this will buffer the next n elements from  

 the
  range if they aren't already; repeated calls will just return the
  buffer;

- void popN(R)(R range, int n) - discards the first n elements from the
  buffer, thus causing the next call to frontN() to fetch more data if
  necessary.

 On such ranges, what would popFront and front do?  I'm assuming since
 frontN and popN are referring to how many elements, and since the most
 logical definition for elements is bytes, that front gets the next
 byte, and popFront discards the next byte.  This seems useless to me.

 How so? It's still useful for implementing readByte, for example.

readByte is covered by frontN(1).  Why the need for front()?

Let me answer that question for you -- so it can be treated as a normal  
range.  But nobody will want to do that.

i.e. copy to appender will read one byte at a time into the array.

 I still don't get the need to "add" this to ranges.  The streaming API
 works fine on its own.

 But there is an omission with your proposed API regardless --
 reading data is a mutating event.  It destructively mutates the
 underlying data stream so that you cannot get the data again.  This
 means you must double-buffer data in order to support frontN and
 popN that are not necessary with a simple read API.

 For example:

 auto buf = new ubyte[1000000];
 stream.read(buf);

 does not need to first buffer the data inside the stream and then
 copy it to buf, it can read it from the OS *directly* into buf.

 [...]

 The idea is that by asking for N elements at a time instead of calling
 front/popFront N times, the underlying implementation can optimize the
 request by creating a buffer of size N and have the OS read exactly N
 bytes directly into that buffer.

 	// Reads 1,000,000 bytes into newly allocated buffer and returns
 	// buffer.
 	auto buf = stream.frontN(1_000_000);

OK, so stream is providing data via return value and allocation.

 	// Since 1,000,000 bytes is already read into the buffer, this
 	// simply returns a slice of the same buffer:
 	auto buf2 = stream.frontN(1_000_000);

Is buf2 mutable?  If so, this is no good, buf could have mutated this  
data.  But this can be fixed by making the return value of frontN be  
const(ubyte)[].

 	assert(buf is buf2);

 	// This consumes the buffer:
 	stream.popN(1_000_000);

What does "consume" mean, discard?  Obviously not "reuse", due to line  
below...

 	// This will read another 1,000,000 bytes into a new buffer
 	auto buf3 = stream.frontN(1_000_000);

OK, you definitely lost me here, this will not fly.  The whole point of  
buffering is to avoid having to reallocate on every read.  If you have to  
allocate every read, "buffering" is going to have a negative impact on  
performance!

-Steve

May 16 2012

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Wed, May 16, 2012 at 04:52:09PM -0400, Steven Schveighoffer wrote:
 On Wed, 16 May 2012 16:30:43 -0400, H. S. Teoh
 <hsteoh quickfur.ath.cx> wrote:
 
On Wed, May 16, 2012 at 04:15:22PM -0400, Steven Schveighoffer wrote:
On Wed, 16 May 2012 15:38:02 -0400, H. S. Teoh
<hsteoh quickfur.ath.cx> wrote:

[...]
One direction that _could_ be helpful, perhaps, is to extend the
concept of range to include, let's tentatively call it, a
ChunkedRange.  Basically a ChunkedRange implements the usual
InputRange operations (empty, front, popfront) but adds the
following new primitives:

- bool hasAtLeast(R)(R range, int n) - true if underlying range has
  at least n elements left;

- E[] frontN(R)(R range, int n) - returns a slice containing the
  front n elements from the range: this will buffer the next n
  elements from the range if they aren't already; repeated calls
  will just return the buffer;

- void popN(R)(R range, int n) - discards the first n elements from
  the buffer, thus causing the next call to frontN() to fetch more
  data if necessary.

On such ranges, what would popFront and front do?  I'm assuming
since frontN and popN are referring to how many elements, and since
the most logical definition for elements is bytes, that front gets
the next byte, and popFront discards the next byte.  This seems
useless to me.

How so? It's still useful for implementing readByte, for example.

 
 readByte is covered by frontN(1).  Why the need for front()?
 
 Let me answer that question for you -- so it can be treated as a
 normal range.  But nobody will want to do that.
 
 i.e. copy to appender will read one byte at a time into the array.

If this new type of range is recognized by std.range, then the relevant
algorithms can be made to recognize the existence of frontN and make
good use of it, instead of iterating front N times. Then front() can
still be used by stuff that really only wants a single byte at a time.


[...]
The idea is that by asking for N elements at a time instead of
calling front/popFront N times, the underlying implementation can
optimize the request by creating a buffer of size N and have the OS
read exactly N bytes directly into that buffer.

	// Reads 1,000,000 bytes into newly allocated buffer and returns
	// buffer.
	auto buf = stream.frontN(1_000_000);

 
 OK, so stream is providing data via return value and allocation.
 
	// Since 1,000,000 bytes is already read into the buffer, this
	// simply returns a slice of the same buffer:
	auto buf2 = stream.frontN(1_000_000);

 
 Is buf2 mutable?  If so, this is no good, buf could have mutated
 this data.  But this can be fixed by making the return value of
 frontN be const(ubyte)[].
 
	assert(buf is buf2);

	// This consumes the buffer:
	stream.popN(1_000_000);

 
 What does "consume" mean, discard?  Obviously not "reuse", due to
 line below...

Yes, discard. That's what popFront does right now for a single element.


	// This will read another 1,000,000 bytes into a new buffer
	auto buf3 = stream.frontN(1_000_000);

 
 OK, you definitely lost me here, this will not fly.  The whole point
 of buffering is to avoid having to reallocate on every read.  If you
 have to allocate every read, "buffering" is going to have a negative
 impact on performance!

[...]

I thought the whole point of buffering is to avoid excessive roundtrips
to disk I/O.

Though you do have a point that allocating on every read is a bad idea.


T

-- 
Why is it that all of the instruments seeking intelligent life in the universe
are pointed away from Earth? -- Michael Beibl

May 16 2012

"jerro" <a a.com> writes:

 One direction that _could_ be helpful, perhaps, is to extend 
 the concept
 of range to include, let's tentatively call it, a ChunkedRange.
 Basically a ChunkedRange implements the usual InputRange 
 operations
 (empty, front, popfront) but adds the following new primitives:

 - bool hasAtLeast(R)(R range, int n) - true if underlying range 
 has at
   least n elements left;

I think it would be better to have a function that would return
the number of elements left.

 - E[] frontN(R)(R range, int n) - returns a slice containing 
 the front n
   elements from the range: this will buffer the next n elements 
 from the
   range if they aren't already; repeated calls will just return 
 the
   buffer;

 - void popN(R)(R range, int n) - discards the first n elements 
 from the
   buffer, thus causing the next call to frontN() to fetch more 
 data if
   necessary.

I like the idea of frontN and popN. But is there any reason why
a type that defines those (let's call it a stream) should also
be a range? I would prefer to have a type that just defines those
two functions, a function that returns the number of available
elements and a functions that tells whether we are at the end of
stream. If you need a range of elements with a blocking popFront,
it's easy to build one on top of it. You can write a functions
that takes any stream and returns a range of element. I think
that's better than  having to write front, popFront, and empty
for every stream.

May 16 2012

Artur Skawina <art.08.09 gmail.com> writes:

On 05/16/12 21:38, H. S. Teoh wrote:
 On Wed, May 16, 2012 at 12:48:49PM -0500, Andrei Alexandrescu wrote:
 On 5/16/12 12:34 PM, Steven Schveighoffer wrote:
 In other words, ranges aren't enough.

 This is copiously clear to me, but the way I like to think about it
 is by extending the notion of range (with notions such as e.g.
 BufferedRange, LookaheadRange, and such) instead of developing an
 abstraction independent from ranges and then working on stitching
 that with ranges.

 [...]
 
 One direction that _could_ be helpful, perhaps, is to extend the concept
 of range to include, let's tentatively call it, a ChunkedRange.
 Basically a ChunkedRange implements the usual InputRange operations
 (empty, front, popfront) but adds the following new primitives:
 
 - bool hasAtLeast(R)(R range, int n) - true if underlying range has at
   least n elements left;
 
 - E[] frontN(R)(R range, int n) - returns a slice containing the front n
   elements from the range: this will buffer the next n elements from the
   range if they aren't already; repeated calls will just return the
   buffer;
 
 - void popN(R)(R range, int n) - discards the first n elements from the
   buffer, thus causing the next call to frontN() to fetch more data if
   necessary.
 
 These are all tentative names, of course. But the idea is that you can
 keep N elements of the range "in view" at a time, without having to
 individually read them out and save them in a second buffer, and you can
 advance this view once you're done with the current data and want to
 move on.
 
 Existing range operations like popFrontN, take, takeExactly, drop, etc.,
 can be extended to take advantage of the extra functionality of
 ChunkedRanges. (Perhaps popFrontN can even be merged with popN, since
 they amount to the same thing.)
 
 Using a ChunkedRange allows you to write functions that parse a
 particular range and return a range of chunks (say, a deserializer that
 returns a range of objects given a range of bytes).
 
 Thinking on it a bit further, perhaps we can call this a WindowedRange,
 since it somewhat resembles the sliding window protocol where you keep a
 "window" of sequential packet ids in an active buffer, and remove them
 from the buffer as they get ack'ed (consumed by popN). The buffer thus
 acts like a "window" into the next n elements in the range, which can be
 "slid forward" as data is consumed.

Right now, everybody reinvents this, with a slightly different interface...
It's really obvious, needed and just has to be standardized.

A few notes:

hasAtLeast is redundant as it can be better expressed as .length; what would
be the point of wrapping 'r.length>=n'? An '.available' property would be
useful to find eg out how much can be consumed w/o blocking, but that one 
should return a size_t too.

'E[] frontN' should have a version that returns all available elements; i 
called it ' property E[] fronts()' here. It's more efficient that way and
doesn't rely on the compiler to inline and optimize the limit checks away.

PopN -- well, its signature here is 'void popFronts(size_t n)', other than
that, there's no difference.

Similar things are necessary for output ranges. Here, what i needed was:

   void put(ref E el)
   void puts(E[] els)
    property size_t free() // Not the most intuitive name w/o context;
                           // returns the number of E's that can be 'put()'
                           // w/o blocking.                            

Note that all of this doesn't address the consume-variable-sized-chunks issue.
But that can now be efficiently handled by another layer on top.


On 05/16/12 22:15, Steven Schveighoffer wrote:
 I still don't get the need to "add" this to ranges.  The streaming API works
fine on its own.

This is not an argument against a streaming API (at least not for me), but
for efficient ranges. With the API above I can shift tens of gigabytes of
data per second between threads. And still use the 'std' range API and
everything that works with it...

 But there is an omission with your proposed API regardless -- reading data is
a mutating event.  It destructively mutates the underlying data stream so that
you cannot get the data again.  This means you must double-buffer data in order
to support frontN and popN that are not necessary with a simple read API.
 
 For example:
 
 auto buf = new ubyte[1000000];
 stream.read(buf);
 
 does not need to first buffer the data inside the stream and then copy it to
buf, it can read it from the OS *directly* into buf.

Sometimes having the buffer managed by 'stream' and 'read()' returning a slice
into it works (this is what 'fronts' above does). Reusing a caller managed
buffer can be useful in other cases, yes. 

artur

May 16 2012

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Wed, 16 May 2012 16:38:54 -0400, Artur Skawina <art.08.09 gmail.com>  
wrote:

 On 05/16/12 22:15, Steven Schveighoffer wrote:
 I still don't get the need to "add" this to ranges.  The streaming API  
 works fine on its own.

 This is not an argument against a streaming API (at least not for me),  
 but
 for efficient ranges. With the API above I can shift tens of gigabytes of
 data per second between threads. And still use the 'std' range API and
 everything that works with it...

But you never would want to.  Don't get me wrong, the primitives here  
could work for a streaming API (I haven't implemented it that way, but it  
could be made to work).  But the idea that it must *also* be a std.range  
input range makes zero sense.

To me, this is as obvious as not supporting linklist[index];  Sure, it can  
be done, but who would ever use it?

-Steve

May 16 2012

Artur Skawina <art.08.09 gmail.com> writes:

On 05/16/12 22:58, Steven Schveighoffer wrote:
 On Wed, 16 May 2012 16:38:54 -0400, Artur Skawina <art.08.09 gmail.com> wrote:
 
 On 05/16/12 22:15, Steven Schveighoffer wrote:
 I still don't get the need to "add" this to ranges.  The streaming API works
fine on its own.

 This is not an argument against a streaming API (at least not for me), but
 for efficient ranges. With the API above I can shift tens of gigabytes of
 data per second between threads. And still use the 'std' range API and
 everything that works with it...

 
 But you never would want to.  Don't get me wrong, the primitives here could
work for a streaming API (I haven't implemented it that way, but it could be
made to work).  But the idea that it must *also* be a std.range input range
makes zero sense.

Well, I do want to. For example, I can pass the produced data to *any* range
consumer, it may be as efficient as mine, but will still work reasonably (I just
did a quick test and the difference seems to be about 10G/s less for plain 
front+popFront consumer).

The goal here is: if we could agree on a standard interface then *any* producer
and
consumer, including the ones in the std lib could take advantage of this
(optional)
feature. It's not so much about function call overhead as /syscall/ and
/locking/
costs. Retrieving or writing 100 elements with only one lock-unlock sequence
makes
a large difference.

 To me, this is as obvious as not supporting linklist[index];  Sure, it can be
done, but who would ever use it?

This is not even related.

Your 'read(ref ubyte[])' approach can actually mean that one more copy of
the data is required. Think writer->range_or_stream->reader -- unless the
reader is already waiting with an empty buffer, the stream has to copy the
data to an internal buffer, which then has to be copied again when a reader
comes around. The 'slice[] = fronts' solution avoids the second copy.
Like I said, depending on the circumstances, sometimes you want one scheme,
sometimes the other. (TBH, right now i can't think of a case where i'd prefer
a non-range based approach; having the same i/f is just so convenient. But 
I'm sure there's one ;) )

artur

May 16 2012

"Adam D. Ruppe" <destructionator gmail.com> writes:

tbh, I've found byChunk to be less than worthless
in my experience; it's a liability because I still
have to wrap it somehow to real real world files.

Consider reading a series of strings in the format
<length><data>,[...].

I'd like it to be this simple (neglecting priming the loop):

string[] s;
while(!file.eof)) {
     ubyte length = file.read!ubyte;
     s ~= file.read!string(length);
}


The C fgetc/fread interface can do this reasonably
well.

string[] s;
while(!feof(fp)) {
    ubyte length = fgetc(fp);
    char[] buffer;
    buffer.length = length;
    fread(buffer.ptr, 1, length, fp);
    s ~= assumeUnique(buffer);
}


But, doing it with byChunk is an exercise in pain
that I don't even feel like writing here.




Another problem is consider a network interface. You
want to handle the packets as they come in.

byChunk doesn't work at all because it blocks until it
gets the chunk of the requested size.

foreach(chunk; socket.byChunk(1024))


suppose you get a packet of length 1000 and you have
to answer it. That will block forever.

So, if you use byChunk as the underlying thing to fill
your buffer... you don't get anywhere.


I think a better input primitive is byPacket(max_size).
This works more like the read primitive on the operating
system.

Moreover, I want it to buffer, and control how much is consumed.


auto packetSource = socket.byPacket(1024);
foreach(packet; packetSource) {
    // as soon as some data comes in we can get the length
    if(packet.length < 2) continue;
    auto length = packet.peek!(ushort); // neglect endian for now
    if(packet.length < length + 2) continue; // wait for more data

    packet.consume(2);
    handle(packet.consume(length));
}



In addition to the byChunk blocking problem...
what if the length straddles the edge?



byChunk is just a huge hassle to work with for every file
format I've tried so far. byLine is a little better
(some file formats are defined as being line based)
but still a bit of a pain for anything that can spill
into two lines.

May 16 2012

=?ISO-8859-1?Q?Alex_R=F8nne_Petersen?= <xtzgzorex gmail.com> writes:

On 13-05-2012 23:38, Walter Bright wrote:
 This discussion started in the thread "Getting the const-correctness of
 Object sorted once and for all", but it deserved its own thread.

 These modules suffer from the following problems:

 1. poor documentation, dearth of examples & rationale

 2. toHash(), toString(), etc., all need to be const pure nothrow, but
 it's turning out to be problematic for doing it for these classes

 3. overlapping functionality with std.stdio

 4. they should present a range interface, not a streaming one

While we're at it, do we want to keep std.outbuffer?

-- 
- Alex

May 14 2012

Walter Bright <newshound2 digitalmars.com> writes:

On 5/14/2012 6:29 PM, Alex R�nne Petersen wrote:
 While we're at it, do we want to keep std.outbuffer?

Since it's not range based, probably not.

May 14 2012

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Mon, May 14, 2012 at 07:57:28PM -0700, Walter Bright wrote:
 On 5/14/2012 6:29 PM, Alex R�nne Petersen wrote:
While we're at it, do we want to keep std.outbuffer?

 
 Since it's not range based, probably not.

Why not just fold this into std.io? I'm surprised that this is a
separate module, actually. It should either be folded into std.io, or
developed to be more generic (i.e., have range-based API, have more
features like auto-flushing past a certain size, etc.).


T

-- 
Prosperity breeds contempt, and poverty breeds consent. -- Suck.com

May 14 2012

Walter Bright <newshound2 digitalmars.com> writes:

On 5/14/2012 9:54 PM, H. S. Teoh wrote:
 On Mon, May 14, 2012 at 07:57:28PM -0700, Walter Bright wrote:
 On 5/14/2012 6:29 PM, Alex R�nne Petersen wrote:
 While we're at it, do we want to keep std.outbuffer?

 Since it's not range based, probably not.

 Why not just fold this into std.io?

It's not I/O.

May 14 2012

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 15.05.2012 8:54, H. S. Teoh wrote:
 On Mon, May 14, 2012 at 07:57:28PM -0700, Walter Bright wrote:
 On 5/14/2012 6:29 PM, Alex R�nne Petersen wrote:
 While we're at it, do we want to keep std.outbuffer?

 Since it's not range based, probably not.

 Why not just fold this into std.io? I'm surprised that this is a
 separate module, actually. It should either be folded into std.io, or
 developed to be more generic (i.e., have range-based API, have more
 features like auto-flushing past a certain size, etc.).

It's std.array Appender. The only difference is text vs binary output form.


-- 
Dmitry Olshansky

May 15 2012

D Programming

C/C++ Programming

Other

digitalmars.D - deprecating std.stream, std.cstream, std.socketstream