www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - deprecating std.stream, std.cstream, std.socketstream

reply Walter Bright <newshound2 digitalmars.com> writes:
This discussion started in the thread "Getting the const-correctness of Object 
sorted once and for all", but it deserved its own thread.

These modules suffer from the following problems:

1. poor documentation, dearth of examples & rationale

2. toHash(), toString(), etc., all need to be const pure nothrow, but it's 
turning out to be problematic for doing it for these classes

3. overlapping functionality with std.stdio

4. they should present a range interface, not a streaming one
May 13 2012
next sibling parent =?ISO-8859-1?Q?Alex_R=F8nne_Petersen?= <xtzgzorex gmail.com> writes:
On 13-05-2012 23:38, Walter Bright wrote:
 This discussion started in the thread "Getting the const-correctness of
 Object sorted once and for all", but it deserved its own thread.

 These modules suffer from the following problems:

 1. poor documentation, dearth of examples & rationale

 2. toHash(), toString(), etc., all need to be const pure nothrow, but
 it's turning out to be problematic for doing it for these classes

 3. overlapping functionality with std.stdio

 4. they should present a range interface, not a streaming one

I'm all for it. I haven't used any of them, ever, and probably never will. Their APIs aren't particularly appealing, honestly. -- - Alex
May 13 2012
prev sibling next sibling parent "Nathan M. Swan" <nathanmswan gmail.com> writes:
On Sunday, 13 May 2012 at 21:39:07 UTC, Walter Bright wrote:
 4. they should present a range interface, not a streaming one

I was just about to make a post suggesting that! You could easily integrate std.io with std.algorithm to do some pretty cool things. NMS
May 13 2012
prev sibling next sibling parent reply "Kiith-Sa" <42 theanswer.com> writes:
On Sunday, 13 May 2012 at 21:39:07 UTC, Walter Bright wrote:
 This discussion started in the thread "Getting the 
 const-correctness of Object sorted once and for all", but it 
 deserved its own thread.

 These modules suffer from the following problems:

 1. poor documentation, dearth of examples & rationale

 2. toHash(), toString(), etc., all need to be const pure 
 nothrow, but it's turning out to be problematic for doing it 
 for these classes

 3. overlapping functionality with std.stdio

 4. they should present a range interface, not a streaming one

My D:YAML library (YAML parser) depends on std.stream (e.g. for cross-endian compatibility and loading from memory), and I've been waiting for a replacement since the first release. I support removing std.stream, but it needs a replacement with equivalent functionality. Actually, I've postponed a 1.0 release _until_ std.stream is replaced.
May 13 2012
parent Walter Bright <newshound2 digitalmars.com> writes:
On 5/13/2012 3:16 PM, Nathan M. Swan wrote:
 Trying to make it read lazily is even harder, as all std.utf functions work on
 arrays, not ranges. I think this should change.

Yes, std.utf should be upgraded to present range interfaces.
May 13 2012
prev sibling next sibling parent "Nathan M. Swan" <nathanmswan gmail.com> writes:
On Sunday, 13 May 2012 at 21:53:58 UTC, Kiith-Sa wrote:
 My D:YAML library (YAML parser) depends on std.stream
 (e.g. for cross-endian compatibility and loading from memory),
 and I've been waiting for a replacement since the first release.

We also need better interfacing with UTFs in that. D is usually great at Unicode, but it doesn't interface well with I/O. For example, when working on the file-reader for SDC I had to hand-code the check-BOM-read-and-convert functions: http://bit.ly/J0QWVF . Trying to make it read lazily is even harder, as all std.utf functions work on arrays, not ranges. I think this should change. NMS
May 13 2012
prev sibling next sibling parent "Nathan M. Swan" <nathanmswan gmail.com> writes:
On Sunday, 13 May 2012 at 21:53:58 UTC, Kiith-Sa wrote:
 My D:YAML library (YAML parser) depends on std.stream
 (e.g. for cross-endian compatibility and loading from memory),
 and I've been waiting for a replacement since the first release.

We also need better interfacing with UTFs in that. D is usually great at Unicode, but it doesn't interface well with I/O. For example, when working on the file-reader for SDC I had to hand-code the check-BOM-read-and-convert functions: http://bit.ly/J0QWVF . Trying to make it read lazily is even harder, as all std.utf functions work on arrays, not ranges. I think this should change. NMS
May 13 2012
prev sibling next sibling parent Robert Clipsham <robert octarineparrot.com> writes:
On 13/05/2012 22:38, Walter Bright wrote:
 This discussion started in the thread "Getting the const-correctness of
 Object sorted once and for all", but it deserved its own thread.

 These modules suffer from the following problems:

 1. poor documentation, dearth of examples & rationale

 2. toHash(), toString(), etc., all need to be const pure nothrow, but
 it's turning out to be problematic for doing it for these classes

 3. overlapping functionality with std.stdio

 4. they should present a range interface, not a streaming one

I make use of std.stream quite a lot... It's horrible, it has to go. I'm not too bothered if replacements aren't available straight away, as it doesn't take much to drop 10 lines of replacement in for the functionality I use from it until the actual replacement appears. -- Robert http://octarineparrot.com/
May 13 2012
prev sibling next sibling parent reply "Oleg Kuporosov" <Oleg.Kuporosov gmail.com> writes:
On Sunday, 13 May 2012 at 21:39:07 UTC, Walter Bright wrote:

 3. overlapping functionality with std.stdio

unfortunatelly std.stdio under Windows couldn't handle UTF16(wchar)-based file names and text IO which are naturel there. The root of issues looks in both underlying DMC C-stdio (something wrong with w* based functions?) and std.format which provides only UTF8 strings. It make sense to depreciate for reasons but only after std.stdio would support UTF16 names/flows or good replacement (Steven's std.io?) would be ready. Currently std.[c]stream is only the way to work with UTF16 filesystems in Phobos. Or switch to Tango which looks supports it too (but I don't have expirience here).
May 13 2012
parent Walter Bright <newshound2 digitalmars.com> writes:
On 5/13/2012 10:22 PM, Oleg Kuporosov wrote:
 unfortunatelly std.stdio under Windows couldn't handle UTF16(wchar)-based file
 names and text IO which are naturel there. The root of issues looks in both
 underlying DMC C-stdio (something wrong with w* based functions?) and
std.format
 which provides only UTF8 strings. It make sense to depreciate for reasons but
 only after std.stdio would support UTF16 names/flows or good replacement
 (Steven's std.io?) would be ready. Currently std.[c]stream is only the way to
 work with UTF16 filesystems in Phobos. Or switch to Tango which looks supports
 it too (but I don't have expirience here).

Why not just convert the UTF16 strings to UTF8 ones? They have the same information.
May 14 2012
prev sibling next sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
On 5/13/12, Kiith-Sa <42 theanswer.com> wrote:
 My D:YAML library (YAML parser) depends on std.stream

Also ae.xml depends on it.
May 14 2012
prev sibling next sibling parent reply Stewart Gordon <smjg_1998 yahoo.com> writes:
 From the other thread....

On 13/05/2012 21:58, Walter Bright wrote:
 On 5/13/2012 1:48 PM, Stewart Gordon wrote:
 On 13/05/2012 20:42, Walter Bright wrote:
 <snip>
 I'd like to see std.stream dumped.  I don't see any reason for it
 to exist that std.stdio doesn't do (or should do).

So std.stdio.File is the replacement for the std.stream stuff?

Not exactly. Ranges are the replacement. std.stdio.File is merely a range that deals with files.

I don't see any of the required range methods in it. Moreover, I'm a bit confused about the means of retrieving multiple elements at once with the range API, such as a set number of bytes from a file. We have popFrontN, which advances the range but doesn't return the data from it. We have take and takeExactly, which seem to be the way to get a set number of elements from the range, but I'm confused about when/whether using these advances the original range. If I'm writing a library to read a binary file format, I want to allow the data to come from a file, a socket or a memory image. The stream API makes this straightforward. But it seems some work is needed before std.stdio and the range API are up to it. Stewart.
May 14 2012
parent Walter Bright <newshound2 digitalmars.com> writes:
On 5/14/2012 4:43 AM, Stewart Gordon wrote:
 If I'm writing a library to read a binary file format, I want to allow the data
 to come from a file, a socket or a memory image. The stream API makes this
 straightforward. But it seems some work is needed before std.stdio and the
range
 API are up to it.

I agree. But that's where the effort needs to be made, not in carrying stream forward.
May 14 2012
prev sibling next sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Sun, 13 May 2012 17:38:23 -0400, Walter Bright  
<newshound2 digitalmars.com> wrote:

 This discussion started in the thread "Getting the const-correctness of  
 Object sorted once and for all", but it deserved its own thread.

 These modules suffer from the following problems:

 1. poor documentation, dearth of examples & rationale

 2. toHash(), toString(), etc., all need to be const pure nothrow, but  
 it's turning out to be problematic for doing it for these classes

 3. overlapping functionality with std.stdio

 4. they should present a range interface, not a streaming one

I keep trying to avoid talking about this, because I'm writing a replacement library for std.stream, and I don't want to step on any toes while it's still not accepted. But I have to say, ranges are *not* a good interface for generic data providers. They are *very* good for structured data providers. In other words, a stream of bytes, not a good range (who wants to get one byte at a time?). A stream of UTF text broken into lines, a very good range. I have no problem with getting rid of std.stream. I've never actually used it. Still, we absolutely need a non-range based low-level streaming interface to data. If nothing else, we need something we can build ranges upon, and I think my replacement does a very good job of that. -Steve
May 14 2012
next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 5/14/2012 8:02 AM, Steven Schveighoffer wrote:
 I keep trying to avoid talking about this, because I'm writing a replacement
 library for std.stream, and I don't want to step on any toes while it's still
 not accepted.

 But I have to say, ranges are *not* a good interface for generic data
providers.
 They are *very* good for structured data providers.

 In other words, a stream of bytes, not a good range (who wants to get one byte
 at a time?). A stream of UTF text broken into lines, a very good range.

 I have no problem with getting rid of std.stream. I've never actually used it.
 Still, we absolutely need a non-range based low-level streaming interface to
 data. If nothing else, we need something we can build ranges upon, and I think
 my replacement does a very good job of that.

I'll say in advance without seeing your design that it'll be a tough sell if it is not range based. I've been doing some range based work on the side. I'm convinced there is enormous potential there, despite numerous shortcomings with them I ran across in Phobos. Those shortcomings can be fixed, they are not fatal. The ability to do things like: void main() { stdin.byChunk(1024). map!(a => a.idup). // one of those shortcomings joiner(). stripComments(). copy(stdout.lockingTextWriter()); } is just kick ass.
May 14 2012
prev sibling next sibling parent "Lars T. Kyllingstad" <public kyllingen.net> writes:
On Tuesday, 15 May 2012 at 02:56:20 UTC, Walter Bright wrote:
 On 5/14/2012 8:02 AM, Steven Schveighoffer wrote:
 I keep trying to avoid talking about this, because I'm writing 
 a replacement
 library for std.stream, and I don't want to step on any toes 
 while it's still
 not accepted.

 But I have to say, ranges are *not* a good interface for 
 generic data providers.
 They are *very* good for structured data providers.

 In other words, a stream of bytes, not a good range (who wants 
 to get one byte
 at a time?). A stream of UTF text broken into lines, a very 
 good range.

 [...]

I'll say in advance without seeing your design that it'll be a tough sell if it is not range based. I've been doing some range based work on the side. I'm convinced there is enormous potential there, despite numerous shortcomings with them I ran across in Phobos. Those shortcomings can be fixed, they are not fatal. [...]

I have to say, I'm with Steve on this one. While I do believe ranges will have a very important role to play in D's future I/O paradigm, I also think there needs to be a layer beneath the ranges that more directly maps to OS primitives. And as D is a systems programming language, that layer needs to be publicly available. (Note that this is how std.stdio works now, more or less.) -Lars
May 15 2012
prev sibling next sibling parent "Lars T. Kyllingstad" <public kyllingen.net> writes:
On Tuesday, 15 May 2012 at 15:22:03 UTC, Lars T. Kyllingstad 
wrote:
 On Tuesday, 15 May 2012 at 02:56:20 UTC, Walter Bright wrote:
 On 5/14/2012 8:02 AM, Steven Schveighoffer wrote:
 I keep trying to avoid talking about this, because I'm 
 writing a replacement
 library for std.stream, and I don't want to step on any toes 
 while it's still
 not accepted.

 But I have to say, ranges are *not* a good interface for 
 generic data providers.
 They are *very* good for structured data providers.

 In other words, a stream of bytes, not a good range (who 
 wants to get one byte
 at a time?). A stream of UTF text broken into lines, a very 
 good range.

 [...]

I'll say in advance without seeing your design that it'll be a tough sell if it is not range based. I've been doing some range based work on the side. I'm convinced there is enormous potential there, despite numerous shortcomings with them I ran across in Phobos. Those shortcomings can be fixed, they are not fatal. [...]

I have to say, I'm with Steve on this one. While I do believe ranges will have a very important role to play in D's future I/O paradigm, I also think there needs to be a layer beneath the ranges that more directly maps to OS primitives. And as D is a systems programming language, that layer needs to be publicly available. (Note that this is how std.stdio works now, more or less.)

Also, I wouldn't mind std.*stream getting deprecated. Personally, I've never used those modules -- not even once. As a first step their documentation could be removed from dlang.org, so new users aren't tempted to start using them. No functionality is better than poor functionality, IMO. -Lars
May 15 2012
prev sibling next sibling parent reply "Nathan M. Swan" <nathanmswan gmail.com> writes:
On Monday, 14 May 2012 at 15:02:11 UTC, Steven Schveighoffer 
wrote:
 In other words, a stream of bytes, not a good range (who wants 
 to get one byte at a time?).  A stream of UTF text broken into 
 lines, a very good range.

There are several cases where one would want one byte at the time; e.g. as an input to another range that produces the utf text as an output. I do agree for e.g. with binary data some data can't be read with ranges (when you need to read small chunks of varying size), but that doesn't mean most things shouldn't be ranged-based. NMS
May 15 2012
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 5/15/2012 3:34 PM, Nathan M. Swan wrote:
 I do agree for e.g. with binary data some data can't be read with ranges (when
 you need to read small chunks of varying size),

I don't see why that should be true.
May 16 2012
next sibling parent reply Robert Clipsham <robert octarineparrot.com> writes:
On 16/05/2012 15:38, Steven Schveighoffer wrote:
 On Wed, 16 May 2012 09:50:12 -0400, Walter Bright
 <newshound2 digitalmars.com> wrote:

 On 5/15/2012 3:34 PM, Nathan M. Swan wrote:
 I do agree for e.g. with binary data some data can't be read with
 ranges (when
 you need to read small chunks of varying size),

I don't see why that should be true.

How do you tell front and popFront how many bytes to read? -Steve

A bit ugly but: ---- // Default to 4 byte chunks auto range = myStream.byChunks(4); foreach (chunk; range) { // Set the next chunk is 3 bytes // Chunk after is 4 bytes range.nextChunkSize = 3; // Next chunk is always 5 bytes range.chunkSize = 5; } ---- -- Robert http://octarineparrot.com/
May 16 2012
parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 16.05.2012 19:32, Steven Schveighoffer wrote:
 On Wed, 16 May 2012 11:19:46 -0400, Robert Clipsham
 <robert octarineparrot.com> wrote:

 On 16/05/2012 15:38, Steven Schveighoffer wrote:
 On Wed, 16 May 2012 09:50:12 -0400, Walter Bright
 <newshound2 digitalmars.com> wrote:

 On 5/15/2012 3:34 PM, Nathan M. Swan wrote:
 I do agree for e.g. with binary data some data can't be read with
 ranges (when
 you need to read small chunks of varying size),

I don't see why that should be true.

How do you tell front and popFront how many bytes to read? -Steve

A bit ugly but: ---- // Default to 4 byte chunks auto range = myStream.byChunks(4); foreach (chunk; range) { // Set the next chunk is 3 bytes // Chunk after is 4 bytes range.nextChunkSize = 3; // Next chunk is always 5 bytes range.chunkSize = 5; }

Yeah, I've seen this before. It's not convincing.

Yes, It's obvious that files do *not* generally follow range of items semantic. I mean not even range of various items. In case of binary data it's most of the time header followed by various data. Or hierarchical structure. Or table of links + raw data. Or whatever. I've yet to see standard way to deal with binary formats :) -- Dmitry Olshansky
May 16 2012
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 5/16/2012 7:38 AM, Steven Schveighoffer wrote:
 On Wed, 16 May 2012 09:50:12 -0400, Walter Bright <newshound2 digitalmars.com>
 wrote:

 On 5/15/2012 3:34 PM, Nathan M. Swan wrote:
 I do agree for e.g. with binary data some data can't be read with ranges (when
 you need to read small chunks of varying size),

I don't see why that should be true.

How do you tell front and popFront how many bytes to read?

std.byLine() does it. In general, you can read n bytes by calling empty, front, and popFront n times.
May 16 2012
next sibling parent reply Stewart Gordon <smjg_1998 yahoo.com> writes:
On 16/05/2012 16:59, Walter Bright wrote:
 On 5/16/2012 7:38 AM, Steven Schveighoffer wrote:
 On Wed, 16 May 2012 09:50:12 -0400, Walter Bright <newshound2 digitalmars.com>
 wrote:

 On 5/15/2012 3:34 PM, Nathan M. Swan wrote:
 I do agree for e.g. with binary data some data can't be read with ranges (when
 you need to read small chunks of varying size),

I don't see why that should be true.

How do you tell front and popFront how many bytes to read?

std.byLine() does it.

And is what you want to do with a text file in many cases.
 In general, you can read n bytes by calling empty, front, and popFront n times.

Why would anybody want to read a large binary file _one byte at a time_? Stewart.
May 16 2012
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 5/16/2012 9:41 AM, Stewart Gordon wrote:
 On 16/05/2012 16:59, Walter Bright wrote:
 On 5/16/2012 7:38 AM, Steven Schveighoffer wrote:
 On Wed, 16 May 2012 09:50:12 -0400, Walter Bright <newshound2 digitalmars.com>
 wrote:

 On 5/15/2012 3:34 PM, Nathan M. Swan wrote:
 I do agree for e.g. with binary data some data can't be read with ranges (when
 you need to read small chunks of varying size),

I don't see why that should be true.

How do you tell front and popFront how many bytes to read?

std.byLine() does it.

And is what you want to do with a text file in many cases.
 In general, you can read n bytes by calling empty, front, and popFront n times.

Why would anybody want to read a large binary file _one byte at a time_?

You can have that range read from byChunk(). It's really the same thing that C's stdio does.
May 16 2012
parent Stewart Gordon <smjg_1998 yahoo.com> writes:
On 16/05/2012 18:21, Walter Bright wrote:
<snip>
 You can have that range read from byChunk(). It's really the same thing that
C's stdio does.

And what if I want it to work on ranges that don't have a byChunk method? Stewart.
May 16 2012
prev sibling parent Stewart Gordon <smjg_1998 yahoo.com> writes:
On 16/05/2012 17:48, H. S. Teoh wrote:
 On Wed, May 16, 2012 at 05:41:49PM +0100, Stewart Gordon wrote:

 Why would anybody want to read a large binary file _one byte at a
 time_?

import std.range; byte[] readNBytes(R)(R range, size_t n) if (isInputRange!R&& hasSlicing!R) { return R[0..n]; }

What if I want it to work on ranges that don't have slicing? Stewart.
May 16 2012
prev sibling next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 5/16/2012 10:18 AM, Steven Schveighoffer wrote:
 On Wed, 16 May 2012 11:59:37 -0400, Walter Bright <newshound2 digitalmars.com>
 wrote:

 On 5/16/2012 7:38 AM, Steven Schveighoffer wrote:
 On Wed, 16 May 2012 09:50:12 -0400, Walter Bright <newshound2 digitalmars.com>
 wrote:

 On 5/15/2012 3:34 PM, Nathan M. Swan wrote:
 I do agree for e.g. with binary data some data can't be read with ranges (when
 you need to read small chunks of varying size),

I don't see why that should be true.

How do you tell front and popFront how many bytes to read?

std.byLine() does it.

Have you looked at how std.byLine works? It certainly does not use a range interface as a source.

It presents a range interface, though. Not a streaming one.
 In general, you can read n bytes by calling empty, front, and popFront n times.

I hope you are not serious! This will make D *the worst performing* i/o language.

You can read arbitrary numbers of bytes by tacking a range on after byChunk().
May 16 2012
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 5/16/12 12:34 PM, Steven Schveighoffer wrote:
 In other words, ranges aren't enough.

This is copiously clear to me, but the way I like to think about it is by extending the notion of range (with notions such as e.g. BufferedRange, LookaheadRange, and such) instead of developing an abstraction independent from ranges and then working on stitching that with ranges. Andrei
May 16 2012
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 5/16/12 1:00 PM, Steven Schveighoffer wrote:
 What I think we would end up with is a streaming API with range
 primitives tacked on.

 - empty is clunky, but possible to implement. However, it may become
 invalid (think of reading a file that is being appended to by another
 process).
 - popFront and front do not have any clear definition of what they refer
 to. The only valid thing I can think of is bytes, and then nobody will
 use them.

 That's hardly saying it's "range based". I refuse to believe that people
 will be thrilled by having to 'pre-configure' each front and popFront
 call in order to get work done. If you want to try and convince me, I'm
 willing to listen, but so far I haven't seen anything that looks at all
 appetizing.

Where the two meet is in the notion of buffered streams. Ranges are by default buffered, i.e. user code can call front() several times without an intervening popFront() and get the same thing. So a range has by definition a buffer of at least one element. That makes the range interface unsuitable for strictly UNbuffered streams. On the other hand, a range could no problem offer OPTIONAL unbuffered reads (the existence of a buffer does not preclude availability of unbuffered transfers). So to tie it all nicely I think we need: 1. A STRICTLY UNBUFFERED streaming abstraction 2. A notion of range that supports unbuffered transfers. Andrei
May 16 2012
parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 05/16/2012 11:08 PM, Andrei Alexandrescu wrote:
 On 5/16/12 1:00 PM, Steven Schveighoffer wrote:
 What I think we would end up with is a streaming API with range
 primitives tacked on.

 - empty is clunky, but possible to implement. However, it may become
 invalid (think of reading a file that is being appended to by another
 process).
 - popFront and front do not have any clear definition of what they refer
 to. The only valid thing I can think of is bytes, and then nobody will
 use them.

 That's hardly saying it's "range based". I refuse to believe that people
 will be thrilled by having to 'pre-configure' each front and popFront
 call in order to get work done. If you want to try and convince me, I'm
 willing to listen, but so far I haven't seen anything that looks at all
 appetizing.

Where the two meet is in the notion of buffered streams. Ranges are by default buffered, i.e. user code can call front() several times without an intervening popFront() and get the same thing. So a range has by definition a buffer of at least one element.

I don't think this necessarily holds. 'front' might be computed on the fly, as it is done for std.algorithm.map.
 That makes the range interface unsuitable for strictly UNbuffered
 streams. On the other hand, a range could no problem offer OPTIONAL
 unbuffered reads (the existence of a buffer does not preclude
 availability of unbuffered transfers).

 So to tie it all nicely I think we need:

 1. A STRICTLY UNBUFFERED streaming abstraction

 2. A notion of range that supports unbuffered transfers.


 Andrei

May 16 2012
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 5/16/12 4:40 PM, Timon Gehr wrote:
 On 05/16/2012 11:08 PM, Andrei Alexandrescu wrote:
 On 5/16/12 1:00 PM, Steven Schveighoffer wrote:
 What I think we would end up with is a streaming API with range
 primitives tacked on.

 - empty is clunky, but possible to implement. However, it may become
 invalid (think of reading a file that is being appended to by another
 process).
 - popFront and front do not have any clear definition of what they refer
 to. The only valid thing I can think of is bytes, and then nobody will
 use them.

 That's hardly saying it's "range based". I refuse to believe that people
 will be thrilled by having to 'pre-configure' each front and popFront
 call in order to get work done. If you want to try and convince me, I'm
 willing to listen, but so far I haven't seen anything that looks at all
 appetizing.

Where the two meet is in the notion of buffered streams. Ranges are by default buffered, i.e. user code can call front() several times without an intervening popFront() and get the same thing. So a range has by definition a buffer of at least one element.

I don't think this necessarily holds. 'front' might be computed on the fly, as it is done for std.algorithm.map.

It used to be buffered in fact but that was too much trouble. The fair thing to say here is that map relies on the implicit buffering of its input. Andrei
May 16 2012
prev sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Wed, May 16, 2012 at 04:52:09PM -0400, Steven Schveighoffer wrote:
 On Wed, 16 May 2012 16:30:43 -0400, H. S. Teoh
 <hsteoh quickfur.ath.cx> wrote:
 
On Wed, May 16, 2012 at 04:15:22PM -0400, Steven Schveighoffer wrote:
On Wed, 16 May 2012 15:38:02 -0400, H. S. Teoh
<hsteoh quickfur.ath.cx> wrote:

One direction that _could_ be helpful, perhaps, is to extend the
concept of range to include, let's tentatively call it, a
ChunkedRange.  Basically a ChunkedRange implements the usual
InputRange operations (empty, front, popfront) but adds the
following new primitives:

- bool hasAtLeast(R)(R range, int n) - true if underlying range has
  at least n elements left;

- E[] frontN(R)(R range, int n) - returns a slice containing the
  front n elements from the range: this will buffer the next n
  elements from the range if they aren't already; repeated calls
  will just return the buffer;

- void popN(R)(R range, int n) - discards the first n elements from
  the buffer, thus causing the next call to frontN() to fetch more
  data if necessary.

On such ranges, what would popFront and front do? I'm assuming since frontN and popN are referring to how many elements, and since the most logical definition for elements is bytes, that front gets the next byte, and popFront discards the next byte. This seems useless to me.

How so? It's still useful for implementing readByte, for example.

readByte is covered by frontN(1). Why the need for front()? Let me answer that question for you -- so it can be treated as a normal range. But nobody will want to do that. i.e. copy to appender will read one byte at a time into the array.

If this new type of range is recognized by std.range, then the relevant algorithms can be made to recognize the existence of frontN and make good use of it, instead of iterating front N times. Then front() can still be used by stuff that really only wants a single byte at a time. [...]
The idea is that by asking for N elements at a time instead of
calling front/popFront N times, the underlying implementation can
optimize the request by creating a buffer of size N and have the OS
read exactly N bytes directly into that buffer.

	// Reads 1,000,000 bytes into newly allocated buffer and returns
	// buffer.
	auto buf = stream.frontN(1_000_000);

OK, so stream is providing data via return value and allocation.
	// Since 1,000,000 bytes is already read into the buffer, this
	// simply returns a slice of the same buffer:
	auto buf2 = stream.frontN(1_000_000);

Is buf2 mutable? If so, this is no good, buf could have mutated this data. But this can be fixed by making the return value of frontN be const(ubyte)[].
	assert(buf is buf2);

	// This consumes the buffer:
	stream.popN(1_000_000);

What does "consume" mean, discard? Obviously not "reuse", due to line below...

Yes, discard. That's what popFront does right now for a single element.
	// This will read another 1,000,000 bytes into a new buffer
	auto buf3 = stream.frontN(1_000_000);

OK, you definitely lost me here, this will not fly. The whole point of buffering is to avoid having to reallocate on every read. If you have to allocate every read, "buffering" is going to have a negative impact on performance!

I thought the whole point of buffering is to avoid excessive roundtrips to disk I/O. Though you do have a point that allocating on every read is a bad idea. T -- Why is it that all of the instruments seeking intelligent life in the universe are pointed away from Earth? -- Michael Beibl
May 16 2012
prev sibling next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 5/15/2012 4:43 PM, Sean Kelly wrote:
 One thing I'd like in a buffered input API is a way to perform transactional
 reads such that if the full read can't be performed, the read state remains
 unchanged. The best you can do with most APIs is to check for a desired
 length, but what I'd I don't want to read until a full line is available, and
 I don't know the exact length?  Typically, you end up having to double
 buffer, which stinks.

std.stdio.byLine()
May 16 2012
parent Walter Bright <newshound2 digitalmars.com> writes:
On 5/16/2012 7:49 AM, Sean Kelly wrote:
 On May 16, 2012, at 6:52 AM, Walter Bright<newshound2 digitalmars.com>
 wrote:

 On 5/15/2012 4:43 PM, Sean Kelly wrote:
 One thing I'd like in a buffered input API is a way to perform
 transactional reads such that if the full read can't be performed, the
 read state remains unchanged. The best you can do with most APIs is to
 check for a desired length, but what I'd I don't want to read until a
 full line is available, and I don't know the exact length?  Typically,
 you end up having to double buffer, which stinks.

std.stdio.byLine()

That was just an example. What if I want to do a formatted read and I'm reading from a file that someone else is writing to? I don't want to block or get a partial result and an EOF that needs to be reset.

Then you'll need an input range that can be reset - a ForwardRange.
May 16 2012
prev sibling next sibling parent travert phare.normalesup.org (Christophe Travert) writes:
"Steven Schveighoffer" , dans le message (digitalmars.D:167548), a
 My new design supports this.  I have a function called readUntil:
 
 https://github.com/schveiguy/phobos/blob/new-io2/std/io.d#L832
 
 Essentially, it reads into its buffer until the condition is satisfied.   
 Therefore, you are not double buffering.  The return value is a slice of  
 the buffer.
 
 There is a way to opt-out of reading any data if you determine you cannot  
 do a full read.  Just return 0 from the delegate.

Maybe I already told this some time ago, but I am not very comfortable with this design. The process delegate has to maintain an internal state, if you want to avoid reading everything again. It will be difficult to implement those process delegates. Do you have an example of moderately complicated reading process to show us it is not too complicated? To avoid this issue, the design could be reversed: A method that would like to read a certain amount of character could take a delegate from the stream, which provides additionnal bytes of data. Example: // create a T by reading from stream. returns true if the T was // successfully created, and false otherwise. bool readFrom(const(ubyte)[] delegate(size_t consumed) stream, out T t); The stream delegate returns a buffer of data to read from when called with consumed==0. It must return additionnal data when called repeatedly. When it is called with a consumed != 0, the corresponding amount of consumed bytes can be discared from the buffer. This "stream" delegate (if should have a better name) should not be more difficult to implement than readUntil, but makes it more easy to use by the client. Did I miss some important information ? -- Christophe
May 16 2012
prev sibling next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Wed, May 16, 2012 at 04:15:22PM -0400, Steven Schveighoffer wrote:
 On Wed, 16 May 2012 15:38:02 -0400, H. S. Teoh
 <hsteoh quickfur.ath.cx> wrote:

One direction that _could_ be helpful, perhaps, is to extend the concept
of range to include, let's tentatively call it, a ChunkedRange.
Basically a ChunkedRange implements the usual InputRange operations
(empty, front, popfront) but adds the following new primitives:

- bool hasAtLeast(R)(R range, int n) - true if underlying range has at
  least n elements left;

- E[] frontN(R)(R range, int n) - returns a slice containing the front n
  elements from the range: this will buffer the next n elements from the
  range if they aren't already; repeated calls will just return the
  buffer;

- void popN(R)(R range, int n) - discards the first n elements from the
  buffer, thus causing the next call to frontN() to fetch more data if
  necessary.

On such ranges, what would popFront and front do? I'm assuming since frontN and popN are referring to how many elements, and since the most logical definition for elements is bytes, that front gets the next byte, and popFront discards the next byte. This seems useless to me.

How so? It's still useful for implementing readByte, for example.
 I still don't get the need to "add" this to ranges.  The streaming API
 works fine on its own.
 
 But there is an omission with your proposed API regardless --
 reading data is a mutating event.  It destructively mutates the
 underlying data stream so that you cannot get the data again.  This
 means you must double-buffer data in order to support frontN and
 popN that are not necessary with a simple read API.
 
 For example:
 
 auto buf = new ubyte[1000000];
 stream.read(buf);
 
 does not need to first buffer the data inside the stream and then
 copy it to buf, it can read it from the OS *directly* into buf.

The idea is that by asking for N elements at a time instead of calling front/popFront N times, the underlying implementation can optimize the request by creating a buffer of size N and have the OS read exactly N bytes directly into that buffer. // Reads 1,000,000 bytes into newly allocated buffer and returns // buffer. auto buf = stream.frontN(1_000_000); // Since 1,000,000 bytes is already read into the buffer, this // simply returns a slice of the same buffer: auto buf2 = stream.frontN(1_000_000); assert(buf is buf2); // This consumes the buffer: stream.popN(1_000_000); // This will read another 1,000,000 bytes into a new buffer auto buf3 = stream.frontN(1_000_000); // This returns the same buffer as buf3 since we already have // the data available. auto buf4 = stream.frontN(1_000_000); T -- English has the lovely word "defenestrate", meaning "to execute by throwing someone out a window", or more recently "to remove Windows from a computer and replace it with something useful". :-) -- John Cowan
May 16 2012
prev sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Wed, 16 May 2012 16:30:43 -0400, H. S. Teoh <hsteoh quickfur.ath.cx>  
wrote:

 On Wed, May 16, 2012 at 04:15:22PM -0400, Steven Schveighoffer wrote:
 On Wed, 16 May 2012 15:38:02 -0400, H. S. Teoh
 <hsteoh quickfur.ath.cx> wrote:

One direction that _could_ be helpful, perhaps, is to extend the  

of range to include, let's tentatively call it, a ChunkedRange.
Basically a ChunkedRange implements the usual InputRange operations
(empty, front, popfront) but adds the following new primitives:

- bool hasAtLeast(R)(R range, int n) - true if underlying range has at
  least n elements left;

- E[] frontN(R)(R range, int n) - returns a slice containing the front  

  elements from the range: this will buffer the next n elements from  

  range if they aren't already; repeated calls will just return the
  buffer;

- void popN(R)(R range, int n) - discards the first n elements from the
  buffer, thus causing the next call to frontN() to fetch more data if
  necessary.

On such ranges, what would popFront and front do? I'm assuming since frontN and popN are referring to how many elements, and since the most logical definition for elements is bytes, that front gets the next byte, and popFront discards the next byte. This seems useless to me.

How so? It's still useful for implementing readByte, for example.

readByte is covered by frontN(1). Why the need for front()? Let me answer that question for you -- so it can be treated as a normal range. But nobody will want to do that. i.e. copy to appender will read one byte at a time into the array.
 I still don't get the need to "add" this to ranges.  The streaming API
 works fine on its own.

 But there is an omission with your proposed API regardless --
 reading data is a mutating event.  It destructively mutates the
 underlying data stream so that you cannot get the data again.  This
 means you must double-buffer data in order to support frontN and
 popN that are not necessary with a simple read API.

 For example:

 auto buf = new ubyte[1000000];
 stream.read(buf);

 does not need to first buffer the data inside the stream and then
 copy it to buf, it can read it from the OS *directly* into buf.

The idea is that by asking for N elements at a time instead of calling front/popFront N times, the underlying implementation can optimize the request by creating a buffer of size N and have the OS read exactly N bytes directly into that buffer. // Reads 1,000,000 bytes into newly allocated buffer and returns // buffer. auto buf = stream.frontN(1_000_000);

OK, so stream is providing data via return value and allocation.
 	// Since 1,000,000 bytes is already read into the buffer, this
 	// simply returns a slice of the same buffer:
 	auto buf2 = stream.frontN(1_000_000);

Is buf2 mutable? If so, this is no good, buf could have mutated this data. But this can be fixed by making the return value of frontN be const(ubyte)[].
 	assert(buf is buf2);

 	// This consumes the buffer:
 	stream.popN(1_000_000);

What does "consume" mean, discard? Obviously not "reuse", due to line below...
 	// This will read another 1,000,000 bytes into a new buffer
 	auto buf3 = stream.frontN(1_000_000);

OK, you definitely lost me here, this will not fly. The whole point of buffering is to avoid having to reallocate on every read. If you have to allocate every read, "buffering" is going to have a negative impact on performance! -Steve
May 16 2012
prev sibling next sibling parent Sean Kelly <sean invisibleduck.org> writes:
On May 15, 2012, at 3:34 PM, "Nathan M. Swan" <nathanmswan gmail.com> wrote:=


 On Monday, 14 May 2012 at 15:02:11 UTC, Steven Schveighoffer wrote:
 In other words, a stream of bytes, not a good range (who wants to get one=


e.
=20
 There are several cases where one would want one byte at the time; e.g. as=

=20
 I do agree for e.g. with binary data some data can't be read with ranges (=

ost things shouldn't be ranged-based. You really want both, depending on the situation. I don't see what's weird a= bout this. C++ iostreams have input and output iterators built on top as wel= l, for much the same reason. The annoying part is that once you've moved to a= range interface it's hard to go back. Like say I want a ZipRange on top of a= FileRange. But now I wan to read structs as binary blobs from that uncompr= essed output.=20 One thing I'd like in a buffered input API is a way to perform transactional= reads such that if the full read can't be performed, the read state remains= unchanged. The best you can do with most APIs is to check for a desired len= gth, but what I'd I don't want to read until a full line is available, and I= don't know the exact length? Typically, you end up having to double buffer= , which stinks.=20=
May 15 2012
prev sibling next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, May 15, 2012 at 04:43:05PM -0700, Sean Kelly wrote:
[...]
 One thing I'd like in a buffered input API is a way to perform
 transactional reads such that if the full read can't be performed, the
 read state remains unchanged. The best you can do with most APIs is to
 check for a desired length, but what I'd I don't want to read until a
 full line is available, and I don't know the exact length?  Typically,
 you end up having to double buffer, which stinks. 

This would be very nice to have, but how would you go about implementing such a thing, though? Wouldn't you need OS-level support for it? T -- Let's eat some disquits while we format the biskettes.
May 15 2012
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Tue, 15 May 2012 19:43:05 -0400, Sean Kelly <sean invisibleduck.org>  
wrote:

 One thing I'd like in a buffered input API is a way to perform  
 transactional reads such that if the full read can't be performed, the  
 read state remains unchanged. The best you can do with most APIs is to  
 check for a desired length, but what I'd I don't want to read until a  
 full line is available, and I don't know the exact length?  Typically,  
 you end up having to double buffer, which stinks.

My new design supports this. I have a function called readUntil: https://github.com/schveiguy/phobos/blob/new-io2/std/io.d#L832 Essentially, it reads into its buffer until the condition is satisfied. Therefore, you are not double buffering. The return value is a slice of the buffer. There is a way to opt-out of reading any data if you determine you cannot do a full read. Just return 0 from the delegate. -Steve
May 16 2012
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Mon, 14 May 2012 22:56:08 -0400, Walter Bright  
<newshound2 digitalmars.com> wrote:

 On 5/14/2012 8:02 AM, Steven Schveighoffer wrote:
 I keep trying to avoid talking about this, because I'm writing a  
 replacement
 library for std.stream, and I don't want to step on any toes while it's  
 still
 not accepted.

 But I have to say, ranges are *not* a good interface for generic data  
 providers.
 They are *very* good for structured data providers.

 In other words, a stream of bytes, not a good range (who wants to get  
 one byte
 at a time?). A stream of UTF text broken into lines, a very good range.

 I have no problem with getting rid of std.stream. I've never actually  
 used it.
 Still, we absolutely need a non-range based low-level streaming  
 interface to
 data. If nothing else, we need something we can build ranges upon, and  
 I think
 my replacement does a very good job of that.

I'll say in advance without seeing your design that it'll be a tough sell if it is not range based. I've been doing some range based work on the side. I'm convinced there is enormous potential there, despite numerous shortcomings with them I ran across in Phobos. Those shortcomings can be fixed, they are not fatal. The ability to do things like: void main() { stdin.byChunk(1024). map!(a => a.idup). // one of those shortcomings joiner(). stripComments(). copy(stdout.lockingTextWriter()); }

I think we may have a misunderstanding. My design is not range-based, but supports ranges, and actually makes them very easy to implement. byChunk is a perfect example of good range -- it defines a specific criteria for determining an "element" of data, appropriate for specific situations. But it must be built on top of something that allows reading arbitrary amounts of data. At the lowest level, this is the OS file descriptor/HANDLE. To be efficient, it should be based on a buffering stream. That buffering stream *does not* need to be a range, and I don't think shoehorning such a construct into a range interface makes any sense. To make this clear, I can say that any range File supports, my design will support *as a range*. To make it even clearer, the current std.stdio.File structure, which you have shown to "kick ass" with ranges, is *NOT* range-based by my definition. I should note, the output range idiom is directly supported, because the output range definition exactly maps to an output stream definition. -Steve
May 16 2012
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Wed, 16 May 2012 09:50:12 -0400, Walter Bright  
<newshound2 digitalmars.com> wrote:

 On 5/15/2012 3:34 PM, Nathan M. Swan wrote:
 I do agree for e.g. with binary data some data can't be read with  
 ranges (when
 you need to read small chunks of varying size),

I don't see why that should be true.

How do you tell front and popFront how many bytes to read? -Steve
May 16 2012
prev sibling next sibling parent Sean Kelly <sean invisibleduck.org> writes:
On May 16, 2012, at 6:52 AM, Walter Bright <newshound2 digitalmars.com> wrot=
e:

 On 5/15/2012 4:43 PM, Sean Kelly wrote:
 One thing I'd like in a buffered input API is a way to perform transactio=


 reads such that if the full read can't be performed, the read state remai=


 unchanged. The best you can do with most APIs is to check for a desired
 length, but what I'd I don't want to read until a full line is available,=


 I don't know the exact length?  Typically, you end up having to double
 buffer, which stinks.

std.stdio.byLine()

That was just an example. What if I want to do a formatted read and I'm read= ing from a file that someone else is writing to? I don't want to block or g= et a partial result and an EOF that needs to be reset.=20=
May 16 2012
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Wed, 16 May 2012 10:03:42 -0400, Christophe Travert  
<travert phare.normalesup.org> wrote:

 "Steven Schveighoffer" , dans le message (digitalmars.D:167548), a
 My new design supports this.  I have a function called readUntil:

 https://github.com/schveiguy/phobos/blob/new-io2/std/io.d#L832

 Essentially, it reads into its buffer until the condition is satisfied.
 Therefore, you are not double buffering.  The return value is a slice of
 the buffer.

 There is a way to opt-out of reading any data if you determine you  
 cannot
 do a full read.  Just return 0 from the delegate.

Maybe I already told this some time ago, but I am not very comfortable with this design. The process delegate has to maintain an internal state, if you want to avoid reading everything again. It will be difficult to implement those process delegates.

The delegate is given which portion has already been "processed", that is the 'start' parameter. If you can use this information, it's highly useful. If you need more context, yes, you have to store it elsewhere, but you do have a delegate which contains a context pointer. In a few places (take a look at TextStream's readln https://github.com/schveiguy/phobos/blob/new-io2/std/io.d#L2149) I use inner functions that have access to the function call's frame pointer in order to configure or store data.
 Do you have an example
 of moderately complicated reading process to show us it is not too
 complicated?

The most complicated I have so far is reading UTF data as a range of dchar: https://github.com/schveiguy/phobos/blob/new-io2/std/io.d#L2209 Note that I hand-inlined all the decoding because using std.utf or the runtime was too slow, so although it looks huge, it's pretty basic stuff, and can largely be ignored for the terms of this discussion. The interesting part is how it specifies what to consume and what not to. I realize it's a different way of thinking about how to do I/O, but it gives more control to the buffer, so it can reason about how best to buffer things. I look at as a way of the buffered stream saying "I'll read some data, you tell me when you see something interesting, and I'll give you a slice to it". The alternative is to double-buffer your data. Each call to read can invalidate the previously buffered data. But readUntil guarantees the data is contiguous and consumed all at once, no need to double-buffer
 To avoid this issue, the design could be reversed: A method that would
 like to read a certain amount of character could take a delegate from
 the stream, which provides additionnal bytes of data.

 Example:
 // create a T by reading from stream. returns true if the T was
 // successfully created, and false otherwise.
 bool readFrom(const(ubyte)[] delegate(size_t consumed) stream, out T t);

 The stream delegate returns a buffer of data to read from when called
 with consumed==0. It must return additionnal data when called
 repeatedly. When it is called with a consumed != 0, the corresponding
 amount of consumed bytes can be discared from the buffer.

I can see use cases for both your method and mine. I think I can implement your idea in terms of mine. I might just do that. The only thing missing is, you need a way to specify to the delegate that it needs more data. Probably using size_t.max as an argument. In fact, I need a peek function anyways, your function will provide that ability as well. -Steve
May 16 2012
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Wed, 16 May 2012 11:19:46 -0400, Robert Clipsham  
<robert octarineparrot.com> wrote:

 On 16/05/2012 15:38, Steven Schveighoffer wrote:
 On Wed, 16 May 2012 09:50:12 -0400, Walter Bright
 <newshound2 digitalmars.com> wrote:

 On 5/15/2012 3:34 PM, Nathan M. Swan wrote:
 I do agree for e.g. with binary data some data can't be read with
 ranges (when
 you need to read small chunks of varying size),

I don't see why that should be true.

How do you tell front and popFront how many bytes to read? -Steve

A bit ugly but: ---- // Default to 4 byte chunks auto range = myStream.byChunks(4); foreach (chunk; range) { // Set the next chunk is 3 bytes // Chunk after is 4 bytes range.nextChunkSize = 3; // Next chunk is always 5 bytes range.chunkSize = 5; }

Yeah, I've seen this before. It's not convincing. -Steve
May 16 2012
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Wed, 16 May 2012 11:48:32 -0400, Dmitry Olshansky  
<dmitry.olsh gmail.com> wrote:

 On 16.05.2012 19:32, Steven Schveighoffer wrote:
 On Wed, 16 May 2012 11:19:46 -0400, Robert Clipsham
 <robert octarineparrot.com> wrote:
 A bit ugly but:
 ----
 // Default to 4 byte chunks
 auto range = myStream.byChunks(4);
 foreach (chunk; range) {
 // Set the next chunk is 3 bytes
 // Chunk after is 4 bytes
 range.nextChunkSize = 3;

 // Next chunk is always 5 bytes
 range.chunkSize = 5;
 }

Yeah, I've seen this before. It's not convincing.

Yes, It's obvious that files do *not* generally follow range of items semantic. I mean not even range of various items. In case of binary data it's most of the time header followed by various data. Or hierarchical structure. Or table of links + raw data. Or whatever. I've yet to see standard way to deal with binary formats :)

The best solution would be a range that's specific to your format. My solution intends to support that. But that's only if your format fits within the "range of elements" model. Good old fashioned "read X bytes" needs to be supported, and insisting you do this range style is just plain wrong IMO. -Steve
May 16 2012
prev sibling next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Wed, May 16, 2012 at 05:41:49PM +0100, Stewart Gordon wrote:
 On 16/05/2012 16:59, Walter Bright wrote:
On 5/16/2012 7:38 AM, Steven Schveighoffer wrote:
On Wed, 16 May 2012 09:50:12 -0400, Walter Bright <newshound2 digitalmars.com>
wrote:

On 5/15/2012 3:34 PM, Nathan M. Swan wrote:
I do agree for e.g. with binary data some data can't be read with
ranges (when you need to read small chunks of varying size),

I don't see why that should be true.

How do you tell front and popFront how many bytes to read?

std.byLine() does it.

And is what you want to do with a text file in many cases.
In general, you can read n bytes by calling empty, front, and
popFront n times.

Why would anybody want to read a large binary file _one byte at a time_?

import std.range; byte[] readNBytes(R)(R range, size_t n) if (isInputRange!R && hasSlicing!R) { return R[0..n]; } T -- MAS = Mana Ada Sistem?
May 16 2012
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Wed, 16 May 2012 11:59:37 -0400, Walter Bright  
<newshound2 digitalmars.com> wrote:

 On 5/16/2012 7:38 AM, Steven Schveighoffer wrote:
 On Wed, 16 May 2012 09:50:12 -0400, Walter Bright  
 <newshound2 digitalmars.com>
 wrote:

 On 5/15/2012 3:34 PM, Nathan M. Swan wrote:
 I do agree for e.g. with binary data some data can't be read with  
 ranges (when
 you need to read small chunks of varying size),

I don't see why that should be true.

How do you tell front and popFront how many bytes to read?

std.byLine() does it.

Have you looked at how std.byLine works? It certainly does not use a range interface as a source.
 In general, you can read n bytes by calling empty, front, and popFront n  
 times.

I hope you are not serious! This will make D *the worst performing* i/o language. This should be evidence enough: steves steves-laptop:~$ time dd if=/dev/zero of=/dev/null bs=1 count=1000000 1000000+0 records in 1000000+0 records out 1000000 bytes (1.0 MB) copied, 0.74052 s, 1.4 MB/s real 0m0.744s user 0m0.176s sys 0m0.564s steves steves-laptop:~$ time dd if=/dev/zero of=/dev/null bs=1000 count=1000 1000+0 records in 1000+0 records out 1000000 bytes (1.0 MB) copied, 0.00194096 s, 515 MB/s real 0m0.006s user 0m0.000s sys 0m0.004s -Steve
May 16 2012
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Wed, 16 May 2012 13:21:37 -0400, Walter Bright  
<newshound2 digitalmars.com> wrote:

 On 5/16/2012 9:41 AM, Stewart Gordon wrote:
 On 16/05/2012 16:59, Walter Bright wrote:
 On 5/16/2012 7:38 AM, Steven Schveighoffer wrote:
 On Wed, 16 May 2012 09:50:12 -0400, Walter Bright  
 <newshound2 digitalmars.com>
 wrote:

 On 5/15/2012 3:34 PM, Nathan M. Swan wrote:
 I do agree for e.g. with binary data some data can't be read with  
 ranges (when
 you need to read small chunks of varying size),

I don't see why that should be true.

How do you tell front and popFront how many bytes to read?

std.byLine() does it.

And is what you want to do with a text file in many cases.
 In general, you can read n bytes by calling empty, front, and popFront  
 n times.

Why would anybody want to read a large binary file _one byte at a time_?

You can have that range read from byChunk(). It's really the same thing that C's stdio does.

This is very wrong. byChunk doesn't cut it. The number of bytes to consume from the stream can depend on any number of factors, including the actual data in the stream. For instance, I challenge you to write an efficient (meaning no extra buffering) byLine using byChunk as a base. -Steve
May 16 2012
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Wed, 16 May 2012 13:23:07 -0400, Walter Bright  
<newshound2 digitalmars.com> wrote:

 On 5/16/2012 10:18 AM, Steven Schveighoffer wrote:
 On Wed, 16 May 2012 11:59:37 -0400, Walter Bright  
 <newshound2 digitalmars.com>
 wrote:

 On 5/16/2012 7:38 AM, Steven Schveighoffer wrote:
 On Wed, 16 May 2012 09:50:12 -0400, Walter Bright  
 <newshound2 digitalmars.com>
 wrote:

 On 5/15/2012 3:34 PM, Nathan M. Swan wrote:
 I do agree for e.g. with binary data some data can't be read with  
 ranges (when
 you need to read small chunks of varying size),

I don't see why that should be true.

How do you tell front and popFront how many bytes to read?

std.byLine() does it.

Have you looked at how std.byLine works? It certainly does not use a range interface as a source.

It presents a range interface, though. Not a streaming one.

But that is *the point*! The code deciding how much data to read (i.e. the entity I referenced above that 'tells front and popFront how many bytes to read') is *not* using a range interface. In other words, ranges aren't enough. Ranges can be built on top of streaming interfaces. But there is *still* a need for a comprehensive streaming toolkit. And C's streaming toolkit is not as good as a native D toolkit can be.
 In general, you can read n bytes by calling empty, front, and popFront  
 n times.

I hope you are not serious! This will make D *the worst performing* i/o language.

You can read arbitrary numbers of bytes by tacking a range on after byChunk().

No, this doesn't work in most cases. See my other post. You can't get everything you want out of just byChunk and byLine. what about byMySpecificPacketProtocol? -Steve
May 16 2012
prev sibling next sibling parent "Adam D. Ruppe" <destructionator gmail.com> writes:
tbh, I've found byChunk to be less than worthless
in my experience; it's a liability because I still
have to wrap it somehow to real real world files.

Consider reading a series of strings in the format
<length><data>,[...].

I'd like it to be this simple (neglecting priming the loop):

string[] s;
while(!file.eof)) {
     ubyte length = file.read!ubyte;
     s ~= file.read!string(length);
}


The C fgetc/fread interface can do this reasonably
well.

string[] s;
while(!feof(fp)) {
    ubyte length = fgetc(fp);
    char[] buffer;
    buffer.length = length;
    fread(buffer.ptr, 1, length, fp);
    s ~= assumeUnique(buffer);
}


But, doing it with byChunk is an exercise in pain
that I don't even feel like writing here.




Another problem is consider a network interface. You
want to handle the packets as they come in.

byChunk doesn't work at all because it blocks until it
gets the chunk of the requested size.

foreach(chunk; socket.byChunk(1024))


suppose you get a packet of length 1000 and you have
to answer it. That will block forever.

So, if you use byChunk as the underlying thing to fill
your buffer... you don't get anywhere.


I think a better input primitive is byPacket(max_size).
This works more like the read primitive on the operating
system.

Moreover, I want it to buffer, and control how much is consumed.


auto packetSource = socket.byPacket(1024);
foreach(packet; packetSource) {
    // as soon as some data comes in we can get the length
    if(packet.length < 2) continue;
    auto length = packet.peek!(ushort); // neglect endian for now
    if(packet.length < length + 2) continue; // wait for more data

    packet.consume(2);
    handle(packet.consume(length));
}



In addition to the byChunk blocking problem...
what if the length straddles the edge?



byChunk is just a huge hassle to work with for every file
format I've tried so far. byLine is a little better
(some file formats are defined as being line based)
but still a bit of a pain for anything that can spill
into two lines.
May 16 2012
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Wed, 16 May 2012 13:48:49 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 On 5/16/12 12:34 PM, Steven Schveighoffer wrote:
 In other words, ranges aren't enough.

This is copiously clear to me, but the way I like to think about it is by extending the notion of range (with notions such as e.g. BufferedRange, LookaheadRange, and such) instead of developing an abstraction independent from ranges and then working on stitching that with ranges.

What I think we would end up with is a streaming API with range primitives tacked on. - empty is clunky, but possible to implement. However, it may become invalid (think of reading a file that is being appended to by another process). - popFront and front do not have any clear definition of what they refer to. The only valid thing I can think of is bytes, and then nobody will use them. That's hardly saying it's "range based". I refuse to believe that people will be thrilled by having to 'pre-configure' each front and popFront call in order to get work done. If you want to try and convince me, I'm willing to listen, but so far I haven't seen anything that looks at all appetizing. -Steve
May 16 2012
prev sibling next sibling parent "Adam D. Ruppe" <destructionator gmail.com> writes:
On Wednesday, 16 May 2012 at 17:48:52 UTC, Andrei Alexandrescu 
wrote:
 This is copiously clear to me, but the way I like to think 
 about it is by extending the notion of range (with notions such 
 as e.g. BufferedRange, LookaheadRange, and such)

I tried this in cgi.d somewhat recently. It ended up only vaguely looking like a range. /** A slight difference from regular ranges is you can give it the maximum number of bytes to consume. IMPORTANT NOTE: the default is to consume nothing, so if you don't call consume() yourself and use a regular foreach, it will infinitely loop! */ void popFront(size_t maxBytesToConsume = 0 /*size_t.max*/, size_t minBytesToSettleFor = 0) {} I called that a "slight different" in the comment, but it is actually a pretty major difference. In practice, it is nothing like a regular range. If I defaulted to size_t.max, you could foreach() it, but then you don't really get to take advantage of the buffer, since it is cleared out entirely for each iteration. If it defaults to 0, you can put it in a foreach... but you have to manually say how much of it is consumed, which no other range does, meaning it won't work with std.algorithm or anything. It sorta looks like a range, but isn't actually one at all. I'm sure something better is possible, but I don't think the range abstraction is a good fit for this use case. Of course, providing optional ranges (like how file gives byChunk, byLine, etc.) is probably a good idea.
May 16 2012
prev sibling next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Wed, May 16, 2012 at 12:48:49PM -0500, Andrei Alexandrescu wrote:
 On 5/16/12 12:34 PM, Steven Schveighoffer wrote:
In other words, ranges aren't enough.

This is copiously clear to me, but the way I like to think about it is by extending the notion of range (with notions such as e.g. BufferedRange, LookaheadRange, and such) instead of developing an abstraction independent from ranges and then working on stitching that with ranges.

One direction that _could_ be helpful, perhaps, is to extend the concept of range to include, let's tentatively call it, a ChunkedRange. Basically a ChunkedRange implements the usual InputRange operations (empty, front, popfront) but adds the following new primitives: - bool hasAtLeast(R)(R range, int n) - true if underlying range has at least n elements left; - E[] frontN(R)(R range, int n) - returns a slice containing the front n elements from the range: this will buffer the next n elements from the range if they aren't already; repeated calls will just return the buffer; - void popN(R)(R range, int n) - discards the first n elements from the buffer, thus causing the next call to frontN() to fetch more data if necessary. These are all tentative names, of course. But the idea is that you can keep N elements of the range "in view" at a time, without having to individually read them out and save them in a second buffer, and you can advance this view once you're done with the current data and want to move on. Existing range operations like popFrontN, take, takeExactly, drop, etc., can be extended to take advantage of the extra functionality of ChunkedRanges. (Perhaps popFrontN can even be merged with popN, since they amount to the same thing.) Using a ChunkedRange allows you to write functions that parse a particular range and return a range of chunks (say, a deserializer that returns a range of objects given a range of bytes). Thinking on it a bit further, perhaps we can call this a WindowedRange, since it somewhat resembles the sliding window protocol where you keep a "window" of sequential packet ids in an active buffer, and remove them from the buffer as they get ack'ed (consumed by popN). The buffer thus acts like a "window" into the next n elements in the range, which can be "slid forward" as data is consumed. T -- Having a smoking section in a restaurant is like having a peeing section in a swimming pool. -- Edward Burr
May 16 2012
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Wed, 16 May 2012 15:38:02 -0400, H. S. Teoh <hsteoh quickfur.ath.cx>  
wrote:

 On Wed, May 16, 2012 at 12:48:49PM -0500, Andrei Alexandrescu wrote:
 On 5/16/12 12:34 PM, Steven Schveighoffer wrote:
In other words, ranges aren't enough.

This is copiously clear to me, but the way I like to think about it is by extending the notion of range (with notions such as e.g. BufferedRange, LookaheadRange, and such) instead of developing an abstraction independent from ranges and then working on stitching that with ranges.

One direction that _could_ be helpful, perhaps, is to extend the concept of range to include, let's tentatively call it, a ChunkedRange. Basically a ChunkedRange implements the usual InputRange operations (empty, front, popfront) but adds the following new primitives: - bool hasAtLeast(R)(R range, int n) - true if underlying range has at least n elements left; - E[] frontN(R)(R range, int n) - returns a slice containing the front n elements from the range: this will buffer the next n elements from the range if they aren't already; repeated calls will just return the buffer; - void popN(R)(R range, int n) - discards the first n elements from the buffer, thus causing the next call to frontN() to fetch more data if necessary.

On such ranges, what would popFront and front do? I'm assuming since frontN and popN are referring to how many elements, and since the most logical definition for elements is bytes, that front gets the next byte, and popFront discards the next byte. This seems useless to me. I still don't get the need to "add" this to ranges. The streaming API works fine on its own. But there is an omission with your proposed API regardless -- reading data is a mutating event. It destructively mutates the underlying data stream so that you cannot get the data again. This means you must double-buffer data in order to support frontN and popN that are not necessary with a simple read API. For example: auto buf = new ubyte[1000000]; stream.read(buf); does not need to first buffer the data inside the stream and then copy it to buf, it can read it from the OS *directly* into buf. -Steve
May 16 2012
prev sibling next sibling parent "jerro" <a a.com> writes:
 One direction that _could_ be helpful, perhaps, is to extend 
 the concept
 of range to include, let's tentatively call it, a ChunkedRange.
 Basically a ChunkedRange implements the usual InputRange 
 operations
 (empty, front, popfront) but adds the following new primitives:

 - bool hasAtLeast(R)(R range, int n) - true if underlying range 
 has at
   least n elements left;

I think it would be better to have a function that would return the number of elements left.
 - E[] frontN(R)(R range, int n) - returns a slice containing 
 the front n
   elements from the range: this will buffer the next n elements 
 from the
   range if they aren't already; repeated calls will just return 
 the
   buffer;

 - void popN(R)(R range, int n) - discards the first n elements 
 from the
   buffer, thus causing the next call to frontN() to fetch more 
 data if
   necessary.

I like the idea of frontN and popN. But is there any reason why a type that defines those (let's call it a stream) should also be a range? I would prefer to have a type that just defines those two functions, a function that returns the number of available elements and a functions that tells whether we are at the end of stream. If you need a range of elements with a blocking popFront, it's easy to build one on top of it. You can write a functions that takes any stream and returns a range of element. I think that's better than having to write front, popFront, and empty for every stream.
May 16 2012
prev sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Wed, 16 May 2012 16:38:54 -0400, Artur Skawina <art.08.09 gmail.com>  
wrote:

 On 05/16/12 22:15, Steven Schveighoffer wrote:
 I still don't get the need to "add" this to ranges.  The streaming API  
 works fine on its own.

This is not an argument against a streaming API (at least not for me), but for efficient ranges. With the API above I can shift tens of gigabytes of data per second between threads. And still use the 'std' range API and everything that works with it...

But you never would want to. Don't get me wrong, the primitives here could work for a streaming API (I haven't implemented it that way, but it could be made to work). But the idea that it must *also* be a std.range input range makes zero sense. To me, this is as obvious as not supporting linklist[index]; Sure, it can be done, but who would ever use it? -Steve
May 16 2012
prev sibling next sibling parent reply =?ISO-8859-1?Q?Alex_R=F8nne_Petersen?= <xtzgzorex gmail.com> writes:
On 13-05-2012 23:38, Walter Bright wrote:
 This discussion started in the thread "Getting the const-correctness of
 Object sorted once and for all", but it deserved its own thread.

 These modules suffer from the following problems:

 1. poor documentation, dearth of examples & rationale

 2. toHash(), toString(), etc., all need to be const pure nothrow, but
 it's turning out to be problematic for doing it for these classes

 3. overlapping functionality with std.stdio

 4. they should present a range interface, not a streaming one

While we're at it, do we want to keep std.outbuffer? -- - Alex
May 14 2012
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 5/14/2012 6:29 PM, Alex Rønne Petersen wrote:
 While we're at it, do we want to keep std.outbuffer?

Since it's not range based, probably not.
May 14 2012
next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 5/14/2012 9:54 PM, H. S. Teoh wrote:
 On Mon, May 14, 2012 at 07:57:28PM -0700, Walter Bright wrote:
 On 5/14/2012 6:29 PM, Alex Rønne Petersen wrote:
 While we're at it, do we want to keep std.outbuffer?

Since it's not range based, probably not.

Why not just fold this into std.io?

It's not I/O.
May 14 2012
prev sibling parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 15.05.2012 8:54, H. S. Teoh wrote:
 On Mon, May 14, 2012 at 07:57:28PM -0700, Walter Bright wrote:
 On 5/14/2012 6:29 PM, Alex Rønne Petersen wrote:
 While we're at it, do we want to keep std.outbuffer?

Since it's not range based, probably not.

Why not just fold this into std.io? I'm surprised that this is a separate module, actually. It should either be folded into std.io, or developed to be more generic (i.e., have range-based API, have more features like auto-flushing past a certain size, etc.).

It's std.array Appender. The only difference is text vs binary output form. -- Dmitry Olshansky
May 15 2012
prev sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Mon, May 14, 2012 at 07:57:28PM -0700, Walter Bright wrote:
 On 5/14/2012 6:29 PM, Alex Rønne Petersen wrote:
While we're at it, do we want to keep std.outbuffer?

Since it's not range based, probably not.

Why not just fold this into std.io? I'm surprised that this is a separate module, actually. It should either be folded into std.io, or developed to be more generic (i.e., have range-based API, have more features like auto-flushing past a certain size, etc.). T -- Prosperity breeds contempt, and poverty breeds consent. -- Suck.com
May 14 2012
prev sibling next sibling parent "Jonas Drewsen" <jdrewsen nospam.com> writes:
On Sunday, 13 May 2012 at 22:26:17 UTC, Walter Bright wrote:
 On 5/13/2012 3:16 PM, Nathan M. Swan wrote:
 Trying to make it read lazily is even harder, as all std.utf 
 functions work on
 arrays, not ranges. I think this should change.

Yes, std.utf should be upgraded to present range interfaces.

+1 on that. I really needed it when doing the std.net.curl stuff and would be happy to move it to a more generic handling in std.utf.
May 15 2012
prev sibling next sibling parent Artur Skawina <art.08.09 gmail.com> writes:
On 05/16/12 21:38, H. S. Teoh wrote:
 On Wed, May 16, 2012 at 12:48:49PM -0500, Andrei Alexandrescu wrote:
 On 5/16/12 12:34 PM, Steven Schveighoffer wrote:
 In other words, ranges aren't enough.

This is copiously clear to me, but the way I like to think about it is by extending the notion of range (with notions such as e.g. BufferedRange, LookaheadRange, and such) instead of developing an abstraction independent from ranges and then working on stitching that with ranges.

One direction that _could_ be helpful, perhaps, is to extend the concept of range to include, let's tentatively call it, a ChunkedRange. Basically a ChunkedRange implements the usual InputRange operations (empty, front, popfront) but adds the following new primitives: - bool hasAtLeast(R)(R range, int n) - true if underlying range has at least n elements left; - E[] frontN(R)(R range, int n) - returns a slice containing the front n elements from the range: this will buffer the next n elements from the range if they aren't already; repeated calls will just return the buffer; - void popN(R)(R range, int n) - discards the first n elements from the buffer, thus causing the next call to frontN() to fetch more data if necessary. These are all tentative names, of course. But the idea is that you can keep N elements of the range "in view" at a time, without having to individually read them out and save them in a second buffer, and you can advance this view once you're done with the current data and want to move on. Existing range operations like popFrontN, take, takeExactly, drop, etc., can be extended to take advantage of the extra functionality of ChunkedRanges. (Perhaps popFrontN can even be merged with popN, since they amount to the same thing.) Using a ChunkedRange allows you to write functions that parse a particular range and return a range of chunks (say, a deserializer that returns a range of objects given a range of bytes). Thinking on it a bit further, perhaps we can call this a WindowedRange, since it somewhat resembles the sliding window protocol where you keep a "window" of sequential packet ids in an active buffer, and remove them from the buffer as they get ack'ed (consumed by popN). The buffer thus acts like a "window" into the next n elements in the range, which can be "slid forward" as data is consumed.

Right now, everybody reinvents this, with a slightly different interface... It's really obvious, needed and just has to be standardized. A few notes: hasAtLeast is redundant as it can be better expressed as .length; what would be the point of wrapping 'r.length>=n'? An '.available' property would be useful to find eg out how much can be consumed w/o blocking, but that one should return a size_t too. 'E[] frontN' should have a version that returns all available elements; i called it ' property E[] fronts()' here. It's more efficient that way and doesn't rely on the compiler to inline and optimize the limit checks away. PopN -- well, its signature here is 'void popFronts(size_t n)', other than that, there's no difference. Similar things are necessary for output ranges. Here, what i needed was: void put(ref E el) void puts(E[] els) property size_t free() // Not the most intuitive name w/o context; // returns the number of E's that can be 'put()' // w/o blocking. Note that all of this doesn't address the consume-variable-sized-chunks issue. But that can now be efficiently handled by another layer on top. On 05/16/12 22:15, Steven Schveighoffer wrote:
 I still don't get the need to "add" this to ranges.  The streaming API works
fine on its own.

This is not an argument against a streaming API (at least not for me), but for efficient ranges. With the API above I can shift tens of gigabytes of data per second between threads. And still use the 'std' range API and everything that works with it...
 But there is an omission with your proposed API regardless -- reading data is
a mutating event.  It destructively mutates the underlying data stream so that
you cannot get the data again.  This means you must double-buffer data in order
to support frontN and popN that are not necessary with a simple read API.
 
 For example:
 
 auto buf = new ubyte[1000000];
 stream.read(buf);
 
 does not need to first buffer the data inside the stream and then copy it to
buf, it can read it from the OS *directly* into buf.

Sometimes having the buffer managed by 'stream' and 'read()' returning a slice into it works (this is what 'fronts' above does). Reusing a caller managed buffer can be useful in other cases, yes. artur
May 16 2012
prev sibling parent Artur Skawina <art.08.09 gmail.com> writes:
On 05/16/12 22:58, Steven Schveighoffer wrote:
 On Wed, 16 May 2012 16:38:54 -0400, Artur Skawina <art.08.09 gmail.com> wrote:
 
 On 05/16/12 22:15, Steven Schveighoffer wrote:
 I still don't get the need to "add" this to ranges.  The streaming API works
fine on its own.

This is not an argument against a streaming API (at least not for me), but for efficient ranges. With the API above I can shift tens of gigabytes of data per second between threads. And still use the 'std' range API and everything that works with it...

But you never would want to. Don't get me wrong, the primitives here could work for a streaming API (I haven't implemented it that way, but it could be made to work). But the idea that it must *also* be a std.range input range makes zero sense.

Well, I do want to. For example, I can pass the produced data to *any* range consumer, it may be as efficient as mine, but will still work reasonably (I just did a quick test and the difference seems to be about 10G/s less for plain front+popFront consumer). The goal here is: if we could agree on a standard interface then *any* producer and consumer, including the ones in the std lib could take advantage of this (optional) feature. It's not so much about function call overhead as /syscall/ and /locking/ costs. Retrieving or writing 100 elements with only one lock-unlock sequence makes a large difference.
 To me, this is as obvious as not supporting linklist[index];  Sure, it can be
done, but who would ever use it?

This is not even related. Your 'read(ref ubyte[])' approach can actually mean that one more copy of the data is required. Think writer->range_or_stream->reader -- unless the reader is already waiting with an empty buffer, the stream has to copy the data to an internal buffer, which then has to be copied again when a reader comes around. The 'slice[] = fronts' solution avoids the second copy. Like I said, depending on the circumstances, sometimes you want one scheme, sometimes the other. (TBH, right now i can't think of a case where i'd prefer a non-range based approach; having the same i/f is just so convenient. But I'm sure there's one ;) ) artur
May 16 2012