www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - How to detect end of stdin?

reply k2 <k2_member pathlink.com> writes:
test.d
---
import std.stream;

void main()
{
while(!stdin.eof())
printf("%c", stdin.getc());
}

---

dmd test.d
type test.d | test.exe

{ while(!stdin.eof()) printf("%c", stdin.getc()); } Error: not enough data in stream Where is wrong? Windows 2000, DMD v0.125
May 25 2005
parent reply "Ben Hinkle" <ben.hinkle gmail.com> writes:
It does seem wierd but here's what's going on: stdin.eof returns true 
*after* eof is hit - but not before (since eof would have to do a read to 
check). So that means you have to wrap the getc in a try/catch. I am tempted 
to make getc return EOF at eof. What do people think? Returning EOF would 
get rid of some ugly try-catches but it would make reading char different 
from reading anything else (if you call read(x) with an int x then it can't 
"return" eof so it must throw). More specifically the key change would be to 
std.Stream
  void read(out char x) { readExact(&x, x.sizeof); }
would become something like
  void read(out char x) { if (readBlock(&x, x.sizeof) == 0) x = EOF; }
Since D uses unicode setting EOF=0xFF means it won't get confused with a 
regular character.

Does that seem like a good trade-off?
-Ben

"k2" <k2_member pathlink.com> wrote in message 
news:d71eoj$23uv$1 digitaldaemon.com...
 test.d
 ---
 import std.stream;

 void main()
 {
 while(!stdin.eof())
 printf("%c", stdin.getc());
 }

 ---

dmd test.d
type test.d | test.exe

{ while(!stdin.eof()) printf("%c", stdin.getc()); } Error: not enough data in stream Where is wrong? Windows 2000, DMD v0.125

May 25 2005
next sibling parent reply Stewart Gordon <smjg_1998 yahoo.com> writes:
Ben Hinkle wrote:
<snip>
 More specifically the key change would be to std.Stream
   void read(out char x) { readExact(&x, x.sizeof); }
 would become something like
   void read(out char x) { if (readBlock(&x, x.sizeof) == 0) x = EOF; }
 Since D uses unicode setting EOF=0xFF means it won't get confused with a 
 regular character.

That doesn't follow. The input stream might not be Unicode; moreover, it might even be a binary file. Moreover, read is designed to be called once you've already established that there should not be an EOF. We should keep intact the concepts of expected and unexpected EOF. digitalmars.D/4085 Stewart. -- My e-mail is valid but not my primary mailbox. Please keep replies on the 'group where everyone may benefit.
May 25 2005
next sibling parent reply "Ben Hinkle" <ben.hinkle gmail.com> writes:
"Stewart Gordon" <smjg_1998 yahoo.com> wrote in message 
news:d71r04$2jsb$1 digitaldaemon.com...
 Ben Hinkle wrote:
 <snip>
 More specifically the key change would be to std.Stream
   void read(out char x) { readExact(&x, x.sizeof); }
 would become something like
   void read(out char x) { if (readBlock(&x, x.sizeof) == 0) x = EOF; }
 Since D uses unicode setting EOF=0xFF means it won't get confused with a 
 regular character.

That doesn't follow. The input stream might not be Unicode; moreover, it might even be a binary file.

char, wchar and dchar imply unicode since this is D. Are you referring to the fact that D doesn't enforce unicode "char" arrays? Reading a non-unicode stream using std.stream isn't possible without another library like libiconv or ICU to map encodings. I would think if one is reading a non-unicode stream one wouldn't use char[] or char or wchar[] or friends - instead one would use byte[] and such.
 Moreover, read is designed to be called once you've already established 
 that there should not be an EOF.  We should keep intact the concepts of 
 expected and unexpected EOF.

 digitalmars.D/4085

I'm not sure what you mean by "intact" since std.stream doesn't really have the notion of expected and unexpected eof - right now they are all unexpected. The non-char reads will throw (unexpected eof). Only trying to read char (or I suppose wchar or dchar) will return EOF (expected eof). The idea is that in a binary file reaching eof in a read is unexpected while reaching eof in a text file is expected.
May 25 2005
next sibling parent reply "Regan Heath" <regan netwin.co.nz> writes:
On Wed, 25 May 2005 08:44:28 -0400, Ben Hinkle <ben.hinkle gmail.com>  
wrote:
 "Stewart Gordon" <smjg_1998 yahoo.com> wrote in message
 news:d71r04$2jsb$1 digitaldaemon.com...
 Ben Hinkle wrote:
 <snip>
 More specifically the key change would be to std.Stream
   void read(out char x) { readExact(&x, x.sizeof); }
 would become something like
   void read(out char x) { if (readBlock(&x, x.sizeof) == 0) x = EOF; }
 Since D uses unicode setting EOF=0xFF means it won't get confused with  
 a
 regular character.

That doesn't follow. The input stream might not be Unicode; moreover, it might even be a binary file.

char, wchar and dchar imply unicode since this is D. Are you referring to the fact that D doesn't enforce unicode "char" arrays? Reading a non-unicode stream using std.stream isn't possible without another library like libiconv or ICU to map encodings. I would think if one is reading a non-unicode stream one wouldn't use char[] or char or wchar[] or friends - instead one would use byte[] and such.
 Moreover, read is designed to be called once you've already established
 that there should not be an EOF.  We should keep intact the concepts of
 expected and unexpected EOF.

 digitalmars.D/4085

I'm not sure what you mean by "intact" since std.stream doesn't really have the notion of expected and unexpected eof - right now they are all unexpected. The non-char reads will throw (unexpected eof). Only trying to read char (or I suppose wchar or dchar) will return EOF (expected eof). The idea is that in a binary file reaching eof in a read is unexpected while reaching eof in a text file is expected.

It's a curly problem, that's for sure. My impression is that the EOF is expected when reading one byte at a time. Maybe also when reading the first byte of a greater than 1 byte thing (where thing is a wchar, dchar, short, int, long, float, etc). But EOF is unexpected when in the middle of reading something. So, for example if you try to read an 'int' and get 2 bytes then EOF, it's unexpected. But, if you're reading chars or bytes, one at time, you expect to hit/read EOF eventually. It could be argued that 'char' is different to 'byte' as, correct me if I am wrong, a single 'char' is a unicode fragment, possibly an incomplete character. So it's concievable you might want to validate it, and if it's incomplete you have an un-expected EOF as opposed to an expected one. Regan
May 25 2005
next sibling parent "Ben Hinkle" <bhinkle mathworks.com> writes:
 My impression is that the EOF is expected when reading one byte at a time. 
 Maybe also when reading the first byte of a greater than 1 byte thing 
 (where thing is a wchar, dchar, short, int, long, float, etc). But EOF is 
 unexpected when in the middle of reading something.

Good point. Half a wchar is unexpected.
 So, for example if you try to read an 'int' and get 2 bytes then EOF, it's 
 unexpected. But, if you're reading chars or bytes, one at time, you expect 
 to hit/read EOF eventually.

I had assumed reading bytes would be considered binary io and so hitting eof would throw. Off the top of my head I would prefer to keep bytes as numeric and chars as text.
 It could be argued that 'char' is different to 'byte' as, correct me if I 
 am wrong, a single 'char' is a unicode fragment, possibly an incomplete 
 character. So it's concievable you might want to validate it, and if it's 
 incomplete you have an un-expected EOF as opposed to an expected one.

I agree char is different than byte. The trouble with trying to validate multi-byte codepoints is that you would need to look ahead or keep state about what the previous bytes were in order to know if the current byte being read is in the middle of a codepoint or not. It seems like a lot of trouble for unclear benefit.
May 25 2005
prev sibling parent "Ben Hinkle" <bhinkle mathworks.com> writes:
 My impression is that the EOF is expected when reading one byte at a time. 
 Maybe also when reading the first byte of a greater than 1 byte thing 
 (where thing is a wchar, dchar, short, int, long, float, etc). But EOF is 
 unexpected when in the middle of reading something.

sorry for the double post, but here's a possible read(out wchar x): void read(out wchar x) { size_t n = readBlock(&x, x.sizeof); if (n == 0) x = wchar.init; else if (n == 1) { // could be partial read void* buf = &x; if (readBlock(buf+1, 1) == 0) throw new ReadException(...); } } That way an eof with half a wchar throws but eof with no data returns EOF. The dchar read would be something similar but probably with a loop for partial reads since it can read up to four times instead of twice.
May 25 2005
prev sibling parent reply Stewart Gordon <smjg_1998 yahoo.com> writes:
Ben Hinkle wrote:
<snip>
 char, wchar and dchar imply unicode since this is D. Are you 
 referring to the fact that D doesn't enforce unicode "char" arrays?  
 Reading a non-unicode stream using std.stream isn't possible without 
 another library like libiconv or ICU to map encodings.

std.stream doesn't care at all about the format of input to that level.
 I would think if one is reading a non-unicode stream one wouldn't use 
 char[] or char or wchar[] or friends - instead one would use byte[] 
 and such.

Up until the point where you need to do console I/O or access an external API that relies on whatever encoding the input is in.
 Moreover, read is designed to be called once you've already 
 established that there should not be an EOF.  We should keep intact 
 the concepts of expected and unexpected EOF.
 
 digitalmars.D/4085

I'm not sure what you mean by "intact" since std.stream doesn't really have the notion of expected and unexpected eof - right now they are all unexpected.

An expected EOF is handled by checking for EOF before attempting to read. It's part of common sense rather than of std.stream itself. I.e. you check for EOF before reading if this is part of the normal program logic. At the moment one can rely on exceptions to catch a premature end of file. This should remain so. I refer you back to the error handling philosophy.
 The non-char reads will throw (unexpected eof). Only trying to read 
 char (or I suppose wchar or dchar) will return EOF (expected eof).  
 The idea is that in a binary file reaching eof in a read is 
 unexpected while reaching eof in a text file is expected.

That doesn't follow either. For example, suppose you're writing a utility that manipulates binary files in general. E.g. a hex editor or a file compression utility. At no point while reading the file can you just expect that there is or isn't more. Conversely, suppose you're writing a D compiler. A D code file is a text file. And yet it can't end abruptly in the middle of a comment or string literal. Similarly, many of my department's programs use parameter files designed to be edited directly by the user, with one parameter per line. If you're expecting the next parameter but instead reach the end of the file, then that's unexpected. So really there is no correlation. Stewart. -- My e-mail is valid but not my primary mailbox. Please keep replies on the 'group where everyone may benefit.
May 25 2005
next sibling parent reply Vathix <vathix dprogramming.com> writes:
What about having 2 different streams: binary and text.

Binary one will work as it does now where eof() just checks the file  
pointer.

Text one will use the unget buffer. If the unget buffer contains a  
character, it is not eof; otherwise it tries to read one into it.
May 25 2005
parent reply "Ben Hinkle" <bhinkle mathworks.com> writes:
"Vathix" <vathix dprogramming.com> wrote in message 
news:op.srb9qssxkcck4r esi...
 What about having 2 different streams: binary and text.

That's essentially what getc() and readLine() do. They treat the stream as a text stream and look at the unget buffer etc. The read() functions directly ask the OS for data and ignore the unget buffer.
 Binary one will work as it does now where eof() just checks the file 
 pointer.

 Text one will use the unget buffer. If the unget buffer contains a 
 character, it is not eof; otherwise it tries to read one into it.

The problem is that, generally speaking, the stream doesn't know it has hit eof until it tries to read past the end and fails. So when you say "otherwise it tried to read one into it" one has to say what happens if that fails. Currently it throws. One can argue it should return a special "eof" character and then set the stream eof flag so that future calls to eof() will indicate that eof has been reached.
May 25 2005
parent reply Vathix <vathix dprogramming.com> writes:
 Text one will use the unget buffer. If the unget buffer contains a
 character, it is not eof; otherwise it tries to read one into it.

The problem is that, generally speaking, the stream doesn't know it has hit eof until it tries to read past the end and fails. So when you say "otherwise it tried to read one into it" one has to say what happens if that fails. Currently it throws. One can argue it should return a special "eof" character and then set the stream eof flag so that future calls to eof() will indicate that eof has been reached.

That's why eof() would try to read into unget and if it fails, it's eof; otherwise it has a char stored for the next getc(). But this won't work right now since the different size chars use different unget buffers. If they shared an unget buffer that is just an array of bytes, you could, for example, unget a wchar and get 2 char`s from it. Removing the unget buffer from a binary stream is also desirable since it's not wise to use ungetc and readBlock on the same stream.
May 25 2005
parent reply "Ben Hinkle" <bhinkle mathworks.com> writes:
"Vathix" <vathix dprogramming.com> wrote in message 
news:op.srccyakakcck4r esi...
 Text one will use the unget buffer. If the unget buffer contains a
 character, it is not eof; otherwise it tries to read one into it.

The problem is that, generally speaking, the stream doesn't know it has hit eof until it tries to read past the end and fails. So when you say "otherwise it tried to read one into it" one has to say what happens if that fails. Currently it throws. One can argue it should return a special "eof" character and then set the stream eof flag so that future calls to eof() will indicate that eof has been reached.

That's why eof() would try to read into unget and if it fails, it's eof; otherwise it has a char stored for the next getc(). But this won't work right now since the different size chars use different unget buffers. If they shared an unget buffer that is just an array of bytes, you could, for example, unget a wchar and get 2 char`s from it. Removing the unget buffer from a binary stream is also desirable since it's not wise to use ungetc and readBlock on the same stream.

I understand you now. I misunderstood that eof() would block. Would that be a problem with something like import std.stream; int main() { while (!stdin.eof()) { stdout.writefln("type a line, please"); char[] line = stdin.readLine(); stdout.writefln("you typed: %s",line); } return 0; } type a line, please hello you typed: hello type a line, please there you typed: there type a line, please ^Z you typed: If stdin.eof() blocked waiting for input then the writefln inside the loop wouldn't get run until after the user has typed a line and hit enter.
May 25 2005
parent Vathix <vathix dprogramming.com> writes:
 I understand you now. I misunderstood that eof() would block.

sorry, I wasn't thinking
May 25 2005
prev sibling parent reply "Ben Hinkle" <bhinkle mathworks.com> writes:
 Moreover, read is designed to be called once you've already established 
 that there should not be an EOF.  We should keep intact the concepts of 
 expected and unexpected EOF.

 digitalmars.D/4085

I'm not sure what you mean by "intact" since std.stream doesn't really have the notion of expected and unexpected eof - right now they are all unexpected.

An expected EOF is handled by checking for EOF before attempting to read. It's part of common sense rather than of std.stream itself. I.e. you check for EOF before reading if this is part of the normal program logic.

But for the situation of the original post (reading stdin) the OS doesn't tell us eof has happened until you try to read and it fails. So in other words for stdin "eof" means "did the last read attempt try to read past eof".
 At the moment one can rely on exceptions to catch a premature end of file. 
 This should remain so.  I refer you back to the error handling philosophy.

That would work fine if eof could detect that stdin has ended without attempting to read.
 The non-char reads will throw (unexpected eof). Only trying to read char 
 (or I suppose wchar or dchar) will return EOF (expected eof).  The idea 
 is that in a binary file reaching eof in a read is unexpected while 
 reaching eof in a text file is expected.

That doesn't follow either. For example, suppose you're writing a utility that manipulates binary files in general. E.g. a hex editor or a file compression utility. At no point while reading the file can you just expect that there is or isn't more.

I don't get you. What do you mean by "follow"? I'm not trying to chain a sequence of statements into a proof or something. I'm stating that from a practical point of view binary files should throw if a read is incomplete and text files should return EOF. I don't understand what you are arguing read() do for different situations.
 Conversely, suppose you're writing a D compiler.  A D code file is a text 
 file.  And yet it can't end abruptly in the middle of a comment or string 
 literal.  Similarly, many of my department's programs use parameter files 
 designed to be edited directly by the user, with one parameter per line. 
 If you're expecting the next parameter but instead reach the end of the 
 file, then that's unexpected.

The semantic content of the text file (eg a D source file) is independent of std.stream. You say some D source code can't end in the middle of a comment. I think such a file would be a semantically incorrect source file but there's no way std.stream can determine that. I could see if someone write a subclass of stream that knows about comments and throws on eof in a comment then that's fine with me. I don't see why that conflicts with returning EOF from getc.
 So really there is no correlation.

So are you arguing for throwing in getc or not throwing?
May 25 2005
parent reply Stewart Gordon <smjg_1998 yahoo.com> writes:
Ben Hinkle wrote:
<snip>
 But for the situation of the original post (reading stdin) the OS doesn't 
 tell us eof has happened until you try to read and it fails. So in other 
 words for stdin "eof" means "did the last read attempt try to read past 
 eof".

Actually it doesn't _mean_ this, it gives this as a possible alternative behaviour for situations where EOF can't be determined directly. But you have a point there. Another thing for me to consider when I get round to writing text I/O classes.... <snip>
The non-char reads will throw (unexpected eof). Only trying to read char 
(or I suppose wchar or dchar) will return EOF (expected eof).  The idea 
is that in a binary file reaching eof in a read is unexpected while 
reaching eof in a text file is expected.

That doesn't follow either. For example, suppose you're writing a utility that manipulates binary files in general. E.g. a hex editor or a file compression utility. At no point while reading the file can you just expect that there is or isn't more.

I don't get you. What do you mean by "follow"?

Basically that claiming any difference between binary and text files in EOF handling doesn't derive from any consistent logic.
 I'm not trying to chain a 
 sequence of statements into a proof or something. I'm stating that from a 
 practical point of view binary files should throw if a read is incomplete 
 and text files should return EOF. I don't understand what you are arguing 
 read() do for different situations.

If there's data left to read, read it. If there isn't data left to read, throw an exception. <snip>
 The semantic content of the text file (eg a D source file) is independent of 
 std.stream. You say some D source code can't end in the middle of a comment. 
 I think such a file would be a semantically incorrect source file but 
 there's no way std.stream can determine that. I could see if someone write a 
 subclass of stream that knows about comments and throws on eof in a comment 
 then that's fine with me. I don't see why that conflicts with returning EOF 
 from getc.

Nobody said anything about std.stream knowing about comments. Just think about it. Just look at this natural way of skipping over a comment (once it's established that we're in a comment: char[] nextChars; while((nextChars = file.readString(2)) != "*/") { file.ungetc(nextChars[1]); // * } * OK, so under getc, "This is the only method that will handle ungetc properly." But you get the idea. It doesn't check for EOF, because this isn't part of the normal program logic. Instead, it relies on exception handling to catch an input file malformed in this respect, just as we might use exception handling to catch file not found and other file access errors. And so we shouldn't be surprised to see this technique in use. Especially in quick and dirty programs, which are a significant part of the motivation for exceptions.
 So really there is no correlation.

So are you arguing for throwing in getc or not throwing?

Throwing. Stewart. -- My e-mail is valid but not my primary mailbox. Please keep replies on the 'group where everyone may benefit.
May 26 2005
parent Ben Hinkle <Ben_member pathlink.com> writes:
The non-char reads will throw (unexpected eof). Only trying to read char 
(or I suppose wchar or dchar) will return EOF (expected eof).  The idea 
is that in a binary file reaching eof in a read is unexpected while 
reaching eof in a text file is expected.

That doesn't follow either. For example, suppose you're writing a utility that manipulates binary files in general. E.g. a hex editor or a file compression utility. At no point while reading the file can you just expect that there is or isn't more.

I don't get you. What do you mean by "follow"?

Basically that claiming any difference between binary and text files in EOF handling doesn't derive from any consistent logic.

OK. I agree. The concept of "text file" and "binary file" is context dependent. One can say from an abstract point of view that EOF shouldn't depend on text vs binary. It is practical, though, to tailor parts of the API for "text files" and for "binary files" so I think it's worth it even though it breaks the uniformity.
<snip>
 The semantic content of the text file (eg a D source file) is independent of 
 std.stream. You say some D source code can't end in the middle of a comment. 
 I think such a file would be a semantically incorrect source file but 
 there's no way std.stream can determine that. I could see if someone write a 
 subclass of stream that knows about comments and throws on eof in a comment 
 then that's fine with me. I don't see why that conflicts with returning EOF 
 from getc.

Nobody said anything about std.stream knowing about comments. Just think about it. Just look at this natural way of skipping over a comment (once it's established that we're in a comment: char[] nextChars; while((nextChars = file.readString(2)) != "*/") { file.ungetc(nextChars[1]); // * } * OK, so under getc, "This is the only method that will handle ungetc properly." But you get the idea. It doesn't check for EOF, because this isn't part of the normal program logic. Instead, it relies on exception handling to catch an input file malformed in this respect, just as we might use exception handling to catch file not found and other file access errors. And so we shouldn't be surprised to see this technique in use. Especially in quick and dirty programs, which are a significant part of the motivation for exceptions.

Note the only functions that would no longer throw are: getc and getcw. So readString(2) would continue to throw (plus readString doesn't use the unget buffer). Asking to read a fixed amount of character will throw if there aren't enough. From a practical point of the the difference is that some code that uses getc will be able to switch from something like #try { # while (true) { # ... blah blah stream.getc() blah blah ... # } catch (ReadException ex) { # if (!stream.eof()) throw ex; # } #} to #while (!stream.eof()) { # ... blah blah stream.getc() blah blah ... #} Everything else should remain the same. Inside std.stream when I make the change I was able to remove the try/catches from readLine/w and scanf plus some try/catches in std.socketstream.
 So really there is no correlation.

So are you arguing for throwing in getc or not throwing?

Throwing.

ok - understood.
May 26 2005
prev sibling parent reply Stewart Gordon <smjg_1998 yahoo.com> writes:
Stewart Gordon wrote:
 Ben Hinkle wrote:
 <snip>
 
 More specifically the key change would be to std.Stream
   void read(out char x) { readExact(&x, x.sizeof); }
 would become something like
   void read(out char x) { if (readBlock(&x, x.sizeof) == 0) x = EOF; }
 Since D uses unicode setting EOF=0xFF means it won't get confused with 
 a regular character.

<snip> That doesn't follow. The input stream might not be Unicode; moreover, it might even be a binary file.

Just thinking about it, even if the program does expect UTF-8 input, this has the drawback that a malformed input file containing a 0xFF byte could cause the input to be truncated. Which probably wouldn't be desirable. So if we're going to do this, should we make it throw an exception if it reads in 0xFF? Stewart. -- My e-mail is valid but not my primary mailbox. Please keep replies on the 'group where everyone may benefit.
May 26 2005
parent Ben Hinkle <Ben_member pathlink.com> writes:
In article <d74p6u$2kkq$1 digitaldaemon.com>, Stewart Gordon says...
Stewart Gordon wrote:
 Ben Hinkle wrote:
 <snip>
 
 More specifically the key change would be to std.Stream
   void read(out char x) { readExact(&x, x.sizeof); }
 would become something like
   void read(out char x) { if (readBlock(&x, x.sizeof) == 0) x = EOF; }
 Since D uses unicode setting EOF=0xFF means it won't get confused with 
 a regular character.

<snip> That doesn't follow. The input stream might not be Unicode; moreover, it might even be a binary file.

Just thinking about it, even if the program does expect UTF-8 input, this has the drawback that a malformed input file containing a 0xFF byte could cause the input to be truncated. Which probably wouldn't be desirable. So if we're going to do this, should we make it throw an exception if it reads in 0xFF?

True. It is fairly evil to co-op a valid return value to mean EOF. I think it is possible to check if the EOF was actually in the stream by checking eof() after getc, though. If eof() is true then the EOF was because of end-of-file while if eof() is false then the EOF was read from the stream. The one little edge case that might not work is if the EOF was the last character in the stream and the stream was seekable (since then the stream can figure out when eof is true without having to read past the end). Maybe the "readEOF" flag that indicates the last read was past the end needs to be public readable.
May 28 2005
prev sibling parent "Ben Hinkle" <bhinkle mathworks.com> writes:
Note another option is instead of

  void read(out char x) { readExact(&x, x.sizeof); }
 would become something like
  void read(out char x) { if (readBlock(&x, x.sizeof) == 0) x = EOF; }

to keep read(out char x) the same and only redo getc and getcw to not call read(ch) directly. So getc() would look something like char getc() { if (<unget buffer non-empty>) return next-char-from-unget-buffer else { char ch; readBlock(&ch,1); // default ch is char.init which is 0xFF return ch; } } That way readLine and other user code wouldn't have to try/catch getc failures but would look for char.init instead.
May 25 2005