www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - stream.getc() doesn't recognize eof

reply Brian White <bcwhite pobox.com> writes:
I was looking through the std.stream code of Phobos and found this function:

   // reads and returns next character from the stream,
   // handles characters pushed back by ungetc()
   // returns char.init on eof.
   char getc() {
     char c;
     if (prevCr) {
       prevCr = false;
       c = getc();
       if (c != '\n')
         return c;
     }
     if (unget.length > 1) {
       c = cast(char)unget[unget.length - 1];
       unget.length = unget.length - 1;
     } else {
       readBlock(&c,1);
     }
     return c;
   }


Is there something I don't understand?  How does it recognize EOF?  The 
"readBlock" function is defined as returning 0 (zero) if there is no 
more data but its return value in not checked.

-- Brian
Mar 12 2008
parent reply Regan Heath <regan netmail.co.nz> writes:
Brian White wrote:
 I was looking through the std.stream code of Phobos and found this 
 function:
 
   // reads and returns next character from the stream,
   // handles characters pushed back by ungetc()
   // returns char.init on eof.
   char getc() {
     char c;
     if (prevCr) {
       prevCr = false;
       c = getc();
       if (c != '\n')
         return c;
     }
     if (unget.length > 1) {
       c = cast(char)unget[unget.length - 1];
       unget.length = unget.length - 1;
     } else {
       readBlock(&c,1);
     }
     return c;
   }
 
 
 Is there something I don't understand?  How does it recognize EOF?  The 
 "readBlock" function is defined as returning 0 (zero) if there is no 
 more data but its return value in not checked.

At EOF readBlock returns 0, but more importantly it does not modify the value of 'c' which it is passed. The value of 'c' is char.init (due to D's automatic initialisation of variables to their init value) So, because c == char.init and nothing has modified it, the path which calls readBlock will return char.init when EOF is reached. :) Regan
Mar 13 2008
parent reply Brian White <bcwhite pobox.com> writes:
 So, because c == char.init and nothing has modified it, the path which 
 calls readBlock will return char.init when EOF is reached.

Ah, thanks! I must say that this technique worries me somewhat. "readBlock" is an abstract function definable by any derived class and I don't believe that "c must remain unchanged where data is not stored" is a defined output requirement of that method. -- Brian
Mar 13 2008
parent reply Regan Heath <regan netmail.co.nz> writes:
Brian White wrote:
 So, because c == char.init and nothing has modified it, the path which 
 calls readBlock will return char.init when EOF is reached.

Ah, thanks! I must say that this technique worries me somewhat. "readBlock" is an abstract function definable by any derived class and I don't believe that "c must remain unchanged where data is not stored" is a defined output requirement of that method.

Good point, might be safer to check for the 0 return and set c to char.init explicitly. You comment did get me thinking... Is there some way of expressing the requirement using design by contract? I think the answer is, not easily, you'd have to do something like: // the problem being that we need a global to copy the input buffer into // and it could be potentially huge. // when really all we want is some way to detect whether // data was written to that address _at all_ byte* buffer_in; abstract size_t readBlock(void* buffer, size_t size) in { buffer_in = malloc(size); memcpy(buffer_in, buffer, size); } out (result) { assert(result > 0 || (result == 0 && memcmp(buffer_in, buffer, size) == 0)); } /* note, no body, therefore function is still 'abstract' */ All that assuming it is legal to specify in/out contracts on an abstract method without a body. It should be possible, it would simply follow the same rules given for inheritance here under "In, Out and Inheritance": http://www.digitalmars.com/d/1.0/dbc.html Regan
Mar 13 2008
parent reply Brian White <bcwhite pobox.com> writes:
 Good point, might be safer to check for the 0 return and set c to 
 char.init explicitly.

I think it makes a better design. This way feels like relying on side-effects and I've spent enough time coding perl to know that making use of side-effects is a great start towards unreadable and unmaintainable code. The more obvious you make code, the less likely there will be bugs and the easier it will be for someone else to maintain it. A comment like "c still has .init value if readBlock failed" would also be sufficient. If I were maintaining this code, I would have (wrongly) assumed a bug and "corrected" it, possibly introducing a new bug.
         (result == 0 && memcmp(buffer_in, buffer, size) == 0));

Eee-Gad, but that's painful! Performance could easily be so bad that I'd turn off the checks and then they're no use at all. I've never known a "read" function to modify bytes beyond the "count" amount returned, but I don't know if it's ever explicitly stated not to do so. -- Brian
Mar 13 2008
parent reply Regan Heath <regan netmail.co.nz> writes:
Brian White wrote:
         (result == 0 && memcmp(buffer_in, buffer, size) == 0));

Eee-Gad, but that's painful! Performance could easily be so bad that I'd turn off the checks and then they're no use at all.

You can use -release to turn off contracts and asserts, so only non-release builds would suffer the penalty.
 I've never known a "read" function to modify bytes beyond the "count" 
 amount returned, but I don't know if it's ever explicitly stated not to 
 do so.

True. You could perhaps cheat a little and remember just the first byte of the output buffer, chances are if the first byte hasn't changed, nothing was written to the buffer. Regan
Mar 14 2008
parent Brian White <bcwhite pobox.com> writes:
         (result == 0 && memcmp(buffer_in, buffer, size) == 0));

Eee-Gad, but that's painful! Performance could easily be so bad that I'd turn off the checks and then they're no use at all.

You can use -release to turn off contracts and asserts, so only non-release builds would suffer the penalty.

My worry is that the test code would be such a performance hit that it would be impossible to use without -release.
 I've never known a "read" function to modify bytes beyond the "count" 
 amount returned, but I don't know if it's ever explicitly stated not 
 to do so.

True. You could perhaps cheat a little and remember just the first byte of the output buffer, chances are if the first byte hasn't changed, nothing was written to the buffer.

I was just thinking the exact same thing. -- Brian
Mar 16 2008