www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - stream.readLine

reply bobef <spam sucks.com> writes:
The implementation of stream.readLine() threats char.init as EOF, which is not
right because char.init is 255 (which is ÿ in Cyrillic). I believe EOF should
be 0.
Jan 23 2007
parent reply Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
bobef wrote:
 The implementation of stream.readLine() threats char.init as EOF, which is not
right because char.init is 255 (which is ÿ in Cyrillic). I believe EOF should
be 0.
No, char.init is 255 which is an invalid byte in UTF-8 data. Codepoint 255 *is* ÿ, IIRC, but char doesn't store codepoints. It stores UTF-8 bytes (code units?). Forgive me if I got the terminology wrong.
Jan 23 2007
parent reply bobef <spam sucks.com> writes:
Then it is impossible to use the readLine() function to read non-utf8 streams?
If it is so this sucks ass, because I have to read the stream to convert it to
utf8, because obviously I can't force any stream out there to be utf8 just
because D likes it :)
Jan 23 2007
parent Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
bobef wrote:
 Then it is impossible to use the readLine() function to read non-utf8 streams?
InputStream.readLine (which I presume is the one you mean) returns an UTF-8 string. It doesn't mention in what format it is read. If someone wants to implement it to read a non-UTF string from somewhere and then convert it to UTF-8 and return it, that's a perfectly valid implementation.
 If it is so this sucks ass, because I have to read the stream to convert it to
utf8, because obviously I can't force any stream out there to be utf8 just
because D likes it :)
A conversion stream may not be so hard to implement. Just create an object implementing InputStream and pass another InputStream to its constructor. Or you can even inherit it directly from std.stream.File, forward the constructors, and only override the readLine* functions. Then if you're reading a file formatted in some ASCII + extended codepage format, you just need a lookup table (or conversion function) to convert the last 128 values to the corresponding UTF codepoints and use std.utf.encode. For Latin-1 data it's even simpler, just pass it straight to std.utf.encode. You'll probably want to use the read(inout ubyte) method to read such a file. The process for other text formats is probably similar, perhaps using other read() overloads to read it (for multi-byte encodings). (Warning: I've never actually implemented a Stream, so the above may well be riddled with errors and misinformation :) )
Jan 23 2007